from:"Leonard Mada via R\-help"

Re: [R] by function does not separate output from function with, mulliple parts

2023-10-25 Thread Leonard Mada via R-help


Dear John,

Printing inside the function is problematic. Your function itself does 
NOT print the labels.


Just as a clarification:

F = factor(rep(1:2, 2))
by(data.frame(V = 1:4, F = F), F, function(x) { print(x); return(NULL); } )
#   V F
# 1 1 1
# 3 3 1
#   V F
# 2 2 2
# 4 4 2
# F: 1 <- this is NOT printed inside the function
# NULL
# -
# F: 2
# NULL

### Return Results
by(data.frame(V = 1:4, F = F), F, function(x) { return(x); } )
# F: 1
#   V F
# 1 1 1
# 3 3 1
# --
# F: 2
#   V F
# 2 2 2
# 4 4 2

Maybe others on the list can offer further assistance.

Sincerely,

Leonard

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Issue from R-devel: subset on table

2023-10-21 Thread Leonard Mada via R-help


Another solution could also be possible - see below.


On 10/21/2023 10:38 PM, Leonard Mada wrote:

My mistake!

It does actually something else, which is incorrect. One could still 
use (although the code is more difficult to read):


subset(tmp <- table(sample(1:10, 100, T)), tmp > 10)


2) Alternative solution
Enhance subset.default to accept also formulas, e.g.:

subset.default = function (x, subset, ...)
{
    if(inherits(subset, "formula")) {
    subset = subset[[2]];
    subset = eval(subset, list("." = x));
    } else if (! is.logical(subset))
    stop("'subset' must be logical")
    x[subset & ! is.na(subset)]
}

# it works now: but results depend on sample()
subset(table(sample(1:10, 100, T)), ~ . > 10)
subset(table(sample(1:10, 100, T)), ~ . > 10 & . < 13)





Sincerely,


Leonard


On 10/21/2023 10:26 PM, Leonard Mada wrote:

Dear List Members,

There was recently an issue on R-devel (which I noticed only very late):
https://stat.ethz.ch/pipermail/r-devel/2023-October/082943.html

It is possible to use subset as well, almost as initially stated:

subset(table(sample(1:5, 100, T)), table > 10)
# Error in table > 10 :
#  comparison (>) is possible only for atomic and list types

subset(table(sample(1:5, 100, T)), 'table' > 10)
#  1  2  3  4  5
# 21 13 15 28 23



Note: The result was ok only by chance! But it is incorrect in general.




Works with the letters-example as well.

Sincerely,

Leonard




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Issue from R-devel: subset on table

2023-10-21 Thread Leonard Mada via R-help


My mistake!


It does actually something else, which is incorrect. One could still use 
(although the code is more difficult to read):


subset(tmp <- table(sample(1:10, 100, T)), tmp > 10)


Sincerely,


Leonard


On 10/21/2023 10:26 PM, Leonard Mada wrote:

Dear List Members,

There was recently an issue on R-devel (which I noticed only very late):
https://stat.ethz.ch/pipermail/r-devel/2023-October/082943.html

It is possible to use subset as well, almost as initially stated:

subset(table(sample(1:5, 100, T)), table > 10)
# Error in table > 10 :
#  comparison (>) is possible only for atomic and list types

subset(table(sample(1:5, 100, T)), 'table' > 10)
#  1  2  3  4  5
# 21 13 15 28 23



Note: The result was ok only by chance! But it is incorrect in general.




Works with the letters-example as well.

Sincerely,

Leonard




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Issue from R-devel: subset on table

2023-10-21 Thread Leonard Mada via R-help


Dear List Members,

There was recently an issue on R-devel (which I noticed only very late):
https://stat.ethz.ch/pipermail/r-devel/2023-October/082943.html

It is possible to use subset as well, almost as initially stated:

subset(table(sample(1:5, 100, T)), table > 10)
# Error in table > 10 :
#  comparison (>) is possible only for atomic and list types

subset(table(sample(1:5, 100, T)), 'table' > 10)
#  1  2  3  4  5
# 21 13 15 28 23

Works with the letters-example as well.

Sincerely,

Leonard

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Best way to test for numeric digits?

2023-10-18 Thread Leonard Mada via R-help


Dear Rui,

On 10/18/2023 8:45 PM, Rui Barradas wrote:

split_chem_elements <- function(x, rm.digits = TRUE) {
  regex <- "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])"
  if(rm.digits) {
    stringr::str_replace_all(mol, regex, "#") |>
  strsplit("#|[[:digit:]]") |>
  lapply(\(x) x[nchar(x) > 0L])
  } else {
    strsplit(x, regex, perl = TRUE)
  }
}

split.symbol.character = function(x, rm.digits = TRUE) {
  # Perl is partly broken in R 4.3, but this works:
  regex <- "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])"
  s <- strsplit(x, regex, perl = TRUE)
  if(rm.digits) {
    s <- lapply(s, \(x) x[grep("[[:digit:]]+", x, invert = TRUE)])
  }
  s
}


You have a glitch (mol is hardcoded) in the code of the first function. 
The times are similar, after correcting for that glitch.


Note:
- grep("[[:digit:]]", ...) behaves almost twice as slow as grep("[0-9]", 
...)!

- corrected results below;

Sincerely,

Leonard
###

split_chem_elements <- function(x, rm.digits = TRUE) {
  regex <- "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])"
  if(rm.digits) {
    stringr::str_replace_all(x, regex, "#") |>
  strsplit("#|[[:digit:]]") |>
  lapply(\(x) x[nchar(x) > 0L])
  } else {
    strsplit(x, regex, perl = TRUE)
  }
}

split.symbol.character = function(x, rm.digits = TRUE) {
  # Perl is partly broken in R 4.3, but this works:
  regex <- "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])"
  s <- strsplit(x, regex, perl = TRUE)
  if(rm.digits) {
    s <- lapply(s, \(x) x[grep("[0-9]", x, invert = TRUE)])
  }
  s
}

mol <- c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl")
mol1 <- rep(mol, 1)

system.time(
  split_chem_elements(mol1)
)
#   user  system elapsed
#   0.58    0.00    0.58

system.time(
  split.symbol.character(mol1)
)
#   user  system elapsed
#   0.67    0.00    0.67

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Best way to test for numeric digits?

2023-10-18 Thread Leonard Mada via R-help


Dear Rui,

Thank you for your reply.

I do have actually access to the chemical symbols: I have started to 
refactor and enhance the Rpdb package, see Rpdb::elements:

https://github.com/discoleo/Rpdb

However, the regex that you have constructed is quite heavy, as it needs 
to iterate through all chemical symbols (in decreasing nchar). Elements 
like C, and especially O, P or S, appear late in the regex expression - 
but are quite common in chemistry.


The alternative regex is (in this respect) simpler. It actually works 
(once you know about the workaround).


Q: My question focused if there is anything like is.numeric, but to 
parse each element of a vector.


Sincerely,


Leonard


On 10/18/2023 6:53 PM, Rui Barradas wrote:

Às 15:59 de 18/10/2023, Leonard Mada via R-help escreveu:

Dear List members,

What is the best way to test for numeric digits?

suppressWarnings(as.double(c("Li", "Na", "K",  "2", "Rb", "Ca", "3")))
# [1] NA NA NA  2 NA NA  3
The above requires the use of the suppressWarnings function. Are there
any better ways?

I was working to extract chemical elements from a formula, something
like this:
split.symbol.character = function(x, rm.digits = TRUE) {
      # Perl is partly broken in R 4.3, but this works:
      regex = "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])";
      # stringi::stri_split(x, regex = regex);
      s = strsplit(x, regex, perl = TRUE);
      if(rm.digits) {
      s = lapply(s, function(s) {
          isNotD = is.na(suppressWarnings(as.numeric(s)));
          s = s[isNotD];
      });
      }
      return(s);
}

split.symbol.character(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl"))


Sincerely,


Leonard


Note:
# works:
regex = "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])";
strsplit(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl"), regex, perl = T)


# broken in R 4.3.1
# only slightly "erroneous" with stringi::stri_split
regex = "(?<=[A-Z])(?![a-z]|$)|(?=[A-Z])|(?<=[a-z])(?=[^a-z])";
strsplit(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl"), regex, perl = T)

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://eu01.z.antigena.com/l/boS9jwics77ZHEe0yO-Lt8AIDZm9-s6afEH4ulMO3sMyE9mLHNAR603_eeHQG2-_t0N2KsFVQRcldL-XDy~dLMhLtJWX69QR9Y0E8BCSopItW8RqG76PPj7ejTkm7UOsLQcy9PUV0-uTjKs2zeC_oxUOrjaFUWIhk8xuDJWb
PLEASE do read the posting guide
https://eu01.z.antigena.com/l/rUSt2cEKjOO0HrIFcEgHH_NROfU9g5sZ8MaK28fnBl9G6CrCrrQyqd~_vNxLYzQ7Ruvlxfq~P_77QvT1BngSg~NLk7joNyC4dSEagQsiroWozpyhR~tbGOGCRg5cGlOszZLsmq2~w6qHO5T~8b5z8ZBTJkCZ8CBDi5KYD33-OK
and provide commented, minimal, self-contained, reproducible code.

Hello,

If you want to extract chemical elements symbols, the following might work.
It uses the periodic table in GitHub package chemr and a package stringr
function.


devtools::install_github("paleolimbot/chemr")



split_chem_elements <- function(x) {
data(pt, package = "chemr", envir = environment())
el <- pt$symbol[order(nchar(pt$symbol), decreasing = TRUE)]
pat <- paste(el, collapse = "|")
stringr::str_extract_all(x, pat)
}

mol <- c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl")
split_chem_elements(mol)
#> [[1]]
#> [1] "C"  "Cl" "F"
#>
#> [[2]]
#> [1] "Li" "Al" "H"
#>
#> [[3]]
#>  [1] "C"  "Cl" "C"  "O"  "Al" "P"  "O"  "Si" "O"  "Cl"


It is also possible to rewrite the function without calls to non base
packages but that will take some more work.

Hope this helps,

Rui Barradas




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Best way to test for numeric digits?

2023-10-18 Thread Leonard Mada via R-help


Dear List members,

What is the best way to test for numeric digits?

suppressWarnings(as.double(c("Li", "Na", "K",  "2", "Rb", "Ca", "3")))
# [1] NA NA NA  2 NA NA  3
The above requires the use of the suppressWarnings function. Are there 
any better ways?


I was working to extract chemical elements from a formula, something 
like this:

split.symbol.character = function(x, rm.digits = TRUE) {
    # Perl is partly broken in R 4.3, but this works:
    regex = "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])";
    # stringi::stri_split(x, regex = regex);
    s = strsplit(x, regex, perl = TRUE);
    if(rm.digits) {
    s = lapply(s, function(s) {
        isNotD = is.na(suppressWarnings(as.numeric(s)));
        s = s[isNotD];
    });
    }
    return(s);
}

split.symbol.character(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl"))


Sincerely,


Leonard


Note:
# works:
regex = "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])";
strsplit(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl"), regex, perl = T)


# broken in R 4.3.1
# only slightly "erroneous" with stringi::stri_split
regex = "(?<=[A-Z])(?![a-z]|$)|(?=[A-Z])|(?<=[a-z])(?=[^a-z])";
strsplit(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl"), regex, perl = T)

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Create new data frame with conditional sums

2023-10-16 Thread Leonard Mada via R-help


Dear Jason,

The code could look something like:

dummyData = data.frame(Tract=seq(1, 10, by=1),
    Pct = c(0.05,0.03,0.01,0.12,0.21,0.04,0.07,0.09,0.06,0.03),
    Totpop = c(4000,3500,4500,4100,3900,4250,5100,4700,4950,4800))

# Define the cutoffs
# - allow for duplicate entries;
by = 0.03; # by = 0.01;
cutoffs <- seq(0, 0.20, by = by)

# Create a new column with cutoffs
dummyData$Cutoff <- cut(dummyData$Pct, breaks = cutoffs,
    labels = cutoffs[-1], ordered_result = TRUE)

# Sort data
# - we could actually order only the columns:
#   Totpop & Cutoff;
dummyData = dummyData[order(dummyData$Cutoff), ]

# Result
cs = cumsum(dummyData$Totpop)

# Only last entry:
# - I do not have a nice one-liner, but this should do it:
isLast = rev(! duplicated(rev(dummyData$Cutoff)))

data.frame(Total = cs[isLast],
    Cutoff = dummyData$Cutoff[isLast])


Sincerely,

Leonard


On 10/15/2023 7:41 PM, Leonard Mada wrote:

Dear Jason,


I do not think that the solution based on aggregate offered by GPT was 
correct. That quasi-solution only aggregates for every individual level.



As I understand, you want the cumulative sum. The idea was proposed by 
Bert; you need only to sort first based on the cutoff (e.g. using an 
ordered factor). And then only extract the last value for each level. 
If Pct is unique, than you can skip this last step and use directly 
the cumsum (but on the sorted data set).



Alternatives: see the solutions with loops or with sapply.


Sincerely,


Leonard




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Create new data frame with conditional sums

2023-10-15 Thread Leonard Mada via R-help


Dear Jason,


I do not think that the solution based on aggregate offered by GPT was 
correct. That quasi-solution only aggregates for every individual level.



As I understand, you want the cumulative sum. The idea was proposed by 
Bert; you need only to sort first based on the cutoff (e.g. using an 
ordered factor). And then only extract the last value for each level. If 
Pct is unique, than you can skip this last step and use directly the 
cumsum (but on the sorted data set).



Alternatives: see the solutions with loops or with sapply.


Sincerely,


Leonard

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [Pkg-Collaboratos] BioShapes Almost-Package

2023-09-04 Thread Leonard Mada via R-help

Thank you very much for all the responses; especially Duncan's guidance. 
I will add some further ideas on workflows below.


There were quite a few views on GitHub; but there is not much to see, as 
there is absolutely no documentation.  I have added in the meantime a 
basic example:

https://github.com/discoleo/BioShapes/blob/main/Examples.Bioshapes.png
The actual code can do a lot more.



Some ideas on workflows:

1. Most code is written as C/C++; the R code is a thin wrapper around 
the C/C++ functions
It is practical to embed the documentation with the R code - as there is 
no complex code anyway. The same may apply for small packages.


2. Complex R code
The comments may clutter the code. It is also difficult to maintain this 
documentation, as the comments are less easily readable. Separating the 
documentation from the code is a good idea.


Unfortunately, this is not so obvious when you start working on your 
first package.


Many thanks,

Leonard


On 9/4/2023 5:47 AM, Jeff Newmiller wrote:

Leonard... the reason roxygen exists is to allow markup in source files to be 
used to automatically generate the numerous files required by standard R 
packages as documented in Writing R Extensions.

If your goal is to not use source files this way then the solution is to not 
use roxygen at all. Just create those files yourself by directly editing them 
from scratch.

On September 3, 2023 7:06:09 PM PDT, Leonard Mada via R-help 
 wrote:

Thank you Bert.


Clarification:

Indeed, I am using an add-on package: it is customary for that package -
that is what I have seen - to have the entire documentation included as
comments in the R src files. (But maybe I am wrong.)


I will try to find some time over the next few days to explore in more
detail the R documentation. Although, I do not know how this will
interact with the add-on package.


Sincerely,


Leonard


On 9/4/2023 4:58 AM, Bert Gunter wrote:

1. R-package-devel is where queries about package protocols should go.

2. But...
"Is there a succinct, but sufficiently informative description of
documentation tools?"
"Writing R Extensions" (shipped with R) is *the* reference for R
documentation. Whether it's sufficiently "succinct" for you, I cannot
say.

"I find that including the documentation in the source files is very
distracting."
?? R documentation (.Rd) files are separate from source (.R) files.
Inline documentation in source files is an "add-on" capability
provided by optional packages if one prefers to do this. Such packages
parse the source files to extract the documentation into the .Rd
files/ So not sure what you mean here. Apologies if I have misunderstood.

" I would prefer to have only basic comments in the source
files and an expanded documentation in a separate location."
If I understand you correctly, this is exactly what the R package
process specifies. Again, see the "Writing R Extensions" manual for
details.

Also, if you wish to have your package on CRAN, it requires that the
package documents all functions in the package as specified by the
"Writing ..." manual.

Again, further questions and elaboration should go to the
R-package-devel list, although I think the manual is really the
authoritative resource to follow.

Cheers,
Bert



On Sun, Sep 3, 2023 at 5:06 PM Leonard Mada via R-help
 wrote:

 Dear R-List Members,

 I am looking for collaborators to further develop the BioShapes
 almost-package. I added a brief description below.

 A.) BioShapes (Almost-) Package

 The aim of the BioShapes quasi-package is to facilitate the
 generation
 of graphical objects resembling biological and chemical entities,
 enabling the construction of diagrams based on these objects. It
 currently includes functions to generate diagrams depicting viral
 particles, liposomes, double helix / DNA strands, various cell types
 (like neurons, brush-border cells and duct cells), Ig-domains, as
 well
 as more basic shapes.

 It should offer researchers in the field of biological and chemical
 sciences a tool to easily generate diagrams depicting the studied
 biological processes.

 The package lacks a proper documentation and is not yet released on
 CRAN. However, it is available on GitHub:
 https://github.com/discoleo/BioShapes

 Although there are 27 unique cloners on GitHub, I am still looking
 for
 contributors and collaborators. I would appreciate any
 collaborations to
 develop it further. I can be contacted both by email and on GitHub.


 B.) Documentation Tools

 Is there a succinct, but sufficiently informative description of
 documentation tools?
 I find that including the documentation in the source files is very
 distracting. I would prefer to have only basic comments in the source
 files and an expanded documentation in a separate locat

Re: [R] [Pkg-Collaboratos] BioShapes Almost-Package

2023-09-03 Thread Leonard Mada via R-help

Thank you Bert.


Clarification:

Indeed, I am using an add-on package: it is customary for that package - 
that is what I have seen - to have the entire documentation included as 
comments in the R src files. (But maybe I am wrong.)


I will try to find some time over the next few days to explore in more 
detail the R documentation. Although, I do not know how this will 
interact with the add-on package.


Sincerely,


Leonard


On 9/4/2023 4:58 AM, Bert Gunter wrote:
> 1. R-package-devel is where queries about package protocols should go.
>
> 2. But...
> "Is there a succinct, but sufficiently informative description of
> documentation tools?"
> "Writing R Extensions" (shipped with R) is *the* reference for R 
> documentation. Whether it's sufficiently "succinct" for you, I cannot 
> say.
>
> "I find that including the documentation in the source files is very
> distracting."
> ?? R documentation (.Rd) files are separate from source (.R) files. 
> Inline documentation in source files is an "add-on" capability 
> provided by optional packages if one prefers to do this. Such packages 
> parse the source files to extract the documentation into the .Rd 
> files/ So not sure what you mean here. Apologies if I have misunderstood.
>
> " I would prefer to have only basic comments in the source
> files and an expanded documentation in a separate location."
> If I understand you correctly, this is exactly what the R package 
> process specifies. Again, see the "Writing R Extensions" manual for 
> details.
>
> Also, if you wish to have your package on CRAN, it requires that the 
> package documents all functions in the package as specified by the 
> "Writing ..." manual.
>
> Again, further questions and elaboration should go to the 
> R-package-devel list, although I think the manual is really the 
> authoritative resource to follow.
>
> Cheers,
> Bert
>
>
>
> On Sun, Sep 3, 2023 at 5:06 PM Leonard Mada via R-help 
>  wrote:
>
> Dear R-List Members,
>
> I am looking for collaborators to further develop the BioShapes
> almost-package. I added a brief description below.
>
> A.) BioShapes (Almost-) Package
>
> The aim of the BioShapes quasi-package is to facilitate the
> generation
> of graphical objects resembling biological and chemical entities,
> enabling the construction of diagrams based on these objects. It
> currently includes functions to generate diagrams depicting viral
> particles, liposomes, double helix / DNA strands, various cell types
> (like neurons, brush-border cells and duct cells), Ig-domains, as
> well
> as more basic shapes.
>
> It should offer researchers in the field of biological and chemical
> sciences a tool to easily generate diagrams depicting the studied
> biological processes.
>
> The package lacks a proper documentation and is not yet released on
> CRAN. However, it is available on GitHub:
> https://github.com/discoleo/BioShapes
>
> Although there are 27 unique cloners on GitHub, I am still looking
> for
> contributors and collaborators. I would appreciate any
> collaborations to
> develop it further. I can be contacted both by email and on GitHub.
>
>
> B.) Documentation Tools
>
> Is there a succinct, but sufficiently informative description of
> documentation tools?
> I find that including the documentation in the source files is very
> distracting. I would prefer to have only basic comments in the source
> files and an expanded documentation in a separate location.
>
> This question may be more appropriate for the R-package-devel list. I
> can move the 2nd question to that list.
>
> ###
>
> As the biological sciences are very vast, I would be very happy for
> collaborators on the development of this package. Examples with
> existing
> shapes are available in (but are unfortunately not documented):
>
> Man/examples/Examples.Man.R
> R/Examples.R
> R/Examples.Cells.R
> tests/experimental/*
>
>
> Many thanks,
>
> Leonard
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> <http://www.R-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] [Pkg-Collaboratos] BioShapes Almost-Package

2023-09-03 Thread Leonard Mada via R-help


Dear R-List Members,

I am looking for collaborators to further develop the BioShapes 
almost-package. I added a brief description below.


A.) BioShapes (Almost-) Package

The aim of the BioShapes quasi-package is to facilitate the generation 
of graphical objects resembling biological and chemical entities, 
enabling the construction of diagrams based on these objects. It 
currently includes functions to generate diagrams depicting viral 
particles, liposomes, double helix / DNA strands, various cell types 
(like neurons, brush-border cells and duct cells), Ig-domains, as well 
as more basic shapes.


It should offer researchers in the field of biological and chemical 
sciences a tool to easily generate diagrams depicting the studied 
biological processes.


The package lacks a proper documentation and is not yet released on 
CRAN. However, it is available on GitHub:

https://github.com/discoleo/BioShapes

Although there are 27 unique cloners on GitHub, I am still looking for 
contributors and collaborators. I would appreciate any collaborations to 
develop it further. I can be contacted both by email and on GitHub.



B.) Documentation Tools

Is there a succinct, but sufficiently informative description of 
documentation tools?
I find that including the documentation in the source files is very 
distracting. I would prefer to have only basic comments in the source 
files and an expanded documentation in a separate location.


This question may be more appropriate for the R-package-devel list. I 
can move the 2nd question to that list.


###

As the biological sciences are very vast, I would be very happy for 
collaborators on the development of this package. Examples with existing 
shapes are available in (but are unfortunately not documented):


Man/examples/Examples.Man.R
R/Examples.R
R/Examples.Cells.R
tests/experimental/*


Many thanks,

Leonard

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Query on finding root

2023-08-28 Thread Leonard Mada via R-help


Dear R-Users,

Just out of curiosity:
Which of the 2 methods is the better one?

The results seem to differ slightly.


fun = function(u){((26104.50*u^0.03399381)/((1-u)^0.107)) - 28353.7}

uniroot(fun, c(0,1))
# 0.6048184

curve(fun(x), 0, 1)
abline(v=0.3952365, col="red")
abline(v=0.6048184, col="red")
abline(h=0, col="blue")



fun = function(u){ (0.03399381*log(u) - 0.107*log(1-u)) - 
log(28353.7/26104.50) }
fun = function(u){ (0.03399381*log(u) - 0.107*log1p(-u)) - 
log(28353.7/26104.50) }


uniroot(fun, c(0,1))
# 0.6047968

curve(fun(x), 0, 1)
abline(v=0.3952365, col="red")
abline(v=0.6047968, col="red")
abline(h=0, col="blue")

Sincerely,

Leonard

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Numerical stability of: 1/(1 - cos(x)) - 2/x^2

2023-08-18 Thread Leonard Mada via R-help

Dear Bert,


On 8/19/2023 2:47 AM, Bert Gunter wrote:
> "Values of type 2^(-n) (and its binary complement) are exactly 
> represented as floating point numbers and do not generate the error. 
> However, values away from such special x-values will generate errors:"
>
> That was exactly my point: The size of errors depends on the accuracy 
> of binary representation of floating point numbers and their arithmetic.
>
> But you previously said:
> "The ugly thing is that the error only gets worse as x decreases. The
> value neither drops to 0, nor does it blow up to infinity; but it gets
> worse in a continuous manner."
>
> That is wrong and disagrees with what you say above.
>
> -- Bert


On "average", the error increases. But it does NOT increase monotonically:

  x = 2^(-20) * 1.1 # is still relatively close to the exact value!
y <- 1 - x^2/2;
1/(1 - y) - 2/x^2
# 58672303, not 0, nor close to 0;


Sincerely,


Leonard


>
> On Fri, Aug 18, 2023 at 4:34 PM Leonard Mada  wrote:
>
> Dear Bert,
>
>
> Values of type 2^(-n) (and its binary complement) are exactly
> represented as floating point numbers and do not generate the
> error. However, values away from such special x-values will
> generate errors:
>
>
> # exactly represented:
> x = 9.53674316406250e-07
> y <- 1 - x^2/2;
> 1/(1 - y) - 2/x^2
>
> # almost exact:
> x = 9.536743164062502e-07
> y <- 1 - x^2/2;
> 1/(1 - y) - 2/x^2
>
> x = 9.536743164062498e-07
> y <- 1 - x^2/2;
> 1/(1 - y) - 2/x^2
>
> # the result behaves far better around values
> # which can be represented exactly,
> # but fails drastically for other values!
> x = 2^(-20) * 1.1
> y <- 1 - x^2/2;
> 1/(1 - y) - 2/x^2
> # 58672303 instead of 0!
>
>
> Sincerely,
>
>
> Leonard
>
>
> On 8/19/2023 2:06 AM, Bert Gunter wrote:
>> "The ugly thing is that the error only gets worse as x decreases.
>> The
>> value neither drops to 0, nor does it blow up to infinity; but it
>> gets
>> worse in a continuous manner."
>>
>> If I understand you correctly, this is wrong:
>>
>> > x <- 2^(-20) ## considerably less then 1e-4 !!
>> > y <- 1 - x^2/2;
>> > 1/(1 - y) - 2/x^2
>> [1] 0
>>
>> It's all about the accuracy of the binary approximation of
>> floating point numbers (and their arithmetic)
>>
>> Cheers,
>> Bert
>>
>>
>> On Fri, Aug 18, 2023 at 3:25 PM Leonard Mada via R-help
>>  wrote:
>>
>> I have added some clarifications below.
>>
>> On 8/18/2023 10:20 PM, Leonard Mada wrote:
>> > [...]
>> > After more careful thinking, I believe that it is a
>> limitation due to
>> > floating points:
>> > [...]
>> >
>> > The problem really stems from the representation of 1 -
>> x^2/2 as shown
>> > below:
>> > x = 1E-4
>> > print(1 - x^2/2, digits=20)
>> > print(0.5, digits=20) # fails
>> > # 0.50003039
>>
>> The floating point representation of 1 - x^2/2 is the real
>> culprit:
>> # 0.50003039
>>
>> The 3039 at the end is really an error due to the floating point
>> representation. However, this error blows up when inverting
>> the value:
>> x = 1E-4;
>> y = 1 - x^2/2;
>> 1/(1 - y) - 2/x^2
>> # 1.215494
>> # should be 1/(x^2/2) - 2/x^2 = 0
>>
>>
>> The ugly thing is that the error only gets worse as x
>> decreases. The
>> value neither drops to 0, nor does it blow up to infinity;
>> but it gets
>> worse in a continuous manner. At least the reason has become
>> now clear.
>>
>>
>> >
>> > Maybe some functions of type cos1p and cos1n would be handy
>> for such
>> > computations (to replace the manual series expansion):
>> > cos1p(x) = 1 + cos(x)
>> > cos1n(x) = 1 - cos(x)
>> > Though, I do not have yet the big picture.
>> >
>>
>> Sincerely,
>>
>>
>> Leonard
>>
>> >
>> >
>> > On 8/17/2023 1:57 PM, Martin

Re: [R] Numerical stability of: 1/(1 - cos(x)) - 2/x^2

2023-08-18 Thread Leonard Mada via R-help

Dear Bert,


Values of type 2^(-n) (and its binary complement) are exactly 
represented as floating point numbers and do not generate the error. 
However, values away from such special x-values will generate errors:


# exactly represented:
x = 9.53674316406250e-07
y <- 1 - x^2/2;
1/(1 - y) - 2/x^2

# almost exact:
x = 9.536743164062502e-07
y <- 1 - x^2/2;
1/(1 - y) - 2/x^2

x = 9.536743164062498e-07
y <- 1 - x^2/2;
1/(1 - y) - 2/x^2

# the result behaves far better around values
# which can be represented exactly,
# but fails drastically for other values!
x = 2^(-20) * 1.1
y <- 1 - x^2/2;
1/(1 - y) - 2/x^2
# 58672303 instead of 0!


Sincerely,


Leonard


On 8/19/2023 2:06 AM, Bert Gunter wrote:
> "The ugly thing is that the error only gets worse as x decreases. The
> value neither drops to 0, nor does it blow up to infinity; but it gets
> worse in a continuous manner."
>
> If I understand you correctly, this is wrong:
>
> > x <- 2^(-20) ## considerably less then 1e-4 !!
> > y <- 1 - x^2/2;
> > 1/(1 - y) - 2/x^2
> [1] 0
>
> It's all about the accuracy of the binary approximation of floating 
> point numbers (and their arithmetic)
>
> Cheers,
> Bert
>
>
> On Fri, Aug 18, 2023 at 3:25 PM Leonard Mada via R-help 
>  wrote:
>
> I have added some clarifications below.
>
> On 8/18/2023 10:20 PM, Leonard Mada wrote:
> > [...]
> > After more careful thinking, I believe that it is a limitation
> due to
> > floating points:
> > [...]
> >
> > The problem really stems from the representation of 1 - x^2/2 as
> shown
> > below:
> > x = 1E-4
> > print(1 - x^2/2, digits=20)
> > print(0.5, digits=20) # fails
> > # 0.50003039
>
> The floating point representation of 1 - x^2/2 is the real culprit:
> # 0.50003039
>
> The 3039 at the end is really an error due to the floating point
> representation. However, this error blows up when inverting the value:
> x = 1E-4;
> y = 1 - x^2/2;
> 1/(1 - y) - 2/x^2
> # 1.215494
> # should be 1/(x^2/2) - 2/x^2 = 0
>
>
> The ugly thing is that the error only gets worse as x decreases. The
> value neither drops to 0, nor does it blow up to infinity; but it
> gets
> worse in a continuous manner. At least the reason has become now
> clear.
>
>
> >
> > Maybe some functions of type cos1p and cos1n would be handy for
> such
> > computations (to replace the manual series expansion):
> > cos1p(x) = 1 + cos(x)
> > cos1n(x) = 1 - cos(x)
> > Though, I do not have yet the big picture.
> >
>
> Sincerely,
>
>
> Leonard
>
> >
> >
> > On 8/17/2023 1:57 PM, Martin Maechler wrote:
> >>>>>>> Leonard Mada
> >>>>>>>  on Wed, 16 Aug 2023 20:50:52 +0300 writes:
> >>  > Dear Iris,
> >>  > Dear Martin,
> >>
> >>  > Thank you very much for your replies. I add a few comments.
> >>
> >>  > 1.) Correct formula
> >>  > The formula in the Subject Title was correct. A small
> glitch
> >> swept into
> >>  > the last formula:
> >>  > - 1/(cos(x) - 1) - 2/x^2
> >>  > or
> >>  > 1/(1 - cos(x)) - 2/x^2 # as in the subject title;
> >>
> >>  > 2.) log1p
> >>  > Actually, the log-part behaves much better. And when it
> fails,
> >> it fails
> >>  > completely (which is easy to spot!).
> >>
> >>  > x = 1E-6
> >>  > log(x) -log(1 - cos(x))/2
> >>  > # 0.3465291
> >>
> >>  > x = 1E-8
> >>  > log(x) -log(1 - cos(x))/2
> >>  > # Inf
> >>  > log(x) - log1p(- cos(x))/2
> >>  > # Inf => fails as well!
> >>  > # although using only log1p(cos(x)) seems to do the trick;
> >>  > log1p(cos(x)); log(2)/2;
> >>
> >>  > 3.) 1/(1 - cos(x)) - 2/x^2
> >>  > It is possible to convert the formula to one which is
> >> numerically more
> >>  > stable. It is also possible to compute it manually, but it
> >> involves much
> >>  > more work and is also error prone:
> >>
> >>  > (x^2 - 2 + 2*cos(x))

Re: [R] Numerical stability of: 1/(1 - cos(x)) - 2/x^2

2023-08-18 Thread Leonard Mada via R-help


I have added some clarifications below.

On 8/18/2023 10:20 PM, Leonard Mada wrote:

[...]
After more careful thinking, I believe that it is a limitation due to 
floating points:

[...]

The problem really stems from the representation of 1 - x^2/2 as shown 
below:

x = 1E-4
print(1 - x^2/2, digits=20)
print(0.5, digits=20) # fails
# 0.50003039


The floating point representation of 1 - x^2/2 is the real culprit:
# 0.50003039

The 3039 at the end is really an error due to the floating point 
representation. However, this error blows up when inverting the value:

x = 1E-4;
y = 1 - x^2/2;
1/(1 - y) - 2/x^2
# 1.215494
# should be 1/(x^2/2) - 2/x^2 = 0


The ugly thing is that the error only gets worse as x decreases. The 
value neither drops to 0, nor does it blow up to infinity; but it gets 
worse in a continuous manner. At least the reason has become now clear.





Maybe some functions of type cos1p and cos1n would be handy for such 
computations (to replace the manual series expansion):

cos1p(x) = 1 + cos(x)
cos1n(x) = 1 - cos(x)
Though, I do not have yet the big picture.



Sincerely,


Leonard




On 8/17/2023 1:57 PM, Martin Maechler wrote:

Leonard Mada
 on Wed, 16 Aug 2023 20:50:52 +0300 writes:

 > Dear Iris,
 > Dear Martin,

 > Thank you very much for your replies. I add a few comments.

 > 1.) Correct formula
 > The formula in the Subject Title was correct. A small glitch 
swept into

 > the last formula:
 > - 1/(cos(x) - 1) - 2/x^2
 > or
 > 1/(1 - cos(x)) - 2/x^2 # as in the subject title;

 > 2.) log1p
 > Actually, the log-part behaves much better. And when it fails, 
it fails

 > completely (which is easy to spot!).

 > x = 1E-6
 > log(x) -log(1 - cos(x))/2
 > # 0.3465291

 > x = 1E-8
 > log(x) -log(1 - cos(x))/2
 > # Inf
 > log(x) - log1p(- cos(x))/2
 > # Inf => fails as well!
 > # although using only log1p(cos(x)) seems to do the trick;
 > log1p(cos(x)); log(2)/2;

 > 3.) 1/(1 - cos(x)) - 2/x^2
 > It is possible to convert the formula to one which is 
numerically more
 > stable. It is also possible to compute it manually, but it 
involves much

 > more work and is also error prone:

 > (x^2 - 2 + 2*cos(x)) / (x^2 * (1 - cos(x)))
 > And applying L'Hospital:
 > (2*x - 2*sin(x)) / (2*x * (1 - cos(x)) + x^2*sin(x))
 > # and a 2nd & 3rd & 4th time
 > 1/6

 > The big problem was that I did not expect it to fail for x = 
1E-4. I

 > thought it is more robust and works maybe until 1E-5.
 > x = 1E-5
 > 2/x^2 - 2E+10
 > # -3.814697e-06

 > This is the reason why I believe that there is room for 
improvement.


 > Sincerely,
 > Leonard

Thank you, Leonard.
Yes, I agree that it is amazing how much your formula suffers from
(a generalization of) "cancellation" --- leading you to think
there was a problem with cos() or log() or .. in R.
But really R uses the system builtin libmath library, and the
problem is really the inherent instability of your formula.

Indeed your first approximation was not really much more stable:

## 3.) 1/(1 - cos(x)) - 2/x^2
## It is possible to convert the formula to one which is numerically 
more
## stable. It is also possible to compute it manually, but it 
involves much

## more work and is also error prone:
## (x^2 - 2 + 2*cos(x)) / (x^2 * (1 - cos(x)))
## MM: but actually, that approximation does not seem better (close 
to the breakdown region):

f1 <- \(x) 1/(1 - cos(x)) - 2/x^2
f2 <- \(x) (x^2 - 2 + 2*cos(x)) / (x^2 * (1 - cos(x)))
curve(f1, 1e-8, 1e-1, log="xy" n=2^10)
curve(f2, add = TRUE, col=2,   n=2^10)
## Zoom in:
curve(f1, 1e-4, 1e-1, log="xy",n=2^9)
curve(f2, add = TRUE, col=2,   n=2^9)
## Zoom in much more in y-direction:
yl <- 1/6 + c(-5, 20)/10
curve(f1, 1e-4, 1e-1, log="x", ylim=yl, n=2^9)
abline(h = 1/6, lty=3, col="gray")
curve(f2, add = TRUE, n=2^9, col=adjustcolor(2, 1/2))

Now, you can use the Rmpfr package (interface to the GNU MPFR
multiple-precision C library) to find out more :

if(!requireNamespace("Rmpfr")) install.packages("Rmpfr")
M <- function(x, precBits=128) Rmpfr::mpfr(x, precBits)

(xM <- M(1e-8))# yes, only ~ 16 dig accurate
## 1.20922560830128472675327e-8
M(10, 128)^-8 # would of course be more accurate,
## but we want the calculation for the double precision number 1e-8

## Now you can draw "the truth" into the above plots:
curve(f1, 1e-4, 1e-1, log="xy",n=2^9)
curve(f2, add = TRUE, col=2,   n=2^9)
## correct:
curve(f1(M(x, 256)), add = TRUE, col=4, lwd=2, n=2^9)
abline(h = 1/6, lty=3, col="gray")

But, indeed we take note  how much it is the formula instability:
Also MPFR needs a lot of extra bits precision before it gets to
the correct numbers:

xM <- c(M(1e-8,  80), M(1e-8,  96), M(1e-8, 112),
 M(1e-8, 128), M(1e-8, 180), M(1e-8, 256))
## to and round back to 70 bits for display:
R <- \(x) Rmpfr::roundMpfr(x, 70)

Re: [R] Numerical stability of: 1/(1 - cos(x)) - 2/x^2

2023-08-18 Thread Leonard Mada via R-help


Dear Martin,

Thank you very much for your analysis.

I add only a small comment:
- the purpose of the modified formula was to apply l'Hospital;
- there are other ways to transform the formula; although applying 
l'Hospital once is probably more robust than simple transformations (but 
the computations are also more tedious and error prone);


After more careful thinking, I believe that it is a limitation due to 
floating points:

x = 1E-4
1/(-x^2/2 - x^4/24) + 2/x^2
1/6

y = 1 - x^2/2 - x^4/24;
1/(cos(x) - 1) + 2/x^2
1/(y - 1) + 2/x^2
# -1.215494
# correct: 1/6

We need the 3rd term for the correct computation of cos(x) in this 
problem: but this is x^4 / 24, which for 1E-4 requires precision at 
least up to 1E-16 / 24, or ~ 1E-18. I did not thought initially about 
that. The trigonometric functions skip one term, and are therefore much 
uglier than the log. The problem really stems from the representation of 
1 - x^2/2 as shown below:

x = 1E-4
print(1 - x^2/2, digits=20)
print(0.5, digits=20) # fails
# 0.50003039

Maybe some functions of type cos1p and cos1n would be handy for such 
computations (to replace the manual series expansion):

cos1p(x) = 1 + cos(x)
cos1n(x) = 1 - cos(x)
Though, I do not have yet the big picture.


Sincerely,


Leonard


On 8/17/2023 1:57 PM, Martin Maechler wrote:

Leonard Mada
 on Wed, 16 Aug 2023 20:50:52 +0300 writes:

 > Dear Iris,
 > Dear Martin,

 > Thank you very much for your replies. I add a few comments.

 > 1.) Correct formula
 > The formula in the Subject Title was correct. A small glitch swept into
 > the last formula:
 > - 1/(cos(x) - 1) - 2/x^2
 > or
 > 1/(1 - cos(x)) - 2/x^2 # as in the subject title;

 > 2.) log1p
 > Actually, the log-part behaves much better. And when it fails, it fails
 > completely (which is easy to spot!).

 > x = 1E-6
 > log(x) -log(1 - cos(x))/2
 > # 0.3465291

 > x = 1E-8
 > log(x) -log(1 - cos(x))/2
 > # Inf
 > log(x) - log1p(- cos(x))/2
 > # Inf => fails as well!
 > # although using only log1p(cos(x)) seems to do the trick;
 > log1p(cos(x)); log(2)/2;

 > 3.) 1/(1 - cos(x)) - 2/x^2
 > It is possible to convert the formula to one which is numerically more
 > stable. It is also possible to compute it manually, but it involves much
 > more work and is also error prone:

 > (x^2 - 2 + 2*cos(x)) / (x^2 * (1 - cos(x)))
 > And applying L'Hospital:
 > (2*x - 2*sin(x)) / (2*x * (1 - cos(x)) + x^2*sin(x))
 > # and a 2nd & 3rd & 4th time
 > 1/6

 > The big problem was that I did not expect it to fail for x = 1E-4. I
 > thought it is more robust and works maybe until 1E-5.
 > x = 1E-5
 > 2/x^2 - 2E+10
 > # -3.814697e-06

 > This is the reason why I believe that there is room for improvement.

 > Sincerely,
 > Leonard

Thank you, Leonard.
Yes, I agree that it is amazing how much your formula suffers from
(a generalization of) "cancellation" --- leading you to think
there was a problem with cos() or log() or .. in R.
But really R uses the system builtin libmath library, and the
problem is really the inherent instability of your formula.

Indeed your first approximation was not really much more stable:

## 3.) 1/(1 - cos(x)) - 2/x^2
## It is possible to convert the formula to one which is numerically more
## stable. It is also possible to compute it manually, but it involves much
## more work and is also error prone:
## (x^2 - 2 + 2*cos(x)) / (x^2 * (1 - cos(x)))
## MM: but actually, that approximation does not seem better (close to the 
breakdown region):
f1 <- \(x) 1/(1 - cos(x)) - 2/x^2
f2 <- \(x) (x^2 - 2 + 2*cos(x)) / (x^2 * (1 - cos(x)))
curve(f1, 1e-8, 1e-1, log="xy" n=2^10)
curve(f2, add = TRUE, col=2,   n=2^10)
## Zoom in:
curve(f1, 1e-4, 1e-1, log="xy",n=2^9)
curve(f2, add = TRUE, col=2,   n=2^9)
## Zoom in much more in y-direction:
yl <- 1/6 + c(-5, 20)/10
curve(f1, 1e-4, 1e-1, log="x", ylim=yl, n=2^9)
abline(h = 1/6, lty=3, col="gray")
curve(f2, add = TRUE, n=2^9, col=adjustcolor(2, 1/2))

Now, you can use the Rmpfr package (interface to the GNU MPFR
multiple-precision C library) to find out more :

if(!requireNamespace("Rmpfr")) install.packages("Rmpfr")
M <- function(x, precBits=128) Rmpfr::mpfr(x, precBits)

(xM <- M(1e-8))# yes, only ~ 16 dig accurate
## 1.20922560830128472675327e-8
M(10, 128)^-8 # would of course be more accurate,
## but we want the calculation for the double precision number 1e-8

## Now you can draw "the truth" into the above plots:
curve(f1, 1e-4, 1e-1, log="xy",n=2^9)
curve(f2, add = TRUE, col=2,   n=2^9)
## correct:
curve(f1(M(x, 256)), add = TRUE, col=4, lwd=2, n=2^9)
abline(h = 1/6, lty=3, col="gray")

But, indeed we take note  how much it is the formula instability:
Also MPFR needs a lot of extra bits precision before it gets to
the correct numbers:

xM <- c(M(1e-8,  80), M(1e-8,  96), M(1e-8, 112),

Re: [R] Numerical stability of: 1/(1 - cos(x)) - 2/x^2

2023-08-16 Thread Leonard Mada via R-help

Dear Iris,
Dear Martin,

Thank you very much for your replies. I add a few comments.

1.) Correct formula
The formula in the Subject Title was correct. A small glitch swept into 
the last formula:
- 1/(cos(x) - 1) - 2/x^2
or
1/(1 - cos(x)) - 2/x^2 # as in the subject title;

2.) log1p
Actually, the log-part behaves much better. And when it fails, it fails 
completely (which is easy to spot!).

x = 1E-6
log(x) -log(1 - cos(x))/2
# 0.3465291

x = 1E-8
log(x) -log(1 - cos(x))/2
# Inf
log(x) - log1p(- cos(x))/2
# Inf => fails as well!
# although using only log1p(cos(x)) seems to do the trick;
log1p(cos(x)); log(2)/2;

3.) 1/(1 - cos(x)) - 2/x^2
It is possible to convert the formula to one which is numerically more 
stable. It is also possible to compute it manually, but it involves much 
more work and is also error prone:

(x^2 - 2 + 2*cos(x)) / (x^2 * (1 - cos(x)))
And applying L'Hospital:
(2*x - 2*sin(x)) / (2*x * (1 - cos(x)) + x^2*sin(x))
# and a 2nd & 3rd & 4th time
1/6

The big problem was that I did not expect it to fail for x = 1E-4. I 
thought it is more robust and works maybe until 1E-5.
x = 1E-5
2/x^2 - 2E+10
# -3.814697e-06

This is the reason why I believe that there is room for improvement.

Sincerely,

Leonard


On 8/16/2023 9:51 AM, Iris Simmons wrote:
> You could rewrite
>
> 1 - cos(x)
>
> as
>
> 2 * sin(x/2)^2
>
> and that might give you more precision?
>
> On Wed, Aug 16, 2023, 01:50 Leonard Mada via R-help 
>  wrote:
>
> Dear R-Users,
>
> I tried to compute the following limit:
> x = 1E-3;
> (-log(1 - cos(x)) - 1/(cos(x)-1)) / 2 - 1/(x^2) + log(x)
> # 0.4299226
> log(2)/2 + 1/12
> # 0.4299069
>
> However, the result diverges as x decreases:
> x = 1E-4
> (-log(1 - cos(x)) - 1/(cos(x)-1)) / 2 - 1/(x^2) + log(x)
> # 0.9543207
> # correct: 0.4299069
>
> I expected the precision to remain good with x = 1E-4 or x = 1E-5.
>
> This part blows up - probably some significant loss of precision of
> cos(x) when x -> 0:
> 1/(cos(x) - 1) - 2/x^2
>
> Maybe there is some room for improvement.
>
> Sincerely,
>
> Leonard
> ==
> The limit was part of the integral:
> up = pi/5;
> integrate(function(x) 1 / sin(x)^3 - 1/x^3 - 1/(2*x), 0, up)
> (log( (1 - cos(up)) / (1 + cos(up)) ) +
>      + 1/(cos(up) - 1) + 1/(cos(up) + 1) + 2*log(2) - 1/3) / 4 +
>      + (1/(2*up^2) - log(up)/2);
>
> # see:
> 
> https://github.com/discoleo/R/blob/master/Math/Integrals.Trig.Fractions.Poly.R
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> 
> https://eu01.z.antigena.com/l/boSAjwics773HEe0HFHDZf3m1AU7fmWr4bglOgXO3sMyE9zLHAAMytf-SnATeHdnKJyeFbBsM6nXG-uPpd0NTW30ooAzNgYV5uhwnlhwxr4i8i21qKJUC~IrUTz2X1a5ioWqOWtHPlgzUrOid926sUOri-_H8XkLDcodDRWb
>
> PLEASE do read the posting guide
> 
> https://eu01.z.antigena.com/l/AUS87vWM-isc3qtDXhJTp4jyQv7tuxdolKFlpY6mWcDOjbSlNzcD~~GORwHJFcX866fJF~qQmKc9R6LV9upRYcB4CBlSnLN0U_X8fIqLyhOIiPzDjYTVLEgiilZrKGuUqfW72b_50MVi~TaTlnE_R7fz8zXoZWGrKmGA
>
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Numerical stability of: 1/(1 - cos(x)) - 2/x^2

2023-08-15 Thread Leonard Mada via R-help


Dear R-Users,

I tried to compute the following limit:
x = 1E-3;
(-log(1 - cos(x)) - 1/(cos(x)-1)) / 2 - 1/(x^2) + log(x)
# 0.4299226
log(2)/2 + 1/12
# 0.4299069

However, the result diverges as x decreases:
x = 1E-4
(-log(1 - cos(x)) - 1/(cos(x)-1)) / 2 - 1/(x^2) + log(x)
# 0.9543207
# correct: 0.4299069

I expected the precision to remain good with x = 1E-4 or x = 1E-5.

This part blows up - probably some significant loss of precision of 
cos(x) when x -> 0:

1/(cos(x) - 1) - 2/x^2

Maybe there is some room for improvement.

Sincerely,

Leonard
==
The limit was part of the integral:
up = pi/5;
integrate(function(x) 1 / sin(x)^3 - 1/x^3 - 1/(2*x), 0, up)
(log( (1 - cos(up)) / (1 + cos(up)) ) +
    + 1/(cos(up) - 1) + 1/(cos(up) + 1) + 2*log(2) - 1/3) / 4 +
    + (1/(2*up^2) - log(up)/2);

# see:
https://github.com/discoleo/R/blob/master/Math/Integrals.Trig.Fractions.Poly.R

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Regex Split?

2023-05-05 Thread Leonard Mada via R-help

Dear Bert,

Thank you for the suggestion. Indeed, there are various solutions and 
workarounds. However, there is still a bug in strsplit.

2.) gsub
I would try to avoid gsub on a Wikipedia-sized corpus: using strsplit 
directly should be far more efficient.

3.) Punctuation marks
Abbreviations and "word1-word2" may be a problem:
gsub("(?[[:punct:]])", "\\1 ", "A.B.C.", perl=T)
# "A. B. C. "

I do not yet have an intuition if the spaces in "A. B. C. " would 
adversely affect the language model. But this goes off-topic.

Sincerely,

Leonard


On 5/6/2023 1:35 AM, Bert Gunter wrote:
> Primarily for my own amusement, here is a way to do what I think you 
> wanted without look-aheads/behinds
>
> strsplit(gsub("([[:punct:]])"," \\1 ","a bc,def, adef,x; ,,gh"), " +")
> [[1]]
>  [1] "a"    "bc"   ","    "def"  ","    "adef" ","    "x"  ";"
> [10] ","    ","    "gh"
>
> I certainly would *not* claim that it is in any way superior to 
> anything that has already been suggested -- indeed, probably the 
> contrary. But it's simple (as am I).
>
> Cheers,
> Bert
>
> On Fri, May 5, 2023 at 2:54 PM Leonard Mada via R-help 
>  wrote:
>
> Dear Avi,
>
> Punctuation marks are used in various NLP language models. Preserving
> the "," is therefore useful in such scenarios and Regex are useful to
> accomplish this (especially if you have sufficient experience with
> such
> expressions).
>
> I observed only an odd behaviour using strsplit: the example
> string is
> constructed; but it is always wise to test a Regex expression against
> various scenarios. It is usually hard to predict what special
> cases will
> occur in a specific corpus.
>
> strsplit("a bc,def, adef ,,gh", " |(?=,)|(?<=,)(?![ ])", perl=T)
> # "a"  "bc"  ","  "def"  ","  ""  "adef"  ","  ","  "gh"
>
> stringi::stri_split("a bc,def, adef ,,gh", regex="
> |(?=,)|(?<=,)(?![ ])")
> # "a"    "bc"   ","    "def"  ","    "adef"  "" ","    "," "gh"
>
> stringi::stri_split("a bc,def, adef ,,gh", regex=" |(? )(?=,)|(?<=,)(?![ ])")
> # "a"    "bc"   ","    "def"  ","    "adef"  ","    "," "gh"
>
> # Expected:
> # "a"  "bc"   ","  "def"   ","  "adef"  ","   ","  "gh"
> # see 2nd instance of stringi::stri_split
>
>
> Sincerely,
>
>
> Leonard
>
>
> On 5/5/2023 11:20 PM, avi.e.gr...@gmail.com wrote:
> > Leonard,
> >
> > It can be helpful to spell out your intent in English or some of
> us have to go back to the documentation to remember what some of
> the operators do.
> >
> > Your text being searched seems to be an example of items between
> comas with an optional space after some commas and in one case,
> nothing between commas.
> >
> > So what is your goal for the example, and in general? You
> mention a bit unclearly at the end some of what you expect and I
> think it would be clearer if you also showed exactly the output
> you would want.
> >
> > I saw some other replies that addressed what you wanted and am
> going to reply in another direction.
> >
> > Why do things the hard way using things like lookahead or look
> behind? Would several steps get you the result way more clearly?
> >
> > For the sake of argument, you either want what reading in a CSV
> file would supply, or something else. Since you are not simply
> splitting on commas, it sounds like something else. But what
> exactly else? Something as simple as this on just a comma produces
> results including empty strings and embedded leading or trailing
> spaces:
> >
> > strsplit("a bc,def, adef ,,gh", ",")
> > [[1]]
> > [1] "a bc"   "def"    " adef " ""       "gh"
> >
> > That can of course be handled by, for example, trimming the
> result after unlisting the odd way strsplit returns results:
> >
> >

Re: [R] Regex Split?

2023-05-05 Thread Leonard Mada via R-help


Dear Avi,

Punctuation marks are used in various NLP language models. Preserving 
the "," is therefore useful in such scenarios and Regex are useful to 
accomplish this (especially if you have sufficient experience with such 
expressions).


I observed only an odd behaviour using strsplit: the example string is 
constructed; but it is always wise to test a Regex expression against 
various scenarios. It is usually hard to predict what special cases will 
occur in a specific corpus.


strsplit("a bc,def, adef ,,gh", " |(?=,)|(?<=,)(?![ ])", perl=T)
# "a"  "bc"  ","  "def"  ","  ""  "adef"  ","  ","  "gh"

stringi::stri_split("a bc,def, adef ,,gh", regex=" |(?=,)|(?<=,)(?![ ])")
# "a"    "bc"   ","    "def"  ","    "adef"  "" ","    "," "gh"

stringi::stri_split("a bc,def, adef ,,gh", regex=" |(?)(?=,)|(?<=,)(?![ ])")

# "a"    "bc"   ","    "def"  ","    "adef"  ","    ","    "gh"

# Expected:
# "a"  "bc"   ","  "def"   ","  "adef"  ","   ","  "gh"
# see 2nd instance of stringi::stri_split


Sincerely,


Leonard


On 5/5/2023 11:20 PM, avi.e.gr...@gmail.com wrote:

Leonard,

It can be helpful to spell out your intent in English or some of us have to go 
back to the documentation to remember what some of the operators do.

Your text being searched seems to be an example of items between comas with an 
optional space after some commas and in one case, nothing between commas.

So what is your goal for the example, and in general? You mention a bit 
unclearly at the end some of what you expect and I think it would be clearer if 
you also showed exactly the output you would want.

I saw some other replies that addressed what you wanted and am going to reply 
in another direction.

Why do things the hard way using things like lookahead or look behind? Would 
several steps get you the result way more clearly?

For the sake of argument, you either want what reading in a CSV file would 
supply, or something else. Since you are not simply splitting on commas, it 
sounds like something else. But what exactly else? Something as simple as this 
on just a comma produces results including empty strings and embedded leading 
or trailing spaces:

strsplit("a bc,def, adef ,,gh", ",")
[[1]]
[1] "a bc"   "def"" adef " ""   "gh"

That can of course be handled by, for example, trimming the result after 
unlisting the odd way strsplit returns results:

library("stringr")
str_squish(unlist(strsplit("a bc,def, adef ,,gh", ",")))

[1] "a bc" "def"  "adef" "" "gh"

Now do you want the empty string to be something else, such as an NA? That can 
be done too with another step.

And a completely different variant can be used to read in your one-line CSV as 
text using standard overkill tools:


read.table(text="a bc,def, adef ,,gh", sep=",")

 V1  V2 V3 V4 V5
1 a bc def  adef  NA gh

The above is a vector of texts. But if you simply want to reassemble your 
initial string cleaned up a bit, you can use paste to put back commas, as in a 
variation of the earlier example:


paste(str_squish(unlist(strsplit("a bc,def, adef ,,gh", ","))), collapse=",")

[1] "a bc,def,adef,,gh"

So my question is whether using advanced methods is really necessary for your 
case, or even particularly efficient. If efficiency matters, often, it is 
better to use tools without regular expressions such as paste0() when they meet 
your needs.

Of course, unless I know what you are actually trying to do, my remarks may be 
not useful.



-Original Message-
From: R-help  On Behalf Of Leonard Mada via R-help
Sent: Thursday, May 4, 2023 5:00 PM
To: R-help Mailing List 
Subject: [R] Regex Split?

Dear R-Users,

I tried the following 3 Regex expressions in R 4.3:
strsplit("a bc,def, adef ,,gh", " |(?=,)|(?<=,)(?![ ])", perl=T)
# "a""bc"   ",""def"  ",""" "adef" ",""," "gh"

strsplit("a bc,def, adef ,,gh", " |(?https://eu01.z.antigena.com/l/boS91wizs77ZHrpn6fDgE-TZu7JxUnjyNg_9mZDUsLWLylcL-dhQytfeUHheLHZnKJw-VwwfCd_W4XdAukyKenqYPFzSJmP5FrWmF_wepejCrBByUVa66jUF7wKGiA8LnqB49ZUVq-urjKs272Rl-mj-SE1q7--Xj1UXRol3
PLEASE do read the posting guide 
https://eu01.z.antigena.com/l/rUS82cEKjOa3tTqQ7yTAXLpuOWG1NttoMdEKDQkk3EZhrLW63rsvJ77vuFxoc44Nwo7BGuQyBzF3bNlYLccamhXBk0shpe_1ZhOeonqIbTm59I58PKOPwwqUt6gLF2fLg3OmstDk7ueraKARO4qpUToOguMdYKyE2_LZnBk7QR
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Regex Split?

2023-05-05 Thread Leonard Mada via R-help

Dear Bill,


Indeed, there are other cases as well - as documented.


Various Regex sites give the warning to avoid the legacy syntax 
"[[:<:]]", so this is the alternative syntax:
strsplit(split="\\b(?=\\w)", "One, two; three!", perl=TRUE)
# "O"  "n"  "e"  ", " "t"  "w"  "o"  "; " "t"  "h"  "r"  "e"  "e" "!"

gsub("\\b(?=\\w)", "#", "One, two; three!", perl=TRUE)
# "#One, #two; #three!"


Sincerely,


Leonard


On 5/5/2023 6:19 PM, Bill Dunlap wrote:
> https://eu01.z.antigena.com/l/BgIBOxsm88PwDTBiTTrQ784MFk2oGZVOA3RMHiarAZuyoEemKrcnpfJeD8X0FgxRDG33qHZho~NriRCbhv9_Ffr3EOfqn2vpaNUAlCDjQ8nOyVUgPM2iGnHi-qpN54kl1YVO_gHimn0m2ZJ68ntGtysras~0mRMDuAgwbTXsQcQ~
>  
> (from 2016, still labelled 'UNCONFIRMED") contains some other examples 
> of strsplit misbehaving when using 0-length perl look-behinds.  E.g.,
>
> > strsplit(split="[[:<:]]", "One, two; three!", perl=TRUE)[[1]]
>  [1] "O"  "n"  "e"  ", " "t"  "w"  "o"  "; " "t"  "h"  "r"  "e"  "e"  "!"
> > gsub(pattern="[[:<:]]", "#", "One, two; three!", perl=TRUE)
> [1] "#One, #two; #three!"
>
> The bug report includes the comment
> It may be possible that strsplit is not using the startoffset argument
> to pcre_exec
>
>pcre/pcre/doc/html/pcreapi.html
>  A non-zero starting offset is useful when searching for another match
>  in the same subject by calling pcre_exec() again after a previous
>  success. Setting startoffset differs from just passing over a
>  shortened string and setting PCRE_NOTBOL in the case of a pattern that
>  begins with any kind of lookbehind.
>
> or it could be something else.
>
>
> On Fri, May 5, 2023 at 3:25 AM Ivan Krylov  wrote:
>
> On Thu, 4 May 2023 23:59:33 +0300
> Leonard Mada via R-help  wrote:
>
> > strsplit("a bc,def, adef ,,gh", " |(?=,)|(?<=,)(?![ ])", perl=T)
> > # "a"    "bc"   ","    "def"  ","    "" "adef" "," "," "gh"
> >
> > strsplit("a bc,def, adef ,,gh", " |(? perl=T)
> > # "a"    "bc"   ","    "def"  ","    "" "adef" "," "," "gh"
> >
> > strsplit("a bc,def, adef ,,gh", " |(? > perl=T)
> > # "a"    "bc"   ","    "def"  ","    "" "adef" "," "," "gh"
> >
> >
> > Is this correct?
>
> Perl seems to return the results you expect:
>
> $ perl -E '
>  say("$_:\n ", join " ", map qq["$_"], split $_, q[a bc,def, adef
> ,,gh])
>  for (
>   qr[ |(?=,)|(?<=,)(?![ ])],
>   qr[ |(?   qr[ |(? )'
> (?^u: |(?=,)|(?<=,)(?![ ])):
>  "a" "bc" "," "def" "," "adef" "," "," "gh"
> (?^u: |(?  "a" "bc" "," "def" "," "adef" "," "," "gh"
> (?^u: |(?  "a" "bc" "," "def" "," "adef" "," "," "gh"
>
> The same thing happens when I ask R to replace the separators instead
> of splitting by them:
>
> sapply(setNames(nm = c(
>  " |(?=,)|(?<=,)(?![ ])",
>  " |(?  " |(? ), gsub, '[]', "a bc,def, adef ,,gh", perl = TRUE)
> #               |(?=,)|(?<=,)(?![ ])         |(? )(?=,)|(?<=,)(?![ ])
> # "a[]bc[],[]def[],[]adef[],[],[]gh"
> "a[]bc[],[]def[],[]adef[],[],[]gh"
> #        |(? # "a[]bc[],[]def[],[]adef[],[],[]gh"
>
> I think that something strange happens when the delimeter pattern
> matches more than once in the same place:
>
> gsub(
>  '(?=<--)|(?<=-->)', '[]', 'split here --><-- split here',
>  perl = TRUE
> )
> # [1] "split here -->[]<-- split here"
>
> (Both Perl's split() and s///g agree with R's gsub() here, although I
> would have accepted "split here -->[][]<-- split her

[R] Regex Split?

2023-05-04 Thread Leonard Mada via R-help


Dear R-Users,

I tried the following 3 Regex expressions in R 4.3:
strsplit("a bc,def, adef ,,gh", " |(?=,)|(?<=,)(?![ ])", perl=T)
# "a"    "bc"   ","    "def"  ","    "" "adef" ","    "," "gh"

strsplit("a bc,def, adef ,,gh", " |(?- the first one could also return "", "," (but probably not; not fully 
sure about this);



Sincerely,


Leonard

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Split String in regex while Keeping Delimiter

2023-04-13 Thread Leonard Mada via R-help


Dear Emily,

I have written a more robust version of the function:
extract.nonLetters = function(x, rm.space = TRUE, normalize=TRUE, 
sort=TRUE) {

    if(normalize) str = stringi::stri_trans_nfc(str);
    ch = strsplit(str, "", fixed = TRUE);
    ch = unique(unlist(ch));
    if(sort) ch = sort(ch);
    pat = if(rm.space) "^[a-zA-Z ]" else "^[a-zA-Z]";
    isLetter = grepl(pat, ch);
    ch = ch[ ! isLetter];
    return(stringi::stri_escape_unicode(ch));
}
extract.nonLetters(str)
# "\\u2013" "+"

This code ("\u2013") is included in the expanded Regex expression:
tokens = strsplit(str, "(?<=[-+\u2010-\u2014])\\s++", perl=TRUE)


Sincerely,

Leonard


On 4/13/2023 9:40 PM, Leonard Mada wrote:

Dear Emily,

Using a look-behind solves the split problem in this case. (Note: 
Using Regex is in most/many cases the simplest solution.)


str = c("leucocyten + gramnegatieve staven +++ grampositieve staven ++",
"leucocyten – grampositieve coccen +")

tokens = strsplit(str, "(?<=[-+])\\s++", perl=TRUE)

PROBLEM
The current expression does NOT work for a different reason: the "-" 
is coded using a NON-ASCII character.


I have written a small utility function to approximately extract 
"non-standard" characters:

### Identify non-ASCII Characters
# beware: the filtering and the sorting may break the codes;
extract.nonLetters = function(x, rm.space = TRUE, sort=FALSE) {
    code = as.numeric(unique(unlist(lapply(x, charToRaw;
    isLetter =
    (code >= 97 & code <= 122) |
    (code >= 65 & code <= 90);
    code = code[ ! isLetter];
    if(rm.space) {
    # removes only simple space!
    code = code[code != 32];
    }
    if(sort) code = sort(code);
    return(code);
}
extract.nonLetters(str, sort = FALSE)
# 43 226 128 147

Note:
- the code for "+" is 43, and for simple "-" is 45: as.numeric 
(charToRaw("+-"));
- "226 128 147" codes something else, but it is not trivial to get the 
Unicode code Point;
https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192=128=dec 



The following is a more comprehensive Regex expression, which accepts 
many variants of "-":

tokens = strsplit(str, "(?<=[-+\u2010-\u2014])\\s++", perl=TRUE)

Sincerely,

Leonard




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Split String in regex while Keeping Delimiter

2023-04-13 Thread Leonard Mada via R-help


Dear Emily,

Using a look-behind solves the split problem in this case. (Note: Using 
Regex is in most/many cases the simplest solution.)


str = c("leucocyten + gramnegatieve staven +++ grampositieve staven ++",
"leucocyten – grampositieve coccen +")

tokens = strsplit(str, "(?<=[-+])\\s++", perl=TRUE)

PROBLEM
The current expression does NOT work for a different reason: the "-" is 
coded using a NON-ASCII character.


I have written a small utility function to approximately extract 
"non-standard" characters:

### Identify non-ASCII Characters
# beware: the filtering and the sorting may break the codes;
extract.nonLetters = function(x, rm.space = TRUE, sort=FALSE) {
    code = as.numeric(unique(unlist(lapply(x, charToRaw;
    isLetter =
    (code >= 97 & code <= 122) |
    (code >= 65 & code <= 90);
    code = code[ ! isLetter];
    if(rm.space) {
    # removes only simple space!
    code = code[code != 32];
    }
    if(sort) code = sort(code);
    return(code);
}
extract.nonLetters(str, sort = FALSE)
# 43 226 128 147

Note:
- the code for "+" is 43, and for simple "-" is 45: as.numeric 
(charToRaw("+-"));
- "226 128 147" codes something else, but it is not trivial to get the 
Unicode code Point;

https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192=128=dec

The following is a more comprehensive Regex expression, which accepts 
many variants of "-":

tokens = strsplit(str, "(?<=[-+\u2010-\u2014])\\s++", perl=TRUE)

Sincerely,

Leonard

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Default Generic function for: args(name, default = TRUE)

2023-03-08 Thread Leonard Mada via R-help

Dear Bert,

Thank you for the idea.

It works, although a little bit ugly. The original code generated an 
ugly warning as well. I have modified it slightly:

is.function.generic = function(name) {
     # TODO: is.function.generic();
     # - this version is a little bit ugly;
     # - S4: if(isGeneric(name));
     length(do.call(.S3methods, list(name))) > 0;
}

The latest code is on GitHub:
https://github.com/discoleo/R/blob/master/Stat/Tools.Code.R

Sincerely,

Leonard

### initial variant:

is.function.generic = function(name) {
     length(.S3methods(name)) > 0;
}
is.function.generic(plot)
# [1] TRUE
# Warning message:
# In .S3methods(name) :
#  generic function 'name' dispatches methods for generic 'plot'


On 3/8/2023 9:24 PM, Bert Gunter wrote:
> ?.S3methods
>
> f <- function()(2)
> > length(.S3methods(f))
> [1] 0
> > length(.S3methods(print))
> [1] 206
>
> There may be better ways, but this is what came to my mind.
> -- Bert
>
> On Wed, Mar 8, 2023 at 11:09 AM Leonard Mada via R-help 
>  wrote:
>
> Dear R-Users,
>
> I want to change the args() function to return by default the
> arguments
> of the default generic function:
> args = function(name, default = TRUE) {
>      # TODO: && is.function.generic();
>      if(default) {
>      fn = match.call()[[2]];
>      fn = paste0(as.character(fn), ".default");
>      name = fn;
>      }
>      .Internal(args(name));
> }
>
> Is there a nice way to find out if a function is generic:
> something like
> is.function.generic()?
>
> Many thanks,
>
> Leonard
> ===
>
> Note:
> - the latest version of this code will be on GitHub: [edited]
> https://github.com/discoleo/R/blob/master/Stat/Tools.Code.R
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> 
> https://eu01.z.antigena.com/l/WZHjscrVT77vHCg4mpOMD6G3~71hGfsI0mptEj63qDZ99ANSHBBoL682m4B1eeO4WJY2kWbEgf4nD6ORsu0Q1G3~fQBVXndWyFwTK0o1QPwCDSM4XcAT_kGxZsqFs0nU5LG9FCvwtZX7Lsta070KiGRfGTdgKbpCnJ9vNNn
>
> PLEASE do read the posting guide
> 
> https://eu01.z.antigena.com/l/Efe9Tc5mAjjlHqsLNsVk2LXfGus-29wP9xeO3U-ofLI66Tb7tlGzxrP41MCDN4tLHzRIy8CNw2lGBcBL8IJrlNylyzjgGj38QiP1AqozMauoon-m6yOtCa2oLqMafoHs6kmA~KUXemho3gXsgpaNdEzAmkcv5WqXCZJU9h
>
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Default Generic function for: args(name, default = TRUE)

2023-03-08 Thread Leonard Mada via R-help


Dear Gregg,

Thank you for the fast response.

I believe though that isGeneric works only for S4-functions:

isGeneric("plot")
# FALSE

I still try to get it to work.

Sincerely,

Leonard


On 3/8/2023 9:13 PM, Gregg Powell wrote:

Yes, there is a way to check if a function is generic. You can use the 
isGeneric function to check if a function is generic. Here's an updated version 
of the args function that includes a check for generic functions:


args = function(name, default = TRUE) {
  if(default && isGeneric(name)) {
   fn = paste0(as.character(name), ".default")
   name = fn
}
.Internal(args(name))
}

r/
Gregg





--- Original Message ---
On Wednesday, March 8th, 2023 at 12:09 PM, Leonard Mada via R-help 
 wrote:



Dear R-Users,

I want to change the args() function to return by default the arguments
of the default generic function:
args = function(name, default = TRUE) {
# TODO: && is.function.generic();
if(default) {
fn = match.call()[[2]];
fn = paste0(as.character(fn), ".default");
name = fn;
}
.Internal(args(name));
}

Is there a nice way to find out if a function is generic: something like
is.function.generic()?

Many thanks,

Leonard
===

Note:
- the latest version of this code will be on GitHub: [edited]
https://github.com/discoleo/R/blob/master/Stat/Tools.Code.R

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://eu01.z.antigena.com/l/CoS9jxWcs77ZH1EdYpfdt4f1PgsZsEHB9bGmOMbk~shS17K1eTNQZGNHQeCHXm_3FeDw2W64pxRnUN0qYZCeC-KYHDkXzka9~lEVxf9mq1jf24zCj~A96_OSrQj~-IjJgb0J9~DKPpwYsqrK9jJHy9jPj6T3V2M8SAwxFZzj
PLEASE do read the posting guide 
https://eu01.z.antigena.com/l/CoS9jwics77ZHse0yVQ6JzRj1U7ZoE-xBthsLCBb5dXiWzcWWdMG5w6w0ko3hHpOaKJLXU-CYJO0bkWf-_eiJLh3FOnfRi22P4jyYsHId9eMpOYB7kA9rQXziAjRycjqgrVw9~DKHfurjKs2zz-nxsjOrlIOuYLYHtTk0XXQ
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Default Generic function for: args(name, default = TRUE)

2023-03-08 Thread Leonard Mada via R-help


Dear R-Users,

I want to change the args() function to return by default the arguments 
of the default generic function:

args = function(name, default = TRUE) {
    # TODO: && is.function.generic();
    if(default) {
    fn = match.call()[[2]];
    fn = paste0(as.character(fn), ".default");
    name = fn;
    }
    .Internal(args(name));
}

Is there a nice way to find out if a function is generic: something like 
is.function.generic()?


Many thanks,

Leonard
===

Note:
- the latest version of this code will be on GitHub:
https://github.com/discoleo/R/commits/master/Stat/Tools.Code.R

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Generic Function read?

2023-02-28 Thread Leonard Mada via R-help


Dear R-Users,

I noticed that *read* is not a generic function. Although it could 
benefit from the functionality available for generic functions:


read = function(file, ...) UseMethod("read")

methods(read)
 # [1] read.csv read.csv2    read.dcf read.delim read.delim2  
read.DIF read.fortran

 # [8] read.ftable  read.fwf read.socket  read.table

The users would still need to call the full function name. But it seems 
useful to be able to find rapidly what formats can be read; including 
with other packages (e.g. for Excel, SAS, ... - although most packages 
do not adhere to the generic naming convention, but maybe they will 
change in the future).


Note:
This should be possible (even though impractical), but actually does NOT 
work:

read = function(file, ...) UseMethod("read")
file = "file.csv"
class(file) = c("csv", class(file));
read(file)

Should it not work?

Sincerely,

Leonard

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Covid Mutations: Cumulative?

2023-01-30 Thread Leonard Mada via R-help


Dear R-Users,

Did anyone follow more closely the SARS Cov-2 lineages?

I have done a quick check of Cov-2 mutations on the list downloaded from 
NCBI (see GitHub page below); but it seems that the list contains the 
cumulative mutations only for B.1 => B.1.1, but not after the B.1.1 branch:

# B.1 => B.1.1 seems cumulative
diff.lineage("B.1.1", "B.1", data=z)
# but B.1.1 => B.1.1.529 is NOT cumulative anymore;
diff.lineage("B.1.1.529", "B.1.1", data=z)
diff.lineage("B.1.1.529", "BA.2", data=z)
diff.lineage("B.1.1.529", "BA.5", data=z)

# Column id: B(oth) = present in both lineages:
    V   Mutation    P    AA Pos AAi AAm Polymorphism id
899 B.1.1 nsp3:F106F nsp3 F106F 106   F F TRUE  B
900 B.1.1 RdRp:P323L RdRp P323L 323   P L    FALSE  B
901 B.1.1    S:D614G    S D614G 614   D G    FALSE  B
902 B.1.1    N:R203K    N R203K 203   R K    FALSE  1
903 B.1.1    N:R203R    N R203R 203   R R TRUE  1
904 B.1.1    N:G204R    N G204R 204   G R    FALSE  1
896   B.1 nsp3:F106F nsp3 F106F 106   F F TRUE  B
897   B.1 RdRp:P323L RdRp P323L 323   P L    FALSE  B
898   B.1    S:D614G    S D614G 614   D G    FALSE  B
# B.1.1.529 and branches do not have any of the defining mutations of B.1.1;

I have uploaded the code on GitHub:
https://github.com/discoleo/R/blob/master/Stat/Infx/Cov2.Variants.R

1.) Does anyone have a better picture of what is going on?
The sub-variants should have cumulative mutations. This should be the 
logic for the sub-lineages and I deduce it also by the data/post on the 
GitHub pango page:

https://github.com/cov-lineages/pango-designation/issues/361


2.) Cumulative List

It maybe that NCBI kept only the new mutations, as the number of 
mutations increased.



Does anyone know if there is a full cumulative list?

Alternatively, there might be a list or package with the full lineage 
encoding. There is a list on the Pango GitHub project, but I hope to 
skip at least this step; the synonyms in the NCBI file seem uglier to 
process.



Note:

This question may be more oriented towards Bioconductor; but I haven't 
found any real Covid packages on Bioconductor.



Thank you very much for any help.


Sincerely,


Leonard

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Covid-19 Variants & Lineages

2023-01-24 Thread Leonard Mada via R-help


Dear Ivan,


Thank you very much.


Indeed, I missed the download button. The csv file seems to contain all 
the mutations in a usable format.



Sincerely,


Leonard


On 1/24/2023 11:29 PM, Ivan Krylov wrote:

On Tue, 24 Jan 2023 22:26:34 +0200
Leonard Mada via R-help  wrote:


The data on the NCBI page "Explore in SARS-CoV-2 Variants Overview"
seems very difficult to download:
https://www.ncbi.nlm.nih.gov/activ
E.g.: (in the lower-left corner, but impossible to copy)
NSP1: S135R
NSP13: R392C

The page has a "download" button, which requests
https://www.ncbi.nlm.nih.gov/genomes/VirusVariation/activ/?report=download_lineages
and offers to save it as "lineages.csv".

I think that the information you're looking for is available if you
feed this URL to read.csv() and look at the aa_definition column.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Covid-19 Variants & Lineages

2023-01-24 Thread Leonard Mada via R-help


Dear R-Users,

1.) Is there a package which gives the full code of a Covid-19 
lineage/variant?


E.g. Omicron = B.1.1.529, while BA correspond to specific subtypes of 
Omicron:

BA.x:
BA.1 = B.1.1.529.1;
BA.1.1 = B.1.1.529.1.1;
BA.1.1.5 = B.1.1.529.1.1.5;

Is there any package to offer such trans-coding functionality? And 
possibly warn if the lineage has been withdrawn?


The full list is available on GitHub:
https://github.com/cov-lineages/pango-designation/blob/master/lineage_notes.txt
Some of the lineages are reassigned or withdrawn. It seems feasible to 
process this list.


2.) Covid Mutations
Is there a package to retrieve the full list of mutations of a specific 
lineage/variant?


E.g. each node in the "tree" B.1.1.529.1.1.5 accumulates 1 or more new 
mutations. It is probably very uncommon for a mutation to get mutated 
back; so the mutations accumulate.


The data on the NCBI page "Explore in SARS-CoV-2 Variants Overview" 
seems very difficult to download:

https://www.ncbi.nlm.nih.gov/activ
E.g.: (in the lower-left corner, but impossible to copy)
NSP1: S135R
NSP13: R392C
[...]


Maybe there is a package already offering such functionality. I am now 
looking over the documentation of the COVID19.Analytics package, but I 
may miss the relevant functions.



Sincerely,


Leonard

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R emulation of FindRoot in Mathematica

2023-01-22 Thread Leonard Mada via R-help

Dear Troels,


There might be an error in one of the eqs:

# [modified] TODO: check;
mg2atp <- 10^(-7)*Mg*mgatp;

This version works:
x0 = c(
     atp = 0.008,
     adp = 0.1,
     pi = 0.003,
     pcr = 0.042,
     cr = 0.004,
     lactate = 0.005
) / 30;
# solved the positive value
x0[1] = 1E-6;

x = multiroot(solve.AcidSpecies, x0, H = 4E-8)
print(x)

# Results:
# atp  adp   pi  pcr cr  lactate
# 4.977576e-04 3.254998e-06 5.581774e-08 4.142785e-09 5.011807e-10 
4.973691e-03


Sincerely,


Leonard


On 1/23/2023 2:24 AM, Leonard Mada wrote:
> Dear Troels,
>
> I send you an updated version of the response. I think that a hybrid 
> approach is probably the best solution:
> - Input / Starting values = free: ATP, ADP, Crea, CreaP, lactate, 
> inorganic phosphate;
> - Output: diff(Total, given total value);
> - I assume that the pH is given;
> - I did NOT check all individual eqs;
>
> library(rootSolve)
>
> solve.AcidSpecies = function(x, H, Mg=0.0006, K = 0.12) {
>     # ... the eqs: ...;
>     ATPTotal = KaATPH * ATP * H + KaATPH2 * KaATPH * ATP * H^2 +
>     + KaATPH3 * KaATPH2 * KaATPH * ATP * H^3 + KdATPMg * ATP * Mg +
>     + KdATPHMg * ATP * H * Mg + KdATPMg2 * ATP * Mg^2 + KdATPK * ATP * K;
>     ### Output:
>     total = c(
>         ATPTotal - 0.008,
>         ADPTotal - 1E-5,
>         CreaTotal - 0.004,
>         CreaPTotal - 0.042,
>         PiTotal - 0.003,
>         LactateTotal - 0.005);
>     return(total);
> }
>
> KaATPH = 10^6.494; # ...
> x0 = c(ATP = 0.008, ADP = 1E-5,
>     Crea = 0.004, CreaP = 0.042, Pi = 0.003, Lactate = 0.005) / 2;
> x = multiroot(solve.AcidSpecies, x0, H = 4E-8);
>
>
> print(x)
>
>
> I think that it is possible to use the eqs directly as provided 
> initially (with some minor corrections). You only need to output the 
> known totals (as a diff), see full code below.
>
>
> Sincerely,
>
>
> Leonard
>
>
>
> library(rootSolve)
>
> solve.AcidSpecies = function(x, H, Mg=0.0006, k = 0.12) {
>     # with(list(x), { seems NOT to work with multiroot });
>     atp = x[1]; adp = x[2]; pi = x[3];
>     pcr = x[4]; cr = x[5]; lactate = x[6];
>
> ###
> hatp <- 10^6.494*H*atp
> hhatp <- 10^3.944*H*hatp
> hhhatp <- 10^1.9*H*hhatp
> atp  <- 10*H*hhhatp
> mgatp <- 10^4.363*atp*Mg
> mghatp <- 10^2.299*hatp*Mg
> mg2atp <- 10^1-7*Mg*mgatp
> katp <- 10^0.959*atp*k
>
> hadp <- 10^6.349*adp*H
> hhadp <- 10^3.819*hadp*H
> hhhadp <- 10*H*hhadp
> mgadp <- 10^3.294*Mg*adp
> mghadp <- 10^1.61*Mg*hadp
> mg2adp <- 10*Mg*mgadp
> kadp <- 10^0.82*k*adp
>
> hpi <- 10^11.616*H*pi
> hhpi <- 10^6.7*H*hpi
> hhhpi <- 10^1.962*H*hhpi
> mgpi <- 10^3.4*Mg*pi
> mghpi <- 10^1.946*Mg*hpi
> mghhpi <- 10^1.19*Mg*hhpi
> kpi <- 10^0.6*k*pi
> khpi <- 10^1.218*k*hpi
> khhpi <- 10^-0.2*k*hhpi
>
> hpcr <- 10^14.3*H*pcr
> hhpcr <- 10^4.5*H*hpcr
> hhhpcr <- 10^2.7*H*hhpcr
> pcr <- 100*H*hhhpcr
> mghpcr <- 10^1.6*Mg*hpcr
> kpcr <- 10^0.74*k*pcr
> khpcr <- 10^0.31*k*hpcr
> khhpcr <- 10^-0.13*k*hhpcr
>
> hcr <- 10^14.3*H*cr
> hhcr <- 10^2.512*H*hcr
>
> hlactate <- 10^3.66*H*lactate
> mglactate <- 10^0.93*Mg*lactate
>
> tatp <- atp + hatp + hhatp + hhhatp + mgatp + mghatp + mg2atp + katp
> tadp <- adp + hadp + hhadp + hhhadp + mghadp + mgadp + mg2adp + kadp
> tpi  <- pi + hpi + hhpi + hhhpi + mgpi + mghpi + mghhpi + kpi + khpi + 
> khhpi
> tpcr <- pcr + hpcr + hhpcr + hhhpcr + pcr + mghpcr + kpcr + khpcr 
> + khhpcr
> tcr  <- cr + hcr + hhcr
> tlactate <- lactate + hlactate + mglactate
> # tmg <- Mg + mgatp + mghatp + mg2atp + mgadp + mghadp + mg2adp + mgpi +
> #    kghpi + mghhpi + mghpcr + mglactate
> # tk <- k + katp + kadp + kpi + khpi + khhpi + kpcr + khpcr + khhpcr
>
>
> total = c(
>     tatp - 0.008,
>     tadp - 0.1,
>     tpi - 0.003,
>     tpcr - 0.042,
>     tcr - 0.004,
>     tlactate - 0.005)
> return(total);
> # })
> }
>
> # conditions
>
> x0 = c(
>     atp = 0.008,
>     adp = 0.1,
>     pi = 0.003,
>     pcr = 0.042,
>     cr = 0.004,
>     lactate = 0.005
> ) / 3;
> # tricky to get a positive value !!!
> x0[1] = 0.001; # still NOT positive;
>
> x = multiroot(solve.AcidSpecies, x0, H = 4E-8)
>
>
> On 1/23/2023 12:37 AM, Leonard Mada wrote:
>> Dear Troels,
>>
>> The system that you mentioned needs to be transformed first. The 
>> equations are standard acid-base equilibria-type equations in 
>> analytic chemistry.
>>
>> ATP + H <-> ATPH
>> ATPH + H <-> ATPH2
>> ATPH2 + H <-> ATPH3
>> [...]
>> The total amount of [ATP] is provided, while the concentration of the 
>> intermediates are unknown.
>>
>> Q.) It was unclear from your description:
>> Do you know the pH?
>> Or is the pH also unknown?
>>
>> I believe that the system is exactly solvable. The "multivariable" 
>> system/solution may be easier to write down: but is uglier to solve, 
>> as the "system" is under-determined. You can use optim in such cases, 
>> see eg. an example were I use it:
>> https://github.com/discoleo/R/blob/master/Stat/Polygons.Examples.R
>>
>>
>> a2 = optim(c(0.9, 0.5),

Re: [R] R emulation of FindRoot in Mathematica

2023-01-22 Thread Leonard Mada via R-help

Dear Troels,

I send you an updated version of the response. I think that a hybrid 
approach is probably the best solution:
- Input / Starting values = free: ATP, ADP, Crea, CreaP, lactate, 
inorganic phosphate;
- Output: diff(Total, given total value);
- I assume that the pH is given;
- I did NOT check all individual eqs;

library(rootSolve)

solve.AcidSpecies = function(x, H, Mg=0.0006, K = 0.12) {
     # ... the eqs: ...;
     ATPTotal = KaATPH * ATP * H + KaATPH2 * KaATPH * ATP * H^2 +
     + KaATPH3 * KaATPH2 * KaATPH * ATP * H^3 + KdATPMg * ATP * Mg +
     + KdATPHMg * ATP * H * Mg + KdATPMg2 * ATP * Mg^2 + KdATPK * ATP * K;
     ### Output:
     total = c(
         ATPTotal - 0.008,
         ADPTotal - 1E-5,
         CreaTotal - 0.004,
         CreaPTotal - 0.042,
         PiTotal - 0.003,
         LactateTotal - 0.005);
     return(total);
}

KaATPH = 10^6.494; # ...
x0 = c(ATP = 0.008, ADP = 1E-5,
     Crea = 0.004, CreaP = 0.042, Pi = 0.003, Lactate = 0.005) / 2;
x = multiroot(solve.AcidSpecies, x0, H = 4E-8);


print(x)


I think that it is possible to use the eqs directly as provided 
initially (with some minor corrections). You only need to output the 
known totals (as a diff), see full code below.


Sincerely,


Leonard



library(rootSolve)

solve.AcidSpecies = function(x, H, Mg=0.0006, k = 0.12) {
     # with(list(x), { seems NOT to work with multiroot });
     atp = x[1]; adp = x[2]; pi = x[3];
     pcr = x[4]; cr = x[5]; lactate = x[6];

###
hatp <- 10^6.494*H*atp
hhatp <- 10^3.944*H*hatp
hhhatp <- 10^1.9*H*hhatp
atp  <- 10*H*hhhatp
mgatp <- 10^4.363*atp*Mg
mghatp <- 10^2.299*hatp*Mg
mg2atp <- 10^1-7*Mg*mgatp
katp <- 10^0.959*atp*k

hadp <- 10^6.349*adp*H
hhadp <- 10^3.819*hadp*H
hhhadp <- 10*H*hhadp
mgadp <- 10^3.294*Mg*adp
mghadp <- 10^1.61*Mg*hadp
mg2adp <- 10*Mg*mgadp
kadp <- 10^0.82*k*adp

hpi <- 10^11.616*H*pi
hhpi <- 10^6.7*H*hpi
hhhpi <- 10^1.962*H*hhpi
mgpi <- 10^3.4*Mg*pi
mghpi <- 10^1.946*Mg*hpi
mghhpi <- 10^1.19*Mg*hhpi
kpi <- 10^0.6*k*pi
khpi <- 10^1.218*k*hpi
khhpi <- 10^-0.2*k*hhpi

hpcr <- 10^14.3*H*pcr
hhpcr <- 10^4.5*H*hpcr
hhhpcr <- 10^2.7*H*hhpcr
pcr <- 100*H*hhhpcr
mghpcr <- 10^1.6*Mg*hpcr
kpcr <- 10^0.74*k*pcr
khpcr <- 10^0.31*k*hpcr
khhpcr <- 10^-0.13*k*hhpcr

hcr <- 10^14.3*H*cr
hhcr <- 10^2.512*H*hcr

hlactate <- 10^3.66*H*lactate
mglactate <- 10^0.93*Mg*lactate

tatp <- atp + hatp + hhatp + hhhatp + mgatp + mghatp + mg2atp + katp
tadp <- adp + hadp + hhadp + hhhadp + mghadp + mgadp + mg2adp + kadp
tpi  <- pi + hpi + hhpi + hhhpi + mgpi + mghpi + mghhpi + kpi + khpi + khhpi
tpcr <- pcr + hpcr + hhpcr + hhhpcr + pcr + mghpcr + kpcr + khpcr + 
khhpcr
tcr  <- cr + hcr + hhcr
tlactate <- lactate + hlactate + mglactate
# tmg <- Mg + mgatp + mghatp + mg2atp + mgadp + mghadp + mg2adp + mgpi +
#    kghpi + mghhpi + mghpcr + mglactate
# tk <- k + katp + kadp + kpi + khpi + khhpi + kpcr + khpcr + khhpcr


total = c(
     tatp - 0.008,
     tadp - 0.1,
     tpi - 0.003,
     tpcr - 0.042,
     tcr - 0.004,
     tlactate - 0.005)
return(total);
# })
}

# conditions

x0 = c(
     atp = 0.008,
     adp = 0.1,
     pi = 0.003,
     pcr = 0.042,
     cr = 0.004,
     lactate = 0.005
) / 3;
# tricky to get a positive value !!!
x0[1] = 0.001; # still NOT positive;

x = multiroot(solve.AcidSpecies, x0, H = 4E-8)


On 1/23/2023 12:37 AM, Leonard Mada wrote:
> Dear Troels,
>
> The system that you mentioned needs to be transformed first. The 
> equations are standard acid-base equilibria-type equations in analytic 
> chemistry.
>
> ATP + H <-> ATPH
> ATPH + H <-> ATPH2
> ATPH2 + H <-> ATPH3
> [...]
> The total amount of [ATP] is provided, while the concentration of the 
> intermediates are unknown.
>
> Q.) It was unclear from your description:
> Do you know the pH?
> Or is the pH also unknown?
>
> I believe that the system is exactly solvable. The "multivariable" 
> system/solution may be easier to write down: but is uglier to solve, 
> as the "system" is under-determined. You can use optim in such cases, 
> see eg. an example were I use it:
> https://github.com/discoleo/R/blob/master/Stat/Polygons.Examples.R
>
>
> a2 = optim(c(0.9, 0.5), polygonOptim, d=x)
> # where the function polygonOptim() computes the distance between the 
> starting-point & ending point of the polygon;
> # (the polygon is defined only by the side lengths and optim() tries 
> to compute the angles);
> # optimal distance = 0, when the polygon is closed;
> # Note: it is possible to use more than 2 starting values in the 
> example above (the version with optim) works quit well;
> # - but you need to "design" the function that is optimized for your 
> particular system, e.g.
> #   by returning: c((ATPTotal - value)^2, (ADPTotal - value)^2, ...);
>
>
> S.1.) Exact Solution
> ATP system: You can express all components as eqs of free ATP, [ATP], 
> and [H], [Mg], [K].
> ATPH = KaATPH * ATP * H;
> ATPH2 = KaATPH2 * ATPH * H
> = KaATPH2 * KaATPH * ATP * H^2;
> [...]
>
> Then you substitute these into

Re: [R] return value of {....}

2023-01-12 Thread Leonard Mada via R-help


Dear Akshay,

The best response was given by Andrew. "{...}" is not a closure.

This is unusual for someone used to C-type languages. But I will try to 
explain some of the rationale.


In the case that "{...}" was a closure, then external variables would 
need to be explicitly declared before the closure (in order to reuse 
those values):

intermediate = c()
{
    intermediate = ...;
    result = someFUN(intermediate);
}

1.) Interactive Sessions
This is cumbersome in interactive sessions. For example: you often 
compute the mean or the variance as intermediary results, and will need 
them later on as well. They could have been computed outside the 
"closure", but writing code in interactive sessions may not always be 
straightforward.


2.) Missing arguments
f = function(x, y) {
    if(missing(y)) {
        # assuming x = matrix
        y = x[,2]; x = x[,1];
    }
}
It would be much more cumbersome to define/use a temporary tempY.

I hope this gives a better perspective why this is indeed a useful 
feature - even if it is counterintuitive.


Sincerely,

Leonard

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problem with integrate(function(x) x^3 / sin(x), -pi/2, pi/2)

2023-01-07 Thread Leonard Mada via R-help


Dear List-Members,

I encounter a problem while trying to integrate the following function:

integrate(function(x) x^3 / sin(x), -pi/2, pi/2)
# Error in integrate(function(x) x^3/sin(x), -pi/2, pi/2) :
#  non-finite function value

# the value should be finite:
curve(x^3 / sin(x), -pi/2, pi/2)
integrate(function(x) x^3 / sin(x), -pi/2, 0)
integrate(function(x) x^3 / sin(x), 0, pi/2)
# but this does NOT work:
integrate(function(x) x^3 / sin(x), -pi/2, pi/2, subdivisions=4096)
integrate(function(x) x^3 / sin(x), -pi/2, pi/2, subdivisions=4097)
# works:
integrate(function(x) x^3 / sin(x), -pi/2, pi/2 + 1E-10)


# Note: works directly with other packages

pracma::integral(function(x) x^3 / sin(x), -pi/2, pi/2 )
# 3.385985


I hope that integrate() gets improved in base R as well.


Sincerely,


Leonard

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] extract from a list of lists

2022-12-28 Thread Leonard Mada via R-help


Dear Terry,

The following approach may be more suitable:

fits <- lapply(argument, function)
fits.df = do.call(rbind, fits);

It works if all the lists returned by "function" have the same number of 
elements.

Example:
fits.df = lapply(seq(3), function(id) {
list(
beta = rnorm(1), loglik = id^2,
iter = sample(seq(100,200), 1), id = id);
})
fits.df = do.call(rbind, x);
fits.df

I have added an id in case the function returns a variable number of "rows".

Sincerely,

Leonard

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Hidden Problems with Clustering Algorithms

2022-11-21 Thread Leonard Mada via R-help


Dear R-Users,

Hidden Problems with Clustering Algorithms

I stumbled recently upon a presentation about hierarchical clustering. 
Unfortunately, it contains a hidden problem of clustering algorithms. 
The problem is deeper and I think that it warrants a closer inspection 
by the statistical community.


The presentation is available online. Both the scaled & non-scaled 
versions show the problem.


de.NBI course - Advanced analysis of quantitative proteomics data using 
R: 03b Clustering Part2

[Note: it's more like introductory notes to basic statistics]
https://www.youtube.com/watch?v=7e1uW_BhljA
times:
- at 6:15 - 6:28 & 6:29 - 7:10 [2 versions, both non-scaled]
- at 5:51 - 6:10 [the scaled version]
- same problem at 7:56;

PROBLEM

Non-Scaled Version: (e.g. the one at 6:15)
- the upper 2 rows are split into various sub-clusters;
- the top tree: a cluster is formed by the right-right sub-tree (some 17 
"genes" or similar "activities" / "expressions");
- the left-most 2 "genes" are actually over-expressed "genes" and 
functionally really belong to the previous/right sub-cluster;


Scaled-Version: (at 5:52)
- the left-most 2 "genes" are over-expressed at the same time with the 
right cluster, and not otherwise;


Unfortunately, the 2 over-expressed (outliers or extreme-values) are 
split off from the relevant cluster and inserted as a separate 
main-branch in the top dendrogram. Switching only the main left & right 
branches in the top tree would only mask this problem. The 2 
pseudo-outliers are really the (probably) upper values in the larger 
cluster of over-expressed "genes" (all the dark genes should belong to 
the same cluster).


The middle sub-cluster shows really NO activity (some 16 "genes"). The 
main branches in the top tree should really split between this 
*NO*-activity cluster and the cluster showing activity (including the 2 
massively over-expressed genes). The problem is present in the scaled 
version as well.


The hierarchical clustering algorithm fails. I have not analysed the 
data, but some problems may contribute to this:
- "gene expression" or "activity" may not be linear, but exponential or 
follow some power rule: a logarithmic transformation (or some other 
transformation) may have been useful;

- simple distances between clusters may be too inaccurate;
- the variance in the low-activity (middle) cluster may be very low 
(almost 0!), while the variance in the high-activity cluster may be much 
higher: the Mahalanobis distance or joining the sub-clusters based on 
some z/t-test taking into account the different variances may be more 
robust;


These questions should be addressed by more senior statisticians.

I hope that the presentation remains on-line as is, as the clustering 
problem is really easy to see and to analyse. It is impossible to detect 
and visualise such anomalies in a heatmap with 1,000 gene-expressions or 
with 10,000 genes, or with 500-1000 samples. It is very obvious on this 
small heatmap.


I do not know if there are any robust tools to validate the generated 
trees. Inspecting by "eye" a dendrogram with > 1,000 genes and hundreds 
of samples is really futile.


Sincerely,

Leonard

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Partition vector of strings into lines of preferred width

2022-10-28 Thread Leonard Mada via R-help


Dear Andrew,


Thank you for the fast reply. I forgot about strwrap. Though my problem 
is slightly different.



I do have the actual vector. Of course, I could first join the strings - 
but this then involves both a join and a split (done by strwrap). Maybe 
its possible to avoid the join and the split. My 2nd approach may be 
also fine, but I have not tested it thoroughly (and I may miss an 
existing solution).



Sincerely,


Leonard


On 10/29/2022 12:51 AM, Andrew Simmons wrote:

I would suggest using strwrap(), the documentation at ?strwrap has
plenty of details and examples.
For paragraphs, I would usually do something like:

strwrap(x = , width = 80, indent = 4)

On Fri, Oct 28, 2022 at 5:42 PM Leonard Mada via R-help
 wrote:

Dear R-Users,

text = "
What is the best way to split/cut a vector of strings into lines of
preferred width?
I have come up with a simple solution, albeit naive, as it involves many
arithmetic divisions.
I have an alternative idea which avoids this problem.
But I may miss some existing functionality!"

# Long vector of strings:
str = strsplit(text, " |(?<=\n)", perl=TRUE)[[1]];
lenWords = nchar(str);

# simple, but naive solution:
# - it involves many divisions;
cut.character.int = function(n, w) {
  ncm = cumsum(n);
  nwd = ncm %/% w;
  count = rle(nwd)$lengths;
  pos = cumsum(count);
  posS = pos[ - length(pos)] + 1;
  posS = c(1, posS);
  pos = rbind(posS, pos);
  return(pos);
}

npos = cut.character.int(lenWords, w=30);
# lets print the results;
for(id in seq(ncol(npos))) {
 len = npos[2, id] - npos[1, id];
 cat(str[seq(npos[1, id], npos[2, id])], c(rep(" ", len), "\n"));
}


The first solution performs an arithmetic division on all string
lengths. It is possible to find out the total length and divide only the
last element of the cumsum. Something like this should work (although it
is not properly tested).


w = 30;
cumlen = cumsum(lenWords);
max = tail(cumlen, 1) %/% w + 1;
pos = cut(cumlen, seq(0, max) * w);
count = rle(as.numeric(pos))$lengths;
# everything else is the same;
pos = cumsum(count);
posS = pos[ - length(pos)] + 1;
posS = c(1, posS);
pos = rbind(posS, pos);

npos = pos; # then print


The cut() may be optimized as well, as the cumsum is sorted ascending. I
did not evaluate the efficiency of the code either.

But do I miss some existing functionality?


Note:

- technically, the cut() function should probably return a vector of
indices (something like: rep(seq_along(count), count)), but it was more
practical to have both the start and end positions.


Many thanks,


Leonard

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Partition vector of strings into lines of preferred width

2022-10-28 Thread Leonard Mada via R-help


Dear R-Users,

text = "
What is the best way to split/cut a vector of strings into lines of 
preferred width?
I have come up with a simple solution, albeit naive, as it involves many 
arithmetic divisions.

I have an alternative idea which avoids this problem.
But I may miss some existing functionality!"

# Long vector of strings:
str = strsplit(text, " |(?<=\n)", perl=TRUE)[[1]];
lenWords = nchar(str);

# simple, but naive solution:
# - it involves many divisions;
cut.character.int = function(n, w) {
    ncm = cumsum(n);
    nwd = ncm %/% w;
    count = rle(nwd)$lengths;
    pos = cumsum(count);
    posS = pos[ - length(pos)] + 1;
    posS = c(1, posS);
    pos = rbind(posS, pos);
    return(pos);
}

npos = cut.character.int(lenWords, w=30);
# lets print the results;
for(id in seq(ncol(npos))) {
   len = npos[2, id] - npos[1, id];
   cat(str[seq(npos[1, id], npos[2, id])], c(rep(" ", len), "\n"));
}


The first solution performs an arithmetic division on all string 
lengths. It is possible to find out the total length and divide only the 
last element of the cumsum. Something like this should work (although it 
is not properly tested).



w = 30;
cumlen = cumsum(lenWords);
max = tail(cumlen, 1) %/% w + 1;
pos = cut(cumlen, seq(0, max) * w);
count = rle(as.numeric(pos))$lengths;
# everything else is the same;
pos = cumsum(count);
posS = pos[ - length(pos)] + 1;
posS = c(1, posS);
pos = rbind(posS, pos);

npos = pos; # then print


The cut() may be optimized as well, as the cumsum is sorted ascending. I 
did not evaluate the efficiency of the code either.


But do I miss some existing functionality?


Note:

- technically, the cut() function should probably return a vector of 
indices (something like: rep(seq_along(count), count)), but it was more 
practical to have both the start and end positions.



Many thanks,


Leonard

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] [External Help] Multivariate Polynomials in R

2022-10-02 Thread Leonard Mada via R-help


Dear R Users,

I have written some R code for multivariate polynomials in R. I am 
looking forward for some help in redesigning and improving the code.


Although this code was not planned initially to be released as a 
package, the functionality has become quite versatile over time. I will 
provide some examples below. If anyone is interested in multivariate 
polynomials and has some spare time, or has some students interested to 
learn some interesting math, feel free to contact me.


The immediate focus should be on:
1) Writing/improving the automatic tests;
2) Redesigning the code (and build an R package);

As the code has grown in size, I am very cautious to change anything, 
until proper tests are written. I have started to write some test code 
(the link to the GitHub page is below), but I am not yet very confident 
how to properly write the tests and also lack the time as well. I will 
appreciate any expertise and help on this topic.


Ultimately, I hope to be able to focus more on the math topics. I will 
post a separate call for some of these topics.


CODE DETAILS

The source files are on GitHub:
https://github.com/discoleo/R/blob/master/Math/Polynomials.Helper.R
- (all) files named Polynomials.Helper.XXX.R are needed; (~ 25 files, 
including the test files);

- if requested, I can also upload a zip file with all these source files;
- the code started as some helper scripts (which is why all those files 
are mixed with other files);


The multivariate polynomials are stored as data.frames and R's 
aggregate() function is the workhorse: but it proved sufficiently fast 
and the code works well even with polynomials with > 10,000 monomials. I 
have some older Java code which used a TreeMap (sorted map), but I do 
not maintain that code anymore. I was very reserved initially regarding 
the efficiency of the data frame; but it worked well! And it proved very 
useful for sub-setting specific monomials!


I have attached some concrete examples below.

Sincerely,

Leonard


### Examples

source("Polynomials.Helper.R")
# - requires also the other Helper scripts;
# - not strictly needed (but are loaded automatically):
#   library(polynom)
#   library(pracma)

### Example 1:
n = 2; # Power "n" will be evaluated automatically
p1 = toPoly.pm("x^n*y + b*z - R")
p2 = toPoly.pm("y^n*z + b*x - R")
p3 = toPoly.pm("z^n*x + b*y - R")

pR = solve.lpm(p1, p2, p3, xn=c("z", "y"))
str(pR) # 124 monomials
# tweaking manually can improve the results;
pR = solve.lpm(p1, p2, p3, xn=c("y", "z"))
str(pR)
# pR[[2]]$Rez: 19 monomials: much better;

pR2 = div.pm(pR[[2]]$Rez, "x^3 + b*x - R", "x")
# Divisible!
str(pR2)
# Order 12 polynomial in x (24 monomials);

### Note:
# - the P[12] contains the distinct roots:
#   it is the minimal order polynomial;
# - the trivial solution (x^3 + b*x = R) was removed;
# - this is the naive way to solve this system (but good as Demo);

# print the coefficients of x;
# (will be used inside the function coeff.S3Ht below)
pR2 = pR2$Rez;
pR2$coeff = - pR2$coeff; # positive leading coeff;
toCoeff(pR2, "x")

### Quick Check
solve.S3Ht = function(R, b) {
    coeff = coeff.S3Ht(R, b);
    x = roots(coeff); # library(pracma)
    # Note: pracma uses leading to free coeff;
    z = b*x^11 - R*x^10 - 2*R^2*b*x^5 + 2*R^2*b^2*x^3 + R*b^4*x^2 - R*b^5;
    z = z / (- R^2*x^6 - R*b^2*x^5 + 3*R*b^3*x^3 - b^6);
    y = (R - z^2*x) / b;
    sol = cbind(x, y, z);
    return(sol);
}
coeff.S3Ht = function(R, b) {
    coeff = c(b^2, - 2*R*b, R^2 - b^3, 3*R*b^2,
    - 3*R^2*b + b^4, R^3 - 4*R*b^3,
    2*R^2*b^2 - b^5, 5*R*b^4,
    R^4 - R^2*b^3 + b^6, - 3*R^3*b^2 - 3*R*b^5,
    - R^4*b + 3*R^2*b^4 - b^7,
    2*R^3*b^3 - R*b^6,
    - R^2*b^5 + b^8);
    return(coeff);
}

R = 5; b = -2;
sol = solve.S3Ht(R, b)
# all 12 sets of solutions:
x = sol[,1]; y = sol[,2]; z = sol[,3];

### Test:
x^2*y + b*z
y^2*z + b*x
z^2*x + b*y

id = 1;
eval.pm(p1, list(x=x[id], y=y[id], z=z[id], b=b, R=R))


##

### Example 2:

n = 5
p1 = toPoly.pm("(x + a)^n + (y + a)^n - R1")
p2 = toPoly.pm("(x + b)*(y + b) - R2")

# Very Naive way to solve:
pR = solve.pm(p1, p2, "y")
str(pR)
table(pR$Rez$x)
# Order 10 with 109 monomials;
# [very naive!]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] regexpr: R takes very long with non-existent pattern

2022-05-18 Thread Leonard Mada via R-help

Dear Bert,


The variable patt does not exist in the R environment.


I was pasting the code for an R function in the R console and I had a 
syntax error on a line. But the next lines executed simply as simple R 
code. The variable patt was not previously defined.


Though x was a different object and the long execution time may 
originate there.

x = original xml with the Pubmed abstracts


Sincerely,


Leonard



On 5/19/2022 3:31 AM, Bert Gunter wrote:
> Doubt that I can help, but what does "not defined" mean? -- NA, "", " 
> " ? Something else?
> I would guess that if it's NA, you should get an immediate error.
> If it's "" , that's a legitimate pattern and would result in matches 
> of 0 length for everything, which might trigger an error in other 
> parts of your code.
> All a guess, though.
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along 
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Wed, May 18, 2022 at 5:08 PM Leonard Mada via R-help 
>  wrote:
>
> Dear R Users,
>
>
> I have run the following command in R:
>
> # x = larger vector of strings (1200 Pubmed abstracts);
> # patt = not defined;
> npos = regexpr(patt, x, perl=TRUE);
> # Error in regexpr(patt, x, perl = TRUE) : object 'patt' not found
>
>
> The problem:
>
> R becomes unresponsive and it takes 1-2 minutes to return the
> error. The
> operation completes almost instantaneously with a valid pattern.
>
> Is there a reason for this behavior?
>
> Tested with R 4.2.0 on MS Windows 10.
>
>
> I have uploaded a set with 1200 Pubmed abstracts on Github, if anyone
> wants to check:
>
> - see file: Example_Abstracts_Title_Pubmed.csv;
>
> https://github.com/discoleo/R/tree/master/TextMining/Pubmed
>
> The variable patt was not defined due to an error: but it took
> very long
> to exit the operation and report the error.
>
>
> Many thanks,
>
>
> Leonard
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> <http://www.R-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] regexpr: R takes very long with non-existent pattern

2022-05-18 Thread Leonard Mada via R-help


Dear Andrew,


I screwed it a little bit up. The object was not a string vector, but an 
xml object (the original xml with the abstracts).


str(x)
List of 2
 $ node:
 $ doc :
 - attr(*, "class")= chr [1:2] "xml_document" "xml_node"


i pasted the R code for a function but had an error, which stopped the 
parsing of the function. But the next lines were still executed:


npos = regexpr(patt, x, perl=TRUE);
# Error in regexpr(patt, x, perl = TRUE) : object 'patt' not found


Variable x was actually the xml object - my mistake. It still takes 1-2 
minutes to generate the final error.


Is regexpr trying to parse the xml with as.character first (I have not 
checked this)?


It makes more sense to first parse the regex expression.


Sincerely,


Leonard

On 5/19/2022 3:26 AM, Andrew Simmons wrote:

Hello,


I tried this myself, something like:


dat <- utils::read.csv(
 
"https://raw.githubusercontent.com/discoleo/R/master/TextMining/Pubmed/Example_Abstracts_Title_Pubmed.csv;,
 check.names = FALSE
)


regexpr(patt, dat$Abstract, perl = TRUE)
regexpr(patt, dat$Title, perl = TRUE)


and I can't reproduce your issue. Mine seems to raise the error within
a second or less that object 'patt' does not exist. I'm using R 4.2.0
and Windows 11, though that shouldn't be making a difference: if you
look at Sys.info(), it's still Windows 10 with a build version of
22000. Don't really know what else to say, have you tried it again
since?


Regards,
 Andrew Simmons

On Wed, May 18, 2022 at 5:09 PM Leonard Mada via R-help
 wrote:

Dear R Users,


I have run the following command in R:

# x = larger vector of strings (1200 Pubmed abstracts);
# patt = not defined;
npos = regexpr(patt, x, perl=TRUE);
# Error in regexpr(patt, x, perl = TRUE) : object 'patt' not found


The problem:

R becomes unresponsive and it takes 1-2 minutes to return the error. The
operation completes almost instantaneously with a valid pattern.

Is there a reason for this behavior?

Tested with R 4.2.0 on MS Windows 10.


I have uploaded a set with 1200 Pubmed abstracts on Github, if anyone
wants to check:

- see file: Example_Abstracts_Title_Pubmed.csv;

https://github.com/discoleo/R/tree/master/TextMining/Pubmed

The variable patt was not defined due to an error: but it took very long
to exit the operation and report the error.


Many thanks,


Leonard

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] regexpr: R takes very long with non-existent pattern

2022-05-18 Thread Leonard Mada via R-help


Dear R Users,


I have run the following command in R:

# x = larger vector of strings (1200 Pubmed abstracts);
# patt = not defined;
npos = regexpr(patt, x, perl=TRUE);
# Error in regexpr(patt, x, perl = TRUE) : object 'patt' not found


The problem:

R becomes unresponsive and it takes 1-2 minutes to return the error. The 
operation completes almost instantaneously with a valid pattern.


Is there a reason for this behavior?

Tested with R 4.2.0 on MS Windows 10.


I have uploaded a set with 1200 Pubmed abstracts on Github, if anyone 
wants to check:


- see file: Example_Abstracts_Title_Pubmed.csv;

https://github.com/discoleo/R/tree/master/TextMining/Pubmed

The variable patt was not defined due to an error: but it took very long 
to exit the operation and report the error.



Many thanks,


Leonard

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to convert category (or range/group) into continuous?

2022-01-18 Thread Leonard Mada via R-help


Dear Marna,


If you want to extract the middle of those intervals, please find below 
an improved variant of Luigi's code.



Note:
- it is more efficient to process the levels of a factor, instead of all 
the individual strings;
- I envision that there are benefits in a large data frame (> 1 million 
rows) - although I have not explicitly checked it;

- the code also handles better the open/closed intervals;
- the returned data structure may require some tweaking (currently 
returns a data.frame);




### Middle of an Interval
mid.factor = function(x, inf.to = NULL, split.str=",") {
    lvl0 = levels(x); lvl = lvl0;
    lvl = sub("^[(\\[]", "", lvl);
    lvl = sub("[])]$", "", lvl); # tricky;
    lvl = strsplit(lvl, split.str);
    lvl = lapply(lvl, function(x) as.numeric(x));
    if( ! is.null(inf.to)) {
    FUN = function(x) {
        if(any(x == Inf)) 1
        else if(any(x == - Inf)) -1
        else 0;
    }
    whatInf = sapply(lvl, FUN);
    # TODO: more advanced;
    lvl[whatInf == -1] = inf.to[1];
    lvl[whatInf ==  1] = inf.to[2];
    }
    mid = sapply(lvl, mean);
    lvl = data.frame(lvl=lvl0, mid=mid);
    merge(data.frame(lvl=x), lvl, by="lvl");
}


# uses the daT data frame;
# requires a factor:
# - this is probably the case with the original data;
daT$group = as.factor(daT$group);
mid.factor(daT$group);


I have uploaded this code also on my GitHub list of useful data tools:

https://github.com/discoleo/R/blob/master/Stat/Tools.Data.R


Sincerely,


Leonard

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Method Guidance

2022-01-14 Thread Leonard Mada via R-help


Dear Jeff,


I am sending an updated version of the code.


The initial version assumed that the time points correspond to an 
integer sequence. The code would fail for arbitrary times.



The new code is robust. I still assume that the data is in column-format 
and that you want the time to the previous "A"-event, even if there are 
other non-A events in between.



The code is similar, but we cannot use seq(0, x-1) anymore. Instead, we 
will repeat the time point of the previous A-event. (Last-A Carried forward)



# jrdf = the data frame from my previous mail;
cumEvent = cumsum(jrdf$Event_A);
# we cannot use the actual values of the cumsum,
# but will use the number of (same) values to the previous event;
freqEvent = rle(cumEvent);
freqEvent = freqEvent$lengths;

# repeat the time-points
timesA = jrdf$Time[jrdf$Event_A == 1];
sameTime = rep(timesA, freqEvent);
timeToA = jrdf$Time - sameTime;

### Step 2:
# extract/view the times (as before);
timeToA[jrdf$Event_B >= 1];
# Every Time to A: e.g. for multiple extractions;
cbind(jrdf, timeToA);
# Time to A only for B: set non-B to 0;


# Note:
- the rle() function might be less known;
- it is "equivalent" to:
tbl = table(cumEvent);
# to be on the safe side (as the cumsum is increasing):
id = order(as.numeric(names(tbl)));
tbl = tbl[id];


Hope this helps,


Leonard


On 1/14/2022 3:30 AM, Leonard Mada wrote:

Dear Jeff,


My answer is a little bit late, but I hope it helps.


jrdf = read.table(text="Time   Event_A    Event_B   Lag_B
1  1 1    0
2  0 1    1
3  0 0    0
4  1 0    0
5  0 1    1
6  0 0    0
7  0 1    3
8  1 1    0
9  0 0    0
10 0 1    2",
header=TRUE, stringsAsFactors=FALSE)

Assuming that:
- Time, Event_A, Event_B are given;
- Lag_B needs to be computed;

Step 1:
- compute time to previous Event A;

tmp = jrdf[, c(1,2)];
# add an extra event so last rows are not lost:
tmp = rbind(tmp, c(nrow(tmp) + 1, 1));

timeBetweenA = diff(tmp$Time[tmp$Event_A > 0]);
timeToA = unlist(sapply(timeBetweenA, function(x) seq(0, x-1)))


### Step 2:
# - extract the times;
timeToA[jrdf$Event_B >= 1];
cbind(jrdf, timeToA);


Sincerely,


Leonard




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Method Guidance

2022-01-13 Thread Leonard Mada via R-help


Dear Jeff,


My answer is a little bit late, but I hope it helps.


jrdf = read.table(text="Time   Event_AEvent_B   Lag_B
1  1 10
2  0 11
3  0 00
4  1 00
5  0 11
6  0 00
7  0 13
8  1 10
9  0 00
10 0 12",
header=TRUE, stringsAsFactors=FALSE)

Assuming that:
- Time, Event_A, Event_B are given;
- Lag_B needs to be computed;

Step 1:
- compute time to previous Event A;

tmp = jrdf[, c(1,2)];
# add an extra event so last rows are not lost:
tmp = rbind(tmp, c(nrow(tmp) + 1, 1));

timeBetweenA = diff(tmp$Time[tmp$Event_A > 0]);
timeToA = unlist(sapply(timeBetweenA, function(x) seq(0, x-1)))


### Step 2:
# - extract the times;
timeToA[jrdf$Event_B >= 1];
cbind(jrdf, timeToA);


Sincerely,


Leonard

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sum every n (4) observations by group

2021-12-20 Thread Leonard Mada via R-help


Dear Miluji,


something like this could help:

sapply(tapply(x$Value, x$ID, cumsum),

    function(x) x[seq(4, length(x), by=4)] - c(0, x[head(seq(4, 
length(x), by=4), -1)]))


1.) Step 1:

Compute the cumsum for each ID:

tapply(x$Value, x$ID, cumsum)


2.) Step 2:

- iterate over the resulting list and select each 4th value;

- you can either run a diff on this or subtract directly the (n-4) sum;


Note:

- you may wish to check if the last value is a multiple of 4;

- alternative: you can do a LOCF (last observation carried forward);


Hope this code example helps.


Sincerely,


Leonard

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to create a proper S4 class?

2021-11-17 Thread Leonard Mada via R-help


Dear Martin,


thank you very much for the guidance.


Ultimately, I got it running. But, for mysterious reasons, it was 
challenging:


- I skipped for now the inheritance (and used 2 explicit non-inherited 
slots): this is still unresolved; [*]


- the code is definitely cleaner;


[*] Mysterious errors, like:

"Error in cbind(deparse.level, ...) :
  cbind for agentMatrix is only defined for 2 agentMatrices"


One last question pops up:

If B inherits from A, how can I down-cast back to A?

b = new("B", someA);

??? as.A(b) ???

Is there a direct method?

I could not explore this, as I am still struggling with the inheritance. 
The information may be useful, though: it helps in deciding the design 
of the data-structures. [Actually, all base-methods should work natively 
as well - but to have a solution in any case.]



Sincerely,


Leonard


On 11/17/2021 5:48 PM, Martin Morgan wrote:

Hi Leonard --

Remember that a class can have 'has a' and 'is a' relationships. For instance, 
a People class might HAVE slots name and age

.People <- setClass(
 "People",
 slots = c(name = "character", age = "numeric")
)

while an Employees class might be described as an 'is a' relationship -- all 
employeeds are people -- while also having slots like years_of_employment and 
job_title

.Employees <- setClass(
 "Employees",
 contains = "People",
 slots = c(years_of_employment = "numeric", job_title = "character")
)

I've used .People and .Employees to capture the return value of setClass(), and 
these can be used as constructors

people <- .People(
name = c("Simon", "Andre"),
age = c(30, 60)
)

employees = .Employees(
 people, # unnamed arguments are class(es) contained in 'Employees'
 years_of_employment = c(3, 30),
 job_title = c("hard worker", "manager")
)

I would not suggest using attributes in addition to slots. Rather, embrace the 
paradigm and represent attributes as additional slots. In practice it is often 
helpful to write a constructor function that might transform between formats 
useful for users to formats useful for programming, and that can be easily 
documented.

Employees <-
 function(name, age, years_of_employment, job_title)
{
 ## implement sanity checks here, or in validity methods
 people <- .People(name = name, age = age)
 .Employees(people, years_of_employment = years_of_employment, job_title = 
job_title)
}

plot() and lines() are both S3 generics, and the rules for S3 generics using S4 
objects are described in the help page ?Methods_for_S3. Likely you will want to 
implement a show() method; show() is an S4 method, so see ?Methods_Details. 
Typically this should use accessors rather than relying on direct slot access, 
e.g.,

person_names <- function(x) x@name
employee_names <- person_names

The next method implemented is often the [ (single bracket subset) function; 
this is relatively complicated to get right, but worth exploring.

I hope that gets you a little further along the road.

Martin Morgan

On 11/16/21, 11:34 PM, "R-help on behalf of Leonard Mada via R-help" 
 wrote:

 Dear List-Members,


 I want to create an S4 class with 2 data slots, as well as a plot and a
 line method.


 Unfortunately I lack any experience with S4 classes. I have put together
 some working code - but I presume that it is not the best way to do it.
 The actual code is also available on Github (see below).


 1.) S4 class
 - should contain 2 data slots:
 Slot 1: the agents:
   = agentMatrix class (defined externally, NetlogoR S4 class);
 Slot 2: the path traveled by the agents:
= a data frame: (x, y, id);
   - my current code: defines only the agents ("t");
 setClass("agentsWithPath", contains = c(t="agentMatrix"));

 1.b.) Attribute with colors specific for each agent
 - should be probably an attribute attached to the agentMatrix and not a
 proper data slot;
 Note:
 - it is currently an attribute on the path data.frame, but I will
 probably change this once I get the S4 class properly implemented;
 - the agentMatrix does NOT store the colors (which are stored in another
 class - but it is useful to have this information available with the
 agents);

 2.) plot & line methods for this class
 plot.agentsWithPath;
 lines.agentsWithPath;


 I somehow got stuck with the S4 class definition. Though it may be a
 good opportunity to learn about S4 classes (and it is probably better
 suited as an S4 class than polynomials).


 The GitHub code draws the agents, but was somehow hacked together. For
 anyone interested:

 https://github.com/discoleo/R/blob/master/Stat/ABM.Models.Particles.R


 Many thanks,


 Leonard

 __

[R] How to create a proper S4 class?

2021-11-16 Thread Leonard Mada via R-help


Dear List-Members,


I want to create an S4 class with 2 data slots, as well as a plot and a 
line method.



Unfortunately I lack any experience with S4 classes. I have put together 
some working code - but I presume that it is not the best way to do it. 
The actual code is also available on Github (see below).



1.) S4 class
- should contain 2 data slots:
Slot 1: the agents:
 = agentMatrix class (defined externally, NetlogoR S4 class);
Slot 2: the path traveled by the agents:
  = a data frame: (x, y, id);
 - my current code: defines only the agents ("t");
setClass("agentsWithPath", contains = c(t="agentMatrix"));

1.b.) Attribute with colors specific for each agent
- should be probably an attribute attached to the agentMatrix and not a 
proper data slot;

Note:
- it is currently an attribute on the path data.frame, but I will 
probably change this once I get the S4 class properly implemented;
- the agentMatrix does NOT store the colors (which are stored in another 
class - but it is useful to have this information available with the 
agents);


2.) plot & line methods for this class
plot.agentsWithPath;
lines.agentsWithPath;


I somehow got stuck with the S4 class definition. Though it may be a 
good opportunity to learn about S4 classes (and it is probably better 
suited as an S4 class than polynomials).



The GitHub code draws the agents, but was somehow hacked together. For 
anyone interested:


https://github.com/discoleo/R/blob/master/Stat/ABM.Models.Particles.R


Many thanks,


Leonard

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Dispatching on 2 arguments?

2021-11-06 Thread Leonard Mada via R-help


Dear List-members,


I would like to experiment with dispatching on 2 arguments and have a 
few questions.



p1 = data.frame(x=1:3, coeff=1)
class(p1) = c("pm", class(p1));

I want to replace variables in a polynomial with either:
another polynomial, or another variable (character) or with a specific 
value.



1.) Can I dispatch on 2 arguments?


replace.pm.? = function(p1, p2, ...) {...}
or classic:
replace.pm = function(p1, p2, ...) {
    if(is.numeric(p2) || is.complex(p2)) return(replace.pm.numeric(p1, 
p2, ...));

    if(is.character(p2)) return(replace.pm.character(p1, p2=p2, ...));
   else ...

}


I will provide some realistic examples below.


2.) Advantages / Disadvantages of each method
What are the advantages or disadvantages to each of these methods?
I do not yet understand what should be the best design.


Real example:


### Quintic
p1 = toPoly.pm("x^5 - 5*K*x^3 - 5*(K^2 + K)*x^2 - 5*K^3*x - K^4 - 6*K^3 
+ 5*K^2 - K")

# fractional powers: [works as well]
r = toPoly.pm("K^(4/5) + K^(3/5) + K^(1/5)")
# - we just found a root of a non-trivial quintic!
#   all variables/monomials got cancelled;
replace.pm(p1, r, "x", pow=1)

# more formal
r = toPoly.pm("k^4*m^4 + k^3*m^3 + k*m")
# m^5 = 1; # m = any of the 5 roots of unity of order 5;
pR = p1;
pR = replace.pm(pR, r, xn="x") # poly
pR = replace.pm(pR, "K", xn="k", pow=5) # character
pR = replace.pm(pR, 1, xn="m", pow=5) # value
pR # the roots worked! [no remaining rows]
# - we just found ALL 5 roots!


The code is on Github (see below).


Sincerely,


Leonard

=

# very experimental code
# some names & arguments may change;

source("Polynomials.Helper.R")
# also required, but are loaded automatically if present in wd;
# source("Polynomials.Helper.Parser.R")
# source("Polynomials.Helper.Format.R")
### not necessary for this Test (just loaded)
# source("Polynomials.Helper.D.R")
# source("Polynomials.Helper.Factorize.R")
# the libraries pracma & polynom are not really required for this test 
either;


### Github:
https://github.com/discoleo/R/blob/master/Math/Polynomials.Helper.R
https://github.com/discoleo/R/blob/master/Math/Polynomials.Helper.Parser.R
https://github.com/discoleo/R/blob/master/Math/Polynomials.Helper.Format.R
# not necessary for this Test
https://github.com/discoleo/R/blob/master/Math/Polynomials.Helper.D.R
https://github.com/discoleo/R/blob/master/Math/Polynomials.Helper.Factorize.R

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] names.data.frame?

2021-11-03 Thread Leonard Mada via R-help

Thank you very much.


Indeed, NextMethod() is the correct way and is working fine.


There are some alternatives (as pointed out). Although I am still trying 
to figure out what would be the best design strategy of such a class.


Note:

- I wanted to exclude the "coeff" column from the returned names;
names.pm = function(p) {
     nms = NextMethod();
     # excludes the Coefficients:
     id = match("coeff", nms);
     return(nms[ - id]);
}


This is why I also hesitate regarding what to use: dimnames or names.


Sincerely,


Leonard


On 11/3/2021 8:54 PM, Andrew Simmons wrote:
> First, your signature for names.pm <http://names.pm> is wrong. It 
> should look something more like:
>
>
> names.pm <http://names.pm> <- function (x)
> {
> }
>
>
> As for the body of the function, you might do something like:
>
>
> names.pm <http://names.pm> <- function (x)
> {
>     NextMethod()
> }
>
>
> but you don't need to define a names method if you're just going to 
> call the next method. I would suggest not defining a names method at all.
>
>
> As a side note, I would suggest making your class through the methods 
> package, with methods::setClass("pm", ...)
> See the documentation for setClass for more details, it's the 
> recommended way to define classes in R.
>
> On Wed, Nov 3, 2021 at 2:36 PM Leonard Mada via R-help 
>  wrote:
>
> Dear List members,
>
>
> Is there a way to access the default names() function?
>
>
> I tried the following:
>
> # Multi-variable polynomial
>
> p = data.frame(x=1:3, coeff=1)
>
> class(p) = c("pm", class(p));
>
>
> names.pm <http://names.pm> = function(p) {
> # .Primitive("names")(p) # does NOT function
> # .Internal("names")(p) # does NOT function
> # nms = names.default(p) # does NOT exist
> # nms = names.data.frame(p) # does NOT exist
> # nms = names(p); # obvious infinite recursion;
> nms = names(unclass(p));
> }
>
>
> Alternatively:
>
> Would it be better to use dimnames.pm <http://dimnames.pm> instead
> of names.pm <http://names.pm>?
>
> I am not fully aware of the advantages and disadvantages of
> dimnames vs
> names.
>
>
> Sincerely,
>
>
> Leonard
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> <http://www.R-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] names.data.frame?

2021-11-03 Thread Leonard Mada via R-help


Dear List members,


Is there a way to access the default names() function?


I tried the following:

# Multi-variable polynomial

p = data.frame(x=1:3, coeff=1)

class(p) = c("pm", class(p));


names.pm = function(p) {
# .Primitive("names")(p) # does NOT function
# .Internal("names")(p) # does NOT function
# nms = names.default(p) # does NOT exist
# nms = names.data.frame(p) # does NOT exist
# nms = names(p); # obvious infinite recursion;
nms = names(unclass(p));
}


Alternatively:

Would it be better to use dimnames.pm instead of names.pm?

I am not fully aware of the advantages and disadvantages of dimnames vs 
names.



Sincerely,


Leonard

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to use ifelse without invoking warnings

2021-10-09 Thread Leonard Mada via R-help


Dear Ravi,


I have uploaded on GitHub a version which handles also constant values 
instead of functions.



Regarding named arguments: this is actually handled automatically as well:

eval.by.formula((x > 5 & x %% 2) ~ (x <= 5) ~ ., FUN, y=2, x)
# [1]  1  4  9 16 25  6 14  8 18 10
eval.by.formula((x > 5 & x %% 2) ~ (x <= 5) ~ ., FUN, x=2, x)
# [1]  4  4  4  4  4  2 14  2 18  2
eval.by.formula((x > 5 & x %% 2) ~ (x <= 5) ~ ., list(FUN[[1]], 0, 1), 
y=2, x)

 # [1]  0  0  0  0  0  1 14  1 18  1


But it still needs proper testing and maybe optimization: it is possible 
to run sapply on the filtered sequence (but I did not want to break 
anything now).



Sincerely,


Leonard



On 10/9/2021 9:26 PM, Leonard Mada wrote:

Dear Ravi,


I wrote a small replacement for ifelse() which avoids such unnecessary 
evaluations (it bothered me a few times as well - so I decided to try 
a small replacement).



### Example:
x = 1:10
FUN = list();
FUN[[1]] = function(x, y) x*y;
FUN[[2]] = function(x, y) x^2;
FUN[[3]] = function(x, y) x;
# lets run multiple conditions
# eval.by.formula(conditions, FUN.list, ... (arguments for FUN) );
eval.by.formula((x > 5 & x %% 2) ~ (x <= 5) ~ ., FUN, x, x-1)
# Example 2
eval.by.formula((x > 5 & x %% 2) ~ (x <= 5) ~ ., FUN, 2, x)


### Disclaimer:
- NOT properly tested;


The code for the function is below. Maybe someone can experiment with 
the code and improve it further. There are a few issues / open 
questions, like:


1.) Best Name: eval.by.formula, ifelse.formula, ...?

2.) Named arguments: not yet;

3.) Fixed values inside FUN.list

4.) Format of expression for conditions:

expression(cond1, cond2, cond3) vs cond1 ~ cond2 ~ cond3 ???

5.) Code efficiency

- some tests on large data sets & optimizations are warranted;


Sincerely,


Leonard

===

The latest code is on Github:

https://github.com/discoleo/R/blob/master/Stat/Tools.Formulas.R

[...]



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to use ifelse without invoking warnings

2021-10-09 Thread Leonard Mada via R-help


Dear Ravi,


I wrote a small replacement for ifelse() which avoids such unnecessary 
evaluations (it bothered me a few times as well - so I decided to try a 
small replacement).



### Example:
x = 1:10
FUN = list();
FUN[[1]] = function(x, y) x*y;
FUN[[2]] = function(x, y) x^2;
FUN[[3]] = function(x, y) x;
# lets run multiple conditions
# eval.by.formula(conditions, FUN.list, ... (arguments for FUN) );
eval.by.formula((x > 5 & x %% 2) ~ (x <= 5) ~ ., FUN, x, x-1)
# Example 2
eval.by.formula((x > 5 & x %% 2) ~ (x <= 5) ~ ., FUN, 2, x)


### Disclaimer:
- NOT properly tested;


The code for the function is below. Maybe someone can experiment with 
the code and improve it further. There are a few issues / open 
questions, like:


1.) Best Name: eval.by.formula, ifelse.formula, ...?

2.) Named arguments: not yet;

3.) Fixed values inside FUN.list

4.) Format of expression for conditions:

expression(cond1, cond2, cond3) vs cond1 ~ cond2 ~ cond3 ???

5.) Code efficiency

- some tests on large data sets & optimizations are warranted;


Sincerely,


Leonard

===

The latest code is on Github:

https://github.com/discoleo/R/blob/master/Stat/Tools.Formulas.R


eval.by.formula = function(e, FUN.list, ..., default=NA) {
    tok = split.formula(e);
    if(length(tok) == 0) return();
    FUN = FUN.list;
    # Argument List
    clst = substitute(as.list(...))[-1];
    len  = length(clst);
    clst.all = lapply(clst, eval);
    eval.f = function(idCond) {
    sapply(seq(length(isEval)), function(id) {
        if(isEval[[id]] == FALSE) return(default);
        args.l = lapply(clst.all, function(a) if(length(a) == 1) a 
else a[[id]]);

        do.call(FUN[[idCond]], args.l);
    });
    }
    # eval 1st condition:
    isEval = eval(tok[[1]]);
    rez = eval.f(1);
    if(length(tok) == 1) return(rez);
    # eval remaining conditions
    isEvalAll = isEval;
    for(id in seq(2, length(tok))) {
    if(tok[[id]] == ".") {
        # Remaining conditions: tok == ".";
        # makes sens only on the last position
        if(id < length(tok)) warning("\".\" is not last!");
        isEval = ! isEvalAll;
        rez[isEval] = eval.f(id)[isEval];
        next;
    }
    isEval = rep(FALSE, length(isEval));
    isEval[ ! isEvalAll] = eval(tok[[id]])[ ! isEvalAll];
    isEvalAll[isEval] = isEval[isEval];
    rez[isEval] = eval.f(id)[isEval];
    }
    return(rez);
}


# current code uses the formula format:
# cond1 ~ cond 2 ~ cond3

# tokenizes a formula in its parts delimited by "~"
# Note:
# - tokenization is automatic for ",";
# - but call MUST then use FUN(expression(_conditions_), other_args, ...);
split.formula = function(e) {
    tok = list();
    while(length(e) > 0) {
    if(e[[1]] == "~") {
        if(length(e) == 2) { tok = c(NA, e[[2]], tok); break; }
        tok = c(e[[3]], tok);
        e = e[[2]];
    } else {
        tok = c(e, tok); break;
    }
    }
    return(tok);
}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Extracting Comments from Functions/Packages

2021-10-07 Thread Leonard Mada via R-help


Dear R Users,


I wrote a minimal parser to extract strings and comments from the 
function definitions.



The string extraction works fine. But there are no comments:

a.) Are the comments stripped from the compiled packages?

b.) Alternatively: Is the deparse() not suited for this task?

b.2.) Is deparse() parsing the function/expression itself?

[see code for extract.str.fun() function below]


### All strings in "base"
extract.str.pkg("base")
# type = 2 for Comments:
extract.str.pkg("base", type=2)
extract.str.pkg("sp", type=2)
extract.str.pkg("NetLogoR", type=2)

The code for the 2 functions (extract.str.pkg & extract.str.fun) and the 
code for the parse.simple() parser are below.



Sincerely,


Leonard

===

The latest code is on GitHub:

https://github.com/discoleo/R/blob/master/Stat/Tools.Formulas.R


### Code to process functions in packages:
extract.str.fun = function(fn, pkg, type=1, strip=TRUE) {
    fn = as.symbol(fn); pkg = as.symbol(pkg);
    fn = list(substitute(pkg ::: fn));
    # deparse
    s = paste0(do.call(deparse, fn), collapse="");
    npos = parse.simple(s);
    extract.str(s, npos[[type]], strip=strip)
}
extract.str.pkg = function(pkg, type=1, exclude.z = TRUE, strip=TRUE) {
    nms = ls(getNamespace(pkg));
    l = lapply(nms, function(fn) extract.str.fun(fn, pkg, type=type, 
strip=strip));

    if(exclude.z) {
        hasStr = sapply(l, function(s) length(s) >= 1);
        nms = nms[hasStr];
        l = l[hasStr];
    }
    names(l) = nms;
    return(l);
}

### minimal Parser:
# - proof of concept;
# - may be useful to process non-conformant R "code", e.g.:
#   "{\"abc\" + \"bcd\"} {FUN}"; (still TODO)
# Warning:
# - not thoroughly checked &
#   may be a little buggy!

parse.simple = function(x, eol="\n") {
    len = nchar(x);
    n.comm = list(integer(0), integer(0));
    n.str  = list(integer(0), integer(0));
    is.hex = function(ch) {
        # Note: only for 1 character!
        return((ch >= "0" && ch <= "9") ||
            (ch >= "A" && ch <= "F") ||
            (ch >= "a" && ch <= "f"));
    }
    npos = 1;
    while(npos <= len) {
        s = substr(x, npos, npos);
        # State: COMMENT
        if(s == "#") {
            n.comm[[1]] = c(n.comm[[1]], npos);
            while(npos < len) {
                npos = npos + 1;
                if(substr(x, npos, npos) == eol) break;
            }
            n.comm[[2]] = c(n.comm[[2]], npos);
            npos = npos + 1; next;
        }
        # State: STRING
        if(s == "\"" || s == "'") {
            n.str[[1]] = c(n.str[[1]], npos);
            while(npos < len) {
                npos = npos + 1;
                se = substr(x, npos, npos);
                if(se == "\\") {
                    npos = npos + 1;
                    # simple escape vs Unicode:
                    if(substr(x, npos, npos) != "u") next;
                    len.end = min(len, npos + 4);
                    npos = npos + 1;
                    isAllHex = TRUE;
                    while(npos <= len.end) {
                        se = substr(x, npos, npos);
                        if( ! is.hex(se)) { isAllHex = FALSE; break; }
                        npos = npos + 1;
                    }
                    if(isAllHex) next;
                }
                if(se == s) break;
            }
            n.str[[2]] = c(n.str[[2]], npos);
            npos = npos + 1; next;
        }
        npos = npos + 1;
    }
    return(list(str = n.str, comm = n.comm));
}


extract.str = function(s, npos, strip=FALSE) {
    if(length(npos[[1]]) == 0) return(character(0));
    strip.FUN = if(strip) {
            function(id) {
                if(npos[[1]][[id]] + 1 < npos[[2]][[id]]) {
                    nStart = npos[[1]][[id]] + 1;
                    nEnd = npos[[2]][[id]] - 1; # TODO: Error with 
malformed string

                    return(substr(s, nStart, nEnd));
                } else {
                    return("");
                }
            }
        } else function(id) substr(s, npos[[1]][[id]], npos[[2]][[id]]);
    sapply(seq(length(npos[[1]])), strip.FUN);
}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Descriptive Statistics: useful hacks

2021-10-05 Thread Leonard Mada via R-help

Dear R users,


I wrote in the meantime a new function:

apply.html(html, XPATH, FUN, ...)


This function applies FUN to the nodes selected using XPATH. However, I 
wonder if there is a possibility to use more simple selectors (e.g. 
jQuery). Although I am not an expert with jQuery, it may be easier for 
end users than XPATH.


Package htmltools does not seem to offer support to import a native html 
file, nor do I see any functions using jQuery selectors. I do not seem 
to find any such packages. I would be glad for any hints.


Many thanks,


Leonard

===

Latest code is on Github:

https://github.com/discoleo/R/blob/master/Stat/Tools.DescriptiveStatistics.R


Notes:

1.) as.html() currently imports only a few types, but it could be easily 
extended to fully generic html;

Note: the export as shiny app may not work with a fully generic html; I 
have not yet explored all the implications!

2.) I am still struggling to understand how to best design the option: 
with.tags = TRUE.

3.) llammas.FUN: Was implemented at great expense and at the last 
minute, but unfortunately is still incomplete and important visual 
styles are missing. Help is welcomed.


On 10/3/2021 1:00 AM, Leonard Mada wrote:
> Dear R Users,
>
>
> I have started to compile some useful hacks for the generation of nice 
> descriptive statistics. I hope that these functions & hacks are useful 
> to the wider R community. I hope that package developers also get some 
> inspiration from the code or from these ideas.
>
>
> I have started to review various packages focused on descriptive 
> statistics - although I am still at the very beginning.
>
>
> ### Hacks / Code
> - split table headers in 2 rows;
> - split results over 2 rows: view.gtsummary(...);
> - add abbreviations as footnotes: add.abbrev(...);
>
> The results are exported as a web page (using shiny) and can be 
> printed as a pdf documented. See the following pdf example:
>
> https://github.com/discoleo/R/blob/master/Stat/Tools.DescriptiveStatistics.Example_1.pdf
>  
>
>
>
> ### Example
> # currently focused on package gtsummary
> library(gtsummary)
> library(xml2)
>
> mtcars %>%
>     # rename2():
>     # - see file Tools.Data.R;
>     # - behaves in most cases the same as dplyr::rename();
>     rename2("HP" = "hp", "Displ" = disp, "Wt (klb)" = "wt", "Rar" = 
> drat) %>%
>     # as.factor.df():
>     # - see file Tools.Data.R;
>     # - encode as (ordered) factor;
>     as.factor.df("cyl", "Cyl ") %>%
>     # the Descriptive Statistics:
>     tbl_summary(by = cyl) %>%
>     modify_header(update = header) %>%
>     add_p() %>%
>     add_overall() %>%
>     modify_header(update = header0) %>%
>     # Hack: split long statistics !!!
>     view.gtsummary(view=FALSE, len=8) %>%
>     add.abbrev(
>         c("Displ", "HP", "Rar", "Wt (klb)" = "Wt"),
>         c("Displacement (in^3)", "Gross horsepower", "Rear axle ratio",
>         "Weight (1000 lbs)"));
>
>
> The required functions are on Github:
> https://github.com/discoleo/R/blob/master/Stat/Tools.DescriptiveStatistics.R 
>
>
>
> The functions rename2() & as.factor.df() are only data-helpers and can 
> be found also on Github:
> https://github.com/discoleo/R/blob/master/Stat/Tools.Data.R
>
>
> Note:
>
> 1.) The function add.abbrev() operates on the generated html-code:
>
> - the functionality is more generic and could be used easily with 
> other packages that export web pages as well;
>
> 2.) Split statistics: is an ugly hack. I plan to redesign the 
> functionality using xml-technologies. But I have already too many 
> side-projects.
>
> 3.) as.factor.df(): traditionally, one would create derived data-sets 
> or add a new column with the variable as factor (as the user may need 
> the numeric values for further analysis). But it looked nicer as a 
> single block of code.
>
>
> Sincerely,
>
>
> Leonard
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Descriptive Statistics: useful hacks

2021-10-02 Thread Leonard Mada via R-help


Dear R Users,


I have started to compile some useful hacks for the generation of nice 
descriptive statistics. I hope that these functions & hacks are useful 
to the wider R community. I hope that package developers also get some 
inspiration from the code or from these ideas.



I have started to review various packages focused on descriptive 
statistics - although I am still at the very beginning.



### Hacks / Code
- split table headers in 2 rows;
- split results over 2 rows: view.gtsummary(...);
- add abbreviations as footnotes: add.abbrev(...);

The results are exported as a web page (using shiny) and can be printed 
as a pdf documented. See the following pdf example:


https://github.com/discoleo/R/blob/master/Stat/Tools.DescriptiveStatistics.Example_1.pdf


### Example
# currently focused on package gtsummary
library(gtsummary)
library(xml2)

mtcars %>%
    # rename2():
    # - see file Tools.Data.R;
    # - behaves in most cases the same as dplyr::rename();
    rename2("HP" = "hp", "Displ" = disp, "Wt (klb)" = "wt", "Rar" = 
drat) %>%

    # as.factor.df():
    # - see file Tools.Data.R;
    # - encode as (ordered) factor;
    as.factor.df("cyl", "Cyl ") %>%
    # the Descriptive Statistics:
    tbl_summary(by = cyl) %>%
    modify_header(update = header) %>%
    add_p() %>%
    add_overall() %>%
    modify_header(update = header0) %>%
    # Hack: split long statistics !!!
    view.gtsummary(view=FALSE, len=8) %>%
    add.abbrev(
        c("Displ", "HP", "Rar", "Wt (klb)" = "Wt"),
        c("Displacement (in^3)", "Gross horsepower", "Rear axle ratio",
        "Weight (1000 lbs)"));


The required functions are on Github:
https://github.com/discoleo/R/blob/master/Stat/Tools.DescriptiveStatistics.R 




The functions rename2() & as.factor.df() are only data-helpers and can 
be found also on Github:

https://github.com/discoleo/R/blob/master/Stat/Tools.Data.R


Note:

1.) The function add.abbrev() operates on the generated html-code:

- the functionality is more generic and could be used easily with other 
packages that export web pages as well;


2.) Split statistics: is an ugly hack. I plan to redesign the 
functionality using xml-technologies. But I have already too many 
side-projects.


3.) as.factor.df(): traditionally, one would create derived data-sets or 
add a new column with the variable as factor (as the user may need the 
numeric values for further analysis). But it looked nicer as a single 
block of code.



Sincerely,


Leonard

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Word-Wrapper Library/Package?

2021-09-29 Thread Leonard Mada via R-help

Many thanks for the hint.


The function is actually a wrapper for:

stringi:::stri_wrap

I will need to have a closer look to this one (documentation and maybe 
also peek into the code):

ret <- .Call(C_stri_wrap, str, width, cost_exponent, indent,
     exdent, prefix, initial, whitespace_only, use_length,
     locale)


I thought at some point about the stringi package, but somehow 
overlooked it.


Sincerely,


Leonard


On 9/30/2021 1:09 AM, CALUM POLWART wrote:
> Have you looked at stringr::str_wrap or its parent function 
> stringi::stri_wrap ?
>
> It applies an algorithm for the wrap. But it doesn't vectorise the 
> lines they are returned with \n for new lines, but you could apply a 
> string split to that result...
>
>
> On 29 Sep 2021 04:57, Andrew Simmons  wrote:
>
> 'strwrap' should wrap at the target column, so I think it's behaving
> correctly. You could do + 1 if you're expecting it to wrap
> immediately
> after the target column.
>
> As far as splitting while trying to minimize a penalty, I don't think
> strwrap can do that, and I don't know of any packages that do such
> a thing.
> If such a thing exists in another language, there's probably an R
> package
> with a similar name containing ports of such functions, that might
> be your
> best bet. I hope this helps.
>
> On Tue, Sep 28, 2021, 23:51 Leonard Mada  wrote:
>
> > Thank you Andrew.
> >
> >
> > I will explore this function more, although I am struggling to
> get it to
> > work properly:
> >
> > strwrap("Abc. B. Defg", 7)
> > # [1] "Abc." "B."   "Defg"
> >
> > # both "Abc. B." and "B. Defg" are 7 characters long.
> >
> > strwrap(paste0(rep("ab", 7), collapse=""), 7)
> > # [1] "ababababababab"
> >
> >
> > Can I set an absolute maximum width?
> >
> > It would be nice to have an algorithm that computes a penalty
> for the
> > split and selects the split with the smallest penalty (when no
> obvious
> > split is possible).
> >
> >
> > Sincerely,
> >
> >
> > Leonard
> >
> >
> >
> > On 9/29/2021 6:30 AM, Andrew Simmons wrote:
> >
> > I think what you're looking for is 'strwrap', it's in package base.
> >
> > On Tue, Sep 28, 2021, 22:26 Leonard Mada via R-help
> 
> > wrote:
> >
> >> Dear R-Users,
> >>
> >>
> >> Does anyone know any package or library that implements
> functions for
> >> word wrapping?
> >>
> >>
> >> I did implement a very rudimentary one (Github link below), but
> would
> >> like to avoid to reinvent the wheel. Considering that
> word-wrapping is a
> >> very common task, it should be available even in base R (e.g. in a
> >> "format" module/package).
> >>
> >>
> >> Sincerely,
> >>
> >>
> >> Leonard
> >>
> >> ===
> >>
> >> The latest versions of the functions are on Github:
> >>
> >> https://github.com/discoleo/R/blob/master/Stat/Tools.CRAN.R
> >> # Note:
> >> # - the function implementing word wrapping: split.N.line(...);
> >> # - for the example below: the functions defined in
> Tools.CRAN.R are
> >> required;
> >>
> >>
> >> Examples:
> >> ### Search CRAN
> >> library(pkgsearch)
> >>
> >> searchCran = function(s, from=1, len=60, len.print=20, extend="*",
> >>  sep=" ", sep.h="-") {
> >>  if( ! is.null(extend)) s = paste0(s, extend);
> >>  x = advanced_search(s, size=len, from=from);
> >>  if(length(x$package_data) == 0) {
> >>  cat("No packages found!", sep="\n");
> >>  } else {
> >>  scroll.pkg(x, len=len.print, sep=sep, sep.h=sep.h);
> >>  }
> >>  invisible(x)
> >> }
> >>
> >> # with nice formatting & printing:
> >> x = searchCran("text", from=60, sep.h="-")
> >>
> >> scroll.pkg(x, start=20, len=21, sep.h = "-*")
&

Re: [R] Word-Wrapper Library/Package?

2021-09-28 Thread Leonard Mada via R-help

Thank you Andrew.


I will explore this function more, although I am struggling to get it to 
work properly:

strwrap("Abc. B. Defg", 7)
# [1] "Abc." "B."   "Defg"

# both "Abc. B." and "B. Defg" are 7 characters long.

strwrap(paste0(rep("ab", 7), collapse=""), 7)
# [1] "ababababababab"


Can I set an absolute maximum width?

It would be nice to have an algorithm that computes a penalty for the 
split and selects the split with the smallest penalty (when no obvious 
split is possible).


Sincerely,


Leonard



On 9/29/2021 6:30 AM, Andrew Simmons wrote:
> I think what you're looking for is 'strwrap', it's in package base.
>
> On Tue, Sep 28, 2021, 22:26 Leonard Mada via R-help 
> mailto:r-help@r-project.org>> wrote:
>
> Dear R-Users,
>
>
> Does anyone know any package or library that implements functions for
> word wrapping?
>
>
> I did implement a very rudimentary one (Github link below), but would
> like to avoid to reinvent the wheel. Considering that
> word-wrapping is a
> very common task, it should be available even in base R (e.g. in a
> "format" module/package).
>
>
> Sincerely,
>
>
> Leonard
>
> ===
>
> The latest versions of the functions are on Github:
>
> https://github.com/discoleo/R/blob/master/Stat/Tools.CRAN.R
> <https://github.com/discoleo/R/blob/master/Stat/Tools.CRAN.R>
> # Note:
> # - the function implementing word wrapping: split.N.line(...);
> # - for the example below: the functions defined in Tools.CRAN.R are
> required;
>
>
> Examples:
> ### Search CRAN
> library(pkgsearch)
>
> searchCran = function(s, from=1, len=60, len.print=20, extend="*",
>      sep=" ", sep.h="-") {
>  if( ! is.null(extend)) s = paste0(s, extend);
>  x = advanced_search(s, size=len, from=from);
>  if(length(x$package_data) == 0) {
>      cat("No packages found!", sep="\n");
>  } else {
>      scroll.pkg(x, len=len.print, sep=sep, sep.h=sep.h);
>  }
>  invisible(x)
> }
>
> # with nice formatting & printing:
> x = searchCran("text", from=60, sep.h="-")
>
> scroll.pkg(x, start=20, len=21, sep.h = "-*")
> # test of sep.h=NULL vs ...
>
>
> Notes:
>
> 1.) split.N.line:
>
> - was implemented to output a pre-specified number of lines (kind of
> "maxLines"), but this is not required from an actual word-wrapper;
>
> - it was an initial design decision when implementing the
> format.lines()
> function; but I plan to implement a 1-pass exact algorithm during the
> next few days anyway;
>
> 2.) Refactoring
>
> - I will also move the formatting code to a new file: probably
> Tools.Formatting.R;
>
> - the same applies for the formatting code for ftable (currently
> in file
> Tools.Data.R);
>
> 3.) Package gridtext
>
> - seems to have some word-wrapping functionality, but does not
> seem to
> expose it;
>
> - I am also currently focused on character-based word wrapping
> (e.g. for
> RConsole);
>
>
>
>         [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org <mailto:R-help@r-project.org> mailing list --
> To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> <https://stat.ethz.ch/mailman/listinfo/r-help>
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> <http://www.R-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Word-Wrapper Library/Package?

2021-09-28 Thread Leonard Mada via R-help

Dear R-Users,


Does anyone know any package or library that implements functions for 
word wrapping?


I did implement a very rudimentary one (Github link below), but would 
like to avoid to reinvent the wheel. Considering that word-wrapping is a 
very common task, it should be available even in base R (e.g. in a 
"format" module/package).


Sincerely,


Leonard

===

The latest versions of the functions are on Github:

https://github.com/discoleo/R/blob/master/Stat/Tools.CRAN.R
# Note:
# - the function implementing word wrapping: split.N.line(...);
# - for the example below: the functions defined in Tools.CRAN.R are 
required;


Examples:
### Search CRAN
library(pkgsearch)

searchCran = function(s, from=1, len=60, len.print=20, extend="*",
         sep=" ", sep.h="-") {
     if( ! is.null(extend)) s = paste0(s, extend);
     x = advanced_search(s, size=len, from=from);
     if(length(x$package_data) == 0) {
         cat("No packages found!", sep="\n");
     } else {
         scroll.pkg(x, len=len.print, sep=sep, sep.h=sep.h);
     }
     invisible(x)
}

# with nice formatting & printing:
x = searchCran("text", from=60, sep.h="-")

scroll.pkg(x, start=20, len=21, sep.h = "-*")
# test of sep.h=NULL vs ...


Notes:

1.) split.N.line:

- was implemented to output a pre-specified number of lines (kind of 
"maxLines"), but this is not required from an actual word-wrapper;

- it was an initial design decision when implementing the format.lines() 
function; but I plan to implement a 1-pass exact algorithm during the 
next few days anyway;

2.) Refactoring

- I will also move the formatting code to a new file: probably 
Tools.Formatting.R;

- the same applies for the formatting code for ftable (currently in file 
Tools.Data.R);

3.) Package gridtext

- seems to have some word-wrapping functionality, but does not seem to 
expose it;

- I am also currently focused on character-based word wrapping (e.g. for 
RConsole);



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading File Sizes: very slow!

2021-09-26 Thread Leonard Mada via R-help



On 9/27/2021 1:06 AM, Leonard Mada wrote:
>
> Dear Bill,
>
>
> Does list.files() always sort the results?
>
> It seems so. The option: full.names = FALSE does not have any effect: 
> the results seem always sorted.
>
>
> Maybe it is better to process the files in an unsorted order: as 
> stored on the disk?
>

After some more investigations:

This took only a few seconds:

sapply(list.dirs(path=path, full.name=F, recursive=F),
     function(f) length(list.files(path = paste0(path, "/", f), 
full.names = FALSE, recursive = TRUE)))

# maybe with caching, but the difference is enormous


Seems BH contains *by far* the most files: 11701 files.

But excluding it from processing did have only a liniar effect: still 377 s.


I had a look at src/main/platform.c, but do not fully understand it.


Sincerely,


Leonard


>
> Sincerely,
>
>
> Leonard
>
>
> On 9/25/2021 8:13 PM, Bill Dunlap wrote:
>> On my Windows 10 laptop I see evidence of the operating system 
>> caching information about recently accessed files.  This makes it 
>> hard to say how the speed might be improved.  Is there a way to clear 
>> this cache?
>>
>> > system.time(L1 <- size.f.pkg(R.home("library")))
>>    user  system elapsed
>>    0.48    2.81   30.42
>> > system.time(L2 <- size.f.pkg(R.home("library")))
>>    user  system elapsed
>>    0.35    1.10    1.43
>> > identical(L1,L2)
>> [1] TRUE
>> > length(L1)
>> [1] 30
>> > length(dir(R.home("library"),recursive=TRUE))
>> [1] 12949
>>
>> On Sat, Sep 25, 2021 at 8:12 AM Leonard Mada via R-help 
>> mailto:r-help@r-project.org>> wrote:
>>
>> Dear List Members,
>>
>>
>> I tried to compute the file sizes of each installed package and the
>> process is terribly slow.
>>
>> It took ~ 10 minutes for 512 packages / 1.6 GB total size of files.
>>
>>
>> 1.) Package Sizes
>>
>>
>> system.time({
>>      x = size.pkg(file=NULL);
>> })
>> # elapsed time: 509 s !!!
>> # 512 Packages; 1.64 GB;
>> # R 4.1.1 on MS Windows 10
>>
>>
>> The code for the size.pkg() function is below and the latest
>> version is
>> on Github:
>>
>> https://github.com/discoleo/R/blob/master/Stat/Tools.CRAN.R
>> <https://github.com/discoleo/R/blob/master/Stat/Tools.CRAN.R>
>>
>>
>> Questions:
>> Is there a way to get the file size faster?
>> It takes long on Windows as well, but of the order of 10-20 s,
>> not 10
>> minutes.
>> Do I miss something?
>>
>>
>> 1.b.) Alternative
>>
>> It came to my mind to read first all file sizes and then use
>> tapply or
>> aggregate - but I do not see why it should be faster.
>>
>> Would it be meaningful to benchmark each individual package?
>>
>> Although I am not very inclined to wait 10 minutes for each new
>> try out.
>>
>>
>> 2.) Big Packages
>>
>> Just as a note: there are a few very large packages (in my list
>> of 512
>> packages):
>>
>> 1  123,566,287   BH
>> 2  113,578,391   sf
>> 3  112,252,652    rgdal
>> 4   81,144,868   magick
>> 5   77,791,374 openNLPmodels.en
>>
>> I suspect that sf & rgdal have a lot of duplicated data structures
>> and/or duplicate code and/or duplicated libraries - although I am
>> not an
>> expert in the field and did not check the sources.
>>
>>
>> Sincerely,
>>
>>
>> Leonard
>>
>> ===
>>
>>
>> # Package Size:
>> size.f.pkg = function(path=NULL) {
>>  if(is.null(path)) path = R.home("library");
>>  xd = list.dirs(path = path, full.names = FALSE, recursive =
>> FALSE);
>>  size.f = function(p) {
>>      p = paste0(path, "/", p);
>>      sum(file.info <http://file.info>(list.files(path=p,
>> pattern=".",
>>          full.names = TRUE, all.files = TRUE, recursive =
>> TRUE))$size);
>>  }
>>  sapply(xd, size.f);
>> }
>>
>> size.pkg = function(path=NULL, sort=TRUE, file="Packages.Size.csv") {
>>  x = size.f.pkg(path=path);
>>  x = as.data.frame(x);
>>  names(x) = "Size"

Re: [R] Reading File Sizes: very slow!

2021-09-26 Thread Leonard Mada via R-help

Dear Bill,


Does list.files() always sort the results?

It seems so. The option: full.names = FALSE does not have any effect: 
the results seem always sorted.


Maybe it is better to process the files in an unsorted order: as stored 
on the disk?


Sincerely,


Leonard


On 9/25/2021 8:13 PM, Bill Dunlap wrote:
> On my Windows 10 laptop I see evidence of the operating system caching 
> information about recently accessed files.  This makes it hard to say 
> how the speed might be improved.  Is there a way to clear this cache?
>
> > system.time(L1 <- size.f.pkg(R.home("library")))
>    user  system elapsed
>    0.48    2.81   30.42
> > system.time(L2 <- size.f.pkg(R.home("library")))
>    user  system elapsed
>    0.35    1.10    1.43
> > identical(L1,L2)
> [1] TRUE
> > length(L1)
> [1] 30
> > length(dir(R.home("library"),recursive=TRUE))
> [1] 12949
>
> On Sat, Sep 25, 2021 at 8:12 AM Leonard Mada via R-help 
> mailto:r-help@r-project.org>> wrote:
>
> Dear List Members,
>
>
> I tried to compute the file sizes of each installed package and the
> process is terribly slow.
>
> It took ~ 10 minutes for 512 packages / 1.6 GB total size of files.
>
>
> 1.) Package Sizes
>
>
> system.time({
>      x = size.pkg(file=NULL);
> })
> # elapsed time: 509 s !!!
> # 512 Packages; 1.64 GB;
> # R 4.1.1 on MS Windows 10
>
>
> The code for the size.pkg() function is below and the latest
> version is
> on Github:
>
> https://github.com/discoleo/R/blob/master/Stat/Tools.CRAN.R
> <https://github.com/discoleo/R/blob/master/Stat/Tools.CRAN.R>
>
>
> Questions:
> Is there a way to get the file size faster?
> It takes long on Windows as well, but of the order of 10-20 s, not 10
> minutes.
> Do I miss something?
>
>
> 1.b.) Alternative
>
> It came to my mind to read first all file sizes and then use
> tapply or
> aggregate - but I do not see why it should be faster.
>
> Would it be meaningful to benchmark each individual package?
>
> Although I am not very inclined to wait 10 minutes for each new
> try out.
>
>
> 2.) Big Packages
>
> Just as a note: there are a few very large packages (in my list of
> 512
> packages):
>
> 1  123,566,287   BH
> 2  113,578,391   sf
> 3  112,252,652    rgdal
> 4   81,144,868   magick
> 5   77,791,374 openNLPmodels.en
>
> I suspect that sf & rgdal have a lot of duplicated data structures
> and/or duplicate code and/or duplicated libraries - although I am
> not an
> expert in the field and did not check the sources.
>
>
> Sincerely,
>
>
> Leonard
>
> ===
>
>
> # Package Size:
> size.f.pkg = function(path=NULL) {
>  if(is.null(path)) path = R.home("library");
>  xd = list.dirs(path = path, full.names = FALSE, recursive =
> FALSE);
>  size.f = function(p) {
>      p = paste0(path, "/", p);
>      sum(file.info <http://file.info>(list.files(path=p,
> pattern=".",
>          full.names = TRUE, all.files = TRUE, recursive =
> TRUE))$size);
>  }
>  sapply(xd, size.f);
> }
>
> size.pkg = function(path=NULL, sort=TRUE, file="Packages.Size.csv") {
>  x = size.f.pkg(path=path);
>  x = as.data.frame(x);
>  names(x) = "Size"
>  x$Name = rownames(x);
>  # Order
>  if(sort) {
>      id = order(x$Size, decreasing=TRUE)
>      x = x[id,];
>  }
>  if( ! is.null(file)) {
>      if( ! is.character(file)) {
>          print("Error: Size NOT written to file!");
>      } else write.csv(x, file=file, row.names=FALSE);
>  }
>  return(x);
> }
>
> __
> R-help@r-project.org <mailto:R-help@r-project.org> mailing list --
> To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> <https://stat.ethz.ch/mailman/listinfo/r-help>
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> <http://www.R-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading File Sizes: very slow!

2021-09-26 Thread Leonard Mada via R-help


Dear Bill,


- using the Ms Windows Properties: ~ 15 s;

[Windows new start, 1st operation, bulk size]

- using R / file.info() (2nd operation): still 523.6 s

[and R seems mostly unresponsive during this time]


Unfortunately, I do not know how to clear any cache.

[The cache may play a role only for smaller sizes? But I am rather not 
inclined to run the ~ 10 minutes procedure multiple times.]



Sincerely,


Leonard


On 9/26/2021 5:49 AM, Richard O'Keefe wrote:

On a $150 second-hand laptop with 0.9GB of library,
and a single-user installation of R so only one place to look
LIBRARY=$HOME/R/x86_64-pc-linux-gnu-library/4.0
cd $LIBRARY
echo "kbytes package"
du -sk * | sort -k1n

took 150 msec to report the disc space needed for every package.

That'

On Sun, 26 Sept 2021 at 06:14, Bill Dunlap  wrote:

On my Windows 10 laptop I see evidence of the operating system caching
information about recently accessed files.  This makes it hard to say how
the speed might be improved.  Is there a way to clear this cache?


system.time(L1 <- size.f.pkg(R.home("library")))

user  system elapsed
0.482.81   30.42

system.time(L2 <- size.f.pkg(R.home("library")))

user  system elapsed
0.351.101.43

identical(L1,L2)

[1] TRUE

length(L1)

[1] 30

length(dir(R.home("library"),recursive=TRUE))

[1] 12949

On Sat, Sep 25, 2021 at 8:12 AM Leonard Mada via R-help <
r-help@r-project.org> wrote:


Dear List Members,


I tried to compute the file sizes of each installed package and the
process is terribly slow.

It took ~ 10 minutes for 512 packages / 1.6 GB total size of files.


1.) Package Sizes


system.time({
  x = size.pkg(file=NULL);
})
# elapsed time: 509 s !!!
# 512 Packages; 1.64 GB;
# R 4.1.1 on MS Windows 10


The code for the size.pkg() function is below and the latest version is
on Github:

https://github.com/discoleo/R/blob/master/Stat/Tools.CRAN.R


Questions:
Is there a way to get the file size faster?
It takes long on Windows as well, but of the order of 10-20 s, not 10
minutes.
Do I miss something?


1.b.) Alternative

It came to my mind to read first all file sizes and then use tapply or
aggregate - but I do not see why it should be faster.

Would it be meaningful to benchmark each individual package?

Although I am not very inclined to wait 10 minutes for each new try out.


2.) Big Packages

Just as a note: there are a few very large packages (in my list of 512
packages):

1  123,566,287   BH
2  113,578,391   sf
3  112,252,652rgdal
4   81,144,868   magick
5   77,791,374 openNLPmodels.en

I suspect that sf & rgdal have a lot of duplicated data structures
and/or duplicate code and/or duplicated libraries - although I am not an
expert in the field and did not check the sources.


Sincerely,


Leonard

===


# Package Size:
size.f.pkg = function(path=NULL) {
  if(is.null(path)) path = R.home("library");
  xd = list.dirs(path = path, full.names = FALSE, recursive = FALSE);
  size.f = function(p) {
  p = paste0(path, "/", p);
  sum(file.info(list.files(path=p, pattern=".",
  full.names = TRUE, all.files = TRUE, recursive = TRUE))$size);
  }
  sapply(xd, size.f);
}

size.pkg = function(path=NULL, sort=TRUE, file="Packages.Size.csv") {
  x = size.f.pkg(path=path);
  x = as.data.frame(x);
  names(x) = "Size"
  x$Name = rownames(x);
  # Order
  if(sort) {
  id = order(x$Size, decreasing=TRUE)
  x = x[id,];
  }
  if( ! is.null(file)) {
  if( ! is.character(file)) {
  print("Error: Size NOT written to file!");
  } else write.csv(x, file=file, row.names=FALSE);
  }
  return(x);
}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Reading File Sizes: very slow!

2021-09-25 Thread Leonard Mada via R-help


Dear List Members,


I tried to compute the file sizes of each installed package and the 
process is terribly slow.


It took ~ 10 minutes for 512 packages / 1.6 GB total size of files.


1.) Package Sizes


system.time({
        x = size.pkg(file=NULL);
})
# elapsed time: 509 s !!!
# 512 Packages; 1.64 GB;
# R 4.1.1 on MS Windows 10


The code for the size.pkg() function is below and the latest version is 
on Github:


https://github.com/discoleo/R/blob/master/Stat/Tools.CRAN.R


Questions:
Is there a way to get the file size faster?
It takes long on Windows as well, but of the order of 10-20 s, not 10 
minutes.

Do I miss something?


1.b.) Alternative

It came to my mind to read first all file sizes and then use tapply or 
aggregate - but I do not see why it should be faster.


Would it be meaningful to benchmark each individual package?

Although I am not very inclined to wait 10 minutes for each new try out.


2.) Big Packages

Just as a note: there are a few very large packages (in my list of 512 
packages):


1  123,566,287   BH
2  113,578,391   sf
3  112,252,652    rgdal
4   81,144,868   magick
5   77,791,374 openNLPmodels.en

I suspect that sf & rgdal have a lot of duplicated data structures 
and/or duplicate code and/or duplicated libraries - although I am not an 
expert in the field and did not check the sources.



Sincerely,


Leonard

===


# Package Size:
size.f.pkg = function(path=NULL) {
    if(is.null(path)) path = R.home("library");
    xd = list.dirs(path = path, full.names = FALSE, recursive = FALSE);
    size.f = function(p) {
        p = paste0(path, "/", p);
        sum(file.info(list.files(path=p, pattern=".",
            full.names = TRUE, all.files = TRUE, recursive = TRUE))$size);
    }
    sapply(xd, size.f);
}

size.pkg = function(path=NULL, sort=TRUE, file="Packages.Size.csv") {
    x = size.f.pkg(path=path);
    x = as.data.frame(x);
    names(x) = "Size"
    x$Name = rownames(x);
    # Order
    if(sort) {
        id = order(x$Size, decreasing=TRUE)
        x = x[id,];
    }
    if( ! is.null(file)) {
        if( ! is.character(file)) {
            print("Error: Size NOT written to file!");
        } else write.csv(x, file=file, row.names=FALSE);
    }
    return(x);
}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Installed packages: Bioconductor vs CRAN?

2021-09-24 Thread Leonard Mada via R-help


[working version]

On 9/25/2021 2:55 AM, Leonard Mada wrote:

Dear List Members,


Is there a way to extract if an installed package is from Bioconductor 
or if it is a regular Cran package?



The information seems to be *not* available in:

installed.packages()


### [updated]

# Basic Info:
info.pkg = function(pkg=NULL, fields="Repository") {
    if(is.null(pkg)) { pkg = installed.packages(fields=fields); }
    else {
        all.pkg = installed.packages();
        pkg = all.pkg[all.pkg[,1] %in% pkg, ];
    }
    p = pkg;
    p = as.data.frame(p);
    p = p[ , c("Package", "Version", "Built", fields, "Imports")];
    return(p);
}


I will think later how to improve the filtering of Bioconductor 
packages. Probably based on biocViews.



Many thanks,


Leonard





Sincerely,


Leonard

===

I started to write some utility functions to analyse installed 
packages. The latest version is on Github:

https://github.com/discoleo/R/blob/master/Stat/Tools.CRAN.R


# Basic Info:
info.pkg = function(pkg=NULL) {
    if(is.null(pkg)) { pkg = installed.packages(); }
    else {
        all.pkg = installed.packages();
        pkg = all.pkg[all.pkg[,1] %in% pkg, ];
    }
    p = pkg;
    p = as.data.frame(p);
    p = p[ , c("Package", "Version", "Built", "Imports")];
    return(p);
}
# Imported packages:
imports.pkg = function(pkg=NULL, sort=TRUE) {
    p = info.pkg(pkg);
    ### Imported packages
    imp = lapply(p$Imports, function(s) strsplit(s, "[,][ ]*"))
    imp = unlist(imp)
    imp = imp[ ! is.na(imp)]
    # Cleanup:
    imp = sub("[ \n\r\t]*+\\([-,. >=0-9\n\t\r]++\\) *+$", "", imp, 
perl=TRUE)

    imp = sub("^[ \n\r\t]++", "", imp, perl=TRUE);
    # Tabulate:
    tbl = as.data.frame(table(imp), stringsAsFactors=FALSE);
    names(tbl)[1] = "Name";
    if(sort) {
        id = order(tbl$Freq, decreasing=TRUE);
        tbl = tbl[id,];
    }
    return(tbl);
}

match.imports = function(pkg, x=NULL, quote=FALSE) {
    if(is.null(x)) x = info.pkg();
    if(quote) {
        pkg = paste0("\\Q", pkg, "\\E");
    }
    # TODO: Use word delimiters?
    # "(

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Installed packages: Bioconductor vs CRAN?

2021-09-24 Thread Leonard Mada via R-help


Dear Bert,


Indeed, this seems to work:

installed.packages(fields="Repository")


I still need to figure out what variants to expect.


Sincerely,


Leonard


On 9/25/2021 3:31 AM, Leonard Mada wrote:

Dear Bert,


The DESCRIPTION file contains additional useful information, e.g.:

1.) Package EBImage:
biocViews: Visualization
Packaged: 2021-05-19 23:53:29 UTC; biocbuild


2.) deSolve
Repository: CRAN


I have verified a few of the CRAN packages, and they seem to include 
the tag:


Repository: CRAN


The Bioconductor packages are different (see e.g. EBImage).

I am wondering if there is already a method to extract this info?


Sincerely,


Leonard


On 9/25/2021 3:06 AM, Bert Gunter wrote:


The help file tells you that installed.packages() looks at the
DESCRIPTION files of packages.
Section 1.1.1 of "Writing R Extensions" tells you what information is
in such files.


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Fri, Sep 24, 2021 at 4:56 PM Leonard Mada via R-help
 wrote:

Dear List Members,


Is there a way to extract if an installed package is from Bioconductor
or if it is a regular Cran package?


The information seems to be *not* available in:

installed.packages()


Sincerely,


Leonard

===

I started to write some utility functions to analyse installed 
packages.

The latest version is on Github:
https://github.com/discoleo/R/blob/master/Stat/Tools.CRAN.R


# Basic Info:
info.pkg = function(pkg=NULL) {
  if(is.null(pkg)) { pkg = installed.packages(); }
  else {
  all.pkg = installed.packages();
  pkg = all.pkg[all.pkg[,1] %in% pkg, ];
  }
  p = pkg;
  p = as.data.frame(p);
  p = p[ , c("Package", "Version", "Built", "Imports")];
  return(p);
}
# Imported packages:
imports.pkg = function(pkg=NULL, sort=TRUE) {
  p = info.pkg(pkg);
  ### Imported packages
  imp = lapply(p$Imports, function(s) strsplit(s, "[,][ ]*"))
  imp = unlist(imp)
  imp = imp[ ! is.na(imp)]
  # Cleanup:
  imp = sub("[ \n\r\t]*+\\([-,. >=0-9\n\t\r]++\\) *+$", "", imp,
perl=TRUE)
  imp = sub("^[ \n\r\t]++", "", imp, perl=TRUE);
  # Tabulate:
  tbl = as.data.frame(table(imp), stringsAsFactors=FALSE);
  names(tbl)[1] = "Name";
  if(sort) {
  id = order(tbl$Freq, decreasing=TRUE);
  tbl = tbl[id,];
  }
  return(tbl);
}

match.imports = function(pkg, x=NULL, quote=FALSE) {
  if(is.null(x)) x = info.pkg();
  if(quote) {
  pkg = paste0("\\Q", pkg, "\\E");
  }
  # TODO: Use word delimiters?
  # "(https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Installed packages: Bioconductor vs CRAN?

2021-09-24 Thread Leonard Mada via R-help


Dear Bert,


The DESCRIPTION file contains additional useful information, e.g.:

1.) Package EBImage:
biocViews: Visualization
Packaged: 2021-05-19 23:53:29 UTC; biocbuild


2.) deSolve
Repository: CRAN


I have verified a few of the CRAN packages, and they seem to include the 
tag:


Repository: CRAN


The Bioconductor packages are different (see e.g. EBImage).

I am wondering if there is already a method to extract this info?


Sincerely,


Leonard


On 9/25/2021 3:06 AM, Bert Gunter wrote:


The help file tells you that installed.packages() looks at the
DESCRIPTION files of packages.
Section 1.1.1 of "Writing R Extensions" tells you what information is
in such files.


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Fri, Sep 24, 2021 at 4:56 PM Leonard Mada via R-help
 wrote:

Dear List Members,


Is there a way to extract if an installed package is from Bioconductor
or if it is a regular Cran package?


The information seems to be *not* available in:

installed.packages()


Sincerely,


Leonard

===

I started to write some utility functions to analyse installed packages.
The latest version is on Github:
https://github.com/discoleo/R/blob/master/Stat/Tools.CRAN.R


# Basic Info:
info.pkg = function(pkg=NULL) {
  if(is.null(pkg)) { pkg = installed.packages(); }
  else {
  all.pkg = installed.packages();
  pkg = all.pkg[all.pkg[,1] %in% pkg, ];
  }
  p = pkg;
  p = as.data.frame(p);
  p = p[ , c("Package", "Version", "Built", "Imports")];
  return(p);
}
# Imported packages:
imports.pkg = function(pkg=NULL, sort=TRUE) {
  p = info.pkg(pkg);
  ### Imported packages
  imp = lapply(p$Imports, function(s) strsplit(s, "[,][ ]*"))
  imp = unlist(imp)
  imp = imp[ ! is.na(imp)]
  # Cleanup:
  imp = sub("[ \n\r\t]*+\\([-,. >=0-9\n\t\r]++\\) *+$", "", imp,
perl=TRUE)
  imp = sub("^[ \n\r\t]++", "", imp, perl=TRUE);
  # Tabulate:
  tbl = as.data.frame(table(imp), stringsAsFactors=FALSE);
  names(tbl)[1] = "Name";
  if(sort) {
  id = order(tbl$Freq, decreasing=TRUE);
  tbl = tbl[id,];
  }
  return(tbl);
}

match.imports = function(pkg, x=NULL, quote=FALSE) {
  if(is.null(x)) x = info.pkg();
  if(quote) {
  pkg = paste0("\\Q", pkg, "\\E");
  }
  # TODO: Use word delimiters?
  # "(https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Installed packages: Bioconductor vs CRAN?

2021-09-24 Thread Leonard Mada via R-help


Dear List Members,


Is there a way to extract if an installed package is from Bioconductor 
or if it is a regular Cran package?



The information seems to be *not* available in:

installed.packages()


Sincerely,


Leonard

===

I started to write some utility functions to analyse installed packages. 
The latest version is on Github:

https://github.com/discoleo/R/blob/master/Stat/Tools.CRAN.R


# Basic Info:
info.pkg = function(pkg=NULL) {
    if(is.null(pkg)) { pkg = installed.packages(); }
    else {
        all.pkg = installed.packages();
        pkg = all.pkg[all.pkg[,1] %in% pkg, ];
    }
    p = pkg;
    p = as.data.frame(p);
    p = p[ , c("Package", "Version", "Built", "Imports")];
    return(p);
}
# Imported packages:
imports.pkg = function(pkg=NULL, sort=TRUE) {
    p = info.pkg(pkg);
    ### Imported packages
    imp = lapply(p$Imports, function(s) strsplit(s, "[,][ ]*"))
    imp = unlist(imp)
    imp = imp[ ! is.na(imp)]
    # Cleanup:
    imp = sub("[ \n\r\t]*+\\([-,. >=0-9\n\t\r]++\\) *+$", "", imp, 
perl=TRUE)

    imp = sub("^[ \n\r\t]++", "", imp, perl=TRUE);
    # Tabulate:
    tbl = as.data.frame(table(imp), stringsAsFactors=FALSE);
    names(tbl)[1] = "Name";
    if(sort) {
        id = order(tbl$Freq, decreasing=TRUE);
        tbl = tbl[id,];
    }
    return(tbl);
}

match.imports = function(pkg, x=NULL, quote=FALSE) {
    if(is.null(x)) x = info.pkg();
    if(quote) {
        pkg = paste0("\\Q", pkg, "\\E");
    }
    # TODO: Use word delimiters?
    # "(https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] [Questionnaire] Standardized Options: Justify/Alignment

2021-09-19 Thread Leonard Mada via R-help


Dear R users,


I have started to work on an improved version of the format.ftable 
function. The code and ideas should be reused to improve other R 
functions (enabling more advanced format of the character output).


However, there are a number of open questions. These are focused on 
standardizing the names and the options of the various arguments used to 
format the output. A separate post will address questions related to 
technical and design decisions.



Note:

The arguments are passed to various helper functions. It may be tedious 
to modify once implemented. Furthermore, package developers should be 
encouraged to use the standardized names and options as well (and may 
use the helper functions as well).



The users are also encouraged to test the various options. Some code to 
enable testing is available at the end of this post.



Structure of "Questionnaire":

a.) Answers should follow a Likert-type scale:
# Strongly disagree (on option ...);
# Disagree (on option ...);
# Agree (on option ...);
# Strongly agree (on option ...);

b.) Motivation: ...

c.) Other comments: ...


### The "Questionnaire"


1.) "MiddleTop", "MiddleBottom" vs "Middle"

Example problem: positioning 3 lines of text on 4 rows;

Workaround: user can easily prepend or append a newline to the relevant 
names, forcing the desired behaviour. However, there is a helper 
function to merge (cbind) 2 string matrices and it may be tedious for a 
user to modify all names. [but this is less used in ftable]


Disadvantages: the 2 variants "break" pmatch()!


2.) Lower case vs Upper case

Motivation: the options for most named algorithms are uppercase and are 
likely to remain uppercase;


Example: option = c("MyName1", "MyName2", "Fields1", "Fields2", ...);

Existing options in format.ftable: are lowercase, "left", "right", 
*"centre"*;



3.) Standardized Names

3.a.) Arguments:
justify = ... or align = ...?
pos = ... or position = ... or valign = ... ?

3.b.) Options:
- "left", "right", "centre" vs "center"?
- using both "centre" and "center" break pmatch();
- "top", "bottom", "middle";

Native-English speakers should review this question as well.

Note:
The new function enables to justify differently both the row-names and 
the factor levels:
- there are actually 2 arguments: justify="left", justify.lvl="c"; # 
with centre vs center issue!



I do not now if there is any facility to run such questionnaires through 
R. My resources are also rather limited - if anyone is willing to 
provide help - I would be very happy.



Sincerely,


Leonard


===

### Test Code

The latest version of the ftable2 function (contains a fix) and the 
needed helper functions are available on Github:

https://github.com/discoleo/R/blob/master/Stat/Tools.Data.R


### Some Data
mtcars$carbCtg = cut(mtcars$carb, c(1, 2, 4, +Inf), right=FALSE)
# Alternative:
# mtcars$carbCtg = cut(mtcars$carb, c(1, 2, 4, 8), include.lowest=TRUE)
tbl = with(mtcars, table(cyl, hp, carbCtg, gear))
id = c(1,3,4);

# Note: the names can be modified to test various scenarios
xnm = c("Long\nname: ", "", "Extremely\nlong\nname: ")
xnm = paste0(xnm, names(dimnames(tbl))[id])
names(dimnames(tbl))[id] = xnm;
ftbl = ftable(tbl, row.vars = id)

### Test: FTABLE
ftable2(ftbl, sep=" | ", justify="left", justify.lvl="c", pos="Top", 
split="\n")


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Improvement: function cut

2021-09-18 Thread Leonard Mada via R-help

Hello Andrew,


I add this info as a completion (so other users can get a better 
understanding):

If we want to perform a survival analysis, than the interval should be 
closed to the right, but we should include also the first time point (as 
per Intention-to-Treat):

[0, 4](4, 8](8, 12](12, 16]

[0, 4](4, 8](8, 12](12, 16](16, 20]


So the series is extendible to the right without any errors!

But the 1st interval (which is the same in both series) is different 
from the other intervals: [0, 4].


I feel that this should have been the default behaviour for cut().

Note:

I was induced to think about a different situation in my previous 
message, as you constructed open intervals on the right, and also 
extended to the right. But survival analysis should be as described in 
this mail and should probably be the default.


Sincerely,


Leonard


On 9/18/2021 1:29 AM, Andrew Simmons wrote:
> I disagree, I don't really think it's too long or ugly, but if you 
> think it is, you could abbreviate it as 'i'.
>
>
> x <- 0:20
> breaks1 <- seq.int <http://seq.int>(0, 16, 4)
> breaks2 <- seq.int <http://seq.int>(0, 20, 4)
> data.frame(
>     cut(x, breaks1, right = FALSE, i = TRUE),
>     cut(x, breaks2, right = FALSE, i = TRUE),
>     check.names = FALSE
> )
>
>
> I hope this helps.
>
> On Fri, Sep 17, 2021 at 6:26 PM Leonard Mada  <mailto:leo.m...@syonic.eu>> wrote:
>
> Hello Andrew,
>
>
> But "cut" generates factors. In most cases with real data one
> expects to have also the ends of the interval: the argument
> "include.lowest" is both ugly and too long.
>
> [The test-code on the ftable thread contains this error! I have
> run through this error a couple of times.]
>
>
> The only real situation that I can imagine to be problematic:
>
> - if the interval goes to +Inf (or -Inf): I do not know if there
> would be any effects when including +Inf (or -Inf).
>
>
> Leonard
>
>
> On 9/18/2021 1:14 AM, Andrew Simmons wrote:
>> While it is not explicitly mentioned anywhere in the
>> documentation for .bincode, I suspect 'include.lowest = FALSE' is
>> the default to keep the definitions of the bins consistent. For
>> example:
>>
>>
>> x <- 0:20
>> breaks1 <- seq.int <http://seq.int>(0, 16, 4)
>> breaks2 <- seq.int <http://seq.int>(0, 20, 4)
>> cbind(
>>     .bincode(x, breaks1, right = FALSE, include.lowest = TRUE),
>>     .bincode(x, breaks2, right = FALSE, include.lowest = TRUE)
>> )
>>
>>
>> by having 'include.lowest = TRUE' with different ends, you can
>> get inconsistent behaviour. While this probably wouldn't be an
>> issue with 'real' data, this would seem like something you'd want
>> to avoid by default. The definitions of the bins are
>>
>>
>> [0, 4)
>> [4, 8)
>> [8, 12)
>> [12, 16]
>>
>>
>> and
>>
>>
>> [0, 4)
>> [4, 8)
>> [8, 12)
>> [12, 16)
>> [16, 20]
>>
>>
>> so you can see where the inconsistent behaviour comes from. You
>> might be able to get R-core to add argument 'warn', but probably
>> not to change the default of 'include.lowest'. I hope this helps
>>
>>
>> On Fri, Sep 17, 2021 at 6:01 PM Leonard Mada > <mailto:leo.m...@syonic.eu>> wrote:
>>
>> Thank you Andrew.
>>
>>
>> Is there any reason not to make: include.lowest = TRUE the
>> default?
>>
>>
>> Regarding the NA:
>>
>> The user still has to suspect that some values were not
>> included and run that test.
>>
>>
>>     Leonard
>>
>>
>> On 9/18/2021 12:53 AM, Andrew Simmons wrote:
>>> Regarding your first point, argument 'include.lowest'
>>> already handles this specific case, see ?.bincode
>>>
>>> Your second point, maybe it could be helpful, but since both
>>> 'cut.default' and '.bincode' return NA if a value isn't
>>> within a bin, you could make something like this on your own.
>>> Might be worth pitching to R-bugs on the wishlist.
>>>
>>>
>>>
>>> On Fri, Sep 17, 2021, 17:45 Leonard Mada via R-help
>>> mailto:r-help@r-project.org>> wrote:
>>>
>>> Hello List members,
>>>
>>>
>>> the following improvements would be useful for func

Re: [R] Improvement: function cut

2021-09-17 Thread Leonard Mada via R-help


The warn should be in cut() => .bincode().

It should be generated whenever a real value (excludes NA or NAN or +/- 
Inf) is not included in any of the bins.



If the user writes a script and doesn't want any warnings: he can select 
warn = FALSE. But otherwise it would be very helpful to catch 
immediately the error (and not after a number of steps or miss the error 
altogether).



Leonard


On 9/18/2021 1:28 AM, Jeff Newmiller wrote:

Re your objection that "the user has to suspect that some values were not 
included" applies equally to your proposed warn option. There are a lot of ways to 
introduce NAs... in real projects all analysts should be suspecting this problem.

On September 17, 2021 3:01:35 PM PDT, Leonard Mada via R-help 
 wrote:

Thank you Andrew.


Is there any reason not to make: include.lowest = TRUE the default?


Regarding the NA:

The user still has to suspect that some values were not included and run
that test.


Leonard


On 9/18/2021 12:53 AM, Andrew Simmons wrote:

Regarding your first point, argument 'include.lowest' already handles
this specific case, see ?.bincode

Your second point, maybe it could be helpful, but since both
'cut.default' and '.bincode' return NA if a value isn't within a bin,
you could make something like this on your own.
Might be worth pitching to R-bugs on the wishlist.



On Fri, Sep 17, 2021, 17:45 Leonard Mada via R-help
mailto:r-help@r-project.org>> wrote:

 Hello List members,


 the following improvements would be useful for function cut (and
 .bincode):


 1.) Argument: Include extremes
 extremes = TRUE
 if(right == FALSE) {
     # include also right for last interval;
 } else {
     # include also left for first interval;
 }


 2.) Argument: warn = TRUE

 Warn if any values are not included in the intervals.


 Motivation:
 - reduce risk of errors when using function cut();


 Sincerely,


 Leonard

 __
 R-help@r-project.org <mailto:R-help@r-project.org> mailing list --
 To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 <https://stat.ethz.ch/mailman/listinfo/r-help>
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 <http://www.R-project.org/posting-guide.html>
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Improvement: function cut

2021-09-17 Thread Leonard Mada via R-help

Why would you want to merge different factors?

It makes no sense on real data. Even if some names are the same, the 
factors are not the same!


The only real-data application that springs to mind is censoring (right 
or left, depending on the choice): but here we have both open and closed 
intervals, e.g. to the right (in the same data-set).


Leonard


On 9/18/2021 1:29 AM, Andrew Simmons wrote:
> I disagree, I don't really think it's too long or ugly, but if you 
> think it is, you could abbreviate it as 'i'.
>
>
> x <- 0:20
> breaks1 <- seq.int <http://seq.int>(0, 16, 4)
> breaks2 <- seq.int <http://seq.int>(0, 20, 4)
> data.frame(
>     cut(x, breaks1, right = FALSE, i = TRUE),
>     cut(x, breaks2, right = FALSE, i = TRUE),
>     check.names = FALSE
> )
>
>
> I hope this helps.
>
> On Fri, Sep 17, 2021 at 6:26 PM Leonard Mada  <mailto:leo.m...@syonic.eu>> wrote:
>
> Hello Andrew,
>
>
> But "cut" generates factors. In most cases with real data one
> expects to have also the ends of the interval: the argument
> "include.lowest" is both ugly and too long.
>
> [The test-code on the ftable thread contains this error! I have
> run through this error a couple of times.]
>
>
> The only real situation that I can imagine to be problematic:
>
> - if the interval goes to +Inf (or -Inf): I do not know if there
> would be any effects when including +Inf (or -Inf).
>
>
> Leonard
>
>
> On 9/18/2021 1:14 AM, Andrew Simmons wrote:
>> While it is not explicitly mentioned anywhere in the
>> documentation for .bincode, I suspect 'include.lowest = FALSE' is
>> the default to keep the definitions of the bins consistent. For
>> example:
>>
>>
>> x <- 0:20
>> breaks1 <- seq.int <http://seq.int>(0, 16, 4)
>> breaks2 <- seq.int <http://seq.int>(0, 20, 4)
>> cbind(
>>     .bincode(x, breaks1, right = FALSE, include.lowest = TRUE),
>>     .bincode(x, breaks2, right = FALSE, include.lowest = TRUE)
>> )
>>
>>
>> by having 'include.lowest = TRUE' with different ends, you can
>> get inconsistent behaviour. While this probably wouldn't be an
>> issue with 'real' data, this would seem like something you'd want
>> to avoid by default. The definitions of the bins are
>>
>>
>> [0, 4)
>> [4, 8)
>> [8, 12)
>> [12, 16]
>>
>>
>> and
>>
>>
>> [0, 4)
>> [4, 8)
>> [8, 12)
>> [12, 16)
>> [16, 20]
>>
>>
>> so you can see where the inconsistent behaviour comes from. You
>> might be able to get R-core to add argument 'warn', but probably
>> not to change the default of 'include.lowest'. I hope this helps
>>
>>
>> On Fri, Sep 17, 2021 at 6:01 PM Leonard Mada > <mailto:leo.m...@syonic.eu>> wrote:
>>
>> Thank you Andrew.
>>
>>
>> Is there any reason not to make: include.lowest = TRUE the
>> default?
>>
>>
>> Regarding the NA:
>>
>> The user still has to suspect that some values were not
>> included and run that test.
>>
>>
>>     Leonard
>>
>>
>> On 9/18/2021 12:53 AM, Andrew Simmons wrote:
>>> Regarding your first point, argument 'include.lowest'
>>> already handles this specific case, see ?.bincode
>>>
>>> Your second point, maybe it could be helpful, but since both
>>> 'cut.default' and '.bincode' return NA if a value isn't
>>> within a bin, you could make something like this on your own.
>>> Might be worth pitching to R-bugs on the wishlist.
>>>
>>>
>>>
>>> On Fri, Sep 17, 2021, 17:45 Leonard Mada via R-help
>>> mailto:r-help@r-project.org>> wrote:
>>>
>>> Hello List members,
>>>
>>>
>>> the following improvements would be useful for function
>>> cut (and .bincode):
>>>
>>>
>>> 1.) Argument: Include extremes
>>> extremes = TRUE
>>> if(right == FALSE) {
>>>     # include also right for last interval;
>>> } else {
>>>     # include also left for first interval;
>>> }
>>>
>>>
>>> 2.) Argument: w

Re: [R] Improvement: function cut

2021-09-17 Thread Leonard Mada via R-help

Hello Andrew,


But "cut" generates factors. In most cases with real data one expects to 
have also the ends of the interval: the argument "include.lowest" is 
both ugly and too long.

[The test-code on the ftable thread contains this error! I have run 
through this error a couple of times.]


The only real situation that I can imagine to be problematic:

- if the interval goes to +Inf (or -Inf): I do not know if there would 
be any effects when including +Inf (or -Inf).


Leonard


On 9/18/2021 1:14 AM, Andrew Simmons wrote:
> While it is not explicitly mentioned anywhere in the documentation for 
> .bincode, I suspect 'include.lowest = FALSE' is the default to keep 
> the definitions of the bins consistent. For example:
>
>
> x <- 0:20
> breaks1 <- seq.int <http://seq.int>(0, 16, 4)
> breaks2 <- seq.int <http://seq.int>(0, 20, 4)
> cbind(
>     .bincode(x, breaks1, right = FALSE, include.lowest = TRUE),
>     .bincode(x, breaks2, right = FALSE, include.lowest = TRUE)
> )
>
>
> by having 'include.lowest = TRUE' with different ends, you can get 
> inconsistent behaviour. While this probably wouldn't be an issue with 
> 'real' data, this would seem like something you'd want to avoid by 
> default. The definitions of the bins are
>
>
> [0, 4)
> [4, 8)
> [8, 12)
> [12, 16]
>
>
> and
>
>
> [0, 4)
> [4, 8)
> [8, 12)
> [12, 16)
> [16, 20]
>
>
> so you can see where the inconsistent behaviour comes from. You might 
> be able to get R-core to add argument 'warn', but probably not to 
> change the default of 'include.lowest'. I hope this helps
>
>
> On Fri, Sep 17, 2021 at 6:01 PM Leonard Mada  <mailto:leo.m...@syonic.eu>> wrote:
>
> Thank you Andrew.
>
>
> Is there any reason not to make: include.lowest = TRUE the default?
>
>
> Regarding the NA:
>
> The user still has to suspect that some values were not included
> and run that test.
>
>
> Leonard
>
>
> On 9/18/2021 12:53 AM, Andrew Simmons wrote:
>> Regarding your first point, argument 'include.lowest' already
>> handles this specific case, see ?.bincode
>>
>> Your second point, maybe it could be helpful, but since both
>> 'cut.default' and '.bincode' return NA if a value isn't within a
>> bin, you could make something like this on your own.
>> Might be worth pitching to R-bugs on the wishlist.
>>
>>
>>
>> On Fri, Sep 17, 2021, 17:45 Leonard Mada via R-help
>> mailto:r-help@r-project.org>> wrote:
>>
>> Hello List members,
>>
>>
>> the following improvements would be useful for function cut
>> (and .bincode):
>>
>>
>> 1.) Argument: Include extremes
>> extremes = TRUE
>> if(right == FALSE) {
>>     # include also right for last interval;
>> } else {
>>     # include also left for first interval;
>> }
>>
>>
>> 2.) Argument: warn = TRUE
>>
>> Warn if any values are not included in the intervals.
>>
>>
>> Motivation:
>> - reduce risk of errors when using function cut();
>>
>>
>> Sincerely,
>>
>>
>> Leonard
>>
>> __
>> R-help@r-project.org <mailto:R-help@r-project.org> mailing
>> list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> <https://stat.ethz.ch/mailman/listinfo/r-help>
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> <http://www.R-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible
>> code.
>>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Improvement: function cut

2021-09-17 Thread Leonard Mada via R-help

Thank you Andrew.


Is there any reason not to make: include.lowest = TRUE the default?


Regarding the NA:

The user still has to suspect that some values were not included and run 
that test.


Leonard


On 9/18/2021 12:53 AM, Andrew Simmons wrote:
> Regarding your first point, argument 'include.lowest' already handles 
> this specific case, see ?.bincode
>
> Your second point, maybe it could be helpful, but since both 
> 'cut.default' and '.bincode' return NA if a value isn't within a bin, 
> you could make something like this on your own.
> Might be worth pitching to R-bugs on the wishlist.
>
>
>
> On Fri, Sep 17, 2021, 17:45 Leonard Mada via R-help 
> mailto:r-help@r-project.org>> wrote:
>
> Hello List members,
>
>
> the following improvements would be useful for function cut (and
> .bincode):
>
>
> 1.) Argument: Include extremes
> extremes = TRUE
> if(right == FALSE) {
>     # include also right for last interval;
> } else {
>     # include also left for first interval;
> }
>
>
> 2.) Argument: warn = TRUE
>
> Warn if any values are not included in the intervals.
>
>
> Motivation:
> - reduce risk of errors when using function cut();
>
>
> Sincerely,
>
>
> Leonard
>
> __
> R-help@r-project.org <mailto:R-help@r-project.org> mailing list --
> To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> <https://stat.ethz.ch/mailman/listinfo/r-help>
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> <http://www.R-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Improvement: function cut

2021-09-17 Thread Leonard Mada via R-help


Hello List members,


the following improvements would be useful for function cut (and .bincode):


1.) Argument: Include extremes
extremes = TRUE
if(right == FALSE) {
   # include also right for last interval;
} else {
   # include also left for first interval;
}


2.) Argument: warn = TRUE

Warn if any values are not included in the intervals.


Motivation:
- reduce risk of errors when using function cut();


Sincerely,


Leonard

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [R Code] Split long names in format.ftable

2021-09-17 Thread Leonard Mada via R-help


Dear List members,


I have uploaded an improved version on Github. The function is now fully 
functional:

- justify: left, right, cent...: TODO centre vs center;
- sep: separator when printing;
- pos: Top, Bottom; TODO: Middle;

see:
https://github.com/discoleo/R/blob/master/Stat/Tools.Data.

I will address some open questions in a separate post.


### Test
# Required:
# - all functions from Github / section "Formatting":
#   from space.builder() to ftable2();


### Data
mtcars$carbCtg = cut(mtcars$carb, c(1, 2, 4, 8), right=FALSE)
tbl = with(mtcars, table(cyl, hp, carbCtg, gear))
id = c(1,3,4);
xnm = c("Long\nname: ", "", "Extremely\nlong\nname: ")
xnm = paste0(xnm, names(dimnames(tbl))[id]);
names(dimnames(tbl))[id] = xnm;
ftbl = ftable(tbl, row.vars = id);

### FTABLE
ftable2(ftbl, sep="|"); # works nicely
ftable2(ftbl, sep=" ")
ftable2(ftbl, sep=" | ")
ftable2(ftbl, sep=" | ", justify="left")
ftable2(ftbl, sep=" | ", justify="cent") # TODO: center vs centre
ftable2(ftbl, sep=" | ", justify="left", justify.lvl="c")


Sincerely,


Leonard


On 9/15/2021 11:14 PM, Leonard Mada wrote:

Dear List members,


I have uploaded an improved version on Github:
- new option: align top vs bottom;

Functions:
split.names: splits and aligns the names;
merge.align: aligns 2 string matrices;
ftable2: enhanced version of format.ftable (proof of concept);
see:
https://github.com/discoleo/R/blob/master/Stat/Tools.Data.R


It makes sense to have such functionality in base R as well: it may be 
useful in various locations to format character output.



Sincerely,


Leonard


On 9/14/2021 8:18 PM, Leonard Mada wrote:

Dear List members,


I wrote some code to split long names in format.ftable. I hope it 
will be useful to others as well.



Ideally, this code should be implemented natively in R. I will 
provide in the 2nd part of the mail a concept how to actually 
implement the code in R. This may be interesting to R-devel as well.


[...]


C.) split.names Function

This function may be useful in other locations as well, particularly 
to split names/labels used in axes and legends in various plots. But 
I do not have much knowledge of the graphics engine in R.



Sincerely,


Leonard




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [R Code] Split long names in format.ftable

2021-09-15 Thread Leonard Mada via R-help


Dear List members,


I have uploaded an improved version on Github:
- new option: align top vs bottom;

Functions:
split.names: splits and aligns the names;
merge.align: aligns 2 string matrices;
ftable2: enhanced version of format.ftable (proof of concept);
see:
https://github.com/discoleo/R/blob/master/Stat/Tools.Data.R


It makes sense to have such functionality in base R as well: it may be 
useful in various locations to format character output.



Sincerely,


Leonard


On 9/14/2021 8:18 PM, Leonard Mada wrote:

Dear List members,


I wrote some code to split long names in format.ftable. I hope it will 
be useful to others as well.



Ideally, this code should be implemented natively in R. I will provide 
in the 2nd part of the mail a concept how to actually implement the 
code in R. This may be interesting to R-devel as well.



### Helper function

# Split the actual names

split.names = function(names, extend=0, justify="Right", 
blank.rm=FALSE, split.ch = "\n", detailed=TRUE) {
    justify = if(is.null(justify)) 0 else pmatch(justify, c("Left", 
"Right"));

    str = strsplit(names, split.ch);
    if(blank.rm) str = lapply(str, function(s) s[nchar(s) > 0]);
    nr  = max(sapply(str, function(s) length(s)));
    nch = lapply(str, function(s) max(nchar(s)));
    chf = function(nch) paste0(rep(" ", nch), collapse="");
    ch0 = sapply(nch, chf);
    mx  = matrix(rep(ch0, each=nr), nrow=nr, ncol=length(names));
    for(nc in seq(length(names))) {
        n = length(str[[nc]]);
        # Justifying
        s = sapply(seq(n), function(nr) paste0(rep(" ", nch[[nc]] - 
nchar(str[[nc]][nr])), collapse=""));
        s = if(justify == 2) paste0(s, str[[nc]]) else 
paste0(str[[nc]], s);

        mx[seq(nr + 1 - length(str[[nc]]), nr) , nc] = s;
    }
    if(extend > 0) {
        mx = cbind(mx, matrix("", nr=nr, ncol=extend));
    }
    if(detailed) attr(mx, "nchar") = unlist(nch);
    return(mx);
}

### ftable with name splitting
# - this code should be ideally integrated inside format.ftable;
ftable2 = function(ftbl, print=TRUE, quote=FALSE, ...) {
    ftbl2 = format(ftbl, quote=quote, ...);
    row.vars = names(attr(ftbl, "row.vars"))
    nr = length(row.vars);
    nms = split.names(row.vars, extend = ncol(ftbl2) - nr);
    ftbl2 = rbind(ftbl2[1,], nms, ftbl2[-c(1,2),]);
    # TODO: update width of factor labels;
    # - new width available in attr(nms, "nchar");
    if(print) {
        cat(t(ftbl2), sep = c(rep(" ", ncol(ftbl2) - 1), "\n"))
    }
    invisible(ftbl2);
}

I have uploaded this code also on Github:

https://github.com/discoleo/R/blob/master/Stat/Tools.Data.R


B.) Detailed Concept
# - I am ignoring any variants;
# - the splitting is actually done in format.ftable;
# - we set only an attribute in ftable;
ftable = function(..., split.ch="\n") {
   [...]
   attr(ftbl, "split.ch") = split.ch; # set an attribute "split.ch"
   return(ftbl);
}

format.ftable(ftbl, ..., split.ch) {
if(is.missing(split.ch)) {
   # check if the split.ch attribute is set and use it;
} else {
   # use the explicitly provided split.ch: if( ! is.null(split.ch))
}
   [...]
}


C.) split.names Function

This function may be useful in other locations as well, particularly 
to split names/labels used in axes and legends in various plots. But I 
do not have much knowledge of the graphics engine in R.



Sincerely,


Leonard




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Evaluating lazily 'f<-' ?

2021-09-15 Thread Leonard Mada via R-help

;>> Definition" claims that the above would be equivalent to:
>>>>
>>>>
>>>> `<-`(df, `padding<-`(df, value = `right<-`(padding(df),
>>>> value = 1)))
>>>>
>>>>
>>>> but that is not correct, and you can tell by using
>>>> `substitute` as you were above. There isn't a way to do
>>>> what you want with the syntax you provided, you'll have to
>>>> do something different. You could add a `which` argument to
>>>> each style function, and maybe put the code for `match.arg`
>>>> in a separate function:
>>>>
>>>>
>>>> match.which <- function (which)
>>>> match.arg(which, c("bottom", "left", "top", "right"),
>>>> several.ok = TRUE)
>>>>
>>>>
>>>> padding <- function (x, which)
>>>> {
>>>>     which <- match.which(which)
>>>>     # more code
>>>> }
>>>>
>>>>
>>>> border <- function (x, which)
>>>> {
>>>>     which <- match.which(which)
>>>>     # more code
>>>> }
>>>>
>>>>
>>>> some_other_style <- function (x, which)
>>>> {
>>>>     which <- match.which(which)
>>>>     # more code
>>>> }
>>>>
>>>>
>>>> I hope this helps.
>>>>
>>>> On Mon, Sep 13, 2021 at 12:17 PM Leonard Mada
>>>> mailto:leo.m...@syonic.eu>> wrote:
>>>>
>>>> Hello Andrew,
>>>>
>>>>
>>>> this could work. I will think about it.
>>>>
>>>>
>>>> But I was thinking more generically. Suppose we have a
>>>> series of functions:
>>>> padding(), border(), some_other_style();
>>>> Each of these functions has the parameter "right" (or
>>>> the group of parameters c("right", ...)).
>>>>
>>>>
>>>> Then I could design a function right(FUN) that assigns
>>>> the value to this parameter and evaluates the function
>>>> FUN().
>>>>
>>>>
>>>> There are a few ways to do this:
>>>>
>>>> 1.) Other parameters as ...
>>>> right(FUN, value, ...) = value; and then pass "..." to FUN.
>>>> right(value, FUN, ...) = value; # or is this the
>>>> syntax? (TODO: explore)
>>>>
>>>> 2.) Another way:
>>>> right(FUN(...other parameters already specified...)) =
>>>> value;
>>>> I wanted to explore this 2nd option: but avoid
>>>> evaluating FUN, unless the parameter "right" is
>>>> injected into the call.
>>>>
>>>> 3.) Option 3:
>>>> The option you mentioned.
>>>>
>>>>
>>>> Independent of the method: there are still
>>>> weird/unexplained behaviours when I try the initial
>>>> code (see the latest mail with the improved code).
>>>>
>>>>
>>>> Sincerely,
>>>>
>>>>
>>>> Leonard
>>>>
>>>>
>>>> On 9/13/2021 6:45 PM, Andrew Simmons wrote:
>>>>> I think you're trying to do something like:
>>>>>
>>>>> `padding<-` <- function (x, which, value)
>>>>> {
>>>>>     which <- match.arg(which, c("bottom", "left",
>>>>> "top", "right"), several.ok = TRUE)
>>>>>     # code to pad to each side here
>>>>> }
>>>>>
>>>>> Then you could use it like
>>>>>
>>>>> df <- data.frame(x=1:5, y = sample(1:5, 5))
>>>>> padding(df, "right") <- 1

[R] [R Code] Split long names in format.ftable

2021-09-14 Thread Leonard Mada via R-help


Dear List members,


I wrote some code to split long names in format.ftable. I hope it will 
be useful to others as well.



Ideally, this code should be implemented natively in R. I will provide 
in the 2nd part of the mail a concept how to actually implement the code 
in R. This may be interesting to R-devel as well.



### Helper function

# Split the actual names

split.names = function(names, extend=0, justify="Right", blank.rm=FALSE, 
split.ch = "\n", detailed=TRUE) {
    justify = if(is.null(justify)) 0 else pmatch(justify, c("Left", 
"Right"));

    str = strsplit(names, split.ch);
    if(blank.rm) str = lapply(str, function(s) s[nchar(s) > 0]);
    nr  = max(sapply(str, function(s) length(s)));
    nch = lapply(str, function(s) max(nchar(s)));
    chf = function(nch) paste0(rep(" ", nch), collapse="");
    ch0 = sapply(nch, chf);
    mx  = matrix(rep(ch0, each=nr), nrow=nr, ncol=length(names));
    for(nc in seq(length(names))) {
        n = length(str[[nc]]);
        # Justifying
        s = sapply(seq(n), function(nr) paste0(rep(" ", nch[[nc]] - 
nchar(str[[nc]][nr])), collapse=""));
        s = if(justify == 2) paste0(s, str[[nc]]) else 
paste0(str[[nc]], s);

        mx[seq(nr + 1 - length(str[[nc]]), nr) , nc] = s;
    }
    if(extend > 0) {
        mx = cbind(mx, matrix("", nr=nr, ncol=extend));
    }
    if(detailed) attr(mx, "nchar") = unlist(nch);
    return(mx);
}

### ftable with name splitting
# - this code should be ideally integrated inside format.ftable;
ftable2 = function(ftbl, print=TRUE, quote=FALSE, ...) {
    ftbl2 = format(ftbl, quote=quote, ...);
    row.vars = names(attr(ftbl, "row.vars"))
    nr = length(row.vars);
    nms = split.names(row.vars, extend = ncol(ftbl2) - nr);
    ftbl2 = rbind(ftbl2[1,], nms, ftbl2[-c(1,2),]);
    # TODO: update width of factor labels;
    # - new width available in attr(nms, "nchar");
    if(print) {
        cat(t(ftbl2), sep = c(rep(" ", ncol(ftbl2) - 1), "\n"))
    }
    invisible(ftbl2);
}

I have uploaded this code also on Github:

https://github.com/discoleo/R/blob/master/Stat/Tools.Data.R


B.) Detailed Concept
# - I am ignoring any variants;
# - the splitting is actually done in format.ftable;
# - we set only an attribute in ftable;
ftable = function(..., split.ch="\n") {
   [...]
   attr(ftbl, "split.ch") = split.ch; # set an attribute "split.ch"
   return(ftbl);
}

format.ftable(ftbl, ..., split.ch) {
if(is.missing(split.ch)) {
   # check if the split.ch attribute is set and use it;
} else {
   # use the explicitly provided split.ch: if( ! is.null(split.ch))
}
   [...]
}


C.) split.names Function

This function may be useful in other locations as well, particularly to 
split names/labels used in axes and legends in various plots. But I do 
not have much knowledge of the graphics engine in R.



Sincerely,


Leonard

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fastest way to extract rows of smaller matrix many times by index to make larger matrix? and multiply columsn of matrix by vector

2021-09-14 Thread Leonard Mada via R-help


Hello Nevil,


you could test something like:


# the Matrix
m = matrix(1:1000, ncol=10)
m = t(m)

# Extract Data
idcol = sample(seq(100), 100, TRUE); # now columns
for(i in 1:100) {
    m2 = m[ , idcol];
}
m2 = t(m2); # transpose back


It may be faster, although I did not benchmark it.


There may be more complex variants. Maybe it is warranted to try for 
10^7 extractions:


- e.g. extracting one row and replacing all occurrences of that row;


Sincerely,


Leonard





It seems I cannot extract digested mail anymore. I hope though that the 
message is processed properly.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Evaluating lazily 'f<-' ?

2021-09-13 Thread Leonard Mada via R-help

   series of functions:
>>> padding(), border(), some_other_style();
>>> Each of these functions has the parameter "right" (or the
>>> group of parameters c("right", ...)).
>>>
>>>
>>> Then I could design a function right(FUN) that assigns the
>>> value to this parameter and evaluates the function FUN().
>>>
>>>
>>> There are a few ways to do this:
>>>
>>> 1.) Other parameters as ...
>>> right(FUN, value, ...) = value; and then pass "..." to FUN.
>>> right(value, FUN, ...) = value; # or is this the syntax?
>>> (TODO: explore)
>>>
>>> 2.) Another way:
>>> right(FUN(...other parameters already specified...)) = value;
>>> I wanted to explore this 2nd option: but avoid evaluating
>>> FUN, unless the parameter "right" is injected into the call.
>>>
>>> 3.) Option 3:
>>> The option you mentioned.
>>>
>>>
>>> Independent of the method: there are still weird/unexplained
>>> behaviours when I try the initial code (see the latest mail
>>> with the improved code).
>>>
>>>
>>> Sincerely,
>>>
>>>
>>> Leonard
>>>
>>>
>>> On 9/13/2021 6:45 PM, Andrew Simmons wrote:
>>>> I think you're trying to do something like:
>>>>
>>>> `padding<-` <- function (x, which, value)
>>>> {
>>>>     which <- match.arg(which, c("bottom", "left", "top",
>>>> "right"), several.ok = TRUE)
>>>>     # code to pad to each side here
>>>> }
>>>>
>>>> Then you could use it like
>>>>
>>>> df <- data.frame(x=1:5, y = sample(1:5, 5))
>>>> padding(df, "right") <- 1
>>>>
>>>> Does that work as expected for you?
>>>>
>>>> On Mon, Sep 13, 2021, 11:28 Leonard Mada via R-help
>>>> mailto:r-help@r-project.org>> wrote:
>>>>
>>>> I try to clarify the code:
>>>>
>>>>
>>>> ###
>>>> right = function(x, val) {print("Right");};
>>>> padding = function(x, right, left, top, bottom)
>>>> {print("Padding");};
>>>> 'padding<-' = function(x, ...) {print("Padding = ");};
>>>> df = data.frame(x=1:5, y = sample(1:5, 5)); # anything
>>>>
>>>> ### Does NOT work as expected
>>>> 'right<-' = function(x, value) {
>>>>  print("This line should be the first printed!")
>>>>  print("But ERROR: x was already evaluated, which
>>>> printed \"Padding\"");
>>>>  x = substitute(x); # x was already evaluated
>>>> before substitute();
>>>>  return("Nothing"); # do not now what the behaviour
>>>> should be?
>>>> }
>>>>
>>>> right(padding(df)) = 1;
>>>>
>>>> ### Output:
>>>>
>>>> [1] "Padding"
>>>> [1] "This line should be the first printed!"
>>>> [1] "But ERROR: x was already evaluated, which printed
>>>> \"Padding\""
>>>> [1] "Padding = " # How did this happen ???
>>>>
>>>>
>>>> ### Problems:
>>>>
>>>> 1.) substitute(x): did not capture the expression;
>>>> - the first parameter of 'right<-' was already
>>>> evaluated, which is not
>>>> the case with '%f%';
>>>> Can I avoid evaluating this parameter?
>>>> How can I avoid to evaluate it and capture the
>>>> expression: "right(...)"?
>>>>
>>>>
>>>> 2.) Unexpected
>>>> 'padding<-' was also called!
>>>> I did not know this

Re: [R] Evaluating lazily 'f<-' ?

2021-09-13 Thread Leonard Mada via R-help

Hello,


I have found the evaluation: it is described in the section on 
subsetting. The forced evaluation makes sense for subsetting.


On 9/13/2021 9:42 PM, Leonard Mada wrote:
>
> Hello Andrew,
>
>
> I try now to understand the evaluation of the expression:
>
> e = expression(r(x) <- 1)
>
> # parameter named "value" seems to be required;
> 'r<-' = function(x, value) {print("R");}
> eval(e, list(x=2))
> # [1] "R"
>
> # both versions work
> 'r<-' = function(value, x) {print("R");}
> eval(e, list(x=2))
> # [1] "R"
>
>
> ### the Expression
> e[[1]][[1]] # "<-", not "r<-"
> e[[1]][[2]] # "r(x)"
>
>
> The evaluation of "e" somehow calls "r<-", but evaluates also the 
> argument of r(...). I am still investigating what is actually happening.
>

The forced evaluation is relevant for subsetting, e.g.:
expression(r(x)[3] <- 1)
expression(r(x)[3] <- 1)[[1]][[2]]
# r(x)[3] # the evaluation details are NOT visible in the expression per se;
# Note: indeed, it makes sens to first evaluate r(x) and then to perform 
the subsetting;


However, in the case of a non-subsetted expression:
r(x) <- 1;
It would make sense to evaluate lazily r(x) if no subsetting is involved 
(more precisely "r<-"(x, value) ).

Would this have any impact on the current code?


Sincerely,


Leonard


>
> Sincerely,
>
>
> Leonard
>
>
> On 9/13/2021 9:15 PM, Andrew Simmons wrote:
>> R's parser doesn't work the way you're expecting it to. When doing an 
>> assignment like:
>>
>>
>> padding(right(df)) <- 1
>>
>>
>> it is broken into small stages. The guide "R Language Definition" 
>> claims that the above would be equivalent to:
>>
>>
>> `<-`(df, `padding<-`(df, value = `right<-`(padding(df), value = 1)))
>>
>>
>> but that is not correct, and you can tell by using `substitute` as 
>> you were above. There isn't a way to do what you want with the syntax 
>> you provided, you'll have to do something different. You could add a 
>> `which` argument to each style function, and maybe put the code for 
>> `match.arg` in a separate function:
>>
>>
>> match.which <- function (which)
>> match.arg(which, c("bottom", "left", "top", "right"), several.ok = TRUE)
>>
>>
>> padding <- function (x, which)
>> {
>>     which <- match.which(which)
>>     # more code
>> }
>>
>>
>> border <- function (x, which)
>> {
>>     which <- match.which(which)
>>     # more code
>> }
>>
>>
>> some_other_style <- function (x, which)
>> {
>>     which <- match.which(which)
>>     # more code
>> }
>>
>>
>> I hope this helps.
>>
>> On Mon, Sep 13, 2021 at 12:17 PM Leonard Mada > <mailto:leo.m...@syonic.eu>> wrote:
>>
>> Hello Andrew,
>>
>>
>> this could work. I will think about it.
>>
>>
>> But I was thinking more generically. Suppose we have a series of
>> functions:
>> padding(), border(), some_other_style();
>> Each of these functions has the parameter "right" (or the group
>> of parameters c("right", ...)).
>>
>>
>> Then I could design a function right(FUN) that assigns the value
>> to this parameter and evaluates the function FUN().
>>
>>
>> There are a few ways to do this:
>>
>> 1.) Other parameters as ...
>> right(FUN, value, ...) = value; and then pass "..." to FUN.
>> right(value, FUN, ...) = value; # or is this the syntax? (TODO:
>> explore)
>>
>> 2.) Another way:
>> right(FUN(...other parameters already specified...)) = value;
>> I wanted to explore this 2nd option: but avoid evaluating FUN,
>> unless the parameter "right" is injected into the call.
>>
>> 3.) Option 3:
>> The option you mentioned.
>>
>>
>> Independent of the method: there are still weird/unexplained
>> behaviours when I try the initial code (see the latest mail with
>> the improved code).
>>
>>
>> Sincerely,
>>
>>
>> Leonard
>>
>>
>> On 9/13/2021 6:45 PM, Andrew Simmons wrote:
>>> I think you're trying to do something like:
>>>
>>> `padding<-` <- function (x, which, value)
>>> {
>>>     which <- match.ar

Re: [R] Evaluating lazily 'f<-' ?

2021-09-13 Thread Leonard Mada via R-help

Hello Andrew,


I try now to understand the evaluation of the expression:

e = expression(r(x) <- 1)

# parameter named "value" seems to be required;
'r<-' = function(x, value) {print("R");}
eval(e, list(x=2))
# [1] "R"

# both versions work
'r<-' = function(value, x) {print("R");}
eval(e, list(x=2))
# [1] "R"


### the Expression
e[[1]][[1]] # "<-", not "r<-"
e[[1]][[2]] # "r(x)"


The evaluation of "e" somehow calls "r<-", but evaluates also the 
argument of r(...). I am still investigating what is actually happening.


Sincerely,


Leonard


On 9/13/2021 9:15 PM, Andrew Simmons wrote:
> R's parser doesn't work the way you're expecting it to. When doing an 
> assignment like:
>
>
> padding(right(df)) <- 1
>
>
> it is broken into small stages. The guide "R Language Definition" 
> claims that the above would be equivalent to:
>
>
> `<-`(df, `padding<-`(df, value = `right<-`(padding(df), value = 1)))
>
>
> but that is not correct, and you can tell by using `substitute` as you 
> were above. There isn't a way to do what you want with the syntax you 
> provided, you'll have to do something different. You could add a 
> `which` argument to each style function, and maybe put the code for 
> `match.arg` in a separate function:
>
>
> match.which <- function (which)
> match.arg(which, c("bottom", "left", "top", "right"), several.ok = TRUE)
>
>
> padding <- function (x, which)
> {
>     which <- match.which(which)
>     # more code
> }
>
>
> border <- function (x, which)
> {
>     which <- match.which(which)
>     # more code
> }
>
>
> some_other_style <- function (x, which)
> {
>     which <- match.which(which)
>     # more code
> }
>
>
> I hope this helps.
>
> On Mon, Sep 13, 2021 at 12:17 PM Leonard Mada  <mailto:leo.m...@syonic.eu>> wrote:
>
> Hello Andrew,
>
>
> this could work. I will think about it.
>
>
> But I was thinking more generically. Suppose we have a series of
> functions:
> padding(), border(), some_other_style();
> Each of these functions has the parameter "right" (or the group of
> parameters c("right", ...)).
>
>
> Then I could design a function right(FUN) that assigns the value
> to this parameter and evaluates the function FUN().
>
>
> There are a few ways to do this:
>
> 1.) Other parameters as ...
> right(FUN, value, ...) = value; and then pass "..." to FUN.
> right(value, FUN, ...) = value; # or is this the syntax? (TODO:
> explore)
>
> 2.) Another way:
> right(FUN(...other parameters already specified...)) = value;
> I wanted to explore this 2nd option: but avoid evaluating FUN,
> unless the parameter "right" is injected into the call.
>
> 3.) Option 3:
> The option you mentioned.
>
>
> Independent of the method: there are still weird/unexplained
> behaviours when I try the initial code (see the latest mail with
> the improved code).
>
>
> Sincerely,
>
>
> Leonard
>
>
>     On 9/13/2021 6:45 PM, Andrew Simmons wrote:
>> I think you're trying to do something like:
>>
>> `padding<-` <- function (x, which, value)
>> {
>>     which <- match.arg(which, c("bottom", "left", "top",
>> "right"), several.ok = TRUE)
>>     # code to pad to each side here
>> }
>>
>> Then you could use it like
>>
>> df <- data.frame(x=1:5, y = sample(1:5, 5))
>> padding(df, "right") <- 1
>>
>> Does that work as expected for you?
>>
>> On Mon, Sep 13, 2021, 11:28 Leonard Mada via R-help
>> mailto:r-help@r-project.org>> wrote:
>>
>> I try to clarify the code:
>>
>>
>> ###
>> right = function(x, val) {print("Right");};
>> padding = function(x, right, left, top, bottom)
>> {print("Padding");};
>> 'padding<-' = function(x, ...) {print("Padding = ");};
>> df = data.frame(x=1:5, y = sample(1:5, 5)); # anything
>>
>> ### Does NOT work as expected
>> 'right<-' = function(x, value) {
>>  print("This line should be the first printed!")
>>  print("But ERROR: x was already evaluated, which printed
>> \"Padding\"&quo

Re: [R] Evaluating lazily 'f<-' ?

2021-09-13 Thread Leonard Mada via R-help

Hello Andrew,


this could work. I will think about it.


But I was thinking more generically. Suppose we have a series of functions:
padding(), border(), some_other_style();
Each of these functions has the parameter "right" (or the group of 
parameters c("right", ...)).


Then I could design a function right(FUN) that assigns the value to this 
parameter and evaluates the function FUN().


There are a few ways to do this:

1.) Other parameters as ...
right(FUN, value, ...) = value; and then pass "..." to FUN.
right(value, FUN, ...) = value; # or is this the syntax? (TODO: explore)

2.) Another way:
right(FUN(...other parameters already specified...)) = value;
I wanted to explore this 2nd option: but avoid evaluating FUN, unless 
the parameter "right" is injected into the call.

3.) Option 3:
The option you mentioned.


Independent of the method: there are still weird/unexplained behaviours 
when I try the initial code (see the latest mail with the improved code).


Sincerely,


Leonard


On 9/13/2021 6:45 PM, Andrew Simmons wrote:
> I think you're trying to do something like:
>
> `padding<-` <- function (x, which, value)
> {
>     which <- match.arg(which, c("bottom", "left", "top", "right"), 
> several.ok = TRUE)
>     # code to pad to each side here
> }
>
> Then you could use it like
>
> df <- data.frame(x=1:5, y = sample(1:5, 5))
> padding(df, "right") <- 1
>
> Does that work as expected for you?
>
> On Mon, Sep 13, 2021, 11:28 Leonard Mada via R-help 
> mailto:r-help@r-project.org>> wrote:
>
> I try to clarify the code:
>
>
> ###
> right = function(x, val) {print("Right");};
> padding = function(x, right, left, top, bottom) {print("Padding");};
> 'padding<-' = function(x, ...) {print("Padding = ");};
> df = data.frame(x=1:5, y = sample(1:5, 5)); # anything
>
> ### Does NOT work as expected
> 'right<-' = function(x, value) {
>  print("This line should be the first printed!")
>  print("But ERROR: x was already evaluated, which printed
> \"Padding\"");
>  x = substitute(x); # x was already evaluated before substitute();
>  return("Nothing"); # do not now what the behaviour should be?
> }
>
> right(padding(df)) = 1;
>
> ### Output:
>
> [1] "Padding"
> [1] "This line should be the first printed!"
> [1] "But ERROR: x was already evaluated, which printed \"Padding\""
> [1] "Padding = " # How did this happen ???
>
>
> ### Problems:
>
> 1.) substitute(x): did not capture the expression;
> - the first parameter of 'right<-' was already evaluated, which is
> not
> the case with '%f%';
> Can I avoid evaluating this parameter?
> How can I avoid to evaluate it and capture the expression:
> "right(...)"?
>
>
> 2.) Unexpected
> 'padding<-' was also called!
> I did not know this. Is it feature or bug?
> R 4.0.4
>
>
> Sincerely,
>
>
> Leonard
>
>
> On 9/13/2021 4:45 PM, Duncan Murdoch wrote:
> > On 13/09/2021 9:38 a.m., Leonard Mada wrote:
> >> Hello,
> >>
> >>
> >> I can include code for "padding<-"as well, but the error is
> before that,
> >> namely in 'right<-':
> >>
> >> right = function(x, val) {print("Right");};
> >> # more options:
> >> padding = function(x, right, left, top, bottom)
> {print("Padding");};
> >> 'padding<-' = function(x, ...) {print("Padding = ");};
> >> df = data.frame(x=1:5, y = sample(1:5, 5));
> >>
> >>
> >> ### Does NOT work
> >> 'right<-' = function(x, val) {
> >>         print("Already evaluated and also does not use 'val'");
> >>         x = substitute(x); # x was evaluated before
> >> }
> >>
> >> right(padding(df)) = 1;
> >
> > It "works" (i.e. doesn't generate an error) for me, when I correct
> > your typo:  the second argument to `right<-` should be `value`, not
> > `val`.
> >
> > I'm still not clear whether it does what you want with that fix,
> > because I don't really understand what you want.
> >
> > Duncan Murdoch
> >
> >>
> >>
> >> I want to capture the assignment event i

Re: [R] Evaluating lazily 'f<-' ?

2021-09-13 Thread Leonard Mada via R-help


I try to clarify the code:


###
right = function(x, val) {print("Right");};
padding = function(x, right, left, top, bottom) {print("Padding");};
'padding<-' = function(x, ...) {print("Padding = ");};
df = data.frame(x=1:5, y = sample(1:5, 5)); # anything

### Does NOT work as expected
'right<-' = function(x, value) {
    print("This line should be the first printed!")
    print("But ERROR: x was already evaluated, which printed \"Padding\"");
    x = substitute(x); # x was already evaluated before substitute();
    return("Nothing"); # do not now what the behaviour should be?
}

right(padding(df)) = 1;

### Output:

[1] "Padding"
[1] "This line should be the first printed!"
[1] "But ERROR: x was already evaluated, which printed \"Padding\""
[1] "Padding = " # How did this happen ???


### Problems:

1.) substitute(x): did not capture the expression;
- the first parameter of 'right<-' was already evaluated, which is not 
the case with '%f%';

Can I avoid evaluating this parameter?
How can I avoid to evaluate it and capture the expression: "right(...)"?


2.) Unexpected
'padding<-' was also called!
I did not know this. Is it feature or bug?
R 4.0.4


Sincerely,


Leonard


On 9/13/2021 4:45 PM, Duncan Murdoch wrote:

On 13/09/2021 9:38 a.m., Leonard Mada wrote:

Hello,


I can include code for "padding<-"as well, but the error is before that,
namely in 'right<-':

right = function(x, val) {print("Right");};
# more options:
padding = function(x, right, left, top, bottom) {print("Padding");};
'padding<-' = function(x, ...) {print("Padding = ");};
df = data.frame(x=1:5, y = sample(1:5, 5));


### Does NOT work
'right<-' = function(x, val) {
        print("Already evaluated and also does not use 'val'");
        x = substitute(x); # x was evaluated before
}

right(padding(df)) = 1;


It "works" (i.e. doesn't generate an error) for me, when I correct 
your typo:  the second argument to `right<-` should be `value`, not 
`val`.


I'm still not clear whether it does what you want with that fix, 
because I don't really understand what you want.


Duncan Murdoch




I want to capture the assignment event inside "right<-" and then call
the function padding() properly.

I haven't thought yet if I should use:

padding(x, right, left, ... other parameters);

or

padding(x, parameter) <- value;


It also depends if I can properly capture the unevaluated expression
inside "right<-":

'right<-' = function(x, val) {

# x is automatically evaluated when using 'f<-'!

# but not when implementing as '%f%' = function(x, y);

}


Many thanks,


Leonard


On 9/13/2021 4:11 PM, Duncan Murdoch wrote:

On 12/09/2021 10:33 a.m., Leonard Mada via R-help wrote:

How can I avoid evaluation?

right = function(x, val) {print("Right");};
padding = function(x) {print("Padding");};
df = data.frame(x=1:5, y = sample(1:5, 5));

### OK
'%=%' = function(x, val) {
       x = substitute(x);
}
right(padding(df)) %=% 1; # but ugly

### Does NOT work
'right<-' = function(x, val) {
       print("Already evaluated and also does not use 'val'");
       x = substitute(x); # is evaluated before
}

right(padding(df)) = 1


That doesn't make sense.  You don't have a `padding<-` function, and
yet you are trying to call right<- to assign something to padding(df).

I'm not sure about your real intention, but assignment functions by
their nature need to evaluate the thing they are assigning to, since
they are designed to modify objects, not create new ones.

To create a new object, just use regular assignment.

Duncan Murdoch




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Evaluating lazily 'f<-' ?

2021-09-13 Thread Leonard Mada via R-help


Hello,


I can include code for "padding<-"as well, but the error is before that, 
namely in 'right<-':


right = function(x, val) {print("Right");};
# more options:
padding = function(x, right, left, top, bottom) {print("Padding");};
'padding<-' = function(x, ...) {print("Padding = ");};
df = data.frame(x=1:5, y = sample(1:5, 5));


### Does NOT work
'right<-' = function(x, val) {
      print("Already evaluated and also does not use 'val'");
      x = substitute(x); # x was evaluated before
}

right(padding(df)) = 1;


I want to capture the assignment event inside "right<-" and then call 
the function padding() properly.


I haven't thought yet if I should use:

padding(x, right, left, ... other parameters);

or

padding(x, parameter) <- value;


It also depends if I can properly capture the unevaluated expression 
inside "right<-":


'right<-' = function(x, val) {

# x is automatically evaluated when using 'f<-'!

# but not when implementing as '%f%' = function(x, y);

}


Many thanks,


Leonard


On 9/13/2021 4:11 PM, Duncan Murdoch wrote:

On 12/09/2021 10:33 a.m., Leonard Mada via R-help wrote:

How can I avoid evaluation?

right = function(x, val) {print("Right");};
padding = function(x) {print("Padding");};
df = data.frame(x=1:5, y = sample(1:5, 5));

### OK
'%=%' = function(x, val) {
      x = substitute(x);
}
right(padding(df)) %=% 1; # but ugly

### Does NOT work
'right<-' = function(x, val) {
      print("Already evaluated and also does not use 'val'");
      x = substitute(x); # is evaluated before
}

right(padding(df)) = 1


That doesn't make sense.  You don't have a `padding<-` function, and 
yet you are trying to call right<- to assign something to padding(df).


I'm not sure about your real intention, but assignment functions by 
their nature need to evaluate the thing they are assigning to, since 
they are designed to modify objects, not create new ones.


To create a new object, just use regular assignment.

Duncan Murdoch


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Evaluating lazily 'f<-' ?

2021-09-13 Thread Leonard Mada via R-help


How can I avoid evaluation?

right = function(x, val) {print("Right");};
padding = function(x) {print("Padding");};
df = data.frame(x=1:5, y = sample(1:5, 5));

### OK
'%=%' = function(x, val) {
    x = substitute(x);
}
right(padding(df)) %=% 1; # but ugly

### Does NOT work
'right<-' = function(x, val) {
    print("Already evaluated and also does not use 'val'");
    x = substitute(x); # is evaluated before
}

right(padding(df)) = 1


Sincerely,


Leonard

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

88 matches

Mail list logo