Re: [R] [External] "apply" a function that takes two or more vectors as arguments, such as cor(x, y), over a "category" or "grouping variable" or "index"?

2022-04-08 Thread Eric Berger
library(dplyr)
my_df |> group_by(my_category) |> summarise(my_z = cor(my_x, my_y))


On Sat, Apr 9, 2022 at 4:37 AM Richard M. Heiberger  wrote:

> look at
> ?mapply
> Apply a Function to Multiple List or Vector Arguments
>
> to see if that meets your needs
>
> > On Apr 08, 2022, at 21:26, Kelly Thompson  wrote:
> >
> > #Q. How can I "apply" a function that takes two or more vectors as
> > arguments, such as cor(x, y), over a "category" or "grouping variable"
> > or "index"?
> > #I'm using cor() as an example, I'd like to find a way to do this for
> > any function that takes 2 or more vectors as arguments.
> >
> >
> > #create example data
> >
> > my_category <- rep ( c("a","b","c"),  4)
> >
> > set.seed(12345)
> > my_x <- rnorm(12)
> >
> > set.seed(54321)
> > my_y <- rnorm(12)
> >
> > my_df <- data.frame(my_category, my_x, my_y)
> >
> > #review data
> > my_df
> >
> > #If i wanted to get the correlation of x and y grouped by category, I
> > could use this code and loop:
> >
> > my_category_unique <- unique(my_category)
> >
> > my_results <- vector("list", length(my_category_unique) )
> > names(my_results) <- my_category_unique
> >
> > #start i loop
> >  for (i in 1:length(my_category_unique) ) {
> >my_criteria_i <- my_category == my_category_unique[i]
> >my_x_i <- my_x[which(my_criteria_i)]
> >my_y_i <- my_y[which(my_criteria_i)]
> >my_correl_i <- cor(x = my_x_i, y = my_y_i)
> >my_results[i] <- list(my_correl_i)
> > } # end i loop
> >
> > #review results
> > my_results
> >
> > #Q. Is there a better or more "elegant" way to do this, using by(),
> > aggregate(), apply(), or some other function?
> >
> > #This does not work and results in this error message: "Error in
> > FUN(dd[x, ], ...) : incompatible dimensions"
> > by (data = my_x, INDICES = my_category, FUN = cor, y = my_y)
> >
> > #This does not work and results in this error message: "Error in
> > cor(my_df$x, my_df$y) : ... supply both 'x' and 'y' or a matrix-like
> > 'x' "
> > by (data = my_df, INDICES = my_category, FUN = function(x, y) { cor
> > (my_df$x, my_df$y) } )
> >
> >
> > #if I wanted the mean of x by category, I could use by() or aggregate():
> > by (data = my_x, INDICES = my_category, FUN = mean)
> >
> > aggregate(x = my_x, by = list(my_category), FUN = mean)
> >
> > #Thanks!
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-helpdata=04%7C01%7Crmh%40temple.edu%7C4c8a50fd1bf14b2cf7b408da19c7fe20%7C716e81efb52244738e3110bd02ccf6e5%7C0%7C0%7C637850644148770767%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=23Y%2Fqw7G1gb4ACIz5V41DjBIR8c2IFkkZgud9dGaftE%3Dreserved=0
> > PLEASE do read the posting guide
> https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.htmldata=04%7C01%7Crmh%40temple.edu%7C4c8a50fd1bf14b2cf7b408da19c7fe20%7C716e81efb52244738e3110bd02ccf6e5%7C0%7C0%7C637850644148770767%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=3vIZYrMBnAZKZhZCwHcLpILHEE72NuLc03LXAxr%2BXQ4%3Dreserved=0
> > and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] error with more 100 forked processes

2022-04-08 Thread Henrik Bengtsson
The reason why you hit the limit already around 100 workers, could be
because you already have other connections open, e.g. file
connections, capture.output(), etc.

If you want to use *forked* processing with more than 125 workers
using bare-bone R, you can use parallel::mclapply() and friends,
because they don't use sockets connections to communicate between the
main process and the workers.

If you don't need *forked* processing per se, there are other
alternatives, as already pointed out above.

As the author of the future framework (https://www.futureverse.org/),
I obviously suggest you try that one. It's on CRAN and installs out of
the box on all OSes. You get several alternatives for parallel
backends. For *forked* processing, call plan(multicore) on top of your
script, and it'll parallelize via the parallel::mclapply() framework
internally, so you won't have the connection limitation to worry
about(*). You can also use plan(future.callr::callr) to parallelize
via the callr package, which also don't have the connection
limitation. Your code will be the same regardless which you end up
using.  For the front end, there's future.apply::future_lapply() et
al. (parallel version of base lapply functions), furrr::future_map()
et al. (parallel version of purrr's map functions), foreach w/
doFuture if you like the y <- foreach(...) %dopar% { ... } style.

(*) But there are other issues with forked processing, e.g. it might
not be compatible with multi-threaded code used by some packages. This
is a problem independent of futures per se.

Hope this helps

Henrik

On Fri, Apr 8, 2022 at 2:19 PM Ivan Krylov  wrote:
>
> On Fri, 8 Apr 2022 22:02:25 +0200
> Guido Kraemer via R-help  wrote:
>
> >  > cl <- makeForkCluster(128)
> > Error in UseMethod("sendData") :
> >no applicable method for 'sendData' applied to an object of class
> > "NULL"
>
> In order to communicate with the workers, R creates connection objects.
> Unfortunately, the memory for connection objects in R has a
> statically-defined limit of 128. (A few connections are used by
> default, and a few more will likely be used by user code during the
> actual program run.)
>
> Try increasing the limit in #define NCONNECTIONS in
> src/main/connections.c and re-compiling R.
>
> See also: https://github.com/HenrikBengtsson/Wishlist-for-R/issues/28
> According to Henrik Bengtsson, R should work well even with as many
> as 16381 possible connections, but then you may run into OS limits on
> file descriptors.
>
>
> --
> Best regards,
> Ivan
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] error with more 100 forked processes

2022-04-08 Thread Guido Kraemer via R-help
I am trying to run a parallel job on a computer with many CPUs and get 
the following error:


> library(parallel)
> cl <- makeForkCluster(128)
Error in UseMethod("sendData") :
  no applicable method for 'sendData' applied to an object of class "NULL"

If I scale down to 100 CPUs it doesn't produce an error. I can reproduce 
this with a self compiled R 4.1.3 on Ubuntu 20.04 and Manjaro, as well 
as the R binaries that come with both distributions.



--
Guido Kraemer
Max Planck Institute for Biogeochemistry Jena
Department for Biogeochemical Integration
Hans-Knöll-Str. 10
07745 Jena
Germany

phone: +49 3641 576293
e-mail: gkrae...@bgc-jena.mpg.de

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [ESS] [External] Emacs 28.1 Released

2022-04-08 Thread Marc Schwartz via ESS-help
Hi,

Vincent, you might want to connect with David Caldwell on the issues that you 
are facing. I noted that on this page:

  https://emacsformacosx.com/about

he notes that there are changes to the app launcher that may be relevant to 
what you are experiencing. Whether there is an intentional change in behavior, 
or possible bug, may be something to evaluate.

If people find value in your distributions, given ease of use considerations, 
it seems to be worthwhile to continue to support them.

For Rich, note that since last month, with the Emacs 2.7.2 release, the macOS 
Emacs binaries from David Caldwell are universal, supporting Apple silicon.

More generally, for some time, even with Emacs 26.x, I had been using the melpa 
based ESS code, rather than the 18.10.2 release code, along with some tweaks in 
my .emacs to try to workaround some of the ESS issues that became evident with 
Emacs 27.x. ESS 18.10.2 is now over 3 years old, which is circa Emacs 26.1.

Regards,

Marc


On April 8, 2022 at 11:44:50 AM, Richard M. Heiberger via ESS-help 
(ess-help@r-project.org (mailto:ess-help@r-project.org)) wrote:

> Thank you Vincent for your distribution. Please keep it up.
>
> I used to do my own setup until xxx time ago and have used yours (both 
> Windows and Mac) since.
> I override your some of your personalizations with my own preferences and it 
> works.
>
> When I switched my Mac to the M1 last year I had some difficulties with the 
> Rosetta emulation of Emacs itself,
> so Simon Urbanek was kind enough to give me a native compilation of Emacs for 
> the M1.
> I substituted that into what was otherwise your distribution and have been 
> quite happy.
>
> For the future, I am very happy not to deal directly with the issues you 
> mention in this email.
>
> Rich
>
>
> > On Apr 08, 2022, at 11:11, Vincent Goulet via ESS-help wrote:
> >
> > Hi,
> >
> > Thanks Marc (and Richard L through GitLab) for the heads up.
> >
> > I tried building my Emacs distribution (on macOS) and stumbled on a weird 
> > problem: the 'site-lisp' directory within the application (e.g. 
> > /Applications/Emacs/Emacs.app/Contents/Resources/site-lisp) is not included 
> > in 'load-path' by default. Since this is where I bundle extensions, they 
> > are not recognized by Emacs. Perhaps the issue is upstream with David 
> > Caldwell's compilation; I'll have to check. I haven't yet taken the time to 
> > check on Windows.
> >
> > That said, ESS 18.10.2 does not compile with Emacs 28.1. It appears it is 
> > time to move forward to the development version of ESS. Those, like me, who 
> > prefer the good ol' stable ESS 18.10 are otherwise stuck on Emacs 27.x. ;-)
> >
> > Over the past few years, Emacs has moved consistently towards the ELPA 
> > package management system. Pretty much anyone able to use Emacs should now 
> > be able to install extensions easily. Org has deprecated the .zip 
> > distribution. Same for ESS de facto, at least currently. This leads me to 
> > question whether maintaining my distribution remains that much useful. Any 
> > thoughts?
> >
> > (For anyone not familiar, my Emacs distributions for macOS and Windows are 
> > stock GNU Emacs with ESS, AUCTeX, Org and some very minor configuration; 
> > see 
> > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fvigou3.gitlab.io%2Femacs-modified-macos=04%7C01%7Crmh%40temple.edu%7C718cb9b549244b1b20d008da197217ac%7C716e81efb52244738e3110bd02ccf6e5%7C0%7C0%7C637850275076056275%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000=x2oucqjMgwCXDoZYZ694p%2F1WqIpY6g9nXmwr%2FrZRNB0%3D=0;
> >  
> > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fvigou3.gitlab.io%2Femacs-modified-windows=04%7C01%7Crmh%40temple.edu%7C718cb9b549244b1b20d008da197217ac%7C716e81efb52244738e3110bd02ccf6e5%7C0%7C0%7C637850275076056275%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000=vxCP4GVkm%2BiQVxEtx5RnJ7vGGzI%2BVQYa2oQNU8pzIvI%3D=0.)
> >
> > Best,
> >
> > v.
> >
> > Vincent Goulet
> > Professeur titulaire
> > École d'actuariat, Université Laval

__
ESS-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/ess-help


Re: [R] [External] add equation and rsqared to plot

2022-04-08 Thread Bert Gunter
Thanks, Bill.

This is a subtlety I certainly did not understand.


Bert

On Fri, Apr 8, 2022 at 10:08 AM Bill Dunlap  wrote:
>
> plotmath also accepts names and calls, which it treats as though they were 
> single-element expressions.  That is why quote() generally works.  
> quote("string") or quote(123) does not invoke plotmath, as quote returns a 
> literal string or number when given such a thing.
>
> plot(0:1,0:1,type="n")
> text(.2, .6, expression(phi^epsilon))
> text(.2, .4, quote(phi^epsilon))
> text(.7, .6, expression(1234567890123456))
> text(.7, .4, quote(1234567890123456))
>
> -Bill
>
> On Fri, Apr 8, 2022 at 9:49 AM Bert Gunter  wrote:
>>
>> Yes, I also find it somewhat confusing. Perhaps this will help. I
>> apologize beforehand if I have misunderstood and you already know all
>> this.
>>
>> The key is to realize that plotmath works with **expressions**,
>> unevaluated forms that include special plotmath keywords, like 'atop',
>> and symbols. So...
>>
>> ## simple example with plotmath used in plot's title
>>
>> ## This will produce an error, as 'atop' is not an R function:
>> plot(1,1, main = atop(x,y))
>>
>> ## to make this work, we need an expression on the rhs of 'main =' . A
>> simple way to do this is to use quote():
>>
>> plot(1,1,main = quote(atop(x,y)))
>>
>> ## Note that this produce 'x' above 'y' **without quoting x and y**.
>> That's because
>> ## this is an expression that plotmath parses and evaluates according
>> to its own rules,
>> ## shown in ?plotmath
>>
>> ## Now suppose we have:
>> x <- 'first line'
>> y <- 'second line'
>>
>> ## and we want to display these quoted strings instead of 'x' and 'y'
>> in the title
>>
>> ## Then this will *not* work -- it gives the same result as before:
>> plot(1,1,main = quote(atop(x,y)))
>>
>> ## So what is needed here is R's 'computing on the language"
>> capability to substitute
>> ## the quoted strings for x and y in the expression. Here are two
>> simple ways to do this:
>>
>> ## First using substitute()
>>
>> plot(1,1, main = substitute(atop(x,y), list (x =x, y = y)))
>>
>> ## Second, using bquote()
>>
>> plot(1,1, main = bquote(atop(.(x), .(y
>>
>> ## More complicated expressions can be built up using plotmath's rules.
>> ## But you need to be careful about distinguishing plotmath expressions and
>> ## ordinary R expressions. For example:
>>
>> x <- pi/4  ## a number
>>
>> ## WRONG -- will display as written. bquote() is the same as quote() here.
>> plot(1,1, main = bquote(sin(pi/4) == round(x,2)))
>>
>> ## WRONG -- will substitute value of x rounded to session default
>> ## in previous. This is a mistake in using bquote
>> plot(1,1, main = bquote(sin(pi/4) == round(.(x), 2)))
>>
>> ## RIGHT -- use of bquote
>> plot(1,1, main = bquote(sin(pi/4) == .(round(x,2
>> ## or -- using substitute
>> plot(1,1, main = substitute(sin(pi/4) == x, list(x = round(x,2
>>
>> Hope this is helpful and, again, apologies if I have misunderstood.
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>> On Fri, Apr 8, 2022 at 7:42 AM PIKAL Petr  wrote:
>> >
>> > Hallo David
>> >
>> > Fair enough. Thanks for your explanation, which told me what should be 
>> > done. It works perfectly for my example but I am still confused how to get 
>> > expressions given to atop (or other functions) be evaluated and help page 
>> > does not enlighten me, so I am still puzzled.
>> >
>> > When I borrow example from help,
>> >
>> > plot(1:10, type="n", xlab="", ylab="", main = "plot math & numbers")
>> > theta <- 1.23 ; mtext(bquote(hat(theta) == .(theta)), line= .25)
>> > for(i in 2:9)
>> > text(i, i+1, substitute(list(xi, eta) == group("(",list(x,y),")"),
>> > list(x = i, y = i+1)))
>> >
>> > #this is OK
>> > ex1 <- expression("  first: {f * minute}(x) " == {f * minute}(x))
>> > ex2 <- expression("   second: {f * second}(x) "== {f * 
>> > second}(x))
>> > text(1, 9.6, ex1, adj=0)
>> > text(1, 9.0, ex2, adj=0)
>> >
>> > #and this is not
>> > text(2, 8, expression(atop(ex1, ex2)))
>> > text(2, 7, substitute( atop(ex1, ex2), list(ex1=ex1,ex2=ex2)))
>> >
>> > #and this works
>> > text(2, 6, expression(atop(1,2)))
>> >
>> > I tried to use eval when calling atop, but it did not work either. 
>> > Therefore some hint in help page could be quite handy.
>> >
>> > Best regards
>> > Petr Pikal
>> >
>> > S pozdravem | Best Regards
>> > RNDr. Petr PIKAL
>> > Vedoucí Výzkumu a vývoje | Research Manager
>> > PRECHEZA a.s.
>> > nábř. Dr. Edvarda Beneše 1170/24 | 750 02 Přerov | Czech Republic
>> > Tel: +420 581 252 256 | GSM: +420 724 008 364
>> > petr.pi...@precheza.cz | www.precheza.cz
>> >
>> > Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních 
>> > partnerů PRECHEZA a.s. jsou zveřejněny na: 
>> > 

Re: [R] pam() with more general dissimilarity / distance

2022-04-08 Thread Martin Maechler
I was asked in private, but reply in public,
so others can also find this answer in the future:

On Fri, Apr 8, 2022 at 1:11 PM  . wrote :
>  Hello
> dear Dr. Maechler
> I have a question about "pam" function in the cluster package. In this
> function, we choose one of the  euclidean or manhattan distances to
> calculate dissimilarity but in the mixed typed data sets the true index may
> be jaccard or other indicators.
> How can we allocate the "true" metric for each variable?
> Best regards
>

yes,  you can use pam() use in two ways;  see this part of the help page :

  Arguments:

   x: data matrix or data frame, or dissimilarity matrix or object,
  depending on the value of the ‘diss’ argument.

  In case of a matrix or data frame, each row corresponds to an
  observation, and each column corresponds to a variable.  All
  variables must be numeric.  Missing values (NAs) _are_
  allowed-as long as every pair of observations has at least
  one case not missing.

  In case of a dissimilarity matrix, ‘x’ is typically the
  output of daisy or dist.  Also a vector of length
  n*(n-1)/2 is allowed (where n is the number of observations),
  and will be interpreted in the same way as the output of the
  above-mentioned functions. Missing values (NAs) are _not_
  allowed.

So, you can first use   dx <-  daisy(x, ...) and use the correct
distance between your observational units,
After that you can use the computed distance / dissimilarity matrix
(the `dx`)  in you call to pam():

px <- pam(dx, k=., )

I hope this helps you.
With best regards,
Martin

--
Martin Maechler
ETH Zurich

‪

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.