Dear R experts, please excuse me for writing to the mailing list without subscribing. I have a somewhat urgent problem that relates to R.
I have to process large amounts of data with R - I'm in an international collaboration and the data processing protocol is fixed, that is a specific set of R commands has to be used. I wrote a perl program that manages creation of data subsets from my database and feeds these subsets to an R process via pipes. This worked all right, however, I wanted to speed things up by exploiting the fact that I have a dual-core machine. So I rewrote my perl driver program to use two threads, each starting its own R instance, getting data off a queue and feeding it to its R process. This also worked, except that I noticed something very peculiar: the processing time was almost exactly the same for both cases. I did some tests to look at this, and it seems that R needs twice the time to do the exact same thing if there are two instances of it running. I don't understand how is this possible. Maybe there is an issue of thread-safety with the R backend, meaning that the two R *interpreter* instances are talking to the same backend that's capable of processing only one thing at a time? Technical details: OS was Ubuntu 9.04 running on a Core2Dou E7300, and the R version used was the default one from the Ubuntu repository. Please see http://www.perlmonks.org/?node_id=792460 for an extended discussion of the problem, and especially http://www.perlmonks.org/?node_id=793506 for excerpts of output and actual code. Thanks for your answers in advance: Péter Juhász ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.