Prof Brian Ripley wrote: >Why not use exp(y*log(x)) if it is adequate for your purposes? It is >faster under Windows.
I will try... Thank you for your advice. >There really is no value in using millions of cases in LVQ or LDA or, I >suspect, random forests. But a difference of a few minutes means that >this is well under 20% of the total time unless your statistical analysis >is very much speedier than mine. No, sorry, the millions of cases are predictions made according to a training set build around circa 2,000 items. I have a prediction rate with a method that combines lvq, lda and random forest of about 5,000 items / sec, which is, roughly, 3 to 4 minutes for 1,000,000 items plus the time to load the dataset, it is a little bit less than 9 minutes... but more than the double with that slow ^. Ok, what is 10 minutes in a lifetime ;-) On Tue, 18 Nov 2003, Philippe Grosjean wrote: > Prof Brian Ripley wrote: > >Your subject line is seriously misleading: this is not `in R' but rather > >in the pre-compiled binary of R on one OS (Windows) against one particular > >runtime (which was actually changed long before R 1.7.1). > > OK, I have not tested on other platforms... However, this is also in R as a > consequence, as soon as R is compiled against the slower routines [in > Windows only] > > >You could not do this by an `R package': that cannot change the runtime > >code in use. You (or someone else) could build R against an alternative > >runtime library system, but it might be easier to use a better OS. > > I compile my own version of R 1.8.0 against MingW 2.0.1 for this reason... > and I really agree with you: "it might be easier to use a better OS". > However, you should first convince the hundreds of people I target with my R > code. Those are biologists, ecologists, oceanographers,... and most of them > use Windows, not Linux/Unix. So, I am forced to use Windows myself. > > >I have yet to see any real statistics application in which this made any > >noticeable difference. With modern processors one is talking about 10x > >faster than a few milliseconds unless the datasets are going to take many > >seconds even to load into R. If you have such an application (a real > >example where ^ takes more than say 20% of the total time of the > >statistical analysis, and the total time is minutes) please share it. > > Here it is: I am working with very large datasets of zooplankton, containing > among others, results from image analyses on each individual. It is very > common in biology to transform/recode/calculate (or whatever you call it) > raw data according to precalibrated allometric relationships. Those have the > general form of Huxley's equation: > > y = a.x^b > > Now, you see what I mean: I have to transform about 17 measurements this way > for each individual in my multi-million entries dataset (note that I do not > compute the whole dataset at once), before using methods like LDA, learning > vector quantization (actually, your code from the VR bundle), or random > forest. In this case, especially with lda or lvq, which are pretty fast, it > really makes the difference in term of minutes in my PIV 2.8 Ghz with 1 Gb > memory... and Windows XP. > > OK, I can understand that the R-core team does not have time to waste on > this problem, especially because they use a better OS. However, I know a lot > of people (the ones that will use my code to analyze their own zooplankton > series) that would benefit my "own faster-MingW 2.0-compiled R 1.8.0 Windows > version", or a better solution. So what? Do I have to distribute it myself? > > Do I have to spot this problem in my benchmark test at > http://www.sciviews.org/other/benchmark.htm (25.7 sec for the whole test > with R 1.8.0 from CRAN against 11.9 sec for R 1.8.0 compiled with MingW > 2.0.1 under Windows on the same computer)? I have not updated it since R > version 1.7.0 to avoid publishing such a bad result. And I have not posted > my own compiled R version online, because it is neither a good practice, nor > a good solution... > > I am looking for a better solution. > Best, > > Philippe Grosjean > > > > -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-devel