Hi-- While I agree that we cannot agree on the ideal algorithms, we should be taking practical steps to implement microarrays in the clinic. I think we can all agree that our algorithms have some degree of efficacy over and above conventional diagnostic techniques. If patients are dying from lack of diagnostic accuracy, I think we have to work hard to use this technology to help them, if we can. I think we can, even now.
What if I offer, in my clinic, a service for cancer patients to compare their affy data to an existing set of data, to predict their prognosis or response to chemotherapy? I think people will line up out the door for such a service. Knowing what we as a group of array analyzers know, wouldn't we all want this kind of service available if we or a loved one got cancer? Can our programs deal with 1,000 .cel files? 10,000 files? I think our programs are pretty good, but what we need is DATA. We must be careful what we wish for--we might get it! So how do we measure whether analyzing 10,000 .cel files with library(affy) is feasible? I'm assuming that advanced hardware would be required for such a task. What are the critical components of such a platform? How much money would a feasible system for array analysis cost? I was just looking ahead two or three years--where is all this genomic array research headed? I guess I'm concerned about scalability. Is anyone really working on implementing affy on a cluster/Beowulf? That sounds like a real challenge. Regards, Michael Benjamin, MD -----Original Message----- From: Liaw, Andy [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 03, 2003 9:47 PM To: 'Michael Benjamin' Subject: RE: [BioC] R performance questions Another point about benchmarking: As has been discussed on R-help before, benchmarks can be misleading, as the one you mentioned. It measures linear algebra tasks, etc., but that typically account for very small portion of "average" tasks. Doug Bates also pointed out that the eigen() example used in that benchmark is computing mostly meaningless results. In our experience, learning to use R more efficiently gives us the most mileage, but large and fast hardware wouldn't hurt... Cheers, Andy > -----Original Message----- > From: Michael Benjamin [mailto:[EMAIL PROTECTED] > Sent: Wednesday, December 03, 2003 7:32 PM > To: 'Liaw, Andy' > Subject: RE: [BioC] R performance questions > > > Thanks. > Mike > > -----Original Message----- > From: Liaw, Andy [mailto:[EMAIL PROTECTED] > Sent: Wednesday, December 03, 2003 8:17 AM > To: 'Michael Benjamin' > Subject: RE: [BioC] R performance questions > > Hi Michael, > > Just one comment about SVM. If you use the svm() function in > the e1071 > package to train linear SVM, it will be rather slow. That's a known > limitation of libsvm, of which the svm() function uses. If you are > willing > to go outside of R, the "bsvm" package by C.J. Lin (same person who > wrote > libsvm) will train linear svm in much more efficient manner. > > HTH, > Andy > > > -----Original Message----- > > From: [EMAIL PROTECTED] > > [mailto:[EMAIL PROTECTED] On Behalf Of > > Michael Benjamin > > Sent: Tuesday, December 02, 2003 10:30 PM > > To: [EMAIL PROTECTED] > > Subject: [BioC] R performance questions > > > > > > Hi, all-- > > > > I wanted to start a thread on R speed/benchmarking. There > is a nice R > > benchmarking overview at > http://www.sciviews.org/other/benchmark.htm, > > along with a > free script so you can see how your machine stacks up. > > > > Looks like R is substantially faster than S-plus. > > > > My problem is this: with 512Mb and an overclocked AMD > Athlon XP 1800+, > > running at 588 SPEC-FP 2000, it still takes FOREVER to > > analyze multiple > > .cel files using affy (expresso). Running svm takes a mighty > > long time > > with more than 500 genes, 150 samples. > > > > Questions: > > 1) Would adding RAM or processing speed improve performance > the most? > > 2) Is it possible to run R on a cluster without rewriting my > > high-level > > code? In other words, > > 3) What are we going to do when we start collecting > terabytes of array > > data to analyze? There will come a "breaking point" at > which desktop > > systems can't perform these analyses fast enough for large > > quantities of > > data. What then? > > > > Michael Benjamin, MD > > Winship Cancer Institute > > Emory University, > > Atlanta, GA > > > > _______________________________________________ > > Bioconductor mailing list > > [EMAIL PROTECTED] > > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > > > > > > ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
