In message <[EMAIL PROTECTED]>, r-help- [EMAIL PROTECTED] writes >Can comeone give me an example (perhaps in a private response, since I'm off >topic here) where one actually needs all cases in a large data set ("large" >being > 1e6, say) to do a STATISTICAL analysis? By "statistical" I exclude, >say searching for some particular characteristic like an adverse event in a >medical or customer repair database, etc. Maybe a definition of >"statistical" is: anything that cannot be routinely done in a single pass >database query.
If the dimensionality of the data is large, you may need a large number of cases too. An example from my own experience would be using quadratic discriminant analysis (with regularization) for classifying symbols for an OCR program. With 200 classes and 100 features, I'd really like many millions of cases. I've been using about 20,000 per class or 4 million in total, but if I had 40 million it would probably work better. Compared to many applications in pattern recognition and data mining, I think this is a fairly small example. -- Graham Jones, author of SharpEye Music Reader http://www.visiv.co.uk 21e Balnakeil, Durness, Lairg, Sutherland, IV27 4PT, Scotland, UK ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html