RE: [R] Massive clustering job?

2004-12-17 Thread Dan Bolser
Sent: Wednesday, December 15, 2004 6:37 AM To: R mailing list Subject: [R] Massive clustering job? Hi, I have ~40,000 rows in a database, each of which contains an id column and 20 additional columns of count data. I want to cluster the rows based on these count vectors. Their are ~1.6 billion

[R] Massive clustering job?

2004-12-15 Thread Dan Bolser
Hi, I have ~40,000 rows in a database, each of which contains an id column and 20 additional columns of count data. I want to cluster the rows based on these count vectors. Their are ~1.6 billion possible 'distances' between pairs of vectors (cells in my distance matrix), so I need to do

Re: [R] Massive clustering job?

2004-12-15 Thread Christian Hennig
Dear Dan, I would think about transforming your columns in such a way (square root, log?) that methods operating on n*p matrices and assuming roughly elliptical within-clusters distributions such as kmeans or clara, or, after dimension reduction, EMclust or fixmahal can be applied. Maybe you can

RE: [R] Massive clustering job?

2004-12-15 Thread Wiener, Matthew
It sounds like clara in package cluster might help. Regards, Matt Wiener -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Dan Bolser Sent: Wednesday, December 15, 2004 6:37 AM To: R mailing list Subject: [R] Massive clustering job? Hi, I have