Sent: Wednesday, December 15, 2004 6:37 AM
To: R mailing list
Subject: [R] Massive clustering job?
Hi,
I have ~40,000 rows in a database, each of which contains an id column and
20 additional columns of count data.
I want to cluster the rows based on these count vectors.
Their are ~1.6 billion
Hi,
I have ~40,000 rows in a database, each of which contains an id column and
20 additional columns of count data.
I want to cluster the rows based on these count vectors.
Their are ~1.6 billion possible 'distances' between pairs of vectors
(cells in my distance matrix), so I need to do
Dear Dan,
I would think about transforming your columns in such a way (square
root, log?) that methods operating on n*p matrices and assuming
roughly elliptical within-clusters distributions such as kmeans or
clara, or, after dimension reduction, EMclust or fixmahal can be applied.
Maybe you can
It sounds like clara in package cluster might help.
Regards,
Matt Wiener
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Dan Bolser
Sent: Wednesday, December 15, 2004 6:37 AM
To: R mailing list
Subject: [R] Massive clustering job?
Hi,
I have