Re: [R] millions of comparisons, speed wanted

2005-12-17 Thread Adrian DUSA
The daisy function is _very_ good! I have been able to use it for nominal variables as well, simply by: daisy(input)*ncol(input) Now, for very large number of rows (say 5000), daisy works for about 3 minutes using the swap space. I probably need more RAM (only 512 on my computer). But at least I

Re: [R] millions of comparisons, speed wanted

2005-12-16 Thread Martin Maechler
I have not taken the time to look into this example, but daisy() from the (recommended, hence part of R) package 'cluster' is more flexible than dist(), particularly in the case of NAs and for (a mixture of continuous and) categorical variables. It uses a version of Gower's formula in

[R] millions of comparisons, speed wanted

2005-12-15 Thread Adrian DUSA
Dear all, I have a 10 columns matrix which has 2^10=1024 unique rows, with values 0 and 1. What I would like to do is a little complicated; in a simple statement, for a subset (say 1000 rows) I want to perform pairwise comparisons between all rows, find out which rows differ by only one

Re: [R] millions of comparisons, speed wanted

2005-12-15 Thread Liaw, Andy
Just some untested idea: If the data are all 0/1, you could use dist(input, method=manhattan), and then check which entry equals 1. This should be much faster than creating all pairs of rows and check position-by-position. HTH, Andy From: Adrian DUSA Dear all, I have a 10 columns matrix