Danny,
Here's another approach that doesn't use sorting. Instead, after
calculating distances it considers a threshold on distance and counts how
many cases are within the threshold. Then a search over thresholds is
conducted to find a threshold yielding the desired number of cases. Then
case n
A quick and dirty clustering method (I think its due to Hartigan, at
least I recall first seeing it in his book on clustering) is to pick a
random set of seed cases, and then make one pass through the data,
assigning each case to the seed closest to it. Then you can compute
your distance matrices
<[EMAIL PROTECTED]> writes:
> I've only begun investigating R as a substitute for SPSS.
>
> I have a need to identify for each CASE the closest (or most similar) 5
> other CASES (not including itself as it is automatically the closest). I
> have a fairly large matrix (5 cases by 50 vars).
Danny -
The flip answer is, it depends on the size of your computer.
One can readily calculate the number of entries in the pairwise
distance matrix that you would like to calculate, and ask whether
it will fit inside the physical memory installed in your computer.
It is 50,000 x 50,000 x 8 byte
I've only begun investigating R as a substitute for SPSS.
I have a need to identify for each CASE the closest (or most similar) 5
other CASES (not including itself as it is automatically the closest). I
have a fairly large matrix (5 cases by 50 vars). In SPSS, I can use Correlate >
Distan