I originally defined:
rankbykey1=: +/@( =...@[ ( [ * ((i.~\:~)"_1)@:( (*"_1 _))) ] )
rankbykey2=: ( ;@((i.~\:~)&.>@</.) /: ;@([</....@#@]) )
key=: 1 1,1 0,1 1,1 0,1 0,0 1,:1 1
data=: 0 1,1 1,0 2,0 2,2 0,0 0,:1 1
key rankbykey1 data
2 1 1 2 0 0 0
key rankbykey2 data
2 1 1 2 0 0 0
I cannot figure out how to use ~: to express this function as Raul
Miller suggested, does anyone see a specific solution using it?
Martin Neitzel gave a better solution for the first part by avoiding
unnecessary boxing/unboxing.
rankbykey3=: (;@(<@(i.~\:~)/.) /: ;@([</....@#@]) )
key rankbykey3 data
2 1 1 2 0 0 0
On my large data set, rankbykey3 runs in just under 24 seconds, as
compared to just over 24 seconds for rankbykey2 (rankbykey1 runs out of
memory).
Using {~ or i...@[ does not give the correct answer in general.
key (;@(<@(i.~\:~)/.) {~ ;@([</....@#@]) ) data
2 0 0 1 1 2 0
key (;@(<@(i.~\:~)/.) /: i...@] ) data
2 1 0 0 1 2 0
key (;@(<@(i.~\:~)/.) {~ i...@] ) data
2 1 0 0 2 0 1
The distinct keys are not ordered, so (;@(<@(i.~\:~)/.) /: [) or
(;@(<@(i.~\:~)/.) \: [) would not work for my situation (I am applying
it to a large table which is actually ordered, but not by the columns
which are used jointly as the key, or the columns which the ranking is
applied to).
Thanks for the responses!
Jordan
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of
[email protected]
Sent: Tuesday, November 10, 2009 9:25 PM
To: [email protected]
Subject: Re: [Jprogramming] rank by a key
> In the following example, collection 0 is 10 30 20 20 and has ranks 3
0
> 1 1, while collection 1 is 200 100 300 and has ranks 1 2 0. The
problem
> is the extra 0 appended to make them the same size, [...]
The usual way to avoid those extra fills is to box the interim values
and to flatten afterwards. This is what your left tine of your
rankbykey2 is doing, but with some unnecessarily convoluted
"pre-boxing - pre-de-boxing - post-re-boxing" sequence.
> rankbykey2=: ( ;@((i.~\:~)&.>@</.) /: ;@([</....@#@]) )
Developing it directly from your innitial code
> 0 1 1 1 0 0 0 (i.~\:~)/. 10 200 100 300 30 20 20
> 3 0 1 1
> 1 2 0 0
in the naive way I come to:
0 1 1 1 0 0 0 <@(i.~\:~)/. 10 200 100 300 30 20 20
+-------+-----+
|3 0 1 1|1 2 0|
+-------+-----+
0 1 1 1 0 0 0 ;@(<@(i.~\:~)/.) 10 200 100 300 30 20 20
3 0 1 1 1 2 0
> [...] and the rearranging
> that happens. I want the output to be 3 1 2 0 0 1 1, so that it
> corresponds to the original right-side input.
For that step, I can offer a slight variation, too. Your right tine in
> rankbykey2=: ( ;@((i.~\:~)&.>@</.) /: ;@([</....@#@]) )
reshuffles the index vector 0 1 2 3 ... N according to the key sets:
0 1 1 1 0 0 0 ;@([ </. i...@#@]) 10 200 100 300 30 20 20
0 4 5 6 1 2 3
With these values, there is no difference between simple indexing and
re-grading your rank values:
0 1 1 1 0 0 0 (;@(<@(i.~\:~)/.) /: ;@([</....@#@])) 10 200 100 300 30
20 20
3 1 2 0 0 1 1
0 1 1 1 0 0 0 (;@(<@(i.~\:~)/.) {~ ;@([</....@#@])) 10 200 100 300 30
20 20
3 1 2 0 0 1 1
My idea was to re-grade the rank values a bit more directly on the key:
0 1 1 1 0 0 0 (;@(<@(i.~\:~)/.) /: [) 10 200 100 300 30 20 20
3 1 2 0 0 1 1
works already nicely, but just because the (two) distinct key groups
happen to be introduced in ascending order. (Perhaps your keys
*ARE* always of this nature?) In the general case, self-indexing will
help:
i.~ 8 4 4 4 8 8 8
0 1 1 1 0 0 0
8 4 4 4 8 8 8 (;@(<@(i.~\:~)/.) /: i...@[) 10 200 100 300 30 20 20
3 1 2 0 0 1 1
All in all, this is not very different from what you did. Slighltly
less boxing/unboxing. Can you measure any difference?
Martin Neitzel
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm