I was having a performance problem that I traced to nub applied to boxed
arrays.

Nub sieve ~: gives the same results here whether the items are unboxed,
boxed, or doubly boxed:
   +/ ~: (?. 10000 4 4 $ 2)
9255
   +/ ~: <"2 (?. 10000 4 4 $ 2)
9255
   +/ ~: <"0 <"2 (?. 10000 4 4 $ 2)
9255

But the runtime is very different in the doubly boxed case:
   6!:2 '+/ ~: (?. 10000 4 4 $ 2)'
0.00105408
   6!:2 '+/ ~: <"2 (?. 10000 4 4 $ 2)'
0.00585098
   6!:2 '+/ ~: <"0 <"2 (?. 10000 4 4 $ 2)'
14.9312

Boxing the items only once, performance appears close to linear:
   6!:2 '+/ ~: <"2 (?. 1000 4 4 $ 2)'
0.000527954
   6!:2 '+/ ~: <"2 (?. 10000 4 4 $ 2)'
0.00488113
   6!:2 '+/ ~: <"2 (?. 100000 4 4 $ 2)'
0.075351

But doubly-boxed, performance seems to become nearly quadratic:
   6!:2 '+/ ~: <"0 <"2 (?. 1000 4 4 $ 2)'
0.162159
   6!:2 '+/ ~: <"0 <"2 (?. 10000 4 4 $ 2)'
14.9312
   6!:2 '+/ ~: <"0 <"2 (?. 100000 4 4 $ 2)'
1106.85

Timing is similar with nub instead of nub sieve.

Is there any J documentation that explains the performance of nub in
various scenarios?  I haven't been able to find any.

Thanks.
-Chris
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to