Here are several differently structured data items:

d =. (n?@$3)<@]^:[&.> c=.<"0 b=. <"2 a=. ?. (n, 4 4) $ 2 [n=.50000

d is irregularly boxed.

By using spread S: on the desired level (here it is 0) you can nub much faster.

   ([: ts '~: '&,)&> ']S:0 d';']S:0 c';'b';'a'
0.059811 9.70701e6
0.040415 6.29702e6
0.017502    590464
0.005743    590528




On 27-06-13 04:19, Christopher Rosin wrote:
Thanks.  I'm running into this because the items in my large lists are
complex data structures that have nested boxing.

I ended up using the following as a replacement for ~:
    ns =: (] (/: /:)~ 1: , 2: ([: -. -:)/\ /:~)
Which seems to give the same results as ~: at least in the context of my
application.

'ns' can be much faster than ~: for large lists of deeply boxed data:
    6!:2 '+/ ~: <"0 <"2 (?. 100000 4 4 $ 2)'
1106.85
    6!:2 '+/ ns <"0 <"2 (?. 100000 4 4 $ 2)'
2.08842

Even though it is slower on simpler unboxed data:
    6!:2 '+/ ~: (?. 100000 4 4 $ 2)'
0.0196997
    6!:2 '+/ ns (?. 100000 4 4 $ 2)'
0.0867839

I don't think that 'ns' is an exact drop-in replacement for ~: though.
For example, with arrays of floating-point values that are close enough to
be on the edge of tolerance, I imagine 'ns' could give different results
than ~:

-Chris





On Wed, Jun 26, 2013 at 6:31 PM, Tracy Harms <[email protected]> wrote:

In such cases it may be worthwhile to make a keying list with items that
correspond to those of the list for which you want to compute the nub. If
you calculate simple unique values for each item you may rely on the
correspondence as needed.
On Jun 25, 2013 11:29 PM, "Christopher Rosin" <[email protected]> wrote:

I was having a performance problem that I traced to nub applied to boxed
arrays.

Nub sieve ~: gives the same results here whether the items are unboxed,
boxed, or doubly boxed:
    +/ ~: (?. 10000 4 4 $ 2)
9255
    +/ ~: <"2 (?. 10000 4 4 $ 2)
9255
    +/ ~: <"0 <"2 (?. 10000 4 4 $ 2)
9255

But the runtime is very different in the doubly boxed case:
    6!:2 '+/ ~: (?. 10000 4 4 $ 2)'
0.00105408
    6!:2 '+/ ~: <"2 (?. 10000 4 4 $ 2)'
0.00585098
    6!:2 '+/ ~: <"0 <"2 (?. 10000 4 4 $ 2)'
14.9312

Boxing the items only once, performance appears close to linear:
    6!:2 '+/ ~: <"2 (?. 1000 4 4 $ 2)'
0.000527954
    6!:2 '+/ ~: <"2 (?. 10000 4 4 $ 2)'
0.00488113
    6!:2 '+/ ~: <"2 (?. 100000 4 4 $ 2)'
0.075351

But doubly-boxed, performance seems to become nearly quadratic:
    6!:2 '+/ ~: <"0 <"2 (?. 1000 4 4 $ 2)'
0.162159
    6!:2 '+/ ~: <"0 <"2 (?. 10000 4 4 $ 2)'
14.9312
    6!:2 '+/ ~: <"0 <"2 (?. 100000 4 4 $ 2)'
1106.85

Timing is similar with nub instead of nub sieve.

Is there any J documentation that explains the performance of nub in
various scenarios?  I haven't been able to find any.

Thanks.
-Chris
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

--
Met vriendelijke groet,
@@i = Arie Groeneveld

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to