Thanks. I'm running into this because the items in my large lists are complex data structures that have nested boxing.
I ended up using the following as a replacement for ~: ns =: (] (/: /:)~ 1: , 2: ([: -. -:)/\ /:~) Which seems to give the same results as ~: at least in the context of my application. 'ns' can be much faster than ~: for large lists of deeply boxed data: 6!:2 '+/ ~: <"0 <"2 (?. 100000 4 4 $ 2)' 1106.85 6!:2 '+/ ns <"0 <"2 (?. 100000 4 4 $ 2)' 2.08842 Even though it is slower on simpler unboxed data: 6!:2 '+/ ~: (?. 100000 4 4 $ 2)' 0.0196997 6!:2 '+/ ns (?. 100000 4 4 $ 2)' 0.0867839 I don't think that 'ns' is an exact drop-in replacement for ~: though. For example, with arrays of floating-point values that are close enough to be on the edge of tolerance, I imagine 'ns' could give different results than ~: -Chris On Wed, Jun 26, 2013 at 6:31 PM, Tracy Harms <[email protected]> wrote: > In such cases it may be worthwhile to make a keying list with items that > correspond to those of the list for which you want to compute the nub. If > you calculate simple unique values for each item you may rely on the > correspondence as needed. > On Jun 25, 2013 11:29 PM, "Christopher Rosin" <[email protected]> wrote: > > > I was having a performance problem that I traced to nub applied to boxed > > arrays. > > > > Nub sieve ~: gives the same results here whether the items are unboxed, > > boxed, or doubly boxed: > > +/ ~: (?. 10000 4 4 $ 2) > > 9255 > > +/ ~: <"2 (?. 10000 4 4 $ 2) > > 9255 > > +/ ~: <"0 <"2 (?. 10000 4 4 $ 2) > > 9255 > > > > But the runtime is very different in the doubly boxed case: > > 6!:2 '+/ ~: (?. 10000 4 4 $ 2)' > > 0.00105408 > > 6!:2 '+/ ~: <"2 (?. 10000 4 4 $ 2)' > > 0.00585098 > > 6!:2 '+/ ~: <"0 <"2 (?. 10000 4 4 $ 2)' > > 14.9312 > > > > Boxing the items only once, performance appears close to linear: > > 6!:2 '+/ ~: <"2 (?. 1000 4 4 $ 2)' > > 0.000527954 > > 6!:2 '+/ ~: <"2 (?. 10000 4 4 $ 2)' > > 0.00488113 > > 6!:2 '+/ ~: <"2 (?. 100000 4 4 $ 2)' > > 0.075351 > > > > But doubly-boxed, performance seems to become nearly quadratic: > > 6!:2 '+/ ~: <"0 <"2 (?. 1000 4 4 $ 2)' > > 0.162159 > > 6!:2 '+/ ~: <"0 <"2 (?. 10000 4 4 $ 2)' > > 14.9312 > > 6!:2 '+/ ~: <"0 <"2 (?. 100000 4 4 $ 2)' > > 1106.85 > > > > Timing is similar with nub instead of nub sieve. > > > > Is there any J documentation that explains the performance of nub in > > various scenarios? I haven't been able to find any. > > > > Thanks. > > -Chris > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > > > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
