Hi again,

I’m back with a speedier solution.
comaxs is a conjunction, and if P is a boolean-valued dyadic verb and L any 
monadic verb on the items of the array A, P comaxs L A does the following:

P and L are supposed to represent orderings on the elements of y. P represents 
a partial order, such that x P y says whether x is less than or equal to y for 
x and y any items of A. L is a “labelling” of the items of A, which represents 
a _total_ order compatible with P, in the sense that x P y implies that (L x) 
<: L y (or more generally that L x comes first in /:~(L x),L y).

Then the result of the computation is the set of maximal items of A for the 
partial order P, that is, items not less than any other item.

As mentioned earlier, the particular problem of inclusion of part number lists 
is a special case of this; indeed we can take

P=: +./@E.&>
L=: #@>

since, for part number lists (or any kind of lists) x and y, if x is included 
in y, then certainly #x is no larger than #y.

To measure performance we need a big, rather random dataset. Unfortunately I 
don’t know of any place I can find huge lists of part numbers, some of which 
include others, but if we look instead at lists of letters, we have a great 
data source which is the dictionary. At the following website we can find a 
quite huge file words_alpha.txt containing lots (too many for any of my devices 
probably) of english words, many of which are included in others:

https://github.com/dwyl/english-words

Running freads on it allows us to try out cmaxs on a part of the English 
language. Sadly though I’m on my phone right now, and I can’t freads downloaded 
files, so either I’ll do it at a later time or someone else can try.

The code works thanks to the previous observation used in fastmax that we can 
safely remove an element a from A that is included in another element, and no 
longer check other items for inclusion in a. The reason this verb is much 
faster than fastmax is because now we also know that if #x is greater than #y 
then we never have to check if x is included in y. (More generally, (L x) > L y 
implies -. x P y.)
We therefore sort A by decreasing list length (L value) prior to running a 
modified version of fastmax which only checks if  (i{A) P (j{A) when i > j.
The last bit if speedup if achieved by manipulating bit arrays rather than the 
actual array A itself.

Cheers,
Louis


cmaxs=: 2 : 0
 n=. #B=. 0"0 C=. 1"0 y=. (\: v) y
 while. n>i=. C i.1 do.
  B=. 1 i}B
  C=. 0 i}C
  C=. -.@:u&(i{y)&.(C&#) y
 end.
 B#y
)

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to