Hi, Louis,

I've got mixed results here. The "cmaxs" solution is, indeed, fastest
so far, but (1) even then I'm not brave enough to test it against more
than 10000 lines (certainly not "millions"), and, (2) perhaps because
of "words_alpha" (simple arrays of characters) as test subject, very
naive approach wins.

   s =. freads 'words_alpha.txt'
   lines =: <;._2 s
   lines10000 =: (10000 ?. #lines) { lines

   ts 'fastincl lines10000'
17.6488 190720
   ts 'recincl lines10000'
16.871 3.3762e7
   ts '(P cmaxs L) lines10000'
7.76267 395264
   ts 'naive lines10000'
1.47661 1.07415e9

Where "naive" follows aforementioned Perl approach, except it doesn't
stop searching after second match, and its huge memory appetite (my
fault? could it be written better?) won't allow more than ~10000
samples anyway (FWIW, Perl script, though on different randomly chosen
shuffled 10000 lines, completes in 1.1 sec and of course doesn't eat 1
Gb of RAM).

   naive =: 3 : 0
joined =. ;(,&'|')&.> y
(1&= +/"1 y ((>@:[)E.])"0 _ joined)#y
)

On the other hand, I appreciate that your solutions are very general,
thank you for shedding the light on correct path. E.g. "P" can be
easily replaced to test for unordered subsets instead. Then another
dilemma, whether repeated part numbers are allowed on one line...

Best regards,
Vadim

On Sat, Jul 6, 2019 at 5:35 AM Louis de Forcrand <ol...@bluewin.ch> wrote:
>
> Hi again,
>
> I’m back with a speedier solution.
> comaxs is a conjunction, and if P is a boolean-valued dyadic verb and L any 
> monadic verb on the items of the array A, P comaxs L A does the following:
>
> P and L are supposed to represent orderings on the elements of y. P 
> represents a partial order, such that x P y says whether x is less than or 
> equal to y for x and y any items of A. L is a “labelling” of the items of A, 
> which represents a _total_ order compatible with P, in the sense that x P y 
> implies that (L x) <: L y (or more generally that L x comes first in /:~(L 
> x),L y).
>
> Then the result of the computation is the set of maximal items of A for the 
> partial order P, that is, items not less than any other item.
>
> As mentioned earlier, the particular problem of inclusion of part number 
> lists is a special case of this; indeed we can take
>
> P=: +./@E.&>
> L=: #@>
>
> since, for part number lists (or any kind of lists) x and y, if x is included 
> in y, then certainly #x is no larger than #y.
>
> To measure performance we need a big, rather random dataset. Unfortunately I 
> don’t know of any place I can find huge lists of part numbers, some of which 
> include others, but if we look instead at lists of letters, we have a great 
> data source which is the dictionary. At the following website we can find a 
> quite huge file words_alpha.txt containing lots (too many for any of my 
> devices probably) of english words, many of which are included in others:
>
> https://github.com/dwyl/english-words
>
> Running freads on it allows us to try out cmaxs on a part of the English 
> language. Sadly though I’m on my phone right now, and I can’t freads 
> downloaded files, so either I’ll do it at a later time or someone else can 
> try.
>
> The code works thanks to the previous observation used in fastmax that we can 
> safely remove an element a from A that is included in another element, and no 
> longer check other items for inclusion in a. The reason this verb is much 
> faster than fastmax is because now we also know that if #x is greater than #y 
> then we never have to check if x is included in y. (More generally, (L x) > L 
> y implies -. x P y.)
> We therefore sort A by decreasing list length (L value) prior to running a 
> modified version of fastmax which only checks if  (i{A) P (j{A) when i > j.
> The last bit if speedup if achieved by manipulating bit arrays rather than 
> the actual array A itself.
>
> Cheers,
> Louis
>
>
> cmaxs=: 2 : 0
>  n=. #B=. 0"0 C=. 1"0 y=. (\: v) y
>  while. n>i=. C i.1 do.
>   B=. 1 i}B
>   C=. 0 i}C
>   C=. -.@:u&(i{y)&.(C&#) y
>  end.
>  B#y
> )
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to