Hmm, I’m disappointed the slight « loopiness » of my solutions makes them that much slower than the naïve approach :( There probably is however little inclusion among random words, maybe try again with, say, the first 10000 words? :)
Also, there is special code for calculating +/x E.y as x +/@:E. y; try the following modification of your verb (probably won’t be much faster but will definitely use less space): naiveS=: 3 : 0 joined=. ; (,&'|')&.> y y #~ 1 = +/@E.&joined@> y ) Will keep thinking! Louis PS: If naiveS isn’t any better try +/@:E. instead of +/@E., as only the former is actually listed in the special code section of the dictionary even though they are equivalent. > On 7 Jul 2019, at 20:15, vadim . <vadim3128m...@gmail.com> wrote: > > Hi, Louis, > > I've got mixed results here. The "cmaxs" solution is, indeed, fastest > so far, but (1) even then I'm not brave enough to test it against more > than 10000 lines (certainly not "millions"), and, (2) perhaps because > of "words_alpha" (simple arrays of characters) as test subject, very > naive approach wins. > > s =. freads 'words_alpha.txt' > lines =: <;._2 s > lines10000 =: (10000 ?. #lines) { lines > > ts 'fastincl lines10000' > 17.6488 190720 > ts 'recincl lines10000' > 16.871 3.3762e7 > ts '(P cmaxs L) lines10000' > 7.76267 395264 > ts 'naive lines10000' > 1.47661 1.07415e9 > > Where "naive" follows aforementioned Perl approach, except it doesn't > stop searching after second match, and its huge memory appetite (my > fault? could it be written better?) won't allow more than ~10000 > samples anyway (FWIW, Perl script, though on different randomly chosen > shuffled 10000 lines, completes in 1.1 sec and of course doesn't eat 1 > Gb of RAM). > > naive =: 3 : 0 > joined =. ;(,&'|')&.> y > (1&= +/"1 y ((>@:[)E.])"0 _ joined)#y > ) > > On the other hand, I appreciate that your solutions are very general, > thank you for shedding the light on correct path. E.g. "P" can be > easily replaced to test for unordered subsets instead. Then another > dilemma, whether repeated part numbers are allowed on one line... > > Best regards, > Vadim > >> On Sat, Jul 6, 2019 at 5:35 AM Louis de Forcrand <ol...@bluewin.ch> wrote: >> >> Hi again, >> >> I’m back with a speedier solution. >> comaxs is a conjunction, and if P is a boolean-valued dyadic verb and L any >> monadic verb on the items of the array A, P comaxs L A does the following: >> >> P and L are supposed to represent orderings on the elements of y. P >> represents a partial order, such that x P y says whether x is less than or >> equal to y for x and y any items of A. L is a “labelling” of the items of A, >> which represents a _total_ order compatible with P, in the sense that x P y >> implies that (L x) <: L y (or more generally that L x comes first in /:~(L >> x),L y). >> >> Then the result of the computation is the set of maximal items of A for the >> partial order P, that is, items not less than any other item. >> >> As mentioned earlier, the particular problem of inclusion of part number >> lists is a special case of this; indeed we can take >> >> P=: +./@E.&> >> L=: #@> >> >> since, for part number lists (or any kind of lists) x and y, if x is >> included in y, then certainly #x is no larger than #y. >> >> To measure performance we need a big, rather random dataset. Unfortunately I >> don’t know of any place I can find huge lists of part numbers, some of which >> include others, but if we look instead at lists of letters, we have a great >> data source which is the dictionary. At the following website we can find a >> quite huge file words_alpha.txt containing lots (too many for any of my >> devices probably) of english words, many of which are included in others: >> >> https://github.com/dwyl/english-words >> >> Running freads on it allows us to try out cmaxs on a part of the English >> language. Sadly though I’m on my phone right now, and I can’t freads >> downloaded files, so either I’ll do it at a later time or someone else can >> try. >> >> The code works thanks to the previous observation used in fastmax that we >> can safely remove an element a from A that is included in another element, >> and no longer check other items for inclusion in a. The reason this verb is >> much faster than fastmax is because now we also know that if #x is greater >> than #y then we never have to check if x is included in y. (More generally, >> (L x) > L y implies -. x P y.) >> We therefore sort A by decreasing list length (L value) prior to running a >> modified version of fastmax which only checks if (i{A) P (j{A) when i > j. >> The last bit if speedup if achieved by manipulating bit arrays rather than >> the actual array A itself. >> >> Cheers, >> Louis >> >> >> cmaxs=: 2 : 0 >> n=. #B=. 0"0 C=. 1"0 y=. (\: v) y >> while. n>i=. C i.1 do. >> B=. 1 i}B >> C=. 0 i}C >> C=. -.@:u&(i{y)&.(C&#) y >> end. >> B#y >> ) >> >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm