thank you Henry, changing to:
(] #~ 0.8 < [: (+/%#) every ([: e.&words s:@:;: )each) a from: (] #~ 0.8 < [: (+/%#) every (words e.~ s:@:;: )each) a brings the timing down to a split second. I was hoping for such a speedup when looking through http://www.jsoftware.com/help/dictionary/special.htm, (could not find there) but was also (incorrectly) assuming than any speedup from n&e. would also exist with (n e. f). roughly the same (actually slightly faster) speedup exists without symbols (using straight boxed data), except it "breaks" (speed) if I assign e.&boxeddata to a name (doesn't break with named e.&symboldata) Bo, all the code is there but relies on public data files that I linked to. wdclippaste'' interacts with whatever is on the clipboard, and each line that uses it, has instructions prior to it to what to copy on clipboard. May require J7 or J8 ----- Original Message ----- From: "[email protected]" <[email protected]> To: [email protected] Cc: Sent: Saturday, July 4, 2015 3:04 AM Subject: Re: [Jprogramming] symbol speed Look at http://www.jsoftware.com/jwiki/Vocabulary/SpecialCombinations#Searching_and_Matching_Items:_Precomputed_searches You should be producing a search verb srch =. e.&wordlist to get wordlist indexed once. Alternative: crc, then sort and use I. Henry Rich ---- 'Pascal Jasmin' via Programming <[email protected]> wrote: > it does not appear that working with lists of crcs is any faster > > instead of words... ie dictionary LF data on clipboard: > > w =. 1 2 3 4 5 6 7 8 9 10 <@(] #~ [ = #@:] every)"0 _ (;: 'a i o'), cutLF > wdclippaste '' > > above just keeps 10 letter words or less, and places them in 10 boxes > > 10 boxes of lists of CRCs (for each word size) > > wn =. /:~@:(128!:3 every) each w > > a is same as previous quoted message > > > (]#~0.8<[:(+/%#)every(((wn{::~ :: 0: <:@#)e.~128!:3)every@:;:)each) a > > takes over 12 seconds. This was faster than 1 big list of CRCs > > > ----- Original Message ----- > From: 'Pascal Jasmin' via Programming <[email protected]> > To: Programming Forum <[email protected]> > Cc: > Sent: Friday, July 3, 2015 9:26 PM > Subject: [Jprogramming] symbol speed > > A cool use of symbols, not obvious to me until today, is a word dictionary > used to test if some other input is in dictionary or not. > > using this list on clipboard > > https://gist.githubusercontent.com/Quackmatic/512736d51d84277594f2/raw/words > > > > words =: s: (;: 'a i o'), cutLF wdclippaste '' NB.(adding 1 letter words) > > here is a list of gibberish sentences that contain 3 real sentences > > https://gist.githubusercontent.com/anonymous/c8fb349e9ae4fcb40cb5/raw/05a1ef03626057e1b57b5bbdddc4c2373ce4b465/challenge.txt > > with that new list on the clipboard > > a =: (<', . ? ! : ; ') rplc~ each cutLF wdclippaste '' > > the following is not terrible (3 seconds or so). filters lines where 80% of > words are in dictionary. > > > (] #~ 0.8 < [: (+/%#) every (words e.~ s:@:;: )each) a > > Is there a way to make it faster? > > I was thinking that a sparse array 2d, where first index is the letter count > and 2nd index is crc32 and the element value is 1 if the value exists in > dictionary and 0 otherwise could be ok. > > even simpler would be a list of valid crc32 values since 60k/4B is going to > have a very low false positive rate well suited to testing to an 80% > threshold. > > Would those approaches be faster? > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
