Look at

http://www.jsoftware.com/jwiki/Vocabulary/SpecialCombinations#Searching_and_Matching_Items:_Precomputed_searches

You should be producing a search verb

srch =. e.&wordlist

to get wordlist indexed once.


Alternative: crc, then sort and use I.


Henry Rich

---- 'Pascal Jasmin' via Programming <[email protected]> wrote: 
> it does not appear that working with lists of crcs is any faster
> 
> instead of words... ie dictionary LF data on clipboard:
> 
>  w =. 1 2 3 4 5 6 7 8 9 10 <@(] #~ [ = #@:] every)"0 _ (;: 'a i o'),  cutLF 
> wdclippaste ''
> 
> above just keeps 10 letter words or less, and places them in 10 boxes
> 
> 10 boxes of lists of CRCs (for each word size)
> 
> wn =. /:~@:(128!:3 every) each w
> 
> a is same as previous quoted message
> 
> >  (]#~0.8<[:(+/%#)every(((wn{::~ :: 0: <:@#)e.~128!:3)every@:;:)each) a
> 
> takes over 12 seconds.  This was faster than 1 big list of CRCs
> 
> 
> ----- Original Message -----
> From: 'Pascal Jasmin' via Programming <[email protected]>
> To: Programming Forum <[email protected]>
> Cc: 
> Sent: Friday, July 3, 2015 9:26 PM
> Subject: [Jprogramming] symbol speed
> 
> A cool use of symbols, not obvious to me until today, is a word dictionary 
> used to test if some other input is in dictionary or not.
> 
> using this list on clipboard
> 
> https://gist.githubusercontent.com/Quackmatic/512736d51d84277594f2/raw/words
> 
> 
> 
> words =: s: (;: 'a i o'),  cutLF wdclippaste ''  NB.(adding 1 letter words)
> 
> here is a list of gibberish sentences that contain 3 real sentences
> 
> https://gist.githubusercontent.com/anonymous/c8fb349e9ae4fcb40cb5/raw/05a1ef03626057e1b57b5bbdddc4c2373ce4b465/challenge.txt
> 
> with that new list on the clipboard
> 
> a =: (<', . ? ! : ; ') rplc~ each cutLF wdclippaste ''
> 
> the following is not terrible (3 seconds or so).  filters lines where 80% of 
> words are in dictionary.
> 
> > (] #~ 0.8 < [: (+/%#) every (words e.~ s:@:;: )each)  a
> 
> Is there a way to make it faster?
> 
> I was thinking that a sparse array 2d, where first index is the letter count 
> and 2nd index is crc32 and the element value is 1 if the value exists in 
> dictionary and 0 otherwise could be ok.
> 
> even simpler would be a list of valid crc32 values since 60k/4B is going to 
> have a very low false positive rate well suited to testing to an 80% 
> threshold.
> 
> Would those approaches be faster?
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to