it does not appear that working with lists of crcs is any faster
instead of words... ie dictionary LF data on clipboard:
w =. 1 2 3 4 5 6 7 8 9 10 <@(] #~ [ = #@:] every)"0 _ (;: 'a i o'), cutLF
wdclippaste ''
above just keeps 10 letter words or less, and places them in 10 boxes
10 boxes of lists of CRCs (for each word size)
wn =. /:~@:(128!:3 every) each w
a is same as previous quoted message
> (]#~0.8<[:(+/%#)every(((wn{::~ :: 0: <:@#)e.~128!:3)every@:;:)each) a
takes over 12 seconds. This was faster than 1 big list of CRCs
----- Original Message -----
From: 'Pascal Jasmin' via Programming <[email protected]>
To: Programming Forum <[email protected]>
Cc:
Sent: Friday, July 3, 2015 9:26 PM
Subject: [Jprogramming] symbol speed
A cool use of symbols, not obvious to me until today, is a word dictionary used
to test if some other input is in dictionary or not.
using this list on clipboard
https://gist.githubusercontent.com/Quackmatic/512736d51d84277594f2/raw/words
words =: s: (;: 'a i o'), cutLF wdclippaste '' NB.(adding 1 letter words)
here is a list of gibberish sentences that contain 3 real sentences
https://gist.githubusercontent.com/anonymous/c8fb349e9ae4fcb40cb5/raw/05a1ef03626057e1b57b5bbdddc4c2373ce4b465/challenge.txt
with that new list on the clipboard
a =: (<', . ? ! : ; ') rplc~ each cutLF wdclippaste ''
the following is not terrible (3 seconds or so). filters lines where 80% of
words are in dictionary.
> (] #~ 0.8 < [: (+/%#) every (words e.~ s:@:;: )each) a
Is there a way to make it faster?
I was thinking that a sparse array 2d, where first index is the letter count
and 2nd index is crc32 and the element value is 1 if the value exists in
dictionary and 0 otherwise could be ok.
even simpler would be a list of valid crc32 values since 60k/4B is going to
have a very low false positive rate well suited to testing to an 80% threshold.
Would those approaches be faster?
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm