Out of curiosity, I'm getting different value for the example listed under 128!:3. Shouldn't it be the same as listed on the page?
Below is from my J session f '123456789' _873187034 f 'assiduously avoid any and all asinine alliterations' NB. Different from the listed example _2855392203 JVERSION Engine: j803/2014-10-19-11:11:11 Library: 8.04.06 Qt IDE: 1.4.3/5.4.2 Platform: Win 64 Installer: J804 install InstallPath: h:/utilities/j64-804 On Tue, Jul 21, 2015 at 11:48 AM, Raul Miller <[email protected]> wrote: > You can't have an inverse crc, because crc is a lossy transformation. > You are basically relying on statistics to avoid collisions (different > strings with the same crc). > > So actual use would look something like: > > step one: get the distinct crcs which are in use. > > step two: go over the data again and for each string find its crc, and > check that some other relevant string isn't producing the same crc. > (If there are, you'll need further work to untangle them.) > > -- > Raul > > On Tue, Jul 21, 2015 at 10:34 AM, Mike Day <[email protected]> wrote: >> That's neat, but it's a bit messy retrieving the actual >> substrings rather than their encoded forms. >> >> This does it, >> 10(]{~i.@[+/~((I.@:(1<#/.~))@:( (128!:3)\ ]))) s >> >> AAAAACCCCC >> >> CCCCCAAAAA >> >> >> but it would be much better with an inverse CRC; >> however that doesn't seem to be supported in J. >> >> >> Is there a maximum window size for this approach? >> >> Thanks, >> >> Mike >> >> >> On 21/07/2015 14:37, Henry Rich wrote: >>> >>> For longer subsequences consider using >>> >>> (10 (128!:3)\ ]) >>> >>> to reduce the size of the intermediate array. >>> >>> Henry Rich >>> >>> On 7/21/2015 12:49 AM, Vijay Lulla wrote: >>>> >>>> Using slightly less space >>>> >>>> (~. #~ 1 < #/.~)@(10 ]\ ]) s >>>> >>>> On Mon, Jul 20, 2015 at 11:59 PM, Tikkanz <[email protected]> wrote: >>>>> >>>>> (i.~ ~: i:~) will find duplicates so how about: >>>>> >>>>> ~.@(#~ i.~ ~: i:~)@(10 ]\ ]) s >>>>> >>>>> AAAAACCCCC >>>>> >>>>> CCCCCAAAAA >>>>> >>>>> >>>>> >>>>> On Tue, Jul 21, 2015 at 3:51 PM, Jon Hough <[email protected]> wrote: >>>>> >>>>>> This is a problem from leetcode.com (similar to Project Euler) >>>>>> https://leetcode.com/problems/repeated-dna-sequences/ >>>>>> The problem is to find all 10 letter repeated subsequences from a DNA >>>>>> string (made of C,G,A,T characters). >>>>>> My solution: >>>>>> func =: (I.@:(1&<)@:>@:(1&{)@:(~. ,: <"0@:(#/.~)) { >>>>>> ])@:(<"1@:(10&(]\))) >>>>>> e.g. s =: 'AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT' NB. see the link for this >>>>>> definition >>>>>> func s >>>>>> ┌──────────┬──────────┐ >>>>>> >>>>>> │AAAAACCCCC│CCCCCAAAAA│ >>>>>> >>>>>> └──────────┴──────────┘ >>>>>> >>>>>> >>>>>> >>>>>> It is not very pretty. Can anyone improve on it? >> >> >> >> --- >> This email has been checked for viruses by Avast antivirus software. >> https://www.avast.com/antivirus >> >> >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
