You can't have an inverse crc, because crc is a lossy transformation.
You are basically relying on statistics to avoid collisions (different
strings with the same crc).

So actual use would look something like:

step one: get the distinct crcs which are in use.

step two: go over the data again and for each string find its crc, and
check that some other relevant string isn't producing the same crc.
(If there are, you'll need further work to untangle them.)

-- 
Raul

On Tue, Jul 21, 2015 at 10:34 AM, Mike Day <[email protected]> wrote:
> That's neat,  but it's a bit messy retrieving the actual
> substrings rather than their encoded forms.
>
> This does it,
>    10(]{~i.@[+/~((I.@:(1<#/.~))@:( (128!:3)\ ]))) s
>
> AAAAACCCCC
>
> CCCCCAAAAA
>
>
> but it would be much better with an inverse CRC;
> however that doesn't seem to be supported in J.
>
>
> Is there a maximum window size for this approach?
>
> Thanks,
>
> Mike
>
>
> On 21/07/2015 14:37, Henry Rich wrote:
>>
>> For longer subsequences consider using
>>
>> (10 (128!:3)\ ])
>>
>> to reduce the size of the intermediate array.
>>
>> Henry Rich
>>
>> On 7/21/2015 12:49 AM, Vijay Lulla wrote:
>>>
>>> Using slightly less space
>>>
>>> (~. #~ 1 < #/.~)@(10 ]\ ]) s
>>>
>>> On Mon, Jul 20, 2015 at 11:59 PM, Tikkanz <[email protected]> wrote:
>>>>
>>>> (i.~ ~: i:~) will find duplicates so how about:
>>>>
>>>>      ~.@(#~ i.~ ~: i:~)@(10 ]\ ]) s
>>>>
>>>> AAAAACCCCC
>>>>
>>>> CCCCCAAAAA
>>>>
>>>>
>>>>
>>>> On Tue, Jul 21, 2015 at 3:51 PM, Jon Hough <[email protected]> wrote:
>>>>
>>>>> This is a problem from leetcode.com (similar to Project Euler)
>>>>> https://leetcode.com/problems/repeated-dna-sequences/
>>>>> The problem is to find all 10 letter repeated subsequences from a DNA
>>>>> string (made of C,G,A,T characters).
>>>>> My solution:
>>>>> func =: (I.@:(1&<)@:>@:(1&{)@:(~. ,: <"0@:(#/.~)) {
>>>>> ])@:(<"1@:(10&(]\)))
>>>>> e.g. s =: 'AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT' NB. see the link for this
>>>>> definition
>>>>> func s
>>>>> ┌──────────┬──────────┐
>>>>>
>>>>> │AAAAACCCCC│CCCCCAAAAA│
>>>>>
>>>>> └──────────┴──────────┘
>>>>>
>>>>>
>>>>>
>>>>> It is not very pretty. Can anyone improve on it?
>
>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to