Re: [Jprogramming] Substring sequences of a string

Vijay Lulla Tue, 21 Jul 2015 12:06:33 -0700

Out of curiosity, I'm getting different value for the example listed
under 128!:3.  Shouldn't it be the same as listed on the page?


Below is from my J session

   f '123456789'
_873187034
   f 'assiduously avoid any and all asinine alliterations'  NB.
Different from the listed example
_2855392203
   JVERSION
Engine: j803/2014-10-19-11:11:11
Library: 8.04.06
Qt IDE: 1.4.3/5.4.2
Platform: Win 64
Installer: J804 install
InstallPath: h:/utilities/j64-804


On Tue, Jul 21, 2015 at 11:48 AM, Raul Miller <[email protected]> wrote:
> You can't have an inverse crc, because crc is a lossy transformation.
> You are basically relying on statistics to avoid collisions (different
> strings with the same crc).
>
> So actual use would look something like:
>
> step one: get the distinct crcs which are in use.
>
> step two: go over the data again and for each string find its crc, and
> check that some other relevant string isn't producing the same crc.
> (If there are, you'll need further work to untangle them.)
>
> --
> Raul
>
> On Tue, Jul 21, 2015 at 10:34 AM, Mike Day <[email protected]> wrote:
>> That's neat,  but it's a bit messy retrieving the actual
>> substrings rather than their encoded forms.
>>
>> This does it,
>>    10(]{~i.@[+/~((I.@:(1<#/.~))@:( (128!:3)\ ]))) s
>>
>> AAAAACCCCC
>>
>> CCCCCAAAAA
>>
>>
>> but it would be much better with an inverse CRC;
>> however that doesn't seem to be supported in J.
>>
>>
>> Is there a maximum window size for this approach?
>>
>> Thanks,
>>
>> Mike
>>
>>
>> On 21/07/2015 14:37, Henry Rich wrote:
>>>
>>> For longer subsequences consider using
>>>
>>> (10 (128!:3)\ ])
>>>
>>> to reduce the size of the intermediate array.
>>>
>>> Henry Rich
>>>
>>> On 7/21/2015 12:49 AM, Vijay Lulla wrote:
>>>>
>>>> Using slightly less space
>>>>
>>>> (~. #~ 1 < #/.~)@(10 ]\ ]) s
>>>>
>>>> On Mon, Jul 20, 2015 at 11:59 PM, Tikkanz <[email protected]> wrote:
>>>>>
>>>>> (i.~ ~: i:~) will find duplicates so how about:
>>>>>
>>>>>      ~.@(#~ i.~ ~: i:~)@(10 ]\ ]) s
>>>>>
>>>>> AAAAACCCCC
>>>>>
>>>>> CCCCCAAAAA
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Jul 21, 2015 at 3:51 PM, Jon Hough <[email protected]> wrote:
>>>>>
>>>>>> This is a problem from leetcode.com (similar to Project Euler)
>>>>>> https://leetcode.com/problems/repeated-dna-sequences/
>>>>>> The problem is to find all 10 letter repeated subsequences from a DNA
>>>>>> string (made of C,G,A,T characters).
>>>>>> My solution:
>>>>>> func =: (I.@:(1&<)@:>@:(1&{)@:(~. ,: <"0@:(#/.~)) {
>>>>>> ])@:(<"1@:(10&(]\)))
>>>>>> e.g. s =: 'AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT' NB. see the link for this
>>>>>> definition
>>>>>> func s
>>>>>> ┌──────────┬──────────┐
>>>>>>
>>>>>> │AAAAACCCCC│CCCCCAAAAA│
>>>>>>
>>>>>> └──────────┴──────────┘
>>>>>>
>>>>>>
>>>>>>
>>>>>> It is not very pretty. Can anyone improve on it?
>>
>>
>>
>> ---
>> This email has been checked for viruses by Avast antivirus software.
>> https://www.avast.com/antivirus
>>
>>
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Substring sequences of a string

Reply via email to