Raul,
I started using your getTagContents function to extract strings from text.
It worked well on some data:
txt2 =: 'stuff1 tag1s stuff2 tag1e stuff3 tag2s stuff4 tag2e stuff5'
txt2 getTagsContents 'tag1s'; 'tag1e';'tag2s';'tag2e'
┌────────┬────────┐
│ stuff2 │ stuff4 │
└────────┴────────┘
However, if the end tag character sequence is carriage return, line feed,
tab, instead of
printable characters, we have a problem:
crlftb =: 13 10 9 { a.
a. i. crlftb
13 10 9
txt1 =: 'stuff1 tag1s stuff2', crlftb, 'stuff3 tag2s stuff4', crlftb ,
'stuff5'
txt1
stuff1 tag1s stuff2
stuff3 tag2s stuff4
stuff5
txt1 getTagsContents 'tag1s';crlftb;'tag2s';crlftb
|assertion failure: getTagsContents
| -:&/:&;/|:locs
Why am I getting this error?
Skip
> On Mon, Nov 14, 2011 at 3:45 PM, Raul Miller <[email protected]>wrote:
>
>> Ok... I hope I am not overlooking anything here:
>>
>> getTagsContents=: 4 :0
>> 'n m'=. $tags=. > _2 <\ y
>> locs=: tags [email protected]:0 }. txt=.(' ',;tags),x,;tags
>> assert. -:&/:&;/ |:locs NB. tags must be balanced
>> data=: _2 {:\ ((/:~ ; locs) I. i.#txt) </. txt
>> expand=: ;(#~ 1&e.S:0) <@|./. |.> (e.L:0~ /:~@;) {."1 locs
>> }: }.(#@>{."1 tags) }.&.>"1 (-n) ]\ expand #inv (+/expand){.data
>> )
>>
>> If you can guarantee that tags are always balanced, you can get rid of
>> the assert statement.
>>
>> How this works:
>>
>> 1. tagged data blocks are extracted from the text (in their original
>> order)
>> 2. expand is defined to be the compression vector on the ravel of the
>> desired result, to get those blocks
>> 3. expand #inv data gets the blocks we need
>>
>> Everything else is busywork to convert between data formats and
>> representations.
>>
>> To make my work easier, I make sure that:
>>
>> a. There is always text to be discarded before the first tag
>> b. the full set of tags appear at the beginning of the text I am working
>> with
>> c. the full set of tags appear at the end of the text I am working with
>>
>> (these boundary guards are discarded from the final result).
>>
>> Note: I hope that this is readable -- for some reason gmail has recently
>> taken
>> to mutilating line-ends on plain text messages, so I do not know how I can
>> send plain text code.
>>
>> FYI,
>>
>> -
>>
>
--
Skip Cave
Cave Consulting LLC
Phone: 214-460-4861
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm