Raul,

I  started using your getTagContents function to extract strings from text.
It worked well on some data:

  txt2 =: 'stuff1 tag1s stuff2 tag1e stuff3 tag2s stuff4 tag2e stuff5'

  txt2 getTagsContents 'tag1s'; 'tag1e';'tag2s';'tag2e'
┌────────┬────────┐
│ stuff2 │ stuff4 │
└────────┴────────┘

However, if the end tag character sequence is carriage return, line feed,
tab, instead of
printable characters, we have a problem:

   crlftb =: 13 10 9 { a.
   a. i. crlftb
13 10 9

   txt1 =: 'stuff1 tag1s stuff2', crlftb, 'stuff3 tag2s stuff4', crlftb ,
'stuff5'
   txt1
stuff1 tag1s stuff2
    stuff3 tag2s stuff4
    stuff5

 txt1 getTagsContents 'tag1s';crlftb;'tag2s';crlftb
|assertion failure: getTagsContents
|   -:&/:&;/|:locs

Why am I getting this error?

Skip


> On Mon, Nov 14, 2011 at 3:45 PM, Raul Miller <[email protected]>wrote:
>
>> Ok... I hope I am not overlooking anything here:
>>
>> getTagsContents=: 4 :0
>>  'n m'=. $tags=. > _2 <\ y
>>  locs=:  tags [email protected]:0 }. txt=.(' ',;tags),x,;tags
>>  assert. -:&/:&;/ |:locs  NB. tags must be balanced
>>  data=: _2 {:\  ((/:~ ; locs) I. i.#txt) </.  txt
>>  expand=: ;(#~ 1&e.S:0) <@|./. |.> (e.L:0~ /:~@;) {."1 locs
>>  }: }.(#@>{."1 tags) }.&.>"1 (-n) ]\ expand #inv (+/expand){.data
>> )
>>
>> If you can guarantee that tags are always balanced, you can get rid of
>> the assert statement.
>>
>> How this works:
>>
>> 1. tagged data blocks are extracted from the text (in their original
>> order)
>> 2. expand is defined to be the compression vector on the ravel of the
>> desired result, to get those blocks
>> 3. expand #inv data gets the blocks we need
>>
>> Everything else is busywork to convert between data formats and
>> representations.
>>
>> To make my work easier, I make sure that:
>>
>> a. There is always text to be discarded before the first tag
>> b. the full set of tags appear at the beginning of the text I am working
>> with
>> c. the full set of tags appear at the end of the text I am working with
>>
>> (these boundary guards are discarded from the final result).
>>
>> Note: I hope that this is readable -- for some reason gmail has recently
>> taken
>> to mutilating line-ends on plain text messages, so I do not know how I can
>> send plain text code.
>>
>> FYI,
>>
>> -
>>
>



-- 
Skip Cave
Cave Consulting LLC
Phone: 214-460-4861
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to