Tags must be balanced because we did not define a rule for unbalanced tags.

-- 
Raul

On Thursday, November 17, 2011, Skip Cave <[email protected]> wrote:
> Raul,
>
> I  started using your getTagContents function to extract strings from
text.
> It worked well on some data:
>
>  txt2 =: 'stuff1 tag1s stuff2 tag1e stuff3 tag2s stuff4 tag2e stuff5'
>
>  txt2 getTagsContents 'tag1s'; 'tag1e';'tag2s';'tag2e'
> ┌────────┬────────┐
> │ stuff2 │ stuff4 │
> └────────┴────────┘
>
> However, if the end tag character sequence is carriage return, line feed,
> tab, instead of
> printable characters, we have a problem:
>
>   crlftb =: 13 10 9 { a.
>   a. i. crlftb
> 13 10 9
>
>   txt1 =: 'stuff1 tag1s stuff2', crlftb, 'stuff3 tag2s stuff4', crlftb ,
> 'stuff5'
>   txt1
> stuff1 tag1s stuff2
>    stuff3 tag2s stuff4
>    stuff5
>
>  txt1 getTagsContents 'tag1s';crlftb;'tag2s';crlftb
> |assertion failure: getTagsContents
> |   -:&/:&;/|:locs
>
> Why am I getting this error?
>
> Skip
>
>
>> On Mon, Nov 14, 2011 at 3:45 PM, Raul Miller <[email protected]
>wrote:
>>
>>> Ok... I hope I am not overlooking anything here:
>>>
>>> getTagsContents=: 4 :0
>>>  'n m'=. $tags=. > _2 <\ y
>>>  locs=:  tags [email protected]:0 }. txt=.(' ',;tags),x,;tags
>>>  assert. -:&/:&;/ |:locs  NB. tags must be balanced
>>>  data=: _2 {:\  ((/:~ ; locs) I. i.#txt) </.  txt
>>>  expand=: ;(#~ 1&e.S:0) <@|./. |.> (e.L:0~ /:~@;) {."1 locs
>>>  }: }.(#@>{."1 tags) }.&.>"1 (-n) ]\ expand #inv (+/expand){.data
>>> )
>>>
>>> If you can guarantee that tags are always balanced, you can get rid of
>>> the assert statement.
>>>
>>> How this works:
>>>
>>> 1. tagged data blocks are extracted from the text (in their original
>>> order)
>>> 2. expand is defined to be the compression vector on the ravel of the
>>> desired result, to get those blocks
>>> 3. expand #inv data gets the blocks we need
>>>
>>> Everything else is busywork to convert between data formats and
>>> representations.
>>>
>>> To make my work easier, I make sure that:
>>>
>>> a. There is always text to be discarded before the first tag
>>> b. the full set of tags appear at the beginning of the text I am working
>>> with
>>> c. the full set of tags appear at the end of the text I am working with
>>>
>>> (these boundary guards are discarded from the final result).
>>>
>>> Note: I hope that this is readable -- for some reason gmail has recently
>>> taken
>>> to mutilating line-ends on plain text messages, so I do not know how I
can
>>> send plain text code.
>>>
>>> FYI,
>>>
>>> -
>>>
>>
>
>
>
> --
> Skip Cave
> Cave Consulting LLC
> Phone: 214-460-4861
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to