P.S. Brian Schott has observed that if you take out the assertion that is
checking for balanced tags that works on the crlftab example.

This is because my code also has an assumption that tags cannot overlap.

I have not thought through what this would mean for other cases that would
currently be rejected.

-- 
Raul

On Thu, Nov 17, 2011 at 9:14 AM, Raul Miller <[email protected]> wrote:

> Looking at this closer...
>
> The problem, here, is that you are using the same close tag for different
> start tags.  And this introduces ambiguities, about what you really mean.
>
> I think, from looking at your test case, that in ambiguous cases you
> intend for the shortest possible match to be used.  Using this rule, it's
> possible to determine which start tag is matched by which end tag.  (And
> this ties into how I built that code: it does not support overlapping tags.)
>
> Also, I think I would assume that you are only planning on allowing for
> end-tags to be re-used and not start tags, and that should also simplify
> the task of matching the start tags with the end tags.
>
> (If the tags are all matched properly then the assertion will not fail.)
>
> FYI,
>
> --
> Raul
>
> On Thu, Nov 17, 2011 at 8:04 AM, Raul Miller <[email protected]>wrote:
>
>> Tags must be balanced because we did not define a rule for unbalanced
>> tags.
>>
>> --
>> Raul
>>
>>
>> On Thursday, November 17, 2011, Skip Cave <[email protected]>
>> wrote:
>> > Raul,
>> >
>> > I  started using your getTagContents function to extract strings from
>> text.
>> > It worked well on some data:
>> >
>> >  txt2 =: 'stuff1 tag1s stuff2 tag1e stuff3 tag2s stuff4 tag2e stuff5'
>> >
>> >  txt2 getTagsContents 'tag1s'; 'tag1e';'tag2s';'tag2e'
>> > ┌────────┬────────┐
>> > │ stuff2 │ stuff4 │
>> > └────────┴────────┘
>> >
>> > However, if the end tag character sequence is carriage return, line
>> feed,
>> > tab, instead of
>> > printable characters, we have a problem:
>> >
>> >   crlftb =: 13 10 9 { a.
>> >   a. i. crlftb
>> > 13 10 9
>> >
>> >   txt1 =: 'stuff1 tag1s stuff2', crlftb, 'stuff3 tag2s stuff4', crlftb ,
>> > 'stuff5'
>> >   txt1
>> > stuff1 tag1s stuff2
>> >    stuff3 tag2s stuff4
>> >    stuff5
>> >
>> >  txt1 getTagsContents 'tag1s';crlftb;'tag2s';crlftb
>> > |assertion failure: getTagsContents
>> > |   -:&/:&;/|:locs
>> >
>> > Why am I getting this error?
>> >
>> > Skip
>> >
>> >
>> >> On Mon, Nov 14, 2011 at 3:45 PM, Raul Miller <[email protected]
>> >wrote:
>> >>
>> >>> Ok... I hope I am not overlooking anything here:
>> >>>
>> >>> getTagsContents=: 4 :0
>> >>>  'n m'=. $tags=. > _2 <\ y
>> >>>  locs=:  tags [email protected]:0 }. txt=.(' ',;tags),x,;tags
>> >>>  assert. -:&/:&;/ |:locs  NB. tags must be balanced
>> >>>  data=: _2 {:\  ((/:~ ; locs) I. i.#txt) </.  txt
>> >>>  expand=: ;(#~ 1&e.S:0) <@|./. |.> (e.L:0~ /:~@;) {."1 locs
>> >>>  }: }.(#@>{."1 tags) }.&.>"1 (-n) ]\ expand #inv (+/expand){.data
>> >>> )
>> >>>
>> >>> If you can guarantee that tags are always balanced, you can get rid of
>> >>> the assert statement.
>> >>>
>> >>> How this works:
>> >>>
>> >>> 1. tagged data blocks are extracted from the text (in their original
>> >>> order)
>> >>> 2. expand is defined to be the compression vector on the ravel of the
>> >>> desired result, to get those blocks
>> >>> 3. expand #inv data gets the blocks we need
>> >>>
>> >>> Everything else is busywork to convert between data formats and
>> >>> representations.
>> >>>
>> >>> To make my work easier, I make sure that:
>> >>>
>> >>> a. There is always text to be discarded before the first tag
>> >>> b. the full set of tags appear at the beginning of the text I am
>> working
>> >>> with
>> >>> c. the full set of tags appear at the end of the text I am working
>> with
>> >>>
>> >>> (these boundary guards are discarded from the final result).
>> >>>
>> >>> Note: I hope that this is readable -- for some reason gmail has
>> recently
>> >>> taken
>> >>> to mutilating line-ends on plain text messages, so I do not know how
>> I can
>> >>> send plain text code.
>> >>>
>> >>> FYI,
>> >>>
>> >>> -
>> >>>
>> >>
>> >
>> >
>> >
>> > --
>> > Skip Cave
>> > Cave Consulting LLC
>> > Phone: 214-460-4861
>> > ----------------------------------------------------------------------
>> > For information about J forums see http://www.jsoftware.com/forums.htm
>>
>
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to