Tags must be balanced because we did not define a rule for unbalanced tags.
-- Raul On Thursday, November 17, 2011, Skip Cave <[email protected]> wrote: > Raul, > > I started using your getTagContents function to extract strings from text. > It worked well on some data: > > txt2 =: 'stuff1 tag1s stuff2 tag1e stuff3 tag2s stuff4 tag2e stuff5' > > txt2 getTagsContents 'tag1s'; 'tag1e';'tag2s';'tag2e' > ┌────────┬────────┐ > │ stuff2 │ stuff4 │ > └────────┴────────┘ > > However, if the end tag character sequence is carriage return, line feed, > tab, instead of > printable characters, we have a problem: > > crlftb =: 13 10 9 { a. > a. i. crlftb > 13 10 9 > > txt1 =: 'stuff1 tag1s stuff2', crlftb, 'stuff3 tag2s stuff4', crlftb , > 'stuff5' > txt1 > stuff1 tag1s stuff2 > stuff3 tag2s stuff4 > stuff5 > > txt1 getTagsContents 'tag1s';crlftb;'tag2s';crlftb > |assertion failure: getTagsContents > | -:&/:&;/|:locs > > Why am I getting this error? > > Skip > > >> On Mon, Nov 14, 2011 at 3:45 PM, Raul Miller <[email protected] >wrote: >> >>> Ok... I hope I am not overlooking anything here: >>> >>> getTagsContents=: 4 :0 >>> 'n m'=. $tags=. > _2 <\ y >>> locs=: tags [email protected]:0 }. txt=.(' ',;tags),x,;tags >>> assert. -:&/:&;/ |:locs NB. tags must be balanced >>> data=: _2 {:\ ((/:~ ; locs) I. i.#txt) </. txt >>> expand=: ;(#~ 1&e.S:0) <@|./. |.> (e.L:0~ /:~@;) {."1 locs >>> }: }.(#@>{."1 tags) }.&.>"1 (-n) ]\ expand #inv (+/expand){.data >>> ) >>> >>> If you can guarantee that tags are always balanced, you can get rid of >>> the assert statement. >>> >>> How this works: >>> >>> 1. tagged data blocks are extracted from the text (in their original >>> order) >>> 2. expand is defined to be the compression vector on the ravel of the >>> desired result, to get those blocks >>> 3. expand #inv data gets the blocks we need >>> >>> Everything else is busywork to convert between data formats and >>> representations. >>> >>> To make my work easier, I make sure that: >>> >>> a. There is always text to be discarded before the first tag >>> b. the full set of tags appear at the beginning of the text I am working >>> with >>> c. the full set of tags appear at the end of the text I am working with >>> >>> (these boundary guards are discarded from the final result). >>> >>> Note: I hope that this is readable -- for some reason gmail has recently >>> taken >>> to mutilating line-ends on plain text messages, so I do not know how I can >>> send plain text code. >>> >>> FYI, >>> >>> - >>> >> > > > > -- > Skip Cave > Cave Consulting LLC > Phone: 214-460-4861 > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
