Sounds like an exercise in early xml 2011/11/17 Skip Cave <[email protected]>
> I think the problem we are having is that the closing crlftb tag string > appears in many other places in the file, besides as a closing tag for the > opening tags. There are many more crlftb strings in the text than there are > opening tag strings. > > So the correct statement is that the opening tags are unique, and will > always start a required text string. Closing tags are not necessarily > unique, and will close the required strings, as well as appear in other > places in the file, which can be ignored. Only the first closing tag string > that appears in the text following a unique opening tag is valid as the > terminating tag for the text to be extracted. > > The function should find the opening tag, and then capture all of the text > up to the first occurrence of the crlftb closing tag. It should ignore all > subsequent crlftb tags until after it finds a unique opening tag, then > again capture all of the text up to the first tag end string, which in this > case will again be the crlftb string. > > It looks like the problem is in this line of the function: > locs=: tags [email protected]:0 }. txt=.(' ',;tags),x,;tags > > But I haven't gotten my head around all that it is doing as yet. > > Here's the whole function, with the assert line commented out. > > getTagsContents=: 4 :0 > 'n m'=. $tags=. > _2 <\ y > locs=: tags [email protected]:0 }. txt=.(' ',;tags),x,;tags > NB. assert. -:&/:&;/ |:locs NB. tags must be balanced > data=: _2 {:\ ((/:~ ; locs) I. i.#txt) </. txt > expand=: ;(#~ 1&e.S:0) <@|./. |.> (e.L:0~ /:~@;) {."1 locs > }: }.(#@>{."1 tags) }.&.>"1 (-n) ]\ expand #inv (+/expand){.data > ) > > Skip > > On Thu, Nov 17, 2011 at 1:33 PM, Skip Cave <[email protected]> > wrote: > > > Yes. As I said in my previous post, the assert statement has been > > commented out. It would throw an error, if the assert wasn't commented > out. > > On Nov 17, 2011 12:00 PM, "Raul Miller" <[email protected]> wrote: > > > >> Did you try just removing the assert? > >> > >> Thanks, > >> > >> -- > >> Raul > >> > >> On Thu, Nov 17, 2011 at 11:24 AM, Skip Cave <[email protected] > >wrote: > >> > >>> I stated that wrong. > >>> > >>> ( ww2) getTagsContents 'STATUS';crlf;'RESULT[0]';crlf > >>> ┌┬──────────────────────────────────────────────────────────┐ > >>> ││_HOSTNAME = Unknown <connected via resource mgr>│ > >>> └┴──────────────────────────────────────────────────────────┘ > >>> > >>> It doesn't find the first tag pair, and for some reason, it captures > >>> *part > >>> of* the string *following* the first tag pair. > >>> > >>> Skip > >>> > >>> On Thu, Nov 17, 2011 at 10:08 AM, Skip Cave <[email protected]> > >>> wrote: > >>> > >>> > Raul, > >>> > > >>> > In my application, the tag pairs will never overlap. Also, the > leading > >>> tag > >>> > of a particular string will always be unique. However, it is handy if > >>> I can > >>> > define just the trailing tags of any tag pair to be all the same > >>> string. > >>> > This won't always be the case, as sometimes the closing tag may be > >>> unique, > >>> > so either case should work. Here's an example of some real data in my > >>> text > >>> > log file. I just pulled a section of the text out of the middle of > the > >>> log: > >>> > > >>> > ww2 > >>> > > >>> > PROMPT_DURATION = 1.968 > >>> > STATUS = RECOGNITION > >>> > SERVER_HOSTNAME = Unknown <connected via resource mgr> > >>> > NUM_RESULTS = 1 > >>> > RESULT[0] = dtmf-9 dtmf-0 dtmf-4 dtmf-7 dtmf-2 > >>> > CONFIDENCE[0] > >>> > > >>> > Let's look at the data: > >>> > $ ww2 > >>> > 260 > >>> > q: 260 > >>> > 2 2 5 13 > >>> > > >>> > So 2*2*5 = 20 and ww2 will fit in a 20 x 13 array > >>> > > >>> > 13 20 $ a. i. ww2 > >>> > 10 9 80 82 79 77 80 84 95 68 85 82 65 84 73 79 78 > 32 > >>> > 32 32 > >>> > 32 32 32 32 32 32 32 32 61 32 49 46 57 54 56 13 10 > 9 > >>> > 83 84 > >>> > 65 84 85 83 32 32 32 32 32 32 32 32 32 32 32 32 32 > 32 > >>> > 32 32 > >>> > 32 32 32 32 61 32 82 69 67 79 71 78 73 84 73 79 78 > 13 > >>> > 10 9 > >>> > 83 69 82 86 69 82 95 72 79 83 84 78 65 77 69 32 32 > 32 > >>> > 32 32 > >>> > 32 32 32 32 32 32 61 32 85 110 107 110 111 119 110 32 60 > 99 > >>> 111 > >>> > 110 > >>> > 110 101 99 116 101 100 32 118 105 97 32 114 101 115 111 117 114 > 99 > >>> > 101 32 > >>> > 109 103 114 62 13 10 9 78 85 77 95 82 69 83 85 76 84 > 83 > >>> > 32 32 > >>> > 32 32 32 32 32 32 32 32 32 32 32 32 32 61 32 49 13 > 10 > >>> > 9 82 > >>> > 69 83 85 76 84 91 48 93 32 32 32 32 32 32 32 32 32 > 32 > >>> > 32 32 > >>> > 32 32 32 32 32 61 32 100 116 109 102 45 57 32 100 116 109 > 102 > >>> > 45 48 > >>> > 32 100 116 109 102 45 52 32 100 116 109 102 45 55 32 100 116 > 109 > >>> > 102 45 > >>> > 50 13 10 9 67 79 78 70 73 68 69 78 67 69 91 48 93 > 32 > >>> > 32 32 > >>> > > >>> > You can see that each line is terminated with a 13, 10, 9 character > >>> string > >>> > (CR, LF, TAB) > >>> > > >>> > we check: > >>> > > >>> > a. i. crlftb > >>> > 13 10 9 > >>> > > >>> > Also the crlftb noun contains the three characters that terminate > each > >>> > line. > >>> > > >>> > I want to capture the row starting with 'STATUS' and the row starting > >>> with > >>> > 'RESULT[0]', > >>> > Both rows terminate with the carriage return, line feed, tab > sequence. > >>> > I have commented the assert. statement out of your getTagsContent, so > >>> I no > >>> > longer get the error: > >>> > > >>> > Now I run the function: > >>> > > >>> > ww2 getTagsContents 'STATUS';crlf;'RESULT[0]';crlf > >>> > ┌┬──────────────────────────────────────────────────────────┐ > >>> > ││_HOSTNAME = Unknown <connected via resource mgr>│ > >>> > └┴──────────────────────────────────────────────────────────┘ > >>> > > >>> > Weird. I get the line AFTER the one I want, and it completely misses > >>> the > >>> > second tag pair. > >>> > Any ideas what is going on? > >>> > > >>> > Skip > >>> > > >>> > > >>> > > >>> > On Thu, Nov 17, 2011 at 8:17 AM, Raul Miller <[email protected] > >>> >wrote: > >>> > > >>> >> P.S. Brian Schott has observed that if you take out the assertion > >>> that is > >>> >> checking for balanced tags that works on the crlftab example. > >>> >> > >>> >> This is because my code also has an assumption that tags cannot > >>> overlap. > >>> >> > >>> >> I have not thought through what this would mean for other cases that > >>> would > >>> >> currently be rejected. > >>> > >>> > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > -- Björn Helgason, Verkfræðingur Fornustekkum II 781 Hornafirði, t-póst: [email protected] gsm: +3546985532 twitter: @flugfiskur http://groups.google.com/group/J-Programming Tæknikunnátta höndlar hið flókna, sköpunargáfa er meistari einfaldleikans góður kennari getur stigið á tær án þess að glansinn fari af skónum /|_ .-----------------------------------. ,' .\ / | Með léttri lund verður | ,--' _,' | Dagurinn í dag | / / | Enn betri en gærdagurinn | ( -. | `-----------------------------------' | ) | (\_ _/) (`-. '--.) (='.'=) ♖♘♗♕♔♙ `. )----' (")_(") ☃☠ ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
