I think this will fix up locs, getting rid of the irrelevant extras:
locs=: (-@#@[ {. I. {./. ])&.>/\"1 locs
Note, however, that this will be bad if you have start "tags" without end
"tags".
And note also that the final line in ww2 did not end with CR,LF,TAB
--
Raul
On Thu, Nov 17, 2011 at 6:54 PM, Skip Cave <[email protected]> wrote:
> I think the problem we are having is that the closing crlftb tag string
> appears in many other places in the file, besides as a closing tag for the
> opening tags. There are many more crlftb strings in the text than there are
> opening tag strings.
>
> So the correct statement is that the opening tags are unique, and will
> always start a required text string. Closing tags are not necessarily
> unique, and will close the required strings, as well as appear in other
> places in the file, which can be ignored. Only the first closing tag string
> that appears in the text following a unique opening tag is valid as the
> terminating tag for the text to be extracted.
>
> The function should find the opening tag, and then capture all of the text
> up to the first occurrence of the crlftb closing tag. It should ignore all
> subsequent crlftb tags until after it finds a unique opening tag, then
> again capture all of the text up to the first tag end string, which in this
> case will again be the crlftb string.
>
> It looks like the problem is in this line of the function:
> locs=: tags [email protected]:0 }. txt=.(' ',;tags),x,;tags
>
> But I haven't gotten my head around all that it is doing as yet.
>
> Here's the whole function, with the assert line commented out.
>
> getTagsContents=: 4 :0
> 'n m'=. $tags=. > _2 <\ y
> locs=: tags [email protected]:0 }. txt=.(' ',;tags),x,;tags
> NB. assert. -:&/:&;/ |:locs NB. tags must be balanced
> data=: _2 {:\ ((/:~ ; locs) I. i.#txt) </. txt
> expand=: ;(#~ 1&e.S:0) <@|./. |.> (e.L:0~ /:~@;) {."1 locs
> }: }.(#@>{."1 tags) }.&.>"1 (-n) ]\ expand #inv (+/expand){.data
> )
>
> Skip
>
> On Thu, Nov 17, 2011 at 1:33 PM, Skip Cave <[email protected]>
> wrote:
>
> > Yes. As I said in my previous post, the assert statement has been
> > commented out. It would throw an error, if the assert wasn't commented
> out.
> > On Nov 17, 2011 12:00 PM, "Raul Miller" <[email protected]> wrote:
> >
> >> Did you try just removing the assert?
> >>
> >> Thanks,
> >>
> >> --
> >> Raul
> >>
> >> On Thu, Nov 17, 2011 at 11:24 AM, Skip Cave <[email protected]
> >wrote:
> >>
> >>> I stated that wrong.
> >>>
> >>> ( ww2) getTagsContents 'STATUS';crlf;'RESULT[0]';crlf
> >>> ┌┬──────────────────────────────────────────────────────────┐
> >>> ││_HOSTNAME = Unknown <connected via resource mgr>│
> >>> └┴──────────────────────────────────────────────────────────┘
> >>>
> >>> It doesn't find the first tag pair, and for some reason, it captures
> >>> *part
> >>> of* the string *following* the first tag pair.
> >>>
> >>> Skip
> >>>
> >>> On Thu, Nov 17, 2011 at 10:08 AM, Skip Cave <[email protected]>
> >>> wrote:
> >>>
> >>> > Raul,
> >>> >
> >>> > In my application, the tag pairs will never overlap. Also, the
> leading
> >>> tag
> >>> > of a particular string will always be unique. However, it is handy if
> >>> I can
> >>> > define just the trailing tags of any tag pair to be all the same
> >>> string.
> >>> > This won't always be the case, as sometimes the closing tag may be
> >>> unique,
> >>> > so either case should work. Here's an example of some real data in my
> >>> text
> >>> > log file. I just pulled a section of the text out of the middle of
> the
> >>> log:
> >>> >
> >>> > ww2
> >>> >
> >>> > PROMPT_DURATION = 1.968
> >>> > STATUS = RECOGNITION
> >>> > SERVER_HOSTNAME = Unknown <connected via resource mgr>
> >>> > NUM_RESULTS = 1
> >>> > RESULT[0] = dtmf-9 dtmf-0 dtmf-4 dtmf-7 dtmf-2
> >>> > CONFIDENCE[0]
> >>> >
> >>> > Let's look at the data:
> >>> > $ ww2
> >>> > 260
> >>> > q: 260
> >>> > 2 2 5 13
> >>> >
> >>> > So 2*2*5 = 20 and ww2 will fit in a 20 x 13 array
> >>> >
> >>> > 13 20 $ a. i. ww2
> >>> > 10 9 80 82 79 77 80 84 95 68 85 82 65 84 73 79 78
> 32
> >>> > 32 32
> >>> > 32 32 32 32 32 32 32 32 61 32 49 46 57 54 56 13 10
> 9
> >>> > 83 84
> >>> > 65 84 85 83 32 32 32 32 32 32 32 32 32 32 32 32 32
> 32
> >>> > 32 32
> >>> > 32 32 32 32 61 32 82 69 67 79 71 78 73 84 73 79 78
> 13
> >>> > 10 9
> >>> > 83 69 82 86 69 82 95 72 79 83 84 78 65 77 69 32 32
> 32
> >>> > 32 32
> >>> > 32 32 32 32 32 32 61 32 85 110 107 110 111 119 110 32 60
> 99
> >>> 111
> >>> > 110
> >>> > 110 101 99 116 101 100 32 118 105 97 32 114 101 115 111 117 114
> 99
> >>> > 101 32
> >>> > 109 103 114 62 13 10 9 78 85 77 95 82 69 83 85 76 84
> 83
> >>> > 32 32
> >>> > 32 32 32 32 32 32 32 32 32 32 32 32 32 61 32 49 13
> 10
> >>> > 9 82
> >>> > 69 83 85 76 84 91 48 93 32 32 32 32 32 32 32 32 32
> 32
> >>> > 32 32
> >>> > 32 32 32 32 32 61 32 100 116 109 102 45 57 32 100 116 109
> 102
> >>> > 45 48
> >>> > 32 100 116 109 102 45 52 32 100 116 109 102 45 55 32 100 116
> 109
> >>> > 102 45
> >>> > 50 13 10 9 67 79 78 70 73 68 69 78 67 69 91 48 93
> 32
> >>> > 32 32
> >>> >
> >>> > You can see that each line is terminated with a 13, 10, 9 character
> >>> string
> >>> > (CR, LF, TAB)
> >>> >
> >>> > we check:
> >>> >
> >>> > a. i. crlftb
> >>> > 13 10 9
> >>> >
> >>> > Also the crlftb noun contains the three characters that terminate
> each
> >>> > line.
> >>> >
> >>> > I want to capture the row starting with 'STATUS' and the row starting
> >>> with
> >>> > 'RESULT[0]',
> >>> > Both rows terminate with the carriage return, line feed, tab
> sequence.
> >>> > I have commented the assert. statement out of your getTagsContent, so
> >>> I no
> >>> > longer get the error:
> >>> >
> >>> > Now I run the function:
> >>> >
> >>> > ww2 getTagsContents 'STATUS';crlf;'RESULT[0]';crlf
> >>> > ┌┬──────────────────────────────────────────────────────────┐
> >>> > ││_HOSTNAME = Unknown <connected via resource mgr>│
> >>> > └┴──────────────────────────────────────────────────────────┘
> >>> >
> >>> > Weird. I get the line AFTER the one I want, and it completely misses
> >>> the
> >>> > second tag pair.
> >>> > Any ideas what is going on?
> >>> >
> >>> > Skip
> >>> >
> >>> >
> >>> >
> >>> > On Thu, Nov 17, 2011 at 8:17 AM, Raul Miller <[email protected]
> >>> >wrote:
> >>> >
> >>> >> P.S. Brian Schott has observed that if you take out the assertion
> >>> that is
> >>> >> checking for balanced tags that works on the crlftab example.
> >>> >>
> >>> >> This is because my code also has an assumption that tags cannot
> >>> overlap.
> >>> >>
> >>> >> I have not thought through what this would mean for other cases that
> >>> would
> >>> >> currently be rejected.
> >>>
> >>>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm