Raul,
In my application, the tag pairs will never overlap. Also, the leading tag
of a particular string will always be unique. However, it is handy if I can
define just the trailing tags of any tag pair to be all the same string.
This won't always be the case, as sometimes the closing tag may be unique,
so either case should work. Here's an example of some real data in my text
log file. I just pulled a section of the text out of the middle of the log:
ww2
PROMPT_DURATION = 1.968
STATUS = RECOGNITION
SERVER_HOSTNAME = Unknown <connected via resource mgr>
NUM_RESULTS = 1
RESULT[0] = dtmf-9 dtmf-0 dtmf-4 dtmf-7 dtmf-2
CONFIDENCE[0]
Let's look at the data:
$ ww2
260
q: 260
2 2 5 13
So 2*2*5 = 20 and ww2 will fit in a 20 x 13 array
13 20 $ a. i. ww2
10 9 80 82 79 77 80 84 95 68 85 82 65 84 73 79 78 32 32
32
32 32 32 32 32 32 32 32 61 32 49 46 57 54 56 13 10 9 83
84
65 84 85 83 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32
32
32 32 32 32 61 32 82 69 67 79 71 78 73 84 73 79 78 13
10 9
83 69 82 86 69 82 95 72 79 83 84 78 65 77 69 32 32 32 32
32
32 32 32 32 32 32 61 32 85 110 107 110 111 119 110 32 60 99 111
110
110 101 99 116 101 100 32 118 105 97 32 114 101 115 111 117 114 99 101
32
109 103 114 62 13 10 9 78 85 77 95 82 69 83 85 76 84 83 32
32
32 32 32 32 32 32 32 32 32 32 32 32 32 61 32 49 13 10 9
82
69 83 85 76 84 91 48 93 32 32 32 32 32 32 32 32 32 32 32
32
32 32 32 32 32 61 32 100 116 109 102 45 57 32 100 116 109 102 45
48
32 100 116 109 102 45 52 32 100 116 109 102 45 55 32 100 116 109 102
45
50 13 10 9 67 79 78 70 73 68 69 78 67 69 91 48 93 32 32
32
You can see that each line is terminated with a 13, 10, 9 character string
(CR, LF, TAB)
we check:
a. i. crlftb
13 10 9
Also the crlftb noun contains the three characters that terminate each line.
I want to capture the row starting with 'STATUS' and the row starting with
'RESULT[0]',
Both rows terminate with the carriage return, line feed, tab sequence.
I have commented the assert. statement out of your getTagsContent, so I no
longer get the error:
Now I run the function:
ww2 getTagsContents 'STATUS';crlf;'RESULT[0]';crlf
┌┬──────────────────────────────────────────────────────────┐
││_HOSTNAME = Unknown <connected via resource mgr>│
└┴──────────────────────────────────────────────────────────┘
Weird. I get the line AFTER the one I want, and it completely misses the
second tag pair.
Any ideas what is going on?
Skip
On Thu, Nov 17, 2011 at 8:17 AM, Raul Miller <[email protected]> wrote:
> P.S. Brian Schott has observed that if you take out the assertion that is
> checking for balanced tags that works on the crlftab example.
>
> This is because my code also has an assumption that tags cannot overlap.
>
> I have not thought through what this would mean for other cases that would
> currently be rejected.
>
> --
> Raul
>
> On Thu, Nov 17, 2011 at 9:14 AM, Raul Miller <[email protected]>
> wrote:
>
> > Looking at this closer...
> >
> > The problem, here, is that you are using the same close tag for different
> > start tags. And this introduces ambiguities, about what you really mean.
> >
> > I think, from looking at your test case, that in ambiguous cases you
> > intend for the shortest possible match to be used. Using this rule, it's
> > possible to determine which start tag is matched by which end tag. (And
> > this ties into how I built that code: it does not support overlapping
> tags.)
> >
> > Also, I think I would assume that you are only planning on allowing for
> > end-tags to be re-used and not start tags, and that should also simplify
> > the task of matching the start tags with the end tags.
> >
> > (If the tags are all matched properly then the assertion will not fail.)
> >
> > FYI,
> >
> > --
> > Raul
> >
> > On Thu, Nov 17, 2011 at 8:04 AM, Raul Miller <[email protected]
> >wrote:
> >
> >> Tags must be balanced because we did not define a rule for unbalanced
> >> tags.
> >>
> >> --
> >> Raul
> >>
> >>
>
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm