Re: [Jprogramming] Finding multiple sequential strings

Skip Cave Thu, 17 Nov 2011 08:09:39 -0800

Raul,

In my application, the tag pairs will never overlap. Also, the leading tag
of a particular string will always be unique. However, it is handy if I can
define just the trailing tags of any tag pair to be all the same string.
This won't always be the case, as sometimes the closing tag may be unique,
so either case should work. Here's an example of some real data in my text
log file. I just pulled a section of the text out of the middle of the log:


   ww2

    PROMPT_DURATION           = 1.968
    STATUS                    = RECOGNITION
    SERVER_HOSTNAME           = Unknown <connected via resource mgr>
    NUM_RESULTS               = 1
    RESULT[0]                 = dtmf-9 dtmf-0 dtmf-4 dtmf-7 dtmf-2
    CONFIDENCE[0]

Let's look at the data:
   $ ww2
260
   q: 260
2 2 5 13

So 2*2*5 = 20 and ww2 will fit in a 20 x 13 array

    13 20  $ a. i. ww2
 10   9  80  82  79  77 80  84  95  68  85  82  65  84  73  79  78  32  32
32
 32  32  32  32  32  32 32  32  61  32  49  46  57  54  56  13  10   9  83
84
 65  84  85  83  32  32 32  32  32  32  32  32  32  32  32  32  32  32  32
32
 32  32  32  32  61  32 82  69  67  79  71  78  73  84  73  79  78  13
10   9
 83  69  82  86  69  82 95  72  79  83  84  78  65  77  69  32  32  32  32
32
 32  32  32  32  32  32 61  32  85 110 107 110 111 119 110  32  60  99 111
110
110 101  99 116 101 100 32 118 105  97  32 114 101 115 111 117 114  99 101
32
109 103 114  62  13  10  9  78  85  77  95  82  69  83  85  76  84  83  32
32
 32  32  32  32  32  32 32  32  32  32  32  32  32  61  32  49  13  10   9
82
 69  83  85  76  84  91 48  93  32  32  32  32  32  32  32  32  32  32  32
32
 32  32  32  32  32  61 32 100 116 109 102  45  57  32 100 116 109 102  45
48
 32 100 116 109 102  45 52  32 100 116 109 102  45  55  32 100 116 109 102
45
 50  13  10   9  67  79 78  70  73  68  69  78  67  69  91  48  93  32  32
32

You can see that each line is terminated with a 13, 10, 9 character string
(CR, LF, TAB)

we check:
   a. i. crlftb
13 10 9

Also the crlftb noun contains the three characters that terminate each line.

I want to capture the row starting with 'STATUS' and the row starting with
'RESULT[0]',
Both rows terminate with the carriage return, line feed, tab sequence.
I have commented the assert. statement out of your getTagsContent, so I no
longer get the error:

Now I run the function:

    ww2 getTagsContents 'STATUS';crlf;'RESULT[0]';crlf
┌┬──────────────────────────────────────────────────────────┐
││_HOSTNAME           = Unknown <connected via resource mgr>│
└┴──────────────────────────────────────────────────────────┘

Weird. I get the line AFTER the one I want, and it completely misses the
second tag pair.
Any ideas what is going on?

Skip


On Thu, Nov 17, 2011 at 8:17 AM, Raul Miller <[email protected]> wrote:

> P.S. Brian Schott has observed that if you take out the assertion that is
> checking for balanced tags that works on the crlftab example.
>
> This is because my code also has an assumption that tags cannot overlap.
>
> I have not thought through what this would mean for other cases that would
> currently be rejected.
>
> --
> Raul
>
> On Thu, Nov 17, 2011 at 9:14 AM, Raul Miller <[email protected]>
> wrote:
>
> > Looking at this closer...
> >
> > The problem, here, is that you are using the same close tag for different
> > start tags.  And this introduces ambiguities, about what you really mean.
> >
> > I think, from looking at your test case, that in ambiguous cases you
> > intend for the shortest possible match to be used.  Using this rule, it's
> > possible to determine which start tag is matched by which end tag.  (And
> > this ties into how I built that code: it does not support overlapping
> tags.)
> >
> > Also, I think I would assume that you are only planning on allowing for
> > end-tags to be re-used and not start tags, and that should also simplify
> > the task of matching the start tags with the end tags.
> >
> > (If the tags are all matched properly then the assertion will not fail.)
> >
> > FYI,
> >
> > --
> > Raul
> >
> > On Thu, Nov 17, 2011 at 8:04 AM, Raul Miller <[email protected]
> >wrote:
> >
> >> Tags must be balanced because we did not define a rule for unbalanced
> >> tags.
> >>
> >> --
> >> Raul
> >>
> >>
>
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Finding multiple sequential strings

Reply via email to