On second thought, this line is unnecessary:
    start=. end getFirst start

It's a holdover from an earlier version.  It's harmless, but useless for
this kind of data.

Aai's comments about speed got me thinking about this again....

-- 
Raul

On Sat, Nov 26, 2011 at 4:50 AM, Raul Miller <[email protected]> wrote:

> Ok... well.. since you have something working from Arie, I do not think I
> have much to add.
>
> That said, given my current understanding of your requirements, I think I
> would write the extractor something like this:
>
> advN=:2 :0  NB. adverb that takes N noun left arguments
>   if.L.n   do. d=. (<m),D [ 'N D M'=. n
>     if.N-1 do. advN ((N-1);d;M)
>     else.      d 1 :M end.
>   else. advN (n;m;0 :0) end.
> )
>
> getFirst=: -@#@[ {. I. {./. ]
>
> taggedEvents=: '' advN 3
>   'tags lineEnd eventStart'=. m
>   lines=. I. lineEnd E. y
>   events=. I. eventStart E. y
>   r=. i.(#events),0
>   for_TAG. tags do. tag=. >TAG
>     start=. events getFirst (#tag)+I. tag E. y
>     end=. start getFirst lines
>     start=. end getFirst start
>     data=. start <@:{&y@(+i.)"0 end-start
>     r=. r,. data (<: events I. start)} ($events) $ a:
>   end.
> )
>
> ww1t=: 1!:1 <'ww1t.txt'
>
> start=: 'start{'
> linend=: CR,LF,TAB
> tags=: 'STATUS';'RESULT[0]';'CONFIDENCE[0]';'UTTERANCE_FILENAME'
> V=: tags linend start taggedEvents
>
> With these definitions, the extracted content looks like:
>    V ww1t
> or
>    tags linend start taggedEvents ww1t
>
> Note that I have assumed that the first event will be preceded by some
> text which does not contain any tags.  You will get an error if this
> assumption is violated.  If you need to process files that do not contain a
> preamble, you should add one to the text (adding a space in front should
> work fine).
>
> --
> Raul
>
> On Fri, Nov 25, 2011 at 4:36 PM, Skip Cave <[email protected]>wrote:
>
>> Raul,
>>
>> I want every event in the text logs to generate a boxed row in the output,
>> even if none of the requested parameters are in the event. Every event
>> will
>> start with the 'start{'  text string, and end with the '}end' text string.
>> I want to uniquely number each event (or boxed row) in the output. Since I
>> have several large log file sets to analyze, I will need to be able to
>> offset the event numbers in a specific output by a constant, so that every
>> event across all output sets to have a unique event number.
>>
>> Skip
>>
>> On Thu, Nov 24, 2011 at 6:00 AM, Raul Miller <[email protected]>
>> wrote:
>>
>> > Ok, this suggests a completely different design.
>> >
>> > That said, when you say "number each row in the output" do you mean
>> event
>> > number or do you mean line number?   I agree that event number is
>> implicit.
>> >
>> > --
>> > Raul
>> >
>> > On Wed, Nov 23, 2011 at 11:47 PM, Skip Cave <[email protected]>
>> > wrote:
>> >
>> > > Raul,
>> > >
>> > > I'm using tags3 for your function:
>> > >
>> > >   tags3
>> > > ┌──────┬───┬─────────┬───┬────
>> > > ───────────────────────┬───┐
>> > >
>> > > │STATUS│   │RESULT[0]│   │CONFIDENCE[0]             =│   │
>> > > └──────┴───┴─────────┴───┴───────────────────────────┴───┘
>> > >
>> > > The empty boxes actually carry the CR, LF, TAB character string
>> defining
>> > > the closing tag for each parameter..
>> > >
>> > > tags6 is the tag string for *Arie's* function:
>> > >  tags6
>> > > ┌──────┬──────┬─────────┬─────────────┬──────────────────┐
>> > > │start{│STATUS│RESULT[0]│CONFIDENCE[0]│UTTERANCE_FILENAME│
>> > > └──────┴──────┴─────────┴─────────────┴──────────────────┘
>> > >
>> > > Arie uses the first element of his tag string to define the string
>> that
>> > > starts each event. In our case that is the 'start{' string which
>> > identifies
>> > > the start of each event. Arie assumes that every line ends in CR, LF,
>> > TAB.
>> > > So he doesn't need to have the closing tag specified for each
>> parameter.
>> > >
>> > > The more I look over the data, the more I think that the function
>> should
>> > > capture EVERY event in the log. Every event starts with 'start{' and
>> ends
>> > > with '}end', so it is easy to spot all the events. If a specific event
>> > has
>> > > NO matching parameter tags in it, then the output will have a row of
>> > empty
>> > > boxes in it, The number of boxed columns in the output will be the
>> number
>> > > of parameters asked for in the tag list. The number of boxed rows in
>> the
>> > > output will be the total number of events in the whole text log file.
>> > >
>> > > I think I also need to number each row in the output. However that is
>> one
>> > > thing I CAN do myself.
>> > >
>> > > Skip
>> > >
>> > > On Wed, Nov 23, 2011 at 8:44 PM, Raul Miller <[email protected]>
>> > > wrote:
>> > >
>> > > > There are other reasons why mine might stop (like missing end tags).
>> > > >
>> > > > What definition are you using for tags6?
>> > > >
>> > > > --
>> > > > Raul
>> > > >
>> > > > > > >> > Raul
>> > > >
>> > > ----------------------------------------------------------------------
>> > > For information about J forums see
>> http://www.jsoftware.com/forums.htm
>> > >
>> > ----------------------------------------------------------------------
>> > For information about J forums see http://www.jsoftware.com/forums.htm
>> >
>>
>>
>>
>> --
>> Skip Cave
>> Cave Consulting LLC
>> Phone: 214-460-4861
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>>
>
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to