If the program detects an assert failure, it should find the whole tag
sequence (tag1s, tag1e, tag2s, tag2e, etc). and should skip over that
entire bad tag sequence. It should find the next appearence of the first
start tag (tag1s) and process it as usual.

Right now, when the assert fails, the whole program stops in the middle of
processing, with no clue where the failure was. In a perfect world, the
program would also  note the position of the failed text in a global
variable, so I could inspect the failure later, as well as find out how
many bad tag sets there were in the run. Generally the problem is a mangled
log file. I probably won't be able to fix it anyway, so just skipping over
the bad tag set is the best option.

Skip

On Mon, Nov 21, 2011 at 2:47 PM, Raul Miller <[email protected]> wrote:

> That assert is checking for unbalanced tags.  You probably have two start
> tags followed by one end tag.
>
> What do you want the program to do for this kind of thing?
>
> --
> Raul
>
> On Mon, Nov 21, 2011 at 3:43 PM, Skip Cave <[email protected]>
> wrote:
>
> > Raul's getTagsConterns function works great on my data. Here's the
> > function:
> >
> > getTagsContents=: 4 :0
> >  'n m'=. $tags=. > _2 <\ y
> >  locs=. (-@#@[ {. I. {./. ])&.>/\"1 tags [email protected]:0 }. txt=. ' ',x,;tags
> >  assert. -:&/:&;/ |:locs  NB. tags must be balanced
> >  data=. _2 {:\  ((/:~ ; locs) I. i.#txt) </.  txt
> >  expand=. ;(i.n) e.L:0 (<;.1~ 1,2>:/\]) ,I. |:>(e.L:0~ /:~@;) {."1 locs
> >  }: (#@>{."1 tags) }.&.>"1 (-n) ]\ expand #inv (+/expand){.data
> > )
> >
> >
> >
> > However, I have a few logs that got garbled, and they fail Raul's assert
> > test:
> >
> > ww1 is a boxed array with 1000 text log files in it, one log file per box
> >
> >   $ww1
> > 1000
> >   $ ; ww1
> > 32842565
> >
> > tags4 is a noun containing the four tag pairs that bracket the text that
> I
> > need to extract using Raul's getTagsContents function
> >
> >   tags4
> >
> >
> ┌──────┬───┬─────────┬───┬───────────────────────────┬───┬──────────────────┬───┐
> > │STATUS│   │RESULT[0]│   │CONFIDENCE[0]             =│
> > │UTTERANCE_FILENAME│   │
> >
> >
> └──────┴───┴─────────┴───┴───────────────────────────┴───┴──────────────────┴───┘
> >
> > now we test:
> >
> >   ww1x =: (;ww1) getTagsContents  tags4
> > |assertion failure: getTagsContents
> > |   -:&/:&;/|:locs
> >
> >    ww1x =: (;  365 {. ww1) getTagsContents  tags4  NB. This works
> >
> >   ww1x =: (; 366 } ww1) getTagsContents   tags4
> > |assertion failure: getTagsContents
> > |   -:&/:&;/|:locs
> >
> > There's the culprit - box no 366 in ww1
> >
> > There also a couple of other garbled logs in ww1 that fail the assertion
> > test.
> >
> > Is there any way to build the getTagsContents, so if a specofic boxed log
> > fails assertion,
> > the function will skip that boxed log and go to the next one?
> >
> > Skip
> > .
> > On Sat, Nov 19, 2011 at 1:01 PM, Skip Cave <[email protected]>
> > wrote:
> >
> > > Raul
> > >
> > > That works like a charm! It gets all the parameters, and puts them in
> the
> > > right columns. Now I'll try it on a larger data file with real data in
> > it:
> > >
> > >   $ww
> > > 10         NB. ww has ten log files in it, one box per log file.
> > >    $;ww
> > > 969059  NB. ww unboxed and raveled is a long text string of catenated
> log
> > > files. Each log file has lots of events in it, and each event has lots
> of
> > > parameters.
> > >
> > >    a. i. crlftb
> > > 13 10 9      NB. The verb crlftb has CR, LF, Tab in it.
> > >
> > > NB. This is the terminator string for all the lines in the log file.
> > >
> > > NB. I want the parameters on every lines starting with STATUS,
> RESULT[0],
> > > and CONFIDENCE[0]
> > >
> > > tags2 =: 'STATUS'; crlftb ; 'RESULT[0]'; crlftb ; 'CONFIDENCE[0]' ;
> > crlftb
> > >    tags2
> > > ┌──────┬───┬─────────┬───┬─────────────┬───┐
> > > │STATUS│   │RESULT[0]│   │CONFIDENCE[0]│   │
> > > └──────┴───┴─────────┴───┴─────────────┴───┘
> > >
> > > NB. Now the acid test:
> > >
> > >  txt9 =:  (; ww) getTagsContents tags2
> > >    $txt9
> > > 120 3
> > >
> > > So there were 120 events in all the log files that had at least one of
> > the
> > > three parameter values we wanted, in them.
> > >
> > > Let's take a look:
> > >
> > >   cleanString1 10 {. 100 }. txt9
> > > ┌───────────┬───────────┬─────────────────┐
> > > │           │           │[0][__MRCP_GID] 0│
> > > ├───────────┼───────────┼─────────────────┤
> > > │           │           │[0][__MRCP_STR] 0│
> > > ├───────────┼───────────┼─────────────────┤
> > > │RECOGNITION│main menu  │75               │
> > > ├───────────┼───────────┼─────────────────┤
> > > │           │           │[0][__MRCP_GID] 0│
> > > ├───────────┼───────────┼─────────────────┤
> > > │           │           │[0][__MRCP_STR] 0│
> > > ├───────────┼───────────┼─────────────────┤
> > > │RECOGNITION│ninety five│64               │
> > > ├───────────┼───────────┼─────────────────┤
> > > │           │           │[0][__MRCP_GID] 0│
> > > ├───────────┼───────────┼─────────────────┤
> > > │           │           │[0][__MRCP_STR] 0│
> > > ├───────────┼───────────┼─────────────────┤
> > > │RECOGNITION│yes        │86               │
> > > ├───────────┼───────────┼─────────────────┤
> > > │           │           │[0][__MRCP_GID] 0│
> > > └───────────┴───────────┴─────────────────┘
> > >
> > > Yes! that's it!
> > >
> > > Raul, Frasier, Björn, Linda, *thanks to all of you* for helping me on
> > > this problem.
> > >
> > > Now I have to do this same thing to a few thousand log files instead of
> > > just 10. Then I need to do all kinds of analysis on the resulting
> data. I
> > > think I know enough J to do the analysis part, but I still may have to
> > ask
> > > a question or two, if I get stuck.
> > >
> > > I'll let you all know how it goes....
> > >
> > > Skip
> > >
> > >
> > > On Sat, Nov 19, 2011 at 11:24 AM, Raul Miller <[email protected]
> > >wrote:
> > >
> > >> Note that this isn't really a new function -- it's the same one that
> you
> > >> posted (or would have posted, i think, if you had posted the last line
> > of
> > >> it).  Except, mine was from a version that had =: instead of =: for
> its
> > >> intermediate results.  That's bad, for production code, but it does
> let
> > us
> > >> see what the bug is:
> > >>
> > >>   _4 ]\ expand #inv (+/expand){.data
> > >>
> > >>
> >
> ┌────────────────────┬───────────────────┬─────────────────────────┬────────────────────┐
> > >> │param1              │param2             │param3
> > │param5
> > >>             │
> > >>
> > >>
> >
> ├────────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
> > >> │param1    =  12345  │param2    =   NONE │param3   =   hello world │
> > >>             │
> > >>
> > >>
> >
> ├────────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
> > >> │                    │                   │param1  = 34567
> >  │param3
> > >>  = hello bob │
> > >>
> > >>
> >
> ├────────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
> > >> │param5   - zero one │                   │
> > │param5
> > >> = two three  │
> > >>
> > >>
> >
> ├────────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
> > >> │param1 = 6789       │param2 = SOME      │                         │
> > >>             │
> > >>
> > >>
> >
> ├────────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
> > >> │param1              │param2             │param3
> > │param5
> > >>             │
> > >>
> > >>
> >
> └────────────────────┴───────────────────┴─────────────────────────┴────────────────────┘
> > >>
> > >> I am not defining "expand" properly.  Thus, parameters are being
> > >> misplaced.
> > >>
> > >> If I use an alternate definition for expand, it seems to get the
> > >> parameters
> > >> into the right places:
> > >>
> > >>   expand=: ;0 1 2 3 e.L:0 (<;.1~ 1,2>:/\]) ,I. |:>(e.L:0~ /:~@;) {."1
> > locs
> > >>   _4 ]\ expand #inv (+/expand){.data
> > >>
> > >>
> >
> ┌───────────────────┬───────────────────┬─────────────────────────┬────────────────────┐
> > >> │param1             │param2             │param3
> > │param5
> > >>           │
> > >>
> > >>
> >
> ├───────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
> > >> │param1    =  12345 │param2    =   NONE │param3   =   hello world │
> > >>           │
> > >>
> > >>
> >
> ├───────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
> > >> │param1  = 34567    │                   │param3  = hello bob
> >  │param5
> > >> - zero one │
> > >>
> > >>
> >
> ├───────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
> > >> │                   │                   │
> > │param5
> > >> =
> > >> two three  │
> > >>
> > >>
> >
> ├───────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
> > >> │param1 = 6789      │param2 = SOME      │                         │
> > >>           │
> > >>
> > >>
> >
> ├───────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
> > >> │param1             │param2             │param3
> > │param5
> > >>           │
> > >>
> > >>
> >
> └───────────────────┴───────────────────┴─────────────────────────┴────────────────────┘
> > >>
> > >> ...and this also lets me clean up some unneeded stuff (I no longer
> need
> > to
> > >> add the blank tags to the text I am working with, and so I no longer
> > need
> > >> to drop those rows from the result.. except it blows up if no tags are
> > >> present, so I can't get rid of that entirely...
> > >>
> > >> Anyways, here's how it looks with this definition for expand:
> > >>
> > >> getTagsContents=: 4 :0
> > >>  'n m'=. $tags=. > _2 <\ y
> > >>   locs=. (-@#@[ {. I. {./. ])&.>/\"1 tags [email protected]:0 }. txt=. '
> ',x,;tags
> > >>   assert. -:&/:&;/ |:locs  NB. tags must be balanced
> > >>   data=. _2 {:\  ((/:~ ; locs) I. i.#txt) </.  txt
> > >>  expand=. ;(i.n) e.L:0 (<;.1~ 1,2>:/\]) ,I. |:>(e.L:0~ /:~@;) {."1
> locs
> > >>  }: (#@>{."1 tags) }.&.>"1 (-n) ]\ expand #inv (+/expand){.data
> > >> )
> > >>
> > >> --
> > >> Raul
> > >>
> > >>
> >
> >
> > --
> > Skip Cave
> > Cave Consulting LLC
> > Phone: 214-460-4861
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> >
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>



-- 
Skip Cave
Cave Consulting LLC
Phone: 214-460-4861
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to