That assert is checking for unbalanced tags.  You probably have two start
tags followed by one end tag.

What do you want the program to do for this kind of thing?

-- 
Raul

On Mon, Nov 21, 2011 at 3:43 PM, Skip Cave <[email protected]> wrote:

> Raul's getTagsConterns function works great on my data. Here's the
> function:
>
> getTagsContents=: 4 :0
>  'n m'=. $tags=. > _2 <\ y
>  locs=. (-@#@[ {. I. {./. ])&.>/\"1 tags [email protected]:0 }. txt=. ' ',x,;tags
>  assert. -:&/:&;/ |:locs  NB. tags must be balanced
>  data=. _2 {:\  ((/:~ ; locs) I. i.#txt) </.  txt
>  expand=. ;(i.n) e.L:0 (<;.1~ 1,2>:/\]) ,I. |:>(e.L:0~ /:~@;) {."1 locs
>  }: (#@>{."1 tags) }.&.>"1 (-n) ]\ expand #inv (+/expand){.data
> )
>
>
>
> However, I have a few logs that got garbled, and they fail Raul's assert
> test:
>
> ww1 is a boxed array with 1000 text log files in it, one log file per box
>
>   $ww1
> 1000
>   $ ; ww1
> 32842565
>
> tags4 is a noun containing the four tag pairs that bracket the text that I
> need to extract using Raul's getTagsContents function
>
>   tags4
>
> ┌──────┬───┬─────────┬───┬───────────────────────────┬───┬──────────────────┬───┐
> │STATUS│   │RESULT[0]│   │CONFIDENCE[0]             =│
> │UTTERANCE_FILENAME│   │
>
> └──────┴───┴─────────┴───┴───────────────────────────┴───┴──────────────────┴───┘
>
> now we test:
>
>   ww1x =: (;ww1) getTagsContents  tags4
> |assertion failure: getTagsContents
> |   -:&/:&;/|:locs
>
>    ww1x =: (;  365 {. ww1) getTagsContents  tags4  NB. This works
>
>   ww1x =: (; 366 } ww1) getTagsContents   tags4
> |assertion failure: getTagsContents
> |   -:&/:&;/|:locs
>
> There's the culprit - box no 366 in ww1
>
> There also a couple of other garbled logs in ww1 that fail the assertion
> test.
>
> Is there any way to build the getTagsContents, so if a specofic boxed log
> fails assertion,
> the function will skip that boxed log and go to the next one?
>
> Skip
> .
> On Sat, Nov 19, 2011 at 1:01 PM, Skip Cave <[email protected]>
> wrote:
>
> > Raul
> >
> > That works like a charm! It gets all the parameters, and puts them in the
> > right columns. Now I'll try it on a larger data file with real data in
> it:
> >
> >   $ww
> > 10         NB. ww has ten log files in it, one box per log file.
> >    $;ww
> > 969059  NB. ww unboxed and raveled is a long text string of catenated log
> > files. Each log file has lots of events in it, and each event has lots of
> > parameters.
> >
> >    a. i. crlftb
> > 13 10 9      NB. The verb crlftb has CR, LF, Tab in it.
> >
> > NB. This is the terminator string for all the lines in the log file.
> >
> > NB. I want the parameters on every lines starting with STATUS, RESULT[0],
> > and CONFIDENCE[0]
> >
> > tags2 =: 'STATUS'; crlftb ; 'RESULT[0]'; crlftb ; 'CONFIDENCE[0]' ;
> crlftb
> >    tags2
> > ┌──────┬───┬─────────┬───┬─────────────┬───┐
> > │STATUS│   │RESULT[0]│   │CONFIDENCE[0]│   │
> > └──────┴───┴─────────┴───┴─────────────┴───┘
> >
> > NB. Now the acid test:
> >
> >  txt9 =:  (; ww) getTagsContents tags2
> >    $txt9
> > 120 3
> >
> > So there were 120 events in all the log files that had at least one of
> the
> > three parameter values we wanted, in them.
> >
> > Let's take a look:
> >
> >   cleanString1 10 {. 100 }. txt9
> > ┌───────────┬───────────┬─────────────────┐
> > │           │           │[0][__MRCP_GID] 0│
> > ├───────────┼───────────┼─────────────────┤
> > │           │           │[0][__MRCP_STR] 0│
> > ├───────────┼───────────┼─────────────────┤
> > │RECOGNITION│main menu  │75               │
> > ├───────────┼───────────┼─────────────────┤
> > │           │           │[0][__MRCP_GID] 0│
> > ├───────────┼───────────┼─────────────────┤
> > │           │           │[0][__MRCP_STR] 0│
> > ├───────────┼───────────┼─────────────────┤
> > │RECOGNITION│ninety five│64               │
> > ├───────────┼───────────┼─────────────────┤
> > │           │           │[0][__MRCP_GID] 0│
> > ├───────────┼───────────┼─────────────────┤
> > │           │           │[0][__MRCP_STR] 0│
> > ├───────────┼───────────┼─────────────────┤
> > │RECOGNITION│yes        │86               │
> > ├───────────┼───────────┼─────────────────┤
> > │           │           │[0][__MRCP_GID] 0│
> > └───────────┴───────────┴─────────────────┘
> >
> > Yes! that's it!
> >
> > Raul, Frasier, Björn, Linda, *thanks to all of you* for helping me on
> > this problem.
> >
> > Now I have to do this same thing to a few thousand log files instead of
> > just 10. Then I need to do all kinds of analysis on the resulting data. I
> > think I know enough J to do the analysis part, but I still may have to
> ask
> > a question or two, if I get stuck.
> >
> > I'll let you all know how it goes....
> >
> > Skip
> >
> >
> > On Sat, Nov 19, 2011 at 11:24 AM, Raul Miller <[email protected]
> >wrote:
> >
> >> Note that this isn't really a new function -- it's the same one that you
> >> posted (or would have posted, i think, if you had posted the last line
> of
> >> it).  Except, mine was from a version that had =: instead of =: for its
> >> intermediate results.  That's bad, for production code, but it does let
> us
> >> see what the bug is:
> >>
> >>   _4 ]\ expand #inv (+/expand){.data
> >>
> >>
> ┌────────────────────┬───────────────────┬─────────────────────────┬────────────────────┐
> >> │param1              │param2             │param3
> │param5
> >>             │
> >>
> >>
> ├────────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
> >> │param1    =  12345  │param2    =   NONE │param3   =   hello world │
> >>             │
> >>
> >>
> ├────────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
> >> │                    │                   │param1  = 34567
>  │param3
> >>  = hello bob │
> >>
> >>
> ├────────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
> >> │param5   - zero one │                   │
> │param5
> >> = two three  │
> >>
> >>
> ├────────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
> >> │param1 = 6789       │param2 = SOME      │                         │
> >>             │
> >>
> >>
> ├────────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
> >> │param1              │param2             │param3
> │param5
> >>             │
> >>
> >>
> └────────────────────┴───────────────────┴─────────────────────────┴────────────────────┘
> >>
> >> I am not defining "expand" properly.  Thus, parameters are being
> >> misplaced.
> >>
> >> If I use an alternate definition for expand, it seems to get the
> >> parameters
> >> into the right places:
> >>
> >>   expand=: ;0 1 2 3 e.L:0 (<;.1~ 1,2>:/\]) ,I. |:>(e.L:0~ /:~@;) {."1
> locs
> >>   _4 ]\ expand #inv (+/expand){.data
> >>
> >>
> ┌───────────────────┬───────────────────┬─────────────────────────┬────────────────────┐
> >> │param1             │param2             │param3
> │param5
> >>           │
> >>
> >>
> ├───────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
> >> │param1    =  12345 │param2    =   NONE │param3   =   hello world │
> >>           │
> >>
> >>
> ├───────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
> >> │param1  = 34567    │                   │param3  = hello bob
>  │param5
> >> - zero one │
> >>
> >>
> ├───────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
> >> │                   │                   │
> │param5
> >> =
> >> two three  │
> >>
> >>
> ├───────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
> >> │param1 = 6789      │param2 = SOME      │                         │
> >>           │
> >>
> >>
> ├───────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
> >> │param1             │param2             │param3
> │param5
> >>           │
> >>
> >>
> └───────────────────┴───────────────────┴─────────────────────────┴────────────────────┘
> >>
> >> ...and this also lets me clean up some unneeded stuff (I no longer need
> to
> >> add the blank tags to the text I am working with, and so I no longer
> need
> >> to drop those rows from the result.. except it blows up if no tags are
> >> present, so I can't get rid of that entirely...
> >>
> >> Anyways, here's how it looks with this definition for expand:
> >>
> >> getTagsContents=: 4 :0
> >>  'n m'=. $tags=. > _2 <\ y
> >>   locs=. (-@#@[ {. I. {./. ])&.>/\"1 tags [email protected]:0 }. txt=. ' ',x,;tags
> >>   assert. -:&/:&;/ |:locs  NB. tags must be balanced
> >>   data=. _2 {:\  ((/:~ ; locs) I. i.#txt) </.  txt
> >>  expand=. ;(i.n) e.L:0 (<;.1~ 1,2>:/\]) ,I. |:>(e.L:0~ /:~@;) {."1 locs
> >>  }: (#@>{."1 tags) }.&.>"1 (-n) ]\ expand #inv (+/expand){.data
> >> )
> >>
> >> --
> >> Raul
> >>
> >>
>
>
> --
> Skip Cave
> Cave Consulting LLC
> Phone: 214-460-4861
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to