Raul
That works like a charm! It gets all the parameters, and puts them in the
right columns. Now I'll try it on a larger data file with real data in it:
$ww
10 NB. ww has ten log files in it, one box per log file.
$;ww
969059 NB. ww unboxed and raveled is a long text string of catenated log
files. Each log file has lots of events in it, and each event has lots of
parameters.
a. i. crlftb
13 10 9 NB. The verb crlftb has CR, LF, Tab in it.
NB. This is the terminator string for all the lines in the log file.
NB. I want the parameters on every lines starting with STATUS, RESULT[0],
and CONFIDENCE[0]
tags2 =: 'STATUS'; crlftb ; 'RESULT[0]'; crlftb ; 'CONFIDENCE[0]' ; crlftb
tags2
┌──────┬───┬─────────┬───┬─────────────┬───┐
│STATUS│ │RESULT[0]│ │CONFIDENCE[0]│ │
└──────┴───┴─────────┴───┴─────────────┴───┘
NB. Now the acid test:
txt9 =: (; ww) getTagsContents tags2
$txt9
120 3
So there were 120 events in all the log files that had at least one of the
three parameter values we wanted, in them.
Let's take a look:
cleanString1 10 {. 100 }. txt9
┌───────────┬───────────┬─────────────────┐
│ │ │[0][__MRCP_GID] 0│
├───────────┼───────────┼─────────────────┤
│ │ │[0][__MRCP_STR] 0│
├───────────┼───────────┼─────────────────┤
│RECOGNITION│main menu │75 │
├───────────┼───────────┼─────────────────┤
│ │ │[0][__MRCP_GID] 0│
├───────────┼───────────┼─────────────────┤
│ │ │[0][__MRCP_STR] 0│
├───────────┼───────────┼─────────────────┤
│RECOGNITION│ninety five│64 │
├───────────┼───────────┼─────────────────┤
│ │ │[0][__MRCP_GID] 0│
├───────────┼───────────┼─────────────────┤
│ │ │[0][__MRCP_STR] 0│
├───────────┼───────────┼─────────────────┤
│RECOGNITION│yes │86 │
├───────────┼───────────┼─────────────────┤
│ │ │[0][__MRCP_GID] 0│
└───────────┴───────────┴─────────────────┘
Yes! that's it!
Raul, Frasier, Björn, Linda, *thanks to all of you* for helping me on this
problem.
Now I have to do this same thing to a few thousand log files instead of
just 10. Then I need to do all kinds of analysis on the resulting data. I
think I know enough J to do the analysis part, but I still may have to ask
a question or two, if I get stuck.
I'll let you all know how it goes....
Skip
On Sat, Nov 19, 2011 at 11:24 AM, Raul Miller <[email protected]> wrote:
> Note that this isn't really a new function -- it's the same one that you
> posted (or would have posted, i think, if you had posted the last line of
> it). Except, mine was from a version that had =: instead of =: for its
> intermediate results. That's bad, for production code, but it does let us
> see what the bug is:
>
> _4 ]\ expand #inv (+/expand){.data
>
> ┌────────────────────┬───────────────────┬─────────────────────────┬────────────────────┐
> │param1 │param2 │param3 │param5
> │
>
> ├────────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
> │param1 = 12345 │param2 = NONE │param3 = hello world │
> │
>
> ├────────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
> │ │ │param1 = 34567 │param3
> = hello bob │
>
> ├────────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
> │param5 - zero one │ │ │param5
> = two three │
>
> ├────────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
> │param1 = 6789 │param2 = SOME │ │
> │
>
> ├────────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
> │param1 │param2 │param3 │param5
> │
>
> └────────────────────┴───────────────────┴─────────────────────────┴────────────────────┘
>
> I am not defining "expand" properly. Thus, parameters are being misplaced.
>
> If I use an alternate definition for expand, it seems to get the parameters
> into the right places:
>
> expand=: ;0 1 2 3 e.L:0 (<;.1~ 1,2>:/\]) ,I. |:>(e.L:0~ /:~@;) {."1 locs
> _4 ]\ expand #inv (+/expand){.data
>
> ┌───────────────────┬───────────────────┬─────────────────────────┬────────────────────┐
> │param1 │param2 │param3 │param5
> │
>
> ├───────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
> │param1 = 12345 │param2 = NONE │param3 = hello world │
> │
>
> ├───────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
> │param1 = 34567 │ │param3 = hello bob │param5
> - zero one │
>
> ├───────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
> │ │ │ │param5 =
> two three │
>
> ├───────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
> │param1 = 6789 │param2 = SOME │ │
> │
>
> ├───────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
> │param1 │param2 │param3 │param5
> │
>
> └───────────────────┴───────────────────┴─────────────────────────┴────────────────────┘
>
> ...and this also lets me clean up some unneeded stuff (I no longer need to
> add the blank tags to the text I am working with, and so I no longer need
> to drop those rows from the result.. except it blows up if no tags are
> present, so I can't get rid of that entirely...
>
> Anyways, here's how it looks with this definition for expand:
>
> getTagsContents=: 4 :0
> 'n m'=. $tags=. > _2 <\ y
> locs=. (-@#@[ {. I. {./. ])&.>/\"1 tags [email protected]:0 }. txt=. ' ',x,;tags
> assert. -:&/:&;/ |:locs NB. tags must be balanced
> data=. _2 {:\ ((/:~ ; locs) I. i.#txt) </. txt
> expand=. ;(i.n) e.L:0 (<;.1~ 1,2>:/\]) ,I. |:>(e.L:0~ /:~@;) {."1 locs
> }: (#@>{."1 tags) }.&.>"1 (-n) ]\ expand #inv (+/expand){.data
> )
>
> --
> Raul
>
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm