Raul's getTagsConterns function works great on my data. Here's the function:
getTagsContents=: 4 :0
'n m'=. $tags=. > _2 <\ y
locs=. (-@#@[ {. I. {./. ])&.>/\"1 tags [email protected]:0 }. txt=. ' ',x,;tags
assert. -:&/:&;/ |:locs NB. tags must be balanced
data=. _2 {:\ ((/:~ ; locs) I. i.#txt) </. txt
expand=. ;(i.n) e.L:0 (<;.1~ 1,2>:/\]) ,I. |:>(e.L:0~ /:~@;) {."1 locs
}: (#@>{."1 tags) }.&.>"1 (-n) ]\ expand #inv (+/expand){.data
)
However, I have a few logs that got garbled, and they fail Raul's assert
test:
ww1 is a boxed array with 1000 text log files in it, one log file per box
$ww1
1000
$ ; ww1
32842565
tags4 is a noun containing the four tag pairs that bracket the text that I
need to extract using Raul's getTagsContents function
tags4
┌──────┬───┬─────────┬───┬───────────────────────────┬───┬──────────────────┬───┐
│STATUS│ │RESULT[0]│ │CONFIDENCE[0] =│
│UTTERANCE_FILENAME│ │
└──────┴───┴─────────┴───┴───────────────────────────┴───┴──────────────────┴───┘
now we test:
ww1x =: (;ww1) getTagsContents tags4
|assertion failure: getTagsContents
| -:&/:&;/|:locs
ww1x =: (; 365 {. ww1) getTagsContents tags4 NB. This works
ww1x =: (; 366 } ww1) getTagsContents tags4
|assertion failure: getTagsContents
| -:&/:&;/|:locs
There's the culprit - box no 366 in ww1
There also a couple of other garbled logs in ww1 that fail the assertion
test.
Is there any way to build the getTagsContents, so if a specofic boxed log
fails assertion,
the function will skip that boxed log and go to the next one?
Skip
.
On Sat, Nov 19, 2011 at 1:01 PM, Skip Cave <[email protected]> wrote:
> Raul
>
> That works like a charm! It gets all the parameters, and puts them in the
> right columns. Now I'll try it on a larger data file with real data in it:
>
> $ww
> 10 NB. ww has ten log files in it, one box per log file.
> $;ww
> 969059 NB. ww unboxed and raveled is a long text string of catenated log
> files. Each log file has lots of events in it, and each event has lots of
> parameters.
>
> a. i. crlftb
> 13 10 9 NB. The verb crlftb has CR, LF, Tab in it.
>
> NB. This is the terminator string for all the lines in the log file.
>
> NB. I want the parameters on every lines starting with STATUS, RESULT[0],
> and CONFIDENCE[0]
>
> tags2 =: 'STATUS'; crlftb ; 'RESULT[0]'; crlftb ; 'CONFIDENCE[0]' ; crlftb
> tags2
> ┌──────┬───┬─────────┬───┬─────────────┬───┐
> │STATUS│ │RESULT[0]│ │CONFIDENCE[0]│ │
> └──────┴───┴─────────┴───┴─────────────┴───┘
>
> NB. Now the acid test:
>
> txt9 =: (; ww) getTagsContents tags2
> $txt9
> 120 3
>
> So there were 120 events in all the log files that had at least one of the
> three parameter values we wanted, in them.
>
> Let's take a look:
>
> cleanString1 10 {. 100 }. txt9
> ┌───────────┬───────────┬─────────────────┐
> │ │ │[0][__MRCP_GID] 0│
> ├───────────┼───────────┼─────────────────┤
> │ │ │[0][__MRCP_STR] 0│
> ├───────────┼───────────┼─────────────────┤
> │RECOGNITION│main menu │75 │
> ├───────────┼───────────┼─────────────────┤
> │ │ │[0][__MRCP_GID] 0│
> ├───────────┼───────────┼─────────────────┤
> │ │ │[0][__MRCP_STR] 0│
> ├───────────┼───────────┼─────────────────┤
> │RECOGNITION│ninety five│64 │
> ├───────────┼───────────┼─────────────────┤
> │ │ │[0][__MRCP_GID] 0│
> ├───────────┼───────────┼─────────────────┤
> │ │ │[0][__MRCP_STR] 0│
> ├───────────┼───────────┼─────────────────┤
> │RECOGNITION│yes │86 │
> ├───────────┼───────────┼─────────────────┤
> │ │ │[0][__MRCP_GID] 0│
> └───────────┴───────────┴─────────────────┘
>
> Yes! that's it!
>
> Raul, Frasier, Björn, Linda, *thanks to all of you* for helping me on
> this problem.
>
> Now I have to do this same thing to a few thousand log files instead of
> just 10. Then I need to do all kinds of analysis on the resulting data. I
> think I know enough J to do the analysis part, but I still may have to ask
> a question or two, if I get stuck.
>
> I'll let you all know how it goes....
>
> Skip
>
>
> On Sat, Nov 19, 2011 at 11:24 AM, Raul Miller <[email protected]>wrote:
>
>> Note that this isn't really a new function -- it's the same one that you
>> posted (or would have posted, i think, if you had posted the last line of
>> it). Except, mine was from a version that had =: instead of =: for its
>> intermediate results. That's bad, for production code, but it does let us
>> see what the bug is:
>>
>> _4 ]\ expand #inv (+/expand){.data
>>
>> ┌────────────────────┬───────────────────┬─────────────────────────┬────────────────────┐
>> │param1 │param2 │param3 │param5
>> │
>>
>> ├────────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
>> │param1 = 12345 │param2 = NONE │param3 = hello world │
>> │
>>
>> ├────────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
>> │ │ │param1 = 34567 │param3
>> = hello bob │
>>
>> ├────────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
>> │param5 - zero one │ │ │param5
>> = two three │
>>
>> ├────────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
>> │param1 = 6789 │param2 = SOME │ │
>> │
>>
>> ├────────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
>> │param1 │param2 │param3 │param5
>> │
>>
>> └────────────────────┴───────────────────┴─────────────────────────┴────────────────────┘
>>
>> I am not defining "expand" properly. Thus, parameters are being
>> misplaced.
>>
>> If I use an alternate definition for expand, it seems to get the
>> parameters
>> into the right places:
>>
>> expand=: ;0 1 2 3 e.L:0 (<;.1~ 1,2>:/\]) ,I. |:>(e.L:0~ /:~@;) {."1 locs
>> _4 ]\ expand #inv (+/expand){.data
>>
>> ┌───────────────────┬───────────────────┬─────────────────────────┬────────────────────┐
>> │param1 │param2 │param3 │param5
>> │
>>
>> ├───────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
>> │param1 = 12345 │param2 = NONE │param3 = hello world │
>> │
>>
>> ├───────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
>> │param1 = 34567 │ │param3 = hello bob │param5
>> - zero one │
>>
>> ├───────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
>> │ │ │ │param5
>> =
>> two three │
>>
>> ├───────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
>> │param1 = 6789 │param2 = SOME │ │
>> │
>>
>> ├───────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
>> │param1 │param2 │param3 │param5
>> │
>>
>> └───────────────────┴───────────────────┴─────────────────────────┴────────────────────┘
>>
>> ...and this also lets me clean up some unneeded stuff (I no longer need to
>> add the blank tags to the text I am working with, and so I no longer need
>> to drop those rows from the result.. except it blows up if no tags are
>> present, so I can't get rid of that entirely...
>>
>> Anyways, here's how it looks with this definition for expand:
>>
>> getTagsContents=: 4 :0
>> 'n m'=. $tags=. > _2 <\ y
>> locs=. (-@#@[ {. I. {./. ])&.>/\"1 tags [email protected]:0 }. txt=. ' ',x,;tags
>> assert. -:&/:&;/ |:locs NB. tags must be balanced
>> data=. _2 {:\ ((/:~ ; locs) I. i.#txt) </. txt
>> expand=. ;(i.n) e.L:0 (<;.1~ 1,2>:/\]) ,I. |:>(e.L:0~ /:~@;) {."1 locs
>> }: (#@>{."1 tags) }.&.>"1 (-n) ]\ expand #inv (+/expand){.data
>> )
>>
>> --
>> Raul
>>
>>
--
Skip Cave
Cave Consulting LLC
Phone: 214-460-4861
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm