Raul's getTagsConterns function works great on my data. Here's the function:

getTagsContents=: 4 :0
 'n m'=. $tags=. > _2 <\ y
 locs=. (-@#@[ {. I. {./. ])&.>/\"1 tags [email protected]:0 }. txt=. ' ',x,;tags
  assert. -:&/:&;/ |:locs  NB. tags must be balanced
  data=. _2 {:\  ((/:~ ; locs) I. i.#txt) </.  txt
 expand=. ;(i.n) e.L:0 (<;.1~ 1,2>:/\]) ,I. |:>(e.L:0~ /:~@;) {."1 locs
 }: (#@>{."1 tags) }.&.>"1 (-n) ]\ expand #inv (+/expand){.data
)



However, I have a few logs that got garbled, and they fail Raul's assert
test:

ww1 is a boxed array with 1000 text log files in it, one log file per box

   $ww1
1000
   $ ; ww1
32842565

tags4 is a noun containing the four tag pairs that bracket the text that I
need to extract using Raul's getTagsContents function

   tags4
┌──────┬───┬─────────┬───┬───────────────────────────┬───┬──────────────────┬───┐
│STATUS│   │RESULT[0]│   │CONFIDENCE[0]             =│
│UTTERANCE_FILENAME│   │
└──────┴───┴─────────┴───┴───────────────────────────┴───┴──────────────────┴───┘

now we test:

   ww1x =: (;ww1) getTagsContents  tags4
|assertion failure: getTagsContents
|   -:&/:&;/|:locs

   ww1x =: (;  365 {. ww1) getTagsContents  tags4  NB. This works

   ww1x =: (; 366 } ww1) getTagsContents   tags4
|assertion failure: getTagsContents
|   -:&/:&;/|:locs

There's the culprit - box no 366 in ww1

There also a couple of other garbled logs in ww1 that fail the assertion
test.

Is there any way to build the getTagsContents, so if a specofic boxed log
fails assertion,
the function will skip that boxed log and go to the next one?

Skip
.
On Sat, Nov 19, 2011 at 1:01 PM, Skip Cave <[email protected]> wrote:

> Raul
>
> That works like a charm! It gets all the parameters, and puts them in the
> right columns. Now I'll try it on a larger data file with real data in it:
>
>   $ww
> 10         NB. ww has ten log files in it, one box per log file.
>    $;ww
> 969059  NB. ww unboxed and raveled is a long text string of catenated log
> files. Each log file has lots of events in it, and each event has lots of
> parameters.
>
>    a. i. crlftb
> 13 10 9      NB. The verb crlftb has CR, LF, Tab in it.
>
> NB. This is the terminator string for all the lines in the log file.
>
> NB. I want the parameters on every lines starting with STATUS, RESULT[0],
> and CONFIDENCE[0]
>
> tags2 =: 'STATUS'; crlftb ; 'RESULT[0]'; crlftb ; 'CONFIDENCE[0]' ; crlftb
>    tags2
> ┌──────┬───┬─────────┬───┬─────────────┬───┐
> │STATUS│   │RESULT[0]│   │CONFIDENCE[0]│   │
> └──────┴───┴─────────┴───┴─────────────┴───┘
>
> NB. Now the acid test:
>
>  txt9 =:  (; ww) getTagsContents tags2
>    $txt9
> 120 3
>
> So there were 120 events in all the log files that had at least one of the
> three parameter values we wanted, in them.
>
> Let's take a look:
>
>   cleanString1 10 {. 100 }. txt9
> ┌───────────┬───────────┬─────────────────┐
> │           │           │[0][__MRCP_GID] 0│
> ├───────────┼───────────┼─────────────────┤
> │           │           │[0][__MRCP_STR] 0│
> ├───────────┼───────────┼─────────────────┤
> │RECOGNITION│main menu  │75               │
> ├───────────┼───────────┼─────────────────┤
> │           │           │[0][__MRCP_GID] 0│
> ├───────────┼───────────┼─────────────────┤
> │           │           │[0][__MRCP_STR] 0│
> ├───────────┼───────────┼─────────────────┤
> │RECOGNITION│ninety five│64               │
> ├───────────┼───────────┼─────────────────┤
> │           │           │[0][__MRCP_GID] 0│
> ├───────────┼───────────┼─────────────────┤
> │           │           │[0][__MRCP_STR] 0│
> ├───────────┼───────────┼─────────────────┤
> │RECOGNITION│yes        │86               │
> ├───────────┼───────────┼─────────────────┤
> │           │           │[0][__MRCP_GID] 0│
> └───────────┴───────────┴─────────────────┘
>
> Yes! that's it!
>
> Raul, Frasier, Björn, Linda, *thanks to all of you* for helping me on
> this problem.
>
> Now I have to do this same thing to a few thousand log files instead of
> just 10. Then I need to do all kinds of analysis on the resulting data. I
> think I know enough J to do the analysis part, but I still may have to ask
> a question or two, if I get stuck.
>
> I'll let you all know how it goes....
>
> Skip
>
>
> On Sat, Nov 19, 2011 at 11:24 AM, Raul Miller <[email protected]>wrote:
>
>> Note that this isn't really a new function -- it's the same one that you
>> posted (or would have posted, i think, if you had posted the last line of
>> it).  Except, mine was from a version that had =: instead of =: for its
>> intermediate results.  That's bad, for production code, but it does let us
>> see what the bug is:
>>
>>   _4 ]\ expand #inv (+/expand){.data
>>
>> ┌────────────────────┬───────────────────┬─────────────────────────┬────────────────────┐
>> │param1              │param2             │param3                   │param5
>>             │
>>
>> ├────────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
>> │param1    =  12345  │param2    =   NONE │param3   =   hello world │
>>             │
>>
>> ├────────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
>> │                    │                   │param1  = 34567          │param3
>>  = hello bob │
>>
>> ├────────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
>> │param5   - zero one │                   │                         │param5
>> = two three  │
>>
>> ├────────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
>> │param1 = 6789       │param2 = SOME      │                         │
>>             │
>>
>> ├────────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
>> │param1              │param2             │param3                   │param5
>>             │
>>
>> └────────────────────┴───────────────────┴─────────────────────────┴────────────────────┘
>>
>> I am not defining "expand" properly.  Thus, parameters are being
>> misplaced.
>>
>> If I use an alternate definition for expand, it seems to get the
>> parameters
>> into the right places:
>>
>>   expand=: ;0 1 2 3 e.L:0 (<;.1~ 1,2>:/\]) ,I. |:>(e.L:0~ /:~@;) {."1 locs
>>   _4 ]\ expand #inv (+/expand){.data
>>
>> ┌───────────────────┬───────────────────┬─────────────────────────┬────────────────────┐
>> │param1             │param2             │param3                   │param5
>>           │
>>
>> ├───────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
>> │param1    =  12345 │param2    =   NONE │param3   =   hello world │
>>           │
>>
>> ├───────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
>> │param1  = 34567    │                   │param3  = hello bob      │param5
>> - zero one │
>>
>> ├───────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
>> │                   │                   │                         │param5
>> =
>> two three  │
>>
>> ├───────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
>> │param1 = 6789      │param2 = SOME      │                         │
>>           │
>>
>> ├───────────────────┼───────────────────┼─────────────────────────┼────────────────────┤
>> │param1             │param2             │param3                   │param5
>>           │
>>
>> └───────────────────┴───────────────────┴─────────────────────────┴────────────────────┘
>>
>> ...and this also lets me clean up some unneeded stuff (I no longer need to
>> add the blank tags to the text I am working with, and so I no longer need
>> to drop those rows from the result.. except it blows up if no tags are
>> present, so I can't get rid of that entirely...
>>
>> Anyways, here's how it looks with this definition for expand:
>>
>> getTagsContents=: 4 :0
>>  'n m'=. $tags=. > _2 <\ y
>>   locs=. (-@#@[ {. I. {./. ])&.>/\"1 tags [email protected]:0 }. txt=. ' ',x,;tags
>>   assert. -:&/:&;/ |:locs  NB. tags must be balanced
>>   data=. _2 {:\  ((/:~ ; locs) I. i.#txt) </.  txt
>>  expand=. ;(i.n) e.L:0 (<;.1~ 1,2>:/\]) ,I. |:>(e.L:0~ /:~@;) {."1 locs
>>  }: (#@>{."1 tags) }.&.>"1 (-n) ]\ expand #inv (+/expand){.data
>> )
>>
>> --
>> Raul
>>
>>


-- 
Skip Cave
Cave Consulting LLC
Phone: 214-460-4861
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to