There are other reasons why mine might stop (like missing end tags). What definition are you using for tags6?
-- Raul On Wed, Nov 23, 2011 at 6:43 PM, Skip Cave <[email protected]> wrote: > Both Arie and Raul posted updated functions. I will test each one on the > data I posted at: > https://www.opendrive.com/files?51418263_gn47v > > I will try Arie's first: > > ww1A1 =: (ww1t) getFieldsV21 tags6 > $ww1A1 > 4917 4 > > 5 {. 500 }. ww1A1 > > ┌───────────────────┬────┬────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ > │ RECOGNITION │ no │ 73 │ > > C:\Nuance\V8.5.0\mrcp\logs\2011\10October\29\02-03-23-vx1prn123-7b42060a_00001cd4_4eab5eeb_0022_0000\utt04.wav > │ > > ├───────────────────┼────┼────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ > │ NO_SPEECH_TIMEOUT │ │ > │ > │ > > ├───────────────────┼────┼────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ > │ ABORTED │ │ > │ > │ > > ├───────────────────┼────┼────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ > │ NO_SPEECH_TIMEOUT │ │ > │ > │ > > ├───────────────────┼────┼────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ > │ NO_SPEECH_TIMEOUT │ │ > │ > │ > > └───────────────────┴────┴────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ > > Looks good! > > Now we try Raul's function: > > tags3 > ┌──────┬───┬─────────┬───┬───────────────────────────┬───┐ > │STATUS│ │RESULT[0]│ │CONFIDENCE[0] =│ │ > └──────┴───┴─────────┴───┴───────────────────────────┴───┘ > ww1R =: cleanString1 (ww1t) getTagsContents tags3 > Ignoring overlapped tags on line(s): 6 25 41 158 194 215 258 282 287 299 > 307 341 381 414 441 443 452 481 484 571 574 610 677 712 748 811 855 1236 > 1268 1303 1350 1382 1449 1590 1635 1671 1707 1713 1725 1733 1767 1807 1840 > 1867 1869 1878 1907 1910 1997 2000 ... > |syntax error: getTagsContents > | smoutput'Ignoring overlapped tags on line(s): ',":1+(I.txt=LF)I. > $ww1R > $ ww1R > > > So Raul's function still stops and aborts when encountering mismatched > tags, though I haven't tried to look at the actual failing data. Raul says > it should skip over the mismatched tags, but it is stopping when it hits > one. > > Skip > > On Wed, Nov 23, 2011 at 11:21 AM, Raul Miller <[email protected]> > wrote: > > > Here's a variation which emits a warning when tags overlap: > > > > dups=: ~.@#~ i.@# ~: i.~ > > > > getTagsContents=: 4 :0 > > 'n m'=. $tags=. > _2 <\ y > > txt=. ' ',x,;tags > > locs=. (-@#@[ {. I. {./. ])&.>/\"1 tags [email protected]:0 }. txt > > overlapped=. dups;{:"1 locs > > if. #overlapped do. > > smoutput 'Ignoring overlapped tags on line(s): ',":1+(I.txt=LF) I. > > overlapped > > locs=. (#~L:0 ([email protected]:0 dups@;)@:({:"1)) locs > > end. > > assert. -:&/:&;/ |:locs NB. tags must be balanced > > data=. _2 {:\ ((/:~ ; locs) I. i.#txt) </. txt > > expand=. ;(i.n) e.L:0 (<;.1~ 1,2>:/\]) ,I. |:>(e.L:0~ /:~@;) {."1 locs > > }: (#@>{."1 tags) }.&.>"1 (-n) ]\ expand #inv (+/expand){.data > > ) > > > > I should also note that a pair of overlapped tags might span two tag > > sequences. And I suspect that deleting all damaged sequences (all tag > > sequences which would have contained damaged tags) would just about > double > > the complexity of the program -- and I doubt that that's worth doing, > given > > that the system already allows damaged tags. > > > > -- > > Raul > > > > On Tue, Nov 22, 2011 at 10:54 AM, Raul Miller <[email protected]> > > wrote: > > > > > This version ignores duplicate tags. > > > > > > Note that it's not precisely what you asked for -- it is not deleting > the > > > entire tag sequence, it's only skipping over the conflicted tags. If > > there > > > is another tag in the sequence which is not conflicted, it will still > > show > > > up. This is because I do not identify the sequences until later. > > > > > > dups=: ~.@#~ i.@# ~: i.~ > > > > > > getTagsContents=: 4 :0 > > > 'n m'=. $tags=. > _2 <\ y > > > txt=. ' ',x,;tags > > > locs=. (-@#@[ {. I. {./. ])&.>/\"1 tags [email protected]:0 }. txt > > > locs=. (#~L:0 ([email protected]:0 dups@;)@:({:"1)) locs > > > assert. -:&/:&;/ |:locs NB. tags must be balanced > > > data=. _2 {:\ ((/:~ ; locs) I. i.#txt) </. txt > > > expand=. ;(i.n) e.L:0 (<;.1~ 1,2>:/\]) ,I. |:>(e.L:0~ /:~@;) {."1 locs > > > }: (#@>{."1 tags) }.&.>"1 (-n) ]\ expand #inv (+/expand){.data > > > ) > > > > > > Note that another approach might be to use a different technique to > > > extract the tag contents. If I used character indices to extract them, > > > then I could relax the restriction that tags cannot overlap. > > > > > > FYI, > > > > > > -- > > > Raul > > > > > > > > > On Mon, Nov 21, 2011 at 4:44 PM, Skip Cave <[email protected] > > >wrote: > > > > > >> If the program detects an assert failure, it should find the whole tag > > >> sequence (tag1s, tag1e, tag2s, tag2e, etc). and should skip over that > > >> entire bad tag sequence. It should find the next appearence of the > first > > >> start tag (tag1s) and process it as usual. > > >> > > >> Right now, when the assert fails, the whole program stops in the > middle > > of > > >> processing, with no clue where the failure was. In a perfect world, > the > > >> program would also note the position of the failed text in a global > > >> variable, so I could inspect the failure later, as well as find out > how > > >> many bad tag sets there were in the run. Generally the problem is a > > >> mangled > > >> log file. I probably won't be able to fix it anyway, so just skipping > > over > > >> the bad tag set is the best option. > > >> > > >> Skip > > >> > > >> On Mon, Nov 21, 2011 at 2:47 PM, Raul Miller <[email protected]> > > >> wrote: > > >> > > >> > That assert is checking for unbalanced tags. You probably have two > > >> start > > >> > tags followed by one end tag. > > >> > > > >> > What do you want the program to do for this kind of thing? > > >> > > > >> > -- > > >> > Raul > > > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
