Re: [Jprogramming] Finding multiple sequential strings

Raul Miller Wed, 23 Nov 2011 18:45:44 -0800

There are other reasons why mine might stop (like missing end tags).

What definition are you using for tags6?


-- 
Raul

On Wed, Nov 23, 2011 at 6:43 PM, Skip Cave <[email protected]> wrote:

> Both Arie and Raul posted updated functions. I will test each one on the
> data I posted at:
> https://www.opendrive.com/files?51418263_gn47v
>
> I will try Arie's first:
>
>   ww1A1 =: (ww1t) getFieldsV21 tags6
>   $ww1A1
> 4917 4
>
>  5 {. 500 }. ww1A1
>
> ┌───────────────────┬────┬────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
> │ RECOGNITION       │ no │ 73 │
>
> C:\Nuance\V8.5.0\mrcp\logs\2011\10October\29\02-03-23-vx1prn123-7b42060a_00001cd4_4eab5eeb_0022_0000\utt04.wav
> │
>
> ├───────────────────┼────┼────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
> │ NO_SPEECH_TIMEOUT │    │
> │
> │
>
> ├───────────────────┼────┼────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
> │ ABORTED           │    │
> │
> │
>
> ├───────────────────┼────┼────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
> │ NO_SPEECH_TIMEOUT │    │
> │
> │
>
> ├───────────────────┼────┼────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
> │ NO_SPEECH_TIMEOUT │    │
> │
> │
>
> └───────────────────┴────┴────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
>
> Looks good!
>
> Now we try Raul's function:
>
>    tags3
> ┌──────┬───┬─────────┬───┬───────────────────────────┬───┐
> │STATUS│   │RESULT[0]│   │CONFIDENCE[0]             =│   │
> └──────┴───┴─────────┴───┴───────────────────────────┴───┘
>   ww1R =: cleanString1 (ww1t) getTagsContents tags3
> Ignoring overlapped tags on line(s): 6 25 41 158 194 215 258 282 287 299
> 307 341 381 414 441 443 452 481 484 571 574 610 677 712 748 811 855 1236
> 1268 1303 1350 1382 1449 1590 1635 1671 1707 1713 1725 1733 1767 1807 1840
> 1867 1869 1878 1907 1910 1997 2000 ...
> |syntax error: getTagsContents
> |       smoutput'Ignoring overlapped tags on line(s): ',":1+(I.txt=LF)I.
>    $ww1R
> $ ww1R
>
>
> So Raul's function still stops and aborts when encountering mismatched
> tags, though I haven't tried to look at the actual failing data. Raul says
> it should skip over the mismatched tags, but it is stopping when it hits
> one.
>
> Skip
>
> On Wed, Nov 23, 2011 at 11:21 AM, Raul Miller <[email protected]>
> wrote:
>
> > Here's a variation which emits a warning when tags overlap:
> >
> > dups=: ~.@#~ i.@# ~: i.~
> >
> > getTagsContents=: 4 :0
> >  'n m'=. $tags=. > _2 <\ y
> >  txt=. ' ',x,;tags
> >  locs=. (-@#@[ {. I. {./. ])&.>/\"1 tags [email protected]:0 }. txt
> >  overlapped=. dups;{:"1 locs
> >  if. #overlapped do.
> >   smoutput 'Ignoring overlapped tags on line(s): ',":1+(I.txt=LF) I.
> > overlapped
> >    locs=. (#~L:0 ([email protected]:0 dups@;)@:({:"1)) locs
> >  end.
> >  assert. -:&/:&;/ |:locs  NB. tags must be balanced
> >  data=. _2 {:\  ((/:~ ; locs) I. i.#txt) </.  txt
> >  expand=. ;(i.n) e.L:0 (<;.1~ 1,2>:/\]) ,I. |:>(e.L:0~ /:~@;) {."1 locs
> >  }: (#@>{."1 tags) }.&.>"1 (-n) ]\ expand #inv (+/expand){.data
> > )
> >
> > I should also note that a pair of overlapped tags might span two tag
> > sequences.  And I suspect that deleting all damaged sequences (all tag
> > sequences which would have contained damaged tags) would just about
> double
> > the complexity of the program -- and I doubt that that's worth doing,
> given
> > that the system already allows damaged tags.
> >
> > --
> > Raul
> >
> > On Tue, Nov 22, 2011 at 10:54 AM, Raul Miller <[email protected]>
> > wrote:
> >
> > > This version ignores duplicate tags.
> > >
> > > Note that it's not precisely what you asked for -- it is not deleting
> the
> > > entire tag sequence, it's only skipping over the conflicted tags.  If
> > there
> > > is another tag in the sequence which is not conflicted, it will still
> > show
> > > up.  This is because I do not identify the sequences until later.
> > >
> > > dups=: ~.@#~ i.@# ~: i.~
> > >
> > > getTagsContents=: 4 :0
> > >  'n m'=. $tags=. > _2 <\ y
> > >  txt=. ' ',x,;tags
> > >  locs=. (-@#@[ {. I. {./. ])&.>/\"1 tags [email protected]:0 }. txt
> > >  locs=. (#~L:0 ([email protected]:0 dups@;)@:({:"1)) locs
> > >  assert. -:&/:&;/ |:locs  NB. tags must be balanced
> > >  data=. _2 {:\  ((/:~ ; locs) I. i.#txt) </.  txt
> > >  expand=. ;(i.n) e.L:0 (<;.1~ 1,2>:/\]) ,I. |:>(e.L:0~ /:~@;) {."1 locs
> > >  }: (#@>{."1 tags) }.&.>"1 (-n) ]\ expand #inv (+/expand){.data
> > > )
> > >
> > > Note that another approach might be to use a different technique to
> > > extract the tag contents.  If I used character indices to extract them,
> > > then I could relax the restriction that tags cannot overlap.
> > >
> > > FYI,
> > >
> > > --
> > > Raul
> > >
> > >
> > > On Mon, Nov 21, 2011 at 4:44 PM, Skip Cave <[email protected]
> > >wrote:
> > >
> > >> If the program detects an assert failure, it should find the whole tag
> > >> sequence (tag1s, tag1e, tag2s, tag2e, etc). and should skip over that
> > >> entire bad tag sequence. It should find the next appearence of the
> first
> > >> start tag (tag1s) and process it as usual.
> > >>
> > >> Right now, when the assert fails, the whole program stops in the
> middle
> > of
> > >> processing, with no clue where the failure was. In a perfect world,
> the
> > >> program would also  note the position of the failed text in a global
> > >> variable, so I could inspect the failure later, as well as find out
> how
> > >> many bad tag sets there were in the run. Generally the problem is a
> > >> mangled
> > >> log file. I probably won't be able to fix it anyway, so just skipping
> > over
> > >> the bad tag set is the best option.
> > >>
> > >> Skip
> > >>
> > >> On Mon, Nov 21, 2011 at 2:47 PM, Raul Miller <[email protected]>
> > >> wrote:
> > >>
> > >> > That assert is checking for unbalanced tags.  You probably have two
> > >> start
> > >> > tags followed by one end tag.
> > >> >
> > >> > What do you want the program to do for this kind of thing?
> > >> >
> > >> > --
> > >> > Raul
> >
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Finding multiple sequential strings

Reply via email to