Re: Jane Austen's peculiarity

James Hale Tue, 11 Aug 2015 07:33:22 -0700

Of course I couldn't resist a tinker. I too am into text manipulation/searching 
and wondered how I would go about this.
I looked at the repeat loops and realised they would run much faster if they 
were inverted as I am sure the list of verbs would be less than the lines of 
text being searched.
I also wanted to use a "repeat for each" construct as this is usually orders of 
magnitude faster.
But this meant I needed the line count and adding a counter seemed counter 
productive.
So I settled on using the lineoffset.


Here was my go...

on mouseUp
   put empty into fld "COOKED"
   put empty into fld "STARTT"
   put empty into fld "STOPT"
   put empty into lCooked1
  put "started : " & the long time into fld "STARTT"
   put the milliseconds into st
   put fld "TEKST" into TEKST
   put fld "WERBS" into WERBS   
   put 0 into acounter   
   put the number of lines of TEKST into numlines
    
   repeat for each line KWERBS in WERBS
      put "was " &  KWERBS into FRAZE
      put "were " & KWERBS into FRAZE2
      put 0 into loffesta
      put 0 into loffestb
      
      put 1 into lcounta
      put 1 into lcountb
      repeat while lcounta <> 0
         put lineoffset(FRAZE,TEKST,loffesta) into lcounta
         if lcounta = 0 then
            exit repeat
         end if
         put lcounta + loffesta  into thelinea
         put thelinea & " : " &  line thelinea of TEKST & cr after lCooked1
         put lcounta into loffesta

      end repeat
      
      repeat while lcountb <> 0
         put lineoffset(FRAZE2,TEKST,loffestb) into lcountb
         if lcountb = 0 then
            exit repeat
         end if
         put lcountb + loffestb  into thelineb
         put thelineb & " : " &    line thelineb of TEKST  & cr after lCooked1
         put lcountb into loffestb    
      end repeat      
   end repeat   
   put the number of lines of lCooked1 & " found"
   put lcooked1 into fld "Cooked"
   put "finished : " & the long time into fld "STOPT"
   put the milliseconds into nd
   put nd - st into fld "TIMET"
end mouseUp


I haven't tried returning to the original repeat order to see if this was 
faster but running the above on Richmond's sample stack for the "WAS/WERE" case 
delivered a result of three lines..

2663 : officers, who in comparison with the stranger, were become "stupid,
731 : was returned in due form. Miss Bennet's pleasing manners grew on the
4116 : were returned, and to lament over his absence from the Netherfield ball.

in 89 msec on my Mac running LC7.1Dp1

I was then going to examine colourising the found chunks when I realised that 
the supplied text had line breaks within each paragraph.
This means none of the proposed solutions (including Richmond's own) will find 
the desired phrase if it falls across one of these line breaks.
For my solution using lineoffset this is a dead end WHILE these line breaks 
within a paragraph remain.
For the other solutions a simple expedient is to increase the number of FRAZEs 
to four...

put "was " &  KWERBS into FRAZE
put "was" & cr  &  KWERBS into FRAZE2
put "were " & KWERBS into FRAZE3
put "were"  & cr & KWERBS into FRAZE4

This addition makes the extra FRAZES two "lines" and thus non valid arguments 
for a lineoffset function.

or so I thought.
However given the unpredictability of the formatting of the text this was a 
much too simplistic solution.
This solution breaks down where paragraphs are indented using spaces!

So, to keep the formatting as read in is problematic without knowing the 
formatting used.
But if the focus is the actual text, then perhaps the "fancy" formatting is not 
important.

Processing the text BEFORE searching so as to remove embedded line breaks and 
space padding allows my original code to work fine.

inserting the following before the REPEATS does the trick (at least with the 
example text

  replace return with "^&*" in TEKST
   put "\s+" into lmultispace
   put replacetext (TEKST,lmultispace," ") into TEKST
   replace "^&*^&*" with return in TEKST
   replace "^&*" with " " in TEKST
   replace return with return & return in TEKST
The only downside being the time to execute went from 89 msec to 616 msec.

you mileage may vary.

NOTE: My method does not identify multiple instances of the FRAZE within a 
single line, however once it is found in a line it would be simple to see if it 
occurred again.

Thanks for the diversion Richmond.

James
_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: Jane Austen's peculiarity

Reply via email to