Re: regex question in matchChunk function

Peter Brigham MD Tue, 15 Dec 2009 14:21:40 -0800

Here is one way. These are utility functions I use constantly for textprocessing. Offsets(str,cntr) returns a comma-delimited list of allthe offsets of str in ctnr. Lineoffsets(str,cntr) does the same withlineoffsets. Then you can interate over the list of offsets to dowhatever you want to each instance of str in cntr. I keep them in autility stack that is in the stackinuse, so it is available to allstacks. I don't use regex, as I have never gotten the regex syntax tostick in my head firmly enough to find it natural, and in any casedoing it by script turns out to be as fast or faster.


-- Peter


Peter M. Brigham
[email protected]
http://home.comcast.net/~pmbrig

---------

function offsets str,cntr
   -- returns a comma-delimited list of
   -- all the offsets of str in cntr
   put "" into oList
   put 0 into startPoint
   repeat
      put offset(str,cntr,startPoint) into os
      if os = 0 then exit repeat
      add os to startPoint
      put startPoint & "," after oList
   end repeat
   if char -1 of oList = "," then delete last char of oList
   if oList = "" then return "0"
   return mosList
end offsets

function lineOffsets str,cntr
   -- returns a comma-delimited list of
   -- all the lineoffsets of str in cntr
   put offsets(str,cntr) into charList
   if charList = "0" then return "0"
   put the number of items of charList into nbr
   put "" into mlo
   repeat for each item n in charList
      put the number of lines of (char 1 to n of cntr) \
               & "," after oList
   end repeat
   if char -1 of oList = "," then delete char -1 of oList
   return oList
end lineOffsets

---------

On Dec 15, 2009, at 1:46 PM, Chris Sheffield wrote:

I am not very familiar with regular expressions, and I'm wonderingif someone more knowledgeable could give me a hint as to how toaccomplish this.
Given a passage of text, I need to find every instance of certainwords within that text and draw a box around them. The box drawing Ican handle just fine by including "box" in the textStyle of thefound chunk. But it's finding the instances that I'm strugglingwith. Here is my code. Big warning! This should not be run as is, ifanyone wants to attempt it. The second repeat will go forever.
repeat for each line tWord in tDiffWords
repeat until matchChunk(tStoryText, "(?i)\b(" & tWord & ")\b", tStartChar, tEndChar) is false
put the textStyle of char tStartChar to tEndChar of fld"StoryText" into tStyle
           if tStyle is empty or tStyle is "plain" then
               put "box" into tStyle
           else
               put comma & "box" after tStyle
           end if
set the textStyle of char tStartChar to tEndChar of fld"StoryText" to tStyle
       end repeat
   end repeat
What I need is some way to use the matchChunk function and continuethe search where the last search ended. I read through some regexdocumentation and came across "\G", but this doesn't seem to work inRev. But maybe I'm not putting it in the right place in my searchstring.
Can anyone help? Is there a way to do this? Or can someone recommendanother method of accomplishing the same thing? Keep in mind thatthis needs to search whole words in a story passage, and we'redealing with all kinds of punctuation here, including hyphens, emdashes, etc.
Thanks,
Chris

--
Chris Sheffield
Read Naturally, Inc.
www.readnaturally.com

_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage yoursubscription preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: regex question in matchChunk function

Reply via email to