Re: matchText and accented characters

Chris Sheffield Wed, 17 Oct 2007 07:45:23 -0700

Thanks, Ken. Using the hex equivalents is an interesting suggestion.I may look into that further.

As for replacing the accented characters with their non-accentedequivalents, that is also something I've done in the past, but theproblem here is that this is Mac/PC cross platform, so it's quite afew extra lines of code.

So I decided to simply try the offset function, with wholeMatches setto true (although I can't really determine if wholeMatches affectsoffset or not), and that seems to be working fine for me. Stilltesting it out to make sure, but so far so good.


Thanks again for the suggestions.


On Oct 16, 2007, at 5:59 PM, Ken Ray wrote:

On Tue, 16 Oct 2007 12:18:54 -0600, Chris Sheffield wrote:

Thanks, Andres. But that didn't seem to fix the problem. That
property, according to the docs, only seems to apply to the numToChar
and charToNum functions. I did try it just to make sure.


The issue is that PCRE (which is the lib that Rev uses) *optionally*

supports locales, so I don't know if any locales were compiled intothe

code that Rev uses. If you knew what you were looking for, you could
replace the accented characters with their hex equivalents and you'd
get a match:

  put matchChunk(fld 1,".*(fianc\x8E).*",tStart,tEnd)

in this case "\x8E" means "use hex code 8E", which is ASCII 142, which
is é (at least on my Mac). To determine this, I ran this code:

  put baseConvert(charToNum("é"),10,16)

which gave me "8E". So if you know specifically the characters to
match, you can use this.

On the other hand, if you have a big chunk of text and you don't know
if there are accented chars or not, I would personally run it the
"brute force" way:

1) put a copy of the text into another variable
2) replace the accented chars with their non-accented counterparts - a
dozen or so lines like:
       - replace "é" with "e" in myVar
       - replace "ó" with "o" in myVar
       - etc.
3) run your 'matchChunk' on the second "clean" variable using
non-accented text (look for "fiance" and not "fiancé")
4) if you get a hit, use the startChar/endChar variables from the
'matchChunk' to extract the text from the *first* variable (the one
with the accented text)

Just my 2 cents,

Ken Ray
Sons of Thunder Software, Inc.
Email: [EMAIL PROTECTED]
Web Site: http://www.sonsothunder.com/
_______________________________________________
use-revolution mailing list
[email protected]

Please visit this url to subscribe, unsubscribe and manage yoursubscription preferences:

http://lists.runrev.com/mailman/listinfo/use-revolution


_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: matchText and accented characters

Reply via email to