On Friday, October 17, 2003, at 07:10 AM, Ivers, Doug E wrote:
This may be a common need...
I want to parse text into a list of words.
I don't know how to pull out an arbitrary number of captures in a Revolution regex. I use a regex that gets me the first capture and the string after that. I loop on that.
For the apostrophe, a simple model would be to assume a word is a sequence of letters but may include embedded apostrophes.
Is there any reason why the same regex won't work with European languages? Do other languages have characters like the English apostrophe should be considered part of a word?
You can look at the PCRE doc web page (start half-way down) and look at the definition of \w, the "word" character matcher. This will match some high characters if the locale is set right. However, it will match underline. You might be better off, looking at just what you want to match in a particular encoding and match with \xhh. (PCRE has a very limited UTF-8 mode, but we don't seem to have a way to turn that on; I'd prefer full-width unicode mode when it comes, anyway.)
Dar Scott
_______________________________________________ use-revolution mailing list [EMAIL PROTECTED] http://lists.runrev.com/mailman/listinfo/use-revolution
