The other main issue is that Rev does not support all the fine nuances of Perl-style RegEx, though the docs say it does.

A problem is that their documentation doesn't match what their functions. A table that summarizes the regular expression codes found in about all programs that implement regular expressions can be seen at : http://revolution.lexicall.org/wiki/tiki-index.php? page=RegularExpressions

What is missing in rev doc:
{} The braces force the preceding character to match a
          specific number of times.
          Ex:  (rat){3}    matches ratratrat
           rat{3}    matches rattt  rat{2,5} matches ratt or
          rattt or ratttt or rattttt (Between 2 and 5 t s)

Though this is implemented:
put "_" & replaceText("AAAAAAA","A{3}","")   -> A
put "_" & replaceText("AAAAAAA","A{4}","")   -> AAA
put "_" & replaceText("AAAAAAA","A{5}","")   -> AA
put "_" & replaceText("AAAAAAA","A{6}","")   -> A

There is an error in their documentation:
[ABC]|[XYZ] matches “AY” or “CX”, but not “AA” or “ZB”.
should be:
[ABC][XYZ] matches “AY” or “CX”, but not “AA” or “ZB”. (i.e., inappropriate to exemplify "|")
Hopefully, the function behaves normally:
put "AYCXAA" into tTExt; put replacetext(tText, "[ABC][XYZ]", "") - > AA put "AYCXAA" into tTExt; put replacetext(tText, "[ABC]|[XYZ]", "") - > empty

The correct example is
(AY|CX)   matches “AY” or “CX”
or a more telling one
(mouse|mice) matches mouse or mice.

I don't remember the details, but I ran into problems trying to use look-around features, for instance. I've come to the conclusion that I should try a simple version of what I want first in the Message Box, then put it into my script.

I was surprised to see Mark use \s and \S as they are not mentioned in the documentation (which hasn't been updated to follow updates in the function in version 2.5). Full information about these special codes can be found below.

Interestingly, start of text can also be represented by \A and \Z . They work in revolution and produce still another behaviour. Honestly, I was pleased to read that regular expressions had been improved (version 2.6?)... but there are obviously some more problems to fix.

put "_" & replaceText(" A C","^ *","")  -> _A C
put "_" & replaceText("A C","^ *","")   -> _C

put "_" & replaceText(" A C","\A ","")   -> _A C   (space before A C)
put "_" & replaceText("A C","\A ","")     -> _A C   (no space)
put "_" & replaceText("A C","\A ","")     -> _A C   (no space)
put "_" & replaceText("A C","\A *","")    -> _
put "_" & replaceText(" A C","\A *","")    -> _


I tried the edge of word (\B) and this seems to behave strangely as well:

put "_" & replaceText(" A C","\B *","")   -> _A C
put "_" & replaceText(" A C","\b *","")   -> _

------------------------------------------------------------------------ ------------------------

 \b and \B    NaV. \b matches the empty string at the
edge of a word; \B matches the empty string if not at the edge of
              a word.
Ex: \bcomput will match "computer" or "computing", but not "supercomputer" since there is no spaces or punctuation between "super" and "computer". \Bcomput will not match "computer" or
              "computing", unless it is part of a bigger word such as
              "supercomputer" or "recomputing".

 \w and \W    NaV. \w matches word-constituent
characters (letters, "_", & digits); \W matches characters that
              are not word-constituent
Ex: a\wz matches "abz", "aTz", "a5z", "a_z", or any three-character
             string starting with "a", ending with "z", and whose
             second character was either a letter (upper-or
             lower-case), a number, or the underscore.
             a\Wz would not match "abz", "aTz", "a5z", or "a_z". It
             would match "a%z", "a z", "a?z" or any three-character
             string starting with "a" and ending with "z" and whose
             second character was not a letter, number, or
             underscore. (This means the second character must
             either be a symbol or a whitespace character.)

 \d and \D    NaV. \d matches any digit. \D matches any
                  character except a digit.
Ex: a\Dz matches "abz", "aTz" or "a%z", not "a2z", "a5z" or "a9z". \D+ matches any non-null string which contains no numeric characters.

 \s and \S    NaV. \s matches exactly one character of
whitespace. (Whitespace is defined as spaces, tabs, newlines, or any character which would not use ink if printed on a printer.) \S
              matches any character that is not whitespace.
Ex: a\sz would match any three-character string starting with "a" and ending with "z" and whose second character was a space, tab, or newline. a\Sz would match any three-character string starting with "a" and ending with "z" whose second character was not a space, tab or newline. (Thus, the second character could be a letter, number or
                  symbol.)

\nnn NaV. This is used for specifying control characters that have no typed equivalent. For example, \007 would find all subjects with an embedded ASCII "bell" character. (The bell is specified by an ASCII value of 7.) You will
              rarely need to use the octal metacharacter.

 \A and \Z    Beginning and End of string. (equivalents of ^and $)


------------------------------------------------------------------------ --------
Marielle Lange (PhD),  Psycholinguist

Alternative emails: [EMAIL PROTECTED], [EMAIL PROTECTED]
Homepage http://homepages.lexicall.org/mlange/
Easy access to lexical databases                    http://lexicall.org
Supporting Education Technologists http:// revolution.lexicall.org


_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Reply via email to