In a Regular Expression search, if [:space:] and [:digit:] only work with + or * or {}, it's a bug.

[:space:] by itself, all alone, should recognize exactly one "white" character.

Square brackets enclose a "character class". A character class is a list of "atoms". An atom is a matchable entity. The only thing the square brackets do is allow any one of the atoms inside to match at a given point in a string. And the whole square bracket list is itself an atom (note: just one atom!), it's just an atom that can match different target characters at different positions in a target string.

"+" and "*" are "quantifiers". A quantifier means "how many of".

So an atom (including a square bracket atom) plus a quantifier means "how many of this". But an atom without any quantifier at all ought to be interpreted as "one of this".

I think maybe what Uwe meant was that it is not possible to search for :space: by itself. This is (almost) true. :space: will not locate a white character in a target string.

(Without the square brackets, :space: is 7 atoms, a colon, a character s, a character p, ... all in the exact order listed, and it will match that literal 7 character sequence in a target string. But that was not the original intent.)

But :space: inside square brackets is not a 7 character sequence. Square brackets create a special context in which :space: is recognized as a shorthand for the list of white characters of the current alphabet, and the whole list becomes a single atom. And as a valid atom, a quantifier is allowed but should *not* be required.

Another thing Uwe might have been trying to say is that it is not valid to quantify a quantifier. This is completely true. It is not valid to try to match +* for example. This would mean something like "any number of at least one of", except that it's meaningless. A regular expression can match a count of things, but it's not possible to just match a count, much less a count of counts. To search for any number of literal plus characters, you must "escape" the + character with a backslash: "\+" (but without the quotes). This strips off the quantifier meaning of the plus character, reverting it temporarily to its literal alphabet character meaning. Thus, "\+*" (without the quotes) means "any number of consecutive literal plus characters", "\++" (without the quotes) means "at least one consecutive literal plus character", and "\+\+" (without the quotes) means "exactly two consecutive literal plus characters".

But [:space:] is not a quantifier, it's an atom, and it is legal, but not required, to quantify an atom.


When I've written my own book on regular expressions, it will make all of this crystal clear. ;)


Uwe Fischer wrote:

Andrew Douglas Pitonyak wrote:

What else can I say besides "Does [:space:] work with regular expressions?"

I can use regular expressions, but I can not make it find a space using this syntax, which is documented. I also tested [:digit:], which does not work for me. [0-9] works just fine, however. In other words, I can use some regular expressions, just not all. I see no issues for this. Depending on the answer, I will open an issue.

I am using 2.02 on Linux. I investigated this based on a question here:
http://www.oooforum.org/forum/viewtopic.phtml?p=154379#154379


please use [:space:]+ or [:space:]* as search term.
[:space:] by itself is a regular expression for "any white space" (look up Wiki or Google what a white space is). You cannot search for a regular expression by itself, in the same sense as you cannot search for something as "between 3 and 6 times". You always must give a parameter what you mean by using the regular expression. You can find all this in a very short list in Online Help. Be aware that a complete discussion of "regular expressions" can fill a book of 500 pages, see amazon.com for that keyword.

Regards
Uwe

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to