Robert Haas <robertmh...@gmail.com> writes:
> I think the right way to imagine this is as though the regular
> expression were being matched to the source text in left-to-right
> fashion.

No, it isn't.  You are headed down the garden path that leads to a
Perl-style definition-by-implementation, and in particular you are going
to end up with an implementation that fails to satisfy the POSIX
standard.  POSIX requires an *overall longest* match (at least for cases
where all quantifiers are greedy), and that sometimes means that the
quantifiers can't be processed strictly left-to-right greedy.  An
example of this is 

regression=# select substring('aaaaaabab' from '(a*(ab)*)');
 substring 
-----------
 aaaaaabab
(1 row)

If the a* is allowed to match as much as it wants, the (ab)* will not be
able to match at all, and then you fail to find the longest possible
overall match.

I suspect that it is possible to construct similar cases where, for an
all-non-greedy pattern, finding the overall shortest match sometimes
requires that individual quantifiers eat more than the local minimum.
I've not absorbed enough caffeine yet this morning to produce an example
though.

I probably shouldn't guess too much at Henry Spencer's thought
processes, but I think that he was looking for an extension of this
POSIX concept to mixed-greediness cases, ie you first define what the
overall RE matches and then let the individual quantifiers fight it out
as to which one gets how much of that.  The particular way he did that
is obviously leaving a lot of people unsatisfied, but I think we need to
keep looking for rules of that sort, and not revert to defining the
behavior by a search algorithm.

> I think it's right to view every RE construct as greedy unless it's got
> an explicit "not-greedy" flag attached to it; after all, that's the
> traditional behavior of REs from time immemorial.  Someone could
> invent a non-greedy form of alternation if they were so inclined.

I think you can do that already: "(foo|bar){1,1}?" (if this doesn't
result in a non-greedy alternation then it's a bug).  The notation is
a bit ugly admittedly.

                        regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to