Robert Haas <robertmh...@gmail.com> writes: > I think the right way to imagine this is as though the regular > expression were being matched to the source text in left-to-right > fashion.
No, it isn't. You are headed down the garden path that leads to a Perl-style definition-by-implementation, and in particular you are going to end up with an implementation that fails to satisfy the POSIX standard. POSIX requires an *overall longest* match (at least for cases where all quantifiers are greedy), and that sometimes means that the quantifiers can't be processed strictly left-to-right greedy. An example of this is regression=# select substring('aaaaaabab' from '(a*(ab)*)'); substring ----------- aaaaaabab (1 row) If the a* is allowed to match as much as it wants, the (ab)* will not be able to match at all, and then you fail to find the longest possible overall match. I suspect that it is possible to construct similar cases where, for an all-non-greedy pattern, finding the overall shortest match sometimes requires that individual quantifiers eat more than the local minimum. I've not absorbed enough caffeine yet this morning to produce an example though. I probably shouldn't guess too much at Henry Spencer's thought processes, but I think that he was looking for an extension of this POSIX concept to mixed-greediness cases, ie you first define what the overall RE matches and then let the individual quantifiers fight it out as to which one gets how much of that. The particular way he did that is obviously leaving a lot of people unsatisfied, but I think we need to keep looking for rules of that sort, and not revert to defining the behavior by a search algorithm. > I think it's right to view every RE construct as greedy unless it's got > an explicit "not-greedy" flag attached to it; after all, that's the > traditional behavior of REs from time immemorial. Someone could > invent a non-greedy form of alternation if they were so inclined. I think you can do that already: "(foo|bar){1,1}?" (if this doesn't result in a non-greedy alternation then it's a bug). The notation is a bit ugly admittedly. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers