On 8/8/07, bill lam <[EMAIL PROTECTED]> wrote:
> Raul Miller wrote:
> > Perhaps
> >    '.*\((.+?)\&\([EMAIL PROTECTED]) flag'
>
> Thank you, it works although I cannot yet understand why.

The underlying problem is that pcre works serially.  )You can
build regular expression engines that work in parallel, treating
all possibilities equally, but those make the treatment of
grouping parenthesis a somewhat significant problem in the
general case.)

So, the non-greedy .+? pattern matches the shortest possible
match that works, but that doesn't say anything about where
it starts.

The .* at the begining is greedy, and matches as much as possible,
which forces the .+? sub-pattern to start as late as possible.

> What is search pattern if it needs to count matching parenthesis, eg
> ('dog', 'mouse'), (('cat' ; dog)&([EMAIL PROTECTED]) flag

Regular expressions cannot count matching parenthesis (though
hypothetically speaking a regular expression implementation can
easily include "non-regular expression features"), though you can
ennumerate the more common possibilities.

The best general approach here is probably to break your sentence up
into relevant "words" and use something that can count or use a stack
to match things up.  And note that while you want to pick your regular
expression(s) to minimize the work required by the rest of the system,
you also often wind up choosing something that "mostly works" rather
than something which is a precise fit for the requirements as you
originally imagined them.

But, here's how you might match on two of the more common possibilities:

'(([^()]+|\([^()]+\))\&\([EMAIL PROTECTED]) flag'

On the one hand, this won't match at all for a number of cases.  On
the other hand, greedy matches work just fine for the cases that it
will match.

-- 
Raul
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to