Re: need some regular expression help

2006-10-08 Thread Diez B. Roggisch

hanumizzle wrote:
 On 7 Oct 2006 15:00:29 -0700, Diez B. Roggisch [EMAIL PROTECTED] wrote:
 
  Chris wrote:
   I need a pattern that  matches a string that has the same number of '('
   as ')':
   findall( compile('...'), '42^((2x+2)sin(x)) + (log(2)/log(5))' ) = [
   '((2x+2)sin(x))', '(log(2)/log(5))' ]
   Can anybody help me out?
 
  This is not possible with regular expressions - they can't remember
  how many parens they already encountered.

 Remember that regular expressions are used to represent regular
 grammars. Most regex engines actually aren't regular in that they
 support fancy things like look-behind/ahead and capture groups...IIRC,
 these cannot be part of a true regular expression library.

Certainly true, and it always gives me a hard time because I don't know
to which extend a regular expression nowadays might do the job because
of these extensions. It was so much easier back in the old times

 With that said, the quote-unquote regexes in Lua have a special
 feature that supports balanced expressions. I believe Python has a
 PCRE lib somewhere; you may be able to use the experimental ??{ }
 construct in that case.

Even if it has - I'm not sure if it really does you good, for several
reasons:

 - regexes - even enhanced ones - don't build trees. But that is what
you ultimately want
   from an expression like sin(log(x))

 - even if they are more powerful these days, the theory of context
free grammars still applies.
   so if what you need isn't LL(k) but LR(k), how do you specify that
to the regex engine?

 - the regexes are useful because of their compact notations, parsers
allow for better structured outcome 


Diez

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: need some regular expression help

2006-10-08 Thread Theerasak Photha
On 8 Oct 2006 01:49:50 -0700, Diez B. Roggisch [EMAIL PROTECTED] wrote:

 Even if it has - I'm not sure if it really does you good, for several
 reasons:

  - regexes - even enhanced ones - don't build trees. But that is what
 you ultimately want
from an expression like sin(log(x))

  - even if they are more powerful these days, the theory of context
 free grammars still applies.
so if what you need isn't LL(k) but LR(k), how do you specify that
 to the regex engine?

  - the regexes are useful because of their compact notations, parsers
 allow for better structured outcome

Just wait for Perl 6 :D

-- Theerasak
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: need some regular expression help

2006-10-08 Thread bearophileHUGS
Tim Chase:
 It still doesn't solve the aforementioned problem
 of things like ')))((('  which is balanced, but psychotic. :)

This may solve the problem:

def balanced(txt):
d = {'(':1, ')':-1}
tot = 0
for c in txt:
tot += d.get(c, 0)
if tot  0:
return False
return tot == 0

print balanced(42^((2x+2)sin(x)) + (log(2)/log(5))) # True
print balanced(42^((2x+2)sin(x) + (log(2)/log(5))) # False
print balanced(42^((2x+2)sin(x))) + (log(2)/log(5))) # False
print balanced()))((() # False

A possibile alternative for Py 2.5. The dict solution looks better, but
this may be faster:

def balanced2(txt):
tot = 0
for c in txt:
tot += 1 if c==( else (-1 if c==) else 0)
if tot  0:
return False
return tot == 0

Bye,
bearophile

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: need some regular expression help

2006-10-08 Thread Fredrik Lundh
[EMAIL PROTECTED] wrote:

  The dict solution looks better, but this may be faster:

it's slightly faster, but both your alternatives are about 10x slower 
than a straightforward:

def balanced(txt):
 return txt.count(() == txt.count())

/F

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: need some regular expression help

2006-10-08 Thread Mirco Wahab
Thus spoke Diez B. Roggisch (on 2006-10-08 10:49):
 Certainly true, and it always gives me a hard time because I don't know
 to which extend a regular expression nowadays might do the job because
 of these extensions. It was so much easier back in the old times

Right, in perl, this would be a no-brainer,
its documented all over the place, like:

   my $re;

   $re = qr{
(?:
  (? [^\\()]+ | \\. )
   |
 \( (??{ $re }) \)
)*
}xs;

where you have a 'delayed execution'
of the

  (??{ $re })

which in the end makes the whole a thing
recursive one, it gets expanded and
executed if the match finds its way
to it.

Above regex will match balanced parens,
as in:

   my $good = 'a + (b / (c - 2)) * (d ^ (e+f))  ';
   my $bad1 = 'a + (b / (c - 2)  * (d ^ (e+f))  ';
   my $bad2 = 'a + (b / (c - 2)) * (d) ^ (e+f) )';

if you do:

   print ok \n if $good =~ /^$re$/;
   print ok \n if $bad1 =~ /^$re$/;
   print ok \n if $bad2 =~ /^$re$/;


This in some depth documented e.g. in
http://japhy.perlmonk.org/articles/tpj/2004-summer.html
(topic: Recursive Regexes)

Regards

M.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: need some regular expression help

2006-10-08 Thread Diez B. Roggisch
Mirco Wahab schrieb:
 Thus spoke Diez B. Roggisch (on 2006-10-08 10:49):
 Certainly true, and it always gives me a hard time because I don't know
 to which extend a regular expression nowadays might do the job because
 of these extensions. It was so much easier back in the old times
 
 Right, in perl, this would be a no-brainer,
 its documented all over the place, like:
 
my $re;
 
$re = qr{
 (?:
   (? [^\\()]+ | \\. )
|
  \( (??{ $re }) \)
 )*
 }xs;
 
 where you have a 'delayed execution'
 of the
 
   (??{ $re })
 
 which in the end makes the whole a thing
 recursive one, it gets expanded and
 executed if the match finds its way
 to it.
 
 Above regex will match balanced parens,
 as in:
 
my $good = 'a + (b / (c - 2)) * (d ^ (e+f))  ';
my $bad1 = 'a + (b / (c - 2)  * (d ^ (e+f))  ';
my $bad2 = 'a + (b / (c - 2)) * (d) ^ (e+f) )';
 
 if you do:
 
print ok \n if $good =~ /^$re$/;
print ok \n if $bad1 =~ /^$re$/;
print ok \n if $bad2 =~ /^$re$/;
 
 
 This in some depth documented e.g. in
 http://japhy.perlmonk.org/articles/tpj/2004-summer.html
 (topic: Recursive Regexes)

That clearly is a recursive grammar rule, and thus it can't be regular 
anymore :) But first of all, I find it ugly - the clean separation of 
lexical and syntactical analysis is better here, IMHO - and secondly, 
what are the properties of that parsing? Is it LL(k), LR(k), backtracking?

Diez
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: need some regular expression help

2006-10-08 Thread bearophileHUGS
Fredrik Lundh wrote:

 it's slightly faster, but both your alternatives are about 10x slower
 than a straightforward:
 def balanced(txt):
  return txt.count(() == txt.count())

I know, but if you read my post again you see that I have shown those
solutions to mark )))((( as bad expressions. Just counting the parens
isn't enough.

Bye,
bearophile

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: need some regular expression help

2006-10-08 Thread Roy Smith
Diez B. Roggisch [EMAIL PROTECTED] wrote:
 Certainly true, and it always gives me a hard time because I don't know
 to which extend a regular expression nowadays might do the job because
 of these extensions. It was so much easier back in the old times

What old times?  I've been working with regex for mumble years and there's 
always been the problem that every implementation supports a slightly 
different syntax.  Even back in the good old days, grep, awk, sed, and ed 
all had slightly different flavors.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: need some regular expression help

2006-10-08 Thread Theerasak Photha
On 10/8/06, Roy Smith [EMAIL PROTECTED] wrote:
 Diez B. Roggisch [EMAIL PROTECTED] wrote:
  Certainly true, and it always gives me a hard time because I don't know
  to which extend a regular expression nowadays might do the job because
  of these extensions. It was so much easier back in the old times

 What old times?  I've been working with regex for mumble years and there's
 always been the problem that every implementation supports a slightly
 different syntax.  Even back in the good old days, grep, awk, sed, and ed
 all had slightly different flavors.

Which grep? Which awk? :)

-- Theerasak
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: need some regular expression help

2006-10-07 Thread Diez B. Roggisch

Chris wrote:
 I need a pattern that  matches a string that has the same number of '('
 as ')':
 findall( compile('...'), '42^((2x+2)sin(x)) + (log(2)/log(5))' ) = [
 '((2x+2)sin(x))', '(log(2)/log(5))' ]
 Can anybody help me out?

This is not possible with regular expressions - they can't remember
how many parens they already encountered.

You will need a real parser for this - pyparsing seems to be the most
popular choice today, I personally like spark. I'm sure you find an
example-grammar that will parse simple arithmetical expressions like
the one above.

Diez

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: need some regular expression help

2006-10-07 Thread John Machin
Chris wrote:
 I need a pattern that  matches a string that has the same number of '('
 as ')':
 findall( compile('...'), '42^((2x+2)sin(x)) + (log(2)/log(5))' ) = [
 '((2x+2)sin(x))', '(log(2)/log(5))' ]
 Can anybody help me out?


No, there is so such pattern. You will have to code up a function.

Consider what your spec really is: '42^((2x+2)sin(x)) +
(log(2)/log(5))' has the same number of left and right parentheses; so
does the zero-length string; so does ') + (' -- perhaps you need to add
'and starts with a ('

Consider what you are going to do with input like this:

print '(' + some_text + ')'

Maybe you need to do some lexical analysis and work at the level of
tokens rather than individual characters.

Which then raises the usual question: you have a perception that
regular expressions are the solution -- to what problem??

HTH,
John

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: need some regular expression help

2006-10-07 Thread hanumizzle
On 7 Oct 2006 15:00:29 -0700, Diez B. Roggisch [EMAIL PROTECTED] wrote:

 Chris wrote:
  I need a pattern that  matches a string that has the same number of '('
  as ')':
  findall( compile('...'), '42^((2x+2)sin(x)) + (log(2)/log(5))' ) = [
  '((2x+2)sin(x))', '(log(2)/log(5))' ]
  Can anybody help me out?

 This is not possible with regular expressions - they can't remember
 how many parens they already encountered.

Remember that regular expressions are used to represent regular
grammars. Most regex engines actually aren't regular in that they
support fancy things like look-behind/ahead and capture groups...IIRC,
these cannot be part of a true regular expression library.

With that said, the quote-unquote regexes in Lua have a special
feature that supports balanced expressions. I believe Python has a
PCRE lib somewhere; you may be able to use the experimental ??{ }
construct in that case.

-- Theerasak
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: need some regular expression help

2006-10-07 Thread Roy Smith
In article [EMAIL PROTECTED],
 Chris [EMAIL PROTECTED] wrote:

 I need a pattern that  matches a string that has the same number of '('
 as ')':
 findall( compile('...'), '42^((2x+2)sin(x)) + (log(2)/log(5))' ) = [
 '((2x+2)sin(x))', '(log(2)/log(5))' ]
 Can anybody help me out?
 
 Thanks for any help!

Why does it need to be a regex?  There is a very simple and well-known 
algorithm which does what you want.

Start with i=0.  Walk the string one character at a time, incrementing i 
each time you see a '(', and decrementing it each time you see a ')'.  At 
the end of the string, the count should be back to 0.  If at any time 
during the process, the count goes negative, you've got mis-matched 
parentheses.

The algorithm runs in O(n), same as a regex.

Regex is a wonderful tool, but it's not the answer to all problems.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: need some regular expression help

2006-10-07 Thread Tim Chase
 Why does it need to be a regex?  There is a very simple and well-known 
 algorithm which does what you want.
 
 Start with i=0.  Walk the string one character at a time, incrementing i 
 each time you see a '(', and decrementing it each time you see a ')'.  At 
 the end of the string, the count should be back to 0.  If at any time 
 during the process, the count goes negative, you've got mis-matched 
 parentheses.
 
 The algorithm runs in O(n), same as a regex.
 
 Regex is a wonderful tool, but it's not the answer to all problems.

Following Roy's suggestion, one could use something like:

  s = '42^((2x+2)sin(x)) + (log(2)/log(5))'
  d = {'(':1, ')':-1}
  sum(d.get(c, 0) for c in s)
0


If you get a sum()  0, then you have too many (, and if you 
have sum()  0, you have too many ) characters.  A sum() of 0 
means there's the same number of parens.  It still doesn't solve 
the aforementioned problem of things like ')))((('  which is 
balanced, but psychotic. :)

-tkc




-- 
http://mail.python.org/mailman/listinfo/python-list