Re: A better RE?

2006-03-10 Thread Eddie Corns
Magnus Lycka [EMAIL PROTECTED] writes:

I want an re that matches strings like 21MAR06 31APR06 1236,
where the last part is day numbers (1-7), i.e it can contain
the numbers 1-7, in order, only one of each, and at least one
digit. I want it as three groups. I was thinking of

Just a small point - what does in order mean here? if it means that eg 1362
is not valid then you're stuck because it's context sensitive and hence not
regular.

I can't see how any of the fancy extensions could help here but maybe I'm just
lacking insight.

Now if [\1-7] worked you'd be home and dry.

Eddie
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: A better RE?

2006-03-10 Thread Fredrik Lundh
Eddie Corns wrote:


 I want an re that matches strings like 21MAR06 31APR06 1236,
 where the last part is day numbers (1-7), i.e it can contain
 the numbers 1-7, in order, only one of each, and at least one
 digit. I want it as three groups. I was thinking of

 Just a small point - what does in order mean here? if it means that eg 1362
 is not valid then you're stuck because it's context sensitive and hence not
 regular.

 I can't see how any of the fancy extensions could help here but maybe I'm
 just lacking insight.

import re

p = re.compile((?=[1234567])(1?2?3?4?5?6?7?)$)

def test(s):
m = p.match(s)
print repr(s), =, m and m.groups() or none

test()
test(1236)
test(1362)
test(12345678)

prints

'' = none
'1236' = ('1236',)
'1362' = none
'12345678' = none

/F



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: A better RE?

2006-03-10 Thread Jim

Eddie Corns wrote:
 Just a small point - what does in order mean here? if it means that eg 1362
 is not valid then you're stuck because it's context sensitive and hence not
 regular.
I'm not seeing that.  Any finite language is regular -- as a last
resort you could list all ascending sequences of 7 or fewer digits (but
perhaps I misunderstood the original poster's requirements).

Jim

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: A better RE?

2006-03-10 Thread Eddie Corns
Fredrik Lundh [EMAIL PROTECTED] writes:

Eddie Corns wrote:


 I want an re that matches strings like 21MAR06 31APR06 1236,
 where the last part is day numbers (1-7), i.e it can contain
 the numbers 1-7, in order, only one of each, and at least one
 digit. I want it as three groups. I was thinking of

 Just a small point - what does in order mean here? if it means that eg 1362
 is not valid then you're stuck because it's context sensitive and hence not
 regular.

 I can't see how any of the fancy extensions could help here but maybe I'm
 just lacking insight.

import re

p = re.compile((?=[1234567])(1?2?3?4?5?6?7?)$)

def test(s):
m = p.match(s)
print repr(s), =, m and m.groups() or none

test()
test(1236)
test(1362)
test(12345678)

prints

'' = none
'1236' = ('1236',)
'1362' = none
'12345678' = none

/F

I know I know!  I cancelled the article about a minute after posting it.

Eddie
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: A better RE?

2006-03-10 Thread Eddie Corns
Jim [EMAIL PROTECTED] writes:


Eddie Corns wrote:
 Just a small point - what does in order mean here? if it means that eg 1362
 is not valid then you're stuck because it's context sensitive and hence not
 regular.
I'm not seeing that.  Any finite language is regular -- as a last
resort you could list all ascending sequences of 7 or fewer digits (but
perhaps I misunderstood the original poster's requirements).

No, that's what I did.  Just carelessnes on my part, time I had a holiday!

Eddie

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: A better RE?

2006-03-10 Thread Paul McGuire
Magnus Lycka [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
 I want an re that matches strings like 21MAR06 31APR06 1236,
 where the last part is day numbers (1-7), i.e it can contain
 the numbers 1-7, in order, only one of each, and at least one
 digit. I want it as three groups. I was thinking of

 r(\d\d[A-Z]\d\d) (\d\d[A-Z]\d\d) (1?2?3?4?5?6?7?)

 but that will match even if the third group is empty,
 right? Does anyone have good and not overly complex RE for
 this?

 P.S. I know the now you have two problems reply...

For the pyparsing-inclined, here are two versions, along with several
examples on how to extract the fields from the returned ParseResults object.
The second version is more rigorous in enforcing the days-of-week rules on
the 3rd field.

Note that the month field is already limited to valid month abbreviations,
and the same technique used to validate the days-of-week field could be used
to ensure that the date fields are valid dates (no 31st of FEB, etc.), that
the second date is after the first, etc.

-- Paul
Download pyparsing at http://pyparsing.sourceforge.net.


data  = 21MAR06 31APR06 1236
data2 = 21MAR06 31APR06 1362

from pyparsing import *

# define format of an entry
month = oneOf(JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC)
date = Combine( Word(nums,exact=2) + month + Word(nums,exact=2) )
daysOfWeek = Word(1234567)
entry = date.setResultsName(startDate) + \
date.setResultsName(endDate) + \
daysOfWeek.setResultsName(weekDays) + \
lineEnd

# extract entry data
e = entry.parseString(data)

# various ways to access the results
print e.startDate, e.endDate, e.weekDays
print %(startDate)s : %(endDate)s : %(weekDays)s % e
print e.asList()
print e
print

# get more rigorous in testing for valid days of week field
def rigorousDayOfWeekTest(s,l,toks):
# remove duplicates from toks[0], sort, then compare to original
tmp = .join(sorted(dict([(ll,0) for ll in toks[0]]).keys()))
if tmp != toks[0]:
raise ParseException(s,l,Invalid days of week field)

daysOfWeek.setParseAction(rigorousDayOfWeekTest)
entry = date.setResultsName(startDate) + \
date.setResultsName(endDate) + \
daysOfWeek.setResultsName(weekDays) + \
lineEnd

print entry.parseString(data)
print entry.parseString(data2) # -- raises ParseException


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: A better RE?

2006-03-10 Thread Magnus Lycka
Schüle Daniel wrote:
   txt = 21MAR06 31APR06 1236
 
   m = '(?:JAN|FEB|MAR|APR|MAI|JUN|JUL|AUG|SEP|OCT|NOV|DEZ)'
 # non capturing group (:?)
 
   p = re.compile(r(\d\d%s\d\d) (\d\d%s\d\d) 
 (?=[1234567])(1?2?3?4?5?6?7?) % (m,m))
 
   p.match(txt).group(1)
 '21MAR06'
 
   p.match(txt).group(2)
 '31APR06'
 
   p.match(txt).group(3)
 1236
 

Excellent. Thanks!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: A better RE?

2006-03-10 Thread Magnus Lycka
Fredrik Lundh wrote:
 Magnus Lycka wrote:
 r(\d\d[A-Z]{3}\d\d) (\d\d[A-Z]{3}\d\d)  (?=[1234567])(1?2?3?4?5?6?7?)
 

Thanks a lot. (I knew about {3} of course, I was in a hurry
when I posted since I was close to missing my train...)
-- 
http://mail.python.org/mailman/listinfo/python-list


A better RE?

2006-03-09 Thread Magnus Lycka
I want an re that matches strings like 21MAR06 31APR06 1236,
where the last part is day numbers (1-7), i.e it can contain
the numbers 1-7, in order, only one of each, and at least one
digit. I want it as three groups. I was thinking of

r(\d\d[A-Z]\d\d) (\d\d[A-Z]\d\d) (1?2?3?4?5?6?7?)

but that will match even if the third group is empty,
right? Does anyone have good and not overly complex RE for
this?

P.S. I know the now you have two problems reply...
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: A better RE?

2006-03-09 Thread Fredrik Lundh
Magnus Lycka wrote:

 I want an re that matches strings like 21MAR06 31APR06 1236,
 where the last part is day numbers (1-7), i.e it can contain
 the numbers 1-7, in order, only one of each, and at least one
 digit. I want it as three groups. I was thinking of

 r(\d\d[A-Z]\d\d) (\d\d[A-Z]\d\d) (1?2?3?4?5?6?7?)

 but that will match even if the third group is empty,
 right? Does anyone have good and not overly complex RE for
 this?

how about (untested)

r(\d\d[A-Z]{3}\d\d) (\d\d[A-Z]{3}\d\d)  (?=[1234567])(1?2?3?4?5?6?7?)

where {3} means require three copies of the previous RE part, and
(?=[1234567]) means require at least one of 1-7, but don't move
forward if it matches.

/F



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: A better RE?

2006-03-09 Thread Schüle Daniel
Magnus Lycka wrote:
 I want an re that matches strings like 21MAR06 31APR06 1236,
 where the last part is day numbers (1-7), i.e it can contain
 the numbers 1-7, in order, only one of each, and at least one
 digit. I want it as three groups. I was thinking of
 
 r(\d\d[A-Z]\d\d) (\d\d[A-Z]\d\d) (1?2?3?4?5?6?7?)
 
 but that will match even if the third group is empty,
 right? Does anyone have good and not overly complex RE for
 this?
 
 P.S. I know the now you have two problems reply...

  txt = 21MAR06 31APR06 1236

  m = '(?:JAN|FEB|MAR|APR|MAI|JUN|JUL|AUG|SEP|OCT|NOV|DEZ)'
# non capturing group (:?)

  p = re.compile(r(\d\d%s\d\d) (\d\d%s\d\d) 
(?=[1234567])(1?2?3?4?5?6?7?) % (m,m))

  p.match(txt).group(1)
'21MAR06'

  p.match(txt).group(2)
'31APR06'

  p.match(txt).group(3)
1236

-- 
http://mail.python.org/mailman/listinfo/python-list