Re: A better RE?
Magnus Lycka [EMAIL PROTECTED] writes: I want an re that matches strings like 21MAR06 31APR06 1236, where the last part is day numbers (1-7), i.e it can contain the numbers 1-7, in order, only one of each, and at least one digit. I want it as three groups. I was thinking of Just a small point - what does in order mean here? if it means that eg 1362 is not valid then you're stuck because it's context sensitive and hence not regular. I can't see how any of the fancy extensions could help here but maybe I'm just lacking insight. Now if [\1-7] worked you'd be home and dry. Eddie -- http://mail.python.org/mailman/listinfo/python-list
Re: A better RE?
Eddie Corns wrote: I want an re that matches strings like 21MAR06 31APR06 1236, where the last part is day numbers (1-7), i.e it can contain the numbers 1-7, in order, only one of each, and at least one digit. I want it as three groups. I was thinking of Just a small point - what does in order mean here? if it means that eg 1362 is not valid then you're stuck because it's context sensitive and hence not regular. I can't see how any of the fancy extensions could help here but maybe I'm just lacking insight. import re p = re.compile((?=[1234567])(1?2?3?4?5?6?7?)$) def test(s): m = p.match(s) print repr(s), =, m and m.groups() or none test() test(1236) test(1362) test(12345678) prints '' = none '1236' = ('1236',) '1362' = none '12345678' = none /F -- http://mail.python.org/mailman/listinfo/python-list
Re: A better RE?
Eddie Corns wrote: Just a small point - what does in order mean here? if it means that eg 1362 is not valid then you're stuck because it's context sensitive and hence not regular. I'm not seeing that. Any finite language is regular -- as a last resort you could list all ascending sequences of 7 or fewer digits (but perhaps I misunderstood the original poster's requirements). Jim -- http://mail.python.org/mailman/listinfo/python-list
Re: A better RE?
Fredrik Lundh [EMAIL PROTECTED] writes: Eddie Corns wrote: I want an re that matches strings like 21MAR06 31APR06 1236, where the last part is day numbers (1-7), i.e it can contain the numbers 1-7, in order, only one of each, and at least one digit. I want it as three groups. I was thinking of Just a small point - what does in order mean here? if it means that eg 1362 is not valid then you're stuck because it's context sensitive and hence not regular. I can't see how any of the fancy extensions could help here but maybe I'm just lacking insight. import re p = re.compile((?=[1234567])(1?2?3?4?5?6?7?)$) def test(s): m = p.match(s) print repr(s), =, m and m.groups() or none test() test(1236) test(1362) test(12345678) prints '' = none '1236' = ('1236',) '1362' = none '12345678' = none /F I know I know! I cancelled the article about a minute after posting it. Eddie -- http://mail.python.org/mailman/listinfo/python-list
Re: A better RE?
Jim [EMAIL PROTECTED] writes: Eddie Corns wrote: Just a small point - what does in order mean here? if it means that eg 1362 is not valid then you're stuck because it's context sensitive and hence not regular. I'm not seeing that. Any finite language is regular -- as a last resort you could list all ascending sequences of 7 or fewer digits (but perhaps I misunderstood the original poster's requirements). No, that's what I did. Just carelessnes on my part, time I had a holiday! Eddie -- http://mail.python.org/mailman/listinfo/python-list
Re: A better RE?
Magnus Lycka [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] I want an re that matches strings like 21MAR06 31APR06 1236, where the last part is day numbers (1-7), i.e it can contain the numbers 1-7, in order, only one of each, and at least one digit. I want it as three groups. I was thinking of r(\d\d[A-Z]\d\d) (\d\d[A-Z]\d\d) (1?2?3?4?5?6?7?) but that will match even if the third group is empty, right? Does anyone have good and not overly complex RE for this? P.S. I know the now you have two problems reply... For the pyparsing-inclined, here are two versions, along with several examples on how to extract the fields from the returned ParseResults object. The second version is more rigorous in enforcing the days-of-week rules on the 3rd field. Note that the month field is already limited to valid month abbreviations, and the same technique used to validate the days-of-week field could be used to ensure that the date fields are valid dates (no 31st of FEB, etc.), that the second date is after the first, etc. -- Paul Download pyparsing at http://pyparsing.sourceforge.net. data = 21MAR06 31APR06 1236 data2 = 21MAR06 31APR06 1362 from pyparsing import * # define format of an entry month = oneOf(JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC) date = Combine( Word(nums,exact=2) + month + Word(nums,exact=2) ) daysOfWeek = Word(1234567) entry = date.setResultsName(startDate) + \ date.setResultsName(endDate) + \ daysOfWeek.setResultsName(weekDays) + \ lineEnd # extract entry data e = entry.parseString(data) # various ways to access the results print e.startDate, e.endDate, e.weekDays print %(startDate)s : %(endDate)s : %(weekDays)s % e print e.asList() print e print # get more rigorous in testing for valid days of week field def rigorousDayOfWeekTest(s,l,toks): # remove duplicates from toks[0], sort, then compare to original tmp = .join(sorted(dict([(ll,0) for ll in toks[0]]).keys())) if tmp != toks[0]: raise ParseException(s,l,Invalid days of week field) daysOfWeek.setParseAction(rigorousDayOfWeekTest) entry = date.setResultsName(startDate) + \ date.setResultsName(endDate) + \ daysOfWeek.setResultsName(weekDays) + \ lineEnd print entry.parseString(data) print entry.parseString(data2) # -- raises ParseException -- http://mail.python.org/mailman/listinfo/python-list
Re: A better RE?
Schüle Daniel wrote: txt = 21MAR06 31APR06 1236 m = '(?:JAN|FEB|MAR|APR|MAI|JUN|JUL|AUG|SEP|OCT|NOV|DEZ)' # non capturing group (:?) p = re.compile(r(\d\d%s\d\d) (\d\d%s\d\d) (?=[1234567])(1?2?3?4?5?6?7?) % (m,m)) p.match(txt).group(1) '21MAR06' p.match(txt).group(2) '31APR06' p.match(txt).group(3) 1236 Excellent. Thanks! -- http://mail.python.org/mailman/listinfo/python-list
Re: A better RE?
Fredrik Lundh wrote: Magnus Lycka wrote: r(\d\d[A-Z]{3}\d\d) (\d\d[A-Z]{3}\d\d) (?=[1234567])(1?2?3?4?5?6?7?) Thanks a lot. (I knew about {3} of course, I was in a hurry when I posted since I was close to missing my train...) -- http://mail.python.org/mailman/listinfo/python-list
A better RE?
I want an re that matches strings like 21MAR06 31APR06 1236, where the last part is day numbers (1-7), i.e it can contain the numbers 1-7, in order, only one of each, and at least one digit. I want it as three groups. I was thinking of r(\d\d[A-Z]\d\d) (\d\d[A-Z]\d\d) (1?2?3?4?5?6?7?) but that will match even if the third group is empty, right? Does anyone have good and not overly complex RE for this? P.S. I know the now you have two problems reply... -- http://mail.python.org/mailman/listinfo/python-list
Re: A better RE?
Magnus Lycka wrote: I want an re that matches strings like 21MAR06 31APR06 1236, where the last part is day numbers (1-7), i.e it can contain the numbers 1-7, in order, only one of each, and at least one digit. I want it as three groups. I was thinking of r(\d\d[A-Z]\d\d) (\d\d[A-Z]\d\d) (1?2?3?4?5?6?7?) but that will match even if the third group is empty, right? Does anyone have good and not overly complex RE for this? how about (untested) r(\d\d[A-Z]{3}\d\d) (\d\d[A-Z]{3}\d\d) (?=[1234567])(1?2?3?4?5?6?7?) where {3} means require three copies of the previous RE part, and (?=[1234567]) means require at least one of 1-7, but don't move forward if it matches. /F -- http://mail.python.org/mailman/listinfo/python-list
Re: A better RE?
Magnus Lycka wrote: I want an re that matches strings like 21MAR06 31APR06 1236, where the last part is day numbers (1-7), i.e it can contain the numbers 1-7, in order, only one of each, and at least one digit. I want it as three groups. I was thinking of r(\d\d[A-Z]\d\d) (\d\d[A-Z]\d\d) (1?2?3?4?5?6?7?) but that will match even if the third group is empty, right? Does anyone have good and not overly complex RE for this? P.S. I know the now you have two problems reply... txt = 21MAR06 31APR06 1236 m = '(?:JAN|FEB|MAR|APR|MAI|JUN|JUL|AUG|SEP|OCT|NOV|DEZ)' # non capturing group (:?) p = re.compile(r(\d\d%s\d\d) (\d\d%s\d\d) (?=[1234567])(1?2?3?4?5?6?7?) % (m,m)) p.match(txt).group(1) '21MAR06' p.match(txt).group(2) '31APR06' p.match(txt).group(3) 1236 -- http://mail.python.org/mailman/listinfo/python-list