_______________________________ >From: eryksun <eryk...@gmail.com> >To: Ed Owens <eowens0...@gmx.com> >Cc: "tutor@python.org" <tutor@python.org> >Sent: Thursday, December 6, 2012 3:08 AM >Subject: Re: [Tutor] Regular expressions question > >On Wed, Dec 5, 2012 at 7:13 PM, Ed Owens <eowens0...@gmx.com> wrote: >>>>> str(string) >> '[<div class="wx-timestamp">\n<div class="wx-subtitle wx-timestamp">Updated: >> Dec 5, 2012, 5:08pm EST</div>\n</div>]' >>>>> m = re.search('":\b(\w+\s+\d+,\s+\d+,\s+\d+:\d+.m\s+\w+)<', str(string)) >>>>> print m >> None > >You need a raw string for the boundary marker \b (i.e the boundary >between \w and \W), else it creates a backspace control character. >Also, I don't see why you have ": at the start of the expression. This >works: > > >>> s = 'Updated: Dec 5, 2012, 5:08pm EST</div>' > >>> m = re.search(r'\b(\w+\s+\d+,\s+\d+,\s+\d+:\d+.m\s+\w+)<', s) > >>> m.group(1) > 'Dec 5, 2012, 5:08pm EST'
Lately I started using named groups (after I didn't understand some of my own regexes I wrote several months earlier). The downside is that the regexes easily get quite long, but one could use the re.VERBOSE flag to make it more readable. m = re.search(r'\b(?P<date>\w+\s+\d+,\s+\d+,\s+\d+:\d+.m\s+\w+)<', s) >>> m.group("date") 'Dec 5, 2012, 5:08pm EST' _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor