Hey everyone, I was wondering if there is a way to use the datetime module to check for variations on a month name when performing a regex match?
In the script below, I created a regex pattern that checks for dates in the following pattern: "August 31, 2007". If there's a match, I can then print the capture date and the line from which it was extracted. While it works in this isolated case, it struck me as not very flexible. What happens when I inevitably get data that has dates formatted in a different way? Do I have to create some type of library that contains variations on each month name (i.e. - January, Jan., 01, 1...) and use that to parse each line? Or is there some way to use datetime to check for date patterns when using regex? Is there a "best practice" in this area that I'm unaware of in this area? Apologies if this question has been answered elsewhere. I wasn't sure how to research this topic (beyond standard datetime docs), but I'd be happy to RTM if someone can point me to some resources. Any suggestions are welcome (including optimizations of the code below). Regards, Serdar #!/usr/bin/env python import re, sys sourcefile = open(sys.argv[1],'r') pattern = re.compile(r'(?P<month>January|February|March|April|May|June|July|August|September|October|November|December)\s(?P<day>\d{1,2}),\s(?P<year>\d{4})') pattern2 = re.compile(r'Return to List') counter = 0 for line in sourcefile: x = pattern.search(line) break_point = pattern2.match(line) if x: counter +=1 print "%s %d, %d <== %s" % ( x.group('month'), int(x.group('day')), int(x.group('year')), line ), elif break_point: break print counter sourcefile.close()
_______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor