> I have a regex that matches dates in various formats. I've tested the regex
> in a reliable testbed, and it seems to match what I want (dates in formats
> like "1 Jan 2010" and "January 1, 2010" and also "January 2008"). It's just
> that using re.findall with it is giving me weird output. I'm using Python
> 2.6.5 here, and I've put in line breaks for clarity's sake:
>
> >>> import re
>
> >>> date_regex =
> >>> re.compile(r"([0-3]?[0-9])?((\s*)|(\t*))((Jan\.?u?a?r?y?)|(Feb\.?r?u?a?r?y?)|(Mar\.?c?h?)|(Apr\.?i?l?)|(May)|(Jun[e.]?)|(Jul[y.]?)|(Aug\.?u?s?t?)|(Sep[t.]?\.?e?m?b?e?r?)|(Oct\.?o?b?e?r?)|(Nov\.?e?m?b?e?r?)|(Dec\.?e?m?b?e?r?))((\s*)|(\t*))(2?0?[0-3]?[0-9]\,?)?((\s*)|(\t*))(2?0?[01][0-9])")
This will also match '1 Janry 2010'.
Not sure if it should?
<snip>two examples</snip>
> >>> test_output = re.findall(date_regex, "The date was January 1, 2008. But
> >>> it was not January 2, 2008.")
>
> >>> print test_output
> [('', ' ', ' ', '', 'January', 'January', '', '', '', '', '', '', '', '', '',
> '', '', ' ', ' ', '', '1,', ' ', ' ', '', '2008'), ('', ' ', ' ', '',
> 'January', 'January', '', '', '', '', '', '', '', '', '', '', '', ' ', ' ',
> '', '2,', ' ', ' ', '', '2008')]
>
> A friend says: " I think that the problem is that every time that you have a
> parenthesis you get an output. Maybe there is a way to suppress this."
>
> My friend's explanation speaks to the empties, but maybe not to the two
> Januaries. Either way, what I want is for re.finall, or some other re method
> that perhaps I haven't properly explored, to return the matches and just the
> matches.
>
> I've read the documentation, googled various permutations etc, and I can't
> figure it out. Any help much appreciated.
The docs say: " If one or more groups are present in the pattern, return a list
of groups". So your friend is right.
In fact, your last example shows exactly this: it shows a list of two tuples.
The tuples contain individual group matches, the two list elements are your two
date matches.
You could solve this by grouping the entire regex (so r"(([0-3 .... [0-9]))" ;
I would even use a named group), and then picking out the first tuple element
of each list element:
[(' January 1, 2008', '', ' ', ' ', '', 'January', 'January', '', '', '', '',
'', '', '', '', '', '', '', ' ', ' ', '', '1,', ' ', ' ', '', '2008'), ('
January 2, 2008', '', ' ', ' ', '', 'January', 'January', '', '', '', '', '',
'', '', '', '', '', '', ' ', ' ', '', '2,', ' ', ' ', '', '2008')]
Hth,
Evert
_______________________________________________
Tutor maillist - [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor