[PHP] preg_match and dates
I have absolutely no control over the source file. The source file is an xml file (er, sort of, it doesn't follow any particular DTD) and has a tag called VERBATIM_DATE in each record - looks to be required in their output as every record so far has it, but w/o a DTD hard to know - time of day, on the other hand, is not required and sometimes (usually) the tag missing. Here's the beauty - VERBATIM_DATE in the same xml file uses multiple different formats. IE - 12 March 1945 14 Mar 1967 Apr 1999 12-03-2005 Before 1904 Winter or Spring 1977 etc. It does seem that if there is a day, the day is always first - but sometimes it has a space as a delimiter, - as delimiter, and sometimes it has both - IE 10-15 Dec 1934 12 March-03 April 1956 What I'm trying to do is write a preg matches for each case I come across - if it matches the preg, it then parses according to the pattern to get me an acceptable -MM-DD (not sure how I'll deal with the season case yet ... but I'm serious, that kind of thing in there several times) To at least get started though, is there a wildcard defined that says match a month? IE /^([0-9]{2})[\s-](MONTH_MATCH)[\s-]([0-9]{4,4}$/ where MONTH is some special magic that matches Mar March Apr April etc. ? If you must know - it's data from a biology vertebrate museum. Thousands of records may match a given query. Most of them look fairly easily parsable and it does look like when a day is specified, it is always first and year is always last. The data is needed by me, so I'm planning on having the script die if it comes across a date I don't have a regex to parse before it does anything so I can add appropriate regex as necessary, but damn - you'd think a vertebrate museum would have cleaned up their DB somewhat. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] preg_match and dates
Michael A. Peters wrote: What I'm trying to do is write a preg matches for each case I come across - if it matches the preg, it then parses according to the pattern to get me an acceptable -MM-DD (not sure how I'll deal with the season case yet ... but I'm serious, that kind of thing in there several times) To at least get started though, is there a wildcard defined that says match a month? IE /^([0-9]{2})[\s-](MONTH_MATCH)[\s-]([0-9]{4,4}$/ where MONTH is some special magic that matches Mar March Apr April etc. ? Just write one yourself. -- Per Jessen, Zürich (6.1°C) -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] preg_match and dates
Michael A. Peters wrote: This is what I have so far - $pattern[] = /^([0-9]{1,2})[\s-]([A-Z][a-z]*)[\s-]([0-9]{4,4})$/i; $clean[] = \\3-\\2-\\1; $pattern[] = /^([A-Z][a-z]*)[\s-]([0-9]{4,4})$/; $clean[] = \\2-\\1-01; $foo = preg_replace($pattern, $clean, $verb_date); If I were you, I'd write several regexes, one for each date format you wish to recognize. It makes the regexes much easier to read, and you can still write sub-expressions for catching e.g. months and then reuse those in your main regexes. /Per Jessen -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] preg_match and dates
Per Jessen wrote: Michael A. Peters wrote: What I'm trying to do is write a preg matches for each case I come across - if it matches the preg, it then parses according to the pattern to get me an acceptable -MM-DD (not sure how I'll deal with the season case yet ... but I'm serious, that kind of thing in there several times) To at least get started though, is there a wildcard defined that says match a month? IE /^([0-9]{2})[\s-](MONTH_MATCH)[\s-]([0-9]{4,4}$/ where MONTH is some special magic that matches Mar March Apr April etc. ? Just write one yourself. This is what I have so far - $pattern[] = /^([0-9]{1,2})[\s-]([A-Z][a-z]*)[\s-]([0-9]{4,4})$/i; $clean[] = \\3-\\2-\\1; $pattern[] = /^([A-Z][a-z]*)[\s-]([0-9]{4,4})$/; $clean[] = \\2-\\1-01; $foo = preg_replace($pattern, $clean, $verb_date); That was enough for me to discover some collectors have two digit years and I can't differentiate 1902 from 2002 so I'll have to flag those and bug the curator to fix 'em. I'd rather have ([A-Z][a-z]*) be replaced with something that makes sure it is a valid short or long month, writing one myself is not impossible but if there is a date wildcard (or a tried and proven pattern) that can match month built into php then it is better to use it, no? -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php