I have absolutely no control over the source file.

The source file is an xml file (er, sort of, it doesn't follow any particular DTD) and has a tag called VERBATIM_DATE in each record - looks to be required in their output as every record so far has it, but w/o a DTD hard to know - time of day, on the other hand, is not required and sometimes (usually) the tag missing.

Here's the beauty - VERBATIM_DATE in the same xml file uses multiple different formats. IE -

12 March 1945
14 Mar 1967
Apr 1999
Before 1904
Winter or Spring 1977


It does seem that if there is a day, the day is always first - but sometimes it has a space as a delimiter, - as delimiter, and sometimes it has both - IE

10-15 Dec 1934
12 March-03 April 1956

What I'm trying to do is write a preg matches for each case I come across - if it matches the preg, it then parses according to the pattern to get me an acceptable YYYY-MM-DD (not sure how I'll deal with the season case yet ... but I'm serious, that kind of thing in there several times)

To at least get started though, is there a wildcard defined that says match a month?



where MONTH is some special magic that matches Mar March Apr April etc. ?

If you must know - it's data from a biology vertebrate museum. Thousands of records may match a given query. Most of them look fairly easily parsable and it does look like when a day is specified, it is always first and year is always last.

The data is needed by me, so I'm planning on having the script die if it comes across a date I don't have a regex to parse before it does anything so I can add appropriate regex as necessary, but damn - you'd think a vertebrate museum would have cleaned up their DB somewhat.

PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to