Re: DateTime::Format::Simple and Indication of month/day/year or d/m/y in Locales...
Ben Bennett [EMAIL PROTECTED] schrieb/wrote: On Wed, Jul 16, 2003 at 12:05:00PM +0200, Claus Färber wrote: Not all languages use suffixes when writing numbers. In German, for example, you'd just write 14. Juli. (Actually, it's spoken vierzehn*ter* Juli, but one does not write 14ter Juli; in spoken language, the suffix is also used for the month as in vierzehnter siebter or 14ter 7ter). Great! Someone who can tell me real German usage! In the example above, is the . after 14 optional? I assume so... Actually, the dot (.) has a function similar to the suffix in other languages (e.g. -st, -nd, -th in English), i.e. it indicates an ordinal number. It is not optional in standard German orthography (that does not mean that everyone always adheres to this rule, of course). Can you clarify what AM/PM and BC/AD forms would be expected? AM/PM is not used in German-speaking countries. We simply use the 24 hour system when writing down times. (Spoken language is different and often ambiguous but I don't think that's relevant for a parser class.) BC is 'vor Christus' or 'v. Chr.'. AD is 'nach Christus' or 'n. Chr.' Sometimes 'AD' is used, too (very rare; usually only ecclesiastic texts). I'd suggest /v(?:or|\.)?\s*Chr(?:ist(?:us)|\.)?/ and /n(?:ach|\.)?\s*Chr(?:ist(?:us)|\.)?/ as patters to conver most non- standard variants. Claus -- http://www.faerber.muc.de
Re: DateTime::Format::Simple and Indication of month/day/year or d/m/y in Locales...
On Tue, Jul 15, 2003 at 01:56:53PM +1000, Iain Truskett wrote: * Ben Bennett ([EMAIL PROTECTED]) [15 Jul 2003 13:10]: [...] My quibble; the name. I'm not a huge fan of ::Simple and ::Lite. Unfortunately, I can't think of a nice alternate for it. Ok. I will think about that (suggestions welcomed). Sounds good. Ignoring unknown day names? I think so. I haven't decided yet. Ommissions from Date::Parse: - July 14th will not be parsed (I don't have localized info on the numeric suffixes) How about you just assume /\d{1,2}\w+/? Perhaps, I will play with it when the rest is finished. Input from people who speak other languages would be appreciated. I think that would be okay in French, I am a bit concerned about how it behaves with non-Latin languages. This will use the DT::F::Locales to get the localized forms of the days and months. What happens in the event of input being in an unknown locale? As in we don't know what locale this is in rather than we don't have locale data for xy_XX. Erm... maybe later I will make something that can deal with ambiguous locales. That seems like a non-Simple.pm task (I realize that it isn't that hard to do, but may be slow). Not really. The best one can do is have it so dates that can only be one type and not the other are done correctly. Ambiguity is part of the reason ISO8601 and W3CDTF have their order specified and why rfc(2)822 uses the month _name_. If you have the locale then I think you should be able to assume ordering. If Simple is to be simple, I'm not sure it can really handle such things. The idea of Simple modules is to have as little of an interface as possible. Inner complexity and outer simplicity. See above. -ben
Re: DateTime::Format::Simple and Indication of month/day/year or d/m/y in Locales...
On Tue, Jul 15, 2003 at 12:14:00AM -0400, John Siracusa wrote: On 7/14/03 11:10 PM, Ben Bennett wrote: [...] Sweet, someone took the bai--...er, picked up the baton ;) Well I have been playing around with the idea for a while, but when the locale stuff got in I decided it was time to stop fiddling and get something useful together. I'm sure you're already doing this, but just in case, make sure to allow for single-digit numbers where there is no ambiguity. This is essential for handling user-created input. Examples: 9/3/2002 (e.g. don't require 09/03/2002) 1:02 (e.g. don't require 01:02) Yes. and maybe even: 200210131:02 No! Egads :-) Actually I wasn't accepting the form 200210130102 either (I will accept 20021013T0102). Should I? but that may make some people break out in hives, so whatever :) Scratch, scratch... Also, don't forget about the optional . in a.m. and p.m. I'm not quite sure how that'd get localized, but the point is that the localized am/pm thingies must be regexes, not constant strings (or, okay, a regex constructed out of a list of constant strings, if you want :) Yeah, I was trying to work that out. It appears not to be in the raw locale data, so I was considering just accepting the am/pm stuff with optional inserted periods, even for other locales. I still have to survey all locales to see if that is even reasonable. The other choice would be to special case BC and AD to allow the dotted form, but that seems a bit restrictive. Which leads to my problem, there appears to be no simple way to get the date order to differentiate m/d/y from d/m/y. Don't. Make it a setting. I've been trying to think of what to name this setting, but have no good ideas. Here are some bad ones instead: It would be a setting... locale. DT::F::Simple-use_mmdd(1); DT::F::Simple-use_ddmm(1); DT::F::Simple-mode('us'); DT::F::Simple-mode('euro'); DT::F::Simple-euro_mode(1); DT::F::Simple-us_mode(1); Gah, that's horrible :) Someone out there must have some sort of pre-existing vocabulary to describe the date format differences. Is it just regional, or are there ISO numbers to reference or what? I can add an optional additional parameter dmy_mode (defaults to your locale if undef) but I really think inferring it from the locale is fine. Speaking of which, what interface do people want? my $us_parser = DateTime::Format::Simple-new(locale = en_US); my $dt = $us_parser-parse_datetime(2/11/74); Or: my $dt = DateTime::Format::Simple-parse_datetime(string = 2/11/74, locale = en_US); Note that it will always be legal to call: my $dt = DateTime::Format::Simple-parse_datetime(2/11/74); And some locale will be assumed (probably en_US). Another choice would be to allow both forms (which I may do to allow user flexilbility). -ben
Re: DateTime::Format::Simple and Indication of month/day/year or d/m/y in Locales...
On Mon, Jul 14, 2003 at 11:39:37PM -0500, Dave Rolsky wrote: On Mon, 14 Jul 2003, Ben Bennett wrote: Which leads to my problem, there appears to be no simple way to get the date order to differentiate m/d/y from d/m/y. I can look at the time formats and try to work it out, but that seems a bit dodgy if you ever change the parser, plus I assume that I am not the only person who will want to know that. So could we break it out as an explicit method? We'd have to look at the _actual_ format strings to do this, but it's certainly possible. Ok, I will play around with this and see if all of the locales have understandable short forms. Also the start of week infomation (and the weekend start and end) seem pretty useful for the financial stuff. Would it be reasonable to add them to the Locale objects? Probably. Cool, I may add that in the future if needed. -ben
Re: DateTime::Format::Simple and Indication of month/day/year or d/m/y in Locales...
On 7/15/03 8:05 AM, Ben Bennett wrote: On Tue, Jul 15, 2003 at 12:14:00AM -0400, John Siracusa wrote: I'm sure you're already doing this, but just in case, make sure to allow for single-digit numbers where there is no ambiguity. [...] Yes. (Also stuff like 10/25/2003 5 p.m. Just checking :) Actually I wasn't accepting the form 200210130102 either (I will accept 20021013T0102). Should I? Yes, definitely for the 200210130102 (or 20021013010259 or 20021013010259.12345) versions. Those aren't ambiguous at all, as far as I can see. I can take or leave the T :) I can add an optional additional parameter dmy_mode (defaults to your locale if undef) but I really think inferring it from the locale is fine. Yeah, that sounds better than trying to come up with names for the setting. Speaking of which, what interface do people want? my $us_parser = DateTime::Format::Simple-new(locale = en_US); my $dt = $us_parser-parse_datetime(2/11/74); Or: my $dt = DateTime::Format::Simple-parse_datetime(string = 2/11/74, locale = en_US); I'd use the first. (Actually, I'd use DateTime-parse(), which would use the first for me :) But I don't see why it can't support both. -John
Re: DateTime::Format::Simple and Indication of month/day/year or d/m/y in Locales...
On Tue, 15 Jul 2003, Ben Bennett wrote: Ommissions from Date::Parse: - July 14th will not be parsed (I don't have localized info on the numeric suffixes) How about you just assume /\d{1,2}\w+/? Perhaps, I will play with it when the rest is finished. Input from people who speak other languages would be appreciated. I think that would be okay in French, I am a bit concerned about how it behaves with non-Latin languages. It works as long as they're using Arabic numerals. If people want to write dates with native numerals (Chinese, for example) that's beyond the scope of common parsing, and not your problem. What happens in the event of input being in an unknown locale? As in we don't know what locale this is in rather than we don't have locale data for xy_XX. Erm... maybe later I will make something that can deal with ambiguous locales. That seems like a non-Simple.pm task (I realize that it isn't that hard to do, but may be slow). No, just default to the 'root' locale (which is really en_US). _That's_ simple ;) -dave /*=== House Absolute Consulting www.houseabsolute.com ===*/
Re: DateTime::Format::Simple and Indication of month/day/year or d/m/y in Locales...
On Tue, 15 Jul 2003, Ben Bennett wrote: We'd have to look at the _actual_ format strings to do this, but it's certainly possible. Ok, I will play around with this and see if all of the locales have understandable short forms. Actually, I was thinking that this would be done when generating the locale modules. It shouldn't be _too_ hard, I think. Also the start of week infomation (and the weekend start and end) seem pretty useful for the financial stuff. Would it be reasonable to add them to the Locale objects? Probably. Cool, I may add that in the future if needed. Again, see above. You're welcome to tweak the generator code in the DateTime::Locale CVS. -dave /*=== House Absolute Consulting www.houseabsolute.com ===*/
Re: DateTime::Format::Simple and Indication of month/day/year or d/m/y in Locales...
On Tue, Jul 15, 2003 at 11:40:16AM -0500, Dave Rolsky wrote: On Tue, 15 Jul 2003, Ben Bennett wrote: Actually, I was thinking that this would be done when generating the locale modules. It shouldn't be _too_ hard, I think. Sorry, that was where I was intending to fiddle with, I just wanted to make sure it was possible to do it for all of the locales (or at least a reasonable number of them). Again, see above. You're welcome to tweak the generator code in the DateTime::Locale CVS. Cool. -ben
Re: DateTime::Format::Simple and Indication of month/day/year or d/m/y in Locales...
Dave Rolsky schreef: On Tue, 15 Jul 2003, Ben Bennett wrote: 200210131:02 No! Egads :-) Actually I wasn't accepting the form 200210130102 either (I will accept 20021013T0102). Should I? Is the former form unambiguous? If so, you mighta s well accept it. 200210131:02 is (more or less) unambiguous: 2002-10-13T01:02. But 200210121:02 is ambiguous: 2002-10-12T01:02, 2002-10-01T21:02, 2002-01-01T21:02? Best not to accept either, I'd think. Eugene
Re: DateTime::Format::Simple and Indication of month/day/year or d/m/y in Locales...
On 7/14/03 11:10 PM, Ben Bennett wrote: I am taking a whack at DT::F::Simple (please speak up now if anyone else wants to claim this project) that can parse things that are similar to the ones that Date::Parse can do. Sweet, someone took the bai--...er, picked up the baton ;) Namely: - Rough ISO8601 strings (only complete datetimes and dates) - Dates with '-', '.', or '/' separators (either by month number or localized short or long name) - Times with ':' or '-' separator and optional localized AM/PM - Day names will be used to sanity check the parsed date if present - I will use localized BC/AD if present I'm sure you're already doing this, but just in case, make sure to allow for single-digit numbers where there is no ambiguity. This is essential for handling user-created input. Examples: 9/3/2002 (e.g. don't require 09/03/2002) 1:02 (e.g. don't require 01:02) and maybe even: 200210131:02 but that may make some people break out in hives, so whatever :) Also, don't forget about the optional . in a.m. and p.m. I'm not quite sure how that'd get localized, but the point is that the localized am/pm thingies must be regexes, not constant strings (or, okay, a regex constructed out of a list of constant strings, if you want :) Which leads to my problem, there appears to be no simple way to get the date order to differentiate m/d/y from d/m/y. Don't. Make it a setting. I've been trying to think of what to name this setting, but have no good ideas. Here are some bad ones instead: DT::F::Simple-use_mmdd(1); DT::F::Simple-use_ddmm(1); DT::F::Simple-mode('us'); DT::F::Simple-mode('euro'); DT::F::Simple-euro_mode(1); DT::F::Simple-us_mode(1); Gah, that's horrible :) Someone out there must have some sort of pre-existing vocabulary to describe the date format differences. Is it just regional, or are there ISO numbers to reference or what? -John
Re: DateTime::Format::Simple and Indication of month/day/year or d/m/y in Locales...
On Mon, 14 Jul 2003, Ben Bennett wrote: Which leads to my problem, there appears to be no simple way to get the date order to differentiate m/d/y from d/m/y. I can look at the time formats and try to work it out, but that seems a bit dodgy if you ever change the parser, plus I assume that I am not the only person who will want to know that. So could we break it out as an explicit method? We'd have to look at the _actual_ format strings to do this, but it's certainly possible. Also the start of week infomation (and the weekend start and end) seem pretty useful for the financial stuff. Would it be reasonable to add them to the Locale objects? Probably. -dave /*=== House Absolute Consulting www.houseabsolute.com ===*/
Re: DateTime::Format::Simple and Indication of month/day/year or d/m/y in Locales...
On Mon, 14 Jul 2003, Joshua Hoblitt wrote: My quibble; the name. I'm not a huge fan of ::Simple and ::Lite. Unfortunately, I can't think of a nice alternate for it. I usually think of ::Simple as referring to a reduced interface. Maybe ::Basic is a better namespace. I like ::Common, since it's supposed to handle common formats (for some value of common). -dave /*=== House Absolute Consulting www.houseabsolute.com ===*/
Re: DateTime::Format::Simple and Indication of month/day/year or d/m/y in Locales...
My quibble; the name. I'm not a huge fan of ::Simple and ::Lite. Unfortunately, I can't think of a nice alternate for it. I usually think of ::Simple as referring to a reduced interface. Maybe ::Basic is a better namespace. I like ::Common, since it's supposed to handle common formats (for some value of common). How about DateTime::Format::Common::Basic::Simple? -J --