Re: DateTime::Format::Simple and Indication of month/day/year or d/m/y in Locales...

2003-07-16 Thread Claus Färber
Ben Bennett [EMAIL PROTECTED] schrieb/wrote:
 On Wed, Jul 16, 2003 at 12:05:00PM +0200, Claus Färber wrote:
 Not all languages use suffixes when writing numbers. In German, for
 example, you'd just write 14. Juli. (Actually, it's spoken
 vierzehn*ter* Juli, but one does not write 14ter Juli; in spoken
 language, the suffix is also used for the month as in vierzehnter
 siebter or 14ter 7ter).

 Great!  Someone who can tell me real German usage!  In the example
 above, is the . after 14 optional?  I assume so...

Actually, the dot (.) has a function similar to the suffix in other
languages (e.g. -st, -nd, -th in English), i.e. it indicates an ordinal
number. It is not optional in standard German orthography (that does not
mean that everyone always adheres to this rule, of course).

 Can you clarify what AM/PM and BC/AD forms would be expected?

AM/PM is not used in German-speaking countries. We simply use the 24  
hour system when writing down times. (Spoken language is different and
often ambiguous but I don't think that's relevant for a parser class.)

BC is 'vor Christus' or 'v. Chr.'.
AD is 'nach Christus' or 'n. Chr.' Sometimes 'AD' is used, too (very
rare; usually only ecclesiastic texts).

I'd suggest /v(?:or|\.)?\s*Chr(?:ist(?:us)|\.)?/ and
/n(?:ach|\.)?\s*Chr(?:ist(?:us)|\.)?/ as patters to conver most non-
standard variants.

Claus
-- 
http://www.faerber.muc.de



Re: DateTime::Format::Simple and Indication of month/day/year or d/m/y in Locales...

2003-07-15 Thread Ben Bennett
On Tue, Jul 15, 2003 at 01:56:53PM +1000, Iain Truskett wrote:
 * Ben Bennett ([EMAIL PROTECTED]) [15 Jul 2003 13:10]:
[...] 

 My quibble; the name. I'm not a huge fan of ::Simple and ::Lite.
 Unfortunately, I can't think of a nice alternate for it.

Ok. I will think about that (suggestions welcomed).
 
 Sounds good. Ignoring unknown day names?

I think so.  I haven't decided yet.

  Ommissions from Date::Parse:
   - July 14th will not be parsed (I don't have localized info on the
 numeric suffixes)
 
 How about you just assume /\d{1,2}\w+/?

Perhaps, I will play with it when the rest is finished.  Input from
people who speak other languages would be appreciated.  I think that
would be okay in French, I am a bit concerned about how it behaves
with non-Latin languages.
 
  This will use the DT::F::Locales to get the localized forms of the
  days and months.
 
 What happens in the event of input being in an unknown locale? As in we
 don't know what locale this is in rather than we don't have locale
 data for xy_XX.

Erm... maybe later I will make something that can deal with ambiguous
locales.  That seems like a non-Simple.pm task (I realize that it
isn't that hard to do, but may be slow).
 
 Not really. The best one can do is have it so dates that can only be one
 type and not the other are done correctly. Ambiguity is part of the
 reason ISO8601 and W3CDTF have their order specified and why rfc(2)822
 uses the month _name_.

If you have the locale then I think you should be able to assume
ordering.
 
 If Simple is to be simple, I'm not sure it can really handle such
 things. The idea of Simple modules is to have as little of an
 interface as possible. Inner complexity and outer simplicity.

See above.

-ben


Re: DateTime::Format::Simple and Indication of month/day/year or d/m/y in Locales...

2003-07-15 Thread Ben Bennett
On Tue, Jul 15, 2003 at 12:14:00AM -0400, John Siracusa wrote:
 On 7/14/03 11:10 PM, Ben Bennett wrote:
[...]
 Sweet, someone took the bai--...er, picked up the baton ;)

Well I have been playing around with the idea for a while, but when
the locale stuff got in I decided it was time to stop fiddling and get
something useful together.
 
 I'm sure you're already doing this, but just in case, make sure to allow for
 single-digit numbers where there is no ambiguity.  This is essential for
 handling user-created input.  Examples:
 
 9/3/2002 (e.g. don't require 09/03/2002)
 1:02 (e.g. don't require 01:02)

Yes.

 and maybe even:
 
 200210131:02

No!  Egads :-)  Actually I wasn't accepting the form 200210130102
either (I will accept 20021013T0102).  Should I?

 but that may make some people break out in hives, so whatever :)

Scratch, scratch...

 Also,
 don't forget about the optional . in a.m. and p.m.  I'm not quite sure
 how that'd get localized, but the point is that the localized am/pm thingies
 must be regexes, not constant strings (or, okay, a regex constructed out of
 a list of constant strings, if you want :)

Yeah, I was trying to work that out.  It appears not to be in the raw
locale data, so I was considering just accepting the am/pm stuff with
optional inserted periods, even for other locales.  I still have to
survey all locales to see if that is even reasonable.  The other
choice would be to special case BC and AD to allow the dotted form,
but that seems a bit restrictive.
 
  Which leads to my problem, there appears to be no simple way to get
  the date order to differentiate m/d/y from d/m/y.
 
 Don't.  Make it a setting.  I've been trying to think of what to name this
 setting, but have no good ideas.  Here are some bad ones instead:

It would be a setting... locale.

 DT::F::Simple-use_mmdd(1);
 DT::F::Simple-use_ddmm(1);
 
 DT::F::Simple-mode('us');
 DT::F::Simple-mode('euro');
 
 DT::F::Simple-euro_mode(1);
 DT::F::Simple-us_mode(1);
 
 Gah, that's horrible :)  Someone out there must have some sort of
 pre-existing vocabulary to describe the date format differences.  Is it just
 regional, or are there ISO numbers to reference or what?

I can add an optional additional parameter dmy_mode (defaults to your
locale if undef) but I really think inferring it from the locale is
fine.

Speaking of which, what interface do people want?

  my $us_parser = DateTime::Format::Simple-new(locale = en_US);
  my $dt = $us_parser-parse_datetime(2/11/74);

Or:

  my $dt = DateTime::Format::Simple-parse_datetime(string = 2/11/74,
locale = en_US);

Note that it will always be legal to call:
  my $dt = DateTime::Format::Simple-parse_datetime(2/11/74);

And some locale will be assumed (probably en_US).

Another choice would be to allow both forms (which I may do to allow
user flexilbility).

-ben


Re: DateTime::Format::Simple and Indication of month/day/year or d/m/y in Locales...

2003-07-15 Thread Ben Bennett
On Mon, Jul 14, 2003 at 11:39:37PM -0500, Dave Rolsky wrote:
 On Mon, 14 Jul 2003, Ben Bennett wrote:
 
  Which leads to my problem, there appears to be no simple way to get
  the date order to differentiate m/d/y from d/m/y.  I can look at the
  time formats and try to work it out, but that seems a bit dodgy if you
  ever change the parser, plus I assume that I am not the only person
  who will want to know that.  So could we break it out as an explicit
  method?
 
 We'd have to look at the _actual_ format strings to do this, but it's
 certainly possible.

Ok, I will play around with this and see if all of the locales have
understandable short forms.

 
  Also the start of week infomation (and the weekend start and end) seem
  pretty useful for the financial stuff.  Would it be reasonable to add
  them to the Locale objects?
 
 Probably.

Cool, I may add that in the future if needed.

-ben


Re: DateTime::Format::Simple and Indication of month/day/year or d/m/y in Locales...

2003-07-15 Thread John Siracusa
On 7/15/03 8:05 AM, Ben Bennett wrote:
 On Tue, Jul 15, 2003 at 12:14:00AM -0400, John Siracusa wrote:
 I'm sure you're already doing this, but just in case, make sure to allow for
 single-digit numbers where there is no ambiguity. [...]
 
 Yes.

(Also stuff like 10/25/2003 5 p.m.  Just checking :)

 Actually I wasn't accepting the form 200210130102
 either (I will accept 20021013T0102).  Should I?

Yes, definitely for the 200210130102 (or 20021013010259 or
20021013010259.12345) versions.  Those aren't ambiguous at all, as far as
I can see.  I can take or leave the T :)

 I can add an optional additional parameter dmy_mode (defaults to your
 locale if undef) but I really think inferring it from the locale is
 fine.

Yeah, that sounds better than trying to come up with names for the setting.

 Speaking of which, what interface do people want?
 
 my $us_parser = DateTime::Format::Simple-new(locale = en_US);
 my $dt = $us_parser-parse_datetime(2/11/74);
 
 Or:
 
 my $dt = DateTime::Format::Simple-parse_datetime(string = 2/11/74,
locale = en_US);

I'd use the first.  (Actually, I'd use DateTime-parse(), which would use
the first for me :)  But I don't see why it can't support both.

-John



Re: DateTime::Format::Simple and Indication of month/day/year or d/m/y in Locales...

2003-07-15 Thread Dave Rolsky
On Tue, 15 Jul 2003, Ben Bennett wrote:

   Ommissions from Date::Parse:
- July 14th will not be parsed (I don't have localized info on the
  numeric suffixes)
 
  How about you just assume /\d{1,2}\w+/?

 Perhaps, I will play with it when the rest is finished.  Input from
 people who speak other languages would be appreciated.  I think that
 would be okay in French, I am a bit concerned about how it behaves
 with non-Latin languages.

It works as long as they're using Arabic numerals.  If people want to
write dates with native numerals (Chinese, for example) that's beyond the
scope of common parsing, and not your problem.

  What happens in the event of input being in an unknown locale? As in we
  don't know what locale this is in rather than we don't have locale
  data for xy_XX.

 Erm... maybe later I will make something that can deal with ambiguous
 locales.  That seems like a non-Simple.pm task (I realize that it
 isn't that hard to do, but may be slow).

No, just default to the 'root' locale (which is really en_US).  _That's_
simple ;)


-dave

/*===
House Absolute Consulting
www.houseabsolute.com
===*/


Re: DateTime::Format::Simple and Indication of month/day/year or d/m/y in Locales...

2003-07-15 Thread Dave Rolsky
On Tue, 15 Jul 2003, Ben Bennett wrote:

  We'd have to look at the _actual_ format strings to do this, but it's
  certainly possible.

 Ok, I will play around with this and see if all of the locales have
 understandable short forms.

Actually, I was thinking that this would be done when generating the
locale modules.  It shouldn't be _too_ hard, I think.

   Also the start of week infomation (and the weekend start and end) seem
   pretty useful for the financial stuff.  Would it be reasonable to add
   them to the Locale objects?
 
  Probably.

 Cool, I may add that in the future if needed.

Again, see above.  You're welcome to tweak the generator code in the
DateTime::Locale CVS.


-dave

/*===
House Absolute Consulting
www.houseabsolute.com
===*/


Re: DateTime::Format::Simple and Indication of month/day/year or d/m/y in Locales...

2003-07-15 Thread Ben Bennett
On Tue, Jul 15, 2003 at 11:40:16AM -0500, Dave Rolsky wrote:
 On Tue, 15 Jul 2003, Ben Bennett wrote:
 
 
 Actually, I was thinking that this would be done when generating the
 locale modules.  It shouldn't be _too_ hard, I think.

Sorry, that was where I was intending to fiddle with, I just wanted to
make sure it was possible to do it for all of the locales (or at least
a reasonable number of them).
 
 Again, see above.  You're welcome to tweak the generator code in the
 DateTime::Locale CVS.

Cool.

-ben


Re: DateTime::Format::Simple and Indication of month/day/year or d/m/y in Locales...

2003-07-15 Thread Eugene van der Pijll
Dave Rolsky schreef:
 On Tue, 15 Jul 2003, Ben Bennett wrote:
 
   200210131:02
 
  No!  Egads :-)  Actually I wasn't accepting the form 200210130102
  either (I will accept 20021013T0102).  Should I?
 
 Is the former form unambiguous?  If so, you mighta s well accept it.

200210131:02 is (more or less) unambiguous: 2002-10-13T01:02.

But 200210121:02 is ambiguous: 2002-10-12T01:02, 2002-10-01T21:02,
2002-01-01T21:02?

Best not to accept either, I'd think.

Eugene


Re: DateTime::Format::Simple and Indication of month/day/year or d/m/y in Locales...

2003-07-14 Thread John Siracusa
On 7/14/03 11:10 PM, Ben Bennett wrote:
 I am taking a whack at DT::F::Simple (please speak up now if anyone
 else wants to claim this project) that can parse things that are
 similar to the ones that Date::Parse can do.

Sweet, someone took the bai--...er, picked up the baton ;)

 Namely:
 - Rough ISO8601 strings (only complete datetimes and dates)
 - Dates with '-', '.', or '/' separators (either by month number or
  localized short or long name)
 - Times with ':' or '-' separator and optional localized AM/PM
 - Day names will be used to sanity check the parsed date if present
 - I will use localized BC/AD if present

I'm sure you're already doing this, but just in case, make sure to allow for
single-digit numbers where there is no ambiguity.  This is essential for
handling user-created input.  Examples:

9/3/2002 (e.g. don't require 09/03/2002)
1:02 (e.g. don't require 01:02)

and maybe even:

200210131:02

but that may make some people break out in hives, so whatever :)  Also,
don't forget about the optional . in a.m. and p.m.  I'm not quite sure
how that'd get localized, but the point is that the localized am/pm thingies
must be regexes, not constant strings (or, okay, a regex constructed out of
a list of constant strings, if you want :)

 Which leads to my problem, there appears to be no simple way to get
 the date order to differentiate m/d/y from d/m/y.

Don't.  Make it a setting.  I've been trying to think of what to name this
setting, but have no good ideas.  Here are some bad ones instead:

DT::F::Simple-use_mmdd(1);
DT::F::Simple-use_ddmm(1);

DT::F::Simple-mode('us');
DT::F::Simple-mode('euro');

DT::F::Simple-euro_mode(1);
DT::F::Simple-us_mode(1);

Gah, that's horrible :)  Someone out there must have some sort of
pre-existing vocabulary to describe the date format differences.  Is it just
regional, or are there ISO numbers to reference or what?

-John



Re: DateTime::Format::Simple and Indication of month/day/year or d/m/y in Locales...

2003-07-14 Thread Dave Rolsky
On Mon, 14 Jul 2003, Ben Bennett wrote:

 Which leads to my problem, there appears to be no simple way to get
 the date order to differentiate m/d/y from d/m/y.  I can look at the
 time formats and try to work it out, but that seems a bit dodgy if you
 ever change the parser, plus I assume that I am not the only person
 who will want to know that.  So could we break it out as an explicit
 method?

We'd have to look at the _actual_ format strings to do this, but it's
certainly possible.

 Also the start of week infomation (and the weekend start and end) seem
 pretty useful for the financial stuff.  Would it be reasonable to add
 them to the Locale objects?

Probably.


-dave

/*===
House Absolute Consulting
www.houseabsolute.com
===*/


Re: DateTime::Format::Simple and Indication of month/day/year or d/m/y in Locales...

2003-07-14 Thread Dave Rolsky
On Mon, 14 Jul 2003, Joshua Hoblitt wrote:

  My quibble; the name. I'm not a huge fan of ::Simple and ::Lite.
  Unfortunately, I can't think of a nice alternate for it.

 I usually think of ::Simple as referring to a reduced interface.  Maybe
 ::Basic is a better namespace.

I like ::Common, since it's supposed to handle common formats (for some
value of common).


-dave

/*===
House Absolute Consulting
www.houseabsolute.com
===*/


Re: DateTime::Format::Simple and Indication of month/day/year or d/m/y in Locales...

2003-07-14 Thread Joshua Hoblitt
   My quibble; the name. I'm not a huge fan of ::Simple and ::Lite.
   Unfortunately, I can't think of a nice alternate for it.
 
  I usually think of ::Simple as referring to a reduced interface.  Maybe
  ::Basic is a better namespace.

 I like ::Common, since it's supposed to handle common formats (for some
 value of common).

How about DateTime::Format::Common::Basic::Simple?

-J

--