Re: [whatwg] Parsing RFC3339 constructs

2009-08-30 Thread Ian Hickson
On Thu, 20 Aug 2009, Christoph P�per wrote:
 Ian Hickson:
  On Tue, 11 Aug 2009, Nils Dagsson Moskopp wrote:
   Am Dienstag, den 11.08.2009, 07:27 + schrieb Ian Hickson:
On Tue, 11 Aug 2009, Julian Reschke wrote:
 Ian Hickson wrote:
   - the literal letters T and Z must be uppercase
  It simplifies processing a tiny amount.
 So for a tiny win, you change the format?
By a tiny amount, yes.
   
   It will be interesting to see if parsers choose to also get lowercase
   letters. I'd half-expect that to work, not at least because there may
   already be RFC-compliant libraries in the wild.
  
  The spec explicitly points out that implementors shouldn't naively use
  ISO8601 libraries.
 
 That is not naivity! It is a standard's duty to correctly integrate other
 standards. If HTML 5 uses a subset of ISO 8601 then content processors must be
 able to use generic ISO-conformant parsers.

No, sorry, ISO-8601 is too vague to make that possible. It doesn't define 
error handling, for instance.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: [whatwg] Parsing RFC3339 constructs

2009-08-20 Thread Christoph Päper

Ian Hickson:

On Tue, 11 Aug 2009, Nils Dagsson Moskopp wrote:

Am Dienstag, den 11.08.2009, 07:27 + schrieb Ian Hickson:

On Tue, 11 Aug 2009, Julian Reschke wrote:

Ian Hickson wrote:

  - the literal letters T and Z must be uppercase

It simplifies processing a tiny amount.

So for a tiny win, you change the format?

By a tiny amount, yes.


It will be interesting to see if parsers choose to also get  
lowercase

letters. I'd half-expect that to work, not at least because there may
already be RFC-compliant libraries in the wild.


The spec explicitly points out that implementors shouldn't naively use
ISO8601 libraries.


That is not naivity! It is a standard's duty to correctly integrate  
other standards. If HTML 5 uses a subset of ISO 8601 then content  
processors must be able to use generic ISO-conformant parsers.  
Content providers OTOH may be required to restrict their generators.  
As soon as they discover what else is accepted by browsers etc.,  
though, they will use it. Ergo HTML 5 should, from the beginning,  
support whatever parts of ISO 8601 common libraries cover already.  
Unless, of course, we presume that HTML implementors will use  
homebrewed code only.


Re: [whatwg] Parsing RFC3339 constructs

2009-08-15 Thread Ian Hickson
On Tue, 11 Aug 2009, Nils Dagsson Moskopp wrote:
 Am Dienstag, den 11.08.2009, 07:27 + schrieb Ian Hickson:
  On Tue, 11 Aug 2009, Julian Reschke wrote:
   Ian Hickson wrote:
On Mon, 27 Apr 2009, Asbjørn Ulsberg wrote:
 On Mon, 27 Apr 2009 12:59:11 +0200, Julian Reschke 
 julian.resc...@gmx.de
 wrote:
 - the literal letters T and Z must be uppercase
  Any technical reason why they have to?
 Any reason why they don't?

It simplifies processing a tiny amount.
   
   So for a tiny win, you change the format?
  
  By a tiny amount, yes.
 
 It will be interesting to see if parsers choose to also get lowercase 
 letters. I'd half-expect that to work, not at least because there may 
 already be RFC-compliant libraries in the wild.

The spec explicitly points out that implementors shouldn't naively use 
ISO8601 libraries.


 So if they do by the time HTML n is the standard, will the uppercase 
 restriction be removed in HTML n+1 ?

HTML5 itself will have to change if the implementations don't implement 
what it says.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: [whatwg] Parsing RFC3339 constructs

2009-08-11 Thread Ian Hickson
On Tue, 11 Aug 2009, Julian Reschke wrote:
 Ian Hickson wrote:
  On Mon, 27 Apr 2009, Asbjørn Ulsberg wrote:
   On Mon, 27 Apr 2009 12:59:11 +0200, Julian Reschke julian.resc...@gmx.de
   wrote:
   - the literal letters T and Z must be uppercase
Any technical reason why they have to?
   Any reason why they don't?
  
  It simplifies processing a tiny amount.
 
 So for a tiny win, you change the format?

By a tiny amount, yes.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: [whatwg] Parsing RFC3339 constructs

2009-08-11 Thread Nils Dagsson Moskopp
Am Dienstag, den 11.08.2009, 07:27 + schrieb Ian Hickson:
 On Tue, 11 Aug 2009, Julian Reschke wrote:
  Ian Hickson wrote:
   On Mon, 27 Apr 2009, Asbjørn Ulsberg wrote:
On Mon, 27 Apr 2009 12:59:11 +0200, Julian Reschke 
julian.resc...@gmx.de
wrote:
- the literal letters T and Z must be uppercase
 Any technical reason why they have to?
Any reason why they don't?
   
   It simplifies processing a tiny amount.
  
  So for a tiny win, you change the format?
 
 By a tiny amount, yes.

It will be interesting to see if parsers choose to also get lowercase
letters. I'd half-expect that to work, not at least because there may
already be RFC-compliant libraries in the wild.

So if they do by the time HTML n is the standard, will the uppercase
restriction be removed in HTML n+1 ?

Cheers
-- 
Nils Dagsson Moskopp
http://dieweltistgarnichtso.net



Re: [whatwg] Parsing RFC3339 constructs

2009-08-10 Thread Ian Hickson
On Mon, 27 Apr 2009, Asbjørn Ulsberg wrote:
 On Mon, 27 Apr 2009 12:59:11 +0200, Julian Reschke julian.resc...@gmx.de 
 wrote:
  
- the literal letters T and Z must be uppercase
 
  Any technical reason why they have to?
 
 Any reason why they don't?

It simplifies processing a tiny amount.


  It would help people understand what the difference to RFC 3339 is.
 
 Indeed, and this is exactly what we did in RFC 4287, as I've pointed out 
 previously. And I can't say that date parsing has proven to be an issue 
 there at all, even with the little work we did on narrowing down and 
 tightening the syntax. Section 3.3. of RFC 4287 says:
 
A Date construct is an element whose content MUST conform
to the date-time production in [RFC3339].  In addition,
an uppercase T character MUST be used to separate date
and time, and an uppercase Z character MUST be present
in the absence of a numeric time zone offset.
 
 Perhaps HTML5 needs more detailing than this for parsing, but not 
 referencing RFC 3339 just for the sake of not referencing RFC 3339 
 doesn't make much sense imho.
 
 For authoring (and parsing, infact), RFC 3339 plus a couple of 
 additional guidelines have proven to be enough for implementors of RFC 
 4287, so assume HTML5 could be better off doing the same, no?

HTML5 now references ISO8601 directly in a non-normative note explaining 
why ISO8601 isn't referenced normatively.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: [whatwg] Parsing RFC3339 constructs

2009-06-30 Thread Ian Hickson
On Fri, 5 Jun 2009, Julian Reschke wrote:
 Ian Hickson wrote:
  On Fri, 5 Jun 2009, Julian Reschke wrote:
   Ian Hickson wrote:
 Michael(tm) Smith wrote:
  It seems pretty clear that there isn't anything else to refer 
  to for the date/time parsing rules -- but to me at least, 
  specifying those rules seems orthogonal to specifying the 
  date/time syntax, and I would think the syntax could just be 
  defined by making reference to the productions[1] in RFC 3339 
  (instead of completely redefining them), while stating any 
  exceptions.
  
  [1] http://tools.ietf.org/html/rfc3339#section-5.6
  
  I think the exceptions might just amount to:
  
- the literal letters T and Z must be uppercase
 Any technical reason why they have to?
Not really. We just need a separator.
   So why make it different from RFC 3339?
  
  Limiting the syntax to the simplest possible syntax was an intentional 
  design choice intended to ease the burden on implementors and authors. 
  In practice, pretty much every time we've made syntax 
  case-insensitive, we've ended up having trouble because of it.
 
 If this was a totally new syntax, I would agree.
 
 But as something based on ISO8601 (and thereby also RFC 3339) it appears 
 to be a bad idea to make it less compatible just for that reason.

We've seriously simplified the ISO-8601 syntax in many more ways than just 
this. This was a conscious design decision.


  The HTML5 spec defines exactly how to parse dates. Implementors are 
  required to implement what the spec describes, so reusing libraries is 
  implicitly not likely to be useful here. RFC3339 isn't even a 
  particularly important one in the grand scheme of things (ISO8601 
  comes to mind as a much higher-profile example).
 
 I think it's unfortunate that HTML5 doesn't allow using an off-the-shelf 
 parser. But if it doesn't, and the temptation *will* be there to use 
 them, I'd recommend stating it very clearly.

Done.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Parsing RFC3339 constructs

2009-06-30 Thread Julian Reschke

Ian Hickson wrote:

If this was a totally new syntax, I would agree.

But as something based on ISO8601 (and thereby also RFC 3339) it appears 
to be a bad idea to make it less compatible just for that reason.


We've seriously simplified the ISO-8601 syntax in many more ways than just 
this. This was a conscious design decision.


Yes, the same decision was made for RFC 3339 (and the similar W3C Note). 
I was recommending to stay closer to those, not to ISO8601.



...


BR, Julian


Re: [whatwg] Parsing RFC3339 constructs

2009-06-05 Thread Ian Hickson
On Fri, 5 Jun 2009, Julian Reschke wrote:
 Ian Hickson wrote:
  Michael(tm) Smith wrote:
  It seems pretty clear that there isn't anything else to refer to for 
  the date/time parsing rules -- but to me at least, specifying those 
  rules seems orthogonal to specifying the date/time syntax, and I 
  would think the syntax could just be defined by making reference to 
  the productions[1] in RFC 3339 (instead of completely redefining 
  them), while stating any exceptions.
 
  [1] http://tools.ietf.org/html/rfc3339#section-5.6
 
  I think the exceptions might just amount to:
 
- the literal letters T and Z must be uppercase
  Any technical reason why they have to?
  
  Not really. We just need a separator.
 
 So why make it different from RFC 3339?

Limiting the syntax to the simplest possible syntax was an intentional 
design choice intended to ease the burden on implementors and authors. In 
practice, pretty much every time we've made syntax case-insensitive, we've 
ended up having trouble because of it.


- a year must be four or more digits, and must be greater that zero
  a year must be four or more digits -- sounds like an alternative 
  format that an additional RFC, updating RFC 3339 could specify.
 
  must be greater that zero -- that's not syntax :-)
 
  So yes, I think referring to RFC 3339, even if it's just a narrative 
  mention, would be good.
  
  Why?
 
 Because it explains to readers how this is different. That is important
 because it's natural to look for existing libraries to parse date formats.

The HTML5 spec defines exactly how to parse dates. Implementors are 
required to implement what the spec describes, so reusing libraries is 
implicitly not likely to be useful here. RFC3339 isn't even a particularly 
important one in the grand scheme of things (ISO8601 comes to mind as a 
much higher-profile example).

I'm certainly not proposing to go through every date format spec and 
explain how the rules in HTML5 differ from those rules. That is the kind 
of material that belongs in support documents.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Parsing RFC3339 constructs

2009-06-05 Thread Julian Reschke

Ian Hickson wrote:

On Fri, 5 Jun 2009, Julian Reschke wrote:

Ian Hickson wrote:

Michael(tm) Smith wrote:
It seems pretty clear that there isn't anything else to refer to for 
the date/time parsing rules -- but to me at least, specifying those 
rules seems orthogonal to specifying the date/time syntax, and I 
would think the syntax could just be defined by making reference to 
the productions[1] in RFC 3339 (instead of completely redefining 
them), while stating any exceptions.


[1] http://tools.ietf.org/html/rfc3339#section-5.6

I think the exceptions might just amount to:

  - the literal letters T and Z must be uppercase

Any technical reason why they have to?

Not really. We just need a separator.

So why make it different from RFC 3339?


Limiting the syntax to the simplest possible syntax was an intentional 
design choice intended to ease the burden on implementors and authors. In 
practice, pretty much every time we've made syntax case-insensitive, we've 
ended up having trouble because of it.


If this was a totally new syntax, I would agree.

But as something based on ISO8601 (and thereby also RFC 3339) it appears 
to be a bad idea to make it less compatible just for that reason.



  - a year must be four or more digits, and must be greater that zero
a year must be four or more digits -- sounds like an alternative 
format that an additional RFC, updating RFC 3339 could specify.


must be greater that zero -- that's not syntax :-)

So yes, I think referring to RFC 3339, even if it's just a narrative 
mention, would be good.

Why?

Because it explains to readers how this is different. That is important
because it's natural to look for existing libraries to parse date formats.


The HTML5 spec defines exactly how to parse dates. Implementors are 
required to implement what the spec describes, so reusing libraries is 
implicitly not likely to be useful here. RFC3339 isn't even a particularly 
important one in the grand scheme of things (ISO8601 comes to mind as a 
much higher-profile example).


I think it's unfortunate that HTML5 doesn't allow using an off-the-shelf 
parser. But if it doesn't, and the temptation *will* be there to use 
them, I'd recommend stating it very clearly.


I'm certainly not proposing to go through every date format spec and 
explain how the rules in HTML5 differ from those rules. That is the kind 
of material that belongs in support documents.


BR, Julian



Re: [whatwg] Parsing RFC3339 constructs

2009-06-04 Thread Ian Hickson
On Mon, 27 Apr 2009, Julian Reschke wrote:
 Michael(tm) Smith wrote:
  Ian Hickson i...@hixie.ch, 2009-04-25 05:35 +:
  On Fri, 2 Jan 2009, Asbjørn Ulsberg wrote:
  Reading the spec, I have to wonder: Does HTML5 need to specify as 
  much as it does inline? Can't more of it be referenced to ISO 8601 
  or even better; RFC 3339? I really fancy how Atom (RFC 4287) has 
  defined date constructs: 
  http://www.atompub.org/rfc4287.html#date.constructs Does not RFC 
  3339 defined date and time in a satisfactory manner to use directly 
  in HTML5?
  The problem isn't so much the syntax definitions as the parsing 
  definitions. We need very specific parsing rules; it's not clear that 
  there is anything to refer to that does the job we need here.
  
  It seems pretty clear that there isn't anything else to refer to for 
  the date/time parsing rules -- but to me at least, specifying those 
  rules seems orthogonal to specifying the date/time syntax, and I would 
  think the syntax could just be defined by making reference to the 
  productions[1] in RFC 3339 (instead of completely redefining them), 
  while stating any exceptions.
  
  [1] http://tools.ietf.org/html/rfc3339#section-5.6
  
  I think the exceptions might just amount to:
  
- the literal letters T and Z must be uppercase
 
 Any technical reason why they have to?

Not really. We just need a separator.


- a year must be four or more digits, and must be greater that zero
 
 a year must be four or more digits -- sounds like an alternative 
 format that an additional RFC, updating RFC 3339 could specify.
 
 must be greater that zero -- that's not syntax :-)
 
 So yes, I think referring to RFC 3339, even if it's just a narrative 
 mention, would be good.

Why?


 Ian replied:
  I don't understand what that would gain us.
 
 It would help people understand what the difference to RFC 3339 is.

Why is that important or desirable? It seems that comparisons to other 
specs would be better placed in other documents. HTML5 doesn't even 
describe how it differs from its previous version (HTML4), why would it 
include descriptions of differences from otherwise unrelated RFCs?

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: [whatwg] Parsing RFC3339 constructs

2009-04-27 Thread Julian Reschke
Michael(tm) Smith wrote:
 Ian Hickson i...@hixie.ch, 2009-04-25 05:35 +:
 
 On Fri, 2 Jan 2009, Asbjørn Ulsberg wrote:
 Reading the spec, I have to wonder: Does HTML5 need to specify as much 
 as it does inline? Can't more of it be referenced to ISO 8601 or even 
 better; RFC 3339? I really fancy how Atom (RFC 4287) has defined date 
 constructs: http://www.atompub.org/rfc4287.html#date.constructs Does 
 not RFC 3339 defined date and time in a satisfactory manner to use 
 directly in HTML5?
 The problem isn't so much the syntax definitions as the parsing 
 definitions. We need very specific parsing rules; it's not clear that 
 there is anything to refer to that does the job we need here.
 
 It seems pretty clear that there isn't anything else to refer to
 for the date/time parsing rules -- but to me at least, specifying
 those rules seems orthogonal to specifying the date/time syntax,
 and I would think the syntax could just be defined by making
 reference to the productions[1] in RFC 3339 (instead of completely
 redefining them), while stating any exceptions.
 
 [1] http://tools.ietf.org/html/rfc3339#section-5.6
 
 I think the exceptions might just amount to:
 
   - the literal letters T and Z must be uppercase

Any technical reason why they have to?

   - a year must be four or more digits, and must be greater that zero

a year must be four or more digits -- sounds like an alternative
format that an additional RFC, updating RFC 3339 could specify.

must be greater that zero -- that's not syntax :-)

So yes, I think referring to RFC 3339, even if it's just a narrative
mention, would be good.

Ian replied:
 I don't understand what that would gain us.

It would help people understand what the difference to RFC 3339 is.

BR, Julian



Re: [whatwg] Parsing RFC3339 constructs

2009-04-27 Thread Asbjørn Ulsberg
On Mon, 27 Apr 2009 12:59:11 +0200, Julian Reschke julian.resc...@gmx.de 
wrote:

   - the literal letters T and Z must be uppercase

 Any technical reason why they have to?

Any reason why they don't?

 It would help people understand what the difference to RFC 3339 is.

Indeed, and this is exactly what we did in RFC 4287, as I've pointed out 
previously. And I can't say that date parsing has proven to be an issue there 
at all, even with the little work we did on narrowing down and tightening the 
syntax. Section 3.3. of RFC 4287 says:

   A Date construct is an element whose content MUST conform
   to the date-time production in [RFC3339].  In addition,
   an uppercase T character MUST be used to separate date
   and time, and an uppercase Z character MUST be present
   in the absence of a numeric time zone offset.

Perhaps HTML5 needs more detailing than this for parsing, but not referencing 
RFC 3339 just for the sake of not referencing RFC 3339 doesn't make much sense 
imho.

For authoring (and parsing, infact), RFC 3339 plus a couple of additional 
guidelines have proven to be enough for implementors of RFC 4287, so assume 
HTML5 could be better off doing the same, no?

-- 
Asbjørn Ulsberg -=|=-  asbj...@ulsberg.no
«He's a loathsome offensive brute, yet I can't look away»


Re: [whatwg] Parsing RFC3339 constructs

2009-04-25 Thread Michael(tm) Smith
Ian Hickson i...@hixie.ch, 2009-04-25 05:35 +:

 On Fri, 2 Jan 2009, Asbjørn Ulsberg wrote:
  
  Reading the spec, I have to wonder: Does HTML5 need to specify as much 
  as it does inline? Can't more of it be referenced to ISO 8601 or even 
  better; RFC 3339? I really fancy how Atom (RFC 4287) has defined date 
  constructs: http://www.atompub.org/rfc4287.html#date.constructs Does 
  not RFC 3339 defined date and time in a satisfactory manner to use 
  directly in HTML5?
 
 The problem isn't so much the syntax definitions as the parsing 
 definitions. We need very specific parsing rules; it's not clear that 
 there is anything to refer to that does the job we need here.

It seems pretty clear that there isn't anything else to refer to
for the date/time parsing rules -- but to me at least, specifying
those rules seems orthogonal to specifying the date/time syntax,
and I would think the syntax could just be defined by making
reference to the productions[1] in RFC 3339 (instead of completely
redefining them), while stating any exceptions.

[1] http://tools.ietf.org/html/rfc3339#section-5.6

I think the exceptions might just amount to:

  - the literal letters T and Z must be uppercase

  - a year must be four or more digits, and must be greater that zero

-- 
Michael(tm) Smith
http://people.w3.org/mike/


Re: [whatwg] Parsing RFC3339 constructs

2009-04-25 Thread Ian Hickson
On Sat, 25 Apr 2009, Michael(tm) Smith wrote:
 
 It seems pretty clear that there isn't anything else to refer to
 for the date/time parsing rules -- but to me at least, specifying
 those rules seems orthogonal to specifying the date/time syntax,
 and I would think the syntax could just be defined by making
 reference to the productions[1] in RFC 3339 (instead of completely
 redefining them), while stating any exceptions.
 
 [1] http://tools.ietf.org/html/rfc3339#section-5.6
 
 I think the exceptions might just amount to:
 
   - the literal letters T and Z must be uppercase
 
   - a year must be four or more digits, and must be greater that zero

I don't understand what that would gain us.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


[whatwg] Parsing RFC3339 constructs

2009-04-24 Thread Ian Hickson
On Fri, 2 Jan 2009, Asbjørn Ulsberg wrote:
 
 Reading the spec, I have to wonder: Does HTML5 need to specify as much 
 as it does inline? Can't more of it be referenced to ISO 8601 or even 
 better; RFC 3339? I really fancy how Atom (RFC 4287) has defined date 
 constructs: http://www.atompub.org/rfc4287.html#date.constructs Does 
 not RFC 3339 defined date and time in a satisfactory manner to use 
 directly in HTML5?

The problem isn't so much the syntax definitions as the parsing 
definitions. We need very specific parsing rules; it's not clear that 
there is anything to refer to that does the job we need here.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'