-----Original Message-----
>From: WeBMartians <[EMAIL PROTECTED]>
>
>Can the standard state that datetime strings support ISO 8601:2004 and leave 
>it at that? (just a thought)

Considering the considerable number of formats that ISO 8601:2004 encompasses, 
I don't think that would be a good idea.  The following format possibilities 
for specifying instants in time, while acceptable to ISO 8601:2004 would not be 
worth adding to HTML 5 under any use case I can imagine.

I really can't see why they bothered with the century format except as 
supplemental field for pre Y2J applications that used a 2 digit year field.

Decimal minutes and hours are allowed in ISO 8601:2004, but lead to problems 
with rounding once one gets to increments smaller than 0.0001 min and 0.00001 h 
respectively, so should probably not be adopted.

The alternate ways of representing 1985-04-12: the ordinal date (1985-102) and 
the week date (1985-W15-5) don't offer anything that would justify the extra 
complexity that would be required of the parser.

In case the specification ever develops a use case for a comma separated list 
of datetime values, then the use of comma as an alternate to the period for the 
decimal point should not be allowed.

HTML 4 currently allows only:
####-##-##T##:##:##Z
####-##-##T##:##:##+##:##
####-##-##T##:##:##-##:##

These three formats are a subset of those suggested by the W3C datetime note 
[1] which itself is a subset of the choices available under ISO 8601.

The HTML 5 draft currently extends this to allow:
 * leaving out the seconds (valid under the W3C datetime note)
 * adding a decimal point and one or more digits for fractional seconds (valid 
under the W3C datetime note)
 * specifying just the date (valid under the W3C datetime note)
 * specifying just the time (valid under ISO 8601, but not the W3C datetime 
note)
 * replacing the "T" separator between the time and date with whitespace (NOT 
VALID under ISO 8601)
 * including some optional whitespace in some specific places (NOT VALID under 
ISO 8601)

Unless allowing the whitespace is needed to back standardize an existing 
implementation's handling of <ins>/<del>, it should be removed from the draft 
as it is not ISO 8601 compliant and complicates the job of the parser, though I 
grant it does improve the human readability slightly.

As a side issue, unless attribute values are restricted to ISO/IEC 656 
characters, ISO 8601 appears to call for also accepting U+2010 HYPHEN as a 
separator between the fields of a date value, and U+2212 MINUS for the sign in 
a time zone designator or extended year representation.

The following is a list of steps for a parser that accepts a wide range of 
valid ISO 8601 datetime formats, assuming that extended years are of the format 
±YYYYY and that at most 3 digits are allowed after the FULL STOP in the 
seconds.  Variables <date>, <time>, and <zone> are booleans for whether a date, 
time, or timezone was present in the designation.  Variables <year>, <month>, 
<day>, <hour>, <minute>, <second>, <millisecond>, <zonehour>, and <zoneminute> 
contain the correct results, but range checking for nonsense values as 
2007-02-29 or even 2007-13-42 is not done, nor is converting them to value 
stored in the DOM.

0. Set <month> and <day> to 1; set <hour>, <minute>, <second>, and 
<millisecond> to 0; set <date>, <time>, and <timezone> to invalid; advance to 
the first non-whitespace character.
1. If the next character is not PLUS, MINUS, or HYPHEN-MINUS, skip to step 5.
2. If the next character is not PLUS, set <year> to -1, else set <year> to 1.
3. Advance 1 character; if the next five characters are not in the set DIGIT 
ZERO .. DIGIT NINE, abort as invalid.
4. Set <date> to valid; multiply <year> by the value given by the next five 
characters; advance 5 characters; skip to step 7.
5. If the next four characters are not in the set DIGIT ZERO .. DIGIT NINE, 
skip to step 17.
6. Set <date> to valid; set <year> to the value given by the next four 
characters; advance 4 characters.
7. If the next character is whitespace, COMMA, or end of string, end as valid.
8. If the next character is not HYPHEN or HYPHEN-MINUS, abort as invalid.
9. If the next two characters are not in the set DIGIT ZERO .. DIGIT NINE, 
abort as invalid.
10. Set <month> to the value given by the next two characters; advance 2 
characters.
11. If the next character is whitespace, COMMA, or end of string, end as valid.
12. If the next character is not HYPHEN or HYPHEN-MINUS, abort as invalid.
13. If the next two characters are not in the set DIGIT ZERO .. DIGIT NINE, 
abort as invalid.
14. Set <day> to the value given by the next two characters; advance 2 
characters.
15. If the next character is whitespace, COMMA, or end of string, end as valid.
16. If the next character is not LATIN CAPITAL LETTER T abort as invalid.
17. If the next two characters are not in the set DIGIT ZERO .. DIGIT NINE, 
abort as invalid.
18. Set <time> to valid; set <hour> to the value given by the next two 
characters; advance 2 characters.
19. If the next character is whitespace, COMMA, or end of string, end as valid.
20. If the next character is not COLON abort as invalid.
21. If the next two characters are not in the set DIGIT ZERO .. DIGIT NINE, 
abort as invalid.
22. Set <minute> to the value given by the next two characters; advance 2 
characters.
23. If the next character is whitespace, COMMA, or end of string, end as valid.
24. If the next character is not COLON abort as invalid.
25. If the next two characters are not in the set DIGIT ZERO .. DIGIT NINE, 
abort as invalid.
26. Set <second> to the value given by the next two characters; advance 2 
characters.
27. If the next character is whitespace, COMMA, or end of string, end as valid.
28. If the next character is not FULL STOP abort as invalid.
29. If the next character is not in the set DIGIT ZERO .. DIGIT NINE, abort as 
invalid.
30. Set <millisecond> to 100 times the value given by the next character; 
advance 1 character.
31. If the next character is whitespace, COMMA, or end of string, end as valid. 
32. If the next character is PLUS. MINUS, HYPHEN-MINUS, or LATIN CAPITAL LETTER 
Z, skip to step 41.
33. If the next character is not in the set DIGIT ZERO .. DIGIT NINE, abort as 
invalid.
34. Add to <millisecond> 10 times the value given by the next character; 
advance 1 character.
35. If the next character is whitespace, COMMA, or end of string, end as valid. 
36. If the next character is PLUS. MINUS, HYPHEN-MINUS, or LATIN CAPITAL LETTER 
Z, skip to step 41.
37. If the next character is not in the set DIGIT ZERO .. DIGIT NINE, abort as 
invalid.
38. Add to <millisecond> the value given by the next character; advance 1 
character.
39. If the next character is whitespace, COMMA, or end of string, end as valid. 
40. If the next character is not PLUS. MINUS, HYPHEN-MINUS, or LATIN CAPITAL 
LETTER Z, abort as invalid.
41. Set <timezone> to valid, set <zoneminute> to 0.
42. If the next character is not LATIN CAPITAL LETTER Z, skip to step 45.
43. Set <zonehour> to 0; advance 1 character.
44. If the next character is whitespace, COMMA, or end of string, end as valid. 
else abort as invalid.
45. If the next character is not PLUS, set <zonehour> to -1, else set 
<zonehour> to 1.
46. If the next two characters are not in the set DIGIT ZERO .. DIGIT NINE, 
abort as invalid.
47. Multiply <zonehour> by the value given by the next two characters; advance 
2 characters.
48. If the next character is whitespace, COMMA, or end of string, skip to step 
#. 
49. If the next character is not COLON abort as invalid.
50. If the next two characters are not in the set DIGIT ZERO .. DIGIT NINE, 
abort as invalid.
51. Set <zoneminute> to the value given by the next two characters; advance 2 
characters.

Note that steps 1 to 4 in the above are only needed if years in expanded format 
(±YYYYY) are used.  With the exception of centuries, ordinal dates, week dates, 
decimal fractions of the hour, and decimal fractions of the minutes, the above 
parses the full set of extended format time instants specified by ISO 
8601:2004.  Rejecting additional forms, such as YYYY-MM, is for the most part 
simply changing an "end as valid" to an "abort as invalid" at the appropriate 
step.

The above does not recognize datetimes with the extra whitespace allowed in the 
HTML 5 draft as that is not valid in ISO 8601:2004.

[1] http://www.w3.org/TR/NOTE-datetime

Reply via email to