Re: [HTML5] 2.8 Character encodings

Anne van Kesteren Tue, 04 Aug 2009 02:13:35 -0700

On Tue, 04 Aug 2009 10:25:35 +0200, Dr. Olaf Hoffmann<[email protected]> wrote:

[snip] I think, up to know, it has not even a version indication,therefore it
is not obvious to me how to indicate, that a document is written in
'HTML5'.

This is by design. We're removing versioning from (X)HTML much like CSSdoes not have versioning. (To be clear, not everyone in the HTML WG agreeswith this design choice.)

But as already mentioned, for an author of an 'ISO-8859-1'-'HTML5'
document apart from the version indication it is already a problem to
specify the used encoding properly.


No, you can just specify it. Just like you can in HTML4.

This problem appears while a document is written and has to be solvedbefore publication, therefore
published documents are not broken, because they simply are not
published due to this problem.


I do not follow this.

Therefore if I start to write some test documents and this problem is
not avoided and a version indication is possible, I think, I will use
UTF-8 for those documents.


This seems like a good idea regardless.

Typically this means, that they are
incompatible with other of my documents and scripts and will appear
in another directory with an Apache-.htaccess file indicating the
different encoding.

That is one solution. You could also always indicate the encoding in thedocument instead and instruct Apache to not include the charset parameter.

I think, the Apache has an option with specific file name extensions too,
this can be used for directories with mixed encodings maybe.

That is an option too. You can also set headers on a per-file basis usingthe Files directive.

Surely I will not explain this to other authors, if this question comesup, because it is too complex for many authors.

Agreed. Encoding is largely misunderstood. It makes more sense for editorsto start defaulting to UTF-8 going forward and have everyone use that, inmy opinion.

This does not cause broken documents, the construct is just more fragile
and one has to care more, where to put and how to name files and one
has to switch the encoding in the editor for different projects. This is
only more work and more sources of possible errors, not recommendable
for every author.

If you simply switch to UTF-8 for all future work this will become lessand less of a problem. And then you've also covered other scripts may theneed arise to use them.

Therefore maybe I will never create more than test documents for
'HTML5' just to avoid such complications.

Ok.

With the new microdata section, 'HTML5' seemed to get more
interesting for authors (well, the CURIEs are still missing, but there
seems to be a workaround with entitiy definitions within the else
almost empty DOCTYPE), therefore it would have been interesting
to test this or to include this in tutorials for other authors, because
it has already a few more semantically relevant elements than
HTML4/XHTML1.x.

Since HTML5 is no longer SGML based entity definitions there will not workand are non-conforming. The reason we did this was because other than thevalidator no software processed text/html resources in this way leading toa lot of author confusion because of the clear mismatch between thevalidator and other software.



--
Anne van Kesteren
http://annevankesteren.nl/

Re: [HTML5] 2.8 Character encodings

Reply via email to