Ok, let's forget about the HTML discussion and let's talk about XML:
 
In a message dated 11/5/2003 12:11:21 PM Pacific Standard Time, [EMAIL PROTECTED] writes:

One can however use it safely with XHTML, because XHTML documents are XML
documents which may specify explicitly another document schema that includes
this extra attribute (thanks to the modular model of XHTML). But you'll have
to provide your own XML schema...
hum... not quite the same. Be carefully here. It depend on what MIME type you used in the Content-Type for your xhtml....
you need to carefully read the following two documents
1. RFC 3023- XML Media types http://www.faqs.org/rfcs/rfc3023.html
 


Note that for XHTML, which must be a valid XML document, UTF-8 is the
default if nothing is specified.
Not true, according to XHTML Media Type http://www.w3.org/TR/xhtml-media-types/ if you are using "application/xhtml+xml" or "application/xml" for your xhtml, then "UTF-8 is the default if nothign is specified". However, if you use "text/xml" as your Content-Type in the header. Read the following text from RFC 3023- XML Media types http://www.faqs.org/rfcs/rfc3023.html :
 
[begin of quote]
3.6 Summary

   The following list applies to text/xml, text/xml-external-parsed-
   entity, and XML-based media types under the top-level type "text"
   that define the charset parameter according to this specification:

   o  Charset parameter is strongly recommended.

   o  If the charset parameter is not specified, the default is "us-
      ascii".  The default of "iso-8859-1" in HTTP is explicitly
      overridden.

   o  No error handling provisions.

   o  An encoding declaration, if present, is irrelevant, but when
      saving a received resource as a file, the correct encoding
      declaration SHOULD be inserted.
[end of quote]
 
Notice, it say not only the "us-ascii" is the default if there are no charset parameter in the HTTP Content-Type header. It ALSO said that "any encoding declaration" (that mean <?xml encoding=""?>) ", if present, is irrevleant". (Supprise :) )
 
But the XML declaration may be added on top
to specify the charset to use when parsing the XML document. In that case,
the XML declaration in the document takes precedence on the external HTTP
header, which itself takes precedence on the <meta http-equiv /> elements.
That is not what the RFC 3023 say. Actaully, in RFC3023, it say such XML declaration should have no effect if received over HTTP protocol.


So if you want full XML compliance and support for legacy browsers, you need
to:
First thing need to be done. Add charset=UTF-8 to the HTTP Content-Type header itself if you are using "text/xml'. or the other approach is to use non "text" MIME Content-Type.

    - use a leading <?xml ?> declaration with the explicit charset
pseudo-attribute.
Not a bad idea to do it anyway.


    - declare the <!DOCTYPE > with your own schema, and make this extended
schema accessible at the referenced SYSTEM url, and give it a specific
PUBLIC doctype name.

    - use a <meta http-equiv /> tag very soon in your <head> section, even
before any possibly internationalized string like the <title></title>
element (in fact it is recommanded to put ALL <meta http-equiv /> elements
before the required <title></title> element and then only put the other
<meta name /> elements such as robots control tags, description and
keywords)

    - avoid all line breaks within <meta http-equiv /> elements (needed for
some web servers tuned for performance and that can parse lazily the HTML
document before generating HTTP headers), unless you can control the
generation of HTTP headers (with a external server control file like
.httpd.conf or similar features, or if you generate headers yourself within
a server-side script)
no clue why you need this.


    - make sure you insert a space before all abbreviated elements
terminators "/>"

    - always specify explicitly the "iso-8859-1" document charset with the
above method, if this is the one you use, as the default charset differs
between HTML (which defaults to ISO-8859-1) and XHTML (which defaults to
UTF-8, per XML conformance, unless there's a leading BOM to specify UTF-16
or UTF-32)
 
==================================
Frank Yung-Fong Tang
System Architect, I�t�rn�ti�n�l D�v�l�pme�t, AOL Int�r��t�v� S�rvi�es
AIM:yungfongta mailto:[EMAIL PROTECTED] Tel:650-937-2913
Yahoo! Msg: frankyungfongtan

John 3:16 "For God so loved the world that he gave his one and only Son, that whoever believes in him shall not perish but have eternal life.

Does your software display Thai language text correctly for Thailand users?
-> Basic Conceptof Thai Language linked from Frank Tang's I�t�rn�ti�n�liz�ti�n Secrets
Want to translate your English text to something Thailand users can understand ?
-> Try English-to-Thai machine translation at http://c3po.links.nectec.or.th/parsit/

Reply via email to