Hello JuliÃn,

At the W3C we wrote some material to answer your questions.  Please see:

http://www.w3.org/International/tutorials/tutorial-char-enc/

and 

http://www.w3.org/International/geo/html-tech/tech-character.html (still early 
draft!)

Please take a look (and let me know if there is any way we can improve the 
material).

Cheers,
RI


============
Richard Ishida
W3C

contact info:
http://www.w3.org/People/Ishida/ 

W3C Internationalization:
http://www.w3.org/International/ 

Publication blog:
http://people.w3.org/rishida/blog/
 
 

> -----Original Message-----
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Dejan Kozina
> Sent: 22 November 2004 01:44
> To: [EMAIL PROTECTED]
> Subject: Re: [WSG] choosing encoding, charset and using 
> special characters
> 
> 
> 
> JuliÃn Landerreche wrote:
> 
> > 1) Question: Is there a way to use special characters 
> directly in the 
> > code?
> 
> Two ways, actually, both requiring the pages being displayed as utf-8.
> One is writing the document with an editor capable of saving text as
> utf-8 (Unired is the one I like -
> http://www.esperanto.mv.ru/UniRed/ENG/), so that anything you 
> can key or paste in it will be stored correctly and rendered 
> as expected, as long as you remember to put  a <meta 
> http-equiv="content-type"
> content="text/html; charset=utf-8"> in your page's head. The 
> other one is using a browser's form to input the text and 
> send it to some sort of CMS. Provided the page with the form 
> is utf-8 too, all modern browsers will convert the whole 
> stuff to utf-8 while uploading.
> 
> > 2) I have seen a lot of webpages that directly use the special 
> > character and dont code them as html entities. This pages are 
> > displayed correctly. Question: Is this a good or bad 
> practice (to use 
> > special characters in code, instead of entities)?
> 
> According to my experience, it is OK to do it using Unicode, 
> otherwise you're relying on unwarranted assumptions regarding 
> the native codepage of the reader's machine (example: if you 
> use an à in your source it will probably be displayed as such 
> on any Spanish and generally western language OS, but it will 
> become a c on most Central European PCs).


As long as you declare the encoding of your page, and that encoding contains 
the character you want to display, it is better to use characters rather than 
escapes.  Apart from anything else, it improves maintainability and reduces 
bandwidth.


> 
> > 3. In Google results, I found that those special characters arent 
> > always correctly displayed.
> 
> Google uses utf-8 for display, so your browser renders the 
> title as if it was encoded as such.
> 
> > Question:  Is there a way to force or override the encoding (not the
> > charset) directly from the page code?
> > I think that my textpattern managed pages should have ISO-8850-1 
> > encoding.


You presumably mean ISO-8859-1 (rather than 8850).  Note that the W3C now 
serves its pages using utf-8.  It makes life a lot easier when you have 
multilingual pages or a number of pages in multiple languages.

> 
> You can try using the numeric character references (written 
> as &#xxx, where xxx is the decimal value of the character) or 
> the hexadecimal ones (written as &#xAAAA, where AAAA is the 
> hex value of the same). The complete list of references is at 
> ftp://ftp.unicode.org/Public/MAPPINGS/.


Note that the numeric value MUST be a Unicode code point value, whatever the 
encoding you are using. There are easier ways of finding a Unicode code point.  
For example, you could try my UniView utility at 
http://www.w3.org/People/Ishida/utilities.html 



> 
> > 3. If I change to UTF-8...  wich are the advantages / disvantages?
> 
> The main advantages are correct rendering in all modern 
> browsers - OSes, plus the possibility of hassle-free mixing 
> of characters from any charset on a  single page. Besides 
> this, it is rapidly becoming the standard encoding for all 
> sort of documents, on the web or otherwise.


As alluded to above.  Significant advantages also arise when receiving form 
data from multilingual pages and storing it centrally.  You don't need to 
figure out which encoding was used, and convert.

Hope that helps.
RI



> 
> There are disavantages: Netscape 4.7 mostly doesn't recognize 
> the characters (except for the first 127 that are part of 
> ASCII) and MacOS 9 and below has sometimes a weird way of 
> displaying them.
> 
> One final word about the document title: even if you place 
> the above meta before the title tag and tweak your server to 
> transmit the correct MIME type almost any browser around will 
> still use the OS's default 'window title' font for the title, 
> so it will be displayed as expected only if that font 
> contains the required glyphs (or shapes). It will display 
> correctly in Google listings, nevertheless.
> 
> 
> --
> Dejan Kozina Web Design Studio
> Dolina 346 (TS)
> I-34018 Trst/Trieste - Italy
> tel./fax: +39 040 228 436
> cell.: +39 348 7355 225
> http://www.kozina.com/
> e-mail: [EMAIL PROTECTED]
> 
> 
> 
> 
> 

******************************************************
The discussion list for  http://webstandardsgroup.org/

 See http://webstandardsgroup.org/mail/guidelines.cfm
 for some hints on posting to the list & getting help
******************************************************

Reply via email to