RE: [WSG] Encoding, charsets and entities...

2005-06-29 Thread Richard Ishida
Hi Roberto,

I think this may answer many of your questions:
http://www.w3.org/International/tutorials/tutorial-char-enc/

RI



Richard Ishida
W3C

contact info:
http://www.w3.org/People/Ishida/ 

W3C Internationalization:
http://www.w3.org/International/ 

Publication blog:
http://people.w3.org/rishida/blog/
 
 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Roberto Gorjão
> Sent: 15 June 2005 10:27
> To: wsg@webstandardsgroup.org
> Subject: [WSG] Encoding, charsets and entities...
> 
> Hi,
> 
> I’m trying to understand the pros and cons of different 
> charset encodings and I would like to know what your 
> experience tells you about this subject, notably:
> 
> * Unicode encoding (UTF-8) seems to be more efficient than ISO
>   charsets (iso-8859-1): It covers all the languages in a single
>   encoding; it’s universal (or at least getting to be); it’s
>   compatible with ASCII; some argue even that it’s quicker… Are
>   there any drawbacks? Does the fact that the characters 
> Unicode may
>   have different sizes affect string calculus with JavaScript?
>   String lengths, character position retrieval and so on?
> * Where does the use of UTF leaves us regarding to entities? Some
>   say that we don’t have to worry anymore with coding currency
>   symbols or accented letters… Is that true? (I really 
> did never pay
>   much attention to this matter and get used to see 
> Dreamweaver code
>   automatically all accented letters that I insert in the 
> design tab
>   (that’s almost the only reason why I use the design tab 
> nowadays…)
>   but I think I would convert myself definitely to a much cheaper
>   software if even this functionality turns out to be 
> useless). And
>   what about quotation marks and less than and greater than signs?
>   They seem to validate all right when inserted directly 
> on the code
>   without any kind of special entities coding.
> * Which is the best way to declare it? I’ve noticed that
>   webstandardsgroup.org page declares it only in the XML “prolog”
>   and does not use any meta tag to do it as does for instance the
>   Unicode.org page.
> 
> Thank you.
> 
> Roberto
> 
> **
> The discussion list for  http://webstandardsgroup.org/
> 
>  See http://webstandardsgroup.org/mail/guidelines.cfm
>  for some hints on posting to the list & getting help
> **
> 

**
The discussion list for  http://webstandardsgroup.org/

 See http://webstandardsgroup.org/mail/guidelines.cfm
 for some hints on posting to the list & getting help
**



Re: [WSG] Encoding, charsets and entities...

2005-06-16 Thread Anders Nawroth


Dejan Kozina skrev:

The encoding declaration in the XML prolog is required only if you use 
an encoding that's not utf-8 or utf-16. XHTML documents default to 
utf-8 if not otherwise specified, while HTML (4.01) documents have no 
default charset.


http://www.w3.org/TR/xhtml-media-types/

"Authors should also be careful about character encoding issues. A 
typical misunderstanding is that since an XHTML document is an XML 
document, the character encoding of an XHTML document should be treated 
as UTF-8 or UTF-16 in the absence of an explicit character encoding 
information. This is *NOT* the case when an XHTML document is served as 
'text/html'."


/Anders
**
The discussion list for  http://webstandardsgroup.org/

See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list & getting help
**



Re: [WSG] Encoding, charsets and entities...

2005-06-15 Thread Dejan Kozina

Hi Roberto.
As long as you can input the characters directly utf-8 is a big 
time-saver. It makes for more readable code to boot. Since the demise of 
NN4 it is supported on all browsers around.
If you use a web-based form to submit content and the page is declared 
as utf-8, you can copy and paste at will into the form and the browser 
will be happy to take care of the conversion. Beats writing pagefuls of 
&#xxx; any time.
The first place a browser should look for the encoding declaration are 
the HTTP headers sent before the document itself ('Content-Type: 
text/html (or whatever);charset=utf-8'). If you're using Apache you may 
add a 'AddDefaultCharset utf-8' to your .htaccess.
The encoding declaration in the XML prolog is required only if you use 
an encoding that's not utf-8 or utf-16. XHTML documents default to utf-8 
if not otherwise specified, while HTML (4.01) documents have no default 
charset.
You may want to declare the charset inside the document too (with http-equiv>), just in case somebody saves it to the disk.


Roberto Gorjão wrote:

Hi,

I’m trying to understand the pros and cons of different charset 
encodings and I would like to know what your experience tells you about 
this subject, notably:


   * Unicode encoding (UTF-8) seems to be more efficient than ISO
 charsets (iso-8859-1): It covers all the languages in a single
 encoding; it’s universal (or at least getting to be); it’s
 compatible with ASCII; some argue even that it’s quicker… Are
 there any drawbacks? Does the fact that the characters Unicode may
 have different sizes affect string calculus with JavaScript?
 String lengths, character position retrieval and so on?
   * Where does the use of UTF leaves us regarding to entities? Some
 say that we don’t have to worry anymore with coding currency
 symbols or accented letters… Is that true? (I really did never pay
 much attention to this matter and get used to see Dreamweaver code
 automatically all accented letters that I insert in the design tab
 (that’s almost the only reason why I use the design tab nowadays…)
 but I think I would convert myself definitely to a much cheaper
 software if even this functionality turns out to be useless). And
 what about quotation marks and less than and greater than signs?
 They seem to validate all right when inserted directly on the code
 without any kind of special entities coding.
   * Which is the best way to declare it? I’ve noticed that
 webstandardsgroup.org page declares it only in the XML “prolog”
 and does not use any meta tag to do it as does for instance the
 Unicode.org page.

Thank you.

Roberto


--
Dejan Kozina
Dolina 346 (TS) - I-34018 Italy
tel./fax: +39 040 228 436 - cell.: +39 348 7355 225
http://www.kozina.com/  - e-mail: [EMAIL PROTECTED]
**
The discussion list for  http://webstandardsgroup.org/

See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list & getting help
**