RE: [WSG] Encoding, charsets and entities...
Hi Roberto, I think this may answer many of your questions: http://www.w3.org/International/tutorials/tutorial-char-enc/ RI Richard Ishida W3C contact info: http://www.w3.org/People/Ishida/ W3C Internationalization: http://www.w3.org/International/ Publication blog: http://people.w3.org/rishida/blog/ > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Roberto Gorjão > Sent: 15 June 2005 10:27 > To: wsg@webstandardsgroup.org > Subject: [WSG] Encoding, charsets and entities... > > Hi, > > I’m trying to understand the pros and cons of different > charset encodings and I would like to know what your > experience tells you about this subject, notably: > > * Unicode encoding (UTF-8) seems to be more efficient than ISO > charsets (iso-8859-1): It covers all the languages in a single > encoding; it’s universal (or at least getting to be); it’s > compatible with ASCII; some argue even that it’s quicker… Are > there any drawbacks? Does the fact that the characters > Unicode may > have different sizes affect string calculus with JavaScript? > String lengths, character position retrieval and so on? > * Where does the use of UTF leaves us regarding to entities? Some > say that we don’t have to worry anymore with coding currency > symbols or accented letters… Is that true? (I really > did never pay > much attention to this matter and get used to see > Dreamweaver code > automatically all accented letters that I insert in the > design tab > (that’s almost the only reason why I use the design tab > nowadays…) > but I think I would convert myself definitely to a much cheaper > software if even this functionality turns out to be > useless). And > what about quotation marks and less than and greater than signs? > They seem to validate all right when inserted directly > on the code > without any kind of special entities coding. > * Which is the best way to declare it? I’ve noticed that > webstandardsgroup.org page declares it only in the XML “prolog” > and does not use any meta tag to do it as does for instance the > Unicode.org page. > > Thank you. > > Roberto > > ** > The discussion list for http://webstandardsgroup.org/ > > See http://webstandardsgroup.org/mail/guidelines.cfm > for some hints on posting to the list & getting help > ** > ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list & getting help **
Re: [WSG] Encoding, charsets and entities...
Dejan Kozina skrev: The encoding declaration in the XML prolog is required only if you use an encoding that's not utf-8 or utf-16. XHTML documents default to utf-8 if not otherwise specified, while HTML (4.01) documents have no default charset. http://www.w3.org/TR/xhtml-media-types/ "Authors should also be careful about character encoding issues. A typical misunderstanding is that since an XHTML document is an XML document, the character encoding of an XHTML document should be treated as UTF-8 or UTF-16 in the absence of an explicit character encoding information. This is *NOT* the case when an XHTML document is served as 'text/html'." /Anders ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list & getting help **
Re: [WSG] Encoding, charsets and entities...
Hi Roberto. As long as you can input the characters directly utf-8 is a big time-saver. It makes for more readable code to boot. Since the demise of NN4 it is supported on all browsers around. If you use a web-based form to submit content and the page is declared as utf-8, you can copy and paste at will into the form and the browser will be happy to take care of the conversion. Beats writing pagefuls of xx; any time. The first place a browser should look for the encoding declaration are the HTTP headers sent before the document itself ('Content-Type: text/html (or whatever);charset=utf-8'). If you're using Apache you may add a 'AddDefaultCharset utf-8' to your .htaccess. The encoding declaration in the XML prolog is required only if you use an encoding that's not utf-8 or utf-16. XHTML documents default to utf-8 if not otherwise specified, while HTML (4.01) documents have no default charset. You may want to declare the charset inside the document too (with http-equiv>), just in case somebody saves it to the disk. Roberto Gorjão wrote: Hi, I’m trying to understand the pros and cons of different charset encodings and I would like to know what your experience tells you about this subject, notably: * Unicode encoding (UTF-8) seems to be more efficient than ISO charsets (iso-8859-1): It covers all the languages in a single encoding; it’s universal (or at least getting to be); it’s compatible with ASCII; some argue even that it’s quicker… Are there any drawbacks? Does the fact that the characters Unicode may have different sizes affect string calculus with JavaScript? String lengths, character position retrieval and so on? * Where does the use of UTF leaves us regarding to entities? Some say that we don’t have to worry anymore with coding currency symbols or accented letters… Is that true? (I really did never pay much attention to this matter and get used to see Dreamweaver code automatically all accented letters that I insert in the design tab (that’s almost the only reason why I use the design tab nowadays…) but I think I would convert myself definitely to a much cheaper software if even this functionality turns out to be useless). And what about quotation marks and less than and greater than signs? They seem to validate all right when inserted directly on the code without any kind of special entities coding. * Which is the best way to declare it? I’ve noticed that webstandardsgroup.org page declares it only in the XML “prolog” and does not use any meta tag to do it as does for instance the Unicode.org page. Thank you. Roberto -- Dejan Kozina Dolina 346 (TS) - I-34018 Italy tel./fax: +39 040 228 436 - cell.: +39 348 7355 225 http://www.kozina.com/ - e-mail: [EMAIL PROTECTED] ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list & getting help **