[WSG] charater sets

2004-11-30 Thread Jonathan T. Sage
Hello - 

  Lately, with all the discussion over UTF charsets, I've been
thinking, and I doubt that I'm alone, so I will post here in the hopes
of creating a nice, informative thread that we can all reference back
to in the future.

I administer a site that is written in english, will never be written
in anything other than english (for the forseeable future), and
contains very few, if any special characters (perhaps the very
infrequent accent).

My question is, what are the possible advantages or disadvantages to
serve these pages as UTF instead of iso-whatever?

I suppose I'm looking for a nice set of resources along the lines of
many accesibility sites that answer up front the why should I care?
question.

Any info would be great, thanks!

~j

-- 
Jonathan T. Sage
Theatrical Lighting / Set Designer
Professional Web Design

[HTTP://www.JTSage.com]
[HTTP://design.JTSage.com]
[EMAIL PROTECTED]
**
The discussion list for  http://webstandardsgroup.org/

 See http://webstandardsgroup.org/mail/guidelines.cfm
 for some hints on posting to the list  getting help
**



Re: [WSG] charater sets

2004-11-30 Thread Kornel Lesinski
In UTF-8 files you can use extra characters in their natural form  
instead of HTML entities - like nbsp, shy, mdash, ndash. You may also use  
quotes, elipsis, etc.
They take less space and are safer for string manipulations on server-side.

You don't have to worry about copying and pasting from other sources (MS  
Word creates quotes and dashes that (formally) are incompatible with  
ISO-8859-1).

Foreign names are preserved.
There are problems, though. Many editors that claim to support UTF-8, but  
internally operate on strings translated to codepage, so they may lose  
characters not present in current system codepage.

As I've mentioned in other post, Notepad, ASP Web Matrix and most likely  
other Microsoft text editors insert invisible BOM character to mark file  
as UTF-8. This character prevents DOCTYPE or XML Prolog from being  
recognized and makes output buffering useless in PHP4.

If you heavily use UTF-8 (most notably soft hyphen) you need to check if  
browser can handle it (check Accept-Charset header plus serve UTF to IE  
anyway, because it sends meaningless headers) - if browser (bot?) can't  
handle UTF-8 you need to make conversion.

--
regards, Kornel Lesiski
**
The discussion list for  http://webstandardsgroup.org/
See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list  getting help
**


Re: [WSG] charater sets

2004-11-30 Thread berry

Kornel LesiÒski is completly right  that was my problem, hidden character,
but using Charset seems sometimes a nightmare and there are so much
information about, that at the end you ask your self if what you understood
is correct.

I tryed to analyse the charset and I reached out with this 3 point. Maybe
you will be interest to comment.

1-The server is the most important thing. If the server is set correctly
for your langage you have no problems even if you forget to write the
charset or the langage content.
example
http://www3.sympatico.ca/berryf/accent.htm

2-The client browser have prevalent on the server even you have put the right
charset
Example if you use uft-8 or iso-8859-1 and the browser is set to baltic or
for some other language, your accent will desapear

3-If  the server is not set for the right language,  the correct charset
and the correct-languge content in your HTML page will give you excellenet
result without using this kind of code #233;


4-Using ISO and having é à û ... character give you a SGML Character Error
with the w3 validator, this error disappears  when you test your page
online.

Finaly  why all this pages and pages about charset if charset can be
explain in few lines!  In our world there is too much informations and W3
have to think to reorganize it site in two parts 1 for users and
programmers 2 for advance information on computer science. Sometimes moving
in the W3C site give headaches.

Regards

Berry




Kornel LesiÒski wrote

In UTF-8 files you can use extra characters in their natural form
instead of HTML entities - like nbsp, shy, mdash, ndash. You may also use
quotes, elipsis, etc.
They take less space and are safer for string manipulations on server-side.

You don't have to worry about copying and pasting from other sources (MS
Word creates quotes and dashes that (formally) are incompatible with
ISO-8859-1).

Foreign names are preserved.

There are problems, though. Many editors that claim to support UTF-8, but
internally operate on strings translated to codepage, so they may lose
characters not present in current system codepage.

As I've mentioned in other post, Notepad, ASP Web Matrix and most likely
other Microsoft text editors insert invisible BOM character to mark file
as UTF-8. This character prevents DOCTYPE or XML Prolog from being
recognized and makes output buffering useless in PHP4.

If you heavily use UTF-8 (most notably soft hyphen) you need to check if
browser can handle it (check Accept-Charset header plus serve UTF to IE
anyway, because it sends meaningless headers) - if browser (bot?) can't
handle UTF-8 you need to make conversion.

--
regards, Kornel LesiÒski

**
The discussion list for  http://webstandardsgroup.org/

 See http://webstandardsgroup.org/mail/guidelines.cfm
 for some hints on posting to the list  getting help
**







**
The discussion list for  http://webstandardsgroup.org/

 See http://webstandardsgroup.org/mail/guidelines.cfm
 for some hints on posting to the list  getting help
**