OT Re: [WSG] UTF-8

2005-04-19 Thread Jan Brasna
PPS. This is a good test to see if the WSG mail system can
handle UTF-8
AFAIK å is Latin1 character (Scandinavian), so no need for UTF here.
--
Jan Brasna aka JohnyB :: www.alphanumeric.cz | www.janbrasna.com
**
The discussion list for  http://webstandardsgroup.org/
See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list  getting help
**


RE: [WSG] UTF-8 (was: Quirks mode vs Standards mode)

2005-04-19 Thread Richard Ishida
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Dean Jackson
 Sent: 19 April 2005 17:12
...

  I try to avoid entities with exception for '
 
 You're right. If you're using UTF-8 you only need to encode 
 the characters that are special in HTML/XHTML/XML (,  and ).
 Using numeric entities (or even named entities) in a UTF-8 
 file for characters that are outside the range of ASCII is 
 usually a waste of space.
 
 The only time I use them is when I'm on a keyboard/system 
 where I don't know how to enter the character, such as å. 
 I'd type aring; in this case.
 
 PS. Hopefully the W3C i18n guru Richard is listening and will 
 tell everyone if I'm wrong.

Hi Dean. I'd hesitate to say anyone was right or wrong here, but I'm of the
same opinion, albeit with one small exception.  I think in UTF-8
NCRs/entities beyond the ASCII range can be useful for invisible characters
(such as LRM in Arabic/Hebrew) or ambiguous characters (such as non-breaking
space - which looks like an ordinary space).

Tee mentioned some issues with Chinese characters on IE Mac that I haven't
got to the bottom of yet, but I don't recall encountering any other problems
that could be solved by using escapes instead.

For a fuller version of my opinion see the slides starting at
http://www.w3.org/International/tutorials/tutorial-char-enc/en/all.html#Slid
e0440

RI

**
The discussion list for  http://webstandardsgroup.org/

 See http://webstandardsgroup.org/mail/guidelines.cfm
 for some hints on posting to the list  getting help
**



RE: OT Re: [WSG] UTF-8

2005-04-19 Thread Richard Ishida
 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Jan Brasna
 Sent: 19 April 2005 17:29
 To: wsg@webstandardsgroup.org
 Subject: OT Re: [WSG] UTF-8
 
  PPS. This is a good test to see if the WSG mail system can handle 
  UTF-8
 
 AFAIK å is Latin1 character (Scandinavian), so no need for UTF here.
 

Yes, but the bytes used in ISO 8859-1 (Latin1) or Windows code page and
those usef for UTF-8 are different.  In Latin1 encoding å is a single byte:
E5; whereas UTF-8 represents this as two bytes: C3 A5.  So the fact that you
are seeing it indicates that the system recognised the Unicode encoding :-)

RI


PS: You may find my Unicode converter a useful play tool for this kind of
thing.  It's a bit rough and ready, but it's useful.
http://people.w3.org/rishida/scripts/uniview/conversion.en.html


Richard Ishida
W3C

contact info:
http://www.w3.org/People/Ishida/ 

W3C Internationalization:
http://www.w3.org/International/ 

Publication blog:
http://people.w3.org/rishida/blog/
 
 
 

**
The discussion list for  http://webstandardsgroup.org/

 See http://webstandardsgroup.org/mail/guidelines.cfm
 for some hints on posting to the list  getting help
**



Re: OT Re: [WSG] UTF-8

2005-04-19 Thread Jan Brasna
Yes, but the bytes used in ISO 8859-1 (Latin1) or Windows code page and
those usef for UTF-8 are different.
Sure, however the mail came in Latin1 (see the headers), so I just want 
to comment that it won't show the difference.

--
Jan Brasna aka JohnyB :: www.alphanumeric.cz | www.janbrasna.com
**
The discussion list for  http://webstandardsgroup.org/
See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list  getting help
**


Re: [WSG] UTF-8 (was: Quirks mode vs Standards mode)

2005-04-19 Thread Gene Falck
Hi Dean,
You wrote:
... Norwenglish lines of text into numeric entities
(UTF-8) where needed.
What characters needs encoding into numeric entities when using UTF-8?
I try to avoid entities with exception for '
It is a small nuisance, of course. I do use them
when I type (US English qwertyuiop keyboard) as I
usually don't have a place to copy and paste. It
does work quite well, though, when I copy and
paste something I used entities or the numeric
codes for into Outlook at work. Mostly at work I
use a degree sign or a plus/minus sign but there
is a lot to cover for foreign place and personal
names that is not on my keyboard.
You're right. If you're using UTF-8 you only need to encode
the characters that are special in HTML/XHTML/XML (,  and ).
Using numeric entities (or even named entities) in a UTF-8 file
for characters that are outside the range of ASCII is usually
a waste of space.
Does anyone have a good quick reference as to which
characters are good on UTF-8? How about a faster or
easier way to type them in? I wasn't aware (until
this thread) that there was enough space for place
name and personal name non-English characters in the
UTF-8 standard.
Regards,
Gene Falck
[EMAIL PROTECTED]
**
The discussion list for  http://webstandardsgroup.org/
See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list  getting help
**


Re: [WSG] UTF-8

2005-04-19 Thread Gunlaug Sørtun
Dean Jackson wrote:
The only time I use them is when I'm on a keyboard/system where I
don't know how to enter the character, such as å. I'd type aring;
in this case.
PS. Hopefully the W3C i18n guru Richard is listening and will tell
everyone if I'm wrong.
I'll second that...
Can someone actually _underwrite_ some real facts on this issue? Facts
that are global and cross-browser enough to be of any real use?
This is one of the few things I'd rather not leave for my visitors to
mess up. I can create a much larger mess at my end.
- Half of the Norwegian sites I visit in a day are full of
question-marks--until I actively change encoding, and change it again,
and again...
- Not uncommon problem on EN-US sites either btw, so something isn't
working too well. #8212; seems to come out a lot more predictable than
most alternatives I see daily. Euro-signs are ok, but they don't look as
if they belong in most sentences they appear in.
- No problem to hit it right, but right isn't the same on all sites,
and the facts I have found on this issue are often discarded by the
next fact-sheet I find. If I'm confused, then so are many regular
web-surfers.
- Regular ASCII-characters (- 127) isn't the problem. It's the 128 - 255
range that messes it up at my end. So I prefer aring or #229; instead
of å before my pages are released into the wild, as that'll get at
least the few characters we Norwegians need as extras come out right
regardless of what encoding _my_ browsers are set at.
Above the basic 8bits ASCII we either need the right encoding-map or the
right multiple of it so it becomes Universal. A proper - universal -
converter would be nice...
---
Whether I've mixed up encoding-maps and entities in a way I shouldn't,
isn't as important as getting it right at my end. I think I understand
enough about language-maps to be able to stack them together and end up
with a universal one in the end. (I think that's already done btw)
regards
Georg
--
http://www.gunlaug.no
**
The discussion list for  http://webstandardsgroup.org/
See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list  getting help
**


Re: [WSG] UTF-8

2005-04-19 Thread Jan Brasna
- Half of the Norwegian sites I visit in a day are full of
question-marks--until I actively change encoding, and change it again,
and again...
Hmm, we here in CZ use Latin2 or CP1250, everyone uses proper charset 
headers, so no problem with this.

--
Jan Brasna aka JohnyB :: www.alphanumeric.cz | www.janbrasna.com
**
The discussion list for  http://webstandardsgroup.org/
See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list  getting help
**


Re: [WSG] UTF-8

2005-04-19 Thread Gunlaug Sørtun
Jan Brasna wrote:
- Half of the Norwegian sites I visit in a day are full of 
question-marks--until I actively change encoding, and change it 
again, and again...
Hmm, we here in CZ use Latin2 or CP1250, everyone uses proper charset
 headers, so no problem with this.
You hit one of the usual problems right on the head. Proper charset
headers are often lacking.
(I have some confessions to make on that subject myself - think it's a
human bug that needs fixing :-) )
regards
Georg
--
http://www.gunlaug.no
**
The discussion list for  http://webstandardsgroup.org/
See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list  getting help
**


RE: [WSG] UTF-8 (was: Quirks mode vs Standards mode)

2005-04-19 Thread Richard Ishida
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Gene Falck
 Sent: 19 April 2005 18:49
...
 Does anyone have a good quick reference as to which 
 characters are good on UTF-8? How about a faster or easier 
 way to type them in? 

FWIW you may find this useful for Latin characters:
http://people.w3.org/rishida/scripts/pickers/latin/

See http://people.w3.org/rishida/scripts/pickers/ for explanations and other
scripts.

RI

**
The discussion list for  http://webstandardsgroup.org/

 See http://webstandardsgroup.org/mail/guidelines.cfm
 for some hints on posting to the list  getting help
**



[WSG] UTF-8 (was: Quirks mode vs Standards mode)

2005-04-18 Thread Anders Nawroth

HTMLTidy is the only useful piece of software I've found for web page
development, and I use it to clean up my pages and get proper encoding
of my Norwenglish lines of text into numeric entities (UTF-8) where 
needed.
What characters needs encoding into numeric entities when using UTF-8?
I try to avoid entities with exception for '
/anders (Sweden)
**
The discussion list for  http://webstandardsgroup.org/
See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list  getting help
**


Re: [WSG] UTF-8 (was: Quirks mode vs Standards mode)

2005-04-18 Thread Paul Menard
Just curious what tidy parameters you are using. I have some European (Polish, 
Czech, Russian)
language sites I'm working on and would prefer to convert the UTF-8 to some 
numeric equal for
certain high-range letters.

Paul
--- Anders Nawroth [EMAIL PROTECTED] wrote:
 
  HTMLTidy is the only useful piece of software I've found for web page
  development, and I use it to clean up my pages and get proper encoding
  of my Norwenglish lines of text into numeric entities (UTF-8) where 
  needed.
 
 What characters needs encoding into numeric entities when using UTF-8?
 
 I try to avoid entities with exception for '
 
 /anders (Sweden)
 **
 The discussion list for  http://webstandardsgroup.org/
 
  See http://webstandardsgroup.org/mail/guidelines.cfm
  for some hints on posting to the list  getting help
 **
 
 
**
The discussion list for  http://webstandardsgroup.org/

 See http://webstandardsgroup.org/mail/guidelines.cfm
 for some hints on posting to the list  getting help
**



Re: [WSG] UTF-8 (was: Quirks mode vs Standards mode)

2005-04-18 Thread Kornel Lesinski
On Mon, 18 Apr 2005 18:10:44 +0100, Paul Menard [EMAIL PROTECTED]  
wrote:

Just curious what tidy parameters you are using. I have some European  
(Polish, Czech, Russian) language sites I'm working on and would prefer 
to convert the UTF-8 to some numeric equal for certain high-range  
letters.
I couldn't get Tidy to properly transcode non-latin1 encodings and I use it
with -raw option that at least prevents it from ruining documents.
Conversion is as easy as copypaste - get text displayed properly, copy it
and paste into UTF-capable editor.
--
regards, Kornel Lesiski
**
The discussion list for  http://webstandardsgroup.org/
See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list  getting help
**


Re: [WSG] UTF-8

2005-04-18 Thread Jan Brasna
I have [...] Czech, [...] sites I'm working on and would prefer to convert 
the UTF-8 to some numeric equal for certain high-range letters.
Well, I'd suggest you not to do this, as nobody here would do it this 
way :) However it'd make the maintenance easier for non-CZ/PL person.

--
Jan Brasna aka JohnyB :: www.alphanumeric.cz | www.janbrasna.com
**
The discussion list for  http://webstandardsgroup.org/
See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list  getting help
**


Re: [WSG] UTF-8

2005-04-18 Thread Gunlaug Sørtun
Anders Nawroth wrote:
What characters needs encoding into numeric entities when using 
UTF-8?

I try to avoid entities with exception for '
Look for some answers here:
http://www.joelonsoftware.com/articles/Unicode.html
...so I don't have to give incomplete answers about something I'm not an
expert on.
-
My own reasoning is this:
I observed that text on my pages became too dependent on browsers
encode-settings, or their ability to auto-detect. Introducing entities
made the end-result much more predictable, and I haven't encountered a
single problem so far (after 2 years). Maybe others have, but they
haven't told me.
I write all (Norwegian) 8bit characters as plain text, characters above
as numeric entities, and leave the rest to Tidy.
What I get is Latin-1 (ISO-8859-1) with a mixture of decimal and
character entities, which is equivalent to UTF-8 for my characters as
far as I know. Most ASCII-characters are left as they are, but æ,
ø, å are converted.
I don't really know what Tidy will do when fed 8bits code from other
language-maps, but the few times I've copied a character from a language
outside my own 8bit maps and left it to Tidy, it has rendered correctly
in my browsers. It looks like a mess if I don't convert it this way.
-
Someone asked what parameters I use...
My Tidy has this script for convert to xml:
---
quote-marks: true
uppercase-tags: false
fix-backslash: false
literal-attributes: true
numeric-entities: true
output-xml: true
---
That's as much information as I can offer. Maybe someone can add some more.
regards
Georg
--
http://www.gunlaug.no
**
The discussion list for  http://webstandardsgroup.org/
See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list  getting help
**


Re: [WSG] UTF-8 ,charset and Standard

2004-11-30 Thread Patrick H. Lauke
berry wrote:
I understand that it is  not the uft-8 wich give the ability to render the
accent on the screen but the language content. meta
http-equiv=Content-Language content=fr
which tell the agent to render the accent using the UFT-8
also don't forget that meta alone is not enough. You need to have your web
server sending out the correct encoding with its headers as well
(if you're using Apache and have .htaccess override support, have a look at
http://www.w3.org/International/questions/qa-htaccess-charset for instance)
Then Why the validator gives an error for each accent when I use UFT-8?
It say that UFT-8 doesn't recognize this kind of character (french character)
Try it with the correct server headers and the validator should be happy.
--
Patrick H. Lauke
_
re·dux (adj.): brought back; returned. used postpositively
[latin : re-, re- + dux, leader; see duke.]
www.splintered.co.uk | www.photographia.co.uk
http://redux.deviantart.com
**
The discussion list for  http://webstandardsgroup.org/
See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list  getting help
**


Re: [WSG] UTF-8 ,charset and Standard

2004-11-30 Thread Kornel Lesinski
  UTF-8, a flavour of unicode, is an universal character set. You don't
define any codepage/language for it. You just simply use whatever
characters you like.
  meta http-equiv=Content-Type content=text/html; charset=UTF-8 /
This creates Content-Type header being http equivalent. Content is
text/html with charset UTF-8.
It would be even better to send real http header with charset. In PHP it's:
?php header(Content-Type: text/html; charset=UTF-8); ?
Note: Don't use Notepad or other Microsoft tools for UTF-8, because they
tend to add unvisible BOM marker character at the beginning of every
file. This helps them recognize UTF-8 from other files, but confuses many
browsers.
I use freeware Notepad2 for UTF-8.
--
regards, Kornel Lesiski
osiolki.net
**
The discussion list for  http://webstandardsgroup.org/
See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list  getting help
**


[WSG] UTF-8 ,charset and Standard

2004-11-29 Thread berry
Someone can told me why using charset if  we have to write in our page this
kind of code  #233; for the accent ?  I understand that the charset give
the opportunity  depend the langage browser to display page correctly but
It doesn't give the server the opportunity to display the page the right
way.  Sometimes, it seems that computer science is still at the stone age.
It feel me upset that each time I have to introduce a text  I have to
format it. I understand that we can give command to the server to display
the text the right way but we don't have always this possibility.

What can we do for keeping our accent in our HTML page?  and  if  I am
wrong can someone told why I can not see my accent on my page when I use
UTF-8 charset ?

Thanks in advance

Berry






**
The discussion list for  http://webstandardsgroup.org/

 See http://webstandardsgroup.org/mail/guidelines.cfm
 for some hints on posting to the list  getting help
**



Re: [WSG] UTF-8 ,charset and Standard

2004-11-29 Thread XStandard
Hi Berry,

Here is an example of a UTF-8 page with non-escaped French characters:

http://xstandard.com/page.asp?p=18BF64A8-DF0A-473E-8402-50E9E917E0C1

Are you able to see them in your browser?

Regards,
-Vlad
http://xstandard.com
Standards-compliant XHTML WYSIWYG editor

- Original Message -
From: berry [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Monday, November 29, 2004 11:57 AM
Subject: [WSG] UTF-8 ,charset and Standard


 Someone can told me why using charset if  we have to write in our page this
 kind of code  #233; for the accent ?  I understand that the charset give
 the opportunity  depend the langage browser to display page correctly but
 It doesn't give the server the opportunity to display the page the right
 way.  Sometimes, it seems that computer science is still at the stone age.
 It feel me upset that each time I have to introduce a text  I have to
 format it. I understand that we can give command to the server to display
 the text the right way but we don't have always this possibility.

 What can we do for keeping our accent in our HTML page?  and  if  I am
 wrong can someone told why I can not see my accent on my page when I use
 UTF-8 charset ?

 Thanks in advance

 Berry






 **
 The discussion list for  http://webstandardsgroup.org/

 See http://webstandardsgroup.org/mail/guidelines.cfm
 for some hints on posting to the list  getting help
 **



**
The discussion list for  http://webstandardsgroup.org/

 See http://webstandardsgroup.org/mail/guidelines.cfm
 for some hints on posting to the list  getting help
**



Re: [WSG] UTF-8 ,charset and Standard

2004-11-29 Thread berry
Yes I am able to see it in my browser maybe the server is set to render
the accent, if not how come I am not able to see the same thing with my page?

I would be surprise, if we have to use XHTML to have  accent ?

My Page use HTML4.1 strict.

Thanks in advance

Berry

Hi Berry,

Here is an example of a UTF-8 page with non-escaped French characters:

http://xstandard.com/page.asp?p=18BF64A8-DF0A-473E-8402-50E9E917E0C1

Are you able to see them in your browser?

Regards,
-Vlad
http://xstandard.com
Standards-compliant XHTML WYSIWYG editor

- Original Message -
From: berry [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Monday, November 29, 2004 11:57 AM
Subject: [WSG] UTF-8 ,charset and Standard


 Someone can told me why using charset if  we have to write in our page this
 kind of code  #233; for the accent ?  I understand that the charset give
 the opportunity  depend the langage browser to display page correctly but
 It doesn't give the server the opportunity to display the page the right
 way.  Sometimes, it seems that computer science is still at the stone age.
 It feel me upset that each time I have to introduce a text  I have to
 format it. I understand that we can give command to the server to display
 the text the right way but we don't have always this possibility.

 What can we do for keeping our accent in our HTML page?  and  if  I am
 wrong can someone told why I can not see my accent on my page when I use
 UTF-8 charset ?

 Thanks in advance

 Berry






 **
 The discussion list for  http://webstandardsgroup.org/

 See http://webstandardsgroup.org/mail/guidelines.cfm
 for some hints on posting to the list  getting help
 **



**
The discussion list for  http://webstandardsgroup.org/

 See http://webstandardsgroup.org/mail/guidelines.cfm
 for some hints on posting to the list  getting help
**







**
The discussion list for  http://webstandardsgroup.org/

 See http://webstandardsgroup.org/mail/guidelines.cfm
 for some hints on posting to the list  getting help
**



Re: [WSG] UTF-8 ,charset and Standard

2004-11-29 Thread berry
Finaly  I have the answer !

I understand that it is  not the uft-8 wich give the ability to render the
accent on the screen but the language content. meta
http-equiv=Content-Language content=fr
which tell the agent to render the accent using the UFT-8

Then Why the validator gives an error for each accent when I use UFT-8?
It say that UFT-8 doesn't recognize this kind of character (french character)


Thank you in Advance

Berry









**
The discussion list for  http://webstandardsgroup.org/

 See http://webstandardsgroup.org/mail/guidelines.cfm
 for some hints on posting to the list  getting help
**