RE: [WSG] UTF-8 (was: Quirks mode vs Standards mode)

2005-04-19 Thread Richard Ishida
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Gene Falck
> Sent: 19 April 2005 18:49
...
> Does anyone have a good quick reference as to which 
> characters are "good" on UTF-8? How about a faster or easier 
> way to type them in? 

FWIW you may find this useful for Latin characters:
http://people.w3.org/rishida/scripts/pickers/latin/

See http://people.w3.org/rishida/scripts/pickers/ for explanations and other
scripts.

RI

**
The discussion list for  http://webstandardsgroup.org/

 See http://webstandardsgroup.org/mail/guidelines.cfm
 for some hints on posting to the list & getting help
**



Re: [WSG] UTF-8 (was: Quirks mode vs Standards mode)

2005-04-19 Thread Gene Falck
Hi Dean,
You wrote:
... Norwenglish lines of text into numeric entities
(UTF-8) where needed.
What characters needs encoding into numeric entities when using UTF-8?
I try to avoid entities with exception for & < > " '
It is a small nuisance, of course. I do use them
when I type (US English qwertyuiop keyboard) as I
usually don't have a place to copy and paste. It
does work quite well, though, when I copy and
paste something I used entities or the numeric
codes for into Outlook at work. Mostly at work I
use a degree sign or a plus/minus sign but there
is a lot to cover for foreign place and personal
names that is not on my keyboard.
You're right. If you're using UTF-8 you only need to encode
the characters that are special in HTML/XHTML/XML (&, < and >).
Using numeric entities (or even named entities) in a UTF-8 file
for characters that are outside the range of ASCII is usually
a waste of space.
Does anyone have a good quick reference as to which
characters are "good" on UTF-8? How about a faster or
easier way to type them in? I wasn't aware (until
this thread) that there was enough space for place
name and personal name non-English characters in the
UTF-8 standard.
Regards,
Gene Falck
[EMAIL PROTECTED]
**
The discussion list for  http://webstandardsgroup.org/
See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list & getting help
**


RE: [WSG] UTF-8 (was: Quirks mode vs Standards mode)

2005-04-19 Thread Richard Ishida
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Dean Jackson
> Sent: 19 April 2005 17:12
...

> > I try to avoid entities with exception for & < > " '
> 
> You're right. If you're using UTF-8 you only need to encode 
> the characters that are special in HTML/XHTML/XML (&, < and >).
> Using numeric entities (or even named entities) in a UTF-8 
> file for characters that are outside the range of ASCII is 
> usually a waste of space.
> 
> The only time I use them is when I'm on a keyboard/system 
> where I don't know how to enter the character, such as "å". 
> I'd type å in this case.
> 
> PS. Hopefully the W3C i18n guru Richard is listening and will 
> tell everyone if I'm wrong.

Hi Dean. I'd hesitate to say anyone was right or wrong here, but I'm of the
same opinion, albeit with one small exception.  I think in UTF-8
NCRs/entities beyond the ASCII range can be useful for invisible characters
(such as LRM in Arabic/Hebrew) or ambiguous characters (such as non-breaking
space - which looks like an ordinary space).

Tee mentioned some issues with Chinese characters on IE Mac that I haven't
got to the bottom of yet, but I don't recall encountering any other problems
that could be solved by using escapes instead.

For a fuller version of my opinion see the slides starting at
http://www.w3.org/International/tutorials/tutorial-char-enc/en/all.html#Slid
e0440

RI

**
The discussion list for  http://webstandardsgroup.org/

 See http://webstandardsgroup.org/mail/guidelines.cfm
 for some hints on posting to the list & getting help
**



Re: [WSG] UTF-8 (was: Quirks mode vs Standards mode)

2005-04-19 Thread Dean Jackson
On 19 Apr 2005, at 02:36, Anders Nawroth wrote:

HTMLTidy is the only useful piece of software I've found for web page
development, and I use it to clean up my pages and get proper encoding
of my Norwenglish lines of text into numeric entities (UTF-8) where 
needed.
What characters needs encoding into numeric entities when using UTF-8?
I try to avoid entities with exception for & < > " '
You're right. If you're using UTF-8 you only need to encode
the characters that are special in HTML/XHTML/XML (&, < and >).
Using numeric entities (or even named entities) in a UTF-8 file
for characters that are outside the range of ASCII is usually
a waste of space.
The only time I use them is when I'm on a keyboard/system where
I don't know how to enter the character, such as "å". I'd type
å in this case.
PS. Hopefully the W3C i18n guru Richard is listening and will
tell everyone if I'm wrong.
PPS. This is a good test to see if the WSG mail system can
handle UTF-8 (assuming Apple Mail encodes the message using it).
Dean
--
dean jackson
world wide web consortium (w3c) - http://www.w3.org/
mailto:[EMAIL PROTECTED]
**
The discussion list for  http://webstandardsgroup.org/
See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list & getting help
**


Re: [WSG] UTF-8 (was: Quirks mode vs Standards mode)

2005-04-18 Thread Kornel Lesinski
On Mon, 18 Apr 2005 18:10:44 +0100, Paul Menard <[EMAIL PROTECTED]>  
wrote:

Just curious what tidy parameters you are using. I have some European  
(Polish, Czech, Russian) language sites I'm working on and would prefer 
to convert the UTF-8 to some numeric equal for certain high-range  
letters.
I couldn't get Tidy to properly transcode non-latin1 encodings and I use it
with -raw option that at least prevents it from ruining documents.
Conversion is as easy as copy&paste - get text displayed properly, copy it
and paste into UTF-capable editor.
--
regards, Kornel Lesiński
**
The discussion list for  http://webstandardsgroup.org/
See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list & getting help
**


Re: [WSG] UTF-8 (was: Quirks mode vs Standards mode)

2005-04-18 Thread Paul Menard
Just curious what tidy parameters you are using. I have some European (Polish, 
Czech, Russian)
language sites I'm working on and would prefer to convert the UTF-8 to some 
numeric equal for
certain high-range letters.

Paul
--- Anders Nawroth <[EMAIL PROTECTED]> wrote:
> 
> > HTMLTidy is the only useful piece of software I've found for web page
> > development, and I use it to clean up my pages and get proper encoding
> > of my Norwenglish lines of text into numeric entities (UTF-8) where 
> > needed.
> 
> What characters needs encoding into numeric entities when using UTF-8?
> 
> I try to avoid entities with exception for & < > " '
> 
> /anders (Sweden)
> **
> The discussion list for  http://webstandardsgroup.org/
> 
>  See http://webstandardsgroup.org/mail/guidelines.cfm
>  for some hints on posting to the list & getting help
> **
> 
> 
**
The discussion list for  http://webstandardsgroup.org/

 See http://webstandardsgroup.org/mail/guidelines.cfm
 for some hints on posting to the list & getting help
**