RE: [WSG] UTF-8 (was: Quirks mode vs Standards mode)
> From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Gene Falck > Sent: 19 April 2005 18:49 ... > Does anyone have a good quick reference as to which > characters are "good" on UTF-8? How about a faster or easier > way to type them in? FWIW you may find this useful for Latin characters: http://people.w3.org/rishida/scripts/pickers/latin/ See http://people.w3.org/rishida/scripts/pickers/ for explanations and other scripts. RI ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list & getting help **
Re: [WSG] UTF-8 (was: Quirks mode vs Standards mode)
Hi Dean, You wrote: ... Norwenglish lines of text into numeric entities (UTF-8) where needed. What characters needs encoding into numeric entities when using UTF-8? I try to avoid entities with exception for & < > " ' It is a small nuisance, of course. I do use them when I type (US English qwertyuiop keyboard) as I usually don't have a place to copy and paste. It does work quite well, though, when I copy and paste something I used entities or the numeric codes for into Outlook at work. Mostly at work I use a degree sign or a plus/minus sign but there is a lot to cover for foreign place and personal names that is not on my keyboard. You're right. If you're using UTF-8 you only need to encode the characters that are special in HTML/XHTML/XML (&, < and >). Using numeric entities (or even named entities) in a UTF-8 file for characters that are outside the range of ASCII is usually a waste of space. Does anyone have a good quick reference as to which characters are "good" on UTF-8? How about a faster or easier way to type them in? I wasn't aware (until this thread) that there was enough space for place name and personal name non-English characters in the UTF-8 standard. Regards, Gene Falck [EMAIL PROTECTED] ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list & getting help **
RE: [WSG] UTF-8 (was: Quirks mode vs Standards mode)
> From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Dean Jackson > Sent: 19 April 2005 17:12 ... > > I try to avoid entities with exception for & < > " ' > > You're right. If you're using UTF-8 you only need to encode > the characters that are special in HTML/XHTML/XML (&, < and >). > Using numeric entities (or even named entities) in a UTF-8 > file for characters that are outside the range of ASCII is > usually a waste of space. > > The only time I use them is when I'm on a keyboard/system > where I don't know how to enter the character, such as "å". > I'd type å in this case. > > PS. Hopefully the W3C i18n guru Richard is listening and will > tell everyone if I'm wrong. Hi Dean. I'd hesitate to say anyone was right or wrong here, but I'm of the same opinion, albeit with one small exception. I think in UTF-8 NCRs/entities beyond the ASCII range can be useful for invisible characters (such as LRM in Arabic/Hebrew) or ambiguous characters (such as non-breaking space - which looks like an ordinary space). Tee mentioned some issues with Chinese characters on IE Mac that I haven't got to the bottom of yet, but I don't recall encountering any other problems that could be solved by using escapes instead. For a fuller version of my opinion see the slides starting at http://www.w3.org/International/tutorials/tutorial-char-enc/en/all.html#Slid e0440 RI ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list & getting help **
Re: [WSG] UTF-8 (was: Quirks mode vs Standards mode)
On 19 Apr 2005, at 02:36, Anders Nawroth wrote: HTMLTidy is the only useful piece of software I've found for web page development, and I use it to clean up my pages and get proper encoding of my Norwenglish lines of text into numeric entities (UTF-8) where needed. What characters needs encoding into numeric entities when using UTF-8? I try to avoid entities with exception for & < > " ' You're right. If you're using UTF-8 you only need to encode the characters that are special in HTML/XHTML/XML (&, < and >). Using numeric entities (or even named entities) in a UTF-8 file for characters that are outside the range of ASCII is usually a waste of space. The only time I use them is when I'm on a keyboard/system where I don't know how to enter the character, such as "å". I'd type å in this case. PS. Hopefully the W3C i18n guru Richard is listening and will tell everyone if I'm wrong. PPS. This is a good test to see if the WSG mail system can handle UTF-8 (assuming Apple Mail encodes the message using it). Dean -- dean jackson world wide web consortium (w3c) - http://www.w3.org/ mailto:[EMAIL PROTECTED] ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list & getting help **
Re: [WSG] UTF-8 (was: Quirks mode vs Standards mode)
On Mon, 18 Apr 2005 18:10:44 +0100, Paul Menard <[EMAIL PROTECTED]> wrote: Just curious what tidy parameters you are using. I have some European (Polish, Czech, Russian) language sites I'm working on and would prefer to convert the UTF-8 to some numeric equal for certain high-range letters. I couldn't get Tidy to properly transcode non-latin1 encodings and I use it with -raw option that at least prevents it from ruining documents. Conversion is as easy as copy&paste - get text displayed properly, copy it and paste into UTF-capable editor. -- regards, Kornel Lesiński ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list & getting help **
Re: [WSG] UTF-8 (was: Quirks mode vs Standards mode)
Just curious what tidy parameters you are using. I have some European (Polish, Czech, Russian) language sites I'm working on and would prefer to convert the UTF-8 to some numeric equal for certain high-range letters. Paul --- Anders Nawroth <[EMAIL PROTECTED]> wrote: > > > HTMLTidy is the only useful piece of software I've found for web page > > development, and I use it to clean up my pages and get proper encoding > > of my Norwenglish lines of text into numeric entities (UTF-8) where > > needed. > > What characters needs encoding into numeric entities when using UTF-8? > > I try to avoid entities with exception for & < > " ' > > /anders (Sweden) > ** > The discussion list for http://webstandardsgroup.org/ > > See http://webstandardsgroup.org/mail/guidelines.cfm > for some hints on posting to the list & getting help > ** > > ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list & getting help **