OT Re: [WSG] UTF-8
PPS. This is a good test to see if the WSG mail system can handle UTF-8 AFAIK å is Latin1 character (Scandinavian), so no need for UTF here. -- Jan Brasna aka JohnyB :: www.alphanumeric.cz | www.janbrasna.com ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help **
RE: [WSG] UTF-8 (was: Quirks mode vs Standards mode)
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Dean Jackson Sent: 19 April 2005 17:12 ... I try to avoid entities with exception for ' You're right. If you're using UTF-8 you only need to encode the characters that are special in HTML/XHTML/XML (, and ). Using numeric entities (or even named entities) in a UTF-8 file for characters that are outside the range of ASCII is usually a waste of space. The only time I use them is when I'm on a keyboard/system where I don't know how to enter the character, such as å. I'd type aring; in this case. PS. Hopefully the W3C i18n guru Richard is listening and will tell everyone if I'm wrong. Hi Dean. I'd hesitate to say anyone was right or wrong here, but I'm of the same opinion, albeit with one small exception. I think in UTF-8 NCRs/entities beyond the ASCII range can be useful for invisible characters (such as LRM in Arabic/Hebrew) or ambiguous characters (such as non-breaking space - which looks like an ordinary space). Tee mentioned some issues with Chinese characters on IE Mac that I haven't got to the bottom of yet, but I don't recall encountering any other problems that could be solved by using escapes instead. For a fuller version of my opinion see the slides starting at http://www.w3.org/International/tutorials/tutorial-char-enc/en/all.html#Slid e0440 RI ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help **
RE: OT Re: [WSG] UTF-8
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jan Brasna Sent: 19 April 2005 17:29 To: wsg@webstandardsgroup.org Subject: OT Re: [WSG] UTF-8 PPS. This is a good test to see if the WSG mail system can handle UTF-8 AFAIK å is Latin1 character (Scandinavian), so no need for UTF here. Yes, but the bytes used in ISO 8859-1 (Latin1) or Windows code page and those usef for UTF-8 are different. In Latin1 encoding å is a single byte: E5; whereas UTF-8 represents this as two bytes: C3 A5. So the fact that you are seeing it indicates that the system recognised the Unicode encoding :-) RI PS: You may find my Unicode converter a useful play tool for this kind of thing. It's a bit rough and ready, but it's useful. http://people.w3.org/rishida/scripts/uniview/conversion.en.html Richard Ishida W3C contact info: http://www.w3.org/People/Ishida/ W3C Internationalization: http://www.w3.org/International/ Publication blog: http://people.w3.org/rishida/blog/ ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help **
Re: OT Re: [WSG] UTF-8
Yes, but the bytes used in ISO 8859-1 (Latin1) or Windows code page and those usef for UTF-8 are different. Sure, however the mail came in Latin1 (see the headers), so I just want to comment that it won't show the difference. -- Jan Brasna aka JohnyB :: www.alphanumeric.cz | www.janbrasna.com ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help **
Re: [WSG] UTF-8 (was: Quirks mode vs Standards mode)
Hi Dean, You wrote: ... Norwenglish lines of text into numeric entities (UTF-8) where needed. What characters needs encoding into numeric entities when using UTF-8? I try to avoid entities with exception for ' It is a small nuisance, of course. I do use them when I type (US English qwertyuiop keyboard) as I usually don't have a place to copy and paste. It does work quite well, though, when I copy and paste something I used entities or the numeric codes for into Outlook at work. Mostly at work I use a degree sign or a plus/minus sign but there is a lot to cover for foreign place and personal names that is not on my keyboard. You're right. If you're using UTF-8 you only need to encode the characters that are special in HTML/XHTML/XML (, and ). Using numeric entities (or even named entities) in a UTF-8 file for characters that are outside the range of ASCII is usually a waste of space. Does anyone have a good quick reference as to which characters are good on UTF-8? How about a faster or easier way to type them in? I wasn't aware (until this thread) that there was enough space for place name and personal name non-English characters in the UTF-8 standard. Regards, Gene Falck [EMAIL PROTECTED] ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help **
Re: [WSG] UTF-8
Dean Jackson wrote: The only time I use them is when I'm on a keyboard/system where I don't know how to enter the character, such as å. I'd type aring; in this case. PS. Hopefully the W3C i18n guru Richard is listening and will tell everyone if I'm wrong. I'll second that... Can someone actually _underwrite_ some real facts on this issue? Facts that are global and cross-browser enough to be of any real use? This is one of the few things I'd rather not leave for my visitors to mess up. I can create a much larger mess at my end. - Half of the Norwegian sites I visit in a day are full of question-marks--until I actively change encoding, and change it again, and again... - Not uncommon problem on EN-US sites either btw, so something isn't working too well. #8212; seems to come out a lot more predictable than most alternatives I see daily. Euro-signs are ok, but they don't look as if they belong in most sentences they appear in. - No problem to hit it right, but right isn't the same on all sites, and the facts I have found on this issue are often discarded by the next fact-sheet I find. If I'm confused, then so are many regular web-surfers. - Regular ASCII-characters (- 127) isn't the problem. It's the 128 - 255 range that messes it up at my end. So I prefer aring or #229; instead of å before my pages are released into the wild, as that'll get at least the few characters we Norwegians need as extras come out right regardless of what encoding _my_ browsers are set at. Above the basic 8bits ASCII we either need the right encoding-map or the right multiple of it so it becomes Universal. A proper - universal - converter would be nice... --- Whether I've mixed up encoding-maps and entities in a way I shouldn't, isn't as important as getting it right at my end. I think I understand enough about language-maps to be able to stack them together and end up with a universal one in the end. (I think that's already done btw) regards Georg -- http://www.gunlaug.no ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help **
Re: [WSG] UTF-8
- Half of the Norwegian sites I visit in a day are full of question-marks--until I actively change encoding, and change it again, and again... Hmm, we here in CZ use Latin2 or CP1250, everyone uses proper charset headers, so no problem with this. -- Jan Brasna aka JohnyB :: www.alphanumeric.cz | www.janbrasna.com ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help **
Re: [WSG] UTF-8
Jan Brasna wrote: - Half of the Norwegian sites I visit in a day are full of question-marks--until I actively change encoding, and change it again, and again... Hmm, we here in CZ use Latin2 or CP1250, everyone uses proper charset headers, so no problem with this. You hit one of the usual problems right on the head. Proper charset headers are often lacking. (I have some confessions to make on that subject myself - think it's a human bug that needs fixing :-) ) regards Georg -- http://www.gunlaug.no ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help **
RE: [WSG] UTF-8 (was: Quirks mode vs Standards mode)
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Gene Falck Sent: 19 April 2005 18:49 ... Does anyone have a good quick reference as to which characters are good on UTF-8? How about a faster or easier way to type them in? FWIW you may find this useful for Latin characters: http://people.w3.org/rishida/scripts/pickers/latin/ See http://people.w3.org/rishida/scripts/pickers/ for explanations and other scripts. RI ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help **
[WSG] UTF-8 (was: Quirks mode vs Standards mode)
HTMLTidy is the only useful piece of software I've found for web page development, and I use it to clean up my pages and get proper encoding of my Norwenglish lines of text into numeric entities (UTF-8) where needed. What characters needs encoding into numeric entities when using UTF-8? I try to avoid entities with exception for ' /anders (Sweden) ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help **
Re: [WSG] UTF-8 (was: Quirks mode vs Standards mode)
Just curious what tidy parameters you are using. I have some European (Polish, Czech, Russian) language sites I'm working on and would prefer to convert the UTF-8 to some numeric equal for certain high-range letters. Paul --- Anders Nawroth [EMAIL PROTECTED] wrote: HTMLTidy is the only useful piece of software I've found for web page development, and I use it to clean up my pages and get proper encoding of my Norwenglish lines of text into numeric entities (UTF-8) where needed. What characters needs encoding into numeric entities when using UTF-8? I try to avoid entities with exception for ' /anders (Sweden) ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help ** ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help **
Re: [WSG] UTF-8 (was: Quirks mode vs Standards mode)
On Mon, 18 Apr 2005 18:10:44 +0100, Paul Menard [EMAIL PROTECTED] wrote: Just curious what tidy parameters you are using. I have some European (Polish, Czech, Russian) language sites I'm working on and would prefer to convert the UTF-8 to some numeric equal for certain high-range letters. I couldn't get Tidy to properly transcode non-latin1 encodings and I use it with -raw option that at least prevents it from ruining documents. Conversion is as easy as copypaste - get text displayed properly, copy it and paste into UTF-capable editor. -- regards, Kornel Lesiski ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help **
Re: [WSG] UTF-8
I have [...] Czech, [...] sites I'm working on and would prefer to convert the UTF-8 to some numeric equal for certain high-range letters. Well, I'd suggest you not to do this, as nobody here would do it this way :) However it'd make the maintenance easier for non-CZ/PL person. -- Jan Brasna aka JohnyB :: www.alphanumeric.cz | www.janbrasna.com ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help **
Re: [WSG] UTF-8
Anders Nawroth wrote: What characters needs encoding into numeric entities when using UTF-8? I try to avoid entities with exception for ' Look for some answers here: http://www.joelonsoftware.com/articles/Unicode.html ...so I don't have to give incomplete answers about something I'm not an expert on. - My own reasoning is this: I observed that text on my pages became too dependent on browsers encode-settings, or their ability to auto-detect. Introducing entities made the end-result much more predictable, and I haven't encountered a single problem so far (after 2 years). Maybe others have, but they haven't told me. I write all (Norwegian) 8bit characters as plain text, characters above as numeric entities, and leave the rest to Tidy. What I get is Latin-1 (ISO-8859-1) with a mixture of decimal and character entities, which is equivalent to UTF-8 for my characters as far as I know. Most ASCII-characters are left as they are, but æ, ø, å are converted. I don't really know what Tidy will do when fed 8bits code from other language-maps, but the few times I've copied a character from a language outside my own 8bit maps and left it to Tidy, it has rendered correctly in my browsers. It looks like a mess if I don't convert it this way. - Someone asked what parameters I use... My Tidy has this script for convert to xml: --- quote-marks: true uppercase-tags: false fix-backslash: false literal-attributes: true numeric-entities: true output-xml: true --- That's as much information as I can offer. Maybe someone can add some more. regards Georg -- http://www.gunlaug.no ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help **
Re: [WSG] UTF-8 ,charset and Standard
berry wrote: I understand that it is not the uft-8 wich give the ability to render the accent on the screen but the language content. meta http-equiv=Content-Language content=fr which tell the agent to render the accent using the UFT-8 also don't forget that meta alone is not enough. You need to have your web server sending out the correct encoding with its headers as well (if you're using Apache and have .htaccess override support, have a look at http://www.w3.org/International/questions/qa-htaccess-charset for instance) Then Why the validator gives an error for each accent when I use UFT-8? It say that UFT-8 doesn't recognize this kind of character (french character) Try it with the correct server headers and the validator should be happy. -- Patrick H. Lauke _ re·dux (adj.): brought back; returned. used postpositively [latin : re-, re- + dux, leader; see duke.] www.splintered.co.uk | www.photographia.co.uk http://redux.deviantart.com ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help **
Re: [WSG] UTF-8 ,charset and Standard
UTF-8, a flavour of unicode, is an universal character set. You don't define any codepage/language for it. You just simply use whatever characters you like. meta http-equiv=Content-Type content=text/html; charset=UTF-8 / This creates Content-Type header being http equivalent. Content is text/html with charset UTF-8. It would be even better to send real http header with charset. In PHP it's: ?php header(Content-Type: text/html; charset=UTF-8); ? Note: Don't use Notepad or other Microsoft tools for UTF-8, because they tend to add unvisible BOM marker character at the beginning of every file. This helps them recognize UTF-8 from other files, but confuses many browsers. I use freeware Notepad2 for UTF-8. -- regards, Kornel Lesiski osiolki.net ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help **
[WSG] UTF-8 ,charset and Standard
Someone can told me why using charset if we have to write in our page this kind of code #233; for the accent ? I understand that the charset give the opportunity depend the langage browser to display page correctly but It doesn't give the server the opportunity to display the page the right way. Sometimes, it seems that computer science is still at the stone age. It feel me upset that each time I have to introduce a text I have to format it. I understand that we can give command to the server to display the text the right way but we don't have always this possibility. What can we do for keeping our accent in our HTML page? and if I am wrong can someone told why I can not see my accent on my page when I use UTF-8 charset ? Thanks in advance Berry ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help **
Re: [WSG] UTF-8 ,charset and Standard
Hi Berry, Here is an example of a UTF-8 page with non-escaped French characters: http://xstandard.com/page.asp?p=18BF64A8-DF0A-473E-8402-50E9E917E0C1 Are you able to see them in your browser? Regards, -Vlad http://xstandard.com Standards-compliant XHTML WYSIWYG editor - Original Message - From: berry [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Monday, November 29, 2004 11:57 AM Subject: [WSG] UTF-8 ,charset and Standard Someone can told me why using charset if we have to write in our page this kind of code #233; for the accent ? I understand that the charset give the opportunity depend the langage browser to display page correctly but It doesn't give the server the opportunity to display the page the right way. Sometimes, it seems that computer science is still at the stone age. It feel me upset that each time I have to introduce a text I have to format it. I understand that we can give command to the server to display the text the right way but we don't have always this possibility. What can we do for keeping our accent in our HTML page? and if I am wrong can someone told why I can not see my accent on my page when I use UTF-8 charset ? Thanks in advance Berry ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help ** ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help **
Re: [WSG] UTF-8 ,charset and Standard
Yes I am able to see it in my browser maybe the server is set to render the accent, if not how come I am not able to see the same thing with my page? I would be surprise, if we have to use XHTML to have accent ? My Page use HTML4.1 strict. Thanks in advance Berry Hi Berry, Here is an example of a UTF-8 page with non-escaped French characters: http://xstandard.com/page.asp?p=18BF64A8-DF0A-473E-8402-50E9E917E0C1 Are you able to see them in your browser? Regards, -Vlad http://xstandard.com Standards-compliant XHTML WYSIWYG editor - Original Message - From: berry [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Monday, November 29, 2004 11:57 AM Subject: [WSG] UTF-8 ,charset and Standard Someone can told me why using charset if we have to write in our page this kind of code #233; for the accent ? I understand that the charset give the opportunity depend the langage browser to display page correctly but It doesn't give the server the opportunity to display the page the right way. Sometimes, it seems that computer science is still at the stone age. It feel me upset that each time I have to introduce a text I have to format it. I understand that we can give command to the server to display the text the right way but we don't have always this possibility. What can we do for keeping our accent in our HTML page? and if I am wrong can someone told why I can not see my accent on my page when I use UTF-8 charset ? Thanks in advance Berry ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help ** ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help ** ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help **
Re: [WSG] UTF-8 ,charset and Standard
Finaly I have the answer ! I understand that it is not the uft-8 wich give the ability to render the accent on the screen but the language content. meta http-equiv=Content-Language content=fr which tell the agent to render the accent using the UFT-8 Then Why the validator gives an error for each accent when I use UFT-8? It say that UFT-8 doesn't recognize this kind of character (french character) Thank you in Advance Berry ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help **