Re: [WSG] problem with utf-8 page encoding

2005-06-09 Thread Anders Nawroth


Vaska.WSG skrev:

The Chinese websites I have looked up have latin1 style urls...no sign 
of Chinese text anywhere in there.


Look at:
http://zh.wikipedia.org/

Works in FF1  IE6 and the URLs look really nice in Opera8 (and 
sometimes in IE too).

I have no other browsers here right now.

Seems like
http://www.mediawiki.org/
(used by Wikipedia) supports this kind of URLs.

I work with a homegrown Php-based CMS, and I now have decided to go with 
non-latin1 URLs; I'm at implementing this right now.


/Anders
**
The discussion list for  http://webstandardsgroup.org/

See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list  getting help
**



Re: [WSG] problem with utf-8 page encoding

2005-06-07 Thread Anders Nawroth


tee:


These are domains but the one Anders provided does have a path in Japanese
character, and it works in FF.
http://www.w3.org/International/tests/sec-iri-3
 


I looked at ja.wikipedia.org and they use this practise.
What doesn't always works well, is links from pages with other charsets 
than UTF-8.

IE and Opera (I have only O8) handles this correct, but not FF.
Otherwise this is more of a server-side issue, to handle paths in a 
correct way.


I'm currently developing a site in japanese, so I have to decide wich 
way to go. I thought I'd go with a-z0-9 in the paths, but now I'm not so 
sure, as wikipedia apparently thinks japanese characters in the path is 
stable enough!


/Anders
**
The discussion list for  http://webstandardsgroup.org/

See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list  getting help
**



Re: [WSG] problem with utf-8 page encoding

2005-06-07 Thread Vaska . WSG
I see that it is possible, but how many folks use it?  Like an individual or a small business...is it common enough  (yet)?

The below links aren't working in Safari...

I've been researching how alot of open source cms's and blog tools deal with this issue and they don't.  Most of them either have some kind of conversion map (that is completely inadequate for the task) or they create urls that have little to do with the page title.  For instance...

http://www.site.com/1-eaeraceceac.php // = the eaeraceceac is a botched character conversion from Chinese

or

http://www.site.com/page0001.php  // = no reference to the page title whatsoever...

The Chinese websites I have looked up have latin1 style urls...no sign of Chinese text anywhere in there.

Aside of requiring a Chinese to enter in a latin page name for an article/entry/page I can't see any way possible to create urls (clean urls) using Chinese (non-latin) characters.

Ideas?

thanks...v



On Jun 7, 2005, at 1:11 AM, tee wrote:

Hi Vaska, as the w3c links Anders provided, it can. However I will be very
skeptical to using it as obviously browsers are not advance enough to handle
it, but then it maybe the server issue too. Sorry, I am too ignorant on this
matter to tell  you anything more.

I did a test on Safari, FF, IE and Opera by entering domain in Chinese, only
FF picks up the address. Wonder how it works on PC browsers.
You may like to try:
Simplified Chinese sites:
A Chinese famous seach engine baidu.com> = ~{0Y6H~}
Or this 163.com> = ~{RWMx~}
Ebay China ebay.com.cn> = ~{RWH$~}

Traditional sites:
tw.yahoo.com> = ~{FfD~}
yam.com> = ~{^,JmLY~}

These are domains but the one Anders provided does have a path in Japanese
character, and it works in FF.
http://www.w3.org/International/tests/sec-iri-3


tee

From: Vaska.WSG [EMAIL PROTECTED]>
Reply-To: wsg@webstandardsgroup.org
Date: Mon, 6 Jun 2005 21:32:08 +0200
To: wsg@webstandardsgroup.org
Subject: Re: [WSG] problem with utf-8 page encoding

tee, or really any Chinese person on this list,

one thing that I've been cuious about is how do you deal with creating
urls.  this could sound extremely naive and i'm sorry for that.  it's
my understanding that use of latin1 characters only is allowed to make
a url...or create folders etc...

http://www.this-is-latin1-text.com/and-this-is-a-folder/and-this-is-a-
filename.php

this wouldn't be possible...

http://www.~{6(;[EMAIL PROTECTED]~{Q!Pc~}.com/~{6(;[EMAIL PROTECTED]/~{6(;[EMAIL PROTECTED].php

i've been having to find a way to deal with this issue and so far i've
only come up with workarounds that just don't seem very user-friendly.

i was looking at conversion maps but it became a completely crazy
exercise...

v

**
The discussion list for  http://webstandardsgroup.org/

See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list  getting help
**





Re: [WSG] problem with utf-8 page encoding

2005-06-06 Thread Vaska . WSG
You really need to give us the URL of the page that this occurs on so others can test it.

It's on my dev server.  If you really want to see a test page (where the images aren't working and alot of the css is totally in outerspace right now) it's here.

http://www.vaska.com/wsg/08test.php

 We also need to know os/ver and browser/ver it occurs on to emulate it.

OSX 10.3.9...Safari, Firefox, Mozilla, IE...

 Opening it as a local file is not a good test 

Apparently so, because the test, at least for me, works when it's on my server.  I should have thought of this one. There are a few validation errors right now as well...

Thanks for the advice Peter.  I think I'm on the right track...just trying to get over some of the finer details so I can move forward (I'm competely rebuilding something that I've been using with clients for years en route to making it an open source thing).

v


Re: [WSG] problem with utf-8 page encoding

2005-06-06 Thread Rimantas Liubertas
On 6/5/05, Vaska. WSG [EMAIL PROTECTED] wrote:
 I'm not sure what the deal is, but when I bring up a page in my system
 it doesn't encode properly at first.  I have to go the browser options
 and change it to utf-8.  The funny thing is that utf-8 is my default as
 set in all my browsers.
...
 I don't have any output buffering or anything of the kind going on
 here.  Is there some on the surface here that I'm missing?


My guess would be that you use apache which has AddDefaultCharset in
his httpd.conf file
uncomented and set to, say ISO-8859-1.
You can check what headers you server sends with Firefoxes
LiveHTTPHeaders extension
or using online tools like this: http://www.seoconsultants.com/tools/headers.asp
HTTP headers have higher priority than META.

Regards,
Rimantas 
-- 
http://rimantas.com/
**
The discussion list for  http://webstandardsgroup.org/

 See http://webstandardsgroup.org/mail/guidelines.cfm
 for some hints on posting to the list  getting help
**



Re: [WSG] problem with utf-8 page encoding

2005-06-06 Thread tee
 http://www.vaska.com/wsg/08test.php
 
  We also need to know os/ver and browser/ver it occurs on to emulate
 it.
 
 OSX 10.3.9...Safari, Firefox, Mozilla, IE...
 
Hi Vaska, you Chinese text showing up fine on Mac, including IE 5.2.
My personal experience, unicode Simplified Chinese is less problematic on
Mac' IE than the Traditional.

A comment about your Chinese text: There shouldn't have space between each
character  - punctuation separate words and sentences, not
space. As a Chinese who is obsesses with her language, it always bugs me to
see the mis-used of Chinese in some English-centric sites. I reckon the same
goes to English tongue, quite a torture for you guys to read my English ;)

Regards,
tee
 

**
The discussion list for  http://webstandardsgroup.org/

 See http://webstandardsgroup.org/mail/guidelines.cfm
 for some hints on posting to the list  getting help
**



Re: [WSG] problem with utf-8 page encoding

2005-06-06 Thread Vaska . WSG
tee, or really any Chinese person on this list,

one thing that I've been cuious about is how do you deal with creating urls.  this could sound extremely naive and i'm sorry for that.  it's my understanding that use of latin1 characters only is allowed to make a url...or create folders etc...

http://www.this-is-latin1-text.com/and-this-is-a-folder/and-this-is-a-filename.php

this wouldn't be possible...

http://www.~{6(;[EMAIL PROTECTED]~{Q!Pc~}.com/~{6(;[EMAIL PROTECTED]~{Q!Pc~}/~{6(;[EMAIL PROTECTED]~{Q!Pc~}.php

i've been having to find a way to deal with this issue and so far i've only come up with workarounds that just don't seem very user-friendly.

i was looking at conversion maps but it became a completely crazy exercise...

v

Re: [WSG] problem with utf-8 page encoding

2005-06-06 Thread Anders Nawroth



this wouldn't be possible...

http://www..com//.php


According to W3C it shold be possible, look at:
http://www.w3.org/International/tests/#iri

IE needs a plugin to enable IDN: (Internationalized Domain Names)
http://www.idnnow.com/

/Anders
**
The discussion list for  http://webstandardsgroup.org/

See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list  getting help
**



Re: [WSG] problem with utf-8 page encoding

2005-06-06 Thread Anders Nawroth

More in-depth information here:
http://www.w3.org/International/articles/idn-and-iri/

Anders Nawroth skrev:




this wouldn't be possible...

http://www..com//.php



According to W3C it shold be possible, look at:
http://www.w3.org/International/tests/#iri

IE needs a plugin to enable IDN: (Internationalized Domain Names)
http://www.idnnow.com/


**
The discussion list for  http://webstandardsgroup.org/

See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list  getting help
**



[WSG] problem with utf-8 page encoding

2005-06-05 Thread Vaska . WSG
I'm not sure what the deal is, but when I bring up a page in my system it doesn't encode properly at first.  I have to go the browser options and change it to utf-8.  The funny thing is that utf-8 is my default as set in all my browsers.

This is my header...

!DOCTYPE html PUBLIC -//W3C//DTD XHTML 1.0 Transitional//EN http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd>
html xmlns=http://www.w3.org/1999/xhtml xml:lang='en' lang='en'>
head>
title>Page title/title>
meta http-equiv=Content-Type content=text/html; charset=utf-8 />

etc...etc...

And in my document I am specifying the correct language as needed...yes, I know it's a table but this is tabular data...I'm just trying to show the pertinent parts here...

td width='60%' class='cell-doc' xml:lang='zh' lang='zh'>~{6(~} ~{;[EMAIL PROTECTED] ~{4N~}~{Q!Pc~}/td>

I don't have any output buffering or anything of the kind going on here.  Is there some on the surface here that I'm missing?

Thanks, v

RE: [WSG] problem with utf-8 page encoding

2005-06-05 Thread Peter Firminger



Hi Vaska,

You really need to give us the URL of the page that this occurs 
onso others can test it.

We also need to know os/ver and browser/ver it occurs on to 
emulate it.

Opening it as 
a local file is not a good test (unless the page is destined for a CD-ROM or 
Kiosk). These things can get complicated by the charset the server sends in the 
request header as well as the font specified in the CSS and what fonts are 
installed on your machine etc.Thus, the code snippet below really doesn't tell us 
much. 

Please try a 
version of your page in valid HTML 4.01 Transitional and see if the behaviour is 
the same. As you're not using an XML prologue (like ?xml version="1.0" 
encoding="utf-8"? ) in your XHTML page I wonder if the behaviouris an 
xml parser thing. May be way off with that as well. Just a 
thought.

Unicode isn't 
a simple fix-all solution. It makes it easy for simple things like European 
keyboard inputs (French, German, Spanish etc.) but once you get to the non-latin 
charsets it gets difficult.I don't believe (though I haven't read the docs 
for a while now) that all the characters required for a universal solution are 
included in UTF-8. From (distant) memory you have to go to something like UTF-16 
or UTF-32 to get anywhere near the number of characters required for all 
languages and I don't know that browser support is very good with those and I 
don't think they were even intended for web use.

Don't believe 
me though as I am certainly not an expert in the field and I am very rusty in my 
recollection, go and read the specs for yourself in your own context. There 
aremyriad resources on this subject online. Some of them listed in http://webstandardsgroup.org/go/resourcecat18.cfm

Bottom line 
is that if you're doing Arabic or Chinese or Koreanetc. 
characters,you may still need to be conservative and do it in a 
basic way like http://www.gt.nsw.gov.au/information/chinese.htmusing 
something specific like meta http-equiv="content-type" content="text/html; 
charset=big5" lang="zh" and maybe (I have been led to believe) suggest a 
decent default font family for this charset.

No need to apologise 
about using tables. It really bugs me that "tables" have such a bad name round 
here that people feel they have to apologise even when using them correctly. 
Yes, you'll probably be ridiculed if you use them for page layout but don't feel 
even the slightest bit bad about using them for their intended 
purpose.

Peter


This 
  is my header...!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 
  Transitional//EN" 
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"html 
  xmlns="http://www.w3.org/1999/xhtml" xml:lang='en' 
  lang='en'headtitlePage 
  title/titlemeta http-equiv="Content-Type" content="text/html; 
  charset=utf-8" /td width='60%' class='cell-doc' 
  xml:lang='zh' lang='zh'~{6(~} ~{;[EMAIL PROTECTED] ~{4N~}~{Q!Pc~}/td