On Sun, 22 Apr 2012 02:23:12 +0200, Yuan Chao <[email protected]> wrote:
On Sun, Apr 22, 2012 at 1:07 AM, Philip Jägenstedt <[email protected]>
wrote:
Unlike ISO-2022-JP which has a very clear states definition, Big5 has
no error handling at all. (Just recall that Kenny was asking about
this about a year ago on this ML.) A visible character is very useful
instead of a fullwidth space, which just hides things away.
<http://dvcs.w3.org/hg/encoding/raw-file/tip/Overview.html#big5>
defines the
error handling. However, it can probably be improved, see
<https://www.w3.org/Bugs/Public/show_bug.cgi?id=16771>.
Wondering how this definition comes?
Anne specified something similar to other multi-byte encodings, I think.
One main goal is to not consume following ASCII characters after an error,
but as you can see the current solution can sometimes instead break a
Chinese character following an error.
http://lists.w3.org/Archives/Public/public-html-ig-zh/2011Aug/0052.html
I didn't see any reply to Kenny's request since.
I didn't see that at the time, but it seems like the new spec should
address this. I suggest discussing the error handling of Big5 in the bug I
filed, input from someone with more experience would be helpful.
For people starts using big5 since the DOS era, one should be used to
the garbled characters due to conflicts with (ext.) ASCII control
codes and tables. This is the "feature" of big5. hahaha... Also a good
"error message".
How U+FFFD is rendered appears to be a font issue, I presume you don't
mean
that random incorrect characters is preferable.
The current solution seems to take all PAU as error. I don't prefer it.
The mapping in the spec doesn't use any PUA code points, are you
suggesting that it should?
On Wed, 18 Apr 2012 22:05:22 +0200, Kang-Hao (Kenny) Lu
提供一點考古方向:有些的編碼看起來是 big5-2003[1]、、、、、囧
6. http://domestic.mytour.com.tw/list.asp?id=721
hkscs: 不捨結束此行精采假期、踏上歸途<U+FFFD �>視情況休息<br>18:30~
uao: 不捨結束此行精采假期、踏上歸途<U+8FF3
迳>視情況休息<br>18:30~
84B3 在 big5-2003 是 U+F0E0(PUA),在 Windows 上看起來是 U+2192(→
RIGHTWARDS ARROW),但是兩個字形(glyph)並不一樣。
有可能,不過<U+3001 IDEOGRAPHIC COMMA 、>或者<U+FF0C FULLWIDTH
COMMA ,>好像更好。
I would tend to "→" here. (as supply info, we don't use comma as
parentheses)
It's mostly <http://www.wintan.com.tw/service_06_08.htm> that made me
Oh. For this example, it's even more obvious that "→" makes sense.
It tells you to look in to the menu bar for [證券帳務] menu item and
*then* click on [庫存查詢] sub-menu. A "、" makes no sense at all!
In
<http://lists.w3.org/Archives/Public/public-html-ig-zh/2012Apr/0044.html>
you said that "、" was very likely, but if you're sure it should be "→"
then
it looks like all 84B3 might be the same, which seems a lot saner.
That's before Kenny's "interpretation". Don't you agree "→" makes more
sense here? As I said, I'm neutral and support for the best.
Yes, you are probably right, I didn't actually read the content when
guessing.
我不知道該說什麼才好了,感覺為 Big5-UAO 把 big5-2003
的東西加回去一些可
以解決很大部份,另外,上面這些字都不是日文漢字,所以也不影響我對
Big5-
I tend to agree with Kenny's view here.
One of you will have to explain exactly what should be done, how should
Firefox's mappings be modified to make better sense?
I think you understand how community does things. We can try to bring
up this and call for people's help.
Do you know of other places than this list where it would be helpful to
ask about these issues?
UAO 的要求
:p,有人知道這部份的編碼對應是在可以動手術的範圍還是不行?
按照上面的,用Big5-2003並不是很完美的。MozTW的映射好像不是完全可靠,所以我不知道該根據什麼去定義Big5-UAO。
問題的範圍畢竟是0.043%的臺灣網頁的幾個字符。現代的瀏覽器只有Firefox能顯示,而且他們的映射還造成別的問題……
Unfortunately it cause some problem for non-native Chinese readers. :)
Certainly it's a problem for all readers of Chinese that random
characters
show up where they don't belong?
Emm... Here you think the current firefox solution is not perfect and
the needs in Taiwan is negligible so it's better to use big5-hkscs to
replace the big5 (seems to be CP950?)? I'm an experimental high energy
physicist. The best way to resolve a debating and validate a theory is
to do experiment and measure it. :) Maybe you can just implement it in
Opera and make a survey to see how both HK and Taiwan users appreciate
it?
It will definitely be an improvement for Opera since HKSCS will start
working and UAO has never worked, but if there's something even better we
could do I'd really prefer that. A better test would be to see the
reactions if Firefox changed, but that's not an experiment I can run :)
在這種情況下,我覺得嘗試跟受影響的網站聯繫還是有希望。反正這是唯一的辦法能夠讓香港和國際的用戶也看得到。
Still as mentioned, HK users overwrite "big5-hkscs" as "big5". It's
their government's choice to create the inconvenience to "encourage"
people to move to unicode.
http://my.opera.com/community/forums/topic.dml?id=191245
It took quite long time for Yahoo! Taiwan to move to unicode. Pushing
big5-hkscs to replace big5 in w3c would have profound effect. I only
ask for not breaking my current usage. Though I'd be happy to help to
put the major variants of big5 to w3c. (it's very little info here
http://dvcs.w3.org/hg/encoding/raw-file/tip/Overview.html#big5)
Which variants do you think should be specified and what should trigger
them? Am I correct to assume that Firefox is the only current browser that
*doesn't* break your current usage?
--
Philip Jägenstedt
Core Developer
Opera Software