On Sat, 21 Apr 2012 22:26:13 +0200, Ambrose LI <[email protected]>
wrote:
One comment first: 亂碼 are not “random characters”; they are most often
the symptom of an encoding or decoding failure, so while I have not
tried to verify Kenny’s results, I am in complete agreement that how
he attacked the problem is the correct way. (I used to have to do this
on several occasions, and the way I did it was no different than how
Kenny has done it.)
You are of course right, perhaps I've used 亂碼 too broadly, I've used it
to mean something like "unrecoverable misencoding or typo". The comment
about "random characters" was meant from the users point of view, of
course the "迳" in <http://www.wintan.com.tw/service_06_08.htm> is not the
result of any random process, just encoding mismatch.
2012/4/21 Philip Jägenstedt <[email protected]>:
[...]
What should the Big5 mapping be? If it is like the conservative Big5
that
Opera currently supports, that really won't help Taiwan sites and users
at
all. What Firefox does is also not that great, so it would have to be a
new
mapping that no browser has ever supported so far.
Personally speaking, I’d say that Big5 has always been a mess and it
is still a mess, and the only sane way to solve this problem is to
expose the underlying variants of Big5 in the encoding selection menu.
Even if some sort of statistical AI technique were used there will
still be occasions where what the machine chooses will be wrong. Just
let the user choose if something doesn’t work.
Right, that will likely be required if we end up with more than one Big5
variant.
--
Philip Jägenstedt
Core Developer
Opera Software