I say live with it.

This happens in Japanese as well, and it gets even worse when searching in romazi, European letters, because there are so many different ways of spelling things, and all the Chinese borrow words mean and sound exactly the same.

But when the whole point of the system is to search for the meaning of a text and not the exact spelling, we have to live with getting a few irrelevant results.

-Dave Oftedal

Edward H Trager wrote:

On Thu, 13 Feb 2003, Rick Cameron wrote:


The Win32 API includes a function that can do this folding, on Windows
NT/2000/XP: LCMapString, with the option LCMAP_SIMPLIFIED_CHINESE or
LCMAP_TRADITIONAL_CHINESE.

I know little about Chinese, but I have the impression that it is much more
common for several traditional characters to correspond to one simplified
character than vice versa. If that's true, it seems to me that it would make
most sense to fold to simplified.

- rick

Hmmm ... Suppose I'm searching for some relatively obscure traditional
character that occurs mostly in Wen Yen (u+6587 u+8A00 : Classical
Chinese) and has a very specific meaning in Classical Chinese. This
character gets "folded" or "mapped" to a fairly common character in modern
bai hua (u+767D u+8BDD) Chinese, and then the search proceeds. The result
set contains hundreds or thousands of irrelevant results related to the
modern meaning, and I still have to sift through them looking for the
needles in the haystack. I'll try to provide a concrete example once I
think of one ... it's been a long time since I studied Classical Chinese.


This "folding" is much easy that implementing a full-fledged
simplified<->traditional conversion (which needs to be context sensitive and
dictionary-driven), because the result is just in a temporary buffer used
for comparison, and no one is going to see it.

_ Marco




--
New Norwegian (Nynorsk) is essentially the speech of Norwegian peasants
as mutilated by a schoolteacher with a poor understanding of Icelandic.
--Halld�r Laxness, via B. Philip Jonsson

Swedish, Norwegian and Danish are actually the same language. It's just
that the Norwegians can't spell it, and the Danes can't pronounce it.
--Chlewey



Reply via email to