Theodore!
Thanks for taking the time to explain. I followed the link and
googled some more info - I now see the problem.
I am sure that this will become relevant for my current project, so
if you don't mind I will contact you later, when I design my search
algorithm... basically it should be a good idea to store normalized
data in my SQLite database to find every relevant hit. Or does SQLite
normalize by default?
Best,
Marcel
On 16.06.2006, at 19:21, Theodore H. Smith wrote:
From: Marcel <[EMAIL PROTECTED]>
Date: Fri, 16 Jun 2006 17:20:33 +0200
Does this mean, that there are several codes for the same
character - =20=
like a "=E9" in Unicode?
Yes, there are several "codes". I think it's called a serialisation
of code points, but I'm also a bit loose on the Unicode terminology.
Basically, Unicode has some history behind it. It tried to be all
things to all people, and thus we often two codes for the same
accented letter. Some people wanted their favourite letters
remaining as just one code-point, despite that the more modern way
that Unicode.org suggests, is to have one letter + one accent as a
seperate letter. The accent is called a "combining character".
So Unicode added both variants, the accents and one code-point that
equals both, to try to please both kinds of users.
Even if they didn't do this thing, we'd still need normalisation
code. What if you have a letter with two accents?
Well, the letter can look like the same letter, even if the accent
is above or below.
For example: A + above-accent + below-accent, looks the same to the
user as: A + below-accent + above-accent.
So, Unicode have specified a correct order for combining characters
to be reordered. My UnicodeStuff module has a ReorderCombiners
method now :)
But not for UTF-8 encoding, or?
The encoding has got nothing to do with this. You'll get exactly
the same problem, no more or less problems, by using UTF-8, UTF-16
or UTF-32.
This FAQ http://www.unicode.org/faq/normalization.html tries to
explain a bit more, but unfortunately it's all written in gobbedly-
gook, so I don't think it'll help ;)
--
http://elfdata.com/plugin/
_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>
Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>
_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>
Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>