On Thu, 8 Apr 2004, Tom Lane wrote:

> No, the ordering *will* be the same as it was before, because strcoll()
> is still functioning the same.  You'd get the same answer from a sort
> operation since it depends on the same operators.
> 
> It interprets them according to LC_CTYPE, which does not change.

I'm afraid that I don't understand you yet, and would like to have
it explained in more detail if possible. While I feel a bit stupid to not 
understand what you are stating, but I'm sure there are more then me who 
feels like that :-)

Maybe we can look at an example. Let us take some utf-8 strings correctly
ordered in swedish

  Åke
  Ära

now, since these are utf-8 they are encoded as

  c3 85 6b 65        (Åke)
  c3 84 72 61        (Ära)

and that is the order they have in the index.

Now, this index is copied into a new database where
the encoding is Latin1. Now we want to in the above table
lookup the string that in Latin1 is represented as

   c3 84 72 61

So we look in the index and see that the first row in the index is
not the same. But, now when we compare these strings as latin1 strings
it's no longer the case that c3 84 72 61 > c3 85 6b 65. As latin1 strings
we compare each character and c3 = c3, and then 84 < 85 (in latin1 84
and 85 are some control characters). Se, we will not find this string
in the index since we think it should have been before the first entry.

We might even insert a new copy of this string in another
position in the index.

So, my question is.

a) What have we gained by copying this table into the latin1 database.
   It looks broken to me. As far as I understand we have to rebuild
   the index to get something that works at least a little.

b) Maybe one should not just reindex but reencode. In some cases that
   works and produces good result. For example from latin1 to utf-8.

c) if we are going to reindex anyway, then why not do that and solve the
   per database locale also. This is an independent point from a) and b)
   that I still want to understand the first two points even if we don't
   talk about per database locale.


-- 
/Dennis Björklund


---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Reply via email to