Jim Allan wrote:

> Jonathan Kaye wrote:
>> Hi all,
>> I don't know whether this is a bug, a feature or an unforseen property of
>> OO sorting. Here's the problem. I am trying to do a sort based strictly
>> on the ascii codes of symbols in a particular column. The problem exists
>> however I define the category of the column be it number, text, all, etc.
>> Here's the problem: the ascii code of the symbol Õ (hex d5) is below the
>> symbol Ô (hex d4). If a do an ascending sort with these two symbols then
>> the Ô's (d4) appear before the Õ's (d5) which is what I want. What's
>> really evil is that if I embed them in strings like this: òËÕÔ225! and
>> this òËÔÔ24!u, the first string is sorted *before* the second string
>> although the hex ascii codes of the first three characters are [f2 cb d5]
>> and [f2 cb d5], respectively. The following number 22 vs. 24 seems to
>> determine the sort order. The codes d4 and d5 seem to be treated
>> identically in this context. How can I convince OO to sort strictly on
>> ascii codes (I have ticked the case sensitive box in the sort options).
> 
> I have tried setting all language information to [none] and still cannot
> get Calc to sort by Unicode order. That is, for example, ß sorts as
> though it were ss, ¶ sorts before alphabetic characters, and so forth.
> The symbols À, Á, Â, Ã, Ä, Å, Æ all collate between A and B.
> 
> So I supposed that Calc uses a Unicode sort order table regardless of
> language and then modifies the sort only according to whatever
> particular language one uses, but never follows strict Unicode order.
> 
> Therefore you cannot rely on a sort to use the particular value of a
> character in a particular character set. Since normally one doesn't want
> pure character set order, this is fine.
> 
> Of course this doesn't explain why Ô and Õ sort differently depending on
> whether they are at the beginning or middle of a string. I think that is
> a bug.
> 
> If you really do need a binary character sort, what you might do is
> create a user function built on the CHAR() function which will translate
> any character string into a string of digits, produce them in another
> column, and then sort on that column.
> 
> Jallan
Thanks Jim. Let me say that I have a workaround. The bug (so it seems to be)
involves accented characters only when followed by numbers. So all the
symbols you mention sort perfectly as long as they are not followed by
numbers. I'm sure the person who coded this sort routine would recognise
the problem immediately because it's so weird. Anyway, the workaround then
becomes obvious: the numbers in question are 1-4 (these are tone markers) I
map them onto non-numbers in a separate field (which is a separate field of
a separate field) and do my sort on that newly created field. Since you've
replicated the problem, I guess we can call this a genuine bug and I guy
I'll have to go through the complicated *sigh* procedure of reporting it. I
should say that the sort does seem to work on the ascii codes so that
higher codes come later in the collating sequence. It's just that the
difference between these codes is neutralised in the presence of numbers (I
don't think it's a question of beginning or middle or end of string here).
It should be trivial to fix if only some dev is reading this.
Cheers,
Jonathan
-- 
Registerd Linux user #445917 at http://counter.li.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to