At 10:56pm -0800 99-12-23, Palm Developers Forum List wrote:
>Date: 23 Dec 1999 18:18:59 -0800
>From: "Sudipta Ghose" <[EMAIL PROTECTED]>
>Subject: StrCompare behaves strangely
>
>Hi,
>
>There was a similar query posted by Aaron Ardiri on Nov. 30, but I couldn't
>find any reply in the archives. So, I am posting this. Can the guys from
>Palm give it a go?

Here is Aaron's original email:

>Date: 30 Nov 1999 04:41:36 -0800
>From: Aaron Ardiri <[EMAIL PROTECTED]>
>Subject: StrCompare() - bug?
>
>hi!
>
>  i was playing aroung with the StrCompare() function the other
>  day and i noticed something *strange* about its behavior :>
>
>    Int result = StrCompare("Language", "Language");
>
>  notice that the second word is "Language" (not a real word,
>  but good enough for comparision) :>> which contains an accented
>  character "a".
>
>  when executing.. the result is NEGATIVE, when in fact it should
>  be POSITIVE! :> i looked at the OS 3.0 source code and noticed
>  that it is fine, however - i had to rewrite the function to
>  compare the "unsigned char" value of the character being used
>  in the string.
>
>  "a" has an ASCII value greater than 127, and hence has a -ve value
>  when being compared within these types of routines. everything of
>  course works fine for normal ASCII characters (non accented)
>
>  has this been fixed in later versions of the OS?

Here was my reply:

>Let me make sure I understand this. The result from 
>StrCompare(string1, string2) will be negative if string1 sorts 
>before (is "less than") string2.
>
>Now, based on the standard Palm sorting rules, unaccented characters 
>sort before accented characters. So "Language" would sort before 
>"L<a+accent>nguage".
>
>Which means that the routine is correctly returning a negative value.
>
>So why do you think the result should be positive? And what did you 
>see when looking through the OS 3.0 source code that made you think 
>there was a bug?

I never received a response, so I assume that there wasn't actually a 
bug in the 3.0 source code.

Continuing with your email:

>It seems Palm OS string comparisons are not always based on ANSI codes.
>e.g.,
>
>StrCompare(" ", "-") or StrCaselessCompare(" ", "-") returns a value > 0.
>The ANSI code for [space] is 32 and ANSI code for - is 45. So, I expected
>that the above two calls will return a value < 0. That's what happens if I
>use strcmp or _stricmp in a VC++ 6.0 program.

Results from StrCompare and StrCaselessCompare will _not_ be the same 
as strcmp, since as you note strcmp does a blind comparison based on 
character byte values, while the Palm OS routines attempt to order 
characters in a manner that is relatively correct (generally good 
enough) for the various US/European locales. I'm not a big fan of 
this one-size-fits-all approach to sorting, but it appears to have 
worked thus far.

>Can anybody from Palm tell me how StrCompare and StrCaselessCompare works in
>Palm OS and why? I am sorting my data in the PC (it's faster). But when I
>send the data to the Palm device, binary search fails as the data is not
>sorted according to its sorting rule.

Pre-sorting data on a PC or Mac is tricky, since you'd want to sort 
exactly the same as the device, and sorting on the device has changed 
(ever so slightly) in the past and might change in the future.

Also before discussing the exact implementation details, note that 
(a) it's very different for Japanese, and (b) the manner in which 
sorting is handled will probably be changing in the future, so this 
information should be treated the same as private data 
structures...subject to change.

>I have confirmed this by checking the array returned by
>GetCharCaselessValue. For [space] it stores 41 and for - it stores 37.
>
>The documentation for GetCharCaselessValue is interesting:
>
>"BytePtr GetCharCaselessValue (void)
>Parameters None.
>Result Returns a pointer to the sort array.
>The compiler pads each byte out to a word so each index position
>contains two characters.
>Note: array[x].high = sort value for character 2x+1."
>
>If its returning a byte pointer, why do we have to worry about word
>alignment? We can always access any byte we want. The StrCmpMatches function
>in the Addressbook example exactly does that.

The documentation is wrong. GetCharCaselessValue returns a pointer to 
an array of 256 bytes. Note that this routine has been deprecated 
since Palm OS 3.1, mostly because it doesn't work for multi-byte 
character encodings such as Shift-JIS (Japanese).

>"The GetCharCaselessValue conversion table converts each
>character into a numeric value that is caseless and sorted according
>to Microsoft Windows sorting rules:
>o Punctuation characters have the lowest values,
>o followed by numbers,
>o followed by alpha characters.
>All forms of each alpha character have equivalent values, so
>that e = E = e-grave = e-circumflex, etc.
>This conversion table is used by all the Palm OS sorting and
>comparison routines to yield caseless searches and caseless sorts in
>the almost same order as Windows-based programs, except that
>    ^^^^^^
>Palm OS routines produce the same sorting for all locales."
>
>What does this "almost" means?

Well, one difference is that on Windows, different locales (which 
still use the CP1252 character encoding) will have different sorting 
rules, while with the Palm OS currently all locales have the same 
sort order. Note again that this will probably be changing in the 
future.

>The documentation for GetCharSortValue says:
>
>"BytePtr GetCharSortValue (void)
>Parameters None.
>Result Returns a pointer to the attributes array. This is an array of 256
>Word
>values, one for each possible character code.
>The compiler pads each byte out to a word so each index position
>contains two characters.
>NOTE: array[x].low = sort value for character 2x."
>
>
>I am li'l lost here. If it is returning an array of 256 word values, why the
>compiler has to pad? If we put 2 chars in a word we get 512 chracters. Do
>Palm OS have that many characters? What am I missing here?

Again the documentation is wrong. GetCharSortValue returns a pointer 
to an array of 256 bytes, similar to GetCharCaselessValue. And again, 
this routine is now deprecated.

-- Ken

Ken Krugler
TransPac Software, Inc.
<http://www.transpac.com>
+1 530-470-9200 (direct) +1 408-261-7550 (main)

Reply via email to