Re: Sort, Find, RawKeyDown ... / diacritical problems.

Emmanuel Companys Wed, 05 Mar 2003 01:57:13 -0800

Le lundi, 3 mars 2003, à 18:45 Europe/Paris, Pierre a écrit :

De: Pierre <pierre.bernaert>
Date: Sam 1 mars 2003 09:20:33 Europe/Paris
À: Liste Révolution <[EMAIL PROTECTED]>
Objet: Sort, Find, RawKeyDown ... / diacritical problems.

Hi everyone,

I'm a french and i'm in the process of translating major (For me) applications written with Hypercard to RR 1.1.

The main tools of some of them mainly relies on "HyperText" approach.

I found, using RR, that SORT and FIND commands didn't work with words having diacritical (Lettres accentuées).

I did send a mail to Kevin Miller below is its reply.

----------------------------------------------------------------------------------------------------------------------------
My mail and the Answer from Kevin

De: Kevin Miller <[EMAIL PROTECTED]>
Date: Jeu 20 fév 2003 20:39:03 Europe/Paris
À: <[EMAIL PROTECTED]>
Objet: Rép : Find, Sort and more generally speaking "Diacritical" ...

On 19/2/03 8:25 am, Pierre <[EMAIL PROTECTED]> wrote:

As you surely know, in RR 1.1 diacritical doesn't work as far as:

• At least FIND and SORT commands are concerned.
• I believe "RawKeyDown" is concerned too

In both cases you get wrong answers (This was working fine with HC
using "International" for sorting).

I'm not used to "Unicode", as far as I know it will be implemented in
Version 2.

Will "Unicode" solve this major problem for "Hyper Text" applications ?

Thank to tell me what are the plans on this topic.

You can script around these issues, relatively easily. Unicode supports
entering and display international text. We don't plan to make any further
changes to these functions for 2.0, but could consider revisiting this (at
the scripting level at least) for 2.1. In the mean time, try asking on the
use-revolution mailing list, someone there is bound to be able to help write
a script to overcome these issues.

Kind regards,

Kevin
----------------------------------------------------------------------------------------------------------------------------------------------

As far as I am concerned I don't see how to deal with this, so my question to the members of this list is

Has some members an idea on how to solved this and better has someone made it work ??

I encountered the same problem one year ago when I was working on my program Polylexis, that you may download from my iDisk.

HyperCard has had "sort ascii" sorting according to the ascii value of each character, AND "sort international" with ignored both the upper case/low case difference AND any diacritics. This was a "pis-aller" more or less acceptable solution for French. José Ileras, a RR user from Barcelona, told me that he had corresponded about the sorting problem with the R-R staff and that it should be taken care of in version 1.1; but I didn't find it was really implmented.

Besides, the problem is not solved by just ignoring diacritics :

FIRST: Where will be sorted special chars such as the german "SZ" (ß)?; or "bar-o" (ø), the "edh", "thorn", "bar-D", the "medium dot" (·), etc for that matter?

SECOND: the diacritized letters have a special sorting behavior depending on the language:
a) ñ is sorted as a different letter in spanish, betwen N and O, in dictionnaries; and so is ç and many diacrized letters in several languages. Ignoring the diacritic then totally changes the alphabetical presentation.
b) even when the diacritized letter is not considered as a separate letter, the typographic rules of the language my asign them a special place: for instance, in French "Macon" may come before "Maçon", "Lez" before "Lés", "Prés" before "Près" and so on. By simply ignoring the diacritics we get a random sorting.

THIRD: Digraphs may have a special behavior too:
a) Ligatures such as æ or œ will be equated to their upper case correspondents by "international sorting"; but where will they be listed? After the Z? As a separate letter (between and B, or O and P), as if they were normal digraphs (ae, oe)? equated to "ä" or "ö" (and "ø")?

Kevin Miller says:

"We don't plan to make any further changes to these functions for 2.0, but could consider revisiting this (at the scripting level at least) for 2.1."
I wish many R-R user will encourage him to do so (although I don't understand what "at the scripting level at least" means). The use of Unicode by istelf will not solve all the sorting problems, and, besides, it still has many inconveniences, mostly practical, and some technical; they certainly will be overcome but to use this two-byte system now, for chars that had been correctly sorted in any computer for 20 years is simply illogical.

The work around I used in Polylexis is less elegant that Jan Schenkels's scripts (http://lists.runrev.com/pipermail/use-revolution/2002-September/008173.html), and I don't knw which one is faster. But mine takes care of all the points, while Jan's simple makes the diacritics and the case to be ignored.

My script put the field to be sorted in a local, then reads the first word of each line of the local, and puts into a second item the "fake form" of it. This "fake form" is obtained by changing the diacritized letter by a digraph (or trigraph) that will ensure the correct location in the list: for instance, for spanish it will replace "ñ" by " "nzz" and "ch" by "cz" and "ll" by "lzz". Then I' just have to sort by item 2". Afterwards item 2 is deleted, and the contents of the local replaces that of the original field. I wrote a different "make fake" function for each of the 9 languages used in my program.

Of course, this is slow in old computers. R-R should have at least: a) an option "sort international", and b) externals or plug-ins for at least the most used one-byte compliant languages. It would be nice if we had an option "sort system selected language" using the sorting system selected by the operating system; but I don't know how much this would be difficult, in a cross platform program...

Manuel

Re: Sort, Find, RawKeyDown ... / diacritical problems.

Reply via email to