Re: Changing UCA primar[l]y weights (bad idea)

2004-07-12 Thread Patrick Andries
Alain LaBonté a écrit : It would be much better to make sorting, matching and searching consistent with tailored tables of either the UCA or ISO/IEC 14651. Unfortunately that is not what happens in most products, except in some good search engines (Google, Altavista and the like, which are

Re: Importance of diacritics (was: Looking for transcription ...)

2004-07-12 Thread busmanus
busmanus wrote: Mike Ayers wrote: Interesting case, and one reason why diacritic stripping, although brutal, may be desireable - it doesn't pretend to be accurate. An even funnier example than Trcsik's name, would be Benk /bnko/ and Benk /bnk/, two famous musicians of Hungary.

MacOS character sets

2004-07-12 Thread Tay, William
Hi, I'd like to understand what character encodingan application that runs onMacOS uses. Just as Windows applications generally use code pages andUNIX applications useISO-8859-X character set, what about MacOS applications? Is there any websitethat shows the encoding of characters of the

Problems Reading Saved Files With Unicode Names

2004-07-12 Thread Nicholas Dinh
Hi, I think I have a problem that is related to unicode translation. I have some files with filenames saved in unicode with special characters. This is fine as I can open it. The problem began when I had to reconfigure my computer system and backed up all my files. To do this, I backed it up

RE: Problems Reading Saved Files With Unicode Names

2004-07-12 Thread Geoff Back
Nicholas, 1. Use CD ripping software to create an ISO image of the CDROM. Roxio's Easy CD Creator, Nero, etc - all can do this. Either: 2a. Use a quality ISO image editor tool to rename the files. You'll have to do a bit of research to find one but they do exist - it's just been a while

Re: Changing UCA primary weights (bad idea)

2004-07-12 Thread Mark Davis
John [Cowan]'s list is not a few characters. Let's take Latin, for starters. There are 1870 entries in the UCA for Latin. If you subtract from John's list the ones that are already interleaved -- as I did in my email -- then you get 78 values, or about 4%. I'll repeat that list again below,

User Expectations for collation (was Re: Looking for transcription or transliteration standards latin-arabic)

2004-07-12 Thread Mark Davis
These provide good examples. It would be interesting to see, of the people on the [EMAIL PROTECTED] list, how many non-Poles would expect to find the following orders: Ab b Ac Eb b Ec Ob b Oc Ce e Cy Ne e Ny Sa a Sy Za a Zy Za a Zy and either (a) or (b): a) La a Ly//

Re: Changing UCA primary weights (bad idea)

2004-07-12 Thread Mark Davis
I am positive that all of my tailorings for Sybase will be *affected*, for example. I don't think they will be *substantially* affected, in the sense of any complete redefinition of how the tailoring itself is defined. I don't think they will be *substantially* affected, in the sense of any

Re: Problems Reading Saved Files With Unicode Names

2004-07-12 Thread Dominikus Scherkl \(MGW\)
Because the filename was automatically converted with ? characters, it is considered and invalid file name and I can't open this. How about open them on a system without this problem (like Un*x)? But be sure to refer to the file with quotes. something like cp CDROM/'my??file'

Re: MacOS character sets

2004-07-12 Thread Doug Ewell
William Tay wrote: I'd like to understand what character encoding an application that runs on MacOS uses. Just as Windows applications generally use code pages and UNIX applications use ISO-8859-X character set, what about MacOS applications? Is there any website that shows the encoding of

Re: Looking for transcription or transliteration standards latin- arabic

2004-07-12 Thread Asmus Freytag
At 01:02 AM 7/10/2004, Marcin 'Qrczak' Kowalczyk wrote: But there are cases when I would prefer to fold Polish diacritics in searches. It's basically every case when you are not sure that all stored data is using diacritics, Or when you are unsure how it is spelled, for example, looking up a

Re: User Expectations for collation (was Re: Looking for transcription or transliteration standards latin-arabic)

2004-07-12 Thread Asmus Freytag
I missed Mark's change in subject - so I replied to Marcin's message right now under the old subject line: - Original Message - From: Marcin 'Qrczak' Kowalczyk [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Saturday, July 10, 2004 01:02 Subject: Re: Looking for transcription or

RE: User Expectations for collation (was Re: Looking for transcri ption or transliteration standards latin-arabic)

2004-07-12 Thread Mike Ayers
Title: RE: User Expectations for collation (was Re: Looking for transcription or transliteration standards latin-arabic) From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Mark Davis Sent: Monday, July 12, 2004 9:21 AM These provide good examples. It would be interesting to

Re: MacOS character sets

2004-07-12 Thread Deborah Goldsmith
The native character set for Mac OS X is Unicode. Earlier versions of Mac OS used Apple-proprietary character sets, and some applications still use those character sets on Mac OS X, though their use is deprecated. The mappings for Apple's old character sets are available at:

RE: Problems Reading Saved Files With Unicode Names

2004-07-12 Thread Mike Ayers
Title: RE: Problems Reading Saved Files With Unicode Names From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Dominikus Scherkl (MGW) Sent: Monday, July 12, 2004 9:51 AM Because the filename was automatically converted with ? characters, it is considered and invalid file

Re: Importance of diacritics

2004-07-12 Thread Doug Ewell
Antnio Martins-Tuvlkin antonio at tuvalkin dot web dot pt wrote: Soviet official (Qruxv, pron. Hrueshawf, a.s.a. Krushchov etc.) used his shoe in a quite unexpected manner this morning Interesting: Antnio's message was encoded in CP1251, but his usual CP1252 signature was still inserted at

Re: Changing UCA primarly weights (bad idea)

2004-07-12 Thread Markus Scherer
Jony Rosenne wrote: For example, a Israeli oriented tailoring would cause Hebrew to sort first, Arabic, Latin and Cyrillic to follow in whatever order is desired by the user, and other scripts would follow in the default ordering. I am not sure that the current default makes this task possible or

Re: Changing UCA primary weights (bad idea)

2004-07-12 Thread Markus Scherer
Mark Davis wrote: So the question is whether Sybase tailorings, such as German, will be affected positively or negatively, and to what degree. If a German customer is accessing a database full of European names, and expects to find with E, and with A and with Z and with L, then he will be

Locale Data cut-off

2004-07-12 Thread Mark Davis
The next version of Common Locale Data Repository (v1.2) will be coming out in mid-October. To manage the work load more effectively in this release, bug reports or requests for feature enhancements (RFEs) after the start of September will not be considered for the v1.2 release, except insofar as

Re: Importance of diacritics

2004-07-12 Thread busmanus
Anto'nio Martins-Tuva'lkin wrote: On 2004.07.12, 15:36, busmanus [EMAIL PROTECTED] wrote: O, yes, and rough transcriptions in brackets do no harm (e.g. at the first occurrence in the given text), at least if such are available. This would be (very roughly) something like Benk (pron. Benkoh) and

Re: Changing UCA primary weights (bad idea)

2004-07-12 Thread Mark Davis
Checking the DIN 5007, it indeed says that letters with diacritics are sorted with the same primary weight (Section 5.1.1.3) and explicitly lists in 6.2.3.1.1 overstrikes as being diacritics, and gives as an example of that. Mark - Original Message - From: Markus Scherer [EMAIL

RE: Changing UCA primar[l]y weights (bad idea)

2004-07-12 Thread Alain LaBonté
Resent with a non-renegade email address... (^8= À 14:10 2004-07-09, Jony Rosenne a écrit: I think the problem is with the concept of default in this case. The default should be the basis for a specific tailoring, and as a last resort for scripts and letters that do not have specific weights,