In a message dated 2001-09-07 17:19:49 Pacific Daylight Time,
[EMAIL PROTECTED] writes:
> You are quite correct that is why Unicode support differing collation
> strengths. Some times you only care about the actual letters without
> diacritics. But even then letters are locale sensitive. F
As a percentage of words in English, it is quite small, but there are still
plenty of homographs, such as:
BASS
BOW(S)
BUFFET
COAX
CLOSE
COMPOUND(S)
CONVERSE
DESERT
DIVERS
DOES
DOVE
ENTRANCE(S)
EXCISE
HARE
INTIMATE
INVALID
LAME
LEAD
LUGER(S)
MANES
MARE(S)
MINUTE
OBJECT(S)
PATENT
POLISH
PRESENT
PR
I disagree. What you want is a merged database field. See
http://www.macchiato.com/slides/icu_collation.ppt
Mark
—
Πόλλ’ ἠπίστατο ἔργα, κακῶς δ’ ἠπίστατο
πάντα — Όμήρου Μαργίτῃ
[http://www.macchiato.com]
- Original Message -
From: "Asmus Freytag" <[EMAIL PROTECTED]>
To: "David Galla
Asmus,
You are quite correct that is why Unicode support differing collation
strengths. Some times you only care about the actual letters without
diacritics. But even then letters are locale sensitive. For example the
Danish alphabet starts with an A and ends it with A ring above. A Dane
woul
Folks,
While many of us are focused on next weeks Unicode conference in San Jose,
we are also very near the deadline for proposals for the next conference in
January...please send in your good ideas.
Thank you,
Lisa
>>> Last Call for Papers! <<<
At 11:50 AM 9/7/01 -0500, Ayers, Mike wrote:
>Words with the
>same spelling and different pronunciation are uncommon but exist in English,
>the classic example being "read" and its own past tense.
Actually, this is a bit more common than you think, since the pronunciation
of vowels in English de
At 01:06 PM 9/7/01 -0400, David Gallardo wrote:
>As a practical matter, you need to take the diacritics into account when
>sorting, even in English where they (may or may not) have linguistic
>significance, otherwise you'll get nondeterministic behaviour. In other
>words, résumé and resume shoul
From: "David Gallardo" <[EMAIL PROTECTED]>
> As a practical matter, you need to take the diacritics into account when
> sorting, even in English where they (may or may not) have linguistic
> significance, otherwise you'll get nondeterministic behaviour. In other
> words, résumé and resume should
>>> Last Call for Papers! <<<
Twentieth International Unicode Conference (IUC20)
Unicode and the Web: The Global Connection
http://www.unicode.org/iuc/iuc20
January 28 - February 1, 2002
> From: David Gallardo [mailto:[EMAIL PROTECTED]]
> Sent: Friday, September 07, 2001 10:07 AM
> As a practical matter, you need to take the diacritics into
> account when
> sorting, even in English where they (may or may not) have linguistic
> significance, otherwise you'll get nondeterministi
> There is also no word pair separated only by the I/J
> distinction (in English), right?
iamb - as in iambic pentamater
jamb - as in a door jamb
As a practical matter, you need to take the diacritics into account when
sorting, even in English where they (may or may not) have linguistic
significance, otherwise you'll get nondeterministic behaviour. In other
words, résumé and resume should fall together, but always in the same order.
Someon
> From: J M Sykes [mailto:[EMAIL PROTECTED]]
> Sent: Friday, September 07, 2001 07:50 AM
> The classic example is 'resume' and 'résumé'. These are, by
> now, two quite
> distinct words, and the fact that there is no 'established'
> order is shown
I spell both "resume" and have never
$B$8$e$&$$$C$A$c$s(B(Juuitchan)
Well, I guess what you say is true,
I could never be the right kind of girl for you,
I could never be your woman
- White Town
>
>Who'd be a lexicographer?
$B;d!)(B
>
>Mike.
>
>**
There is also no word pair separated only by the I/J distinction (in English), right?
$B$8$e$&$$$C$A$c$s(B(Juuitchan)
Well, I guess what you say is true,
I could never be the right kind of girl for you,
I could never be your woman
- White Town
I know of no word pair in a
>
> I believe that there is an established sort order in English, which
> is to sort without regard to diacritics, or else we'd never find the
words!
> In English (American English more than British English), diacritics are
> considered optional, and it is common to see "naїve" written "naive", "S
On Thu, 6 Sep 2001, Ayers, Mike wrote:
>
> > From: David Starner [mailto:[EMAIL PROTECTED]]
> > Sent: Thursday, September 06, 2001 01:40 PM
>
> > On Thu, Sep 06, 2001 at 04:03:07PM +0200, Thierry Sourbier wrote:
> > > The only little thing to know about French and diacritical
> > mark is that
>I would say it is a variant of "o" we just called it... "o with a circumflex
>accent" ("o avec un accent circonflex"). The difference between "o" and "ô"
>is normally audible (for a French speaker). The relationship is the same
>than with any other letter which sometimes have accents (e.g. "a" an
18 matches
Mail list logo