I think the key phrase is "user-perceived". And you don't need to involve 
complex scripts either.

For instance as an English-speaking person, I would perceive the "æ" in 
"encyclopædia" as being two characters (albeit shoved together somewhat). The 
argument for this is that the word can equally well be rendered as 
"encyclopaedia".

A Danish or Norwegian speaker, on the other hand, would perceive "æ" (as in 
"ære" or "æsj!") as being a single indivisible character.

Mark Dalley

-----Original Message-----
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Janusz S. Bien
Sent: 19 September 2016 07:40
To: Christoph Päper
Cc: unicode Unicode Discussion
Subject: graphemes (was: "textels")

On Sun, Sep 18 2016 at 21:40 CEST, christoph.pae...@crissov.de writes:
> Janusz S. Bien <jsb...@mimuw.edu.pl>:
>> 
>> From the Unicode glossary:
>> 
>>> Grapheme. (1) A minimally distinctive unit of writing in the context of a 
>>> particular writing system.[...] (2) What a user thinks of as a character.
>> 
>>> User-Perceived Character. What everyone thinks of as a character in their 
>>> script.
>> 
>> […] the definitions are language/locale dependent.
>
> A writing system is (usually) language-dependent, a script is not, 
> although some scripts have been used exclusively (or prominently) in a 
> single writing system with a single language.

It depends of course what do you mean exactly by script, and which meaning of 
term is intended in the definition of User-Perceived Character. But "a user" is 
definitely language/locale dependent :-)

> So definition (1) of ‘grapheme’ would be appropriate for linguistics,
> (2) maybe for typography and computer science, but it’Í extremely 
> vague.

I think that 'grapheme' (2) in the present wording is simply incorrect. I 
suspect it is not used in the standard at all.

Searching the Unicode site I found only one use of 'grapheme' alone:

http://www.unicode.org/L2/L2000/00274-N2236-grapheme-joiner.htm

        Graphemes are sequences of one or more encoded characters that
        correspond to what users think of as characters.

I guess the intention of 'grapheme' (2) was to describe it without any 
reference to computer encoding, which is definitely an extremely difficult task.

Best regards

Janusz


-- 
                           ,   
Prof. dr hab. Janusz S. Bien -  Uniwersytet Warszawski (Katedra Lingwistyki 
Formalnej) Prof. Janusz S. Bien - University of Warsaw (Formal Linguistics 
Department) jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, 
http://fleksem.klf.uw.edu.pl/~jsbien/

-----Original Message-----
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Janusz S. Bien
Sent: 19 September 2016 07:40
To: Christoph Päper
Cc: unicode Unicode Discussion
Subject: graphemes (was: "textels")

On Sun, Sep 18 2016 at 21:40 CEST, christoph.pae...@crissov.de writes:
> Janusz S. Bien <jsb...@mimuw.edu.pl>:
>> 
>> From the Unicode glossary:
>> 
>>> Grapheme. (1) A minimally distinctive unit of writing in the context of a 
>>> particular writing system.[...] (2) What a user thinks of as a character.
>> 
>>> User-Perceived Character. What everyone thinks of as a character in their 
>>> script.
>> 
>> […] the definitions are language/locale dependent.
>
> A writing system is (usually) language-dependent, a script is not, 
> although some scripts have been used exclusively (or prominently) in a 
> single writing system with a single language.

It depends of course what do you mean exactly by script, and which meaning of 
term is intended in the definition of User-Perceived Character. But "a user" is 
definitely language/locale dependent :-)

> So definition (1) of ‘grapheme’ would be appropriate for linguistics,
> (2) maybe for typography and computer science, but it’Í extremely 
> vague.

I think that 'grapheme' (2) in the present wording is simply incorrect. I 
suspect it is not used in the standard at all.

Searching the Unicode site I found only one use of 'grapheme' alone:

http://www.unicode.org/L2/L2000/00274-N2236-grapheme-joiner.htm

        Graphemes are sequences of one or more encoded characters that
        correspond to what users think of as characters.

I guess the intention of 'grapheme' (2) was to describe it without any 
reference to computer encoding, which is definitely an extremely difficult task.

Best regards

Janusz


-- 
                           ,   
Prof. dr hab. Janusz S. Bien -  Uniwersytet Warszawski (Katedra Lingwistyki 
Formalnej) Prof. Janusz S. Bien - University of Warsaw (Formal Linguistics 
Department) jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, 
http://fleksem.klf.uw.edu.pl/~jsbien/


Reply via email to