Re: Mixed-Script confusables in prog.languages

Martin J. Dürst Mon, 05 Dec 2016 03:36:07 -0800


On 2016/12/05 17:31, Reini Urban wrote:

ψ_S contains Greek U+03C8, Common and Latin. Since Latin and Common are always 
allowed, the only
new script is Greek. The first non-default script is automatically and silently 
allowed, only a mix with another
non-default script, such as Cyrillic would error or need an explicit 
declaration.

So ψ_S alone is fine, if everything else is Greek.
But mixing with the Cyrillic version would lead to an error.

Allowing mixing of Greek and Latin (or Cyrillic and Latin) would be abig problem. As an example, it would allow A_Α (the second letter is aGreek one).

Amharic is not defined as UCD script property. It’s alphabet is called Ge’ez, 
which we call
Ethiopic in the UCD. But that’s all I know. I’m not a domain expert. Does 
Ethiopic uses
other Semitic scripts in its alphabet or is it complete?

It's complete. I have never heard that it would need Arabic or Hebrew orsome such.

How about the many Indian scripts? Do they mix?
Being an indian movie expert tells me that indian languages usually don’t mix.
They make Tamil and Bengali versions of Hindi movies, and usually fall back to 
english to
get common points across the barrier. But their scripts? No idea.

I don't think they mix two different scripts in the same word. Would bevery confusing.

In the examples in perl which partially came from parrot there’s a wild 
eclectic mix of various scripts
which do make no sense at all. So I don’t know if I can trust those tests, that 
they make sense and
are readable at all. My guess is that the authors just liked code golfing and 
picked random unicode
characters. It’s from perl after all.

Such as this perl test t/mro/isa_c3_utf8.t

use utf8 qw( Hangul Cyrillic Ethiopic Canadian_Aboriginal Malayalam Hiragana );

...
package 캎oẃ;
package urḲḵｋ;
@urḲḵｋ::ISA = 'kഌoんḰ';
package к;
@urḲḵｋ::ISA = ('kഌoんḰ', '캎oẃ');
package ṭ화ckэ;
...

These identifiers are unreadable, because I don’t assume that anybody will be 
able to understand
Hangul Cyrillic Ethiopic Canadian_Aboriginal Malayalam and Hiragana at once.
I understand a bit Hangul, Cyrillic and Hiragana, but the mix sounds highly 
illegal to me.

The mixes aren't illegal, in that they are not against any law. But theyare complete intellegible garbage anyway.


Regards,   Martin.

Re: Mixed-Script confusables in prog.languages

Reply via email to