Re: Why is TAMIL SIGN VIRAMA (pulli) not Alphabetic?

Asmus Freytag via Unicode Mon, 28 May 2018 21:45:52 -0700

One of the general principles is that combining marks inherit theproperty of their base character.


Normally, "inherited" should be the only property value for combining marks.

There have been some deviations from this over the years, for variousreasons, and there are some properties (such as general category) whereit is necessary to recognize the character as combining, but the generalprinciple still holds.

Therefore, if you are trying to see whether a string is alphabetic,combining marks should be "transparent" to such an algorithm.


A./


On 5/28/2018 9:23 PM, Martin J. Dürst via Unicode wrote:

Hello Sundar,

On 2018/05/28 04:27, SundaraRaman R via Unicode wrote:
Hi,

In languages like Ruby or Java
(https://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#isAlphabetic(int)),
functions to check if a character is alphabetic do that by looking for
the 'Alphabetic'  property (defined true if it's in one of the L
categories, or Nl, or has 'Other_Alphabetic' property). When parsing
Tamil text, this works out well for independent vowels and consonants
(which are in Lo), and for most dependent signs (which are in Mc or Mn
but have the 'Other_Alphabetic' property), but the very common pulli(VIRAMA)
is neither in Lo nor has 'Other_Alphabetic', and so leads to
concluding any string containing it to be non-alphabetic.

This doesn't make sense to me since the Virama  “◌்” as much of an
alphabetic character as any of the "Dependent Vowel" characters which
have been given the 'Other_Alphabetic' property. Is there a rationale
behind this difference, or is it an oversight to be corrected?
I suggest submitting an error report viahttps://www.unicode.org/reporting.html. I haven't studied the issue indetail (sorry, just no time this week), but it sounds reasonable togive the VIRAMA the 'Other_Alphabetic' property.
I'd recommend to mention examples other than Tamil in your report(assuming they exist).
BTW, what's the method you are using in Ruby? If there's a problem inRuby (which I don't think; it's just using Unicode data), then pleasemake a bug report at https://bugs.ruby-lang.org/projects/ruby-trunk, Ishould be able to follow up on that.
Regards,   Martin.

Re: Why is TAMIL SIGN VIRAMA (pulli) not Alphabetic?

Reply via email to