Re: Why is TAMIL SIGN VIRAMA (pulli) not Alphabetic?

Ken Whistler via Unicode Mon, 28 May 2018 22:04:43 -0700



On 5/28/2018 9:44 PM, Asmus Freytag via Unicode wrote:

One of the general principles is that combining marks inherit theproperty of their base character.
Normally, "inherited" should be the only property value for combiningmarks.
There have been some deviations from this over the years, for variousreasons, and there are some properties (such as general category)where it is necessary to recognize the character as combining, but thegeneral principle still holds.
Therefore, if you are trying to see whether a string is alphabetic,combining marks should be "transparent" to such an algorithm.

Generally, good advice. But there are clear exceptions. For example, theenclosing combining marks for symbols are intended (basically) to makesymbols of a sort. And many combining marks have explicit scriptassigments, so they cannot simply willy-nilly inherit the script of abase letter if they are misapplied, for example.

This is why I recommend simply adding the Diacritic property into themix for testing a string. That is a closer approximation to the kind ofnaive "Is this string alphabetic?" question that SunaraRaman was askingabout -- it picks up the correct subset of combining marks to union withthe set of actual isAlphabetic characters, to produce more expectedresults. (Including, of course, the correct classification of all theviramas, stackers, and killers, as well as picking up all the nuktas.).

Folks, please examine the set of character for Diacritic and forExtender in:


http://www.unicode.org/Public/UCD/latest/ucd/PropList.txt

to see what I'm talking about. The stuff you are looking for is alreadythere.


--Ken

P.S. And please don't start an argument about the fact that a "virama"isn't really a "diacritic". We know that, too. ;-)

Re: Why is TAMIL SIGN VIRAMA (pulli) not Alphabetic?

Reply via email to