On 03/29/2015 03:41 AM, Andrew West wrote:
On 28 March 2015 at 20:05, Karl Williamson <[email protected]> wrote:

Existing software that looks at the numeric values of characters is written
expecting that rational numbers will have been reduced to their lowest form.

That seems to be a rather rash statement. I have software (BabelPad)
which parses the numeric values of characters for numeric sorting
purposes, and it parses "6/12" for MEROITIC CURSIVE FRACTION SIX
TWELFTHS as 0.5. Personally I find it hard to imagine how you could
write software that accepts "6/12" as input and is unable to come up
with the answer of a half.

The statement is not rash, as it is simply a statement of objective fact. I am the maintainer of software that fails with beta 8.0 due to this change. And it has nothing to do with not being able to do arithmetic division; your assumption was wrong.

The software essentially creates a database of Unicode properties for regular expression pattern matching. so that someone can say

  /\p{Numeric_Value=0.5}/

and quickly determine if the matched string contains a code point with that characteristic. Because the database is copied as-is to many different computers with different word sizes and different floating point implementations, it can't do the division ahead of time because of the inherent fuzziness of floating point numbers. It solves this the same way Unicode has, by leaving rational numbers in their original precisely specified format. Thus it creates a table for the property-value combination of Numeric_Value and 1/2, taking the UCD value as-is.

Prior to beta 8, the UCD came with all fractions already reduced. It would not occur to someone with a mainly mathematical or computer science background that the input data would come otherwise, as the mathematical convention is to specify in irreducible terms, even though this isn't promised by Unicode, so of course there is no code to handle the new case. The code thus creates a second table for the property-value combination of Numeric_Value and 6/12, which causes problems.

It's a small matter to add code to reduce the UCD-specified rational numbers, but it's just one more complication to have to deal with along with the many that the UCD already presents, and if there is not a good reason the data for these new characters is specified contrary to mathematical convention, then the data should be changed instead of having to code around it.

I would say that fractions should not be reduced to their lowest form
in the Unicode data as some people may need to order fractions by
numerator or denominator, and reducing to lowest form could break the
expectations of some software.  Having said that, I note that the
numeric value of one character has been reduced in the Unicode data:
U+2189 VULGAR FRACTION ZERO THIRDS is given the numeric value of "0"
rather that "0/3".

So there is some precedent for reducing.



Andrew
_______________________________________________
Unicode mailing list
[email protected]
http://unicode.org/mailman/listinfo/unicode


_______________________________________________
Unicode mailing list
[email protected]
http://unicode.org/mailman/listinfo/unicode

Reply via email to