Re: Meroitic cursive fractions numerical values
On Sunday, March 29, 2015, Andrew West andrewcw...@gmail.com wrote: Having said that, I note that the numeric value of one character has been reduced in the Unicode data: U+2189 VULGAR FRACTION ZERO THIRDS is given the numeric value of 0 rather that 0/3. Could that be because it's intended less as an actual fraction than as a shorthand for 0 out of 3 (for outs and strikes in baseball)? ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Re: Meroitic cursive fractions numerical values
On 03/31/2015 11:30 AM, Doug Ewell wrote: Karl Williamson public at khwilliamson dot com wrote: It's a small matter to add code to reduce the UCD-specified rational numbers, but it's just one more complication to have to deal with along with the many that the UCD already presents, and if there is not a good reason the data for these new characters is specified contrary to mathematical convention, then the data should be changed instead of having to code around it. UAX #44, Section 5.9.1 says: | For all numeric properties, and for properties such as | Unicode_Radical_Stroke which are constructed from combinations of | numeric values, use loose matching rule UAX44-LM1 when comparing | property values. | | UAX44-LM1. Apply numeric equivalences. | • 01.00 is equivalent to 1. | • 1.67 in the UCD is a repeating fraction, and equivalent to | 10/6 or 5/3. This strongly suggests that the implementation should be changed, not to match the data, but to match the specification. Ok. I've made the change. Is it a problem that DerivedNumericValues.txt doesn't match UnicodeData.txt in this regard? (That is, the derived file comes with irreducible rationals) -- Doug Ewell | http://ewellic.org | Thornton, CO ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
RE: Meroitic cursive fractions numerical values
Karl Williamson public at khwilliamson dot com wrote: Is it a problem that DerivedNumericValues.txt doesn't match UnicodeData.txt in this regard? (That is, the derived file comes with irreducible rationals) Not if you follow the loose matching rule UAX44-LM1 (see earlier message) which says that mathematically equivalent (or nearly so) numbers should be treated as equivalent. UnicodeData-8.0.0d8.txt: 109FB;MEROITIC CURSIVE FRACTION SIX TWELFTHS;No;0;R6/12;N; DerivedNumericValues-8.0.0d10.txt: 109FB ; 0.5 ; ; 1/2 # No MEROITIC CURSIVE FRACTION SIX TWELFTHS -- Doug Ewell | http://ewellic.org | Thornton, CO ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Re: Meroitic cursive fractions numerical values
Karl Williamson public at khwilliamson dot com wrote: It's a small matter to add code to reduce the UCD-specified rational numbers, but it's just one more complication to have to deal with along with the many that the UCD already presents, and if there is not a good reason the data for these new characters is specified contrary to mathematical convention, then the data should be changed instead of having to code around it. UAX #44, Section 5.9.1 says: | For all numeric properties, and for properties such as | Unicode_Radical_Stroke which are constructed from combinations of | numeric values, use loose matching rule UAX44-LM1 when comparing | property values. | | UAX44-LM1. Apply numeric equivalences. | • 01.00 is equivalent to 1. | • 1.67 in the UCD is a repeating fraction, and equivalent to | 10/6 or 5/3. This strongly suggests that the implementation should be changed, not to match the data, but to match the specification. -- Doug Ewell | http://ewellic.org | Thornton, CO ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Re: Meroitic cursive fractions numerical values
On 03/29/2015 03:41 AM, Andrew West wrote: On 28 March 2015 at 20:05, Karl Williamson pub...@khwilliamson.com wrote: Existing software that looks at the numeric values of characters is written expecting that rational numbers will have been reduced to their lowest form. That seems to be a rather rash statement. I have software (BabelPad) which parses the numeric values of characters for numeric sorting purposes, and it parses 6/12 for MEROITIC CURSIVE FRACTION SIX TWELFTHS as 0.5. Personally I find it hard to imagine how you could write software that accepts 6/12 as input and is unable to come up with the answer of a half. The statement is not rash, as it is simply a statement of objective fact. I am the maintainer of software that fails with beta 8.0 due to this change. And it has nothing to do with not being able to do arithmetic division; your assumption was wrong. The software essentially creates a database of Unicode properties for regular expression pattern matching. so that someone can say /\p{Numeric_Value=0.5}/ and quickly determine if the matched string contains a code point with that characteristic. Because the database is copied as-is to many different computers with different word sizes and different floating point implementations, it can't do the division ahead of time because of the inherent fuzziness of floating point numbers. It solves this the same way Unicode has, by leaving rational numbers in their original precisely specified format. Thus it creates a table for the property-value combination of Numeric_Value and 1/2, taking the UCD value as-is. Prior to beta 8, the UCD came with all fractions already reduced. It would not occur to someone with a mainly mathematical or computer science background that the input data would come otherwise, as the mathematical convention is to specify in irreducible terms, even though this isn't promised by Unicode, so of course there is no code to handle the new case. The code thus creates a second table for the property-value combination of Numeric_Value and 6/12, which causes problems. It's a small matter to add code to reduce the UCD-specified rational numbers, but it's just one more complication to have to deal with along with the many that the UCD already presents, and if there is not a good reason the data for these new characters is specified contrary to mathematical convention, then the data should be changed instead of having to code around it. I would say that fractions should not be reduced to their lowest form in the Unicode data as some people may need to order fractions by numerator or denominator, and reducing to lowest form could break the expectations of some software. Having said that, I note that the numeric value of one character has been reduced in the Unicode data: U+2189 VULGAR FRACTION ZERO THIRDS is given the numeric value of 0 rather that 0/3. So there is some precedent for reducing. Andrew ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Re: Meroitic cursive fractions numerical values
On 28 March 2015 at 20:05, Karl Williamson pub...@khwilliamson.com wrote: Existing software that looks at the numeric values of characters is written expecting that rational numbers will have been reduced to their lowest form. That seems to be a rather rash statement. I have software (BabelPad) which parses the numeric values of characters for numeric sorting purposes, and it parses 6/12 for MEROITIC CURSIVE FRACTION SIX TWELFTHS as 0.5. Personally I find it hard to imagine how you could write software that accepts 6/12 as input and is unable to come up with the answer of a half. I would say that fractions should not be reduced to their lowest form in the Unicode data as some people may need to order fractions by numerator or denominator, and reducing to lowest form could break the expectations of some software. Having said that, I note that the numeric value of one character has been reduced in the Unicode data: U+2189 VULGAR FRACTION ZERO THIRDS is given the numeric value of 0 rather that 0/3. Andrew ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Re: Meroitic cursive fractions numerical values
How would you note the numeric value property of the mathematical pi symbol, if you use 0.5, assuming that it should be written as a single decimal value without using any operator ? You can't because there's an infinite number of decimals, unless you explciitly says that the numeric property is limtied to the precision of an IEEE 64-bit double floatting point value (or 80-bit long double supported natively by x86 processors). So you have to imagine that the numeric value property is effectively a mathematical expression using some conventional set of mathematical symbols (in which case the numeric value property of the pi symbol should be the symbol itself). In that case, writing 6/12 or 1/2 is fully equivalent, mathematically, as this property is a mathematical expression. Now that property should have a syntax defined. The problem being that for complex expressions there are several mathematical notations, the most common used in plaintext being using TeX (except that it does not just note the expression itself but its presentation and layout). Could Unicode define a basic plaintext syntax for a subset of mathematical expressions that are useful to parse the numeric value field ? It would of course contain the syntax for numbers (using all decimal digits from various scripts, but ignoring the localized conventions for decimal separators, reduced to just the ASCII dot, and the grouping separators, reduced to none), restricting the use of unnecessary whitespaces in that field, reducing the use of unnecessary leading zeroes, or trailing zeroes in decimal parts), it would contain the subset of symbolic constants encoded in Unicode as symbolic constants (such as pi, e, i). It would not contain any symbolic constant directly expressible with others. It could potentially contain superscript digits used for exponents. And of course it would contain the common set of arithmetic operators (+, the ASCII MINUS-HYPHEN or mathematical MINUS, × or the ASCII ASTERISK, / or ÷, ^ for noting exponentiation, and parentheses), or algebric operators (such as√). It would not include special operators (such as ±) that can't be evaluated to a single number in a single dimensional numerical body (so we limit us to the body of complex numbers ?). Further extensions would include some common functions such as core trigonometric and hyperbolic functions (sine, cosine, tangent, cotangent) and their inverse, and logarithms. That syntax would not specify if those expressions are effectively evaluatable such as 0/0 (it's up to implementations to check this according to their own numeric domain) as the syntax does not specify the numeric domain (body or ring?) in which it will be evaluated (for example 1/0 is valid in some rings where all member numbers are invertible, including zero), and it will not assume that -1 is necessarily different from +1 (they are equivalent in Z/2Z which just contains two members: 0 and 1, and where 2 or 4 are also equal 0) or the precision of numbers (1/100 could be equal to 0 in an integer domain). This could be the base for defining a basic set of expressions that many programming languages could support in their syntax, using the precision they want or can support (even if their native syntax use other similar notations with simple substitution rules. For this reason, it seems more natural to avoid reducing fractions in the numeric property value, and keep them in their natural form : 6/12 NOT reduced to 1/2, and 0/3 NOT reduced to 0 (because this may incorrectly assume a subset of a linear numeric body): let the implementation define itself its numeric domain and these expressions are evaluatable in that domain: the parser will be the same, only the evaluator will be different as it completely depends on the numeric domain. 2015-03-29 11:41 GMT+02:00 Andrew West andrewcw...@gmail.com: On 28 March 2015 at 20:05, Karl Williamson pub...@khwilliamson.com wrote: Existing software that looks at the numeric values of characters is written expecting that rational numbers will have been reduced to their lowest form. That seems to be a rather rash statement. I have software (BabelPad) which parses the numeric values of characters for numeric sorting purposes, and it parses 6/12 for MEROITIC CURSIVE FRACTION SIX TWELFTHS as 0.5. Personally I find it hard to imagine how you could write software that accepts 6/12 as input and is unable to come up with the answer of a half. I would say that fractions should not be reduced to their lowest form in the Unicode data as some people may need to order fractions by numerator or denominator, and reducing to lowest form could break the expectations of some software. Having said that, I note that the numeric value of one character has been reduced in the Unicode data: U+2189 VULGAR FRACTION ZERO THIRDS is given the numeric value of 0 rather that 0/3. Andrew ___ Unicode mailing list
Re: Meroitic cursive fractions numerical values
Unless there is a value in documenting the value of the numerator and denominator, in which case this should be prominently explained in the documentation. Or is that written down somewhere already? A./ On 3/28/2015 1:05 PM, Karl Williamson wrote: In the 8.0 Beta files, some numerical values are not reduced to their lowest forms. Is there a compelling reason that 109FB;MEROITIC CURSIVE FRACTION SIX TWELFTHS;No;0;R6/12;N; is not written as 109FB;MEROITIC CURSIVE FRACTION SIX TWELFTHS;No;0;R1/2;N; given that there is also a 109BD;MEROITIC CURSIVE FRACTION ONE HALF;No;0;R1/2;N; Aren't the numeric values of U+109FB and U+109BD the same? Existing software that looks at the numeric values of characters is written expecting that rational numbers will have been reduced to their lowest form. ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Re: Meroitic cursive fractions numerical values
On 03/28/2015 02:25 PM, Asmus Freytag (t) wrote: Unless there is a value in documenting the value of the numerator and denominator, in which case this should be prominently explained in the documentation. It seems to me that the character name provides sufficient documentation of the numerator and denominator Or is that written down somewhere already? A./ On 3/28/2015 1:05 PM, Karl Williamson wrote: In the 8.0 Beta files, some numerical values are not reduced to their lowest forms. Is there a compelling reason that 109FB;MEROITIC CURSIVE FRACTION SIX TWELFTHS;No;0;R6/12;N; is not written as 109FB;MEROITIC CURSIVE FRACTION SIX TWELFTHS;No;0;R1/2;N; given that there is also a 109BD;MEROITIC CURSIVE FRACTION ONE HALF;No;0;R1/2;N; Aren't the numeric values of U+109FB and U+109BD the same? Existing software that looks at the numeric values of characters is written expecting that rational numbers will have been reduced to their lowest form. ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Re: Meroitic cursive fractions numerical values
On 3/28/2015 1:05 PM, Karl Williamson wrote: In the 8.0 Beta files, some numerical values are not reduced to their lowest forms. Is there a compelling reason that 109FB;MEROITIC CURSIVE FRACTION SIX TWELFTHS;No;0;R6/12;N; is not written as 109FB;MEROITIC CURSIVE FRACTION SIX TWELFTHS;No;0;R1/2;N; Well, obviously you might not consider it a compelling reason, but the numeric values were written that way in the original proposal (L2/12-206, June 6, 2012). Nobody said anything about rational numbers expressed as fractions being required to be lowest form, and the entries were just carried forward into the drafts of Unicode 8.0 UnicodeData.txt for beta review. given that there is also a 109BD;MEROITIC CURSIVE FRACTION ONE HALF;No;0;R1/2;N; Aren't the numeric values of U+109FB and U+109BD the same? Of course. Existing software that looks at the numeric values of characters is written expecting that rational numbers will have been reduced to their lowest form. Well, not all existing software, obviously, as the tools used to generate the derived data files didn't complain, and produced the correct results for these Meroitic fractions: http://www.unicode.org/Public/8.0.0/ucd/extracted/DerivedNumericValues-8.0.0d10.txt And there is nothing in the documentation of the Numeric_Value property (see UAX #44) that currently *requires* only an irreducible fraction (or an integer) in the field. (See also DerivedNumericValues.txt, which is silent on this.) You can always provide beta feedback requesting that the relevant fractions be changed to their lowest forms, for review by the May UTC meeting. Personally, I wouldn't object to a change like that, as I don't see any particular didactic value to expressing the fractional values with precisely the same numerator and denominator as the character form implies, if it isn't mathematically necessary. On the other hand, I would be loathe to make this a mandatory *requirement* of the Numeric_Value field, as that would then add yet another baroque invariant on the UCD data, and would imply yet more elaborated testing to verify for each release that a new invariant we imposed on ourselves what not somehow violated in the new data for the UCD. The set of invariants currently maintained is already bordering on impossible for any one participant in the data maintenance to understand. The other drawbacks of piling on invariants is that the UTC has been bitten by them in the past when something new comes up that wasn't anticipated. This particular requirement might be innocuous and safe -- but why tempt the fates? --Ken ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Meroitic cursive fractions numerical values
In the 8.0 Beta files, some numerical values are not reduced to their lowest forms. Is there a compelling reason that 109FB;MEROITIC CURSIVE FRACTION SIX TWELFTHS;No;0;R6/12;N; is not written as 109FB;MEROITIC CURSIVE FRACTION SIX TWELFTHS;No;0;R1/2;N; given that there is also a 109BD;MEROITIC CURSIVE FRACTION ONE HALF;No;0;R1/2;N; Aren't the numeric values of U+109FB and U+109BD the same? Existing software that looks at the numeric values of characters is written expecting that rational numbers will have been reduced to their lowest form. ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode