Re: Meroitic cursive fractions numerical values

2015-04-02 Thread Garth Wallace
On Sunday, March 29, 2015, Andrew West andrewcw...@gmail.com wrote:


 Having said that, I note that the
 numeric value of one character has been reduced in the Unicode data:
 U+2189 VULGAR FRACTION ZERO THIRDS is given the numeric value of 0
 rather that 0/3.


Could that be because it's intended less as an actual fraction than as a
shorthand for 0 out of 3 (for outs and strikes in baseball)?
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Meroitic cursive fractions numerical values

2015-04-01 Thread Karl Williamson

On 03/31/2015 11:30 AM, Doug Ewell wrote:

Karl Williamson public at khwilliamson dot com wrote:


It's a small matter to add code to reduce the UCD-specified rational
numbers, but it's just one more complication to have to deal with
along with the many that the UCD already presents, and if there is not
a good reason the data for these new characters is specified contrary
to mathematical convention, then the data should be changed instead of
having to code around it.


UAX #44, Section 5.9.1 says:

| For all numeric properties, and for properties such as
| Unicode_Radical_Stroke which are constructed from combinations of
| numeric values, use loose matching rule UAX44-LM1 when comparing
| property values.
|
| UAX44-LM1. Apply numeric equivalences.
| • 01.00 is equivalent to 1.
| • 1.67 in the UCD is a repeating fraction, and equivalent to
|   10/6 or 5/3.

This strongly suggests that the implementation should be changed, not to
match the data, but to match the specification.


Ok.  I've made the change.

Is it a problem that DerivedNumericValues.txt doesn't match 
UnicodeData.txt in this regard?  (That is, the derived file comes with 
irreducible rationals)


--
Doug Ewell | http://ewellic.org | Thornton, CO 


___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode



___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


RE: Meroitic cursive fractions numerical values

2015-04-01 Thread Doug Ewell
Karl Williamson public at khwilliamson dot com wrote:

 Is it a problem that DerivedNumericValues.txt doesn't match
 UnicodeData.txt in this regard? (That is, the derived file comes with
 irreducible rationals)

Not if you follow the loose matching rule UAX44-LM1 (see earlier
message) which says that mathematically equivalent (or nearly so)
numbers should be treated as equivalent.

UnicodeData-8.0.0d8.txt:
109FB;MEROITIC CURSIVE FRACTION SIX TWELFTHS;No;0;R6/12;N;

DerivedNumericValues-8.0.0d10.txt:
109FB  ; 0.5 ; ; 1/2 # No MEROITIC CURSIVE FRACTION SIX TWELFTHS

--
Doug Ewell | http://ewellic.org | Thornton, CO 

___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Meroitic cursive fractions numerical values

2015-03-31 Thread Doug Ewell
Karl Williamson public at khwilliamson dot com wrote:

 It's a small matter to add code to reduce the UCD-specified rational
 numbers, but it's just one more complication to have to deal with
 along with the many that the UCD already presents, and if there is not
 a good reason the data for these new characters is specified contrary
 to mathematical convention, then the data should be changed instead of
 having to code around it.

UAX #44, Section 5.9.1 says:

| For all numeric properties, and for properties such as
| Unicode_Radical_Stroke which are constructed from combinations of
| numeric values, use loose matching rule UAX44-LM1 when comparing
| property values.
|
| UAX44-LM1. Apply numeric equivalences.
| • 01.00 is equivalent to 1.
| • 1.67 in the UCD is a repeating fraction, and equivalent to
|   10/6 or 5/3.

This strongly suggests that the implementation should be changed, not to
match the data, but to match the specification.

--
Doug Ewell | http://ewellic.org | Thornton, CO 


___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Meroitic cursive fractions numerical values

2015-03-30 Thread Karl Williamson

On 03/29/2015 03:41 AM, Andrew West wrote:

On 28 March 2015 at 20:05, Karl Williamson pub...@khwilliamson.com wrote:


Existing software that looks at the numeric values of characters is written
expecting that rational numbers will have been reduced to their lowest form.


That seems to be a rather rash statement. I have software (BabelPad)
which parses the numeric values of characters for numeric sorting
purposes, and it parses 6/12 for MEROITIC CURSIVE FRACTION SIX
TWELFTHS as 0.5. Personally I find it hard to imagine how you could
write software that accepts 6/12 as input and is unable to come up
with the answer of a half.


The statement is not rash, as it is simply a statement of objective 
fact.  I am the maintainer of software that fails with beta 8.0 due to 
this change.  And it has nothing to do with not being able to do 
arithmetic division; your assumption was wrong.


The software essentially creates a database of Unicode properties for 
regular expression pattern matching. so that someone can say


  /\p{Numeric_Value=0.5}/

and quickly determine if the matched string contains a code point with 
that characteristic.  Because the database is copied as-is to many 
different computers with different word sizes and different floating 
point implementations, it can't do the division ahead of time because of 
the inherent fuzziness of floating point numbers.  It solves this the 
same way Unicode has, by leaving rational numbers in their original 
precisely specified format.  Thus it creates a table for the 
property-value combination of Numeric_Value and 1/2, taking the UCD 
value as-is.


Prior to beta 8, the UCD came with all fractions already reduced.  It 
would not occur to someone with a mainly mathematical or computer 
science background that the input data would come otherwise, as the 
mathematical convention is to specify in irreducible terms, even though 
this isn't promised by Unicode, so of course there is no code to handle 
the new case.  The code thus creates a second table for the 
property-value combination of Numeric_Value and 6/12, which causes problems.


It's a small matter to add code to reduce the UCD-specified rational 
numbers, but it's just one more complication to have to deal with along 
with the many that the UCD already presents, and if there is not a good 
reason the data for these new characters is specified contrary to 
mathematical convention, then the data should be changed instead of 
having to code around it.


I would say that fractions should not be reduced to their lowest form
in the Unicode data as some people may need to order fractions by
numerator or denominator, and reducing to lowest form could break the
expectations of some software.  Having said that, I note that the
numeric value of one character has been reduced in the Unicode data:
U+2189 VULGAR FRACTION ZERO THIRDS is given the numeric value of 0
rather that 0/3.


So there is some precedent for reducing.




Andrew
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode



___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Meroitic cursive fractions numerical values

2015-03-29 Thread Andrew West
On 28 March 2015 at 20:05, Karl Williamson pub...@khwilliamson.com wrote:

 Existing software that looks at the numeric values of characters is written
 expecting that rational numbers will have been reduced to their lowest form.

That seems to be a rather rash statement. I have software (BabelPad)
which parses the numeric values of characters for numeric sorting
purposes, and it parses 6/12 for MEROITIC CURSIVE FRACTION SIX
TWELFTHS as 0.5. Personally I find it hard to imagine how you could
write software that accepts 6/12 as input and is unable to come up
with the answer of a half.

I would say that fractions should not be reduced to their lowest form
in the Unicode data as some people may need to order fractions by
numerator or denominator, and reducing to lowest form could break the
expectations of some software.  Having said that, I note that the
numeric value of one character has been reduced in the Unicode data:
U+2189 VULGAR FRACTION ZERO THIRDS is given the numeric value of 0
rather that 0/3.

Andrew
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Meroitic cursive fractions numerical values

2015-03-29 Thread Philippe Verdy
How would you note the numeric value property of the mathematical pi
symbol, if you use 0.5, assuming that it should be written as a single
decimal value without using any operator ?
You can't because there's an infinite number of decimals, unless you
explciitly says that the numeric property is limtied to the precision of an
IEEE 64-bit double floatting point value (or 80-bit long double
supported natively by x86 processors).

So you have to imagine that the numeric value property is effectively a
mathematical expression using some conventional set of mathematical symbols
(in which case the numeric value property of the pi symbol should be the
symbol itself). In that case, writing 6/12 or 1/2 is fully equivalent,
mathematically, as this property is a mathematical expression.

Now that property should have a syntax defined. The problem being that for
complex expressions there are several mathematical notations, the most
common used in plaintext being using TeX (except that it does not just note
the expression itself but its presentation and layout).

Could Unicode define a basic plaintext syntax for a subset of mathematical
expressions that are useful to parse the numeric value field ? It would
of course contain the syntax for numbers (using all decimal digits from
various scripts, but ignoring the localized conventions for decimal
separators, reduced to just the ASCII dot, and the grouping separators,
reduced to none), restricting the use of unnecessary whitespaces in that
field, reducing the use of unnecessary leading zeroes, or trailing zeroes
in decimal parts), it would contain the subset of symbolic constants
encoded in Unicode as symbolic constants (such as pi, e, i). It would not
contain any symbolic constant directly expressible with others. It could
potentially contain superscript digits used for exponents.

And of course it would contain the common set of arithmetic operators (+,
the ASCII MINUS-HYPHEN or mathematical MINUS, × or the ASCII ASTERISK, /
or  ÷, ^ for noting exponentiation, and parentheses), or algebric operators
(such as√). It would not include special operators  (such as ±) that can't
be evaluated to a single number in a single dimensional numerical body (so
we limit us to the body of complex numbers ?). Further extensions would
include some common functions such as core trigonometric and hyperbolic
functions (sine, cosine, tangent, cotangent) and their inverse, and
logarithms.

That syntax would not specify if those expressions are effectively
evaluatable such as 0/0 (it's up to implementations to check this according
to their own numeric domain) as the syntax does not specify the numeric
domain (body or ring?) in which it will be evaluated (for example 1/0 is
valid in some rings where all member numbers are invertible, including
zero), and it will not assume that -1 is necessarily different from +1
(they are equivalent in Z/2Z which just contains two members: 0 and 1, and
where 2 or 4 are also equal 0) or the precision of numbers (1/100
could be equal to 0 in an integer domain).

This could be the base for defining a basic set of expressions that many
programming languages could support in their syntax, using the precision
they want or can support (even if their native syntax use other similar
notations with simple substitution rules.

For this reason, it seems more natural to avoid reducing fractions in the
numeric property value, and keep them in their natural form : 6/12 NOT
reduced to 1/2, and 0/3 NOT reduced to 0 (because this may
incorrectly assume a subset of a linear numeric body): let the
implementation define itself its numeric domain and these expressions are
evaluatable in that domain: the parser will be the same, only the evaluator
will be different as it completely depends on the numeric domain.


2015-03-29 11:41 GMT+02:00 Andrew West andrewcw...@gmail.com:

 On 28 March 2015 at 20:05, Karl Williamson pub...@khwilliamson.com
 wrote:
 
  Existing software that looks at the numeric values of characters is
 written
  expecting that rational numbers will have been reduced to their lowest
 form.

 That seems to be a rather rash statement. I have software (BabelPad)
 which parses the numeric values of characters for numeric sorting
 purposes, and it parses 6/12 for MEROITIC CURSIVE FRACTION SIX
 TWELFTHS as 0.5. Personally I find it hard to imagine how you could
 write software that accepts 6/12 as input and is unable to come up
 with the answer of a half.

 I would say that fractions should not be reduced to their lowest form
 in the Unicode data as some people may need to order fractions by
 numerator or denominator, and reducing to lowest form could break the
 expectations of some software.  Having said that, I note that the
 numeric value of one character has been reduced in the Unicode data:
 U+2189 VULGAR FRACTION ZERO THIRDS is given the numeric value of 0
 rather that 0/3.

 Andrew
 ___
 Unicode mailing list
 

Re: Meroitic cursive fractions numerical values

2015-03-28 Thread Asmus Freytag (t)
Unless there is a value in documenting the value of the numerator and 
denominator, in which case this should be prominently explained in the 
documentation. Or is that written down somewhere already?


A./



On 3/28/2015 1:05 PM, Karl Williamson wrote:
In the 8.0 Beta files, some numerical values are not reduced to their 
lowest forms.  Is there a compelling reason that


109FB;MEROITIC CURSIVE FRACTION SIX TWELFTHS;No;0;R6/12;N;

is not written as

109FB;MEROITIC CURSIVE FRACTION SIX TWELFTHS;No;0;R1/2;N;

given that there is also a

109BD;MEROITIC CURSIVE FRACTION ONE HALF;No;0;R1/2;N;

Aren't the numeric values of U+109FB and U+109BD the same?

Existing software that looks at the numeric values of characters is 
written expecting that rational numbers will have been reduced to 
their lowest form.

___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode



___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Meroitic cursive fractions numerical values

2015-03-28 Thread Karl Williamson

On 03/28/2015 02:25 PM, Asmus Freytag (t) wrote:

Unless there is a value in documenting the value of the numerator and
denominator, in which case this should be prominently explained in the
documentation.


It seems to me that the character name provides sufficient documentation 
of the numerator and denominator



 Or is that written down somewhere already?


A./



On 3/28/2015 1:05 PM, Karl Williamson wrote:

In the 8.0 Beta files, some numerical values are not reduced to their
lowest forms.  Is there a compelling reason that

109FB;MEROITIC CURSIVE FRACTION SIX TWELFTHS;No;0;R6/12;N;

is not written as

109FB;MEROITIC CURSIVE FRACTION SIX TWELFTHS;No;0;R1/2;N;

given that there is also a

109BD;MEROITIC CURSIVE FRACTION ONE HALF;No;0;R1/2;N;

Aren't the numeric values of U+109FB and U+109BD the same?

Existing software that looks at the numeric values of characters is
written expecting that rational numbers will have been reduced to
their lowest form.
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode






___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Meroitic cursive fractions numerical values

2015-03-28 Thread Ken Whistler

On 3/28/2015 1:05 PM, Karl Williamson wrote:
In the 8.0 Beta files, some numerical values are not reduced to their 
lowest forms.  Is there a compelling reason that


109FB;MEROITIC CURSIVE FRACTION SIX TWELFTHS;No;0;R6/12;N;

is not written as

109FB;MEROITIC CURSIVE FRACTION SIX TWELFTHS;No;0;R1/2;N;


Well, obviously you might not consider it a compelling reason, but the
numeric values were written that way in the original proposal (L2/12-206,
June 6, 2012). Nobody said anything about rational numbers
expressed as fractions being required to be lowest form,
and the entries were just carried forward into the drafts of Unicode 8.0
UnicodeData.txt for beta review.



given that there is also a

109BD;MEROITIC CURSIVE FRACTION ONE HALF;No;0;R1/2;N;

Aren't the numeric values of U+109FB and U+109BD the same?


Of course.



Existing software that looks at the numeric values of characters is 
written expecting that rational numbers will have been reduced to 
their lowest form.


Well, not all existing software, obviously, as the tools used to 
generate the

derived data files didn't complain, and produced the correct results for
these Meroitic fractions:

http://www.unicode.org/Public/8.0.0/ucd/extracted/DerivedNumericValues-8.0.0d10.txt

And there is nothing in the documentation of the Numeric_Value property 
(see UAX #44)

that currently *requires* only an irreducible fraction (or an integer) in
the field. (See also DerivedNumericValues.txt, which is silent on this.)

You can always provide beta feedback requesting that the relevant fractions
be changed to their lowest forms, for review by the May UTC meeting.

Personally, I wouldn't object to a change like that, as I don't see any 
particular

didactic value to expressing the fractional values with precisely the same
numerator and denominator as the character form implies, if it isn't 
mathematically

necessary.

On the other hand, I would be loathe to make this a mandatory *requirement*
of the Numeric_Value field, as that would then add yet another baroque
invariant on the UCD data, and would imply yet more elaborated testing to
verify for each release that a new invariant we imposed on ourselves what
not somehow violated in the new data for the UCD. The set of invariants 
currently

maintained is already bordering on impossible for any one participant in the
data maintenance to understand.

The other drawbacks of piling on invariants is that the UTC has been 
bitten by

them in the past when something new comes up that wasn't anticipated.
This particular requirement might be innocuous and safe -- but why tempt
the fates?

--Ken

___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Meroitic cursive fractions numerical values

2015-03-28 Thread Karl Williamson
In the 8.0 Beta files, some numerical values are not reduced to their 
lowest forms.  Is there a compelling reason that


109FB;MEROITIC CURSIVE FRACTION SIX TWELFTHS;No;0;R6/12;N;

is not written as

109FB;MEROITIC CURSIVE FRACTION SIX TWELFTHS;No;0;R1/2;N;

given that there is also a

109BD;MEROITIC CURSIVE FRACTION ONE HALF;No;0;R1/2;N;

Aren't the numeric values of U+109FB and U+109BD the same?

Existing software that looks at the numeric values of characters is 
written expecting that rational numbers will have been reduced to their 
lowest form.

___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode