[sympy] Re: Arbitrary Precision Problem Persists

casevh Sat, 09 Apr 2016 23:05:14 -0700


On Saturday, April 9, 2016 at 3:36:10 PM UTC-7, Amy Valhausen wrote:
>
> Hi Case,
>
> thanks for your reply - in the thread cited ; 
> https://groups.google.com/forum/#!topic/sympy/eUfW6C_nHdI
>
> The impression I get (please keep in mind Im a newbie), was that most
> of the gentleman came to the conclusion that floats of any kind used for 
> this problem would result in data loss and inaccurate results, Oscar seemed 
> to think that numeric values would have to be stored as strings to prevent 
> this.  The issue of using 900 plus digits was also covered so I understand 
> that part.
>
 
Floating point arithmetic has several quirks that come into play when 
solving this type of problem. I will skip some of the really obscure 
details but I hope this helpful.

Floating-point representation

Most computer arithmetic is performed using binary, or radix-2, arithmetic. 
Most human arithmetic is performed using decimal, or radix-10, arithmetic. 
When you work with the value 1.4142, you are specifying a decimal value. 
When a decimal value is converted to a binary floating point number, most 
values cannot be stored exactly. It the same as trying to store 1/3 in a 
calculator - the value that is stored is close but not exact. There are a 
few implications:

1) The use of the Decimal type can sometimes help since it can store the 
value exactly. Decimal("1.4142") is stored exactly.

2) Notice that I used a string to initialize the Decimal instance. It is a 
common error to see Decimal(1.4142). The Python interpreter first convert 
the characters 1.4142 into a binary, not-quite-exact, approximation, then 
Decimal() converts this approximation. The conversion is exact, but it is 
the exact version of a slightly wrong value.

>>> Decimal("1.4142")
Decimal('1.4142')
>>> Decimal(1.4142)
Decimal('1.4141999999999999015898310972261242568492889404296875')

When working with arbitrary-precision, it is important to delay conversion 
to a floating-point format as long as possible. You don't want to change 
from one radix to another (i.e. from binary to decimal format) or increase 
the precision. Each conversion will (almost always) create a very small 
error.

Floating-point values

Floating-point numbers are are stored in a condensed format. Instead of 
storing all the possible digits (for decimal format) or bits (for binary 
format), only the first few digits/bits are stored. The remaining 
digits/bits are just replaced by 0 and a count is kept of the number of 
zeros. If the count of zeros is too large and positive, the number is 
eventually replaced by Infinity. If the count is large and negative, the 
zeros are effectively placed before the digits. If the count becomes too 
negative, the number is eventually replaced by 0.

The most common floating point format uses 64 bits. The precision (the 
number of leading bits that are kept) is 53. This is approximately 15 
decimal digits. The exponent (the count of the zeros) is slightly more than 
1000. This is approximately 308 decimal digits. This format in a standard 
known as IEEE-754. There are other formats that use smaller limits (i.e. 
float32) or larger limits (i.e. float128)

Calculating ((n ** 6000) % 400) encounters both the limits of floating 
point values. The value of 1.4142**6000 is 
approximately 1.1614417884357118e+903. The value of the exponent exceeds 
the limits of a float64 so a value of Infinity (or OverflowError) is 
returned by Python.

The other problem is that mod cares about the least significant bits but 
the condensed format only keeps the most significant bits and the least 
significant bits are replaced by 0.

To get accurate results using floating point arithmetic, we need to use a 
format that has a sufficiently large exponent range to avoid the 
Infinity/OverflowError and sufficient precision to not replace the least 
significant bits with zero. A precision greater than 903 decimal digits (or 
903*math.log2(10) = 3000 bits) is the needed so ensure that some of the 
least significant information is still present.

There are several common data types that can utilize arbitrary precision 
and large exponents ranges.

1) The decimal.Decimal type included with Python.
2) The mpmath.mpf type that is part of the mpmath module and is used by 
sympy.
3) The gmpy2.mpfr type that is part of the gmpy2 module. 

Here is an example using gmpy2. The result is rounded to 53 bits.

>>> import gmpy2
>>> gmpy2.get_context().precision=3100
>>> a=(gmpy2.mpfr("1.4142")**6000) % 400
>>> gmpy2.round2(a, 53)
mpfr('271.04818100863093')

In this example, the value 1.4142 was kept as a string until it was 
converted directly to extended precision type. Then the result was rounded 
to 53 bits.

Another way to calculate (1.4142**6000) % 400 is to delay the conversion to 
a floating point until after the calculation is done. This done by using 
rational, or fractional, arithmetic. The value 1.4142 is exactly 7071/5000. 
The fraction 7071/5000 is raised to the 6000 power exactly and then % 400 
calculates the remainder exactly. Since the numerator and denominator have 
unlimited size, the values can be computed exactly. The final remainder is 
then converted to a float.

>>> float((Fraction("1.4142")**6000) % 400)
271.04818100863093

The two values agree.

So why don't we just use rational arithmetic? As the numerator and 
denominator become larger, the calculations slow down. Floating point 
arithmetic has consistent performance. 

casevh 

>
> As I mentioned in the cited link above, I want to be able to use any 
> rational number between 1.0x to 2 as the base, I picked a truncated value 
> of sqrt(2) as an example only.
>
> The value you mentioned of 176 for ;
>
> ( 1.414213562^6000) % 400)
>
> Was one of several values reported by folks working on this issue at the 
> link ;
> https://groups.google.com/forum/#!topic/sympy/eUfW6C_nHdI
>
> But they determined that it, along with other several values like 32, 
> etc....
> were wrong... some seemed to think that the value of 271 was closer,
> but there were so many different approaches used, functions, data types
> used that all gave different answers it left me a bit perplexed in the end
> to grasp which of the methods was most accurate.  Towards the end of 
> the thread https://groups.google.com/forum/#!topic/sympy/eUfW6C_nHdI
> Oscar and Aaron seem to have worked out what I think is a better overall 
> approach but I cant understand or put together what they mean enough to 
> write it as code and experiment with....
>
>
> On Saturday, April 9, 2016 at 3:04:16 PM UTC-6, casevh wrote:
>>
>>
>> On Saturday, April 9, 2016 at 11:56:01 AM UTC-7, Amy Valhausen wrote:
>>>
>>> In the thread ; 
>>> https://groups.google.com/forum/#!topic/sympy/eUfW6C_nHdI
>>>
>>> I was seeking feedback and help with the problem of type ;
>>>
>>> ( 1.414213562^6000) % 400)
>>>
>>> After reviewing all the excellent collaboration at the above link, Im 
>>> feeling
>>> a little lost and overwhelmed.  So many good suggestions were made and
>>> also a lot of bugs, and bottlenecks discovered - Im not sure which 
>>> approach
>>> is best for what I am hoping to solve?
>>>
>>> Ive been reviewing the problems with the different approaches, including 
>>> precision loss
>>> with use of floats, some of the suggested functions, etc.
>>>
>>> Oscar mentioned the solution of using a string value and storing it as a 
>>> possibility?
>>>
>>> My hope is to arrive at a method that returns lossless precision, if I 
>>> dont have to sacrifice speed then
>>> thats great but if preserving accuracy comes at a cost of speed thats ok 
>>> with me.
>>>
>>> I'd like also to get the highest precision available, if possible past 
>>> the 15 digit limit, the larger the better.
>>>
>>> What do you guys think is the best code solution for ;
>>> given any rational  between 1.0x and 2, for example "1.414213562"
>>> being raised to a power as high as 6000
>>> with modulus returned of 400
>>>
>>> ( 1.414213562^6000) % 400)    ?
>>>
>>> So that the greatest and most accurate precision is kept?
>>>
>>> I have been testing out some of the code ideas such as ;
>>>
>>> from sympy import mpmath
>>> mpmath.mp.dps = 821
>>> x = mpmath.mpf('1.414213562') ** 6000
>>>
>>> Thanks Oscar, Aaron and everyone!  I have a few questions to ask, but 
>>> first wanted to say this entire thread has been so meaningful as a learning 
>>> process for me!
>>>
>>> I am an intermediate VB programmer and newbie to python and this single 
>>> thread you have all been so gracefully collaborating together on has taught 
>>> me more and helped me to understand the issues surrounding this than any 
>>> other source online I have researched and I am saving this entire thread 
>>> for future use and review!
>>>
>>> I have been trying to keep pace with all the excellent insights and 
>>> suggestions as well as potential bottlenecks.
>>>
>>>  
>> Using Python's included Fraction type, you can calculate ((n**6000) % 
>> 400) exactly and then convert the exact result into a float.
>>
>> >>> from fractions import Fraction
>> >>> float((Fraction("1.414213562")**6000) % 400)
>> 314.3479803094588
>>
>> The Fraction type isn't very fast but the calculation is exact.
>>
>> However, if your value for n changes a small amount, your result will 
>> change by a large amount.
>>
>> >>> float((Fraction("1.414213562")**6000) % 400)
>> 314.3479803094588
>> >>> float((Fraction("1.4142135623")**6000) % 400)
>> 82.204377195306
>> >>> float((Fraction("1.41421356237")**6000) % 400)
>> 164.74604114158947
>> >>> float((Fraction("1.414213562373")**6000) % 400)
>> 394.37793561523273
>>
>> Your value of n looks suspiciously like an approximation to sqrt(2). If 
>> that's correct, the result would be:
>>
>> >>> pow(2,3000,400)
>> 176
>>
>> Since we are using an approximation, the least significant digits (the 
>> ones retained by the mod operation) are incorrect. How many digits in the 
>> approximation of sqrt(2) are needed to produce a stable result?
>>
>> (Note: I'm the maintainer of gmpy2, a module for very fast arbitrary 
>> precision arithmetic. If it is present on your system, it will be 
>> automatically used by mpmath. I'll use gmpy2 directly since I'm more 
>> familiar with it.)
>>
>> >>> import gmpy2
>> >>> gmpy2.get_context().precision=10000
>> >>> s=str(gmpy2.sqrt(2))
>> >>> float((gmpy2.mpq(s[:900])**6000) % 400)
>> 397.6970528425263
>> >>> float((gmpy2.mpq(s[:901])**6000) % 400)
>> 317.77259005351806
>> >>> float((gmpy2.mpq(s[:902])**6000) % 400)
>> 381.7876974957164
>> >>> float((gmpy2.mpq(s[:903])**6000) % 400)
>> 381.7876974957164
>> >>> float((gmpy2.mpq(s[:904])**6000) % 400)
>> 370.4278485701384
>> >>> float((gmpy2.mpq(s[:905])**6000) % 400)
>> 48.15587878502279
>> >>> float((gmpy2.mpq(s[:906])**6000) % 400)
>> 345.64468558337177
>> >>> float((gmpy2.mpq(s[:907])**6000) % 400)
>> 154.42196588552062
>> >>> float((gmpy2.mpq(s[:908])**6000) % 400)
>> 175.2996939157355
>> >>> float((gmpy2.mpq(s[:909])**6000) % 400)
>> 175.82163711649088
>> >>> float((gmpy2.mpq(s[:910])**6000) % 400)
>> 175.9782200767175
>> >>> 
>>
>> You need over 900 accurate digits in the approximation of the sqrt(2) to 
>> get the expected answer.
>>
>> casevh
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sympy.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sympy/0c6b1d23-49cb-499c-ae22-5c82bffa9df2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[sympy] Re: Arbitrary Precision Problem Persists

Reply via email to