Re: [Python-Dev] nice()

Smith Mon, 13 Feb 2006 09:15:00 -0800

| From: Josiah Carlson <[EMAIL PROTECTED]>
| "Alan Gauld" <[EMAIL PROTECTED]> wrote:
|| However I do dislike the name nice() - there is already a nice() in
|| the 
|| os module with a fairly well understood function.

perhaps trim(), nearly(), about(), defer_the_pain_of() :-) I've waited to think 
of names until after writing this. The reason for the last name option may 
become apparent after reading the rest of this post.

|| But I'm sure some
|| time with a thesaurus can overcome that single mild objection. :-)
| 
| Presumably it would be located somewhere like the math module.

I would like to see it as accessible as round, int, float, and repr. I really 
think a round-from-the-left is a nice tool to have. It's obviously very easy to 
build your own if you know what tools to use. Not everyone is going to be 
reading the python-dev or similar lists, however, and so having it handy would 
be nice.

| From: Greg Ewing <[EMAIL PROTECTED]>
| Smith wrote:
| 
||     When teaching some programming to total newbies, a common
||     frustration is how to explain why a==b is False when a and b are
||     floats computed by different routes which ``should'' give the
||     same results (if arithmetic had infinite precision).
| 
| This is just a special case of the problems inherent
| in the use of floating point. As with all of these,
| papering over this particular one isn't going to help
| in the long run -- another one will pop up in due
| course.
| 
| Seems to me it's better to educate said newbies not
| to use algorithms that require comparing floats for
| equality at all. 

I think that having a helper function like nice() is a middle ground solution 
to the problem, falling short of using only decimal or rational values for 
numbers and doing better than requiring a test of error between floating values 
that should be equal but aren't because of alternate methods of computation. 
Just like the argument for having true division being the default behavior for 
the computational environment, it seems a little unfriendly to expect the more 
casual user to have to worry that 3*0.1 is not the same as 3/10.0. I know--they 
really are different, and one should (eventually) understand why, but does 
anyone really want the warts of floating point representation to be popping up 
in their work if they could be avoided, or at least easily circumvented?

I know you know why the following numbers show up as not equal, but this would 
be an example of the pain in working with a reasonably simple exercise of, say, 
computing the bin boundaries for a histogram where bins are a width of 0.1: 

###
>>> for i in range(20):
...  if (i*.1==i/10.)<>(nice(i*.1)==nice(i/10.)):
...   print i,repr(i*.1),repr(i/10.),i*.1,i/10.
... 
3 0.30000000000000004 0.29999999999999999 0.3 0.3
6 0.60000000000000009 0.59999999999999998 0.6 0.6
7 0.70000000000000007 0.69999999999999996 0.7 0.7
12 1.2000000000000002 1.2 1.2 1.2
14 1.4000000000000001 1.3999999999999999 1.4 1.4
17 1.7000000000000002 1.7 1.7 1.7
19 1.9000000000000001 1.8999999999999999 1.9 1.9
###

For, say, garden variety numbers that aren't full of garbage digits resulting 
from fp computation, the boundaries computed as 0.1*i are not going to agree 
with such simple numbers as 1.4 and 0.7.

Would anyone (and I truly don't know the answer) really mind if all floating 
point values were filtered through whatever lies behind the str() manipulation 
of floats before the computation was made? I'm not saying that strings would be 
compared, but that float(str(x)) would be compared to float(str(y)) if x were 
being compared to y as in x<=y. If this could be done, wouldn't a lot of grief 
just go away and not require the use of decimal or rational types for many 
users? 

I understand that the above really is just a patch over the problem, but I'm 
wondering if it moves the problem far enough away that most users wouldn't have 
to worry about it. Here, for example, are the first values where the running 
sum doesn't equal the straight multiple of some step size:

###
>>> def go(x,n=1000):
...  s=0;i=0
...  while s<n:
...   i+=1;s+=x
...   if nice(s)<>nice(i*x):
...    return i,s,i*x,`s`,`i*x`
... 
>>> for i in range(1,100):
...  print i, go(i/1000.)
...  print
...  
1 (60372 60.3719999999 60.372 60.371999999949999 60.372)

2 (49645 99.2899999999 99.29 99.289999999949998 99.290000000000006)
###

The soonest the breakdown occurs is at the 22496th multiple of 0.041 for the 
range given above. By the time someone starts getting into needs of iterating 
so many times, they will be ready to use the more sophisticated option of 
nice()--the one which makes it more versatile and less of a patch--the option 
to round the answers to a given number of leading digits rather than a given 
decimal precision like round. nice() gives a simple way to think about making a 
comparison of floats. You just have to ask yourself at what "part per X" do you 
no longer care whether the numbers are different or not. e.g., for 
approximately 1 part in 100, use nice(x,2) and nice(y,2) to make the comparison 
between x and y. Replacing nice(*) with nice(*,6) in the go() defined above 
produces no discrepancy in values computed the two different ways. Since the 
cost of str() and '%.*e' is nearly the same, perhaps a default value of 
leadingDigits=9 would be a good default value, and the float(str()) optio!
 n could be eliminated from nice. Isn't nice() sort of a poor-man's 
decimal-type without all the extra baggage?

| In my opinion, if you ever find
| yourself trying to do this, you're not thinking about
| the problem correctly, and your algorithm is simply
| wrong, even if you had infinitely precise floats.
|

As for real world examples of when this would be nice I will have to rely on 
others to justify this more heavily. Some quick examples that come to mind are:

* Creating histograms of physical measurements with limited significant digits 
(i.e., not lots of digits from computation)
* Collecting sets of points within a certain range of a given value (all points 
within 10% of a given value)
* Stopping iterations when computed errors have fallen below a certain 
threshold. (For this, getting the stopping condition "right" is not so critical 
because doing one more iteration usually isn't a problem if an error happens to 
be a tiny bit larger than the required tolerance. However, the leadingDigits 
option on nice() allows one to even get this stopping condition right to a 
limited precision, something like

###
tol = 1e-5
while 1:
    #do something and compute err
    if nice(err,3)<=nice(tol,3):
        break
###

By specifying the leadingDigits value of 3, the user is saying that it's fine 
to quit when the err >= 0.9995. Since there is no additional cost in specifying 
more digits, a value of 9 could be used as well.

| Ismael at tutor wrote:
| How about overloading Float comparison? 

I'm not so adept at such things--how easy is this to do for all comparisions in 
a script? in an interactive session? For the latter, if it were easy, perhaps 
it could be part of a "newbie" mode that could be loaded. I think that some 
(one above has said so) would rather not have an issue pushed away, they would 
want to leave things as they are and just learn to work around it, not be given 
a hand-holding device that is eventually going to let them down anyway. I'm 
wondering if just having to use the function to make a comparison will be like 
putting your helmet on before you cycle--a reminder that there may be hazards 
ahead, proceed with caution. If so, then overloading the Float comparision 
would be better not done, requiring the "buckling" of the floats within nice().

| 
| If I have understood correctly, float to float comparison must be done
| comparing relative errors, so that when dealing with small but rightly
| represented numbers it won't tell "True" just because they're
| "close". I 
| think your/their solution only covers the case when dealing with "big"
| numbers.

Think of two small numbers that you think might fail the nice test and then use 
the leadingDigits option (set at something like 6) and see if the problem 
doesn't disappear. If I understand you correctly, is this such a case: x and y 
defined below are truly close and nice()'s default comparison would say they 
are different, but nice(*,6) would say they are the same--the same to the first 
6 digits of the exponential representation:

###
>>> x=1.234567e-7
>>> y=1.234568e-7
>>> nice(x)==nice(y)
False
>>> nice(x,6)==nice(y,6)
True
###

| Chuck Allison wrote on edu-sig:
| There is a reliable way to compute the exact number of floating-point
| "intervals" (one less than the number of FP numbers) between any two
| FP numbers. It is a long-ago solved problem. I have attached a C++
| version. You can't define closeness by a "distance" in a FP system -
| you should use this measure instead (called "ulps" - units in the
| last place). The distance between large FP numbers may always be
| greater than the tolerance you prescribe. The spacing between
| adjacent FP numbers at the top of the scale for IEEE double precision
| numbers is 2^(972) (approx. 10^(293))! I doubt you're going to make
| your tolerance this big. I don't believe newbies can grasp this, but
| they can be taught to get a "feel" for floating-point number systems.
| You can't write reliable FP code without this understanding. See
| http://uvsc.freshsources.com/decimals.pdf.            

A very readable 13 page introduction to some floating point issues. Thanks for 
the reference. The author concludes with,

"Computer science students don't need to be numerical analysts, but they may be 
called upon to write mathematical software. Indeed, scientists and engineers 
use tools like Matlab and Mathematica, but who implements these systems? It 
takes the expertise that only CS graduates have to write such sophisticated 
software. Without knowledge of the intricacies of floating-point computation, 
they will make a mess of things. In this paper I have surveyed the basics that 
every CS graduate should have mastered before they can be trusted in a 
workplace that does any kind of computing with real numbers."

So perhaps this brings us back to the original comment that "fp issues are a 
learning opportunity." They are. The question I have is "how soon do they need 
to run into them?" Is decreasing the likelihood that they will see the problem 
(but not eliminate it) a good thing for the python community or not?

/c
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] nice()

Reply via email to