Re: the handiness of undef becoming NaN (when you want that)
Aaron Sherman wrote: Let's take this code as an example: while() { $count++; $total += substr($_,22,2); } printf Average: %.2f\n, $total/$count; Right now, if my expected numeric column has garbage in it on the 400,000th line, I treat it as zero and go on, getting a meaningful result. Indeed, you might consider ignoring garbage as producing a meaningful result, and in the application you envision, that could be extremely useful. However, in other applications, the fact that there was garbage on the 400,000th line could be critical to determining a serious flaw in the results. I note that your ignored garbage isn't completely ignored: you still count the line, thusly adjusting your average downward somewhat. Of course if there are millions of non-garbage lines, the difference will be small, and perhaps, for your application, irrelevant. However, if, starting from the 100,000th line through the 600,000th line, all is garbage, and there are only 700,000 lines, the garbage could have quite a bias to the results, and you'd never notice by looking at the first few and last few pages of the report. If that garbage translates to NaN, then I'm going to get Average: NaN as my result? That's just freaky! Garbage in, garbage out. However, in the case of NaN, at least you can tell that the output is, indeed, garbage. Silent conversion to zero can bias results, and it might go undetected. More, someone has mentioned the %x{$_}++ feature, which IMHO, MUST continue to work. NaN is a nice feature, but I don't think that it should be an EASY to invoke it. Indeed, NaN is a nice feature; I hope I've shown that for your example there is a counterexample where it would be helpful to avoid silent conversions of garbage to zero. I think both sets of semantics are useful; I'd personally consider your example a bug, and would rather see code like while () { my $temp = substr($_,22,2); if ( is_numeric ( $temp )) { $count ++; $total += $temp; } else { $badlines ++; } } printf Average: %.2f\n, $total/$count; printf goodlines: $count badlines: $badlines\n; for some definition of is_numeric, possibly checking for the reasonableness of the range of the input number for the particular application, as well as it looking like a number. Yes, it takes a few extra lines to code, but adds a significant amount of surety to the usefulness of the results. Clearly my code could be written with or without the existance of the NaN feature. The existance of and use of the feature of string garbage converting to NaN allows your code to be used more safely, and when the result is NaN, you realize the need to convert your code to my code to determine the validity of your results. -- Glenn = Due to the current economic situation, the light at the end of the tunnel will be turned off until further notice.
Re: the handiness of undef becoming NaN (when you want that)
On Mon, 22 Oct 2001 12:18:16 -0400 Aaron Sherman [EMAIL PROTECTED] wrote: $z[0] = 50; $z[2] = 20; @x = @y[@z]; In your code, should @x contain (@y[50,0,20]) or (@y[50,20]) or (@y[50,undef,20]) ? @y[50,undef,20], which in Perl5 is @y[50,0,20]. An arbitrary and perhaps confusing decision. If there are other means, I'm not thinking of them right now. Perl's conversion of undefined values and strings to 0 is VERY USEFUL. I'd really like to avoid breaking it. Yes //, makes it easier to get over the undef thing, but only a little. I'm always getting warnings when I do stuff like that in my code. Let's take this code as an example: while() { $count++; $total += substr($_,22,2); } printf Average: %.2f\n, $total/$count; An interesting example. The answer I have in mind: if (m/^.{21}(\d\d)/) { $total += $1; } More, someone has mentioned the %x{$_}++ feature, which IMHO, MUST continue to work. What is void plus one? I think a pragma for this would be ideal. Sam.
Re: the handiness of undef becoming NaN (when you want that)
Aaron Sherman wrote: I see your point, but going from: you have to error-check to be sure that the average you get is valid to you get NaN and like it is a bit steep. you get NaN and like it only happens when you put garbage in... and get garbage out. Yes, NaN is garbage. But when it doesn't happen, you would have reasonable confidence that the input is not garbage even _without_ writing the error checking code. So I see the tradeoff as being the following: Automatic NaN conversions for garbage in allows me to avoid writing the error checking code for cases where good data is coming in, yet have confidence when I get non-NaN output that the data was good (to the extent that numbers appeared where expected in strings). vs. Automatic conversion of garbage to zero allows me to get good results for cases where good data is coming in, but for garbage in I get what you call meaningful results without any clue as to whether the meaningful results resulted from .005% garbage, or 99.995% garbage, or any other % garbage. So laziness and automatic NaN conversion allows me to write the simple solution, and if all my assumptions about the data are correct, I get good results. But if they aren't, I get NaN, and know I need to write a more careful program, with more input validity checking. Laziness and zero conversion allows me to produce meaningful results with no clue as to just how much meaning they actually have. Unless, of course, I happen to divide by one of those garbage ins, and get a meaningful divide by zero error. How do you produce meaningful results in that case? For Dallas you get 8, and for Dr. Who you get NaN! For the present technique of conversion to zero, is the result for Dallas or Dr. Who a better result? Is a garbage 0 or a garbage 8 more meaningful to you? Is more error checking good? Yes. Is screwing the user who doesn't error check the right answer? Not in my Perl programming experience We can agree more error checking is good. It is not clear to me that producing meaningful results from garbage in should be considered screwing the user. I'd call it alerting the user that he has some garbage in, and he needs to enhance his code (or data source) to deal with it to produce what I would call meaningful results -- some statistics on % data valid or some such thing. Yes, it takes a few extra lines to code, but adds a significant amount of surety to the usefulness of the results. But converting to NaN does *not*. I suspect you are reacting here to the probably misreading of my statement above. Note, however, that I did not say that converting to NaN adds usefulness to the results. I said it adds surety to the usefulness of the results. When you get NaN, you are very, very sure that something went wrong, as NaN is seldom, if ever, useful as a result. So converting to NaN does add a significant amount of _surety_ to the usefulness of the results. You know then, that the results are not useful, because you had garbage in. Great, sounds-like pragma territory. I could live with a: use string_to_nan; I have a hard time living with: use string_to_zero; and a NaN default. If I had to choose among those, I'd personally prefer to live with the second. However, there are other options, some of which have been mentioned in related threads. The option I like is to have Perl 6 provide a selection of different string methods for extracting the numeric values, for a variety of types of numeric values. For example, one method might accept only integers, another method might allow complex numbers, another method might allow metric suffixes for scaling, another method might allow floating point numbers, another might allow decimal numbers without exponents, another might allow full numeric expression evaluation (numeric constants and operators only). With such an option, you could choose the type of conversion based on the type of data you expect. I could see each of those conversion functions taking an optional parameter defining what to return if the conversion fails... typical values for the parameter might be undef, 0, or NaN... and I'd recommend that omitting the parameter would cause the conversion to return undef easily detectable by code that cares, converts to zero for some cases of meaningful results (when they really might be) and compatible with most cases of not supplying a parameter to a method. -- Glenn = Due to the current economic situation, the light at the end of the tunnel will be turned off until further notice.
RE: the handiness of undef becoming NaN (when you want that)
More, someone has mentioned the %x{$_}++ feature, which IMHO, MUST continue to work. What is void plus one? Can't we utilize the lazy arrays stuff to make all this work. Out of the box, all entries could default to NaN. But its easy to write @a ^= 0; to change this default. I'm sure this could be extended to work with hashes. I'm not sure what the correct syntax is, though. Dave.
Re: the handiness of undef becoming NaN (when you want that)
Aaron Sherman [EMAIL PROTECTED] writes: On Mon, Oct 22, 2001 at 04:27:24PM +0100, Sam Vilain wrote: On Fri, 19 Oct 2001 09:27:50 -0400 Aaron Sherman [EMAIL PROTECTED] wrote: I am implementing a textbook algo in Perl (the textbook has it written in C++) and have realized that if undef was to numericize to NaN instead of 0, there are a lot of uninitialization errors that would get caught. use strict vars; does not recognize use of new array indices as unitialized variables. Yes, but do you really want to break: $z[0] = 50; $z[2] = 20; @x = @y[@z]; In your code, should @x contain (@y[50,0,20]) or (@y[50,20]) or (@y[50,undef,20]) ? @y[50,undef,20], which in Perl5 is @y[50,0,20]. I have a great many fears around NaN. I think I should only be able to get a NaN by: Directly invoking it (e.g. $x = NaN) Performing a mathematical operation whose result would otherwise be an exception (e.g. $x = 1/0) If there are other means, I'm not thinking of them right now. Perl's conversion of undefined values and strings to 0 is VERY USEFUL. I'd really like to avoid breaking it. Yes //, makes it easier to get over the undef thing, but only a little. Let's take this code as an example: while() { $count++; $total += substr($_,22,2); } printf Average: %.2f\n, $total/$count; Right now, if my expected numeric column has garbage in it on the 400,000th line, I treat it as zero and go on, getting a meaningful result. If that garbage translates to NaN, then I'm going to get Average: NaN as my result? That's just freaky! Yeah, but it's correct. If you extract something and get garbage then you're going to screw your average up. Admittedly, in 400,000 lines, it's unlikely to shift the average by much, but it will shift it. Of course, this is assuming there's no difference between: $total += substr($_, 22, 2); # implicit numification $total += +substr($_, 22, 2); # explicit numification Which might be controllable via pragma. However, it seems to me that having both explicit and implicit numification of garbage go to NaN is a *good* thing because it will force you to think about what you're doing with code like this. Maybe you want garbage to increment the count and numify as zero, so you'd do: while () { my $val = +substr($_, 22, 2); $count++; if ($val ne 'NaN') { # Ugly, gets 'round the IEEE thing $total += $val; } } Or, if you just want to skip that line: while () { my $val = +substr($_, 22, 2); next if $val eq 'NaN'; $count++; $total += $val; } More, someone has mentioned the %x{$_}++ feature, which IMHO, MUST continue to work. And it will, since it has nothing whatsoever to do with string numification. %x{non_existent}++ # Doesn't do numification. Autovivifies an entry in # %x, with value 'undef', which numfies to 0. %x{string} = 'string'; %x{string}++ %x{string} == 'strinh'; NaN is a nice feature, but I don't think that it should be an EASY to invoke it. Disagree.
Default values, was RE: the handiness of undef becoming NaN (when you want that)
Aaron Sherman wrote On Mon, Oct 22, 2001 at 11:30:01AM -0700, David Whipp wrote: More, someone has mentioned the %x{$_}++ feature, which IMHO, MUST continue to work. What is void plus one? Can't we utilize the lazy arrays stuff to make all this work. Out of the box, all entries could default to NaN. But its easy to write @a ^= 0; to change this default. I'm sure this could be extended to work with hashes. I'm not sure what the correct syntax is, though. Nope. my @a; @a ^= 0; print @a Are you saying that this should print an infinite number of zeros? Quoting Larry on this subject (apocalypse 3, bottom of page 3): I can think of other cans of worms this opens, and I'm quite certain I'm too stupid to think of them all. Nevertheless, my gut feeling is that we can make things work more like people expect rather than less. And I was always a little bit jealous that REXX could have arrays with default values. :-) Dave.
RE: Default values, was RE: the handiness of undef becoming NaN (when you want that)
Aaron Sherman wrote Larry's hubris notwithstanding, I'd like to suggest that more, in this case means no, it prints nothing. This *must* be true, as you don't want: @a ^+ @b to always return an infinite list. You want it to produce a list with (as a3 suggested) length max(@a.length,@b.length) OK, now we've got this resolved, I'd like to return the focus back to the original point. @x ^= 0; @x[5]++; does not have problems with NaNs; and does not generate a warning with -w. My suggestion was to extend this to work with hashes, too. something like %x ^= $^_ = 0; or values %x ^= 0; Dave.
Re: De NaN-ibus
Hufgo suggested: :Have I missed anything? Code, and docs, for ieee.pm. Other than that, it looks good to me. :) Ah, but that's a SMoP, left as an exercise to the reader. ;-) Damian