Aaron Sherman wrote: > Let's take this code as an example: > > while(<>) { > $count++; > $total += substr($_,22,2); > } > printf "Average: %.2f\n", $total/$count; > > Right now, if my expected numeric column has garbage in it on the > 400,000th line, I treat it as zero and go on, getting a meaningful > result.
Indeed, you might consider "ignoring garbage" as producing a "meaningful result", and in the application you envision, that could be extremely useful. However, in other applications, the fact that there was garbage on the 400,000th line could be critical to determining a serious flaw in the results. I note that your "ignored garbage" isn't completely ignored: you still count the line, thusly adjusting your average downward somewhat. Of course if there are millions of non-garbage lines, the difference will be small, and perhaps, for your application, irrelevant. However, if, starting from the 100,000th line through the 600,000th line, all is garbage, and there are only 700,000 lines, the garbage could have quite a bias to the results, and you'd never notice by looking at the first few and last few pages of the report. > If that garbage translates to NaN, then I'm going to get > "Average: NaN" as my result? That's just freaky! Garbage in, garbage out. However, in the case of NaN, at least you can tell that the output is, indeed, garbage. Silent conversion to zero can bias results, and it might go undetected. > More, someone has mentioned the %x{$_}++ feature, which IMHO, MUST continue > to work. > > NaN is a nice feature, but I don't think that it should be an EASY > to invoke it. Indeed, NaN is a nice feature; I hope I've shown that for your example there is a counterexample where it would be helpful to avoid silent conversions of garbage to zero. I think both sets of semantics are useful; I'd personally consider your example a bug, and would rather see code like while (<>) { my $temp = substr($_,22,2); if ( is_numeric ( $temp )) { $count ++; $total += $temp; } else { $badlines ++; } } printf "Average: %.2f\n", $total/$count; printf "goodlines: $count badlines: $badlines\n"; for some definition of "is_numeric", possibly checking for the reasonableness of the range of the input number for the particular application, as well as it looking like a number. Yes, it takes a few extra lines to code, but adds a significant amount of surety to the usefulness of the results. Clearly my code could be written with or without the existance of the NaN feature. The existance of and use of the feature of string garbage converting to NaN allows your code to be used more safely, and when the result is NaN, you realize the need to convert your code to my code to determine the validity of your results. -- Glenn ===== Due to the current economic situation, the light at the end of the tunnel will be turned off until further notice.