Aaron Sherman wrote:

> Let's take this code as an example:
>
>         while(<>) {
>                 $count++;
>                 $total += substr($_,22,2);
>         }
>         printf "Average: %.2f\n", $total/$count;
>
> Right now, if my expected numeric column has garbage in it on the
> 400,000th line, I treat it as zero and go on, getting a meaningful
> result.

Indeed, you might consider "ignoring garbage" as producing a "meaningful
result", and in the application you envision, that could be extremely useful.

However, in other applications, the fact that there was garbage on the
400,000th line could be critical to determining a serious flaw in the results.

I note that your "ignored garbage" isn't completely ignored: you still count
the line, thusly adjusting your average downward somewhat.  Of course if there
are millions of non-garbage lines, the difference will be small, and perhaps,
for your application, irrelevant.

However, if, starting from the 100,000th line through the 600,000th line, all
is garbage, and there are only 700,000 lines, the garbage could have quite a
bias to the results, and you'd never notice by looking at the first few and
last few pages of the report.

> If that garbage translates to NaN, then I'm going to get
> "Average: NaN" as my result? That's just freaky!

Garbage in, garbage out.  However, in the case of NaN, at least you can tell
that the output is, indeed, garbage.  Silent conversion to zero can bias
results, and it might go undetected.

> More, someone has mentioned the %x{$_}++ feature, which IMHO, MUST continue
> to work.
>
> NaN is a nice feature, but I don't think that it should be an EASY
> to invoke it.

Indeed, NaN is a nice feature; I hope I've shown that for your example there is
a counterexample where it would be helpful to avoid silent conversions of
garbage to zero.

I think both sets of semantics are useful; I'd personally consider your example
a bug, and would rather see code like

while (<>)
{ my $temp = substr($_,22,2);
  if ( is_numeric ( $temp ))
  { $count ++;
    $total += $temp;
  } else
  { $badlines ++;
  }
}
printf "Average: %.2f\n", $total/$count;
printf "goodlines: $count  badlines: $badlines\n";

for some definition of "is_numeric", possibly checking for the reasonableness
of the range of the input number for the particular application, as well as it
looking like a number.

Yes, it takes a few extra lines to code, but adds a significant amount of
surety to the usefulness of the results.

Clearly my code could be written with or without the existance of the NaN
feature.  The existance of and use of the feature of string garbage converting
to NaN allows your code to be used more safely, and when the result is NaN, you
realize the need to convert your code to my code to determine the validity of
your results.

--
Glenn
=====
Due to the current economic situation, the light at the
end of the tunnel will be turned off until further notice.


Reply via email to