Re: the handiness of undef becoming NaN (when you want that)

2001-10-22 Thread Glenn Linderman

Aaron Sherman wrote:

 Let's take this code as an example:

 while() {
 $count++;
 $total += substr($_,22,2);
 }
 printf Average: %.2f\n, $total/$count;

 Right now, if my expected numeric column has garbage in it on the
 400,000th line, I treat it as zero and go on, getting a meaningful
 result.

Indeed, you might consider ignoring garbage as producing a meaningful
result, and in the application you envision, that could be extremely useful.

However, in other applications, the fact that there was garbage on the
400,000th line could be critical to determining a serious flaw in the results.

I note that your ignored garbage isn't completely ignored: you still count
the line, thusly adjusting your average downward somewhat.  Of course if there
are millions of non-garbage lines, the difference will be small, and perhaps,
for your application, irrelevant.

However, if, starting from the 100,000th line through the 600,000th line, all
is garbage, and there are only 700,000 lines, the garbage could have quite a
bias to the results, and you'd never notice by looking at the first few and
last few pages of the report.

 If that garbage translates to NaN, then I'm going to get
 Average: NaN as my result? That's just freaky!

Garbage in, garbage out.  However, in the case of NaN, at least you can tell
that the output is, indeed, garbage.  Silent conversion to zero can bias
results, and it might go undetected.

 More, someone has mentioned the %x{$_}++ feature, which IMHO, MUST continue
 to work.

 NaN is a nice feature, but I don't think that it should be an EASY
 to invoke it.

Indeed, NaN is a nice feature; I hope I've shown that for your example there is
a counterexample where it would be helpful to avoid silent conversions of
garbage to zero.

I think both sets of semantics are useful; I'd personally consider your example
a bug, and would rather see code like

while ()
{ my $temp = substr($_,22,2);
  if ( is_numeric ( $temp ))
  { $count ++;
$total += $temp;
  } else
  { $badlines ++;
  }
}
printf Average: %.2f\n, $total/$count;
printf goodlines: $count  badlines: $badlines\n;

for some definition of is_numeric, possibly checking for the reasonableness
of the range of the input number for the particular application, as well as it
looking like a number.

Yes, it takes a few extra lines to code, but adds a significant amount of
surety to the usefulness of the results.

Clearly my code could be written with or without the existance of the NaN
feature.  The existance of and use of the feature of string garbage converting
to NaN allows your code to be used more safely, and when the result is NaN, you
realize the need to convert your code to my code to determine the validity of
your results.

--
Glenn
=
Due to the current economic situation, the light at the
end of the tunnel will be turned off until further notice.





Re: the handiness of undef becoming NaN (when you want that)

2001-10-22 Thread Sam Vilain

On Mon, 22 Oct 2001 12:18:16 -0400
Aaron Sherman [EMAIL PROTECTED] wrote:

 $z[0] = 50;
 $z[2] = 20;
 @x = @y[@z];
  In your code, should @x contain (@y[50,0,20]) or (@y[50,20]) or
  (@y[50,undef,20]) ?
 @y[50,undef,20], which in Perl5 is @y[50,0,20].

An arbitrary and perhaps confusing decision.

 If there are other means, I'm not thinking of them right now.
 Perl's conversion of undefined values and strings to 0 is VERY
 USEFUL. I'd really like to avoid breaking it. Yes //, makes it
 easier to get over the undef thing, but only a little.

I'm always getting warnings when I do stuff like that in my code.

 Let's take this code as an example:
   while() {
   $count++;
   $total += substr($_,22,2);
   }
   printf Average: %.2f\n, $total/$count;

An interesting example.  The answer I have in mind:

if (m/^.{21}(\d\d)/) {
$total += $1;
}

 More, someone has mentioned the %x{$_}++ feature, which IMHO, MUST
 continue to work.

What is void plus one?

I think a pragma for this would be ideal.

Sam.



Re: the handiness of undef becoming NaN (when you want that)

2001-10-22 Thread Glenn Linderman

Aaron Sherman wrote:

 I see your point, but going from: you have to error-check to be
 sure that the average you get is valid to you get NaN and like it
 is a bit steep.

you get NaN and like it only happens when you put garbage in... and get garbage
out.

Yes, NaN is garbage.  But when it doesn't happen, you would have reasonable
confidence that the input is not garbage even _without_ writing the error checking
code.

So I see the tradeoff as being the following:

Automatic NaN conversions for garbage in allows me to avoid writing the error
checking code for cases where good data is coming in, yet have confidence when I get
non-NaN output that the data was good (to the extent that numbers appeared where
expected in strings).

vs.

Automatic conversion of garbage to zero allows me to get good results for cases
where good data is coming in, but for garbage in I get what you call meaningful
results without any clue as to whether the meaningful results resulted from .005%
garbage, or 99.995% garbage, or any other % garbage.

So laziness and automatic NaN conversion allows me to write the simple solution, and
if all my assumptions about the data are correct, I get good results.  But if they
aren't, I get NaN, and know I need to write a more careful program, with more input
validity checking.

Laziness and zero conversion allows me to produce meaningful results with no clue
as to just how much meaning they actually have.

Unless, of course, I happen to divide by one of those garbage ins, and get a
meaningful divide by zero error.  How do you produce meaningful results in that
case?

 For Dallas you get 8, and for Dr. Who you get NaN!

For the present technique of conversion to zero, is the result for Dallas or Dr.
Who a better result?  Is a garbage 0 or a garbage 8 more meaningful to you?

 Is more error checking good? Yes. Is screwing the user who doesn't
 error check the right answer? Not in my Perl programming experience

We can agree more error checking is good.

It is not clear to me that producing meaningful results from garbage in should be
considered screwing the user.  I'd call it alerting the user that he has some
garbage in, and he needs to enhance his code (or data source) to deal with it to
produce what I would call meaningful results -- some statistics on % data valid or
some such thing.

  Yes, it takes a few extra lines to code, but adds a significant amount of
  surety to the usefulness of the results.

 But converting to NaN does *not*.

I suspect you are reacting here to the probably misreading of my statement above.
Note, however, that I did not say that converting to NaN adds usefulness to the
results.  I said it adds surety to the usefulness of the results.  When you get
NaN, you are very, very sure that something went wrong, as NaN is seldom, if ever,
useful as a result.

So converting to NaN does add a significant amount of _surety_ to the usefulness of
the results.  You know then, that the results are not useful, because you had garbage
in.

 Great, sounds-like pragma territory. I could live with a:

 use string_to_nan;

 I have a hard time living with:

 use string_to_zero;

 and a NaN default.

If I had to choose among those, I'd personally prefer to live with the second.
However, there are other options, some of which have been mentioned in related
threads.

The option I like is to have Perl 6 provide a selection of different string methods
for extracting the numeric values, for a variety of types of numeric values.  For
example, one method might accept only integers, another method might allow complex
numbers, another method might allow metric suffixes for scaling, another method might
allow floating point numbers, another might allow decimal numbers without exponents,
another might allow full numeric expression evaluation (numeric constants and
operators only).  With such an option, you could choose the type of conversion based
on the type of data you expect.  I could see each of those conversion functions
taking an optional parameter defining what to return if the conversion fails...
typical values for the parameter might be undef, 0, or NaN... and I'd recommend that
omitting the parameter would cause the conversion to return undef easily
detectable by code that cares, converts to zero for some cases of meaningful results
(when they really might be) and compatible with most cases of not supplying a
parameter to a method.

--
Glenn
=
Due to the current economic situation, the light at the
end of the tunnel will be turned off until further notice.





RE: the handiness of undef becoming NaN (when you want that)

2001-10-22 Thread David Whipp

  More, someone has mentioned the %x{$_}++ feature, which IMHO, MUST
  continue to work.
 
 What is void plus one?

Can't we utilize the lazy arrays stuff to make all this work.
Out of the box, all entries could default to NaN. But its easy
to write

@a ^= 0;

to change this default. I'm sure this could be extended to work
with hashes. I'm not sure what the correct syntax is, though.


Dave.



Re: the handiness of undef becoming NaN (when you want that)

2001-10-22 Thread Piers Cawley

Aaron Sherman [EMAIL PROTECTED] writes:

 On Mon, Oct 22, 2001 at 04:27:24PM +0100, Sam Vilain wrote:
 On Fri, 19 Oct 2001 09:27:50 -0400
 Aaron Sherman [EMAIL PROTECTED] wrote:
 
   I am implementing a textbook algo in Perl (the textbook has
   it written in C++) and have realized that if undef was to
   numericize to NaN instead of 0, there are a lot of uninitialization
   errors that would get caught.  
use strict vars;
   does not recognize use of new array indices as unitialized variables.
  Yes, but do you really want to break:
 $z[0] = 50;
 $z[2] = 20;
 @x = @y[@z];
 
 In your code, should @x contain (@y[50,0,20]) or (@y[50,20]) or
 (@y[50,undef,20]) ?
 
 @y[50,undef,20], which in Perl5 is @y[50,0,20].
 
 I have a great many fears around NaN. I think I should only be able to
 get a NaN by:
 
   Directly invoking it (e.g. $x = NaN)
   Performing a mathematical operation whose result would
   otherwise be an exception (e.g. $x = 1/0)
 
 If there are other means, I'm not thinking of them right now.
 Perl's conversion of undefined values and strings to 0 is VERY
 USEFUL. I'd really like to avoid breaking it. Yes //, makes it
 easier to get over the undef thing, but only a little.
 
 Let's take this code as an example:
 
   while() {
   $count++;
   $total += substr($_,22,2);
   }
   printf Average: %.2f\n, $total/$count;
 
 Right now, if my expected numeric column has garbage in it on the
 400,000th line, I treat it as zero and go on, getting a meaningful
 result. If that garbage translates to NaN, then I'm going to get
 Average: NaN as my result? That's just freaky!

Yeah, but it's correct. If you extract something and get garbage then
you're going to screw your average up. Admittedly, in 400,000 lines,
it's unlikely to shift the average by much, but it will shift it. 

Of course, this is assuming there's no difference between:

$total += substr($_, 22, 2); # implicit numification

$total += +substr($_, 22, 2); # explicit numification

Which might be controllable via pragma.

However, it seems to me that having both explicit and implicit
numification of garbage go to NaN is a *good* thing because it will
force you to think about what you're doing with code like this. Maybe
you want garbage to increment the count and numify as zero, so you'd
do:

while () {
my $val = +substr($_, 22, 2);
$count++;
if ($val ne 'NaN') { # Ugly, gets 'round the IEEE thing
$total += $val;
}
}

Or, if you just want to skip that line:

while () {
my $val = +substr($_, 22, 2);
next if $val eq 'NaN';
$count++;
$total += $val;
}

 More, someone has mentioned the %x{$_}++ feature, which IMHO, MUST
 continue to work.

And it will, since it has nothing whatsoever to do with string
numification. 

%x{non_existent}++ # Doesn't do numification. Autovivifies an entry in
   # %x, with value 'undef', which numfies to 0.

%x{string} = 'string';
%x{string}++ 
%x{string} == 'strinh';

 NaN is a nice feature, but I don't think that it should be an EASY
 to invoke it.

Disagree.








Default values, was RE: the handiness of undef becoming NaN (when you want that)

2001-10-22 Thread David Whipp

Aaron Sherman wrote
 On Mon, Oct 22, 2001 at 11:30:01AM -0700, David Whipp wrote:
More, someone has mentioned the %x{$_}++ feature, which 
 IMHO, MUST
continue to work.
   
   What is void plus one?
  
  Can't we utilize the lazy arrays stuff to make all this work.
  Out of the box, all entries could default to NaN. But its easy
  to write
  
  @a ^= 0;
  
  to change this default. I'm sure this could be extended to work
  with hashes. I'm not sure what the correct syntax is, though.
 
 Nope.
 
   my @a;
   @a ^= 0;
   print @a
 
 Are you saying that this should print an infinite number of zeros?
 

Quoting Larry on this subject (apocalypse 3, bottom of page 3): I
can think of other cans of worms this opens, and I'm quite certain
I'm too stupid to think of them all. Nevertheless, my gut feeling
is that we can make things work more like people expect rather
than less. And I was always a little bit jealous that REXX could
have arrays with default values. :-)


Dave.




RE: Default values, was RE: the handiness of undef becoming NaN (when you want that)

2001-10-22 Thread David Whipp

  Aaron Sherman wrote
 Larry's hubris notwithstanding, I'd like to suggest that 
 more, in this
 case means no, it prints nothing.
 
 This *must* be true, as you don't want:
 
   @a ^+ @b
 
 to always return an infinite list. You want it to produce a list with
 (as a3 suggested) length max(@a.length,@b.length)

OK, now we've got this resolved, I'd like to return the focus
back to the original point.

@x ^= 0;
@x[5]++;

does not have problems with NaNs; and does not generate a warning
with -w.


My suggestion was to extend this to work with hashes, too.

something like 

%x ^= $^_ = 0;
or
values %x ^= 0;


Dave.



Re: De NaN-ibus

2001-10-22 Thread Damian Conway


Hufgo suggested:

:Have I missed anything?

Code, and docs, for ieee.pm. Other than that, it looks good to me. :)

Ah, but that's a SMoP, left as an exercise to the reader.

;-)

Damian