Re: [Numpy-discussion] please change mean to use dtype=float

2006-09-22 Thread Tim Hochberg
Sebastian Haase wrote:
 On Thursday 21 September 2006 15:28, Tim Hochberg wrote:
   
 David M. Cooke wrote:
 
 On Thu, 21 Sep 2006 11:34:42 -0700

 Tim Hochberg [EMAIL PROTECTED] wrote:
   
 Tim Hochberg wrote:
 
 Robert Kern wrote:
   
 David M. Cooke wrote:
 
 On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote:
   
 Let me offer a third path: the algorithms used for .mean() and
 .var() are substandard. There are much better incremental algorithms
 that entirely avoid the need to accumulate such large (and therefore
 precision-losing) intermediate values. The algorithms look like the
 following for 1D arrays in Python:

 def mean(a):
  m = a[0]
  for i in range(1, len(a)):
  m += (a[i] - m) / (i + 1)
  return m
 
 This isn't really going to be any better than using a simple sum.
 It'll also be slower (a division per iteration).
   
 With one exception, every test that I've thrown at it shows that it's
 better for float32. That exception is uniformly spaced arrays, like
 linspace().

   You do avoid
   accumulating large sums, but then doing the division a[i]/len(a)
   and adding that will do the same.

 Okay, this is true.

 
 Now, if you want to avoid losing precision, you want to use a better
 summation technique, like compensated (or Kahan) summation:

 def mean(a):
 s = e = a.dtype.type(0)
 for i in range(0, len(a)):
 temp = s
 y = a[i] + e
 s = temp + y
 e = (temp - s) + y
 return s / len(a)

   
 def var(a):
  m = a[0]
  t = a.dtype.type(0)
  for i in range(1, len(a)):
  q = a[i] - m
  r = q / (i+1)
  m += r
  t += i * q * r
  t /= len(a)
  return t

 Alternatively, from Knuth:

 def var_knuth(a):
  m = a.dtype.type(0)
  variance = a.dtype.type(0)
  for i in range(len(a)):
  delta = a[i] - m
  m += delta / (i+1)
  variance += delta * (a[i] - m)
  variance /= len(a)
  return variance
 
 I'm going to go ahead and attach a module containing the versions of
 mean, var, etc that I've been playing with in case someone wants to mess
 with them. Some were stolen from traffic on this list, for others I
 grabbed the algorithms from wikipedia or equivalent.
 
 I looked into this a bit more. I checked float32 (single precision) and
 float64 (double precision), using long doubles (float96) for the exact
 results. This is based on your code. Results are compared using
 abs(exact_stat - computed_stat) / max(abs(values)), with 1 values in
 the range of [-100, 900]

 First, the mean. In float32, the Kahan summation in single precision is
 better by about 2 orders of magnitude than simple summation. However,
 accumulating the sum in double precision is better by about 9 orders of
 magnitude than simple summation (7 orders more than Kahan).

 In float64, Kahan summation is the way to go, by 2 orders of magnitude.

 For the variance, in float32, Knuth's method is *no better* than the
 two-pass method. Tim's code does an implicit conversion of intermediate
 results to float64, which is why he saw a much better result.
   
 Doh! And I fixed that same problem in the mean implementation earlier
 too. I was astounded by how good knuth was doing, but not astounded
 enough apparently.

 Does it seem weird to anyone else that in:
 numpy_scalar op python_scalar
 the precision ends up being controlled by the python scalar? I would
 expect the numpy_scalar to control the resulting precision just like
 numpy arrays do in similar circumstances. Perhaps the egg on my face is
 just clouding my vision though.

 
 The two-pass method using
 Kahan summation (again, in single precision), is better by about 2 orders
 of magnitude. There is practically no difference when using a
 double-precision accumulator amongst the techniques: they're all about 9
 orders of magnitude better than single-precision two-pass.

 In float64, Kahan summation is again better than the rest, by about 2
 orders of magnitude.

 I've put my adaptation of Tim's code, and box-and-whisker plots of the
 results, at http://arbutus.mcmaster.ca/dmc/numpy/variance/

 Conclusions:

 - If you're going to calculate everything in single precision, use Kahan
 summation. Using it in double-precision also helps.
 - If you can use a double-precision accumulator, it's much better than
 any of the techniques in single-precision only.

 - for speed+precision in the variance, either use Kahan summation in
 single precision with the two-pass method, or use double precision with
 simple summation with the two-pass method. Knuth buys you nothing, except
 slower code :-)
   
 The two pass methods are definitely more accurate. I won't be convinced
 on the speed front till I see comparable C implementations slug it out.
 That may well mean never in practice. However, I expect that 

Re: [Numpy-discussion] please change mean to use dtype=float

2006-09-22 Thread Christopher Barker


Tim Hochberg wrote:
 It would probably be nice to expose the 
 Kahan sum and maybe even the raw_kahan_sum somewhere.

What about using it for .sum() by default? What is the speed hit anyway? 
In any case, having it available would be nice.

 I'm on the fence on using the array dtype for the accumulator dtype 
 versus always using at least double precision for the accumulator. The 
 former is easier to explain and is probably faster, but the latter is a 
 lot more accuracy for basically free.

I don't think the difficulty of explanation is a big issue at all -- I'm 
having a really hard time imagining someone getting confused and/or 
disappointed that their single precision calculation didn't overflow or 
was more accurate than expected. Anyone that did, would know enough to 
understand the explanation.

In general, users expect things to just work. They only dig into the 
details when something goes wrong, like the mean of a bunch of positive 
integers coming out as negative (wasn't that the OP's example?). The 
fewer such instance we have, the fewer questions we have.

 speeds shake out I suppose. If the speed of using float64 is comparable 
 to that of using float32, we might as well.

Only testing will tell, but my experience is that with double precision 
FPUs, doubles are just as fast as floats unless you're dealing with 
enough memory to make a difference in caching. In this case, only the 
accumulator is double, so that wouldn't be an issue. I suppose the float 
to double conversion could take some time though...

 One thing I'm not on the 
 fence about is the return type: it should always match the input type, 
 or dtype if that is specified.

+1

 Since numpy-scalars are 
 generally the results of indexing operations, not literals, I think that 
 they should behave like arrays for purposes of determining the resulting 
 precision, not like python-scalars.

+1

 Of course the accuracy is pretty bad at single precision, so 
 the possible, theoretical speed advantage at large sizes probably 
 doesn't matter.

good point. the larger the array -- the more accuracy matters.

-Chris

-- 
Christopher Barker, Ph.D.
Oceanographer

NOAA/ORR/HAZMAT (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

[EMAIL PROTECTED]

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] please change mean to use dtype=float

2006-09-22 Thread Charles R Harris
On 9/22/06, Tim Hochberg [EMAIL PROTECTED] wrote:
Sebastian Haase wrote: On Thursday 21 September 2006 15:28, Tim Hochberg wrote: David M. Cooke wrote: On Thu, 21 Sep 2006 11:34:42 -0700 Tim Hochberg 
[EMAIL PROTECTED] wrote: Tim Hochberg wrote: Robert Kern wrote:
 David M. Cooke wrote: On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote: Let me offer a third path: the algorithms used for .mean() and
 .var() are substandard. There are much better incremental algorithms that entirely avoid the need to accumulate such large (and therefore precision-losing) intermediate values. The algorithms look like the
 following for 1D arrays in Python: def mean(a):m = a[0]for i in range(1, len(a)):
m += (a[i] - m) / (i + 1)return m This isn't really going to be any better than using a simple sum.
 It'll also be slower (a division per iteration). With one exception, every test that I've thrown at it shows that it's better for float32. That exception is uniformly spaced arrays, like
 linspace(). You do avoid accumulating large sums, but then doing the division a[i]/len(a) and adding that will do the same.
 Okay, this is true. Now, if you want to avoid losing precision, you want to use a better
 summation technique, like compensated (or Kahan) summation: def mean(a): s = e = a.dtype.type
(0) for i in range(0, len(a)): temp = s y = a[i] + e s = temp + y
 e = (temp - s) + y return s / len(a) def var(a):
m = a[0]t = a.dtype.type(0)for i in range(1, len(a)):q = a[i] - m
r = q / (i+1)m += rt += i * q * rt /= len(a)
return t Alternatively, from Knuth: def var_knuth(a):
m = a.dtype.type(0)variance = a.dtype.type(0)for i in range(len(a)):delta = a[i] - m
m += delta / (i+1)variance += delta * (a[i] - m)variance /= len(a)return variance
 I'm going to go ahead and attach a module containing the versions of mean, var, etc that I've been playing with in case someone wants to mess
 with them. Some were stolen from traffic on this list, for others I grabbed the algorithms from wikipedia or equivalent. I looked into this a bit more. I checked float32 (single precision) and
 float64 (double precision), using long doubles (float96) for the exact results. This is based on your code. Results are compared using abs(exact_stat - computed_stat) / max(abs(values)), with 1 values in
 the range of [-100, 900] First, the mean. In float32, the Kahan summation in single precision is better by about 2 orders of magnitude than simple summation. However,
 accumulating the sum in double precision is better by about 9 orders of magnitude than simple summation (7 orders more than Kahan). In float64, Kahan summation is the way to go, by 2 orders of magnitude.
 For the variance, in float32, Knuth's method is *no better* than the two-pass method. Tim's code does an implicit conversion of intermediate results to float64, which is why he saw a much better result.
 Doh! And I fixed that same problem in the mean implementation earlier too. I was astounded by how good knuth was doing, but not astounded enough apparently.
 Does it seem weird to anyone else that in: numpy_scalar op python_scalar the precision ends up being controlled by the python scalar? I would expect the numpy_scalar to control the resulting precision just like
 numpy arrays do in similar circumstances. Perhaps the egg on my face is just clouding my vision though. The two-pass method using Kahan summation (again, in single precision), is better by about 2 orders
 of magnitude. There is practically no difference when using a double-precision accumulator amongst the techniques: they're all about 9 orders of magnitude better than single-precision two-pass.
 In float64, Kahan summation is again better than the rest, by about 2 orders of magnitude. I've put my adaptation of Tim's code, and box-and-whisker plots of the
 results, at http://arbutus.mcmaster.ca/dmc/numpy/variance/ Conclusions: - If you're going to calculate everything in single precision, use Kahan
 summation. Using it in double-precision also helps. - If you can use a double-precision accumulator, it's much better than any of the techniques in single-precision only.
 - for speed+precision in the variance, either use Kahan summation in single precision with the two-pass method, or use double precision with simple summation with the two-pass method. Knuth buys you nothing, except
 slower code :-) The two pass methods are definitely more accurate. I won't be convinced on the speed front till I see comparable C implementations slug it out. That may well mean never in practice. However, I expect that somewhere
 around 10,000 items, the cache will overflow and memory bandwidth will become the bottleneck. At that point the extra operations of Knuth won't matter as much as making two passes through the array and Knuth will win
 on speed. Of course the accuracy is pretty bad at single precision, so the possible, theoretical speed advantage at large sizes probably doesn't matter. -tim
 After 1.0 is out, we should look at doing one of 

Re: [Numpy-discussion] please change mean to use dtype=float

2006-09-21 Thread David M. Cooke
On Thu, 21 Sep 2006 11:34:42 -0700
Tim Hochberg [EMAIL PROTECTED] wrote:

 Tim Hochberg wrote:
  Robert Kern wrote:

  David M. Cooke wrote:

  
  On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote:
  

  Let me offer a third path: the algorithms used for .mean() and .var()
  are substandard. There are much better incremental algorithms that
  entirely avoid the need to accumulate such large (and therefore
  precision-losing) intermediate values. The algorithms look like the
  following for 1D arrays in Python:
 
  def mean(a):
   m = a[0]
   for i in range(1, len(a)):
   m += (a[i] - m) / (i + 1)
   return m

  
  This isn't really going to be any better than using a simple sum.
  It'll also be slower (a division per iteration).
  

  With one exception, every test that I've thrown at it shows that it's
  better for float32. That exception is uniformly spaced arrays, like
  linspace().
 
You do avoid
accumulating large sums, but then doing the division a[i]/len(a) and
adding that will do the same.
 
  Okay, this is true.
 

  
  Now, if you want to avoid losing precision, you want to use a better
  summation technique, like compensated (or Kahan) summation:
 
  def mean(a):
  s = e = a.dtype.type(0)
  for i in range(0, len(a)):
  temp = s
  y = a[i] + e
  s = temp + y
  e = (temp - s) + y
  return s / len(a)
  
  def var(a):
   m = a[0]
   t = a.dtype.type(0)
   for i in range(1, len(a)):
   q = a[i] - m
   r = q / (i+1)
   m += r
   t += i * q * r
   t /= len(a)
   return t
 
  Alternatively, from Knuth:
 
  def var_knuth(a):
   m = a.dtype.type(0)
   variance = a.dtype.type(0)
   for i in range(len(a)):
   delta = a[i] - m
   m += delta / (i+1)
   variance += delta * (a[i] - m)
   variance /= len(a)
   return variance
 
 I'm going to go ahead and attach a module containing the versions of 
 mean, var, etc that I've been playing with in case someone wants to mess 
 with them. Some were stolen from traffic on this list, for others I 
 grabbed the algorithms from wikipedia or equivalent.

I looked into this a bit more. I checked float32 (single precision) and
float64 (double precision), using long doubles (float96) for the exact
results. This is based on your code. Results are compared using
abs(exact_stat - computed_stat) / max(abs(values)), with 1 values in the
range of [-100, 900]

First, the mean. In float32, the Kahan summation in single precision is
better by about 2 orders of magnitude than simple summation. However,
accumulating the sum in double precision is better by about 9 orders of
magnitude than simple summation (7 orders more than Kahan).

In float64, Kahan summation is the way to go, by 2 orders of magnitude.

For the variance, in float32, Knuth's method is *no better* than the two-pass
method. Tim's code does an implicit conversion of intermediate results to
float64, which is why he saw a much better result. The two-pass method using
Kahan summation (again, in single precision), is better by about 2 orders of
magnitude. There is practically no difference when using a double-precision
accumulator amongst the techniques: they're all about 9 orders of magnitude
better than single-precision two-pass.

In float64, Kahan summation is again better than the rest, by about 2 orders
of magnitude.

I've put my adaptation of Tim's code, and box-and-whisker plots of the
results, at http://arbutus.mcmaster.ca/dmc/numpy/variance/

Conclusions:

- If you're going to calculate everything in single precision, use Kahan
summation. Using it in double-precision also helps.
- If you can use a double-precision accumulator, it's much better than any of
the techniques in single-precision only.

- for speed+precision in the variance, either use Kahan summation in single
precision with the two-pass method, or use double precision with simple
summation with the two-pass method. Knuth buys you nothing, except slower
code :-)

After 1.0 is out, we should look at doing one of the above.

-- 
||\/|
/--\
|David M. Cooke  http://arbutus.physics.mcmaster.ca/dmc/
|[EMAIL PROTECTED]

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] please change mean to use dtype=float

2006-09-21 Thread Travis Oliphant
David M. Cooke wrote:


Conclusions:

- If you're going to calculate everything in single precision, use Kahan
summation. Using it in double-precision also helps.
- If you can use a double-precision accumulator, it's much better than any of
the techniques in single-precision only.

- for speed+precision in the variance, either use Kahan summation in single
precision with the two-pass method, or use double precision with simple
summation with the two-pass method. Knuth buys you nothing, except slower
code :-)

After 1.0 is out, we should look at doing one of the above.
  


+1



-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] please change mean to use dtype=float

2006-09-21 Thread Tim Hochberg
David M. Cooke wrote:
 On Thu, 21 Sep 2006 11:34:42 -0700
 Tim Hochberg [EMAIL PROTECTED] wrote:

   
 Tim Hochberg wrote:
 
 Robert Kern wrote:
   
   
 David M. Cooke wrote:
   
 
 
 On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote:
 
   
   
 Let me offer a third path: the algorithms used for .mean() and .var()
 are substandard. There are much better incremental algorithms that
 entirely avoid the need to accumulate such large (and therefore
 precision-losing) intermediate values. The algorithms look like the
 following for 1D arrays in Python:

 def mean(a):
  m = a[0]
  for i in range(1, len(a)):
  m += (a[i] - m) / (i + 1)
  return m
   
 
 
 This isn't really going to be any better than using a simple sum.
 It'll also be slower (a division per iteration).
 
   
   
 With one exception, every test that I've thrown at it shows that it's
 better for float32. That exception is uniformly spaced arrays, like
 linspace().

   You do avoid
   accumulating large sums, but then doing the division a[i]/len(a) and
   adding that will do the same.

 Okay, this is true.

   
 
 
 Now, if you want to avoid losing precision, you want to use a better
 summation technique, like compensated (or Kahan) summation:

 def mean(a):
 s = e = a.dtype.type(0)
 for i in range(0, len(a)):
 temp = s
 y = a[i] + e
 s = temp + y
 e = (temp - s) + y
 return s / len(a)
   
 
 
 def var(a):
  m = a[0]
  t = a.dtype.type(0)
  for i in range(1, len(a)):
  q = a[i] - m
  r = q / (i+1)
  m += r
  t += i * q * r
  t /= len(a)
  return t

 Alternatively, from Knuth:

 def var_knuth(a):
  m = a.dtype.type(0)
  variance = a.dtype.type(0)
  for i in range(len(a)):
  delta = a[i] - m
  m += delta / (i+1)
  variance += delta * (a[i] - m)
  variance /= len(a)
  return variance
 
 I'm going to go ahead and attach a module containing the versions of 
 mean, var, etc that I've been playing with in case someone wants to mess 
 with them. Some were stolen from traffic on this list, for others I 
 grabbed the algorithms from wikipedia or equivalent.
 

 I looked into this a bit more. I checked float32 (single precision) and
 float64 (double precision), using long doubles (float96) for the exact
 results. This is based on your code. Results are compared using
 abs(exact_stat - computed_stat) / max(abs(values)), with 1 values in the
 range of [-100, 900]

 First, the mean. In float32, the Kahan summation in single precision is
 better by about 2 orders of magnitude than simple summation. However,
 accumulating the sum in double precision is better by about 9 orders of
 magnitude than simple summation (7 orders more than Kahan).

 In float64, Kahan summation is the way to go, by 2 orders of magnitude.

 For the variance, in float32, Knuth's method is *no better* than the two-pass
 method. Tim's code does an implicit conversion of intermediate results to
 float64, which is why he saw a much better result. The two-pass method using
 Kahan summation (again, in single precision), is better by about 2 orders of
 magnitude. There is practically no difference when using a double-precision
 accumulator amongst the techniques: they're all about 9 orders of magnitude
 better than single-precision two-pass.

 In float64, Kahan summation is again better than the rest, by about 2 orders
 of magnitude.

 I've put my adaptation of Tim's code, and box-and-whisker plots of the
 results, at http://arbutus.mcmaster.ca/dmc/numpy/variance/

 Conclusions:

 - If you're going to calculate everything in single precision, use Kahan
 summation. Using it in double-precision also helps.
 - If you can use a double-precision accumulator, it's much better than any of
 the techniques in single-precision only.

 - for speed+precision in the variance, either use Kahan summation in single
 precision with the two-pass method, or use double precision with simple
 summation with the two-pass method. Knuth buys you nothing, except slower
 code :-)

 After 1.0 is out, we should look at doing one of the above.
   
One more little tidbit; it appears possible to fix up Knuth's 
algorithm so that it's comparable in accuracy to the two pass Kahan 
version by doing Kahan summation while accumulating the variance. 
Testing on this was far from thorough, but in the tests I did it nearly 
always produced identical results to kahan. Of course this is even 
slower than the original Knuth version, but it's interesting anyway:


# This is probably messier than it need to be
# but I'm out of time for today...
def var_knuth2(values, dtype=default_prec):
var(values) - variance of values computed using
   Knuth's one pass algorithm
m = variance = mc = vc = dtype(0)
for i, x in enumerate(values):
 

Re: [Numpy-discussion] please change mean to use dtype=float

2006-09-21 Thread Sebastian Haase
On Thursday 21 September 2006 15:28, Tim Hochberg wrote:
 David M. Cooke wrote:
  On Thu, 21 Sep 2006 11:34:42 -0700
 
  Tim Hochberg [EMAIL PROTECTED] wrote:
  Tim Hochberg wrote:
  Robert Kern wrote:
  David M. Cooke wrote:
  On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote:
  Let me offer a third path: the algorithms used for .mean() and
  .var() are substandard. There are much better incremental algorithms
  that entirely avoid the need to accumulate such large (and therefore
  precision-losing) intermediate values. The algorithms look like the
  following for 1D arrays in Python:
 
  def mean(a):
   m = a[0]
   for i in range(1, len(a)):
   m += (a[i] - m) / (i + 1)
   return m
 
  This isn't really going to be any better than using a simple sum.
  It'll also be slower (a division per iteration).
 
  With one exception, every test that I've thrown at it shows that it's
  better for float32. That exception is uniformly spaced arrays, like
  linspace().
 
You do avoid
accumulating large sums, but then doing the division a[i]/len(a)
and adding that will do the same.
 
  Okay, this is true.
 
  Now, if you want to avoid losing precision, you want to use a better
  summation technique, like compensated (or Kahan) summation:
 
  def mean(a):
  s = e = a.dtype.type(0)
  for i in range(0, len(a)):
  temp = s
  y = a[i] + e
  s = temp + y
  e = (temp - s) + y
  return s / len(a)
 
  def var(a):
   m = a[0]
   t = a.dtype.type(0)
   for i in range(1, len(a)):
   q = a[i] - m
   r = q / (i+1)
   m += r
   t += i * q * r
   t /= len(a)
   return t
 
  Alternatively, from Knuth:
 
  def var_knuth(a):
   m = a.dtype.type(0)
   variance = a.dtype.type(0)
   for i in range(len(a)):
   delta = a[i] - m
   m += delta / (i+1)
   variance += delta * (a[i] - m)
   variance /= len(a)
   return variance
 
  I'm going to go ahead and attach a module containing the versions of
  mean, var, etc that I've been playing with in case someone wants to mess
  with them. Some were stolen from traffic on this list, for others I
  grabbed the algorithms from wikipedia or equivalent.
 
  I looked into this a bit more. I checked float32 (single precision) and
  float64 (double precision), using long doubles (float96) for the exact
  results. This is based on your code. Results are compared using
  abs(exact_stat - computed_stat) / max(abs(values)), with 1 values in
  the range of [-100, 900]
 
  First, the mean. In float32, the Kahan summation in single precision is
  better by about 2 orders of magnitude than simple summation. However,
  accumulating the sum in double precision is better by about 9 orders of
  magnitude than simple summation (7 orders more than Kahan).
 
  In float64, Kahan summation is the way to go, by 2 orders of magnitude.
 
  For the variance, in float32, Knuth's method is *no better* than the
  two-pass method. Tim's code does an implicit conversion of intermediate
  results to float64, which is why he saw a much better result.

 Doh! And I fixed that same problem in the mean implementation earlier
 too. I was astounded by how good knuth was doing, but not astounded
 enough apparently.

 Does it seem weird to anyone else that in:
 numpy_scalar op python_scalar
 the precision ends up being controlled by the python scalar? I would
 expect the numpy_scalar to control the resulting precision just like
 numpy arrays do in similar circumstances. Perhaps the egg on my face is
 just clouding my vision though.

  The two-pass method using
  Kahan summation (again, in single precision), is better by about 2 orders
  of magnitude. There is practically no difference when using a
  double-precision accumulator amongst the techniques: they're all about 9
  orders of magnitude better than single-precision two-pass.
 
  In float64, Kahan summation is again better than the rest, by about 2
  orders of magnitude.
 
  I've put my adaptation of Tim's code, and box-and-whisker plots of the
  results, at http://arbutus.mcmaster.ca/dmc/numpy/variance/
 
  Conclusions:
 
  - If you're going to calculate everything in single precision, use Kahan
  summation. Using it in double-precision also helps.
  - If you can use a double-precision accumulator, it's much better than
  any of the techniques in single-precision only.
 
  - for speed+precision in the variance, either use Kahan summation in
  single precision with the two-pass method, or use double precision with
  simple summation with the two-pass method. Knuth buys you nothing, except
  slower code :-)

 The two pass methods are definitely more accurate. I won't be convinced
 on the speed front till I see comparable C implementations slug it out.
 That may well mean never in practice. However, I expect that somewhere
 around 10,000 items, the cache will overflow and memory bandwidth will
 become 

Re: [Numpy-discussion] please change mean to use dtype=float

2006-09-20 Thread Robert Kern
Sebastian Haase wrote:
 Robert Kern wrote:
 Sebastian Haase wrote:
 I know that having too much knowledge of the details often makes one 
 forget what the newcomers will do and expect.
 Please be more careful with such accusations. Repeated frequently, they can 
 become quite insulting.

 I did not mean to insult anyone - what I meant was, that I'm for numpy 
 becoming an easy platform to use. I have spend and enjoyed part the last 
 four years developing and evangelizing Python as an alternative to 
 Matlab and C/Fortran based image analysis environment. I often find 
 myself arguing for good support of the single precision data format. So 
 I find it actually somewhat ironic to see myself arguing now for wanting 
 float64 over float32 ;-)

No one is doubting that you want numpy to be easy to use. Please don't doubt 
that the rest of us want otherwise. However, the fact that you *want* numpy to 
be easy to use does not mean that your suggestions *will* make numpy easy to 
use.

We haven't forgotten what newcomers will do; to the contrary, we are quite 
aware 
that new users need consistent behavior in order to learn how to use a system. 
Adding another special case in how dtypes implicitly convert to one another 
will 
impede new users being able to understand the whole system. See A. M. 
Archibald's question in the thread ufunc.reduce and conversion for an 
example. 
In our judgement this is a worse outcome than notational convenience for 
float32 
users, who already need to be aware of the effects of their precision choice. 
Each of us can come to different conclusions in good faith without one of us 
forgetting the new user experience.

Let me offer a third path: the algorithms used for .mean() and .var() are 
substandard. There are much better incremental algorithms that entirely avoid 
the need to accumulate such large (and therefore precision-losing) intermediate 
values. The algorithms look like the following for 1D arrays in Python:

def mean(a):
 m = a[0]
 for i in range(1, len(a)):
 m += (a[i] - m) / (i + 1)
 return m

def var(a):
 m = a[0]
 t = a.dtype.type(0)
 for i in range(1, len(a)):
 q = a[i] - m
 r = q / (i+1)
 m += r
 t += i * q * r
 t /= len(a)
 return t

Alternatively, from Knuth:

def var_knuth(a):
 m = a.dtype.type(0)
 variance = a.dtype.type(0)
 for i in range(len(a)):
 delta = a[i] - m
 m += delta / (i+1)
 variance += delta * (a[i] - m)
 variance /= len(a)
 return variance

If you will code up implementations of these for ndarray.mean() and 
ndarray.var(), I will check them in and then float32 arrays will avoid most of 
the catastrophes that the current implementations run into.

 We are only talking 
 about people that will a) work with single-precision data (e.g. large 
 scale-image analysis) and who b) will tend to just use the default 
 (dtype)  --- How else can I say this: these people will just assume that 
 arr.mean() *is* the mean of arr.
 I don't understand what you mean, here. arr.mean() is almost never *the* 
 mean of 
 arr. Double precision can't fix that.

 This was not supposed to be a scientific statement -- I'm (again) 
 thinking of our students that not always appreciate the full complexity 
 of computational numerics and data types and such.

They need to appreciate the complexity of computational numerics if they are 
going to do numerical computation. Double precision does not make it any 
simpler.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth.
   -- Umberto Eco


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] please change mean to use dtype=float

2006-09-20 Thread Sebastian Haase
Robert Kern wrote:
  This was not supposed to be a scientific statement -- I'm (again)
  thinking of our students that not always appreciate the full
  complexity
  of computational numerics and data types and such.
 
  They need to appreciate the complexity of computational numerics if
  they are going to do numerical computation. Double precision does not
  make it any simpler.

This is were we differ.

  We haven't forgotten what newcomers will do; to the contrary, we are
  quite aware
  that new users need consistent behavior in order to learn how to use a
  system.
  Adding another special case in how dtypes implicitly convert to one
  another will
  impede new users being able to understand the whole system.

All I'm proposing could be summarized in:
mean(), sum(), var() ... produce output of dtype float64 (except for 
input float96 which produces float96)

A comment on this is also that for these operations the input 
type/precision is almost not related to the resulting output precision 
-- the int case makes that already clear.  (This is different for e.g. 
min() or max() )

The proposed alternative implementations seem to have one or more 
multiplication (or division) for each value -- this might be noticeably 
slower ...

Regards,
Sebastian


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] please change mean to use dtype=float

2006-09-20 Thread Tim Hochberg
Robert Kern wrote:
 Sebastian Haase wrote:
   
 Robert Kern wrote:
 
 Sebastian Haase wrote:
   
 I know that having too much knowledge of the details often makes one 
 forget what the newcomers will do and expect.
 
 Please be more careful with such accusations. Repeated frequently, they can 
 become quite insulting.

   
 I did not mean to insult anyone - what I meant was, that I'm for numpy 
 becoming an easy platform to use. I have spend and enjoyed part the last 
 four years developing and evangelizing Python as an alternative to 
 Matlab and C/Fortran based image analysis environment. I often find 
 myself arguing for good support of the single precision data format. So 
 I find it actually somewhat ironic to see myself arguing now for wanting 
 float64 over float32 ;-)
 

 No one is doubting that you want numpy to be easy to use. Please don't doubt 
 that the rest of us want otherwise. However, the fact that you *want* numpy 
 to 
 be easy to use does not mean that your suggestions *will* make numpy easy to 
 use.

 We haven't forgotten what newcomers will do; to the contrary, we are quite 
 aware 
 that new users need consistent behavior in order to learn how to use a 
 system. 
 Adding another special case in how dtypes implicitly convert to one another 
 will 
 impede new users being able to understand the whole system. See A. M. 
 Archibald's question in the thread ufunc.reduce and conversion for an 
 example. 
 In our judgement this is a worse outcome than notational convenience for 
 float32 
 users, who already need to be aware of the effects of their precision choice. 
 Each of us can come to different conclusions in good faith without one of us 
 forgetting the new user experience.

 Let me offer a third path: the algorithms used for .mean() and .var() are 
 substandard. There are much better incremental algorithms that entirely avoid 
 the need to accumulate such large (and therefore precision-losing) 
 intermediate 
 values. The algorithms look like the following for 1D arrays in Python:

 def mean(a):
  m = a[0]
  for i in range(1, len(a)):
  m += (a[i] - m) / (i + 1)
  return m

 def var(a):
  m = a[0]
  t = a.dtype.type(0)
  for i in range(1, len(a)):
  q = a[i] - m
  r = q / (i+1)
  m += r
  t += i * q * r
  t /= len(a)
  return t

 Alternatively, from Knuth:

 def var_knuth(a):
  m = a.dtype.type(0)
  variance = a.dtype.type(0)
  for i in range(len(a)):
  delta = a[i] - m
  m += delta / (i+1)
  variance += delta * (a[i] - m)
  variance /= len(a)
  return variance

 If you will code up implementations of these for ndarray.mean() and 
 ndarray.var(), I will check them in and then float32 arrays will avoid most 
 of 
 the catastrophes that the current implementations run into.
   
+1


   
 We are only talking 
 about people that will a) work with single-precision data (e.g. large 
 scale-image analysis) and who b) will tend to just use the default 
 (dtype)  --- How else can I say this: these people will just assume that 
 arr.mean() *is* the mean of arr.
 
 I don't understand what you mean, here. arr.mean() is almost never *the* 
 mean of 
 arr. Double precision can't fix that.

   
 This was not supposed to be a scientific statement -- I'm (again) 
 thinking of our students that not always appreciate the full complexity 
 of computational numerics and data types and such.
 

 They need to appreciate the complexity of computational numerics if they are 
 going to do numerical computation. Double precision does not make it any 
 simpler.

   



-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] please change mean to use dtype=float

2006-09-20 Thread David M. Cooke
On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote:
 Let me offer a third path: the algorithms used for .mean() and .var() are 
 substandard. There are much better incremental algorithms that entirely avoid 
 the need to accumulate such large (and therefore precision-losing) 
 intermediate 
 values. The algorithms look like the following for 1D arrays in Python:
 
 def mean(a):
  m = a[0]
  for i in range(1, len(a)):
  m += (a[i] - m) / (i + 1)
  return m

This isn't really going to be any better than using a simple sum.
It'll also be slower (a division per iteration). You do avoid
accumulating large sums, but then doing the division a[i]/len(a) and
adding that will do the same.

Now, if you want to avoid losing precision, you want to use a better
summation technique, like compensated (or Kahan) summation:

def mean(a):
s = e = a.dtype.type(0)
for i in range(0, len(a)):
temp = s
y = a[i] + e
s = temp + y
e = (temp - s) + y
return s / len(a)

Some numerical experiments in Maple using 5-digit precision show that
your mean is maybe a bit better in some cases, but can also be much
worse, than sum(a)/len(a), but both are quite poor in comparision to the
Kahan summation.

(We could probably use a fast implementation of Kahan summation in
addition to a.sum())

 def var(a):
  m = a[0]
  t = a.dtype.type(0)
  for i in range(1, len(a)):
  q = a[i] - m
  r = q / (i+1)
  m += r
  t += i * q * r
  t /= len(a)
  return t
 
 Alternatively, from Knuth:
 
 def var_knuth(a):
  m = a.dtype.type(0)
  variance = a.dtype.type(0)
  for i in range(len(a)):
  delta = a[i] - m
  m += delta / (i+1)
  variance += delta * (a[i] - m)
  variance /= len(a)
  return variance

These formulas are good when you can only do one pass over the data
(like in a calculator where you don't store all the data points), but
are slightly worse than doing two passes. Kahan summation would probably
also be good here too.

-- 
||\/|
/--\
|David M. Cooke  http://arbutus.physics.mcmaster.ca/dmc/
|[EMAIL PROTECTED]

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] please change mean to use dtype=float

2006-09-20 Thread Christopher Barker
Sebastian Haase wrote:
 The best I can hope for is a sound default for most (practical) cases...
 I still think that 80bit vs. 128bit vs 96bit is rather academic for most 
 people ...  most people seem to only use float64 and then there are some 
 that use float32 (like us) ...

I fully agree with Sebastian here. As Travis pointed out, all we are 
talking about is the default. The default should follow the principle 
of least surprise for the less-knowledgeable users. Personally, I try to 
always use doubles, unless I have a real reason not to. The recent 
change of default types for zeros et al. will help.

clearly, there is a problem to say the default accumulator for *all* 
types is double, as you wouldn't want to downcast float128s, even if 
they are obscure. However, is it that hard to say that the default 
accumulator will have *at least* the precision of double?

Robert Kern wrote:
 Let me offer a third path: the algorithms used for .mean() and .var() are 
 substandard. There are much better incremental algorithms that entirely avoid 
 the need to accumulate such large (and therefore precision-losing) 
 intermediate 
 values.

This, of course, is an even better solution, unless there are 
substantial performance impacts.

-Chris







-- 
Christopher Barker, Ph.D.
Oceanographer

NOAA/ORR/HAZMAT (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

[EMAIL PROTECTED]

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] please change mean to use dtype=float

2006-09-20 Thread Charles R Harris
On 9/20/06, David M. Cooke [EMAIL PROTECTED] wrote:
On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote: Let me offer a third path: the algorithms used for .mean() and .var() are substandard. There are much better incremental algorithms that entirely avoid
 the need to accumulate such large (and therefore precision-losing) intermediate values. The algorithms look like the following for 1D arrays in Python: def mean(a):m = a[0]
for i in range(1, len(a)):m += (a[i] - m) / (i + 1)return mThis isn't really going to be any better than using a simple sum.It'll also be slower (a division per iteration). You do avoid
accumulating large sums, but then doing the division a[i]/len(a) andadding that will do the same.Now, if you want to avoid losing precision, you want to use a bettersummation technique, like compensated (or Kahan) summation:
def mean(a):s = e = a.dtype.type(0)for i in range(0, len(a)):temp = sy = a[i] + es = temp + ye = (temp - s) + yreturn s / len(a)
A variant of this can also be used to generate the sin/cos for fourier transforms using repeated complex multiplication. It works amazingly well. But I suspect accumulating in higher precision would be just as good and faster if the hardware supports it.
Divide and conquer, like an fft where only the constant coefficient is computed, is also a good bet but requires lg(n) passes over the data and extra storage.
Some numerical experiments in Maple using 5-digit precision show thatyour mean is maybe a bit better in some cases, but can also be muchworse, than sum(a)/len(a), but both are quite poor in comparision to theKahan summation.
(We could probably use a fast implementation of Kahan summation inaddition to a.sum()) def var(a):m = a[0]t = a.dtype.type(0)for i in range(1, len(a)):
q = a[i] - mr = q / (i+1)m += rt += i * q * rt /= len(a)return t Alternatively, from Knuth: def var_knuth(a):
m = a.dtype.type(0)variance = a.dtype.type(0)for i in range(len(a)):delta = a[i] - mm += delta / (i+1)variance += delta * (a[i] - m)
variance /= len(a)return varianceThese formulas are good when you can only do one pass over the data(like in a calculator where you don't store all the data points), butare slightly worse than doing two passes. Kahan summation would probably
also be good here too.I think we should leave things as they are. Choosing the right precision for accumulating sums is absolutely fundamental, I always think about it when programming such loops because it is always a potential gotcha. Making the defaults higher precision just kicks the can down the road where the unwary are still likely to trip over it at some point.
Perhaps these special tricks for special cases could be included in scipy somewhere.Chuck
-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] please change mean to use dtype=float

2006-09-20 Thread Robert Kern
David M. Cooke wrote:
 On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote:
 Let me offer a third path: the algorithms used for .mean() and .var() are 
 substandard. There are much better incremental algorithms that entirely 
 avoid 
 the need to accumulate such large (and therefore precision-losing) 
 intermediate 
 values. The algorithms look like the following for 1D arrays in Python:

 def mean(a):
  m = a[0]
  for i in range(1, len(a)):
  m += (a[i] - m) / (i + 1)
  return m
 
 This isn't really going to be any better than using a simple sum.
 It'll also be slower (a division per iteration).

With one exception, every test that I've thrown at it shows that it's better 
for 
float32. That exception is uniformly spaced arrays, like linspace().

  You do avoid
  accumulating large sums, but then doing the division a[i]/len(a) and
  adding that will do the same.

Okay, this is true.

 Now, if you want to avoid losing precision, you want to use a better
 summation technique, like compensated (or Kahan) summation:
 
 def mean(a):
 s = e = a.dtype.type(0)
 for i in range(0, len(a)):
 temp = s
 y = a[i] + e
 s = temp + y
 e = (temp - s) + y
 return s / len(a)
 
 Some numerical experiments in Maple using 5-digit precision show that
 your mean is maybe a bit better in some cases, but can also be much
 worse, than sum(a)/len(a), but both are quite poor in comparision to the
 Kahan summation.
 
 (We could probably use a fast implementation of Kahan summation in
 addition to a.sum())

+1

 def var(a):
  m = a[0]
  t = a.dtype.type(0)
  for i in range(1, len(a)):
  q = a[i] - m
  r = q / (i+1)
  m += r
  t += i * q * r
  t /= len(a)
  return t

 Alternatively, from Knuth:

 def var_knuth(a):
  m = a.dtype.type(0)
  variance = a.dtype.type(0)
  for i in range(len(a)):
  delta = a[i] - m
  m += delta / (i+1)
  variance += delta * (a[i] - m)
  variance /= len(a)
  return variance
 
 These formulas are good when you can only do one pass over the data
 (like in a calculator where you don't store all the data points), but
 are slightly worse than doing two passes. Kahan summation would probably
 also be good here too.

Again, my tests show otherwise for float32. I'll condense my ipython log into a 
module for everyone's perusal. It's possible that the Kahan summation of the 
squared residuals will work better than the current two-pass algorithm and the 
implementations I give above.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth.
   -- Umberto Eco


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] please change mean to use dtype=float

2006-09-20 Thread Tim Hochberg
Robert Kern wrote:
 David M. Cooke wrote:
   
 On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote:
 
 Let me offer a third path: the algorithms used for .mean() and .var() are 
 substandard. There are much better incremental algorithms that entirely 
 avoid 
 the need to accumulate such large (and therefore precision-losing) 
 intermediate 
 values. The algorithms look like the following for 1D arrays in Python:

 def mean(a):
  m = a[0]
  for i in range(1, len(a)):
  m += (a[i] - m) / (i + 1)
  return m
   
 This isn't really going to be any better than using a simple sum.
 It'll also be slower (a division per iteration).
 

 With one exception, every test that I've thrown at it shows that it's better 
 for 
 float32. That exception is uniformly spaced arrays, like linspace().
   
Here's version of mean based on the Kahan sum, including the 
compensation term that seems to be consistently much better than builtin 
mean  It seems to be typically 4 orders or magnitude closer to the 
correct answer than the builtin mean and 2 orders of magnitude better 
than just naively using the Kahan sum. I'm using the sum performed with 
dtype=int64 as the correct value.


def rawKahanSum(values):  # Stolen from mathworld
if not input:
return 0.0
total = values[0]
c = type(total)(0.0)# A running compensation for lost 
low-order bits.
for x in values[1:]:
y = x - c # So far, so good: c is zero.
t = total + y   # Alas, total is big, y small, so 
low-order digits of y are lost.
c = (t - total) - y# (t - total) recovers the high-order 
part of y; subtracting y recovers -(low part of y)
total = t   # Algebriacally, c should always be 
zero. Beware eagerly optimising compilers!
  # Next time around, the lost low part 
will be added to y in a fresh attempt.
return total, c
   
def kahanSum(values):
total, c =  rawKahanSum(values)
return total
   
def mean(values):
total, c = rawKahanSum(values)
n = float(len(values))
return total /  n - c / n
   


for i in range(100):
data = random.random([1]).astype(float32)
baseline = data.mean(dtype=float64) 
delta_builtin_mean = baseline - data.mean()
delta_compensated_mean = baseline - mean(data)
delta_kahan_mean = baseline - (kahanSum(data) / len(data))
if not abs(delta_builtin_mean) = abs(delta_kahan_mean) = 
abs(delta_compensated_mean):
print ,
print delta_builtin_mean, delta_kahan_mean, delta_compensated_mean

-tim
   You do avoid
   accumulating large sums, but then doing the division a[i]/len(a) and
   adding that will do the same.

 Okay, this is true.

   
 Now, if you want to avoid losing precision, you want to use a better
 summation technique, like compensated (or Kahan) summation:

 def mean(a):
 s = e = a.dtype.type(0)
 for i in range(0, len(a)):
 temp = s
 y = a[i] + e
 s = temp + y
 e = (temp - s) + y
 return s / len(a)

 Some numerical experiments in Maple using 5-digit precision show that
 your mean is maybe a bit better in some cases, but can also be much
 worse, than sum(a)/len(a), but both are quite poor in comparision to the
 Kahan summation.

 (We could probably use a fast implementation of Kahan summation in
 addition to a.sum())
 

 +1

   
 def var(a):
  m = a[0]
  t = a.dtype.type(0)
  for i in range(1, len(a)):
  q = a[i] - m
  r = q / (i+1)
  m += r
  t += i * q * r
  t /= len(a)
  return t

 Alternatively, from Knuth:

 def var_knuth(a):
  m = a.dtype.type(0)
  variance = a.dtype.type(0)
  for i in range(len(a)):
  delta = a[i] - m
  m += delta / (i+1)
  variance += delta * (a[i] - m)
  variance /= len(a)
  return variance
   
 These formulas are good when you can only do one pass over the data
 (like in a calculator where you don't store all the data points), but
 are slightly worse than doing two passes. Kahan summation would probably
 also be good here too.
 

 Again, my tests show otherwise for float32. I'll condense my ipython log into 
 a 
 module for everyone's perusal. It's possible that the Kahan summation of the 
 squared residuals will work better than the current two-pass algorithm and 
 the 
 implementations I give above.

   



-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] please change mean to use dtype=float

2006-09-20 Thread Tim Hochberg
[Sorry, this version should have less munged formatting since I clipped 
the comments. Oh, and the Kahan sum algorithm was grabbed from 
wikipedia, not mathworld]

Tim Hochberg wrote:
 Robert Kern wrote:
   
 David M. Cooke wrote:
   
 
 On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote:
 
   
 Let me offer a third path: the algorithms used for .mean() and .var() are 
 substandard. There are much better incremental algorithms that entirely 
 avoid 
 the need to accumulate such large (and therefore precision-losing) 
 intermediate 
 values. The algorithms look like the following for 1D arrays in Python:

 def mean(a):
  m = a[0]
  for i in range(1, len(a)):
  m += (a[i] - m) / (i + 1)
  return m
   
 
 This isn't really going to be any better than using a simple sum.
 It'll also be slower (a division per iteration).
 
   
 With one exception, every test that I've thrown at it shows that it's better 
 for 
 float32. That exception is uniformly spaced arrays, like linspace().
   
 
 Here's version of mean based on the Kahan sum, including the 
 compensation term that seems to be consistently much better than builtin 
 mean  It seems to be typically 4 orders or magnitude closer to the 
 correct answer than the builtin mean and 2 orders of magnitude better 
 than just naively using the Kahan sum. I'm using the sum performed with 
 dtype=int64 as the correct value.


   



def rawKahanSum(values):
if not input:
return 0.0
total = values[0]
c = type(total)(0.0)   
for x in values[1:]:
y = x - c 
t = total + y 
c = (t - total) - y  
total = t
   
return total, c
   
def kahanSum(values):
total, c =  rawKahanSum(values)
return total
   
def mean(values):
total, c = rawKahanSum(values)
n = float(len(values))
return total /  n - c / n
   


for i in range(100):
data = random.random([1]).astype(float32)
baseline = data.mean(dtype=float64)
delta_builtin_mean = baseline - data.mean()
delta_compensated_mean = baseline - mean(data)
delta_kahan_mean = baseline - (kahanSum(data) / len(data))
if not abs(delta_builtin_mean) = abs(delta_kahan_mean) = 
abs(delta_compensated_mean):
print ,
print delta_builtin_mean, delta_kahan_mean, delta_compensated_mean


   


 -tim
   
   You do avoid
   accumulating large sums, but then doing the division a[i]/len(a) and
   adding that will do the same.

 Okay, this is true.

   
 
 Now, if you want to avoid losing precision, you want to use a better
 summation technique, like compensated (or Kahan) summation:

 def mean(a):
 s = e = a.dtype.type(0)
 for i in range(0, len(a)):
 temp = s
 y = a[i] + e
 s = temp + y
 e = (temp - s) + y
 return s / len(a)

 Some numerical experiments in Maple using 5-digit precision show that
 your mean is maybe a bit better in some cases, but can also be much
 worse, than sum(a)/len(a), but both are quite poor in comparision to the
 Kahan summation.

 (We could probably use a fast implementation of Kahan summation in
 addition to a.sum())
 
   
 +1

   
 
 def var(a):
  m = a[0]
  t = a.dtype.type(0)
  for i in range(1, len(a)):
  q = a[i] - m
  r = q / (i+1)
  m += r
  t += i * q * r
  t /= len(a)
  return t

 Alternatively, from Knuth:

 def var_knuth(a):
  m = a.dtype.type(0)
  variance = a.dtype.type(0)
  for i in range(len(a)):
  delta = a[i] - m
  m += delta / (i+1)
  variance += delta * (a[i] - m)
  variance /= len(a)
  return variance
   
 
 These formulas are good when you can only do one pass over the data
 (like in a calculator where you don't store all the data points), but
 are slightly worse than doing two passes. Kahan summation would probably
 also be good here too.
 
   
 Again, my tests show otherwise for float32. I'll condense my ipython log 
 into a 
 module for everyone's perusal. It's possible that the Kahan summation of the 
 squared residuals will work better than the current two-pass algorithm and 
 the 
 implementations I give above.

   
 



 -
 Take Surveys. Earn Cash. Influence the Future of IT
 Join SourceForge.net's Techsay panel and you'll get the chance to share your
 opinions on IT  business topics through brief surveys -- and earn cash
 http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
 ___
 Numpy-discussion mailing list
 Numpy-discussion@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/numpy-discussion


   



-
Take Surveys. Earn Cash. Influence the Future of IT
Join 

Re: [Numpy-discussion] please change mean to use dtype=float

2006-09-20 Thread Tim Hochberg
Robert Kern wrote:
 David M. Cooke wrote:
   
 On Wed, Sep 20, 2006 at 03:01:18AM -0500, Robert Kern wrote:
 
 Let me offer a third path: the algorithms used for .mean() and .var() are 
 substandard. There are much better incremental algorithms that entirely 
 avoid 
 the need to accumulate such large (and therefore precision-losing) 
 intermediate 
 values. The algorithms look like the following for 1D arrays in Python:

 def mean(a):
  m = a[0]
  for i in range(1, len(a)):
  m += (a[i] - m) / (i + 1)
  return m
   
 This isn't really going to be any better than using a simple sum.
 It'll also be slower (a division per iteration).
 

 With one exception, every test that I've thrown at it shows that it's better 
 for 
 float32. That exception is uniformly spaced arrays, like linspace().

   You do avoid
   accumulating large sums, but then doing the division a[i]/len(a) and
   adding that will do the same.

 Okay, this is true.

   
 Now, if you want to avoid losing precision, you want to use a better
 summation technique, like compensated (or Kahan) summation:

 def mean(a):
 s = e = a.dtype.type(0)
 for i in range(0, len(a)):
 temp = s
 y = a[i] + e
 s = temp + y
 e = (temp - s) + y
 return s / len(a)

 Some numerical experiments in Maple using 5-digit precision show that
 your mean is maybe a bit better in some cases, but can also be much
 worse, than sum(a)/len(a), but both are quite poor in comparision to the
 Kahan summation.

 (We could probably use a fast implementation of Kahan summation in
 addition to a.sum())
 

 +1

   
 def var(a):
  m = a[0]
  t = a.dtype.type(0)
  for i in range(1, len(a)):
  q = a[i] - m
  r = q / (i+1)
  m += r
  t += i * q * r
  t /= len(a)
  return t

 Alternatively, from Knuth:

 def var_knuth(a):
  m = a.dtype.type(0)
  variance = a.dtype.type(0)
  for i in range(len(a)):
  delta = a[i] - m
  m += delta / (i+1)
  variance += delta * (a[i] - m)
  variance /= len(a)
  return variance
   
 These formulas are good when you can only do one pass over the data
 (like in a calculator where you don't store all the data points), but
 are slightly worse than doing two passes. Kahan summation would probably
 also be good here too.
 
On the flip side, it doesn't take a very large array (somewhere in the 
vicinity of 10,000 values in my experience) before memory bandwidth 
becomes a limiting factor. In that region a two pass algorithm could 
well be twice as slow as a single pass algorithm even if the former is 
somewhat simpler in terms of numeric operations.

-tim



 Again, my tests show otherwise for float32. I'll condense my ipython log into 
 a 
 module for everyone's perusal. It's possible that the Kahan summation of the 
 squared residuals will work better than the current two-pass algorithm and 
 the 
 implementations I give above.
   








-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] please change mean to use dtype=float

2006-09-19 Thread Travis Oliphant
Sebastian Haase wrote:

Hello all,
I just had someone from my lab coming to my desk saying:
My god - SciPy is really stupid 
An array with only positive numbers claims to have a negative mean !! ?

  

I was asking about this before ... the reason was of course that her array was 
of dtype int32 and had many large values  to cause an overflow (wrap 
around) .

Now that the default for axis is None (for all functions having an axis 
argument),
can we please change dtype to default to float64 !?
  


The default is float64 now (as long as you are not using 
numpy.oldnumeric). 

I suppose more appropriately, we could reduce over float for integer 
data-types when calculating the mean as well (since a floating point is 
returned anyway).


-Travis


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] please change mean to use dtype=float

2006-09-19 Thread Sebastian Haase
On Tuesday 19 September 2006 15:48, Travis Oliphant wrote:
 Sebastian Haase wrote:
snip
 can we please change dtype to default to float64 !?

 The default is float64 now (as long as you are not using
 numpy.oldnumeric).

 I suppose more appropriately, we could reduce over float for integer
 data-types when calculating the mean as well (since a floating point is
 returned anyway).


Is now mean() always reducing over float64 ? 
The svn note Log:
Fix mean, std, and var methods so that they reduce over double data-type with 
integer inputs.

makes it sound that a float32 input is stays float32 ?

For mean calculation this might introduce large errors - I usually would 
require double-precision for *any*  input type ...

(don't know how to say this for complex types !? Are here real and imag 
treated separately / independently ?)

Thanks,
Sebastian Haase

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] please change mean to use dtype=float

2006-09-19 Thread Travis Oliphant
Sebastian Haase wrote:

On Tuesday 19 September 2006 15:48, Travis Oliphant wrote:
  

Sebastian Haase wrote:


snip
  

can we please change dtype to default to float64 !?
  

The default is float64 now (as long as you are not using
numpy.oldnumeric).

I suppose more appropriately, we could reduce over float for integer
data-types when calculating the mean as well (since a floating point is
returned anyway).




Is now mean() always reducing over float64 ? 
The svn note Log:
Fix mean, std, and var methods so that they reduce over double data-type with 
integer inputs.

makes it sound that a float32 input is stays float32 ?
  

Yes, that is true.  Only integer inputs are changed because you are 
going to get a floating point output anyway.

For mean calculation this might introduce large errors - I usually would 
require double-precision for *any*  input type ...
  

Of course.  The system is not fool-proof.  I hesitate to arbitrarily 
change this.  The advantage of using single-precision calculation is 
that it is faster.  We do rely on the user who expressly requests these 
things to be aware of the difficulties.

(don't know how to say this for complex types !? Are here real and imag 
treated separately / independently ?)
  


There is a complex add performed at a low-level as two separate adds.  
The addition is performed in the precision requested.

-Travis


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] please change mean to use dtype=float

2006-09-19 Thread Robert Kern
Sebastian Haase wrote:
 (don't know how to say this for complex types !? Are here real and imag 
 treated separately / independently ?)

Yes. For mean(), there's really no alternative. Scalar variance is not a 
well-defined concept for complex numbers, but treating the real and imaginary 
parts separately is a sensible and (partially) informative thing to do. Simply 
applying the formula for estimating variance for real numbers to complex 
numbers 
(i.e. change x to z) is a meaningless operation.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth.
   -- Umberto Eco


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] please change mean to use dtype=float

2006-09-19 Thread Charles R Harris
On 9/19/06, Travis Oliphant [EMAIL PROTECTED] wrote:
Sebastian Haase wrote:On Tuesday 19 September 2006 15:48, Travis Oliphant wrote:Sebastian Haase wrote:snipcan we please change dtype to default to float64 !?
The default is float64 now (as long as you are not usingnumpy.oldnumeric).I suppose more appropriately, we could reduce over float for integer
data-types when calculating the mean as well (since a floating point isreturned anyway).Is now mean() always reducing over float64 ?
The svn note Log:Fix mean, std, and var methods so that they reduce over double data-type withinteger inputs.makes it sound that a float32 input is stays float32 ?
Yes, that is true.Only integer inputs are changed because you aregoing to get a floating point output anyway.For mean calculation this might introduce large errors - I usually would
require double-precision for *any*input type ...Of course.The system is not fool-proof.I hesitate to arbitrarilychange this.The advantage of using single-precision calculation isthat it is faster.We do rely on the user who expressly requests these
things to be aware of the difficulties.Speed depends on the achitecture. Float is a trifle slower than double on my Athlon64, but faster on PPC750. I don't know about other machines. I think there is a good argument for accumlating in double and converting to float for output if required. 
Chuck
-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] please change mean to use dtype=float

2006-09-19 Thread Sebastian Haase
On Tuesday 19 September 2006 17:17, Travis Oliphant wrote:
 Sebastian Haase wrote:
 On Tuesday 19 September 2006 15:48, Travis Oliphant wrote:
 Sebastian Haase wrote:
 
 snip
 
 can we please change dtype to default to float64 !?
 
 The default is float64 now (as long as you are not using
 numpy.oldnumeric).
 
 I suppose more appropriately, we could reduce over float for integer
 data-types when calculating the mean as well (since a floating point is
 returned anyway).
 
 Is now mean() always reducing over float64 ?
 The svn note Log:
 Fix mean, std, and var methods so that they reduce over double data-type
  with integer inputs.
 
 makes it sound that a float32 input is stays float32 ?

 Yes, that is true.  Only integer inputs are changed because you are
 going to get a floating point output anyway.

 For mean calculation this might introduce large errors - I usually would
 require double-precision for *any*  input type ...

 Of course.  The system is not fool-proof.  I hesitate to arbitrarily
 change this.  The advantage of using single-precision calculation is
 that it is faster.  We do rely on the user who expressly requests these
 things to be aware of the difficulties.

I still would argue that getting a good (smaller rounding errors) answer 
should be the default -- if speed is wanted, then *that* could be still 
specified by explicitly using dtype=float32  (which would also be a possible 
choice for int32 input) . 
In image processing we always want means to be calculated in float64 even 
though input data is always float32 (if not uint16).

Also it is simpler to say float64 is the default (full stop.) - instead 
float64 is the default unless you have float32 

Thanks,
Sebastian Haase

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] please change mean to use dtype=float

2006-09-19 Thread Travis Oliphant
Charles R Harris wrote:

 Speed depends on the achitecture. Float is a trifle slower than double 
 on my Athlon64, but faster on PPC750. I don't know about other 
 machines. I think there is a good argument for accumlating in double 
 and converting to float for output if required.

Yes there is.  It's just not what NumPy ever does so it would be an 
exception in this case and would need  a more convincing argument in my 
opinion.You can always specify the accumulation type yourself with 
the dtype argument.  We are only talking about what the default should 
be.  

-Travis


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] please change mean to use dtype=float

2006-09-19 Thread Travis Oliphant
Sebastian Haase wrote:
 I still would argue that getting a good (smaller rounding errors) answer 
 should be the default -- if speed is wanted, then *that* could be still 
 specified by explicitly using dtype=float32  (which would also be a possible 
 choice for int32 input) . 
   
So you are arguing for using long double then ;-)

 In image processing we always want means to be calculated in float64 even 
 though input data is always float32 (if not uint16).

 Also it is simpler to say float64 is the default (full stop.) - instead 
   
 float64 is the default unless you have float32 
   
the type you have is the default except for integers.  Do you really 
want float64 to be the default for float96?

Unless we are going to use long double as the default, then I'm not 
convinced that we should special-case the double type.

-Travis


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] please change mean to use dtype=float

2006-09-19 Thread Sebastian Haase
Travis Oliphant wrote:
 Sebastian Haase wrote:
 I still would argue that getting a good (smaller rounding errors) answer 
 should be the default -- if speed is wanted, then *that* could be still 
 specified by explicitly using dtype=float32  (which would also be a possible 
 choice for int32 input) . 
   
 So you are arguing for using long double then ;-)
 
 In image processing we always want means to be calculated in float64 even 
 though input data is always float32 (if not uint16).

 Also it is simpler to say float64 is the default (full stop.) - instead 
   
 float64 is the default unless you have float32 
   
 the type you have is the default except for integers.  Do you really 
 want float64 to be the default for float96?
 
 Unless we are going to use long double as the default, then I'm not 
 convinced that we should special-case the double type.
 
I guess I'm not really aware of the float96 type ...
Is that a machine type on any system ?  I always thought that -- e.g. 
coming from C -- double is as good as it gets...
Who uses float96 ?  I heard somewhere that (some) CPUs use 80bits 
internally when calculating 64bit double-precision...

Is this not going into some academic argument !?
For all I know, calculating mean()s (and sum()s, ...) is always done in 
double precision -- never in single precision, even when the data is in 
float32.

Having float32 be the default for float32 data is just requiring more 
typing, and more explaining ...  it would compromise numpy usability as 
a day-to-day replacement for other systems.

Sorry, if I'm being ignorant ...

- Sebastian

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] please change mean to use dtype=float

2006-09-19 Thread Charles R Harris
On 9/19/06, Charles R Harris [EMAIL PROTECTED] wrote:
On 9/19/06, Sebastian Haase 
[EMAIL PROTECTED] wrote:
Travis Oliphant wrote: Sebastian Haase wrote: I still would argue that getting a good (smaller rounding errors) answer should be the default -- if speed is wanted, then *that* could be still
 specified by explicitly using dtype=float32(which would also be a possible choice for int32 input) . So you are arguing for using long double then ;-) In image processing we always want means to be calculated in float64 even
 though input data is always float32 (if not uint16). Also it is simpler to say float64 is the default (full stop.) - instead float64 is the default unless you have float32
 the type you have is the default except for integers.Do you really want float64 to be the default for float96? Unless we are going to use long double as the default, then I'm not
 convinced that we should special-case the double type.I guess I'm not really aware of the float96 type ...Is that a machine type on any system ?I always thought that -- 
e.g
.coming from C -- double is as good as it gets...Who uses float96 ?I heard somewhere that (some) CPUs use 80bitsinternally when calculating 64bit double-precision...Is this not going into some academic argument !?
For all I know, calculating mean()s (and sum()s, ...) is always done indouble precision -- never in single precision, even when the data is infloat32.Having float32 be the default for float32 data is just requiring more
typing, and more explaining ...it would compromise numpy usability asa day-to-day replacement for other systems.Sorry, if I'm being ignorant ...I'm going to side with Travis here. It is only a default and easily overridden. And yes, there are precisions greater than double. I was using quad precision back in the eighties on a VAX for some inherently ill conditioned problems. And on my machine long double is 12 bytes. 
Here is the 754r (revision) spec: http://en.wikipedia.org/wiki/IEEE_754rIt includes quads (128 bits) and half precision (16 bits) floats. I believe the latter are used for some imaging stuff, radar for instance, and are also available in some high end GPUs from Nvidia and other companies. The 80 bit numbers you refer to were defined as extended precision in the original 754 spec and were mostly intended for temporaries in internal FPU computations. They have various alignment requirements for efficient use, which is why they show up as 96 bits (4 byte alignment) and sometimes 128 bits (8 byte alignment). So actually, float128 would not always distinquish between extended precision and quad precision.  I see more work for Travis in the future ;)
Chuck
-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] please change mean to use dtype=float

2006-09-19 Thread Robert Kern
Sebastian Haase wrote:
 I know that having too much knowledge of the details often makes one 
 forget what the newcomers will do and expect.

Please be more careful with such accusations. Repeated frequently, they can 
become quite insulting.

 We are only talking 
 about people that will a) work with single-precision data (e.g. large 
 scale-image analysis) and who b) will tend to just use the default 
 (dtype)  --- How else can I say this: these people will just assume that 
 arr.mean() *is* the mean of arr.

I don't understand what you mean, here. arr.mean() is almost never *the* mean 
of 
arr. Double precision can't fix that.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth.
   -- Umberto Eco


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] please change mean to use dtype=float

2006-09-19 Thread Sebastian Haase
Robert Kern wrote:
 Sebastian Haase wrote:
 I know that having too much knowledge of the details often makes one 
 forget what the newcomers will do and expect.
 
 Please be more careful with such accusations. Repeated frequently, they can 
 become quite insulting.
 
I did not mean to insult anyone - what I meant was, that I'm for numpy 
becoming an easy platform to use. I have spend and enjoyed part the last 
four years developing and evangelizing Python as an alternative to 
Matlab and C/Fortran based image analysis environment. I often find 
myself arguing for good support of the single precision data format. So 
I find it actually somewhat ironic to see myself arguing now for wanting 
float64 over float32 ;-)


 We are only talking 
 about people that will a) work with single-precision data (e.g. large 
 scale-image analysis) and who b) will tend to just use the default 
 (dtype)  --- How else can I say this: these people will just assume that 
 arr.mean() *is* the mean of arr.
 
 I don't understand what you mean, here. arr.mean() is almost never *the* mean 
 of 
 arr. Double precision can't fix that.
 
This was not supposed to be a scientific statement -- I'm (again) 
thinking of our students that not always appreciate the full complexity 
of computational numerics and data types and such.
The best I can hope for is a sound default for most (practical) cases...
I still think that 80bit vs. 128bit vs 96bit is rather academic for most 
people ...  most people seem to only use float64 and then there are some 
that use float32 (like us) ...


Cheers,
Sebastian

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion