Re: [math] Problems with sparse implementations of RealVector

2012-06-19 Thread Bill Barker



-Original Message- 
From: Luc Maisonobe

Sent: Monday, June 18, 2012 1:40 AM
To: Commons Developers List
Subject: Re: [math] Problems with sparse implementations of RealVector


Hi Sébastien,

Le 18/06/2012 08:11, Sébastien Brisard a écrit :

Dear all,

in this thread,
http://markmail.org/thread/hhvm6wv3d3uhkwqs
we had an interesting discussion on a bug which was revealed by
abstract unit tests on all implementations of RealVector. It turns out
that the bug is more far-reaching than we initially thought, and I
would like to make sure that it has been brought to everyone's
attention (as the subject of the previous thread was pretty cryptic).

So here goes. In RealVector, we provide ebeMultiply(RealVector) and
ebeDivide(RealVector). Also, in sparse implementations of RealVector,
zero entries are not stored. This is all very well, but for the fact
that 0.0 is actually signed in Java. The sign of zero is effectively
lost in OpenMapRealVector. This affects the correctness of the
returned values of ebeMulltiply() and ebeDivide()

1. For ebeMultiply()
final RealVector v1 = new ArrayRealVector(new double[] { 1d });
final RealVector v2 = new OpenMapRealVector(new double[] 
{ -0d });

final RealVector w = v1.ebeMultiply(v2);
System.out.println(1d / w.getEntry(0));

prints Infinity, instead of -Infinity (because the sign is lost in
v2). This means that w holds +0d instead of -0d.

2. For ebeDivide()
final RealVector v1 = new ArrayRealVector(new double[] { 1d });
final RealVector v2 = new OpenMapRealVector(new double[] 
{ -0d });

final RealVector w = v1.ebeDivide(v2);
System.out.println(w.getEntry(0));

prints Infinity, instead of -Infinity. For this last bug, Gilles
suggested the following fix



 public OpenMapRealVector ebeDivide(OpenMapRealVector v) {
   if (v.getDefaultEntry() == 0) {
 throw new ZeroException();
   }

   // ...
 }



which was indeed no big deal, since the exception occured only when
the expected entry should have been + or -Infinity (which means that
the calculation had effectively failed).

However, this fix is not the end of the story, because it should be
applied to *any* implementation of RealVector.ebeDivide, as long as
the provided argument is an OpenMapRealVector. This makes things
cumbersome. Also, other implementations of RealVector (not only
OpenMapRealVector) might be affected by the same limitation. In my
view, this would require the definition of a new abstract method in
RealVector
protected boolean preservesSignOfZeroEntries()
which returns true if the sign of zero entries can be reliably
retrieve from this vector. Then, for each implementation of
ebeMultiply and ebeDivide,, we should test for
preservesSignOfZeroEntries(), and handle the boundary cases
accordingly.

The question is then: how should the boundary case be handled in the
ebeMultiply example? In this case, the expected value is perfectly
valid, and throwing an exception would effectively stop a computation
which is not yet in failed state.  I would be tempted to quietly
accept operations like : any double * (zero with undecidable sign).
The returned value would be zero with undecidable sign (remember that
the sign of zero is only used to compute (any double) / (signed
zero)). But then, preservesSignOfZeroEntries() must be specified at
construction time, because even ArrayRealVector might in some
circumstances end up with zero entries with undecidable sign... This
quickly gets very complicated!

I think there is no satisfactory implementation of ebeMultiply and
ebeDivide, and I would go as far as deprecate them. Users who need to
perform these operations can always use visitors to do so efficiently
(if not in an absolute fool-proof way).


This sound good to me. I am not a big fan of all the ebe methods
(despite I think I am the one who implemented them, from a user
request). I also would be glad if we removed most or even all of the map
methods.



The ebe methods aren't all that interesting, and with the new visitor 
pattern they can be implemented by the user.  Also, the users of 
SparseVector really won't care what value of +-infinity and/or NaN is stored 
and would probably just prefer that an exception is thrown if this case is 
detected.



Luc


Any better idea?
Thanks in advance,
Sébastien


-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org





-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org 



-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [math] Problems with sparse implementations of RealVector

2012-06-19 Thread Bill Barker
And, it is much worse than that.  Pretty much nobody cares about ebe, but 
dotProduct and outerProduct also assume that 0*NaN = 0 and 0*+-Infinity = 0. 
e.g.:

  RealVector a = new OpenMapRealVector(10);
  RealVector b = new OpenMapRealVector(10);
  a.setEntry(1, 1.0);
  b.setEntry(2, Double.NaN);
  double prod = a.dotProduct(b);
  assert(prod == 0.0);

The OpenMapRealVector class is already so incredibly slow.  I really can't 
see maintaining support for it if it has to handle these edge cases as well.


-Original Message- 
From: Sébastien Brisard

Sent: Sunday, June 17, 2012 11:11 PM
To: Commons Developers List
Subject: [math] Problems with sparse implementations of RealVector

Dear all,

in this thread,
http://markmail.org/thread/hhvm6wv3d3uhkwqs
we had an interesting discussion on a bug which was revealed by
abstract unit tests on all implementations of RealVector. It turns out
that the bug is more far-reaching than we initially thought, and I
would like to make sure that it has been brought to everyone's
attention (as the subject of the previous thread was pretty cryptic).

So here goes. In RealVector, we provide ebeMultiply(RealVector) and
ebeDivide(RealVector). Also, in sparse implementations of RealVector,
zero entries are not stored. This is all very well, but for the fact
that 0.0 is actually signed in Java. The sign of zero is effectively
lost in OpenMapRealVector. This affects the correctness of the
returned values of ebeMulltiply() and ebeDivide()

1. For ebeMultiply()
   final RealVector v1 = new ArrayRealVector(new double[] { 1d });
   final RealVector v2 = new OpenMapRealVector(new double[] { -0d });
   final RealVector w = v1.ebeMultiply(v2);
   System.out.println(1d / w.getEntry(0));

prints Infinity, instead of -Infinity (because the sign is lost in
v2). This means that w holds +0d instead of -0d.

2. For ebeDivide()
   final RealVector v1 = new ArrayRealVector(new double[] { 1d });
   final RealVector v2 = new OpenMapRealVector(new double[] { -0d });
   final RealVector w = v1.ebeDivide(v2);
   System.out.println(w.getEntry(0));

prints Infinity, instead of -Infinity. For this last bug, Gilles
suggested the following fix



 public OpenMapRealVector ebeDivide(OpenMapRealVector v) {
   if (v.getDefaultEntry() == 0) {
 throw new ZeroException();
   }

   // ...
 }



which was indeed no big deal, since the exception occured only when
the expected entry should have been + or -Infinity (which means that
the calculation had effectively failed).

However, this fix is not the end of the story, because it should be
applied to *any* implementation of RealVector.ebeDivide, as long as
the provided argument is an OpenMapRealVector. This makes things
cumbersome. Also, other implementations of RealVector (not only
OpenMapRealVector) might be affected by the same limitation. In my
view, this would require the definition of a new abstract method in
RealVector
protected boolean preservesSignOfZeroEntries()
which returns true if the sign of zero entries can be reliably
retrieve from this vector. Then, for each implementation of
ebeMultiply and ebeDivide,, we should test for
preservesSignOfZeroEntries(), and handle the boundary cases
accordingly.

The question is then: how should the boundary case be handled in the
ebeMultiply example? In this case, the expected value is perfectly
valid, and throwing an exception would effectively stop a computation
which is not yet in failed state.  I would be tempted to quietly
accept operations like : any double * (zero with undecidable sign).
The returned value would be zero with undecidable sign (remember that
the sign of zero is only used to compute (any double) / (signed
zero)). But then, preservesSignOfZeroEntries() must be specified at
construction time, because even ArrayRealVector might in some
circumstances end up with zero entries with undecidable sign... This
quickly gets very complicated!

I think there is no satisfactory implementation of ebeMultiply and
ebeDivide, and I would go as far as deprecate them. Users who need to
perform these operations can always use visitors to do so efficiently
(if not in an absolute fool-proof way).

Any better idea?
Thanks in advance,
Sébastien


-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org 



-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [math] Problems with sparse implementations of RealVector

2012-06-19 Thread Sébastien Brisard
Hi Bill,

2012/6/19 Bill Barker billwbar...@verizon.net:


 -Original Message- From: Luc Maisonobe
 Sent: Monday, June 18, 2012 1:40 AM
 To: Commons Developers List
 Subject: Re: [math] Problems with sparse implementations of RealVector


 Hi Sébastien,

 Le 18/06/2012 08:11, Sébastien Brisard a écrit :

 Dear all,

 in this thread,
 http://markmail.org/thread/hhvm6wv3d3uhkwqs
 we had an interesting discussion on a bug which was revealed by
 abstract unit tests on all implementations of RealVector. It turns out
 that the bug is more far-reaching than we initially thought, and I
 would like to make sure that it has been brought to everyone's
 attention (as the subject of the previous thread was pretty cryptic).

 So here goes. In RealVector, we provide ebeMultiply(RealVector) and
 ebeDivide(RealVector). Also, in sparse implementations of RealVector,
 zero entries are not stored. This is all very well, but for the fact
 that 0.0 is actually signed in Java. The sign of zero is effectively
 lost in OpenMapRealVector. This affects the correctness of the
 returned values of ebeMulltiply() and ebeDivide()

 1. For ebeMultiply()
        final RealVector v1 = new ArrayRealVector(new double[] { 1d });
        final RealVector v2 = new OpenMapRealVector(new double[] { -0d });
        final RealVector w = v1.ebeMultiply(v2);
        System.out.println(1d / w.getEntry(0));

 prints Infinity, instead of -Infinity (because the sign is lost in
 v2). This means that w holds +0d instead of -0d.

 2. For ebeDivide()
        final RealVector v1 = new ArrayRealVector(new double[] { 1d });
        final RealVector v2 = new OpenMapRealVector(new double[] { -0d });
        final RealVector w = v1.ebeDivide(v2);
        System.out.println(w.getEntry(0));

 prints Infinity, instead of -Infinity. For this last bug, Gilles
 suggested the following fix


  public OpenMapRealVector ebeDivide(OpenMapRealVector v) {
   if (v.getDefaultEntry() == 0) {
     throw new ZeroException();
   }

   // ...
  }


 which was indeed no big deal, since the exception occured only when
 the expected entry should have been + or -Infinity (which means that
 the calculation had effectively failed).

 However, this fix is not the end of the story, because it should be
 applied to *any* implementation of RealVector.ebeDivide, as long as
 the provided argument is an OpenMapRealVector. This makes things
 cumbersome. Also, other implementations of RealVector (not only
 OpenMapRealVector) might be affected by the same limitation. In my
 view, this would require the definition of a new abstract method in
 RealVector
 protected boolean preservesSignOfZeroEntries()
 which returns true if the sign of zero entries can be reliably
 retrieve from this vector. Then, for each implementation of
 ebeMultiply and ebeDivide,, we should test for
 preservesSignOfZeroEntries(), and handle the boundary cases
 accordingly.

 The question is then: how should the boundary case be handled in the
 ebeMultiply example? In this case, the expected value is perfectly
 valid, and throwing an exception would effectively stop a computation
 which is not yet in failed state.  I would be tempted to quietly
 accept operations like : any double * (zero with undecidable sign).
 The returned value would be zero with undecidable sign (remember that
 the sign of zero is only used to compute (any double) / (signed
 zero)). But then, preservesSignOfZeroEntries() must be specified at
 construction time, because even ArrayRealVector might in some
 circumstances end up with zero entries with undecidable sign... This
 quickly gets very complicated!

 I think there is no satisfactory implementation of ebeMultiply and
 ebeDivide, and I would go as far as deprecate them. Users who need to
 perform these operations can always use visitors to do so efficiently
 (if not in an absolute fool-proof way).


 This sound good to me. I am not a big fan of all the ebe methods
 (despite I think I am the one who implemented them, from a user
 request). I also would be glad if we removed most or even all of the map
 methods.


 The ebe methods aren't all that interesting, and with the new visitor
 pattern they can be implemented by the user.  Also, the users of
 SparseVector really won't care what value of +-infinity and/or NaN is stored
 and would probably just prefer that an exception is thrown if this case is
 detected.

I agree. I miss the good old division by zero error...
Sébastien


-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [math] Problems with sparse implementations of RealVector

2012-06-19 Thread Sébastien Brisard
Hi Bill,


 And, it is much worse than that.  Pretty much nobody cares about ebe, but
 dotProduct and outerProduct also assume that 0*NaN = 0 and 0*+-Infinity = 0.
 e.g.:
  RealVector a = new OpenMapRealVector(10);
  RealVector b = new OpenMapRealVector(10);
  a.setEntry(1, 1.0);
  b.setEntry(2, Double.NaN);
  double prod = a.dotProduct(b);
  assert(prod == 0.0);

 The OpenMapRealVector class is already so incredibly slow.  I really can't
 see maintaining support for it if it has to handle these edge cases as well.


Thanks for spotting this. I am in the middle of refactoring all unit
tests for RealVector, and haven't reached dotProduct() yet, so haven't
had the opportunity to reveal this bug.
I think the question you raise is legitimate. Gilles already
questioned in a recent post the support for this class.

What do the others think?

Sébastien


-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [math] Problems with sparse implementations of RealVector

2012-06-18 Thread Luc Maisonobe
Hi Sébastien,

Le 18/06/2012 08:11, Sébastien Brisard a écrit :
 Dear all,
 
 in this thread,
 http://markmail.org/thread/hhvm6wv3d3uhkwqs
 we had an interesting discussion on a bug which was revealed by
 abstract unit tests on all implementations of RealVector. It turns out
 that the bug is more far-reaching than we initially thought, and I
 would like to make sure that it has been brought to everyone's
 attention (as the subject of the previous thread was pretty cryptic).
 
 So here goes. In RealVector, we provide ebeMultiply(RealVector) and
 ebeDivide(RealVector). Also, in sparse implementations of RealVector,
 zero entries are not stored. This is all very well, but for the fact
 that 0.0 is actually signed in Java. The sign of zero is effectively
 lost in OpenMapRealVector. This affects the correctness of the
 returned values of ebeMulltiply() and ebeDivide()
 
 1. For ebeMultiply()
 final RealVector v1 = new ArrayRealVector(new double[] { 1d });
 final RealVector v2 = new OpenMapRealVector(new double[] { -0d });
 final RealVector w = v1.ebeMultiply(v2);
 System.out.println(1d / w.getEntry(0));
 
 prints Infinity, instead of -Infinity (because the sign is lost in
 v2). This means that w holds +0d instead of -0d.
 
 2. For ebeDivide()
 final RealVector v1 = new ArrayRealVector(new double[] { 1d });
 final RealVector v2 = new OpenMapRealVector(new double[] { -0d });
 final RealVector w = v1.ebeDivide(v2);
 System.out.println(w.getEntry(0));
 
 prints Infinity, instead of -Infinity. For this last bug, Gilles
 suggested the following fix
 

  public OpenMapRealVector ebeDivide(OpenMapRealVector v) {
if (v.getDefaultEntry() == 0) {
  throw new ZeroException();
}

// ...
  }

 
 which was indeed no big deal, since the exception occured only when
 the expected entry should have been + or -Infinity (which means that
 the calculation had effectively failed).
 
 However, this fix is not the end of the story, because it should be
 applied to *any* implementation of RealVector.ebeDivide, as long as
 the provided argument is an OpenMapRealVector. This makes things
 cumbersome. Also, other implementations of RealVector (not only
 OpenMapRealVector) might be affected by the same limitation. In my
 view, this would require the definition of a new abstract method in
 RealVector
 protected boolean preservesSignOfZeroEntries()
 which returns true if the sign of zero entries can be reliably
 retrieve from this vector. Then, for each implementation of
 ebeMultiply and ebeDivide,, we should test for
 preservesSignOfZeroEntries(), and handle the boundary cases
 accordingly.
 
 The question is then: how should the boundary case be handled in the
 ebeMultiply example? In this case, the expected value is perfectly
 valid, and throwing an exception would effectively stop a computation
 which is not yet in failed state.  I would be tempted to quietly
 accept operations like : any double * (zero with undecidable sign).
 The returned value would be zero with undecidable sign (remember that
 the sign of zero is only used to compute (any double) / (signed
 zero)). But then, preservesSignOfZeroEntries() must be specified at
 construction time, because even ArrayRealVector might in some
 circumstances end up with zero entries with undecidable sign... This
 quickly gets very complicated!
 
 I think there is no satisfactory implementation of ebeMultiply and
 ebeDivide, and I would go as far as deprecate them. Users who need to
 perform these operations can always use visitors to do so efficiently
 (if not in an absolute fool-proof way).

This sound good to me. I am not a big fan of all the ebe methods
(despite I think I am the one who implemented them, from a user
request). I also would be glad if we removed most or even all of the map
methods.

Luc

 
 Any better idea?
 Thanks in advance,
 Sébastien
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
 For additional commands, e-mail: dev-h...@commons.apache.org
 
 


-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org