Re: [math] Problems with sparse implementations of RealVector
-Original Message- From: Luc Maisonobe Sent: Monday, June 18, 2012 1:40 AM To: Commons Developers List Subject: Re: [math] Problems with sparse implementations of RealVector Hi Sébastien, Le 18/06/2012 08:11, Sébastien Brisard a écrit : Dear all, in this thread, http://markmail.org/thread/hhvm6wv3d3uhkwqs we had an interesting discussion on a bug which was revealed by abstract unit tests on all implementations of RealVector. It turns out that the bug is more far-reaching than we initially thought, and I would like to make sure that it has been brought to everyone's attention (as the subject of the previous thread was pretty cryptic). So here goes. In RealVector, we provide ebeMultiply(RealVector) and ebeDivide(RealVector). Also, in sparse implementations of RealVector, zero entries are not stored. This is all very well, but for the fact that 0.0 is actually signed in Java. The sign of zero is effectively lost in OpenMapRealVector. This affects the correctness of the returned values of ebeMulltiply() and ebeDivide() 1. For ebeMultiply() final RealVector v1 = new ArrayRealVector(new double[] { 1d }); final RealVector v2 = new OpenMapRealVector(new double[] { -0d }); final RealVector w = v1.ebeMultiply(v2); System.out.println(1d / w.getEntry(0)); prints Infinity, instead of -Infinity (because the sign is lost in v2). This means that w holds +0d instead of -0d. 2. For ebeDivide() final RealVector v1 = new ArrayRealVector(new double[] { 1d }); final RealVector v2 = new OpenMapRealVector(new double[] { -0d }); final RealVector w = v1.ebeDivide(v2); System.out.println(w.getEntry(0)); prints Infinity, instead of -Infinity. For this last bug, Gilles suggested the following fix public OpenMapRealVector ebeDivide(OpenMapRealVector v) { if (v.getDefaultEntry() == 0) { throw new ZeroException(); } // ... } which was indeed no big deal, since the exception occured only when the expected entry should have been + or -Infinity (which means that the calculation had effectively failed). However, this fix is not the end of the story, because it should be applied to *any* implementation of RealVector.ebeDivide, as long as the provided argument is an OpenMapRealVector. This makes things cumbersome. Also, other implementations of RealVector (not only OpenMapRealVector) might be affected by the same limitation. In my view, this would require the definition of a new abstract method in RealVector protected boolean preservesSignOfZeroEntries() which returns true if the sign of zero entries can be reliably retrieve from this vector. Then, for each implementation of ebeMultiply and ebeDivide,, we should test for preservesSignOfZeroEntries(), and handle the boundary cases accordingly. The question is then: how should the boundary case be handled in the ebeMultiply example? In this case, the expected value is perfectly valid, and throwing an exception would effectively stop a computation which is not yet in failed state. I would be tempted to quietly accept operations like : any double * (zero with undecidable sign). The returned value would be zero with undecidable sign (remember that the sign of zero is only used to compute (any double) / (signed zero)). But then, preservesSignOfZeroEntries() must be specified at construction time, because even ArrayRealVector might in some circumstances end up with zero entries with undecidable sign... This quickly gets very complicated! I think there is no satisfactory implementation of ebeMultiply and ebeDivide, and I would go as far as deprecate them. Users who need to perform these operations can always use visitors to do so efficiently (if not in an absolute fool-proof way). This sound good to me. I am not a big fan of all the ebe methods (despite I think I am the one who implemented them, from a user request). I also would be glad if we removed most or even all of the map methods. The ebe methods aren't all that interesting, and with the new visitor pattern they can be implemented by the user. Also, the users of SparseVector really won't care what value of +-infinity and/or NaN is stored and would probably just prefer that an exception is thrown if this case is detected. Luc Any better idea? Thanks in advance, Sébastien - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [math] Problems with sparse implementations of RealVector
And, it is much worse than that. Pretty much nobody cares about ebe, but dotProduct and outerProduct also assume that 0*NaN = 0 and 0*+-Infinity = 0. e.g.: RealVector a = new OpenMapRealVector(10); RealVector b = new OpenMapRealVector(10); a.setEntry(1, 1.0); b.setEntry(2, Double.NaN); double prod = a.dotProduct(b); assert(prod == 0.0); The OpenMapRealVector class is already so incredibly slow. I really can't see maintaining support for it if it has to handle these edge cases as well. -Original Message- From: Sébastien Brisard Sent: Sunday, June 17, 2012 11:11 PM To: Commons Developers List Subject: [math] Problems with sparse implementations of RealVector Dear all, in this thread, http://markmail.org/thread/hhvm6wv3d3uhkwqs we had an interesting discussion on a bug which was revealed by abstract unit tests on all implementations of RealVector. It turns out that the bug is more far-reaching than we initially thought, and I would like to make sure that it has been brought to everyone's attention (as the subject of the previous thread was pretty cryptic). So here goes. In RealVector, we provide ebeMultiply(RealVector) and ebeDivide(RealVector). Also, in sparse implementations of RealVector, zero entries are not stored. This is all very well, but for the fact that 0.0 is actually signed in Java. The sign of zero is effectively lost in OpenMapRealVector. This affects the correctness of the returned values of ebeMulltiply() and ebeDivide() 1. For ebeMultiply() final RealVector v1 = new ArrayRealVector(new double[] { 1d }); final RealVector v2 = new OpenMapRealVector(new double[] { -0d }); final RealVector w = v1.ebeMultiply(v2); System.out.println(1d / w.getEntry(0)); prints Infinity, instead of -Infinity (because the sign is lost in v2). This means that w holds +0d instead of -0d. 2. For ebeDivide() final RealVector v1 = new ArrayRealVector(new double[] { 1d }); final RealVector v2 = new OpenMapRealVector(new double[] { -0d }); final RealVector w = v1.ebeDivide(v2); System.out.println(w.getEntry(0)); prints Infinity, instead of -Infinity. For this last bug, Gilles suggested the following fix public OpenMapRealVector ebeDivide(OpenMapRealVector v) { if (v.getDefaultEntry() == 0) { throw new ZeroException(); } // ... } which was indeed no big deal, since the exception occured only when the expected entry should have been + or -Infinity (which means that the calculation had effectively failed). However, this fix is not the end of the story, because it should be applied to *any* implementation of RealVector.ebeDivide, as long as the provided argument is an OpenMapRealVector. This makes things cumbersome. Also, other implementations of RealVector (not only OpenMapRealVector) might be affected by the same limitation. In my view, this would require the definition of a new abstract method in RealVector protected boolean preservesSignOfZeroEntries() which returns true if the sign of zero entries can be reliably retrieve from this vector. Then, for each implementation of ebeMultiply and ebeDivide,, we should test for preservesSignOfZeroEntries(), and handle the boundary cases accordingly. The question is then: how should the boundary case be handled in the ebeMultiply example? In this case, the expected value is perfectly valid, and throwing an exception would effectively stop a computation which is not yet in failed state. I would be tempted to quietly accept operations like : any double * (zero with undecidable sign). The returned value would be zero with undecidable sign (remember that the sign of zero is only used to compute (any double) / (signed zero)). But then, preservesSignOfZeroEntries() must be specified at construction time, because even ArrayRealVector might in some circumstances end up with zero entries with undecidable sign... This quickly gets very complicated! I think there is no satisfactory implementation of ebeMultiply and ebeDivide, and I would go as far as deprecate them. Users who need to perform these operations can always use visitors to do so efficiently (if not in an absolute fool-proof way). Any better idea? Thanks in advance, Sébastien - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [math] Problems with sparse implementations of RealVector
Hi Bill, 2012/6/19 Bill Barker billwbar...@verizon.net: -Original Message- From: Luc Maisonobe Sent: Monday, June 18, 2012 1:40 AM To: Commons Developers List Subject: Re: [math] Problems with sparse implementations of RealVector Hi Sébastien, Le 18/06/2012 08:11, Sébastien Brisard a écrit : Dear all, in this thread, http://markmail.org/thread/hhvm6wv3d3uhkwqs we had an interesting discussion on a bug which was revealed by abstract unit tests on all implementations of RealVector. It turns out that the bug is more far-reaching than we initially thought, and I would like to make sure that it has been brought to everyone's attention (as the subject of the previous thread was pretty cryptic). So here goes. In RealVector, we provide ebeMultiply(RealVector) and ebeDivide(RealVector). Also, in sparse implementations of RealVector, zero entries are not stored. This is all very well, but for the fact that 0.0 is actually signed in Java. The sign of zero is effectively lost in OpenMapRealVector. This affects the correctness of the returned values of ebeMulltiply() and ebeDivide() 1. For ebeMultiply() final RealVector v1 = new ArrayRealVector(new double[] { 1d }); final RealVector v2 = new OpenMapRealVector(new double[] { -0d }); final RealVector w = v1.ebeMultiply(v2); System.out.println(1d / w.getEntry(0)); prints Infinity, instead of -Infinity (because the sign is lost in v2). This means that w holds +0d instead of -0d. 2. For ebeDivide() final RealVector v1 = new ArrayRealVector(new double[] { 1d }); final RealVector v2 = new OpenMapRealVector(new double[] { -0d }); final RealVector w = v1.ebeDivide(v2); System.out.println(w.getEntry(0)); prints Infinity, instead of -Infinity. For this last bug, Gilles suggested the following fix public OpenMapRealVector ebeDivide(OpenMapRealVector v) { if (v.getDefaultEntry() == 0) { throw new ZeroException(); } // ... } which was indeed no big deal, since the exception occured only when the expected entry should have been + or -Infinity (which means that the calculation had effectively failed). However, this fix is not the end of the story, because it should be applied to *any* implementation of RealVector.ebeDivide, as long as the provided argument is an OpenMapRealVector. This makes things cumbersome. Also, other implementations of RealVector (not only OpenMapRealVector) might be affected by the same limitation. In my view, this would require the definition of a new abstract method in RealVector protected boolean preservesSignOfZeroEntries() which returns true if the sign of zero entries can be reliably retrieve from this vector. Then, for each implementation of ebeMultiply and ebeDivide,, we should test for preservesSignOfZeroEntries(), and handle the boundary cases accordingly. The question is then: how should the boundary case be handled in the ebeMultiply example? In this case, the expected value is perfectly valid, and throwing an exception would effectively stop a computation which is not yet in failed state. I would be tempted to quietly accept operations like : any double * (zero with undecidable sign). The returned value would be zero with undecidable sign (remember that the sign of zero is only used to compute (any double) / (signed zero)). But then, preservesSignOfZeroEntries() must be specified at construction time, because even ArrayRealVector might in some circumstances end up with zero entries with undecidable sign... This quickly gets very complicated! I think there is no satisfactory implementation of ebeMultiply and ebeDivide, and I would go as far as deprecate them. Users who need to perform these operations can always use visitors to do so efficiently (if not in an absolute fool-proof way). This sound good to me. I am not a big fan of all the ebe methods (despite I think I am the one who implemented them, from a user request). I also would be glad if we removed most or even all of the map methods. The ebe methods aren't all that interesting, and with the new visitor pattern they can be implemented by the user. Also, the users of SparseVector really won't care what value of +-infinity and/or NaN is stored and would probably just prefer that an exception is thrown if this case is detected. I agree. I miss the good old division by zero error... Sébastien - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [math] Problems with sparse implementations of RealVector
Hi Bill, And, it is much worse than that. Pretty much nobody cares about ebe, but dotProduct and outerProduct also assume that 0*NaN = 0 and 0*+-Infinity = 0. e.g.: RealVector a = new OpenMapRealVector(10); RealVector b = new OpenMapRealVector(10); a.setEntry(1, 1.0); b.setEntry(2, Double.NaN); double prod = a.dotProduct(b); assert(prod == 0.0); The OpenMapRealVector class is already so incredibly slow. I really can't see maintaining support for it if it has to handle these edge cases as well. Thanks for spotting this. I am in the middle of refactoring all unit tests for RealVector, and haven't reached dotProduct() yet, so haven't had the opportunity to reveal this bug. I think the question you raise is legitimate. Gilles already questioned in a recent post the support for this class. What do the others think? Sébastien - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [math] Problems with sparse implementations of RealVector
Hi Sébastien, Le 18/06/2012 08:11, Sébastien Brisard a écrit : Dear all, in this thread, http://markmail.org/thread/hhvm6wv3d3uhkwqs we had an interesting discussion on a bug which was revealed by abstract unit tests on all implementations of RealVector. It turns out that the bug is more far-reaching than we initially thought, and I would like to make sure that it has been brought to everyone's attention (as the subject of the previous thread was pretty cryptic). So here goes. In RealVector, we provide ebeMultiply(RealVector) and ebeDivide(RealVector). Also, in sparse implementations of RealVector, zero entries are not stored. This is all very well, but for the fact that 0.0 is actually signed in Java. The sign of zero is effectively lost in OpenMapRealVector. This affects the correctness of the returned values of ebeMulltiply() and ebeDivide() 1. For ebeMultiply() final RealVector v1 = new ArrayRealVector(new double[] { 1d }); final RealVector v2 = new OpenMapRealVector(new double[] { -0d }); final RealVector w = v1.ebeMultiply(v2); System.out.println(1d / w.getEntry(0)); prints Infinity, instead of -Infinity (because the sign is lost in v2). This means that w holds +0d instead of -0d. 2. For ebeDivide() final RealVector v1 = new ArrayRealVector(new double[] { 1d }); final RealVector v2 = new OpenMapRealVector(new double[] { -0d }); final RealVector w = v1.ebeDivide(v2); System.out.println(w.getEntry(0)); prints Infinity, instead of -Infinity. For this last bug, Gilles suggested the following fix public OpenMapRealVector ebeDivide(OpenMapRealVector v) { if (v.getDefaultEntry() == 0) { throw new ZeroException(); } // ... } which was indeed no big deal, since the exception occured only when the expected entry should have been + or -Infinity (which means that the calculation had effectively failed). However, this fix is not the end of the story, because it should be applied to *any* implementation of RealVector.ebeDivide, as long as the provided argument is an OpenMapRealVector. This makes things cumbersome. Also, other implementations of RealVector (not only OpenMapRealVector) might be affected by the same limitation. In my view, this would require the definition of a new abstract method in RealVector protected boolean preservesSignOfZeroEntries() which returns true if the sign of zero entries can be reliably retrieve from this vector. Then, for each implementation of ebeMultiply and ebeDivide,, we should test for preservesSignOfZeroEntries(), and handle the boundary cases accordingly. The question is then: how should the boundary case be handled in the ebeMultiply example? In this case, the expected value is perfectly valid, and throwing an exception would effectively stop a computation which is not yet in failed state. I would be tempted to quietly accept operations like : any double * (zero with undecidable sign). The returned value would be zero with undecidable sign (remember that the sign of zero is only used to compute (any double) / (signed zero)). But then, preservesSignOfZeroEntries() must be specified at construction time, because even ArrayRealVector might in some circumstances end up with zero entries with undecidable sign... This quickly gets very complicated! I think there is no satisfactory implementation of ebeMultiply and ebeDivide, and I would go as far as deprecate them. Users who need to perform these operations can always use visitors to do so efficiently (if not in an absolute fool-proof way). This sound good to me. I am not a big fan of all the ebe methods (despite I think I am the one who implemented them, from a user request). I also would be glad if we removed most or even all of the map methods. Luc Any better idea? Thanks in advance, Sébastien - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org