[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1425


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-28 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-50303211
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-28 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-50303204
  
There were some problems with pyspark. Let's call Jenkins again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-28 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-50363809
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-50364089
  
QA tests have started for PR 1425. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17283/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-50370726
  
QA results for PR 1425:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds the following public classes 
(experimental):brcase class VectorWithCompare(x: Vector) extends 
Ordered[VectorWithCompare] {brcase class CompareDoubleRightSide(brcase 
class CompareVectorRightSide(brclass TestingUtilsSuite extends FunSuite 
{brbrFor more information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17283/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-28 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-50379081
  
LGTM. I'm merging this into master! Thanks!!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-27 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-50265751
  
@dbtsai I think it is very uncommon to combine the scientific notation with 
percentage, like `1e-10 percent`. Shall we switch to `absTol` and `relTol` 
instead? I feel `*Tol` is better than `*Err` because we are testing an equality 
with tolerance but not error. MATLAB also uses tolerance: 
http://www.mathworks.com/help/matlab/ref/matlab.unittest.constraints.tolerance-class.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-50289898
  
QA tests have started for PR 1425. This patch DID NOT merge cleanly! 
brView progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17255/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-50290044
  
QA tests have started for PR 1425. This patch DID NOT merge cleanly! 
brView progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17257/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-50290249
  
QA tests have started for PR 1425. This patch DID NOT merge cleanly! 
brView progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17258/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-27 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/1425#discussion_r15443033
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/clustering/KMeansSuite.scala ---
@@ -40,27 +41,51 @@ class KMeansSuite extends FunSuite with 
LocalSparkContext {
 // No matter how many runs or iterations we use, we should get one 
cluster,
 // centered at the mean of the points
 
+ HEAD
--- End diff --

This was not merged cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-50290624
  
QA tests have started for PR 1425. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17261/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-27 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/1425#discussion_r15443103
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/clustering/KMeansSuite.scala ---
@@ -40,27 +41,51 @@ class KMeansSuite extends FunSuite with 
LocalSparkContext {
 // No matter how many runs or iterations we use, we should get one 
cluster,
 // centered at the mean of the points
 
+ HEAD
--- End diff --

Tried to rebase against master with conflicts. I addressed them in the next 
push. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-50291865
  
QA results for PR 1425:br- This patch FAILED unit tests.br- This patch 
merges cleanlybr- This patch adds the following public classes 
(experimental):brcase class VectorWithCompare(x: Vector) extends 
Ordered[VectorWithCompare] {brcase class CompareDoubleRightSide(brcase 
class CompareVectorRightSide(brclass TestingUtilsSuite extends FunSuite 
{brbrFor more information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17261/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-50292780
  
QA results for PR 1425:br- This patch PASSES unit tests.brbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17255/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-50293019
  
QA results for PR 1425:br- This patch PASSES unit tests.brbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17257/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-27 Thread dbtsai
Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-50293096
  
@mengxr Resolved all the conflicts after rebasing, and all the unittests 
are passed. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-24 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-50045370
  
@dbtsai As discussed with @srowen , the `%` sign is not helpful because we 
need `0.01%` in many cases and people never use notations like 
`1e-10%`.

Instead of adding more sugar in testing, how about using `absErr`/`absTol` 
and `relErr`/`relTol`? For example,

~~~
assert(a ~== b absErr 0.01)
assert(a ~== b relTol 1e-5)
~~~

I still recommend using `err`/`tol` that switches automatically between 
absolute error and relative error at `1.0`. But it is fine to be explicit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-24 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/1425#discussion_r15361377
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/clustering/KMeansSuite.scala ---
@@ -146,30 +147,39 @@ class KMeansSuite extends FunSuite with 
LocalSparkContext {
 val center = Vectors.sparse(n, Seq((0, 1.0), (1, 3.0), (2, 4.0)))
 
 var model = KMeans.train(data, k=1, maxIterations=1)
-assert(model.clusterCenters.head === center)
+assert(model.clusterCenters.head ~== center +- 1E-5)
 
 model = KMeans.train(data, k=1, maxIterations=2)
-assert(model.clusterCenters.head === center)
+assert(model.clusterCenters.head ~== center +- 1E-5)
 
 model = KMeans.train(data, k=1, maxIterations=5)
-assert(model.clusterCenters.head === center)
+assert(model.clusterCenters.head ~== center +- 1E-5)
 
 model = KMeans.train(data, k=1, maxIterations=1, runs=5)
-assert(model.clusterCenters.head === center)
+assert(model.clusterCenters.head ~== center +- 1E-5)
 
 model = KMeans.train(data, k=1, maxIterations=1, runs=5)
-assert(model.clusterCenters.head === center)
+assert(model.clusterCenters.head ~== center +- 1E-5)
 
 model = KMeans.train(data, k=1, maxIterations=1, runs=1, 
initializationMode=RANDOM)
-assert(model.clusterCenters.head === center)
+assert(model.clusterCenters.head ~== center +- 1E-5)
 
 model = KMeans.train(data, k=1, maxIterations=1, runs=1, 
initializationMode=K_MEANS_PARALLEL)
-assert(model.clusterCenters.head === center)
+assert(model.clusterCenters.head ~== center +- 1E-5)
 
 data.unpersist()
   }
 
   test(k-means|| initialization) {
+
+case class VectorWithCompare(val x: Vector) extends 
Ordered[VectorWithCompare] {
--- End diff --

`val` is unnecessary for cache classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-24 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/1425#discussion_r15361389
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/clustering/KMeansSuite.scala ---
@@ -146,30 +147,39 @@ class KMeansSuite extends FunSuite with 
LocalSparkContext {
 val center = Vectors.sparse(n, Seq((0, 1.0), (1, 3.0), (2, 4.0)))
 
 var model = KMeans.train(data, k=1, maxIterations=1)
-assert(model.clusterCenters.head === center)
+assert(model.clusterCenters.head ~== center +- 1E-5)
 
 model = KMeans.train(data, k=1, maxIterations=2)
-assert(model.clusterCenters.head === center)
+assert(model.clusterCenters.head ~== center +- 1E-5)
 
 model = KMeans.train(data, k=1, maxIterations=5)
-assert(model.clusterCenters.head === center)
+assert(model.clusterCenters.head ~== center +- 1E-5)
 
 model = KMeans.train(data, k=1, maxIterations=1, runs=5)
-assert(model.clusterCenters.head === center)
+assert(model.clusterCenters.head ~== center +- 1E-5)
 
 model = KMeans.train(data, k=1, maxIterations=1, runs=5)
-assert(model.clusterCenters.head === center)
+assert(model.clusterCenters.head ~== center +- 1E-5)
 
 model = KMeans.train(data, k=1, maxIterations=1, runs=1, 
initializationMode=RANDOM)
-assert(model.clusterCenters.head === center)
+assert(model.clusterCenters.head ~== center +- 1E-5)
 
 model = KMeans.train(data, k=1, maxIterations=1, runs=1, 
initializationMode=K_MEANS_PARALLEL)
-assert(model.clusterCenters.head === center)
+assert(model.clusterCenters.head ~== center +- 1E-5)
 
 data.unpersist()
   }
 
   test(k-means|| initialization) {
+
+case class VectorWithCompare(val x: Vector) extends 
Ordered[VectorWithCompare] {
+  @Override
+   def compare(that: VectorWithCompare): Int = {
--- End diff --

move this line up?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-24 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/1425#discussion_r15361609
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/util/TestingUtilsSuite.scala ---
@@ -0,0 +1,264 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.util
+
+import org.apache.spark.mllib.linalg.Vectors
+import org.scalatest.FunSuite
+import org.apache.spark.mllib.util.TestingUtils._
+import org.scalatest.exceptions.TestFailedException
+
+class TestingUtilsSuite extends FunSuite {
+
+  test(Comparing doubles using relative error in percentage.) {
+
+assert(23.1 ~== 23.52 %+- 2.0)
+assert(23.1 ~== 22.74 %+- 2.0)
+assert(23.1 ~= 23.52 %+- 2.0)
+assert(23.1 ~= 22.74 %+- 2.0)
+assert(!(23.1 !~= 23.52 %+- 2.0))
+assert(!(23.1 !~= 22.74 %+- 2.0))
+
+withClue(Should throw exception with message when test fails.) {
+  intercept[TestFailedException] {
+// This will throw exception with the following message.
+// Did not expect 23.1 and 23.52 to be within 2.0% using relative 
error.
+assert(23.1 !~== 23.52 %+- 2.0)
+  }
+  intercept[TestFailedException] {
+// Did not expect 23.1 and 22.74 to be within 2.0% using relative 
error.
+assert(23.1 !~== 22.74 %+- 2.0)
+  }
+}
+
+assert(23.1 !~== 23.63 %+- 2.0)
+assert(23.1 !~== 22.34 %+- 2.0)
+assert(23.1 !~= 23.63 %+- 2.0)
+assert(23.1 !~= 22.34 %+- 2.0)
+assert(!(23.1 ~= 23.63 %+- 2.0))
+assert(!(23.1 ~= 22.34 %+- 2.0))
+
+withClue(Should throw exception with message when test fails.) {
+  intercept[TestFailedException] {
+// Expected 23.1 and 23.63 to be within 2.0% using relative 
error.
+assert(23.1 ~== 23.63 %+- 2.0)
+  }
+  intercept[TestFailedException] {
+// Expected 23.1 and 22.34 to be within 2.0% using relative 
error.
+assert(23.1 ~== 22.34 %+- 2.0)
+  }
+}
+
+withClue(Comparing against zero should fail the test and throw 
exception with message.) {
+  intercept[TestFailedException] {
+// 0.1 or 0.0 is extremely close to zero, so the relative error is 
meaningless.
+assert(0.1 ~== 0.0 %+- 3.2)
+  }
+  intercept[TestFailedException] {
+// 0.1 or 0.0 is extremely close to zero, so the relative error is 
meaningless.
+assert(0.1 ~= 0.0 %+- 3.2)
+  }
+  intercept[TestFailedException] {
+// 0.1 or 0.0 is extremely close to zero, so the relative error is 
meaningless.
+assert(0.1 !~== 0.0 %+- 3.2)
+  }
+  intercept[TestFailedException] {
+// 0.1 or 0.0 is extremely close to zero, so the relative error is 
meaningless.
+assert(0.1 !~= 0.0 %+- 3.2)
+  }
+  intercept[TestFailedException] {
+// 0.0 or 0.1 is extremely close to zero, so the relative error is 
meaningless.
+assert(0.0 ~== 0.1 %+- 3.2)
+  }
+  intercept[TestFailedException] {
+// 0.0 or 0.1 is extremely close to zero, so the relative error is 
meaningless.
+assert(0.0 ~= 0.1 %+- 3.2)
+  }
+  intercept[TestFailedException] {
+// 0.0 or 0.1 is extremely close to zero, so the relative error is 
meaningless.
+assert(0.0 !~== 0.1 %+- 3.2)
+  }
+  intercept[TestFailedException] {
+// 0.0 or 0.1 is extremely close to zero, so the relative error is 
meaningless.
+assert(0.0 !~= 0.1 %+- 3.2)
+  }
+}
+
+// Comparisons of numbers very close to zero.
+assert(10 * Double.MinPositiveValue ~== 9.5 * Double.MinPositiveValue 
%+- 1.0)
+assert(10 * Double.MinPositiveValue !~== 11 * Double.MinPositiveValue 
%+- 1.0)
+
+assert(-Double.MinPositiveValue ~== 1.18 * -Double.MinPositiveValue 
%+- 1.2)
+

[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-24 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/1425#discussion_r15361772
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/util/TestingUtilsSuite.scala ---
@@ -0,0 +1,264 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.util
+
+import org.apache.spark.mllib.linalg.Vectors
+import org.scalatest.FunSuite
+import org.apache.spark.mllib.util.TestingUtils._
+import org.scalatest.exceptions.TestFailedException
+
+class TestingUtilsSuite extends FunSuite {
+
+  test(Comparing doubles using relative error in percentage.) {
+
+assert(23.1 ~== 23.52 %+- 2.0)
+assert(23.1 ~== 22.74 %+- 2.0)
+assert(23.1 ~= 23.52 %+- 2.0)
+assert(23.1 ~= 22.74 %+- 2.0)
+assert(!(23.1 !~= 23.52 %+- 2.0))
+assert(!(23.1 !~= 22.74 %+- 2.0))
+
+withClue(Should throw exception with message when test fails.) {
+  intercept[TestFailedException] {
+// This will throw exception with the following message.
+// Did not expect 23.1 and 23.52 to be within 2.0% using relative 
error.
+assert(23.1 !~== 23.52 %+- 2.0)
+  }
+  intercept[TestFailedException] {
+// Did not expect 23.1 and 22.74 to be within 2.0% using relative 
error.
+assert(23.1 !~== 22.74 %+- 2.0)
+  }
+}
+
+assert(23.1 !~== 23.63 %+- 2.0)
+assert(23.1 !~== 22.34 %+- 2.0)
+assert(23.1 !~= 23.63 %+- 2.0)
+assert(23.1 !~= 22.34 %+- 2.0)
+assert(!(23.1 ~= 23.63 %+- 2.0))
+assert(!(23.1 ~= 22.34 %+- 2.0))
+
+withClue(Should throw exception with message when test fails.) {
+  intercept[TestFailedException] {
+// Expected 23.1 and 23.63 to be within 2.0% using relative 
error.
+assert(23.1 ~== 23.63 %+- 2.0)
+  }
+  intercept[TestFailedException] {
+// Expected 23.1 and 22.34 to be within 2.0% using relative 
error.
+assert(23.1 ~== 22.34 %+- 2.0)
+  }
+}
+
+withClue(Comparing against zero should fail the test and throw 
exception with message.) {
+  intercept[TestFailedException] {
+// 0.1 or 0.0 is extremely close to zero, so the relative error is 
meaningless.
+assert(0.1 ~== 0.0 %+- 3.2)
+  }
+  intercept[TestFailedException] {
+// 0.1 or 0.0 is extremely close to zero, so the relative error is 
meaningless.
+assert(0.1 ~= 0.0 %+- 3.2)
+  }
+  intercept[TestFailedException] {
+// 0.1 or 0.0 is extremely close to zero, so the relative error is 
meaningless.
+assert(0.1 !~== 0.0 %+- 3.2)
+  }
+  intercept[TestFailedException] {
+// 0.1 or 0.0 is extremely close to zero, so the relative error is 
meaningless.
+assert(0.1 !~= 0.0 %+- 3.2)
+  }
+  intercept[TestFailedException] {
+// 0.0 or 0.1 is extremely close to zero, so the relative error is 
meaningless.
+assert(0.0 ~== 0.1 %+- 3.2)
+  }
+  intercept[TestFailedException] {
+// 0.0 or 0.1 is extremely close to zero, so the relative error is 
meaningless.
+assert(0.0 ~= 0.1 %+- 3.2)
+  }
+  intercept[TestFailedException] {
+// 0.0 or 0.1 is extremely close to zero, so the relative error is 
meaningless.
+assert(0.0 !~== 0.1 %+- 3.2)
+  }
+  intercept[TestFailedException] {
+// 0.0 or 0.1 is extremely close to zero, so the relative error is 
meaningless.
+assert(0.0 !~= 0.1 %+- 3.2)
+  }
+}
+
+// Comparisons of numbers very close to zero.
+assert(10 * Double.MinPositiveValue ~== 9.5 * Double.MinPositiveValue 
%+- 1.0)
+assert(10 * Double.MinPositiveValue !~== 11 * Double.MinPositiveValue 
%+- 1.0)
+
+assert(-Double.MinPositiveValue ~== 1.18 * -Double.MinPositiveValue 
%+- 1.2)
+

[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-24 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/1425#discussion_r15361805
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/util/TestingUtilsSuite.scala ---
@@ -0,0 +1,264 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.util
+
+import org.apache.spark.mllib.linalg.Vectors
+import org.scalatest.FunSuite
+import org.apache.spark.mllib.util.TestingUtils._
+import org.scalatest.exceptions.TestFailedException
+
+class TestingUtilsSuite extends FunSuite {
+
+  test(Comparing doubles using relative error in percentage.) {
+
+assert(23.1 ~== 23.52 %+- 2.0)
+assert(23.1 ~== 22.74 %+- 2.0)
+assert(23.1 ~= 23.52 %+- 2.0)
+assert(23.1 ~= 22.74 %+- 2.0)
+assert(!(23.1 !~= 23.52 %+- 2.0))
+assert(!(23.1 !~= 22.74 %+- 2.0))
+
+withClue(Should throw exception with message when test fails.) {
+  intercept[TestFailedException] {
+// This will throw exception with the following message.
+// Did not expect 23.1 and 23.52 to be within 2.0% using relative 
error.
+assert(23.1 !~== 23.52 %+- 2.0)
+  }
+  intercept[TestFailedException] {
+// Did not expect 23.1 and 22.74 to be within 2.0% using relative 
error.
+assert(23.1 !~== 22.74 %+- 2.0)
+  }
+}
+
+assert(23.1 !~== 23.63 %+- 2.0)
+assert(23.1 !~== 22.34 %+- 2.0)
+assert(23.1 !~= 23.63 %+- 2.0)
+assert(23.1 !~= 22.34 %+- 2.0)
+assert(!(23.1 ~= 23.63 %+- 2.0))
+assert(!(23.1 ~= 22.34 %+- 2.0))
+
+withClue(Should throw exception with message when test fails.) {
+  intercept[TestFailedException] {
+// Expected 23.1 and 23.63 to be within 2.0% using relative 
error.
+assert(23.1 ~== 23.63 %+- 2.0)
+  }
+  intercept[TestFailedException] {
+// Expected 23.1 and 22.34 to be within 2.0% using relative 
error.
+assert(23.1 ~== 22.34 %+- 2.0)
+  }
+}
+
+withClue(Comparing against zero should fail the test and throw 
exception with message.) {
+  intercept[TestFailedException] {
+// 0.1 or 0.0 is extremely close to zero, so the relative error is 
meaningless.
+assert(0.1 ~== 0.0 %+- 3.2)
+  }
+  intercept[TestFailedException] {
+// 0.1 or 0.0 is extremely close to zero, so the relative error is 
meaningless.
+assert(0.1 ~= 0.0 %+- 3.2)
+  }
+  intercept[TestFailedException] {
+// 0.1 or 0.0 is extremely close to zero, so the relative error is 
meaningless.
+assert(0.1 !~== 0.0 %+- 3.2)
+  }
+  intercept[TestFailedException] {
+// 0.1 or 0.0 is extremely close to zero, so the relative error is 
meaningless.
+assert(0.1 !~= 0.0 %+- 3.2)
+  }
+  intercept[TestFailedException] {
+// 0.0 or 0.1 is extremely close to zero, so the relative error is 
meaningless.
+assert(0.0 ~== 0.1 %+- 3.2)
+  }
+  intercept[TestFailedException] {
+// 0.0 or 0.1 is extremely close to zero, so the relative error is 
meaningless.
+assert(0.0 ~= 0.1 %+- 3.2)
+  }
+  intercept[TestFailedException] {
+// 0.0 or 0.1 is extremely close to zero, so the relative error is 
meaningless.
+assert(0.0 !~== 0.1 %+- 3.2)
+  }
+  intercept[TestFailedException] {
+// 0.0 or 0.1 is extremely close to zero, so the relative error is 
meaningless.
+assert(0.0 !~= 0.1 %+- 3.2)
+  }
+}
+
+// Comparisons of numbers very close to zero.
+assert(10 * Double.MinPositiveValue ~== 9.5 * Double.MinPositiveValue 
%+- 1.0)
+assert(10 * Double.MinPositiveValue !~== 11 * Double.MinPositiveValue 
%+- 1.0)
+
+assert(-Double.MinPositiveValue ~== 1.18 * -Double.MinPositiveValue 
%+- 1.2)
+

[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-50064220
  
QA tests have started for PR 1425. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17129/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-24 Thread dbtsai
Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-50064963
  
@mengxr `%+-` is used as an operator to indicate this is relative error. 
Users can write `assert(a ~== b %+- 1E-10)` for relative error, and `assert(a 
~== b +- 1E-10)` for absolute error. 

As a result, the syntactic sugar would be the same as scalatest for 
absolute error except they use `===` instead of `~==`. 

On the other hand, however, using `absErr`/`relErr` seems to be easier to 
remember. I'm open to both, and it's easy to change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-50068870
  
QA results for PR 1425:br- This patch FAILED unit tests.br- This patch 
merges cleanlybr- This patch adds the following public classes 
(experimental):brcase class VectorWithCompare(x: Vector) extends 
Ordered[VectorWithCompare] {brcase class CompareDoubleRightSide(val fun: 
(Double, Double, Double) = Boolean,brcase class CompareVectorRightSide(val 
fun: (Vector, Vector, Double) = Boolean,brclass TestingUtilsSuite extends 
FunSuite {brbrFor more information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17129/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-50075991
  
QA tests have started for PR 1425. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17136/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-50079139
  
QA tests have started for PR 1425. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17137/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-50081089
  
QA results for PR 1425:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds the following public classes 
(experimental):brcase class VectorWithCompare(x: Vector) extends 
Ordered[VectorWithCompare] {brcase class CompareDoubleRightSide(val fun: 
(Double, Double, Double) = Boolean,brcase class CompareVectorRightSide(val 
fun: (Vector, Vector, Double) = Boolean,brclass TestingUtilsSuite extends 
FunSuite {brbrFor more information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17136/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-24 Thread dbtsai
Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-50081864
  
@mengxr I just rebased against master, and it passes the test. Depending on 
whether we want to use `absErr`/`relErr`, `+-`/`%+-` or both, I can do further 
modification. Tks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-50083800
  
QA results for PR 1425:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds the following public classes 
(experimental):brcase class VectorWithCompare(x: Vector) extends 
Ordered[VectorWithCompare] {brcase class CompareDoubleRightSide(brcase 
class CompareVectorRightSide(brclass TestingUtilsSuite extends FunSuite 
{brbrFor more information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17137/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-23 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-49853321
  
@mengxr Sure, maybe the % syntax isn't helpful. I just mean two different 
operators or methods of some kind. Why bother with these issues instead of 
making two methods?

Yes, choosing 1.0 as the switching point removes the discontinuity. I think 
it will surprise readers to find that, in a test with a series of checks like 
0.1 +- 0.01, 1 +- 0.01, 3 +- 0.01 that the latter doesn't mean 
[2.99,3.01], when the first two do in fact mean [0.09,0.11] and [0.99,1.01], 
which matches what one would expect from all these other unit testing 
frameworks.

@dbtsai I think developing two operators is a good solution If there is a 
separate operator for relative error, you don't need to special-case the 
behavior. Sure it's meaningless to make a relative error test at 0 but you can 
just warn the caller; it's well-defined what happens. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-49953475
  
QA tests have started for PR 1425. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17071/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-49954126
  
QA tests have started for PR 1425. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17073/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-23 Thread dbtsai
Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-49954543
  
@srowen @mengxr and @dorx

Based on our discussion, I've implemented two different APIs for relative 
error, and absolute error. It makes sense that test writers should know which 
one they need depending on their circumstances. 

Developers also need to explicitly specify the eps now, and there is no 
default value which will sometimes cause confusion. 

When comparing against zero using relative error, a exception will be 
raised to warn users that it's meaningless.

For relative error in percentage, users can now write 

assert(23.1 ~== 23.52 %+- 2.0)
assert(23.1 ~== 22.74 %+- 2.0)
assert(23.1 ~= 23.52 %+- 2.0)
assert(23.1 ~= 22.74 %+- 2.0)
assert(!(23.1 !~= 23.52 %+- 2.0))
assert(!(23.1 !~= 22.74 %+- 2.0))

// This will throw exception with the following message.
// Did not expect 23.1 and 23.52 to be within 2.0% using relative 
error.
assert(23.1 !~== 23.52 %+- 2.0)

// Expected 23.1 and 22.34 to be within 2.0% using relative error.
assert(23.1 ~== 22.34 %+- 2.0)
  
For absolute error, 

assert(17.8 ~== 17.99 +- 0.2)
assert(17.8 ~== 17.61 +- 0.2)
assert(17.8 ~= 17.99 +- 0.2)
assert(17.8 ~= 17.61 +- 0.2)
assert(!(17.8 !~= 17.99 +- 0.2))
assert(!(17.8 !~= 17.61 +- 0.2))

// This will throw exception with the following message.
// Did not expect 17.8 and 17.99 to be within 0.2 using absolute 
error.
assert(17.8 !~== 17.99 +- 0.2)
 
// Expected 17.8 and 17.59 to be within 0.2 using absolute error.
assert(17.8 ~== 17.59 +- 0.2)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-49956756
  
QA tests have started for PR 1425. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17080/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-49959451
  
QA results for PR 1425:br- This patch FAILED unit tests.br- This patch 
merges cleanlybr- This patch adds the following public classes 
(experimental):brcase class VectorWithCompare(val x: Vector) extends 
Ordered[VectorWithCompare] {brcase class CompareDoubleRightSide(val fun: 
(Double, Double, Double) = Boolean,brcase class CompareVectorRightSide(val 
fun: (Vector, Vector, Double) = Boolean,brclass TestingUtilsSuite extends 
FunSuite {brbrFor more information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17071/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-49960072
  
QA results for PR 1425:br- This patch FAILED unit tests.br- This patch 
merges cleanlybr- This patch adds the following public classes 
(experimental):brcase class VectorWithCompare(val x: Vector) extends 
Ordered[VectorWithCompare] {brcase class CompareDoubleRightSide(val fun: 
(Double, Double, Double) = Boolean,brcase class CompareVectorRightSide(val 
fun: (Vector, Vector, Double) = Boolean,brclass TestingUtilsSuite extends 
FunSuite {brbrFor more information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17073/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-49962119
  
QA results for PR 1425:br- This patch FAILED unit tests.br- This patch 
merges cleanlybr- This patch adds the following public classes 
(experimental):brcase class VectorWithCompare(val x: Vector) extends 
Ordered[VectorWithCompare] {brcase class CompareDoubleRightSide(val fun: 
(Double, Double, Double) = Boolean,brcase class CompareVectorRightSide(val 
fun: (Vector, Vector, Double) = Boolean,brclass TestingUtilsSuite extends 
FunSuite {brbrFor more information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17080/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-49962839
  
QA tests have started for PR 1425. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17091/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-49964740
  
QA results for PR 1425:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds the following public classes 
(experimental):brcase class VectorWithCompare(val x: Vector) extends 
Ordered[VectorWithCompare] {brcase class CompareDoubleRightSide(val fun: 
(Double, Double, Double) = Boolean,brcase class CompareVectorRightSide(val 
fun: (Vector, Vector, Double) = Boolean,brclass TestingUtilsSuite extends 
FunSuite {brbrFor more information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17091/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-22 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/1425#discussion_r15216588
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/util/TestingUtils.scala ---
@@ -18,28 +18,90 @@
 package org.apache.spark.mllib.util
 
 import org.apache.spark.mllib.linalg.Vector
+import org.scalatest.exceptions.TestFailedException
 
 object TestingUtils {
 
+  val defaultEpsilon = 1E-10
+
   implicit class DoubleWithAlmostEquals(val x: Double) {
-// An improved version of AlmostEquals would always divide by the 
larger number.
-// This will avoid the problem of diving by zero.
-def almostEquals(y: Double, epsilon: Double = 1E-10): Boolean = {
-  if(x == y) {
+
+def almostEquals(y: Double, eps: Double = defaultEpsilon): Boolean = {
+  val absX = math.abs(x)
+  val absY = math.abs(y)
+  val diff = math.abs(x - y)
+  if (x == y) {
 true
-  } else if(math.abs(x)  math.abs(y)) {
-math.abs(x - y) / math.abs(x)  epsilon
+  } else if (absX  1E-15 || absY  1E-15) {
--- End diff --

See commentary at 
https://issues.apache.org/jira/browse/SPARK-2599?focusedCommentId=14068293page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14068293
 I can see the idea here, but all of these kinds of efforts seem to lead to 
errors or unintuitive behavior. For example this line means that:

Fails:
expected = 1e-15
actual = 2e-15
eps = 0.1

Passes:
expected = 1e-16
actual = 2e-16
eps = 0.1

Why is 1e-15 special anyway?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-22 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-49713754
  
Is it possible to support syntax like `0.3 +- 0.1` for absolute error, and 
`0.3 +- 10%` for relative error? Seems like the kind of crazy thing that Scala 
just might support. Maybe it's a nice way to support both semantics; I think 
relative error semantics have to be a separate method anwayy.

Also, there is the method `Math.ulp` 
(http://docs.oracle.com/javase/7/docs/api/java/lang/Math.html#ulp(double)) 
whichs tell you how big the gap is between floating-point values around a given 
value. How about using this to pick a reasonable absolute error around the 
test's expected value? At least it scales automatically.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-22 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-49808612
  
@srowen `+- 10%` is not very practical because we usually need `+- 
0.0001%`. For most numerical computation, the switching point is `1.0`. 
Above `1.0`, use relative error or absolute error otherwise. Because of `1.0`, 
this is a smooth change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-21 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-49681818
  
@dbtsai The suggestion of using `~==` and `~=` looks good to me. But is 
`!~==` really used somewhere in the tests?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-21 Thread dbtsai
Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-49682436
  
`!~==` will be used in the test since `!(a~==b)` will not work due to that 
(a~==b) is not returning false but throwing exception for messaging. I will 
replace the almostEquals with `~==`. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-21 Thread dorx
Github user dorx commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-49683184
  
@dbtsai  this is awesome! I actually created a JIRA on this after trying to 
use TestUtils in one of my unit suites, but it looks like you're already taking 
care of it. https://issues.apache.org/jira/browse/SPARK-2599


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-21 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-49698702
  
@dbtsai I saw why `!(a~==b)` doesn't work but the question was that `!~==` 
was not used in our tests except the unit tests for itself.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-49318202
  
QA tests have started for PR 1425. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16780/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-49321741
  
QA tests have started for PR 1425. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16781/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-16 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-49137261
  
@dbtsai The assertions with `===` were all tested to work, but I agree it 
is more robust to allow numerical errors. One downside of this change is that 
`===` reports the values in comparison when something is wrong but now 
`almostEquals` only returns true/false. It would be great if we can make the 
implementation similar to `===`.

Btw, Scalatest 2.x has this tolerance feature, where you can use `+-` to 
indicate a range. We are not using Scalatest 2.x but it is a useful feature.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-16 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/1425#discussion_r14988046
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetricsSuite.scala
 ---
@@ -20,8 +20,20 @@ package org.apache.spark.mllib.evaluation
 import org.scalatest.FunSuite
 
 import org.apache.spark.mllib.util.LocalSparkContext
+import org.apache.spark.mllib.util.TestingUtils._
 
 class BinaryClassificationMetricsSuite extends FunSuite with 
LocalSparkContext {
+
+  implicit class SeqDoubleWithAlmostEquals(val x: Seq[Double]) {
+def almostEquals(y: Seq[Double], eps: Double = 1E-6): Boolean =
--- End diff --

1.0e-6 is way bigger than an ulp for a double; 1.0e-12 is more like it. I 
understand a complex calculation might legitimately vary by significantly more 
than an ulp depending on the implementation. As @mengxr says where you mean to 
allow significantly more than machine precision worth of noise, that's probably 
good to do with an explicitly larger epsilon. But this is certainly a good step 
forward already.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-16 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/1425#discussion_r15013544
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/classification/LogisticRegressionSuite.scala
 ---
@@ -81,9 +82,8 @@ class LogisticRegressionSuite extends FunSuite with 
LocalSparkContext with Match
 val model = lr.run(testRDD)
 
 // Test the weights
-val weight0 = model.weights(0)
-assert(weight0 = -1.60  weight0 = -1.40, weight0 +  not in [-1.6, 
-1.4])
-assert(model.intercept = 1.9  model.intercept = 2.1, 
model.intercept +  not in [1.9, 2.1])
+assert(model.weights(0).almostEquals(-1.5244128696247), weight0 
should be -1.5244128696247)
--- End diff --

We can have higher relative error here instead. If the implementation is 
changed, it's also nice to have a test which can catch the slightly different 
behavior. Also, updating those numbers will not take too much time comparing 
with the implementation work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-16 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/1425#discussion_r15013786
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetricsSuite.scala
 ---
@@ -20,8 +20,20 @@ package org.apache.spark.mllib.evaluation
 import org.scalatest.FunSuite
 
 import org.apache.spark.mllib.util.LocalSparkContext
+import org.apache.spark.mllib.util.TestingUtils._
 
 class BinaryClassificationMetricsSuite extends FunSuite with 
LocalSparkContext {
+
+  implicit class SeqDoubleWithAlmostEquals(val x: Seq[Double]) {
+def almostEquals(y: Seq[Double], eps: Double = 1E-6): Boolean =
--- End diff --

Yeah, for one ulp, it might be 10e-15. Lots of time, I manually type the 
numbers or just copy the first couple dights of numbers to save the line space, 
so that's why I chose 1.0e-6. Thus, I can just type around 7 digits of numbers. 

I agree with you that in this case, we may want to explicitly specify with 
larger epsilon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-16 Thread dbtsai
Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-49221370
  
@mengxr  Scalatest 2.x has the tolerance feature, but it's absolute error 
not relative error. For large numbers, the absolute error may not be 
meaningful. With `===`, it will return false even the different is only one 
unit of least precision (ULP), and it often happens when running the unittest 
under different architecture of machine. For example, ARM and X86 may have 
different numerical rounding , and we don't run any test other than X86. C++ 
boost has their numerical `===` test with the relative error for this reason.

I probably can add method called `~=` and `~==` method for `Double`, and 
`Vector` type using implicit class, and `~==` will raise the exception for the 
message purpose like `===` does.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-16 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-49221957
  
`almostEquals` reads better than `~===`. The feature we like is having the 
values in comparison in the error message but not the name :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-16 Thread dbtsai
Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-49222983
  
I learn `almostEquals` from boost library. Anyway, in this case, how do we 
distinguish the one with throwing out the message, and the one just returning 
true/false?

`almostEquals` and `almostEqualsWithMessage`? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-16 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-49252992
  
QA tests have started for PR 1425. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16763/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-16 Thread dbtsai
Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-49253108
  
@mengxr  and @srowen  What do you think `assert((0.0001 !~== 0.0) +- 
1E-5)`? We have `~==` and `~==` which will have the error message in the latest 
commit from my co-worker.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-15 Thread dbtsai
GitHub user dbtsai opened a pull request:

https://github.com/apache/spark/pull/1425

[SPARK-2479][MLlib] Comparing floating-point numbers using relative error 
in UnitTests

Floating point math is not exact, and most floating-point numbers end up 
being slightly imprecise due to rounding errors. Simple values like 0.1 cannot 
be precisely represented using binary floating point numbers, and the limited 
precision of floating point numbers means that slight changes in the order of 
operations or the precision of intermediates can change the result. That means 
that comparing two floats to see if they are equal is usually not what we want. 
As long as this imprecision stays small, it can usually be ignored.
See the following famous article for detail.

http://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/
For example:
float a = 0.15 + 0.15
float b = 0.1 + 0.2
if(a == b) // can be false!
if(a = b) // can also be false!

(ps, not all the tests involving floating point comparisons are changed to 
use almostEquals) 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/AlpineNow/spark 
SPARK-2479_comparing_floating_point

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1425.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1425


commit f4da8f4f8693763b4823e36e3d270b74a7ce67bf
Author: DB Tsai dbt...@alpinenow.com
Date:   2014-07-14T23:24:11Z

Alpine Data Labs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-49114541
  
QA tests have started for PR 1425. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16698/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---