[GitHub] spark issue #21794: [SPARK-24834][CORE] use java comparison for float and do...

2018-08-22 Thread bavardage
Github user bavardage commented on the issue:

https://github.com/apache/spark/pull/21794
  
yep fair - the intent I think was clarity rather than necessarily perf: 
it's misleading to have a method named 'nan safe' which has no special handling 
of nans. I'll look at opening a different PR which could increase clarity/may 
have minor perf benefit.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21794: [SPARK-24834][CORE] use java comparison for float and do...

2018-08-16 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/21794
  
I think we'd have to close this due to the behavior change, but would merge 
an optimization of the existing behavior.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21794: [SPARK-24834][CORE] use java comparison for float and do...

2018-07-18 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/21794
  
`spark-sql` suggests that -0 and 0 are considered the same though. `SELECT 
-0.0 == 0.0;` returns `true`. It's probably essential not to change behavior 
here, but if performance is the issue, I think the method can be optimized.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21794: [SPARK-24834][CORE] use java comparison for float and do...

2018-07-18 Thread bavardage
Github user bavardage commented on the issue:

https://github.com/apache/spark/pull/21794
  
it does seem that spark currently does distinguish -0 and 0, at least as 
far as groupbys go

```
scala> case class Thing(x : Float)
defined class Thing

scala> val df = Seq(Thing(0.0f), Thing(-0.0f),Thing(0.0f), 
Thing(-0.0f),Thing(0.0f), Thing(-0.0f),Thing(0.0f), Thing(-0.0f)).toDF
2018-07-17 13:17:08 WARN  ObjectStore:568 - Failed to get database 
global_temp, returning NoSuchObjectException
df: org.apache.spark.sql.DataFrame = [x: float]

scala> df.groupBy($"x").count
res0: org.apache.spark.sql.DataFrame = [x: float, count: bigint]

scala> res0.collect
res1: Array[org.apache.spark.sql.Row] = Array([-0.0,4], [0.0,4])
```

doubles are hashed via `doubleToLongBits` 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala#L338
 which gives different bitwise representations of positive and negative 0


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21794: [SPARK-24834][CORE] use java comparison for float and do...

2018-07-17 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/21794
  
BTW if it becomes necessary to not change the semantics, I think the 
methods could at least be streamlined a bit:

```
if (x < y) {
  -1
} else if (x > y) {
  1
} else if (x == y) {
  0
} else {
  if (java.lang.Double.isNaN(x)) {
if (java.lang.Double.isNaN(y)) {
  0
} else {
  1
}
  } else {
-1
  }
}
```

More tests can't hurt, too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21794: [SPARK-24834][CORE] use java comparison for float and do...

2018-07-17 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21794
  
It would be good to add test cases for them since it is not covered now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21794: [SPARK-24834][CORE] use java comparison for float and do...

2018-07-17 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21794
  
I think this is required since SparkSQL does not distinguish 0.0 from -0.0. 
Am I correct?
cc @gatorsmile @maropu



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21794: [SPARK-24834][CORE] use java comparison for float and do...

2018-07-17 Thread bavardage
Github user bavardage commented on the issue:

https://github.com/apache/spark/pull/21794
  
There is a way in which the `nanSafeCompare*` methods do differ from java 
built-in, but that wasn't captured in the test cases (or in the suggestive 
naming of the method), namely the handling of `-0.0` vs `0.0`.

`java.lang.Double.compare(-0.0d, 0.0d) == -1`
`Utils.nanSafeCompareDoubles(-0.0d, 0.0d) == 0`

If that behaviour is indeed required, I could abandon this PR and instead 
fix the test cases to capture this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21794: [SPARK-24834][CORE] use java comparison for float and do...

2018-07-17 Thread bavardage
Github user bavardage commented on the issue:

https://github.com/apache/spark/pull/21794
  
cc @JoshRosen who introduced this code originally, I think


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21794: [SPARK-24834][CORE] use java comparison for float and do...

2018-07-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21794
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21794: [SPARK-24834][CORE] use java comparison for float and do...

2018-07-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21794
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org