[jira] [Updated] (SPARK-21344) BinaryType comparison does signed byte array comparison

2017-07-07 Thread Dongjoon Hyun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-21344:
--
Affects Version/s: 2.0.0

> BinaryType comparison does signed byte array comparison
> ---
>
> Key: SPARK-21344
> URL: https://issues.apache.org/jira/browse/SPARK-21344
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.1.1
>Reporter: Shubham Chopra
>
> BinaryType used by Spark SQL defines ordering using signed byte comparisons. 
> This can lead to unexpected behavior. Consider the following code snippet 
> that shows this error:
> {code}
> case class TestRecord(col0: Array[Byte])
> def convertToBytes(i: Long): Array[Byte] = {
> val bb = java.nio.ByteBuffer.allocate(8)
> bb.putLong(i)
> bb.array
>   }
> def test = {
> val sql = spark.sqlContext
> import sql.implicits._
> val timestamp = 1498772083037L
> val data = (timestamp to timestamp + 1000L).map(i => 
> TestRecord(convertToBytes(i)))
> val testDF = sc.parallelize(data).toDF
> val filter1 = testDF.filter(col("col0") >= convertToBytes(timestamp) && 
> col("col0") < convertToBytes(timestamp + 50L))
> val filter2 = testDF.filter(col("col0") >= convertToBytes(timestamp + 
> 50L) && col("col0") < convertToBytes(timestamp + 100L))
> val filter3 = testDF.filter(col("col0") >= convertToBytes(timestamp) && 
> col("col0") < convertToBytes(timestamp + 100L))
> assert(filter1.count == 50)
> assert(filter2.count == 50)
> assert(filter3.count == 100)
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21344) BinaryType comparison does signed byte array comparison

2017-07-07 Thread Dongjoon Hyun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-21344:
--
Priority: Major  (was: Blocker)

> BinaryType comparison does signed byte array comparison
> ---
>
> Key: SPARK-21344
> URL: https://issues.apache.org/jira/browse/SPARK-21344
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1
>Reporter: Shubham Chopra
>
> BinaryType used by Spark SQL defines ordering using signed byte comparisons. 
> This can lead to unexpected behavior. Consider the following code snippet 
> that shows this error:
> {code}
> case class TestRecord(col0: Array[Byte])
> def convertToBytes(i: Long): Array[Byte] = {
> val bb = java.nio.ByteBuffer.allocate(8)
> bb.putLong(i)
> bb.array
>   }
> def test = {
> val sql = spark.sqlContext
> import sql.implicits._
> val timestamp = 1498772083037L
> val data = (timestamp to timestamp + 1000L).map(i => 
> TestRecord(convertToBytes(i)))
> val testDF = sc.parallelize(data).toDF
> val filter1 = testDF.filter(col("col0") >= convertToBytes(timestamp) && 
> col("col0") < convertToBytes(timestamp + 50L))
> val filter2 = testDF.filter(col("col0") >= convertToBytes(timestamp + 
> 50L) && col("col0") < convertToBytes(timestamp + 100L))
> val filter3 = testDF.filter(col("col0") >= convertToBytes(timestamp) && 
> col("col0") < convertToBytes(timestamp + 100L))
> assert(filter1.count == 50)
> assert(filter2.count == 50)
> assert(filter3.count == 100)
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21344) BinaryType comparison does signed byte array comparison

2017-07-07 Thread Shubham Chopra (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shubham Chopra updated SPARK-21344:
---
Priority: Blocker  (was: Major)

> BinaryType comparison does signed byte array comparison
> ---
>
> Key: SPARK-21344
> URL: https://issues.apache.org/jira/browse/SPARK-21344
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1
>Reporter: Shubham Chopra
>Priority: Blocker
>
> BinaryType used by Spark SQL defines ordering using signed byte comparisons. 
> This can lead to unexpected behavior. Consider the following code snippet 
> that shows this error:
> {code}
> case class TestRecord(col0: Array[Byte])
> def convertToBytes(i: Long): Array[Byte] = {
> val bb = java.nio.ByteBuffer.allocate(8)
> bb.putLong(i)
> bb.array
>   }
> def test = {
> val sql = spark.sqlContext
> import sql.implicits._
> val timestamp = 1498772083037L
> val data = (timestamp to timestamp + 1000L).map(i => 
> TestRecord(convertToBytes(i)))
> val testDF = sc.parallelize(data).toDF
> val filter1 = testDF.filter(col("col0") >= convertToBytes(timestamp) && 
> col("col0") < convertToBytes(timestamp + 50L))
> val filter2 = testDF.filter(col("col0") >= convertToBytes(timestamp + 
> 50L) && col("col0") < convertToBytes(timestamp + 100L))
> val filter3 = testDF.filter(col("col0") >= convertToBytes(timestamp) && 
> col("col0") < convertToBytes(timestamp + 100L))
> assert(filter1.count == 50)
> assert(filter2.count == 50)
> assert(filter3.count == 100)
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21344) BinaryType comparison does signed byte array comparison

2017-07-07 Thread Shubham Chopra (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shubham Chopra updated SPARK-21344:
---
Description: 
BinaryType used by Spark SQL defines ordering using signed byte comparisons. 
This can lead to unexpected behavior. Consider the following code snippet that 
shows this error:

{code}
case class TestRecord(col0: Array[Byte])
def convertToBytes(i: Long): Array[Byte] = {
val bb = java.nio.ByteBuffer.allocate(8)
bb.putLong(i)
bb.array
  }
def test = {
val sql = spark.sqlContext
import sql.implicits._
val timestamp = 1498772083037L
val data = (timestamp to timestamp + 1000L).map(i => 
TestRecord(convertToBytes(i)))
val testDF = sc.parallelize(data).toDF
val filter1 = testDF.filter(col("col0") >= convertToBytes(timestamp) && 
col("col0") < convertToBytes(timestamp + 50L))
val filter2 = testDF.filter(col("col0") >= convertToBytes(timestamp + 50L) 
&& col("col0") < convertToBytes(timestamp + 100L))
val filter3 = testDF.filter(col("col0") >= convertToBytes(timestamp) && 
col("col0") < convertToBytes(timestamp + 100L))
assert(filter1.count == 50)
assert(filter2.count == 50)
assert(filter3.count == 100)
}
{code}




  was:
BinaryType used by Spark SQL defines ordering using signed byte comparisons. 
This can lead to unexpected behavior. Consider the following code snippet that 
shows this error:

{code:scala}
case class TestRecord(col0: Array[Byte])
def convertToBytes(i: Long): Array[Byte] = {
val bb = java.nio.ByteBuffer.allocate(8)
bb.putLong(i)
bb.array
  }
def test = {
val sql = spark.sqlContext
import sql.implicits._
val timestamp = 1498772083037L
val data = (timestamp to timestamp + 1000L).map(i => 
TestRecord(convertToBytes(i)))
val testDF = sc.parallelize(data).toDF
val filter1 = testDF.filter(col("col0") >= convertToBytes(timestamp) && 
col("col0") < convertToBytes(timestamp + 50L))
val filter2 = testDF.filter(col("col0") >= convertToBytes(timestamp + 50L) 
&& col("col0") < convertToBytes(timestamp + 100L))
val filter3 = testDF.filter(col("col0") >= convertToBytes(timestamp) && 
col("col0") < convertToBytes(timestamp + 100L))
assert(filter1.count == 50)
assert(filter2.count == 50)
assert(filter3.count == 100)
}
{code}





> BinaryType comparison does signed byte array comparison
> ---
>
> Key: SPARK-21344
> URL: https://issues.apache.org/jira/browse/SPARK-21344
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1
>Reporter: Shubham Chopra
>
> BinaryType used by Spark SQL defines ordering using signed byte comparisons. 
> This can lead to unexpected behavior. Consider the following code snippet 
> that shows this error:
> {code}
> case class TestRecord(col0: Array[Byte])
> def convertToBytes(i: Long): Array[Byte] = {
> val bb = java.nio.ByteBuffer.allocate(8)
> bb.putLong(i)
> bb.array
>   }
> def test = {
> val sql = spark.sqlContext
> import sql.implicits._
> val timestamp = 1498772083037L
> val data = (timestamp to timestamp + 1000L).map(i => 
> TestRecord(convertToBytes(i)))
> val testDF = sc.parallelize(data).toDF
> val filter1 = testDF.filter(col("col0") >= convertToBytes(timestamp) && 
> col("col0") < convertToBytes(timestamp + 50L))
> val filter2 = testDF.filter(col("col0") >= convertToBytes(timestamp + 
> 50L) && col("col0") < convertToBytes(timestamp + 100L))
> val filter3 = testDF.filter(col("col0") >= convertToBytes(timestamp) && 
> col("col0") < convertToBytes(timestamp + 100L))
> assert(filter1.count == 50)
> assert(filter2.count == 50)
> assert(filter3.count == 100)
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org