[jira] [Updated] (SPARK-21344) BinaryType comparison does signed byte array comparison
[ https://issues.apache.org/jira/browse/SPARK-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-21344: -- Affects Version/s: 2.0.0 > BinaryType comparison does signed byte array comparison > --- > > Key: SPARK-21344 > URL: https://issues.apache.org/jira/browse/SPARK-21344 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.1.1 >Reporter: Shubham Chopra > > BinaryType used by Spark SQL defines ordering using signed byte comparisons. > This can lead to unexpected behavior. Consider the following code snippet > that shows this error: > {code} > case class TestRecord(col0: Array[Byte]) > def convertToBytes(i: Long): Array[Byte] = { > val bb = java.nio.ByteBuffer.allocate(8) > bb.putLong(i) > bb.array > } > def test = { > val sql = spark.sqlContext > import sql.implicits._ > val timestamp = 1498772083037L > val data = (timestamp to timestamp + 1000L).map(i => > TestRecord(convertToBytes(i))) > val testDF = sc.parallelize(data).toDF > val filter1 = testDF.filter(col("col0") >= convertToBytes(timestamp) && > col("col0") < convertToBytes(timestamp + 50L)) > val filter2 = testDF.filter(col("col0") >= convertToBytes(timestamp + > 50L) && col("col0") < convertToBytes(timestamp + 100L)) > val filter3 = testDF.filter(col("col0") >= convertToBytes(timestamp) && > col("col0") < convertToBytes(timestamp + 100L)) > assert(filter1.count == 50) > assert(filter2.count == 50) > assert(filter3.count == 100) > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21344) BinaryType comparison does signed byte array comparison
[ https://issues.apache.org/jira/browse/SPARK-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-21344: -- Priority: Major (was: Blocker) > BinaryType comparison does signed byte array comparison > --- > > Key: SPARK-21344 > URL: https://issues.apache.org/jira/browse/SPARK-21344 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1 >Reporter: Shubham Chopra > > BinaryType used by Spark SQL defines ordering using signed byte comparisons. > This can lead to unexpected behavior. Consider the following code snippet > that shows this error: > {code} > case class TestRecord(col0: Array[Byte]) > def convertToBytes(i: Long): Array[Byte] = { > val bb = java.nio.ByteBuffer.allocate(8) > bb.putLong(i) > bb.array > } > def test = { > val sql = spark.sqlContext > import sql.implicits._ > val timestamp = 1498772083037L > val data = (timestamp to timestamp + 1000L).map(i => > TestRecord(convertToBytes(i))) > val testDF = sc.parallelize(data).toDF > val filter1 = testDF.filter(col("col0") >= convertToBytes(timestamp) && > col("col0") < convertToBytes(timestamp + 50L)) > val filter2 = testDF.filter(col("col0") >= convertToBytes(timestamp + > 50L) && col("col0") < convertToBytes(timestamp + 100L)) > val filter3 = testDF.filter(col("col0") >= convertToBytes(timestamp) && > col("col0") < convertToBytes(timestamp + 100L)) > assert(filter1.count == 50) > assert(filter2.count == 50) > assert(filter3.count == 100) > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21344) BinaryType comparison does signed byte array comparison
[ https://issues.apache.org/jira/browse/SPARK-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chopra updated SPARK-21344: --- Priority: Blocker (was: Major) > BinaryType comparison does signed byte array comparison > --- > > Key: SPARK-21344 > URL: https://issues.apache.org/jira/browse/SPARK-21344 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1 >Reporter: Shubham Chopra >Priority: Blocker > > BinaryType used by Spark SQL defines ordering using signed byte comparisons. > This can lead to unexpected behavior. Consider the following code snippet > that shows this error: > {code} > case class TestRecord(col0: Array[Byte]) > def convertToBytes(i: Long): Array[Byte] = { > val bb = java.nio.ByteBuffer.allocate(8) > bb.putLong(i) > bb.array > } > def test = { > val sql = spark.sqlContext > import sql.implicits._ > val timestamp = 1498772083037L > val data = (timestamp to timestamp + 1000L).map(i => > TestRecord(convertToBytes(i))) > val testDF = sc.parallelize(data).toDF > val filter1 = testDF.filter(col("col0") >= convertToBytes(timestamp) && > col("col0") < convertToBytes(timestamp + 50L)) > val filter2 = testDF.filter(col("col0") >= convertToBytes(timestamp + > 50L) && col("col0") < convertToBytes(timestamp + 100L)) > val filter3 = testDF.filter(col("col0") >= convertToBytes(timestamp) && > col("col0") < convertToBytes(timestamp + 100L)) > assert(filter1.count == 50) > assert(filter2.count == 50) > assert(filter3.count == 100) > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21344) BinaryType comparison does signed byte array comparison
[ https://issues.apache.org/jira/browse/SPARK-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chopra updated SPARK-21344: --- Description: BinaryType used by Spark SQL defines ordering using signed byte comparisons. This can lead to unexpected behavior. Consider the following code snippet that shows this error: {code} case class TestRecord(col0: Array[Byte]) def convertToBytes(i: Long): Array[Byte] = { val bb = java.nio.ByteBuffer.allocate(8) bb.putLong(i) bb.array } def test = { val sql = spark.sqlContext import sql.implicits._ val timestamp = 1498772083037L val data = (timestamp to timestamp + 1000L).map(i => TestRecord(convertToBytes(i))) val testDF = sc.parallelize(data).toDF val filter1 = testDF.filter(col("col0") >= convertToBytes(timestamp) && col("col0") < convertToBytes(timestamp + 50L)) val filter2 = testDF.filter(col("col0") >= convertToBytes(timestamp + 50L) && col("col0") < convertToBytes(timestamp + 100L)) val filter3 = testDF.filter(col("col0") >= convertToBytes(timestamp) && col("col0") < convertToBytes(timestamp + 100L)) assert(filter1.count == 50) assert(filter2.count == 50) assert(filter3.count == 100) } {code} was: BinaryType used by Spark SQL defines ordering using signed byte comparisons. This can lead to unexpected behavior. Consider the following code snippet that shows this error: {code:scala} case class TestRecord(col0: Array[Byte]) def convertToBytes(i: Long): Array[Byte] = { val bb = java.nio.ByteBuffer.allocate(8) bb.putLong(i) bb.array } def test = { val sql = spark.sqlContext import sql.implicits._ val timestamp = 1498772083037L val data = (timestamp to timestamp + 1000L).map(i => TestRecord(convertToBytes(i))) val testDF = sc.parallelize(data).toDF val filter1 = testDF.filter(col("col0") >= convertToBytes(timestamp) && col("col0") < convertToBytes(timestamp + 50L)) val filter2 = testDF.filter(col("col0") >= convertToBytes(timestamp + 50L) && col("col0") < convertToBytes(timestamp + 100L)) val filter3 = testDF.filter(col("col0") >= convertToBytes(timestamp) && col("col0") < convertToBytes(timestamp + 100L)) assert(filter1.count == 50) assert(filter2.count == 50) assert(filter3.count == 100) } {code} > BinaryType comparison does signed byte array comparison > --- > > Key: SPARK-21344 > URL: https://issues.apache.org/jira/browse/SPARK-21344 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1 >Reporter: Shubham Chopra > > BinaryType used by Spark SQL defines ordering using signed byte comparisons. > This can lead to unexpected behavior. Consider the following code snippet > that shows this error: > {code} > case class TestRecord(col0: Array[Byte]) > def convertToBytes(i: Long): Array[Byte] = { > val bb = java.nio.ByteBuffer.allocate(8) > bb.putLong(i) > bb.array > } > def test = { > val sql = spark.sqlContext > import sql.implicits._ > val timestamp = 1498772083037L > val data = (timestamp to timestamp + 1000L).map(i => > TestRecord(convertToBytes(i))) > val testDF = sc.parallelize(data).toDF > val filter1 = testDF.filter(col("col0") >= convertToBytes(timestamp) && > col("col0") < convertToBytes(timestamp + 50L)) > val filter2 = testDF.filter(col("col0") >= convertToBytes(timestamp + > 50L) && col("col0") < convertToBytes(timestamp + 100L)) > val filter3 = testDF.filter(col("col0") >= convertToBytes(timestamp) && > col("col0") < convertToBytes(timestamp + 100L)) > assert(filter1.count == 50) > assert(filter2.count == 50) > assert(filter3.count == 100) > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org