Shubham Chopra created SPARK-21344: -------------------------------------- Summary: BinaryType comparison does signed byte array comparison Key: SPARK-21344 URL: https://issues.apache.org/jira/browse/SPARK-21344 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.1.1 Reporter: Shubham Chopra
BinaryType used by Spark SQL defines ordering using signed byte comparisons. This can lead to unexpected behavior. Consider the following code snippet that shows this error: {code:scala} case class TestRecord(col0: Array[Byte]) def convertToBytes(i: Long): Array[Byte] = { val bb = java.nio.ByteBuffer.allocate(8) bb.putLong(i) bb.array } def test = { val sql = spark.sqlContext import sql.implicits._ val timestamp = 1498772083037L val data = (timestamp to timestamp + 1000L).map(i => TestRecord(convertToBytes(i))) val testDF = sc.parallelize(data).toDF val filter1 = testDF.filter(col("col0") >= convertToBytes(timestamp) && col("col0") < convertToBytes(timestamp + 50L)) val filter2 = testDF.filter(col("col0") >= convertToBytes(timestamp + 50L) && col("col0") < convertToBytes(timestamp + 100L)) val filter3 = testDF.filter(col("col0") >= convertToBytes(timestamp) && col("col0") < convertToBytes(timestamp + 100L)) assert(filter1.count == 50) assert(filter2.count == 50) assert(filter3.count == 100) } {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org