[jira] [Created] (SPARK-21344) BinaryType comparison does signed byte array comparison

Shubham Chopra (JIRA) Fri, 07 Jul 2017 13:16:57 -0700

Shubham Chopra created SPARK-21344:
--------------------------------------

             Summary: BinaryType comparison does signed byte array comparison
                 Key: SPARK-21344
                 URL: https://issues.apache.org/jira/browse/SPARK-21344
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.1.1
            Reporter: Shubham Chopra



BinaryType used by Spark SQL defines ordering using signed byte comparisons. 
This can lead to unexpected behavior. Consider the following code snippet that 
shows this error:

{code:scala}
case class TestRecord(col0: Array[Byte])
def convertToBytes(i: Long): Array[Byte] = {
    val bb = java.nio.ByteBuffer.allocate(8)
    bb.putLong(i)
    bb.array
  }
def test = {
    val sql = spark.sqlContext
    import sql.implicits._
    val timestamp = 1498772083037L
    val data = (timestamp to timestamp + 1000L).map(i => 
TestRecord(convertToBytes(i)))
    val testDF = sc.parallelize(data).toDF
    val filter1 = testDF.filter(col("col0") >= convertToBytes(timestamp) && 
col("col0") < convertToBytes(timestamp + 50L))
    val filter2 = testDF.filter(col("col0") >= convertToBytes(timestamp + 50L) 
&& col("col0") < convertToBytes(timestamp + 100L))
    val filter3 = testDF.filter(col("col0") >= convertToBytes(timestamp) && 
col("col0") < convertToBytes(timestamp + 100L))
    assert(filter1.count == 50)
    assert(filter2.count == 50)
    assert(filter3.count == 100)
}
{code}






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-21344) BinaryType comparison does signed byte array comparison

Reply via email to