[GitHub] spark pull request: [SPARK-8782] [SQL] Fix code generation for ORD...

JoshRosen Wed, 01 Jul 2015 23:27:30 -0700

Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/7179#discussion_r33750029
  
    --- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala
 ---
    @@ -42,4 +47,47 @@ class CodeGenerationSuite extends SparkFunSuite {
     
         futures.foreach(Await.result(_, 10.seconds))
       }
    +
    +  // Test GenerateOrdering for all common types. For each type, we 
construct random input rows that
    +  // contain two columns of that type, then for pairs of 
randomly-generated rows we check that
    +  // GenerateOrdering agrees with RowOrdering.
    +  (DataTypeTestUtils.atomicTypes ++ Set(NullType)).foreach { dataType =>
    +    test(s"GenerateOrdering with $dataType") {
    +      val rowOrdering = RowOrdering.forSchema(Seq(dataType, dataType))
    +      val genOrdering = GenerateOrdering.generate(
    +        BoundReference(0, dataType, nullable = true).asc ::
    +        BoundReference(1, dataType, nullable = true).asc :: Nil)
    +      val rowType = StructType(
    +        StructField("a", dataType, nullable = true) ::
    +        StructField("b", dataType, nullable = true) :: Nil)
    +      val toCatalyst = 
CatalystTypeConverters.createToCatalystConverter(rowType)
    +      // Sort ordering is not defined for NaN, so skip any random inputs 
that contain it:
    +      def isIncomparable(v: Any): Boolean = v match {
    --- End diff --
    
    Given that we might use sorting for clustering as part of a sort-based 
distinct operator, I wonder whether this has any bad implications for 
performing distinct on columns that contain NaN. Should we warn about this 
undefined behavior somewhere in our documentation?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-8782] [SQL] Fix code generation for ORD...

Reply via email to