Github user rednaxelafx commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20757#discussion_r173006281
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
    @@ -1408,11 +1409,37 @@ case class ValidateExternalType(child: Expression, 
expected: DataType)
     
       override def dataType: DataType = 
RowEncoder.externalDataTypeForInput(expected)
     
    -  override def eval(input: InternalRow): Any =
    -    throw new UnsupportedOperationException("Only code-generated 
evaluation is supported")
    -
       private val errMsg = s" is not a valid external type for schema of 
${expected.simpleString}"
     
    +  private lazy val checkType = expected match {
    +    case _: DecimalType =>
    +      (value: Any) => {
    +        Seq(classOf[java.math.BigDecimal], classOf[scala.math.BigDecimal], 
classOf[Decimal])
    +          .exists { x => value.getClass.isAssignableFrom(x) }
    +      }
    +    case _: ArrayType =>
    +      (value: Any) => {
    +        value.getClass.isAssignableFrom(classOf[Seq[_]]) || 
value.getClass.isArray
    --- End diff --
    
    Hi guys, sorry I'm late.
    
    In your new code you're doing:
    ```diff
    +    case _: ArrayType =>
    +      (value: Any) => {
    +        value.getClass.isArray || value.isInstanceOf[Seq[_]]
    +      }
    ```
    which is good. `xxx.getClass().isAssignableFrom(some_class_literal)` in the 
old version of this PR is actually backwards, it should have been 
`some_class_literal.isAssignableFrom(xxx.getClass())`, e.g.
    ```
    scala> classOf[String].isAssignableFrom(classOf[Object])
    res0: Boolean = false
    
    scala> classOf[Object].isAssignableFrom(classOf[String])
    res1: Boolean = true
    ```
    and the latter is semantically the same as `xxx.isInstanceOf[some_class]`. 
`isInstanceOf[]` is guaranteed to be at least as fast as 
`some_class_literal.isAssignableFrom(xxx.getClass())`, and in general 
`isInstanceOf[]` is faster.
    
    `xxx.getClass().isArray()` has a fixed overhead, whereas `isInstanceOf[]` 
can have a fast path slightly faster than the `isArray` and a slow path that 
can be much slower than `isArray`. So putting the `isArray` check first in your 
new code makes more sense to me.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to