[GitHub] spark pull request: [SPARK-14962][SQL] Do not push down isnotnull/...

HyukjinKwon Fri, 29 Apr 2016 04:54:30 -0700

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12777#discussion_r61565675
  
    --- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFilters.scala ---
    @@ -56,29 +55,35 @@ import org.apache.spark.sql.sources._
      * known to be convertible.
      */
     private[orc] object OrcFilters extends Logging {
    -  def createFilter(filters: Array[Filter]): Option[SearchArgument] = {
    +  def createFilter(schema: StructType, filters: Array[Filter]): 
Option[SearchArgument] = {
    +    val dataTypeMap = schema.map(f => f.name -> f.dataType).toMap
    +
         // First, tries to convert each filter individually to see whether 
it's convertible, and then
         // collect all convertible ones to build the final `SearchArgument`.
         val convertibleFilters = for {
           filter <- filters
    -      _ <- buildSearchArgument(filter, SearchArgumentFactory.newBuilder())
    +      _ <- buildSearchArgument(dataTypeMap, filter, 
SearchArgumentFactory.newBuilder())
         } yield filter
     
         for {
           // Combines all convertible filters using `And` to produce a single 
conjunction
           conjunction <- convertibleFilters.reduceOption(And)
           // Then tries to build a single ORC `SearchArgument` for the 
conjunction predicate
    -      builder <- buildSearchArgument(conjunction, 
SearchArgumentFactory.newBuilder())
    +      builder <- buildSearchArgument(dataTypeMap, conjunction, 
SearchArgumentFactory.newBuilder())
         } yield builder.build()
       }
     
    -  private def buildSearchArgument(expression: Filter, builder: Builder): 
Option[Builder] = {
    +  private def buildSearchArgument(
    +      dataTypeMap: Map[String, DataType],
    +      expression: Filter,
    +      builder: Builder): Option[Builder] = {
         def newBuilder = SearchArgumentFactory.newBuilder()
     
    -    def isSearchableLiteral(value: Any): Boolean = value match {
    -      // These are types recognized by the 
`SearchArgumentImpl.BuilderImpl.boxLiteral()` method.
    -      case _: String | _: Long | _: Double | _: Byte | _: Short | _: 
Integer | _: Float => true
    -      case _: DateWritable | _: HiveDecimal | _: HiveChar | _: HiveVarchar 
=> true
    +    def isSearchableType(dataType: DataType): Boolean = dataType match {
    +      // Only the values in the Spark types below can be recognized by
    +      // the `SearchArgumentImpl.BuilderImpl.boxLiteral()` method.
    +      case ByteType | ShortType | FloatType | DoubleType => true
    +      case IntegerType | LongType | StringType => true
    --- End diff --
    
    Note to myself: this should be okay because 
`CatalystTypeConverters.createToScalaConverter()` is called in 
`DataSourceStrategy` for the values. I checked all the cases and it seems there 
are no cases that they are converted to one of `DateWritable | _: HiveDecimal | 
_: HiveChar | _: HiveVarchar`.
    
    As the all values in `source.Filter` are converted by `DataType`, this 
should be okay. I check the test codes. `ParquetFilters` is also doing this in 
this way.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-14962][SQL] Do not push down isnotnull/...

Reply via email to