Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/12777#discussion_r61565675
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFilters.scala ---
@@ -56,29 +55,35 @@ import org.apache.spark.sql.sources._
* known to be convertible.
*/
private[orc] object OrcFilters extends Logging {
- def createFilter(filters: Array[Filter]): Option[SearchArgument] = {
+ def createFilter(schema: StructType, filters: Array[Filter]):
Option[SearchArgument] = {
+ val dataTypeMap = schema.map(f => f.name -> f.dataType).toMap
+
// First, tries to convert each filter individually to see whether
it's convertible, and then
// collect all convertible ones to build the final `SearchArgument`.
val convertibleFilters = for {
filter <- filters
- _ <- buildSearchArgument(filter, SearchArgumentFactory.newBuilder())
+ _ <- buildSearchArgument(dataTypeMap, filter,
SearchArgumentFactory.newBuilder())
} yield filter
for {
// Combines all convertible filters using `And` to produce a single
conjunction
conjunction <- convertibleFilters.reduceOption(And)
// Then tries to build a single ORC `SearchArgument` for the
conjunction predicate
- builder <- buildSearchArgument(conjunction,
SearchArgumentFactory.newBuilder())
+ builder <- buildSearchArgument(dataTypeMap, conjunction,
SearchArgumentFactory.newBuilder())
} yield builder.build()
}
- private def buildSearchArgument(expression: Filter, builder: Builder):
Option[Builder] = {
+ private def buildSearchArgument(
+ dataTypeMap: Map[String, DataType],
+ expression: Filter,
+ builder: Builder): Option[Builder] = {
def newBuilder = SearchArgumentFactory.newBuilder()
- def isSearchableLiteral(value: Any): Boolean = value match {
- // These are types recognized by the
`SearchArgumentImpl.BuilderImpl.boxLiteral()` method.
- case _: String | _: Long | _: Double | _: Byte | _: Short | _:
Integer | _: Float => true
- case _: DateWritable | _: HiveDecimal | _: HiveChar | _: HiveVarchar
=> true
+ def isSearchableType(dataType: DataType): Boolean = dataType match {
+ // Only the values in the Spark types below can be recognized by
+ // the `SearchArgumentImpl.BuilderImpl.boxLiteral()` method.
+ case ByteType | ShortType | FloatType | DoubleType => true
+ case IntegerType | LongType | StringType => true
--- End diff --
Note to myself: this should be okay because
`CatalystTypeConverters.createToScalaConverter()` is called in
`DataSourceStrategy` for the values. I checked all the cases and it seems there
are no cases that they are converted to one of `DateWritable | _: HiveDecimal |
_: HiveChar | _: HiveVarchar`.
As the all values in `source.Filter` are converted by `DataType`, this
should be okay. I check the test codes. `ParquetFilters` is also doing this in
this way.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]