cloud-fan commented on a change in pull request #24910: [SPARK-28108][SQL]
Simplify OrcFilters
URL: https://github.com/apache/spark/pull/24910#discussion_r296462734
##########
File path:
sql/core/v2.3.5/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala
##########
@@ -115,228 +161,108 @@ private class OrcFilterConverter(val dataTypeMap:
Map[String, DataType]) {
case _ => value
}
- import org.apache.spark.sql.sources._
- import OrcFilters._
-
/**
- * Builds a SearchArgument for a Filter by first trimming the
non-convertible nodes, and then
- * only building the remaining convertible nodes.
- *
- * Doing the conversion in this way avoids the computational complexity
problems introduced by
- * checking whether a node is convertible while building it. The approach
implemented here has
- * complexity that's linear in the size of the Filter tree - O(number of
Filter nodes) - we run
- * a single pass over the tree to trim it, and then another pass on the
trimmed tree to convert
- * the remaining nodes.
+ * Build a SearchArgument and return the builder so far.
*
- * The alternative approach of checking-while-building can (and did) result
- * in exponential complexity in the height of the tree, causing perf
problems with Filters with
- * as few as ~35 nodes if they were skewed.
+ * @param dataTypeMap a map from the attribute name to its data type.
+ * @param expression the input filter predicates.
+ * @param builder the input SearchArgument.Builder.
+ * @return the builder so far.
*/
- private[sql] def buildSearchArgument(
+ private def buildSearchArgument(
+ dataTypeMap: Map[String, DataType],
expression: Filter,
builder: Builder): Option[Builder] = {
Review comment:
isn't it guaranteed that we only call `buildSearchArgument` with convertible
filters?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]