Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10377#discussion_r48390319
  
    --- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFilters.scala ---
    @@ -26,15 +26,47 @@ import org.apache.spark.Logging
     import org.apache.spark.sql.sources._
     
     /**
    - * It may be optimized by push down partial filters. But we are 
conservative here.
    - * Because if some filters fail to be parsed, the tree may be corrupted,
    - * and cannot be used anymore.
    + * Helper object for building ORC `SearchArgument`s, which are used for 
ORC predicate push-down.
    + *
    + * Due to limitation of ORC `SearchArgument` builder, we had to end up 
with a pretty weird double-
    + * checking pattern when converting `And`/`Or`/`Not` filters.
    + *
    + * An ORC `SearchArgument` must be built in one pass using a single 
builder.  For example, you can't
    + * build `a = 1` and `b = 2` first, and then combine them into `a = 1 AND 
b = 2`.  This is quite
    + * different from the cases in Spark SQL or Parquet, where complex filters 
can be easily built using
    + * existing simpler ones.
    + *
    + * The annoying part is that, `SearchArgument` builder methods like 
`startAnd()`, `startOr()`, and
    + * `startNot()` mutate internal state of the builder instance.  This 
forces us to translate all
    + * convertible filters with a single builder instance. However, before 
actually converting a filter,
    + * we've no idea whether it can be recognized by ORC or not. Thus, when an 
inconvertible filter is
    + * found, we may already end up with a builder whose internal state is 
inconsistent.
    + *
    + * For example, to convert an `And` filter with builder `b`, we call 
`b.startAnd()` first, and then
    + * try to convert its children.  Say we convert `left` child successfully, 
but find that `right`
    + * child is inconvertible.  Alas, `b.startAnd()` call can't be rolled 
back, and `b` is inconsistent
    + * now.
    + *
    + * The workaround employed here is that, for `And`/`Or`/`Not`, we first 
try to convert their
    --- End diff --
    
    `buildSearchArgument` is recursive, so nested `And`/`Or`/`Not` within top 
level filters are also covered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to