Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10377#discussion_r48390321
  
    --- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFilters.scala ---
    @@ -50,37 +82,18 @@ private[orc] object OrcFilters extends Logging {
           case _ => false
         }
     
    -    // lian: I probably missed something here, and had to end up with a 
pretty weird double-checking
    -    // pattern when converting `And`/`Or`/`Not` filters.
    -    //
    -    // The annoying part is that, `SearchArgument` builder methods like 
`startAnd()` `startOr()`,
    -    // and `startNot()` mutate internal state of the builder instance.  
This forces us to translate
    -    // all convertible filters with a single builder instance. However, 
before actually converting a
    -    // filter, we've no idea whether it can be recognized by ORC or not. 
Thus, when an inconvertible
    -    // filter is found, we may already end up with a builder whose 
internal state is inconsistent.
    -    //
    -    // For example, to convert an `And` filter with builder `b`, we call 
`b.startAnd()` first, and
    -    // then try to convert its children.  Say we convert `left` child 
successfully, but find that
    -    // `right` child is inconvertible.  Alas, `b.startAnd()` call can't be 
rolled back, and `b` is
    -    // inconsistent now.
    -    //
    -    // The workaround employed here is that, for `And`/`Or`/`Not`, we 
first try to convert their
    -    // children with brand new builders, and only do the actual conversion 
with the right builder
    -    // instance when the children are proven to be convertible.
    -    //
    -    // P.S.: Hive seems to use `SearchArgument` together with 
`ExprNodeGenericFuncDesc` only.
    -    // Usage of builder methods mentioned above can only be found in test 
code, where all tested
    -    // filters are known to be convertible.
    -
         expression match {
           case And(left, right) =>
    -        // At here, it is not safe to just convert one side if we do not 
understand the
    -        // other side. Here is an example used to explain the reason.
    -        // Let's say we have NOT(a = 2 AND b in ('1')) and we do not 
understand how to
    -        // convert b in ('1'). If we only convert a = 2, we will end up 
with a filter
    -        // NOT(a = 2), which will generate wrong results.
    -        // Pushing one side of AND down is only safe to do at the top 
level.
    -        // You can see ParquetRelation's initializeLocalJobFunc method as 
an example.
    +        // Conventionally, it's safe to push down only one branch of an 
`And` predicate even if the
    +        // other branch is inconvertible.  However, it's not safe to do so 
here.  Because the `And`
    +        // predicate maybe nested within a `Not` predicate.  Take the 
following predicate as an
    --- End diff --
    
    OK, I'll just use the comment you used in Parquet then.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to