gengliangwang opened a new pull request #24910: [SPARK-28108][SQL] Simplify 
OrcFilters
URL: https://github.com/apache/spark/pull/24910
 
 
   ## What changes were proposed in this pull request?
   
   In #24068, @IvanVergiliev fixes the issue that OrcFilters.createBuilder has 
exponential complexity in the height of the filter tree due to the way the 
check-and-build pattern is implemented.
   
   Comparing to the approach in #24068, I propose a simple solution for the 
issue:
   1. separate the logic of building a convertible filter tree and the actual 
SearchArgument builder, since the two procedures are different and their return 
types are different. Thus the new introduced class 
`ActionType`,`TrimUnconvertibleFilters` and `BuildSearchArgument`  in #24068 
can be dropped. The code is more readable.
   2. For most of the leaf nodes, the convertible result is always Some(node), 
we can abstract it like this PR.
   3. The code is actually small changes on the previous code. See 
https://github.com/apache/spark/pull/24783
   
   
   ## How was this patch tested?
   Run the benchmark provided in #24068:
   ```
   val schema = StructType.fromDDL("col INT")
   (20 to 30).foreach { width =>
     val whereFilter = (1 to width).map(i => EqualTo("col", i)).reduceLeft(Or)
     val start = System.currentTimeMillis()
     OrcFilters.createFilter(schema, Seq(whereFilter))
     println(s"With $width filters, conversion takes 
${System.currentTimeMillis() - start} ms")
   }
   ```
   Result:
   ```
   With 20 filters, conversion takes 6 ms
   With 21 filters, conversion takes 0 ms
   With 22 filters, conversion takes 0 ms
   With 23 filters, conversion takes 0 ms
   With 24 filters, conversion takes 0 ms
   With 25 filters, conversion takes 0 ms
   With 26 filters, conversion takes 0 ms
   With 27 filters, conversion takes 0 ms
   With 28 filters, conversion takes 0 ms
   With 29 filters, conversion takes 0 ms
   With 30 filters, conversion takes 0 ms
   ```
   
   Also verified with Unit tests.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to