MaxGekk opened a new pull request #27366: [WIP][SPARK-30648][SQL] Support 
filters pushdown in JSON datasource
URL: https://github.com/apache/spark/pull/27366
 
 
   ### What changes were proposed in this pull request?
   In the PR, I propose to support pushed down filters in JSON datasource. The 
reason of pushing a filter up to `JacksonParser` is to apply the filter as soon 
as all its attributes become available i.e. converted from JSON field values to 
desired values according to the schema. This allows to skip conversions of 
other values if the filter returns `false`. This can improve performance when 
pushed filters are highly selective and conversion of JSON string fields to 
desired values are comparably expensive ( for example, conversion to 
`TIMESTAMP` values).
   
   ### Why are the changes needed?
   The changes improve performance on synthetic benchmarks up to **20 times** 
(on JDK 8):
   ```
   Java HotSpot(TM) 64-Bit Server VM 1.8.0_231-b11 on Mac OS X 10.15.2
   Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
   Filters pushdown:                         Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   w/o filters                                       13083          13150       
   61          0.0      130827.2       1.0X
   pushdown disabled                                 13039          13115       
   78          0.0      130387.7       1.0X
   w/ filters                                          637            649       
   10          0.2        6369.7      20.5X
   ```
   
   ### Does this PR introduce any user-facing change?
   No
   
   ### How was this patch tested?
   - Add new test suites `JsonFiltersSuite` and `JacksonParserSuite`.
   - By new end-to-end test in `JsonSuite`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to