Hi, I know Spark 3.0 has added Parquet predicate pushdown for nested structures 
(SPARK-17636) Does it also support predicate pushdown for an array of structs?  
For example, say I have a spark table 'individuals' (in parquet format) with 
the following schema 
root |-- individual_id: string (nullable = true) |-- devices: array (nullable = 
true) | |-- element: struct (containsNull = true) | | |-- type: string 
(nullable = true) | | |-- carrier_name: string (nullable = true) | | |-- model: 
string (nullable = true) | | |-- vendor: string (nullable = true) | | |-- 
year_released: integer (nullable = true) | | |-- primary_hardware_type: string 
(nullable = true) | | |-- browser_name: string (nullable = true) | | |-- 
browser_version: string (nullable = true) | | |-- manufacturer: string 
(nullable = true)

I can then use the following code to find the number of individuals who have at 
least one device that was released after 2010
select count(*) as total_count from individuals  where exists(devices, dev -> 
dev.year_released > 2010)
The query runs well with spark 3.0 but it had to read all the columns of the 
nested structure 'devices', as shown below.res14: 
org.apache.spark.sql.execution.SparkPlan =AdaptiveSparkPlan isFinalPlan=false+- 
HashAggregate(keys=[], functions=[finalmerge_count(merge count#59L) AS 
count(1)#55L], output=[total_count#54L]) +- Exchange SinglePartition, true, 
[id=#35] +- HashAggregate(keys=[], functions=[partial_count(1) AS count#59L], 
output=[count#59L]) +- Project +- Filter exists(devices#48, 
lambdafunction((lambda dev#56.year_released > 2018), lambda dev#56, false)) +- 
FileScan parquet [ids#48] Batched: true, DataFilters: [exists(devices#48, 
lambdafunction((lambda dev#56.year_released > 2018), lambda dev#56, false))], 
Format: Parquet, Location: InMemoryFileIndex[s3://..., PartitionFilters: [], 
PushedFilters: [], ReadSchema: 
struct<devices:array<struct<id_last_seen:date,type:string,value:string,carrier_name:string,model:string,vendor:string,in...

Any thoughts?
Thanks
Haijia

Reply via email to