[GitHub] [spark] aokolnychyi edited a comment on pull request #35395: [SPARK-38085][SQL] DataSource V2: Handle DELETE commands for group-based sources

GitBox Mon, 21 Mar 2022 12:54:57 -0700


aokolnychyi edited a comment on pull request #35395:
URL: https://github.com/apache/spark/pull/35395#issuecomment-1074348488



   @rdblue @cloud-fan, I assumed the delete condition (not negated) would be 
explicitly passed to both scan builders by Spark. For instance, if the delete 
condition is `part_col = 'a' and id =1`, Spark would push it to the main scan 
builder and then provide an extra predicate on the filter attributes (e.g. 
`_file_name IN (...)`). Since the scan condition will be the same, data sources 
may cache and reuse some information between the scans.
   
   I can also see data sources delaying the actual split planning in the main 
scan up until they receive the runtime filter too. I guess there is a number of 
ways data sources can behave.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] aokolnychyi edited a comment on pull request #35395: [SPARK-38085][SQL] DataSource V2: Handle DELETE commands for group-based sources

Reply via email to