[GitHub] [spark] rdblue commented on pull request #36304: [SPARK-38959][SQL] DS V2: Support runtime group filtering in row-level commands

GitBox Wed, 28 Sep 2022 16:10:43 -0700


rdblue commented on PR #36304:
URL: https://github.com/apache/spark/pull/36304#issuecomment-1261555810


   I talked with @aokolnychyi about this and I think this is a data source 
problem, not something Spark should track right now.
   
   The main problem is that some table sources have different versions and 
that's not something that we're used to handling. Data sources that don't have 
different versions are not affected, so option 1 is not great because it forces 
everyone to deal with a problem only few sources have.
   
   Spark could use option 2 and track this itself, but that complicates the API 
as well and we don't know that we need it yet. If we do add version/history to 
Spark then we'd probably want to add `SHOW HISTORY` and things as well.
   
   We've also found a reliable way for option 3 to work. The underlying table 
instance is the same, so the filter method just needs to check that the table 
instance has not been refreshed or modified when the runtime filter is applied 
to it.
   
   I think that option 3 is the simplest approach in terms of new Spark APIs 
(none!) and is the right way forward until Spark decides to model tables with 
multiple versions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] rdblue commented on pull request #36304: [SPARK-38959][SQL] DS V2: Support runtime group filtering in row-level commands

Reply via email to