rdblue commented on PR #36304: URL: https://github.com/apache/spark/pull/36304#issuecomment-1261555810
I talked with @aokolnychyi about this and I think this is a data source problem, not something Spark should track right now. The main problem is that some table sources have different versions and that's not something that we're used to handling. Data sources that don't have different versions are not affected, so option 1 is not great because it forces everyone to deal with a problem only few sources have. Spark could use option 2 and track this itself, but that complicates the API as well and we don't know that we need it yet. If we do add version/history to Spark then we'd probably want to add `SHOW HISTORY` and things as well. We've also found a reliable way for option 3 to work. The underlying table instance is the same, so the filter method just needs to check that the table instance has not been refreshed or modified when the runtime filter is applied to it. I think that option 3 is the simplest approach in terms of new Spark APIs (none!) and is the right way forward until Spark decides to model tables with multiple versions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
