AndrewKL opened a new pull request #26567: [SPARK-29929][SQL][V2-DS]{POC} add 
support for V2 datasources to require a distributio…
URL: https://github.com/apache/spark/pull/26567
 
 
   …n so that all writes to the table staisfies that distribution
   
   WARNING: This is a POC to explore the concept of adding child distribution 
requirements to v2 datasources. This would allow implementers of V2 Datasources 
to guarantee that the incoming data during writes has a particular distribution 
of data in the partitions and that the number of partitions can be defined.
   
   
   Challenges faced. 
   1. Distribution requirements must be known before a table is created. 
Currently many SQL style commands including CTAS that involve creating a table 
are an awkward fit for the current implementation.  For example SupportsRead 
extends table.  A Better implementation may be to have the 
RequireTableDistribution interface to extend TableProvider but this will 
require piping 
   
   ### What changes were proposed in this pull request?
   
   This is a POC and is not currently under consideration for merging.
   
   ### Why are the changes needed?
   
   Allowing V2 Datasources.
   
   
   ### Does this PR introduce any user-facing change?
   
   Theoretically it would involve updating the V2 DS to support 
RequireDistribution.
   
   
   ### How was this patch tested?
   An example V2DS.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to