[GitHub] [spark] ericl opened a new pull request #24572: [SPARK-27669][SQL] Refactor DataFrameWriter to resolve datasources in a command

GitBox Thu, 09 May 2019 17:57:30 -0700

ericl opened a new pull request #24572: [SPARK-27669][SQL] Refactor 
DataFrameWriter to resolve datasources in a command
URL: https://github.com/apache/spark/pull/24572
 
 
   ## What changes were proposed in this pull request?
   
   Currently, DataFrameWriter.save() does a large amount of ad-hoc work (e.g., 
loading data source classes, validating options, and so on) before executing 
the command.
   
   The execution of this code falls outside the scope of any SQL execution, 
which is unfortunate since it means it's untracked by Spark (e.g., in the Spark 
UI), and also means df.write ops cannot be manipulated by custom catalyst rules 
prior to execution.
   
   These issues can be largely resolved by creating a command that represents 
df.write.save/saveAsTable(), which also is nice since it simplifies the code a 
bit.
   
   cc @gatorsmile @srinathshankar 
   
   ## How was this patch tested?
   
   Existing tests should still pass.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] ericl opened a new pull request #24572: [SPARK-27669][SQL] Refactor DataFrameWriter to resolve datasources in a command

Reply via email to