[GitHub] spark pull request #22529: [SPARK-25460][BRANCH-2.4][SS] DataSourceV2: SS so...

HyukjinKwon Sat, 22 Sep 2018 19:33:32 -0700

GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/22529


    [SPARK-25460][BRANCH-2.4][SS] DataSourceV2: SS sources do not respect 
SessionConfigSupport

    ## What changes were proposed in this pull request?
    
    This PR proposes to backport SPARK-25460 to branch-2.4:
    
    This PR proposes to respect `SessionConfigSupport` in SS datasources as 
well. Currently these are only respected in batch sources:
    
    
https://github.com/apache/spark/blob/e06da95cd9423f55cdb154a2778b0bddf7be984c/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala#L198-L203
    
    
https://github.com/apache/spark/blob/e06da95cd9423f55cdb154a2778b0bddf7be984c/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala#L244-L249
    
    If a developer makes a datasource V2 that supports both structured 
streaming and batch jobs, batch jobs respect a specific configuration, let's 
say, URL to connect and fetch data (which end users might not be aware of); 
however, structured streaming ends up with not supporting this (and should 
explicitly be set into options).
    
    ## How was this patch tested?
    
    Unit tests were added.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark SPARK-25460-backport

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22529.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22529
    
----
commit b6f8880ad6bdbcb721ca0863502ec4b6c85b162c
Author: hyukjinkwon <gurwls223@...>
Date:   2018-09-20T12:22:55Z

    [SPARK-25460][SS] DataSourceV2: SS sources do not respect 
SessionConfigSupport
    
    This PR proposes to respect `SessionConfigSupport` in SS datasources as 
well. Currently these are only respected in batch sources:
    
    
https://github.com/apache/spark/blob/e06da95cd9423f55cdb154a2778b0bddf7be984c/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala#L198-L203
    
    
https://github.com/apache/spark/blob/e06da95cd9423f55cdb154a2778b0bddf7be984c/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala#L244-L249
    
    If a developer makes a datasource V2 that supports both structured 
streaming and batch jobs, batch jobs respect a specific configuration, let's 
say, URL to connect and fetch data (which end users might not be aware of); 
however, structured streaming ends up with not supporting this (and should 
explicitly be set into options).
    
    Unit tests were added.
    
    Closes #22462 from HyukjinKwon/SPARK-25460.
    
    Authored-by: hyukjinkwon <[email protected]>
    Signed-off-by: Wenchen Fan <[email protected]>

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22529: [SPARK-25460][BRANCH-2.4][SS] DataSourceV2: SS so...

Reply via email to