GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/21376

    [SPARK-24250][SQL] support accessing SQLConf inside tasks

    re-submit https://github.com/apache/spark/pull/21299 which breaks build.
    
    A new commit is added to fix the SQLConf problem in 
`JsonSchemaInference.infer`.
    
    ## What changes were proposed in this pull request?
    
    Previously in #20136 we decided to forbid tasks to access `SQLConf`, 
because it doesn't work and always give you the default conf value. In #21190 
we fixed the check and all the places that violate it.
    
    Currently the pattern of accessing configs at the executor side is: read 
the configs at the driver side, then access the variables holding the config 
values in the RDD closure, so that they will be serialized to the executor 
side. Something like
    ```
    val someConf = conf.getXXX
    child.execute().mapPartitions {
      if (someConf == ...) ...
      ...
    }
    ```
    
    However, this pattern is hard to apply if the config needs to be propagated 
via a long call stack. An example is `DataType.sameType`, and see how many 
changes were made in #21190 .
    
    When it comes to code generation, it's even worse. I tried it locally and 
we need to change a ton of files to propagate configs to code generators.
    
    This PR proposes to allow tasks to access `SQLConf`. The idea is, we can 
save all the SQL configs to job properties when an SQL execution is triggered. 
At executor side we rebuild the `SQLConf` from job properties.
    
    ## How was this patch tested?
    
    a new test suite

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark config

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21376.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21376
    
----
commit ba467036fdd2e6efe3ef2be66f378da341c73423
Author: Wenchen Fan <wenchen@...>
Date:   2018-05-19T10:51:02Z

    support accessing SQLConf at executor side

commit a1519d4aa692adceef1f3878a2ccd1715bf6175a
Author: Wenchen Fan <wenchen@...>
Date:   2018-05-20T10:33:00Z

    fix json schema inference

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to