GitHub user cloud-fan opened a pull request:
https://github.com/apache/spark/pull/21376
[SPARK-24250][SQL] support accessing SQLConf inside tasks
re-submit https://github.com/apache/spark/pull/21299 which breaks build.
A new commit is added to fix the SQLConf problem in
`JsonSchemaInference.infer`.
## What changes were proposed in this pull request?
Previously in #20136 we decided to forbid tasks to access `SQLConf`,
because it doesn't work and always give you the default conf value. In #21190
we fixed the check and all the places that violate it.
Currently the pattern of accessing configs at the executor side is: read
the configs at the driver side, then access the variables holding the config
values in the RDD closure, so that they will be serialized to the executor
side. Something like
```
val someConf = conf.getXXX
child.execute().mapPartitions {
if (someConf == ...) ...
...
}
```
However, this pattern is hard to apply if the config needs to be propagated
via a long call stack. An example is `DataType.sameType`, and see how many
changes were made in #21190 .
When it comes to code generation, it's even worse. I tried it locally and
we need to change a ton of files to propagate configs to code generators.
This PR proposes to allow tasks to access `SQLConf`. The idea is, we can
save all the SQL configs to job properties when an SQL execution is triggered.
At executor side we rebuild the `SQLConf` from job properties.
## How was this patch tested?
a new test suite
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/cloud-fan/spark config
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21376.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21376
----
commit ba467036fdd2e6efe3ef2be66f378da341c73423
Author: Wenchen Fan <wenchen@...>
Date: 2018-05-19T10:51:02Z
support accessing SQLConf at executor side
commit a1519d4aa692adceef1f3878a2ccd1715bf6175a
Author: Wenchen Fan <wenchen@...>
Date: 2018-05-20T10:33:00Z
fix json schema inference
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]