Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3543#discussion_r21638810
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala ---
@@ -262,7 +263,7 @@ class SQLContext(@transient val sparkContext:
SparkContext)
def createParquetFile[A <: Product : TypeTag](
path: String,
allowExisting: Boolean = true,
- conf: Configuration = new Configuration()): SchemaRDD = {
--- End diff --
So let me see if I have things straight
- Currently, the code is using new Configuration() as a default, which may
have some thread safety issues due to the constructor
- my original patch uses SparkHadoopUtil.get.conf, which is a singleton, so
should decrease the constructor thread safety problem, but increase the
problems if the hadoop configuration is modified. It also won't do the right
thing for people who have altered the sparkConf, which makes it no good (I
haven't run into this in personal usage of the patched version, because I
always pass in a complete sparkConf via properties rather than setting it in
code)
- @tdas suggested to use this.sparkContext.hadoopConfiguration. This will
use the "right" spark config, but may have thread safety issues both at
construction the time the spark context is created, and if the configuration is
modified.
So....
Use tdas' suggestion, add a
HadoopRDD.CONFIGURATION_INSTANTIATION_LOCK.synchronized block to
SparkHadoopUtil.newConfiguration? And people are out of luck if they have code
that used to work because they were modifying new blank instances of
Configuration, rather than the now-shared one?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]