Github user koeninger commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3543#discussion_r21638810
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala ---
    @@ -262,7 +263,7 @@ class SQLContext(@transient val sparkContext: 
SparkContext)
       def createParquetFile[A <: Product : TypeTag](
           path: String,
           allowExisting: Boolean = true,
    -      conf: Configuration = new Configuration()): SchemaRDD = {
    --- End diff --
    
    So let me see if I have things straight
    
    - Currently, the code is using new Configuration() as a default, which may 
have some thread safety issues due to the constructor
    
    - my original patch uses SparkHadoopUtil.get.conf, which is a singleton, so 
should decrease the constructor thread safety problem, but increase the 
problems if the hadoop configuration is modified.  It also won't do the right 
thing for people who have altered the sparkConf, which makes it no good (I 
haven't run into this in personal usage of the patched version, because I 
always pass in a complete sparkConf via properties rather than setting it in 
code)
    
    - @tdas suggested to use this.sparkContext.hadoopConfiguration.  This will 
use the "right" spark config, but may have thread safety issues both at 
construction the time the spark context is created, and if the configuration is 
modified.
    
    So....
    
    Use tdas' suggestion, add a 
HadoopRDD.CONFIGURATION_INSTANTIATION_LOCK.synchronized block to 
SparkHadoopUtil.newConfiguration?  And people are out of luck if they have code 
that used to work because they were modifying new blank instances of 
Configuration, rather than the now-shared one? 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to