[GitHub] [spark] yaroslav-serhiichuk commented on a change in pull request #31768: [SPARK-33436][PYSPARK] PySpark equivalent of SparkContext.hadoopConfiguration

GitBox Sun, 07 Mar 2021 12:22:25 -0800


yaroslav-serhiichuk commented on a change in pull request #31768:
URL: https://github.com/apache/spark/pull/31768#discussion_r589082773




##########
File path: python/pyspark/context.py
##########
@@ -1255,6 +1255,16 @@ def getConf(self):
         conf.setAll(self._conf.getAll())
         return conf
 
+    def hadoopConfiguration(self):
+        """
+        Returns the Hadoop configuration used for the Hadoop code (e.g. file 
systems) we reuse.
+
+        As it will be reused in all Hadoop RDDs, it's better not to modify it 
unless you
+        plan to set some global configurations for all Hadoop RDDs.
+        Return :class:`Configuration` object
+        """
+        return self._jsc.hadoopConfiguration()

Review comment:
       I had faced this issue when tried to setup s3a properties in pyspark 
job. The way how to do it was not present anywhere in the documentation but 
found a solution on 
[Stackoverflow.](https://stackoverflow.com/questions/28844631/how-to-set-hadoop-configuration-values-from-pyspark)
 Digging deeper I found a similar ticket opened in 
[Jira](https://issues.apache.org/jira/browse/SPARK-33436). That's why this PR 
was created. If we really do not need it, it can be closed. But in my mind, it 
is logical to have the same sc API I Scala and Python.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] yaroslav-serhiichuk commented on a change in pull request #31768: [SPARK-33436][PYSPARK] PySpark equivalent of SparkContext.hadoopConfiguration

Reply via email to