HyukjinKwon commented on a change in pull request #31768:
URL: https://github.com/apache/spark/pull/31768#discussion_r588972940
##########
File path: python/pyspark/context.py
##########
@@ -1255,6 +1255,16 @@ def getConf(self):
conf.setAll(self._conf.getAll())
return conf
+ def hadoopConfiguration(self):
+ """
+ Returns the Hadoop configuration used for the Hadoop code (e.g. file
systems) we reuse.
+
+ As it will be reused in all Hadoop RDDs, it's better not to modify it
unless you
+ plan to set some global configurations for all Hadoop RDDs.
+ Return :class:`Configuration` object
+ """
+ return self._jsc.hadoopConfiguration()
Review comment:
It exposes a Py4J instance as is which we should avoid. Defining a
Hadoop configuration class in PySpark doesn't make sense either. I am not sure
which approach is the best. We could probably leave it and let advanced-users
to use `sc. _jsc.hadoopConfiguration` for now.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]