dongjoon-hyun commented on a change in pull request #26530: [SPARK-25694][SQL]
Add a config for `URL.setURLStreamHandlerFactory`
URL: https://github.com/apache/spark/pull/26530#discussion_r346652327
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala
##########
@@ -185,11 +187,26 @@ private[sql] class SharedState(
}
object SharedState extends Logging {
- try {
- URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory())
- } catch {
- case e: Error =>
- logWarning("URL.setURLStreamHandlerFactory failed to set
FsUrlStreamHandlerFactory")
+ @volatile private var factory: Option[FsUrlStreamHandlerFactory] = None
+ private lazy val defaultFactory = new FsUrlStreamHandlerFactory()
+ private def setFsUrlStreamHandlerFactory(conf: SparkConf): Unit = {
+ factory match {
+ case Some(_) =>
+ logWarning("FsUrlStreamHandlerFactory has been already initialized, " +
+ "so it can not be modified")
+ case None => synchronized {
+ try {
+ if (conf.getBoolean("spark.fsUrlStreamHandlerFactory.enabled",
true)) {
Review comment:
This [SPARK-25694](https://issues.apache.org/jira/browse/SPARK-25694) is a
long-standing issue. Originally, [[SPARK-12868][SQL] Allow adding jars from
hdfs](https://github.com/apache/spark/pull/17342 ) added this for better Hive
support. However, this have a side-effect when we use Spark without `-Phive`.
This causes exceptions when the users tries to use another custom factories or
3rd party library (trying to set this). This configuration will unblock those
non-hive users.
```
scala> sql("show tables").show
+--------+---------+-----------+
|database|tableName|isTemporary|
+--------+---------+-----------+
+--------+---------+-----------+
scala> java.net.URL.setURLStreamHandlerFactory(new
org.apache.hadoop.fs.FsUrlStreamHandlerFactory())
java.lang.Error: factory already defined
at java.net.URL.setURLStreamHandlerFactory(URL.java:1134)
... 47 elided
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]