dbtsai commented on a change in pull request #26530: [SPARK-25694][SQL] Add a
config for `URL.setURLStreamHandlerFactory`
URL: https://github.com/apache/spark/pull/26530#discussion_r347153431
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala
##########
@@ -185,11 +187,26 @@ private[sql] class SharedState(
}
object SharedState extends Logging {
- try {
- URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory())
- } catch {
- case e: Error =>
- logWarning("URL.setURLStreamHandlerFactory failed to set
FsUrlStreamHandlerFactory")
+ @volatile private var factory: Option[FsUrlStreamHandlerFactory] = None
+ private lazy val defaultFactory = new FsUrlStreamHandlerFactory()
+ private def setFsUrlStreamHandlerFactory(conf: SparkConf): Unit = {
+ factory match {
+ case Some(_) =>
+ logWarning("FsUrlStreamHandlerFactory has been already initialized, " +
+ "so it can not be modified")
+ case None => synchronized {
+ try {
+ if (conf.getBoolean("spark.fsUrlStreamHandlerFactory.enabled",
true)) {
Review comment:
@jiangzho It's very hard for end users to know this conf exists since it's
hidden in the code, and not easy for developers to maintain this. In Spark
community, we typically add a new conf with a default value and doc in
`config/package.scala`.
For example, in this scenario, we can add
```scala
private[spark] val USE_DEFAULT_URL_STREAM_HANDLER_FACTORY =
ConfigBuilder("spark.is.useDefaultUrlStreamHandlerFactory")
.doc("Some doc")
.booleanConf
.createWithDefault(true)
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]