hvanhovell commented on a change in pull request #23398: [SPARK-26493][SQL]
Allow multiple spark.sql.extensions
URL: https://github.com/apache/spark/pull/23398#discussion_r244412314
##########
File path:
sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala
##########
@@ -114,6 +124,25 @@ class SparkSessionExtensionSuite extends SparkFunSuite {
stop(session)
}
}
+
+ test("use multiple custom class for extensions") {
+ val session = SparkSession.builder()
+ .master("local[1]")
+ .config("spark.sql.extensions", Seq(
+ classOf[MyExtensions].getCanonicalName,
+ classOf[MyExtensions2].getCanonicalName).mkString(","))
+ .getOrCreate()
+ try {
+
assert(session.sessionState.planner.strategies.contains(MySparkStrategy(session)))
+
assert(session.sessionState.analyzer.extendedResolutionRules.contains(MyRule(session)))
+ assert(session.sessionState.functionRegistry
+ .lookupFunction(MyExtensions.myFunction._1).isDefined)
+ assert(session.sessionState.functionRegistry
+ .lookupFunction(MyExtensions2.myFunction._1).isDefined)
Review comment:
There are use cases where you want to execute rules in a certain order. So I
think it is reasonable to add the same rule multiple times. If you want more
control you could even create 'micro' optimizer batches by calling multiple
rules from one rule.
I think this is more a matter of proper documentation than one where we
should explicitly block things. Also note that this is a pretty advanced
feature and by this stage users are expected to know what they are doing.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]