[GitHub] jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions
jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions URL: https://github.com/apache/spark/pull/23398#discussion_r246503902 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala ## @@ -108,12 +124,86 @@ class SparkSessionExtensionSuite extends SparkFunSuite { try { assert(session.sessionState.planner.strategies.contains(MySparkStrategy(session))) assert(session.sessionState.analyzer.extendedResolutionRules.contains(MyRule(session))) + assert(session.sessionState.analyzer.postHocResolutionRules.contains(MyRule(session))) + assert(session.sessionState.analyzer.extendedCheckRules.contains(MyCheckRule(session))) + assert(session.sessionState.optimizer.batches.flatMap(_.rules).contains(MyRule(session))) + assert(session.sessionState.sqlParser.isInstanceOf[MyParser]) + assert(session.sessionState.functionRegistry +.lookupFunction(MyExtensions.myFunction._1).isDefined) +} finally { + stop(session) +} + } + + test("use multiple custom class for extensions in the specified order") { +val session = SparkSession.builder() + .master("local[1]") + .config("spark.sql.extensions", Seq( +classOf[MyExtensions2].getCanonicalName, +classOf[MyExtensions].getCanonicalName).mkString(",")) + .getOrCreate() +try { + assert(session.sessionState.planner.strategies.containsSlice( +Seq(MySparkStrategy2(session), MySparkStrategy(session + val orderedRules = Seq(MyRule2(session), MyRule(session)) + val orderedCheckRules = Seq(MyCheckRule2(session), MyCheckRule(session)) + val parser = MyParser(session, CatalystSqlParser) + assert(session.sessionState.analyzer.extendedResolutionRules.containsSlice(orderedRules)) + assert(session.sessionState.analyzer.postHocResolutionRules.containsSlice(orderedRules)) + assert(session.sessionState.analyzer.extendedCheckRules.containsSlice(orderedCheckRules)) + assert(session.sessionState.optimizer.batches.flatMap(_.rules).filter(orderedRules.contains) +.containsSlice(orderedRules ++ orderedRules)) // The optimizer rules are duplicated + assert(session.sessionState.sqlParser == parser) Review comment: Thanks for the follow up, I was not aware of that difference. I updated the tests to use `===` and `!==` as originally recommended. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions
jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions URL: https://github.com/apache/spark/pull/23398#discussion_r246395549 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala ## @@ -108,12 +124,86 @@ class SparkSessionExtensionSuite extends SparkFunSuite { try { assert(session.sessionState.planner.strategies.contains(MySparkStrategy(session))) assert(session.sessionState.analyzer.extendedResolutionRules.contains(MyRule(session))) + assert(session.sessionState.analyzer.postHocResolutionRules.contains(MyRule(session))) + assert(session.sessionState.analyzer.extendedCheckRules.contains(MyCheckRule(session))) + assert(session.sessionState.optimizer.batches.flatMap(_.rules).contains(MyRule(session))) + assert(session.sessionState.sqlParser.isInstanceOf[MyParser]) + assert(session.sessionState.functionRegistry +.lookupFunction(MyExtensions.myFunction._1).isDefined) +} finally { + stop(session) +} + } + + test("use multiple custom class for extensions in the specified order") { +val session = SparkSession.builder() + .master("local[1]") + .config("spark.sql.extensions", Seq( +classOf[MyExtensions2].getCanonicalName, +classOf[MyExtensions].getCanonicalName).mkString(",")) + .getOrCreate() +try { + assert(session.sessionState.planner.strategies.containsSlice( +Seq(MySparkStrategy2(session), MySparkStrategy(session + val orderedRules = Seq(MyRule2(session), MyRule(session)) + val orderedCheckRules = Seq(MyCheckRule2(session), MyCheckRule(session)) + val parser = MyParser(session, CatalystSqlParser) + assert(session.sessionState.analyzer.extendedResolutionRules.containsSlice(orderedRules)) + assert(session.sessionState.analyzer.postHocResolutionRules.containsSlice(orderedRules)) + assert(session.sessionState.analyzer.extendedCheckRules.containsSlice(orderedCheckRules)) + assert(session.sessionState.optimizer.batches.flatMap(_.rules).filter(orderedRules.contains) +.containsSlice(orderedRules ++ orderedRules)) // The optimizer rules are duplicated + assert(session.sessionState.sqlParser == parser) Review comment: Based on [databricks/scala-style-guide#36](https://github.com/databricks/scala-style-guide/issues/36), it looks like `==` might now be preferred over `===`. For what its worth, it seems that in the cases for this test `==` produces reasonable error messages such as `MyParser(org.apache.spark.sql.SparkSession@6e8a9c30,org.apache.spark.sql.catalyst.parser.CatalystSqlParser$@5d01ea21) did not equal IntentionalErrorThatIInsertedHere` and `2 did not equal 3`. So please let me know if there is newer guidance to use `===` and I can make the changes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions
jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions URL: https://github.com/apache/spark/pull/23398#discussion_r246389358 ## File path: sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala ## @@ -61,6 +61,26 @@ import org.apache.spark.sql.catalyst.rules.Rule * .getOrCreate() * }}} * + * The extensions can also be used by setting the Spark SQL configuration property + * spark.sql.extensions, for example: Review comment: I added the recommended documentation updates and also added code marks around `withExtensions` and fixed the typo there. I didn't think it was necessary to provide the example of using the comma-separated string but I did note it in the documentation. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions
jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions URL: https://github.com/apache/spark/pull/23398#discussion_r246384135 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala ## @@ -151,14 +241,63 @@ case class MyParser(spark: SparkSession, delegate: ParserInterface) extends Pars object MyExtensions { val myFunction = (FunctionIdentifier("myFunction"), -new ExpressionInfo("noClass", "myDb", "myFunction", "usage", "extended usage" ), +new ExpressionInfo("noClass", "myDb", "myFunction", "usage", "extended usage"), (myArgs: Seq[Expression]) => Literal(5, IntegerType)) + } class MyExtensions extends (SparkSessionExtensions => Unit) { def apply(e: SparkSessionExtensions): Unit = { e.injectPlannerStrategy(MySparkStrategy) e.injectResolutionRule(MyRule) +e.injectPostHocResolutionRule(MyRule) +e.injectCheckRule(MyCheckRule) +e.injectOptimizerRule(MyRule) +e.injectParser(MyParser) e.injectFunction(MyExtensions.myFunction) } } + +case class MyRule2(spark: SparkSession) extends Rule[LogicalPlan] { + override def apply(plan: LogicalPlan): LogicalPlan = plan +} + +case class MyCheckRule2(spark: SparkSession) extends (LogicalPlan => Unit) { + override def apply(plan: LogicalPlan): Unit = { } +} + +case class MySparkStrategy2(spark: SparkSession) extends SparkStrategy { + override def apply(plan: LogicalPlan): Seq[SparkPlan] = Seq.empty +} + +object MyExtensions2 { + + val myFunction = (FunctionIdentifier("myFunction2"), +new ExpressionInfo("noClass", "myDb", "myFunction2", "usage", "extended usage" ), Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions
jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions URL: https://github.com/apache/spark/pull/23398#discussion_r246383290 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala ## @@ -151,14 +241,63 @@ case class MyParser(spark: SparkSession, delegate: ParserInterface) extends Pars object MyExtensions { val myFunction = (FunctionIdentifier("myFunction"), -new ExpressionInfo("noClass", "myDb", "myFunction", "usage", "extended usage" ), +new ExpressionInfo("noClass", "myDb", "myFunction", "usage", "extended usage"), (myArgs: Seq[Expression]) => Literal(5, IntegerType)) + } class MyExtensions extends (SparkSessionExtensions => Unit) { def apply(e: SparkSessionExtensions): Unit = { e.injectPlannerStrategy(MySparkStrategy) e.injectResolutionRule(MyRule) +e.injectPostHocResolutionRule(MyRule) +e.injectCheckRule(MyCheckRule) +e.injectOptimizerRule(MyRule) +e.injectParser(MyParser) e.injectFunction(MyExtensions.myFunction) } } + +case class MyRule2(spark: SparkSession) extends Rule[LogicalPlan] { + override def apply(plan: LogicalPlan): LogicalPlan = plan +} + +case class MyCheckRule2(spark: SparkSession) extends (LogicalPlan => Unit) { + override def apply(plan: LogicalPlan): Unit = { } +} + +case class MySparkStrategy2(spark: SparkSession) extends SparkStrategy { + override def apply(plan: LogicalPlan): Seq[SparkPlan] = Seq.empty +} + +object MyExtensions2 { + + val myFunction = (FunctionIdentifier("myFunction2"), +new ExpressionInfo("noClass", "myDb", "myFunction2", "usage", "extended usage" ), +(_: Seq[Expression]) => Literal(5, IntegerType)) +} + +class MyExtensions2 extends (SparkSessionExtensions => Unit) { + def apply(e: SparkSessionExtensions): Unit = { +e.injectPlannerStrategy(MySparkStrategy2) +e.injectResolutionRule(MyRule2) +e.injectPostHocResolutionRule(MyRule2) +e.injectCheckRule(MyCheckRule2) +e.injectOptimizerRule(MyRule2) +e.injectParser((_, _) => CatalystSqlParser) +e.injectFunction(MyExtensions2.myFunction) + } +} + +object MyExtensions2Duplicate { + + val myFunction = (FunctionIdentifier("myFunction2"), +new ExpressionInfo("noClass", "myDb", "myFunction2", "usage", "last wins" ), Review comment: I made both changes and also updated one of the tests to validate the `ExpressionInfo` object rather than the extended usage text. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions
jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions URL: https://github.com/apache/spark/pull/23398#discussion_r246382931 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala ## @@ -151,14 +241,63 @@ case class MyParser(spark: SparkSession, delegate: ParserInterface) extends Pars object MyExtensions { val myFunction = (FunctionIdentifier("myFunction"), -new ExpressionInfo("noClass", "myDb", "myFunction", "usage", "extended usage" ), +new ExpressionInfo("noClass", "myDb", "myFunction", "usage", "extended usage"), (myArgs: Seq[Expression]) => Literal(5, IntegerType)) Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions
jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions URL: https://github.com/apache/spark/pull/23398#discussion_r246382805 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala ## @@ -151,14 +241,63 @@ case class MyParser(spark: SparkSession, delegate: ParserInterface) extends Pars object MyExtensions { val myFunction = (FunctionIdentifier("myFunction"), -new ExpressionInfo("noClass", "myDb", "myFunction", "usage", "extended usage" ), +new ExpressionInfo("noClass", "myDb", "myFunction", "usage", "extended usage"), (myArgs: Seq[Expression]) => Literal(5, IntegerType)) + } class MyExtensions extends (SparkSessionExtensions => Unit) { def apply(e: SparkSessionExtensions): Unit = { e.injectPlannerStrategy(MySparkStrategy) e.injectResolutionRule(MyRule) +e.injectPostHocResolutionRule(MyRule) +e.injectCheckRule(MyCheckRule) +e.injectOptimizerRule(MyRule) +e.injectParser(MyParser) e.injectFunction(MyExtensions.myFunction) } } + +case class MyRule2(spark: SparkSession) extends Rule[LogicalPlan] { + override def apply(plan: LogicalPlan): LogicalPlan = plan +} + +case class MyCheckRule2(spark: SparkSession) extends (LogicalPlan => Unit) { + override def apply(plan: LogicalPlan): Unit = { } +} + +case class MySparkStrategy2(spark: SparkSession) extends SparkStrategy { + override def apply(plan: LogicalPlan): Seq[SparkPlan] = Seq.empty +} + +object MyExtensions2 { + + val myFunction = (FunctionIdentifier("myFunction2"), +new ExpressionInfo("noClass", "myDb", "myFunction2", "usage", "extended usage" ), +(_: Seq[Expression]) => Literal(5, IntegerType)) +} + +class MyExtensions2 extends (SparkSessionExtensions => Unit) { + def apply(e: SparkSessionExtensions): Unit = { +e.injectPlannerStrategy(MySparkStrategy2) +e.injectResolutionRule(MyRule2) +e.injectPostHocResolutionRule(MyRule2) +e.injectCheckRule(MyCheckRule2) +e.injectOptimizerRule(MyRule2) +e.injectParser((_, _) => CatalystSqlParser) Review comment: I also made the suggested change in 2 other places in this file so that it is consistent. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions
jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions URL: https://github.com/apache/spark/pull/23398#discussion_r246382567 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala ## @@ -151,14 +241,63 @@ case class MyParser(spark: SparkSession, delegate: ParserInterface) extends Pars object MyExtensions { val myFunction = (FunctionIdentifier("myFunction"), -new ExpressionInfo("noClass", "myDb", "myFunction", "usage", "extended usage" ), +new ExpressionInfo("noClass", "myDb", "myFunction", "usage", "extended usage"), (myArgs: Seq[Expression]) => Literal(5, IntegerType)) + Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions
jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions URL: https://github.com/apache/spark/pull/23398#discussion_r246382355 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ## @@ -103,7 +104,13 @@ class SimpleFunctionRegistry extends FunctionRegistry { name: FunctionIdentifier, info: ExpressionInfo, builder: FunctionBuilder): Unit = synchronized { -functionBuilders.put(normalizeFuncName(name), (info, builder)) +val normalizedName = normalizeFuncName(name) +val newFunction = (info, builder) +functionBuilders.put(normalizedName, newFunction) match { + case Some(previousFunction) if previousFunction != newFunction => +logWarning(s"The function $normalizedName replaced a previously registered function.") + case _ => Unit Review comment: I made a change here which I think was what you were recommending. I didn't want to remove the `case _ =>` otherwise I think it may result in a non-exhaustive match exception. If you wanted me to remove the entire match statement, just let me know. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions
jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions URL: https://github.com/apache/spark/pull/23398#discussion_r246381002 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala ## @@ -78,6 +86,14 @@ class SparkSessionExtensionSuite extends SparkFunSuite { } } + test("inject multiple rules") { +withSession(Seq(_.injectOptimizerRule(MyRule), + _.injectPlannerStrategy(MySparkStrategy))) { session => Review comment: Done - I indented it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions
jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions URL: https://github.com/apache/spark/pull/23398#discussion_r245484843 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala ## @@ -99,9 +99,15 @@ object StaticSQLConf { .createWithDefault(false) val SPARK_SESSION_EXTENSIONS = buildStaticConf("spark.sql.extensions") -.doc("Name of the class used to configure Spark Session extensions. The class should " + - "implement Function1[SparkSessionExtension, Unit], and must have a no-args constructor.") +.doc("A comma-separated list of classes that implement " + + "Function1[SparkSessionExtension, Unit] used to configure Spark Session extensions. The " + + "classes must have a no-args constructor. If multiple extensions are specified, they are " + + "applied in the specified order. For the case of rules and planner strategies, they are " + + "applied in the specified order. For the case of parsers, the last parser is used and each " + + "parser can delegate to its predecessor. For the case of function name conflicts, the last " + Review comment: I added an example to the documentation of `SparkSessionExtensions`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions
jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions URL: https://github.com/apache/spark/pull/23398#discussion_r244759859 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ## @@ -103,7 +104,10 @@ class SimpleFunctionRegistry extends FunctionRegistry { name: FunctionIdentifier, info: ExpressionInfo, builder: FunctionBuilder): Unit = synchronized { -functionBuilders.put(normalizeFuncName(name), (info, builder)) +val normalizedName = normalizeFuncName(name) +if (functionBuilders.put(normalizedName, (info, builder)).isDefined) { Review comment: I added a check which which will only log if different function objects are registered. The "allow an extension to be duplicated" unit tests that I previously added registers the same object twice. This test no longer prints the warning. The "use the last registered function name when there are duplicates" unit tests that I previously added registers different functions with the same name. This test prints the warning. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions
jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions URL: https://github.com/apache/spark/pull/23398#discussion_r244523719 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala ## @@ -99,9 +99,10 @@ object StaticSQLConf { .createWithDefault(false) val SPARK_SESSION_EXTENSIONS = buildStaticConf("spark.sql.extensions") -.doc("Name of the class used to configure Spark Session extensions. The class should " + +.doc("List of the class names used to configure Spark Session extensions. The classes should " + Review comment: I updated the documentation as suggested with respect to the ordering. I think the comment about "listeners" is related to code that is close in proximity to the code for this pull request, but it is a different configuration item. So I didn't update the `spark.sql.queryExecutionListeners` documentation. If you think that documentation should be updated, let me know if I should include it as a part of this pull request or as a separate pull request. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions
jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions URL: https://github.com/apache/spark/pull/23398#discussion_r244523670 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala ## @@ -114,6 +124,25 @@ class SparkSessionExtensionSuite extends SparkFunSuite { stop(session) } } + + test("use multiple custom class for extensions") { +val session = SparkSession.builder() + .master("local[1]") + .config("spark.sql.extensions", Seq( +classOf[MyExtensions].getCanonicalName, +classOf[MyExtensions2].getCanonicalName).mkString(",")) + .getOrCreate() +try { + assert(session.sessionState.planner.strategies.contains(MySparkStrategy(session))) + assert(session.sessionState.analyzer.extendedResolutionRules.contains(MyRule(session))) + assert(session.sessionState.functionRegistry +.lookupFunction(MyExtensions.myFunction._1).isDefined) + assert(session.sessionState.functionRegistry +.lookupFunction(MyExtensions2.myFunction._1).isDefined) Review comment: Can we have a test case for duplicated extension names? Done Can we have a negative test case for function name conflicts? MyExtension2.myFunction and MyExtension3.myFunction? Done I added documentation for the behavior. I added a warning message if a registered function is replaced. I added a test case for the ordering. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions
jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions URL: https://github.com/apache/spark/pull/23398#discussion_r244414275 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala ## @@ -114,6 +124,25 @@ class SparkSessionExtensionSuite extends SparkFunSuite { stop(session) } } + + test("use multiple custom class for extensions") { +val session = SparkSession.builder() + .master("local[1]") + .config("spark.sql.extensions", Seq( +classOf[MyExtensions].getCanonicalName, +classOf[MyExtensions2].getCanonicalName).mkString(",")) + .getOrCreate() +try { + assert(session.sessionState.planner.strategies.contains(MySparkStrategy(session))) + assert(session.sessionState.analyzer.extendedResolutionRules.contains(MyRule(session))) + assert(session.sessionState.functionRegistry +.lookupFunction(MyExtensions.myFunction._1).isDefined) + assert(session.sessionState.functionRegistry +.lookupFunction(MyExtensions2.myFunction._1).isDefined) Review comment: Prior to this change, it was possible to programmatically register multiple extensions but it was not possible to do so through the spark.sql.extensions configuration. Although it wasn't documented/tested until this pull request. E.g. The following works without this pull request: ``` SparkSession.builder() .master("..") .withExtensions(sparkSessionExtensions1) .withExtensions(sparkSessionExtensions2) .getOrCreate() ``` So I think conflicting function names are already currently possible (but not documented). In the following cases: 1. Conflicting function names are registered by calling `.withExtenions()` multiple times 2. An extension accidentally registers a function that was already registered with the builtin functions 3. An extension accidentally registers a function multiple times by calling `injectFunction(myFunction)` As for the order, it looks to me like the last function to be stored with conflicting names is the one which is retrieved: ``` class SimpleFunctionRegistry extends FunctionRegistry { @GuardedBy("this") private val functionBuilders = new mutable.HashMap[FunctionIdentifier, (ExpressionInfo, FunctionBuilder)] override def registerFunction( name: FunctionIdentifier, info: ExpressionInfo, builder: FunctionBuilder): Unit = synchronized { functionBuilders.put(normalizeFuncName(name), (info, builder)) } ``` I will update this PR to document what happens in order of operations and conflicts. If we need to explicitly block duplicates functions from being registered, I can temporarily drop this PR and see about making those changes first. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org