[GitHub] jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions

2019-01-09 Thread GitBox
jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] 
Allow multiple spark.sql.extensions
URL: https://github.com/apache/spark/pull/23398#discussion_r246503902
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala
 ##
 @@ -108,12 +124,86 @@ class SparkSessionExtensionSuite extends SparkFunSuite {
 try {
   
assert(session.sessionState.planner.strategies.contains(MySparkStrategy(session)))
   
assert(session.sessionState.analyzer.extendedResolutionRules.contains(MyRule(session)))
+  
assert(session.sessionState.analyzer.postHocResolutionRules.contains(MyRule(session)))
+  
assert(session.sessionState.analyzer.extendedCheckRules.contains(MyCheckRule(session)))
+  
assert(session.sessionState.optimizer.batches.flatMap(_.rules).contains(MyRule(session)))
+  assert(session.sessionState.sqlParser.isInstanceOf[MyParser])
+  assert(session.sessionState.functionRegistry
+.lookupFunction(MyExtensions.myFunction._1).isDefined)
+} finally {
+  stop(session)
+}
+  }
+
+  test("use multiple custom class for extensions in the specified order") {
+val session = SparkSession.builder()
+  .master("local[1]")
+  .config("spark.sql.extensions", Seq(
+classOf[MyExtensions2].getCanonicalName,
+classOf[MyExtensions].getCanonicalName).mkString(","))
+  .getOrCreate()
+try {
+  assert(session.sessionState.planner.strategies.containsSlice(
+Seq(MySparkStrategy2(session), MySparkStrategy(session
+  val orderedRules = Seq(MyRule2(session), MyRule(session))
+  val orderedCheckRules = Seq(MyCheckRule2(session), MyCheckRule(session))
+  val parser = MyParser(session, CatalystSqlParser)
+  
assert(session.sessionState.analyzer.extendedResolutionRules.containsSlice(orderedRules))
+  
assert(session.sessionState.analyzer.postHocResolutionRules.containsSlice(orderedRules))
+  
assert(session.sessionState.analyzer.extendedCheckRules.containsSlice(orderedCheckRules))
+  
assert(session.sessionState.optimizer.batches.flatMap(_.rules).filter(orderedRules.contains)
+.containsSlice(orderedRules ++ orderedRules)) // The optimizer rules 
are duplicated
+  assert(session.sessionState.sqlParser == parser)
 
 Review comment:
   Thanks for the follow up, I was not aware of that difference. I updated the 
tests to use `===` and `!==` as originally recommended.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions

2019-01-09 Thread GitBox
jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] 
Allow multiple spark.sql.extensions
URL: https://github.com/apache/spark/pull/23398#discussion_r246395549
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala
 ##
 @@ -108,12 +124,86 @@ class SparkSessionExtensionSuite extends SparkFunSuite {
 try {
   
assert(session.sessionState.planner.strategies.contains(MySparkStrategy(session)))
   
assert(session.sessionState.analyzer.extendedResolutionRules.contains(MyRule(session)))
+  
assert(session.sessionState.analyzer.postHocResolutionRules.contains(MyRule(session)))
+  
assert(session.sessionState.analyzer.extendedCheckRules.contains(MyCheckRule(session)))
+  
assert(session.sessionState.optimizer.batches.flatMap(_.rules).contains(MyRule(session)))
+  assert(session.sessionState.sqlParser.isInstanceOf[MyParser])
+  assert(session.sessionState.functionRegistry
+.lookupFunction(MyExtensions.myFunction._1).isDefined)
+} finally {
+  stop(session)
+}
+  }
+
+  test("use multiple custom class for extensions in the specified order") {
+val session = SparkSession.builder()
+  .master("local[1]")
+  .config("spark.sql.extensions", Seq(
+classOf[MyExtensions2].getCanonicalName,
+classOf[MyExtensions].getCanonicalName).mkString(","))
+  .getOrCreate()
+try {
+  assert(session.sessionState.planner.strategies.containsSlice(
+Seq(MySparkStrategy2(session), MySparkStrategy(session
+  val orderedRules = Seq(MyRule2(session), MyRule(session))
+  val orderedCheckRules = Seq(MyCheckRule2(session), MyCheckRule(session))
+  val parser = MyParser(session, CatalystSqlParser)
+  
assert(session.sessionState.analyzer.extendedResolutionRules.containsSlice(orderedRules))
+  
assert(session.sessionState.analyzer.postHocResolutionRules.containsSlice(orderedRules))
+  
assert(session.sessionState.analyzer.extendedCheckRules.containsSlice(orderedCheckRules))
+  
assert(session.sessionState.optimizer.batches.flatMap(_.rules).filter(orderedRules.contains)
+.containsSlice(orderedRules ++ orderedRules)) // The optimizer rules 
are duplicated
+  assert(session.sessionState.sqlParser == parser)
 
 Review comment:
   Based on 
[databricks/scala-style-guide#36](https://github.com/databricks/scala-style-guide/issues/36),
 it looks like `==` might now be preferred over `===`. For what its worth, it 
seems that in the cases for this test `==` produces reasonable error messages 
such as 
`MyParser(org.apache.spark.sql.SparkSession@6e8a9c30,org.apache.spark.sql.catalyst.parser.CatalystSqlParser$@5d01ea21)
 did not equal IntentionalErrorThatIInsertedHere` and `2 did not equal 3`. So 
please let me know if there is newer guidance to use `===` and I can make the 
changes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions

2019-01-09 Thread GitBox
jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] 
Allow multiple spark.sql.extensions
URL: https://github.com/apache/spark/pull/23398#discussion_r246389358
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala
 ##
 @@ -61,6 +61,26 @@ import org.apache.spark.sql.catalyst.rules.Rule
  * .getOrCreate()
  * }}}
  *
+ * The extensions can also be used by setting the Spark SQL configuration 
property
+ * spark.sql.extensions, for example:
 
 Review comment:
   I added the recommended documentation updates and also added code marks 
around `withExtensions` and fixed the typo there. I didn't think it was 
necessary to provide the example of using the comma-separated string but I did 
note it in the documentation.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions

2019-01-09 Thread GitBox
jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] 
Allow multiple spark.sql.extensions
URL: https://github.com/apache/spark/pull/23398#discussion_r246384135
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala
 ##
 @@ -151,14 +241,63 @@ case class MyParser(spark: SparkSession, delegate: 
ParserInterface) extends Pars
 object MyExtensions {
 
   val myFunction = (FunctionIdentifier("myFunction"),
-new ExpressionInfo("noClass", "myDb", "myFunction", "usage", "extended 
usage" ),
+new ExpressionInfo("noClass", "myDb", "myFunction", "usage", "extended 
usage"),
 (myArgs: Seq[Expression]) => Literal(5, IntegerType))
+
 }
 
 class MyExtensions extends (SparkSessionExtensions => Unit) {
   def apply(e: SparkSessionExtensions): Unit = {
 e.injectPlannerStrategy(MySparkStrategy)
 e.injectResolutionRule(MyRule)
+e.injectPostHocResolutionRule(MyRule)
+e.injectCheckRule(MyCheckRule)
+e.injectOptimizerRule(MyRule)
+e.injectParser(MyParser)
 e.injectFunction(MyExtensions.myFunction)
   }
 }
+
+case class MyRule2(spark: SparkSession) extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan = plan
+}
+
+case class MyCheckRule2(spark: SparkSession) extends (LogicalPlan => Unit) {
+  override def apply(plan: LogicalPlan): Unit = { }
+}
+
+case class MySparkStrategy2(spark: SparkSession) extends SparkStrategy {
+  override def apply(plan: LogicalPlan): Seq[SparkPlan] = Seq.empty
+}
+
+object MyExtensions2 {
+
+  val myFunction = (FunctionIdentifier("myFunction2"),
+new ExpressionInfo("noClass", "myDb", "myFunction2", "usage", "extended 
usage" ),
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions

2019-01-09 Thread GitBox
jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] 
Allow multiple spark.sql.extensions
URL: https://github.com/apache/spark/pull/23398#discussion_r246383290
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala
 ##
 @@ -151,14 +241,63 @@ case class MyParser(spark: SparkSession, delegate: 
ParserInterface) extends Pars
 object MyExtensions {
 
   val myFunction = (FunctionIdentifier("myFunction"),
-new ExpressionInfo("noClass", "myDb", "myFunction", "usage", "extended 
usage" ),
+new ExpressionInfo("noClass", "myDb", "myFunction", "usage", "extended 
usage"),
 (myArgs: Seq[Expression]) => Literal(5, IntegerType))
+
 }
 
 class MyExtensions extends (SparkSessionExtensions => Unit) {
   def apply(e: SparkSessionExtensions): Unit = {
 e.injectPlannerStrategy(MySparkStrategy)
 e.injectResolutionRule(MyRule)
+e.injectPostHocResolutionRule(MyRule)
+e.injectCheckRule(MyCheckRule)
+e.injectOptimizerRule(MyRule)
+e.injectParser(MyParser)
 e.injectFunction(MyExtensions.myFunction)
   }
 }
+
+case class MyRule2(spark: SparkSession) extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan = plan
+}
+
+case class MyCheckRule2(spark: SparkSession) extends (LogicalPlan => Unit) {
+  override def apply(plan: LogicalPlan): Unit = { }
+}
+
+case class MySparkStrategy2(spark: SparkSession) extends SparkStrategy {
+  override def apply(plan: LogicalPlan): Seq[SparkPlan] = Seq.empty
+}
+
+object MyExtensions2 {
+
+  val myFunction = (FunctionIdentifier("myFunction2"),
+new ExpressionInfo("noClass", "myDb", "myFunction2", "usage", "extended 
usage" ),
+(_: Seq[Expression]) => Literal(5, IntegerType))
+}
+
+class MyExtensions2 extends (SparkSessionExtensions => Unit) {
+  def apply(e: SparkSessionExtensions): Unit = {
+e.injectPlannerStrategy(MySparkStrategy2)
+e.injectResolutionRule(MyRule2)
+e.injectPostHocResolutionRule(MyRule2)
+e.injectCheckRule(MyCheckRule2)
+e.injectOptimizerRule(MyRule2)
+e.injectParser((_, _) => CatalystSqlParser)
+e.injectFunction(MyExtensions2.myFunction)
+  }
+}
+
+object MyExtensions2Duplicate {
+
+  val myFunction = (FunctionIdentifier("myFunction2"),
+new ExpressionInfo("noClass", "myDb", "myFunction2", "usage", "last wins" 
),
 
 Review comment:
   I made both changes and also updated one of the tests to validate the 
`ExpressionInfo` object rather than the extended usage text.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions

2019-01-09 Thread GitBox
jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] 
Allow multiple spark.sql.extensions
URL: https://github.com/apache/spark/pull/23398#discussion_r246382931
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala
 ##
 @@ -151,14 +241,63 @@ case class MyParser(spark: SparkSession, delegate: 
ParserInterface) extends Pars
 object MyExtensions {
 
   val myFunction = (FunctionIdentifier("myFunction"),
-new ExpressionInfo("noClass", "myDb", "myFunction", "usage", "extended 
usage" ),
+new ExpressionInfo("noClass", "myDb", "myFunction", "usage", "extended 
usage"),
 (myArgs: Seq[Expression]) => Literal(5, IntegerType))
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions

2019-01-09 Thread GitBox
jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] 
Allow multiple spark.sql.extensions
URL: https://github.com/apache/spark/pull/23398#discussion_r246382805
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala
 ##
 @@ -151,14 +241,63 @@ case class MyParser(spark: SparkSession, delegate: 
ParserInterface) extends Pars
 object MyExtensions {
 
   val myFunction = (FunctionIdentifier("myFunction"),
-new ExpressionInfo("noClass", "myDb", "myFunction", "usage", "extended 
usage" ),
+new ExpressionInfo("noClass", "myDb", "myFunction", "usage", "extended 
usage"),
 (myArgs: Seq[Expression]) => Literal(5, IntegerType))
+
 }
 
 class MyExtensions extends (SparkSessionExtensions => Unit) {
   def apply(e: SparkSessionExtensions): Unit = {
 e.injectPlannerStrategy(MySparkStrategy)
 e.injectResolutionRule(MyRule)
+e.injectPostHocResolutionRule(MyRule)
+e.injectCheckRule(MyCheckRule)
+e.injectOptimizerRule(MyRule)
+e.injectParser(MyParser)
 e.injectFunction(MyExtensions.myFunction)
   }
 }
+
+case class MyRule2(spark: SparkSession) extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan = plan
+}
+
+case class MyCheckRule2(spark: SparkSession) extends (LogicalPlan => Unit) {
+  override def apply(plan: LogicalPlan): Unit = { }
+}
+
+case class MySparkStrategy2(spark: SparkSession) extends SparkStrategy {
+  override def apply(plan: LogicalPlan): Seq[SparkPlan] = Seq.empty
+}
+
+object MyExtensions2 {
+
+  val myFunction = (FunctionIdentifier("myFunction2"),
+new ExpressionInfo("noClass", "myDb", "myFunction2", "usage", "extended 
usage" ),
+(_: Seq[Expression]) => Literal(5, IntegerType))
+}
+
+class MyExtensions2 extends (SparkSessionExtensions => Unit) {
+  def apply(e: SparkSessionExtensions): Unit = {
+e.injectPlannerStrategy(MySparkStrategy2)
+e.injectResolutionRule(MyRule2)
+e.injectPostHocResolutionRule(MyRule2)
+e.injectCheckRule(MyCheckRule2)
+e.injectOptimizerRule(MyRule2)
+e.injectParser((_, _) => CatalystSqlParser)
 
 Review comment:
   I also made the suggested change in 2 other places in this file so that it 
is consistent.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions

2019-01-09 Thread GitBox
jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] 
Allow multiple spark.sql.extensions
URL: https://github.com/apache/spark/pull/23398#discussion_r246382567
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala
 ##
 @@ -151,14 +241,63 @@ case class MyParser(spark: SparkSession, delegate: 
ParserInterface) extends Pars
 object MyExtensions {
 
   val myFunction = (FunctionIdentifier("myFunction"),
-new ExpressionInfo("noClass", "myDb", "myFunction", "usage", "extended 
usage" ),
+new ExpressionInfo("noClass", "myDb", "myFunction", "usage", "extended 
usage"),
 (myArgs: Seq[Expression]) => Literal(5, IntegerType))
+
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions

2019-01-09 Thread GitBox
jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] 
Allow multiple spark.sql.extensions
URL: https://github.com/apache/spark/pull/23398#discussion_r246382355
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
 ##
 @@ -103,7 +104,13 @@ class SimpleFunctionRegistry extends FunctionRegistry {
   name: FunctionIdentifier,
   info: ExpressionInfo,
   builder: FunctionBuilder): Unit = synchronized {
-functionBuilders.put(normalizeFuncName(name), (info, builder))
+val normalizedName = normalizeFuncName(name)
+val newFunction = (info, builder)
+functionBuilders.put(normalizedName, newFunction) match {
+  case Some(previousFunction) if previousFunction != newFunction =>
+logWarning(s"The function $normalizedName replaced a previously 
registered function.")
+  case _ => Unit
 
 Review comment:
   I made a change here which I think was what you were recommending. I didn't 
want to remove the `case _ =>` otherwise I think it may result in a 
non-exhaustive match exception. If you wanted me to remove the entire match 
statement, just let me know.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions

2019-01-09 Thread GitBox
jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] 
Allow multiple spark.sql.extensions
URL: https://github.com/apache/spark/pull/23398#discussion_r246381002
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala
 ##
 @@ -78,6 +86,14 @@ class SparkSessionExtensionSuite extends SparkFunSuite {
 }
   }
 
+  test("inject multiple rules") {
+withSession(Seq(_.injectOptimizerRule(MyRule),
+  _.injectPlannerStrategy(MySparkStrategy))) { session =>
 
 Review comment:
   Done - I indented it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions

2019-01-05 Thread GitBox
jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] 
Allow multiple spark.sql.extensions
URL: https://github.com/apache/spark/pull/23398#discussion_r245484843
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala
 ##
 @@ -99,9 +99,15 @@ object StaticSQLConf {
   .createWithDefault(false)
 
   val SPARK_SESSION_EXTENSIONS = buildStaticConf("spark.sql.extensions")
-.doc("Name of the class used to configure Spark Session extensions. The 
class should " +
-  "implement Function1[SparkSessionExtension, Unit], and must have a 
no-args constructor.")
+.doc("A comma-separated list of classes that implement " +
+  "Function1[SparkSessionExtension, Unit] used to configure Spark Session 
extensions. The " +
+  "classes must have a no-args constructor. If multiple extensions are 
specified, they are " +
+  "applied in the specified order. For the case of rules and planner 
strategies, they are " +
+  "applied in the specified order. For the case of parsers, the last 
parser is used and each " +
+  "parser can delegate to its predecessor. For the case of function name 
conflicts, the last " +
 
 Review comment:
   I added an example to the documentation of `SparkSessionExtensions`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions

2019-01-02 Thread GitBox
jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] 
Allow multiple spark.sql.extensions
URL: https://github.com/apache/spark/pull/23398#discussion_r244759859
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
 ##
 @@ -103,7 +104,10 @@ class SimpleFunctionRegistry extends FunctionRegistry {
   name: FunctionIdentifier,
   info: ExpressionInfo,
   builder: FunctionBuilder): Unit = synchronized {
-functionBuilders.put(normalizeFuncName(name), (info, builder))
+val normalizedName = normalizeFuncName(name)
+if (functionBuilders.put(normalizedName, (info, builder)).isDefined) {
 
 Review comment:
   I added a check which which will only log if different function objects are 
registered. The "allow an extension to be duplicated" unit tests that I 
previously added registers the same object twice. This test no longer prints 
the warning. The "use the last registered function name when there are 
duplicates" unit tests that I previously added registers different functions 
with the same name. This test prints the warning.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions

2018-12-29 Thread GitBox
jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] 
Allow multiple spark.sql.extensions
URL: https://github.com/apache/spark/pull/23398#discussion_r244523719
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala
 ##
 @@ -99,9 +99,10 @@ object StaticSQLConf {
   .createWithDefault(false)
 
   val SPARK_SESSION_EXTENSIONS = buildStaticConf("spark.sql.extensions")
-.doc("Name of the class used to configure Spark Session extensions. The 
class should " +
+.doc("List of the class names used to configure Spark Session extensions. 
The classes should " +
 
 Review comment:
   I updated the documentation as suggested with respect to the ordering. I 
think the comment about "listeners" is related to code that is close in 
proximity to the code for this pull request, but it is a different 
configuration item. So I didn't update the `spark.sql.queryExecutionListeners` 
documentation. If you think that documentation should be updated, let me know 
if I should include it as a part of this pull request or as a separate pull 
request.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions

2018-12-29 Thread GitBox
jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] 
Allow multiple spark.sql.extensions
URL: https://github.com/apache/spark/pull/23398#discussion_r244523670
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala
 ##
 @@ -114,6 +124,25 @@ class SparkSessionExtensionSuite extends SparkFunSuite {
   stop(session)
 }
   }
+
+  test("use multiple custom class for extensions") {
+val session = SparkSession.builder()
+  .master("local[1]")
+  .config("spark.sql.extensions", Seq(
+classOf[MyExtensions].getCanonicalName,
+classOf[MyExtensions2].getCanonicalName).mkString(","))
+  .getOrCreate()
+try {
+  
assert(session.sessionState.planner.strategies.contains(MySparkStrategy(session)))
+  
assert(session.sessionState.analyzer.extendedResolutionRules.contains(MyRule(session)))
+  assert(session.sessionState.functionRegistry
+.lookupFunction(MyExtensions.myFunction._1).isDefined)
+  assert(session.sessionState.functionRegistry
+.lookupFunction(MyExtensions2.myFunction._1).isDefined)
 
 Review comment:
   Can we have a test case for duplicated extension names? Done
   Can we have a negative test case for function name conflicts? 
MyExtension2.myFunction and MyExtension3.myFunction? Done
   
   I added documentation for the behavior.
   I added a warning message if a registered function is replaced.
   I added a test case for the ordering.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions

2018-12-28 Thread GitBox
jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] 
Allow multiple spark.sql.extensions
URL: https://github.com/apache/spark/pull/23398#discussion_r244414275
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala
 ##
 @@ -114,6 +124,25 @@ class SparkSessionExtensionSuite extends SparkFunSuite {
   stop(session)
 }
   }
+
+  test("use multiple custom class for extensions") {
+val session = SparkSession.builder()
+  .master("local[1]")
+  .config("spark.sql.extensions", Seq(
+classOf[MyExtensions].getCanonicalName,
+classOf[MyExtensions2].getCanonicalName).mkString(","))
+  .getOrCreate()
+try {
+  
assert(session.sessionState.planner.strategies.contains(MySparkStrategy(session)))
+  
assert(session.sessionState.analyzer.extendedResolutionRules.contains(MyRule(session)))
+  assert(session.sessionState.functionRegistry
+.lookupFunction(MyExtensions.myFunction._1).isDefined)
+  assert(session.sessionState.functionRegistry
+.lookupFunction(MyExtensions2.myFunction._1).isDefined)
 
 Review comment:
   Prior to this change, it was possible to programmatically register multiple 
extensions but it was not possible to do so through the spark.sql.extensions 
configuration. Although it wasn't documented/tested until this pull request. 
E.g. The following works without this pull request:
   ```
   SparkSession.builder()
 .master("..")
 .withExtensions(sparkSessionExtensions1)
 .withExtensions(sparkSessionExtensions2)
 .getOrCreate()
   ```
   
   So I think conflicting function names are already currently possible (but 
not documented). In the following cases:
   1. Conflicting function names are registered by calling `.withExtenions()` 
multiple times
   2. An extension accidentally registers a function that was already 
registered with the builtin functions
   3. An extension accidentally registers a function multiple times by calling 
`injectFunction(myFunction)`
   
   As for the order, it looks to me like the last function to be stored with 
conflicting names is the one which is retrieved:
   ```
   class SimpleFunctionRegistry extends FunctionRegistry {
   
 @GuardedBy("this")
 private val functionBuilders =
   new mutable.HashMap[FunctionIdentifier, (ExpressionInfo, 
FunctionBuilder)]
 override def registerFunction(
 name: FunctionIdentifier,
 info: ExpressionInfo,
 builder: FunctionBuilder): Unit = synchronized {
   functionBuilders.put(normalizeFuncName(name), (info, builder))
 }
   ```
   
   I will update this PR to document what happens in order of operations and 
conflicts. If we need to explicitly block duplicates functions from being 
registered, I can temporarily drop this PR and see about making those changes 
first.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org