Re: [PR] [SPARK-51119][SQL] Readers on executors resolving EXISTS_DEFAULT should not call catalogs [spark]

2025-02-10 Thread via GitHub


szehon-ho commented on code in PR #49840:
URL: https://github.com/apache/spark/pull/49840#discussion_r1950287853


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala:
##
@@ -320,6 +319,29 @@ object ResolveDefaultColumns extends QueryErrorsBase
 coerceDefaultValue(analyzed, dataType, statementType, colName, defaultSQL)

Review Comment:
   synced with @cloud-fan offline, this is not constant folded after this line, 
when analyzing to create EXISTS_DEFAULT.  So in the input of 
analyzeExistsDefault() we get sometimes a top level CAST from EXISTS_DEFAULT.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51119][SQL] Readers on executors resolving EXISTS_DEFAULT should not call catalogs [spark]

2025-02-10 Thread via GitHub


szehon-ho commented on code in PR #49840:
URL: https://github.com/apache/spark/pull/49840#discussion_r1950287853


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala:
##
@@ -320,6 +319,29 @@ object ResolveDefaultColumns extends QueryErrorsBase
 coerceDefaultValue(analyzed, dataType, statementType, colName, defaultSQL)

Review Comment:
   synced with @cloud-fan offline, this is not constant folded after this line 
to make EXISTS_DEFAULT.  So in the input of analyzeExistsDefault() we get 
sometimes a top level CAST from EXISTS_DEFAULT.



##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala:
##
@@ -265,6 +267,23 @@ object Literal {
   s"Literal must have a corresponding value to ${dataType.catalogString}, 
" +
   s"but class ${Utils.getSimpleName(value.getClass)} found.")
   }
+
+  def fromSQL(sql: String): Expression = {
+CatalystSqlParser.parseExpression(sql).transformUp {
+  case u: UnresolvedFunction =>
+assert(u.nameParts.length == 1)
+assert(!u.isDistinct)
+assert(u.filter.isEmpty)
+assert(!u.ignoreNulls)
+assert(u.orderingWithinGroup.isEmpty)
+assert(!u.isInternal)
+
FunctionRegistry.builtin.lookupFunction(FunctionIdentifier(u.nameParts.head), 
u.arguments)
+} match {
+  case c: Cast if c.needsTimeZone =>

Review Comment:
   synced offline, see the other comment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51119][SQL] Readers on executors resolving EXISTS_DEFAULT should not call catalogs [spark]

2025-02-10 Thread via GitHub


szehon-ho commented on code in PR #49840:
URL: https://github.com/apache/spark/pull/49840#discussion_r1950287853


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala:
##
@@ -320,6 +319,29 @@ object ResolveDefaultColumns extends QueryErrorsBase
 coerceDefaultValue(analyzed, dataType, statementType, colName, defaultSQL)

Review Comment:
   synced with @cloud-fan offline, this is not constant folded after this line, 
when analyzing to create EXISTS_DEFAULT.  So in the input of 
analyzeExistsDefault() , EXISTS_DEFAULT sometimes has a top level CAST



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51119][SQL] Readers on executors resolving EXISTS_DEFAULT should not call catalogs [spark]

2025-02-10 Thread via GitHub


szehon-ho commented on PR #49840:
URL: https://github.com/apache/spark/pull/49840#issuecomment-2649906152

   Synced with @cloud-fan offline, the current code should work for this case.  
Made follow up https://github.com/apache/spark/pull/49881 to do some cleanup to 
put the logic in the right place.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51119][SQL] Readers on executors resolving EXISTS_DEFAULT should not call catalogs [spark]

2025-02-10 Thread via GitHub


cloud-fan commented on code in PR #49840:
URL: https://github.com/apache/spark/pull/49840#discussion_r1950180575


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala:
##
@@ -320,6 +319,29 @@ object ResolveDefaultColumns extends QueryErrorsBase
 coerceDefaultValue(analyzed, dataType, statementType, colName, defaultSQL)

Review Comment:
   I think the CAST is added here, but it should be constant-folded before we 
generate the existing default string. We need to debug it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51119][SQL] Readers on executors resolving EXISTS_DEFAULT should not call catalogs [spark]

2025-02-10 Thread via GitHub


cloud-fan commented on code in PR #49840:
URL: https://github.com/apache/spark/pull/49840#discussion_r1950179313


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala:
##
@@ -265,6 +267,23 @@ object Literal {
   s"Literal must have a corresponding value to ${dataType.catalogString}, 
" +
   s"but class ${Utils.getSimpleName(value.getClass)} found.")
   }
+
+  def fromSQL(sql: String): Expression = {
+CatalystSqlParser.parseExpression(sql).transformUp {
+  case u: UnresolvedFunction =>
+assert(u.nameParts.length == 1)
+assert(!u.isDistinct)
+assert(u.filter.isEmpty)
+assert(!u.ignoreNulls)
+assert(u.orderingWithinGroup.isEmpty)
+assert(!u.isInternal)
+
FunctionRegistry.builtin.lookupFunction(FunctionIdentifier(u.nameParts.head), 
u.arguments)
+} match {
+  case c: Cast if c.needsTimeZone =>

Review Comment:
   I'm looking at the previous test failure
   ```
   Cause: org.apache.spark.sql.AnalysisException: 
[INVALID_DEFAULT_VALUE.UNRESOLVED_EXPRESSION] Failed to execute  command 
because the destination column or variable `c` has a DEFAULT value 
CAST(TIMESTAMP '2018-11-17 00:00:00' AS STRING), which fails to resolve as a 
valid expression. SQLSTATE: 42623
   ```
   
   `CAST(TIMESTAMP '2018-11-17 00:00:00' AS STRING)` can't be generated by 
`Literal#sql`. Seems we have some misunderstanding about how this existing 
default string is generated. @szehon-ho can you take a closer look?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51119][SQL] Readers on executors resolving EXISTS_DEFAULT should not call catalogs [spark]

2025-02-10 Thread via GitHub


szehon-ho commented on code in PR #49840:
URL: https://github.com/apache/spark/pull/49840#discussion_r1950141582


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala:
##
@@ -265,6 +267,23 @@ object Literal {
   s"Literal must have a corresponding value to ${dataType.catalogString}, 
" +
   s"but class ${Utils.getSimpleName(value.getClass)} found.")
   }
+
+  def fromSQL(sql: String): Expression = {
+CatalystSqlParser.parseExpression(sql).transformUp {
+  case u: UnresolvedFunction =>
+assert(u.nameParts.length == 1)
+assert(!u.isDistinct)
+assert(u.filter.isEmpty)
+assert(!u.ignoreNulls)
+assert(u.orderingWithinGroup.isEmpty)
+assert(!u.isInternal)
+
FunctionRegistry.builtin.lookupFunction(FunctionIdentifier(u.nameParts.head), 
u.arguments)
+} match {
+  case c: Cast if c.needsTimeZone =>

Review Comment:
   @cloud-fan  sure, let me do that.
   
   BTW, I looked a little bit and couldnt reproduce a failure with the current 
implementation using a following unit test with a nested cast:
   ```
 test("SPARK-51119: array of timestamp should have timezone if default 
values castable") {
   withTable("t") {
 sql(s"CREATE TABLE t(key int, c ARRAY DEFAULT " +
   s"ARRAY(CAST(timestamp '2018-11-17' AS STRING))) " +
   s"USING parquet")
 sql("INSERT INTO t (key) VALUES(1)")
 checkAnswer(sql("select * from t"), Row(1, Array("2018-11-17 
00:00:00")))
   }
 }
 ```
   
   Unlike the failing case of top-level cast:
   ```
 test("SPARK-46958: timestamp should have timezone for resolvable if 
default values castable") {
   val defaults = Seq("timestamp '2018-11-17'", "CAST(timestamp 
'2018-11-17' AS STRING)")
   defaults.foreach { default =>
 withTable("t") {
   sql(s"CREATE TABLE t(key int, c STRING DEFAULT $default) " +
 s"USING parquet")
   sql("INSERT INTO t (key) VALUES(1)")
   checkAnswer(sql("select * from t"), Row(1, "2018-11-17 00:00:00"))
 }
   }
 }
   ```
   
   EXISTS_DEFAULT is saved without a cast in the first case:  
`ARRAY('2018-11-17 00:00:00')` (looks like it got evaluated)
   and with a cast in the second case:  `CAST(TIMESTAMP '2018-11-17 00:00:00' 
AS STRING)`
   
   So I think in this particular scenario, it doesnt matter.  But agree that it 
is better to have it, as we are making  a generic method.  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51119][SQL] Readers on executors resolving EXISTS_DEFAULT should not call catalogs [spark]

2025-02-10 Thread via GitHub


szehon-ho commented on code in PR #49840:
URL: https://github.com/apache/spark/pull/49840#discussion_r1950141582


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala:
##
@@ -265,6 +267,23 @@ object Literal {
   s"Literal must have a corresponding value to ${dataType.catalogString}, 
" +
   s"but class ${Utils.getSimpleName(value.getClass)} found.")
   }
+
+  def fromSQL(sql: String): Expression = {
+CatalystSqlParser.parseExpression(sql).transformUp {
+  case u: UnresolvedFunction =>
+assert(u.nameParts.length == 1)
+assert(!u.isDistinct)
+assert(u.filter.isEmpty)
+assert(!u.ignoreNulls)
+assert(u.orderingWithinGroup.isEmpty)
+assert(!u.isInternal)
+
FunctionRegistry.builtin.lookupFunction(FunctionIdentifier(u.nameParts.head), 
u.arguments)
+} match {
+  case c: Cast if c.needsTimeZone =>

Review Comment:
   @cloud-fan  sure, let me do that.
   
   BTW, I looked a little bit and couldnt reproduce a failure with the current 
implementation using a following unit test with a nested cast:
   ```
 test("SPARK-51119: array of timestamp should have timezone if default 
values castable") {
   withTable("t") {
 sql(s"CREATE TABLE t(key int, c ARRAY DEFAULT " +
   s"ARRAY(CAST(timestamp '2018-11-17' AS STRING))) " +
   s"USING parquet")
 sql("INSERT INTO t (key) VALUES(1)")
 checkAnswer(sql("select * from t"), Row(1, Array("2018-11-17 
00:00:00")))
   }
 }
 ```
   
   Unlike the failing case of top-level cast:
   ```
 test("SPARK-46958: timestamp should have timezone for resolvable if 
default values castable") {
   val defaults = Seq("timestamp '2018-11-17'", "CAST(timestamp 
'2018-11-17' AS STRING)")
   defaults.foreach { default =>
 withTable("t") {
   sql(s"CREATE TABLE t(key int, c STRING DEFAULT $default) " +
 s"USING parquet")
   sql("INSERT INTO t (key) VALUES(1)")
   checkAnswer(sql("select * from t"), Row(1, "2018-11-17 00:00:00"))
 }
   }
 }
   ```
   
   EXISTS_DEFAULT is saved without a cast in the first case:  
`ARRAY('2018-11-17 00:00:00')` (looks like it got evaluated)
   and with a cast in the second case:  `CAST(TIMESTAMP '2018-11-17 00:00:00' 
AS STRING)`
   
   So I think in this particular scenario, it doesnt matter.  But agree that it 
is better to have it, as a generic method.  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51119][SQL] Readers on executors resolving EXISTS_DEFAULT should not call catalogs [spark]

2025-02-10 Thread via GitHub


szehon-ho commented on code in PR #49840:
URL: https://github.com/apache/spark/pull/49840#discussion_r1950141582


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala:
##
@@ -265,6 +267,23 @@ object Literal {
   s"Literal must have a corresponding value to ${dataType.catalogString}, 
" +
   s"but class ${Utils.getSimpleName(value.getClass)} found.")
   }
+
+  def fromSQL(sql: String): Expression = {
+CatalystSqlParser.parseExpression(sql).transformUp {
+  case u: UnresolvedFunction =>
+assert(u.nameParts.length == 1)
+assert(!u.isDistinct)
+assert(u.filter.isEmpty)
+assert(!u.ignoreNulls)
+assert(u.orderingWithinGroup.isEmpty)
+assert(!u.isInternal)
+
FunctionRegistry.builtin.lookupFunction(FunctionIdentifier(u.nameParts.head), 
u.arguments)
+} match {
+  case c: Cast if c.needsTimeZone =>

Review Comment:
   @cloud-fan  sure, let me do that.
   
   BTW, I looked a little bit and couldnt reproduce a failure with the current 
implementation using a following unit test with a nested cast:
   ```
 test("SPARK-51119: array of timestamp should have timezone if default 
values castable") {
   withTable("t") {
 sql(s"CREATE TABLE t(key int, c ARRAY DEFAULT " +
   s"ARRAY(CAST(timestamp '2018-11-17' AS STRING))) " +
   s"USING parquet")
 sql("INSERT INTO t (key) VALUES(1)")
 checkAnswer(sql("select * from t"), Row(1, Array("2018-11-17 
00:00:00")))
   }
 }
 ```
   
   Unlike the failing case of top-level cast:
   ```
 test("SPARK-46958: timestamp should have timezone for resolvable if 
default values castable") {
   val defaults = Seq("timestamp '2018-11-17'", "CAST(timestamp 
'2018-11-17' AS STRING)")
   defaults.foreach { default =>
 withTable("t") {
   sql(s"CREATE TABLE t(key int, c STRING DEFAULT $default) " +
 s"USING parquet")
   sql("INSERT INTO t (key) VALUES(1)")
   checkAnswer(sql("select * from t"), Row(1, "2018-11-17 00:00:00"))
 }
   }
 }
   ```
   
   EXISTS_DEFAULT is saved without a cast in the first case:  
`ARRAY('2018-11-17 00:00:00')`
   and with a cast in the second case:  `CAST(TIMESTAMP '2018-11-17 00:00:00' 
AS STRING)`
   
   So I think in this particular scenario, it doesnt matter.  But agree that it 
is better to have it, as a generic method.  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51119][SQL] Readers on executors resolving EXISTS_DEFAULT should not call catalogs [spark]

2025-02-10 Thread via GitHub


szehon-ho commented on PR #49840:
URL: https://github.com/apache/spark/pull/49840#issuecomment-2649577662

   Thanks @cloud-fan @dtenedor @dongjoon-hyun !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51119][SQL] Readers on executors resolving EXISTS_DEFAULT should not call catalogs [spark]

2025-02-10 Thread via GitHub


cloud-fan commented on code in PR #49840:
URL: https://github.com/apache/spark/pull/49840#discussion_r1950090297


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala:
##
@@ -265,6 +267,23 @@ object Literal {
   s"Literal must have a corresponding value to ${dataType.catalogString}, 
" +
   s"but class ${Utils.getSimpleName(value.getClass)} found.")
   }
+
+  def fromSQL(sql: String): Expression = {
+CatalystSqlParser.parseExpression(sql).transformUp {
+  case u: UnresolvedFunction =>
+assert(u.nameParts.length == 1)
+assert(!u.isDistinct)
+assert(u.filter.isEmpty)
+assert(!u.ignoreNulls)
+assert(u.orderingWithinGroup.isEmpty)
+assert(!u.isInternal)
+
FunctionRegistry.builtin.lookupFunction(FunctionIdentifier(u.nameParts.head), 
u.arguments)
+} match {
+  case c: Cast if c.needsTimeZone =>

Review Comment:
   the CAST can be nested inside array/map/struct, we should put this case 
match inside the `transformUp`, together with `case u: UnresolvedFunction`
   
   @szehon-ho can you make a followup PR for it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51119][SQL] Readers on executors resolving EXISTS_DEFAULT should not call catalogs [spark]

2025-02-10 Thread via GitHub


dongjoon-hyun commented on PR #49840:
URL: https://github.com/apache/spark/pull/49840#issuecomment-2649563613

   According to the affected version, I merged this to master/4.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51119][SQL] Readers on executors resolving EXISTS_DEFAULT should not call catalogs [spark]

2025-02-10 Thread via GitHub


dongjoon-hyun closed pull request #49840: [SPARK-51119][SQL] Readers on 
executors resolving EXISTS_DEFAULT should not call catalogs
URL: https://github.com/apache/spark/pull/49840


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51119][SQL] Readers on executors resolving EXISTS_DEFAULT should not call catalogs [spark]

2025-02-10 Thread via GitHub


szehon-ho commented on code in PR #49840:
URL: https://github.com/apache/spark/pull/49840#discussion_r1949755752


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala:
##
@@ -320,6 +319,29 @@ object ResolveDefaultColumns extends QueryErrorsBase
 coerceDefaultValue(analyzed, dataType, statementType, colName, defaultSQL)
   }
 
+  /**
+   * Analyze EXISTS_DEFAULT value.  This skips some steps of analyze as most 
of the
+   * analysis has been done before.
+   */
+  private def analyzeExistingDefault(field: StructField): Expression = {

Review Comment:
   Done.



##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala:
##
@@ -265,6 +267,19 @@ object Literal {
   s"Literal must have a corresponding value to ${dataType.catalogString}, 
" +
   s"but class ${Utils.getSimpleName(value.getClass)} found.")
   }
+
+  def fromSQL(sql: String): Expression = {
+CatalystSqlParser.parseExpression(sql).transformUp {

Review Comment:
   Thanks for the pointer, saved me a lot of time!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51119][SQL] Readers on executors resolving EXISTS_DEFAULT should not call catalogs [spark]

2025-02-10 Thread via GitHub


dtenedor commented on code in PR #49840:
URL: https://github.com/apache/spark/pull/49840#discussion_r1949570677


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala:
##
@@ -320,6 +319,29 @@ object ResolveDefaultColumns extends QueryErrorsBase
 coerceDefaultValue(analyzed, dataType, statementType, colName, defaultSQL)
   }
 
+  /**
+   * Analyze EXISTS_DEFAULT value.  This skips some steps of analyze as most 
of the
+   * analysis has been done before.
+   */
+  private def analyzeExistingDefault(field: StructField): Expression = {

Review Comment:
   nit: we might name this `analyzeExistenceDefaultValue` as we use the term 
"existence default value" to differentiate between "current default value". 
These are two separate metadata entries for the `StructField`, where the former 
gets filled in at scan time for each row where the corresponding column value 
is not present in storage (e.g. the Parquet vector is absent) and the latter 
gets filled in at DML time only.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51119][SQL] Readers on executors resolving EXISTS_DEFAULT should not call catalogs [spark]

2025-02-09 Thread via GitHub


cloud-fan commented on code in PR #49840:
URL: https://github.com/apache/spark/pull/49840#discussion_r1948322440


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala:
##
@@ -265,6 +267,19 @@ object Literal {
   s"Literal must have a corresponding value to ${dataType.catalogString}, 
" +
   s"but class ${Utils.getSimpleName(value.getClass)} found.")
   }
+
+  def fromSQL(sql: String): Expression = {
+CatalystSqlParser.parseExpression(sql).transformUp {

Review Comment:
   Looking at the test failure, the missing piece is that `Literal#sql` can 
generate CAST which needs to be resolved to attach timezone. So we should add 
one more case match
   ```
   case c: Cast if c.needsTimeZone =>
 c.withTimeZone(SQLConf.get.sessionLocalTimeZone)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51119][SQL] Readers on executors resolving EXISTS_DEFAULT should not call catalogs [spark]

2025-02-09 Thread via GitHub


cloud-fan commented on code in PR #49840:
URL: https://github.com/apache/spark/pull/49840#discussion_r1948322440


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala:
##
@@ -265,6 +267,19 @@ object Literal {
   s"Literal must have a corresponding value to ${dataType.catalogString}, 
" +
   s"but class ${Utils.getSimpleName(value.getClass)} found.")
   }
+
+  def fromSQL(sql: String): Expression = {
+CatalystSqlParser.parseExpression(sql).transformUp {

Review Comment:
   Looking at the test failure, the missing piece is that `Literal#sql` can 
generate CAST which needs to be resolved to attach timezone. So we should add 
one more case match
   ```
   case c: Cast if c.needsTimeZone =>
 c.withTimeZone(conf.sessionLocalTimeZone)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51119][SQL] Readers on executors resolving EXISTS_DEFAULT should not call catalogs [spark]

2025-02-08 Thread via GitHub


szehon-ho commented on code in PR #49840:
URL: https://github.com/apache/spark/pull/49840#discussion_r1947544742


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala:
##
@@ -265,6 +267,15 @@ object Literal {
   s"Literal must have a corresponding value to ${dataType.catalogString}, 
" +
   s"but class ${Utils.getSimpleName(value.getClass)} found.")
   }
+
+  def fromSQL(sql: String): Expression = {
+CatalystSqlParser.parseExpression(sql).transformUp {
+  case u: UnresolvedFunction =>
+  assert(u.nameParts.length == 1)
+  assert(!u.isDistinct)

Review Comment:
   Done, i added a bunch of asserts.  Not too familiar with these flags, please 
double check



##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala:
##
@@ -320,6 +330,29 @@ object ResolveDefaultColumns extends QueryErrorsBase
 coerceDefaultValue(analyzed, dataType, statementType, colName, defaultSQL)
   }
 
+  /**
+   * Analyze EXISTS_DEFAULT value.  This skips some steps of analyze as most 
of the
+   * analysis has been done before.
+   */
+  def analyzeExistingDefault(field: StructField, defaultSQL: String): 
Expression = {
+val defaultSQL = 
field.metadata.getString(EXISTS_DEFAULT_COLUMN_METADATA_KEY)
+
+// Parse the expression.
+val expr = Literal.fromSQL(defaultSQL)
+
+// Check invariants
+if (expr.containsPattern(PLAN_EXPRESSION)) {
+  throw 
QueryCompilationErrors.defaultValuesMayNotContainSubQueryExpressions(
+"", field.name, defaultSQL)
+}
+if (!expr.resolved) {
+  throw QueryCompilationErrors.defaultValuesUnresolvedExprError(
+"", field.name, defaultSQL, null)
+}
+
+expr

Review Comment:
   Added



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51119][SQL] Readers on executors resolving EXISTS_DEFAULT should not call catalogs [spark]

2025-02-08 Thread via GitHub


szehon-ho commented on code in PR #49840:
URL: https://github.com/apache/spark/pull/49840#discussion_r1947541477


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala:
##
@@ -407,10 +429,7 @@ object ResolveDefaultColumns extends QueryErrorsBase
   val defaultValue: Option[String] = field.getExistenceDefaultValue()
   defaultValue.map { text: String =>
 val expr = try {
-  val expr = analyze(field, "", EXISTS_DEFAULT_COLUMN_METADATA_KEY)
-  expr match {
-case _: ExprLiteral | _: Cast => expr
-  }
+  analyzeExistingDefault(field, text)

Review Comment:
   @dtarima yea i was back and forth on this, I had this test originally but 
removed it in 
https://github.com/apache/spark/pull/49840/commits/0491833ce1f43d3555f7e7969b0dceb1fe3718fc
   (hope the link works).  It seemed a little intrusive on production code and 
debatable value, but Im open.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51119][SQL] Readers on executors resolving EXISTS_DEFAULT should not call catalogs [spark]

2025-02-07 Thread via GitHub


dtenedor commented on code in PR #49840:
URL: https://github.com/apache/spark/pull/49840#discussion_r1947462204


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala:
##
@@ -407,10 +429,7 @@ object ResolveDefaultColumns extends QueryErrorsBase
   val defaultValue: Option[String] = field.getExistenceDefaultValue()
   defaultValue.map { text: String =>
 val expr = try {
-  val expr = analyze(field, "", EXISTS_DEFAULT_COLUMN_METADATA_KEY)
-  expr match {
-case _: ExprLiteral | _: Cast => expr
-  }
+  analyzeExistingDefault(field, text)

Review Comment:
   unit testing idea: we could try to stub out the catalog manager with 
something that always throws errors, if we wanted to be extra careful. Our 
`DefaultColumnAnalyzer` currently looks like this:
   
   ```
 /**
  * This is an Analyzer for processing default column values using built-in 
functions only.
  */
 object DefaultColumnAnalyzer extends Analyzer(
   new CatalogManager(BuiltInFunctionCatalog, 
BuiltInFunctionCatalog.v1Catalog)) {
 }
   ```
   
   We could use a different catalog manager that always throws errors there 
specifically when we exercise this `getExistenceDefaultValues` method. This 
would make sure we're not inadvertently doing catalog manager lookups there 
anymore.



##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala:
##
@@ -320,6 +330,29 @@ object ResolveDefaultColumns extends QueryErrorsBase
 coerceDefaultValue(analyzed, dataType, statementType, colName, defaultSQL)
   }
 
+  /**
+   * Analyze EXISTS_DEFAULT value.  This skips some steps of analyze as most 
of the
+   * analysis has been done before.
+   */
+  def analyzeExistingDefault(field: StructField, defaultSQL: String): 
Expression = {
+val defaultSQL = 
field.metadata.getString(EXISTS_DEFAULT_COLUMN_METADATA_KEY)
+
+// Parse the expression.
+val expr = Literal.fromSQL(defaultSQL)
+
+// Check invariants
+if (expr.containsPattern(PLAN_EXPRESSION)) {
+  throw 
QueryCompilationErrors.defaultValuesMayNotContainSubQueryExpressions(
+"", field.name, defaultSQL)
+}
+if (!expr.resolved) {
+  throw QueryCompilationErrors.defaultValuesUnresolvedExprError(
+"", field.name, defaultSQL, null)
+}
+
+expr

Review Comment:
   It shouldn't be strictly necessary, but would be a no-op on executors, and 
we'll be safer to have it, so might as well keep it to double-check that the 
existence default value is the type we expect.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51119][SQL] Readers on executors resolving EXISTS_DEFAULT should not call catalogs [spark]

2025-02-07 Thread via GitHub


cloud-fan commented on code in PR #49840:
URL: https://github.com/apache/spark/pull/49840#discussion_r1947410556


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala:
##
@@ -265,6 +267,15 @@ object Literal {
   s"Literal must have a corresponding value to ${dataType.catalogString}, 
" +
   s"but class ${Utils.getSimpleName(value.getClass)} found.")
   }
+
+  def fromSQL(sql: String): Expression = {
+CatalystSqlParser.parseExpression(sql).transformUp {
+  case u: UnresolvedFunction =>
+  assert(u.nameParts.length == 1)
+  assert(!u.isDistinct)

Review Comment:
   can we check other fields in `UnresolvedFunction` as well? The SQL string 
produced by `Literal#sql` should not specify any extra fields



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51119][SQL] Readers on executors resolving EXISTS_DEFAULT should not call catalogs [spark]

2025-02-07 Thread via GitHub


cloud-fan commented on code in PR #49840:
URL: https://github.com/apache/spark/pull/49840#discussion_r1947410776


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala:
##
@@ -320,6 +330,29 @@ object ResolveDefaultColumns extends QueryErrorsBase
 coerceDefaultValue(analyzed, dataType, statementType, colName, defaultSQL)
   }
 
+  /**
+   * Analyze EXISTS_DEFAULT value.  This skips some steps of analyze as most 
of the
+   * analysis has been done before.
+   */
+  def analyzeExistingDefault(field: StructField, defaultSQL: String): 
Expression = {
+val defaultSQL = 
field.metadata.getString(EXISTS_DEFAULT_COLUMN_METADATA_KEY)
+
+// Parse the expression.
+val expr = Literal.fromSQL(defaultSQL)
+
+// Check invariants
+if (expr.containsPattern(PLAN_EXPRESSION)) {
+  throw 
QueryCompilationErrors.defaultValuesMayNotContainSubQueryExpressions(
+"", field.name, defaultSQL)
+}
+if (!expr.resolved) {
+  throw QueryCompilationErrors.defaultValuesUnresolvedExprError(
+"", field.name, defaultSQL, null)
+}
+
+expr

Review Comment:
   shall we still call `coerceDefaultValue` at the end?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51119][SQL] Readers on executors resolving EXISTS_DEFAULT should not call catalogs [spark]

2025-02-07 Thread via GitHub


cloud-fan commented on code in PR #49840:
URL: https://github.com/apache/spark/pull/49840#discussion_r1947410447


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala:
##
@@ -265,6 +267,15 @@ object Literal {
   s"Literal must have a corresponding value to ${dataType.catalogString}, 
" +
   s"but class ${Utils.getSimpleName(value.getClass)} found.")
   }
+
+  def fromSQL(sql: String): Expression = {
+CatalystSqlParser.parseExpression(sql).transformUp {
+  case u: UnresolvedFunction =>
+  assert(u.nameParts.length == 1)

Review Comment:
   nit: 2 spaces indentation



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51119][SQL] Readers on executors resolving EXISTS_DEFAULT should not call catalogs [spark]

2025-02-07 Thread via GitHub


szehon-ho commented on code in PR #49840:
URL: https://github.com/apache/spark/pull/49840#discussion_r1945608585


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala:
##
@@ -52,6 +52,17 @@ object ResolveDefaultColumns extends QueryErrorsBase
   // CURRENT_DEFAULT_COLUMN_METADATA.
   val CURRENT_DEFAULT_COLUMN_NAME = "DEFAULT"
 
+  var defaultColumnAnalyzer: Analyzer = DefaultColumnAnalyzer
+  var defaultColumnOptimizer: Optimizer = DefaultColumnOptimizer
+
+  /**
+   * Visible for testing
+   */
+  def setAnalyzerAndOptimizer(analyzer: Analyzer, optimizer: Optimizer): Unit 
= {

Review Comment:
   Its hard to reproduce the issue in unit test, so I end up mocking these 
members to verify that the catalogs are not called.
   
   let me know, Im totally fine with removing this.



##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala:
##
@@ -52,6 +52,17 @@ object ResolveDefaultColumns extends QueryErrorsBase
   // CURRENT_DEFAULT_COLUMN_METADATA.
   val CURRENT_DEFAULT_COLUMN_NAME = "DEFAULT"
 
+  var defaultColumnAnalyzer: Analyzer = DefaultColumnAnalyzer
+  var defaultColumnOptimizer: Optimizer = DefaultColumnOptimizer
+
+  /**
+   * Visible for testing
+   */
+  def setAnalyzerAndOptimizer(analyzer: Analyzer, optimizer: Optimizer): Unit 
= {

Review Comment:
   Its hard to reproduce the issue in unit test, so I end up mocking these 
members to verify that the catalogs are not called.
   
   let me know, Im totally fine with removing this if we think existing test 
coverage is ok.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51119][SQL] Readers on executors resolving EXISTS_DEFAULT should not call catalogs [spark]

2025-02-07 Thread via GitHub


szehon-ho commented on PR #49840:
URL: https://github.com/apache/spark/pull/49840#issuecomment-2644417195

   Had a chat offline with @cloud-fan who suggest simplifying the 
analyzeExistence to be just the following bare bones code to resolve functions.
   
   ```
 def fromSQL(sql: String): Expression = {
   CatalystSqlParser.parseExpression(sql).transformUp {
 case u: UnresolvedFunction =>
 assert(u.nameParts.length == 1)
 assert(!u.isDistinct)
 
FunctionRegistry.builtin.lookupFunction(FunctionIdentifier(u.nameParts.head), 
u.arguments)
   }
 }
 ```
 
 Thanks for the suggestion, this simplifies it a lot.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51119][SQL] Readers on executors resolving EXISTS_DEFAULT should not call catalogs [spark]

2025-02-06 Thread via GitHub


szehon-ho commented on code in PR #49840:
URL: https://github.com/apache/spark/pull/49840#discussion_r1945607596


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala:
##
@@ -320,6 +331,67 @@ object ResolveDefaultColumns extends QueryErrorsBase
 coerceDefaultValue(analyzed, dataType, statementType, colName, defaultSQL)
   }
 
+  /**
+   * Analyze EXISTS_DEFAULT value.  This skips some steps of analyze as most 
of the
+   * analysis has been done before.
+   *
+   * VisibleForTesting
+   */
+  def analyzeExistingDefault(field: StructField,

Review Comment:
   This is simpler version of analyze, used for existsDefaultValues (which is 
called from executors).  Make in another method, as we have another may have an 
opportunity to simplify further, but for now it seems some part of analysis is 
still needed to resolve some functions like array().  The problematic code of 
FinishAnalysis was removed though.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51119][SQL] Readers on executors resolving EXISTS_DEFAULT should not call catalogs [spark]

2025-02-06 Thread via GitHub


szehon-ho commented on code in PR #49840:
URL: https://github.com/apache/spark/pull/49840#discussion_r1945607596


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala:
##
@@ -320,6 +331,67 @@ object ResolveDefaultColumns extends QueryErrorsBase
 coerceDefaultValue(analyzed, dataType, statementType, colName, defaultSQL)
   }
 
+  /**
+   * Analyze EXISTS_DEFAULT value.  This skips some steps of analyze as most 
of the
+   * analysis has been done before.
+   *
+   * VisibleForTesting
+   */
+  def analyzeExistingDefault(field: StructField,

Review Comment:
   This is simpler version of analyze, used for existsDefaultValues.  Make in 
another method, as we have another may have an opportunity to simplify further, 
but for now it seems some part of analysis is still needed to resolve some 
functions like array().  The problematic code of FinishAnalysis was removed 
though.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-51119][SQL] Readers on executors resolving EXISTS_DEFAULT should not call catalogs [spark]

2025-02-06 Thread via GitHub


szehon-ho commented on code in PR #49840:
URL: https://github.com/apache/spark/pull/49840#discussion_r1945607596


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala:
##
@@ -320,6 +331,67 @@ object ResolveDefaultColumns extends QueryErrorsBase
 coerceDefaultValue(analyzed, dataType, statementType, colName, defaultSQL)
   }
 
+  /**
+   * Analyze EXISTS_DEFAULT value.  This skips some steps of analyze as most 
of the
+   * analysis has been done before.
+   *
+   * VisibleForTesting
+   */
+  def analyzeExistingDefault(field: StructField,

Review Comment:
   This is simpler version of analyze.  We may have an opportunity to simplify 
further, but for now it seems some part of analysis is still needed to resolve 
some functions like array().  The problematic code of FinishAnalysis was 
removed though.



##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala:
##
@@ -52,6 +52,17 @@ object ResolveDefaultColumns extends QueryErrorsBase
   // CURRENT_DEFAULT_COLUMN_METADATA.
   val CURRENT_DEFAULT_COLUMN_NAME = "DEFAULT"
 
+  var defaultColumnAnalyzer: Analyzer = DefaultColumnAnalyzer
+  var defaultColumnOptimizer: Optimizer = DefaultColumnOptimizer
+
+  /**
+   * Visible for testing
+   */
+  def setAnalyzerAndOptimizer(analyzer: Analyzer, optimizer: Optimizer): Unit 
= {

Review Comment:
   Its hard to reproduce the issue in unit test, so I end up mocking these 
members to verify that the catalogs are not called.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org