[jira] [Resolved] (SPARK-39295) Improve documentation of pandas API support list.

2022-06-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-39295.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36729
[https://github.com/apache/spark/pull/36729]

> Improve documentation of pandas API support list.
> -
>
> Key: SPARK-39295
> URL: https://issues.apache.org/jira/browse/SPARK-39295
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Hyunwoo Park
>Assignee: Hyunwoo Park
>Priority: Major
> Fix For: 3.4.0
>
>
> The description provided in the supported pandas API list document or the 
> code comment needs improvement. Also, there are cases where the link of the 
> function property provided in the document is not connected, so it needs to 
> be corrected.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39295) Improve documentation of pandas API support list.

2022-06-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-39295:


Assignee: Hyunwoo Park

> Improve documentation of pandas API support list.
> -
>
> Key: SPARK-39295
> URL: https://issues.apache.org/jira/browse/SPARK-39295
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Hyunwoo Park
>Assignee: Hyunwoo Park
>Priority: Major
>
> The description provided in the supported pandas API list document or the 
> code comment needs improvement. Also, there are cases where the link of the 
> function property provided in the document is not connected, so it needs to 
> be corrected.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39355) Avoid UnresolvedAttribute.apply throwing ParseException

2022-06-01 Thread dzcxzl (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-39355:
---
Summary: Avoid UnresolvedAttribute.apply throwing ParseException  (was: 
UnresolvedAttribute should only use CatalystSqlParser if name contains dot)

> Avoid UnresolvedAttribute.apply throwing ParseException
> ---
>
> Key: SPARK-39355
> URL: https://issues.apache.org/jira/browse/SPARK-39355
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: dzcxzl
>Priority: Trivial
>
>  
> {code:java}
> select * from (select '2022-06-01' as c1 ) a where c1 in (select 
> date_add('2022-06-01',0)); {code}
> {code:java}
> Error in query:
> mismatched input '(' expecting {, '.', '-'}(line 1, pos 8)
> == SQL ==
> date_add(2022-06-01, 0)
> ^^^ {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29260) Enable supported Hive metastore versions once it support altering database location

2022-06-01 Thread Chao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17545268#comment-17545268
 ] 

Chao Sun commented on SPARK-29260:
--

Thanks [~yumwang]. Spark currently throw exception when Hive client version is 
not 3.0/3.1 (see 
[here|https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L360]),
 which I think is not necessary, since what matters is the Hive version used by 
HMS. For instance, when Spark with built-in Hive 2.x talks to a HMS of Hive 
3.x, it should still be able to change database location. On the other hand, 
the command won't be effective if Spark with Hive 3.x is talking to a HMS of 
Hive 2.x.

> Enable supported Hive metastore versions once it support altering database 
> location
> ---
>
> Key: SPARK-29260
> URL: https://issues.apache.org/jira/browse/SPARK-29260
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>
> Hive 3.x is supported currently. Hive 2.2.1 and Hive 2.4.0 have not released.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39361) Stop using Log4J2's extended throwable logging pattern in default logging configurations

2022-06-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17545266#comment-17545266
 ] 

Apache Spark commented on SPARK-39361:
--

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/36747

> Stop using Log4J2's extended throwable logging pattern in default logging 
> configurations
> 
>
> Key: SPARK-39361
> URL: https://issues.apache.org/jira/browse/SPARK-39361
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Major
>
> This PR addresses a performance problem in Log4J 2 related to exception 
> logging: in certain scenarios I observed that Log4J2's exception stacktrace 
> logging can be ~10x slower than Log4J 1.
> The problem stems from a new log pattern format in Log4J2 called ["extended 
> exception"|https://logging.apache.org/log4j/2.x/manual/layouts.html#PatternExtendedException],
>  which enriches the regular stacktrace string with information on the name of 
> the JAR files that contained the classes in each stack frame.
> Log4J queries the classloader to determine the source JAR for each class. 
> This isn't cheap, but this information is cached and reused in future 
> exception logging calls. In certain scenarios involving runtime-generated 
> classes, this lookup will fail and the failed lookup result will _not_ be 
> cached. As a result, expensive classloading operations will be performed 
> every time such an exception is logged. In addition to being very slow, these 
> operations take out a lock on the classloader and thus can cause severe lock 
> contention if multiple threads are logging errors. This issue is described in 
> more detail in a comment on a Log4J2 JIRA and in a linked blogpost: 
> https://issues.apache.org/jira/browse/LOG4J2-2391?focusedCommentId=16667140&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16667140
>  . Spark frequently uses generated classes and lambdas and thus Spark 
> executor logs will almost always trigger this edge-case and suffer from poor 
> performance.
> By default, if you do not specify an explicit exception format in your 
> logging pattern then Log4J2 will add this "extended exception" pattern (see 
> PatternLayout's {{alwaysWriteExceptions}} flag in Log4J's documentation, plus 
> [the code implementing that 
> flag|https://github.com/apache/logging-log4j2/blob/d6c8ab0863c551cdf0f8a5b1966ab45e3cddf572/log4j-core/src/main/java/org/apache/logging/log4j/core/pattern/PatternParser.java#L206-L209]
>  in Log4J2).
> In this PR, I have updated Spark's default Log4J2 configurations so that each 
> pattern layout includes an explicit {{%ex}} so that it uses the normal 
> (non-extended) exception logging format.
> Although it's true that any program logging exceptions at a high rate should 
> probably just fix the source of the exceptions, I think it's still a good 
> idea for us to try to fix this out-of-the-box performance difference so that 
> users' existing workloads do not regress when upgrading to 3.3.0.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29260) Enable supported Hive metastore versions once it support altering database location

2022-06-01 Thread Yuming Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17545264#comment-17545264
 ] 

Yuming Wang commented on SPARK-29260:
-

The syntax supports it, but it doesn't change location:
{code:scala}
sql("CREATE DATABASE db1")
sql("ALTER DATABASE db1 SET LOCATION 'file://tmp/spark/db1'")
sql("DESC DATABASE EXTENDED db1").show(false)
{code}
{noformat}
+--+---+
|info_name |info_value  
   |
+--+---+
|Namespace Name|db1 
   |
|Comment   |
   |
|Location  
|file:/private/var/folders/tg/f5mz46090wg7swzgdc69f8q03965_0/T/warehouse-cbe89b9b-528d-420f-bea5-5f7eb714dc07/db1.db|
|Owner |yumwang 
   |
|Properties|
   |
+--+---+
{noformat}

Hive 3.1 support it:
{noformat}
bin/spark-sql --conf spark.sql.hive.metastore.version=3.1.0 --conf 
spark.sql.hive.metastore.jars=maven

spark-sql> CREATE DATABASE db1;
Time taken: 2.628 seconds
spark-sql> ALTER DATABASE db1 SET LOCATION 'file://tmp/spark/db1';
Time taken: 0.142 seconds
spark-sql> DESC DATABASE EXTENDED db1;
Namespace Name  db1
Comment
Locationfile:/spark/db1
Owner   yumwang
Properties
Time taken: 0.427 seconds, Fetched 5 row(s)
{noformat}




> Enable supported Hive metastore versions once it support altering database 
> location
> ---
>
> Key: SPARK-29260
> URL: https://issues.apache.org/jira/browse/SPARK-29260
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>
> Hive 3.x is supported currently. Hive 2.2.1 and Hive 2.4.0 have not released.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39314) Respect ps.concat sort parameter to follow pandas behavior

2022-06-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-39314:


Assignee: Yikun Jiang

> Respect ps.concat sort parameter to follow pandas behavior
> --
>
> Key: SPARK-39314
> URL: https://issues.apache.org/jira/browse/SPARK-39314
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
>
> [https://github.com/Yikun/spark/pull/101/checks?check_run_id=6621103945]
>  
> [https://github.com/pandas-dev/pandas/issues/47127]
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39314) Respect ps.concat sort parameter to follow pandas behavior

2022-06-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-39314.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36711
[https://github.com/apache/spark/pull/36711]

> Respect ps.concat sort parameter to follow pandas behavior
> --
>
> Key: SPARK-39314
> URL: https://issues.apache.org/jira/browse/SPARK-39314
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
> Fix For: 3.4.0
>
>
> [https://github.com/Yikun/spark/pull/101/checks?check_run_id=6621103945]
>  
> [https://github.com/pandas-dev/pandas/issues/47127]
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39326) replace "NaN" with real "None" value in indexes in doctest

2022-06-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-39326.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36712
[https://github.com/apache/spark/pull/36712]

> replace "NaN" with real "None" value in indexes in doctest
> --
>
> Key: SPARK-39326
> URL: https://issues.apache.org/jira/browse/SPARK-39326
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39326) replace "NaN" with real "None" value in indexes in doctest

2022-06-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-39326:


Assignee: Yikun Jiang

> replace "NaN" with real "None" value in indexes in doctest
> --
>
> Key: SPARK-39326
> URL: https://issues.apache.org/jira/browse/SPARK-39326
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39353) Cannot fetch hdfs data node local

2022-06-01 Thread Jinpeng Chi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17545254#comment-17545254
 ] 

Jinpeng Chi commented on SPARK-39353:
-

I enabled short-circuit reading on HDFS deployed on Kubernetes and mounted it 
on the physical machine. The Spark Pods (including Executor and Driver) I 
submitted to Kubernetes mount this Socket, but when I query, all Task Loading 
data on HDFS is very slow, and the UI shows that the Locality Level is Any, so 
I want to ask how to open Node Local, instead of going through the network, 
directly from the disk

> Cannot fetch hdfs data node local
> -
>
> Key: SPARK-39353
> URL: https://issues.apache.org/jira/browse/SPARK-39353
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.2.1
> Environment: HDFS on Kubernetes 3.3.1
> Spark on Kubernetes 3.2.1
>Reporter: Jinpeng Chi
>Priority: Major
>
> When i use hdfs short circuit read, the local level always any



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39353) Cannot fetch hdfs data node local

2022-06-01 Thread Jinpeng Chi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17545255#comment-17545255
 ] 

Jinpeng Chi commented on SPARK-39353:
-

[~hyukjin.kwon] 

> Cannot fetch hdfs data node local
> -
>
> Key: SPARK-39353
> URL: https://issues.apache.org/jira/browse/SPARK-39353
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.2.1
> Environment: HDFS on Kubernetes 3.3.1
> Spark on Kubernetes 3.2.1
>Reporter: Jinpeng Chi
>Priority: Major
>
> When i use hdfs short circuit read, the local level always any



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39360) Recover spark.kubernetes.memoryOverheadFactor doc and remove deprecation

2022-06-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-39360:
-

Assignee: Dongjoon Hyun

> Recover spark.kubernetes.memoryOverheadFactor doc and remove deprecation
> 
>
> Key: SPARK-39360
> URL: https://issues.apache.org/jira/browse/SPARK-39360
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39360) Recover spark.kubernetes.memoryOverheadFactor doc and remove deprecation

2022-06-01 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-39360.
---
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 36744
[https://github.com/apache/spark/pull/36744]

> Recover spark.kubernetes.memoryOverheadFactor doc and remove deprecation
> 
>
> Key: SPARK-39360
> URL: https://issues.apache.org/jira/browse/SPARK-39360
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39354) The analysis exception is incorrect

2022-06-01 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17545249#comment-17545249
 ] 

Hyukjin Kwon commented on SPARK-39354:
--

BTW, I don't think this is a release blocker because it only changes the error 
messages .. right?

> The analysis exception is incorrect
> ---
>
> Key: SPARK-39354
> URL: https://issues.apache.org/jira/browse/SPARK-39354
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Priority: Minor
>
> {noformat}
> scala> spark.sql("create table t1(user_id int, auct_end_dt date) using 
> parquet;")
> res0: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("select * from t1 join t2 on t1.user_id = t2.user_id where 
> t1.auct_end_dt >= Date_sub('2020-12-27', 90)").show
> org.apache.spark.sql.AnalysisException: cannot resolve 
> 'date_sub('2020-12-27', 90)' due to data type mismatch: argument 1 requires 
> date type, however, ''2020-12-27'' is of string type.; line 1 pos 76
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.RemoveTempResolvedColumn$.$anonfun$apply$82(Analyzer.scala:4334)
>   at 
> org.apache.spark.sql.catalyst.analysis.RemoveTempResolvedColumn$.$anonfun$apply$82$adapted(Analyzer.scala:4327)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:365)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:364)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:364)
> {noformat}
> The analysis exception should be:
> {noformat}
> org.apache.spark.sql.AnalysisException: Table or view not found: t2
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39361) Stop using Log4J2's extended throwable logging pattern in default logging configurations

2022-06-01 Thread Josh Rosen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-39361:
---
Description: 
This PR addresses a performance problem in Log4J 2 related to exception 
logging: in certain scenarios I observed that Log4J2's exception stacktrace 
logging can be ~10x slower than Log4J 1.

The problem stems from a new log pattern format in Log4J2 called ["extended 
exception"|https://logging.apache.org/log4j/2.x/manual/layouts.html#PatternExtendedException],
 which enriches the regular stacktrace string with information on the name of 
the JAR files that contained the classes in each stack frame.

Log4J queries the classloader to determine the source JAR for each class. This 
isn't cheap, but this information is cached and reused in future exception 
logging calls. In certain scenarios involving runtime-generated classes, this 
lookup will fail and the failed lookup result will _not_ be cached. As a 
result, expensive classloading operations will be performed every time such an 
exception is logged. In addition to being very slow, these operations take out 
a lock on the classloader and thus can cause severe lock contention if multiple 
threads are logging errors. This issue is described in more detail in a comment 
on a Log4J2 JIRA and in a linked blogpost: 
https://issues.apache.org/jira/browse/LOG4J2-2391?focusedCommentId=16667140&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16667140
 . Spark frequently uses generated classes and lambdas and thus Spark executor 
logs will almost always trigger this edge-case and suffer from poor performance.

By default, if you do not specify an explicit exception format in your logging 
pattern then Log4J2 will add this "extended exception" pattern (see 
PatternLayout's {{alwaysWriteExceptions}} flag in Log4J's documentation, plus 
[the code implementing that 
flag|https://github.com/apache/logging-log4j2/blob/d6c8ab0863c551cdf0f8a5b1966ab45e3cddf572/log4j-core/src/main/java/org/apache/logging/log4j/core/pattern/PatternParser.java#L206-L209]
 in Log4J2).

In this PR, I have updated Spark's default Log4J2 configurations so that each 
pattern layout includes an explicit {{%ex}} so that it uses the normal 
(non-extended) exception logging format.

Although it's true that any program logging exceptions at a high rate should 
probably just fix the source of the exceptions, I think it's still a good idea 
for us to try to fix this out-of-the-box performance difference so that users' 
existing workloads do not regress when upgrading to 3.3.0.

  was:
This PR addresses a performance problem in Log4J 2 related to exception 
logging: in certain scenarios I observed that Log4J2's exception stacktrace 
logging can be ~10x slower than Log4J 1.

The problem stems from a new log pattern format in Log4J2 called ["extended 
exception"|https://logging.apache.org/log4j/2.x/manual/layouts.html#PatternExtendedException],
 which enriches the regular stacktrace string with information on the name of 
the JAR files that contained the classes in each stack frame.

Log4J queries the classloader to determine the source JAR for each class. This 
isn't cheap, but this information is cached and reused in future exception 
logging calls. In certain scenarios involving runtime-generated classes, this 
lookup will fail and the failed lookup result will _not_ be cached. As a 
result, expensive classloading operations will be performed every time such an 
exception is logged. In addition to being very slow, these operations take out 
a lock on the classloader and thus can cause severe lock contention if multiple 
threads are logging errors. This issue is described in more detail in a comment 
on a Log4J2 JIRA and in a linked blogpost. Spark frequently uses generated 
classes and lambdas and thus Spark executor logs will almost always trigger 
this edge-case and suffer from poor performance.

By default, if you do not specify an explicit exception format in your logging 
pattern then Log4J2 will add this "extended exception" pattern (see 
PatternLayout's {{alwaysWriteExceptions}} flag in Log4J's documentation, plus 
[the code implementing that 
flag|https://github.com/apache/logging-log4j2/blob/d6c8ab0863c551cdf0f8a5b1966ab45e3cddf572/log4j-core/src/main/java/org/apache/logging/log4j/core/pattern/PatternParser.java#L206-L209]
 in Log4J2).

In this PR, I have updated Spark's default Log4J2 configurations so that each 
pattern layout includes an explicit {{%ex}} so that it uses the normal 
(non-extended) exception logging format.

Although it's true that any program logging exceptions at a high rate should 
probably just fix the source of the exceptions, I think it's still a good idea 
for us to try to fix this out-of-the-box performance difference so that users' 
existing workloads do not regress when upgrading to 3.3.0.


> Stop using Log4J2's extended throwa

[jira] [Assigned] (SPARK-39354) The analysis exception is incorrect

2022-06-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39354:


Assignee: (was: Apache Spark)

> The analysis exception is incorrect
> ---
>
> Key: SPARK-39354
> URL: https://issues.apache.org/jira/browse/SPARK-39354
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Priority: Minor
>
> {noformat}
> scala> spark.sql("create table t1(user_id int, auct_end_dt date) using 
> parquet;")
> res0: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("select * from t1 join t2 on t1.user_id = t2.user_id where 
> t1.auct_end_dt >= Date_sub('2020-12-27', 90)").show
> org.apache.spark.sql.AnalysisException: cannot resolve 
> 'date_sub('2020-12-27', 90)' due to data type mismatch: argument 1 requires 
> date type, however, ''2020-12-27'' is of string type.; line 1 pos 76
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.RemoveTempResolvedColumn$.$anonfun$apply$82(Analyzer.scala:4334)
>   at 
> org.apache.spark.sql.catalyst.analysis.RemoveTempResolvedColumn$.$anonfun$apply$82$adapted(Analyzer.scala:4327)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:365)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:364)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:364)
> {noformat}
> The analysis exception should be:
> {noformat}
> org.apache.spark.sql.AnalysisException: Table or view not found: t2
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39354) The analysis exception is incorrect

2022-06-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39354:


Assignee: Apache Spark

> The analysis exception is incorrect
> ---
>
> Key: SPARK-39354
> URL: https://issues.apache.org/jira/browse/SPARK-39354
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>Priority: Minor
>
> {noformat}
> scala> spark.sql("create table t1(user_id int, auct_end_dt date) using 
> parquet;")
> res0: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("select * from t1 join t2 on t1.user_id = t2.user_id where 
> t1.auct_end_dt >= Date_sub('2020-12-27', 90)").show
> org.apache.spark.sql.AnalysisException: cannot resolve 
> 'date_sub('2020-12-27', 90)' due to data type mismatch: argument 1 requires 
> date type, however, ''2020-12-27'' is of string type.; line 1 pos 76
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.RemoveTempResolvedColumn$.$anonfun$apply$82(Analyzer.scala:4334)
>   at 
> org.apache.spark.sql.catalyst.analysis.RemoveTempResolvedColumn$.$anonfun$apply$82$adapted(Analyzer.scala:4327)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:365)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:364)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:364)
> {noformat}
> The analysis exception should be:
> {noformat}
> org.apache.spark.sql.AnalysisException: Table or view not found: t2
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39361) Stop using Log4J2's extended throwable logging pattern in default logging configurations

2022-06-01 Thread Josh Rosen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-39361:
---
Description: 
This PR addresses a performance problem in Log4J 2 related to exception 
logging: in certain scenarios I observed that Log4J2's exception stacktrace 
logging can be ~10x slower than Log4J 1.

The problem stems from a new log pattern format in Log4J2 called ["extended 
exception"|https://logging.apache.org/log4j/2.x/manual/layouts.html#PatternExtendedException],
 which enriches the regular stacktrace string with information on the name of 
the JAR files that contained the classes in each stack frame.

Log4J queries the classloader to determine the source JAR for each class. This 
isn't cheap, but this information is cached and reused in future exception 
logging calls. In certain scenarios involving runtime-generated classes, this 
lookup will fail and the failed lookup result will _not_ be cached. As a 
result, expensive classloading operations will be performed every time such an 
exception is logged. In addition to being very slow, these operations take out 
a lock on the classloader and thus can cause severe lock contention if multiple 
threads are logging errors. This issue is described in more detail in a comment 
on a Log4J2 JIRA and in a linked blogpost. Spark frequently uses generated 
classes and lambdas and thus Spark executor logs will almost always trigger 
this edge-case and suffer from poor performance.

By default, if you do not specify an explicit exception format in your logging 
pattern then Log4J2 will add this "extended exception" pattern (see 
PatternLayout's {{alwaysWriteExceptions}} flag in Log4J's documentation, plus 
[the code implementing that 
flag|https://github.com/apache/logging-log4j2/blob/d6c8ab0863c551cdf0f8a5b1966ab45e3cddf572/log4j-core/src/main/java/org/apache/logging/log4j/core/pattern/PatternParser.java#L206-L209]
 in Log4J2).

In this PR, I have updated Spark's default Log4J2 configurations so that each 
pattern layout includes an explicit {{%ex}} so that it uses the normal 
(non-extended) exception logging format.

Although it's true that any program logging exceptions at a high rate should 
probably just fix the source of the exceptions, I think it's still a good idea 
for us to try to fix this out-of-the-box performance difference so that users' 
existing workloads do not regress when upgrading to 3.3.0.

  was:
This PR addresses a performance problem in Log4J 2 related to exception 
logging: in certain scenarios I observed that Log4J2's exception stacktrace 
logging can be ~10x slower than Log4J 1.

The problem stems from a new log pattern format in Log4J2 called ["extended 
exception"|https://logging.apache.org/log4j/2.x/manual/layouts.html#PatternExtendedException],
 which enriches the regular stacktrace string with information on the name of 
the JAR files that contained the classes in each stack frame.

Log4J queries the classloader to determine the source JAR for each class. This 
isn't cheap, but this information is cached and reused in future exception 
logging calls. In certain scenarios involving runtime-generated classes, this 
lookup will fail and the failed lookup result will _not_ be cached. As a 
result, expensive classloading operations will be performed every time such an 
exception is logged. In addition to being very slow, these operations take out 
a lock on the classloader and thus can cause severe lock contention if multiple 
threads are logging errors. This issue is described in more detail in a comment 
on a Log4J2 JIRA and in a linked blogpost. Spark frequently uses generated 
classes and lambdas and thus Spark executor logs will almost always trigger 
this edge-case and suffer from poor performance.

By default, if you do not specify an explicit exception format in your logging 
pattern then Log4J2 will add this "extended exception" pattern (see 
PatternLayout's {{alwaysWriteExceptions}} flag in Log4J's documentation, plus 
[the code implementing that 
flag|https://github.com/apache/logging-log4j2/blob/d6c8ab0863c551cdf0f8a5b1966ab45e3cddf572/log4j-core/src/main/java/org/apache/logging/log4j/core/pattern/PatternParser.java#L206-L209]
 in Log4J2).

In this PR, I have updated Spark's default Log4J2 configurations so that each 
pattern layout includes an explicit {{%ex}} so that it uses the normal 
(non-extended) exception logging format.

Although it's true that any program logging exceptions at a high rate should 
probably just fix the source of the exceptions, I think it's still a good idea 
for us to try to fix this issue.


> Stop using Log4J2's extended throwable logging pattern in default logging 
> configurations
> 
>
> Key: SPARK-39361
> URL: https://issues.apache.org/jira/browse/SPARK-39361
> Pr

[jira] [Created] (SPARK-39361) Stop using Log4J2's extended throwable logging pattern in default logging configurations

2022-06-01 Thread Josh Rosen (Jira)
Josh Rosen created SPARK-39361:
--

 Summary: Stop using Log4J2's extended throwable logging pattern in 
default logging configurations
 Key: SPARK-39361
 URL: https://issues.apache.org/jira/browse/SPARK-39361
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.3.0
Reporter: Josh Rosen
Assignee: Josh Rosen


This PR addresses a performance problem in Log4J 2 related to exception 
logging: in certain scenarios I observed that Log4J2's exception stacktrace 
logging can be ~10x slower than Log4J 1.

The problem stems from a new log pattern format in Log4J2 called ["extended 
exception"|https://logging.apache.org/log4j/2.x/manual/layouts.html#PatternExtendedException],
 which enriches the regular stacktrace string with information on the name of 
the JAR files that contained the classes in each stack frame.

Log4J queries the classloader to determine the source JAR for each class. This 
isn't cheap, but this information is cached and reused in future exception 
logging calls. In certain scenarios involving runtime-generated classes, this 
lookup will fail and the failed lookup result will _not_ be cached. As a 
result, expensive classloading operations will be performed every time such an 
exception is logged. In addition to being very slow, these operations take out 
a lock on the classloader and thus can cause severe lock contention if multiple 
threads are logging errors. This issue is described in more detail in a comment 
on a Log4J2 JIRA and in a linked blogpost. Spark frequently uses generated 
classes and lambdas and thus Spark executor logs will almost always trigger 
this edge-case and suffer from poor performance.

By default, if you do not specify an explicit exception format in your logging 
pattern then Log4J2 will add this "extended exception" pattern (see 
PatternLayout's {{alwaysWriteExceptions}} flag in Log4J's documentation, plus 
[the code implementing that 
flag|https://github.com/apache/logging-log4j2/blob/d6c8ab0863c551cdf0f8a5b1966ab45e3cddf572/log4j-core/src/main/java/org/apache/logging/log4j/core/pattern/PatternParser.java#L206-L209]
 in Log4J2).

In this PR, I have updated Spark's default Log4J2 configurations so that each 
pattern layout includes an explicit {{%ex}} so that it uses the normal 
(non-extended) exception logging format.

Although it's true that any program logging exceptions at a high rate should 
probably just fix the source of the exceptions, I think it's still a good idea 
for us to try to fix this issue.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39359) Restrict DEFAULT columns to allowlist of supported data source types

2022-06-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39359:


Assignee: (was: Apache Spark)

> Restrict DEFAULT columns to allowlist of supported data source types
> 
>
> Key: SPARK-39359
> URL: https://issues.apache.org/jira/browse/SPARK-39359
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4
>Reporter: Daniel
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39360) Recover spark.kubernetes.memoryOverheadFactor doc and remove deprecation

2022-06-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39360:


Assignee: Apache Spark

> Recover spark.kubernetes.memoryOverheadFactor doc and remove deprecation
> 
>
> Key: SPARK-39360
> URL: https://issues.apache.org/jira/browse/SPARK-39360
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39359) Restrict DEFAULT columns to allowlist of supported data source types

2022-06-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39359:


Assignee: Apache Spark

> Restrict DEFAULT columns to allowlist of supported data source types
> 
>
> Key: SPARK-39359
> URL: https://issues.apache.org/jira/browse/SPARK-39359
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4
>Reporter: Daniel
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39360) Recover spark.kubernetes.memoryOverheadFactor doc and remove deprecation

2022-06-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39360:


Assignee: (was: Apache Spark)

> Recover spark.kubernetes.memoryOverheadFactor doc and remove deprecation
> 
>
> Key: SPARK-39360
> URL: https://issues.apache.org/jira/browse/SPARK-39360
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39360) Recover spark.kubernetes.memoryOverheadFactor doc and remove deprecation

2022-06-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17545214#comment-17545214
 ] 

Apache Spark commented on SPARK-39360:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/36744

> Recover spark.kubernetes.memoryOverheadFactor doc and remove deprecation
> 
>
> Key: SPARK-39360
> URL: https://issues.apache.org/jira/browse/SPARK-39360
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39359) Restrict DEFAULT columns to allowlist of supported data source types

2022-06-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17545213#comment-17545213
 ] 

Apache Spark commented on SPARK-39359:
--

User 'dtenedor' has created a pull request for this issue:
https://github.com/apache/spark/pull/36745

> Restrict DEFAULT columns to allowlist of supported data source types
> 
>
> Key: SPARK-39359
> URL: https://issues.apache.org/jira/browse/SPARK-39359
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4
>Reporter: Daniel
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39360) Recover spark.kubernetes.memoryOverheadFactor doc and remove deprecation

2022-06-01 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-39360:
-

 Summary: Recover spark.kubernetes.memoryOverheadFactor doc and 
remove deprecation
 Key: SPARK-39360
 URL: https://issues.apache.org/jira/browse/SPARK-39360
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes
Affects Versions: 3.3.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29260) Enable supported Hive metastore versions once it support altering database location

2022-06-01 Thread Chao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17545166#comment-17545166
 ] 

Chao Sun commented on SPARK-29260:
--

[~yumwang] Looks like HIVE-8472 is for the server side changes of this feature. 
Even with Hive 2.x, I think Spark should already support altering database 
location as it is, right? 

> Enable supported Hive metastore versions once it support altering database 
> location
> ---
>
> Key: SPARK-29260
> URL: https://issues.apache.org/jira/browse/SPARK-29260
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>
> Hive 3.x is supported currently. Hive 2.2.1 and Hive 2.4.0 have not released.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39359) Restrict DEFAULT columns to allowlist of supported data source types

2022-06-01 Thread Daniel (Jira)
Daniel created SPARK-39359:
--

 Summary: Restrict DEFAULT columns to allowlist of supported data 
source types
 Key: SPARK-39359
 URL: https://issues.apache.org/jira/browse/SPARK-39359
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4
Reporter: Daniel






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39346) Convert asserts/illegal state exception to internal errors on each phase

2022-06-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17545070#comment-17545070
 ] 

Apache Spark commented on SPARK-39346:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/36742

> Convert asserts/illegal state exception to internal errors on each phase
> 
>
> Key: SPARK-39346
> URL: https://issues.apache.org/jira/browse/SPARK-39346
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.4.0
>
>
> Wrap assert/illegal state exception by internal errors on each phase of query 
> execution.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39358) Have the arguments passed via spark.executor.extraJavaOptions placed before the classpath when composing the executors commandline

2022-06-01 Thread Alberto Bortolan (Jira)
Alberto Bortolan created SPARK-39358:


 Summary: Have the arguments passed via 
spark.executor.extraJavaOptions placed before the classpath when composing the 
executors commandline
 Key: SPARK-39358
 URL: https://issues.apache.org/jira/browse/SPARK-39358
 Project: Spark
  Issue Type: Wish
  Components: Spark Core
Affects Versions: 3.2.1, 2.4.8
Reporter: Alberto Bortolan


When submitting a job it's possible to pass java options to be applied to the 
executor processes via the configuration parameter 
{{spark.executor.extraJavaOptions}} as in 
{noformat}
--conf 'spark.driver.extraJavaOptions=-Dmycompany.app.name=some_name'{noformat}
The command line for the executors is composed as
{noformat}
java <$SPARK_CONF_DIF/java.opts options>  -cp  -Xmx 
 ...{noformat}
Since the classpath can be particularly long, it would be helpful for the 
purpose of visibility  when systems are monitored with tools that show the 
first part of the command line of a process (e.g nmon), to get the extra java 
options moved before classpath, as in
{noformat}
java <$SPARK_CONF_DIR/java.opts options>  -cp  
-Xmx  ...{noformat}
The type of extra options that would be passed in this way would be application 
and submission dependent and so it would not be appropriate or even possible to 
place them inside {{{}$SPARK_CONF_DIR/java-opts{}}}.

---

For reference, the code that builds the command line is 
[buildSparkSubmitCommand|https://github.com/apache/spark/blob/4f25b3f71238a00508a356591553f2dfa89f8290/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java#L263],
 where, one of the first actions is calling  
[buildJavaCommand|https://github.com/apache/spark/blob/4f25b3f71238a00508a356591553f2dfa89f8290/launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java#L92]
 that prepare the first part of the string from the "java" command to "-cp 
classpath" included.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39354) The analysis exception is incorrect

2022-06-01 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17545047#comment-17545047
 ] 

Dongjoon Hyun commented on SPARK-39354:
---

Ah, got it. I removed my previous message. Yes, this is a regression.

> The analysis exception is incorrect
> ---
>
> Key: SPARK-39354
> URL: https://issues.apache.org/jira/browse/SPARK-39354
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Priority: Minor
>
> {noformat}
> scala> spark.sql("create table t1(user_id int, auct_end_dt date) using 
> parquet;")
> res0: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("select * from t1 join t2 on t1.user_id = t2.user_id where 
> t1.auct_end_dt >= Date_sub('2020-12-27', 90)").show
> org.apache.spark.sql.AnalysisException: cannot resolve 
> 'date_sub('2020-12-27', 90)' due to data type mismatch: argument 1 requires 
> date type, however, ''2020-12-27'' is of string type.; line 1 pos 76
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.RemoveTempResolvedColumn$.$anonfun$apply$82(Analyzer.scala:4334)
>   at 
> org.apache.spark.sql.catalyst.analysis.RemoveTempResolvedColumn$.$anonfun$apply$82$adapted(Analyzer.scala:4327)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:365)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:364)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:364)
> {noformat}
> The analysis exception should be:
> {noformat}
> org.apache.spark.sql.AnalysisException: Table or view not found: t2
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] (SPARK-39354) The analysis exception is incorrect

2022-06-01 Thread Dongjoon Hyun (Jira)


[ https://issues.apache.org/jira/browse/SPARK-39354 ]


Dongjoon Hyun deleted comment on SPARK-39354:
---

was (Author: dongjoon):
Hi, [~yumwang]. When `date_sub` is not registered, it's correct, isn't?

> The analysis exception is incorrect
> ---
>
> Key: SPARK-39354
> URL: https://issues.apache.org/jira/browse/SPARK-39354
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Priority: Minor
>
> {noformat}
> scala> spark.sql("create table t1(user_id int, auct_end_dt date) using 
> parquet;")
> res0: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("select * from t1 join t2 on t1.user_id = t2.user_id where 
> t1.auct_end_dt >= Date_sub('2020-12-27', 90)").show
> org.apache.spark.sql.AnalysisException: cannot resolve 
> 'date_sub('2020-12-27', 90)' due to data type mismatch: argument 1 requires 
> date type, however, ''2020-12-27'' is of string type.; line 1 pos 76
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.RemoveTempResolvedColumn$.$anonfun$apply$82(Analyzer.scala:4334)
>   at 
> org.apache.spark.sql.catalyst.analysis.RemoveTempResolvedColumn$.$anonfun$apply$82$adapted(Analyzer.scala:4327)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:365)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:364)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:364)
> {noformat}
> The analysis exception should be:
> {noformat}
> org.apache.spark.sql.AnalysisException: Table or view not found: t2
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39354) The analysis exception is incorrect

2022-06-01 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17545045#comment-17545045
 ] 

Dongjoon Hyun commented on SPARK-39354:
---

Hi, [~yumwang]. When `date_sub` is not registered, it's correct, isn't?

> The analysis exception is incorrect
> ---
>
> Key: SPARK-39354
> URL: https://issues.apache.org/jira/browse/SPARK-39354
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Priority: Minor
>
> {noformat}
> scala> spark.sql("create table t1(user_id int, auct_end_dt date) using 
> parquet;")
> res0: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("select * from t1 join t2 on t1.user_id = t2.user_id where 
> t1.auct_end_dt >= Date_sub('2020-12-27', 90)").show
> org.apache.spark.sql.AnalysisException: cannot resolve 
> 'date_sub('2020-12-27', 90)' due to data type mismatch: argument 1 requires 
> date type, however, ''2020-12-27'' is of string type.; line 1 pos 76
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.RemoveTempResolvedColumn$.$anonfun$apply$82(Analyzer.scala:4334)
>   at 
> org.apache.spark.sql.catalyst.analysis.RemoveTempResolvedColumn$.$anonfun$apply$82$adapted(Analyzer.scala:4327)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:365)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:364)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:364)
> {noformat}
> The analysis exception should be:
> {noformat}
> org.apache.spark.sql.AnalysisException: Table or view not found: t2
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39354) The analysis exception is incorrect

2022-06-01 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17545043#comment-17545043
 ] 

Dongjoon Hyun commented on SPARK-39354:
---

Thank you for pinging me, [~maxgekk].

> The analysis exception is incorrect
> ---
>
> Key: SPARK-39354
> URL: https://issues.apache.org/jira/browse/SPARK-39354
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Priority: Minor
>
> {noformat}
> scala> spark.sql("create table t1(user_id int, auct_end_dt date) using 
> parquet;")
> res0: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("select * from t1 join t2 on t1.user_id = t2.user_id where 
> t1.auct_end_dt >= Date_sub('2020-12-27', 90)").show
> org.apache.spark.sql.AnalysisException: cannot resolve 
> 'date_sub('2020-12-27', 90)' due to data type mismatch: argument 1 requires 
> date type, however, ''2020-12-27'' is of string type.; line 1 pos 76
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.RemoveTempResolvedColumn$.$anonfun$apply$82(Analyzer.scala:4334)
>   at 
> org.apache.spark.sql.catalyst.analysis.RemoveTempResolvedColumn$.$anonfun$apply$82$adapted(Analyzer.scala:4327)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:365)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:364)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:364)
> {noformat}
> The analysis exception should be:
> {noformat}
> org.apache.spark.sql.AnalysisException: Table or view not found: t2
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39346) Convert asserts/illegal state exception to internal errors on each phase

2022-06-01 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-39346.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36704
[https://github.com/apache/spark/pull/36704]

> Convert asserts/illegal state exception to internal errors on each phase
> 
>
> Key: SPARK-39346
> URL: https://issues.apache.org/jira/browse/SPARK-39346
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.4.0
>
>
> Wrap assert/illegal state exception by internal errors on each phase of query 
> execution.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39346) Convert asserts/illegal state exception to internal errors on each phase

2022-06-01 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-39346:


Assignee: Max Gekk

> Convert asserts/illegal state exception to internal errors on each phase
> 
>
> Key: SPARK-39346
> URL: https://issues.apache.org/jira/browse/SPARK-39346
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> Wrap assert/illegal state exception by internal errors on each phase of query 
> execution.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39354) The analysis exception is incorrect

2022-06-01 Thread Max Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17545040#comment-17545040
 ] 

Max Gekk commented on SPARK-39354:
--

Ping [~amaliujia] and [~wenchen] [~dongjoon] as you reviewed the PR which 
probably introduced the bug.

> The analysis exception is incorrect
> ---
>
> Key: SPARK-39354
> URL: https://issues.apache.org/jira/browse/SPARK-39354
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Priority: Minor
>
> {noformat}
> scala> spark.sql("create table t1(user_id int, auct_end_dt date) using 
> parquet;")
> res0: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("select * from t1 join t2 on t1.user_id = t2.user_id where 
> t1.auct_end_dt >= Date_sub('2020-12-27', 90)").show
> org.apache.spark.sql.AnalysisException: cannot resolve 
> 'date_sub('2020-12-27', 90)' due to data type mismatch: argument 1 requires 
> date type, however, ''2020-12-27'' is of string type.; line 1 pos 76
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.RemoveTempResolvedColumn$.$anonfun$apply$82(Analyzer.scala:4334)
>   at 
> org.apache.spark.sql.catalyst.analysis.RemoveTempResolvedColumn$.$anonfun$apply$82$adapted(Analyzer.scala:4327)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:365)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:364)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:364)
> {noformat}
> The analysis exception should be:
> {noformat}
> org.apache.spark.sql.AnalysisException: Table or view not found: t2
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39313) V2ExpressionUtils.toCatalystOrdering should fail if V2Expression can not be translated

2022-06-01 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated SPARK-39313:
-
Fix Version/s: 3.3.0

> V2ExpressionUtils.toCatalystOrdering should fail if V2Expression can not be 
> translated
> --
>
> Key: SPARK-39313
> URL: https://issues.apache.org/jira/browse/SPARK-39313
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Blocker
> Fix For: 3.3.0, 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39313) V2ExpressionUtils.toCatalystOrdering should fail if V2Expression can not be translated

2022-06-01 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun reassigned SPARK-39313:


Assignee: Cheng Pan

> V2ExpressionUtils.toCatalystOrdering should fail if V2Expression can not be 
> translated
> --
>
> Key: SPARK-39313
> URL: https://issues.apache.org/jira/browse/SPARK-39313
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39313) V2ExpressionUtils.toCatalystOrdering should fail if V2Expression can not be translated

2022-06-01 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved SPARK-39313.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36697
[https://github.com/apache/spark/pull/36697]

> V2ExpressionUtils.toCatalystOrdering should fail if V2Expression can not be 
> translated
> --
>
> Key: SPARK-39313
> URL: https://issues.apache.org/jira/browse/SPARK-39313
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Blocker
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39354) The analysis exception is incorrect

2022-06-01 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17545016#comment-17545016
 ] 

Yang Jie commented on SPARK-39354:
--

This issue was introduced by SPARK-38118:

 

[https://github.com/apache/spark/blob/5a3ba9b0b301a3b0c43f8d0d88e2b6bdce57d0e6/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L4353-L4372]

 

 
{code:java}
      // HAVING clause will be resolved as a Filter. When having func(column 
with wrong data type),
      // the column could be wrapped by a TempResolvedColumn, e.g. 
mean(tempresolvedcolumn(t.c)).
      // Because TempResolvedColumn can still preserve column data type, here 
is a chance to check
      // if the data type matches with the required data type of the function. 
We can throw an error
      // when data types mismatches.
      case operator: Filter =>
        operator.expressions.foreach(_.foreachUp {
          case e: Expression if e.childrenResolved && 
e.checkInputDataTypes().isFailure =>
            e.checkInputDataTypes() match {
              case TypeCheckResult.TypeCheckFailure(message) =>
                e.setTagValue(DATA_TYPE_MISMATCH_ERROR, true)
                e.failAnalysis(
                  s"cannot resolve '${e.sql}' due to data type mismatch: 
$message" +
                    extraHintForAnsiTypeCoercionExpression(plan))
            }
          case _ =>
        })
      case _ => {code}
 

`case operator: Filter =>` is too broad, should add some restrictions 

 

 

> The analysis exception is incorrect
> ---
>
> Key: SPARK-39354
> URL: https://issues.apache.org/jira/browse/SPARK-39354
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Priority: Minor
>
> {noformat}
> scala> spark.sql("create table t1(user_id int, auct_end_dt date) using 
> parquet;")
> res0: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("select * from t1 join t2 on t1.user_id = t2.user_id where 
> t1.auct_end_dt >= Date_sub('2020-12-27', 90)").show
> org.apache.spark.sql.AnalysisException: cannot resolve 
> 'date_sub('2020-12-27', 90)' due to data type mismatch: argument 1 requires 
> date type, however, ''2020-12-27'' is of string type.; line 1 pos 76
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.RemoveTempResolvedColumn$.$anonfun$apply$82(Analyzer.scala:4334)
>   at 
> org.apache.spark.sql.catalyst.analysis.RemoveTempResolvedColumn$.$anonfun$apply$82$adapted(Analyzer.scala:4327)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:365)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:364)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:364)
> {noformat}
> The analysis exception should be:
> {noformat}
> org.apache.spark.sql.AnalysisException: Table or view not found: t2
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39357) pmCache memory leak caused by IsolatedClassLoader

2022-06-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544997#comment-17544997
 ] 

Apache Spark commented on SPARK-39357:
--

User 'tianshuang' has created a pull request for this issue:
https://github.com/apache/spark/pull/36741

> pmCache memory leak caused by IsolatedClassLoader
> -
>
> Key: SPARK-39357
> URL: https://issues.apache.org/jira/browse/SPARK-39357
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 3.2.1
>Reporter: tianshuang
>Priority: Major
> Attachments: Xnip2022-06-01_23-09-35.jpg, 
> Xnip2022-06-01_23-19-35.jpeg, Xnip2022-06-01_23-32-39.jpg
>
>
> I found this bug in Spark 2.4.4, because the related code has not changed, so 
> this bug still exists on master, the following is a brief description of this 
> bug:
> In May 2015, 
> [SPARK-6907|https://github.com/apache/spark/commit/daa70bf135f23381f5f410aa95a1c0e5a2888568]
>  introduced isolated classloader for HiveMetastore to support Hive 
> multi-version loading, but this PR resulted in [RawStore cleanup 
> mechanism|https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/ThreadFactoryWithGarbageCleanup.java
>  #L27-L42] is broken because the `ThreadWithGarbageCleanup` class used by 
> `HiveServer2-Handler-Pool` and `HiveServer2-Background-Pool` and 
> `HiveServer2-HttpHandler-Pool` is loaded by AppClassLoader, in the source 
> code of `ThreadWithGarbageCleanup` class: `RawStore threadLocalRawStore = 
> HiveMetaStore.HMSHandler.getRawStore();` This line of code will use the 
> `threadLocalMS` instance in `HiveMetaStore.HMSHandler` (loaded by 
> AppClassLoader), and in the process of thread execution, the `client` 
> actually created by isolatedClassLoader, in the process of obtaining 
> `RawStore` instance through `HiveMetaStore.HMSHandler#getMSForConf`, the `ms` 
> instance is set to `threadLocalMS`, but the static `threadLocalMS` instance 
> belongs to `HMSHandler`(loaded by IsolatedClassLoader$$anon$1), that is, the 
> set and get methods do not operate on the same `threadLocalMS` instance, so 
> in `ThreadWithGarbageCleanup#cacheThreadLocalRawStore` method, the obtained 
> `RawStore` instance is null, so the subsequent `RawStore` cleaning logic does 
> not take effect, because the `shutdown` method of `RawStore` instance is not 
> called, resulting in `pmCache` of `JDOPersistenceManagerFactory` memory leak.
> Long-running Spark ThriftServer end up with frequent GCs, resulting in poor 
> performance.
> I analyzed the heap dump using MAT and executed the following OQL: `SELECT * 
> FROM INSTANCEOF java.lang.Class c WHERE c.@displayName.contains("class 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler ")`, two instances 
> of the `HMSHandler` *Class* can be found in the heap. Also know that they 
> each hold a static `threadLocalMS` instance.
> We execute the following OQL: `select * from 
> org.datanucleus.api.jdo.JDOPersistenceManagerFactory`, we can see that the 
> `pmCache` of the `JDOPersistenceManagerFactory` instance occupies 1.3GB of 
> memory.
> We execute the following OQL: `SELECT * FROM INSTANCEOF java.lang.Class c 
> WHERE c.@displayName.contains("class 
> org.apache.hive.service.server.ThreadFactoryWithGarbageCleanup")`, we can see 
> that there is no element in the static instance `threadRawStoreMap` of 
> `ThreadFactoryWithGarbageCleanup`, which confirms the above statement, 
> because `HMSHandler.getRawStore()` in 
> `ThreadWithGarbageCleanup#cacheThreadLocalRawStore` is called on the 
> `threadLocalMS` instance in `HMSHandler`(loaded by AppClassLoader) instead of 
> `threadLocalMS` instance in `HMSHandler`(loaded by 
> IsolatedClassLoader$$anon$1).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39357) pmCache memory leak caused by IsolatedClassLoader

2022-06-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39357:


Assignee: Apache Spark

> pmCache memory leak caused by IsolatedClassLoader
> -
>
> Key: SPARK-39357
> URL: https://issues.apache.org/jira/browse/SPARK-39357
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 3.2.1
>Reporter: tianshuang
>Assignee: Apache Spark
>Priority: Major
> Attachments: Xnip2022-06-01_23-09-35.jpg, 
> Xnip2022-06-01_23-19-35.jpeg, Xnip2022-06-01_23-32-39.jpg
>
>
> I found this bug in Spark 2.4.4, because the related code has not changed, so 
> this bug still exists on master, the following is a brief description of this 
> bug:
> In May 2015, 
> [SPARK-6907|https://github.com/apache/spark/commit/daa70bf135f23381f5f410aa95a1c0e5a2888568]
>  introduced isolated classloader for HiveMetastore to support Hive 
> multi-version loading, but this PR resulted in [RawStore cleanup 
> mechanism|https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/ThreadFactoryWithGarbageCleanup.java
>  #L27-L42] is broken because the `ThreadWithGarbageCleanup` class used by 
> `HiveServer2-Handler-Pool` and `HiveServer2-Background-Pool` and 
> `HiveServer2-HttpHandler-Pool` is loaded by AppClassLoader, in the source 
> code of `ThreadWithGarbageCleanup` class: `RawStore threadLocalRawStore = 
> HiveMetaStore.HMSHandler.getRawStore();` This line of code will use the 
> `threadLocalMS` instance in `HiveMetaStore.HMSHandler` (loaded by 
> AppClassLoader), and in the process of thread execution, the `client` 
> actually created by isolatedClassLoader, in the process of obtaining 
> `RawStore` instance through `HiveMetaStore.HMSHandler#getMSForConf`, the `ms` 
> instance is set to `threadLocalMS`, but the static `threadLocalMS` instance 
> belongs to `HMSHandler`(loaded by IsolatedClassLoader$$anon$1), that is, the 
> set and get methods do not operate on the same `threadLocalMS` instance, so 
> in `ThreadWithGarbageCleanup#cacheThreadLocalRawStore` method, the obtained 
> `RawStore` instance is null, so the subsequent `RawStore` cleaning logic does 
> not take effect, because the `shutdown` method of `RawStore` instance is not 
> called, resulting in `pmCache` of `JDOPersistenceManagerFactory` memory leak.
> Long-running Spark ThriftServer end up with frequent GCs, resulting in poor 
> performance.
> I analyzed the heap dump using MAT and executed the following OQL: `SELECT * 
> FROM INSTANCEOF java.lang.Class c WHERE c.@displayName.contains("class 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler ")`, two instances 
> of the `HMSHandler` *Class* can be found in the heap. Also know that they 
> each hold a static `threadLocalMS` instance.
> We execute the following OQL: `select * from 
> org.datanucleus.api.jdo.JDOPersistenceManagerFactory`, we can see that the 
> `pmCache` of the `JDOPersistenceManagerFactory` instance occupies 1.3GB of 
> memory.
> We execute the following OQL: `SELECT * FROM INSTANCEOF java.lang.Class c 
> WHERE c.@displayName.contains("class 
> org.apache.hive.service.server.ThreadFactoryWithGarbageCleanup")`, we can see 
> that there is no element in the static instance `threadRawStoreMap` of 
> `ThreadFactoryWithGarbageCleanup`, which confirms the above statement, 
> because `HMSHandler.getRawStore()` in 
> `ThreadWithGarbageCleanup#cacheThreadLocalRawStore` is called on the 
> `threadLocalMS` instance in `HMSHandler`(loaded by AppClassLoader) instead of 
> `threadLocalMS` instance in `HMSHandler`(loaded by 
> IsolatedClassLoader$$anon$1).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39357) pmCache memory leak caused by IsolatedClassLoader

2022-06-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39357:


Assignee: (was: Apache Spark)

> pmCache memory leak caused by IsolatedClassLoader
> -
>
> Key: SPARK-39357
> URL: https://issues.apache.org/jira/browse/SPARK-39357
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 3.2.1
>Reporter: tianshuang
>Priority: Major
> Attachments: Xnip2022-06-01_23-09-35.jpg, 
> Xnip2022-06-01_23-19-35.jpeg, Xnip2022-06-01_23-32-39.jpg
>
>
> I found this bug in Spark 2.4.4, because the related code has not changed, so 
> this bug still exists on master, the following is a brief description of this 
> bug:
> In May 2015, 
> [SPARK-6907|https://github.com/apache/spark/commit/daa70bf135f23381f5f410aa95a1c0e5a2888568]
>  introduced isolated classloader for HiveMetastore to support Hive 
> multi-version loading, but this PR resulted in [RawStore cleanup 
> mechanism|https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/ThreadFactoryWithGarbageCleanup.java
>  #L27-L42] is broken because the `ThreadWithGarbageCleanup` class used by 
> `HiveServer2-Handler-Pool` and `HiveServer2-Background-Pool` and 
> `HiveServer2-HttpHandler-Pool` is loaded by AppClassLoader, in the source 
> code of `ThreadWithGarbageCleanup` class: `RawStore threadLocalRawStore = 
> HiveMetaStore.HMSHandler.getRawStore();` This line of code will use the 
> `threadLocalMS` instance in `HiveMetaStore.HMSHandler` (loaded by 
> AppClassLoader), and in the process of thread execution, the `client` 
> actually created by isolatedClassLoader, in the process of obtaining 
> `RawStore` instance through `HiveMetaStore.HMSHandler#getMSForConf`, the `ms` 
> instance is set to `threadLocalMS`, but the static `threadLocalMS` instance 
> belongs to `HMSHandler`(loaded by IsolatedClassLoader$$anon$1), that is, the 
> set and get methods do not operate on the same `threadLocalMS` instance, so 
> in `ThreadWithGarbageCleanup#cacheThreadLocalRawStore` method, the obtained 
> `RawStore` instance is null, so the subsequent `RawStore` cleaning logic does 
> not take effect, because the `shutdown` method of `RawStore` instance is not 
> called, resulting in `pmCache` of `JDOPersistenceManagerFactory` memory leak.
> Long-running Spark ThriftServer end up with frequent GCs, resulting in poor 
> performance.
> I analyzed the heap dump using MAT and executed the following OQL: `SELECT * 
> FROM INSTANCEOF java.lang.Class c WHERE c.@displayName.contains("class 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler ")`, two instances 
> of the `HMSHandler` *Class* can be found in the heap. Also know that they 
> each hold a static `threadLocalMS` instance.
> We execute the following OQL: `select * from 
> org.datanucleus.api.jdo.JDOPersistenceManagerFactory`, we can see that the 
> `pmCache` of the `JDOPersistenceManagerFactory` instance occupies 1.3GB of 
> memory.
> We execute the following OQL: `SELECT * FROM INSTANCEOF java.lang.Class c 
> WHERE c.@displayName.contains("class 
> org.apache.hive.service.server.ThreadFactoryWithGarbageCleanup")`, we can see 
> that there is no element in the static instance `threadRawStoreMap` of 
> `ThreadFactoryWithGarbageCleanup`, which confirms the above statement, 
> because `HMSHandler.getRawStore()` in 
> `ThreadWithGarbageCleanup#cacheThreadLocalRawStore` is called on the 
> `threadLocalMS` instance in `HMSHandler`(loaded by AppClassLoader) instead of 
> `threadLocalMS` instance in `HMSHandler`(loaded by 
> IsolatedClassLoader$$anon$1).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39357) pmCache memory leak caused by IsolatedClassLoader

2022-06-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544995#comment-17544995
 ] 

Apache Spark commented on SPARK-39357:
--

User 'tianshuang' has created a pull request for this issue:
https://github.com/apache/spark/pull/36741

> pmCache memory leak caused by IsolatedClassLoader
> -
>
> Key: SPARK-39357
> URL: https://issues.apache.org/jira/browse/SPARK-39357
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 3.2.1
>Reporter: tianshuang
>Priority: Major
> Attachments: Xnip2022-06-01_23-09-35.jpg, 
> Xnip2022-06-01_23-19-35.jpeg, Xnip2022-06-01_23-32-39.jpg
>
>
> I found this bug in Spark 2.4.4, because the related code has not changed, so 
> this bug still exists on master, the following is a brief description of this 
> bug:
> In May 2015, 
> [SPARK-6907|https://github.com/apache/spark/commit/daa70bf135f23381f5f410aa95a1c0e5a2888568]
>  introduced isolated classloader for HiveMetastore to support Hive 
> multi-version loading, but this PR resulted in [RawStore cleanup 
> mechanism|https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/ThreadFactoryWithGarbageCleanup.java
>  #L27-L42] is broken because the `ThreadWithGarbageCleanup` class used by 
> `HiveServer2-Handler-Pool` and `HiveServer2-Background-Pool` and 
> `HiveServer2-HttpHandler-Pool` is loaded by AppClassLoader, in the source 
> code of `ThreadWithGarbageCleanup` class: `RawStore threadLocalRawStore = 
> HiveMetaStore.HMSHandler.getRawStore();` This line of code will use the 
> `threadLocalMS` instance in `HiveMetaStore.HMSHandler` (loaded by 
> AppClassLoader), and in the process of thread execution, the `client` 
> actually created by isolatedClassLoader, in the process of obtaining 
> `RawStore` instance through `HiveMetaStore.HMSHandler#getMSForConf`, the `ms` 
> instance is set to `threadLocalMS`, but the static `threadLocalMS` instance 
> belongs to `HMSHandler`(loaded by IsolatedClassLoader$$anon$1), that is, the 
> set and get methods do not operate on the same `threadLocalMS` instance, so 
> in `ThreadWithGarbageCleanup#cacheThreadLocalRawStore` method, the obtained 
> `RawStore` instance is null, so the subsequent `RawStore` cleaning logic does 
> not take effect, because the `shutdown` method of `RawStore` instance is not 
> called, resulting in `pmCache` of `JDOPersistenceManagerFactory` memory leak.
> Long-running Spark ThriftServer end up with frequent GCs, resulting in poor 
> performance.
> I analyzed the heap dump using MAT and executed the following OQL: `SELECT * 
> FROM INSTANCEOF java.lang.Class c WHERE c.@displayName.contains("class 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler ")`, two instances 
> of the `HMSHandler` *Class* can be found in the heap. Also know that they 
> each hold a static `threadLocalMS` instance.
> We execute the following OQL: `select * from 
> org.datanucleus.api.jdo.JDOPersistenceManagerFactory`, we can see that the 
> `pmCache` of the `JDOPersistenceManagerFactory` instance occupies 1.3GB of 
> memory.
> We execute the following OQL: `SELECT * FROM INSTANCEOF java.lang.Class c 
> WHERE c.@displayName.contains("class 
> org.apache.hive.service.server.ThreadFactoryWithGarbageCleanup")`, we can see 
> that there is no element in the static instance `threadRawStoreMap` of 
> `ThreadFactoryWithGarbageCleanup`, which confirms the above statement, 
> because `HMSHandler.getRawStore()` in 
> `ThreadWithGarbageCleanup#cacheThreadLocalRawStore` is called on the 
> `threadLocalMS` instance in `HMSHandler`(loaded by AppClassLoader) instead of 
> `threadLocalMS` instance in `HMSHandler`(loaded by 
> IsolatedClassLoader$$anon$1).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39357) pmCache memory leak caused by IsolatedClassLoader

2022-06-01 Thread tianshuang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tianshuang updated SPARK-39357:
---
Description: 
I found this bug in Spark 2.4.4, because the related code has not changed, so 
this bug still exists on master, the following is a brief description of this 
bug:

In May 2015, 
[SPARK-6907|https://github.com/apache/spark/commit/daa70bf135f23381f5f410aa95a1c0e5a2888568]
 introduced isolated classloader for HiveMetastore to support Hive 
multi-version loading, but this PR resulted in [RawStore cleanup 
mechanism|https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/ThreadFactoryWithGarbageCleanup.java
 #L27-L42] is broken because the `ThreadWithGarbageCleanup` class used by 
`HiveServer2-Handler-Pool` and `HiveServer2-Background-Pool` and 
`HiveServer2-HttpHandler-Pool` is loaded by AppClassLoader, in the source code 
of `ThreadWithGarbageCleanup` class: `RawStore threadLocalRawStore = 
HiveMetaStore.HMSHandler.getRawStore();` This line of code will use the 
`threadLocalMS` instance in `HiveMetaStore.HMSHandler` (loaded by 
AppClassLoader), and in the process of thread execution, the `client` actually 
created by isolatedClassLoader, in the process of obtaining `RawStore` instance 
through `HiveMetaStore.HMSHandler#getMSForConf`, the `ms` instance is set to 
`threadLocalMS`, but the static `threadLocalMS` instance belongs to 
`HMSHandler`(loaded by IsolatedClassLoader$$anon$1), that is, the set and get 
methods do not operate on the same `threadLocalMS` instance, so in 
`ThreadWithGarbageCleanup#cacheThreadLocalRawStore` method, the obtained 
`RawStore` instance is null, so the subsequent `RawStore` cleaning logic does 
not take effect, because the `shutdown` method of `RawStore` instance is not 
called, resulting in `pmCache` of `JDOPersistenceManagerFactory` memory leak.

Long-running Spark ThriftServer end up with frequent GCs, resulting in poor 
performance.

I analyzed the heap dump using MAT and executed the following OQL: `SELECT * 
FROM INSTANCEOF java.lang.Class c WHERE c.@displayName.contains("class 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler ")`, two instances of 
the `HMSHandler` *Class* can be found in the heap. Also know that they each 
hold a static `threadLocalMS` instance.

We execute the following OQL: `select * from 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory`, we can see that the 
`pmCache` of the `JDOPersistenceManagerFactory` instance occupies 1.3GB of 
memory.

We execute the following OQL: `SELECT * FROM INSTANCEOF java.lang.Class c WHERE 
c.@displayName.contains("class 
org.apache.hive.service.server.ThreadFactoryWithGarbageCleanup")`, we can see 
that there is no element in the static instance `threadRawStoreMap` of 
`ThreadFactoryWithGarbageCleanup`, which confirms the above statement, because 
`HMSHandler.getRawStore()` in 
`ThreadWithGarbageCleanup#cacheThreadLocalRawStore` is called on the 
`threadLocalMS` instance in `HMSHandler`(loaded by AppClassLoader) instead of 
`threadLocalMS` instance in `HMSHandler`(loaded by IsolatedClassLoader$$anon$1).

  was:
I found this bug in Spark 2.4.4, because the related code has not changed, so 
this bug still exists on master, the following is a brief description of this 
bug:

In May 2015, 
[SPARK-6907|https://github.com/apache/spark/commit/daa70bf135f23381f5f410aa95a1c0e5a2888568]
 introduced isolated classloader for HiveMetastore to support Hive 
multi-version loading, but this PR resulted in [RawStore cleanup 
mechanism](https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/ThreadFactoryWithGarbageCleanup.java
 #L27-L42) is broken because the `ThreadWithGarbageCleanup` class used by 
`HiveServer2-Handler-Pool` and `HiveServer2-Background-Pool` and 
`HiveServer2-HttpHandler-Pool` is loaded by AppClassLoader, in the source code 
of `ThreadWithGarbageCleanup` class: `RawStore threadLocalRawStore = 
HiveMetaStore.HMSHandler.getRawStore();` This line of code will use the 
`threadLocalMS` instance in `HiveMetaStore.HMSHandler` (loaded by 
AppClassLoader), and in the process of thread execution, the `client` actually 
created by isolatedClassLoader, in the process of obtaining `RawStore` instance 
through `HiveMetaStore.HMSHandler#getMSForConf`, the `ms` instance is set to 
`threadLocalMS`, but the static `threadLocalMS` instance belongs to 
`HMSHandler`(loaded by IsolatedClassLoader$$anon$1), that is, the set and get 
methods do not operate on the same `threadLocalMS` instance, so in 
`ThreadWithGarbageCleanup#cacheThreadLocalRawStore` method, the obtained 
`RawStore` instance is null, so the subsequent `RawStore` cleaning logic does 
not take effect, because the `shutdown` method of `RawStore` instance is not 
called, resulting in `pmCache` of `JDOPersistenceManagerFactory` memory leak.

Long-run

[jira] [Updated] (SPARK-39357) pmCache memory leak caused by IsolatedClassLoader

2022-06-01 Thread tianshuang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tianshuang updated SPARK-39357:
---
Description: 
I found this bug in Spark 2.4.4, because the related code has not changed, so 
this bug still exists on master, the following is a brief description of this 
bug:

In May 2015, 
[SPARK-6907|https://github.com/apache/spark/commit/daa70bf135f23381f5f410aa95a1c0e5a2888568]
 introduced isolated classloader for HiveMetastore to support Hive 
multi-version loading, but this PR resulted in [RawStore cleanup 
mechanism](https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/ThreadFactoryWithGarbageCleanup.java
 #L27-L42) is broken because the `ThreadWithGarbageCleanup` class used by 
`HiveServer2-Handler-Pool` and `HiveServer2-Background-Pool` and 
`HiveServer2-HttpHandler-Pool` is loaded by AppClassLoader, in the source code 
of `ThreadWithGarbageCleanup` class: `RawStore threadLocalRawStore = 
HiveMetaStore.HMSHandler.getRawStore();` This line of code will use the 
`threadLocalMS` instance in `HiveMetaStore.HMSHandler` (loaded by 
AppClassLoader), and in the process of thread execution, the `client` actually 
created by isolatedClassLoader, in the process of obtaining `RawStore` instance 
through `HiveMetaStore.HMSHandler#getMSForConf`, the `ms` instance is set to 
`threadLocalMS`, but the static `threadLocalMS` instance belongs to 
`HMSHandler`(loaded by IsolatedClassLoader$$anon$1), that is, the set and get 
methods do not operate on the same `threadLocalMS` instance, so in 
`ThreadWithGarbageCleanup#cacheThreadLocalRawStore` method, the obtained 
`RawStore` instance is null, so the subsequent `RawStore` cleaning logic does 
not take effect, because the `shutdown` method of `RawStore` instance is not 
called, resulting in `pmCache` of `JDOPersistenceManagerFactory` memory leak.

Long-running Spark ThriftServer end up with frequent GCs, resulting in poor 
performance.

I analyzed the heap dump using MAT and executed the following OQL: `SELECT * 
FROM INSTANCEOF java.lang.Class c WHERE c.@displayName.contains("class 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler ")`, two instances of 
the `HMSHandler` **Class** can be found in the heap. Also know that they each 
hold a static `threadLocalMS` instance.

We execute the following OQL: `select * from 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory`, we can see that the 
`pmCache` of the `JDOPersistenceManagerFactory` instance occupies 1.3GB of 
memory.

We execute the following OQL: `SELECT * FROM INSTANCEOF java.lang.Class c WHERE 
c.@displayName.contains("class 
org.apache.hive.service.server.ThreadFactoryWithGarbageCleanup")`, we can see 
that there is no element in the static instance `threadRawStoreMap` of 
`ThreadFactoryWithGarbageCleanup`, which confirms the above statement, because 
`HMSHandler.getRawStore()` in 
`ThreadWithGarbageCleanup#cacheThreadLocalRawStore` is called on the 
`threadLocalMS` instance in `HMSHandler`(loaded by AppClassLoader) instead of 
`threadLocalMS` instance in `HMSHandler`(loaded by IsolatedClassLoader$$anon$1).

  was:
I found this bug in Spark 2.4.4, because the related code has not changed, so 
this bug still exists on master, the following is a brief description of this 
bug:

In May 2015, 
[SPARK-6907](https://github.com/apache/spark/commit/daa70bf135f23381f5f410aa95a1c0e5a2888568)
 introduced isolated classloader for HiveMetastore to support Hive 
multi-version loading, but this PR resulted in [RawStore cleanup 
mechanism](https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/ThreadFactoryWithGarbageCleanup.java
 #L27-L42) is broken because the `ThreadWithGarbageCleanup` class used by 
`HiveServer2-Handler-Pool` and `HiveServer2-Background-Pool` and 
`HiveServer2-HttpHandler-Pool` is loaded by AppClassLoader, in the source code 
of `ThreadWithGarbageCleanup` class: `RawStore threadLocalRawStore = 
HiveMetaStore.HMSHandler.getRawStore();` This line of code will use the 
`threadLocalMS` instance in `HiveMetaStore.HMSHandler` (loaded by 
AppClassLoader), and in the process of thread execution, the `client` actually 
created by isolatedClassLoader, in the process of obtaining `RawStore` instance 
through `HiveMetaStore.HMSHandler#getMSForConf`, the `ms` instance is set to 
`threadLocalMS`, but the static `threadLocalMS` instance belongs to 
`HMSHandler`(loaded by IsolatedClassLoader$$anon$1), that is, the set and get 
methods do not operate on the same `threadLocalMS` instance, so in 
`ThreadWithGarbageCleanup#cacheThreadLocalRawStore` method, the obtained 
`RawStore` instance is null, so the subsequent `RawStore` cleaning logic does 
not take effect, because the `shutdown` method of `RawStore` instance is not 
called, resulting in `pmCache` of `JDOPersistenceManagerFactory` memory leak.

Long

[jira] [Updated] (SPARK-39357) pmCache memory leak caused by IsolatedClassLoader

2022-06-01 Thread tianshuang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tianshuang updated SPARK-39357:
---
Attachment: Xnip2022-06-01_23-19-35.jpeg

> pmCache memory leak caused by IsolatedClassLoader
> -
>
> Key: SPARK-39357
> URL: https://issues.apache.org/jira/browse/SPARK-39357
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 3.2.1
>Reporter: tianshuang
>Priority: Major
> Attachments: Xnip2022-06-01_23-09-35.jpg, 
> Xnip2022-06-01_23-19-35.jpeg, Xnip2022-06-01_23-32-39.jpg
>
>
> I found this bug in Spark 2.4.4, because the related code has not changed, so 
> this bug still exists on master, the following is a brief description of this 
> bug:
> In May 2015, 
> [SPARK-6907](https://github.com/apache/spark/commit/daa70bf135f23381f5f410aa95a1c0e5a2888568)
>  introduced isolated classloader for HiveMetastore to support Hive 
> multi-version loading, but this PR resulted in [RawStore cleanup 
> mechanism](https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/ThreadFactoryWithGarbageCleanup.java
>  #L27-L42) is broken because the `ThreadWithGarbageCleanup` class used by 
> `HiveServer2-Handler-Pool` and `HiveServer2-Background-Pool` and 
> `HiveServer2-HttpHandler-Pool` is loaded by AppClassLoader, in the source 
> code of `ThreadWithGarbageCleanup` class: `RawStore threadLocalRawStore = 
> HiveMetaStore.HMSHandler.getRawStore();` This line of code will use the 
> `threadLocalMS` instance in `HiveMetaStore.HMSHandler` (loaded by 
> AppClassLoader), and in the process of thread execution, the `client` 
> actually created by isolatedClassLoader, in the process of obtaining 
> `RawStore` instance through `HiveMetaStore.HMSHandler#getMSForConf`, the `ms` 
> instance is set to `threadLocalMS`, but the static `threadLocalMS` instance 
> belongs to `HMSHandler`(loaded by IsolatedClassLoader$$anon$1), that is, the 
> set and get methods do not operate on the same `threadLocalMS` instance, so 
> in `ThreadWithGarbageCleanup#cacheThreadLocalRawStore` method, the obtained 
> `RawStore` instance is null, so the subsequent `RawStore` cleaning logic does 
> not take effect, because the `shutdown` method of `RawStore` instance is not 
> called, resulting in `pmCache` of `JDOPersistenceManagerFactory` memory leak.
> Long-running Spark ThriftServer end up with frequent GCs, resulting in poor 
> performance.
> I analyzed the heap dump using MAT and executed the following OQL: `SELECT * 
> FROM INSTANCEOF java.lang.Class c WHERE c.@displayName.contains("class 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler ")`, two instances 
> of the `HMSHandler` **Class** can be found in the heap. Also know that they 
> each hold a static `threadLocalMS` instance.
> We execute the following OQL: `select * from 
> org.datanucleus.api.jdo.JDOPersistenceManagerFactory`, we can see that the 
> `pmCache` of the `JDOPersistenceManagerFactory` instance occupies 1.3GB of 
> memory.
> We execute the following OQL: `SELECT * FROM INSTANCEOF java.lang.Class c 
> WHERE c.@displayName.contains("class 
> org.apache.hive.service.server.ThreadFactoryWithGarbageCleanup")`, we can see 
> that there is no element in the static instance `threadRawStoreMap` of 
> `ThreadFactoryWithGarbageCleanup`, which confirms the above statement, 
> because `HMSHandler.getRawStore()` in 
> `ThreadWithGarbageCleanup#cacheThreadLocalRawStore` is called on the 
> `threadLocalMS` instance in `HMSHandler`(loaded by AppClassLoader) instead of 
> `threadLocalMS` instance in `HMSHandler`(loaded by 
> IsolatedClassLoader$$anon$1).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39357) pmCache memory leak caused by IsolatedClassLoader

2022-06-01 Thread tianshuang (Jira)
tianshuang created SPARK-39357:
--

 Summary: pmCache memory leak caused by IsolatedClassLoader
 Key: SPARK-39357
 URL: https://issues.apache.org/jira/browse/SPARK-39357
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.1, 2.4.4
Reporter: tianshuang
 Attachments: Xnip2022-06-01_23-09-35.jpg, 
Xnip2022-06-01_23-19-35.jpeg, Xnip2022-06-01_23-32-39.jpg

I found this bug in Spark 2.4.4, because the related code has not changed, so 
this bug still exists on master, the following is a brief description of this 
bug:

In May 2015, 
[SPARK-6907](https://github.com/apache/spark/commit/daa70bf135f23381f5f410aa95a1c0e5a2888568)
 introduced isolated classloader for HiveMetastore to support Hive 
multi-version loading, but this PR resulted in [RawStore cleanup 
mechanism](https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/ThreadFactoryWithGarbageCleanup.java
 #L27-L42) is broken because the `ThreadWithGarbageCleanup` class used by 
`HiveServer2-Handler-Pool` and `HiveServer2-Background-Pool` and 
`HiveServer2-HttpHandler-Pool` is loaded by AppClassLoader, in the source code 
of `ThreadWithGarbageCleanup` class: `RawStore threadLocalRawStore = 
HiveMetaStore.HMSHandler.getRawStore();` This line of code will use the 
`threadLocalMS` instance in `HiveMetaStore.HMSHandler` (loaded by 
AppClassLoader), and in the process of thread execution, the `client` actually 
created by isolatedClassLoader, in the process of obtaining `RawStore` instance 
through `HiveMetaStore.HMSHandler#getMSForConf`, the `ms` instance is set to 
`threadLocalMS`, but the static `threadLocalMS` instance belongs to 
`HMSHandler`(loaded by IsolatedClassLoader$$anon$1), that is, the set and get 
methods do not operate on the same `threadLocalMS` instance, so in 
`ThreadWithGarbageCleanup#cacheThreadLocalRawStore` method, the obtained 
`RawStore` instance is null, so the subsequent `RawStore` cleaning logic does 
not take effect, because the `shutdown` method of `RawStore` instance is not 
called, resulting in `pmCache` of `JDOPersistenceManagerFactory` memory leak.

Long-running Spark ThriftServer end up with frequent GCs, resulting in poor 
performance.

I analyzed the heap dump using MAT and executed the following OQL: `SELECT * 
FROM INSTANCEOF java.lang.Class c WHERE c.@displayName.contains("class 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler ")`, two instances of 
the `HMSHandler` **Class** can be found in the heap. Also know that they each 
hold a static `threadLocalMS` instance.

We execute the following OQL: `select * from 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory`, we can see that the 
`pmCache` of the `JDOPersistenceManagerFactory` instance occupies 1.3GB of 
memory.

We execute the following OQL: `SELECT * FROM INSTANCEOF java.lang.Class c WHERE 
c.@displayName.contains("class 
org.apache.hive.service.server.ThreadFactoryWithGarbageCleanup")`, we can see 
that there is no element in the static instance `threadRawStoreMap` of 
`ThreadFactoryWithGarbageCleanup`, which confirms the above statement, because 
`HMSHandler.getRawStore()` in 
`ThreadWithGarbageCleanup#cacheThreadLocalRawStore` is called on the 
`threadLocalMS` instance in `HMSHandler`(loaded by AppClassLoader) instead of 
`threadLocalMS` instance in `HMSHandler`(loaded by IsolatedClassLoader$$anon$1).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39357) pmCache memory leak caused by IsolatedClassLoader

2022-06-01 Thread tianshuang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tianshuang updated SPARK-39357:
---
Attachment: Xnip2022-06-01_23-32-39.jpg

> pmCache memory leak caused by IsolatedClassLoader
> -
>
> Key: SPARK-39357
> URL: https://issues.apache.org/jira/browse/SPARK-39357
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 3.2.1
>Reporter: tianshuang
>Priority: Major
> Attachments: Xnip2022-06-01_23-09-35.jpg, 
> Xnip2022-06-01_23-19-35.jpeg, Xnip2022-06-01_23-32-39.jpg
>
>
> I found this bug in Spark 2.4.4, because the related code has not changed, so 
> this bug still exists on master, the following is a brief description of this 
> bug:
> In May 2015, 
> [SPARK-6907](https://github.com/apache/spark/commit/daa70bf135f23381f5f410aa95a1c0e5a2888568)
>  introduced isolated classloader for HiveMetastore to support Hive 
> multi-version loading, but this PR resulted in [RawStore cleanup 
> mechanism](https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/ThreadFactoryWithGarbageCleanup.java
>  #L27-L42) is broken because the `ThreadWithGarbageCleanup` class used by 
> `HiveServer2-Handler-Pool` and `HiveServer2-Background-Pool` and 
> `HiveServer2-HttpHandler-Pool` is loaded by AppClassLoader, in the source 
> code of `ThreadWithGarbageCleanup` class: `RawStore threadLocalRawStore = 
> HiveMetaStore.HMSHandler.getRawStore();` This line of code will use the 
> `threadLocalMS` instance in `HiveMetaStore.HMSHandler` (loaded by 
> AppClassLoader), and in the process of thread execution, the `client` 
> actually created by isolatedClassLoader, in the process of obtaining 
> `RawStore` instance through `HiveMetaStore.HMSHandler#getMSForConf`, the `ms` 
> instance is set to `threadLocalMS`, but the static `threadLocalMS` instance 
> belongs to `HMSHandler`(loaded by IsolatedClassLoader$$anon$1), that is, the 
> set and get methods do not operate on the same `threadLocalMS` instance, so 
> in `ThreadWithGarbageCleanup#cacheThreadLocalRawStore` method, the obtained 
> `RawStore` instance is null, so the subsequent `RawStore` cleaning logic does 
> not take effect, because the `shutdown` method of `RawStore` instance is not 
> called, resulting in `pmCache` of `JDOPersistenceManagerFactory` memory leak.
> Long-running Spark ThriftServer end up with frequent GCs, resulting in poor 
> performance.
> I analyzed the heap dump using MAT and executed the following OQL: `SELECT * 
> FROM INSTANCEOF java.lang.Class c WHERE c.@displayName.contains("class 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler ")`, two instances 
> of the `HMSHandler` **Class** can be found in the heap. Also know that they 
> each hold a static `threadLocalMS` instance.
> We execute the following OQL: `select * from 
> org.datanucleus.api.jdo.JDOPersistenceManagerFactory`, we can see that the 
> `pmCache` of the `JDOPersistenceManagerFactory` instance occupies 1.3GB of 
> memory.
> We execute the following OQL: `SELECT * FROM INSTANCEOF java.lang.Class c 
> WHERE c.@displayName.contains("class 
> org.apache.hive.service.server.ThreadFactoryWithGarbageCleanup")`, we can see 
> that there is no element in the static instance `threadRawStoreMap` of 
> `ThreadFactoryWithGarbageCleanup`, which confirms the above statement, 
> because `HMSHandler.getRawStore()` in 
> `ThreadWithGarbageCleanup#cacheThreadLocalRawStore` is called on the 
> `threadLocalMS` instance in `HMSHandler`(loaded by AppClassLoader) instead of 
> `threadLocalMS` instance in `HMSHandler`(loaded by 
> IsolatedClassLoader$$anon$1).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39357) pmCache memory leak caused by IsolatedClassLoader

2022-06-01 Thread tianshuang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tianshuang updated SPARK-39357:
---
Attachment: Xnip2022-06-01_23-09-35.jpg

> pmCache memory leak caused by IsolatedClassLoader
> -
>
> Key: SPARK-39357
> URL: https://issues.apache.org/jira/browse/SPARK-39357
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 3.2.1
>Reporter: tianshuang
>Priority: Major
> Attachments: Xnip2022-06-01_23-09-35.jpg, 
> Xnip2022-06-01_23-19-35.jpeg, Xnip2022-06-01_23-32-39.jpg
>
>
> I found this bug in Spark 2.4.4, because the related code has not changed, so 
> this bug still exists on master, the following is a brief description of this 
> bug:
> In May 2015, 
> [SPARK-6907](https://github.com/apache/spark/commit/daa70bf135f23381f5f410aa95a1c0e5a2888568)
>  introduced isolated classloader for HiveMetastore to support Hive 
> multi-version loading, but this PR resulted in [RawStore cleanup 
> mechanism](https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/ThreadFactoryWithGarbageCleanup.java
>  #L27-L42) is broken because the `ThreadWithGarbageCleanup` class used by 
> `HiveServer2-Handler-Pool` and `HiveServer2-Background-Pool` and 
> `HiveServer2-HttpHandler-Pool` is loaded by AppClassLoader, in the source 
> code of `ThreadWithGarbageCleanup` class: `RawStore threadLocalRawStore = 
> HiveMetaStore.HMSHandler.getRawStore();` This line of code will use the 
> `threadLocalMS` instance in `HiveMetaStore.HMSHandler` (loaded by 
> AppClassLoader), and in the process of thread execution, the `client` 
> actually created by isolatedClassLoader, in the process of obtaining 
> `RawStore` instance through `HiveMetaStore.HMSHandler#getMSForConf`, the `ms` 
> instance is set to `threadLocalMS`, but the static `threadLocalMS` instance 
> belongs to `HMSHandler`(loaded by IsolatedClassLoader$$anon$1), that is, the 
> set and get methods do not operate on the same `threadLocalMS` instance, so 
> in `ThreadWithGarbageCleanup#cacheThreadLocalRawStore` method, the obtained 
> `RawStore` instance is null, so the subsequent `RawStore` cleaning logic does 
> not take effect, because the `shutdown` method of `RawStore` instance is not 
> called, resulting in `pmCache` of `JDOPersistenceManagerFactory` memory leak.
> Long-running Spark ThriftServer end up with frequent GCs, resulting in poor 
> performance.
> I analyzed the heap dump using MAT and executed the following OQL: `SELECT * 
> FROM INSTANCEOF java.lang.Class c WHERE c.@displayName.contains("class 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler ")`, two instances 
> of the `HMSHandler` **Class** can be found in the heap. Also know that they 
> each hold a static `threadLocalMS` instance.
> We execute the following OQL: `select * from 
> org.datanucleus.api.jdo.JDOPersistenceManagerFactory`, we can see that the 
> `pmCache` of the `JDOPersistenceManagerFactory` instance occupies 1.3GB of 
> memory.
> We execute the following OQL: `SELECT * FROM INSTANCEOF java.lang.Class c 
> WHERE c.@displayName.contains("class 
> org.apache.hive.service.server.ThreadFactoryWithGarbageCleanup")`, we can see 
> that there is no element in the static instance `threadRawStoreMap` of 
> `ThreadFactoryWithGarbageCleanup`, which confirms the above statement, 
> because `HMSHandler.getRawStore()` in 
> `ThreadWithGarbageCleanup#cacheThreadLocalRawStore` is called on the 
> `threadLocalMS` instance in `HMSHandler`(loaded by AppClassLoader) instead of 
> `threadLocalMS` instance in `HMSHandler`(loaded by 
> IsolatedClassLoader$$anon$1).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39355) UnresolvedAttribute should only use CatalystSqlParser if name contains dot

2022-06-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39355:


Assignee: (was: Apache Spark)

> UnresolvedAttribute should only use CatalystSqlParser if name contains dot
> --
>
> Key: SPARK-39355
> URL: https://issues.apache.org/jira/browse/SPARK-39355
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: dzcxzl
>Priority: Trivial
>
>  
> {code:java}
> select * from (select '2022-06-01' as c1 ) a where c1 in (select 
> date_add('2022-06-01',0)); {code}
> {code:java}
> Error in query:
> mismatched input '(' expecting {, '.', '-'}(line 1, pos 8)
> == SQL ==
> date_add(2022-06-01, 0)
> ^^^ {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39355) UnresolvedAttribute should only use CatalystSqlParser if name contains dot

2022-06-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39355:


Assignee: Apache Spark

> UnresolvedAttribute should only use CatalystSqlParser if name contains dot
> --
>
> Key: SPARK-39355
> URL: https://issues.apache.org/jira/browse/SPARK-39355
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: dzcxzl
>Assignee: Apache Spark
>Priority: Trivial
>
>  
> {code:java}
> select * from (select '2022-06-01' as c1 ) a where c1 in (select 
> date_add('2022-06-01',0)); {code}
> {code:java}
> Error in query:
> mismatched input '(' expecting {, '.', '-'}(line 1, pos 8)
> == SQL ==
> date_add(2022-06-01, 0)
> ^^^ {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39355) UnresolvedAttribute should only use CatalystSqlParser if name contains dot

2022-06-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39355:


Assignee: Apache Spark

> UnresolvedAttribute should only use CatalystSqlParser if name contains dot
> --
>
> Key: SPARK-39355
> URL: https://issues.apache.org/jira/browse/SPARK-39355
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: dzcxzl
>Assignee: Apache Spark
>Priority: Trivial
>
>  
> {code:java}
> select * from (select '2022-06-01' as c1 ) a where c1 in (select 
> date_add('2022-06-01',0)); {code}
> {code:java}
> Error in query:
> mismatched input '(' expecting {, '.', '-'}(line 1, pos 8)
> == SQL ==
> date_add(2022-06-01, 0)
> ^^^ {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39355) UnresolvedAttribute should only use CatalystSqlParser if name contains dot

2022-06-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544980#comment-17544980
 ] 

Apache Spark commented on SPARK-39355:
--

User 'cxzl25' has created a pull request for this issue:
https://github.com/apache/spark/pull/36740

> UnresolvedAttribute should only use CatalystSqlParser if name contains dot
> --
>
> Key: SPARK-39355
> URL: https://issues.apache.org/jira/browse/SPARK-39355
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: dzcxzl
>Priority: Trivial
>
>  
> {code:java}
> select * from (select '2022-06-01' as c1 ) a where c1 in (select 
> date_add('2022-06-01',0)); {code}
> {code:java}
> Error in query:
> mismatched input '(' expecting {, '.', '-'}(line 1, pos 8)
> == SQL ==
> date_add(2022-06-01, 0)
> ^^^ {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39355) UnresolvedAttribute should only use CatalystSqlParser if name contains dot

2022-06-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39355:


Assignee: (was: Apache Spark)

> UnresolvedAttribute should only use CatalystSqlParser if name contains dot
> --
>
> Key: SPARK-39355
> URL: https://issues.apache.org/jira/browse/SPARK-39355
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: dzcxzl
>Priority: Trivial
>
>  
> {code:java}
> select * from (select '2022-06-01' as c1 ) a where c1 in (select 
> date_add('2022-06-01',0)); {code}
> {code:java}
> Error in query:
> mismatched input '(' expecting {, '.', '-'}(line 1, pos 8)
> == SQL ==
> date_add(2022-06-01, 0)
> ^^^ {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39356) Add option to skip initial message in Pregel API

2022-06-01 Thread Aaron Zolnai-Lucas (Jira)
Aaron Zolnai-Lucas created SPARK-39356:
--

 Summary: Add option to skip initial message in Pregel API
 Key: SPARK-39356
 URL: https://issues.apache.org/jira/browse/SPARK-39356
 Project: Spark
  Issue Type: Improvement
  Components: GraphX
Affects Versions: 3.2.1
Reporter: Aaron Zolnai-Lucas


The current (3.2.1) [Pregel 
API|https://github.com/apache/spark/blob/5a3ba9b0b301a3b0c43f8d0d88e2b6bdce57d0e6/graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala#L117]
 takes a parameter {{initialMsg: A}} where {{A : scala.reflect.ClassTag}} is 
the message type for the Pregel iterations. At the start of the iterative 
process, the user-supplied vertex update method {{vprog}} is called with the 
initial message.

However, in some cases, the start point for a message passing scheme is best 
described by starting with the {{message}} phase rather than the {{vprog}} 
phase, and in many cases the first message depends on individual vertex data 
(instead of a static initial message). In these cases, users are forced to add 
boilerplate to their {{vprog}} function to check if the message received is the 
{{initialMessage}} and ignore the message (leave the node state unchanged) if 
it is. This leads to less efficient (due to extra iteration and check) and less 
readable code.
 
My proposed solution is to change {{initialMsg}} to a parameter of type 
{{Option[A]}} with default {{{}None{}}}, and then inside {{Pregel.apply}} 
function, set:
{code:scala}
var g = initialMsg match {
  case Some(msg) => graph.mapVertices((vid, vdata) => vprog(vid, vdata, msg))
  case _ => graph
}
{code}
This way, the user chooses whether to start the iteration from the {{message}} 
or {{vprog}} phase. I believe this small change could improve user code 
readability and efficiency.

Note: The signature of {{GraphOps.pregel}} would have to be changed to match
 

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39356) Add option to skip initial message in Pregel API

2022-06-01 Thread Aaron Zolnai-Lucas (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Zolnai-Lucas updated SPARK-39356:
---
Priority: Minor  (was: Major)

> Add option to skip initial message in Pregel API
> 
>
> Key: SPARK-39356
> URL: https://issues.apache.org/jira/browse/SPARK-39356
> Project: Spark
>  Issue Type: Improvement
>  Components: GraphX
>Affects Versions: 3.2.1
>Reporter: Aaron Zolnai-Lucas
>Priority: Minor
>  Labels: graphx, pregel
>
> The current (3.2.1) [Pregel 
> API|https://github.com/apache/spark/blob/5a3ba9b0b301a3b0c43f8d0d88e2b6bdce57d0e6/graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala#L117]
>  takes a parameter {{initialMsg: A}} where {{A : scala.reflect.ClassTag}} is 
> the message type for the Pregel iterations. At the start of the iterative 
> process, the user-supplied vertex update method {{vprog}} is called with the 
> initial message.
> However, in some cases, the start point for a message passing scheme is best 
> described by starting with the {{message}} phase rather than the {{vprog}} 
> phase, and in many cases the first message depends on individual vertex data 
> (instead of a static initial message). In these cases, users are forced to 
> add boilerplate to their {{vprog}} function to check if the message received 
> is the {{initialMessage}} and ignore the message (leave the node state 
> unchanged) if it is. This leads to less efficient (due to extra iteration and 
> check) and less readable code.
>  
> My proposed solution is to change {{initialMsg}} to a parameter of type 
> {{Option[A]}} with default {{{}None{}}}, and then inside {{Pregel.apply}} 
> function, set:
> {code:scala}
> var g = initialMsg match {
>   case Some(msg) => graph.mapVertices((vid, vdata) => vprog(vid, vdata, msg))
>   case _ => graph
> }
> {code}
> This way, the user chooses whether to start the iteration from the 
> {{message}} or {{vprog}} phase. I believe this small change could improve 
> user code readability and efficiency.
> Note: The signature of {{GraphOps.pregel}} would have to be changed to match
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39355) UnresolvedAttribute should only use CatalystSqlParser if name contains dot

2022-06-01 Thread dzcxzl (Jira)
dzcxzl created SPARK-39355:
--

 Summary: UnresolvedAttribute should only use CatalystSqlParser if 
name contains dot
 Key: SPARK-39355
 URL: https://issues.apache.org/jira/browse/SPARK-39355
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.0
Reporter: dzcxzl


 
{code:java}
select * from (select '2022-06-01' as c1 ) a where c1 in (select 
date_add('2022-06-01',0)); {code}
{code:java}
Error in query:
mismatched input '(' expecting {, '.', '-'}(line 1, pos 8)
== SQL ==
date_add(2022-06-01, 0)
^^^ {code}
 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39040) Respect NaNvl in EquivalentExpressions for expression elimination

2022-06-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544945#comment-17544945
 ] 

Apache Spark commented on SPARK-39040:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/36739

> Respect NaNvl in EquivalentExpressions for expression elimination
> -
>
> Key: SPARK-39040
> URL: https://issues.apache.org/jira/browse/SPARK-39040
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
> Fix For: 3.3.0
>
>
> For example the query will fail:
> {code:java}
> set spark.sql.ansi.enabled=true;
> set 
> spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ConstantFolding;
> SELECT nanvl(1, 1/0 + 1/0);  {code}
> {code:java}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 4.0 failed 1 times, most recent failure: Lost task 0.0 in stage 4.0 
> (TID 4) (10.221.98.68 executor driver): 
> org.apache.spark.SparkArithmeticException: divide by zero. To return NULL 
> instead, use 'try_divide'. If necessary set spark.sql.ansi.enabled to false 
> (except for ANSI interval type) to bypass this error.
> == SQL(line 1, position 17) ==
> select nanvl(1 , 1/0 + 1/0)
>                  ^^^    at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:151)
>  {code}
> We should respect the ordering of conditional expression that always evaluate 
> the predicate branch first, so the query above should not fail.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39040) Respect NaNvl in EquivalentExpressions for expression elimination

2022-06-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544946#comment-17544946
 ] 

Apache Spark commented on SPARK-39040:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/36739

> Respect NaNvl in EquivalentExpressions for expression elimination
> -
>
> Key: SPARK-39040
> URL: https://issues.apache.org/jira/browse/SPARK-39040
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
> Fix For: 3.3.0
>
>
> For example the query will fail:
> {code:java}
> set spark.sql.ansi.enabled=true;
> set 
> spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ConstantFolding;
> SELECT nanvl(1, 1/0 + 1/0);  {code}
> {code:java}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 4.0 failed 1 times, most recent failure: Lost task 0.0 in stage 4.0 
> (TID 4) (10.221.98.68 executor driver): 
> org.apache.spark.SparkArithmeticException: divide by zero. To return NULL 
> instead, use 'try_divide'. If necessary set spark.sql.ansi.enabled to false 
> (except for ANSI interval type) to bypass this error.
> == SQL(line 1, position 17) ==
> select nanvl(1 , 1/0 + 1/0)
>                  ^^^    at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.divideByZeroError(QueryExecutionErrors.scala:151)
>  {code}
> We should respect the ordering of conditional expression that always evaluate 
> the predicate branch first, so the query above should not fail.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39267) Clean up dsl unnecessary symbol

2022-06-01 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-39267.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36646
[https://github.com/apache/spark/pull/36646]

> Clean up dsl unnecessary symbol
> ---
>
> Key: SPARK-39267
> URL: https://issues.apache.org/jira/browse/SPARK-39267
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Priority: Minor
> Fix For: 3.4.0
>
>
> dsl is a test helper file which provide easy used functions. But some of 
> these are unnecessary, for example:
> {code:java}
> def subquery(alias: Symbol): LogicalPlan {code}
> For a subquery, we only need the name, so a string type parameter is enough. 
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39267) Clean up dsl unnecessary symbol

2022-06-01 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-39267:
---

Assignee: XiDuo You

> Clean up dsl unnecessary symbol
> ---
>
> Key: SPARK-39267
> URL: https://issues.apache.org/jira/browse/SPARK-39267
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Minor
> Fix For: 3.4.0
>
>
> dsl is a test helper file which provide easy used functions. But some of 
> these are unnecessary, for example:
> {code:java}
> def subquery(alias: Symbol): LogicalPlan {code}
> For a subquery, we only need the name, so a string type parameter is enough. 
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39354) The analysis exception is incorrect

2022-06-01 Thread Yuming Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544914#comment-17544914
 ] 

Yuming Wang commented on SPARK-39354:
-

Yes. It is only exception message related.

> The analysis exception is incorrect
> ---
>
> Key: SPARK-39354
> URL: https://issues.apache.org/jira/browse/SPARK-39354
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Priority: Minor
>
> {noformat}
> scala> spark.sql("create table t1(user_id int, auct_end_dt date) using 
> parquet;")
> res0: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("select * from t1 join t2 on t1.user_id = t2.user_id where 
> t1.auct_end_dt >= Date_sub('2020-12-27', 90)").show
> org.apache.spark.sql.AnalysisException: cannot resolve 
> 'date_sub('2020-12-27', 90)' due to data type mismatch: argument 1 requires 
> date type, however, ''2020-12-27'' is of string type.; line 1 pos 76
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.RemoveTempResolvedColumn$.$anonfun$apply$82(Analyzer.scala:4334)
>   at 
> org.apache.spark.sql.catalyst.analysis.RemoveTempResolvedColumn$.$anonfun$apply$82$adapted(Analyzer.scala:4327)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:365)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:364)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:364)
> {noformat}
> The analysis exception should be:
> {noformat}
> org.apache.spark.sql.AnalysisException: Table or view not found: t2
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39354) The analysis exception is incorrect

2022-06-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-39354:
-
Priority: Blocker  (was: Major)

> The analysis exception is incorrect
> ---
>
> Key: SPARK-39354
> URL: https://issues.apache.org/jira/browse/SPARK-39354
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Priority: Blocker
>
> {noformat}
> scala> spark.sql("create table t1(user_id int, auct_end_dt date) using 
> parquet;")
> res0: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("select * from t1 join t2 on t1.user_id = t2.user_id where 
> t1.auct_end_dt >= Date_sub('2020-12-27', 90)").show
> org.apache.spark.sql.AnalysisException: cannot resolve 
> 'date_sub('2020-12-27', 90)' due to data type mismatch: argument 1 requires 
> date type, however, ''2020-12-27'' is of string type.; line 1 pos 76
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.RemoveTempResolvedColumn$.$anonfun$apply$82(Analyzer.scala:4334)
>   at 
> org.apache.spark.sql.catalyst.analysis.RemoveTempResolvedColumn$.$anonfun$apply$82$adapted(Analyzer.scala:4327)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:365)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:364)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:364)
> {noformat}
> The analysis exception should be:
> {noformat}
> org.apache.spark.sql.AnalysisException: Table or view not found: t2
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39354) The analysis exception is incorrect

2022-06-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-39354:
-
Priority: Minor  (was: Blocker)

> The analysis exception is incorrect
> ---
>
> Key: SPARK-39354
> URL: https://issues.apache.org/jira/browse/SPARK-39354
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Priority: Minor
>
> {noformat}
> scala> spark.sql("create table t1(user_id int, auct_end_dt date) using 
> parquet;")
> res0: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("select * from t1 join t2 on t1.user_id = t2.user_id where 
> t1.auct_end_dt >= Date_sub('2020-12-27', 90)").show
> org.apache.spark.sql.AnalysisException: cannot resolve 
> 'date_sub('2020-12-27', 90)' due to data type mismatch: argument 1 requires 
> date type, however, ''2020-12-27'' is of string type.; line 1 pos 76
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.RemoveTempResolvedColumn$.$anonfun$apply$82(Analyzer.scala:4334)
>   at 
> org.apache.spark.sql.catalyst.analysis.RemoveTempResolvedColumn$.$anonfun$apply$82$adapted(Analyzer.scala:4327)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:365)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:364)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:364)
> {noformat}
> The analysis exception should be:
> {noformat}
> org.apache.spark.sql.AnalysisException: Table or view not found: t2
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39354) The analysis exception is incorrect

2022-06-01 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544904#comment-17544904
 ] 

Hyukjin Kwon commented on SPARK-39354:
--

[~yumwang] is this only exception message related?

> The analysis exception is incorrect
> ---
>
> Key: SPARK-39354
> URL: https://issues.apache.org/jira/browse/SPARK-39354
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Priority: Minor
>
> {noformat}
> scala> spark.sql("create table t1(user_id int, auct_end_dt date) using 
> parquet;")
> res0: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("select * from t1 join t2 on t1.user_id = t2.user_id where 
> t1.auct_end_dt >= Date_sub('2020-12-27', 90)").show
> org.apache.spark.sql.AnalysisException: cannot resolve 
> 'date_sub('2020-12-27', 90)' due to data type mismatch: argument 1 requires 
> date type, however, ''2020-12-27'' is of string type.; line 1 pos 76
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.RemoveTempResolvedColumn$.$anonfun$apply$82(Analyzer.scala:4334)
>   at 
> org.apache.spark.sql.catalyst.analysis.RemoveTempResolvedColumn$.$anonfun$apply$82$adapted(Analyzer.scala:4327)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:365)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:364)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:364)
> {noformat}
> The analysis exception should be:
> {noformat}
> org.apache.spark.sql.AnalysisException: Table or view not found: t2
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39353) Cannot fetch hdfs data node local

2022-06-01 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-39353:
-
Priority: Major  (was: Blocker)

> Cannot fetch hdfs data node local
> -
>
> Key: SPARK-39353
> URL: https://issues.apache.org/jira/browse/SPARK-39353
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.2.1
> Environment: HDFS on Kubernetes 3.3.1
> Spark on Kubernetes 3.2.1
>Reporter: Jinpeng Chi
>Priority: Major
>
> When i use hdfs short circuit read, the local level always any



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39353) Cannot fetch hdfs data node local

2022-06-01 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544903#comment-17544903
 ] 

Hyukjin Kwon commented on SPARK-39353:
--

[~cutiechi] mind elabourating the issue a bit more? How do you reproduce this 
issue?

> Cannot fetch hdfs data node local
> -
>
> Key: SPARK-39353
> URL: https://issues.apache.org/jira/browse/SPARK-39353
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.2.1
> Environment: HDFS on Kubernetes 3.3.1
> Spark on Kubernetes 3.2.1
>Reporter: Jinpeng Chi
>Priority: Major
>
> When i use hdfs short circuit read, the local level always any



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39354) The analysis exception is incorrect

2022-06-01 Thread Yuming Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544894#comment-17544894
 ] 

Yuming Wang commented on SPARK-39354:
-

[~maxgekk], I think this is a blocker issue for 3.3.0 release.

> The analysis exception is incorrect
> ---
>
> Key: SPARK-39354
> URL: https://issues.apache.org/jira/browse/SPARK-39354
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Priority: Major
>
> {noformat}
> scala> spark.sql("create table t1(user_id int, auct_end_dt date) using 
> parquet;")
> res0: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("select * from t1 join t2 on t1.user_id = t2.user_id where 
> t1.auct_end_dt >= Date_sub('2020-12-27', 90)").show
> org.apache.spark.sql.AnalysisException: cannot resolve 
> 'date_sub('2020-12-27', 90)' due to data type mismatch: argument 1 requires 
> date type, however, ''2020-12-27'' is of string type.; line 1 pos 76
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.RemoveTempResolvedColumn$.$anonfun$apply$82(Analyzer.scala:4334)
>   at 
> org.apache.spark.sql.catalyst.analysis.RemoveTempResolvedColumn$.$anonfun$apply$82$adapted(Analyzer.scala:4327)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:365)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:364)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:364)
> {noformat}
> The analysis exception should be:
> {noformat}
> org.apache.spark.sql.AnalysisException: Table or view not found: t2
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39354) The analysis exception is incorrect

2022-06-01 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-39354:
---

 Summary: The analysis exception is incorrect
 Key: SPARK-39354
 URL: https://issues.apache.org/jira/browse/SPARK-39354
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.0
Reporter: Yuming Wang


{noformat}
scala> spark.sql("create table t1(user_id int, auct_end_dt date) using 
parquet;")
res0: org.apache.spark.sql.DataFrame = []

scala> spark.sql("select * from t1 join t2 on t1.user_id = t2.user_id where 
t1.auct_end_dt >= Date_sub('2020-12-27', 90)").show
org.apache.spark.sql.AnalysisException: cannot resolve 'date_sub('2020-12-27', 
90)' due to data type mismatch: argument 1 requires date type, however, 
''2020-12-27'' is of string type.; line 1 pos 76
  at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
  at 
org.apache.spark.sql.catalyst.analysis.RemoveTempResolvedColumn$.$anonfun$apply$82(Analyzer.scala:4334)
  at 
org.apache.spark.sql.catalyst.analysis.RemoveTempResolvedColumn$.$anonfun$apply$82$adapted(Analyzer.scala:4327)
  at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:365)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:364)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:364)
{noformat}

The analysis exception should be:
{noformat}
org.apache.spark.sql.AnalysisException: Table or view not found: t2
{noformat}





--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39350) DescribeNamespace should redact properties

2022-06-01 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-39350:
---

Assignee: angerszhu

> DescribeNamespace should redact properties
> --
>
> Key: SPARK-39350
> URL: https://issues.apache.org/jira/browse/SPARK-39350
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
>
> DescribeNamespace should redact properties



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39350) DescribeNamespace should redact properties

2022-06-01 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-39350.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36735
[https://github.com/apache/spark/pull/36735]

> DescribeNamespace should redact properties
> --
>
> Key: SPARK-39350
> URL: https://issues.apache.org/jira/browse/SPARK-39350
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.4.0
>
>
> DescribeNamespace should redact properties



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39353) Cannot fetch hdfs data node local

2022-06-01 Thread Jinpeng Chi (Jira)
Jinpeng Chi created SPARK-39353:
---

 Summary: Cannot fetch hdfs data node local
 Key: SPARK-39353
 URL: https://issues.apache.org/jira/browse/SPARK-39353
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes
Affects Versions: 3.2.1
 Environment: HDFS on Kubernetes 3.3.1

Spark on Kubernetes 3.2.1
Reporter: Jinpeng Chi


When i use hdfs short circuit read, the local level always any



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39352) There are problems in canUpCast function

2022-06-01 Thread Chao Gao (Jira)
Chao Gao created SPARK-39352:


 Summary: There are problems in canUpCast function
 Key: SPARK-39352
 URL: https://issues.apache.org/jira/browse/SPARK-39352
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.2.1
Reporter: Chao Gao


Spark's canUpCast means one value can be cast to another type without the loss 
of precision. But actually long type can not be cast to float/double
{code:java}
def canUpCast(from: DataType, to: DataType): Boolean = (from, to) match {
  case _ if from == to => true
  case (from: NumericType, to: DecimalType) if to.isWiderThan(from) => true
  case (from: DecimalType, to: NumericType) if from.isTighterThan(to) => true
  case (f, t) if legalNumericPrecedence(f, t) => true
  case (DateType, TimestampType) => true
  case (_: AtomicType, StringType) => true
  case (_: CalendarIntervalType, StringType) => true
  case (NullType, _) => true

... {code}
{code:java}
private def legalNumericPrecedence(from: DataType, to: DataType): Boolean = { 
val fromPrecedence = TypeCoercion.numericPrecedence.indexOf(from)
val toPrecedence = TypeCoercion.numericPrecedence.indexOf(to) fromPrecedence >= 
0 && fromPrecedence < toPrecedence
}
{code}
{code:java}
val numericPrecedence =
  IndexedSeq(
ByteType,
ShortType,
IntegerType,
LongType,
FloatType,
DoubleType) {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org