from:"Gengliang Wang \(Jira\)"

[jira] [Resolved] (SPARK-37963) Need to update Partition URI after renaming table in InMemoryCatalog

2022-01-20 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-37963.

Fix Version/s: 3.3.0
   3.2.1
   Resolution: Fixed

Issue resolved by pull request 35251
[https://github.com/apache/spark/pull/35251]

> Need to update Partition URI after renaming table in InMemoryCatalog
> 
>
> Key: SPARK-37963
> URL: https://issues.apache.org/jira/browse/SPARK-37963
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.3.0, 3.2.1
>
>
> After renaming a partitioned table, select from the new table from 
> InMemoryCatalog will get an empty result.
> The following checkAnswer will fail as the result is empty.
> {code:java}
> sql(s"create table foo(i int, j int) using PARQUET partitioned by (j)")
> sql("insert into table foo partition(j=2) values (1)")
> sql(s"alter table foo rename to bar")
> checkAnswer(spark.table("bar"), Row(1, 2)) {code}
> To fix the bug, we need to update Partition URI after renaming a table in 
> InMemoryCatalog
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37963) Need to update Partition URI after renaming table in InMemoryCatalog

2022-01-19 Thread Gengliang Wang (Jira)

Gengliang Wang created SPARK-37963:
--

 Summary: Need to update Partition URI after renaming table in 
InMemoryCatalog
 Key: SPARK-37963
 URL: https://issues.apache.org/jira/browse/SPARK-37963
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


After renaming a partitioned table, select from the new table from 
InMemoryCatalog will get an empty result.

The following checkAnswer will fail as the result is empty.
{code:java}
sql(s"create table foo(i int, j int) using PARQUET partitioned by (j)")
sql("insert into table foo partition(j=2) values (1)")
sql(s"alter table foo rename to bar")
checkAnswer(spark.table("bar"), Row(1, 2)) {code}
To fix the bug, we need to update Partition URI after renaming a table in 
InMemoryCatalog

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37818) Add option for show create table command

2022-01-10 Thread Gengliang Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17471925#comment-17471925
 ] 

Gengliang Wang commented on SPARK-37818:


[~huaxingao] FYI I set the fixed version as 3.2.1. I saw there is a tag 
3.2.1-rc1 already, so I will update the fixed version as 3.2.2 if this doc 
change can't make it on 3.2.1

> Add option for show create table command
> 
>
> Key: SPARK-37818
> URL: https://issues.apache.org/jira/browse/SPARK-37818
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.3.0
>Reporter: PengLei
>Assignee: PengLei
>Priority: Trivial
> Fix For: 3.2.1, 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37818) Add option for show create table command

2022-01-10 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-37818.

Fix Version/s: 3.2.1
   Resolution: Fixed

Issue resolved by pull request 35107
[https://github.com/apache/spark/pull/35107]

> Add option for show create table command
> 
>
> Key: SPARK-37818
> URL: https://issues.apache.org/jira/browse/SPARK-37818
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.3.0
>Reporter: PengLei
>Assignee: PengLei
>Priority: Trivial
> Fix For: 3.3.0, 3.2.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37818) Add option for show create table command

2022-01-10 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-37818:
--

Assignee: PengLei

> Add option for show create table command
> 
>
> Key: SPARK-37818
> URL: https://issues.apache.org/jira/browse/SPARK-37818
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.3.0
>Reporter: PengLei
>Assignee: PengLei
>Priority: Trivial
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37817) Remove unreachable code in complexTypeExtractors.scala

2022-01-05 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-37817.

Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 35106
[https://github.com/apache/spark/pull/35106]

> Remove unreachable code in complexTypeExtractors.scala 
> ---
>
> Key: SPARK-37817
> URL: https://issues.apache.org/jira/browse/SPARK-37817
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Trivial
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37817) Remove unreachable code in complexTypeExtractors.scala

2022-01-05 Thread Gengliang Wang (Jira)

Gengliang Wang created SPARK-37817:
--

 Summary: Remove unreachable code in complexTypeExtractors.scala 
 Key: SPARK-37817
 URL: https://issues.apache.org/jira/browse/SPARK-37817
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.3.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37815) Fix the github action job "test_report"

2022-01-05 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-37815.

Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 35104
[https://github.com/apache/spark/pull/35104]

> Fix the github action job "test_report"
> ---
>
> Key: SPARK-37815
> URL: https://issues.apache.org/jira/browse/SPARK-37815
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Priority: Minor
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37815) Fix the github action job "test_report"

2022-01-05 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-37815:
--

Assignee: Gengliang Wang

> Fix the github action job "test_report"
> ---
>
> Key: SPARK-37815
> URL: https://issues.apache.org/jira/browse/SPARK-37815
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Minor
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37815) Fix the github action job "test_report"

2022-01-05 Thread Gengliang Wang (Jira)

Gengliang Wang created SPARK-37815:
--

 Summary: Fix the github action job "test_report"
 Key: SPARK-37815
 URL: https://issues.apache.org/jira/browse/SPARK-37815
 Project: Spark
  Issue Type: Task
  Components: Project Infra
Affects Versions: 3.3.0
Reporter: Gengliang Wang






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37750) ANSI mode: optionally return null result if element not exists in array/map

2022-01-04 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-37750:
---
Description: 
Add a new configuration `spark.sql.ansi.failOnElementNotExists` which controls 
whether throwing exceptions or returning null results when element not exists 
in the [] operator in array/map type.

The default value of the new configuration is true.

  was:
Add a new configuration `spark.sql.ansi.failOnElementNotExists` which controls 
whether throwing exceptions or returning null results when element not exists 
in:
 * [] operator in array/map type
 * element_at()
 * elt()


> ANSI mode: optionally return null result if element not exists in array/map
> ---
>
> Key: SPARK-37750
> URL: https://issues.apache.org/jira/browse/SPARK-37750
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.3.0
>
>
> Add a new configuration `spark.sql.ansi.failOnElementNotExists` which 
> controls whether throwing exceptions or returning null results when element 
> not exists in the [] operator in array/map type.
> The default value of the new configuration is true.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37750) ANSI mode: optionally return null result if element not exists in array/map

2022-01-04 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-37750.

Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 35031
[https://github.com/apache/spark/pull/35031]

> ANSI mode: optionally return null result if element not exists in array/map
> ---
>
> Key: SPARK-37750
> URL: https://issues.apache.org/jira/browse/SPARK-37750
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.3.0
>
>
> Add a new configuration `spark.sql.ansi.failOnElementNotExists` which 
> controls whether throwing exceptions or returning null results when element 
> not exists in:
>  * [] operator in array/map type
>  * element_at()
>  * elt()



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37750) ANSI mode: optionally return null result if element not exists in array/map

2021-12-27 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-37750:
---
Summary: ANSI mode: optionally return null result if element not exists in 
array/map  (was: ANSI mode: Add a config to optionally return null result if 
element not exists in array/map)

> ANSI mode: optionally return null result if element not exists in array/map
> ---
>
> Key: SPARK-37750
> URL: https://issues.apache.org/jira/browse/SPARK-37750
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Add a new configuration `spark.sql.ansi.failOnElementNotExists` which 
> controls whether throwing exceptions or returning null results when element 
> not exists in:
>  * [] operator in array/map type
>  * element_at()
>  * elt()



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37750) ANSI mode: Add a config to optionally return null result if element not exists in array/map

2021-12-27 Thread Gengliang Wang (Jira)

Gengliang Wang created SPARK-37750:
--

 Summary: ANSI mode: Add a config to optionally return null result 
if element not exists in array/map
 Key: SPARK-37750
 URL: https://issues.apache.org/jira/browse/SPARK-37750
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.3.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


Add a new configuration `spark.sql.ansi.failOnElementNotExists` which controls 
whether throwing exceptions or returning null results when element not exists 
in:
 * [] operator in array/map type
 * element_at()
 * elt()



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37724) ANSI mode: disable ANSI reserved keywords by default

2021-12-24 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-37724.

Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34996
[https://github.com/apache/spark/pull/34996]

> ANSI mode: disable ANSI reserved keywords by default
> 
>
> Key: SPARK-37724
> URL: https://issues.apache.org/jira/browse/SPARK-37724
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.3.0
>
>
> The reserved keywords thing is a big stopper for many users that want to try 
> ANSI mode. They have to update the SQL queries to pass the parser, which is 
> nothing about data quality but just trouble.
> By disabling the feature as default, I think we can get better adoption of 
> the ANSI mode.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37733) Change log level of tests to WARN

2021-12-23 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-37733:
---
Component/s: Build
 (was: Project Infra)

> Change log level of tests to WARN
> -
>
> Key: SPARK-37733
> URL: https://issues.apache.org/jira/browse/SPARK-37733
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37733) Change log level of tests to WARN

2021-12-23 Thread Gengliang Wang (Jira)

Gengliang Wang created SPARK-37733:
--

 Summary: Change log level of tests to WARN
 Key: SPARK-37733
 URL: https://issues.apache.org/jira/browse/SPARK-37733
 Project: Spark
  Issue Type: Task
  Components: Project Infra
Affects Versions: 3.3.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37714) ANSI mode: allow casting between numeric type and timestamp type

2021-12-23 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-37714.

Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34985
[https://github.com/apache/spark/pull/34985]

> ANSI mode: allow casting between numeric type and timestamp type 
> -
>
> Key: SPARK-37714
> URL: https://issues.apache.org/jira/browse/SPARK-37714
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.3.0
>
>
> h3. What changes were proposed?
>  * By default, allow casting between numeric type and timestamp type under 
> ANSI mode
>  * Remove the user-facing configuration 
> {{spark.sql.ansi.allowCastBetweenDatetimeAndNumeric}}
> h3. Why are the changes needed?
> Same reason as mentioned in 
> [#34459|https://github.com/apache/spark/pull/34459]. It is for better 
> adoption of ANSI SQL mode since users are relying on it:
>  * As we did some data science, we found that many Spark SQL users are 
> actually using {{Cast(Timestamp as Numeric)}} and {{{}Cast(Numeric as 
> Timestamp){}}}.
>  * The Spark SQL connector for Tableau is using this feature for DateTime 
> math. e.g.
> {{CAST(FROM_UNIXTIME(CAST(CAST(%1 AS BIGINT) + (%2 * 86400) AS BIGINT)) AS 
> TIMESTAMP)}}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37724) ANSI mode: disable ANSI reserved keywords by default

2021-12-23 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-37724:
---
Summary: ANSI mode: disable ANSI reserved keywords by default  (was: ANSI 
mode: disable ANSI reserved keyworks by default)

> ANSI mode: disable ANSI reserved keywords by default
> 
>
> Key: SPARK-37724
> URL: https://issues.apache.org/jira/browse/SPARK-37724
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> The reserved keywords thing is a big stopper for many users that want to try 
> ANSI mode. They have to update the SQL queries to pass the parser, which is 
> nothing about data quality but just trouble.
> By disabling the feature as default, I think we can get better adoption of 
> the ANSI mode.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37724) ANSI mode: disable ANSI reserved keyworks by default

2021-12-23 Thread Gengliang Wang (Jira)

Gengliang Wang created SPARK-37724:
--

 Summary: ANSI mode: disable ANSI reserved keyworks by default
 Key: SPARK-37724
 URL: https://issues.apache.org/jira/browse/SPARK-37724
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.3.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


The reserved keywords thing is a big stopper for many users that want to try 
ANSI mode. They have to update the SQL queries to pass the parser, which is 
nothing about data quality but just trouble.

By disabling the feature as default, I think we can get better adoption of the 
ANSI mode.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37659) Fix FsHistoryProvider race condition between list and delet log info

2021-12-22 Thread Gengliang Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17464335#comment-17464335
 ] 

Gengliang Wang commented on SPARK-37659:


Issue resolved by [https://github.com/apache/spark/pull/34919]

> Fix FsHistoryProvider race condition between list and delet log info
> 
>
> Key: SPARK-37659
> URL: https://issues.apache.org/jira/browse/SPARK-37659
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.1.2, 3.2.1, 3.3.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
> Fix For: 3.3.0
>
>
> After SPARK-29043, FsHistoryProvider will list the log info without waitting 
> all `mergeApplicationListing` task finished.
> However the `LevelDBIterator` of list log info is not thread safe if some 
> other threads delete the related log info at same time.
> There is the error msg:
> {code:java}
> 21/12/15 14:12:02 ERROR FsHistoryProvider: Exception in checking for event 
> log updates
> java.util.NoSuchElementException: 
> 1^@__main__^@+hdfs://xxx/application_xxx.inprogress
> at org.apache.spark.util.kvstore.LevelDB.get(LevelDB.java:132)
> at 
> org.apache.spark.util.kvstore.LevelDBIterator.next(LevelDBIterator.java:137)
> at 
> scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:44)
> at scala.collection.Iterator.foreach(Iterator.scala:941)
> at scala.collection.Iterator.foreach$(Iterator.scala:941)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
> at scala.collection.IterableLike.foreach(IterableLike.scala:74)
> at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
> at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
> at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
> at 
> scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184)
> at 
> scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47)
> at scala.collection.TraversableLike.to(TraversableLike.scala:678)
> at scala.collection.TraversableLike.to$(TraversableLike.scala:675)
> at scala.collection.AbstractTraversable.to(Traversable.scala:108)
> at scala.collection.TraversableOnce.toList(TraversableOnce.scala:299)
> at scala.collection.TraversableOnce.toList$(TraversableOnce.scala:299)
> at scala.collection.AbstractTraversable.toList(Traversable.scala:108)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider.checkForLogs(FsHistoryProvider.scala:588)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$startPolling$3(FsHistoryProvider.scala:299)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37659) Fix FsHistoryProvider race condition between list and delet log info

2021-12-22 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-37659.

   Fix Version/s: 3.3.0
Target Version/s: 3.3.0
Assignee: XiDuo You
  Resolution: Fixed

> Fix FsHistoryProvider race condition between list and delet log info
> 
>
> Key: SPARK-37659
> URL: https://issues.apache.org/jira/browse/SPARK-37659
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.1.2, 3.2.1, 3.3.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
> Fix For: 3.3.0
>
>
> After SPARK-29043, FsHistoryProvider will list the log info without waitting 
> all `mergeApplicationListing` task finished.
> However the `LevelDBIterator` of list log info is not thread safe if some 
> other threads delete the related log info at same time.
> There is the error msg:
> {code:java}
> 21/12/15 14:12:02 ERROR FsHistoryProvider: Exception in checking for event 
> log updates
> java.util.NoSuchElementException: 
> 1^@__main__^@+hdfs://xxx/application_xxx.inprogress
> at org.apache.spark.util.kvstore.LevelDB.get(LevelDB.java:132)
> at 
> org.apache.spark.util.kvstore.LevelDBIterator.next(LevelDBIterator.java:137)
> at 
> scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:44)
> at scala.collection.Iterator.foreach(Iterator.scala:941)
> at scala.collection.Iterator.foreach$(Iterator.scala:941)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
> at scala.collection.IterableLike.foreach(IterableLike.scala:74)
> at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
> at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
> at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
> at 
> scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184)
> at 
> scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47)
> at scala.collection.TraversableLike.to(TraversableLike.scala:678)
> at scala.collection.TraversableLike.to$(TraversableLike.scala:675)
> at scala.collection.AbstractTraversable.to(Traversable.scala:108)
> at scala.collection.TraversableOnce.toList(TraversableOnce.scala:299)
> at scala.collection.TraversableOnce.toList$(TraversableOnce.scala:299)
> at scala.collection.AbstractTraversable.toList(Traversable.scala:108)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider.checkForLogs(FsHistoryProvider.scala:588)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$startPolling$3(FsHistoryProvider.scala:299)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37714) ANSI mode: allow casting between numeric type and timestamp type

2021-12-22 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-37714:
---
Summary: ANSI mode: allow casting between numeric type and timestamp type   
(was: ANSI mode: allow casting between numeric type and timestamp type by 
default)

> ANSI mode: allow casting between numeric type and timestamp type 
> -
>
> Key: SPARK-37714
> URL: https://issues.apache.org/jira/browse/SPARK-37714
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> h3. What changes were proposed?
>  * By default, allow casting between numeric type and timestamp type under 
> ANSI mode
>  * Remove the user-facing configuration 
> {{spark.sql.ansi.allowCastBetweenDatetimeAndNumeric}}
> h3. Why are the changes needed?
> Same reason as mentioned in 
> [#34459|https://github.com/apache/spark/pull/34459]. It is for better 
> adoption of ANSI SQL mode since users are relying on it:
>  * As we did some data science, we found that many Spark SQL users are 
> actually using {{Cast(Timestamp as Numeric)}} and {{{}Cast(Numeric as 
> Timestamp){}}}.
>  * The Spark SQL connector for Tableau is using this feature for DateTime 
> math. e.g.
> {{CAST(FROM_UNIXTIME(CAST(CAST(%1 AS BIGINT) + (%2 * 86400) AS BIGINT)) AS 
> TIMESTAMP)}}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37714) ANSI mode: allow casting between numeric type and timestamp type by default

2021-12-22 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-37714:
---
Description: 
h3. What changes were proposed?
 * By default, allow casting between numeric type and timestamp type under ANSI 
mode
 * Remove the user-facing configuration 
{{spark.sql.ansi.allowCastBetweenDatetimeAndNumeric}}

h3. Why are the changes needed?

Same reason as mentioned in 
[#34459|https://github.com/apache/spark/pull/34459]. It is for better adoption 
of ANSI SQL mode since users are relying on it:
 * As we did some data science, we found that many Spark SQL users are actually 
using {{Cast(Timestamp as Numeric)}} and {{{}Cast(Numeric as Timestamp){}}}.
 * The Spark SQL connector for Tableau is using this feature for DateTime math. 
e.g.
{{CAST(FROM_UNIXTIME(CAST(CAST(%1 AS BIGINT) + (%2 * 86400) AS BIGINT)) AS 
TIMESTAMP)}}

  was:
By default, allow casting between numeric type and datetime type under ANSI 
mode 

This is for better adoption of ANSI SQL mode:
 * As we did some data science, we found that many Spark SQL users are actually 
using {{Cast(Timestamp as Numeric)}} and {{{}Cast(Numeric as Timestamp){}}}. 
There are also some usages of {{{}Cast(Date as Numeric){}}}.
 * The Spark SQL connector for Tableau is using this feature for DateTime math. 
e.g.
{{CAST(FROM_UNIXTIME(CAST(CAST(%1 AS BIGINT) + (%2 * 86400) AS BIGINT)) AS 
TIMESTAMP)}}


> ANSI mode: allow casting between numeric type and timestamp type by default
> ---
>
> Key: SPARK-37714
> URL: https://issues.apache.org/jira/browse/SPARK-37714
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> h3. What changes were proposed?
>  * By default, allow casting between numeric type and timestamp type under 
> ANSI mode
>  * Remove the user-facing configuration 
> {{spark.sql.ansi.allowCastBetweenDatetimeAndNumeric}}
> h3. Why are the changes needed?
> Same reason as mentioned in 
> [#34459|https://github.com/apache/spark/pull/34459]. It is for better 
> adoption of ANSI SQL mode since users are relying on it:
>  * As we did some data science, we found that many Spark SQL users are 
> actually using {{Cast(Timestamp as Numeric)}} and {{{}Cast(Numeric as 
> Timestamp){}}}.
>  * The Spark SQL connector for Tableau is using this feature for DateTime 
> math. e.g.
> {{CAST(FROM_UNIXTIME(CAST(CAST(%1 AS BIGINT) + (%2 * 86400) AS BIGINT)) AS 
> TIMESTAMP)}}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37714) ANSI mode: allow casting between numeric type and timestamp type by default

2021-12-22 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-37714:
---
Summary: ANSI mode: allow casting between numeric type and timestamp type 
by default  (was: ANSI mode: allow casting between numeric type and datetime 
type by default)

> ANSI mode: allow casting between numeric type and timestamp type by default
> ---
>
> Key: SPARK-37714
> URL: https://issues.apache.org/jira/browse/SPARK-37714
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> By default, allow casting between numeric type and datetime type under ANSI 
> mode 
> This is for better adoption of ANSI SQL mode:
>  * As we did some data science, we found that many Spark SQL users are 
> actually using {{Cast(Timestamp as Numeric)}} and {{{}Cast(Numeric as 
> Timestamp){}}}. There are also some usages of {{{}Cast(Date as Numeric){}}}.
>  * The Spark SQL connector for Tableau is using this feature for DateTime 
> math. e.g.
> {{CAST(FROM_UNIXTIME(CAST(CAST(%1 AS BIGINT) + (%2 * 86400) AS BIGINT)) AS 
> TIMESTAMP)}}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37714) ANSI mode: allow casting between numeric type and datetime type by default

2021-12-22 Thread Gengliang Wang (Jira)

Gengliang Wang created SPARK-37714:
--

 Summary: ANSI mode: allow casting between numeric type and 
datetime type by default
 Key: SPARK-37714
 URL: https://issues.apache.org/jira/browse/SPARK-37714
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


By default, allow casting between numeric type and datetime type under ANSI 
mode 

This is for better adoption of ANSI SQL mode:
 * As we did some data science, we found that many Spark SQL users are actually 
using {{Cast(Timestamp as Numeric)}} and {{{}Cast(Numeric as Timestamp){}}}. 
There are also some usages of {{{}Cast(Date as Numeric){}}}.
 * The Spark SQL connector for Tableau is using this feature for DateTime math. 
e.g.
{{CAST(FROM_UNIXTIME(CAST(CAST(%1 AS BIGINT) + (%2 * 86400) AS BIGINT)) AS 
TIMESTAMP)}}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33354) New explicit cast syntax rules in ANSI mode

2021-12-22 Thread Gengliang Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17463786#comment-17463786
 ] 

Gengliang Wang commented on SPARK-33354:


[~kwafor] there will be runtime error if the String can't be parsed as Numbers 
under ANSI mode.

> New explicit cast syntax rules in ANSI mode
> ---
>
> Key: SPARK-33354
> URL: https://issues.apache.org/jira/browse/SPARK-33354
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.1.0
>
>
> In section 6.13 of the ANSI SQL standard,  there are syntax rules for valid 
> combinations of the source and target data types.
> To make Spark's ANSI mode more ANSI SQL Compatible，I propose to disallow the 
> following casting in ANSI mode:
> {code:java}
> TimeStamp <=> Boolean
> Date <=> Boolean
> Numeric <=> Timestamp
> Numeric <=> Date
> Numeric <=> Binary
> String <=> Array
> String <=> Map
> String <=> Struct
> {code}
> The following castings are considered invalid in ANSI SQL standard, but they 
> are quite straight forward. Let's Allow them for now
> {code:java}
> Numeric <=> Boolean
> String <=> Boolean
> String <=> Binary
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37707) Allow store assignment between TimestampNTZ and Date/Timestamp

2021-12-22 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-37707.

   Fix Version/s: 3.3.0
Target Version/s: 3.3.0
  Resolution: Fixed

> Allow store assignment between TimestampNTZ  and Date/Timestamp
> ---
>
> Key: SPARK-37707
> URL: https://issues.apache.org/jira/browse/SPARK-37707
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.3.0
>
>
> Allow store assigment between:
>  * TimestampNTZ <=> Date
>  * TimestampNTZ <=> Timestamp



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37707) Allow store assignment between TimestampNTZ and Date/Timestamp

2021-12-21 Thread Gengliang Wang (Jira)

Gengliang Wang created SPARK-37707:
--

 Summary: Allow store assignment between TimestampNTZ  and 
Date/Timestamp
 Key: SPARK-37707
 URL: https://issues.apache.org/jira/browse/SPARK-37707
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


Allow store assigment between:
 * TimestampNTZ <=> Date
 * TimestampNTZ <=> Timestamp



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37373) Collect LocalSparkContext worker logs in case of test failure

2021-12-15 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-37373.

Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34651
[https://github.com/apache/spark/pull/34651]

> Collect LocalSparkContext worker logs in case of test failure
> -
>
> Key: SPARK-37373
> URL: https://issues.apache.org/jira/browse/SPARK-37373
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.3.0
>Reporter: Attila Zsolt Piros
>Assignee: Attila Zsolt Piros
>Priority: Major
> Fix For: 3.3.0
>
>
> About 50 test suites are using LocalSparkContext by specifying 
> "local-cluster" as the cluster URL. In this case executor logs will be under 
> the worker dir which is a temp directory and as such will be deleted at 
> shutdown (for details see 
> https://github.com/apache/spark/blob/0a4961df29aab6912492e87e4e719865fe20d981/core/src/main/scala/org/apache/spark/deploy/LocalSparkCluster.scala#L70)
> So when a test fails and the error was on the executor side the log will be 
> lost. 
> This is only for local cluster tests and not for standalone tests where logs 
> will be kept in the "/work".  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37631) Code clean up on promoting strings in math functions

2021-12-13 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-37631.

Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34884
[https://github.com/apache/spark/pull/34884]

> Code clean up on promoting strings in math functions
> 
>
> Key: SPARK-37631
> URL: https://issues.apache.org/jira/browse/SPARK-37631
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Trivial
> Fix For: 3.3.0
>
>
> There is some similar logic in promoting string to double type in 
> TypeCoercion and AnsiTypeCoercion.
> We can change the function Abs/UnaryMinus/UnaryPositive to extend 
> ImplicitCastInputTypes so that string will implicitly cast to Double type in 
> the rule `ImplicitTypeCasts`.  We don't have to repeat the logic in rule 
> `PromoteStrings` or `PromoteStringLiterals`



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37631) Code clean up on promoting strings in math functions

2021-12-13 Thread Gengliang Wang (Jira)

Gengliang Wang created SPARK-37631:
--

 Summary: Code clean up on promoting strings in math functions
 Key: SPARK-37631
 URL: https://issues.apache.org/jira/browse/SPARK-37631
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.3.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


There is some similar logic in promoting string to double type in TypeCoercion 
and AnsiTypeCoercion.

We can change the function Abs/UnaryMinus/UnaryPositive to extend 

ImplicitCastInputTypes so that string will implicitly cast to Double type in 
the rule `ImplicitTypeCasts`.  We don't have to repeat the logic in rule 
`PromoteStrings` or `PromoteStringLiterals`



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37584) New SQL function: map_contains_key

2021-12-09 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-37584.

Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34836
[https://github.com/apache/spark/pull/34836]

> New SQL function: map_contains_key
> --
>
> Key: SPARK-37584
> URL: https://issues.apache.org/jira/browse/SPARK-37584
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.3.0
>
>
> Add a new function map_contains_key, which returns true if the map contains 
> the key
> Examples:
> > SELECT map_contains_key(map(1, 'a', 2, 'b'), 1);
> true
> > SELECT map_contains_key(map(1, 'a', 2, 'b'), 3);
> false



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37584) New SQL function: map_contains_key

2021-12-08 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-37584:
---
Summary: New SQL function: map_contains_key  (was: New function: 
map_contains_key)

> New SQL function: map_contains_key
> --
>
> Key: SPARK-37584
> URL: https://issues.apache.org/jira/browse/SPARK-37584
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Add a new function map_contains_key, which returns true if the map contains 
> the key
> Examples:
> > SELECT map_contains_key(map(1, 'a', 2, 'b'), 1);
> true
> > SELECT map_contains_key(map(1, 'a', 2, 'b'), 3);
> false



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37584) New function: map_contains_key

2021-12-08 Thread Gengliang Wang (Jira)

Gengliang Wang created SPARK-37584:
--

 Summary: New function: map_contains_key
 Key: SPARK-37584
 URL: https://issues.apache.org/jira/browse/SPARK-37584
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.3.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


Add a new function map_contains_key, which returns true if the map contains the 
key

Examples:
> SELECT map_contains_key(map(1, 'a', 2, 'b'), 1);
true
> SELECT map_contains_key(map(1, 'a', 2, 'b'), 3);
false



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37571) decouple amplab jenkins from spark website, builds and tests

2021-12-07 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-37571:
---
Affects Version/s: 3.3.0
   (was: 3.2.0)

> decouple amplab jenkins from spark website, builds and tests
> 
>
> Key: SPARK-37571
> URL: https://issues.apache.org/jira/browse/SPARK-37571
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Shane Knapp
>Assignee: Shane Knapp
>Priority: Major
> Attachments: audit.txt, spark-repo-to-be-audited.txt
>
>
> we will be turning off jenkins on dec 23rd, and we need to decouple the build 
> infra from jenkins, as well as remove any amplab jenkins-specific docs on the 
> website, scripts and infra setup.
> i'll be creating > 1 PRs for this.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37533) New SQL function: try_element_at

2021-12-04 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-37533.

Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34796
[https://github.com/apache/spark/pull/34796]

> New SQL function: try_element_at
> 
>
> Key: SPARK-37533
> URL: https://issues.apache.org/jira/browse/SPARK-37533
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.3.0
>
>
> Add New SQL functions `try_element_at`, which is identical to the 
> `element_at` except that it returns null if error occurs
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37533) New SQL function: try_element_at

2021-12-03 Thread Gengliang Wang (Jira)

Gengliang Wang created SPARK-37533:
--

 Summary: New SQL function: try_element_at
 Key: SPARK-37533
 URL: https://issues.apache.org/jira/browse/SPARK-37533
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


Add New SQL functions `try_element_at`, which is identical to the `element_at` 
except that it returns null if error occurs

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37490) Show hint if analyzer fails due to ANSI type coercion

2021-11-30 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-37490.

Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34747
[https://github.com/apache/spark/pull/34747]

> Show hint if analyzer fails due to ANSI type coercion
> -
>
> Key: SPARK-37490
> URL: https://issues.apache.org/jira/browse/SPARK-37490
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.3.0
>
>
> Show hint in the error message if analysis failed only with ANSI type 
> coercion:
> {code:java}
> To fix the error, you might need to add explicit type casts.
> To bypass the error with lenient type coercion rules, set 
> spark.sql.ansi.enabled as false. {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37490) Show hint if analyzer fails due to ANSI type coercion

2021-11-29 Thread Gengliang Wang (Jira)

Gengliang Wang created SPARK-37490:
--

 Summary: Show hint if analyzer fails due to ANSI type coercion
 Key: SPARK-37490
 URL: https://issues.apache.org/jira/browse/SPARK-37490
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


Show hint in the error message if analysis failed only with ANSI type coercion:
{code:java}
To fix the error, you might need to add explicit type casts.
To bypass the error with lenient type coercion rules, set 
spark.sql.ansi.enabled as false. {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34735) Add modified configs for SQL execution in UI

2021-11-26 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-34735:
--

Assignee: XiDuo You

> Add modified configs for SQL execution in UI
> 
>
> Key: SPARK-34735
> URL: https://issues.apache.org/jira/browse/SPARK-34735
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: sql-ui.jpg
>
>
> For SQL user, it's very common to add some config to optimize sql. Within a 
> script, it would look like this
> {code:java}
> set k1=v1;
> set k2=v2;
> set ...
> INSERT INTO TABLE t1
> SELECT ...
> {code}
>  
>  It's hard to find the configs used by sql without the raw sql string. 
> Current UI provide a `Environment` tab that we can only get some global 
> initial config, however it's not enough.
> Some use case:
>  * Jar based job, we might set config many times due to many sql execution.
>  * SQL server e.g. (SparkThriftServer), we might execute thousands scripts 
> every day with different session.
> We expect a feature that can list the modified configs which could affect the 
> sql execution.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34735) Add modified configs for SQL execution in UI

2021-11-26 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-34735.

Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 31830
[https://github.com/apache/spark/pull/31830]

> Add modified configs for SQL execution in UI
> 
>
> Key: SPARK-34735
> URL: https://issues.apache.org/jira/browse/SPARK-34735
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: XiDuo You
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: sql-ui.jpg
>
>
> For SQL user, it's very common to add some config to optimize sql. Within a 
> script, it would look like this
> {code:java}
> set k1=v1;
> set k2=v2;
> set ...
> INSERT INTO TABLE t1
> SELECT ...
> {code}
>  
>  It's hard to find the configs used by sql without the raw sql string. 
> Current UI provide a `Environment` tab that we can only get some global 
> initial config, however it's not enough.
> Some use case:
>  * Jar based job, we might set config many times due to many sql execution.
>  * SQL server e.g. (SparkThriftServer), we might execute thousands scripts 
> every day with different session.
> We expect a feature that can list the modified configs which could affect the 
> sql execution.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37438) ANSI mode: Use store assignment rules for resolving function invocation

2021-11-22 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-37438.

Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34681
[https://github.com/apache/spark/pull/34681]

> ANSI mode: Use store assignment rules for resolving function invocation
> ---
>
> Key: SPARK-37438
> URL: https://issues.apache.org/jira/browse/SPARK-37438
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.3.0
>
>
> Under ANSI mode(spark.sql.ansi.enabled=true), the function invocation of 
> Spark SQL:
> - In general, it follows the `Store assignment` rules as storing the input 
> values as the declared parameter type of the SQL functions
> - Special rules apply for string literals and untyped NULL. A NULL can be 
> promoted to any other type, while a string literal can be promoted to any 
> simple data type.
> {code:sql}
> > SET spark.sql.ansi.enabled=true;
> -- implicitly cast Int to String type
> > SELECT concat('total number: ', 1);
> total number: 1
> -- implicitly cast Timestamp to Date type
> > select datediff(now(), current_date);
> 0
> -- specialrule: implicitly cast String literal to Double type
> > SELECT ceil('0.1');
> 1
> -- specialrule: implicitly cast NULL to Date type
> > SELECT year(null);
> NULL
> > CREATE TABLE t(s string);
> -- Can't store String column as Numeric types.
> > SELECT ceil(s) from t;
> Error in query: cannot resolve 'CEIL(spark_catalog.default.t.s)' due to data 
> type mismatch
> -- Can't store String column as Date type.
> > select year(s) from t;
> Error in query: cannot resolve 'year(spark_catalog.default.t.s)' due to data 
> type mismatch
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37438) ANSI mode: Use store assignment rules for resolving function invocation

2021-11-22 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-37438:
---
Description: 
Under ANSI mode(spark.sql.ansi.enabled=true), the function invocation of Spark 
SQL:
- In general, it follows the `Store assignment` rules as storing the input 
values as the declared parameter type of the SQL functions
- Special rules apply for string literals and untyped NULL. A NULL can be 
promoted to any other type, while a string literal can be promoted to any 
simple data type.


{code:sql}
> SET spark.sql.ansi.enabled=true;
-- implicitly cast Int to String type
> SELECT concat('total number: ', 1);
total number: 1
-- implicitly cast Timestamp to Date type
> select datediff(now(), current_date);
0

-- specialrule: implicitly cast String literal to Double type
> SELECT ceil('0.1');
1
-- specialrule: implicitly cast NULL to Date type
> SELECT year(null);
NULL

> CREATE TABLE t(s string);
-- Can't store String column as Numeric types.
> SELECT ceil(s) from t;
Error in query: cannot resolve 'CEIL(spark_catalog.default.t.s)' due to data 
type mismatch
-- Can't store String column as Date type.
> select year(s) from t;
Error in query: cannot resolve 'year(spark_catalog.default.t.s)' due to data 
type mismatch
{code}



  was:
Under ANSI mode(spark.sql.ansi.enabled=true), the function invocation of Spark 
SQL:
- In general, it follows the `Store assignment` rules as storing the input 
values as the declared parameter type of the SQL functions
- Special rules apply for string literals and untyped NULL. A NULL can be 
promoted to any other type, while a string literal can be promoted to any 
simple data type.

```sql
> SET spark.sql.ansi.enabled=true;
-- implicitly cast Int to String type
> SELECT concat('total number: ', 1);
total number: 1
-- implicitly cast Timestamp to Date type
> select datediff(now(), current_date);
0

-- specialrule: implicitly cast String literal to Double type
> SELECT ceil('0.1');
1
-- specialrule: implicitly cast NULL to Date type
> SELECT year(null);
NULL

> CREATE TABLE t(s string);
-- Can't store assign String column as Numeric types.
> SELECT ceil(s) from t;
Error in query: cannot resolve 'CEIL(spark_catalog.default.t.s)' due to data 
type mismatch
-- Can't store assign String column as Date type.
> select year(s) from t;
Error in query: cannot resolve 'year(spark_catalog.default.t.s)' due to data 
type mismatch
```


> ANSI mode: Use store assignment rules for resolving function invocation
> ---
>
> Key: SPARK-37438
> URL: https://issues.apache.org/jira/browse/SPARK-37438
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Under ANSI mode(spark.sql.ansi.enabled=true), the function invocation of 
> Spark SQL:
> - In general, it follows the `Store assignment` rules as storing the input 
> values as the declared parameter type of the SQL functions
> - Special rules apply for string literals and untyped NULL. A NULL can be 
> promoted to any other type, while a string literal can be promoted to any 
> simple data type.
> {code:sql}
> > SET spark.sql.ansi.enabled=true;
> -- implicitly cast Int to String type
> > SELECT concat('total number: ', 1);
> total number: 1
> -- implicitly cast Timestamp to Date type
> > select datediff(now(), current_date);
> 0
> -- specialrule: implicitly cast String literal to Double type
> > SELECT ceil('0.1');
> 1
> -- specialrule: implicitly cast NULL to Date type
> > SELECT year(null);
> NULL
> > CREATE TABLE t(s string);
> -- Can't store String column as Numeric types.
> > SELECT ceil(s) from t;
> Error in query: cannot resolve 'CEIL(spark_catalog.default.t.s)' due to data 
> type mismatch
> -- Can't store String column as Date type.
> > select year(s) from t;
> Error in query: cannot resolve 'year(spark_catalog.default.t.s)' due to data 
> type mismatch
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37438) ANSI mode: Use store assignment rules for resolving function invocation

2021-11-22 Thread Gengliang Wang (Jira)

Gengliang Wang created SPARK-37438:
--

 Summary: ANSI mode: Use store assignment rules for resolving 
function invocation
 Key: SPARK-37438
 URL: https://issues.apache.org/jira/browse/SPARK-37438
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


Under ANSI mode(spark.sql.ansi.enabled=true), the function invocation of Spark 
SQL:
- In general, it follows the `Store assignment` rules as storing the input 
values as the declared parameter type of the SQL functions
- Special rules apply for string literals and untyped NULL. A NULL can be 
promoted to any other type, while a string literal can be promoted to any 
simple data type.

```sql
> SET spark.sql.ansi.enabled=true;
-- implicitly cast Int to String type
> SELECT concat('total number: ', 1);
total number: 1
-- implicitly cast Timestamp to Date type
> select datediff(now(), current_date);
0

-- specialrule: implicitly cast String literal to Double type
> SELECT ceil('0.1');
1
-- specialrule: implicitly cast NULL to Date type
> SELECT year(null);
NULL

> CREATE TABLE t(s string);
-- Can't store assign String column as Numeric types.
> SELECT ceil(s) from t;
Error in query: cannot resolve 'CEIL(spark_catalog.default.t.s)' due to data 
type mismatch
-- Can't store assign String column as Date type.
> select year(s) from t;
Error in query: cannot resolve 'year(spark_catalog.default.t.s)' due to data 
type mismatch
```



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36346) Support TimestampNTZ type in Orc file source

2021-11-17 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-36346:
--

Assignee: jiaan.geng

> Support TimestampNTZ type in Orc file source
> 
>
> Key: SPARK-36346
> URL: https://issues.apache.org/jira/browse/SPARK-36346
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
>
> As per https://orc.apache.org/docs/types.html, Orc supports both 
> TIMESTAMP_NTZ and TIMESTAMP_LTZ (Spark's current default timestamp type):
> * A TIMESTAMP => TIMESTAMP_LTZ
> * Timestamp with local time zone => TIMESTAMP_NTZ
> In Spark 3.1 or prior, Spark only considered TIMESTAMP.
> Since 3.2, with the support of timestamp without time zone type:
> * Orc writer follows the definition and uses "Timestamp with local time zone" 
> on writing TIMESTAMP_NTZ.
> * Orc reader converts the "Timestamp with local time zone" to TIMESTAMP_NTZ.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36346) Support TimestampNTZ type in Orc file source

2021-11-17 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-36346.

Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 33588
[https://github.com/apache/spark/pull/33588]

> Support TimestampNTZ type in Orc file source
> 
>
> Key: SPARK-36346
> URL: https://issues.apache.org/jira/browse/SPARK-36346
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.3.0
>
>
> As per https://orc.apache.org/docs/types.html, Orc supports both 
> TIMESTAMP_NTZ and TIMESTAMP_LTZ (Spark's current default timestamp type):
> * A TIMESTAMP => TIMESTAMP_LTZ
> * Timestamp with local time zone => TIMESTAMP_NTZ
> In Spark 3.1 or prior, Spark only considered TIMESTAMP.
> Since 3.2, with the support of timestamp without time zone type:
> * Orc writer follows the definition and uses "Timestamp with local time zone" 
> on writing TIMESTAMP_NTZ.
> * Orc reader converts the "Timestamp with local time zone" to TIMESTAMP_NTZ.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37179) ANSI mode: Add a config to allow casting between Datetime and Numeric

2021-11-02 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-37179:
---
Description: 
Add a config `spark.sql.ansi.allowCastBetweenDatetimeAndNumeric`to allow 
casting between Datetime and Numeric. The default value of the configuration is 
`false`.
Also, casting double/float type to timestamp should raise exceptions if there 
is overflow or the input is Nan/infinite.

This is for better adoption of ANSI SQL mode:
- As we did some data science, we found that many Spark SQL users are actually 
using `Cast(Timestamp as Numeric)` and `Cast(Numeric as Timestamp)`. There are 
also some usages of `Cast(Date as Numeric)`.
- The Spark SQL connector for Tableau is using this feature for DateTime math. 
e.g.
 `CAST(FROM_UNIXTIME(CAST(CAST(%1 AS BIGINT) + (%2 * 86400) AS BIGINT)) AS 
TIMESTAMP)`

So, having a new configuration can provide users with an alternative choice on 
turning on ANSI mode.

  was:
We should allow the casting between Timestamp and Numeric types:
* As we did some data science, we found that many Spark SQL users are actually 
using `Cast(Timestamp as Numeric)` and `Cast(Numeric as Timestamp)`. 
* The Spark SQL connector for Tableau is using this feature for DateTime math. 
e.g.
{code:java}
CAST(FROM_UNIXTIME(CAST(CAST(%1 AS BIGINT) + (%2 * 86400) AS BIGINT)) AS 
TIMESTAMP)
{code}
* In the current syntax, we specially allow Numeric <=> Boolean and String <=> 
Binary since they are straight forward and frequently used.  I suggest we allow 
Timestamp <=> Numeric as well for better ANSI mode adoption.


> ANSI mode: Add a config to allow casting between Datetime and Numeric
> -
>
> Key: SPARK-37179
> URL: https://issues.apache.org/jira/browse/SPARK-37179
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.3.0
>
>
> Add a config `spark.sql.ansi.allowCastBetweenDatetimeAndNumeric`to allow 
> casting between Datetime and Numeric. The default value of the configuration 
> is `false`.
> Also, casting double/float type to timestamp should raise exceptions if there 
> is overflow or the input is Nan/infinite.
> This is for better adoption of ANSI SQL mode:
> - As we did some data science, we found that many Spark SQL users are 
> actually using `Cast(Timestamp as Numeric)` and `Cast(Numeric as Timestamp)`. 
> There are also some usages of `Cast(Date as Numeric)`.
> - The Spark SQL connector for Tableau is using this feature for DateTime 
> math. e.g.
>  `CAST(FROM_UNIXTIME(CAST(CAST(%1 AS BIGINT) + (%2 * 86400) AS BIGINT)) AS 
> TIMESTAMP)`
> So, having a new configuration can provide users with an alternative choice 
> on turning on ANSI mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37179) ANSI mode: Add a config to allow casting between Datetime and Numeric

2021-11-02 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-37179:
---
Summary: ANSI mode: Add a config to allow casting between Datetime and 
Numeric  (was: ANSI mode: Allow casting between Timestamp and Numeric)

> ANSI mode: Add a config to allow casting between Datetime and Numeric
> -
>
> Key: SPARK-37179
> URL: https://issues.apache.org/jira/browse/SPARK-37179
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.3.0
>
>
> We should allow the casting between Timestamp and Numeric types:
> * As we did some data science, we found that many Spark SQL users are 
> actually using `Cast(Timestamp as Numeric)` and `Cast(Numeric as Timestamp)`. 
> * The Spark SQL connector for Tableau is using this feature for DateTime 
> math. e.g.
> {code:java}
> CAST(FROM_UNIXTIME(CAST(CAST(%1 AS BIGINT) + (%2 * 86400) AS BIGINT)) AS 
> TIMESTAMP)
> {code}
> * In the current syntax, we specially allow Numeric <=> Boolean and String 
> <=> Binary since they are straight forward and frequently used.  I suggest we 
> allow Timestamp <=> Numeric as well for better ANSI mode adoption.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37179) ANSI mode: Allow casting between Timestamp and Numeric

2021-11-01 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-37179:
---
Description: 
We should allow the casting between Timestamp and Numeric types:
* As we did some data science, we found that many Spark SQL users are actually 
using `Cast(Timestamp as Numeric)` and `Cast(Numeric as Timestamp)`. 
* The Spark SQL connector for Tableau is using this feature for DateTime math. 
e.g.
{code:java}
CAST(FROM_UNIXTIME(CAST(CAST(%1 AS BIGINT) + (%2 * 86400) AS BIGINT)) AS 
TIMESTAMP)
{code}
* In the current syntax, we specially allow Numeric <=> Boolean and String <=> 
Binary since they are straight forward and frequently used.  I suggest we allow 
Timestamp <=> Numeric as well for better ANSI mode adoption.

  was:
We should allow casting 
As we did some data science, we found that many Spark SQL users are actually 
using `Cast(Timestamp as Numeric)` and `Cast(Numeric as Timestamp)`. 


> ANSI mode: Allow casting between Timestamp and Numeric
> --
>
> Key: SPARK-37179
> URL: https://issues.apache.org/jira/browse/SPARK-37179
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> We should allow the casting between Timestamp and Numeric types:
> * As we did some data science, we found that many Spark SQL users are 
> actually using `Cast(Timestamp as Numeric)` and `Cast(Numeric as Timestamp)`. 
> * The Spark SQL connector for Tableau is using this feature for DateTime 
> math. e.g.
> {code:java}
> CAST(FROM_UNIXTIME(CAST(CAST(%1 AS BIGINT) + (%2 * 86400) AS BIGINT)) AS 
> TIMESTAMP)
> {code}
> * In the current syntax, we specially allow Numeric <=> Boolean and String 
> <=> Binary since they are straight forward and frequently used.  I suggest we 
> allow Timestamp <=> Numeric as well for better ANSI mode adoption.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37179) ANSI mode: Allow casting between Timestamp and Numeric

2021-11-01 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-37179:
--

Assignee: Gengliang Wang

> ANSI mode: Allow casting between Timestamp and Numeric
> --
>
> Key: SPARK-37179
> URL: https://issues.apache.org/jira/browse/SPARK-37179
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> We should allow casting 
> As we did some data science, we found that many Spark SQL users are actually 
> using `Cast(Timestamp as Numeric)` and `Cast(Numeric as Timestamp)`. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37179) ANSI mode: Allow casting between Timestamp and Numeric

2021-11-01 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-37179:
---
Description: 
We should allow casting 
As we did some data science, we found that many Spark SQL users are actually 
using `Cast(Timestamp as Numeric)` and `Cast(Numeric as Timestamp)`. 

  was:The casting between 


> ANSI mode: Allow casting between Timestamp and Numeric
> --
>
> Key: SPARK-37179
> URL: https://issues.apache.org/jira/browse/SPARK-37179
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Priority: Major
>
> We should allow casting 
> As we did some data science, we found that many Spark SQL users are actually 
> using `Cast(Timestamp as Numeric)` and `Cast(Numeric as Timestamp)`. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37179) ANSI mode: Allow casting between Timestamp and Numeric

2021-11-01 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-37179:
---
Description: The casting between 

> ANSI mode: Allow casting between Timestamp and Numeric
> --
>
> Key: SPARK-37179
> URL: https://issues.apache.org/jira/browse/SPARK-37179
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Priority: Major
>
> The casting between 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37179) ANSI mode: Allow casting between Timestamp and Numeric

2021-11-01 Thread Gengliang Wang (Jira)

Gengliang Wang created SPARK-37179:
--

 Summary: ANSI mode: Allow casting between Timestamp and Numeric
 Key: SPARK-37179
 URL: https://issues.apache.org/jira/browse/SPARK-37179
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0
Reporter: Gengliang Wang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37163) Disallow casting Date as Numeric types

2021-10-30 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-37163.

Resolution: Won't Do

> Disallow casting Date as Numeric types
> --
>
> Key: SPARK-37163
> URL: https://issues.apache.org/jira/browse/SPARK-37163
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Currently, Date type values can be cast as Numeric types. However, the result 
> is always NULL.
> On the other hand, Numeric values can't be cast as Date type.
> It doesn't make sense to keep the behavior of casting Date to null numeric 
> types. I suggest to disallow the conversion. We can have a legacy flag 
> `spark.sql.legacy.allowCastDateAsNumeric` if users really want to fall back 
> to the legacy behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37163) Disallow casting Date as Numeric types

2021-10-29 Thread Gengliang Wang (Jira)

Gengliang Wang created SPARK-37163:
--

 Summary: Disallow casting Date as Numeric types
 Key: SPARK-37163
 URL: https://issues.apache.org/jira/browse/SPARK-37163
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


Currently, Date type values can be cast as Numeric types. However, the result 
is always NULL.
On the other hand, Numeric values can't be cast as Date type.
It doesn't make sense to keep the behavior of casting Date to null numeric 
types. I suggest to disallow the conversion. We can have a legacy flag 
`spark.sql.legacy.allowCastDateAsNumeric` if users really want to fall back to 
the legacy behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37057) Fix wrong DocSearch facet filter in release-tag.sh

2021-10-19 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-37057.

Fix Version/s: 3.2.1
   3.3.0
   Resolution: Fixed

Issue resolved by pull request 34328
[https://github.com/apache/spark/pull/34328]

> Fix wrong DocSearch facet filter in release-tag.sh
> --
>
> Key: SPARK-37057
> URL: https://issues.apache.org/jira/browse/SPARK-37057
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Minor
> Fix For: 3.3.0, 3.2.1
>
>
> In release-tag.sh, the DocSearch facet filter should be updated as the 
> release version before creating  git tag. 
> If we missed the step, the facet filter is wrong in the new release doc:  
> https://github.com/apache/spark/blame/v3.2.0/docs/_config.yml#L42



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37057) Fix wrong DocSearch facet filter in release-tag.sh

2021-10-18 Thread Gengliang Wang (Jira)

Gengliang Wang created SPARK-37057:
--

 Summary: Fix wrong DocSearch facet filter in release-tag.sh
 Key: SPARK-37057
 URL: https://issues.apache.org/jira/browse/SPARK-37057
 Project: Spark
  Issue Type: Bug
  Components: Project Infra
Affects Versions: 3.2.0, 3.3.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


In release-tag.sh, the DocSearch facet filter should be updated as the release 
version before creating  git tag. 
If we missed the step, the facet filter is wrong in the new release doc:  
https://github.com/apache/spark/blame/v3.2.0/docs/_config.yml#L42



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36367) Fix the behavior to follow pandas >= 1.3

2021-10-17 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-36367:
---
Affects Version/s: (was: 3.3.0)
   3.2.0

> Fix the behavior to follow pandas >= 1.3
> 
>
> Key: SPARK-36367
> URL: https://issues.apache.org/jira/browse/SPARK-36367
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.2.0
>
>
> Pandas 1.3 has been released. We should follow the new pandas behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34990) Add ParquetEncryptionSuite

2021-10-13 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-34990:
---
Priority: Minor  (was: Major)

> Add ParquetEncryptionSuite
> --
>
> Key: SPARK-34990
> URL: https://issues.apache.org/jira/browse/SPARK-34990
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maya Anderson
>Assignee: Maya Anderson
>Priority: Minor
> Fix For: 3.2.0
>
>
> Now that Parquet Modular Encryption is available in Spark as of SPARK-34542 , 
> we need a test to demonstrate and verify its usage from Spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34990) Add ParquetEncryptionSuite

2021-10-13 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-34990:
---
Issue Type: Test  (was: New Feature)

> Add ParquetEncryptionSuite
> --
>
> Key: SPARK-34990
> URL: https://issues.apache.org/jira/browse/SPARK-34990
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maya Anderson
>Assignee: Maya Anderson
>Priority: Major
> Fix For: 3.2.0
>
>
> Now that Parquet Modular Encryption is available in Spark as of SPARK-34542 , 
> we need a test to demonstrate and verify its usage from Spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34198) Add RocksDB StateStore implementation

2021-10-09 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-34198.

  Assignee: Apache Spark
Resolution: Fixed

> Add RocksDB StateStore implementation
> -
>
> Key: SPARK-34198
> URL: https://issues.apache.org/jira/browse/SPARK-34198
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: L. C. Hsieh
>Assignee: Apache Spark
>Priority: Major
>
> Currently Spark SS only has one built-in StateStore implementation 
> HDFSBackedStateStore. Actually it uses in-memory map to store state rows. As 
> there are more and more streaming applications, some of them requires to use 
> large state in stateful operations such as streaming aggregation and join.
> Several other major streaming frameworks already use RocksDB for state 
> management. So it is proven to be good choice for large state usage. But 
> Spark SS still lacks of a built-in state store for the requirement.
> We would like to explore the possibility to add RocksDB-based StateStore into 
> Spark SS.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35531) Can not insert into hive bucket table if create table with upper case schema

2021-10-07 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-35531:
---
Affects Version/s: 3.0.0
   3.1.1

> Can not insert into hive bucket table if create table with upper case schema
> 
>
> Key: SPARK-35531
> URL: https://issues.apache.org/jira/browse/SPARK-35531
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.1.1, 3.2.0
>Reporter: Hongyi Zhang
>Priority: Major
>
>  
>  
> create table TEST1(
>  V1 BIGINT,
>  S1 INT)
>  partitioned by (PK BIGINT)
>  clustered by (V1)
>  sorted by (S1)
>  into 200 buckets
>  STORED AS PARQUET;
>  
> insert into test1
>  select
>  * from values(1,1,1);
>  
>  
> org.apache.hadoop.hive.ql.metadata.HiveException: Bucket columns V1 is not 
> part of the table columns ([FieldSchema(name:v1, type:bigint, comment:null), 
> FieldSchema(name:s1, type:int, comment:null)]
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Bucket columns V1 is not 
> part of the table columns ([FieldSchema(name:v1, type:bigint, comment:null), 
> FieldSchema(name:s1, type:int, comment:null)]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35531) Can not insert into hive bucket table if create table with upper case schema

2021-10-07 Thread Gengliang Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425954#comment-17425954
 ] 

Gengliang Wang commented on SPARK-35531:


I can reproduce the issue on 3.0.0 and 3.1.1. 
It's a long-standing bug.

> Can not insert into hive bucket table if create table with upper case schema
> 
>
> Key: SPARK-35531
> URL: https://issues.apache.org/jira/browse/SPARK-35531
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Hongyi Zhang
>Priority: Major
>
>  
>  
> create table TEST1(
>  V1 BIGINT,
>  S1 INT)
>  partitioned by (PK BIGINT)
>  clustered by (V1)
>  sorted by (S1)
>  into 200 buckets
>  STORED AS PARQUET;
>  
> insert into test1
>  select
>  * from values(1,1,1);
>  
>  
> org.apache.hadoop.hive.ql.metadata.HiveException: Bucket columns V1 is not 
> part of the table columns ([FieldSchema(name:v1, type:bigint, comment:null), 
> FieldSchema(name:s1, type:int, comment:null)]
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Bucket columns V1 is not 
> part of the table columns ([FieldSchema(name:v1, type:bigint, comment:null), 
> FieldSchema(name:s1, type:int, comment:null)]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36905) Reading Hive view without explicit column names fails in Spark

2021-10-06 Thread Gengliang Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425072#comment-17425072
 ] 

Gengliang Wang commented on SPARK-36905:


[~shardulm] Thanks for reporting the issue. 
I don't think this is a release blocker. I will mention this one as a known 
issue in the release note if it is not resolved by then.

> Reading Hive view without explicit column names fails in Spark 
> ---
>
> Key: SPARK-36905
> URL: https://issues.apache.org/jira/browse/SPARK-36905
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Shardul Mahadik
>Priority: Major
>
> Consider a Hive view in which some columns are not explicitly named
> {code:sql}
> CREATE VIEW test_view AS
> SELECT 1
> FROM some_table
> {code}
> Reading this view in Spark leads to an {{AnalysisException}}
> {code:java}
> org.apache.spark.sql.AnalysisException: cannot resolve '`_c0`' given input 
> columns: [1]
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:188)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:185)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$2(TreeNode.scala:340)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:340)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$1(TreeNode.scala:337)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:406)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:242)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:404)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:357)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:337)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$1(TreeNode.scala:337)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:406)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:242)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:404)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:357)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:337)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUp$1(QueryPlan.scala:104)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:116)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:116)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:127)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:132)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>   at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>   at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:132)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:137)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:242)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:137)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:104)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:185)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:182)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:94)
>

[jira] [Commented] (SPARK-36892) Disable batch fetch for a shuffle when push based shuffle is enabled

2021-10-06 Thread Gengliang Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17424825#comment-17424825
 ] 

Gengliang Wang commented on SPARK-36892:


[~mridulm80] [~mshen] [~zhouyejoe] [~apatnam] Again, thanks for testing Spark 
3.2 with real workloads. Now that all the blockers are resolved. I will have 
the new RC soon.

> Disable batch fetch for a shuffle when push based shuffle is enabled
> 
>
> Key: SPARK-36892
> URL: https://issues.apache.org/jira/browse/SPARK-36892
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.2.0
>Reporter: Mridul Muralidharan
>Assignee: Ye Zhou
>Priority: Blocker
> Fix For: 3.2.0
>
>
> When push based shuffle is enabled, efficient fetch of merged mapper shuffle 
> output happens.
> Unfortunately, this currently interacts badly with 
> spark.sql.adaptive.fetchShuffleBlocksInBatch, potentially causing shuffle 
> fetch to hang and/or duplicate data to be fetched, causing correctness issues.
> Given batch fetch does not benefit spark stages reading merged blocks when 
> push based shuffle is enabled, ShuffleBlockFetcherIterator.doBatchFetch can 
> be disabled when push based shuffle is enabled.
> Thx to [~Ngone51] for surfacing this issue.
> +CC [~Gengliang.Wang]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36892) Disable batch fetch for a shuffle when push based shuffle is enabled

2021-10-06 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-36892.

Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 34156
[https://github.com/apache/spark/pull/34156]

> Disable batch fetch for a shuffle when push based shuffle is enabled
> 
>
> Key: SPARK-36892
> URL: https://issues.apache.org/jira/browse/SPARK-36892
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.2.0
>Reporter: Mridul Muralidharan
>Assignee: Ye Zhou
>Priority: Blocker
> Fix For: 3.2.0
>
>
> When push based shuffle is enabled, efficient fetch of merged mapper shuffle 
> output happens.
> Unfortunately, this currently interacts badly with 
> spark.sql.adaptive.fetchShuffleBlocksInBatch, potentially causing shuffle 
> fetch to hang and/or duplicate data to be fetched, causing correctness issues.
> Given batch fetch does not benefit spark stages reading merged blocks when 
> push based shuffle is enabled, ShuffleBlockFetcherIterator.doBatchFetch can 
> be disabled when push based shuffle is enabled.
> Thx to [~Ngone51] for surfacing this issue.
> +CC [~Gengliang.Wang]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36892) Disable batch fetch for a shuffle when push based shuffle is enabled

2021-10-06 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-36892:
--

Assignee: Ye Zhou

> Disable batch fetch for a shuffle when push based shuffle is enabled
> 
>
> Key: SPARK-36892
> URL: https://issues.apache.org/jira/browse/SPARK-36892
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.2.0
>Reporter: Mridul Muralidharan
>Assignee: Ye Zhou
>Priority: Blocker
>
> When push based shuffle is enabled, efficient fetch of merged mapper shuffle 
> output happens.
> Unfortunately, this currently interacts badly with 
> spark.sql.adaptive.fetchShuffleBlocksInBatch, potentially causing shuffle 
> fetch to hang and/or duplicate data to be fetched, causing correctness issues.
> Given batch fetch does not benefit spark stages reading merged blocks when 
> push based shuffle is enabled, ShuffleBlockFetcherIterator.doBatchFetch can 
> be disabled when push based shuffle is enabled.
> Thx to [~Ngone51] for surfacing this issue.
> +CC [~Gengliang.Wang]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36926) Discrepancy in Q22 of TPCH for Spark 3.2

2021-10-05 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-36926:
--

Assignee: Wenchen Fan  (was: Gengliang Wang)

> Discrepancy in Q22 of TPCH for Spark 3.2
> 
>
> Key: SPARK-36926
> URL: https://issues.apache.org/jira/browse/SPARK-36926
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Aravind Patnam
>Assignee: Wenchen Fan
>Priority: Blocker
> Fix For: 3.2.0
>
>
> When running TPCH scale 100 against 3.2, Query 22 has a discrepancy in the 
> number of rows returned by the query. This was tested with both AQE on and 
> off. All the other queries were matching in results. Below is the results 
> that we got when testing Q22 on 3.2: 
>  
> {code:java}
>   "results": [
> {
>   "name": "Q22",
>   "mode": "collect",
>   "parameters": {},
>   "joinTypes": [
> "SortMergeJoin"
>   ],
>   "tables": [
> "customer"
>   ],
>   "parsingTime": 0.016522,
>   "analysisTime": 0.004132,
>   "optimizationTime": 39.173868,
>   "planningTime": 23.10939,
>   "executionTime": 13762.183844,
>   "result": 0,
>   "breakDown": [],
>   "queryExecution": "== Parsed Logical Plan ==\n'Sort ['cntrycode ASC 
> NULLS FIRST], true\n+- 'Aggregate ['cntrycode], ['cntrycode, 'count(1) AS 
> numcust#150, 'sum('c_acctbal) AS totacctbal#151]\n   +- 'SubqueryAlias 
> custsale\n  +- 'Project ['substring('c_phone, 1, 2) AS cntrycode#147, 
> 'c_acctbal]\n +- 'Filter (('substring('c_phone, 1, 2) IN 
> (13,31,23,29,30,18,17) AND ('c_acctbal > scalar-subquery#148 [])) AND NOT 
> exists#149 [])\n:  :- 'Project [unresolvedalias('avg('c_acctbal), 
> None)]\n:  :  +- 'Filter (('c_acctbal > 0.00) AND 
> 'substring('c_phone, 1, 2) IN (13,31,23,29,30,18,17))\n:  : 
> +- 'UnresolvedRelation [customer], [], false\n:  +- 'Project 
> [*]\n: +- 'Filter ('o_custkey = 'c_custkey)\n:
> +- 'UnresolvedRelation [orders], [], false\n+- 
> 'UnresolvedRelation [customer], [], false\n\n== Analyzed Logical Plan 
> ==\ncntrycode: string, numcust: bigint, totacctbal: decimal(22,2)\nSort 
> [cntrycode#147 ASC NULLS FIRST], true\n+- Aggregate [cntrycode#147], 
> [cntrycode#147, count(1) AS numcust#150L, sum(c_acctbal#11) AS 
> totacctbal#151]\n   +- SubqueryAlias custsale\n  +- Project 
> [substring(c_phone#10, 1, 2) AS cntrycode#147, c_acctbal#11]\n +- 
> Filter ((substring(c_phone#10, 1, 2) IN (13,31,23,29,30,18,17) AND 
> (cast(c_acctbal#11 as decimal(16,6)) > cast(scalar-subquery#148 [] as 
> decimal(16,6 AND NOT exists#149 [c_custkey#6L])\n:  :- 
> Aggregate [avg(c_acctbal#160) AS avg(c_acctbal)#154]\n:  :  +- 
> Filter ((cast(c_acctbal#160 as decimal(12,2)) > cast(0.00 as decimal(12,2))) 
> AND substring(c_phone#159, 1, 2) IN (13,31,23,29,30,18,17))\n:  : 
> +- SubqueryAlias spark_catalog.tpch_data_orc_100.customer\n:  
> :+- Relation 
> tpch_data_orc_100.customer[c_custkey#155L,c_name#156,c_address#157,c_nationkey#158L,c_phone#159,c_acctbal#160,c_comment#161,c_mktsegment#162]
>  orc\n:  +- Project [o_orderkey#16L, o_custkey#17L, 
> o_orderstatus#18, o_totalprice#19, o_orderpriority#20, o_clerk#21, 
> o_shippriority#22, o_comment#23, o_orderdate#24]\n: +- Filter 
> (o_custkey#17L = outer(c_custkey#6L))\n:+- SubqueryAlias 
> spark_catalog.tpch_data_orc_100.orders\n:   +- Relation 
> tpch_data_orc_100.orders[o_orderkey#16L,o_custkey#17L,o_orderstatus#18,o_totalprice#19,o_orderpriority#20,o_clerk#21,o_shippriority#22,o_comment#23,o_orderdate#24]
>  orc\n+- SubqueryAlias spark_catalog.tpch_data_orc_100.customer\n 
>   +- Relation 
> tpch_data_orc_100.customer[c_custkey#6L,c_name#7,c_address#8,c_nationkey#9L,c_phone#10,c_acctbal#11,c_comment#12,c_mktsegment#13]
>  orc\n\n== Optimized Logical Plan ==\nSort [cntrycode#147 ASC NULLS FIRST], 
> true\n+- Aggregate [cntrycode#147], [cntrycode#147, count(1) AS numcust#150L, 
> sum(c_acctbal#11) AS totacctbal#151]\n   +- Project [substring(c_phone#10, 1, 
> 2) AS cntrycode#147, c_acctbal#11]\n  +- Join LeftAnti, (o_custkey#17L = 
> c_custkey#6L)\n :- Project [c_custkey#6L, c_phone#10, c_acctbal#11]\n 
> :  +- Filter ((isnotnull(c_acctbal#11) AND substring(c_phone#10, 1, 
> 2) IN (13,31,23,29,30,18,17)) AND (cast(c_acctbal#11 as decimal(16,6)) > 
> scalar-subquery#148 []))\n : :  +- Aggregate [avg(c_acctbal#160) 
> AS avg(c_acctbal)#154]\n : : +- Project [c_acctbal#160]\n 
> : :+- Filter

[jira] [Assigned] (SPARK-36926) Discrepancy in Q22 of TPCH for Spark 3.2

2021-10-05 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-36926:
--

Assignee: Gengliang Wang

> Discrepancy in Q22 of TPCH for Spark 3.2
> 
>
> Key: SPARK-36926
> URL: https://issues.apache.org/jira/browse/SPARK-36926
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Aravind Patnam
>Assignee: Gengliang Wang
>Priority: Blocker
> Fix For: 3.2.0
>
>
> When running TPCH scale 100 against 3.2, Query 22 has a discrepancy in the 
> number of rows returned by the query. This was tested with both AQE on and 
> off. All the other queries were matching in results. Below is the results 
> that we got when testing Q22 on 3.2: 
>  
> {code:java}
>   "results": [
> {
>   "name": "Q22",
>   "mode": "collect",
>   "parameters": {},
>   "joinTypes": [
> "SortMergeJoin"
>   ],
>   "tables": [
> "customer"
>   ],
>   "parsingTime": 0.016522,
>   "analysisTime": 0.004132,
>   "optimizationTime": 39.173868,
>   "planningTime": 23.10939,
>   "executionTime": 13762.183844,
>   "result": 0,
>   "breakDown": [],
>   "queryExecution": "== Parsed Logical Plan ==\n'Sort ['cntrycode ASC 
> NULLS FIRST], true\n+- 'Aggregate ['cntrycode], ['cntrycode, 'count(1) AS 
> numcust#150, 'sum('c_acctbal) AS totacctbal#151]\n   +- 'SubqueryAlias 
> custsale\n  +- 'Project ['substring('c_phone, 1, 2) AS cntrycode#147, 
> 'c_acctbal]\n +- 'Filter (('substring('c_phone, 1, 2) IN 
> (13,31,23,29,30,18,17) AND ('c_acctbal > scalar-subquery#148 [])) AND NOT 
> exists#149 [])\n:  :- 'Project [unresolvedalias('avg('c_acctbal), 
> None)]\n:  :  +- 'Filter (('c_acctbal > 0.00) AND 
> 'substring('c_phone, 1, 2) IN (13,31,23,29,30,18,17))\n:  : 
> +- 'UnresolvedRelation [customer], [], false\n:  +- 'Project 
> [*]\n: +- 'Filter ('o_custkey = 'c_custkey)\n:
> +- 'UnresolvedRelation [orders], [], false\n+- 
> 'UnresolvedRelation [customer], [], false\n\n== Analyzed Logical Plan 
> ==\ncntrycode: string, numcust: bigint, totacctbal: decimal(22,2)\nSort 
> [cntrycode#147 ASC NULLS FIRST], true\n+- Aggregate [cntrycode#147], 
> [cntrycode#147, count(1) AS numcust#150L, sum(c_acctbal#11) AS 
> totacctbal#151]\n   +- SubqueryAlias custsale\n  +- Project 
> [substring(c_phone#10, 1, 2) AS cntrycode#147, c_acctbal#11]\n +- 
> Filter ((substring(c_phone#10, 1, 2) IN (13,31,23,29,30,18,17) AND 
> (cast(c_acctbal#11 as decimal(16,6)) > cast(scalar-subquery#148 [] as 
> decimal(16,6 AND NOT exists#149 [c_custkey#6L])\n:  :- 
> Aggregate [avg(c_acctbal#160) AS avg(c_acctbal)#154]\n:  :  +- 
> Filter ((cast(c_acctbal#160 as decimal(12,2)) > cast(0.00 as decimal(12,2))) 
> AND substring(c_phone#159, 1, 2) IN (13,31,23,29,30,18,17))\n:  : 
> +- SubqueryAlias spark_catalog.tpch_data_orc_100.customer\n:  
> :+- Relation 
> tpch_data_orc_100.customer[c_custkey#155L,c_name#156,c_address#157,c_nationkey#158L,c_phone#159,c_acctbal#160,c_comment#161,c_mktsegment#162]
>  orc\n:  +- Project [o_orderkey#16L, o_custkey#17L, 
> o_orderstatus#18, o_totalprice#19, o_orderpriority#20, o_clerk#21, 
> o_shippriority#22, o_comment#23, o_orderdate#24]\n: +- Filter 
> (o_custkey#17L = outer(c_custkey#6L))\n:+- SubqueryAlias 
> spark_catalog.tpch_data_orc_100.orders\n:   +- Relation 
> tpch_data_orc_100.orders[o_orderkey#16L,o_custkey#17L,o_orderstatus#18,o_totalprice#19,o_orderpriority#20,o_clerk#21,o_shippriority#22,o_comment#23,o_orderdate#24]
>  orc\n+- SubqueryAlias spark_catalog.tpch_data_orc_100.customer\n 
>   +- Relation 
> tpch_data_orc_100.customer[c_custkey#6L,c_name#7,c_address#8,c_nationkey#9L,c_phone#10,c_acctbal#11,c_comment#12,c_mktsegment#13]
>  orc\n\n== Optimized Logical Plan ==\nSort [cntrycode#147 ASC NULLS FIRST], 
> true\n+- Aggregate [cntrycode#147], [cntrycode#147, count(1) AS numcust#150L, 
> sum(c_acctbal#11) AS totacctbal#151]\n   +- Project [substring(c_phone#10, 1, 
> 2) AS cntrycode#147, c_acctbal#11]\n  +- Join LeftAnti, (o_custkey#17L = 
> c_custkey#6L)\n :- Project [c_custkey#6L, c_phone#10, c_acctbal#11]\n 
> :  +- Filter ((isnotnull(c_acctbal#11) AND substring(c_phone#10, 1, 
> 2) IN (13,31,23,29,30,18,17)) AND (cast(c_acctbal#11 as decimal(16,6)) > 
> scalar-subquery#148 []))\n : :  +- Aggregate [avg(c_acctbal#160) 
> AS avg(c_acctbal)#154]\n : : +- Project [c_acctbal#160]\n 
> : :+- Filter (isnotnull(c_acc

[jira] [Resolved] (SPARK-36926) Discrepancy in Q22 of TPCH for Spark 3.2

2021-10-05 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-36926.

Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 34193
[https://github.com/apache/spark/pull/34193]

> Discrepancy in Q22 of TPCH for Spark 3.2
> 
>
> Key: SPARK-36926
> URL: https://issues.apache.org/jira/browse/SPARK-36926
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Aravind Patnam
>Priority: Blocker
> Fix For: 3.2.0
>
>
> When running TPCH scale 100 against 3.2, Query 22 has a discrepancy in the 
> number of rows returned by the query. This was tested with both AQE on and 
> off. All the other queries were matching in results. Below is the results 
> that we got when testing Q22 on 3.2: 
>  
> {code:java}
>   "results": [
> {
>   "name": "Q22",
>   "mode": "collect",
>   "parameters": {},
>   "joinTypes": [
> "SortMergeJoin"
>   ],
>   "tables": [
> "customer"
>   ],
>   "parsingTime": 0.016522,
>   "analysisTime": 0.004132,
>   "optimizationTime": 39.173868,
>   "planningTime": 23.10939,
>   "executionTime": 13762.183844,
>   "result": 0,
>   "breakDown": [],
>   "queryExecution": "== Parsed Logical Plan ==\n'Sort ['cntrycode ASC 
> NULLS FIRST], true\n+- 'Aggregate ['cntrycode], ['cntrycode, 'count(1) AS 
> numcust#150, 'sum('c_acctbal) AS totacctbal#151]\n   +- 'SubqueryAlias 
> custsale\n  +- 'Project ['substring('c_phone, 1, 2) AS cntrycode#147, 
> 'c_acctbal]\n +- 'Filter (('substring('c_phone, 1, 2) IN 
> (13,31,23,29,30,18,17) AND ('c_acctbal > scalar-subquery#148 [])) AND NOT 
> exists#149 [])\n:  :- 'Project [unresolvedalias('avg('c_acctbal), 
> None)]\n:  :  +- 'Filter (('c_acctbal > 0.00) AND 
> 'substring('c_phone, 1, 2) IN (13,31,23,29,30,18,17))\n:  : 
> +- 'UnresolvedRelation [customer], [], false\n:  +- 'Project 
> [*]\n: +- 'Filter ('o_custkey = 'c_custkey)\n:
> +- 'UnresolvedRelation [orders], [], false\n+- 
> 'UnresolvedRelation [customer], [], false\n\n== Analyzed Logical Plan 
> ==\ncntrycode: string, numcust: bigint, totacctbal: decimal(22,2)\nSort 
> [cntrycode#147 ASC NULLS FIRST], true\n+- Aggregate [cntrycode#147], 
> [cntrycode#147, count(1) AS numcust#150L, sum(c_acctbal#11) AS 
> totacctbal#151]\n   +- SubqueryAlias custsale\n  +- Project 
> [substring(c_phone#10, 1, 2) AS cntrycode#147, c_acctbal#11]\n +- 
> Filter ((substring(c_phone#10, 1, 2) IN (13,31,23,29,30,18,17) AND 
> (cast(c_acctbal#11 as decimal(16,6)) > cast(scalar-subquery#148 [] as 
> decimal(16,6 AND NOT exists#149 [c_custkey#6L])\n:  :- 
> Aggregate [avg(c_acctbal#160) AS avg(c_acctbal)#154]\n:  :  +- 
> Filter ((cast(c_acctbal#160 as decimal(12,2)) > cast(0.00 as decimal(12,2))) 
> AND substring(c_phone#159, 1, 2) IN (13,31,23,29,30,18,17))\n:  : 
> +- SubqueryAlias spark_catalog.tpch_data_orc_100.customer\n:  
> :+- Relation 
> tpch_data_orc_100.customer[c_custkey#155L,c_name#156,c_address#157,c_nationkey#158L,c_phone#159,c_acctbal#160,c_comment#161,c_mktsegment#162]
>  orc\n:  +- Project [o_orderkey#16L, o_custkey#17L, 
> o_orderstatus#18, o_totalprice#19, o_orderpriority#20, o_clerk#21, 
> o_shippriority#22, o_comment#23, o_orderdate#24]\n: +- Filter 
> (o_custkey#17L = outer(c_custkey#6L))\n:+- SubqueryAlias 
> spark_catalog.tpch_data_orc_100.orders\n:   +- Relation 
> tpch_data_orc_100.orders[o_orderkey#16L,o_custkey#17L,o_orderstatus#18,o_totalprice#19,o_orderpriority#20,o_clerk#21,o_shippriority#22,o_comment#23,o_orderdate#24]
>  orc\n+- SubqueryAlias spark_catalog.tpch_data_orc_100.customer\n 
>   +- Relation 
> tpch_data_orc_100.customer[c_custkey#6L,c_name#7,c_address#8,c_nationkey#9L,c_phone#10,c_acctbal#11,c_comment#12,c_mktsegment#13]
>  orc\n\n== Optimized Logical Plan ==\nSort [cntrycode#147 ASC NULLS FIRST], 
> true\n+- Aggregate [cntrycode#147], [cntrycode#147, count(1) AS numcust#150L, 
> sum(c_acctbal#11) AS totacctbal#151]\n   +- Project [substring(c_phone#10, 1, 
> 2) AS cntrycode#147, c_acctbal#11]\n  +- Join LeftAnti, (o_custkey#17L = 
> c_custkey#6L)\n :- Project [c_custkey#6L, c_phone#10, c_acctbal#11]\n 
> :  +- Filter ((isnotnull(c_acctbal#11) AND substring(c_phone#10, 1, 
> 2) IN (13,31,23,29,30,18,17)) AND (cast(c_acctbal#11 as decimal(16,6)) > 
> scalar-subquery#148 []))\n : :  +- Aggregate [avg(c_acctbal#160) 
> AS avg(c_acctbal)#154]\n : : +- Project [c_acctba

[jira] [Commented] (SPARK-36892) Disable batch fetch for a shuffle when push based shuffle is enabled

2021-09-30 Thread Gengliang Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422964#comment-17422964
 ] 

Gengliang Wang commented on SPARK-36892:


[~mshen] Thanks for the tests. I understand it is not easy work. 
I talked to [~mridulm80] on Slack as well. I will hold the next RC until your 
tests are complete. 


> Disable batch fetch for a shuffle when push based shuffle is enabled
> 
>
> Key: SPARK-36892
> URL: https://issues.apache.org/jira/browse/SPARK-36892
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.2.0
>Reporter: Mridul Muralidharan
>Priority: Blocker
>
> When push based shuffle is enabled, efficient fetch of merged mapper shuffle 
> output happens.
> Unfortunately, this currently interacts badly with 
> spark.sql.adaptive.fetchShuffleBlocksInBatch, potentially causing shuffle 
> fetch to hang and/or duplicate data to be fetched, causing correctness issues.
> Given batch fetch does not benefit spark stages reading merged blocks when 
> push based shuffle is enabled, ShuffleBlockFetcherIterator.doBatchFetch can 
> be disabled when push based shuffle is enabled.
> Thx to [~Ngone51] for surfacing this issue.
> +CC [~Gengliang.Wang]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36904) The specified datastore driver ("org.postgresql.Driver") was not found in the CLASSPATH

2021-09-30 Thread Gengliang Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422768#comment-17422768
 ] 

Gengliang Wang commented on SPARK-36904:


I tried building the code with
```
./build/mvn -Phive,hive-thriftserver  -DskipTests clean package
```
and creating/reading a table  works.

[~jlaskowski] Please provide more details for reproducing, for example, how is 
the table "covid_19" created?

> The specified datastore driver ("org.postgresql.Driver") was not found in the 
> CLASSPATH
> ---
>
> Key: SPARK-36904
> URL: https://issues.apache.org/jira/browse/SPARK-36904
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
> Environment: Spark 3.2.0 (RC6)
> {code:java}
> $ ./bin/spark-shell --version 
>   
>
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 3.2.0
>   /_/
> Using Scala version 2.12.15, OpenJDK 64-Bit Server VM, 11.0.12
> Branch heads/v3.2.0-rc6
> Compiled by user jacek on 2021-09-30T10:44:35Z
> Revision dde73e2e1c7e55c8e740cb159872e081ddfa7ed6
> Url https://github.com/apache/spark.git
> Type --help for more information.
> {code}
> Built from [https://github.com/apache/spark/commits/v3.2.0-rc6] using the 
> following command:
> {code:java}
> $ ./build/mvn \
> -Pyarn,kubernetes,hadoop-cloud,hive,hive-thriftserver \
> -DskipTests \
> clean install
> {code}
> {code:java}
> $ java -version
> openjdk version "11.0.12" 2021-07-20
> OpenJDK Runtime Environment Temurin-11.0.12+7 (build 11.0.12+7)
> OpenJDK 64-Bit Server VM Temurin-11.0.12+7 (build 11.0.12+7, mixed mode) 
> {code}
>Reporter: Jacek Laskowski
>Priority: Critical
> Attachments: exception.txt
>
>
> It looks similar to [hivethriftserver built into spark3.0.0. is throwing 
> error "org.postgresql.Driver" was not found in the 
> CLASSPATH|https://stackoverflow.com/q/62534653/1305344], but reporting here 
> for future reference.
> After I built the 3.2.0 (RC6) I ran `spark-shell` to execute `sql("describe 
> table covid_19")`. That gave me the exception (a full version is attached):
> {code}
> Caused by: java.lang.reflect.InvocationTargetException: 
> org.datanucleus.exceptions.NucleusException: Attempt to invoke the "BONECP" 
> plugin to create a ConnectionPool gave an error : The specified datastore 
> driver ("org.postgresql.Driver") was not found in the CLASSPATH. Please check 
> your CLASSPATH specification, and the name of the driver.
>   at jdk.internal.reflect.GeneratedConstructorAccessor64.newInstance(Unknown 
> Source)
>   at 
> java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
>   at 
> org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:606)
>   at 
> org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:330)
>   at 
> org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:203)
>   at 
> org.datanucleus.store.AbstractStoreManager.(AbstractStoreManager.java:162)
>   at 
> org.datanucleus.store.rdbms.RDBMSStoreManager.(RDBMSStoreManager.java:285)
>   at jdk.internal.reflect.GeneratedConstructorAccessor63.newInstance(Unknown 
> Source)
>   at 
> java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
>   at 
> org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:606)
>   at 
> org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301)
>   at 
> org.datanucleus.NucleusContextHelper.createStoreManagerForProperties(NucleusContextHelper.java:133)
>   at 
> org.datanucleus.PersistenceNucleusContextImpl.initialise(PersistenceNucleusContextImpl.java:422)
>   at 
> org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:817)
>   ... 171 more
> Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the 
> "BONECP" plugin to create a ConnectionPool gave an error : The specified 
> datastore driver ("org.postgresql.Driver") was not found in the CLASSPATH. 
> Please check your CLASSPATH specification, and the name of the driver.
>   at 
> org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:232)
>   at 
>

[jira] [Commented] (SPARK-36904) The specified datastore driver ("org.postgresql.Driver") was not found in the CLASSPATH

2021-09-30 Thread Gengliang Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422740#comment-17422740
 ] 

Gengliang Wang commented on SPARK-36904:


[~jlaskowski] Have you tried using the binary tarball instead of building Spark 
from the source code? 

> The specified datastore driver ("org.postgresql.Driver") was not found in the 
> CLASSPATH
> ---
>
> Key: SPARK-36904
> URL: https://issues.apache.org/jira/browse/SPARK-36904
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
> Environment: Spark 3.2.0 (RC6)
> {code:java}
> $ ./bin/spark-shell --version 
>   
>
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 3.2.0
>   /_/
> Using Scala version 2.12.15, OpenJDK 64-Bit Server VM, 11.0.12
> Branch heads/v3.2.0-rc6
> Compiled by user jacek on 2021-09-30T10:44:35Z
> Revision dde73e2e1c7e55c8e740cb159872e081ddfa7ed6
> Url https://github.com/apache/spark.git
> Type --help for more information.
> {code}
> Built from [https://github.com/apache/spark/commits/v3.2.0-rc6] using the 
> following command:
> {code:java}
> $ ./build/mvn \
> -Pyarn,kubernetes,hadoop-cloud,hive,hive-thriftserver \
> -DskipTests \
> clean install
> {code}
> {code:java}
> $ java -version
> openjdk version "11.0.12" 2021-07-20
> OpenJDK Runtime Environment Temurin-11.0.12+7 (build 11.0.12+7)
> OpenJDK 64-Bit Server VM Temurin-11.0.12+7 (build 11.0.12+7, mixed mode) 
> {code}
>Reporter: Jacek Laskowski
>Priority: Critical
> Attachments: exception.txt
>
>
> It looks similar to [hivethriftserver built into spark3.0.0. is throwing 
> error "org.postgresql.Driver" was not found in the 
> CLASSPATH|https://stackoverflow.com/q/62534653/1305344], but reporting here 
> for future reference.
> After I built the 3.2.0 (RC6) I ran `spark-shell` to execute `sql("describe 
> table covid_19")`. That gave me the exception (a full version is attached):
> {code}
> Caused by: java.lang.reflect.InvocationTargetException: 
> org.datanucleus.exceptions.NucleusException: Attempt to invoke the "BONECP" 
> plugin to create a ConnectionPool gave an error : The specified datastore 
> driver ("org.postgresql.Driver") was not found in the CLASSPATH. Please check 
> your CLASSPATH specification, and the name of the driver.
>   at jdk.internal.reflect.GeneratedConstructorAccessor64.newInstance(Unknown 
> Source)
>   at 
> java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
>   at 
> org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:606)
>   at 
> org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:330)
>   at 
> org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:203)
>   at 
> org.datanucleus.store.AbstractStoreManager.(AbstractStoreManager.java:162)
>   at 
> org.datanucleus.store.rdbms.RDBMSStoreManager.(RDBMSStoreManager.java:285)
>   at jdk.internal.reflect.GeneratedConstructorAccessor63.newInstance(Unknown 
> Source)
>   at 
> java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
>   at 
> org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:606)
>   at 
> org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301)
>   at 
> org.datanucleus.NucleusContextHelper.createStoreManagerForProperties(NucleusContextHelper.java:133)
>   at 
> org.datanucleus.PersistenceNucleusContextImpl.initialise(PersistenceNucleusContextImpl.java:422)
>   at 
> org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:817)
>   ... 171 more
> Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the 
> "BONECP" plugin to create a ConnectionPool gave an error : The specified 
> datastore driver ("org.postgresql.Driver") was not found in the CLASSPATH. 
> Please check your CLASSPATH specification, and the name of the driver.
>   at 
> org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:232)
>   at 
> org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:117)
>   at 
> org.datanucleus.store.rdbms.Conne

[jira] [Commented] (SPARK-36892) Disable batch fetch for a shuffle when push based shuffle is enabled

2021-09-30 Thread Gengliang Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422593#comment-17422593
 ] 

Gengliang Wang commented on SPARK-36892:


[~mridulm80] [~zhouyejoe] [~mshen] The push-based shuffle feature fails the 
3.2.0 RC multiple times. Could you run some real workloads(e.g TPCDS) with the 
latest branch 3.2 after the fix is merged? 

> Disable batch fetch for a shuffle when push based shuffle is enabled
> 
>
> Key: SPARK-36892
> URL: https://issues.apache.org/jira/browse/SPARK-36892
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.2.0
>Reporter: Mridul Muralidharan
>Priority: Blocker
>
> When push based shuffle is enabled, efficient fetch of merged mapper shuffle 
> output happens.
> Unfortunately, this currently interacts badly with 
> spark.sql.adaptive.fetchShuffleBlocksInBatch, potentially causing shuffle 
> fetch to hang and/or duplicate data to be fetched, causing correctness issues.
> Given batch fetch does not benefit spark stages reading merged blocks when 
> push based shuffle is enabled, ShuffleBlockFetcherIterator.doBatchFetch can 
> be disabled when push based shuffle is enabled.
> Thx to [~Ngone51] for surfacing this issue.
> +CC [~Gengliang.Wang]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36892) Disable batch fetch for a shuffle when push based shuffle is enabled

2021-09-29 Thread Gengliang Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422533#comment-17422533
 ] 

Gengliang Wang commented on SPARK-36892:


[~zhouyejoe] Thank you!

> Disable batch fetch for a shuffle when push based shuffle is enabled
> 
>
> Key: SPARK-36892
> URL: https://issues.apache.org/jira/browse/SPARK-36892
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.2.0
>Reporter: Mridul Muralidharan
>Priority: Blocker
>
> When push based shuffle is enabled, efficient fetch of merged mapper shuffle 
> output happens.
> Unfortunately, this currently interacts badly with 
> spark.sql.adaptive.fetchShuffleBlocksInBatch, potentially causing shuffle 
> fetch to hang and/or duplicate data to be fetched, causing correctness issues.
> Given batch fetch does not benefit spark stages reading merged blocks when 
> push based shuffle is enabled, ShuffleBlockFetcherIterator.doBatchFetch can 
> be disabled when push based shuffle is enabled.
> Thx to [~Ngone51] for surfacing this issue.
> +CC [~Gengliang.Wang]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36836) "sha2" expression with bit_length of 224 returns incorrect results

2021-09-28 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-36836:
--

Assignee: Richard Chen

> "sha2" expression with bit_length of 224 returns incorrect results
> --
>
> Key: SPARK-36836
> URL: https://issues.apache.org/jira/browse/SPARK-36836
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0, 3.0.0, 3.1.0, 3.2.0
>Reporter: Richard Chen
>Assignee: Richard Chen
>Priority: Major
> Fix For: 3.2.0
>
>
> {{sha2(input, bit_length)}} returns incorrect results when {{bit_length == 
> 224}}.
>  
> This bug seems to have been present since the {{sha2}} expression was 
> introduced in 1.5.0.
>  
> Repro in spark shell:
> {{spark.sql("SELECT sha2('abc', 224)").show()}}
>  
> Spark currently returns a garbled string, consisting of invalid UTF:
>  {{#\t}"4�"�B�w��U�*��你���l��}}
> The expected return value is: 
> {{23097d223405d8228642a477bda255b32aadbce4bda0b3f7e36c9da7}}
>  
> This appears to happen because the  {{MessageDigest.digest()}} function 
> appears to return bytes intended to be interpreted as a {{BigInt}} rather 
> than a string. Thus, the output of {{MessageDigest.digest()}} must first be 
> interpreted as a {{BigInt}} and then transformed into a hex string. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36836) "sha2" expression with bit_length of 224 returns incorrect results

2021-09-28 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-36836.

Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 34086
[https://github.com/apache/spark/pull/34086]

> "sha2" expression with bit_length of 224 returns incorrect results
> --
>
> Key: SPARK-36836
> URL: https://issues.apache.org/jira/browse/SPARK-36836
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0, 3.0.0, 3.1.0, 3.2.0
>Reporter: Richard Chen
>Priority: Major
> Fix For: 3.2.0
>
>
> {{sha2(input, bit_length)}} returns incorrect results when {{bit_length == 
> 224}}.
>  
> This bug seems to have been present since the {{sha2}} expression was 
> introduced in 1.5.0.
>  
> Repro in spark shell:
> {{spark.sql("SELECT sha2('abc', 224)").show()}}
>  
> Spark currently returns a garbled string, consisting of invalid UTF:
>  {{#\t}"4�"�B�w��U�*��你���l��}}
> The expected return value is: 
> {{23097d223405d8228642a477bda255b32aadbce4bda0b3f7e36c9da7}}
>  
> This appears to happen because the  {{MessageDigest.digest()}} function 
> appears to return bytes intended to be interpreted as a {{BigInt}} rather 
> than a string. Thus, the output of {{MessageDigest.digest()}} must first be 
> interpreted as a {{BigInt}} and then transformed into a hex string. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36873) Add provided Guava dependency for network-yarn module

2021-09-28 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-36873.

Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 34125
[https://github.com/apache/spark/pull/34125]

> Add provided Guava dependency for network-yarn module
> -
>
> Key: SPARK-36873
> URL: https://issues.apache.org/jira/browse/SPARK-36873
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Chao Sun
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.2.0
>
>
> In Spark 3.1 and earlier the network-yarn module implicitly relies on guava 
> from hadoop-client dependency, which was changed by SPARK-33212 where we 
> moved to shaded Hadoop client which no longer expose the transitive guava 
> dependency. This was fine for a while since we were not using 
> {{createDependencyReducedPom}} so the module picks up the transitive 
> dependency from {{spark-network-common}}. However, this got changed by 
> SPARK-36835 when we restored {{createDependencyReducedPom}} and now it is no 
> longer able to find guava classes:
> {code}
> mvn test -pl common/network-yarn -Phadoop-3.2 -Phive-thriftserver 
> -Pkinesis-asl -Pkubernetes -Pmesos -Pnetlib-lgpl -Pscala-2.12 
> -Pspark-ganglia-lgpl -Pyarn
> ...
> [INFO] Compiling 1 Java source to 
> /Users/sunchao/git/spark/common/network-yarn/target/scala-2.12/classes ...
> [WARNING] [Warn] : bootstrap class path not set in conjunction with -source 8
> [ERROR] [Error] 
> /Users/sunchao/git/spark/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java:32:
>  package com.google.common.annotations does not exist
> [ERROR] [Error] 
> /Users/sunchao/git/spark/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java:33:
>  package com.google.common.base does not exist
> [ERROR] [Error] 
> /Users/sunchao/git/spark/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java:34:
>  package com.google.common.collect does not exist
> [ERROR] [Error] 
> /Users/sunchao/git/spark/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java:118:
>  cannot find symbol
>   symbol:   class VisibleForTesting
>   location: class org.apache.spark.network.yarn.YarnShuffleService
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36856) Building by "./build/mvn" may be stuck on MacOS

2021-09-28 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-36856:
--

Assignee: copperybean

> Building by "./build/mvn" may be stuck on MacOS
> ---
>
> Key: SPARK-36856
> URL: https://issues.apache.org/jira/browse/SPARK-36856
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.0, 3.3.0
> Environment: MacOS 11.4
>Reporter: copperybean
>Assignee: copperybean
>Priority: Major
> Fix For: 3.2.0
>
>
> Command "./build/mvn" will be stuck on my MacOS 11.4. Because it is using 
> error java home. On my mac, "/usr/bin/java" is a real file instead of a 
> symbolic link, so the java home is set to path "/usr", and lead the launched 
> maven process stuck with this error java home.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36856) Building by "./build/mvn" may be stuck on MacOS

2021-09-28 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-36856:
---
Affects Version/s: (was: 3.0.0)
   3.2.0

> Building by "./build/mvn" may be stuck on MacOS
> ---
>
> Key: SPARK-36856
> URL: https://issues.apache.org/jira/browse/SPARK-36856
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.2.0, 3.3.0
> Environment: MacOS 11.4
>Reporter: copperybean
>Assignee: copperybean
>Priority: Major
> Fix For: 3.2.0
>
>
> Command "./build/mvn" will be stuck on my MacOS 11.4. Because it is using 
> error java home. On my mac, "/usr/bin/java" is a real file instead of a 
> symbolic link, so the java home is set to path "/usr", and lead the launched 
> maven process stuck with this error java home.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36856) Building by "./build/mvn" may be stuck on MacOS

2021-09-28 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-36856:
---
Priority: Minor  (was: Major)

> Building by "./build/mvn" may be stuck on MacOS
> ---
>
> Key: SPARK-36856
> URL: https://issues.apache.org/jira/browse/SPARK-36856
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.2.0, 3.3.0
> Environment: MacOS 11.4
>Reporter: copperybean
>Assignee: copperybean
>Priority: Minor
> Fix For: 3.2.0
>
>
> Command "./build/mvn" will be stuck on my MacOS 11.4. Because it is using 
> error java home. On my mac, "/usr/bin/java" is a real file instead of a 
> symbolic link, so the java home is set to path "/usr", and lead the launched 
> maven process stuck with this error java home.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36856) Building by "./build/mvn" may be stuck on MacOS

2021-09-28 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-36856.

Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 34111
[https://github.com/apache/spark/pull/34111]

> Building by "./build/mvn" may be stuck on MacOS
> ---
>
> Key: SPARK-36856
> URL: https://issues.apache.org/jira/browse/SPARK-36856
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.0, 3.3.0
> Environment: MacOS 11.4
>Reporter: copperybean
>Priority: Major
> Fix For: 3.2.0
>
>
> Command "./build/mvn" will be stuck on my MacOS 11.4. Because it is using 
> error java home. On my mac, "/usr/bin/java" is a real file instead of a 
> symbolic link, so the java home is set to path "/usr", and lead the launched 
> maven process stuck with this error java home.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-36861) Partition columns are overly eagerly parsed as dates

2021-09-27 Thread Gengliang Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17420549#comment-17420549
 ] 

Gengliang Wang edited comment on SPARK-36861 at 9/27/21, 8:06 AM:
--

Hmm, the PR https://github.com/apache/spark/pull/33709 is only on master. I 
can't reproduce your case on 3.2.0 RC4 with:

{code:scala}
> val df = Seq(("2021-01-01T00", 0), ("2021-01-01T01", 1), ("2021-01-01T02", 
> 2)).toDF("hour", "i")
> df.write.partitionBy("hour").parquet("/tmp/t1")
> spark.read.parquet("/tmp/t1").schema
res2: org.apache.spark.sql.types.StructType = 
StructType(StructField(i,IntegerType,true), StructField(hour,StringType,true))
{code}

The issue can be reproduced on Spark master though.



was (Author: gengliang.wang):
Hmm, the PR https://github.com/apache/spark/pull/33709 is only on master. I 
can't reproduce your case on RC4 with:

{code:scala}
> val df = Seq(("2021-01-01T00", 0), ("2021-01-01T01", 1), ("2021-01-01T02", 
> 2)).toDF("hour", "i")
> df.write.partitionBy("hour").parquet("/tmp/t1")
> spark.read.parquet("/tmp/t1").schema
res2: org.apache.spark.sql.types.StructType = 
StructType(StructField(i,IntegerType,true), StructField(hour,StringType,true))
{code}


> Partition columns are overly eagerly parsed as dates
> 
>
> Key: SPARK-36861
> URL: https://issues.apache.org/jira/browse/SPARK-36861
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tanel Kiis
>Priority: Major
>
> I have an input directory with subdirs:
> * hour=2021-01-01T00
> * hour=2021-01-01T01
> * hour=2021-01-01T02
> * ...
> in spark 3.1 the 'hour' column is parsed as a string type, but in 3.2 RC it 
> is parsed as date type and the hour part is lost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36861) Partition columns are overly eagerly parsed as dates

2021-09-27 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-36861:
---
Affects Version/s: (was: 3.2.0)
   3.3.0

> Partition columns are overly eagerly parsed as dates
> 
>
> Key: SPARK-36861
> URL: https://issues.apache.org/jira/browse/SPARK-36861
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Tanel Kiis
>Priority: Major
>
> I have an input directory with subdirs:
> * hour=2021-01-01T00
> * hour=2021-01-01T01
> * hour=2021-01-01T02
> * ...
> in spark 3.1 the 'hour' column is parsed as a string type, but in 3.2 RC it 
> is parsed as date type and the hour part is lost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36861) Partition columns are overly eagerly parsed as dates

2021-09-27 Thread Gengliang Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17420549#comment-17420549
 ] 

Gengliang Wang commented on SPARK-36861:


Hmm, the PR https://github.com/apache/spark/pull/33709 is only on master. I 
can't reproduce your case on RC4 with:

{code:scala}
> val df = Seq(("2021-01-01T00", 0), ("2021-01-01T01", 1), ("2021-01-01T02", 
> 2)).toDF("hour", "i")
> df.write.partitionBy("hour").parquet("/tmp/t1")
> spark.read.parquet("/tmp/t1").schema
res2: org.apache.spark.sql.types.StructType = 
StructType(StructField(i,IntegerType,true), StructField(hour,StringType,true))
{code}


> Partition columns are overly eagerly parsed as dates
> 
>
> Key: SPARK-36861
> URL: https://issues.apache.org/jira/browse/SPARK-36861
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tanel Kiis
>Priority: Major
>
> I have an input directory with subdirs:
> * hour=2021-01-01T00
> * hour=2021-01-01T01
> * hour=2021-01-01T02
> * ...
> in spark 3.1 the 'hour' column is parsed as a string type, but in 3.2 RC it 
> is parsed as date type and the hour part is lost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36861) Partition columns are overly eagerly parsed as dates

2021-09-27 Thread Gengliang Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17420538#comment-17420538
 ] 

Gengliang Wang commented on SPARK-36861:


[~tanelk] This is a new behavior introduced from 
https://github.com/apache/spark/pull/33709
However, turning into date and losing the hour part seems wrong. cc [~maxgekk] 
[~cloud_fan]

> Partition columns are overly eagerly parsed as dates
> 
>
> Key: SPARK-36861
> URL: https://issues.apache.org/jira/browse/SPARK-36861
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tanel Kiis
>Priority: Major
>
> I have an input directory with subdirs:
> * hour=2021-01-01T00
> * hour=2021-01-01T01
> * hour=2021-01-01T02
> * ...
> in spark 3.1 the 'hour' column is parsed as a string type, but in 3.2 RC it 
> is parsed as date type and the hour part is lost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36851) Incorrect parsing of negative ANSI typed interval literals

2021-09-26 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-36851:
--

Assignee: Peng Lei

> Incorrect parsing of negative ANSI typed interval literals
> --
>
> Key: SPARK-36851
> URL: https://issues.apache.org/jira/browse/SPARK-36851
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Peng Lei
>Priority: Major
> Fix For: 3.2.0
>
>
> If start field and end field are the same, parser doesn't take into account 
> the sign before interval literal string. For example:
> Works fine:
> {code:sql}
> spark-sql> select interval -'1-1' year to month;
> -1-1
> {code}
> Incorrect result:
> {code:sql}
> spark-sql> select interval -'1' year;
> 1-0
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36851) Incorrect parsing of negative ANSI typed interval literals

2021-09-26 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-36851.

Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 34107
[https://github.com/apache/spark/pull/34107]

> Incorrect parsing of negative ANSI typed interval literals
> --
>
> Key: SPARK-36851
> URL: https://issues.apache.org/jira/browse/SPARK-36851
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> If start field and end field are the same, parser doesn't take into account 
> the sign before interval literal string. For example:
> Works fine:
> {code:sql}
> spark-sql> select interval -'1-1' year to month;
> -1-1
> {code}
> Incorrect result:
> {code:sql}
> spark-sql> select interval -'1' year;
> 1-0
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36827) Task/Stage/Job data remain in memory leads memory leak

2021-09-24 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-36827.

Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 34092
[https://github.com/apache/spark/pull/34092]

> Task/Stage/Job data remain in memory leads memory leak
> --
>
> Key: SPARK-36827
> URL: https://issues.apache.org/jira/browse/SPARK-36827
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: Kohki Nishio
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: mem1.txt, worker.txt
>
>
> Noticing memory-leak like behavior, steady increase of heap after GC and 
> eventually it leads to a service failure. 
> The GC histogram shows very high number of Task/Data/Job data
> {code}
>  num #instances #bytes  class name 
> -- 
>6:   7835346 2444627952  org.apache.spark.status.TaskDataWrapper 
>   25:   3765152  180727296  org.apache.spark.status.StageDataWrapper 
>   88:2322559290200  org.apache.spark.status.JobDataWrapper 
> {code}
> Thread dumps show clearly the clean up thread is always doing cleanupStages
> {code}
> "element-tracking-store-worker" #355 daemon prio=5 os_prio=0 
> tid=0x7f31b0014800 nid=0x409 runnable [0x7f2f25783000]
>java.lang.Thread.State: RUNNABLE
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.util.kvstore.KVTypeInfo$MethodAccessor.get(KVTypeInfo.java:162)
>   at 
> org.apache.spark.util.kvstore.InMemoryStore$InMemoryView.compare(InMemoryStore.java:434)
>   at 
> org.apache.spark.util.kvstore.InMemoryStore$InMemoryView.lambda$iterator$0(InMemoryStore.java:375)
>   at 
> org.apache.spark.util.kvstore.InMemoryStore$InMemoryView$$Lambda$9000/574018760.compare(Unknown
>  Source)
>   at java.util.TimSort.gallopLeft(TimSort.java:542)
>   at java.util.TimSort.mergeLo(TimSort.java:752)
>   at java.util.TimSort.mergeAt(TimSort.java:514)
>   at java.util.TimSort.mergeCollapse(TimSort.java:439)
>   at java.util.TimSort.sort(TimSort.java:245)
>   at java.util.Arrays.sort(Arrays.java:1512)
>   at java.util.ArrayList.sort(ArrayList.java:1464)
>   at 
> org.apache.spark.util.kvstore.InMemoryStore$InMemoryView.iterator(InMemoryStore.java:375)
>   at 
> org.apache.spark.util.kvstore.KVStoreView.closeableIterator(KVStoreView.java:117)
>   at 
> org.apache.spark.status.AppStatusListener.$anonfun$cleanupStages$2(AppStatusListener.scala:1269)
>   at 
> org.apache.spark.status.AppStatusListener$$Lambda$9126/608388595.apply(Unknown
>  Source)
>   at scala.collection.immutable.List.map(List.scala:297)
>   at 
> org.apache.spark.status.AppStatusListener.cleanupStages(AppStatusListener.scala:1260)
>   at 
> org.apache.spark.status.AppStatusListener.$anonfun$new$3(AppStatusListener.scala:98)
>   at 
> org.apache.spark.status.AppStatusListener$$Lambda$646/596139882.apply$mcVJ$sp(Unknown
>  Source)
>   at 
> org.apache.spark.status.ElementTrackingStore.$anonfun$write$3(ElementTrackingStore.scala:135)
>   at 
> org.apache.spark.status.ElementTrackingStore.$anonfun$write$3$adapted(ElementTrackingStore.scala:133)
>   at 
> org.apache.spark.status.ElementTrackingStore$$Lambda$986/162337848.apply(Unknown
>  Source)
>   at scala.collection.immutable.List.foreach(List.scala:431)
>   at 
> org.apache.spark.status.ElementTrackingStore.$anonfun$write$2(ElementTrackingStore.scala:133)
>   at 
> org.apache.spark.status.ElementTrackingStore.$anonfun$write$2$adapted(ElementTrackingStore.scala:131)
>   at 
> org.apache.spark.status.ElementTrackingStore$$Lambda$984/600376389.apply(Unknown
>  Source)
>   at 
> org.apache.spark.status.ElementTrackingStore$LatchedTriggers.$anonfun$fireOnce$1(ElementTrackingStore.scala:58)
>   at 
> org.apache.spark.status.ElementTrackingStore$LatchedTriggers$$Lambda$985/1187323214.apply$mcV$sp(Unknown
>  Source)
>   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>   at org.apache.spark.util.Utils$.tryLog(Utils.scala:2013)
>   at 
> org.apache.spark.status.ElementTrackingStore$$anon$1.run(ElementTrackingStore.scala:117)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(T

[jira] [Assigned] (SPARK-36827) Task/Stage/Job data remain in memory leads memory leak

2021-09-24 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-36827:
--

Assignee: Gengliang Wang

> Task/Stage/Job data remain in memory leads memory leak
> --
>
> Key: SPARK-36827
> URL: https://issues.apache.org/jira/browse/SPARK-36827
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: Kohki Nishio
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: mem1.txt, worker.txt
>
>
> Noticing memory-leak like behavior, steady increase of heap after GC and 
> eventually it leads to a service failure. 
> The GC histogram shows very high number of Task/Data/Job data
> {code}
>  num #instances #bytes  class name 
> -- 
>6:   7835346 2444627952  org.apache.spark.status.TaskDataWrapper 
>   25:   3765152  180727296  org.apache.spark.status.StageDataWrapper 
>   88:2322559290200  org.apache.spark.status.JobDataWrapper 
> {code}
> Thread dumps show clearly the clean up thread is always doing cleanupStages
> {code}
> "element-tracking-store-worker" #355 daemon prio=5 os_prio=0 
> tid=0x7f31b0014800 nid=0x409 runnable [0x7f2f25783000]
>java.lang.Thread.State: RUNNABLE
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.util.kvstore.KVTypeInfo$MethodAccessor.get(KVTypeInfo.java:162)
>   at 
> org.apache.spark.util.kvstore.InMemoryStore$InMemoryView.compare(InMemoryStore.java:434)
>   at 
> org.apache.spark.util.kvstore.InMemoryStore$InMemoryView.lambda$iterator$0(InMemoryStore.java:375)
>   at 
> org.apache.spark.util.kvstore.InMemoryStore$InMemoryView$$Lambda$9000/574018760.compare(Unknown
>  Source)
>   at java.util.TimSort.gallopLeft(TimSort.java:542)
>   at java.util.TimSort.mergeLo(TimSort.java:752)
>   at java.util.TimSort.mergeAt(TimSort.java:514)
>   at java.util.TimSort.mergeCollapse(TimSort.java:439)
>   at java.util.TimSort.sort(TimSort.java:245)
>   at java.util.Arrays.sort(Arrays.java:1512)
>   at java.util.ArrayList.sort(ArrayList.java:1464)
>   at 
> org.apache.spark.util.kvstore.InMemoryStore$InMemoryView.iterator(InMemoryStore.java:375)
>   at 
> org.apache.spark.util.kvstore.KVStoreView.closeableIterator(KVStoreView.java:117)
>   at 
> org.apache.spark.status.AppStatusListener.$anonfun$cleanupStages$2(AppStatusListener.scala:1269)
>   at 
> org.apache.spark.status.AppStatusListener$$Lambda$9126/608388595.apply(Unknown
>  Source)
>   at scala.collection.immutable.List.map(List.scala:297)
>   at 
> org.apache.spark.status.AppStatusListener.cleanupStages(AppStatusListener.scala:1260)
>   at 
> org.apache.spark.status.AppStatusListener.$anonfun$new$3(AppStatusListener.scala:98)
>   at 
> org.apache.spark.status.AppStatusListener$$Lambda$646/596139882.apply$mcVJ$sp(Unknown
>  Source)
>   at 
> org.apache.spark.status.ElementTrackingStore.$anonfun$write$3(ElementTrackingStore.scala:135)
>   at 
> org.apache.spark.status.ElementTrackingStore.$anonfun$write$3$adapted(ElementTrackingStore.scala:133)
>   at 
> org.apache.spark.status.ElementTrackingStore$$Lambda$986/162337848.apply(Unknown
>  Source)
>   at scala.collection.immutable.List.foreach(List.scala:431)
>   at 
> org.apache.spark.status.ElementTrackingStore.$anonfun$write$2(ElementTrackingStore.scala:133)
>   at 
> org.apache.spark.status.ElementTrackingStore.$anonfun$write$2$adapted(ElementTrackingStore.scala:131)
>   at 
> org.apache.spark.status.ElementTrackingStore$$Lambda$984/600376389.apply(Unknown
>  Source)
>   at 
> org.apache.spark.status.ElementTrackingStore$LatchedTriggers.$anonfun$fireOnce$1(ElementTrackingStore.scala:58)
>   at 
> org.apache.spark.status.ElementTrackingStore$LatchedTriggers$$Lambda$985/1187323214.apply$mcV$sp(Unknown
>  Source)
>   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>   at org.apache.spark.util.Utils$.tryLog(Utils.scala:2013)
>   at 
> org.apache.spark.status.ElementTrackingStore$$anon$1.run(ElementTrackingStore.scala:117)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlas

[jira] [Assigned] (SPARK-36835) Spark 3.2.0 POMs are no longer "dependency reduced"

2021-09-23 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-36835:
--

Assignee: Chao Sun

> Spark 3.2.0 POMs are no longer "dependency reduced"
> ---
>
> Key: SPARK-36835
> URL: https://issues.apache.org/jira/browse/SPARK-36835
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Josh Rosen
>Assignee: Chao Sun
>Priority: Blocker
>
> It looks like Spark 3.2.0's POMs are no longer "dependency reduced". As a 
> result, applications may pull in additional unnecessary dependencies when 
> depending on Spark.
> Spark uses the Maven Shade plugin to create effective POMs and to bundle 
> shaded versions of certain libraries with Spark (namely, Jetty, Guava, and 
> JPPML). [By 
> default|https://maven.apache.org/plugins/maven-shade-plugin/shade-mojo.html#createDependencyReducedPom],
>  the Maven Shade plugin generates simplified POMs which remove dependencies 
> on artifacts that have been shaded.
> SPARK-33212 / 
> [b6f46ca29742029efea2790af7fdefbc2fcf52de|https://github.com/apache/spark/commit/b6f46ca29742029efea2790af7fdefbc2fcf52de]
>  changed the configuration of the Maven Shade plugin, setting 
> {{createDependencyReducedPom}} to {{false}}.
> As a result, the generated POMs now include compile-scope dependencies on the 
> shaded libraries. For example, compare the {{org.eclipse.jetty}} dependencies 
> in:
>  * Spark 3.1.2: 
> [https://repo1.maven.org/maven2/org/apache/spark/spark-core_2.12/3.1.2/spark-core_2.12-3.1.2.pom]
>  * Spark 3.2.0 RC2: 
> [https://repository.apache.org/content/repositories/orgapachespark-1390/org/apache/spark/spark-core_2.12/3.2.0/spark-core_2.12-3.2.0.pom]
> I think we should revert back to generating "dependency reduced" POMs to 
> ensure that Spark declares a proper set of dependencies and to avoid "unknown 
> unknown" consequences of changing our generated POM format.
> /cc [~csun]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36835) Spark 3.2.0 POMs are no longer "dependency reduced"

2021-09-23 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-36835.

Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 34085
[https://github.com/apache/spark/pull/34085]

> Spark 3.2.0 POMs are no longer "dependency reduced"
> ---
>
> Key: SPARK-36835
> URL: https://issues.apache.org/jira/browse/SPARK-36835
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Josh Rosen
>Assignee: Chao Sun
>Priority: Blocker
> Fix For: 3.2.0
>
>
> It looks like Spark 3.2.0's POMs are no longer "dependency reduced". As a 
> result, applications may pull in additional unnecessary dependencies when 
> depending on Spark.
> Spark uses the Maven Shade plugin to create effective POMs and to bundle 
> shaded versions of certain libraries with Spark (namely, Jetty, Guava, and 
> JPPML). [By 
> default|https://maven.apache.org/plugins/maven-shade-plugin/shade-mojo.html#createDependencyReducedPom],
>  the Maven Shade plugin generates simplified POMs which remove dependencies 
> on artifacts that have been shaded.
> SPARK-33212 / 
> [b6f46ca29742029efea2790af7fdefbc2fcf52de|https://github.com/apache/spark/commit/b6f46ca29742029efea2790af7fdefbc2fcf52de]
>  changed the configuration of the Maven Shade plugin, setting 
> {{createDependencyReducedPom}} to {{false}}.
> As a result, the generated POMs now include compile-scope dependencies on the 
> shaded libraries. For example, compare the {{org.eclipse.jetty}} dependencies 
> in:
>  * Spark 3.1.2: 
> [https://repo1.maven.org/maven2/org/apache/spark/spark-core_2.12/3.1.2/spark-core_2.12-3.1.2.pom]
>  * Spark 3.2.0 RC2: 
> [https://repository.apache.org/content/repositories/orgapachespark-1390/org/apache/spark/spark-core_2.12/3.2.0/spark-core_2.12-3.2.0.pom]
> I think we should revert back to generating "dependency reduced" POMs to 
> ensure that Spark declares a proper set of dependencies and to avoid "unknown 
> unknown" consequences of changing our generated POM format.
> /cc [~csun]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36782) Deadlock between map-output-dispatcher and dispatcher-BlockManagerMaster upon migrating shuffle blocks

2021-09-23 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-36782:
--

Assignee: Fabian Thiele

> Deadlock between map-output-dispatcher and dispatcher-BlockManagerMaster upon 
> migrating shuffle blocks
> --
>
> Key: SPARK-36782
> URL: https://issues.apache.org/jira/browse/SPARK-36782
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.3.0, 3.2.1
>Reporter: Fabian Thiele
>Assignee: Fabian Thiele
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: 
> 0001-Add-test-showing-that-decommission-might-deadlock.patch, 
> spark_stacktrace_deadlock.txt
>
>
> I can observe a deadlock on the driver that can be triggered rather reliably 
> in a job with a larger amount of tasks - upon using
> {code:java}
> spark.decommission.enabled: true
> spark.storage.decommission.rddBlocks.enabled: true
> spark.storage.decommission.shuffleBlocks.enabled: true
> spark.storage.decommission.enabled: true{code}
>  
> It origins in the {{dispatcher-BlockManagerMaster}} making a call to 
> {{updateBlockInfo}} when shuffles are migrated. This is not performed by a 
> thread from the pool but instead by the {{dispatcher-BlockManagerMaster}} 
> itself. I suppose this was done under the assumption that this would be very 
> fast. However if the block that is updated is a shuffle index block it calls
> {code:java}
> mapOutputTracker.updateMapOutput(shuffleId, mapId, blockManagerId){code}
> for which it waits to acquire a write lock as part of the 
> {{MapOutputTracker}}.
> If the timing is bad then one of the {{map-output-dispatchers}} are holding 
> this lock as part of e.g. {{serializedMapStatus}}. In this function 
> {{MapOutputTracker.serializeOutputStatuses}} is called and as part of that we 
> do
> {code:java}
> if (arrSize >= minBroadcastSize) {
>  // Use broadcast instead.
>  // Important arr(0) is the tag == DIRECT, ignore that while deserializing !
>  // arr is a nested Array so that it can handle over 2GB serialized data
>  val arr = chunkedByteBuf.getChunks().map(_.array())
>  val bcast = broadcastManager.newBroadcast(arr, isLocal){code}
> which makes an RPC call to {{dispatcher-BlockManagerMaster}}. That one 
> however is unable to answer as it is blocked while waiting on the 
> aforementioned lock. Hence the deadlock. The ingredients of this deadlock are 
> therefore: sufficient size of the array to go the broadcast-path, as well as 
> timing of incoming {{updateBlockInfo}} call as happens regularly during 
> decommissioning. Potentially earlier versions than 3.1.0 are affected but I 
> could not sufficiently conclude that.
> I have a stacktrace of all driver threads showing the deadlock: 
> [^spark_stacktrace_deadlock.txt]
> A coworker of mine wrote a patch that replicates the issue as a test case as 
> well: [^0001-Add-test-showing-that-decommission-might-deadlock.patch]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36782) Deadlock between map-output-dispatcher and dispatcher-BlockManagerMaster upon migrating shuffle blocks

2021-09-22 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-36782.

Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 34043
[https://github.com/apache/spark/pull/34043]

> Deadlock between map-output-dispatcher and dispatcher-BlockManagerMaster upon 
> migrating shuffle blocks
> --
>
> Key: SPARK-36782
> URL: https://issues.apache.org/jira/browse/SPARK-36782
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.3.0, 3.2.1
>Reporter: Fabian Thiele
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: 
> 0001-Add-test-showing-that-decommission-might-deadlock.patch, 
> spark_stacktrace_deadlock.txt
>
>
> I can observe a deadlock on the driver that can be triggered rather reliably 
> in a job with a larger amount of tasks - upon using
> {code:java}
> spark.decommission.enabled: true
> spark.storage.decommission.rddBlocks.enabled: true
> spark.storage.decommission.shuffleBlocks.enabled: true
> spark.storage.decommission.enabled: true{code}
>  
> It origins in the {{dispatcher-BlockManagerMaster}} making a call to 
> {{updateBlockInfo}} when shuffles are migrated. This is not performed by a 
> thread from the pool but instead by the {{dispatcher-BlockManagerMaster}} 
> itself. I suppose this was done under the assumption that this would be very 
> fast. However if the block that is updated is a shuffle index block it calls
> {code:java}
> mapOutputTracker.updateMapOutput(shuffleId, mapId, blockManagerId){code}
> for which it waits to acquire a write lock as part of the 
> {{MapOutputTracker}}.
> If the timing is bad then one of the {{map-output-dispatchers}} are holding 
> this lock as part of e.g. {{serializedMapStatus}}. In this function 
> {{MapOutputTracker.serializeOutputStatuses}} is called and as part of that we 
> do
> {code:java}
> if (arrSize >= minBroadcastSize) {
>  // Use broadcast instead.
>  // Important arr(0) is the tag == DIRECT, ignore that while deserializing !
>  // arr is a nested Array so that it can handle over 2GB serialized data
>  val arr = chunkedByteBuf.getChunks().map(_.array())
>  val bcast = broadcastManager.newBroadcast(arr, isLocal){code}
> which makes an RPC call to {{dispatcher-BlockManagerMaster}}. That one 
> however is unable to answer as it is blocked while waiting on the 
> aforementioned lock. Hence the deadlock. The ingredients of this deadlock are 
> therefore: sufficient size of the array to go the broadcast-path, as well as 
> timing of incoming {{updateBlockInfo}} call as happens regularly during 
> decommissioning. Potentially earlier versions than 3.1.0 are affected but I 
> could not sufficiently conclude that.
> I have a stacktrace of all driver threads showing the deadlock: 
> [^spark_stacktrace_deadlock.txt]
> A coworker of mine wrote a patch that replicates the issue as a test case as 
> well: [^0001-Add-test-showing-that-decommission-might-deadlock.patch]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35103) Improve the performance of type coercion rules

2021-09-20 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-35103:
---
Summary: Improve the performance of type coercion rules  (was: Improve type 
coercion rule performance)

> Improve the performance of type coercion rules
> --
>
> Key: SPARK-35103
> URL: https://issues.apache.org/jira/browse/SPARK-35103
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yingyi Bu
>Assignee: Yingyi Bu
>Priority: Major
> Fix For: 3.2.0
>
>
> Reduce the time spent on type coercion rules by running them together 
> one-tree-node-at-a-time in a combined rule.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36772) FinalizeShuffleMerge fails with an exception due to attempt id not matching

2021-09-18 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-36772:
--

Assignee: Ye Zhou

> FinalizeShuffleMerge fails with an exception due to attempt id not matching
> ---
>
> Key: SPARK-36772
> URL: https://issues.apache.org/jira/browse/SPARK-36772
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.2.0
>Reporter: Mridul Muralidharan
>Assignee: Ye Zhou
>Priority: Blocker
>
> As part of driver request to external shuffle services (ESS) to finalize the 
> merge, it also passes its [application attempt 
> id|https://github.com/apache/spark/blob/3f09093a21306b0fbcb132d4c9f285e56ac6b43c/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockStoreClient.java#L180]
>  so that ESS can validate the request is from the correct attempt.
> This attempt id is fetched from the TransportConf passed in when creating the 
> [ExternalBlockStoreClient|https://github.com/apache/spark/blob/67421d80b8935d91b86e8cd3becb211fa2abd54f/core/src/main/scala/org/apache/spark/SparkEnv.scala#L352]
>  - and the transport conf leverages a [cloned 
> copy|https://github.com/apache/spark/blob/0494dc90af48ce7da0625485a4dc6917a244d580/core/src/main/scala/org/apache/spark/network/netty/SparkTransportConf.scala#L47]
>  of the SparkConf passed to it.
> Application attempt id is set as part of SparkContext 
> [initialization|https://github.com/apache/spark/blob/67421d80b8935d91b86e8cd3becb211fa2abd54f/core/src/main/scala/org/apache/spark/SparkContext.scala#L586].
> But this happens after driver SparkEnv has [already been 
> created|https://github.com/apache/spark/blob/67421d80b8935d91b86e8cd3becb211fa2abd54f/core/src/main/scala/org/apache/spark/SparkContext.scala#L460].
> Hence the attempt id that ExternalBlockStoreClient uses will always end up 
> being -1 : which will not match the attempt id at ESS (which is based on 
> spark.app.attempt.id) : resulting in merge finalization to always fail (" 
> java.lang.IllegalArgumentException: The attempt id -1 in this 
> FinalizeShuffleMerge message does not match with the current attempt id 1 
> stored in shuffle service for application ...")



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36772) FinalizeShuffleMerge fails with an exception due to attempt id not matching

2021-09-18 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-36772.

Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 34018
[https://github.com/apache/spark/pull/34018]

> FinalizeShuffleMerge fails with an exception due to attempt id not matching
> ---
>
> Key: SPARK-36772
> URL: https://issues.apache.org/jira/browse/SPARK-36772
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.2.0
>Reporter: Mridul Muralidharan
>Assignee: Ye Zhou
>Priority: Blocker
> Fix For: 3.2.0
>
>
> As part of driver request to external shuffle services (ESS) to finalize the 
> merge, it also passes its [application attempt 
> id|https://github.com/apache/spark/blob/3f09093a21306b0fbcb132d4c9f285e56ac6b43c/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockStoreClient.java#L180]
>  so that ESS can validate the request is from the correct attempt.
> This attempt id is fetched from the TransportConf passed in when creating the 
> [ExternalBlockStoreClient|https://github.com/apache/spark/blob/67421d80b8935d91b86e8cd3becb211fa2abd54f/core/src/main/scala/org/apache/spark/SparkEnv.scala#L352]
>  - and the transport conf leverages a [cloned 
> copy|https://github.com/apache/spark/blob/0494dc90af48ce7da0625485a4dc6917a244d580/core/src/main/scala/org/apache/spark/network/netty/SparkTransportConf.scala#L47]
>  of the SparkConf passed to it.
> Application attempt id is set as part of SparkContext 
> [initialization|https://github.com/apache/spark/blob/67421d80b8935d91b86e8cd3becb211fa2abd54f/core/src/main/scala/org/apache/spark/SparkContext.scala#L586].
> But this happens after driver SparkEnv has [already been 
> created|https://github.com/apache/spark/blob/67421d80b8935d91b86e8cd3becb211fa2abd54f/core/src/main/scala/org/apache/spark/SparkContext.scala#L460].
> Hence the attempt id that ExternalBlockStoreClient uses will always end up 
> being -1 : which will not match the attempt id at ESS (which is based on 
> spark.app.attempt.id) : resulting in merge finalization to always fail (" 
> java.lang.IllegalArgumentException: The attempt id -1 in this 
> FinalizeShuffleMerge message does not match with the current attempt id 1 
> stored in shuffle service for application ...")



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-36433) Logs should show correct URL of where HistoryServer is started

2021-09-16 Thread Gengliang Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17416007#comment-17416007
 ] 

Gengliang Wang edited comment on SPARK-36433 at 9/16/21, 9:34 AM:
--

[~thejdeep] [~holden] The regression change SPARK-36237 is merged on master 
only. So I change the affect version to 3.3.0 and the priority to "minor".


was (Author: gengliang.wang):
[~thejdeep][~holden] The regression change SPARK-36237 is merged on master 
only. So I change the affect version to 3.3.0 and the priority to "minor".

> Logs should show correct URL of where HistoryServer is started
> --
>
> Key: SPARK-36433
> URL: https://issues.apache.org/jira/browse/SPARK-36433
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.3.0
>Reporter: Thejdeep Gudivada
>Assignee: Thejdeep Gudivada
>Priority: Minor
> Fix For: 3.3.0
>
>
> Due to a recent refactoring in the WebUI bind() code, the log message to 
> print the bound host and port information got moved and because of this the 
> info printed is incorrect.
>  
> Example log - 21/08/05 10:47:38 INFO HistoryServer: Bound HistoryServer to 
> 0.0.0.0, and started at :-1
>  
> Notice above that the port is incorrect



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36433) Logs should show correct URL of where HistoryServer is started

2021-09-16 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-36433:
--

Assignee: Thejdeep Gudivada  (was: Thejdeep)

> Logs should show correct URL of where HistoryServer is started
> --
>
> Key: SPARK-36433
> URL: https://issues.apache.org/jira/browse/SPARK-36433
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.3.0
>Reporter: Thejdeep Gudivada
>Assignee: Thejdeep Gudivada
>Priority: Minor
> Fix For: 3.3.0
>
>
> Due to a recent refactoring in the WebUI bind() code, the log message to 
> print the bound host and port information got moved and because of this the 
> info printed is incorrect.
>  
> Example log - 21/08/05 10:47:38 INFO HistoryServer: Bound HistoryServer to 
> 0.0.0.0, and started at :-1
>  
> Notice above that the port is incorrect



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

< 5 6 7 8 9 10 11 12 13 14 >

901 - 1000 of 2031 matches

Mail list logo