date:20210802

[jira] [Reopened] (SPARK-36367) Fix the behavior to follow pandas >= 1.3

2021-08-02 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reopened SPARK-36367:
--

> Fix the behavior to follow pandas >= 1.3
> 
>
> Key: SPARK-36367
> URL: https://issues.apache.org/jira/browse/SPARK-36367
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Assignee: Haejoon Lee
>Priority: Major
>
> Pandas 1.3 has been released. We should follow the new pandas behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36367) Fix the behavior to follow pandas >= 1.3

2021-08-02 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-36367:
-
Fix Version/s: (was: 3.2.0)

> Fix the behavior to follow pandas >= 1.3
> 
>
> Key: SPARK-36367
> URL: https://issues.apache.org/jira/browse/SPARK-36367
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Assignee: Haejoon Lee
>Priority: Major
>
> Pandas 1.3 has been released. We should follow the new pandas behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36367) Fix the behavior to follow pandas >= 1.3

2021-08-02 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-36367.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33614
[https://github.com/apache/spark/pull/33614]

> Fix the behavior to follow pandas >= 1.3
> 
>
> Key: SPARK-36367
> URL: https://issues.apache.org/jira/browse/SPARK-36367
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.2.0
>
>
> Pandas 1.3 has been released. We should follow the new pandas behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36331) Add SQLSTATE guideline

2021-08-02 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-36331:


Assignee: Karen Feng

> Add SQLSTATE guideline
> --
>
> Key: SPARK-36331
> URL: https://issues.apache.org/jira/browse/SPARK-36331
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Karen Feng
>Assignee: Karen Feng
>Priority: Major
>
> Add SQLSTATE guideline to the error guidelines.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36331) Add SQLSTATE guideline

2021-08-02 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-36331.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33560
[https://github.com/apache/spark/pull/33560]

> Add SQLSTATE guideline
> --
>
> Key: SPARK-36331
> URL: https://issues.apache.org/jira/browse/SPARK-36331
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Karen Feng
>Assignee: Karen Feng
>Priority: Major
> Fix For: 3.2.0
>
>
> Add SQLSTATE guideline to the error guidelines.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36381) ALTER TABLE ADD/RENAME COLUMNS check exist does not use case sensitive for v2 command.

2021-08-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391948#comment-17391948
 ] 

Apache Spark commented on SPARK-36381:
--

User 'Peng-Lei' has created a pull request for this issue:
https://github.com/apache/spark/pull/33618

> ALTER TABLE ADD/RENAME COLUMNS check exist does not use case sensitive for v2 
> command.
> --
>
> Key: SPARK-36381
> URL: https://issues.apache.org/jira/browse/SPARK-36381
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: PengLei
>Priority: Major
>
> ALTER TABLE ADD/RENAME COLUMNS check exist does not use case sensitive for v2 
> command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36381) ALTER TABLE ADD/RENAME COLUMNS check exist does not use case sensitive for v2 command.

2021-08-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391947#comment-17391947
 ] 

Apache Spark commented on SPARK-36381:
--

User 'Peng-Lei' has created a pull request for this issue:
https://github.com/apache/spark/pull/33618

> ALTER TABLE ADD/RENAME COLUMNS check exist does not use case sensitive for v2 
> command.
> --
>
> Key: SPARK-36381
> URL: https://issues.apache.org/jira/browse/SPARK-36381
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: PengLei
>Priority: Major
>
> ALTER TABLE ADD/RENAME COLUMNS check exist does not use case sensitive for v2 
> command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35548) Handling new attempt has started error message in BlockPushErrorHandler in client

2021-08-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391937#comment-17391937
 ] 

Apache Spark commented on SPARK-35548:
--

User 'zhuqi-lucas' has created a pull request for this issue:
https://github.com/apache/spark/pull/33617

> Handling new attempt has started error message in BlockPushErrorHandler in 
> client
> -
>
> Key: SPARK-35548
> URL: https://issues.apache.org/jira/browse/SPARK-35548
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Ye Zhou
>Priority: Major
>
> In SPARK-33350, a new type of error message is introduced in 
> BlockPushErrorHandler which indicates the PushblockStream message is received 
> after a new application attempt has started. This error message should be 
> correctly handled in client without retrying the block push.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35548) Handling new attempt has started error message in BlockPushErrorHandler in client

2021-08-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391936#comment-17391936
 ] 

Apache Spark commented on SPARK-35548:
--

User 'zhuqi-lucas' has created a pull request for this issue:
https://github.com/apache/spark/pull/33617

> Handling new attempt has started error message in BlockPushErrorHandler in 
> client
> -
>
> Key: SPARK-35548
> URL: https://issues.apache.org/jira/browse/SPARK-35548
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Ye Zhou
>Priority: Major
>
> In SPARK-33350, a new type of error message is introduced in 
> BlockPushErrorHandler which indicates the PushblockStream message is received 
> after a new application attempt has started. This error message should be 
> correctly handled in client without retrying the block push.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35548) Handling new attempt has started error message in BlockPushErrorHandler in client

2021-08-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35548:


Assignee: (was: Apache Spark)

> Handling new attempt has started error message in BlockPushErrorHandler in 
> client
> -
>
> Key: SPARK-35548
> URL: https://issues.apache.org/jira/browse/SPARK-35548
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Ye Zhou
>Priority: Major
>
> In SPARK-33350, a new type of error message is introduced in 
> BlockPushErrorHandler which indicates the PushblockStream message is received 
> after a new application attempt has started. This error message should be 
> correctly handled in client without retrying the block push.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35548) Handling new attempt has started error message in BlockPushErrorHandler in client

2021-08-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35548:


Assignee: Apache Spark

> Handling new attempt has started error message in BlockPushErrorHandler in 
> client
> -
>
> Key: SPARK-35548
> URL: https://issues.apache.org/jira/browse/SPARK-35548
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Ye Zhou
>Assignee: Apache Spark
>Priority: Major
>
> In SPARK-33350, a new type of error message is introduced in 
> BlockPushErrorHandler which indicates the PushblockStream message is received 
> after a new application attempt has started. This error message should be 
> correctly handled in client without retrying the block push.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36389) Revert the change that accepts negative mapId in ShuffleBlockId

2021-08-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36389:


Assignee: (was: Apache Spark)

> Revert the change that accepts negative mapId in ShuffleBlockId
> ---
>
> Key: SPARK-36389
> URL: https://issues.apache.org/jira/browse/SPARK-36389
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.2.0
>Reporter: Chandni Singh
>Priority: Minor
>
> With SPARK-32922, we added a change that {{ShuffleBlockId}} can have a 
> negative mapId. This was to support push-based shuffle where {{-1}} as mapId 
> indicated a push-merged block. However with SPARK-32923, a different type of 
> {{BlockId}} was introduced - {{ShuffleMergedId}}, but reverting the change to 
> {{ShuffleBlockId}} was missed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36389) Revert the change that accepts negative mapId in ShuffleBlockId

2021-08-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391913#comment-17391913
 ] 

Apache Spark commented on SPARK-36389:
--

User 'otterc' has created a pull request for this issue:
https://github.com/apache/spark/pull/33616

> Revert the change that accepts negative mapId in ShuffleBlockId
> ---
>
> Key: SPARK-36389
> URL: https://issues.apache.org/jira/browse/SPARK-36389
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.2.0
>Reporter: Chandni Singh
>Priority: Minor
>
> With SPARK-32922, we added a change that {{ShuffleBlockId}} can have a 
> negative mapId. This was to support push-based shuffle where {{-1}} as mapId 
> indicated a push-merged block. However with SPARK-32923, a different type of 
> {{BlockId}} was introduced - {{ShuffleMergedId}}, but reverting the change to 
> {{ShuffleBlockId}} was missed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36389) Revert the change that accepts negative mapId in ShuffleBlockId

2021-08-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36389:


Assignee: Apache Spark

> Revert the change that accepts negative mapId in ShuffleBlockId
> ---
>
> Key: SPARK-36389
> URL: https://issues.apache.org/jira/browse/SPARK-36389
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.2.0
>Reporter: Chandni Singh
>Assignee: Apache Spark
>Priority: Minor
>
> With SPARK-32922, we added a change that {{ShuffleBlockId}} can have a 
> negative mapId. This was to support push-based shuffle where {{-1}} as mapId 
> indicated a push-merged block. However with SPARK-32923, a different type of 
> {{BlockId}} was introduced - {{ShuffleMergedId}}, but reverting the change to 
> {{ShuffleBlockId}} was missed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36389) Revert the change that accepts negative mapId in ShuffleBlockId

2021-08-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391912#comment-17391912
 ] 

Apache Spark commented on SPARK-36389:
--

User 'otterc' has created a pull request for this issue:
https://github.com/apache/spark/pull/33616

> Revert the change that accepts negative mapId in ShuffleBlockId
> ---
>
> Key: SPARK-36389
> URL: https://issues.apache.org/jira/browse/SPARK-36389
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.2.0
>Reporter: Chandni Singh
>Priority: Minor
>
> With SPARK-32922, we added a change that {{ShuffleBlockId}} can have a 
> negative mapId. This was to support push-based shuffle where {{-1}} as mapId 
> indicated a push-merged block. However with SPARK-32923, a different type of 
> {{BlockId}} was introduced - {{ShuffleMergedId}}, but reverting the change to 
> {{ShuffleBlockId}} was missed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36389) Revert the change that accepts negative mapId

2021-08-02 Thread Chandni Singh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated SPARK-36389:
--
Description: With SPARK-32922, we added a change that {{ShuffleBlockId}} 
can have a negative mapId. This was to support push-based shuffle where {{-1}} 
as mapId indicated a push-merged block. However with SPARK-32923, a different 
type of {{BlockId}} was introduced - {{ShuffleMergedId}}, but reverting the 
change to {{ShuffleBlockId}} was missed.   (was: With SPARK-32922, we added a 
change that {{ShuffleBlockId}} can have a negative mapId. This was to support 
push-based shuffle where {{-1}} as mapId indicated a push-merged block. However 
with SPARK-32923, a different type of {{BlockId}} was introduce - 
{{ShuffleMergedId}}, but reverting the change to {{ShuffleBlockId}} was missed. 
)

> Revert the change that accepts negative mapId
> -
>
> Key: SPARK-36389
> URL: https://issues.apache.org/jira/browse/SPARK-36389
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.2.0
>Reporter: Chandni Singh
>Priority: Minor
>
> With SPARK-32922, we added a change that {{ShuffleBlockId}} can have a 
> negative mapId. This was to support push-based shuffle where {{-1}} as mapId 
> indicated a push-merged block. However with SPARK-32923, a different type of 
> {{BlockId}} was introduced - {{ShuffleMergedId}}, but reverting the change to 
> {{ShuffleBlockId}} was missed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36389) Revert the change that accepts negative mapId in ShuffleBlockId

2021-08-02 Thread Chandni Singh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated SPARK-36389:
--
Summary: Revert the change that accepts negative mapId in ShuffleBlockId  
(was: Revert the change that accepts negative mapId)

> Revert the change that accepts negative mapId in ShuffleBlockId
> ---
>
> Key: SPARK-36389
> URL: https://issues.apache.org/jira/browse/SPARK-36389
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.2.0
>Reporter: Chandni Singh
>Priority: Minor
>
> With SPARK-32922, we added a change that {{ShuffleBlockId}} can have a 
> negative mapId. This was to support push-based shuffle where {{-1}} as mapId 
> indicated a push-merged block. However with SPARK-32923, a different type of 
> {{BlockId}} was introduced - {{ShuffleMergedId}}, but reverting the change to 
> {{ShuffleBlockId}} was missed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36389) Revert the change that accepts negative mapId

2021-08-02 Thread Chandni Singh (Jira)

Chandni Singh created SPARK-36389:
-

 Summary: Revert the change that accepts negative mapId
 Key: SPARK-36389
 URL: https://issues.apache.org/jira/browse/SPARK-36389
 Project: Spark
  Issue Type: Bug
  Components: Shuffle
Affects Versions: 3.2.0
Reporter: Chandni Singh


With SPARK-32922, we added a change that {{ShuffleBlockId}} can have a negative 
mapId. This was to support push-based shuffle where {{-1}} as mapId indicated a 
push-merged block. However with SPARK-32923, a different type of {{BlockId}} 
was introduce - {{ShuffleMergedId}}, but reverting the change to 
{{ShuffleBlockId}} was missed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36374) Push-based shuffle documentation

2021-08-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391906#comment-17391906
 ] 

Apache Spark commented on SPARK-36374:
--

User 'venkata91' has created a pull request for this issue:
https://github.com/apache/spark/pull/33615

> Push-based shuffle documentation
> 
>
> Key: SPARK-36374
> URL: https://issues.apache.org/jira/browse/SPARK-36374
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 3.2.0
>Reporter: Venkata krishnan Sowrirajan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36374) Push-based shuffle documentation

2021-08-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36374:


Assignee: Apache Spark

> Push-based shuffle documentation
> 
>
> Key: SPARK-36374
> URL: https://issues.apache.org/jira/browse/SPARK-36374
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 3.2.0
>Reporter: Venkata krishnan Sowrirajan
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36374) Push-based shuffle documentation

2021-08-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36374:


Assignee: (was: Apache Spark)

> Push-based shuffle documentation
> 
>
> Key: SPARK-36374
> URL: https://issues.apache.org/jira/browse/SPARK-36374
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 3.2.0
>Reporter: Venkata krishnan Sowrirajan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36374) Push-based shuffle documentation

2021-08-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391907#comment-17391907
 ] 

Apache Spark commented on SPARK-36374:
--

User 'venkata91' has created a pull request for this issue:
https://github.com/apache/spark/pull/33615

> Push-based shuffle documentation
> 
>
> Key: SPARK-36374
> URL: https://issues.apache.org/jira/browse/SPARK-36374
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 3.2.0
>Reporter: Venkata krishnan Sowrirajan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36367) Fix the behavior to follow pandas >= 1.3

2021-08-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391860#comment-17391860
 ] 

Apache Spark commented on SPARK-36367:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/33614

> Fix the behavior to follow pandas >= 1.3
> 
>
> Key: SPARK-36367
> URL: https://issues.apache.org/jira/browse/SPARK-36367
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Assignee: Haejoon Lee
>Priority: Major
>
> Pandas 1.3 has been released. We should follow the new pandas behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36359) Coalesce drop all expressions after the first non nullable expression

2021-08-02 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-36359:

Summary: Coalesce drop all expressions after the first non nullable 
expression  (was: Coalesce returns the first expression if it is non nullable)

> Coalesce drop all expressions after the first non nullable expression
> -
>
> Key: SPARK-36359
> URL: https://issues.apache.org/jira/browse/SPARK-36359
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36345) Add mlflow/sklearn to GHA docker image

2021-08-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391858#comment-17391858
 ] 

Apache Spark commented on SPARK-36345:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/33598

> Add mlflow/sklearn to GHA docker image
> --
>
> Key: SPARK-36345
> URL: https://issues.apache.org/jira/browse/SPARK-36345
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra, PySpark, Tests
>Affects Versions: 3.2.0
>Reporter: Haejoon Lee
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.2.0
>
>
> In Github Actions CI, we install the `mlflow>=1.0` and `sklearn` in the step 
> "List Python packages (Python 3.9)" of "pyspark" job.
>  
> We can reduce the cost of CI by creating the image that has pre-installed 
> both package.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36367) Fix the behavior to follow pandas >= 1.3

2021-08-02 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-36367:
-
Affects Version/s: (was: 3.2.0)
   3.3.0

> Fix the behavior to follow pandas >= 1.3
> 
>
> Key: SPARK-36367
> URL: https://issues.apache.org/jira/browse/SPARK-36367
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Assignee: Haejoon Lee
>Priority: Major
>
> Pandas 1.3 has been released. We should follow the new pandas behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36345) Add mlflow/sklearn to GHA docker image

2021-08-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391857#comment-17391857
 ] 

Apache Spark commented on SPARK-36345:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/33598

> Add mlflow/sklearn to GHA docker image
> --
>
> Key: SPARK-36345
> URL: https://issues.apache.org/jira/browse/SPARK-36345
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra, PySpark, Tests
>Affects Versions: 3.2.0
>Reporter: Haejoon Lee
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.2.0
>
>
> In Github Actions CI, we install the `mlflow>=1.0` and `sklearn` in the step 
> "List Python packages (Python 3.9)" of "pyspark" job.
>  
> We can reduce the cost of CI by creating the image that has pre-installed 
> both package.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36367) Fix the behavior to follow pandas >= 1.3

2021-08-02 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-36367:


Assignee: Haejoon Lee

> Fix the behavior to follow pandas >= 1.3
> 
>
> Key: SPARK-36367
> URL: https://issues.apache.org/jira/browse/SPARK-36367
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Assignee: Haejoon Lee
>Priority: Major
>
> Pandas 1.3 has been released. We should follow the new pandas behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36373) DecimalPrecision only add necessary cast

2021-08-02 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang reassigned SPARK-36373:
---

Assignee: Yuming Wang

> DecimalPrecision only add necessary cast
> 
>
> Key: SPARK-36373
> URL: https://issues.apache.org/jira/browse/SPARK-36373
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>
> For example:
> {noformat}
> EqualTo(AttributeReference("d1", DecimalType(5, 2))(), 
> AttributeReference("d2", DecimalType(2, 1))())
> {noformat}
> It will add a useless cast to {{d1}}:
> {noformat}
> (cast(d1#6 as decimal(5,2)) = cast(d2#7 as decimal(5,2)))
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36373) DecimalPrecision only add necessary cast

2021-08-02 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-36373.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 33602
[https://github.com/apache/spark/pull/33602]

> DecimalPrecision only add necessary cast
> 
>
> Key: SPARK-36373
> URL: https://issues.apache.org/jira/browse/SPARK-36373
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.3.0
>
>
> For example:
> {noformat}
> EqualTo(AttributeReference("d1", DecimalType(5, 2))(), 
> AttributeReference("d2", DecimalType(2, 1))())
> {noformat}
> It will add a useless cast to {{d1}}:
> {noformat}
> (cast(d1#6 as decimal(5,2)) = cast(d2#7 as decimal(5,2)))
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36137) HiveShim always fallback to getAllPartitionsOf regardless of whether directSQL is enabled in remote HMS

2021-08-02 Thread L. C. Hsieh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh resolved SPARK-36137.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 33382
[https://github.com/apache/spark/pull/33382]

> HiveShim always fallback to getAllPartitionsOf regardless of whether 
> directSQL is enabled in remote HMS
> ---
>
> Key: SPARK-36137
> URL: https://issues.apache.org/jira/browse/SPARK-36137
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Fix For: 3.3.0
>
>
> At the moment {{getPartitionsByFilter}} in Hive shim only fallback to use 
> {{getAllPartitionsOf}} when {{hive.metastore.try.direct.sql}} is enabled in 
> the remote HMS. However, in certain cases the remote HMS will fallback to use 
> ORM (which only support string type for partition columns) to query the 
> underlying RDBMS even if this config is set to true, and Spark will not be 
> able to recover from the error and will just fail the query. 
> For instance, we encountered this bug HIVE-21497 in HMS running Hive 3.1.2, 
> and Spark was not able to pushdown filter for {{date}} column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36137) HiveShim always fallback to getAllPartitionsOf regardless of whether directSQL is enabled in remote HMS

2021-08-02 Thread L. C. Hsieh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh reassigned SPARK-36137:
---

Assignee: Chao Sun

> HiveShim always fallback to getAllPartitionsOf regardless of whether 
> directSQL is enabled in remote HMS
> ---
>
> Key: SPARK-36137
> URL: https://issues.apache.org/jira/browse/SPARK-36137
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>
> At the moment {{getPartitionsByFilter}} in Hive shim only fallback to use 
> {{getAllPartitionsOf}} when {{hive.metastore.try.direct.sql}} is enabled in 
> the remote HMS. However, in certain cases the remote HMS will fallback to use 
> ORM (which only support string type for partition columns) to query the 
> underlying RDBMS even if this config is set to true, and Spark will not be 
> able to recover from the error and will just fail the query. 
> For instance, we encountered this bug HIVE-21497 in HMS running Hive 3.1.2, 
> and Spark was not able to pushdown filter for {{date}} column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36378) Minor changes to address a few identified server side inefficiencies

2021-08-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36378:


Assignee: (was: Apache Spark)

> Minor changes to address a few identified server side inefficiencies
> 
>
> Key: SPARK-36378
> URL: https://issues.apache.org/jira/browse/SPARK-36378
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.2.0
>Reporter: Min Shen
>Priority: Major
>
> With the SPIP ticket close to being finished, we have done some performance 
> evaluations to compare the performance of push-based shuffle in upstream 
> Spark with the production version we have internally at LinkedIn.
> The evaluations have revealed a few regressions and also some additional perf 
> improvement opportunity.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36378) Minor changes to address a few identified server side inefficiencies

2021-08-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391832#comment-17391832
 ] 

Apache Spark commented on SPARK-36378:
--

User 'Victsm' has created a pull request for this issue:
https://github.com/apache/spark/pull/33613

> Minor changes to address a few identified server side inefficiencies
> 
>
> Key: SPARK-36378
> URL: https://issues.apache.org/jira/browse/SPARK-36378
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.2.0
>Reporter: Min Shen
>Priority: Major
>
> With the SPIP ticket close to being finished, we have done some performance 
> evaluations to compare the performance of push-based shuffle in upstream 
> Spark with the production version we have internally at LinkedIn.
> The evaluations have revealed a few regressions and also some additional perf 
> improvement opportunity.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36378) Minor changes to address a few identified server side inefficiencies

2021-08-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36378:


Assignee: Apache Spark

> Minor changes to address a few identified server side inefficiencies
> 
>
> Key: SPARK-36378
> URL: https://issues.apache.org/jira/browse/SPARK-36378
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.2.0
>Reporter: Min Shen
>Assignee: Apache Spark
>Priority: Major
>
> With the SPIP ticket close to being finished, we have done some performance 
> evaluations to compare the performance of push-based shuffle in upstream 
> Spark with the production version we have internally at LinkedIn.
> The evaluations have revealed a few regressions and also some additional perf 
> improvement opportunity.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36388) Fix DataFrame groupby-rolling to follow pandas 1.3

2021-08-02 Thread Takuya Ueshin (Jira)

Takuya Ueshin created SPARK-36388:
-

 Summary: Fix DataFrame groupby-rolling to follow pandas 1.3
 Key: SPARK-36388
 URL: https://issues.apache.org/jira/browse/SPARK-36388
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36386) Fix DataFrame groupby-expanding to follow pandas 1.3

2021-08-02 Thread Takuya Ueshin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-36386:
--
Summary: Fix DataFrame groupby-expanding to follow pandas 1.3  (was: Fix 
groupby-expanding to follow pandas 1.3)

> Fix DataFrame groupby-expanding to follow pandas 1.3
> 
>
> Key: SPARK-36386
> URL: https://issues.apache.org/jira/browse/SPARK-36386
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36387) Fix Series.astype from datetime to nullable string

2021-08-02 Thread Takuya Ueshin (Jira)

Takuya Ueshin created SPARK-36387:
-

 Summary: Fix Series.astype from datetime to nullable string
 Key: SPARK-36387
 URL: https://issues.apache.org/jira/browse/SPARK-36387
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36386) Fix groupby-expanding to follow pandas 1.3

2021-08-02 Thread Takuya Ueshin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-36386:
--
Summary: Fix groupby-expanding to follow pandas 1.3  (was: Fix expanding to 
follow pandas 1.3)

> Fix groupby-expanding to follow pandas 1.3
> --
>
> Key: SPARK-36386
> URL: https://issues.apache.org/jira/browse/SPARK-36386
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36386) Fix expanding to follow pandas 1.3

2021-08-02 Thread Takuya Ueshin (Jira)

Takuya Ueshin created SPARK-36386:
-

 Summary: Fix expanding to follow pandas 1.3
 Key: SPARK-36386
 URL: https://issues.apache.org/jira/browse/SPARK-36386
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30602) SPIP: Support push-based shuffle to improve shuffle efficiency

2021-08-02 Thread Min Shen (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391745#comment-17391745
 ] 

Min Shen commented on SPARK-30602:
--

[~mridulm80], thanks for shepherding this work and your reviews on the PRs as 
well!

BTW, could you please add me as the assignee of this ticket to properly credit 
the work?

> SPIP: Support push-based shuffle to improve shuffle efficiency
> --
>
> Key: SPARK-30602
> URL: https://issues.apache.org/jira/browse/SPARK-30602
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Min Shen
>Priority: Major
>  Labels: release-notes
> Fix For: 3.2.0
>
> Attachments: Screen Shot 2020-06-23 at 11.31.22 AM.jpg, 
> vldb_magnet_final.pdf
>
>
> In a large deployment of a Spark compute infrastructure, Spark shuffle is 
> becoming a potential scaling bottleneck and a source of inefficiency in the 
> cluster. When doing Spark on YARN for a large-scale deployment, people 
> usually enable Spark external shuffle service and store the intermediate 
> shuffle files on HDD. Because the number of blocks generated for a particular 
> shuffle grows quadratically compared to the size of shuffled data (# mappers 
> and reducers grows linearly with the size of shuffled data, but # blocks is # 
> mappers * # reducers), one general trend we have observed is that the more 
> data a Spark application processes, the smaller the block size becomes. In a 
> few production clusters we have seen, the average shuffle block size is only 
> 10s of KBs. Because of the inefficiency of performing random reads on HDD for 
> small amount of data, the overall efficiency of the Spark external shuffle 
> services serving the shuffle blocks degrades as we see an increasing # of 
> Spark applications processing an increasing amount of data. In addition, 
> because Spark external shuffle service is a shared service in a multi-tenancy 
> cluster, the inefficiency with one Spark application could propagate to other 
> applications as well.
> In this ticket, we propose a solution to improve Spark shuffle efficiency in 
> above mentioned environments with push-based shuffle. With push-based 
> shuffle, shuffle is performed at the end of mappers and blocks get pre-merged 
> and move towards reducers. In our prototype implementation, we have seen 
> significant efficiency improvements when performing large shuffles. We take a 
> Spark-native approach to achieve this, i.e., extending Spark’s existing 
> shuffle netty protocol, and the behaviors of Spark mappers, reducers and 
> drivers. This way, we can bring the benefits of more efficient shuffle in 
> Spark without incurring the dependency or overhead of either specialized 
> storage layer or external infrastructure pieces.
>  
> Link to dev mailing list discussion: 
> [http://apache-spark-developers-list.1001551.n3.nabble.com/Enabling-push-based-shuffle-in-Spark-td28732.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36379) Null at root level of a JSON array causes the parsing failure (w/ permissive mode)

2021-08-02 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-36379.
---
Fix Version/s: 3.3.0
   3.2.0
 Assignee: Hyukjin Kwon
   Resolution: Fixed

This is resolved via https://github.com/apache/spark/pull/33608

> Null at root level of a JSON array causes the parsing failure (w/ permissive 
> mode)
> --
>
> Key: SPARK-36379
> URL: https://issues.apache.org/jira/browse/SPARK-36379
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2, 3.2.0, 3.3.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
> Fix For: 3.2.0, 3.3.0
>
>
> {code}
> scala> spark.read.json(Seq("""[{"a": "str"}, null, {"a": 
> "str"}]""").toDS).collect()
> {code}
> {code}
> ...
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 
> (TID 1) (172.30.3.20 executor driver): java.lang.NullPointerException
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
> {code}
> Since the mode (by default) is permissive, we shouldn't just fail like above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36385) Add possibility for jdbc insert hints

2021-08-02 Thread Nikolay Ivanitskiy (Jira)

Nikolay Ivanitskiy created SPARK-36385:
--

 Summary: Add possibility for jdbc insert hints
 Key: SPARK-36385
 URL: https://issues.apache.org/jira/browse/SPARK-36385
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 2.4.7
Reporter: Nikolay Ivanitskiy


Some SQL backends supports hints for SQL insert statement, such as
{code:java}
/*+ ignore_row_on_dupkey_index ( table(col1, col2, ...) ) */ {code}
for example.
{code:java}
insert  /*+ ignore_row_on_dupkey_index ( table(col1, col2, ...) ) */ into 
table(...{code}
But spark jdbc writer does not allow to add hints.

I suggest to add support for hints in 
org.apache.spark.sql.execution.datasources.jdbc.JdbsUtils.getInsertStatement 
and 
org.apache.spark.sql.execution.datasources.jdbc.JdbsUtils.getInsertStatement.saveTable

Hints should be stored in options.

I already have version with hints support so if issue will be accepted I can 
post the fix.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35430) Investigate the failure of "PVs with local storage" integration test on Docker driver

2021-08-02 Thread Shane Knapp (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Knapp resolved SPARK-35430.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 32793
[https://github.com/apache/spark/pull/32793]

> Investigate the failure of "PVs with local storage" integration test on 
> Docker driver
> -
>
> Key: SPARK-35430
> URL: https://issues.apache.org/jira/browse/SPARK-35430
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.2.0
>Reporter: Attila Zsolt Piros
>Assignee: Attila Zsolt Piros
>Priority: Major
> Fix For: 3.3.0
>
>
> With https://issues.apache.org/jira/browse/SPARK-34738 integration tests are 
> migrated to docker but "PVs with local storage" was failing so we created a 
> separate test tag in https://github.com/apache/spark/pull/31829 called 
> "persistentVolume" test tag which not used by the 
> dev-run-integration-tests.sh so this way that tests is skipped.
> Here we should revert "persistentVolume" and investigate the error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35430) Investigate the failure of "PVs with local storage" integration test on Docker driver

2021-08-02 Thread Shane Knapp (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Knapp reassigned SPARK-35430:
---

Assignee: Attila Zsolt Piros

> Investigate the failure of "PVs with local storage" integration test on 
> Docker driver
> -
>
> Key: SPARK-35430
> URL: https://issues.apache.org/jira/browse/SPARK-35430
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.2.0
>Reporter: Attila Zsolt Piros
>Assignee: Attila Zsolt Piros
>Priority: Major
>
> With https://issues.apache.org/jira/browse/SPARK-34738 integration tests are 
> migrated to docker but "PVs with local storage" was failing so we created a 
> separate test tag in https://github.com/apache/spark/pull/31829 called 
> "persistentVolume" test tag which not used by the 
> dev-run-integration-tests.sh so this way that tests is skipped.
> Here we should revert "persistentVolume" and investigate the error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36086) The case of the delta table is inconsistent with parquet

2021-08-02 Thread Wenchen Fan (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391679#comment-17391679
 ] 

Wenchen Fan commented on SPARK-36086:
-

[~krivosheinruslan] please open a ticket if you are working to improve the v2 
describe table command. This ticket is resolved because this column name case 
different is fixed.

> The case of the delta table is inconsistent with parquet
> 
>
> Key: SPARK-36086
> URL: https://issues.apache.org/jira/browse/SPARK-36086
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Yuming Wang
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.2.0
>
>
> How to reproduce this issue:
> {noformat}
> 1. Add delta-core_2.12-1.0.0-SNAPSHOT.jar to ${SPARK_HOME}/jars.
> 2. bin/spark-shell --conf 
> spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension --conf 
> spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog
> {noformat}
> {code:scala}
> spark.sql("create table t1 using parquet as select id, id as lower_id from 
> range(5)")
> spark.sql("CREATE VIEW v1 as SELECT * FROM t1")
> spark.sql("CREATE TABLE t2 USING DELTA PARTITIONED BY (LOWER_ID) SELECT 
> LOWER_ID, ID FROM v1")
> spark.sql("CREATE TABLE t3 USING PARQUET PARTITIONED BY (LOWER_ID) SELECT 
> LOWER_ID, ID FROM v1")
> spark.sql("desc extended t2").show(false)
> spark.sql("desc extended t3").show(false)
> {code}
> {noformat}
> scala> spark.sql("desc extended t2").show(false)
> ++--+---+
> |col_name|data_type   
>   |comment|
> ++--+---+
> |lower_id|bigint  
>   |   |
> |id  |bigint  
>   |   |
> ||
>   |   |
> |# Partitioning  |
>   |   |
> |Part 0  |lower_id
>   |   |
> ||
>   |   |
> |# Detailed Table Information|
>   |   |
> |Name|default.t2  
>   |   |
> |Location
> |file:/Users/yumwang/Downloads/spark-3.1.1-bin-hadoop2.7/spark-warehouse/t2|  
>  |
> |Provider|delta   
>   |   |
> |Table Properties
> |[Type=MANAGED,delta.minReaderVersion=1,delta.minWriterVersion=2]  |  
>  |
> ++--+---+
> scala> spark.sql("desc extended t3").show(false)
> ++--+---+
> |col_name|data_type   
>   |comment|
> ++--+---+
> |ID  |bigint  
>   |null   |
> |LOWER_ID|bigint  
>   |null   |
> |# Partition Information |
>   |   |
> |# col_name  |data_type   
>   |comment|
> |LOWER_ID|bigint  
>   |null   |
> ||
>   |   |
> |# Detailed Table Information|
>   |   |
> |Database|default 
>   |   |
> |Table   |t3  
>   |   |
> |Owner

[jira] [Assigned] (SPARK-36086) The case of the delta table is inconsistent with parquet

2021-08-02 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-36086:
---

Assignee: angerszhu

> The case of the delta table is inconsistent with parquet
> 
>
> Key: SPARK-36086
> URL: https://issues.apache.org/jira/browse/SPARK-36086
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Yuming Wang
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.2.0
>
>
> How to reproduce this issue:
> {noformat}
> 1. Add delta-core_2.12-1.0.0-SNAPSHOT.jar to ${SPARK_HOME}/jars.
> 2. bin/spark-shell --conf 
> spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension --conf 
> spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog
> {noformat}
> {code:scala}
> spark.sql("create table t1 using parquet as select id, id as lower_id from 
> range(5)")
> spark.sql("CREATE VIEW v1 as SELECT * FROM t1")
> spark.sql("CREATE TABLE t2 USING DELTA PARTITIONED BY (LOWER_ID) SELECT 
> LOWER_ID, ID FROM v1")
> spark.sql("CREATE TABLE t3 USING PARQUET PARTITIONED BY (LOWER_ID) SELECT 
> LOWER_ID, ID FROM v1")
> spark.sql("desc extended t2").show(false)
> spark.sql("desc extended t3").show(false)
> {code}
> {noformat}
> scala> spark.sql("desc extended t2").show(false)
> ++--+---+
> |col_name|data_type   
>   |comment|
> ++--+---+
> |lower_id|bigint  
>   |   |
> |id  |bigint  
>   |   |
> ||
>   |   |
> |# Partitioning  |
>   |   |
> |Part 0  |lower_id
>   |   |
> ||
>   |   |
> |# Detailed Table Information|
>   |   |
> |Name|default.t2  
>   |   |
> |Location
> |file:/Users/yumwang/Downloads/spark-3.1.1-bin-hadoop2.7/spark-warehouse/t2|  
>  |
> |Provider|delta   
>   |   |
> |Table Properties
> |[Type=MANAGED,delta.minReaderVersion=1,delta.minWriterVersion=2]  |  
>  |
> ++--+---+
> scala> spark.sql("desc extended t3").show(false)
> ++--+---+
> |col_name|data_type   
>   |comment|
> ++--+---+
> |ID  |bigint  
>   |null   |
> |LOWER_ID|bigint  
>   |null   |
> |# Partition Information |
>   |   |
> |# col_name  |data_type   
>   |comment|
> |LOWER_ID|bigint  
>   |null   |
> ||
>   |   |
> |# Detailed Table Information|
>   |   |
> |Database|default 
>   |   |
> |Table   |t3  
>   |   |
> |Owner   |yumwang 
>   |   |
> |Created Time|Mon Jul 12 14:07:16 CST 2021
>

[jira] [Resolved] (SPARK-36086) The case of the delta table is inconsistent with parquet

2021-08-02 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-36086.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33576
[https://github.com/apache/spark/pull/33576]

> The case of the delta table is inconsistent with parquet
> 
>
> Key: SPARK-36086
> URL: https://issues.apache.org/jira/browse/SPARK-36086
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Yuming Wang
>Priority: Major
> Fix For: 3.2.0
>
>
> How to reproduce this issue:
> {noformat}
> 1. Add delta-core_2.12-1.0.0-SNAPSHOT.jar to ${SPARK_HOME}/jars.
> 2. bin/spark-shell --conf 
> spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension --conf 
> spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog
> {noformat}
> {code:scala}
> spark.sql("create table t1 using parquet as select id, id as lower_id from 
> range(5)")
> spark.sql("CREATE VIEW v1 as SELECT * FROM t1")
> spark.sql("CREATE TABLE t2 USING DELTA PARTITIONED BY (LOWER_ID) SELECT 
> LOWER_ID, ID FROM v1")
> spark.sql("CREATE TABLE t3 USING PARQUET PARTITIONED BY (LOWER_ID) SELECT 
> LOWER_ID, ID FROM v1")
> spark.sql("desc extended t2").show(false)
> spark.sql("desc extended t3").show(false)
> {code}
> {noformat}
> scala> spark.sql("desc extended t2").show(false)
> ++--+---+
> |col_name|data_type   
>   |comment|
> ++--+---+
> |lower_id|bigint  
>   |   |
> |id  |bigint  
>   |   |
> ||
>   |   |
> |# Partitioning  |
>   |   |
> |Part 0  |lower_id
>   |   |
> ||
>   |   |
> |# Detailed Table Information|
>   |   |
> |Name|default.t2  
>   |   |
> |Location
> |file:/Users/yumwang/Downloads/spark-3.1.1-bin-hadoop2.7/spark-warehouse/t2|  
>  |
> |Provider|delta   
>   |   |
> |Table Properties
> |[Type=MANAGED,delta.minReaderVersion=1,delta.minWriterVersion=2]  |  
>  |
> ++--+---+
> scala> spark.sql("desc extended t3").show(false)
> ++--+---+
> |col_name|data_type   
>   |comment|
> ++--+---+
> |ID  |bigint  
>   |null   |
> |LOWER_ID|bigint  
>   |null   |
> |# Partition Information |
>   |   |
> |# col_name  |data_type   
>   |comment|
> |LOWER_ID|bigint  
>   |null   |
> ||
>   |   |
> |# Detailed Table Information|
>   |   |
> |Database|default 
>   |   |
> |Table   |t3  
>   |   |
> |Owner   |yumwang 
>   |   |
> |Created Time

[jira] [Resolved] (SPARK-36382) Remove noisy footer from the summary table for metrics

2021-08-02 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-36382.

Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 33611
[https://github.com/apache/spark/pull/33611]

> Remove noisy footer from the summary table for metrics
> --
>
> Key: SPARK-36382
> URL: https://issues.apache.org/jira/browse/SPARK-36382
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.3.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
> Fix For: 3.3.0
>
>
> In the WebUI, some tables are implemented using DataTables 
> (https://datatables.net/).
> By default, tables created using DataTables shows footer which says `Showing 
> x to y of z entries`, which is helpful for some tables if table entries can 
> grow
> But the summary table for metrics in StagePage cannot grow so it's a little 
> bit noisy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36383) NullPointerException throws during executor shutdown

2021-08-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36383:


Assignee: Apache Spark

> NullPointerException throws during executor shutdown
> 
>
> Key: SPARK-36383
> URL: https://issues.apache.org/jira/browse/SPARK-36383
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.3, 3.1.2, 3.2.0, 3.3.0
>Reporter: wuyi
>Assignee: Apache Spark
>Priority: Major
>
> {code:java}
> 21/07/23 16:04:10 WARN Executor: Unable to stop executor metrics poller
> java.lang.NullPointerException
>         at org.apache.spark.executor.Executor.stop(Executor.scala:318)
>         at 
> org.apache.spark.executor.Executor.$anonfun$stopHookReference$1(Executor.scala:76)
>         at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214)
>         at 
> org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188)
>         at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>         at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2025)
>         at 
> org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188)
>         at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>         at scala.util.Try$.apply(Try.scala:213)
>         at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
>         at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> 21/07/23 16:04:10 WARN Executor: Unable to stop heartbeater
> java.lang.NullPointerException
>         at org.apache.spark.executor.Executor.stop(Executor.scala:324)
>         at 
> org.apache.spark.executor.Executor.$anonfun$stopHookReference$1(Executor.scala:76)
>         at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214)
>         at 
> org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188)
>         at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>         at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2025)
>         at 
> org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188)
>         at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>         at scala.util.Try$.apply(Try.scala:213)
>         at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
>         at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> 21/07/23 16:04:10 ERROR Utils: Uncaught exception in thread shutdown-hook-0
> java.lang.NullPointerException
>         at 
> org.apache.spark.executor.Executor.$anonfun$stop$3(Executor.scala:334)
>         at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>         at 
> org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:231)
>         at org.apache.spark.executor.Executor.stop(Executor.scala:334)
>         at 
> org.apache.spark.executor.Executor.$anonfun$stopHookReference$1(Executor.scala:76)
>         at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214)
>         at 
> org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188)
>         at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>         at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2025)
>         at 
> org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188)
>         at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>         at scala.util.Try$.apply(Try.scala:213)
>         at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
>

[jira] [Assigned] (SPARK-36383) NullPointerException throws during executor shutdown

2021-08-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36383:


Assignee: (was: Apache Spark)

> NullPointerException throws during executor shutdown
> 
>
> Key: SPARK-36383
> URL: https://issues.apache.org/jira/browse/SPARK-36383
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.3, 3.1.2, 3.2.0, 3.3.0
>Reporter: wuyi
>Priority: Major
>
> {code:java}
> 21/07/23 16:04:10 WARN Executor: Unable to stop executor metrics poller
> java.lang.NullPointerException
>         at org.apache.spark.executor.Executor.stop(Executor.scala:318)
>         at 
> org.apache.spark.executor.Executor.$anonfun$stopHookReference$1(Executor.scala:76)
>         at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214)
>         at 
> org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188)
>         at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>         at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2025)
>         at 
> org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188)
>         at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>         at scala.util.Try$.apply(Try.scala:213)
>         at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
>         at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> 21/07/23 16:04:10 WARN Executor: Unable to stop heartbeater
> java.lang.NullPointerException
>         at org.apache.spark.executor.Executor.stop(Executor.scala:324)
>         at 
> org.apache.spark.executor.Executor.$anonfun$stopHookReference$1(Executor.scala:76)
>         at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214)
>         at 
> org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188)
>         at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>         at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2025)
>         at 
> org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188)
>         at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>         at scala.util.Try$.apply(Try.scala:213)
>         at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
>         at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> 21/07/23 16:04:10 ERROR Utils: Uncaught exception in thread shutdown-hook-0
> java.lang.NullPointerException
>         at 
> org.apache.spark.executor.Executor.$anonfun$stop$3(Executor.scala:334)
>         at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>         at 
> org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:231)
>         at org.apache.spark.executor.Executor.stop(Executor.scala:334)
>         at 
> org.apache.spark.executor.Executor.$anonfun$stopHookReference$1(Executor.scala:76)
>         at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214)
>         at 
> org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188)
>         at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>         at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2025)
>         at 
> org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188)
>         at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>         at scala.util.Try$.apply(Try.scala:213)
>         at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
>         at 
>

[jira] [Commented] (SPARK-36383) NullPointerException throws during executor shutdown

2021-08-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391662#comment-17391662
 ] 

Apache Spark commented on SPARK-36383:
--

User 'Ngone51' has created a pull request for this issue:
https://github.com/apache/spark/pull/33612

> NullPointerException throws during executor shutdown
> 
>
> Key: SPARK-36383
> URL: https://issues.apache.org/jira/browse/SPARK-36383
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.3, 3.1.2, 3.2.0, 3.3.0
>Reporter: wuyi
>Priority: Major
>
> {code:java}
> 21/07/23 16:04:10 WARN Executor: Unable to stop executor metrics poller
> java.lang.NullPointerException
>         at org.apache.spark.executor.Executor.stop(Executor.scala:318)
>         at 
> org.apache.spark.executor.Executor.$anonfun$stopHookReference$1(Executor.scala:76)
>         at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214)
>         at 
> org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188)
>         at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>         at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2025)
>         at 
> org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188)
>         at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>         at scala.util.Try$.apply(Try.scala:213)
>         at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
>         at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> 21/07/23 16:04:10 WARN Executor: Unable to stop heartbeater
> java.lang.NullPointerException
>         at org.apache.spark.executor.Executor.stop(Executor.scala:324)
>         at 
> org.apache.spark.executor.Executor.$anonfun$stopHookReference$1(Executor.scala:76)
>         at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214)
>         at 
> org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188)
>         at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>         at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2025)
>         at 
> org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188)
>         at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>         at scala.util.Try$.apply(Try.scala:213)
>         at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
>         at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> 21/07/23 16:04:10 ERROR Utils: Uncaught exception in thread shutdown-hook-0
> java.lang.NullPointerException
>         at 
> org.apache.spark.executor.Executor.$anonfun$stop$3(Executor.scala:334)
>         at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>         at 
> org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:231)
>         at org.apache.spark.executor.Executor.stop(Executor.scala:334)
>         at 
> org.apache.spark.executor.Executor.$anonfun$stopHookReference$1(Executor.scala:76)
>         at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214)
>         at 
> org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188)
>         at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>         at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2025)
>         at 
> org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188)
>         at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>         at scala.util.Try$.apply(Try.scala:213)
>         at 
>

[jira] [Created] (SPARK-36384) Add documentation for shuffle checksum

2021-08-02 Thread wuyi (Jira)

wuyi created SPARK-36384:


 Summary: Add documentation for shuffle checksum
 Key: SPARK-36384
 URL: https://issues.apache.org/jira/browse/SPARK-36384
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 3.2.0, 3.3.0
Reporter: wuyi


Add documentation for shuffle checksum



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35275) Add checksum for shuffle blocks

2021-08-02 Thread wuyi (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391654#comment-17391654
 ] 

wuyi commented on SPARK-35275:
--

[~mridulm80] Yes, we shall have the doc task. Let me create it.

> Add checksum for shuffle blocks
> ---
>
> Key: SPARK-35275
> URL: https://issues.apache.org/jira/browse/SPARK-35275
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: wuyi
>Priority: Major
>
> Shuffle data corruption is a long-standing issue in Spark. For example, in 
> SPARK-18105, people continually reports corruption issue. However, data 
> corruption is difficult to reproduce in most cases and even harder to tell 
> the root cause. We don't know if it's a Spark issue or not. With the checksum 
> support for the shuffle, Spark itself can at least distinguish the cause 
> between disk and network, which is very important for users.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36224) Use "void" as the type name of NullType

2021-08-02 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-36224.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33437
[https://github.com/apache/spark/pull/33437]

> Use "void" as the type name of NullType
> ---
>
> Key: SPARK-36224
> URL: https://issues.apache.org/jira/browse/SPARK-36224
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Linhong Liu
>Assignee: Linhong Liu
>Priority: Major
> Fix For: 3.2.0
>
>
> In PR: [https://github.com/apache/spark/pull/28833,] we support parsing 
> "void" as NullType. But still use "null" as the type name. This leads some 
> confusing and inconsistent issues. For example:
> `org.apache.spark.sql.types.DataType.fromDDL(NullType.toDDL)` is not working



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36224) Use "void" as the type name of NullType

2021-08-02 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-36224:
---

Assignee: Linhong Liu

> Use "void" as the type name of NullType
> ---
>
> Key: SPARK-36224
> URL: https://issues.apache.org/jira/browse/SPARK-36224
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Linhong Liu
>Assignee: Linhong Liu
>Priority: Major
>
> In PR: [https://github.com/apache/spark/pull/28833,] we support parsing 
> "void" as NullType. But still use "null" as the type name. This leads some 
> confusing and inconsistent issues. For example:
> `org.apache.spark.sql.types.DataType.fromDDL(NullType.toDDL)` is not working



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35275) Add checksum for shuffle blocks

2021-08-02 Thread Gengliang Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391643#comment-17391643
 ] 

Gengliang Wang commented on SPARK-35275:


[~mridulm80]+1. I will cut RC around the 10th. We still have time.

> Add checksum for shuffle blocks
> ---
>
> Key: SPARK-35275
> URL: https://issues.apache.org/jira/browse/SPARK-35275
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: wuyi
>Priority: Major
>
> Shuffle data corruption is a long-standing issue in Spark. For example, in 
> SPARK-18105, people continually reports corruption issue. However, data 
> corruption is difficult to reproduce in most cases and even harder to tell 
> the root cause. We don't know if it's a Spark issue or not. With the checksum 
> support for the shuffle, Spark itself can at least distinguish the cause 
> between disk and network, which is very important for users.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35275) Add checksum for shuffle blocks

2021-08-02 Thread Mridul Muralidharan (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391640#comment-17391640
 ] 

Mridul Muralidharan commented on SPARK-35275:
-

Do we want to add a documentation task for this jira as well [~Ngone51] ?

> Add checksum for shuffle blocks
> ---
>
> Key: SPARK-35275
> URL: https://issues.apache.org/jira/browse/SPARK-35275
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: wuyi
>Priority: Major
>
> Shuffle data corruption is a long-standing issue in Spark. For example, in 
> SPARK-18105, people continually reports corruption issue. However, data 
> corruption is difficult to reproduce in most cases and even harder to tell 
> the root cause. We don't know if it's a Spark issue or not. With the checksum 
> support for the shuffle, Spark itself can at least distinguish the cause 
> between disk and network, which is very important for users.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36325) Writing to hiveserver throught jdbc throws ParseException

2021-08-02 Thread Jira



[ 
https://issues.apache.org/jira/browse/SPARK-36325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391638#comment-17391638
 ] 

Jesús Ricardo Ballesteros Molina commented on SPARK-36325:
--

Hello, firstable thank you for your reply. I used the dialect but now I have 
another issue, and this one I don't know how to address it.

 

 
{code:java}
import org.apache.spark.sql.jdbc.{JdbcDialects, JdbcType, JdbcDialect}
import org.apache.spark.sql.types.StringType
import java.sql.Types
import org.apache.spark.sql.types.DataType
val HiveDialect = new JdbcDialect { 
override def canHandle(url: String): Boolean = url.startsWith("jdbc:hive2") || 
url.contains("hive2")
override def quoteIdentifier(colName: String): String ={ s"$colName" }
override def getJDBCType(dt: DataType): Option[JdbcType] = dt match {
 case StringType => Option(JdbcType("STRING", Types.VARCHAR))
 case _ => None
 }
}

JdbcDialects.registerDialect(HiveDialect)
df_linux.write.mode("overwrite")
 .format("jdbc")
 .option("driver","org.apache.hive.jdbc.HiveDriver")
 .option("url", "jdbc:hive2://sa3secessuperset01.a3sec.local:1")
 .option("dbtable", "o365new")
 //.option("createTableColumnTypes", "_time VARCHAR(1024), raw_log 
VARCHAR(1024), service_name VARCHAR(1024), hostname VARCHAR(1024), pid 
VARCHAR(1024), username VARCHAR(1024), source_ip VARCHAR(1024)")
 .option("createTableColumnTypes", "time STRING, raw_log STRING, service_name 
STRING, hostname STRING, pid STRING, username STRING, source_ip STRING")
 .save()
{code}
 

I get this error:

 
{code:java}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 32.0 failed 4 times, most recent failure: Lost task 0.3 in stage 32.0 
(TID 423) (10.103.0.118 executor 2): java.sql.SQLFeatureNotSupportedException: 
Method not supported
 at 
org.apache.hive.jdbc.HivePreparedStatement.addBatch(HivePreparedStatement.java:78)
 at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:683)
 at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$saveTable$1(JdbcUtils.scala:856)
 at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$saveTable$1$adapted(JdbcUtils.scala:854)
 
{code}
 

Maybe jdbc is not the way to write throught thriftserver but I don't know how 
to do it. At the moment I am using another database but I really want to use 
the SparkSQL. If you think I should close this issue and maybe open as 
something else feel free to close the ticket.

 

 

 

> Writing to hiveserver throught jdbc throws ParseException
> -
>
> Key: SPARK-36325
> URL: https://issues.apache.org/jira/browse/SPARK-36325
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
> Environment: OS: Debian 10
> Spark version: 3.1.2
> Zeppelin Notebook: 0.9.0
> Jdbc driver:  org.apache.hive:hive-jdbc:3.1.2  
>Reporter: Jesús Ricardo Ballesteros Molina
>Priority: Major
>  Labels: spark, spark-sql
>
> Hello everyone, I am new working on Spark and this is my first post. If I 
> make a mistake please be kind to me but I have searched in the web and I 
> haven't found anything related. If this bug is duplicated or something please 
> feel free to close it and tell me where to look. 
> I am working with Zeppelin, I got a dataframe from Solr API, I processed and 
> I want to write to a table trough thrift  and read that new table from Apache 
> SuperSet.
>  
> I have this df with this schema:
> {code:java}
> %spark
> df_linux.printSchema()
> root
>  |-- time: string (nullable = false)
>  |-- raw_log: string (nullable = false)
>  |-- service_name: string (nullable = false)
>  |-- hostname: string (nullable = false)
>  |-- pid: string (nullable = false)
>  |-- username: string (nullable = false)
>  |-- source_ip: string (nullable = false)
> {code}
>  
> And this content:
>  
> {code:java}
> %spark
> df_linux.show()
> ++++--+-++-+
> | time| raw_log|service_name| hostname| pid|username|source_ip|
> ++++--+-++-+
> |2021-07-28T07:41:53Z|Jul 28 07:41:52 s...| 
> sshd[11611]|sa3secessuperset01|11611| debian| 10.0.9.3|
> |2021-07-28T07:41:44Z|Jul 28 07:41:43 s...| 
> sshd[11590]|sa3secessuperset01|11590| debian| 10.0.9.3|
> |2021-07-27T08:46:11Z|Jul 27 08:46:10 s...| 
> sshd[16954]|sa3secessuperset01|16954| debian| 10.0.9.3|
> |2021-07-27T08:44:55Z|Jul 27 08:44:54 s...| 
> sshd[16511]|sa3secessuperset01|16511| debian| 10.0.9.3|
> |2021-07-27T08:30:03Z|Jul 27 08:30:02 s...| 
> sshd[14511]|sa3secessuperset01|14511| debian| 10.0.9.3|
>

[jira] [Assigned] (SPARK-36206) Diagnose shuffle data corruption by checksum

2021-08-02 Thread Mridul Muralidharan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan reassigned SPARK-36206:
---

Assignee: wuyi

> Diagnose shuffle data corruption by checksum
> 
>
> Key: SPARK-36206
> URL: https://issues.apache.org/jira/browse/SPARK-36206
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
>
> After adding checksums in SPARK-35276, we can leverage the checksums to do 
> diagnosis for shuffle data corruption now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36206) Diagnose shuffle data corruption by checksum

2021-08-02 Thread Mridul Muralidharan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan resolved SPARK-36206.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33451
[https://github.com/apache/spark/pull/33451]

> Diagnose shuffle data corruption by checksum
> 
>
> Key: SPARK-36206
> URL: https://issues.apache.org/jira/browse/SPARK-36206
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
> Fix For: 3.2.0
>
>
> After adding checksums in SPARK-35276, we can leverage the checksums to do 
> diagnosis for shuffle data corruption now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36383) NullPointerException throws during executor shutdown

2021-08-02 Thread wuyi (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wuyi updated SPARK-36383:
-
Summary: NullPointerException throws during executor shutdown  (was: Avoid 
NullPointerException during executor shutdown)

> NullPointerException throws during executor shutdown
> 
>
> Key: SPARK-36383
> URL: https://issues.apache.org/jira/browse/SPARK-36383
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.3, 3.1.2, 3.2.0, 3.3.0
>Reporter: wuyi
>Priority: Major
>
> {code:java}
> 21/07/23 16:04:10 WARN Executor: Unable to stop executor metrics poller
> java.lang.NullPointerException
>         at org.apache.spark.executor.Executor.stop(Executor.scala:318)
>         at 
> org.apache.spark.executor.Executor.$anonfun$stopHookReference$1(Executor.scala:76)
>         at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214)
>         at 
> org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188)
>         at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>         at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2025)
>         at 
> org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188)
>         at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>         at scala.util.Try$.apply(Try.scala:213)
>         at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
>         at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> 21/07/23 16:04:10 WARN Executor: Unable to stop heartbeater
> java.lang.NullPointerException
>         at org.apache.spark.executor.Executor.stop(Executor.scala:324)
>         at 
> org.apache.spark.executor.Executor.$anonfun$stopHookReference$1(Executor.scala:76)
>         at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214)
>         at 
> org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188)
>         at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>         at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2025)
>         at 
> org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188)
>         at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>         at scala.util.Try$.apply(Try.scala:213)
>         at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
>         at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> 21/07/23 16:04:10 ERROR Utils: Uncaught exception in thread shutdown-hook-0
> java.lang.NullPointerException
>         at 
> org.apache.spark.executor.Executor.$anonfun$stop$3(Executor.scala:334)
>         at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>         at 
> org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:231)
>         at org.apache.spark.executor.Executor.stop(Executor.scala:334)
>         at 
> org.apache.spark.executor.Executor.$anonfun$stopHookReference$1(Executor.scala:76)
>         at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214)
>         at 
> org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188)
>         at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>         at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2025)
>         at 
> org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188)
>         at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>         at scala.util.Try$.apply(Try.scala:213)
>         at 
>

[jira] [Created] (SPARK-36383) Avoid NullPointerException during executor shutdown

2021-08-02 Thread wuyi (Jira)

wuyi created SPARK-36383:


 Summary: Avoid NullPointerException during executor shutdown
 Key: SPARK-36383
 URL: https://issues.apache.org/jira/browse/SPARK-36383
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.1.2, 3.0.3, 3.2.0, 3.3.0
Reporter: wuyi


{code:java}
21/07/23 16:04:10 WARN Executor: Unable to stop executor metrics poller
java.lang.NullPointerException
        at org.apache.spark.executor.Executor.stop(Executor.scala:318)
        at 
org.apache.spark.executor.Executor.$anonfun$stopHookReference$1(Executor.scala:76)
        at 
org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214)
        at 
org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188)
        at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2025)
        at 
org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188)
        at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at scala.util.Try$.apply(Try.scala:213)
        at 
org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
        at 
org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
21/07/23 16:04:10 WARN Executor: Unable to stop heartbeater
java.lang.NullPointerException
        at org.apache.spark.executor.Executor.stop(Executor.scala:324)
        at 
org.apache.spark.executor.Executor.$anonfun$stopHookReference$1(Executor.scala:76)
        at 
org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214)
        at 
org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188)
        at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2025)
        at 
org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188)
        at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at scala.util.Try$.apply(Try.scala:213)
        at 
org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
        at 
org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
21/07/23 16:04:10 ERROR Utils: Uncaught exception in thread shutdown-hook-0
java.lang.NullPointerException
        at 
org.apache.spark.executor.Executor.$anonfun$stop$3(Executor.scala:334)
        at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:231)
        at org.apache.spark.executor.Executor.stop(Executor.scala:334)
        at 
org.apache.spark.executor.Executor.$anonfun$stopHookReference$1(Executor.scala:76)
        at 
org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214)
        at 
org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188)
        at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2025)
        at 
org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188)
        at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at scala.util.Try$.apply(Try.scala:213)
        at 
org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
        at 
org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at

[jira] [Assigned] (SPARK-36382) Remove noisy footer from the summary table for metrics

2021-08-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36382:


Assignee: Kousuke Saruta  (was: Apache Spark)

> Remove noisy footer from the summary table for metrics
> --
>
> Key: SPARK-36382
> URL: https://issues.apache.org/jira/browse/SPARK-36382
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.3.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> In the WebUI, some tables are implemented using DataTables 
> (https://datatables.net/).
> By default, tables created using DataTables shows footer which says `Showing 
> x to y of z entries`, which is helpful for some tables if table entries can 
> grow
> But the summary table for metrics in StagePage cannot grow so it's a little 
> bit noisy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36382) Remove noisy footer from the summary table for metrics

2021-08-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36382:


Assignee: Apache Spark  (was: Kousuke Saruta)

> Remove noisy footer from the summary table for metrics
> --
>
> Key: SPARK-36382
> URL: https://issues.apache.org/jira/browse/SPARK-36382
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.3.0
>Reporter: Kousuke Saruta
>Assignee: Apache Spark
>Priority: Minor
>
> In the WebUI, some tables are implemented using DataTables 
> (https://datatables.net/).
> By default, tables created using DataTables shows footer which says `Showing 
> x to y of z entries`, which is helpful for some tables if table entries can 
> grow
> But the summary table for metrics in StagePage cannot grow so it's a little 
> bit noisy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36382) Remove noisy footer from the summary table for metrics

2021-08-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391584#comment-17391584
 ] 

Apache Spark commented on SPARK-36382:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/33611

> Remove noisy footer from the summary table for metrics
> --
>
> Key: SPARK-36382
> URL: https://issues.apache.org/jira/browse/SPARK-36382
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.3.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> In the WebUI, some tables are implemented using DataTables 
> (https://datatables.net/).
> By default, tables created using DataTables shows footer which says `Showing 
> x to y of z entries`, which is helpful for some tables if table entries can 
> grow
> But the summary table for metrics in StagePage cannot grow so it's a little 
> bit noisy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36382) Remove noisy footer from the summary table for metrics

2021-08-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391585#comment-17391585
 ] 

Apache Spark commented on SPARK-36382:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/33611

> Remove noisy footer from the summary table for metrics
> --
>
> Key: SPARK-36382
> URL: https://issues.apache.org/jira/browse/SPARK-36382
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.3.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> In the WebUI, some tables are implemented using DataTables 
> (https://datatables.net/).
> By default, tables created using DataTables shows footer which says `Showing 
> x to y of z entries`, which is helpful for some tables if table entries can 
> grow
> But the summary table for metrics in StagePage cannot grow so it's a little 
> bit noisy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36382) Remove noisy footer from the summary table for metrics

2021-08-02 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-36382:
---
Summary: Remove noisy footer from the summary table for metrics  (was: 
Remove unnecesssary footer from the summary table for metrics)

> Remove noisy footer from the summary table for metrics
> --
>
> Key: SPARK-36382
> URL: https://issues.apache.org/jira/browse/SPARK-36382
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.3.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> In the WebUI, some tables are implemented using DataTables 
> (https://datatables.net/).
> By default, tables created using DataTables shows footer which says `Showing 
> x to y of z entries`, which is helpful for some tables if table entries can 
> grow
> But the summary table for metrics in StagePage cannot grow so it's a little 
> bit noisy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36382) Remove unnecesssary footer from the summary table for metrics

2021-08-02 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-36382:
--

 Summary: Remove unnecesssary footer from the summary table for 
metrics
 Key: SPARK-36382
 URL: https://issues.apache.org/jira/browse/SPARK-36382
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 3.3.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


In the WebUI, some tables are implemented using DataTables 
(https://datatables.net/).
By default, tables created using DataTables shows footer which says `Showing x 
to y of z entries`, which is helpful for some tables if table entries can grow
But the summary table for metrics in StagePage cannot grow so it's a little bit 
noisy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35918) Consolidate logic between AvroSerializer/AvroDeserializer for schema mismatch handling and error messages

2021-08-02 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-35918.

Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 33308
[https://github.com/apache/spark/pull/33308]

> Consolidate logic between AvroSerializer/AvroDeserializer for schema mismatch 
> handling and error messages
> -
>
> Key: SPARK-35918
> URL: https://issues.apache.org/jira/browse/SPARK-35918
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: 3.3.0
>
>
> While working on [PR #31490|https://github.com/apache/spark/pull/31490] for 
> SPARK-34365, we discussed that there is room for improvement in how schema 
> mismatch errors are reported 
> ([comment1|https://github.com/apache/spark/pull/31490#discussion_r659970793], 
> [comment2|https://github.com/apache/spark/pull/31490#issuecomment-869866848]).
>  We can also consolidate more of the logic between AvroSerializer and 
> AvroDeserializer to avoid some duplication of error handling and consolidate 
> how these error messages are generated.
> This will essentially be taking the [logic from the initial proposal from PR 
> #31490|https://github.com/apache/spark/pull/31490/commits/83a922fdff08528e59233f67930ac78bfb3fa178],
>  but applied separately from the current set of proposed changes to cut down 
> on PR size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35918) Consolidate logic between AvroSerializer/AvroDeserializer for schema mismatch handling and error messages

2021-08-02 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-35918:
--

Assignee: Erik Krogen

> Consolidate logic between AvroSerializer/AvroDeserializer for schema mismatch 
> handling and error messages
> -
>
> Key: SPARK-35918
> URL: https://issues.apache.org/jira/browse/SPARK-35918
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
>
> While working on [PR #31490|https://github.com/apache/spark/pull/31490] for 
> SPARK-34365, we discussed that there is room for improvement in how schema 
> mismatch errors are reported 
> ([comment1|https://github.com/apache/spark/pull/31490#discussion_r659970793], 
> [comment2|https://github.com/apache/spark/pull/31490#issuecomment-869866848]).
>  We can also consolidate more of the logic between AvroSerializer and 
> AvroDeserializer to avoid some duplication of error handling and consolidate 
> how these error messages are generated.
> This will essentially be taking the [logic from the initial proposal from PR 
> #31490|https://github.com/apache/spark/pull/31490/commits/83a922fdff08528e59233f67930ac78bfb3fa178],
>  but applied separately from the current set of proposed changes to cut down 
> on PR size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36381) ALTER TABLE ADD/RENAME COLUMNS check exist does not use case sensitive for v2 command.

2021-08-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36381:


Assignee: Apache Spark

> ALTER TABLE ADD/RENAME COLUMNS check exist does not use case sensitive for v2 
> command.
> --
>
> Key: SPARK-36381
> URL: https://issues.apache.org/jira/browse/SPARK-36381
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: PengLei
>Assignee: Apache Spark
>Priority: Major
>
> ALTER TABLE ADD/RENAME COLUMNS check exist does not use case sensitive for v2 
> command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36381) ALTER TABLE ADD/RENAME COLUMNS check exist does not use case sensitive for v2 command.

2021-08-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391559#comment-17391559
 ] 

Apache Spark commented on SPARK-36381:
--

User 'Peng-Lei' has created a pull request for this issue:
https://github.com/apache/spark/pull/33610

> ALTER TABLE ADD/RENAME COLUMNS check exist does not use case sensitive for v2 
> command.
> --
>
> Key: SPARK-36381
> URL: https://issues.apache.org/jira/browse/SPARK-36381
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: PengLei
>Priority: Major
>
> ALTER TABLE ADD/RENAME COLUMNS check exist does not use case sensitive for v2 
> command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36381) ALTER TABLE ADD/RENAME COLUMNS check exist does not use case sensitive for v2 command.

2021-08-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36381:


Assignee: (was: Apache Spark)

> ALTER TABLE ADD/RENAME COLUMNS check exist does not use case sensitive for v2 
> command.
> --
>
> Key: SPARK-36381
> URL: https://issues.apache.org/jira/browse/SPARK-36381
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: PengLei
>Priority: Major
>
> ALTER TABLE ADD/RENAME COLUMNS check exist does not use case sensitive for v2 
> command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36381) ALTER TABLE ADD/RENAME COLUMNS check exist does not use case sensitive for v2 command.

2021-08-02 Thread PengLei (Jira)

PengLei created SPARK-36381:
---

 Summary: ALTER TABLE ADD/RENAME COLUMNS check exist does not use 
case sensitive for v2 command.
 Key: SPARK-36381
 URL: https://issues.apache.org/jira/browse/SPARK-36381
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.0
Reporter: PengLei


ALTER TABLE ADD/RENAME COLUMNS check exist does not use case sensitive for v2 
command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36237) SparkUI should bind handler after application started

2021-08-02 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-36237.

Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 33457
[https://github.com/apache/spark/pull/33457]

> SparkUI should bind handler after application started
> -
>
> Key: SPARK-36237
> URL: https://issues.apache.org/jira/browse/SPARK-36237
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.3.0
>
>
> When we use prometheus to fetch metrics, always call before application 
> started.
> Then throw alot of exception not of NoSuchElementException
> {code:java}
> 21/07/19 04:53:37 INFO Client: Preparing resources for our AM container
> 21/07/19 04:53:37 INFO Client: Uploading resource 
> hdfs://tl3/packages/jars/spark-2.4-archive.tar.gz -> 
> hdfs://R2/user/xiaoke.zhou/.sparkStaging/application_1624456325569_7143920/spark-2.4-archive.tar.gz
> 21/07/19 04:53:37 WARN JettyUtils: GET /jobs/ failed: 
> java.util.NoSuchElementException: Failed to get the application information. 
> If you are starting up Spark, please wait a while until it's ready.
> java.util.NoSuchElementException: Failed to get the application information. 
> If you are starting up Spark, please wait a while until it's ready.
>   at 
> org.apache.spark.status.AppStatusStore.applicationInfo(AppStatusStore.scala:43)
>   at org.apache.spark.ui.jobs.AllJobsPage.render(AllJobsPage.scala:275)
>   at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:90)
>   at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:90)
>   at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>   at 
> org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>   at 
> org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>   at 
> org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>   at 
> org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:513)
>   at 
> org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>   at 
> org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
>   at 
> org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>   at 
> org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>   at org.spark_project.jetty.server.Server.handle(Server.java:539)
>   at org.spark_project.jetty.server.HttpChannel.handle(Htt
> [2021-07-19 04:54:55,111] INFO - pChannel.java:333)
>   at 
> org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>   at 
> org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>   at 
> org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108)
>   at 
> org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>   at 
> org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>   at 
> org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>   at 
> org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>   at 
> org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>   at 
> org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>   at java.lang.Thread.run(Thread.java:748)
> 21/07/19 04:53:37 WARN ServletHandler: /jobs/
> java.util.NoSuchElementException: Failed to get the application information. 
> If you are starting up Spark, please wait a while until it's ready.
>   at 
> org.apache.spark.status.AppStatusStore.applicationInfo(AppStatusStore.scala:43)
>   at org.apache.spark.ui.jobs.AllJobsPage.render(AllJobsPage.scala:275)
>   at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:90)
>   at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:90)
>   at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>   at 
>

[jira] [Assigned] (SPARK-36237) SparkUI should bind handler after application started

2021-08-02 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-36237:
--

Assignee: angerszhu

> SparkUI should bind handler after application started
> -
>
> Key: SPARK-36237
> URL: https://issues.apache.org/jira/browse/SPARK-36237
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
>
> When we use prometheus to fetch metrics, always call before application 
> started.
> Then throw alot of exception not of NoSuchElementException
> {code:java}
> 21/07/19 04:53:37 INFO Client: Preparing resources for our AM container
> 21/07/19 04:53:37 INFO Client: Uploading resource 
> hdfs://tl3/packages/jars/spark-2.4-archive.tar.gz -> 
> hdfs://R2/user/xiaoke.zhou/.sparkStaging/application_1624456325569_7143920/spark-2.4-archive.tar.gz
> 21/07/19 04:53:37 WARN JettyUtils: GET /jobs/ failed: 
> java.util.NoSuchElementException: Failed to get the application information. 
> If you are starting up Spark, please wait a while until it's ready.
> java.util.NoSuchElementException: Failed to get the application information. 
> If you are starting up Spark, please wait a while until it's ready.
>   at 
> org.apache.spark.status.AppStatusStore.applicationInfo(AppStatusStore.scala:43)
>   at org.apache.spark.ui.jobs.AllJobsPage.render(AllJobsPage.scala:275)
>   at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:90)
>   at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:90)
>   at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>   at 
> org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>   at 
> org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>   at 
> org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>   at 
> org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:513)
>   at 
> org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>   at 
> org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
>   at 
> org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>   at 
> org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>   at org.spark_project.jetty.server.Server.handle(Server.java:539)
>   at org.spark_project.jetty.server.HttpChannel.handle(Htt
> [2021-07-19 04:54:55,111] INFO - pChannel.java:333)
>   at 
> org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>   at 
> org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>   at 
> org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108)
>   at 
> org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>   at 
> org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>   at 
> org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>   at 
> org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>   at 
> org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>   at 
> org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>   at java.lang.Thread.run(Thread.java:748)
> 21/07/19 04:53:37 WARN ServletHandler: /jobs/
> java.util.NoSuchElementException: Failed to get the application information. 
> If you are starting up Spark, please wait a while until it's ready.
>   at 
> org.apache.spark.status.AppStatusStore.applicationInfo(AppStatusStore.scala:43)
>   at org.apache.spark.ui.jobs.AllJobsPage.render(AllJobsPage.scala:275)
>   at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:90)
>   at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:90)
>   at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>   at 
> org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>   at 
>

[jira] [Assigned] (SPARK-36380) Simplify the logical plan names for ALTER TABLE ... COLUMN

2021-08-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36380:


Assignee: Apache Spark

> Simplify the logical plan names for ALTER TABLE ... COLUMN
> --
>
> Key: SPARK-36380
> URL: https://issues.apache.org/jira/browse/SPARK-36380
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36380) Simplify the logical plan names for ALTER TABLE ... COLUMN

2021-08-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391505#comment-17391505
 ] 

Apache Spark commented on SPARK-36380:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/33609

> Simplify the logical plan names for ALTER TABLE ... COLUMN
> --
>
> Key: SPARK-36380
> URL: https://issues.apache.org/jira/browse/SPARK-36380
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36380) Simplify the logical plan names for ALTER TABLE ... COLUMN

2021-08-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36380:


Assignee: (was: Apache Spark)

> Simplify the logical plan names for ALTER TABLE ... COLUMN
> --
>
> Key: SPARK-36380
> URL: https://issues.apache.org/jira/browse/SPARK-36380
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36380) Simplify the logical plan names for ALTER TABLE ... COLUMN

2021-08-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391506#comment-17391506
 ] 

Apache Spark commented on SPARK-36380:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/33609

> Simplify the logical plan names for ALTER TABLE ... COLUMN
> --
>
> Key: SPARK-36380
> URL: https://issues.apache.org/jira/browse/SPARK-36380
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36380) Simplify the logical plan names for ALTER TABLE ... COLUMN

2021-08-02 Thread Wenchen Fan (Jira)

Wenchen Fan created SPARK-36380:
---

 Summary: Simplify the logical plan names for ALTER TABLE ... COLUMN
 Key: SPARK-36380
 URL: https://issues.apache.org/jira/browse/SPARK-36380
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36372) ALTER TABLE ADD COLUMNS should check duplicates for the specified columns for v2 command

2021-08-02 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-36372.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 33600
[https://github.com/apache/spark/pull/33600]

> ALTER TABLE ADD COLUMNS should check duplicates for the specified columns for 
> v2 command
> 
>
> Key: SPARK-36372
> URL: https://issues.apache.org/jira/browse/SPARK-36372
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Major
> Fix For: 3.2.0
>
>
> ALTER TABLE ADD COLUMNS currently doesn't check duplicates for the specified 
> columns for v2 command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36372) ALTER TABLE ADD COLUMNS should check duplicates for the specified columns for v2 command

2021-08-02 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-36372:
---

Assignee: Terry Kim

> ALTER TABLE ADD COLUMNS should check duplicates for the specified columns for 
> v2 command
> 
>
> Key: SPARK-36372
> URL: https://issues.apache.org/jira/browse/SPARK-36372
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Major
>
> ALTER TABLE ADD COLUMNS currently doesn't check duplicates for the specified 
> columns for v2 command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36379) Null at root level of a JSON array causes the parsing failure (w/ permissive mode)

2021-08-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36379:


Assignee: Apache Spark

> Null at root level of a JSON array causes the parsing failure (w/ permissive 
> mode)
> --
>
> Key: SPARK-36379
> URL: https://issues.apache.org/jira/browse/SPARK-36379
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2, 3.2.0, 3.3.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Minor
>
> {code}
> scala> spark.read.json(Seq("""[{"a": "str"}, null, {"a": 
> "str"}]""").toDS).collect()
> {code}
> {code}
> ...
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 
> (TID 1) (172.30.3.20 executor driver): java.lang.NullPointerException
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
> {code}
> Since the mode (by default) is permissive, we shouldn't just fail like above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36379) Null at root level of a JSON array causes the parsing failure (w/ permissive mode)

2021-08-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391400#comment-17391400
 ] 

Apache Spark commented on SPARK-36379:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/33608

> Null at root level of a JSON array causes the parsing failure (w/ permissive 
> mode)
> --
>
> Key: SPARK-36379
> URL: https://issues.apache.org/jira/browse/SPARK-36379
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2, 3.2.0, 3.3.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> {code}
> scala> spark.read.json(Seq("""[{"a": "str"}, null, {"a": 
> "str"}]""").toDS).collect()
> {code}
> {code}
> ...
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 
> (TID 1) (172.30.3.20 executor driver): java.lang.NullPointerException
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
> {code}
> Since the mode (by default) is permissive, we shouldn't just fail like above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36379) Null at root level of a JSON array causes the parsing failure (w/ permissive mode)

2021-08-02 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36379:


Assignee: (was: Apache Spark)

> Null at root level of a JSON array causes the parsing failure (w/ permissive 
> mode)
> --
>
> Key: SPARK-36379
> URL: https://issues.apache.org/jira/browse/SPARK-36379
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2, 3.2.0, 3.3.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> {code}
> scala> spark.read.json(Seq("""[{"a": "str"}, null, {"a": 
> "str"}]""").toDS).collect()
> {code}
> {code}
> ...
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 
> (TID 1) (172.30.3.20 executor driver): java.lang.NullPointerException
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
> {code}
> Since the mode (by default) is permissive, we shouldn't just fail like above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36175) Support TimestampNTZ in Avro data source

2021-08-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391385#comment-17391385
 ] 

Apache Spark commented on SPARK-36175:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/33607

> Support TimestampNTZ in Avro data source 
> -
>
> Key: SPARK-36175
> URL: https://issues.apache.org/jira/browse/SPARK-36175
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.3.0
>
>
> As per the Avro spec 
> https://avro.apache.org/docs/1.10.2/spec.html#Local+timestamp+%28microsecond+precision%29,
>  Spark can convert TimestampNTZ type from/to Avro's Local timestamp type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36175) Support TimestampNTZ in Avro data source

2021-08-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391386#comment-17391386
 ] 

Apache Spark commented on SPARK-36175:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/33607

> Support TimestampNTZ in Avro data source 
> -
>
> Key: SPARK-36175
> URL: https://issues.apache.org/jira/browse/SPARK-36175
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.3.0
>
>
> As per the Avro spec 
> https://avro.apache.org/docs/1.10.2/spec.html#Local+timestamp+%28microsecond+precision%29,
>  Spark can convert TimestampNTZ type from/to Avro's Local timestamp type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35815) Allow delayThreshold for watermark to be represented as ANSI day-time/year-month interval literals

2021-08-02 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391373#comment-17391373
 ] 

Apache Spark commented on SPARK-35815:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/33606

> Allow delayThreshold for watermark to be represented as ANSI 
> day-time/year-month interval literals
> --
>
> Key: SPARK-35815
> URL: https://issues.apache.org/jira/browse/SPARK-35815
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Structured Streaming
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
> Fix For: 3.2.0
>
>
> delayThreshold parameter of DataFrame.withWatermark should handle ANSI 
> day-time/year-month interval literals.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36379) Null at root level of a JSON array causes the parsing failure (w/ permissive mode)

2021-08-02 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-36379:
-
Issue Type: Bug  (was: Improvement)

> Null at root level of a JSON array causes the parsing failure (w/ permissive 
> mode)
> --
>
> Key: SPARK-36379
> URL: https://issues.apache.org/jira/browse/SPARK-36379
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2, 3.2.0, 3.3.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> scala> spark.read.json(Seq("""[{"a": "str"}, null, {"a": 
> "str"}]""").toDS).collect()
> {code}
> {code}
> ...
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 
> (TID 1) (172.30.3.20 executor driver): java.lang.NullPointerException
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
> {code}
> Since the mode (by default) is permissive, we shouldn't just fail like above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36379) Null at root level of a JSON array causes the parsing failure (w/ permissive mode)

2021-08-02 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-36379:
-
Priority: Minor  (was: Major)

> Null at root level of a JSON array causes the parsing failure (w/ permissive 
> mode)
> --
>
> Key: SPARK-36379
> URL: https://issues.apache.org/jira/browse/SPARK-36379
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2, 3.2.0, 3.3.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> {code}
> scala> spark.read.json(Seq("""[{"a": "str"}, null, {"a": 
> "str"}]""").toDS).collect()
> {code}
> {code}
> ...
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 
> (TID 1) (172.30.3.20 executor driver): java.lang.NullPointerException
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
> {code}
> Since the mode (by default) is permissive, we shouldn't just fail like above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36379) Null at root level of a JSON array causes the parsing failure (w/ permissive mode)

2021-08-02 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-36379:


 Summary: Null at root level of a JSON array causes the parsing 
failure (w/ permissive mode)
 Key: SPARK-36379
 URL: https://issues.apache.org/jira/browse/SPARK-36379
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.2, 3.2.0, 3.3.0
Reporter: Hyukjin Kwon



{code}
scala> spark.read.json(Seq("""[{"a": "str"}, null, {"a": 
"str"}]""").toDS).collect()
{code}

{code}
...
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 
1) (172.30.3.20 executor driver): java.lang.NullPointerException
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
{code}

Since the mode (by default) is permissive, we shouldn't just fail like above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35917) Disable push-based shuffle until the feature is complete

2021-08-02 Thread Mridul Muralidharan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan updated SPARK-35917:

Fix Version/s: (was: 3.2.0)

> Disable push-based shuffle until the feature is complete
> 
>
> Key: SPARK-35917
> URL: https://issues.apache.org/jira/browse/SPARK-35917
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Chandni Singh
>Priority: Major
>
> Push-based shuffle is partially merged in apache master but some of the tasks 
> are still incomplete. Since 3.2 is going to cut soon, we will not be able to 
> get the pending tasks reviewed and merged. Few of the pending tasks make 
> protocol changes to the push-based shuffle protocols, so we would like to 
> prevent users from enabling push-based shuffle both on the client and the 
> server until push-based shuffle implementation is complete. 
> We can prevent push-based shuffle to be used by throwing 
> {{UnsupportedOperationException}} (or something like that) both on the client 
> and the server when the user tries to enable it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35917) Disable push-based shuffle until the feature is complete

2021-08-02 Thread Mridul Muralidharan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan updated SPARK-35917:

Fix Version/s: 3.2.0

> Disable push-based shuffle until the feature is complete
> 
>
> Key: SPARK-35917
> URL: https://issues.apache.org/jira/browse/SPARK-35917
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Chandni Singh
>Priority: Major
> Fix For: 3.2.0
>
>
> Push-based shuffle is partially merged in apache master but some of the tasks 
> are still incomplete. Since 3.2 is going to cut soon, we will not be able to 
> get the pending tasks reviewed and merged. Few of the pending tasks make 
> protocol changes to the push-based shuffle protocols, so we would like to 
> prevent users from enabling push-based shuffle both on the client and the 
> server until push-based shuffle implementation is complete. 
> We can prevent push-based shuffle to be used by throwing 
> {{UnsupportedOperationException}} (or something like that) both on the client 
> and the server when the user tries to enable it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35917) Disable push-based shuffle until the feature is complete

2021-08-02 Thread Mridul Muralidharan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan reassigned SPARK-35917:
---

Assignee: (was: Mridul Muralidharan)

> Disable push-based shuffle until the feature is complete
> 
>
> Key: SPARK-35917
> URL: https://issues.apache.org/jira/browse/SPARK-35917
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Chandni Singh
>Priority: Major
>
> Push-based shuffle is partially merged in apache master but some of the tasks 
> are still incomplete. Since 3.2 is going to cut soon, we will not be able to 
> get the pending tasks reviewed and merged. Few of the pending tasks make 
> protocol changes to the push-based shuffle protocols, so we would like to 
> prevent users from enabling push-based shuffle both on the client and the 
> server until push-based shuffle implementation is complete. 
> We can prevent push-based shuffle to be used by throwing 
> {{UnsupportedOperationException}} (or something like that) both on the client 
> and the server when the user tries to enable it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36306) Refactor seventeenth set of 20 query execution errors to use error classes

2021-08-02 Thread PengLei (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391362#comment-17391362
 ] 

PengLei commented on SPARK-36306:
-

working on this

> Refactor seventeenth set of 20 query execution errors to use error classes
> --
>
> Key: SPARK-36306
> URL: https://issues.apache.org/jira/browse/SPARK-36306
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 3.2.0
>Reporter: Karen Feng
>Priority: Major
>
> Refactor some exceptions in 
> [QueryExecutionErrors|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala]
>  to use error classes.
> There are currently ~350 exceptions in this file; so this PR only focuses on 
> the seventeenth set of 20.
> {code:java}
> legacyCheckpointDirectoryExistsError
> subprocessExitedError
> outputDataTypeUnsupportedByNodeWithoutSerdeError
> invalidStartIndexError
> concurrentModificationOnExternalAppendOnlyUnsafeRowArrayError
> doExecuteBroadcastNotImplementedError
> databaseNameConflictWithSystemPreservedDatabaseError
> commentOnTableUnsupportedError
> unsupportedUpdateColumnNullabilityError
> renameColumnUnsupportedForOlderMySQLError
> failedToExecuteQueryError
> nestedFieldUnsupportedError
> transformationsAndActionsNotInvokedByDriverError
> repeatedPivotsUnsupportedError
> pivotNotAfterGroupByUnsupportedError
> {code}
> For more detail, see the parent ticket SPARK-36094.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36305) Refactor sixteenth set of 20 query execution errors to use error classes

2021-08-02 Thread PengLei (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391361#comment-17391361
 ] 

PengLei commented on SPARK-36305:
-

working on this

> Refactor sixteenth set of 20 query execution errors to use error classes
> 
>
> Key: SPARK-36305
> URL: https://issues.apache.org/jira/browse/SPARK-36305
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 3.2.0
>Reporter: Karen Feng
>Priority: Major
>
> Refactor some exceptions in 
> [QueryExecutionErrors|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala]
>  to use error classes.
> There are currently ~350 exceptions in this file; so this PR only focuses on 
> the sixteenth set of 20.
> {code:java}
> cannotDropMultiPartitionsOnNonatomicPartitionTableError
> truncateMultiPartitionUnsupportedError
> overwriteTableByUnsupportedExpressionError
> dynamicPartitionOverwriteUnsupportedByTableError
> failedMergingSchemaError
> cannotBroadcastTableOverMaxTableRowsError
> cannotBroadcastTableOverMaxTableBytesError
> notEnoughMemoryToBuildAndBroadcastTableError
> executeCodePathUnsupportedError
> cannotMergeClassWithOtherClassError
> continuousProcessingUnsupportedByDataSourceError
> failedToReadDataError
> failedToGenerateEpochMarkerError
> foreachWriterAbortedDueToTaskFailureError
> integerOverflowError
> failedToReadDeltaFileError
> failedToReadSnapshotFileError
> cannotPurgeAsBreakInternalStateError
> cleanUpSourceFilesUnsupportedError
> latestOffsetNotCalledError
> {code}
> For more detail, see the parent ticket SPARK-36094.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36304) Refactor fifteenth set of 20 query execution errors to use error classes

2021-08-02 Thread PengLei (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391360#comment-17391360
 ] 

PengLei commented on SPARK-36304:
-

woking on this

> Refactor fifteenth set of 20 query execution errors to use error classes
> 
>
> Key: SPARK-36304
> URL: https://issues.apache.org/jira/browse/SPARK-36304
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 3.2.0
>Reporter: Karen Feng
>Priority: Major
>
> Refactor some exceptions in 
> [QueryExecutionErrors|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala]
>  to use error classes.
> There are currently ~350 exceptions in this file; so this PR only focuses on 
> the fifteenth set of 20.
> {code:java}
> unsupportedOperationExceptionError
> nullLiteralsCannotBeCastedError
> notUserDefinedTypeError
> cannotLoadUserDefinedTypeError
> timeZoneIdNotSpecifiedForTimestampTypeError
> notPublicClassError
> primitiveTypesNotSupportedError
> fieldIndexOnRowWithoutSchemaError
> valueIsNullError
> onlySupportDataSourcesProvidingFileFormatError
> failToSetOriginalPermissionBackError
> failToSetOriginalACLBackError
> multiFailuresInStageMaterializationError
> unrecognizedCompressionSchemaTypeIDError
> getParentLoggerNotImplementedError
> cannotCreateParquetConverterForTypeError
> cannotCreateParquetConverterForDecimalTypeError
> cannotCreateParquetConverterForDataTypeError
> cannotAddMultiPartitionsOnNonatomicPartitionTableError
> userSpecifiedSchemaUnsupportedByDataSourceError
> {code}
> For more detail, see the parent ticket SPARK-36094.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36378) Minor changes to address a few identified server side inefficiencies

2021-08-02 Thread Mridul Muralidharan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan reassigned SPARK-36378:
---

Assignee: (was: Mridul Muralidharan)

> Minor changes to address a few identified server side inefficiencies
> 
>
> Key: SPARK-36378
> URL: https://issues.apache.org/jira/browse/SPARK-36378
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.2.0
>Reporter: Min Shen
>Priority: Major
>
> With the SPIP ticket close to being finished, we have done some performance 
> evaluations to compare the performance of push-based shuffle in upstream 
> Spark with the production version we have internally at LinkedIn.
> The evaluations have revealed a few regressions and also some additional perf 
> improvement opportunity.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 >

1 - 100 of 104 matches

Mail list logo