[jira] [Reopened] (SPARK-36367) Fix the behavior to follow pandas >= 1.3
[ https://issues.apache.org/jira/browse/SPARK-36367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reopened SPARK-36367: -- > Fix the behavior to follow pandas >= 1.3 > > > Key: SPARK-36367 > URL: https://issues.apache.org/jira/browse/SPARK-36367 > Project: Spark > Issue Type: Umbrella > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Takuya Ueshin >Assignee: Haejoon Lee >Priority: Major > > Pandas 1.3 has been released. We should follow the new pandas behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36367) Fix the behavior to follow pandas >= 1.3
[ https://issues.apache.org/jira/browse/SPARK-36367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-36367: - Fix Version/s: (was: 3.2.0) > Fix the behavior to follow pandas >= 1.3 > > > Key: SPARK-36367 > URL: https://issues.apache.org/jira/browse/SPARK-36367 > Project: Spark > Issue Type: Umbrella > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Takuya Ueshin >Assignee: Haejoon Lee >Priority: Major > > Pandas 1.3 has been released. We should follow the new pandas behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36367) Fix the behavior to follow pandas >= 1.3
[ https://issues.apache.org/jira/browse/SPARK-36367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-36367. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33614 [https://github.com/apache/spark/pull/33614] > Fix the behavior to follow pandas >= 1.3 > > > Key: SPARK-36367 > URL: https://issues.apache.org/jira/browse/SPARK-36367 > Project: Spark > Issue Type: Umbrella > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Takuya Ueshin >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.2.0 > > > Pandas 1.3 has been released. We should follow the new pandas behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36331) Add SQLSTATE guideline
[ https://issues.apache.org/jira/browse/SPARK-36331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-36331: Assignee: Karen Feng > Add SQLSTATE guideline > -- > > Key: SPARK-36331 > URL: https://issues.apache.org/jira/browse/SPARK-36331 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Karen Feng >Assignee: Karen Feng >Priority: Major > > Add SQLSTATE guideline to the error guidelines. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36331) Add SQLSTATE guideline
[ https://issues.apache.org/jira/browse/SPARK-36331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-36331. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33560 [https://github.com/apache/spark/pull/33560] > Add SQLSTATE guideline > -- > > Key: SPARK-36331 > URL: https://issues.apache.org/jira/browse/SPARK-36331 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Karen Feng >Assignee: Karen Feng >Priority: Major > Fix For: 3.2.0 > > > Add SQLSTATE guideline to the error guidelines. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36381) ALTER TABLE ADD/RENAME COLUMNS check exist does not use case sensitive for v2 command.
[ https://issues.apache.org/jira/browse/SPARK-36381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391948#comment-17391948 ] Apache Spark commented on SPARK-36381: -- User 'Peng-Lei' has created a pull request for this issue: https://github.com/apache/spark/pull/33618 > ALTER TABLE ADD/RENAME COLUMNS check exist does not use case sensitive for v2 > command. > -- > > Key: SPARK-36381 > URL: https://issues.apache.org/jira/browse/SPARK-36381 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: PengLei >Priority: Major > > ALTER TABLE ADD/RENAME COLUMNS check exist does not use case sensitive for v2 > command. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36381) ALTER TABLE ADD/RENAME COLUMNS check exist does not use case sensitive for v2 command.
[ https://issues.apache.org/jira/browse/SPARK-36381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391947#comment-17391947 ] Apache Spark commented on SPARK-36381: -- User 'Peng-Lei' has created a pull request for this issue: https://github.com/apache/spark/pull/33618 > ALTER TABLE ADD/RENAME COLUMNS check exist does not use case sensitive for v2 > command. > -- > > Key: SPARK-36381 > URL: https://issues.apache.org/jira/browse/SPARK-36381 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: PengLei >Priority: Major > > ALTER TABLE ADD/RENAME COLUMNS check exist does not use case sensitive for v2 > command. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35548) Handling new attempt has started error message in BlockPushErrorHandler in client
[ https://issues.apache.org/jira/browse/SPARK-35548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391937#comment-17391937 ] Apache Spark commented on SPARK-35548: -- User 'zhuqi-lucas' has created a pull request for this issue: https://github.com/apache/spark/pull/33617 > Handling new attempt has started error message in BlockPushErrorHandler in > client > - > > Key: SPARK-35548 > URL: https://issues.apache.org/jira/browse/SPARK-35548 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Ye Zhou >Priority: Major > > In SPARK-33350, a new type of error message is introduced in > BlockPushErrorHandler which indicates the PushblockStream message is received > after a new application attempt has started. This error message should be > correctly handled in client without retrying the block push. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35548) Handling new attempt has started error message in BlockPushErrorHandler in client
[ https://issues.apache.org/jira/browse/SPARK-35548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391936#comment-17391936 ] Apache Spark commented on SPARK-35548: -- User 'zhuqi-lucas' has created a pull request for this issue: https://github.com/apache/spark/pull/33617 > Handling new attempt has started error message in BlockPushErrorHandler in > client > - > > Key: SPARK-35548 > URL: https://issues.apache.org/jira/browse/SPARK-35548 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Ye Zhou >Priority: Major > > In SPARK-33350, a new type of error message is introduced in > BlockPushErrorHandler which indicates the PushblockStream message is received > after a new application attempt has started. This error message should be > correctly handled in client without retrying the block push. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35548) Handling new attempt has started error message in BlockPushErrorHandler in client
[ https://issues.apache.org/jira/browse/SPARK-35548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35548: Assignee: (was: Apache Spark) > Handling new attempt has started error message in BlockPushErrorHandler in > client > - > > Key: SPARK-35548 > URL: https://issues.apache.org/jira/browse/SPARK-35548 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Ye Zhou >Priority: Major > > In SPARK-33350, a new type of error message is introduced in > BlockPushErrorHandler which indicates the PushblockStream message is received > after a new application attempt has started. This error message should be > correctly handled in client without retrying the block push. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35548) Handling new attempt has started error message in BlockPushErrorHandler in client
[ https://issues.apache.org/jira/browse/SPARK-35548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35548: Assignee: Apache Spark > Handling new attempt has started error message in BlockPushErrorHandler in > client > - > > Key: SPARK-35548 > URL: https://issues.apache.org/jira/browse/SPARK-35548 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Ye Zhou >Assignee: Apache Spark >Priority: Major > > In SPARK-33350, a new type of error message is introduced in > BlockPushErrorHandler which indicates the PushblockStream message is received > after a new application attempt has started. This error message should be > correctly handled in client without retrying the block push. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36389) Revert the change that accepts negative mapId in ShuffleBlockId
[ https://issues.apache.org/jira/browse/SPARK-36389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36389: Assignee: (was: Apache Spark) > Revert the change that accepts negative mapId in ShuffleBlockId > --- > > Key: SPARK-36389 > URL: https://issues.apache.org/jira/browse/SPARK-36389 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.2.0 >Reporter: Chandni Singh >Priority: Minor > > With SPARK-32922, we added a change that {{ShuffleBlockId}} can have a > negative mapId. This was to support push-based shuffle where {{-1}} as mapId > indicated a push-merged block. However with SPARK-32923, a different type of > {{BlockId}} was introduced - {{ShuffleMergedId}}, but reverting the change to > {{ShuffleBlockId}} was missed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36389) Revert the change that accepts negative mapId in ShuffleBlockId
[ https://issues.apache.org/jira/browse/SPARK-36389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391913#comment-17391913 ] Apache Spark commented on SPARK-36389: -- User 'otterc' has created a pull request for this issue: https://github.com/apache/spark/pull/33616 > Revert the change that accepts negative mapId in ShuffleBlockId > --- > > Key: SPARK-36389 > URL: https://issues.apache.org/jira/browse/SPARK-36389 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.2.0 >Reporter: Chandni Singh >Priority: Minor > > With SPARK-32922, we added a change that {{ShuffleBlockId}} can have a > negative mapId. This was to support push-based shuffle where {{-1}} as mapId > indicated a push-merged block. However with SPARK-32923, a different type of > {{BlockId}} was introduced - {{ShuffleMergedId}}, but reverting the change to > {{ShuffleBlockId}} was missed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36389) Revert the change that accepts negative mapId in ShuffleBlockId
[ https://issues.apache.org/jira/browse/SPARK-36389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36389: Assignee: Apache Spark > Revert the change that accepts negative mapId in ShuffleBlockId > --- > > Key: SPARK-36389 > URL: https://issues.apache.org/jira/browse/SPARK-36389 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.2.0 >Reporter: Chandni Singh >Assignee: Apache Spark >Priority: Minor > > With SPARK-32922, we added a change that {{ShuffleBlockId}} can have a > negative mapId. This was to support push-based shuffle where {{-1}} as mapId > indicated a push-merged block. However with SPARK-32923, a different type of > {{BlockId}} was introduced - {{ShuffleMergedId}}, but reverting the change to > {{ShuffleBlockId}} was missed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36389) Revert the change that accepts negative mapId in ShuffleBlockId
[ https://issues.apache.org/jira/browse/SPARK-36389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391912#comment-17391912 ] Apache Spark commented on SPARK-36389: -- User 'otterc' has created a pull request for this issue: https://github.com/apache/spark/pull/33616 > Revert the change that accepts negative mapId in ShuffleBlockId > --- > > Key: SPARK-36389 > URL: https://issues.apache.org/jira/browse/SPARK-36389 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.2.0 >Reporter: Chandni Singh >Priority: Minor > > With SPARK-32922, we added a change that {{ShuffleBlockId}} can have a > negative mapId. This was to support push-based shuffle where {{-1}} as mapId > indicated a push-merged block. However with SPARK-32923, a different type of > {{BlockId}} was introduced - {{ShuffleMergedId}}, but reverting the change to > {{ShuffleBlockId}} was missed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36389) Revert the change that accepts negative mapId
[ https://issues.apache.org/jira/browse/SPARK-36389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chandni Singh updated SPARK-36389: -- Description: With SPARK-32922, we added a change that {{ShuffleBlockId}} can have a negative mapId. This was to support push-based shuffle where {{-1}} as mapId indicated a push-merged block. However with SPARK-32923, a different type of {{BlockId}} was introduced - {{ShuffleMergedId}}, but reverting the change to {{ShuffleBlockId}} was missed. (was: With SPARK-32922, we added a change that {{ShuffleBlockId}} can have a negative mapId. This was to support push-based shuffle where {{-1}} as mapId indicated a push-merged block. However with SPARK-32923, a different type of {{BlockId}} was introduce - {{ShuffleMergedId}}, but reverting the change to {{ShuffleBlockId}} was missed. ) > Revert the change that accepts negative mapId > - > > Key: SPARK-36389 > URL: https://issues.apache.org/jira/browse/SPARK-36389 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.2.0 >Reporter: Chandni Singh >Priority: Minor > > With SPARK-32922, we added a change that {{ShuffleBlockId}} can have a > negative mapId. This was to support push-based shuffle where {{-1}} as mapId > indicated a push-merged block. However with SPARK-32923, a different type of > {{BlockId}} was introduced - {{ShuffleMergedId}}, but reverting the change to > {{ShuffleBlockId}} was missed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36389) Revert the change that accepts negative mapId in ShuffleBlockId
[ https://issues.apache.org/jira/browse/SPARK-36389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chandni Singh updated SPARK-36389: -- Summary: Revert the change that accepts negative mapId in ShuffleBlockId (was: Revert the change that accepts negative mapId) > Revert the change that accepts negative mapId in ShuffleBlockId > --- > > Key: SPARK-36389 > URL: https://issues.apache.org/jira/browse/SPARK-36389 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.2.0 >Reporter: Chandni Singh >Priority: Minor > > With SPARK-32922, we added a change that {{ShuffleBlockId}} can have a > negative mapId. This was to support push-based shuffle where {{-1}} as mapId > indicated a push-merged block. However with SPARK-32923, a different type of > {{BlockId}} was introduced - {{ShuffleMergedId}}, but reverting the change to > {{ShuffleBlockId}} was missed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36389) Revert the change that accepts negative mapId
Chandni Singh created SPARK-36389: - Summary: Revert the change that accepts negative mapId Key: SPARK-36389 URL: https://issues.apache.org/jira/browse/SPARK-36389 Project: Spark Issue Type: Bug Components: Shuffle Affects Versions: 3.2.0 Reporter: Chandni Singh With SPARK-32922, we added a change that {{ShuffleBlockId}} can have a negative mapId. This was to support push-based shuffle where {{-1}} as mapId indicated a push-merged block. However with SPARK-32923, a different type of {{BlockId}} was introduce - {{ShuffleMergedId}}, but reverting the change to {{ShuffleBlockId}} was missed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36374) Push-based shuffle documentation
[ https://issues.apache.org/jira/browse/SPARK-36374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391906#comment-17391906 ] Apache Spark commented on SPARK-36374: -- User 'venkata91' has created a pull request for this issue: https://github.com/apache/spark/pull/33615 > Push-based shuffle documentation > > > Key: SPARK-36374 > URL: https://issues.apache.org/jira/browse/SPARK-36374 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 3.2.0 >Reporter: Venkata krishnan Sowrirajan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36374) Push-based shuffle documentation
[ https://issues.apache.org/jira/browse/SPARK-36374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36374: Assignee: Apache Spark > Push-based shuffle documentation > > > Key: SPARK-36374 > URL: https://issues.apache.org/jira/browse/SPARK-36374 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 3.2.0 >Reporter: Venkata krishnan Sowrirajan >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36374) Push-based shuffle documentation
[ https://issues.apache.org/jira/browse/SPARK-36374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36374: Assignee: (was: Apache Spark) > Push-based shuffle documentation > > > Key: SPARK-36374 > URL: https://issues.apache.org/jira/browse/SPARK-36374 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 3.2.0 >Reporter: Venkata krishnan Sowrirajan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36374) Push-based shuffle documentation
[ https://issues.apache.org/jira/browse/SPARK-36374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391907#comment-17391907 ] Apache Spark commented on SPARK-36374: -- User 'venkata91' has created a pull request for this issue: https://github.com/apache/spark/pull/33615 > Push-based shuffle documentation > > > Key: SPARK-36374 > URL: https://issues.apache.org/jira/browse/SPARK-36374 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 3.2.0 >Reporter: Venkata krishnan Sowrirajan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36367) Fix the behavior to follow pandas >= 1.3
[ https://issues.apache.org/jira/browse/SPARK-36367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391860#comment-17391860 ] Apache Spark commented on SPARK-36367: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/33614 > Fix the behavior to follow pandas >= 1.3 > > > Key: SPARK-36367 > URL: https://issues.apache.org/jira/browse/SPARK-36367 > Project: Spark > Issue Type: Umbrella > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Takuya Ueshin >Assignee: Haejoon Lee >Priority: Major > > Pandas 1.3 has been released. We should follow the new pandas behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36359) Coalesce drop all expressions after the first non nullable expression
[ https://issues.apache.org/jira/browse/SPARK-36359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-36359: Summary: Coalesce drop all expressions after the first non nullable expression (was: Coalesce returns the first expression if it is non nullable) > Coalesce drop all expressions after the first non nullable expression > - > > Key: SPARK-36359 > URL: https://issues.apache.org/jira/browse/SPARK-36359 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36345) Add mlflow/sklearn to GHA docker image
[ https://issues.apache.org/jira/browse/SPARK-36345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391858#comment-17391858 ] Apache Spark commented on SPARK-36345: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/33598 > Add mlflow/sklearn to GHA docker image > -- > > Key: SPARK-36345 > URL: https://issues.apache.org/jira/browse/SPARK-36345 > Project: Spark > Issue Type: Improvement > Components: Project Infra, PySpark, Tests >Affects Versions: 3.2.0 >Reporter: Haejoon Lee >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.2.0 > > > In Github Actions CI, we install the `mlflow>=1.0` and `sklearn` in the step > "List Python packages (Python 3.9)" of "pyspark" job. > > We can reduce the cost of CI by creating the image that has pre-installed > both package. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36367) Fix the behavior to follow pandas >= 1.3
[ https://issues.apache.org/jira/browse/SPARK-36367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-36367: - Affects Version/s: (was: 3.2.0) 3.3.0 > Fix the behavior to follow pandas >= 1.3 > > > Key: SPARK-36367 > URL: https://issues.apache.org/jira/browse/SPARK-36367 > Project: Spark > Issue Type: Umbrella > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Takuya Ueshin >Assignee: Haejoon Lee >Priority: Major > > Pandas 1.3 has been released. We should follow the new pandas behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36345) Add mlflow/sklearn to GHA docker image
[ https://issues.apache.org/jira/browse/SPARK-36345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391857#comment-17391857 ] Apache Spark commented on SPARK-36345: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/33598 > Add mlflow/sklearn to GHA docker image > -- > > Key: SPARK-36345 > URL: https://issues.apache.org/jira/browse/SPARK-36345 > Project: Spark > Issue Type: Improvement > Components: Project Infra, PySpark, Tests >Affects Versions: 3.2.0 >Reporter: Haejoon Lee >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.2.0 > > > In Github Actions CI, we install the `mlflow>=1.0` and `sklearn` in the step > "List Python packages (Python 3.9)" of "pyspark" job. > > We can reduce the cost of CI by creating the image that has pre-installed > both package. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36367) Fix the behavior to follow pandas >= 1.3
[ https://issues.apache.org/jira/browse/SPARK-36367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-36367: Assignee: Haejoon Lee > Fix the behavior to follow pandas >= 1.3 > > > Key: SPARK-36367 > URL: https://issues.apache.org/jira/browse/SPARK-36367 > Project: Spark > Issue Type: Umbrella > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Assignee: Haejoon Lee >Priority: Major > > Pandas 1.3 has been released. We should follow the new pandas behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36373) DecimalPrecision only add necessary cast
[ https://issues.apache.org/jira/browse/SPARK-36373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang reassigned SPARK-36373: --- Assignee: Yuming Wang > DecimalPrecision only add necessary cast > > > Key: SPARK-36373 > URL: https://issues.apache.org/jira/browse/SPARK-36373 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > > For example: > {noformat} > EqualTo(AttributeReference("d1", DecimalType(5, 2))(), > AttributeReference("d2", DecimalType(2, 1))()) > {noformat} > It will add a useless cast to {{d1}}: > {noformat} > (cast(d1#6 as decimal(5,2)) = cast(d2#7 as decimal(5,2))) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36373) DecimalPrecision only add necessary cast
[ https://issues.apache.org/jira/browse/SPARK-36373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang resolved SPARK-36373. - Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 33602 [https://github.com/apache/spark/pull/33602] > DecimalPrecision only add necessary cast > > > Key: SPARK-36373 > URL: https://issues.apache.org/jira/browse/SPARK-36373 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.3.0 > > > For example: > {noformat} > EqualTo(AttributeReference("d1", DecimalType(5, 2))(), > AttributeReference("d2", DecimalType(2, 1))()) > {noformat} > It will add a useless cast to {{d1}}: > {noformat} > (cast(d1#6 as decimal(5,2)) = cast(d2#7 as decimal(5,2))) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36137) HiveShim always fallback to getAllPartitionsOf regardless of whether directSQL is enabled in remote HMS
[ https://issues.apache.org/jira/browse/SPARK-36137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh resolved SPARK-36137. - Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 33382 [https://github.com/apache/spark/pull/33382] > HiveShim always fallback to getAllPartitionsOf regardless of whether > directSQL is enabled in remote HMS > --- > > Key: SPARK-36137 > URL: https://issues.apache.org/jira/browse/SPARK-36137 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Fix For: 3.3.0 > > > At the moment {{getPartitionsByFilter}} in Hive shim only fallback to use > {{getAllPartitionsOf}} when {{hive.metastore.try.direct.sql}} is enabled in > the remote HMS. However, in certain cases the remote HMS will fallback to use > ORM (which only support string type for partition columns) to query the > underlying RDBMS even if this config is set to true, and Spark will not be > able to recover from the error and will just fail the query. > For instance, we encountered this bug HIVE-21497 in HMS running Hive 3.1.2, > and Spark was not able to pushdown filter for {{date}} column. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36137) HiveShim always fallback to getAllPartitionsOf regardless of whether directSQL is enabled in remote HMS
[ https://issues.apache.org/jira/browse/SPARK-36137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh reassigned SPARK-36137: --- Assignee: Chao Sun > HiveShim always fallback to getAllPartitionsOf regardless of whether > directSQL is enabled in remote HMS > --- > > Key: SPARK-36137 > URL: https://issues.apache.org/jira/browse/SPARK-36137 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > > At the moment {{getPartitionsByFilter}} in Hive shim only fallback to use > {{getAllPartitionsOf}} when {{hive.metastore.try.direct.sql}} is enabled in > the remote HMS. However, in certain cases the remote HMS will fallback to use > ORM (which only support string type for partition columns) to query the > underlying RDBMS even if this config is set to true, and Spark will not be > able to recover from the error and will just fail the query. > For instance, we encountered this bug HIVE-21497 in HMS running Hive 3.1.2, > and Spark was not able to pushdown filter for {{date}} column. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36378) Minor changes to address a few identified server side inefficiencies
[ https://issues.apache.org/jira/browse/SPARK-36378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36378: Assignee: (was: Apache Spark) > Minor changes to address a few identified server side inefficiencies > > > Key: SPARK-36378 > URL: https://issues.apache.org/jira/browse/SPARK-36378 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.2.0 >Reporter: Min Shen >Priority: Major > > With the SPIP ticket close to being finished, we have done some performance > evaluations to compare the performance of push-based shuffle in upstream > Spark with the production version we have internally at LinkedIn. > The evaluations have revealed a few regressions and also some additional perf > improvement opportunity. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36378) Minor changes to address a few identified server side inefficiencies
[ https://issues.apache.org/jira/browse/SPARK-36378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391832#comment-17391832 ] Apache Spark commented on SPARK-36378: -- User 'Victsm' has created a pull request for this issue: https://github.com/apache/spark/pull/33613 > Minor changes to address a few identified server side inefficiencies > > > Key: SPARK-36378 > URL: https://issues.apache.org/jira/browse/SPARK-36378 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.2.0 >Reporter: Min Shen >Priority: Major > > With the SPIP ticket close to being finished, we have done some performance > evaluations to compare the performance of push-based shuffle in upstream > Spark with the production version we have internally at LinkedIn. > The evaluations have revealed a few regressions and also some additional perf > improvement opportunity. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36378) Minor changes to address a few identified server side inefficiencies
[ https://issues.apache.org/jira/browse/SPARK-36378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36378: Assignee: Apache Spark > Minor changes to address a few identified server side inefficiencies > > > Key: SPARK-36378 > URL: https://issues.apache.org/jira/browse/SPARK-36378 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.2.0 >Reporter: Min Shen >Assignee: Apache Spark >Priority: Major > > With the SPIP ticket close to being finished, we have done some performance > evaluations to compare the performance of push-based shuffle in upstream > Spark with the production version we have internally at LinkedIn. > The evaluations have revealed a few regressions and also some additional perf > improvement opportunity. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36388) Fix DataFrame groupby-rolling to follow pandas 1.3
Takuya Ueshin created SPARK-36388: - Summary: Fix DataFrame groupby-rolling to follow pandas 1.3 Key: SPARK-36388 URL: https://issues.apache.org/jira/browse/SPARK-36388 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.2.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36386) Fix DataFrame groupby-expanding to follow pandas 1.3
[ https://issues.apache.org/jira/browse/SPARK-36386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin updated SPARK-36386: -- Summary: Fix DataFrame groupby-expanding to follow pandas 1.3 (was: Fix groupby-expanding to follow pandas 1.3) > Fix DataFrame groupby-expanding to follow pandas 1.3 > > > Key: SPARK-36386 > URL: https://issues.apache.org/jira/browse/SPARK-36386 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36387) Fix Series.astype from datetime to nullable string
Takuya Ueshin created SPARK-36387: - Summary: Fix Series.astype from datetime to nullable string Key: SPARK-36387 URL: https://issues.apache.org/jira/browse/SPARK-36387 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.2.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36386) Fix groupby-expanding to follow pandas 1.3
[ https://issues.apache.org/jira/browse/SPARK-36386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin updated SPARK-36386: -- Summary: Fix groupby-expanding to follow pandas 1.3 (was: Fix expanding to follow pandas 1.3) > Fix groupby-expanding to follow pandas 1.3 > -- > > Key: SPARK-36386 > URL: https://issues.apache.org/jira/browse/SPARK-36386 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36386) Fix expanding to follow pandas 1.3
Takuya Ueshin created SPARK-36386: - Summary: Fix expanding to follow pandas 1.3 Key: SPARK-36386 URL: https://issues.apache.org/jira/browse/SPARK-36386 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.2.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30602) SPIP: Support push-based shuffle to improve shuffle efficiency
[ https://issues.apache.org/jira/browse/SPARK-30602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391745#comment-17391745 ] Min Shen commented on SPARK-30602: -- [~mridulm80], thanks for shepherding this work and your reviews on the PRs as well! BTW, could you please add me as the assignee of this ticket to properly credit the work? > SPIP: Support push-based shuffle to improve shuffle efficiency > -- > > Key: SPARK-30602 > URL: https://issues.apache.org/jira/browse/SPARK-30602 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Min Shen >Priority: Major > Labels: release-notes > Fix For: 3.2.0 > > Attachments: Screen Shot 2020-06-23 at 11.31.22 AM.jpg, > vldb_magnet_final.pdf > > > In a large deployment of a Spark compute infrastructure, Spark shuffle is > becoming a potential scaling bottleneck and a source of inefficiency in the > cluster. When doing Spark on YARN for a large-scale deployment, people > usually enable Spark external shuffle service and store the intermediate > shuffle files on HDD. Because the number of blocks generated for a particular > shuffle grows quadratically compared to the size of shuffled data (# mappers > and reducers grows linearly with the size of shuffled data, but # blocks is # > mappers * # reducers), one general trend we have observed is that the more > data a Spark application processes, the smaller the block size becomes. In a > few production clusters we have seen, the average shuffle block size is only > 10s of KBs. Because of the inefficiency of performing random reads on HDD for > small amount of data, the overall efficiency of the Spark external shuffle > services serving the shuffle blocks degrades as we see an increasing # of > Spark applications processing an increasing amount of data. In addition, > because Spark external shuffle service is a shared service in a multi-tenancy > cluster, the inefficiency with one Spark application could propagate to other > applications as well. > In this ticket, we propose a solution to improve Spark shuffle efficiency in > above mentioned environments with push-based shuffle. With push-based > shuffle, shuffle is performed at the end of mappers and blocks get pre-merged > and move towards reducers. In our prototype implementation, we have seen > significant efficiency improvements when performing large shuffles. We take a > Spark-native approach to achieve this, i.e., extending Spark’s existing > shuffle netty protocol, and the behaviors of Spark mappers, reducers and > drivers. This way, we can bring the benefits of more efficient shuffle in > Spark without incurring the dependency or overhead of either specialized > storage layer or external infrastructure pieces. > > Link to dev mailing list discussion: > [http://apache-spark-developers-list.1001551.n3.nabble.com/Enabling-push-based-shuffle-in-Spark-td28732.html] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36379) Null at root level of a JSON array causes the parsing failure (w/ permissive mode)
[ https://issues.apache.org/jira/browse/SPARK-36379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-36379. --- Fix Version/s: 3.3.0 3.2.0 Assignee: Hyukjin Kwon Resolution: Fixed This is resolved via https://github.com/apache/spark/pull/33608 > Null at root level of a JSON array causes the parsing failure (w/ permissive > mode) > -- > > Key: SPARK-36379 > URL: https://issues.apache.org/jira/browse/SPARK-36379 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2, 3.2.0, 3.3.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Fix For: 3.2.0, 3.3.0 > > > {code} > scala> spark.read.json(Seq("""[{"a": "str"}, null, {"a": > "str"}]""").toDS).collect() > {code} > {code} > ... > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 > (TID 1) (172.30.3.20 executor driver): java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) > {code} > Since the mode (by default) is permissive, we shouldn't just fail like above. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36385) Add possibility for jdbc insert hints
Nikolay Ivanitskiy created SPARK-36385: -- Summary: Add possibility for jdbc insert hints Key: SPARK-36385 URL: https://issues.apache.org/jira/browse/SPARK-36385 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 2.4.7 Reporter: Nikolay Ivanitskiy Some SQL backends supports hints for SQL insert statement, such as {code:java} /*+ ignore_row_on_dupkey_index ( table(col1, col2, ...) ) */ {code} for example. {code:java} insert /*+ ignore_row_on_dupkey_index ( table(col1, col2, ...) ) */ into table(...{code} But spark jdbc writer does not allow to add hints. I suggest to add support for hints in org.apache.spark.sql.execution.datasources.jdbc.JdbsUtils.getInsertStatement and org.apache.spark.sql.execution.datasources.jdbc.JdbsUtils.getInsertStatement.saveTable Hints should be stored in options. I already have version with hints support so if issue will be accepted I can post the fix. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35430) Investigate the failure of "PVs with local storage" integration test on Docker driver
[ https://issues.apache.org/jira/browse/SPARK-35430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shane Knapp resolved SPARK-35430. - Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 32793 [https://github.com/apache/spark/pull/32793] > Investigate the failure of "PVs with local storage" integration test on > Docker driver > - > > Key: SPARK-35430 > URL: https://issues.apache.org/jira/browse/SPARK-35430 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.2.0 >Reporter: Attila Zsolt Piros >Assignee: Attila Zsolt Piros >Priority: Major > Fix For: 3.3.0 > > > With https://issues.apache.org/jira/browse/SPARK-34738 integration tests are > migrated to docker but "PVs with local storage" was failing so we created a > separate test tag in https://github.com/apache/spark/pull/31829 called > "persistentVolume" test tag which not used by the > dev-run-integration-tests.sh so this way that tests is skipped. > Here we should revert "persistentVolume" and investigate the error. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35430) Investigate the failure of "PVs with local storage" integration test on Docker driver
[ https://issues.apache.org/jira/browse/SPARK-35430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shane Knapp reassigned SPARK-35430: --- Assignee: Attila Zsolt Piros > Investigate the failure of "PVs with local storage" integration test on > Docker driver > - > > Key: SPARK-35430 > URL: https://issues.apache.org/jira/browse/SPARK-35430 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.2.0 >Reporter: Attila Zsolt Piros >Assignee: Attila Zsolt Piros >Priority: Major > > With https://issues.apache.org/jira/browse/SPARK-34738 integration tests are > migrated to docker but "PVs with local storage" was failing so we created a > separate test tag in https://github.com/apache/spark/pull/31829 called > "persistentVolume" test tag which not used by the > dev-run-integration-tests.sh so this way that tests is skipped. > Here we should revert "persistentVolume" and investigate the error. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36086) The case of the delta table is inconsistent with parquet
[ https://issues.apache.org/jira/browse/SPARK-36086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391679#comment-17391679 ] Wenchen Fan commented on SPARK-36086: - [~krivosheinruslan] please open a ticket if you are working to improve the v2 describe table command. This ticket is resolved because this column name case different is fixed. > The case of the delta table is inconsistent with parquet > > > Key: SPARK-36086 > URL: https://issues.apache.org/jira/browse/SPARK-36086 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.1 >Reporter: Yuming Wang >Assignee: angerszhu >Priority: Major > Fix For: 3.2.0 > > > How to reproduce this issue: > {noformat} > 1. Add delta-core_2.12-1.0.0-SNAPSHOT.jar to ${SPARK_HOME}/jars. > 2. bin/spark-shell --conf > spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension --conf > spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog > {noformat} > {code:scala} > spark.sql("create table t1 using parquet as select id, id as lower_id from > range(5)") > spark.sql("CREATE VIEW v1 as SELECT * FROM t1") > spark.sql("CREATE TABLE t2 USING DELTA PARTITIONED BY (LOWER_ID) SELECT > LOWER_ID, ID FROM v1") > spark.sql("CREATE TABLE t3 USING PARQUET PARTITIONED BY (LOWER_ID) SELECT > LOWER_ID, ID FROM v1") > spark.sql("desc extended t2").show(false) > spark.sql("desc extended t3").show(false) > {code} > {noformat} > scala> spark.sql("desc extended t2").show(false) > ++--+---+ > |col_name|data_type > |comment| > ++--+---+ > |lower_id|bigint > | | > |id |bigint > | | > || > | | > |# Partitioning | > | | > |Part 0 |lower_id > | | > || > | | > |# Detailed Table Information| > | | > |Name|default.t2 > | | > |Location > |file:/Users/yumwang/Downloads/spark-3.1.1-bin-hadoop2.7/spark-warehouse/t2| > | > |Provider|delta > | | > |Table Properties > |[Type=MANAGED,delta.minReaderVersion=1,delta.minWriterVersion=2] | > | > ++--+---+ > scala> spark.sql("desc extended t3").show(false) > ++--+---+ > |col_name|data_type > |comment| > ++--+---+ > |ID |bigint > |null | > |LOWER_ID|bigint > |null | > |# Partition Information | > | | > |# col_name |data_type > |comment| > |LOWER_ID|bigint > |null | > || > | | > |# Detailed Table Information| > | | > |Database|default > | | > |Table |t3 > | | > |Owner
[jira] [Assigned] (SPARK-36086) The case of the delta table is inconsistent with parquet
[ https://issues.apache.org/jira/browse/SPARK-36086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-36086: --- Assignee: angerszhu > The case of the delta table is inconsistent with parquet > > > Key: SPARK-36086 > URL: https://issues.apache.org/jira/browse/SPARK-36086 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.1 >Reporter: Yuming Wang >Assignee: angerszhu >Priority: Major > Fix For: 3.2.0 > > > How to reproduce this issue: > {noformat} > 1. Add delta-core_2.12-1.0.0-SNAPSHOT.jar to ${SPARK_HOME}/jars. > 2. bin/spark-shell --conf > spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension --conf > spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog > {noformat} > {code:scala} > spark.sql("create table t1 using parquet as select id, id as lower_id from > range(5)") > spark.sql("CREATE VIEW v1 as SELECT * FROM t1") > spark.sql("CREATE TABLE t2 USING DELTA PARTITIONED BY (LOWER_ID) SELECT > LOWER_ID, ID FROM v1") > spark.sql("CREATE TABLE t3 USING PARQUET PARTITIONED BY (LOWER_ID) SELECT > LOWER_ID, ID FROM v1") > spark.sql("desc extended t2").show(false) > spark.sql("desc extended t3").show(false) > {code} > {noformat} > scala> spark.sql("desc extended t2").show(false) > ++--+---+ > |col_name|data_type > |comment| > ++--+---+ > |lower_id|bigint > | | > |id |bigint > | | > || > | | > |# Partitioning | > | | > |Part 0 |lower_id > | | > || > | | > |# Detailed Table Information| > | | > |Name|default.t2 > | | > |Location > |file:/Users/yumwang/Downloads/spark-3.1.1-bin-hadoop2.7/spark-warehouse/t2| > | > |Provider|delta > | | > |Table Properties > |[Type=MANAGED,delta.minReaderVersion=1,delta.minWriterVersion=2] | > | > ++--+---+ > scala> spark.sql("desc extended t3").show(false) > ++--+---+ > |col_name|data_type > |comment| > ++--+---+ > |ID |bigint > |null | > |LOWER_ID|bigint > |null | > |# Partition Information | > | | > |# col_name |data_type > |comment| > |LOWER_ID|bigint > |null | > || > | | > |# Detailed Table Information| > | | > |Database|default > | | > |Table |t3 > | | > |Owner |yumwang > | | > |Created Time|Mon Jul 12 14:07:16 CST 2021 >
[jira] [Resolved] (SPARK-36086) The case of the delta table is inconsistent with parquet
[ https://issues.apache.org/jira/browse/SPARK-36086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-36086. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33576 [https://github.com/apache/spark/pull/33576] > The case of the delta table is inconsistent with parquet > > > Key: SPARK-36086 > URL: https://issues.apache.org/jira/browse/SPARK-36086 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.1 >Reporter: Yuming Wang >Priority: Major > Fix For: 3.2.0 > > > How to reproduce this issue: > {noformat} > 1. Add delta-core_2.12-1.0.0-SNAPSHOT.jar to ${SPARK_HOME}/jars. > 2. bin/spark-shell --conf > spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension --conf > spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog > {noformat} > {code:scala} > spark.sql("create table t1 using parquet as select id, id as lower_id from > range(5)") > spark.sql("CREATE VIEW v1 as SELECT * FROM t1") > spark.sql("CREATE TABLE t2 USING DELTA PARTITIONED BY (LOWER_ID) SELECT > LOWER_ID, ID FROM v1") > spark.sql("CREATE TABLE t3 USING PARQUET PARTITIONED BY (LOWER_ID) SELECT > LOWER_ID, ID FROM v1") > spark.sql("desc extended t2").show(false) > spark.sql("desc extended t3").show(false) > {code} > {noformat} > scala> spark.sql("desc extended t2").show(false) > ++--+---+ > |col_name|data_type > |comment| > ++--+---+ > |lower_id|bigint > | | > |id |bigint > | | > || > | | > |# Partitioning | > | | > |Part 0 |lower_id > | | > || > | | > |# Detailed Table Information| > | | > |Name|default.t2 > | | > |Location > |file:/Users/yumwang/Downloads/spark-3.1.1-bin-hadoop2.7/spark-warehouse/t2| > | > |Provider|delta > | | > |Table Properties > |[Type=MANAGED,delta.minReaderVersion=1,delta.minWriterVersion=2] | > | > ++--+---+ > scala> spark.sql("desc extended t3").show(false) > ++--+---+ > |col_name|data_type > |comment| > ++--+---+ > |ID |bigint > |null | > |LOWER_ID|bigint > |null | > |# Partition Information | > | | > |# col_name |data_type > |comment| > |LOWER_ID|bigint > |null | > || > | | > |# Detailed Table Information| > | | > |Database|default > | | > |Table |t3 > | | > |Owner |yumwang > | | > |Created Time
[jira] [Resolved] (SPARK-36382) Remove noisy footer from the summary table for metrics
[ https://issues.apache.org/jira/browse/SPARK-36382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-36382. Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 33611 [https://github.com/apache/spark/pull/33611] > Remove noisy footer from the summary table for metrics > -- > > Key: SPARK-36382 > URL: https://issues.apache.org/jira/browse/SPARK-36382 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > Fix For: 3.3.0 > > > In the WebUI, some tables are implemented using DataTables > (https://datatables.net/). > By default, tables created using DataTables shows footer which says `Showing > x to y of z entries`, which is helpful for some tables if table entries can > grow > But the summary table for metrics in StagePage cannot grow so it's a little > bit noisy. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36383) NullPointerException throws during executor shutdown
[ https://issues.apache.org/jira/browse/SPARK-36383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36383: Assignee: Apache Spark > NullPointerException throws during executor shutdown > > > Key: SPARK-36383 > URL: https://issues.apache.org/jira/browse/SPARK-36383 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.3, 3.1.2, 3.2.0, 3.3.0 >Reporter: wuyi >Assignee: Apache Spark >Priority: Major > > {code:java} > 21/07/23 16:04:10 WARN Executor: Unable to stop executor metrics poller > java.lang.NullPointerException > at org.apache.spark.executor.Executor.stop(Executor.scala:318) > at > org.apache.spark.executor.Executor.$anonfun$stopHookReference$1(Executor.scala:76) > at > org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214) > at > org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at > org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2025) > at > org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at scala.util.Try$.apply(Try.scala:213) > at > org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) > at > org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 21/07/23 16:04:10 WARN Executor: Unable to stop heartbeater > java.lang.NullPointerException > at org.apache.spark.executor.Executor.stop(Executor.scala:324) > at > org.apache.spark.executor.Executor.$anonfun$stopHookReference$1(Executor.scala:76) > at > org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214) > at > org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at > org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2025) > at > org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at scala.util.Try$.apply(Try.scala:213) > at > org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) > at > org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 21/07/23 16:04:10 ERROR Utils: Uncaught exception in thread shutdown-hook-0 > java.lang.NullPointerException > at > org.apache.spark.executor.Executor.$anonfun$stop$3(Executor.scala:334) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at > org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:231) > at org.apache.spark.executor.Executor.stop(Executor.scala:334) > at > org.apache.spark.executor.Executor.$anonfun$stopHookReference$1(Executor.scala:76) > at > org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214) > at > org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at > org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2025) > at > org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at scala.util.Try$.apply(Try.scala:213) > at > org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) >
[jira] [Assigned] (SPARK-36383) NullPointerException throws during executor shutdown
[ https://issues.apache.org/jira/browse/SPARK-36383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36383: Assignee: (was: Apache Spark) > NullPointerException throws during executor shutdown > > > Key: SPARK-36383 > URL: https://issues.apache.org/jira/browse/SPARK-36383 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.3, 3.1.2, 3.2.0, 3.3.0 >Reporter: wuyi >Priority: Major > > {code:java} > 21/07/23 16:04:10 WARN Executor: Unable to stop executor metrics poller > java.lang.NullPointerException > at org.apache.spark.executor.Executor.stop(Executor.scala:318) > at > org.apache.spark.executor.Executor.$anonfun$stopHookReference$1(Executor.scala:76) > at > org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214) > at > org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at > org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2025) > at > org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at scala.util.Try$.apply(Try.scala:213) > at > org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) > at > org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 21/07/23 16:04:10 WARN Executor: Unable to stop heartbeater > java.lang.NullPointerException > at org.apache.spark.executor.Executor.stop(Executor.scala:324) > at > org.apache.spark.executor.Executor.$anonfun$stopHookReference$1(Executor.scala:76) > at > org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214) > at > org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at > org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2025) > at > org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at scala.util.Try$.apply(Try.scala:213) > at > org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) > at > org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 21/07/23 16:04:10 ERROR Utils: Uncaught exception in thread shutdown-hook-0 > java.lang.NullPointerException > at > org.apache.spark.executor.Executor.$anonfun$stop$3(Executor.scala:334) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at > org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:231) > at org.apache.spark.executor.Executor.stop(Executor.scala:334) > at > org.apache.spark.executor.Executor.$anonfun$stopHookReference$1(Executor.scala:76) > at > org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214) > at > org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at > org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2025) > at > org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at scala.util.Try$.apply(Try.scala:213) > at > org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) > at >
[jira] [Commented] (SPARK-36383) NullPointerException throws during executor shutdown
[ https://issues.apache.org/jira/browse/SPARK-36383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391662#comment-17391662 ] Apache Spark commented on SPARK-36383: -- User 'Ngone51' has created a pull request for this issue: https://github.com/apache/spark/pull/33612 > NullPointerException throws during executor shutdown > > > Key: SPARK-36383 > URL: https://issues.apache.org/jira/browse/SPARK-36383 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.3, 3.1.2, 3.2.0, 3.3.0 >Reporter: wuyi >Priority: Major > > {code:java} > 21/07/23 16:04:10 WARN Executor: Unable to stop executor metrics poller > java.lang.NullPointerException > at org.apache.spark.executor.Executor.stop(Executor.scala:318) > at > org.apache.spark.executor.Executor.$anonfun$stopHookReference$1(Executor.scala:76) > at > org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214) > at > org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at > org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2025) > at > org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at scala.util.Try$.apply(Try.scala:213) > at > org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) > at > org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 21/07/23 16:04:10 WARN Executor: Unable to stop heartbeater > java.lang.NullPointerException > at org.apache.spark.executor.Executor.stop(Executor.scala:324) > at > org.apache.spark.executor.Executor.$anonfun$stopHookReference$1(Executor.scala:76) > at > org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214) > at > org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at > org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2025) > at > org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at scala.util.Try$.apply(Try.scala:213) > at > org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) > at > org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 21/07/23 16:04:10 ERROR Utils: Uncaught exception in thread shutdown-hook-0 > java.lang.NullPointerException > at > org.apache.spark.executor.Executor.$anonfun$stop$3(Executor.scala:334) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at > org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:231) > at org.apache.spark.executor.Executor.stop(Executor.scala:334) > at > org.apache.spark.executor.Executor.$anonfun$stopHookReference$1(Executor.scala:76) > at > org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214) > at > org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at > org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2025) > at > org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at scala.util.Try$.apply(Try.scala:213) > at >
[jira] [Created] (SPARK-36384) Add documentation for shuffle checksum
wuyi created SPARK-36384: Summary: Add documentation for shuffle checksum Key: SPARK-36384 URL: https://issues.apache.org/jira/browse/SPARK-36384 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 3.2.0, 3.3.0 Reporter: wuyi Add documentation for shuffle checksum -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35275) Add checksum for shuffle blocks
[ https://issues.apache.org/jira/browse/SPARK-35275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391654#comment-17391654 ] wuyi commented on SPARK-35275: -- [~mridulm80] Yes, we shall have the doc task. Let me create it. > Add checksum for shuffle blocks > --- > > Key: SPARK-35275 > URL: https://issues.apache.org/jira/browse/SPARK-35275 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: wuyi >Priority: Major > > Shuffle data corruption is a long-standing issue in Spark. For example, in > SPARK-18105, people continually reports corruption issue. However, data > corruption is difficult to reproduce in most cases and even harder to tell > the root cause. We don't know if it's a Spark issue or not. With the checksum > support for the shuffle, Spark itself can at least distinguish the cause > between disk and network, which is very important for users. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36224) Use "void" as the type name of NullType
[ https://issues.apache.org/jira/browse/SPARK-36224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-36224. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33437 [https://github.com/apache/spark/pull/33437] > Use "void" as the type name of NullType > --- > > Key: SPARK-36224 > URL: https://issues.apache.org/jira/browse/SPARK-36224 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Linhong Liu >Assignee: Linhong Liu >Priority: Major > Fix For: 3.2.0 > > > In PR: [https://github.com/apache/spark/pull/28833,] we support parsing > "void" as NullType. But still use "null" as the type name. This leads some > confusing and inconsistent issues. For example: > `org.apache.spark.sql.types.DataType.fromDDL(NullType.toDDL)` is not working -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36224) Use "void" as the type name of NullType
[ https://issues.apache.org/jira/browse/SPARK-36224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-36224: --- Assignee: Linhong Liu > Use "void" as the type name of NullType > --- > > Key: SPARK-36224 > URL: https://issues.apache.org/jira/browse/SPARK-36224 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Linhong Liu >Assignee: Linhong Liu >Priority: Major > > In PR: [https://github.com/apache/spark/pull/28833,] we support parsing > "void" as NullType. But still use "null" as the type name. This leads some > confusing and inconsistent issues. For example: > `org.apache.spark.sql.types.DataType.fromDDL(NullType.toDDL)` is not working -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35275) Add checksum for shuffle blocks
[ https://issues.apache.org/jira/browse/SPARK-35275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391643#comment-17391643 ] Gengliang Wang commented on SPARK-35275: [~mridulm80]+1. I will cut RC around the 10th. We still have time. > Add checksum for shuffle blocks > --- > > Key: SPARK-35275 > URL: https://issues.apache.org/jira/browse/SPARK-35275 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: wuyi >Priority: Major > > Shuffle data corruption is a long-standing issue in Spark. For example, in > SPARK-18105, people continually reports corruption issue. However, data > corruption is difficult to reproduce in most cases and even harder to tell > the root cause. We don't know if it's a Spark issue or not. With the checksum > support for the shuffle, Spark itself can at least distinguish the cause > between disk and network, which is very important for users. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35275) Add checksum for shuffle blocks
[ https://issues.apache.org/jira/browse/SPARK-35275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391640#comment-17391640 ] Mridul Muralidharan commented on SPARK-35275: - Do we want to add a documentation task for this jira as well [~Ngone51] ? > Add checksum for shuffle blocks > --- > > Key: SPARK-35275 > URL: https://issues.apache.org/jira/browse/SPARK-35275 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: wuyi >Priority: Major > > Shuffle data corruption is a long-standing issue in Spark. For example, in > SPARK-18105, people continually reports corruption issue. However, data > corruption is difficult to reproduce in most cases and even harder to tell > the root cause. We don't know if it's a Spark issue or not. With the checksum > support for the shuffle, Spark itself can at least distinguish the cause > between disk and network, which is very important for users. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36325) Writing to hiveserver throught jdbc throws ParseException
[ https://issues.apache.org/jira/browse/SPARK-36325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391638#comment-17391638 ] Jesús Ricardo Ballesteros Molina commented on SPARK-36325: -- Hello, firstable thank you for your reply. I used the dialect but now I have another issue, and this one I don't know how to address it. {code:java} import org.apache.spark.sql.jdbc.{JdbcDialects, JdbcType, JdbcDialect} import org.apache.spark.sql.types.StringType import java.sql.Types import org.apache.spark.sql.types.DataType val HiveDialect = new JdbcDialect { override def canHandle(url: String): Boolean = url.startsWith("jdbc:hive2") || url.contains("hive2") override def quoteIdentifier(colName: String): String ={ s"$colName" } override def getJDBCType(dt: DataType): Option[JdbcType] = dt match { case StringType => Option(JdbcType("STRING", Types.VARCHAR)) case _ => None } } JdbcDialects.registerDialect(HiveDialect) df_linux.write.mode("overwrite") .format("jdbc") .option("driver","org.apache.hive.jdbc.HiveDriver") .option("url", "jdbc:hive2://sa3secessuperset01.a3sec.local:1") .option("dbtable", "o365new") //.option("createTableColumnTypes", "_time VARCHAR(1024), raw_log VARCHAR(1024), service_name VARCHAR(1024), hostname VARCHAR(1024), pid VARCHAR(1024), username VARCHAR(1024), source_ip VARCHAR(1024)") .option("createTableColumnTypes", "time STRING, raw_log STRING, service_name STRING, hostname STRING, pid STRING, username STRING, source_ip STRING") .save() {code} I get this error: {code:java} org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 32.0 failed 4 times, most recent failure: Lost task 0.3 in stage 32.0 (TID 423) (10.103.0.118 executor 2): java.sql.SQLFeatureNotSupportedException: Method not supported at org.apache.hive.jdbc.HivePreparedStatement.addBatch(HivePreparedStatement.java:78) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:683) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$saveTable$1(JdbcUtils.scala:856) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$saveTable$1$adapted(JdbcUtils.scala:854) {code} Maybe jdbc is not the way to write throught thriftserver but I don't know how to do it. At the moment I am using another database but I really want to use the SparkSQL. If you think I should close this issue and maybe open as something else feel free to close the ticket. > Writing to hiveserver throught jdbc throws ParseException > - > > Key: SPARK-36325 > URL: https://issues.apache.org/jira/browse/SPARK-36325 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2 > Environment: OS: Debian 10 > Spark version: 3.1.2 > Zeppelin Notebook: 0.9.0 > Jdbc driver: org.apache.hive:hive-jdbc:3.1.2 >Reporter: Jesús Ricardo Ballesteros Molina >Priority: Major > Labels: spark, spark-sql > > Hello everyone, I am new working on Spark and this is my first post. If I > make a mistake please be kind to me but I have searched in the web and I > haven't found anything related. If this bug is duplicated or something please > feel free to close it and tell me where to look. > I am working with Zeppelin, I got a dataframe from Solr API, I processed and > I want to write to a table trough thrift and read that new table from Apache > SuperSet. > > I have this df with this schema: > {code:java} > %spark > df_linux.printSchema() > root > |-- time: string (nullable = false) > |-- raw_log: string (nullable = false) > |-- service_name: string (nullable = false) > |-- hostname: string (nullable = false) > |-- pid: string (nullable = false) > |-- username: string (nullable = false) > |-- source_ip: string (nullable = false) > {code} > > And this content: > > {code:java} > %spark > df_linux.show() > ++++--+-++-+ > | time| raw_log|service_name| hostname| pid|username|source_ip| > ++++--+-++-+ > |2021-07-28T07:41:53Z|Jul 28 07:41:52 s...| > sshd[11611]|sa3secessuperset01|11611| debian| 10.0.9.3| > |2021-07-28T07:41:44Z|Jul 28 07:41:43 s...| > sshd[11590]|sa3secessuperset01|11590| debian| 10.0.9.3| > |2021-07-27T08:46:11Z|Jul 27 08:46:10 s...| > sshd[16954]|sa3secessuperset01|16954| debian| 10.0.9.3| > |2021-07-27T08:44:55Z|Jul 27 08:44:54 s...| > sshd[16511]|sa3secessuperset01|16511| debian| 10.0.9.3| > |2021-07-27T08:30:03Z|Jul 27 08:30:02 s...| > sshd[14511]|sa3secessuperset01|14511| debian| 10.0.9.3| >
[jira] [Assigned] (SPARK-36206) Diagnose shuffle data corruption by checksum
[ https://issues.apache.org/jira/browse/SPARK-36206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan reassigned SPARK-36206: --- Assignee: wuyi > Diagnose shuffle data corruption by checksum > > > Key: SPARK-36206 > URL: https://issues.apache.org/jira/browse/SPARK-36206 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: wuyi >Assignee: wuyi >Priority: Major > > After adding checksums in SPARK-35276, we can leverage the checksums to do > diagnosis for shuffle data corruption now. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36206) Diagnose shuffle data corruption by checksum
[ https://issues.apache.org/jira/browse/SPARK-36206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan resolved SPARK-36206. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33451 [https://github.com/apache/spark/pull/33451] > Diagnose shuffle data corruption by checksum > > > Key: SPARK-36206 > URL: https://issues.apache.org/jira/browse/SPARK-36206 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: wuyi >Assignee: wuyi >Priority: Major > Fix For: 3.2.0 > > > After adding checksums in SPARK-35276, we can leverage the checksums to do > diagnosis for shuffle data corruption now. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36383) NullPointerException throws during executor shutdown
[ https://issues.apache.org/jira/browse/SPARK-36383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wuyi updated SPARK-36383: - Summary: NullPointerException throws during executor shutdown (was: Avoid NullPointerException during executor shutdown) > NullPointerException throws during executor shutdown > > > Key: SPARK-36383 > URL: https://issues.apache.org/jira/browse/SPARK-36383 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.3, 3.1.2, 3.2.0, 3.3.0 >Reporter: wuyi >Priority: Major > > {code:java} > 21/07/23 16:04:10 WARN Executor: Unable to stop executor metrics poller > java.lang.NullPointerException > at org.apache.spark.executor.Executor.stop(Executor.scala:318) > at > org.apache.spark.executor.Executor.$anonfun$stopHookReference$1(Executor.scala:76) > at > org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214) > at > org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at > org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2025) > at > org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at scala.util.Try$.apply(Try.scala:213) > at > org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) > at > org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 21/07/23 16:04:10 WARN Executor: Unable to stop heartbeater > java.lang.NullPointerException > at org.apache.spark.executor.Executor.stop(Executor.scala:324) > at > org.apache.spark.executor.Executor.$anonfun$stopHookReference$1(Executor.scala:76) > at > org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214) > at > org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at > org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2025) > at > org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at scala.util.Try$.apply(Try.scala:213) > at > org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) > at > org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 21/07/23 16:04:10 ERROR Utils: Uncaught exception in thread shutdown-hook-0 > java.lang.NullPointerException > at > org.apache.spark.executor.Executor.$anonfun$stop$3(Executor.scala:334) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at > org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:231) > at org.apache.spark.executor.Executor.stop(Executor.scala:334) > at > org.apache.spark.executor.Executor.$anonfun$stopHookReference$1(Executor.scala:76) > at > org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214) > at > org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at > org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2025) > at > org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at scala.util.Try$.apply(Try.scala:213) > at >
[jira] [Created] (SPARK-36383) Avoid NullPointerException during executor shutdown
wuyi created SPARK-36383: Summary: Avoid NullPointerException during executor shutdown Key: SPARK-36383 URL: https://issues.apache.org/jira/browse/SPARK-36383 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.1.2, 3.0.3, 3.2.0, 3.3.0 Reporter: wuyi {code:java} 21/07/23 16:04:10 WARN Executor: Unable to stop executor metrics poller java.lang.NullPointerException at org.apache.spark.executor.Executor.stop(Executor.scala:318) at org.apache.spark.executor.Executor.$anonfun$stopHookReference$1(Executor.scala:76) at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214) at org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2025) at org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at scala.util.Try$.apply(Try.scala:213) at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 21/07/23 16:04:10 WARN Executor: Unable to stop heartbeater java.lang.NullPointerException at org.apache.spark.executor.Executor.stop(Executor.scala:324) at org.apache.spark.executor.Executor.$anonfun$stopHookReference$1(Executor.scala:76) at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214) at org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2025) at org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at scala.util.Try$.apply(Try.scala:213) at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 21/07/23 16:04:10 ERROR Utils: Uncaught exception in thread shutdown-hook-0 java.lang.NullPointerException at org.apache.spark.executor.Executor.$anonfun$stop$3(Executor.scala:334) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:231) at org.apache.spark.executor.Executor.stop(Executor.scala:334) at org.apache.spark.executor.Executor.$anonfun$stopHookReference$1(Executor.scala:76) at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214) at org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2025) at org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at scala.util.Try$.apply(Try.scala:213) at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at
[jira] [Assigned] (SPARK-36382) Remove noisy footer from the summary table for metrics
[ https://issues.apache.org/jira/browse/SPARK-36382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36382: Assignee: Kousuke Saruta (was: Apache Spark) > Remove noisy footer from the summary table for metrics > -- > > Key: SPARK-36382 > URL: https://issues.apache.org/jira/browse/SPARK-36382 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > In the WebUI, some tables are implemented using DataTables > (https://datatables.net/). > By default, tables created using DataTables shows footer which says `Showing > x to y of z entries`, which is helpful for some tables if table entries can > grow > But the summary table for metrics in StagePage cannot grow so it's a little > bit noisy. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36382) Remove noisy footer from the summary table for metrics
[ https://issues.apache.org/jira/browse/SPARK-36382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36382: Assignee: Apache Spark (was: Kousuke Saruta) > Remove noisy footer from the summary table for metrics > -- > > Key: SPARK-36382 > URL: https://issues.apache.org/jira/browse/SPARK-36382 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.3.0 >Reporter: Kousuke Saruta >Assignee: Apache Spark >Priority: Minor > > In the WebUI, some tables are implemented using DataTables > (https://datatables.net/). > By default, tables created using DataTables shows footer which says `Showing > x to y of z entries`, which is helpful for some tables if table entries can > grow > But the summary table for metrics in StagePage cannot grow so it's a little > bit noisy. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36382) Remove noisy footer from the summary table for metrics
[ https://issues.apache.org/jira/browse/SPARK-36382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391584#comment-17391584 ] Apache Spark commented on SPARK-36382: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/33611 > Remove noisy footer from the summary table for metrics > -- > > Key: SPARK-36382 > URL: https://issues.apache.org/jira/browse/SPARK-36382 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > In the WebUI, some tables are implemented using DataTables > (https://datatables.net/). > By default, tables created using DataTables shows footer which says `Showing > x to y of z entries`, which is helpful for some tables if table entries can > grow > But the summary table for metrics in StagePage cannot grow so it's a little > bit noisy. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36382) Remove noisy footer from the summary table for metrics
[ https://issues.apache.org/jira/browse/SPARK-36382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391585#comment-17391585 ] Apache Spark commented on SPARK-36382: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/33611 > Remove noisy footer from the summary table for metrics > -- > > Key: SPARK-36382 > URL: https://issues.apache.org/jira/browse/SPARK-36382 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > In the WebUI, some tables are implemented using DataTables > (https://datatables.net/). > By default, tables created using DataTables shows footer which says `Showing > x to y of z entries`, which is helpful for some tables if table entries can > grow > But the summary table for metrics in StagePage cannot grow so it's a little > bit noisy. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36382) Remove noisy footer from the summary table for metrics
[ https://issues.apache.org/jira/browse/SPARK-36382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-36382: --- Summary: Remove noisy footer from the summary table for metrics (was: Remove unnecesssary footer from the summary table for metrics) > Remove noisy footer from the summary table for metrics > -- > > Key: SPARK-36382 > URL: https://issues.apache.org/jira/browse/SPARK-36382 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > In the WebUI, some tables are implemented using DataTables > (https://datatables.net/). > By default, tables created using DataTables shows footer which says `Showing > x to y of z entries`, which is helpful for some tables if table entries can > grow > But the summary table for metrics in StagePage cannot grow so it's a little > bit noisy. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36382) Remove unnecesssary footer from the summary table for metrics
Kousuke Saruta created SPARK-36382: -- Summary: Remove unnecesssary footer from the summary table for metrics Key: SPARK-36382 URL: https://issues.apache.org/jira/browse/SPARK-36382 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 3.3.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta In the WebUI, some tables are implemented using DataTables (https://datatables.net/). By default, tables created using DataTables shows footer which says `Showing x to y of z entries`, which is helpful for some tables if table entries can grow But the summary table for metrics in StagePage cannot grow so it's a little bit noisy. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35918) Consolidate logic between AvroSerializer/AvroDeserializer for schema mismatch handling and error messages
[ https://issues.apache.org/jira/browse/SPARK-35918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-35918. Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 33308 [https://github.com/apache/spark/pull/33308] > Consolidate logic between AvroSerializer/AvroDeserializer for schema mismatch > handling and error messages > - > > Key: SPARK-35918 > URL: https://issues.apache.org/jira/browse/SPARK-35918 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.2 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: 3.3.0 > > > While working on [PR #31490|https://github.com/apache/spark/pull/31490] for > SPARK-34365, we discussed that there is room for improvement in how schema > mismatch errors are reported > ([comment1|https://github.com/apache/spark/pull/31490#discussion_r659970793], > [comment2|https://github.com/apache/spark/pull/31490#issuecomment-869866848]). > We can also consolidate more of the logic between AvroSerializer and > AvroDeserializer to avoid some duplication of error handling and consolidate > how these error messages are generated. > This will essentially be taking the [logic from the initial proposal from PR > #31490|https://github.com/apache/spark/pull/31490/commits/83a922fdff08528e59233f67930ac78bfb3fa178], > but applied separately from the current set of proposed changes to cut down > on PR size. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35918) Consolidate logic between AvroSerializer/AvroDeserializer for schema mismatch handling and error messages
[ https://issues.apache.org/jira/browse/SPARK-35918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-35918: -- Assignee: Erik Krogen > Consolidate logic between AvroSerializer/AvroDeserializer for schema mismatch > handling and error messages > - > > Key: SPARK-35918 > URL: https://issues.apache.org/jira/browse/SPARK-35918 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.2 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > > While working on [PR #31490|https://github.com/apache/spark/pull/31490] for > SPARK-34365, we discussed that there is room for improvement in how schema > mismatch errors are reported > ([comment1|https://github.com/apache/spark/pull/31490#discussion_r659970793], > [comment2|https://github.com/apache/spark/pull/31490#issuecomment-869866848]). > We can also consolidate more of the logic between AvroSerializer and > AvroDeserializer to avoid some duplication of error handling and consolidate > how these error messages are generated. > This will essentially be taking the [logic from the initial proposal from PR > #31490|https://github.com/apache/spark/pull/31490/commits/83a922fdff08528e59233f67930ac78bfb3fa178], > but applied separately from the current set of proposed changes to cut down > on PR size. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36381) ALTER TABLE ADD/RENAME COLUMNS check exist does not use case sensitive for v2 command.
[ https://issues.apache.org/jira/browse/SPARK-36381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36381: Assignee: Apache Spark > ALTER TABLE ADD/RENAME COLUMNS check exist does not use case sensitive for v2 > command. > -- > > Key: SPARK-36381 > URL: https://issues.apache.org/jira/browse/SPARK-36381 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: PengLei >Assignee: Apache Spark >Priority: Major > > ALTER TABLE ADD/RENAME COLUMNS check exist does not use case sensitive for v2 > command. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36381) ALTER TABLE ADD/RENAME COLUMNS check exist does not use case sensitive for v2 command.
[ https://issues.apache.org/jira/browse/SPARK-36381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391559#comment-17391559 ] Apache Spark commented on SPARK-36381: -- User 'Peng-Lei' has created a pull request for this issue: https://github.com/apache/spark/pull/33610 > ALTER TABLE ADD/RENAME COLUMNS check exist does not use case sensitive for v2 > command. > -- > > Key: SPARK-36381 > URL: https://issues.apache.org/jira/browse/SPARK-36381 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: PengLei >Priority: Major > > ALTER TABLE ADD/RENAME COLUMNS check exist does not use case sensitive for v2 > command. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36381) ALTER TABLE ADD/RENAME COLUMNS check exist does not use case sensitive for v2 command.
[ https://issues.apache.org/jira/browse/SPARK-36381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36381: Assignee: (was: Apache Spark) > ALTER TABLE ADD/RENAME COLUMNS check exist does not use case sensitive for v2 > command. > -- > > Key: SPARK-36381 > URL: https://issues.apache.org/jira/browse/SPARK-36381 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: PengLei >Priority: Major > > ALTER TABLE ADD/RENAME COLUMNS check exist does not use case sensitive for v2 > command. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36381) ALTER TABLE ADD/RENAME COLUMNS check exist does not use case sensitive for v2 command.
PengLei created SPARK-36381: --- Summary: ALTER TABLE ADD/RENAME COLUMNS check exist does not use case sensitive for v2 command. Key: SPARK-36381 URL: https://issues.apache.org/jira/browse/SPARK-36381 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0 Reporter: PengLei ALTER TABLE ADD/RENAME COLUMNS check exist does not use case sensitive for v2 command. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36237) SparkUI should bind handler after application started
[ https://issues.apache.org/jira/browse/SPARK-36237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-36237. Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 33457 [https://github.com/apache/spark/pull/33457] > SparkUI should bind handler after application started > - > > Key: SPARK-36237 > URL: https://issues.apache.org/jira/browse/SPARK-36237 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Fix For: 3.3.0 > > > When we use prometheus to fetch metrics, always call before application > started. > Then throw alot of exception not of NoSuchElementException > {code:java} > 21/07/19 04:53:37 INFO Client: Preparing resources for our AM container > 21/07/19 04:53:37 INFO Client: Uploading resource > hdfs://tl3/packages/jars/spark-2.4-archive.tar.gz -> > hdfs://R2/user/xiaoke.zhou/.sparkStaging/application_1624456325569_7143920/spark-2.4-archive.tar.gz > 21/07/19 04:53:37 WARN JettyUtils: GET /jobs/ failed: > java.util.NoSuchElementException: Failed to get the application information. > If you are starting up Spark, please wait a while until it's ready. > java.util.NoSuchElementException: Failed to get the application information. > If you are starting up Spark, please wait a while until it's ready. > at > org.apache.spark.status.AppStatusStore.applicationInfo(AppStatusStore.scala:43) > at org.apache.spark.ui.jobs.AllJobsPage.render(AllJobsPage.scala:275) > at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:90) > at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:90) > at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > at > org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) > at > org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:513) > at > org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at > org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493) > at > org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) > at > org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > at org.spark_project.jetty.server.Server.handle(Server.java:539) > at org.spark_project.jetty.server.HttpChannel.handle(Htt > [2021-07-19 04:54:55,111] INFO - pChannel.java:333) > at > org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) > at > org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > at > org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108) > at > org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > at > org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) > at > org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) > at > org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) > at > org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) > at > org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) > at java.lang.Thread.run(Thread.java:748) > 21/07/19 04:53:37 WARN ServletHandler: /jobs/ > java.util.NoSuchElementException: Failed to get the application information. > If you are starting up Spark, please wait a while until it's ready. > at > org.apache.spark.status.AppStatusStore.applicationInfo(AppStatusStore.scala:43) > at org.apache.spark.ui.jobs.AllJobsPage.render(AllJobsPage.scala:275) > at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:90) > at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:90) > at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at >
[jira] [Assigned] (SPARK-36237) SparkUI should bind handler after application started
[ https://issues.apache.org/jira/browse/SPARK-36237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-36237: -- Assignee: angerszhu > SparkUI should bind handler after application started > - > > Key: SPARK-36237 > URL: https://issues.apache.org/jira/browse/SPARK-36237 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > > When we use prometheus to fetch metrics, always call before application > started. > Then throw alot of exception not of NoSuchElementException > {code:java} > 21/07/19 04:53:37 INFO Client: Preparing resources for our AM container > 21/07/19 04:53:37 INFO Client: Uploading resource > hdfs://tl3/packages/jars/spark-2.4-archive.tar.gz -> > hdfs://R2/user/xiaoke.zhou/.sparkStaging/application_1624456325569_7143920/spark-2.4-archive.tar.gz > 21/07/19 04:53:37 WARN JettyUtils: GET /jobs/ failed: > java.util.NoSuchElementException: Failed to get the application information. > If you are starting up Spark, please wait a while until it's ready. > java.util.NoSuchElementException: Failed to get the application information. > If you are starting up Spark, please wait a while until it's ready. > at > org.apache.spark.status.AppStatusStore.applicationInfo(AppStatusStore.scala:43) > at org.apache.spark.ui.jobs.AllJobsPage.render(AllJobsPage.scala:275) > at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:90) > at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:90) > at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > at > org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) > at > org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:513) > at > org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at > org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493) > at > org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) > at > org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > at org.spark_project.jetty.server.Server.handle(Server.java:539) > at org.spark_project.jetty.server.HttpChannel.handle(Htt > [2021-07-19 04:54:55,111] INFO - pChannel.java:333) > at > org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) > at > org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > at > org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108) > at > org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > at > org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) > at > org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) > at > org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) > at > org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) > at > org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) > at java.lang.Thread.run(Thread.java:748) > 21/07/19 04:53:37 WARN ServletHandler: /jobs/ > java.util.NoSuchElementException: Failed to get the application information. > If you are starting up Spark, please wait a while until it's ready. > at > org.apache.spark.status.AppStatusStore.applicationInfo(AppStatusStore.scala:43) > at org.apache.spark.ui.jobs.AllJobsPage.render(AllJobsPage.scala:275) > at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:90) > at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:90) > at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > at >
[jira] [Assigned] (SPARK-36380) Simplify the logical plan names for ALTER TABLE ... COLUMN
[ https://issues.apache.org/jira/browse/SPARK-36380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36380: Assignee: Apache Spark > Simplify the logical plan names for ALTER TABLE ... COLUMN > -- > > Key: SPARK-36380 > URL: https://issues.apache.org/jira/browse/SPARK-36380 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36380) Simplify the logical plan names for ALTER TABLE ... COLUMN
[ https://issues.apache.org/jira/browse/SPARK-36380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391505#comment-17391505 ] Apache Spark commented on SPARK-36380: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/33609 > Simplify the logical plan names for ALTER TABLE ... COLUMN > -- > > Key: SPARK-36380 > URL: https://issues.apache.org/jira/browse/SPARK-36380 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36380) Simplify the logical plan names for ALTER TABLE ... COLUMN
[ https://issues.apache.org/jira/browse/SPARK-36380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36380: Assignee: (was: Apache Spark) > Simplify the logical plan names for ALTER TABLE ... COLUMN > -- > > Key: SPARK-36380 > URL: https://issues.apache.org/jira/browse/SPARK-36380 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36380) Simplify the logical plan names for ALTER TABLE ... COLUMN
[ https://issues.apache.org/jira/browse/SPARK-36380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391506#comment-17391506 ] Apache Spark commented on SPARK-36380: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/33609 > Simplify the logical plan names for ALTER TABLE ... COLUMN > -- > > Key: SPARK-36380 > URL: https://issues.apache.org/jira/browse/SPARK-36380 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36380) Simplify the logical plan names for ALTER TABLE ... COLUMN
Wenchen Fan created SPARK-36380: --- Summary: Simplify the logical plan names for ALTER TABLE ... COLUMN Key: SPARK-36380 URL: https://issues.apache.org/jira/browse/SPARK-36380 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36372) ALTER TABLE ADD COLUMNS should check duplicates for the specified columns for v2 command
[ https://issues.apache.org/jira/browse/SPARK-36372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-36372. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33600 [https://github.com/apache/spark/pull/33600] > ALTER TABLE ADD COLUMNS should check duplicates for the specified columns for > v2 command > > > Key: SPARK-36372 > URL: https://issues.apache.org/jira/browse/SPARK-36372 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Terry Kim >Assignee: Terry Kim >Priority: Major > Fix For: 3.2.0 > > > ALTER TABLE ADD COLUMNS currently doesn't check duplicates for the specified > columns for v2 command. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36372) ALTER TABLE ADD COLUMNS should check duplicates for the specified columns for v2 command
[ https://issues.apache.org/jira/browse/SPARK-36372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-36372: --- Assignee: Terry Kim > ALTER TABLE ADD COLUMNS should check duplicates for the specified columns for > v2 command > > > Key: SPARK-36372 > URL: https://issues.apache.org/jira/browse/SPARK-36372 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Terry Kim >Assignee: Terry Kim >Priority: Major > > ALTER TABLE ADD COLUMNS currently doesn't check duplicates for the specified > columns for v2 command. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36379) Null at root level of a JSON array causes the parsing failure (w/ permissive mode)
[ https://issues.apache.org/jira/browse/SPARK-36379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36379: Assignee: Apache Spark > Null at root level of a JSON array causes the parsing failure (w/ permissive > mode) > -- > > Key: SPARK-36379 > URL: https://issues.apache.org/jira/browse/SPARK-36379 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2, 3.2.0, 3.3.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Minor > > {code} > scala> spark.read.json(Seq("""[{"a": "str"}, null, {"a": > "str"}]""").toDS).collect() > {code} > {code} > ... > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 > (TID 1) (172.30.3.20 executor driver): java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) > {code} > Since the mode (by default) is permissive, we shouldn't just fail like above. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36379) Null at root level of a JSON array causes the parsing failure (w/ permissive mode)
[ https://issues.apache.org/jira/browse/SPARK-36379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391400#comment-17391400 ] Apache Spark commented on SPARK-36379: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/33608 > Null at root level of a JSON array causes the parsing failure (w/ permissive > mode) > -- > > Key: SPARK-36379 > URL: https://issues.apache.org/jira/browse/SPARK-36379 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2, 3.2.0, 3.3.0 >Reporter: Hyukjin Kwon >Priority: Minor > > {code} > scala> spark.read.json(Seq("""[{"a": "str"}, null, {"a": > "str"}]""").toDS).collect() > {code} > {code} > ... > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 > (TID 1) (172.30.3.20 executor driver): java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) > {code} > Since the mode (by default) is permissive, we shouldn't just fail like above. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36379) Null at root level of a JSON array causes the parsing failure (w/ permissive mode)
[ https://issues.apache.org/jira/browse/SPARK-36379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36379: Assignee: (was: Apache Spark) > Null at root level of a JSON array causes the parsing failure (w/ permissive > mode) > -- > > Key: SPARK-36379 > URL: https://issues.apache.org/jira/browse/SPARK-36379 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2, 3.2.0, 3.3.0 >Reporter: Hyukjin Kwon >Priority: Minor > > {code} > scala> spark.read.json(Seq("""[{"a": "str"}, null, {"a": > "str"}]""").toDS).collect() > {code} > {code} > ... > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 > (TID 1) (172.30.3.20 executor driver): java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) > {code} > Since the mode (by default) is permissive, we shouldn't just fail like above. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36175) Support TimestampNTZ in Avro data source
[ https://issues.apache.org/jira/browse/SPARK-36175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391385#comment-17391385 ] Apache Spark commented on SPARK-36175: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/33607 > Support TimestampNTZ in Avro data source > - > > Key: SPARK-36175 > URL: https://issues.apache.org/jira/browse/SPARK-36175 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Gengliang Wang >Assignee: jiaan.geng >Priority: Major > Fix For: 3.3.0 > > > As per the Avro spec > https://avro.apache.org/docs/1.10.2/spec.html#Local+timestamp+%28microsecond+precision%29, > Spark can convert TimestampNTZ type from/to Avro's Local timestamp type. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36175) Support TimestampNTZ in Avro data source
[ https://issues.apache.org/jira/browse/SPARK-36175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391386#comment-17391386 ] Apache Spark commented on SPARK-36175: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/33607 > Support TimestampNTZ in Avro data source > - > > Key: SPARK-36175 > URL: https://issues.apache.org/jira/browse/SPARK-36175 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Gengliang Wang >Assignee: jiaan.geng >Priority: Major > Fix For: 3.3.0 > > > As per the Avro spec > https://avro.apache.org/docs/1.10.2/spec.html#Local+timestamp+%28microsecond+precision%29, > Spark can convert TimestampNTZ type from/to Avro's Local timestamp type. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35815) Allow delayThreshold for watermark to be represented as ANSI day-time/year-month interval literals
[ https://issues.apache.org/jira/browse/SPARK-35815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391373#comment-17391373 ] Apache Spark commented on SPARK-35815: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/33606 > Allow delayThreshold for watermark to be represented as ANSI > day-time/year-month interval literals > -- > > Key: SPARK-35815 > URL: https://issues.apache.org/jira/browse/SPARK-35815 > Project: Spark > Issue Type: Sub-task > Components: SQL, Structured Streaming >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > Fix For: 3.2.0 > > > delayThreshold parameter of DataFrame.withWatermark should handle ANSI > day-time/year-month interval literals. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36379) Null at root level of a JSON array causes the parsing failure (w/ permissive mode)
[ https://issues.apache.org/jira/browse/SPARK-36379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-36379: - Issue Type: Bug (was: Improvement) > Null at root level of a JSON array causes the parsing failure (w/ permissive > mode) > -- > > Key: SPARK-36379 > URL: https://issues.apache.org/jira/browse/SPARK-36379 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2, 3.2.0, 3.3.0 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > scala> spark.read.json(Seq("""[{"a": "str"}, null, {"a": > "str"}]""").toDS).collect() > {code} > {code} > ... > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 > (TID 1) (172.30.3.20 executor driver): java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) > {code} > Since the mode (by default) is permissive, we shouldn't just fail like above. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36379) Null at root level of a JSON array causes the parsing failure (w/ permissive mode)
[ https://issues.apache.org/jira/browse/SPARK-36379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-36379: - Priority: Minor (was: Major) > Null at root level of a JSON array causes the parsing failure (w/ permissive > mode) > -- > > Key: SPARK-36379 > URL: https://issues.apache.org/jira/browse/SPARK-36379 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2, 3.2.0, 3.3.0 >Reporter: Hyukjin Kwon >Priority: Minor > > {code} > scala> spark.read.json(Seq("""[{"a": "str"}, null, {"a": > "str"}]""").toDS).collect() > {code} > {code} > ... > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 > (TID 1) (172.30.3.20 executor driver): java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) > {code} > Since the mode (by default) is permissive, we shouldn't just fail like above. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36379) Null at root level of a JSON array causes the parsing failure (w/ permissive mode)
Hyukjin Kwon created SPARK-36379: Summary: Null at root level of a JSON array causes the parsing failure (w/ permissive mode) Key: SPARK-36379 URL: https://issues.apache.org/jira/browse/SPARK-36379 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.2, 3.2.0, 3.3.0 Reporter: Hyukjin Kwon {code} scala> spark.read.json(Seq("""[{"a": "str"}, null, {"a": "str"}]""").toDS).collect() {code} {code} ... org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1) (172.30.3.20 executor driver): java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) {code} Since the mode (by default) is permissive, we shouldn't just fail like above. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35917) Disable push-based shuffle until the feature is complete
[ https://issues.apache.org/jira/browse/SPARK-35917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan updated SPARK-35917: Fix Version/s: (was: 3.2.0) > Disable push-based shuffle until the feature is complete > > > Key: SPARK-35917 > URL: https://issues.apache.org/jira/browse/SPARK-35917 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Chandni Singh >Priority: Major > > Push-based shuffle is partially merged in apache master but some of the tasks > are still incomplete. Since 3.2 is going to cut soon, we will not be able to > get the pending tasks reviewed and merged. Few of the pending tasks make > protocol changes to the push-based shuffle protocols, so we would like to > prevent users from enabling push-based shuffle both on the client and the > server until push-based shuffle implementation is complete. > We can prevent push-based shuffle to be used by throwing > {{UnsupportedOperationException}} (or something like that) both on the client > and the server when the user tries to enable it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35917) Disable push-based shuffle until the feature is complete
[ https://issues.apache.org/jira/browse/SPARK-35917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan updated SPARK-35917: Fix Version/s: 3.2.0 > Disable push-based shuffle until the feature is complete > > > Key: SPARK-35917 > URL: https://issues.apache.org/jira/browse/SPARK-35917 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Chandni Singh >Priority: Major > Fix For: 3.2.0 > > > Push-based shuffle is partially merged in apache master but some of the tasks > are still incomplete. Since 3.2 is going to cut soon, we will not be able to > get the pending tasks reviewed and merged. Few of the pending tasks make > protocol changes to the push-based shuffle protocols, so we would like to > prevent users from enabling push-based shuffle both on the client and the > server until push-based shuffle implementation is complete. > We can prevent push-based shuffle to be used by throwing > {{UnsupportedOperationException}} (or something like that) both on the client > and the server when the user tries to enable it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35917) Disable push-based shuffle until the feature is complete
[ https://issues.apache.org/jira/browse/SPARK-35917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan reassigned SPARK-35917: --- Assignee: (was: Mridul Muralidharan) > Disable push-based shuffle until the feature is complete > > > Key: SPARK-35917 > URL: https://issues.apache.org/jira/browse/SPARK-35917 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Chandni Singh >Priority: Major > > Push-based shuffle is partially merged in apache master but some of the tasks > are still incomplete. Since 3.2 is going to cut soon, we will not be able to > get the pending tasks reviewed and merged. Few of the pending tasks make > protocol changes to the push-based shuffle protocols, so we would like to > prevent users from enabling push-based shuffle both on the client and the > server until push-based shuffle implementation is complete. > We can prevent push-based shuffle to be used by throwing > {{UnsupportedOperationException}} (or something like that) both on the client > and the server when the user tries to enable it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36306) Refactor seventeenth set of 20 query execution errors to use error classes
[ https://issues.apache.org/jira/browse/SPARK-36306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391362#comment-17391362 ] PengLei commented on SPARK-36306: - working on this > Refactor seventeenth set of 20 query execution errors to use error classes > -- > > Key: SPARK-36306 > URL: https://issues.apache.org/jira/browse/SPARK-36306 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 3.2.0 >Reporter: Karen Feng >Priority: Major > > Refactor some exceptions in > [QueryExecutionErrors|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala] > to use error classes. > There are currently ~350 exceptions in this file; so this PR only focuses on > the seventeenth set of 20. > {code:java} > legacyCheckpointDirectoryExistsError > subprocessExitedError > outputDataTypeUnsupportedByNodeWithoutSerdeError > invalidStartIndexError > concurrentModificationOnExternalAppendOnlyUnsafeRowArrayError > doExecuteBroadcastNotImplementedError > databaseNameConflictWithSystemPreservedDatabaseError > commentOnTableUnsupportedError > unsupportedUpdateColumnNullabilityError > renameColumnUnsupportedForOlderMySQLError > failedToExecuteQueryError > nestedFieldUnsupportedError > transformationsAndActionsNotInvokedByDriverError > repeatedPivotsUnsupportedError > pivotNotAfterGroupByUnsupportedError > {code} > For more detail, see the parent ticket SPARK-36094. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36305) Refactor sixteenth set of 20 query execution errors to use error classes
[ https://issues.apache.org/jira/browse/SPARK-36305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391361#comment-17391361 ] PengLei commented on SPARK-36305: - working on this > Refactor sixteenth set of 20 query execution errors to use error classes > > > Key: SPARK-36305 > URL: https://issues.apache.org/jira/browse/SPARK-36305 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 3.2.0 >Reporter: Karen Feng >Priority: Major > > Refactor some exceptions in > [QueryExecutionErrors|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala] > to use error classes. > There are currently ~350 exceptions in this file; so this PR only focuses on > the sixteenth set of 20. > {code:java} > cannotDropMultiPartitionsOnNonatomicPartitionTableError > truncateMultiPartitionUnsupportedError > overwriteTableByUnsupportedExpressionError > dynamicPartitionOverwriteUnsupportedByTableError > failedMergingSchemaError > cannotBroadcastTableOverMaxTableRowsError > cannotBroadcastTableOverMaxTableBytesError > notEnoughMemoryToBuildAndBroadcastTableError > executeCodePathUnsupportedError > cannotMergeClassWithOtherClassError > continuousProcessingUnsupportedByDataSourceError > failedToReadDataError > failedToGenerateEpochMarkerError > foreachWriterAbortedDueToTaskFailureError > integerOverflowError > failedToReadDeltaFileError > failedToReadSnapshotFileError > cannotPurgeAsBreakInternalStateError > cleanUpSourceFilesUnsupportedError > latestOffsetNotCalledError > {code} > For more detail, see the parent ticket SPARK-36094. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36304) Refactor fifteenth set of 20 query execution errors to use error classes
[ https://issues.apache.org/jira/browse/SPARK-36304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17391360#comment-17391360 ] PengLei commented on SPARK-36304: - woking on this > Refactor fifteenth set of 20 query execution errors to use error classes > > > Key: SPARK-36304 > URL: https://issues.apache.org/jira/browse/SPARK-36304 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 3.2.0 >Reporter: Karen Feng >Priority: Major > > Refactor some exceptions in > [QueryExecutionErrors|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala] > to use error classes. > There are currently ~350 exceptions in this file; so this PR only focuses on > the fifteenth set of 20. > {code:java} > unsupportedOperationExceptionError > nullLiteralsCannotBeCastedError > notUserDefinedTypeError > cannotLoadUserDefinedTypeError > timeZoneIdNotSpecifiedForTimestampTypeError > notPublicClassError > primitiveTypesNotSupportedError > fieldIndexOnRowWithoutSchemaError > valueIsNullError > onlySupportDataSourcesProvidingFileFormatError > failToSetOriginalPermissionBackError > failToSetOriginalACLBackError > multiFailuresInStageMaterializationError > unrecognizedCompressionSchemaTypeIDError > getParentLoggerNotImplementedError > cannotCreateParquetConverterForTypeError > cannotCreateParquetConverterForDecimalTypeError > cannotCreateParquetConverterForDataTypeError > cannotAddMultiPartitionsOnNonatomicPartitionTableError > userSpecifiedSchemaUnsupportedByDataSourceError > {code} > For more detail, see the parent ticket SPARK-36094. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36378) Minor changes to address a few identified server side inefficiencies
[ https://issues.apache.org/jira/browse/SPARK-36378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan reassigned SPARK-36378: --- Assignee: (was: Mridul Muralidharan) > Minor changes to address a few identified server side inefficiencies > > > Key: SPARK-36378 > URL: https://issues.apache.org/jira/browse/SPARK-36378 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.2.0 >Reporter: Min Shen >Priority: Major > > With the SPIP ticket close to being finished, we have done some performance > evaluations to compare the performance of push-based shuffle in upstream > Spark with the production version we have internally at LinkedIn. > The evaluations have revealed a few regressions and also some additional perf > improvement opportunity. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org