[jira] [Updated] (SPARK-36660) Cotangent is not supported by Dataframe
[ https://issues.apache.org/jira/browse/SPARK-36660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuto Akutsu updated SPARK-36660: Priority: Minor (was: Major) > Cotangent is not supported by Dataframe > --- > > Key: SPARK-36660 > URL: https://issues.apache.org/jira/browse/SPARK-36660 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2 >Reporter: Yuto Akutsu >Priority: Minor > > Cotangent is implemented in mathExpressions but cannot be called by dataframe > operations like other math expressions (e.g. {{df.select(sin($"col"))}}). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36661) Support TimestampNTZ in PyArrow
Hyukjin Kwon created SPARK-36661: Summary: Support TimestampNTZ in PyArrow Key: SPARK-36661 URL: https://issues.apache.org/jira/browse/SPARK-36661 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.3.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36661) Support TimestampNTZ in Py4J
[ https://issues.apache.org/jira/browse/SPARK-36661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-36661: - Summary: Support TimestampNTZ in Py4J (was: Support TimestampNTZ in PyArrow) > Support TimestampNTZ in Py4J > > > Key: SPARK-36661 > URL: https://issues.apache.org/jira/browse/SPARK-36661 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36660) Cotangent is not supported by Dataframe
[ https://issues.apache.org/jira/browse/SPARK-36660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuto Akutsu updated SPARK-36660: Description: Cotangent is implemented in mathExpressions but cannot be called by dataframe operations like other math expressions (e.g. {{df.select(sin($"col"))}}). (was: Cotangent is implemented in mathExpressions but cannot be called by dataframe operations like other math expressions (e.g. df.select(sin($"col"))).) > Cotangent is not supported by Dataframe > --- > > Key: SPARK-36660 > URL: https://issues.apache.org/jira/browse/SPARK-36660 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2 >Reporter: Yuto Akutsu >Priority: Major > > Cotangent is implemented in mathExpressions but cannot be called by dataframe > operations like other math expressions (e.g. {{df.select(sin($"col"))}}). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36660) Cotangent is not supported by Dataframe
[ https://issues.apache.org/jira/browse/SPARK-36660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuto Akutsu updated SPARK-36660: Description: Cotangent is implemented in mathExpressions but cannot be called by dataframe operations like other math expressions (e.g. \{code}df.select(sin($"col"))\{code}). (was: Cotangent is implemented in mathExpressions but cannot be called by dataframe operations like other math expressions (e.g. df.select(sin($"col"))).) > Cotangent is not supported by Dataframe > --- > > Key: SPARK-36660 > URL: https://issues.apache.org/jira/browse/SPARK-36660 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2 >Reporter: Yuto Akutsu >Priority: Major > > Cotangent is implemented in mathExpressions but cannot be called by dataframe > operations like other math expressions (e.g. > \{code}df.select(sin($"col"))\{code}). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36660) Cotangent is not supported by Dataframe
[ https://issues.apache.org/jira/browse/SPARK-36660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuto Akutsu updated SPARK-36660: Description: Cotangent is implemented in mathExpressions but cannot be called by dataframe operations like other math expressions (e.g. df.select(sin($"col"))). (was: Cotangent is implemented in mathExpressions but cannot be called by dataframe operations like other math expressions (e.g. \{code}df.select(sin($"col"))\{code}).) > Cotangent is not supported by Dataframe > --- > > Key: SPARK-36660 > URL: https://issues.apache.org/jira/browse/SPARK-36660 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2 >Reporter: Yuto Akutsu >Priority: Major > > Cotangent is implemented in mathExpressions but cannot be called by dataframe > operations like other math expressions (e.g. df.select(sin($"col"))). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36660) Cotangent is not supported by Dataframe
[ https://issues.apache.org/jira/browse/SPARK-36660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuto Akutsu updated SPARK-36660: Description: Cotangent is implemented in mathExpressions but cannot be called by dataframe operations like other math expressions (e.g. df.select(sin($"col"))). (was: Cotangent is implemented in ^mathExpressions^ but cannot be called by dataframe operations like other math expressions (e.g. ^df.select(sin($"col"))^).) > Cotangent is not supported by Dataframe > --- > > Key: SPARK-36660 > URL: https://issues.apache.org/jira/browse/SPARK-36660 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2 >Reporter: Yuto Akutsu >Priority: Major > > Cotangent is implemented in mathExpressions but cannot be called by dataframe > operations like other math expressions (e.g. df.select(sin($"col"))). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36658) Expose executionId to QueryExecutionListener
[ https://issues.apache.org/jira/browse/SPARK-36658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36658: Assignee: Apache Spark > Expose executionId to QueryExecutionListener > > > Key: SPARK-36658 > URL: https://issues.apache.org/jira/browse/SPARK-36658 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.2 >Reporter: huangtengfei >Assignee: Apache Spark >Priority: Minor > > Now in > [QueryExecutionListener|https://github.com/apache/spark/blob/v3.2.0-rc2/sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala#L38] > we have exposed API to get the query execution information: > def onSuccess(funcName: String, qe: QueryExecution, durationNs: Long): Unit > def onFailure(funcName: String, qe: QueryExecution, exception: Exception): > Unit > > But we can not get a clear information that which query is this. In Spark > SQL, I think that executionId is the direct identifier of a query execution. > So I think it make sense to expose executionId to the QueryExecutionListener, > so that people can easily find the exact query in UI or history server to > track more information of the query execution. And there is no easy way we > can find the relevant executionId from a QueryExecution object. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36658) Expose executionId to QueryExecutionListener
[ https://issues.apache.org/jira/browse/SPARK-36658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36658: Assignee: (was: Apache Spark) > Expose executionId to QueryExecutionListener > > > Key: SPARK-36658 > URL: https://issues.apache.org/jira/browse/SPARK-36658 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.2 >Reporter: huangtengfei >Priority: Minor > > Now in > [QueryExecutionListener|https://github.com/apache/spark/blob/v3.2.0-rc2/sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala#L38] > we have exposed API to get the query execution information: > def onSuccess(funcName: String, qe: QueryExecution, durationNs: Long): Unit > def onFailure(funcName: String, qe: QueryExecution, exception: Exception): > Unit > > But we can not get a clear information that which query is this. In Spark > SQL, I think that executionId is the direct identifier of a query execution. > So I think it make sense to expose executionId to the QueryExecutionListener, > so that people can easily find the exact query in UI or history server to > track more information of the query execution. And there is no easy way we > can find the relevant executionId from a QueryExecution object. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36658) Expose executionId to QueryExecutionListener
[ https://issues.apache.org/jira/browse/SPARK-36658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409254#comment-17409254 ] Apache Spark commented on SPARK-36658: -- User 'ivoson' has created a pull request for this issue: https://github.com/apache/spark/pull/33905 > Expose executionId to QueryExecutionListener > > > Key: SPARK-36658 > URL: https://issues.apache.org/jira/browse/SPARK-36658 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.2 >Reporter: huangtengfei >Priority: Minor > > Now in > [QueryExecutionListener|https://github.com/apache/spark/blob/v3.2.0-rc2/sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala#L38] > we have exposed API to get the query execution information: > def onSuccess(funcName: String, qe: QueryExecution, durationNs: Long): Unit > def onFailure(funcName: String, qe: QueryExecution, exception: Exception): > Unit > > But we can not get a clear information that which query is this. In Spark > SQL, I think that executionId is the direct identifier of a query execution. > So I think it make sense to expose executionId to the QueryExecutionListener, > so that people can easily find the exact query in UI or history server to > track more information of the query execution. And there is no easy way we > can find the relevant executionId from a QueryExecution object. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36660) Cotangent is not supported by Dataframe
[ https://issues.apache.org/jira/browse/SPARK-36660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuto Akutsu updated SPARK-36660: Description: Cotangent is implemented in ^mathExpressions^ but cannot be called by dataframe operations like other math expressions (e.g. ^df.select(sin($"col"))^). (was: Cotangent is implemented in mathExpressions but cannot be called by dataframe operations like other math expressions (e.g. ^df.select(sin($"col"))^).) > Cotangent is not supported by Dataframe > --- > > Key: SPARK-36660 > URL: https://issues.apache.org/jira/browse/SPARK-36660 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2 >Reporter: Yuto Akutsu >Priority: Major > > Cotangent is implemented in ^mathExpressions^ but cannot be called by > dataframe operations like other math expressions (e.g. > ^df.select(sin($"col"))^). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36660) Cotangent is not supported by Dataframe
[ https://issues.apache.org/jira/browse/SPARK-36660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuto Akutsu updated SPARK-36660: Description: Cotangent is implemented in mathExpressions but cannot be called by dataframe operations like other math expressions (e.g. ^df.select(sin($"col"))^). (was: Cotangent is implemented in mathExpressions but cannot be called by dataframe operations like other math expressions (e.g. `df.select(sin($"col"))`).) > Cotangent is not supported by Dataframe > --- > > Key: SPARK-36660 > URL: https://issues.apache.org/jira/browse/SPARK-36660 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2 >Reporter: Yuto Akutsu >Priority: Major > > Cotangent is implemented in mathExpressions but cannot be called by dataframe > operations like other math expressions (e.g. ^df.select(sin($"col"))^). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36660) Cotangent is not supported by Dataframe
Yuto Akutsu created SPARK-36660: --- Summary: Cotangent is not supported by Dataframe Key: SPARK-36660 URL: https://issues.apache.org/jira/browse/SPARK-36660 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.2 Reporter: Yuto Akutsu Cotangent is implemented in mathExpressions but cannot be called by dataframe operations like other math expressions (e.g. `df.select(sin($"col"))`). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36659) Promote spark.sql.execution.topKSortFallbackThreshold to user-faced config
[ https://issues.apache.org/jira/browse/SPARK-36659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409240#comment-17409240 ] Apache Spark commented on SPARK-36659: -- User 'yaooqinn' has created a pull request for this issue: https://github.com/apache/spark/pull/33904 > Promote spark.sql.execution.topKSortFallbackThreshold to user-faced config > -- > > Key: SPARK-36659 > URL: https://issues.apache.org/jira/browse/SPARK-36659 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Kent Yao >Priority: Minor > > spark.sql.execution.topKSortFallbackThreshold now is an internal config > hidden from users Integer.MAX_VALUE - 15 as its default. In many real-world > cases, if the K is very big, there would be performance issues. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36659) Promote spark.sql.execution.topKSortFallbackThreshold to user-faced config
[ https://issues.apache.org/jira/browse/SPARK-36659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36659: Assignee: Apache Spark > Promote spark.sql.execution.topKSortFallbackThreshold to user-faced config > -- > > Key: SPARK-36659 > URL: https://issues.apache.org/jira/browse/SPARK-36659 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Kent Yao >Assignee: Apache Spark >Priority: Minor > > spark.sql.execution.topKSortFallbackThreshold now is an internal config > hidden from users Integer.MAX_VALUE - 15 as its default. In many real-world > cases, if the K is very big, there would be performance issues. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36659) Promote spark.sql.execution.topKSortFallbackThreshold to user-faced config
[ https://issues.apache.org/jira/browse/SPARK-36659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36659: Assignee: (was: Apache Spark) > Promote spark.sql.execution.topKSortFallbackThreshold to user-faced config > -- > > Key: SPARK-36659 > URL: https://issues.apache.org/jira/browse/SPARK-36659 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Kent Yao >Priority: Minor > > spark.sql.execution.topKSortFallbackThreshold now is an internal config > hidden from users Integer.MAX_VALUE - 15 as its default. In many real-world > cases, if the K is very big, there would be performance issues. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36659) Promote spark.sql.execution.topKSortFallbackThreshold to user-faced config
[ https://issues.apache.org/jira/browse/SPARK-36659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409239#comment-17409239 ] Apache Spark commented on SPARK-36659: -- User 'yaooqinn' has created a pull request for this issue: https://github.com/apache/spark/pull/33904 > Promote spark.sql.execution.topKSortFallbackThreshold to user-faced config > -- > > Key: SPARK-36659 > URL: https://issues.apache.org/jira/browse/SPARK-36659 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Kent Yao >Priority: Minor > > spark.sql.execution.topKSortFallbackThreshold now is an internal config > hidden from users Integer.MAX_VALUE - 15 as its default. In many real-world > cases, if the K is very big, there would be performance issues. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36652) AQE dynamic join selection should not apply to non-equi join
[ https://issues.apache.org/jira/browse/SPARK-36652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-36652. --- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 33899 [https://github.com/apache/spark/pull/33899] > AQE dynamic join selection should not apply to non-equi join > > > Key: SPARK-36652 > URL: https://issues.apache.org/jira/browse/SPARK-36652 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Cheng Su >Assignee: Cheng Su >Priority: Minor > Fix For: 3.3.0 > > > Currently `DynamicJoinSelection` has two features: 1.demote broadcast hash > join, and 2.promote shuffled hash join. Both are achieved by adding join hint > in query plan, and only works for equi join. However the rule is matching > with `Join` operator now - > [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/DynamicJoinSelection.scala#L71,] > so it would add hint for non-equi join by mistake. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36652) AQE dynamic join selection should not apply to non-equi join
[ https://issues.apache.org/jira/browse/SPARK-36652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-36652: - Assignee: Cheng Su > AQE dynamic join selection should not apply to non-equi join > > > Key: SPARK-36652 > URL: https://issues.apache.org/jira/browse/SPARK-36652 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Cheng Su >Assignee: Cheng Su >Priority: Minor > > Currently `DynamicJoinSelection` has two features: 1.demote broadcast hash > join, and 2.promote shuffled hash join. Both are achieved by adding join hint > in query plan, and only works for equi join. However the rule is matching > with `Join` operator now - > [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/DynamicJoinSelection.scala#L71,] > so it would add hint for non-equi join by mistake. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36633) DivideDTInterval should throw the same exception when divide by zero.
[ https://issues.apache.org/jira/browse/SPARK-36633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-36633: --- Summary: DivideDTInterval should throw the same exception when divide by zero. (was: DivideDTInterval should consider ansi mode.) > DivideDTInterval should throw the same exception when divide by zero. > - > > Key: SPARK-36633 > URL: https://issues.apache.org/jira/browse/SPARK-36633 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Priority: Major > > DivideDTInterval not consider the ansi mode, we should support it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36659) Promote spark.sql.execution.topKSortFallbackThreshold to user-faced config
Kent Yao created SPARK-36659: Summary: Promote spark.sql.execution.topKSortFallbackThreshold to user-faced config Key: SPARK-36659 URL: https://issues.apache.org/jira/browse/SPARK-36659 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0, 3.3.0 Reporter: Kent Yao spark.sql.execution.topKSortFallbackThreshold now is an internal config hidden from users Integer.MAX_VALUE - 15 as its default. In many real-world cases, if the K is very big, there would be performance issues. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36650) ApplicationMaster shutdown hook should catch timeout exception
[ https://issues.apache.org/jira/browse/SPARK-36650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-36650. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 33897 [https://github.com/apache/spark/pull/33897] > ApplicationMaster shutdown hook should catch timeout exception > -- > > Key: SPARK-36650 > URL: https://issues.apache.org/jira/browse/SPARK-36650 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.2.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Fix For: 3.3.0 > > > ApplicationMaster shutdown hook should catch timeout exception -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36650) ApplicationMaster shutdown hook should catch timeout exception
[ https://issues.apache.org/jira/browse/SPARK-36650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-36650: Assignee: angerszhu > ApplicationMaster shutdown hook should catch timeout exception > -- > > Key: SPARK-36650 > URL: https://issues.apache.org/jira/browse/SPARK-36650 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.2.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > > ApplicationMaster shutdown hook should catch timeout exception -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-36658) Expose executionId to QueryExecutionListener
[ https://issues.apache.org/jira/browse/SPARK-36658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409210#comment-17409210 ] huangtengfei edited comment on SPARK-36658 at 9/3/21, 2:36 AM: --- cc [~cloud_fan] could you share thoughts about this? was (Author: ivoson): cc [~cloud_fan] could you share any thoughts about this? > Expose executionId to QueryExecutionListener > > > Key: SPARK-36658 > URL: https://issues.apache.org/jira/browse/SPARK-36658 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.2 >Reporter: huangtengfei >Priority: Minor > > Now in > [QueryExecutionListener|https://github.com/apache/spark/blob/v3.2.0-rc2/sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala#L38] > we have exposed API to get the query execution information: > def onSuccess(funcName: String, qe: QueryExecution, durationNs: Long): Unit > def onFailure(funcName: String, qe: QueryExecution, exception: Exception): > Unit > > But we can not get a clear information that which query is this. In Spark > SQL, I think that executionId is the direct identifier of a query execution. > So I think it make sense to expose executionId to the QueryExecutionListener, > so that people can easily find the exact query in UI or history server to > track more information of the query execution. And there is no easy way we > can find the relevant executionId from a QueryExecution object. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36658) Expose executionId to QueryExecutionListener
[ https://issues.apache.org/jira/browse/SPARK-36658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409228#comment-17409228 ] huangtengfei commented on SPARK-36658: -- Will create a RP for this. > Expose executionId to QueryExecutionListener > > > Key: SPARK-36658 > URL: https://issues.apache.org/jira/browse/SPARK-36658 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.2 >Reporter: huangtengfei >Priority: Minor > > Now in > [QueryExecutionListener|https://github.com/apache/spark/blob/v3.2.0-rc2/sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala#L38] > we have exposed API to get the query execution information: > def onSuccess(funcName: String, qe: QueryExecution, durationNs: Long): Unit > def onFailure(funcName: String, qe: QueryExecution, exception: Exception): > Unit > > But we can not get a clear information that which query is this. In Spark > SQL, I think that executionId is the direct identifier of a query execution. > So I think it make sense to expose executionId to the QueryExecutionListener, > so that people can easily find the exact query in UI or history server to > track more information of the query execution. And there is no easy way we > can find the relevant executionId from a QueryExecution object. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36635) spark-sql do NOT support that select name expression as string type now
[ https://issues.apache.org/jira/browse/SPARK-36635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409225#comment-17409225 ] Hyukjin Kwon commented on SPARK-36635: -- can you justify why it should work? is it ANSI standard or other DBMSes support? > spark-sql do NOT support that select name expression as string type now > > > Key: SPARK-36635 > URL: https://issues.apache.org/jira/browse/SPARK-36635 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0, 3.1.2 >Reporter: weixiuli >Priority: Major > > The follow statement would throw an exception. > {code:java} > sql("SELECT age as 'a', name as 'n' FROM VALUES (2, 'Alice'), (5, 'Bob') > people(age, name)") > {code} > {code:java} > // Exception information > Error in query: > mismatched input ''a'' expecting {, ';'}(line 1, pos 14) > == SQL == > SELECT age as 'a', name as 'n' FROM VALUES (2, 'Alice'), (5, 'Bob') > people(age, name) > --^^^ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36635) spark-sql do NOT support that select name expression as string type now
[ https://issues.apache.org/jira/browse/SPARK-36635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409222#comment-17409222 ] weixiuli commented on SPARK-36635: -- It didn't work before either. > spark-sql do NOT support that select name expression as string type now > > > Key: SPARK-36635 > URL: https://issues.apache.org/jira/browse/SPARK-36635 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0, 3.1.2 >Reporter: weixiuli >Priority: Major > > The follow statement would throw an exception. > {code:java} > sql("SELECT age as 'a', name as 'n' FROM VALUES (2, 'Alice'), (5, 'Bob') > people(age, name)") > {code} > {code:java} > // Exception information > Error in query: > mismatched input ''a'' expecting {, ';'}(line 1, pos 14) > == SQL == > SELECT age as 'a', name as 'n' FROM VALUES (2, 'Alice'), (5, 'Bob') > people(age, name) > --^^^ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36351) Separate partition filters and data filters in PushDownUtils
[ https://issues.apache.org/jira/browse/SPARK-36351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh resolved SPARK-36351. - Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 33650 [https://github.com/apache/spark/pull/33650] > Separate partition filters and data filters in PushDownUtils > > > Key: SPARK-36351 > URL: https://issues.apache.org/jira/browse/SPARK-36351 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Minor > Fix For: 3.3.0 > > > Currently, DSv2 partition filters and data filters are separated in > PruneFileSourcePartitions. It's better to separate these in PushDownUtils, > where we do filter/aggregate push down and column pruning, so we can still > push down aggregate for FileScan if the filers are only partition filters. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36351) Separate partition filters and data filters in PushDownUtils
[ https://issues.apache.org/jira/browse/SPARK-36351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh reassigned SPARK-36351: --- Assignee: Huaxin Gao > Separate partition filters and data filters in PushDownUtils > > > Key: SPARK-36351 > URL: https://issues.apache.org/jira/browse/SPARK-36351 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Minor > > Currently, DSv2 partition filters and data filters are separated in > PruneFileSourcePartitions. It's better to separate these in PushDownUtils, > where we do filter/aggregate push down and column pruning, so we can still > push down aggregate for FileScan if the filers are only partition filters. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36658) Expose executionId to QueryExecutionListener
[ https://issues.apache.org/jira/browse/SPARK-36658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huangtengfei updated SPARK-36658: - Description: Now in [QueryExecutionListener|https://github.com/apache/spark/blob/v3.2.0-rc2/sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala#L38] we have exposed API to get the query execution information: def onSuccess(funcName: String, qe: QueryExecution, durationNs: Long): Unit def onFailure(funcName: String, qe: QueryExecution, exception: Exception): Unit But we can not get a clear information that which query is this. In Spark SQL, I think that executionId is the direct identifier of a query execution. So I think it make sense to expose executionId to the QueryExecutionListener, so that people can easily find the exact query in UI or history server to track more information of the query execution. And there is no easy way we can find the relevant executionId from a QueryExecution object. was: Now in [QueryExecutionListener|https://github.com/apache/spark/blob/v3.2.0-rc2/sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala#L38] we have exposed API to get the query execution information: def onSuccess(funcName: String, qe: QueryExecution, durationNs: Long): Unit def onFailure(funcName: String, qe: QueryExecution, exception: Exception): Unit But we can not get a clear information that which query is this. In Spark SQL, I think that executionId is the direct identifier of a query execution. So I think it make sense to expose executionId to the QueryExecutionListener, so that people can easily find the exact query in UI or history server to track more information of the query execution. > Expose executionId to QueryExecutionListener > > > Key: SPARK-36658 > URL: https://issues.apache.org/jira/browse/SPARK-36658 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.2 >Reporter: huangtengfei >Priority: Minor > > Now in > [QueryExecutionListener|https://github.com/apache/spark/blob/v3.2.0-rc2/sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala#L38] > we have exposed API to get the query execution information: > def onSuccess(funcName: String, qe: QueryExecution, durationNs: Long): Unit > def onFailure(funcName: String, qe: QueryExecution, exception: Exception): > Unit > > But we can not get a clear information that which query is this. In Spark > SQL, I think that executionId is the direct identifier of a query execution. > So I think it make sense to expose executionId to the QueryExecutionListener, > so that people can easily find the exact query in UI or history server to > track more information of the query execution. And there is no easy way we > can find the relevant executionId from a QueryExecution object. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36658) Expose executionId to QueryExecutionListener
[ https://issues.apache.org/jira/browse/SPARK-36658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409210#comment-17409210 ] huangtengfei commented on SPARK-36658: -- cc [~cloud_fan] could you share any thoughts about this? > Expose executionId to QueryExecutionListener > > > Key: SPARK-36658 > URL: https://issues.apache.org/jira/browse/SPARK-36658 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.2 >Reporter: huangtengfei >Priority: Minor > > Now in > [QueryExecutionListener|https://github.com/apache/spark/blob/v3.2.0-rc2/sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala#L38] > we have exposed API to get the query execution information: > def onSuccess(funcName: String, qe: QueryExecution, durationNs: Long): Unit > def onFailure(funcName: String, qe: QueryExecution, exception: Exception): > Unit > > But we can not get a clear information that which query is this. In Spark > SQL, I think that executionId is the direct identifier of a query execution. > So I think it make sense to expose executionId to the QueryExecutionListener, > so that people can easily find the exact query in UI or history server to > track more information of the query execution. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36658) Expose executionId to QueryExecutionListener
huangtengfei created SPARK-36658: Summary: Expose executionId to QueryExecutionListener Key: SPARK-36658 URL: https://issues.apache.org/jira/browse/SPARK-36658 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.2 Reporter: huangtengfei Now in [QueryExecutionListener|https://github.com/apache/spark/blob/v3.2.0-rc2/sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala#L38] we have exposed API to get the query execution information: def onSuccess(funcName: String, qe: QueryExecution, durationNs: Long): Unit def onFailure(funcName: String, qe: QueryExecution, exception: Exception): Unit But we can not get a clear information that which query is this. In Spark SQL, I think that executionId is the direct identifier of a query execution. So I think it make sense to expose executionId to the QueryExecutionListener, so that people can easily find the exact query in UI or history server to track more information of the query execution. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36657) Update comment in `gen-sql-config-docs.py`
[ https://issues.apache.org/jira/browse/SPARK-36657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-36657: -- Affects Version/s: 3.2.0 > Update comment in `gen-sql-config-docs.py` > -- > > Key: SPARK-36657 > URL: https://issues.apache.org/jira/browse/SPARK-36657 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: William Hyun >Assignee: William Hyun >Priority: Major > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36657) Update comment in `gen-sql-config-docs.py`
[ https://issues.apache.org/jira/browse/SPARK-36657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-36657: - Assignee: William Hyun > Update comment in `gen-sql-config-docs.py` > -- > > Key: SPARK-36657 > URL: https://issues.apache.org/jira/browse/SPARK-36657 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.3.0 >Reporter: William Hyun >Assignee: William Hyun >Priority: Major > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36657) Update comment in `gen-sql-config-docs.py`
[ https://issues.apache.org/jira/browse/SPARK-36657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-36657. --- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33902 [https://github.com/apache/spark/pull/33902] > Update comment in `gen-sql-config-docs.py` > -- > > Key: SPARK-36657 > URL: https://issues.apache.org/jira/browse/SPARK-36657 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.3.0 >Reporter: William Hyun >Priority: Major > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36656) CollapseProject should not collapse correlated scalar subqueries
[ https://issues.apache.org/jira/browse/SPARK-36656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409174#comment-17409174 ] Apache Spark commented on SPARK-36656: -- User 'allisonwang-db' has created a pull request for this issue: https://github.com/apache/spark/pull/33903 > CollapseProject should not collapse correlated scalar subqueries > > > Key: SPARK-36656 > URL: https://issues.apache.org/jira/browse/SPARK-36656 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Allison Wang >Priority: Major > > Currently, the optimizer rule `CollapseProject` inlines expressions generated > from correlated scalar subqueries, which can create unnecessary left outer > joins. > {code:sql} > select c1, s, s * 10 from ( > select c1, (select first(c2) from t2 where t1.c1 = t2.c1) s from t1) > {code} > {code:scala} > // Before > Project [c1, s, (s * 10)] > +- Project [c1, scalar-subquery [c1] AS s] >: +- Aggregate [c1], [first(c2), c1] >: +- LocalRelation [c1, c2] >+- LocalRelation [c1, c2] > // After (scalar subqueries are inlined) > Project [c1, scalar-subquery [c1], (scalar-subquery [c1] * 10)] > : +- Aggregate [c1], [first(c2), c1] > : +- LocalRelation [c1, c2] > : +- Aggregate [c1], [first(c2), c1] > : +- LocalRelation [c1, c2] > +- LocalRelation [c1, c2] > {code} > Then this query will have two LeftOuter joins created. We should only > collapse projects after correlated subqueries are rewritten as joins. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36657) Update comment in `gen-sql-config-docs.py`
[ https://issues.apache.org/jira/browse/SPARK-36657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409171#comment-17409171 ] Apache Spark commented on SPARK-36657: -- User 'williamhyun' has created a pull request for this issue: https://github.com/apache/spark/pull/33902 > Update comment in `gen-sql-config-docs.py` > -- > > Key: SPARK-36657 > URL: https://issues.apache.org/jira/browse/SPARK-36657 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.3.0 >Reporter: William Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36656) CollapseProject should not collapse correlated scalar subqueries
[ https://issues.apache.org/jira/browse/SPARK-36656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409173#comment-17409173 ] Apache Spark commented on SPARK-36656: -- User 'allisonwang-db' has created a pull request for this issue: https://github.com/apache/spark/pull/33903 > CollapseProject should not collapse correlated scalar subqueries > > > Key: SPARK-36656 > URL: https://issues.apache.org/jira/browse/SPARK-36656 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Allison Wang >Priority: Major > > Currently, the optimizer rule `CollapseProject` inlines expressions generated > from correlated scalar subqueries, which can create unnecessary left outer > joins. > {code:sql} > select c1, s, s * 10 from ( > select c1, (select first(c2) from t2 where t1.c1 = t2.c1) s from t1) > {code} > {code:scala} > // Before > Project [c1, s, (s * 10)] > +- Project [c1, scalar-subquery [c1] AS s] >: +- Aggregate [c1], [first(c2), c1] >: +- LocalRelation [c1, c2] >+- LocalRelation [c1, c2] > // After (scalar subqueries are inlined) > Project [c1, scalar-subquery [c1], (scalar-subquery [c1] * 10)] > : +- Aggregate [c1], [first(c2), c1] > : +- LocalRelation [c1, c2] > : +- Aggregate [c1], [first(c2), c1] > : +- LocalRelation [c1, c2] > +- LocalRelation [c1, c2] > {code} > Then this query will have two LeftOuter joins created. We should only > collapse projects after correlated subqueries are rewritten as joins. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36656) CollapseProject should not collapse correlated scalar subqueries
[ https://issues.apache.org/jira/browse/SPARK-36656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36656: Assignee: Apache Spark > CollapseProject should not collapse correlated scalar subqueries > > > Key: SPARK-36656 > URL: https://issues.apache.org/jira/browse/SPARK-36656 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Allison Wang >Assignee: Apache Spark >Priority: Major > > Currently, the optimizer rule `CollapseProject` inlines expressions generated > from correlated scalar subqueries, which can create unnecessary left outer > joins. > {code:sql} > select c1, s, s * 10 from ( > select c1, (select first(c2) from t2 where t1.c1 = t2.c1) s from t1) > {code} > {code:scala} > // Before > Project [c1, s, (s * 10)] > +- Project [c1, scalar-subquery [c1] AS s] >: +- Aggregate [c1], [first(c2), c1] >: +- LocalRelation [c1, c2] >+- LocalRelation [c1, c2] > // After (scalar subqueries are inlined) > Project [c1, scalar-subquery [c1], (scalar-subquery [c1] * 10)] > : +- Aggregate [c1], [first(c2), c1] > : +- LocalRelation [c1, c2] > : +- Aggregate [c1], [first(c2), c1] > : +- LocalRelation [c1, c2] > +- LocalRelation [c1, c2] > {code} > Then this query will have two LeftOuter joins created. We should only > collapse projects after correlated subqueries are rewritten as joins. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36656) CollapseProject should not collapse correlated scalar subqueries
[ https://issues.apache.org/jira/browse/SPARK-36656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36656: Assignee: (was: Apache Spark) > CollapseProject should not collapse correlated scalar subqueries > > > Key: SPARK-36656 > URL: https://issues.apache.org/jira/browse/SPARK-36656 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Allison Wang >Priority: Major > > Currently, the optimizer rule `CollapseProject` inlines expressions generated > from correlated scalar subqueries, which can create unnecessary left outer > joins. > {code:sql} > select c1, s, s * 10 from ( > select c1, (select first(c2) from t2 where t1.c1 = t2.c1) s from t1) > {code} > {code:scala} > // Before > Project [c1, s, (s * 10)] > +- Project [c1, scalar-subquery [c1] AS s] >: +- Aggregate [c1], [first(c2), c1] >: +- LocalRelation [c1, c2] >+- LocalRelation [c1, c2] > // After (scalar subqueries are inlined) > Project [c1, scalar-subquery [c1], (scalar-subquery [c1] * 10)] > : +- Aggregate [c1], [first(c2), c1] > : +- LocalRelation [c1, c2] > : +- Aggregate [c1], [first(c2), c1] > : +- LocalRelation [c1, c2] > +- LocalRelation [c1, c2] > {code} > Then this query will have two LeftOuter joins created. We should only > collapse projects after correlated subqueries are rewritten as joins. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36657) Update comment in `gen-sql-config-docs.py`
[ https://issues.apache.org/jira/browse/SPARK-36657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36657: Assignee: (was: Apache Spark) > Update comment in `gen-sql-config-docs.py` > -- > > Key: SPARK-36657 > URL: https://issues.apache.org/jira/browse/SPARK-36657 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.3.0 >Reporter: William Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36657) Update comment in `gen-sql-config-docs.py`
[ https://issues.apache.org/jira/browse/SPARK-36657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409170#comment-17409170 ] Apache Spark commented on SPARK-36657: -- User 'williamhyun' has created a pull request for this issue: https://github.com/apache/spark/pull/33902 > Update comment in `gen-sql-config-docs.py` > -- > > Key: SPARK-36657 > URL: https://issues.apache.org/jira/browse/SPARK-36657 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.3.0 >Reporter: William Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36657) Update comment in `gen-sql-config-docs.py`
[ https://issues.apache.org/jira/browse/SPARK-36657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36657: Assignee: Apache Spark > Update comment in `gen-sql-config-docs.py` > -- > > Key: SPARK-36657 > URL: https://issues.apache.org/jira/browse/SPARK-36657 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.3.0 >Reporter: William Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36657) Update comment in `gen-sql-config-docs.py`
William Hyun created SPARK-36657: Summary: Update comment in `gen-sql-config-docs.py` Key: SPARK-36657 URL: https://issues.apache.org/jira/browse/SPARK-36657 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.3.0 Reporter: William Hyun -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36656) CollapseProject should not collapse correlated scalar subqueries
[ https://issues.apache.org/jira/browse/SPARK-36656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated SPARK-36656: - Description: Currently, the optimizer rule `CollapseProject` inlines expressions generated from correlated scalar subqueries, which can create unnecessary left outer joins. {code:sql} select c1, s, s * 10 from ( select c1, (select first(c2) from t2 where t1.c1 = t2.c1) s from t1) {code} {code:scala} // Before Project [c1, s, (s * 10)] +- Project [c1, scalar-subquery [c1] AS s] : +- Aggregate [c1], [first(c2), c1] : +- LocalRelation [c1, c2] +- LocalRelation [c1, c2] // After (scalar subqueries are inlined) Project [c1, scalar-subquery [c1], (scalar-subquery [c1] * 10)] : +- Aggregate [c1], [first(c2), c1] : +- LocalRelation [c1, c2] : +- Aggregate [c1], [first(c2), c1] : +- LocalRelation [c1, c2] +- LocalRelation [c1, c2] {code} Then this query will have two LeftOuter joins created. We should only collapse projects after correlated subqueries are rewritten as joins. was: Currently, the optimizer rule `CollapseProject` inlines expressions generated from correlated scalar subqueries, which can create unnecessary left outer joins. {code:scala} // Before Project [c1, s, (s * 10)] +- Project [c1, scalar-subquery [c1] AS s] : +- Aggregate [c1], [first(c2), c1] : +- LocalRelation [c1, c2] +- LocalRelation [c1, c2] // After (scalar subqueries are inlined) Project [c1, scalar-subquery [c1], (scalar-subquery [c1] * 10)] : +- Aggregate [c1], [first(c2), c1] : +- LocalRelation [c1, c2] : +- Aggregate [c1], [first(c2), c1] : +- LocalRelation [c1, c2] +- LocalRelation [c1, c2] {code} Then this query will have two LeftOuter joins created. We should only collapse projects after correlated subqueries are rewritten as joins. > CollapseProject should not collapse correlated scalar subqueries > > > Key: SPARK-36656 > URL: https://issues.apache.org/jira/browse/SPARK-36656 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Allison Wang >Priority: Major > > Currently, the optimizer rule `CollapseProject` inlines expressions generated > from correlated scalar subqueries, which can create unnecessary left outer > joins. > {code:sql} > select c1, s, s * 10 from ( > select c1, (select first(c2) from t2 where t1.c1 = t2.c1) s from t1) > {code} > {code:scala} > // Before > Project [c1, s, (s * 10)] > +- Project [c1, scalar-subquery [c1] AS s] >: +- Aggregate [c1], [first(c2), c1] >: +- LocalRelation [c1, c2] >+- LocalRelation [c1, c2] > // After (scalar subqueries are inlined) > Project [c1, scalar-subquery [c1], (scalar-subquery [c1] * 10)] > : +- Aggregate [c1], [first(c2), c1] > : +- LocalRelation [c1, c2] > : +- Aggregate [c1], [first(c2), c1] > : +- LocalRelation [c1, c2] > +- LocalRelation [c1, c2] > {code} > Then this query will have two LeftOuter joins created. We should only > collapse projects after correlated subqueries are rewritten as joins. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36656) CollapseProject should not collapse correlated scalar subqueries
Allison Wang created SPARK-36656: Summary: CollapseProject should not collapse correlated scalar subqueries Key: SPARK-36656 URL: https://issues.apache.org/jira/browse/SPARK-36656 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.3.0 Reporter: Allison Wang Currently, the optimizer rule `CollapseProject` inlines expressions generated from correlated scalar subqueries, which can create unnecessary left outer joins. {code:scala} // Before Project [c1, s, (s * 10)] +- Project [c1, scalar-subquery [c1] AS s] : +- Aggregate [c1], [first(c2), c1] : +- LocalRelation [c1, c2] +- LocalRelation [c1, c2] // After (scalar subqueries are inlined) Project [c1, scalar-subquery [c1], (scalar-subquery [c1] * 10)] : +- Aggregate [c1], [first(c2), c1] : +- LocalRelation [c1, c2] : +- Aggregate [c1], [first(c2), c1] : +- LocalRelation [c1, c2] +- LocalRelation [c1, c2] {code} Then this query will have two LeftOuter joins created. We should only collapse projects after correlated subqueries are rewritten as joins. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36655) Add `versionadded` for API added in Spark 3.3.0
[ https://issues.apache.org/jira/browse/SPARK-36655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36655: Assignee: Apache Spark > Add `versionadded` for API added in Spark 3.3.0 > --- > > Key: SPARK-36655 > URL: https://issues.apache.org/jira/browse/SPARK-36655 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Xinrong Meng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36655) Add `versionadded` for API added in Spark 3.3.0
[ https://issues.apache.org/jira/browse/SPARK-36655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36655: Assignee: (was: Apache Spark) > Add `versionadded` for API added in Spark 3.3.0 > --- > > Key: SPARK-36655 > URL: https://issues.apache.org/jira/browse/SPARK-36655 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Xinrong Meng >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36655) Add `versionadded` for API added in Spark 3.3.0
[ https://issues.apache.org/jira/browse/SPARK-36655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409134#comment-17409134 ] Apache Spark commented on SPARK-36655: -- User 'xinrong-databricks' has created a pull request for this issue: https://github.com/apache/spark/pull/33901 > Add `versionadded` for API added in Spark 3.3.0 > --- > > Key: SPARK-36655 > URL: https://issues.apache.org/jira/browse/SPARK-36655 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Xinrong Meng >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36654) Drop type ignores from numpy imports
[ https://issues.apache.org/jira/browse/SPARK-36654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36654: Assignee: (was: Apache Spark) > Drop type ignores from numpy imports > > > Key: SPARK-36654 > URL: https://issues.apache.org/jira/browse/SPARK-36654 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.1.2, 3.2.0, 3.3.0 >Reporter: Maciej Szymkiewicz >Priority: Minor > > Currently we use {{type: ingore[import]}} on all numpy imports ‒ this was > necessary because numpy didn't provide annotations at the time when we added > stubs to PySpark. > Since numpy 1.20 (https://github.com/numpy/numpy/releases/tag/v1.20.0) numpy > is PEP 561 compatible and these ignores are no longer necessary (current > numpy version is 1.21). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36654) Drop type ignores from numpy imports
[ https://issues.apache.org/jira/browse/SPARK-36654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36654: Assignee: Apache Spark > Drop type ignores from numpy imports > > > Key: SPARK-36654 > URL: https://issues.apache.org/jira/browse/SPARK-36654 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.1.2, 3.2.0, 3.3.0 >Reporter: Maciej Szymkiewicz >Assignee: Apache Spark >Priority: Minor > > Currently we use {{type: ingore[import]}} on all numpy imports ‒ this was > necessary because numpy didn't provide annotations at the time when we added > stubs to PySpark. > Since numpy 1.20 (https://github.com/numpy/numpy/releases/tag/v1.20.0) numpy > is PEP 561 compatible and these ignores are no longer necessary (current > numpy version is 1.21). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36654) Drop type ignores from numpy imports
[ https://issues.apache.org/jira/browse/SPARK-36654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409133#comment-17409133 ] Apache Spark commented on SPARK-36654: -- User 'zero323' has created a pull request for this issue: https://github.com/apache/spark/pull/33900 > Drop type ignores from numpy imports > > > Key: SPARK-36654 > URL: https://issues.apache.org/jira/browse/SPARK-36654 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.1.2, 3.2.0, 3.3.0 >Reporter: Maciej Szymkiewicz >Priority: Minor > > Currently we use {{type: ingore[import]}} on all numpy imports ‒ this was > necessary because numpy didn't provide annotations at the time when we added > stubs to PySpark. > Since numpy 1.20 (https://github.com/numpy/numpy/releases/tag/v1.20.0) numpy > is PEP 561 compatible and these ignores are no longer necessary (current > numpy version is 1.21). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36655) Add `versionadded` for API added in Spark 3.3.0
Xinrong Meng created SPARK-36655: Summary: Add `versionadded` for API added in Spark 3.3.0 Key: SPARK-36655 URL: https://issues.apache.org/jira/browse/SPARK-36655 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.3.0 Reporter: Xinrong Meng -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36654) Drop type ignores from numpy imports
Maciej Szymkiewicz created SPARK-36654: -- Summary: Drop type ignores from numpy imports Key: SPARK-36654 URL: https://issues.apache.org/jira/browse/SPARK-36654 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.1.2, 3.2.0, 3.3.0 Reporter: Maciej Szymkiewicz Currently we use {{type: ingore[import]}} on all numpy imports ‒ this was necessary because numpy didn't provide annotations at the time when we added stubs to PySpark. Since numpy 1.20 (https://github.com/numpy/numpy/releases/tag/v1.20.0) numpy is PEP 561 compatible and these ignores are no longer necessary (current numpy version is 1.21). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36653) Implement Series.__xor__
Takuya Ueshin created SPARK-36653: - Summary: Implement Series.__xor__ Key: SPARK-36653 URL: https://issues.apache.org/jira/browse/SPARK-36653 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.3.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36652) AQE dynamic join selection should not apply to non-equi join
[ https://issues.apache.org/jira/browse/SPARK-36652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Su updated SPARK-36652: - Affects Version/s: (was: 3.2.0) > AQE dynamic join selection should not apply to non-equi join > > > Key: SPARK-36652 > URL: https://issues.apache.org/jira/browse/SPARK-36652 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Cheng Su >Priority: Minor > > Currently `DynamicJoinSelection` has two features: 1.demote broadcast hash > join, and 2.promote shuffled hash join. Both are achieved by adding join hint > in query plan, and only works for equi join. However the rule is matching > with `Join` operator now - > [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/DynamicJoinSelection.scala#L71,] > so it would add hint for non-equi join by mistake. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36443) Demote BroadcastJoin causes performance regression and increases OOM risks
[ https://issues.apache.org/jira/browse/SPARK-36443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408958#comment-17408958 ] Dongjoon Hyun commented on SPARK-36443: --- Thank you for the details. > Demote BroadcastJoin causes performance regression and increases OOM risks > -- > > Key: SPARK-36443 > URL: https://issues.apache.org/jira/browse/SPARK-36443 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.2 >Reporter: Kent Yao >Priority: Major > Attachments: image-2021-08-06-11-24-34-122.png, > image-2021-08-06-17-57-15-765.png, screenshot-1.png > > > > h2. A test case > Use bin/spark-sql with local mode and all other default settings with 3.1.2 > to run the case below > {code:sql} > // Some comments here > set spark.sql.shuffle.partitions=20; > set spark.sql.adaptive.enabled=true; > -- set spark.sql.adaptive.nonEmptyPartitionRatioForBroadcastJoin=0; -- > (default 0.2)enable this for not demote bhj > set spark.sql.autoBroadcastJoinThreshold=200; > SELECT > l.id % 12345 k, > sum(l.id) sum, > count(l.id) cnt, > avg(l.id) avg, > min(l.id) min, > max(l.id) max > from (select id % 3 id from range(0, 1e8, 1, 100)) l > left join (SELECT max(id) as id, id % 2 gid FROM range(0, 1000, 2, 100) > group by gid) r ON l.id = r.id > GROUP BY 1; > {code} > > 1. demote bhj w/ nonEmptyPartitionRatioForBroadcastJoin comment out > > | | > ||[Job Id > ▾|http://localhost:4040/jobs/?=Job+Id=false=100#completed]||[Description|http://localhost:4040/jobs/?=Description=100#completed]||[Submitted|http://localhost:4040/jobs/?=Submitted=100#completed]||[Duration|http://localhost:4040/jobs/?=Duration=100#completed]||Stages: > Succeeded/Total||Tasks (for all stages): Succeeded/Total|| > |4|SELECT l.id % 12345 k, sum(l.id) sum, count(l.id) cnt, avg(l.id) avg, > min(l.id) min, max(l.id) max from (select id % 3 id from range(0, 1e8, 1, > 100)) l left join (SELECT max(id) as id, id % 2 gid FROM range(0, 1000, 2, > 100) group by gid) r ON l.id = r.id GROUP BY 1[main at > NativeMethodAccessorImpl.java:0|http://localhost:4040/jobs/job/?id=4]|2021/08/06 > 17:31:37|71 ms|1/1 (4 skipped)|3/3 (205 skipped) > | > |3|SELECT l.id % 12345 k, sum(l.id) sum, count(l.id) cnt, avg(l.id) avg, > min(l.id) min, max(l.id) max from (select id % 3 id from range(0, 1e8, 1, > 100)) l left join (SELECT max(id) as id, id % 2 gid FROM range(0, 1000, 2, > 100) group by gid) r ON l.id = r.id GROUP BY 1[main at > NativeMethodAccessorImpl.java:0|http://localhost:4040/jobs/job/?id=3]|2021/08/06 > 17:31:18|19 s|1/1 (3 skipped)|4/4 (201 skipped) > | > |2|SELECT l.id % 12345 k, sum(l.id) sum, count(l.id) cnt, avg(l.id) avg, > min(l.id) min, max(l.id) max from (select id % 3 id from range(0, 1e8, 1, > 100)) l left join (SELECT max(id) as id, id % 2 gid FROM range(0, 1000, 2, > 100) group by gid) r ON l.id = r.id GROUP BY 1[main at > NativeMethodAccessorImpl.java:0|http://localhost:4040/jobs/job/?id=2]|2021/08/06 > 17:31:18|87 ms|1/1 (1 skipped)|1/1 (100 skipped) > | > |1|SELECT l.id % 12345 k, sum(l.id) sum, count(l.id) cnt, avg(l.id) avg, > min(l.id) min, max(l.id) max from (select id % 3 id from range(0, 1e8, 1, > 100)) l left join (SELECT max(id) as id, id % 2 gid FROM range(0, 1000, 2, > 100) group by gid) r ON l.id = r.id GROUP BY 1[main at > NativeMethodAccessorImpl.java:0|http://localhost:4040/jobs/job/?id=1]|2021/08/06 > 17:31:16|2 s|1/1|100/100 > | > |0|SELECT l.id % 12345 k, sum(l.id) sum, count(l.id) cnt, avg(l.id) avg, > min(l.id) min, max(l.id) max from (select id % 3 id from range(0, 1e8, 1, > 100)) l left join (SELECT max(id) as id, id % 2 gid FROM range(0, 1000, 2, > 100) group by gid) r ON l.id = r.id GROUP BY 1[main at > NativeMethodAccessorImpl.java:0|http://localhost:4040/jobs/job/?id=0]|2021/08/06 > 17:31:15|2 s|1/1|100/100 | > 2. set nonEmptyPartitionRatioForBroadcastJoin to 0 to tell spark not to > demote bhj > > ||[Job Id (Job Group) > ▾|http://localhost:4040/jobs/?=Job+Id+%28Job+Group%29=false=100#completed]||[Description|http://localhost:4040/jobs/?=Description=100#completed]||[Submitted|http://localhost:4040/jobs/?=Submitted=100#completed]||[Duration|http://localhost:4040/jobs/?=Duration=100#completed]||Stages: > Succeeded/Total||Tasks (for all stages): Succeeded/Total|| > |5|SELECT l.id % 12345 k, sum(l.id) sum, count(l.id) cnt, avg(l.id) avg, > min(l.id) min, max(l.id) max from (select id % 3 id from range(0, 1e8, 1, > 100)) l left join (SELECT max(id) as id, id % 2 gid FROM range(0, 1000, 2, > 100) group by gid) r ON l.id = r.id GROUP BY 1[main at > NativeMethodAccessorImpl.java:0|http://localhost:4040/jobs/job/?id=5]|2021/08/06 > 18:25:15|29 ms|1/1 (2 skipped)|3/3 (200 skipped) > | > |4|SELECT l.id %
[jira] [Updated] (SPARK-36652) AQE dynamic join selection should not apply to non-equi join
[ https://issues.apache.org/jira/browse/SPARK-36652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-36652: -- Parent: SPARK-33828 Issue Type: Sub-task (was: Bug) > AQE dynamic join selection should not apply to non-equi join > > > Key: SPARK-36652 > URL: https://issues.apache.org/jira/browse/SPARK-36652 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Cheng Su >Priority: Minor > > Currently `DynamicJoinSelection` has two features: 1.demote broadcast hash > join, and 2.promote shuffled hash join. Both are achieved by adding join hint > in query plan, and only works for equi join. However the rule is matching > with `Join` operator now - > [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/DynamicJoinSelection.scala#L71,] > so it would add hint for non-equi join by mistake. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36617) Inconsistencies in approxQuantile annotations
[ https://issues.apache.org/jira/browse/SPARK-36617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz reassigned SPARK-36617: -- Assignee: Cary Lee > Inconsistencies in approxQuantile annotations > - > > Key: SPARK-36617 > URL: https://issues.apache.org/jira/browse/SPARK-36617 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL >Affects Versions: 3.1.0, 3.2.0, 3.3.0 >Reporter: Maciej Szymkiewicz >Assignee: Cary Lee >Priority: Major > Fix For: 3.2.0, 3.1.3 > > > I've been reviewing PR in the legacy repo > (https://github.com/zero323/pyspark-stubs/pull/552) and it looks like we have > two problems with annotations for {{approxQuantile}}. > First of all {{DataFrame.approxQuantile}} should overload definition to match > input arguments ‒ if col is a sequence then result should be a list of lists: > {code:python} > @overload > def approxQuantile( > self, > col: str, > probabilities: Union[List[float], Tuple[float]], > relativeError: float > ) -> List[float]: ... > @overload > def approxQuantile( > self, > col: Union[List[str], Tuple[str]], > probabilities: Union[List[float], Tuple[float]], > relativeError: float > ) -> List[List[float]]: ... > {code} > Additionally {{DataFrameStatFunctions.approxQuantile}} should match whatever > we have in {{DataFrame}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36652) AQE dynamic join selection should not apply to non-equi join
[ https://issues.apache.org/jira/browse/SPARK-36652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36652: Assignee: (was: Apache Spark) > AQE dynamic join selection should not apply to non-equi join > > > Key: SPARK-36652 > URL: https://issues.apache.org/jira/browse/SPARK-36652 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Cheng Su >Priority: Minor > > Currently `DynamicJoinSelection` has two features: 1.demote broadcast hash > join, and 2.promote shuffled hash join. Both are achieved by adding join hint > in query plan, and only works for equi join. However the rule is matching > with `Join` operator now - > [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/DynamicJoinSelection.scala#L71,] > so it would add hint for non-equi join by mistake. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36652) AQE dynamic join selection should not apply to non-equi join
[ https://issues.apache.org/jira/browse/SPARK-36652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36652: Assignee: Apache Spark > AQE dynamic join selection should not apply to non-equi join > > > Key: SPARK-36652 > URL: https://issues.apache.org/jira/browse/SPARK-36652 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Cheng Su >Assignee: Apache Spark >Priority: Minor > > Currently `DynamicJoinSelection` has two features: 1.demote broadcast hash > join, and 2.promote shuffled hash join. Both are achieved by adding join hint > in query plan, and only works for equi join. However the rule is matching > with `Join` operator now - > [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/DynamicJoinSelection.scala#L71,] > so it would add hint for non-equi join by mistake. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36652) AQE dynamic join selection should not apply to non-equi join
[ https://issues.apache.org/jira/browse/SPARK-36652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408914#comment-17408914 ] Apache Spark commented on SPARK-36652: -- User 'c21' has created a pull request for this issue: https://github.com/apache/spark/pull/33899 > AQE dynamic join selection should not apply to non-equi join > > > Key: SPARK-36652 > URL: https://issues.apache.org/jira/browse/SPARK-36652 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Cheng Su >Priority: Minor > > Currently `DynamicJoinSelection` has two features: 1.demote broadcast hash > join, and 2.promote shuffled hash join. Both are achieved by adding join hint > in query plan, and only works for equi join. However the rule is matching > with `Join` operator now - > [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/DynamicJoinSelection.scala#L71,] > so it would add hint for non-equi join by mistake. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36630) Add the option to use physical statistics to avoid large tables being broadcast
[ https://issues.apache.org/jira/browse/SPARK-36630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-36630: --- Parent: (was: SPARK-33828) Issue Type: Question (was: Sub-task) > Add the option to use physical statistics to avoid large tables being > broadcast > --- > > Key: SPARK-36630 > URL: https://issues.apache.org/jira/browse/SPARK-36630 > Project: Spark > Issue Type: Question > Components: SQL >Affects Versions: 3.2.0 >Reporter: gaoyajun02 >Priority: Major > > Currently when AQE's queryStage is not materialized, it uses the stats of the > logical plan to estimate whether the plan can be converted to BHJ, and in > some scenarios the estimated value is several orders of magnitude smaller > than the actual broadcast data, which can lead to large tables being broadcast -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-36630) Add the option to use physical statistics to avoid large tables being broadcast
[ https://issues.apache.org/jira/browse/SPARK-36630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-36630: --- Comment: was deleted (was: close it) > Add the option to use physical statistics to avoid large tables being > broadcast > --- > > Key: SPARK-36630 > URL: https://issues.apache.org/jira/browse/SPARK-36630 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: gaoyajun02 >Priority: Major > > Currently when AQE's queryStage is not materialized, it uses the stats of the > logical plan to estimate whether the plan can be converted to BHJ, and in > some scenarios the estimated value is several orders of magnitude smaller > than the actual broadcast data, which can lead to large tables being broadcast -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-36630) Add the option to use physical statistics to avoid large tables being broadcast
[ https://issues.apache.org/jira/browse/SPARK-36630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 closed SPARK-36630. -- close it > Add the option to use physical statistics to avoid large tables being > broadcast > --- > > Key: SPARK-36630 > URL: https://issues.apache.org/jira/browse/SPARK-36630 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: gaoyajun02 >Priority: Major > > Currently when AQE's queryStage is not materialized, it uses the stats of the > logical plan to estimate whether the plan can be converted to BHJ, and in > some scenarios the estimated value is several orders of magnitude smaller > than the actual broadcast data, which can lead to large tables being broadcast -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36630) Add the option to use physical statistics to avoid large tables being broadcast
[ https://issues.apache.org/jira/browse/SPARK-36630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 resolved SPARK-36630. Resolution: Fixed > Add the option to use physical statistics to avoid large tables being > broadcast > --- > > Key: SPARK-36630 > URL: https://issues.apache.org/jira/browse/SPARK-36630 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: gaoyajun02 >Priority: Major > > Currently when AQE's queryStage is not materialized, it uses the stats of the > logical plan to estimate whether the plan can be converted to BHJ, and in > some scenarios the estimated value is several orders of magnitude smaller > than the actual broadcast data, which can lead to large tables being broadcast -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36630) Add the option to use physical statistics to avoid large tables being broadcast
[ https://issues.apache.org/jira/browse/SPARK-36630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408901#comment-17408901 ] gaoyajun02 commented on SPARK-36630: Found my issue is similar to https://issues.apache.org/jira/browse/SPARK-35264. I can set the autoBroadcastThreshold to -1 to disable logical plan stats, and set spark.sql.adaptive.autoBroadcastJoinThreshold to control the threshold for physical stats > Add the option to use physical statistics to avoid large tables being > broadcast > --- > > Key: SPARK-36630 > URL: https://issues.apache.org/jira/browse/SPARK-36630 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: gaoyajun02 >Priority: Major > > Currently when AQE's queryStage is not materialized, it uses the stats of the > logical plan to estimate whether the plan can be converted to BHJ, and in > some scenarios the estimated value is several orders of magnitude smaller > than the actual broadcast data, which can lead to large tables being broadcast -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36652) AQE dynamic join selection should not apply to non-equi join
Cheng Su created SPARK-36652: Summary: AQE dynamic join selection should not apply to non-equi join Key: SPARK-36652 URL: https://issues.apache.org/jira/browse/SPARK-36652 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0, 3.3.0 Reporter: Cheng Su Currently `DynamicJoinSelection` has two features: 1.demote broadcast hash join, and 2.promote shuffled hash join. Both are achieved by adding join hint in query plan, and only works for equi join. However the rule is matching with `Join` operator now - [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/DynamicJoinSelection.scala#L71,] so it would add hint for non-equi join by mistake. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36637) Bad error message when using non-existing named window
[ https://issues.apache.org/jira/browse/SPARK-36637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-36637: --- Assignee: angerszhu > Bad error message when using non-existing named window > -- > > Key: SPARK-36637 > URL: https://issues.apache.org/jira/browse/SPARK-36637 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Wenchen Fan >Assignee: angerszhu >Priority: Major > Fix For: 3.2.0 > > > {code:java} > CREATE TABLE employees > (name STRING, dept STRING, salary INT, age INT); > SELECT AVG(age) OVER win AS salary, > AVG(salary) OVER win AS avgsalary, > MIN(salary) OVER win AS minsalary, > MAX(salary) OVER win AS maxsalary, > COUNT(1) OVER win AS numEmps > FROM employees; > Error in query: unresolved operator 'Aggregate > [unresolvedwindowexpression(avg(age#43), WindowSpecReference(win)) AS > salary#34, unresolvedwindowexpression(avg(salary#42), > WindowSpecReference(win)) AS avgsalary#35, > unresolvedwindowexpression(min(salary#42), WindowSpecReference(win)) AS > minsalary#36, unresolvedwindowexpression(max(salary#42), > WindowSpecReference(win)) AS maxsalary#37, > unresolvedwindowexpression(count(1), WindowSpecReference(win)) AS numEmps#38]; > 'Aggregate [unresolvedwindowexpression(avg(age#43), WindowSpecReference(win)) > AS salary#34, unresolvedwindowexpression(avg(salary#42), > WindowSpecReference(win)) AS avgsalary#35, > unresolvedwindowexpression(min(salary#42), WindowSpecReference(win)) AS > minsalary#36, unresolvedwindowexpression(max(salary#42), > WindowSpecReference(win)) AS maxsalary#37, > unresolvedwindowexpression(count(1), WindowSpecReference(win)) AS numEmps#38] > +- SubqueryAlias spark_catalog.default.employees > +- HiveTableRelation [`default`.`employees`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [name#40, > dept#41, salary#42, age#43], Partition Cols: []] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36637) Bad error message when using non-existing named window
[ https://issues.apache.org/jira/browse/SPARK-36637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-36637. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33892 [https://github.com/apache/spark/pull/33892] > Bad error message when using non-existing named window > -- > > Key: SPARK-36637 > URL: https://issues.apache.org/jira/browse/SPARK-36637 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Wenchen Fan >Priority: Major > Fix For: 3.2.0 > > > {code:java} > CREATE TABLE employees > (name STRING, dept STRING, salary INT, age INT); > SELECT AVG(age) OVER win AS salary, > AVG(salary) OVER win AS avgsalary, > MIN(salary) OVER win AS minsalary, > MAX(salary) OVER win AS maxsalary, > COUNT(1) OVER win AS numEmps > FROM employees; > Error in query: unresolved operator 'Aggregate > [unresolvedwindowexpression(avg(age#43), WindowSpecReference(win)) AS > salary#34, unresolvedwindowexpression(avg(salary#42), > WindowSpecReference(win)) AS avgsalary#35, > unresolvedwindowexpression(min(salary#42), WindowSpecReference(win)) AS > minsalary#36, unresolvedwindowexpression(max(salary#42), > WindowSpecReference(win)) AS maxsalary#37, > unresolvedwindowexpression(count(1), WindowSpecReference(win)) AS numEmps#38]; > 'Aggregate [unresolvedwindowexpression(avg(age#43), WindowSpecReference(win)) > AS salary#34, unresolvedwindowexpression(avg(salary#42), > WindowSpecReference(win)) AS avgsalary#35, > unresolvedwindowexpression(min(salary#42), WindowSpecReference(win)) AS > minsalary#36, unresolvedwindowexpression(max(salary#42), > WindowSpecReference(win)) AS maxsalary#37, > unresolvedwindowexpression(count(1), WindowSpecReference(win)) AS numEmps#38] > +- SubqueryAlias spark_catalog.default.employees > +- HiveTableRelation [`default`.`employees`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [name#40, > dept#41, salary#42, age#43], Partition Cols: []] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36617) Inconsistencies in approxQuantile annotations
[ https://issues.apache.org/jira/browse/SPARK-36617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz resolved SPARK-36617. Fix Version/s: 3.1.3 3.2.0 Resolution: Resolved Issue resolved by pull request 33880 https://github.com/apache/spark/pull/33880 > Inconsistencies in approxQuantile annotations > - > > Key: SPARK-36617 > URL: https://issues.apache.org/jira/browse/SPARK-36617 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL >Affects Versions: 3.1.0, 3.2.0, 3.3.0 >Reporter: Maciej Szymkiewicz >Priority: Major > Fix For: 3.2.0, 3.1.3 > > > I've been reviewing PR in the legacy repo > (https://github.com/zero323/pyspark-stubs/pull/552) and it looks like we have > two problems with annotations for {{approxQuantile}}. > First of all {{DataFrame.approxQuantile}} should overload definition to match > input arguments ‒ if col is a sequence then result should be a list of lists: > {code:python} > @overload > def approxQuantile( > self, > col: str, > probabilities: Union[List[float], Tuple[float]], > relativeError: float > ) -> List[float]: ... > @overload > def approxQuantile( > self, > col: Union[List[str], Tuple[str]], > probabilities: Union[List[float], Tuple[float]], > relativeError: float > ) -> List[List[float]]: ... > {code} > Additionally {{DataFrameStatFunctions.approxQuantile}} should match whatever > we have in {{DataFrame}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36622) spark.history.kerberos.principal doesn't take value _HOST
[ https://issues.apache.org/jira/browse/SPARK-36622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408813#comment-17408813 ] Thomas Graves commented on SPARK-36622: --- Supported _HOST for SHS likely makes sense since its a server. > spark.history.kerberos.principal doesn't take value _HOST > - > > Key: SPARK-36622 > URL: https://issues.apache.org/jira/browse/SPARK-36622 > Project: Spark > Issue Type: Improvement > Components: Deploy, Security, Spark Core >Affects Versions: 3.0.1, 3.1.2 >Reporter: pralabhkumar >Priority: Minor > > spark.history.kerberos.principal doesn't understand value _HOST. > It says failure to login for principal : spark/_HOST@realm . > It will be helpful to take _HOST value via config file and change it with > current hostname(similar to what Hive does) . This will also help to run SHS > on multiple machines without hardcoding principal hostname. > .spark.history.kerberos.principal > > It require minor change in HistoryServer.scala in initSecurity method . > > Please let me know , if this request make sense , I'll create the PR . > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-35623) Volcano resource manager for Spark on Kubernetes
[ https://issues.apache.org/jira/browse/SPARK-35623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408750#comment-17408750 ] Senthil Kumar edited comment on SPARK-35623 at 9/2/21, 11:56 AM: - [~dipanjanK] Include me too pls. mail id: senthissen...@gmail.com was (Author: senthh): [~dipanjanK] Include me too pls > Volcano resource manager for Spark on Kubernetes > > > Key: SPARK-35623 > URL: https://issues.apache.org/jira/browse/SPARK-35623 > Project: Spark > Issue Type: Brainstorming > Components: Kubernetes >Affects Versions: 3.1.1, 3.1.2 >Reporter: Dipanjan Kailthya >Priority: Minor > Labels: kubernetes, resourcemanager > > Dear Spark Developers, > > Hello from the Netherlands! Posting this here as I still haven't gotten > accepted to post in the spark dev mailing list. > > My team is planning to use spark with Kubernetes support on our shared > (multi-tenant) on premise Kubernetes cluster. However we would like to have > certain scheduling features like fair-share and preemption which as we > understand are not built into the current spark-kubernetes resource manager > yet. We have been working on and are close to a first successful prototype > integration with Volcano ([https://volcano.sh/en/docs/]). Briefly this means > a new resource manager component with lots in common with existing > spark-kubernetes resource manager, but instead of pods it launches Volcano > jobs which delegate the driver and executor pod creation and lifecycle > management to Volcano. We are interested in contributing this to open source, > either directly in spark or as a separate project. > > So, two questions: > > 1. Do the spark maintainers see this as a valuable contribution to the > mainline spark codebase? If so, can we have some guidance on how to publish > the changes? > > 2. Are any other developers / organizations interested to contribute to this > effort? If so, please get in touch. > > Best, > Dipanjan -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35623) Volcano resource manager for Spark on Kubernetes
[ https://issues.apache.org/jira/browse/SPARK-35623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408750#comment-17408750 ] Senthil Kumar commented on SPARK-35623: --- [~dipanjanK] Include me too pls > Volcano resource manager for Spark on Kubernetes > > > Key: SPARK-35623 > URL: https://issues.apache.org/jira/browse/SPARK-35623 > Project: Spark > Issue Type: Brainstorming > Components: Kubernetes >Affects Versions: 3.1.1, 3.1.2 >Reporter: Dipanjan Kailthya >Priority: Minor > Labels: kubernetes, resourcemanager > > Dear Spark Developers, > > Hello from the Netherlands! Posting this here as I still haven't gotten > accepted to post in the spark dev mailing list. > > My team is planning to use spark with Kubernetes support on our shared > (multi-tenant) on premise Kubernetes cluster. However we would like to have > certain scheduling features like fair-share and preemption which as we > understand are not built into the current spark-kubernetes resource manager > yet. We have been working on and are close to a first successful prototype > integration with Volcano ([https://volcano.sh/en/docs/]). Briefly this means > a new resource manager component with lots in common with existing > spark-kubernetes resource manager, but instead of pods it launches Volcano > jobs which delegate the driver and executor pod creation and lifecycle > management to Volcano. We are interested in contributing this to open source, > either directly in spark or as a separate project. > > So, two questions: > > 1. Do the spark maintainers see this as a valuable contribution to the > mainline spark codebase? If so, can we have some guidance on how to publish > the changes? > > 2. Are any other developers / organizations interested to contribute to this > effort? If so, please get in touch. > > Best, > Dipanjan -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36632) DivideYMInterval should throw the same exception when divide by zero.
[ https://issues.apache.org/jira/browse/SPARK-36632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-36632: --- Summary: DivideYMInterval should throw the same exception when divide by zero. (was: DivideYMInterval should consider ansi mode.) > DivideYMInterval should throw the same exception when divide by zero. > - > > Key: SPARK-36632 > URL: https://issues.apache.org/jira/browse/SPARK-36632 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Priority: Major > > DivideYMInterval not consider the ansi mode, we should support it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36627) Tasks with Java proxy objects fail to deserialize
[ https://issues.apache.org/jira/browse/SPARK-36627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samuel Souza updated SPARK-36627: - Description: In JavaSerializer.JavaDeserializationStream we override resolveClass of ObjectInputStream to use the threads' contextClassLoader. However, we do not override resolveProxyClass, which is used when deserializing Java proxy objects, which makes spark use the wrong classloader when deserializing objects, which causes the job to fail with the following exception: {code} Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4, , executor 1): java.lang.ClassNotFoundException: at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522) at java.base/java.lang.Class.forName0(Native Method) at java.base/java.lang.Class.forName(Class.java:398) at java.base/java.io.ObjectInputStream.resolveProxyClass(ObjectInputStream.java:829) at java.base/java.io.ObjectInputStream.readProxyDesc(ObjectInputStream.java:1917) ... at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) {code} was: In JavaSerializer.JavaDeserializationStream we override resolveClass of ObjectInputStream to use the threads' contextClassLoader. However, we do not override resolveProxyClass, which is used when deserializing Java proxy objects, which makes spark use the wrong classloader when deserializing objects, which causes the job to fail with the following exception: Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4, , executor 1): java.lang.ClassNotFoundException: at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522) at java.base/java.lang.Class.forName0(Native Method) at java.base/java.lang.Class.forName(Class.java:398) at java.base/java.io.ObjectInputStream.resolveProxyClass(ObjectInputStream.java:829) at java.base/java.io.ObjectInputStream.readProxyDesc(ObjectInputStream.java:1917) ... at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) > Tasks with Java proxy objects fail to deserialize > - > > Key: SPARK-36627 > URL: https://issues.apache.org/jira/browse/SPARK-36627 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.3 >Reporter: Samuel Souza >Priority: Minor > > In JavaSerializer.JavaDeserializationStream we override resolveClass of > ObjectInputStream to use the threads' contextClassLoader. However, we do not > override resolveProxyClass, which is used when deserializing Java proxy > objects, which makes spark use the wrong classloader when deserializing > objects, which causes the job to fail with the following exception: > {code} > Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in > stage 1.0 (TID 4, , executor 1): java.lang.ClassNotFoundException: > > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522) > at java.base/java.lang.Class.forName0(Native Method) > at java.base/java.lang.Class.forName(Class.java:398) > at > java.base/java.io.ObjectInputStream.resolveProxyClass(ObjectInputStream.java:829) > at > java.base/java.io.ObjectInputStream.readProxyDesc(ObjectInputStream.java:1917) > ... > at > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36627) Tasks with Java proxy objects fail to deserialize
[ https://issues.apache.org/jira/browse/SPARK-36627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samuel Souza updated SPARK-36627: - Summary: Tasks with Java proxy objects fail to deserialize (was: Tasks with Java proxy objects fail to desrialize) > Tasks with Java proxy objects fail to deserialize > - > > Key: SPARK-36627 > URL: https://issues.apache.org/jira/browse/SPARK-36627 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.3 >Reporter: Samuel Souza >Priority: Minor > > In JavaSerializer.JavaDeserializationStream we override resolveClass of > ObjectInputStream to use the threads' contextClassLoader. However, we do not > override resolveProxyClass, which is used when deserializing Java proxy > objects, which makes spark use the wrong classloader when deserializing > objects, which causes the job to fail with the following exception: > Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in > stage 1.0 (TID 4, , executor 1): java.lang.ClassNotFoundException: > > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522) > at java.base/java.lang.Class.forName0(Native Method) > at java.base/java.lang.Class.forName(Class.java:398) > at > java.base/java.io.ObjectInputStream.resolveProxyClass(ObjectInputStream.java:829) > at > java.base/java.io.ObjectInputStream.readProxyDesc(ObjectInputStream.java:1917) > ... > at > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36644) Push down boolean column filter
[ https://issues.apache.org/jira/browse/SPARK-36644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36644: Assignee: Apache Spark > Push down boolean column filter > --- > > Key: SPARK-36644 > URL: https://issues.apache.org/jira/browse/SPARK-36644 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.1.2, 3.2.0 >Reporter: Kazuyuki Tanimura >Assignee: Apache Spark >Priority: Major > > The following query does not push down the filter > ``` > SELECT * FROM t WHERE boolean_field > ``` > although the following query pushes down the filter as expected. > ``` > SELECT * FROM t WHERE boolean_field = true > ``` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36644) Push down boolean column filter
[ https://issues.apache.org/jira/browse/SPARK-36644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408658#comment-17408658 ] Apache Spark commented on SPARK-36644: -- User 'kazuyukitanimura' has created a pull request for this issue: https://github.com/apache/spark/pull/33898 > Push down boolean column filter > --- > > Key: SPARK-36644 > URL: https://issues.apache.org/jira/browse/SPARK-36644 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.1.2, 3.2.0 >Reporter: Kazuyuki Tanimura >Priority: Major > > The following query does not push down the filter > ``` > SELECT * FROM t WHERE boolean_field > ``` > although the following query pushes down the filter as expected. > ``` > SELECT * FROM t WHERE boolean_field = true > ``` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36644) Push down boolean column filter
[ https://issues.apache.org/jira/browse/SPARK-36644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36644: Assignee: (was: Apache Spark) > Push down boolean column filter > --- > > Key: SPARK-36644 > URL: https://issues.apache.org/jira/browse/SPARK-36644 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.1.2, 3.2.0 >Reporter: Kazuyuki Tanimura >Priority: Major > > The following query does not push down the filter > ``` > SELECT * FROM t WHERE boolean_field > ``` > although the following query pushes down the filter as expected. > ``` > SELECT * FROM t WHERE boolean_field = true > ``` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36630) Add the option to use physical statistics to avoid large tables being broadcast
[ https://issues.apache.org/jira/browse/SPARK-36630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaoyajun02 updated SPARK-36630: --- Parent: SPARK-33828 Issue Type: Sub-task (was: Improvement) > Add the option to use physical statistics to avoid large tables being > broadcast > --- > > Key: SPARK-36630 > URL: https://issues.apache.org/jira/browse/SPARK-36630 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: gaoyajun02 >Priority: Major > > Currently when AQE's queryStage is not materialized, it uses the stats of the > logical plan to estimate whether the plan can be converted to BHJ, and in > some scenarios the estimated value is several orders of magnitude smaller > than the actual broadcast data, which can lead to large tables being broadcast -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36651) NPE in HiveShim$.toCatalystDecimal(HiveShim.scala:110)
[ https://issues.apache.org/jira/browse/SPARK-36651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Serge Shikov updated SPARK-36651: - Description: This is still reproducible in Spark 2.4.5. Original issue SPARK-25080 conains steps to reproduce. ``` Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1190 in stage 392.0 failed 4 times, most recent failure: Lost task 1190.3 in stage 392.0 (TID 122055, ip-172-31-32-196.ec2.internal, executor 487): java.lang.NullPointerException at org.apache.spark.sql.hive.HiveShim$.toCatalystDecimal(HiveShim.scala:110) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:414) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:413) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:442) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:433) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:217) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:294) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:265) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1753) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1741) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1740) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1740) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:871) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:871) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:871) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1974) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1923) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1912) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:682) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:194) ... 67 more Caused by: java.lang.NullPointerException at org.apache.spark.sql.hive.HiveShim$.toCatalystDecimal(HiveShim.scala:110) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:414) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:413) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:442) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:433) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:217) at
[jira] [Updated] (SPARK-36651) NPE in HiveShim$.toCatalystDecimal(HiveShim.scala:110)
[ https://issues.apache.org/jira/browse/SPARK-36651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Serge Shikov updated SPARK-36651: - Issue Type: Bug (was: Improvement) > NPE in HiveShim$.toCatalystDecimal(HiveShim.scala:110) > -- > > Key: SPARK-36651 > URL: https://issues.apache.org/jira/browse/SPARK-36651 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 2.4.5 > Environment: AWS EMR >Reporter: Serge Shikov >Priority: Minor > > This is still reproducible in Spark 2.4.5. Original issue conains steps to > reproduce. > > ``` > Caused by: org.apache.spark.SparkException: Job aborted due to stage > failure: Task 1190 in stage 392.0 failed 4 times, most recent failure: Lost > task 1190.3 in stage 392.0 (TID 122055, ip-172-31-32-196.ec2.internal, > executor 487): java.lang.NullPointerException > at org.apache.spark.sql.hive.HiveShim$.toCatalystDecimal(HiveShim.scala:110) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:414) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:413) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:442) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:433) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) > at > org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:217) > at > org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:294) > at > org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:265) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) > at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) > at org.apache.spark.scheduler.Task.run(Task.scala:109) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > > Driver stacktrace: > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1753) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1741) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1740) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1740) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:871) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:871) > at scala.Option.foreach(Option.scala:257) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:871) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1974) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1923) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1912) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:682) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:194) > ... 67 more > Caused by: java.lang.NullPointerException > at org.apache.spark.sql.hive.HiveShim$.toCatalystDecimal(HiveShim.scala:110) > at >
[jira] [Updated] (SPARK-36651) NPE in HiveShim$.toCatalystDecimal(HiveShim.scala:110)
[ https://issues.apache.org/jira/browse/SPARK-36651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Serge Shikov updated SPARK-36651: - Priority: Major (was: Minor) > NPE in HiveShim$.toCatalystDecimal(HiveShim.scala:110) > -- > > Key: SPARK-36651 > URL: https://issues.apache.org/jira/browse/SPARK-36651 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 2.4.5 > Environment: AWS EMR >Reporter: Serge Shikov >Priority: Major > > This is still reproducible in Spark 2.4.5. Original issue conains steps to > reproduce. > > ``` > Caused by: org.apache.spark.SparkException: Job aborted due to stage > failure: Task 1190 in stage 392.0 failed 4 times, most recent failure: Lost > task 1190.3 in stage 392.0 (TID 122055, ip-172-31-32-196.ec2.internal, > executor 487): java.lang.NullPointerException > at org.apache.spark.sql.hive.HiveShim$.toCatalystDecimal(HiveShim.scala:110) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:414) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:413) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:442) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:433) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) > at > org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:217) > at > org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:294) > at > org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:265) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) > at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) > at org.apache.spark.scheduler.Task.run(Task.scala:109) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > > Driver stacktrace: > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1753) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1741) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1740) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1740) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:871) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:871) > at scala.Option.foreach(Option.scala:257) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:871) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1974) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1923) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1912) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:682) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:194) > ... 67 more > Caused by: java.lang.NullPointerException > at org.apache.spark.sql.hive.HiveShim$.toCatalystDecimal(HiveShim.scala:110) > at >
[jira] [Updated] (SPARK-36651) NPE in HiveShim$.toCatalystDecimal(HiveShim.scala:110)
[ https://issues.apache.org/jira/browse/SPARK-36651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Serge Shikov updated SPARK-36651: - Description: This is still reproducible in Spark 2.4.5. Original issue conains steps to reproduce. ``` Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1190 in stage 392.0 failed 4 times, most recent failure: Lost task 1190.3 in stage 392.0 (TID 122055, ip-172-31-32-196.ec2.internal, executor 487): java.lang.NullPointerException at org.apache.spark.sql.hive.HiveShim$.toCatalystDecimal(HiveShim.scala:110) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:414) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:413) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:442) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:433) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:217) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:294) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:265) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1753) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1741) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1740) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1740) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:871) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:871) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:871) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1974) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1923) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1912) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:682) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:194) ... 67 more Caused by: java.lang.NullPointerException at org.apache.spark.sql.hive.HiveShim$.toCatalystDecimal(HiveShim.scala:110) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:414) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:413) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:442) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:433) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:217) at
[jira] [Updated] (SPARK-36651) NPE in HiveShim$.toCatalystDecimal(HiveShim.scala:110)
[ https://issues.apache.org/jira/browse/SPARK-36651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Serge Shikov updated SPARK-36651: - Affects Version/s: (was: 2.3.1) 2.4.5 > NPE in HiveShim$.toCatalystDecimal(HiveShim.scala:110) > -- > > Key: SPARK-36651 > URL: https://issues.apache.org/jira/browse/SPARK-36651 > Project: Spark > Issue Type: Improvement > Components: Input/Output >Affects Versions: 2.4.5 > Environment: AWS EMR >Reporter: Serge Shikov >Priority: Minor > > This is still reproducible in Spark 2.4.5. Original issue conains steps to > reproduce. > > ``` > Caused by: org.apache.spark.SparkException: Job aborted due to stage > failure: Task 1190 in stage 392.0 failed 4 times, most recent failure: Lost > task 1190.3 in stage 392.0 (TID 122055, ip-172-31-32-196.ec2.internal, > executor 487): java.lang.NullPointerException > at org.apache.spark.sql.hive.HiveShim$.toCatalystDecimal(HiveShim.scala:110) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:414) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:413) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:442) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:433) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) > at > org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:217) > at > org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:294) > at > org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:265) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) > at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) > at org.apache.spark.scheduler.Task.run(Task.scala:109) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > > Driver stacktrace: > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1753) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1741) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1740) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1740) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:871) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:871) > at scala.Option.foreach(Option.scala:257) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:871) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1974) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1923) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1912) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:682) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:194) > ... 67 more > Caused by: java.lang.NullPointerException > at org.apache.spark.sql.hive.HiveShim$.toCatalystDecimal(HiveShim.scala:110) > at >
[jira] [Created] (SPARK-36651) NPE in HiveShim$.toCatalystDecimal(HiveShim.scala:110)
Serge Shikov created SPARK-36651: Summary: NPE in HiveShim$.toCatalystDecimal(HiveShim.scala:110) Key: SPARK-36651 URL: https://issues.apache.org/jira/browse/SPARK-36651 Project: Spark Issue Type: Improvement Components: Input/Output Affects Versions: 2.3.1 Environment: AWS EMR Reporter: Serge Shikov NPE while reading hive table. ``` Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1190 in stage 392.0 failed 4 times, most recent failure: Lost task 1190.3 in stage 392.0 (TID 122055, ip-172-31-32-196.ec2.internal, executor 487): java.lang.NullPointerException at org.apache.spark.sql.hive.HiveShim$.toCatalystDecimal(HiveShim.scala:110) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:414) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:413) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:442) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:433) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:217) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:294) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$2.apply(ShuffleExchangeExec.scala:265) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1753) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1741) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1740) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1740) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:871) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:871) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:871) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1974) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1923) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1912) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:682) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:194) ... 67 more Caused by: java.lang.NullPointerException at org.apache.spark.sql.hive.HiveShim$.toCatalystDecimal(HiveShim.scala:110) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:414) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$11.apply(TableReader.scala:413) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:442) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:433) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at
[jira] [Commented] (SPARK-36650) ApplicationMaster shutdown hook should catch timeout exception
[ https://issues.apache.org/jira/browse/SPARK-36650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408601#comment-17408601 ] Apache Spark commented on SPARK-36650: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/33897 > ApplicationMaster shutdown hook should catch timeout exception > -- > > Key: SPARK-36650 > URL: https://issues.apache.org/jira/browse/SPARK-36650 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Major > > ApplicationMaster shutdown hook should catch timeout exception -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36650) ApplicationMaster shutdown hook should catch timeout exception
[ https://issues.apache.org/jira/browse/SPARK-36650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36650: Assignee: (was: Apache Spark) > ApplicationMaster shutdown hook should catch timeout exception > -- > > Key: SPARK-36650 > URL: https://issues.apache.org/jira/browse/SPARK-36650 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Major > > ApplicationMaster shutdown hook should catch timeout exception -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36650) ApplicationMaster shutdown hook should catch timeout exception
[ https://issues.apache.org/jira/browse/SPARK-36650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36650: Assignee: Apache Spark > ApplicationMaster shutdown hook should catch timeout exception > -- > > Key: SPARK-36650 > URL: https://issues.apache.org/jira/browse/SPARK-36650 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.2.0 >Reporter: angerszhu >Assignee: Apache Spark >Priority: Major > > ApplicationMaster shutdown hook should catch timeout exception -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36584) ExecutorMonitor#onBlockUpdated will receive event from driver
[ https://issues.apache.org/jira/browse/SPARK-36584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] 胡振宇 updated SPARK-36584: Description: When driver broadcast object, it will send the [SparkListenerBlockUpdated|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala#L228] event. [ExecutorMonitor#onBlockUpdated|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L380] receives and handles the event, in this method, it calls [ensureExecutorIsTracked|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L489] to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In my understanding, `ExecutorMonitor` should only monitor executor not driver. Although this will not cause any problems at the moment because UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a potential risk was: When driver broadcast object, it will send the [SparkListenerBlockUpdated|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala#L228] event. [ExecutorMonitor#onBlockUpdated|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L380]receives and handles the event, in this method, it calls [ensureExecutorIsTracked|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L489] to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In my understanding, `ExecutorMonitor` should only monitor Executor. Although this will not cause any problems at the moment because UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a potential risk > ExecutorMonitor#onBlockUpdated will receive event from driver > - > > Key: SPARK-36584 > URL: https://issues.apache.org/jira/browse/SPARK-36584 > Project: Spark > Issue Type: Question > Components: Spark Core >Affects Versions: 3.1.2 > Environment: Spark 3.1.2 >Reporter: 胡振宇 >Priority: Minor > Original Estimate: 1h > Remaining Estimate: 1h > > When driver broadcast object, it will send the > [SparkListenerBlockUpdated|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala#L228] > event. > [ExecutorMonitor#onBlockUpdated|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L380] > receives and handles the event, in this method, > it calls > [ensureExecutorIsTracked|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L489] > to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In > my understanding, `ExecutorMonitor` should only monitor executor not driver. > Although this will not cause any problems at the moment because > UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a > potential risk -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36584) ExecutorMonitor#onBlockUpdated will receive event from driver
[ https://issues.apache.org/jira/browse/SPARK-36584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] 胡振宇 updated SPARK-36584: Description: When driver broadcast object, it will send the [SparkListenerBlockUpdated|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala#L228] event. [ExecutorMonitor#onBlockUpdated|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L380]receives and handles the event, in this method, it calls [ensureExecutorIsTracked|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L489] to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In my understanding, `ExecutorMonitor` should only monitor Executor. Although this will not cause any problems at the moment because UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a potential risk was: When driver broadcast object, it will send the [SparkListenerBlockUpdated|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala#L228] event. [ExecutorMonitor#onBlockUpdated|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L380]receives and handles the event, in this method, it calls [ensureExecutorIsTracked|#L489] to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In my understanding, `ExecutorMonitor` should only monitor Executor. Although this will not cause any problems at the moment because UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a potential risk > ExecutorMonitor#onBlockUpdated will receive event from driver > - > > Key: SPARK-36584 > URL: https://issues.apache.org/jira/browse/SPARK-36584 > Project: Spark > Issue Type: Question > Components: Spark Core >Affects Versions: 3.1.2 > Environment: Spark 3.1.2 >Reporter: 胡振宇 >Priority: Minor > Original Estimate: 1h > Remaining Estimate: 1h > > When driver broadcast object, it will send the > [SparkListenerBlockUpdated|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala#L228] > event. > [ExecutorMonitor#onBlockUpdated|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L380]receives > and handles the event, in this method, > it calls > [ensureExecutorIsTracked|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L489] > to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In > my understanding, `ExecutorMonitor` should only monitor Executor. Although > this will not cause any problems at the moment because > UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a > potential risk -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36584) ExecutorMonitor#onBlockUpdated will receive event from driver
[ https://issues.apache.org/jira/browse/SPARK-36584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] 胡振宇 updated SPARK-36584: Description: When driver broadcast object, it will send the [SparkListenerBlockUpdated|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala#L228] event. [ExecutorMonitor#onBlockUpdated|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L380]receives and handles the event, in this method, it calls [ensureExecutorIsTracked|#L489] to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In my understanding, `ExecutorMonitor` should only monitor Executor. Although this will not cause any problems at the moment because UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a potential risk was: When driver broadcast object, it will send the [SparkListenerBlockUpdated|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala#L228] event. [ExecutorMonitor#onBlockUpdated |#L380]receives and handles the event, in this method, it calls [ensureExecutorIsTracked|#L489] to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In my understanding, `ExecutorMonitor` should only monitor Executor. Although this will not cause any problems at the moment because UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a potential risk > ExecutorMonitor#onBlockUpdated will receive event from driver > - > > Key: SPARK-36584 > URL: https://issues.apache.org/jira/browse/SPARK-36584 > Project: Spark > Issue Type: Question > Components: Spark Core >Affects Versions: 3.1.2 > Environment: Spark 3.1.2 >Reporter: 胡振宇 >Priority: Minor > Original Estimate: 1h > Remaining Estimate: 1h > > When driver broadcast object, it will send the > [SparkListenerBlockUpdated|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala#L228] > event. > [ExecutorMonitor#onBlockUpdated|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L380]receives > and handles the event, in this method, > it calls [ensureExecutorIsTracked|#L489] > to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In > my understanding, `ExecutorMonitor` should only monitor Executor. Although > this will not cause any problems at the moment because > UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a > potential risk -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36584) ExecutorMonitor#onBlockUpdated will receive event from driver
[ https://issues.apache.org/jira/browse/SPARK-36584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] 胡振宇 updated SPARK-36584: Description: When driver broadcast object, it will send the [SparkListenerBlockUpdated|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala#L228] event. [ExecutorMonitor#onBlockUpdated |#L380]receives and handles the event, in this method, it calls [ensureExecutorIsTracked|#L489] to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In my understanding, `ExecutorMonitor` should only monitor Executor. Although this will not cause any problems at the moment because UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a potential risk was: When driver broadcast object, it will send the [SparkListenerBlockUpdated| #L228] event. [ExecutorMonitor#onBlockUpdated |#L380]receives and handles the event, in this method, it calls [ensureExecutorIsTracked|#L489] to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In my understanding, `ExecutorMonitor` should only monitor Executor. Although this will not cause any problems at the moment because UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a potential risk > ExecutorMonitor#onBlockUpdated will receive event from driver > - > > Key: SPARK-36584 > URL: https://issues.apache.org/jira/browse/SPARK-36584 > Project: Spark > Issue Type: Question > Components: Spark Core >Affects Versions: 3.1.2 > Environment: Spark 3.1.2 >Reporter: 胡振宇 >Priority: Minor > Original Estimate: 1h > Remaining Estimate: 1h > > When driver broadcast object, it will send the > [SparkListenerBlockUpdated|https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala#L228] > event. [ExecutorMonitor#onBlockUpdated |#L380]receives and handles the > event, in this method, > it calls [ensureExecutorIsTracked|#L489] > to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In > my understanding, `ExecutorMonitor` should only monitor Executor. Although > this will not cause any problems at the moment because > UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a > potential risk -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36584) ExecutorMonitor#onBlockUpdated will receive event from driver
[ https://issues.apache.org/jira/browse/SPARK-36584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] 胡振宇 updated SPARK-36584: Description: When driver broadcast object, it will send the [SparkListenerBlockUpdated| #L228] event. [ExecutorMonitor#onBlockUpdated |#L380]receives and handles the event, in this method, it calls [ensureExecutorIsTracked|#L489] to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In my understanding, `ExecutorMonitor` should only monitor Executor. Although this will not cause any problems at the moment because UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a potential risk was: When driver broadcast object, it will send the [SparkListenerBlockUpdated|#L228] event. [ExecutorMonitor#onBlockUpdated |#L380]receives and handles the event, in this method, it calls [ensureExecutorIsTracked|#L489] to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In my understanding, `ExecutorMonitor` should only monitor Executor. Although this will not cause any problems at the moment because UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a potential risk [link title|http://example.com] > ExecutorMonitor#onBlockUpdated will receive event from driver > - > > Key: SPARK-36584 > URL: https://issues.apache.org/jira/browse/SPARK-36584 > Project: Spark > Issue Type: Question > Components: Spark Core >Affects Versions: 3.1.2 > Environment: Spark 3.1.2 >Reporter: 胡振宇 >Priority: Minor > Original Estimate: 1h > Remaining Estimate: 1h > > When driver broadcast object, it will send the [SparkListenerBlockUpdated| > #L228] event. [ExecutorMonitor#onBlockUpdated |#L380]receives and handles the > event, in this method, > it calls [ensureExecutorIsTracked|#L489] > to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In > my understanding, `ExecutorMonitor` should only monitor Executor. Although > this will not cause any problems at the moment because > UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a > potential risk -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36584) ExecutorMonitor#onBlockUpdated will receive event from driver
[ https://issues.apache.org/jira/browse/SPARK-36584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] 胡振宇 updated SPARK-36584: Description: When driver broadcast object, it will send the [SparkListenerBlockUpdated|#L228] event. [ExecutorMonitor#onBlockUpdated |#L380]receives and handles the event, in this method, it calls [ensureExecutorIsTracked|#L489] to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In my understanding, `ExecutorMonitor` should only monitor Executor. Although this will not cause any problems at the moment because UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a potential risk was: When driver broadcast object, it will send the [SparkListenerBlockUpdated|#L228] event. [ExecutorMonitor#onBlockUpdated|#L380]] receives and handles the event, in this method, it calls [ensureExecutorIsTracked|#L489] to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In my understanding, `ExecutorMonitor` should only monitor Executor. Although this will not cause any problems at the moment because UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a potential risk > ExecutorMonitor#onBlockUpdated will receive event from driver > - > > Key: SPARK-36584 > URL: https://issues.apache.org/jira/browse/SPARK-36584 > Project: Spark > Issue Type: Question > Components: Spark Core >Affects Versions: 3.1.2 > Environment: Spark 3.1.2 >Reporter: 胡振宇 >Priority: Minor > Original Estimate: 1h > Remaining Estimate: 1h > > When driver broadcast object, it will send the > [SparkListenerBlockUpdated|#L228] event. [ExecutorMonitor#onBlockUpdated > |#L380]receives and handles the event, in this method, > it calls [ensureExecutorIsTracked|#L489] > to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In > my understanding, `ExecutorMonitor` should only monitor Executor. Although > this will not cause any problems at the moment because > UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a > potential risk -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36584) ExecutorMonitor#onBlockUpdated will receive event from driver
[ https://issues.apache.org/jira/browse/SPARK-36584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] 胡振宇 updated SPARK-36584: Description: When driver broadcast object, it will send the [SparkListenerBlockUpdated|#L228] event. [ExecutorMonitor#onBlockUpdated |#L380]receives and handles the event, in this method, it calls [ensureExecutorIsTracked|#L489] to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In my understanding, `ExecutorMonitor` should only monitor Executor. Although this will not cause any problems at the moment because UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a potential risk [link title|http://example.com] was: When driver broadcast object, it will send the [SparkListenerBlockUpdated|#L228] event. [ExecutorMonitor#onBlockUpdated |#L380]receives and handles the event, in this method, it calls [ensureExecutorIsTracked|#L489] to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In my understanding, `ExecutorMonitor` should only monitor Executor. Although this will not cause any problems at the moment because UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a potential risk > ExecutorMonitor#onBlockUpdated will receive event from driver > - > > Key: SPARK-36584 > URL: https://issues.apache.org/jira/browse/SPARK-36584 > Project: Spark > Issue Type: Question > Components: Spark Core >Affects Versions: 3.1.2 > Environment: Spark 3.1.2 >Reporter: 胡振宇 >Priority: Minor > Original Estimate: 1h > Remaining Estimate: 1h > > When driver broadcast object, it will send the > [SparkListenerBlockUpdated|#L228] event. [ExecutorMonitor#onBlockUpdated > |#L380]receives and handles the event, in this method, > it calls [ensureExecutorIsTracked|#L489] > to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In > my understanding, `ExecutorMonitor` should only monitor Executor. Although > this will not cause any problems at the moment because > UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a > potential risk > [link title|http://example.com] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36584) ExecutorMonitor#onBlockUpdated will receive event from driver
[ https://issues.apache.org/jira/browse/SPARK-36584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] 胡振宇 updated SPARK-36584: Description: When driver broadcast object, it will send the [SparkListenerBlockUpdated|#L228] event. [ExecutorMonitor#onBlockUpdated|#L380]] receives and handles the event, in this method, it calls [ensureExecutorIsTracked|#L489] to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In my understanding, `ExecutorMonitor` should only monitor Executor. Although this will not cause any problems at the moment because UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a potential risk was: When driver broadcast object, it will send the [SparkListenerBlockUpdated|#L228]] event. [ExecutorMonitor#onBlockUpdated|[https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L380]] receives and handles the event, in this method, it calls [ensureExecutorIsTracked|#L489]] to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In my understanding, `ExecutorMonitor` should only monitor Executor. Although this will not cause any problems at the moment because UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a potential risk > ExecutorMonitor#onBlockUpdated will receive event from driver > - > > Key: SPARK-36584 > URL: https://issues.apache.org/jira/browse/SPARK-36584 > Project: Spark > Issue Type: Question > Components: Spark Core >Affects Versions: 3.1.2 > Environment: Spark 3.1.2 >Reporter: 胡振宇 >Priority: Minor > Original Estimate: 1h > Remaining Estimate: 1h > > When driver broadcast object, it will send the > [SparkListenerBlockUpdated|#L228] event. > [ExecutorMonitor#onBlockUpdated|#L380]] receives and handles the event, in > this method, > it calls [ensureExecutorIsTracked|#L489] > to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In > my understanding, `ExecutorMonitor` should only monitor Executor. Although > this will not cause any problems at the moment because > UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a > potential risk -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36584) ExecutorMonitor#onBlockUpdated will receive event from driver
[ https://issues.apache.org/jira/browse/SPARK-36584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] 胡振宇 updated SPARK-36584: Description: When driver broadcast object, it will send the [SparkListenerBlockUpdated|#L228]] event. [ExecutorMonitor#onBlockUpdated|[https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L380]] receives and handles the event, in this method, it calls [ensureExecutorIsTracked|#L489]] to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In my understanding, `ExecutorMonitor` should only monitor Executor. Although this will not cause any problems at the moment because UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a potential risk was: When driver broadcast object, it will send the [SparkListenerBlockUpdated|[https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala#L228]] event. [[ExecutorMonitor#onBlockUpdated|#onBlockUpdated]|[https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L380]] receives and handles the event, in this method, it calls [ensureExecutorIsTracked|[https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L489]] to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In my understanding, `ExecutorMonitor` should only monitor Executor. Although this will not cause any problems at the moment because UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a potential risk > ExecutorMonitor#onBlockUpdated will receive event from driver > - > > Key: SPARK-36584 > URL: https://issues.apache.org/jira/browse/SPARK-36584 > Project: Spark > Issue Type: Question > Components: Spark Core >Affects Versions: 3.1.2 > Environment: Spark 3.1.2 >Reporter: 胡振宇 >Priority: Minor > Original Estimate: 1h > Remaining Estimate: 1h > > When driver broadcast object, it will send the > [SparkListenerBlockUpdated|#L228]] event. > [ExecutorMonitor#onBlockUpdated|[https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L380]] > receives and handles the event, in this method, > it calls [ensureExecutorIsTracked|#L489]] > to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In > my understanding, `ExecutorMonitor` should only monitor Executor. Although > this will not cause any problems at the moment because > UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a > potential risk -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36584) ExecutorMonitor#onBlockUpdated will receive event from driver
[ https://issues.apache.org/jira/browse/SPARK-36584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] 胡振宇 updated SPARK-36584: Description: When driver broadcast object, it will send the [SparkListenerBlockUpdated|[https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala#L228]] event. [[ExecutorMonitor#onBlockUpdated|#onBlockUpdated]|[https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L380]] receives and handles the event, in this method, it calls [ensureExecutorIsTracked|[https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L489]] to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In my understanding, `ExecutorMonitor` should only monitor Executor. Although this will not cause any problems at the moment because UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a potential risk was: When driver broadcast object, it will send the [SparkListenerBlockUpdated](https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala#L228) event. [ExecutorMonitor#onBlockUpdated](https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L380) receives and handles the event, in this method, it calls [ensureExecutorIsTracked](https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L489) to put driver in `executors` variable, but in my understanding, `ExecutorMonitor` should only monitor Executor, not Driver. Moreover, adding a `driver` to the `executors` will affect the calculation of [ExecutorAllocationManager#removeExecutors](https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L552), and the driver will occupy the count of `executors` Issue Type: Question (was: Bug) Priority: Minor (was: Major) > ExecutorMonitor#onBlockUpdated will receive event from driver > - > > Key: SPARK-36584 > URL: https://issues.apache.org/jira/browse/SPARK-36584 > Project: Spark > Issue Type: Question > Components: Spark Core >Affects Versions: 3.1.2 > Environment: Spark 3.1.2 >Reporter: 胡振宇 >Priority: Minor > Original Estimate: 1h > Remaining Estimate: 1h > > When driver broadcast object, it will send the > [SparkListenerBlockUpdated|[https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala#L228]] > event. > [[ExecutorMonitor#onBlockUpdated|#onBlockUpdated]|[https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L380]] > receives and handles the event, in this method, > it calls > [ensureExecutorIsTracked|[https://github.com/apache/spark/blob/df0ec56723f0b47c3629055fa7a8c63bb4285147/core/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala#L489]] > to put driver in `executors` variable with `UNKNOWN_RESOURCE_PROFILE_ID`. In > my understanding, `ExecutorMonitor` should only monitor Executor. Although > this will not cause any problems at the moment because > UNKNOWN_RESOURCE_PROFILE_ID will be filtered out, but I think this is a > potential risk -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org