[jira] [Commented] (SPARK-41427) Protobuf serializer for ExecutorStageSummaryWrapper
[ https://issues.apache.org/jira/browse/SPARK-41427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17649617#comment-17649617 ] Apache Spark commented on SPARK-41427: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/39135 > Protobuf serializer for ExecutorStageSummaryWrapper > --- > > Key: SPARK-41427 > URL: https://issues.apache.org/jira/browse/SPARK-41427 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41596) Document the new feature "Async Progress Tracking" to Structured Streaming guide doc
[ https://issues.apache.org/jira/browse/SPARK-41596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17649616#comment-17649616 ] Jungtaek Lim commented on SPARK-41596: -- cc. [~jerrypeng] Could you please take this up and complete the efforts for this SPIP? Thanks in advance. > Document the new feature "Async Progress Tracking" to Structured Streaming > guide doc > > > Key: SPARK-41596 > URL: https://issues.apache.org/jira/browse/SPARK-41596 > Project: Spark > Issue Type: Documentation > Components: Structured Streaming >Affects Versions: 3.4.0 >Reporter: Jungtaek Lim >Priority: Blocker > > Given that we merged the new SPIP feature SPARK-39591, we have to document > the new feature to the Structured Streaming guide doc so that end users can > refer to the doc and start experimenting the feature. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41596) Document the new feature "Async Progress Tracking" to Structured Streaming guide doc
Jungtaek Lim created SPARK-41596: Summary: Document the new feature "Async Progress Tracking" to Structured Streaming guide doc Key: SPARK-41596 URL: https://issues.apache.org/jira/browse/SPARK-41596 Project: Spark Issue Type: Documentation Components: Structured Streaming Affects Versions: 3.4.0 Reporter: Jungtaek Lim Given that we merged the new SPIP feature SPARK-39591, we have to document the new feature to the Structured Streaming guide doc so that end users can refer to the doc and start experimenting the feature. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39591) SPIP: Asynchronous Offset Management in Structured Streaming
[ https://issues.apache.org/jira/browse/SPARK-39591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-39591. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38517 [https://github.com/apache/spark/pull/38517] > SPIP: Asynchronous Offset Management in Structured Streaming > > > Key: SPARK-39591 > URL: https://issues.apache.org/jira/browse/SPARK-39591 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.3.0 >Reporter: Boyang Jerry Peng >Assignee: Boyang Jerry Peng >Priority: Major > Labels: SPIP > Fix For: 3.4.0 > > > Currently in Structured Streaming, at the beginning of every micro-batch the > offset to process up to for the current batch is persisted to durable > storage. At the end of every micro-batch, a marker to indicate the > completion of this current micro-batch is persisted to durable storage. For > pipelines such as one that read from Kafka and write to Kafka, end-to-end > exactly once is not support and latency is sensitive, we can allow users to > configure offset commits to be written asynchronously thus this commit > operation will not contribute to the batch duration and effectively lowering > the overall latency of the pipeline. > > SPIP Doc: > > https://docs.google.com/document/d/1iPiI4YoGCM0i61pBjkxcggU57gHKf2jVwD7HWMHgH-Y/edit?usp=sharing -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39591) SPIP: Asynchronous Offset Management in Structured Streaming
[ https://issues.apache.org/jira/browse/SPARK-39591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim reassigned SPARK-39591: Assignee: Boyang Jerry Peng > SPIP: Asynchronous Offset Management in Structured Streaming > > > Key: SPARK-39591 > URL: https://issues.apache.org/jira/browse/SPARK-39591 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.3.0 >Reporter: Boyang Jerry Peng >Assignee: Boyang Jerry Peng >Priority: Major > Labels: SPIP > > Currently in Structured Streaming, at the beginning of every micro-batch the > offset to process up to for the current batch is persisted to durable > storage. At the end of every micro-batch, a marker to indicate the > completion of this current micro-batch is persisted to durable storage. For > pipelines such as one that read from Kafka and write to Kafka, end-to-end > exactly once is not support and latency is sensitive, we can allow users to > configure offset commits to be written asynchronously thus this commit > operation will not contribute to the batch duration and effectively lowering > the overall latency of the pipeline. > > SPIP Doc: > > https://docs.google.com/document/d/1iPiI4YoGCM0i61pBjkxcggU57gHKf2jVwD7HWMHgH-Y/edit?usp=sharing -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41425) Protobuf serializer for RDDStorageInfoWrapper
[ https://issues.apache.org/jira/browse/SPARK-41425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-41425. Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39104 [https://github.com/apache/spark/pull/39104] > Protobuf serializer for RDDStorageInfoWrapper > - > > Key: SPARK-41425 > URL: https://issues.apache.org/jira/browse/SPARK-41425 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Sandeep Singh >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41425) Protobuf serializer for RDDStorageInfoWrapper
[ https://issues.apache.org/jira/browse/SPARK-41425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-41425: -- Assignee: Sandeep Singh > Protobuf serializer for RDDStorageInfoWrapper > - > > Key: SPARK-41425 > URL: https://issues.apache.org/jira/browse/SPARK-41425 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Sandeep Singh >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41427) Protobuf serializer for ExecutorStageSummaryWrapper
[ https://issues.apache.org/jira/browse/SPARK-41427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-41427. Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39100 [https://github.com/apache/spark/pull/39100] > Protobuf serializer for ExecutorStageSummaryWrapper > --- > > Key: SPARK-41427 > URL: https://issues.apache.org/jira/browse/SPARK-41427 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41427) Protobuf serializer for ExecutorStageSummaryWrapper
[ https://issues.apache.org/jira/browse/SPARK-41427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-41427: -- Assignee: Gengliang Wang > Protobuf serializer for ExecutorStageSummaryWrapper > --- > > Key: SPARK-41427 > URL: https://issues.apache.org/jira/browse/SPARK-41427 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41349) Implement `DataFrame.hint`
[ https://issues.apache.org/jira/browse/SPARK-41349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-41349. --- Resolution: Fixed Issue resolved by pull request 38984 [https://github.com/apache/spark/pull/38984] > Implement `DataFrame.hint` > -- > > Key: SPARK-41349 > URL: https://issues.apache.org/jira/browse/SPARK-41349 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Deng Ziming >Priority: Major > Fix For: 3.4.0 > > > implement DataFrame.hint with the proto message added in > https://issues.apache.org/jira/browse/SPARK-41345 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41423) Protobuf serializer for StageDataWrapper
[ https://issues.apache.org/jira/browse/SPARK-41423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17649528#comment-17649528 ] BingKun Pan commented on SPARK-41423: - I work on it. > Protobuf serializer for StageDataWrapper > > > Key: SPARK-41423 > URL: https://issues.apache.org/jira/browse/SPARK-41423 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41595) Support generator function explode/explode_outer in the FROM clause
[ https://issues.apache.org/jira/browse/SPARK-41595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41595: Assignee: (was: Apache Spark) > Support generator function explode/explode_outer in the FROM clause > --- > > Key: SPARK-41595 > URL: https://issues.apache.org/jira/browse/SPARK-41595 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Allison Wang >Priority: Major > > Currently, the table-valued generator function explode/explode_outer can only > be used in the SELECT clause of a query: > SELECT explode(array(1, 2)) > This task is to allow table-valued functions to be used in the FROM clause of > a query: > SELECT * FROM explode(array(1, 2)) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41595) Support generator function explode/explode_outer in the FROM clause
[ https://issues.apache.org/jira/browse/SPARK-41595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17649527#comment-17649527 ] Apache Spark commented on SPARK-41595: -- User 'allisonwang-db' has created a pull request for this issue: https://github.com/apache/spark/pull/39133 > Support generator function explode/explode_outer in the FROM clause > --- > > Key: SPARK-41595 > URL: https://issues.apache.org/jira/browse/SPARK-41595 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Allison Wang >Priority: Major > > Currently, the table-valued generator function explode/explode_outer can only > be used in the SELECT clause of a query: > SELECT explode(array(1, 2)) > This task is to allow table-valued functions to be used in the FROM clause of > a query: > SELECT * FROM explode(array(1, 2)) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41595) Support generator function explode/explode_outer in the FROM clause
[ https://issues.apache.org/jira/browse/SPARK-41595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41595: Assignee: Apache Spark > Support generator function explode/explode_outer in the FROM clause > --- > > Key: SPARK-41595 > URL: https://issues.apache.org/jira/browse/SPARK-41595 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Allison Wang >Assignee: Apache Spark >Priority: Major > > Currently, the table-valued generator function explode/explode_outer can only > be used in the SELECT clause of a query: > SELECT explode(array(1, 2)) > This task is to allow table-valued functions to be used in the FROM clause of > a query: > SELECT * FROM explode(array(1, 2)) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41589) PyTorch Distributor
[ https://issues.apache.org/jira/browse/SPARK-41589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17649526#comment-17649526 ] Rithwik Ediga Lakhamsani commented on SPARK-41589: -- [~xkrogen] I created a new copy, please let me know if you still can't see it. Thank you for your patience! > PyTorch Distributor > --- > > Key: SPARK-41589 > URL: https://issues.apache.org/jira/browse/SPARK-41589 > Project: Spark > Issue Type: Umbrella > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > This is a project to make it easier for PySpark users to distribute PyTorch > code using PySpark. The corresponding [Design > Document|https://docs.google.com/document/d/1QPO1Ly8WteL6aIPvVcR7Xne9qVtJiB3fdrRn7NwBcpA/edit?usp=sharing] > can give more context. This was a project determined by the Databricks ML > Training Team; please reach out to [~gurwls223] (Spark-side proxy) or > [~erithwik] for more context. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41589) PyTorch Distributor
[ https://issues.apache.org/jira/browse/SPARK-41589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rithwik Ediga Lakhamsani updated SPARK-41589: - Description: This is a project to make it easier for PySpark users to distribute PyTorch code using PySpark. The corresponding [Design Document|https://docs.google.com/document/d/1QPO1Ly8WteL6aIPvVcR7Xne9qVtJiB3fdrRn7NwBcpA/edit?usp=sharing] can give more context. This was a project determined by the Databricks ML Training Team; please reach out to [~gurwls223] (Spark-side proxy) or [~erithwik] for more context. (was: This is a project to make it easier for PySpark users to distribute PyTorch code using PySpark. The corresponding [Design Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit] and [PRD|https://docs.google.com/document/d/1KprHkzx9r3lv47TLgO6FnkYZT92xOx6OeKvTJPxqpfk/edit] can give more context. This was a project determined by the Databricks ML Training Team; please reach out to [~gurwls223] (Spark-side proxy) or [~erithwik] for more context.) > PyTorch Distributor > --- > > Key: SPARK-41589 > URL: https://issues.apache.org/jira/browse/SPARK-41589 > Project: Spark > Issue Type: Umbrella > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > This is a project to make it easier for PySpark users to distribute PyTorch > code using PySpark. The corresponding [Design > Document|https://docs.google.com/document/d/1QPO1Ly8WteL6aIPvVcR7Xne9qVtJiB3fdrRn7NwBcpA/edit?usp=sharing] > can give more context. This was a project determined by the Databricks ML > Training Team; please reach out to [~gurwls223] (Spark-side proxy) or > [~erithwik] for more context. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41595) Support generator function explode/explode_outer in the FROM clause
Allison Wang created SPARK-41595: Summary: Support generator function explode/explode_outer in the FROM clause Key: SPARK-41595 URL: https://issues.apache.org/jira/browse/SPARK-41595 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: Allison Wang Currently, the table-valued generator function explode/explode_outer can only be used in the SELECT clause of a query: SELECT explode(array(1, 2)) This task is to allow table-valued functions to be used in the FROM clause of a query: SELECT * FROM explode(array(1, 2)) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41594) Support table-valued generator functions in the FROM clause
Allison Wang created SPARK-41594: Summary: Support table-valued generator functions in the FROM clause Key: SPARK-41594 URL: https://issues.apache.org/jira/browse/SPARK-41594 Project: Spark Issue Type: Umbrella Components: SQL Affects Versions: 3.4.0 Reporter: Allison Wang Umbrella Jira for supporting table-valued generator functions in the FROM clause of a query. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41589) PyTorch Distributor
[ https://issues.apache.org/jira/browse/SPARK-41589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17649516#comment-17649516 ] Rithwik Ediga Lakhamsani commented on SPARK-41589: -- Sorry, I need update it with a new copy. I will add a new comment on this ticket when the new document should be available. > PyTorch Distributor > --- > > Key: SPARK-41589 > URL: https://issues.apache.org/jira/browse/SPARK-41589 > Project: Spark > Issue Type: Umbrella > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > This is a project to make it easier for PySpark users to distribute PyTorch > code using PySpark. The corresponding [Design > Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit] > and > [PRD|https://docs.google.com/document/d/1KprHkzx9r3lv47TLgO6FnkYZT92xOx6OeKvTJPxqpfk/edit] > can give more context. This was a project determined by the Databricks ML > Training Team; please reach out to [~gurwls223] (Spark-side proxy) or > [~erithwik] for more context. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41589) PyTorch Distributor
[ https://issues.apache.org/jira/browse/SPARK-41589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17649515#comment-17649515 ] Erik Krogen commented on SPARK-41589: - Nope :( > PyTorch Distributor > --- > > Key: SPARK-41589 > URL: https://issues.apache.org/jira/browse/SPARK-41589 > Project: Spark > Issue Type: Umbrella > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > This is a project to make it easier for PySpark users to distribute PyTorch > code using PySpark. The corresponding [Design > Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit] > and > [PRD|https://docs.google.com/document/d/1KprHkzx9r3lv47TLgO6FnkYZT92xOx6OeKvTJPxqpfk/edit] > can give more context. This was a project determined by the Databricks ML > Training Team; please reach out to [~gurwls223] (Spark-side proxy) or > [~erithwik] for more context. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41589) PyTorch Distributor
[ https://issues.apache.org/jira/browse/SPARK-41589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-41589: - Component/s: PySpark > PyTorch Distributor > --- > > Key: SPARK-41589 > URL: https://issues.apache.org/jira/browse/SPARK-41589 > Project: Spark > Issue Type: Umbrella > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > This is a project to make it easier for PySpark users to distribute PyTorch > code using PySpark. The corresponding [Design > Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit] > and > [PRD|https://docs.google.com/document/d/1KprHkzx9r3lv47TLgO6FnkYZT92xOx6OeKvTJPxqpfk/edit] > can give more context. This was a project determined by the Databricks ML > Training Team; please reach out to [~gurwls223] (Spark-side proxy) or > [~erithwik] for more context. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41535) InterpretedUnsafeProjection and InterpretedMutableProjection can corrupt unsafe buffer when used with calendar interval data
[ https://issues.apache.org/jira/browse/SPARK-41535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41535: Assignee: Bruce Robbins > InterpretedUnsafeProjection and InterpretedMutableProjection can corrupt > unsafe buffer when used with calendar interval data > > > Key: SPARK-41535 > URL: https://issues.apache.org/jira/browse/SPARK-41535 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.1, 3.2.3, 3.4.0 >Reporter: Bruce Robbins >Assignee: Bruce Robbins >Priority: Major > > This returns the wrong answer: > {noformat} > set spark.sql.codegen.wholeStage=false; > set spark.sql.codegen.factoryMode=NO_CODEGEN; > select first(col1), last(col2) from values > (make_interval(0, 0, 0, 7, 0, 0, 0), make_interval(17, 0, 0, 2, 0, 0, 0)) > as data(col1, col2); > +---+---+ > |first(col1)|last(col2) | > +---+---+ > |16 years 2 days|16 years 2 days| > +---+---+ > {noformat} > In the above case, {{TungstenAggregationIterator}} uses > {{InterpretedUnsafeProjection}} to create the aggregation buffer and then > initializes all the fields to null. {{InterpretedUnsafeProjection}} > incorrectly calls {{UnsafeRowWriter#setNullAt}}, rather than > {{unsafeRowWriter#write}}, for the two calendar interval fields. As a result, > the writer never allocates memory from the variable length region for the two > decimals, and the pointers in the fixed region get left as zero. Later, when > {{InterpretedMutableProjection}} attempts to update the first field, > {{UnsafeRow#setInterval}} picks up the zero pointer and stores interval data > on top of the null-tracking bit set. The call to UnsafeRow#setInterval for > the second field also stomps the null-tracking bit set. Later updates to the > null-tracking bit set (e.g., calls to setNotNullAt) further corrupt the > interval data, turning {{interval 7 years 2 days}} into {{interval 16 years 2 > days}}. > Even if you fix the above bug to {{InterpretedUnsafeProjection}} so that the > buffer is created correctly, {{InterpretedMutableProjection}} has a similar > bug to SPARK-41395, except this time for calendar interval data: > {noformat} > set spark.sql.codegen.wholeStage=false; > set spark.sql.codegen.factoryMode=NO_CODEGEN; > select first(col1), last(col2), max(col3) from values > (null, null, 1), > (make_interval(0, 0, 0, 7, 0, 0, 0), make_interval(17, 0, 0, 2, 0, 0, 0), 3) > as data(col1, col2, col3); > +---+---+-+ > |first(col1)|last(col2) |max(col3)| > +---+---+-+ > |16 years 2 days|16 years 2 days|3| > +---+---+-+ > {noformat} > These two bugs could get exercised during codegen fallback. Take for example > this case where I forced codegen to fail for the Greatest expression: > {noformat} > spark-sql> select first(col1), last(col2), max(col3) from values > (null, null, 1), > (make_interval(0, 0, 0, 7, 0, 0, 0), make_interval(17, 0, 0, 2, 0, 0, 0), 3) > as data(col1, col2, col3); > 22/12/15 13:06:23 ERROR CodeGenerator: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 70, Column 1: ';' expected instead of 'if' > ... > 22/12/15 13:06:24 WARN MutableProjection: Expr codegen error and falling back > to interpreter mode > java.util.concurrent.ExecutionException: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 78, Column 1: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 78, Column 1: ';' expected instead of 'boolean' > ... > 16 years 2 days 16 years 2 days 3 > Time taken: 5.852 seconds, Fetched 1 row(s) > spark-sql> > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41535) InterpretedUnsafeProjection and InterpretedMutableProjection can corrupt unsafe buffer when used with calendar interval data
[ https://issues.apache.org/jira/browse/SPARK-41535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41535. -- Fix Version/s: 3.3.2 3.2.3 3.4.0 Resolution: Fixed Issue resolved by pull request 39117 [https://github.com/apache/spark/pull/39117] > InterpretedUnsafeProjection and InterpretedMutableProjection can corrupt > unsafe buffer when used with calendar interval data > > > Key: SPARK-41535 > URL: https://issues.apache.org/jira/browse/SPARK-41535 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.1, 3.2.3, 3.4.0 >Reporter: Bruce Robbins >Assignee: Bruce Robbins >Priority: Major > Fix For: 3.3.2, 3.2.3, 3.4.0 > > > This returns the wrong answer: > {noformat} > set spark.sql.codegen.wholeStage=false; > set spark.sql.codegen.factoryMode=NO_CODEGEN; > select first(col1), last(col2) from values > (make_interval(0, 0, 0, 7, 0, 0, 0), make_interval(17, 0, 0, 2, 0, 0, 0)) > as data(col1, col2); > +---+---+ > |first(col1)|last(col2) | > +---+---+ > |16 years 2 days|16 years 2 days| > +---+---+ > {noformat} > In the above case, {{TungstenAggregationIterator}} uses > {{InterpretedUnsafeProjection}} to create the aggregation buffer and then > initializes all the fields to null. {{InterpretedUnsafeProjection}} > incorrectly calls {{UnsafeRowWriter#setNullAt}}, rather than > {{unsafeRowWriter#write}}, for the two calendar interval fields. As a result, > the writer never allocates memory from the variable length region for the two > decimals, and the pointers in the fixed region get left as zero. Later, when > {{InterpretedMutableProjection}} attempts to update the first field, > {{UnsafeRow#setInterval}} picks up the zero pointer and stores interval data > on top of the null-tracking bit set. The call to UnsafeRow#setInterval for > the second field also stomps the null-tracking bit set. Later updates to the > null-tracking bit set (e.g., calls to setNotNullAt) further corrupt the > interval data, turning {{interval 7 years 2 days}} into {{interval 16 years 2 > days}}. > Even if you fix the above bug to {{InterpretedUnsafeProjection}} so that the > buffer is created correctly, {{InterpretedMutableProjection}} has a similar > bug to SPARK-41395, except this time for calendar interval data: > {noformat} > set spark.sql.codegen.wholeStage=false; > set spark.sql.codegen.factoryMode=NO_CODEGEN; > select first(col1), last(col2), max(col3) from values > (null, null, 1), > (make_interval(0, 0, 0, 7, 0, 0, 0), make_interval(17, 0, 0, 2, 0, 0, 0), 3) > as data(col1, col2, col3); > +---+---+-+ > |first(col1)|last(col2) |max(col3)| > +---+---+-+ > |16 years 2 days|16 years 2 days|3| > +---+---+-+ > {noformat} > These two bugs could get exercised during codegen fallback. Take for example > this case where I forced codegen to fail for the Greatest expression: > {noformat} > spark-sql> select first(col1), last(col2), max(col3) from values > (null, null, 1), > (make_interval(0, 0, 0, 7, 0, 0, 0), make_interval(17, 0, 0, 2, 0, 0, 0), 3) > as data(col1, col2, col3); > 22/12/15 13:06:23 ERROR CodeGenerator: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 70, Column 1: ';' expected instead of 'if' > ... > 22/12/15 13:06:24 WARN MutableProjection: Expr codegen error and falling back > to interpreter mode > java.util.concurrent.ExecutionException: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 78, Column 1: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 78, Column 1: ';' expected instead of 'boolean' > ... > 16 years 2 days 16 years 2 days 3 > Time taken: 5.852 seconds, Fetched 1 row(s) > spark-sql> > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-41589) PyTorch Distributor
[ https://issues.apache.org/jira/browse/SPARK-41589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17649511#comment-17649511 ] Rithwik Ediga Lakhamsani edited comment on SPARK-41589 at 12/20/22 12:27 AM: - Oh sorry, let me fix that! Does it work now [~xkrogen]? was (Author: JIRAUSER298573): Oh sorry, let me fix that! > PyTorch Distributor > --- > > Key: SPARK-41589 > URL: https://issues.apache.org/jira/browse/SPARK-41589 > Project: Spark > Issue Type: Umbrella > Components: ML >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > This is a project to make it easier for PySpark users to distribute PyTorch > code using PySpark. The corresponding [Design > Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit] > and > [PRD|https://docs.google.com/document/d/1KprHkzx9r3lv47TLgO6FnkYZT92xOx6OeKvTJPxqpfk/edit] > can give more context. This was a project determined by the Databricks ML > Training Team; please reach out to [~gurwls223] (Spark-side proxy) or > [~erithwik] for more context. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41589) PyTorch Distributor
[ https://issues.apache.org/jira/browse/SPARK-41589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rithwik Ediga Lakhamsani updated SPARK-41589: - Description: This is a project to make it easier for PySpark users to distribute PyTorch code using PySpark. The corresponding [Design Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit] and [PRD|https://docs.google.com/document/d/1KprHkzx9r3lv47TLgO6FnkYZT92xOx6OeKvTJPxqpfk/edit] can give more context. This was a project determined by the Databricks ML Training Team; please reach out to [~gurwls223] (Spark-side proxy) or [~erithwik] for more context. (was: This is a project to make it easier for PySpark users to distribute PyTorch code using PySpark. The corresponding [Design Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit] and [PRD|https://docs.google.com/document/d/1KprHkzx9r3lv47TLgO6FnkYZT92xOx6OeKvTJPxqpfk/edit] can give more context. ) > PyTorch Distributor > --- > > Key: SPARK-41589 > URL: https://issues.apache.org/jira/browse/SPARK-41589 > Project: Spark > Issue Type: Umbrella > Components: ML >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > This is a project to make it easier for PySpark users to distribute PyTorch > code using PySpark. The corresponding [Design > Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit] > and > [PRD|https://docs.google.com/document/d/1KprHkzx9r3lv47TLgO6FnkYZT92xOx6OeKvTJPxqpfk/edit] > can give more context. This was a project determined by the Databricks ML > Training Team; please reach out to [~gurwls223] (Spark-side proxy) or > [~erithwik] for more context. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41589) PyTorch Distributor
[ https://issues.apache.org/jira/browse/SPARK-41589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17649511#comment-17649511 ] Rithwik Ediga Lakhamsani commented on SPARK-41589: -- Oh sorry, let me fix that! > PyTorch Distributor > --- > > Key: SPARK-41589 > URL: https://issues.apache.org/jira/browse/SPARK-41589 > Project: Spark > Issue Type: Umbrella > Components: ML >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > This is a project to make it easier for PySpark users to distribute PyTorch > code using PySpark. The corresponding [Design > Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit] > and > [PRD|https://docs.google.com/document/d/1KprHkzx9r3lv47TLgO6FnkYZT92xOx6OeKvTJPxqpfk/edit] > can give more context. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41589) PyTorch Distributor
[ https://issues.apache.org/jira/browse/SPARK-41589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17649510#comment-17649510 ] Erik Krogen commented on SPARK-41589: - [~erithwik] can you make the linked documents world-viewable? I get access denied. > PyTorch Distributor > --- > > Key: SPARK-41589 > URL: https://issues.apache.org/jira/browse/SPARK-41589 > Project: Spark > Issue Type: Umbrella > Components: ML >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > This is a project to make it easier for PySpark users to distribute PyTorch > code using PySpark. The corresponding [Design > Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit] > and > [PRD|https://docs.google.com/document/d/1KprHkzx9r3lv47TLgO6FnkYZT92xOx6OeKvTJPxqpfk/edit] > can give more context. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41592) Implement functionality for training a PyTorch file on the executors
Rithwik Ediga Lakhamsani created SPARK-41592: Summary: Implement functionality for training a PyTorch file on the executors Key: SPARK-41592 URL: https://issues.apache.org/jira/browse/SPARK-41592 Project: Spark Issue Type: Sub-task Components: ML Affects Versions: 3.4.0 Reporter: Rithwik Ediga Lakhamsani -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41593) Implement logging from the executor nodes
Rithwik Ediga Lakhamsani created SPARK-41593: Summary: Implement logging from the executor nodes Key: SPARK-41593 URL: https://issues.apache.org/jira/browse/SPARK-41593 Project: Spark Issue Type: Sub-task Components: ML Affects Versions: 3.4.0 Reporter: Rithwik Ediga Lakhamsani -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41591) Implement functionality for training a PyTorch file locally
Rithwik Ediga Lakhamsani created SPARK-41591: Summary: Implement functionality for training a PyTorch file locally Key: SPARK-41591 URL: https://issues.apache.org/jira/browse/SPARK-41591 Project: Spark Issue Type: Sub-task Components: ML Affects Versions: 3.4.0 Reporter: Rithwik Ediga Lakhamsani -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41589) PyTorch Distributor
[ https://issues.apache.org/jira/browse/SPARK-41589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17649509#comment-17649509 ] Rithwik Ediga Lakhamsani commented on SPARK-41589: -- I am working on this. > PyTorch Distributor > --- > > Key: SPARK-41589 > URL: https://issues.apache.org/jira/browse/SPARK-41589 > Project: Spark > Issue Type: Umbrella > Components: ML >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > This is a project to make it easier for PySpark users to distribute PyTorch > code using PySpark. The corresponding [Design > Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit] > and > [PRD|https://docs.google.com/document/d/1KprHkzx9r3lv47TLgO6FnkYZT92xOx6OeKvTJPxqpfk/edit] > can give more context. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41589) PyTorch Distributor
[ https://issues.apache.org/jira/browse/SPARK-41589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rithwik Ediga Lakhamsani updated SPARK-41589: - Description: This is a project to make it easier for PySpark users to distribute PyTorch code using PySpark. The corresponding [Design Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit] and [PRD|https://docs.google.com/document/d/1KprHkzx9r3lv47TLgO6FnkYZT92xOx6OeKvTJPxqpfk/edit] can give more context. > PyTorch Distributor > --- > > Key: SPARK-41589 > URL: https://issues.apache.org/jira/browse/SPARK-41589 > Project: Spark > Issue Type: Umbrella > Components: ML >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > This is a project to make it easier for PySpark users to distribute PyTorch > code using PySpark. The corresponding [Design > Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit] > and > [PRD|https://docs.google.com/document/d/1KprHkzx9r3lv47TLgO6FnkYZT92xOx6OeKvTJPxqpfk/edit] > can give more context. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41590) Implement Baseline API Code
Rithwik Ediga Lakhamsani created SPARK-41590: Summary: Implement Baseline API Code Key: SPARK-41590 URL: https://issues.apache.org/jira/browse/SPARK-41590 Project: Spark Issue Type: Sub-task Components: ML Affects Versions: 3.4.0 Reporter: Rithwik Ediga Lakhamsani Creating a baseline API so that we can agree on how the users will interact with the code. This was determined in this [Design Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit] and can be updated as necessary. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41589) PyTorch Distributor
Rithwik Ediga Lakhamsani created SPARK-41589: Summary: PyTorch Distributor Key: SPARK-41589 URL: https://issues.apache.org/jira/browse/SPARK-41589 Project: Spark Issue Type: Umbrella Components: ML Affects Versions: 3.4.0 Reporter: Rithwik Ediga Lakhamsani -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41583) Add Spark Connect and protobuf into setup.py with specifying dependencies
[ https://issues.apache.org/jira/browse/SPARK-41583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41583. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39123 [https://github.com/apache/spark/pull/39123] > Add Spark Connect and protobuf into setup.py with specifying dependencies > - > > Key: SPARK-41583 > URL: https://issues.apache.org/jira/browse/SPARK-41583 > Project: Spark > Issue Type: Sub-task > Components: Connect, Protobuf >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.4.0 > > > We should document this, and put both pyspark.sql.connect and > pyspark.sql.protobuf into the PyPi package. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41583) Add Spark Connect and protobuf into setup.py with specifying dependencies
[ https://issues.apache.org/jira/browse/SPARK-41583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41583: Assignee: Hyukjin Kwon > Add Spark Connect and protobuf into setup.py with specifying dependencies > - > > Key: SPARK-41583 > URL: https://issues.apache.org/jira/browse/SPARK-41583 > Project: Spark > Issue Type: Sub-task > Components: Connect, Protobuf >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > We should document this, and put both pyspark.sql.connect and > pyspark.sql.protobuf into the PyPi package. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41588) Make "Rule id not found" error message more actionable
[ https://issues.apache.org/jira/browse/SPARK-41588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-41588. Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39120 [https://github.com/apache/spark/pull/39120] > Make "Rule id not found" error message more actionable > -- > > Key: SPARK-41588 > URL: https://issues.apache.org/jira/browse/SPARK-41588 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Reynold Xin >Assignee: Reynold Xin >Priority: Major > Fix For: 3.4.0 > > > It was super confusing to me when adding a new rule that I bumped into the > rule id error. We should update the error message to make it more actionable, > i.e. explaining to the developers which file to modify. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41588) Make "Rule id not found" error message more actionable
[ https://issues.apache.org/jira/browse/SPARK-41588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41588: Assignee: Reynold Xin (was: Apache Spark) > Make "Rule id not found" error message more actionable > -- > > Key: SPARK-41588 > URL: https://issues.apache.org/jira/browse/SPARK-41588 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Reynold Xin >Assignee: Reynold Xin >Priority: Major > > It was super confusing to me when adding a new rule that I bumped into the > rule id error. We should update the error message to make it more actionable, > i.e. explaining to the developers which file to modify. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41588) Make "Rule id not found" error message more actionable
[ https://issues.apache.org/jira/browse/SPARK-41588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17649501#comment-17649501 ] Apache Spark commented on SPARK-41588: -- User 'rxin' has created a pull request for this issue: https://github.com/apache/spark/pull/39120 > Make "Rule id not found" error message more actionable > -- > > Key: SPARK-41588 > URL: https://issues.apache.org/jira/browse/SPARK-41588 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Reynold Xin >Assignee: Reynold Xin >Priority: Major > > It was super confusing to me when adding a new rule that I bumped into the > rule id error. We should update the error message to make it more actionable, > i.e. explaining to the developers which file to modify. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41588) Make "Rule id not found" error message more actionable
[ https://issues.apache.org/jira/browse/SPARK-41588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41588: Assignee: Apache Spark (was: Reynold Xin) > Make "Rule id not found" error message more actionable > -- > > Key: SPARK-41588 > URL: https://issues.apache.org/jira/browse/SPARK-41588 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Reynold Xin >Assignee: Apache Spark >Priority: Major > > It was super confusing to me when adding a new rule that I bumped into the > rule id error. We should update the error message to make it more actionable, > i.e. explaining to the developers which file to modify. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41588) Make "Rule id not found" error message more actionable
Reynold Xin created SPARK-41588: --- Summary: Make "Rule id not found" error message more actionable Key: SPARK-41588 URL: https://issues.apache.org/jira/browse/SPARK-41588 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: Reynold Xin Assignee: Reynold Xin It was super confusing to me when adding a new rule that I bumped into the rule id error. We should update the error message to make it more actionable, i.e. explaining to the developers which file to modify. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41420) Protobuf serializer for ApplicationInfoWrapper
[ https://issues.apache.org/jira/browse/SPARK-41420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-41420. Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39093 [https://github.com/apache/spark/pull/39093] > Protobuf serializer for ApplicationInfoWrapper > -- > > Key: SPARK-41420 > URL: https://issues.apache.org/jira/browse/SPARK-41420 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Sandeep Singh >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41420) Protobuf serializer for ApplicationInfoWrapper
[ https://issues.apache.org/jira/browse/SPARK-41420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-41420: -- Assignee: Sandeep Singh > Protobuf serializer for ApplicationInfoWrapper > -- > > Key: SPARK-41420 > URL: https://issues.apache.org/jira/browse/SPARK-41420 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Sandeep Singh >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41427) Protobuf serializer for ExecutorStageSummaryWrapper
[ https://issues.apache.org/jira/browse/SPARK-41427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17649438#comment-17649438 ] Apache Spark commented on SPARK-41427: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/39100 > Protobuf serializer for ExecutorStageSummaryWrapper > --- > > Key: SPARK-41427 > URL: https://issues.apache.org/jira/browse/SPARK-41427 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41422) Protobuf serializer for ExecutorSummaryWrapper
[ https://issues.apache.org/jira/browse/SPARK-41422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17649439#comment-17649439 ] Gengliang Wang commented on SPARK-41422: [~techaddict] I was commenting on the wrong jira. Feel free to submit the PR. > Protobuf serializer for ExecutorSummaryWrapper > -- > > Key: SPARK-41422 > URL: https://issues.apache.org/jira/browse/SPARK-41422 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41427) Protobuf serializer for ExecutorStageSummaryWrapper
[ https://issues.apache.org/jira/browse/SPARK-41427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17649440#comment-17649440 ] Apache Spark commented on SPARK-41427: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/39100 > Protobuf serializer for ExecutorStageSummaryWrapper > --- > > Key: SPARK-41427 > URL: https://issues.apache.org/jira/browse/SPARK-41427 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41427) Protobuf serializer for ExecutorStageSummaryWrapper
[ https://issues.apache.org/jira/browse/SPARK-41427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41427: Assignee: (was: Apache Spark) > Protobuf serializer for ExecutorStageSummaryWrapper > --- > > Key: SPARK-41427 > URL: https://issues.apache.org/jira/browse/SPARK-41427 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41427) Protobuf serializer for ExecutorStageSummaryWrapper
[ https://issues.apache.org/jira/browse/SPARK-41427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41427: Assignee: Apache Spark > Protobuf serializer for ExecutorStageSummaryWrapper > --- > > Key: SPARK-41427 > URL: https://issues.apache.org/jira/browse/SPARK-41427 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-41422) Protobuf serializer for ExecutorSummaryWrapper
[ https://issues.apache.org/jira/browse/SPARK-41422 ] Gengliang Wang deleted comment on SPARK-41422: was (Author: apachespark): User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/39100 > Protobuf serializer for ExecutorSummaryWrapper > -- > > Key: SPARK-41422 > URL: https://issues.apache.org/jira/browse/SPARK-41422 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-41422) Protobuf serializer for ExecutorSummaryWrapper
[ https://issues.apache.org/jira/browse/SPARK-41422 ] Gengliang Wang deleted comment on SPARK-41422: was (Author: gengliang.wang): [~techaddict] I have a PR for this one already. Sorry I didn't claim it. I will claim next time. The ExecutorMetrics is a bit tricky, so I am doing it by myself. > Protobuf serializer for ExecutorSummaryWrapper > -- > > Key: SPARK-41422 > URL: https://issues.apache.org/jira/browse/SPARK-41422 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41162) Anti-join must not be pushed below aggregation with ambiguous predicates
[ https://issues.apache.org/jira/browse/SPARK-41162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17649418#comment-17649418 ] Apache Spark commented on SPARK-41162: -- User 'EnricoMi' has created a pull request for this issue: https://github.com/apache/spark/pull/39131 > Anti-join must not be pushed below aggregation with ambiguous predicates > > > Key: SPARK-41162 > URL: https://issues.apache.org/jira/browse/SPARK-41162 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.3, 3.3.1, 3.2.3, 3.4.0 >Reporter: Enrico Minack >Priority: Major > Labels: correctness > > The following query should return a single row as all values for {{id}} > except for the largest will be eliminated by the anti-join: > {code} > val ids = Seq(1, 2, 3).toDF("id").distinct() > val result = ids.withColumn("id", $"id" + 1).join(ids, "id", > "left_anti").collect() > assert(result.length == 1) > {code} > Without the {{distinct()}}, the assertion is true. With {{distinct()}}, the > assertion should still hold but is false. > Rule {{PushDownLeftSemiAntiJoin}} pushes the {{Join}} below the left > {{Aggregate}} with join condition {{(id#750 + 1) = id#750}}, which can never > be true. > {code} > === Applying Rule > org.apache.spark.sql.catalyst.optimizer.PushDownLeftSemiAntiJoin === > !Join LeftAnti, (id#752 = id#750) 'Aggregate [id#750], > [(id#750 + 1) AS id#752] > !:- Aggregate [id#750], [(id#750 + 1) AS id#752] +- 'Join LeftAnti, > ((id#750 + 1) = id#750) > !: +- LocalRelation [id#750] :- LocalRelation > [id#750] > !+- Aggregate [id#750], [id#750] +- Aggregate [id#750], > [id#750] > ! +- LocalRelation [id#750]+- LocalRelation > [id#750] > {code} > The optimizer then rightly removes the left-anti join altogether, returning > the left child only. > Rule {{PushDownLeftSemiAntiJoin}} should not push down predicates that > reference left *and* right child. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41277) Save and leverage shuffle key in tblproperties
[ https://issues.apache.org/jira/browse/SPARK-41277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17649300#comment-17649300 ] Ohad Raviv commented on SPARK-41277: [~gurwls223] - can I please get your opinion here? > Save and leverage shuffle key in tblproperties > -- > > Key: SPARK-41277 > URL: https://issues.apache.org/jira/browse/SPARK-41277 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.1 >Reporter: Ohad Raviv >Priority: Minor > > I'm not sure if I'm not missing anything trivial. > In a typical process, many datasets get materialized and many of them after a > shuffle (e.g join). then they would again be involved in further actions and > often use the same key. > Wouldn't it make sense to save the shuffle key along with the table to avoid > unnecessary shuffles? > Also, the implementation seems quite straightforward - to just leverage the > bucketing mechanism. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41441) Allow Generate with no required child output to host outer references
[ https://issues.apache.org/jira/browse/SPARK-41441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-41441. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38968 [https://github.com/apache/spark/pull/38968] > Allow Generate with no required child output to host outer references > - > > Key: SPARK-41441 > URL: https://issues.apache.org/jira/browse/SPARK-41441 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Fix For: 3.4.0 > > > Currently, in CheckAnalysis, Spark disallows Generate to host any outer > references when it's required child output is not empty. But when the child > output is empty, it can host outer references, which DecorrelateInnerQuery > does not handle. > For example, > {code:java} > select * from t, lateral (select explode(array(c1, c2))){code} > This throws an internal error : > {code:java} > Caused by: java.lang.AssertionError: assertion failed: Correlated column is > not allowed in Generate explode(array(outer(c1#219), outer(c2#220))), false, > [col#221] +- OneRowRelation{code} > We should support Generate to host outer references when its required child > output is empty. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41441) Allow Generate with no required child output to host outer references
[ https://issues.apache.org/jira/browse/SPARK-41441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-41441: --- Assignee: Allison Wang > Allow Generate with no required child output to host outer references > - > > Key: SPARK-41441 > URL: https://issues.apache.org/jira/browse/SPARK-41441 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > > Currently, in CheckAnalysis, Spark disallows Generate to host any outer > references when it's required child output is not empty. But when the child > output is empty, it can host outer references, which DecorrelateInnerQuery > does not handle. > For example, > {code:java} > select * from t, lateral (select explode(array(c1, c2))){code} > This throws an internal error : > {code:java} > Caused by: java.lang.AssertionError: assertion failed: Correlated column is > not allowed in Generate explode(array(outer(c1#219), outer(c2#220))), false, > [col#221] +- OneRowRelation{code} > We should support Generate to host outer references when its required child > output is empty. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41587) Upgrade org.scalatestplus:selenium-4-4 to org.scalatestplus:selenium-4-7
[ https://issues.apache.org/jira/browse/SPARK-41587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17649287#comment-17649287 ] Apache Spark commented on SPARK-41587: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/39129 > Upgrade org.scalatestplus:selenium-4-4 to org.scalatestplus:selenium-4-7 > > > Key: SPARK-41587 > URL: https://issues.apache.org/jira/browse/SPARK-41587 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > https://github.com/scalatest/scalatestplus-selenium/releases/tag/release-3.2.14.0-for-selenium-4.7 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41587) Upgrade org.scalatestplus:selenium-4-4 to org.scalatestplus:selenium-4-7
[ https://issues.apache.org/jira/browse/SPARK-41587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41587: Assignee: Apache Spark > Upgrade org.scalatestplus:selenium-4-4 to org.scalatestplus:selenium-4-7 > > > Key: SPARK-41587 > URL: https://issues.apache.org/jira/browse/SPARK-41587 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > > https://github.com/scalatest/scalatestplus-selenium/releases/tag/release-3.2.14.0-for-selenium-4.7 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41587) Upgrade org.scalatestplus:selenium-4-4 to org.scalatestplus:selenium-4-7
[ https://issues.apache.org/jira/browse/SPARK-41587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41587: Assignee: (was: Apache Spark) > Upgrade org.scalatestplus:selenium-4-4 to org.scalatestplus:selenium-4-7 > > > Key: SPARK-41587 > URL: https://issues.apache.org/jira/browse/SPARK-41587 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > https://github.com/scalatest/scalatestplus-selenium/releases/tag/release-3.2.14.0-for-selenium-4.7 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41586) Introduce new PySpark package: pyspark.errors
[ https://issues.apache.org/jira/browse/SPARK-41586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41586: Assignee: (was: Apache Spark) > Introduce new PySpark package: pyspark.errors > - > > Key: SPARK-41586 > URL: https://issues.apache.org/jira/browse/SPARK-41586 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > Introduce new package `pyspark.errors` for improving PySpark error message. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41586) Introduce new PySpark package: pyspark.errors
[ https://issues.apache.org/jira/browse/SPARK-41586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41586: Assignee: Apache Spark > Introduce new PySpark package: pyspark.errors > - > > Key: SPARK-41586 > URL: https://issues.apache.org/jira/browse/SPARK-41586 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Apache Spark >Priority: Major > > Introduce new package `pyspark.errors` for improving PySpark error message. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41586) Introduce new PySpark package: pyspark.errors
[ https://issues.apache.org/jira/browse/SPARK-41586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17649286#comment-17649286 ] Apache Spark commented on SPARK-41586: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/39128 > Introduce new PySpark package: pyspark.errors > - > > Key: SPARK-41586 > URL: https://issues.apache.org/jira/browse/SPARK-41586 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > Introduce new package `pyspark.errors` for improving PySpark error message. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41587) Upgrade org.scalatestplus:selenium-4-4 to org.scalatestplus:selenium-4-7
Yang Jie created SPARK-41587: Summary: Upgrade org.scalatestplus:selenium-4-4 to org.scalatestplus:selenium-4-7 Key: SPARK-41587 URL: https://issues.apache.org/jira/browse/SPARK-41587 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.4.0 Reporter: Yang Jie https://github.com/scalatest/scalatestplus-selenium/releases/tag/release-3.2.14.0-for-selenium-4.7 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41586) Introduce new PySpark package: pyspark.errors
[ https://issues.apache.org/jira/browse/SPARK-41586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17649261#comment-17649261 ] Haejoon Lee commented on SPARK-41586: - I'm working on it > Introduce new PySpark package: pyspark.errors > - > > Key: SPARK-41586 > URL: https://issues.apache.org/jira/browse/SPARK-41586 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > > Introduce new package `pyspark.errors` for improving PySpark error message. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41586) Introduce new PySpark package: pyspark.errors
Haejoon Lee created SPARK-41586: --- Summary: Introduce new PySpark package: pyspark.errors Key: SPARK-41586 URL: https://issues.apache.org/jira/browse/SPARK-41586 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.4.0 Reporter: Haejoon Lee Introduce new package `pyspark.errors` for improving PySpark error message. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41585) The Spark exclude node functionality for YARN should work independently of dynamic allocation
[ https://issues.apache.org/jira/browse/SPARK-41585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17649235#comment-17649235 ] Apache Spark commented on SPARK-41585: -- User 'LucaCanali' has created a pull request for this issue: https://github.com/apache/spark/pull/39127 > The Spark exclude node functionality for YARN should work independently of > dynamic allocation > - > > Key: SPARK-41585 > URL: https://issues.apache.org/jira/browse/SPARK-41585 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 3.3.1 >Reporter: Luca Canali >Priority: Minor > > The Spark exclude node functionality for Spark on YARN, introduced in > SPARK-26688, allows users to specify a list of node names that are excluded > from resource allocation. This is done using the configuration parameter: > {{spark.yarn.exclude.nodes}} > The feature currently works only for executors allocated via dynamic > allocation. To use the feature on Spark 3.3.1, for example, one may need also > to configure spark.dynamicAllocation.minExecutors=0 and > spark.executor.instances=0, therefore relying on executor resource allocation > only via dynamic allocation. > This proposes to extend the use of Spark exclude node functionality for YARN > beyond dynamic allocation, which I believe makes it more consistent also with > what the documentation reports for this feature/configuration parameter. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21829) Enable config to permanently blacklist a list of nodes
[ https://issues.apache.org/jira/browse/SPARK-21829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17649234#comment-17649234 ] Luca Canali commented on SPARK-21829: - Note: similar functionality was later implemented in https://issues.apache.org/jira/browse/SPARK-26688 > Enable config to permanently blacklist a list of nodes > -- > > Key: SPARK-21829 > URL: https://issues.apache.org/jira/browse/SPARK-21829 > Project: Spark > Issue Type: New Feature > Components: Scheduler, Spark Core >Affects Versions: 2.1.1, 2.2.0 >Reporter: Luca Canali >Priority: Minor > > The idea for this proposal comes from a performance incident in a local > cluster where a job was found very slow because of a log tail of stragglers > due to 2 nodes in the cluster being slow to access a remote filesystem. > The issue was limited to the 2 machines and was related to external > configurations: the 2 machines that performed badly when accessing the remote > file system were behaving normally for other jobs in the cluster (a shared > YARN cluster). > With this new feature I propose to introduce a mechanism to allow users to > specify a list of nodes in the cluster where executors/tasks should not run > for a specific job. > The proposed implementation that I tested (see PR) uses the Spark blacklist > mechanism. With the parameter spark.blacklist.alwaysBlacklistedNodes, a list > of user-specified nodes is added to the blacklist at the start of the Spark > Context and it is never expired. > I have tested this on a YARN cluster on a case taken from the original > production problem and I confirm a performance improvement of about 5x for > the specific test case I have. I imagine that there can be other cases where > Spark users may want to blacklist a set of nodes. This can be used for > troubleshooting, including cases where certain nodes/executors are slow for a > given workload and this is caused by external agents, so the anomaly is not > picked up by the cluster manager. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41585) The Spark exclude node functionality for YARN should work independently of dynamic allocation
[ https://issues.apache.org/jira/browse/SPARK-41585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41585: Assignee: Apache Spark > The Spark exclude node functionality for YARN should work independently of > dynamic allocation > - > > Key: SPARK-41585 > URL: https://issues.apache.org/jira/browse/SPARK-41585 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 3.3.1 >Reporter: Luca Canali >Assignee: Apache Spark >Priority: Minor > > The Spark exclude node functionality for Spark on YARN, introduced in > SPARK-26688, allows users to specify a list of node names that are excluded > from resource allocation. This is done using the configuration parameter: > {{spark.yarn.exclude.nodes}} > The feature currently works only for executors allocated via dynamic > allocation. To use the feature on Spark 3.3.1, for example, one may need also > to configure spark.dynamicAllocation.minExecutors=0 and > spark.executor.instances=0, therefore relying on executor resource allocation > only via dynamic allocation. > This proposes to extend the use of Spark exclude node functionality for YARN > beyond dynamic allocation, which I believe makes it more consistent also with > what the documentation reports for this feature/configuration parameter. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41585) The Spark exclude node functionality for YARN should work independently of dynamic allocation
[ https://issues.apache.org/jira/browse/SPARK-41585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17649233#comment-17649233 ] Apache Spark commented on SPARK-41585: -- User 'LucaCanali' has created a pull request for this issue: https://github.com/apache/spark/pull/39127 > The Spark exclude node functionality for YARN should work independently of > dynamic allocation > - > > Key: SPARK-41585 > URL: https://issues.apache.org/jira/browse/SPARK-41585 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 3.3.1 >Reporter: Luca Canali >Priority: Minor > > The Spark exclude node functionality for Spark on YARN, introduced in > SPARK-26688, allows users to specify a list of node names that are excluded > from resource allocation. This is done using the configuration parameter: > {{spark.yarn.exclude.nodes}} > The feature currently works only for executors allocated via dynamic > allocation. To use the feature on Spark 3.3.1, for example, one may need also > to configure spark.dynamicAllocation.minExecutors=0 and > spark.executor.instances=0, therefore relying on executor resource allocation > only via dynamic allocation. > This proposes to extend the use of Spark exclude node functionality for YARN > beyond dynamic allocation, which I believe makes it more consistent also with > what the documentation reports for this feature/configuration parameter. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41585) The Spark exclude node functionality for YARN should work independently of dynamic allocation
[ https://issues.apache.org/jira/browse/SPARK-41585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41585: Assignee: (was: Apache Spark) > The Spark exclude node functionality for YARN should work independently of > dynamic allocation > - > > Key: SPARK-41585 > URL: https://issues.apache.org/jira/browse/SPARK-41585 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 3.3.1 >Reporter: Luca Canali >Priority: Minor > > The Spark exclude node functionality for Spark on YARN, introduced in > SPARK-26688, allows users to specify a list of node names that are excluded > from resource allocation. This is done using the configuration parameter: > {{spark.yarn.exclude.nodes}} > The feature currently works only for executors allocated via dynamic > allocation. To use the feature on Spark 3.3.1, for example, one may need also > to configure spark.dynamicAllocation.minExecutors=0 and > spark.executor.instances=0, therefore relying on executor resource allocation > only via dynamic allocation. > This proposes to extend the use of Spark exclude node functionality for YARN > beyond dynamic allocation, which I believe makes it more consistent also with > what the documentation reports for this feature/configuration parameter. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41585) The Spark exclude node functionality for YARN should work independently of dynamic allocation
[ https://issues.apache.org/jira/browse/SPARK-41585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luca Canali updated SPARK-41585: Description: The Spark exclude node functionality for Spark on YARN, introduced in SPARK-26688, allows users to specify a list of node names that are excluded from resource allocation. This is done using the configuration parameter: {{spark.yarn.exclude.nodes}} The feature currently works only for executors allocated via dynamic allocation. To use the feature on Spark 3.3.1, for example, one may need also to configure spark.dynamicAllocation.minExecutors=0 and spark.executor.instances=0, therefore relying on executor resource allocation only via dynamic allocation. This proposes to extend the use of Spark exclude node functionality for YARN beyond dynamic allocation, which I believe makes it more consistent also with what the documentation reports for this feature/configuration parameter. was: The Spark exclude node functionality for YARN, introduced in SPARK-26688, allows users to specify a list of node names that are excluded from resource allocation. This is done using the configuration parameter: {{spark.yarn.exclude.nodes}} The feature currently works only for executors allocated via dynamic allocation. To use the feature on Spark 3.3.1, for example, one may need also to configure spark.dynamicAllocation.minExecutors=0 and spark.executor.instances=0, therefore relying on executor resource allocation only via dynamic allocation. This proposes to extend the use of Spark exclude node functionality for YARN beyond dynamic allocation, which I believe makes it more consistent also with what the documentation reports for this feature/configuration parameter. > The Spark exclude node functionality for YARN should work independently of > dynamic allocation > - > > Key: SPARK-41585 > URL: https://issues.apache.org/jira/browse/SPARK-41585 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 3.3.1 >Reporter: Luca Canali >Priority: Minor > > The Spark exclude node functionality for Spark on YARN, introduced in > SPARK-26688, allows users to specify a list of node names that are excluded > from resource allocation. This is done using the configuration parameter: > {{spark.yarn.exclude.nodes}} > The feature currently works only for executors allocated via dynamic > allocation. To use the feature on Spark 3.3.1, for example, one may need also > to configure spark.dynamicAllocation.minExecutors=0 and > spark.executor.instances=0, therefore relying on executor resource allocation > only via dynamic allocation. > This proposes to extend the use of Spark exclude node functionality for YARN > beyond dynamic allocation, which I believe makes it more consistent also with > what the documentation reports for this feature/configuration parameter. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41585) The Spark exclude node functionality for YARN should work independently of dynamic allocation
[ https://issues.apache.org/jira/browse/SPARK-41585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luca Canali updated SPARK-41585: Description: The Spark exclude node functionality for YARN, introduced in SPARK-26688, allows users to specify a list of node names that are excluded from resource allocation. This is done using the configuration parameter: {{spark.yarn.exclude.nodes}} The feature currently works only for executors allocated via dynamic allocation. To use the feature on Spark 3.3.1, for example, one may need also to configure spark.dynamicAllocation.minExecutors=0 and spark.executor.instances=0, therefore relying on executor resource allocation only via dynamic allocation. This proposes to extend the use of Spark exclude node functionality for YARN beyond dynamic allocation, which I believe makes it more consistent also with what the documentation reports for this feature/configuration parameter. was: The Spark exclude node functionality for YARN, introduced in SPARK-26688, allows users to specify a list of node names that are excluded from resource allocation. This is done using the configuration parameter: {{spark.yarn.exclude.nodes}} The feature currently works only for executors allocated via dynamic allocation. To use the feature on Spark 3.3.1, for eaxmple, one needs to configure spark.dynamicAllocation.minExecutors=0 and spark.executor.instances=0, therefore relying on executor resource allocation only via dynamic allocation. This proposes to extend the use of Spark exclude node functionality for YARN beyond dynamic allocation, which I believe makes it more consistent also with what the documentation reports for this feature/configuration parameter. > The Spark exclude node functionality for YARN should work independently of > dynamic allocation > - > > Key: SPARK-41585 > URL: https://issues.apache.org/jira/browse/SPARK-41585 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 3.3.1 >Reporter: Luca Canali >Priority: Minor > > The Spark exclude node functionality for YARN, introduced in SPARK-26688, > allows users to specify a list of node names that are excluded from resource > allocation. This is done using the configuration parameter: > {{spark.yarn.exclude.nodes}} > The feature currently works only for executors allocated via dynamic > allocation. To use the feature on Spark 3.3.1, for example, one may need also > to configure spark.dynamicAllocation.minExecutors=0 and > spark.executor.instances=0, therefore relying on executor resource allocation > only via dynamic allocation. > This proposes to extend the use of Spark exclude node functionality for YARN > beyond dynamic allocation, which I believe makes it more consistent also with > what the documentation reports for this feature/configuration parameter. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41585) The Spark exclude node functionality for YARN should work independently of dynamic allocation
Luca Canali created SPARK-41585: --- Summary: The Spark exclude node functionality for YARN should work independently of dynamic allocation Key: SPARK-41585 URL: https://issues.apache.org/jira/browse/SPARK-41585 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 3.3.1 Reporter: Luca Canali The Spark exclude node functionality for YARN, introduced in SPARK-26688, allows users to specify a list of node names that are excluded from resource allocation. This is done using the configuration parameter: {{spark.yarn.exclude.nodes}} The feature currently works only for executors allocated via dynamic allocation. To use the feature on Spark 3.3.1, for eaxmple, one needs to configure spark.dynamicAllocation.minExecutors=0 and spark.executor.instances=0, therefore relying on executor resource allocation only via dynamic allocation. This proposes to extend the use of Spark exclude node functionality for YARN beyond dynamic allocation, which I believe makes it more consistent also with what the documentation reports for this feature/configuration parameter. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41584) Upgrade RoaringBitmap to 0.9.36
[ https://issues.apache.org/jira/browse/SPARK-41584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17649201#comment-17649201 ] Apache Spark commented on SPARK-41584: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/39125 > Upgrade RoaringBitmap to 0.9.36 > --- > > Key: SPARK-41584 > URL: https://issues.apache.org/jira/browse/SPARK-41584 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > https://github.com/RoaringBitmap/RoaringBitmap/compare/0.9.35...0.9.36 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41584) Upgrade RoaringBitmap to 0.9.36
[ https://issues.apache.org/jira/browse/SPARK-41584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41584: Assignee: (was: Apache Spark) > Upgrade RoaringBitmap to 0.9.36 > --- > > Key: SPARK-41584 > URL: https://issues.apache.org/jira/browse/SPARK-41584 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > https://github.com/RoaringBitmap/RoaringBitmap/compare/0.9.35...0.9.36 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41584) Upgrade RoaringBitmap to 0.9.36
[ https://issues.apache.org/jira/browse/SPARK-41584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17649200#comment-17649200 ] Apache Spark commented on SPARK-41584: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/39125 > Upgrade RoaringBitmap to 0.9.36 > --- > > Key: SPARK-41584 > URL: https://issues.apache.org/jira/browse/SPARK-41584 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > > https://github.com/RoaringBitmap/RoaringBitmap/compare/0.9.35...0.9.36 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41584) Upgrade RoaringBitmap to 0.9.36
[ https://issues.apache.org/jira/browse/SPARK-41584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41584: Assignee: Apache Spark > Upgrade RoaringBitmap to 0.9.36 > --- > > Key: SPARK-41584 > URL: https://issues.apache.org/jira/browse/SPARK-41584 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > > https://github.com/RoaringBitmap/RoaringBitmap/compare/0.9.35...0.9.36 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41584) Upgrade RoaringBitmap to 0.9.36
Yang Jie created SPARK-41584: Summary: Upgrade RoaringBitmap to 0.9.36 Key: SPARK-41584 URL: https://issues.apache.org/jira/browse/SPARK-41584 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.4.0 Reporter: Yang Jie https://github.com/RoaringBitmap/RoaringBitmap/compare/0.9.35...0.9.36 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org