[jira] [Commented] (SPARK-41427) Protobuf serializer for ExecutorStageSummaryWrapper

2022-12-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649617#comment-17649617
 ] 

Apache Spark commented on SPARK-41427:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/39135

> Protobuf serializer for ExecutorStageSummaryWrapper
> ---
>
> Key: SPARK-41427
> URL: https://issues.apache.org/jira/browse/SPARK-41427
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41596) Document the new feature "Async Progress Tracking" to Structured Streaming guide doc

2022-12-19 Thread Jungtaek Lim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649616#comment-17649616
 ] 

Jungtaek Lim commented on SPARK-41596:
--

cc. [~jerrypeng] Could you please take this up and complete the efforts for 
this SPIP? Thanks in advance.

> Document the new feature "Async Progress Tracking" to Structured Streaming 
> guide doc
> 
>
> Key: SPARK-41596
> URL: https://issues.apache.org/jira/browse/SPARK-41596
> Project: Spark
>  Issue Type: Documentation
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Jungtaek Lim
>Priority: Blocker
>
> Given that we merged the new SPIP feature SPARK-39591, we have to document 
> the new feature to the Structured Streaming guide doc so that end users can 
> refer to the doc and start experimenting the feature.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41596) Document the new feature "Async Progress Tracking" to Structured Streaming guide doc

2022-12-19 Thread Jungtaek Lim (Jira)
Jungtaek Lim created SPARK-41596:


 Summary: Document the new feature "Async Progress Tracking" to 
Structured Streaming guide doc
 Key: SPARK-41596
 URL: https://issues.apache.org/jira/browse/SPARK-41596
 Project: Spark
  Issue Type: Documentation
  Components: Structured Streaming
Affects Versions: 3.4.0
Reporter: Jungtaek Lim


Given that we merged the new SPIP feature SPARK-39591, we have to document the 
new feature to the Structured Streaming guide doc so that end users can refer 
to the doc and start experimenting the feature.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39591) SPIP: Asynchronous Offset Management in Structured Streaming

2022-12-19 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-39591.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38517
[https://github.com/apache/spark/pull/38517]

> SPIP: Asynchronous Offset Management in Structured Streaming
> 
>
> Key: SPARK-39591
> URL: https://issues.apache.org/jira/browse/SPARK-39591
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Boyang Jerry Peng
>Assignee: Boyang Jerry Peng
>Priority: Major
>  Labels: SPIP
> Fix For: 3.4.0
>
>
> Currently in Structured Streaming, at the beginning of every micro-batch the 
> offset to process up to for the current batch is persisted to durable 
> storage.  At the end of every micro-batch, a marker to indicate the 
> completion of this current micro-batch is persisted to durable storage. For 
> pipelines such as one that read from Kafka and write to Kafka, end-to-end 
> exactly once is not support and latency is sensitive, we can allow users to 
> configure offset commits to be written asynchronously thus this commit 
> operation will not contribute to the batch duration and effectively lowering 
> the overall latency of the pipeline.
>  
> SPIP Doc: 
>  
> https://docs.google.com/document/d/1iPiI4YoGCM0i61pBjkxcggU57gHKf2jVwD7HWMHgH-Y/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39591) SPIP: Asynchronous Offset Management in Structured Streaming

2022-12-19 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-39591:


Assignee: Boyang Jerry Peng

> SPIP: Asynchronous Offset Management in Structured Streaming
> 
>
> Key: SPARK-39591
> URL: https://issues.apache.org/jira/browse/SPARK-39591
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Boyang Jerry Peng
>Assignee: Boyang Jerry Peng
>Priority: Major
>  Labels: SPIP
>
> Currently in Structured Streaming, at the beginning of every micro-batch the 
> offset to process up to for the current batch is persisted to durable 
> storage.  At the end of every micro-batch, a marker to indicate the 
> completion of this current micro-batch is persisted to durable storage. For 
> pipelines such as one that read from Kafka and write to Kafka, end-to-end 
> exactly once is not support and latency is sensitive, we can allow users to 
> configure offset commits to be written asynchronously thus this commit 
> operation will not contribute to the batch duration and effectively lowering 
> the overall latency of the pipeline.
>  
> SPIP Doc: 
>  
> https://docs.google.com/document/d/1iPiI4YoGCM0i61pBjkxcggU57gHKf2jVwD7HWMHgH-Y/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41425) Protobuf serializer for RDDStorageInfoWrapper

2022-12-19 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-41425.

Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39104
[https://github.com/apache/spark/pull/39104]

> Protobuf serializer for RDDStorageInfoWrapper
> -
>
> Key: SPARK-41425
> URL: https://issues.apache.org/jira/browse/SPARK-41425
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Sandeep Singh
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41425) Protobuf serializer for RDDStorageInfoWrapper

2022-12-19 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-41425:
--

Assignee: Sandeep Singh

> Protobuf serializer for RDDStorageInfoWrapper
> -
>
> Key: SPARK-41425
> URL: https://issues.apache.org/jira/browse/SPARK-41425
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Sandeep Singh
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41427) Protobuf serializer for ExecutorStageSummaryWrapper

2022-12-19 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-41427.

Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39100
[https://github.com/apache/spark/pull/39100]

> Protobuf serializer for ExecutorStageSummaryWrapper
> ---
>
> Key: SPARK-41427
> URL: https://issues.apache.org/jira/browse/SPARK-41427
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41427) Protobuf serializer for ExecutorStageSummaryWrapper

2022-12-19 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-41427:
--

Assignee: Gengliang Wang

> Protobuf serializer for ExecutorStageSummaryWrapper
> ---
>
> Key: SPARK-41427
> URL: https://issues.apache.org/jira/browse/SPARK-41427
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41349) Implement `DataFrame.hint`

2022-12-19 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-41349.
---
Resolution: Fixed

Issue resolved by pull request 38984
[https://github.com/apache/spark/pull/38984]

> Implement `DataFrame.hint`
> --
>
> Key: SPARK-41349
> URL: https://issues.apache.org/jira/browse/SPARK-41349
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Deng Ziming
>Priority: Major
> Fix For: 3.4.0
>
>
> implement DataFrame.hint with the proto message added in 
> https://issues.apache.org/jira/browse/SPARK-41345



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41423) Protobuf serializer for StageDataWrapper

2022-12-19 Thread BingKun Pan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649528#comment-17649528
 ] 

BingKun Pan commented on SPARK-41423:
-

I work on it.

> Protobuf serializer for StageDataWrapper
> 
>
> Key: SPARK-41423
> URL: https://issues.apache.org/jira/browse/SPARK-41423
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41595) Support generator function explode/explode_outer in the FROM clause

2022-12-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41595:


Assignee: (was: Apache Spark)

> Support generator function explode/explode_outer in the FROM clause
> ---
>
> Key: SPARK-41595
> URL: https://issues.apache.org/jira/browse/SPARK-41595
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Allison Wang
>Priority: Major
>
> Currently, the table-valued generator function explode/explode_outer can only 
> be used in the SELECT clause of a query:
> SELECT explode(array(1, 2))
> This task is to allow table-valued functions to be used in the FROM clause of 
> a query:
> SELECT * FROM explode(array(1, 2))



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41595) Support generator function explode/explode_outer in the FROM clause

2022-12-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649527#comment-17649527
 ] 

Apache Spark commented on SPARK-41595:
--

User 'allisonwang-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/39133

> Support generator function explode/explode_outer in the FROM clause
> ---
>
> Key: SPARK-41595
> URL: https://issues.apache.org/jira/browse/SPARK-41595
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Allison Wang
>Priority: Major
>
> Currently, the table-valued generator function explode/explode_outer can only 
> be used in the SELECT clause of a query:
> SELECT explode(array(1, 2))
> This task is to allow table-valued functions to be used in the FROM clause of 
> a query:
> SELECT * FROM explode(array(1, 2))



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41595) Support generator function explode/explode_outer in the FROM clause

2022-12-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41595:


Assignee: Apache Spark

> Support generator function explode/explode_outer in the FROM clause
> ---
>
> Key: SPARK-41595
> URL: https://issues.apache.org/jira/browse/SPARK-41595
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Allison Wang
>Assignee: Apache Spark
>Priority: Major
>
> Currently, the table-valued generator function explode/explode_outer can only 
> be used in the SELECT clause of a query:
> SELECT explode(array(1, 2))
> This task is to allow table-valued functions to be used in the FROM clause of 
> a query:
> SELECT * FROM explode(array(1, 2))



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41589) PyTorch Distributor

2022-12-19 Thread Rithwik Ediga Lakhamsani (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649526#comment-17649526
 ] 

Rithwik Ediga Lakhamsani commented on SPARK-41589:
--

[~xkrogen] I created a new copy, please let me know if you still can't see it. 
Thank you for your patience!

> PyTorch Distributor
> ---
>
> Key: SPARK-41589
> URL: https://issues.apache.org/jira/browse/SPARK-41589
> Project: Spark
>  Issue Type: Umbrella
>  Components: ML, PySpark
>Affects Versions: 3.4.0
>Reporter: Rithwik Ediga Lakhamsani
>Priority: Major
>
> This is a project to make it easier for PySpark users to distribute PyTorch 
> code using PySpark. The corresponding [Design 
> Document|https://docs.google.com/document/d/1QPO1Ly8WteL6aIPvVcR7Xne9qVtJiB3fdrRn7NwBcpA/edit?usp=sharing]
>  can give more context. This was a project determined by the Databricks ML 
> Training Team; please reach out to [~gurwls223] (Spark-side proxy) or 
> [~erithwik] for more context.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41589) PyTorch Distributor

2022-12-19 Thread Rithwik Ediga Lakhamsani (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rithwik Ediga Lakhamsani updated SPARK-41589:
-
Description: This is a project to make it easier for PySpark users to 
distribute PyTorch code using PySpark. The corresponding [Design 
Document|https://docs.google.com/document/d/1QPO1Ly8WteL6aIPvVcR7Xne9qVtJiB3fdrRn7NwBcpA/edit?usp=sharing]
 can give more context. This was a project determined by the Databricks ML 
Training Team; please reach out to [~gurwls223] (Spark-side proxy) or 
[~erithwik] for more context.  (was: This is a project to make it easier for 
PySpark users to distribute PyTorch code using PySpark. The corresponding 
[Design 
Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit]
 and 
[PRD|https://docs.google.com/document/d/1KprHkzx9r3lv47TLgO6FnkYZT92xOx6OeKvTJPxqpfk/edit]
 can give more context. This was a project determined by the Databricks ML 
Training Team; please reach out to [~gurwls223] (Spark-side proxy) or 
[~erithwik] for more context.)

> PyTorch Distributor
> ---
>
> Key: SPARK-41589
> URL: https://issues.apache.org/jira/browse/SPARK-41589
> Project: Spark
>  Issue Type: Umbrella
>  Components: ML, PySpark
>Affects Versions: 3.4.0
>Reporter: Rithwik Ediga Lakhamsani
>Priority: Major
>
> This is a project to make it easier for PySpark users to distribute PyTorch 
> code using PySpark. The corresponding [Design 
> Document|https://docs.google.com/document/d/1QPO1Ly8WteL6aIPvVcR7Xne9qVtJiB3fdrRn7NwBcpA/edit?usp=sharing]
>  can give more context. This was a project determined by the Databricks ML 
> Training Team; please reach out to [~gurwls223] (Spark-side proxy) or 
> [~erithwik] for more context.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41595) Support generator function explode/explode_outer in the FROM clause

2022-12-19 Thread Allison Wang (Jira)
Allison Wang created SPARK-41595:


 Summary: Support generator function explode/explode_outer in the 
FROM clause
 Key: SPARK-41595
 URL: https://issues.apache.org/jira/browse/SPARK-41595
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Allison Wang


Currently, the table-valued generator function explode/explode_outer can only 
be used in the SELECT clause of a query:

SELECT explode(array(1, 2))

This task is to allow table-valued functions to be used in the FROM clause of a 
query:

SELECT * FROM explode(array(1, 2))



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41594) Support table-valued generator functions in the FROM clause

2022-12-19 Thread Allison Wang (Jira)
Allison Wang created SPARK-41594:


 Summary: Support table-valued generator functions in the FROM 
clause
 Key: SPARK-41594
 URL: https://issues.apache.org/jira/browse/SPARK-41594
 Project: Spark
  Issue Type: Umbrella
  Components: SQL
Affects Versions: 3.4.0
Reporter: Allison Wang


Umbrella Jira for supporting table-valued generator functions in the FROM 
clause of a query. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41589) PyTorch Distributor

2022-12-19 Thread Rithwik Ediga Lakhamsani (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649516#comment-17649516
 ] 

Rithwik Ediga Lakhamsani commented on SPARK-41589:
--

Sorry, I need update it with a new copy. I will add a new comment on this 
ticket when the new document should be available.

> PyTorch Distributor
> ---
>
> Key: SPARK-41589
> URL: https://issues.apache.org/jira/browse/SPARK-41589
> Project: Spark
>  Issue Type: Umbrella
>  Components: ML, PySpark
>Affects Versions: 3.4.0
>Reporter: Rithwik Ediga Lakhamsani
>Priority: Major
>
> This is a project to make it easier for PySpark users to distribute PyTorch 
> code using PySpark. The corresponding [Design 
> Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit]
>  and 
> [PRD|https://docs.google.com/document/d/1KprHkzx9r3lv47TLgO6FnkYZT92xOx6OeKvTJPxqpfk/edit]
>  can give more context. This was a project determined by the Databricks ML 
> Training Team; please reach out to [~gurwls223] (Spark-side proxy) or 
> [~erithwik] for more context.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41589) PyTorch Distributor

2022-12-19 Thread Erik Krogen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649515#comment-17649515
 ] 

Erik Krogen commented on SPARK-41589:
-

Nope :(

> PyTorch Distributor
> ---
>
> Key: SPARK-41589
> URL: https://issues.apache.org/jira/browse/SPARK-41589
> Project: Spark
>  Issue Type: Umbrella
>  Components: ML, PySpark
>Affects Versions: 3.4.0
>Reporter: Rithwik Ediga Lakhamsani
>Priority: Major
>
> This is a project to make it easier for PySpark users to distribute PyTorch 
> code using PySpark. The corresponding [Design 
> Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit]
>  and 
> [PRD|https://docs.google.com/document/d/1KprHkzx9r3lv47TLgO6FnkYZT92xOx6OeKvTJPxqpfk/edit]
>  can give more context. This was a project determined by the Databricks ML 
> Training Team; please reach out to [~gurwls223] (Spark-side proxy) or 
> [~erithwik] for more context.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41589) PyTorch Distributor

2022-12-19 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-41589:
-
Component/s: PySpark

> PyTorch Distributor
> ---
>
> Key: SPARK-41589
> URL: https://issues.apache.org/jira/browse/SPARK-41589
> Project: Spark
>  Issue Type: Umbrella
>  Components: ML, PySpark
>Affects Versions: 3.4.0
>Reporter: Rithwik Ediga Lakhamsani
>Priority: Major
>
> This is a project to make it easier for PySpark users to distribute PyTorch 
> code using PySpark. The corresponding [Design 
> Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit]
>  and 
> [PRD|https://docs.google.com/document/d/1KprHkzx9r3lv47TLgO6FnkYZT92xOx6OeKvTJPxqpfk/edit]
>  can give more context. This was a project determined by the Databricks ML 
> Training Team; please reach out to [~gurwls223] (Spark-side proxy) or 
> [~erithwik] for more context.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41535) InterpretedUnsafeProjection and InterpretedMutableProjection can corrupt unsafe buffer when used with calendar interval data

2022-12-19 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41535:


Assignee: Bruce Robbins

> InterpretedUnsafeProjection and InterpretedMutableProjection can corrupt 
> unsafe buffer when used with calendar interval data
> 
>
> Key: SPARK-41535
> URL: https://issues.apache.org/jira/browse/SPARK-41535
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.1, 3.2.3, 3.4.0
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Major
>
> This returns the wrong answer:
> {noformat}
> set spark.sql.codegen.wholeStage=false;
> set spark.sql.codegen.factoryMode=NO_CODEGEN;
> select first(col1), last(col2) from values
> (make_interval(0, 0, 0, 7, 0, 0, 0), make_interval(17, 0, 0, 2, 0, 0, 0))
> as data(col1, col2);
> +---+---+
> |first(col1)|last(col2) |
> +---+---+
> |16 years 2 days|16 years 2 days|
> +---+---+
> {noformat}
> In the above case, {{TungstenAggregationIterator}} uses 
> {{InterpretedUnsafeProjection}} to create the aggregation buffer and then 
> initializes all the fields to null. {{InterpretedUnsafeProjection}} 
> incorrectly calls {{UnsafeRowWriter#setNullAt}}, rather than 
> {{unsafeRowWriter#write}}, for the two calendar interval fields. As a result, 
> the writer never allocates memory from the variable length region for the two 
> decimals, and the pointers in the fixed region get left as zero. Later, when 
> {{InterpretedMutableProjection}} attempts to update the first field, 
> {{UnsafeRow#setInterval}} picks up the zero pointer and stores interval data 
> on top of the null-tracking bit set. The call to UnsafeRow#setInterval for 
> the second field also stomps the null-tracking bit set. Later updates to the 
> null-tracking bit set (e.g., calls to setNotNullAt) further corrupt the 
> interval data, turning {{interval 7 years 2 days}} into {{interval 16 years 2 
> days}}.
> Even if you fix the above bug to {{InterpretedUnsafeProjection}} so that the 
> buffer is created correctly, {{InterpretedMutableProjection}} has a similar 
> bug to SPARK-41395, except this time for calendar interval data:
> {noformat}
> set spark.sql.codegen.wholeStage=false;
> set spark.sql.codegen.factoryMode=NO_CODEGEN;
> select first(col1), last(col2), max(col3) from values
> (null, null, 1),
> (make_interval(0, 0, 0, 7, 0, 0, 0), make_interval(17, 0, 0, 2, 0, 0, 0), 3)
> as data(col1, col2, col3);
> +---+---+-+
> |first(col1)|last(col2) |max(col3)|
> +---+---+-+
> |16 years 2 days|16 years 2 days|3|
> +---+---+-+
> {noformat}
> These two bugs could get exercised during codegen fallback. Take for example 
> this case where I forced codegen to fail for the Greatest expression:
> {noformat}
> spark-sql> select first(col1), last(col2), max(col3) from values
> (null, null, 1),
> (make_interval(0, 0, 0, 7, 0, 0, 0), make_interval(17, 0, 0, 2, 0, 0, 0), 3)
> as data(col1, col2, col3);
> 22/12/15 13:06:23 ERROR CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 70, Column 1: ';' expected instead of 'if'
> ...
> 22/12/15 13:06:24 WARN MutableProjection: Expr codegen error and falling back 
> to interpreter mode
> java.util.concurrent.ExecutionException: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 78, Column 1: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 78, Column 1: ';' expected instead of 'boolean'
> ...
> 16 years 2 days   16 years 2 days 3
> Time taken: 5.852 seconds, Fetched 1 row(s)
> spark-sql> 
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41535) InterpretedUnsafeProjection and InterpretedMutableProjection can corrupt unsafe buffer when used with calendar interval data

2022-12-19 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41535.
--
Fix Version/s: 3.3.2
   3.2.3
   3.4.0
   Resolution: Fixed

Issue resolved by pull request 39117
[https://github.com/apache/spark/pull/39117]

> InterpretedUnsafeProjection and InterpretedMutableProjection can corrupt 
> unsafe buffer when used with calendar interval data
> 
>
> Key: SPARK-41535
> URL: https://issues.apache.org/jira/browse/SPARK-41535
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.1, 3.2.3, 3.4.0
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Major
> Fix For: 3.3.2, 3.2.3, 3.4.0
>
>
> This returns the wrong answer:
> {noformat}
> set spark.sql.codegen.wholeStage=false;
> set spark.sql.codegen.factoryMode=NO_CODEGEN;
> select first(col1), last(col2) from values
> (make_interval(0, 0, 0, 7, 0, 0, 0), make_interval(17, 0, 0, 2, 0, 0, 0))
> as data(col1, col2);
> +---+---+
> |first(col1)|last(col2) |
> +---+---+
> |16 years 2 days|16 years 2 days|
> +---+---+
> {noformat}
> In the above case, {{TungstenAggregationIterator}} uses 
> {{InterpretedUnsafeProjection}} to create the aggregation buffer and then 
> initializes all the fields to null. {{InterpretedUnsafeProjection}} 
> incorrectly calls {{UnsafeRowWriter#setNullAt}}, rather than 
> {{unsafeRowWriter#write}}, for the two calendar interval fields. As a result, 
> the writer never allocates memory from the variable length region for the two 
> decimals, and the pointers in the fixed region get left as zero. Later, when 
> {{InterpretedMutableProjection}} attempts to update the first field, 
> {{UnsafeRow#setInterval}} picks up the zero pointer and stores interval data 
> on top of the null-tracking bit set. The call to UnsafeRow#setInterval for 
> the second field also stomps the null-tracking bit set. Later updates to the 
> null-tracking bit set (e.g., calls to setNotNullAt) further corrupt the 
> interval data, turning {{interval 7 years 2 days}} into {{interval 16 years 2 
> days}}.
> Even if you fix the above bug to {{InterpretedUnsafeProjection}} so that the 
> buffer is created correctly, {{InterpretedMutableProjection}} has a similar 
> bug to SPARK-41395, except this time for calendar interval data:
> {noformat}
> set spark.sql.codegen.wholeStage=false;
> set spark.sql.codegen.factoryMode=NO_CODEGEN;
> select first(col1), last(col2), max(col3) from values
> (null, null, 1),
> (make_interval(0, 0, 0, 7, 0, 0, 0), make_interval(17, 0, 0, 2, 0, 0, 0), 3)
> as data(col1, col2, col3);
> +---+---+-+
> |first(col1)|last(col2) |max(col3)|
> +---+---+-+
> |16 years 2 days|16 years 2 days|3|
> +---+---+-+
> {noformat}
> These two bugs could get exercised during codegen fallback. Take for example 
> this case where I forced codegen to fail for the Greatest expression:
> {noformat}
> spark-sql> select first(col1), last(col2), max(col3) from values
> (null, null, 1),
> (make_interval(0, 0, 0, 7, 0, 0, 0), make_interval(17, 0, 0, 2, 0, 0, 0), 3)
> as data(col1, col2, col3);
> 22/12/15 13:06:23 ERROR CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 70, Column 1: ';' expected instead of 'if'
> ...
> 22/12/15 13:06:24 WARN MutableProjection: Expr codegen error and falling back 
> to interpreter mode
> java.util.concurrent.ExecutionException: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 78, Column 1: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 78, Column 1: ';' expected instead of 'boolean'
> ...
> 16 years 2 days   16 years 2 days 3
> Time taken: 5.852 seconds, Fetched 1 row(s)
> spark-sql> 
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-41589) PyTorch Distributor

2022-12-19 Thread Rithwik Ediga Lakhamsani (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649511#comment-17649511
 ] 

Rithwik Ediga Lakhamsani edited comment on SPARK-41589 at 12/20/22 12:27 AM:
-

Oh sorry, let me fix that! Does it work now [~xkrogen]?


was (Author: JIRAUSER298573):
Oh sorry, let me fix that! 

> PyTorch Distributor
> ---
>
> Key: SPARK-41589
> URL: https://issues.apache.org/jira/browse/SPARK-41589
> Project: Spark
>  Issue Type: Umbrella
>  Components: ML
>Affects Versions: 3.4.0
>Reporter: Rithwik Ediga Lakhamsani
>Priority: Major
>
> This is a project to make it easier for PySpark users to distribute PyTorch 
> code using PySpark. The corresponding [Design 
> Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit]
>  and 
> [PRD|https://docs.google.com/document/d/1KprHkzx9r3lv47TLgO6FnkYZT92xOx6OeKvTJPxqpfk/edit]
>  can give more context. This was a project determined by the Databricks ML 
> Training Team; please reach out to [~gurwls223] (Spark-side proxy) or 
> [~erithwik] for more context.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41589) PyTorch Distributor

2022-12-19 Thread Rithwik Ediga Lakhamsani (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rithwik Ediga Lakhamsani updated SPARK-41589:
-
Description: This is a project to make it easier for PySpark users to 
distribute PyTorch code using PySpark. The corresponding [Design 
Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit]
 and 
[PRD|https://docs.google.com/document/d/1KprHkzx9r3lv47TLgO6FnkYZT92xOx6OeKvTJPxqpfk/edit]
 can give more context. This was a project determined by the Databricks ML 
Training Team; please reach out to [~gurwls223] (Spark-side proxy) or 
[~erithwik] for more context.  (was: This is a project to make it easier for 
PySpark users to distribute PyTorch code using PySpark. The corresponding 
[Design 
Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit]
 and 
[PRD|https://docs.google.com/document/d/1KprHkzx9r3lv47TLgO6FnkYZT92xOx6OeKvTJPxqpfk/edit]
 can give more context. )

> PyTorch Distributor
> ---
>
> Key: SPARK-41589
> URL: https://issues.apache.org/jira/browse/SPARK-41589
> Project: Spark
>  Issue Type: Umbrella
>  Components: ML
>Affects Versions: 3.4.0
>Reporter: Rithwik Ediga Lakhamsani
>Priority: Major
>
> This is a project to make it easier for PySpark users to distribute PyTorch 
> code using PySpark. The corresponding [Design 
> Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit]
>  and 
> [PRD|https://docs.google.com/document/d/1KprHkzx9r3lv47TLgO6FnkYZT92xOx6OeKvTJPxqpfk/edit]
>  can give more context. This was a project determined by the Databricks ML 
> Training Team; please reach out to [~gurwls223] (Spark-side proxy) or 
> [~erithwik] for more context.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41589) PyTorch Distributor

2022-12-19 Thread Rithwik Ediga Lakhamsani (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649511#comment-17649511
 ] 

Rithwik Ediga Lakhamsani commented on SPARK-41589:
--

Oh sorry, let me fix that! 

> PyTorch Distributor
> ---
>
> Key: SPARK-41589
> URL: https://issues.apache.org/jira/browse/SPARK-41589
> Project: Spark
>  Issue Type: Umbrella
>  Components: ML
>Affects Versions: 3.4.0
>Reporter: Rithwik Ediga Lakhamsani
>Priority: Major
>
> This is a project to make it easier for PySpark users to distribute PyTorch 
> code using PySpark. The corresponding [Design 
> Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit]
>  and 
> [PRD|https://docs.google.com/document/d/1KprHkzx9r3lv47TLgO6FnkYZT92xOx6OeKvTJPxqpfk/edit]
>  can give more context. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41589) PyTorch Distributor

2022-12-19 Thread Erik Krogen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649510#comment-17649510
 ] 

Erik Krogen commented on SPARK-41589:
-

[~erithwik]  can you make the linked documents world-viewable? I get access 
denied.

> PyTorch Distributor
> ---
>
> Key: SPARK-41589
> URL: https://issues.apache.org/jira/browse/SPARK-41589
> Project: Spark
>  Issue Type: Umbrella
>  Components: ML
>Affects Versions: 3.4.0
>Reporter: Rithwik Ediga Lakhamsani
>Priority: Major
>
> This is a project to make it easier for PySpark users to distribute PyTorch 
> code using PySpark. The corresponding [Design 
> Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit]
>  and 
> [PRD|https://docs.google.com/document/d/1KprHkzx9r3lv47TLgO6FnkYZT92xOx6OeKvTJPxqpfk/edit]
>  can give more context. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41592) Implement functionality for training a PyTorch file on the executors

2022-12-19 Thread Rithwik Ediga Lakhamsani (Jira)
Rithwik Ediga Lakhamsani created SPARK-41592:


 Summary: Implement functionality for training a PyTorch file on 
the executors
 Key: SPARK-41592
 URL: https://issues.apache.org/jira/browse/SPARK-41592
 Project: Spark
  Issue Type: Sub-task
  Components: ML
Affects Versions: 3.4.0
Reporter: Rithwik Ediga Lakhamsani






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41593) Implement logging from the executor nodes

2022-12-19 Thread Rithwik Ediga Lakhamsani (Jira)
Rithwik Ediga Lakhamsani created SPARK-41593:


 Summary: Implement logging from the executor nodes
 Key: SPARK-41593
 URL: https://issues.apache.org/jira/browse/SPARK-41593
 Project: Spark
  Issue Type: Sub-task
  Components: ML
Affects Versions: 3.4.0
Reporter: Rithwik Ediga Lakhamsani






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41591) Implement functionality for training a PyTorch file locally

2022-12-19 Thread Rithwik Ediga Lakhamsani (Jira)
Rithwik Ediga Lakhamsani created SPARK-41591:


 Summary: Implement functionality for training a PyTorch file 
locally
 Key: SPARK-41591
 URL: https://issues.apache.org/jira/browse/SPARK-41591
 Project: Spark
  Issue Type: Sub-task
  Components: ML
Affects Versions: 3.4.0
Reporter: Rithwik Ediga Lakhamsani






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41589) PyTorch Distributor

2022-12-19 Thread Rithwik Ediga Lakhamsani (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649509#comment-17649509
 ] 

Rithwik Ediga Lakhamsani commented on SPARK-41589:
--

I am working on this.

> PyTorch Distributor
> ---
>
> Key: SPARK-41589
> URL: https://issues.apache.org/jira/browse/SPARK-41589
> Project: Spark
>  Issue Type: Umbrella
>  Components: ML
>Affects Versions: 3.4.0
>Reporter: Rithwik Ediga Lakhamsani
>Priority: Major
>
> This is a project to make it easier for PySpark users to distribute PyTorch 
> code using PySpark. The corresponding [Design 
> Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit]
>  and 
> [PRD|https://docs.google.com/document/d/1KprHkzx9r3lv47TLgO6FnkYZT92xOx6OeKvTJPxqpfk/edit]
>  can give more context. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41589) PyTorch Distributor

2022-12-19 Thread Rithwik Ediga Lakhamsani (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rithwik Ediga Lakhamsani updated SPARK-41589:
-
Description: This is a project to make it easier for PySpark users to 
distribute PyTorch code using PySpark. The corresponding [Design 
Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit]
 and 
[PRD|https://docs.google.com/document/d/1KprHkzx9r3lv47TLgO6FnkYZT92xOx6OeKvTJPxqpfk/edit]
 can give more context. 

> PyTorch Distributor
> ---
>
> Key: SPARK-41589
> URL: https://issues.apache.org/jira/browse/SPARK-41589
> Project: Spark
>  Issue Type: Umbrella
>  Components: ML
>Affects Versions: 3.4.0
>Reporter: Rithwik Ediga Lakhamsani
>Priority: Major
>
> This is a project to make it easier for PySpark users to distribute PyTorch 
> code using PySpark. The corresponding [Design 
> Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit]
>  and 
> [PRD|https://docs.google.com/document/d/1KprHkzx9r3lv47TLgO6FnkYZT92xOx6OeKvTJPxqpfk/edit]
>  can give more context. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41590) Implement Baseline API Code

2022-12-19 Thread Rithwik Ediga Lakhamsani (Jira)
Rithwik Ediga Lakhamsani created SPARK-41590:


 Summary: Implement Baseline API Code
 Key: SPARK-41590
 URL: https://issues.apache.org/jira/browse/SPARK-41590
 Project: Spark
  Issue Type: Sub-task
  Components: ML
Affects Versions: 3.4.0
Reporter: Rithwik Ediga Lakhamsani


Creating a baseline API so that we can agree on how the users will interact 
with the code. This was determined in this [Design 
Document|https://docs.google.com/document/d/1_nhUP46cHnYmnZoyirySXvuY1KDMU3vdHRx9MngSVtA/edit]
 and can be updated as necessary.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41589) PyTorch Distributor

2022-12-19 Thread Rithwik Ediga Lakhamsani (Jira)
Rithwik Ediga Lakhamsani created SPARK-41589:


 Summary: PyTorch Distributor
 Key: SPARK-41589
 URL: https://issues.apache.org/jira/browse/SPARK-41589
 Project: Spark
  Issue Type: Umbrella
  Components: ML
Affects Versions: 3.4.0
Reporter: Rithwik Ediga Lakhamsani






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41583) Add Spark Connect and protobuf into setup.py with specifying dependencies

2022-12-19 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41583.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39123
[https://github.com/apache/spark/pull/39123]

> Add Spark Connect and protobuf into setup.py with specifying dependencies
> -
>
> Key: SPARK-41583
> URL: https://issues.apache.org/jira/browse/SPARK-41583
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Protobuf
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>
> We should document this, and put both pyspark.sql.connect and 
> pyspark.sql.protobuf into the PyPi package.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41583) Add Spark Connect and protobuf into setup.py with specifying dependencies

2022-12-19 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41583:


Assignee: Hyukjin Kwon

> Add Spark Connect and protobuf into setup.py with specifying dependencies
> -
>
> Key: SPARK-41583
> URL: https://issues.apache.org/jira/browse/SPARK-41583
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Protobuf
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> We should document this, and put both pyspark.sql.connect and 
> pyspark.sql.protobuf into the PyPi package.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41588) Make "Rule id not found" error message more actionable

2022-12-19 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-41588.

Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39120
[https://github.com/apache/spark/pull/39120]

> Make "Rule id not found" error message more actionable
> --
>
> Key: SPARK-41588
> URL: https://issues.apache.org/jira/browse/SPARK-41588
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>Priority: Major
> Fix For: 3.4.0
>
>
> It was super confusing to me when adding a new rule that I bumped into the 
> rule id error. We should update the error message to make it more actionable, 
> i.e. explaining to the developers which file to modify.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41588) Make "Rule id not found" error message more actionable

2022-12-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41588:


Assignee: Reynold Xin  (was: Apache Spark)

> Make "Rule id not found" error message more actionable
> --
>
> Key: SPARK-41588
> URL: https://issues.apache.org/jira/browse/SPARK-41588
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>Priority: Major
>
> It was super confusing to me when adding a new rule that I bumped into the 
> rule id error. We should update the error message to make it more actionable, 
> i.e. explaining to the developers which file to modify.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41588) Make "Rule id not found" error message more actionable

2022-12-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649501#comment-17649501
 ] 

Apache Spark commented on SPARK-41588:
--

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/39120

> Make "Rule id not found" error message more actionable
> --
>
> Key: SPARK-41588
> URL: https://issues.apache.org/jira/browse/SPARK-41588
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>Priority: Major
>
> It was super confusing to me when adding a new rule that I bumped into the 
> rule id error. We should update the error message to make it more actionable, 
> i.e. explaining to the developers which file to modify.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41588) Make "Rule id not found" error message more actionable

2022-12-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41588:


Assignee: Apache Spark  (was: Reynold Xin)

> Make "Rule id not found" error message more actionable
> --
>
> Key: SPARK-41588
> URL: https://issues.apache.org/jira/browse/SPARK-41588
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Reynold Xin
>Assignee: Apache Spark
>Priority: Major
>
> It was super confusing to me when adding a new rule that I bumped into the 
> rule id error. We should update the error message to make it more actionable, 
> i.e. explaining to the developers which file to modify.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41588) Make "Rule id not found" error message more actionable

2022-12-19 Thread Reynold Xin (Jira)
Reynold Xin created SPARK-41588:
---

 Summary: Make "Rule id not found" error message more actionable
 Key: SPARK-41588
 URL: https://issues.apache.org/jira/browse/SPARK-41588
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0
Reporter: Reynold Xin
Assignee: Reynold Xin


It was super confusing to me when adding a new rule that I bumped into the rule 
id error. We should update the error message to make it more actionable, i.e. 
explaining to the developers which file to modify.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41420) Protobuf serializer for ApplicationInfoWrapper

2022-12-19 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-41420.

Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39093
[https://github.com/apache/spark/pull/39093]

> Protobuf serializer for ApplicationInfoWrapper
> --
>
> Key: SPARK-41420
> URL: https://issues.apache.org/jira/browse/SPARK-41420
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Sandeep Singh
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41420) Protobuf serializer for ApplicationInfoWrapper

2022-12-19 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-41420:
--

Assignee: Sandeep Singh

> Protobuf serializer for ApplicationInfoWrapper
> --
>
> Key: SPARK-41420
> URL: https://issues.apache.org/jira/browse/SPARK-41420
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Sandeep Singh
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41427) Protobuf serializer for ExecutorStageSummaryWrapper

2022-12-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649438#comment-17649438
 ] 

Apache Spark commented on SPARK-41427:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/39100

> Protobuf serializer for ExecutorStageSummaryWrapper
> ---
>
> Key: SPARK-41427
> URL: https://issues.apache.org/jira/browse/SPARK-41427
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41422) Protobuf serializer for ExecutorSummaryWrapper

2022-12-19 Thread Gengliang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649439#comment-17649439
 ] 

Gengliang Wang commented on SPARK-41422:


[~techaddict] I was commenting on the wrong jira. Feel free to submit the PR.

> Protobuf serializer for ExecutorSummaryWrapper
> --
>
> Key: SPARK-41422
> URL: https://issues.apache.org/jira/browse/SPARK-41422
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41427) Protobuf serializer for ExecutorStageSummaryWrapper

2022-12-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649440#comment-17649440
 ] 

Apache Spark commented on SPARK-41427:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/39100

> Protobuf serializer for ExecutorStageSummaryWrapper
> ---
>
> Key: SPARK-41427
> URL: https://issues.apache.org/jira/browse/SPARK-41427
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41427) Protobuf serializer for ExecutorStageSummaryWrapper

2022-12-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41427:


Assignee: (was: Apache Spark)

> Protobuf serializer for ExecutorStageSummaryWrapper
> ---
>
> Key: SPARK-41427
> URL: https://issues.apache.org/jira/browse/SPARK-41427
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41427) Protobuf serializer for ExecutorStageSummaryWrapper

2022-12-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41427:


Assignee: Apache Spark

> Protobuf serializer for ExecutorStageSummaryWrapper
> ---
>
> Key: SPARK-41427
> URL: https://issues.apache.org/jira/browse/SPARK-41427
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] (SPARK-41422) Protobuf serializer for ExecutorSummaryWrapper

2022-12-19 Thread Gengliang Wang (Jira)


[ https://issues.apache.org/jira/browse/SPARK-41422 ]


Gengliang Wang deleted comment on SPARK-41422:


was (Author: apachespark):
User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/39100

> Protobuf serializer for ExecutorSummaryWrapper
> --
>
> Key: SPARK-41422
> URL: https://issues.apache.org/jira/browse/SPARK-41422
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] (SPARK-41422) Protobuf serializer for ExecutorSummaryWrapper

2022-12-19 Thread Gengliang Wang (Jira)


[ https://issues.apache.org/jira/browse/SPARK-41422 ]


Gengliang Wang deleted comment on SPARK-41422:


was (Author: gengliang.wang):
[~techaddict] I have a PR for this one already. Sorry I didn't claim it. 

I will claim next time. The ExecutorMetrics is a bit tricky, so I am doing it 
by myself.

> Protobuf serializer for ExecutorSummaryWrapper
> --
>
> Key: SPARK-41422
> URL: https://issues.apache.org/jira/browse/SPARK-41422
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41162) Anti-join must not be pushed below aggregation with ambiguous predicates

2022-12-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649418#comment-17649418
 ] 

Apache Spark commented on SPARK-41162:
--

User 'EnricoMi' has created a pull request for this issue:
https://github.com/apache/spark/pull/39131

> Anti-join must not be pushed below aggregation with ambiguous predicates
> 
>
> Key: SPARK-41162
> URL: https://issues.apache.org/jira/browse/SPARK-41162
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.3, 3.3.1, 3.2.3, 3.4.0
>Reporter: Enrico Minack
>Priority: Major
>  Labels: correctness
>
> The following query should return a single row as all values for {{id}} 
> except for the largest will be eliminated by the anti-join:
> {code}
> val ids = Seq(1, 2, 3).toDF("id").distinct()
> val result = ids.withColumn("id", $"id" + 1).join(ids, "id", 
> "left_anti").collect()
> assert(result.length == 1)
> {code}
> Without the {{distinct()}}, the assertion is true. With {{distinct()}}, the 
> assertion should still hold but is false.
> Rule {{PushDownLeftSemiAntiJoin}} pushes the {{Join}} below the left 
> {{Aggregate}} with join condition {{(id#750 + 1) = id#750}}, which can never 
> be true.
> {code}
> === Applying Rule 
> org.apache.spark.sql.catalyst.optimizer.PushDownLeftSemiAntiJoin ===
> !Join LeftAnti, (id#752 = id#750)  'Aggregate [id#750], 
> [(id#750 + 1) AS id#752]
> !:- Aggregate [id#750], [(id#750 + 1) AS id#752]   +- 'Join LeftAnti, 
> ((id#750 + 1) = id#750)
> !:  +- LocalRelation [id#750] :- LocalRelation 
> [id#750]
> !+- Aggregate [id#750], [id#750]  +- Aggregate [id#750], 
> [id#750]
> !   +- LocalRelation [id#750]+- LocalRelation 
> [id#750]
> {code}
> The optimizer then rightly removes the left-anti join altogether, returning 
> the left child only.
> Rule {{PushDownLeftSemiAntiJoin}} should not push down predicates that 
> reference left *and* right child.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41277) Save and leverage shuffle key in tblproperties

2022-12-19 Thread Ohad Raviv (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649300#comment-17649300
 ] 

Ohad Raviv commented on SPARK-41277:


[~gurwls223] - can I please get your opinion here?

> Save and leverage shuffle key in tblproperties
> --
>
> Key: SPARK-41277
> URL: https://issues.apache.org/jira/browse/SPARK-41277
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Ohad Raviv
>Priority: Minor
>
> I'm not sure if I'm not missing anything trivial.
> In a typical process, many datasets get materialized and many of them after a 
> shuffle (e.g join). then they would again be involved in further actions and 
> often use the same key.
> Wouldn't it make sense to save the shuffle key along with the table to avoid 
> unnecessary shuffles?
> Also, the implementation seems quite straightforward - to just leverage the 
> bucketing mechanism.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41441) Allow Generate with no required child output to host outer references

2022-12-19 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-41441.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38968
[https://github.com/apache/spark/pull/38968]

> Allow Generate with no required child output to host outer references
> -
>
> Key: SPARK-41441
> URL: https://issues.apache.org/jira/browse/SPARK-41441
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently, in CheckAnalysis, Spark disallows Generate to host any outer 
> references when it's required child output is not empty. But when the child 
> output is empty, it can host outer references, which DecorrelateInnerQuery 
> does not handle.
> For example,
> {code:java}
> select * from t, lateral (select explode(array(c1, c2))){code}
> This throws an internal error :
> {code:java}
> Caused by: java.lang.AssertionError: assertion failed: Correlated column is 
> not allowed in Generate explode(array(outer(c1#219), outer(c2#220))), false, 
> [col#221] +- OneRowRelation{code}
>  We should support Generate to host outer references when its required child 
> output is empty.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41441) Allow Generate with no required child output to host outer references

2022-12-19 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-41441:
---

Assignee: Allison Wang

> Allow Generate with no required child output to host outer references
> -
>
> Key: SPARK-41441
> URL: https://issues.apache.org/jira/browse/SPARK-41441
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>
> Currently, in CheckAnalysis, Spark disallows Generate to host any outer 
> references when it's required child output is not empty. But when the child 
> output is empty, it can host outer references, which DecorrelateInnerQuery 
> does not handle.
> For example,
> {code:java}
> select * from t, lateral (select explode(array(c1, c2))){code}
> This throws an internal error :
> {code:java}
> Caused by: java.lang.AssertionError: assertion failed: Correlated column is 
> not allowed in Generate explode(array(outer(c1#219), outer(c2#220))), false, 
> [col#221] +- OneRowRelation{code}
>  We should support Generate to host outer references when its required child 
> output is empty.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41587) Upgrade org.scalatestplus:selenium-4-4 to org.scalatestplus:selenium-4-7

2022-12-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649287#comment-17649287
 ] 

Apache Spark commented on SPARK-41587:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/39129

> Upgrade org.scalatestplus:selenium-4-4 to org.scalatestplus:selenium-4-7
> 
>
> Key: SPARK-41587
> URL: https://issues.apache.org/jira/browse/SPARK-41587
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> https://github.com/scalatest/scalatestplus-selenium/releases/tag/release-3.2.14.0-for-selenium-4.7



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41587) Upgrade org.scalatestplus:selenium-4-4 to org.scalatestplus:selenium-4-7

2022-12-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41587:


Assignee: Apache Spark

> Upgrade org.scalatestplus:selenium-4-4 to org.scalatestplus:selenium-4-7
> 
>
> Key: SPARK-41587
> URL: https://issues.apache.org/jira/browse/SPARK-41587
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>
> https://github.com/scalatest/scalatestplus-selenium/releases/tag/release-3.2.14.0-for-selenium-4.7



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41587) Upgrade org.scalatestplus:selenium-4-4 to org.scalatestplus:selenium-4-7

2022-12-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41587:


Assignee: (was: Apache Spark)

> Upgrade org.scalatestplus:selenium-4-4 to org.scalatestplus:selenium-4-7
> 
>
> Key: SPARK-41587
> URL: https://issues.apache.org/jira/browse/SPARK-41587
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> https://github.com/scalatest/scalatestplus-selenium/releases/tag/release-3.2.14.0-for-selenium-4.7



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41586) Introduce new PySpark package: pyspark.errors

2022-12-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41586:


Assignee: (was: Apache Spark)

> Introduce new PySpark package: pyspark.errors
> -
>
> Key: SPARK-41586
> URL: https://issues.apache.org/jira/browse/SPARK-41586
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Introduce new package `pyspark.errors` for improving PySpark error message.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41586) Introduce new PySpark package: pyspark.errors

2022-12-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41586:


Assignee: Apache Spark

> Introduce new PySpark package: pyspark.errors
> -
>
> Key: SPARK-41586
> URL: https://issues.apache.org/jira/browse/SPARK-41586
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>
> Introduce new package `pyspark.errors` for improving PySpark error message.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41586) Introduce new PySpark package: pyspark.errors

2022-12-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649286#comment-17649286
 ] 

Apache Spark commented on SPARK-41586:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/39128

> Introduce new PySpark package: pyspark.errors
> -
>
> Key: SPARK-41586
> URL: https://issues.apache.org/jira/browse/SPARK-41586
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Introduce new package `pyspark.errors` for improving PySpark error message.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41587) Upgrade org.scalatestplus:selenium-4-4 to org.scalatestplus:selenium-4-7

2022-12-19 Thread Yang Jie (Jira)
Yang Jie created SPARK-41587:


 Summary: Upgrade org.scalatestplus:selenium-4-4 to 
org.scalatestplus:selenium-4-7
 Key: SPARK-41587
 URL: https://issues.apache.org/jira/browse/SPARK-41587
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.4.0
Reporter: Yang Jie


https://github.com/scalatest/scalatestplus-selenium/releases/tag/release-3.2.14.0-for-selenium-4.7



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41586) Introduce new PySpark package: pyspark.errors

2022-12-19 Thread Haejoon Lee (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649261#comment-17649261
 ] 

Haejoon Lee commented on SPARK-41586:
-

I'm working on it

> Introduce new PySpark package: pyspark.errors
> -
>
> Key: SPARK-41586
> URL: https://issues.apache.org/jira/browse/SPARK-41586
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Introduce new package `pyspark.errors` for improving PySpark error message.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41586) Introduce new PySpark package: pyspark.errors

2022-12-19 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-41586:
---

 Summary: Introduce new PySpark package: pyspark.errors
 Key: SPARK-41586
 URL: https://issues.apache.org/jira/browse/SPARK-41586
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.4.0
Reporter: Haejoon Lee


Introduce new package `pyspark.errors` for improving PySpark error message.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41585) The Spark exclude node functionality for YARN should work independently of dynamic allocation

2022-12-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649235#comment-17649235
 ] 

Apache Spark commented on SPARK-41585:
--

User 'LucaCanali' has created a pull request for this issue:
https://github.com/apache/spark/pull/39127

> The Spark exclude node functionality for YARN should work independently of 
> dynamic allocation
> -
>
> Key: SPARK-41585
> URL: https://issues.apache.org/jira/browse/SPARK-41585
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 3.3.1
>Reporter: Luca Canali
>Priority: Minor
>
> The Spark exclude node functionality for Spark on YARN, introduced in 
> SPARK-26688, allows users to specify a list of node names that are excluded 
> from resource allocation. This is done using the configuration parameter: 
> {{spark.yarn.exclude.nodes}}
> The feature currently works only for executors allocated via dynamic 
> allocation. To use the feature on Spark 3.3.1, for example, one may need also 
> to configure spark.dynamicAllocation.minExecutors=0 and 
> spark.executor.instances=0, therefore relying on executor resource allocation 
> only via dynamic allocation.
> This proposes to extend the use of Spark exclude node functionality for YARN 
> beyond dynamic allocation, which I believe makes it more consistent also with 
> what the documentation reports for this feature/configuration parameter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21829) Enable config to permanently blacklist a list of nodes

2022-12-19 Thread Luca Canali (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-21829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649234#comment-17649234
 ] 

Luca Canali commented on SPARK-21829:
-

Note: similar functionality was later implemented in 
https://issues.apache.org/jira/browse/SPARK-26688

> Enable config to permanently blacklist a list of nodes
> --
>
> Key: SPARK-21829
> URL: https://issues.apache.org/jira/browse/SPARK-21829
> Project: Spark
>  Issue Type: New Feature
>  Components: Scheduler, Spark Core
>Affects Versions: 2.1.1, 2.2.0
>Reporter: Luca Canali
>Priority: Minor
>
> The idea for this proposal comes from a performance incident in a local 
> cluster where a job was found very slow because of a log tail of stragglers 
> due to 2 nodes in the cluster being slow to access a remote filesystem.
> The issue was limited to the 2 machines and was related to external 
> configurations: the 2 machines that performed badly when accessing the remote 
> file system were behaving normally for other jobs in the cluster (a shared 
> YARN cluster).
> With this new feature I propose to introduce a mechanism to allow users to 
> specify a list of nodes in the cluster where executors/tasks should not run 
> for a specific job.
> The proposed implementation that I tested (see PR) uses the Spark blacklist 
> mechanism. With the parameter spark.blacklist.alwaysBlacklistedNodes, a list 
> of user-specified nodes is added to the blacklist at the start of the Spark 
> Context and it is never expired. 
> I have tested this on a YARN cluster on a case taken from the original 
> production problem and I confirm a performance improvement of about 5x for 
> the specific test case I have. I imagine that there can be other cases where 
> Spark users may want to blacklist a set of nodes. This can be used for 
> troubleshooting, including cases where certain nodes/executors are slow for a 
> given workload and this is caused by external agents, so the anomaly is not 
> picked up by the cluster manager.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41585) The Spark exclude node functionality for YARN should work independently of dynamic allocation

2022-12-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41585:


Assignee: Apache Spark

> The Spark exclude node functionality for YARN should work independently of 
> dynamic allocation
> -
>
> Key: SPARK-41585
> URL: https://issues.apache.org/jira/browse/SPARK-41585
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 3.3.1
>Reporter: Luca Canali
>Assignee: Apache Spark
>Priority: Minor
>
> The Spark exclude node functionality for Spark on YARN, introduced in 
> SPARK-26688, allows users to specify a list of node names that are excluded 
> from resource allocation. This is done using the configuration parameter: 
> {{spark.yarn.exclude.nodes}}
> The feature currently works only for executors allocated via dynamic 
> allocation. To use the feature on Spark 3.3.1, for example, one may need also 
> to configure spark.dynamicAllocation.minExecutors=0 and 
> spark.executor.instances=0, therefore relying on executor resource allocation 
> only via dynamic allocation.
> This proposes to extend the use of Spark exclude node functionality for YARN 
> beyond dynamic allocation, which I believe makes it more consistent also with 
> what the documentation reports for this feature/configuration parameter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41585) The Spark exclude node functionality for YARN should work independently of dynamic allocation

2022-12-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649233#comment-17649233
 ] 

Apache Spark commented on SPARK-41585:
--

User 'LucaCanali' has created a pull request for this issue:
https://github.com/apache/spark/pull/39127

> The Spark exclude node functionality for YARN should work independently of 
> dynamic allocation
> -
>
> Key: SPARK-41585
> URL: https://issues.apache.org/jira/browse/SPARK-41585
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 3.3.1
>Reporter: Luca Canali
>Priority: Minor
>
> The Spark exclude node functionality for Spark on YARN, introduced in 
> SPARK-26688, allows users to specify a list of node names that are excluded 
> from resource allocation. This is done using the configuration parameter: 
> {{spark.yarn.exclude.nodes}}
> The feature currently works only for executors allocated via dynamic 
> allocation. To use the feature on Spark 3.3.1, for example, one may need also 
> to configure spark.dynamicAllocation.minExecutors=0 and 
> spark.executor.instances=0, therefore relying on executor resource allocation 
> only via dynamic allocation.
> This proposes to extend the use of Spark exclude node functionality for YARN 
> beyond dynamic allocation, which I believe makes it more consistent also with 
> what the documentation reports for this feature/configuration parameter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41585) The Spark exclude node functionality for YARN should work independently of dynamic allocation

2022-12-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41585:


Assignee: (was: Apache Spark)

> The Spark exclude node functionality for YARN should work independently of 
> dynamic allocation
> -
>
> Key: SPARK-41585
> URL: https://issues.apache.org/jira/browse/SPARK-41585
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 3.3.1
>Reporter: Luca Canali
>Priority: Minor
>
> The Spark exclude node functionality for Spark on YARN, introduced in 
> SPARK-26688, allows users to specify a list of node names that are excluded 
> from resource allocation. This is done using the configuration parameter: 
> {{spark.yarn.exclude.nodes}}
> The feature currently works only for executors allocated via dynamic 
> allocation. To use the feature on Spark 3.3.1, for example, one may need also 
> to configure spark.dynamicAllocation.minExecutors=0 and 
> spark.executor.instances=0, therefore relying on executor resource allocation 
> only via dynamic allocation.
> This proposes to extend the use of Spark exclude node functionality for YARN 
> beyond dynamic allocation, which I believe makes it more consistent also with 
> what the documentation reports for this feature/configuration parameter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41585) The Spark exclude node functionality for YARN should work independently of dynamic allocation

2022-12-19 Thread Luca Canali (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luca Canali updated SPARK-41585:

Description: 
The Spark exclude node functionality for Spark on YARN, introduced in 
SPARK-26688, allows users to specify a list of node names that are excluded 
from resource allocation. This is done using the configuration parameter: 
{{spark.yarn.exclude.nodes}}

The feature currently works only for executors allocated via dynamic 
allocation. To use the feature on Spark 3.3.1, for example, one may need also 
to configure spark.dynamicAllocation.minExecutors=0 and 
spark.executor.instances=0, therefore relying on executor resource allocation 
only via dynamic allocation.

This proposes to extend the use of Spark exclude node functionality for YARN 
beyond dynamic allocation, which I believe makes it more consistent also with 
what the documentation reports for this feature/configuration parameter.

  was:
The Spark exclude node functionality for YARN, introduced in SPARK-26688, 
allows users to specify a list of node names that are excluded from resource 
allocation. This is done using the configuration parameter: 
{{spark.yarn.exclude.nodes}}

The feature currently works only for executors allocated via dynamic 
allocation. To use the feature on Spark 3.3.1, for example, one may need also 
to configure spark.dynamicAllocation.minExecutors=0 and 
spark.executor.instances=0, therefore relying on executor resource allocation 
only via dynamic allocation.

This proposes to extend the use of Spark exclude node functionality for YARN 
beyond dynamic allocation, which I believe makes it more consistent also with 
what the documentation reports for this feature/configuration parameter.


> The Spark exclude node functionality for YARN should work independently of 
> dynamic allocation
> -
>
> Key: SPARK-41585
> URL: https://issues.apache.org/jira/browse/SPARK-41585
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 3.3.1
>Reporter: Luca Canali
>Priority: Minor
>
> The Spark exclude node functionality for Spark on YARN, introduced in 
> SPARK-26688, allows users to specify a list of node names that are excluded 
> from resource allocation. This is done using the configuration parameter: 
> {{spark.yarn.exclude.nodes}}
> The feature currently works only for executors allocated via dynamic 
> allocation. To use the feature on Spark 3.3.1, for example, one may need also 
> to configure spark.dynamicAllocation.minExecutors=0 and 
> spark.executor.instances=0, therefore relying on executor resource allocation 
> only via dynamic allocation.
> This proposes to extend the use of Spark exclude node functionality for YARN 
> beyond dynamic allocation, which I believe makes it more consistent also with 
> what the documentation reports for this feature/configuration parameter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41585) The Spark exclude node functionality for YARN should work independently of dynamic allocation

2022-12-19 Thread Luca Canali (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luca Canali updated SPARK-41585:

Description: 
The Spark exclude node functionality for YARN, introduced in SPARK-26688, 
allows users to specify a list of node names that are excluded from resource 
allocation. This is done using the configuration parameter: 
{{spark.yarn.exclude.nodes}}

The feature currently works only for executors allocated via dynamic 
allocation. To use the feature on Spark 3.3.1, for example, one may need also 
to configure spark.dynamicAllocation.minExecutors=0 and 
spark.executor.instances=0, therefore relying on executor resource allocation 
only via dynamic allocation.

This proposes to extend the use of Spark exclude node functionality for YARN 
beyond dynamic allocation, which I believe makes it more consistent also with 
what the documentation reports for this feature/configuration parameter.

  was:
The Spark exclude node functionality for YARN, introduced in SPARK-26688, 
allows users to specify a list of node names that are excluded from resource 
allocation. This is done using the configuration parameter: 
{{spark.yarn.exclude.nodes}}

The feature currently works only for executors allocated via dynamic 
allocation. To use the feature on Spark 3.3.1, for eaxmple, one needs to 
configure spark.dynamicAllocation.minExecutors=0 and 
spark.executor.instances=0, therefore relying on executor resource allocation 
only via dynamic allocation.

This proposes to extend the use of Spark exclude node functionality for YARN 
beyond dynamic allocation, which I believe makes it more consistent also with 
what the documentation reports for this feature/configuration parameter.


> The Spark exclude node functionality for YARN should work independently of 
> dynamic allocation
> -
>
> Key: SPARK-41585
> URL: https://issues.apache.org/jira/browse/SPARK-41585
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 3.3.1
>Reporter: Luca Canali
>Priority: Minor
>
> The Spark exclude node functionality for YARN, introduced in SPARK-26688, 
> allows users to specify a list of node names that are excluded from resource 
> allocation. This is done using the configuration parameter: 
> {{spark.yarn.exclude.nodes}}
> The feature currently works only for executors allocated via dynamic 
> allocation. To use the feature on Spark 3.3.1, for example, one may need also 
> to configure spark.dynamicAllocation.minExecutors=0 and 
> spark.executor.instances=0, therefore relying on executor resource allocation 
> only via dynamic allocation.
> This proposes to extend the use of Spark exclude node functionality for YARN 
> beyond dynamic allocation, which I believe makes it more consistent also with 
> what the documentation reports for this feature/configuration parameter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41585) The Spark exclude node functionality for YARN should work independently of dynamic allocation

2022-12-19 Thread Luca Canali (Jira)
Luca Canali created SPARK-41585:
---

 Summary: The Spark exclude node functionality for YARN should work 
independently of dynamic allocation
 Key: SPARK-41585
 URL: https://issues.apache.org/jira/browse/SPARK-41585
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 3.3.1
Reporter: Luca Canali


The Spark exclude node functionality for YARN, introduced in SPARK-26688, 
allows users to specify a list of node names that are excluded from resource 
allocation. This is done using the configuration parameter: 
{{spark.yarn.exclude.nodes}}

The feature currently works only for executors allocated via dynamic 
allocation. To use the feature on Spark 3.3.1, for eaxmple, one needs to 
configure spark.dynamicAllocation.minExecutors=0 and 
spark.executor.instances=0, therefore relying on executor resource allocation 
only via dynamic allocation.

This proposes to extend the use of Spark exclude node functionality for YARN 
beyond dynamic allocation, which I believe makes it more consistent also with 
what the documentation reports for this feature/configuration parameter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41584) Upgrade RoaringBitmap to 0.9.36

2022-12-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649201#comment-17649201
 ] 

Apache Spark commented on SPARK-41584:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/39125

> Upgrade RoaringBitmap to 0.9.36
> ---
>
> Key: SPARK-41584
> URL: https://issues.apache.org/jira/browse/SPARK-41584
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> https://github.com/RoaringBitmap/RoaringBitmap/compare/0.9.35...0.9.36



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41584) Upgrade RoaringBitmap to 0.9.36

2022-12-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41584:


Assignee: (was: Apache Spark)

> Upgrade RoaringBitmap to 0.9.36
> ---
>
> Key: SPARK-41584
> URL: https://issues.apache.org/jira/browse/SPARK-41584
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> https://github.com/RoaringBitmap/RoaringBitmap/compare/0.9.35...0.9.36



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41584) Upgrade RoaringBitmap to 0.9.36

2022-12-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649200#comment-17649200
 ] 

Apache Spark commented on SPARK-41584:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/39125

> Upgrade RoaringBitmap to 0.9.36
> ---
>
> Key: SPARK-41584
> URL: https://issues.apache.org/jira/browse/SPARK-41584
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> https://github.com/RoaringBitmap/RoaringBitmap/compare/0.9.35...0.9.36



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41584) Upgrade RoaringBitmap to 0.9.36

2022-12-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41584:


Assignee: Apache Spark

> Upgrade RoaringBitmap to 0.9.36
> ---
>
> Key: SPARK-41584
> URL: https://issues.apache.org/jira/browse/SPARK-41584
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>
> https://github.com/RoaringBitmap/RoaringBitmap/compare/0.9.35...0.9.36



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41584) Upgrade RoaringBitmap to 0.9.36

2022-12-19 Thread Yang Jie (Jira)
Yang Jie created SPARK-41584:


 Summary: Upgrade RoaringBitmap to 0.9.36
 Key: SPARK-41584
 URL: https://issues.apache.org/jira/browse/SPARK-41584
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.4.0
Reporter: Yang Jie


https://github.com/RoaringBitmap/RoaringBitmap/compare/0.9.35...0.9.36



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org