[jira] [Assigned] (SPARK-41648) Deduplicate docstrings in pyspark.sql.connect.readwriter

2022-12-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41648:


Assignee: Apache Spark

> Deduplicate docstrings in pyspark.sql.connect.readwriter
> 
>
> Key: SPARK-41648
> URL: https://issues.apache.org/jira/browse/SPARK-41648
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41648) Deduplicate docstrings in pyspark.sql.connect.readwriter

2022-12-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41648:


Assignee: (was: Apache Spark)

> Deduplicate docstrings in pyspark.sql.connect.readwriter
> 
>
> Key: SPARK-41648
> URL: https://issues.apache.org/jira/browse/SPARK-41648
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41648) Deduplicate docstrings in pyspark.sql.connect.readwriter

2022-12-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650178#comment-17650178
 ] 

Apache Spark commented on SPARK-41648:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/39153

> Deduplicate docstrings in pyspark.sql.connect.readwriter
> 
>
> Key: SPARK-41648
> URL: https://issues.apache.org/jira/browse/SPARK-41648
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41648) Deduplicate docstrings in pyspark.sql.connect.readwriter

2022-12-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650174#comment-17650174
 ] 

Apache Spark commented on SPARK-41648:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/39153

> Deduplicate docstrings in pyspark.sql.connect.readwriter
> 
>
> Key: SPARK-41648
> URL: https://issues.apache.org/jira/browse/SPARK-41648
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41660) only propagate metadata columns if they are used

2022-12-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650134#comment-17650134
 ] 

Apache Spark commented on SPARK-41660:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/39152

> only propagate metadata columns if they are used
> 
>
> Key: SPARK-41660
> URL: https://issues.apache.org/jira/browse/SPARK-41660
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41660) only propagate metadata columns if they are used

2022-12-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41660:


Assignee: (was: Apache Spark)

> only propagate metadata columns if they are used
> 
>
> Key: SPARK-41660
> URL: https://issues.apache.org/jira/browse/SPARK-41660
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41660) only propagate metadata columns if they are used

2022-12-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41660:


Assignee: Apache Spark

> only propagate metadata columns if they are used
> 
>
> Key: SPARK-41660
> URL: https://issues.apache.org/jira/browse/SPARK-41660
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41660) only propagate metadata columns if they are used

2022-12-20 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-41660:
---

 Summary: only propagate metadata columns if they are used
 Key: SPARK-41660
 URL: https://issues.apache.org/jira/browse/SPARK-41660
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41653) Test parity: enable doctests in Spark Connect

2022-12-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41653:


Assignee: Hyukjin Kwon

> Test parity: enable doctests in Spark Connect
> -
>
> Key: SPARK-41653
> URL: https://issues.apache.org/jira/browse/SPARK-41653
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> We should actually run the doctests of Spark Connect.
> We should add something like 
> https://github.com/apache/spark/blob/master/python/pyspark/sql/column.py#L1227-L1247
>  to Spark Connect modules, and add the module into 
> https://github.com/apache/spark/blob/master/dev/sparktestsupport/modules.py#L507



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41651) Test parity: pyspark.sql.tests.test_dataframe

2022-12-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41651:


Assignee: Hyukjin Kwon

> Test parity: pyspark.sql.tests.test_dataframe
> -
>
> Key: SPARK-41651
> URL: https://issues.apache.org/jira/browse/SPARK-41651
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> After https://github.com/apache/spark/pull/39041 (SPARK-41528), we now reuses 
> the same test cases, see 
> {{python/pyspark/sql/tests/connect/test_parity_dataframe.py}}.
> We should remove all the test cases defined there, and fix Spark Connect 
> behaviours accordingly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41652) Test parity: pyspark.sql.tests.test_functions

2022-12-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41652:


Assignee: Hyukjin Kwon

> Test parity: pyspark.sql.tests.test_functions
> -
>
> Key: SPARK-41652
> URL: https://issues.apache.org/jira/browse/SPARK-41652
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> After https://github.com/apache/spark/pull/39041 (SPARK-41528), we now reuses 
> the same test cases, see 
> {{python/pyspark/sql/tests/connect/test_parity_functions.py}}.
> We should remove all the test cases defined there, and fix Spark Connect 
> behaviours accordingly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41653) Test parity: enable doctests in Spark Connect

2022-12-20 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650122#comment-17650122
 ] 

Hyukjin Kwon commented on SPARK-41653:
--

cc jiaan.geng and Deng Ziming in case you guys are interested in this.

> Test parity: enable doctests in Spark Connect
> -
>
> Key: SPARK-41653
> URL: https://issues.apache.org/jira/browse/SPARK-41653
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> We should actually run the doctests of Spark Connect.
> We should add something like 
> https://github.com/apache/spark/blob/master/python/pyspark/sql/column.py#L1227-L1247
>  to Spark Connect modules, and add the module into 
> https://github.com/apache/spark/blob/master/dev/sparktestsupport/modules.py#L507



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41653) Test parity: enable doctests in Spark Connect

2022-12-20 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650121#comment-17650121
 ] 

Hyukjin Kwon commented on SPARK-41653:
--

If the files is too big, feel free to split the JIRA or make a multiple 
followups. (e.g., pyspark.sql.connect.functions)

> Test parity: enable doctests in Spark Connect
> -
>
> Key: SPARK-41653
> URL: https://issues.apache.org/jira/browse/SPARK-41653
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> We should actually run the doctests of Spark Connect.
> We should add something like 
> https://github.com/apache/spark/blob/master/python/pyspark/sql/column.py#L1227-L1247
>  to Spark Connect modules, and add the module into 
> https://github.com/apache/spark/blob/master/dev/sparktestsupport/modules.py#L507



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41659) Enable doctests in pyspark.sql.connect.readwriter

2022-12-20 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-41659:


 Summary: Enable doctests in pyspark.sql.connect.readwriter
 Key: SPARK-41659
 URL: https://issues.apache.org/jira/browse/SPARK-41659
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41657) Enable doctests in pyspark.sql.connect.session

2022-12-20 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-41657:


 Summary: Enable doctests in pyspark.sql.connect.session
 Key: SPARK-41657
 URL: https://issues.apache.org/jira/browse/SPARK-41657
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41658) Enable doctests in pyspark.sql.connect.functions

2022-12-20 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-41658:


 Summary: Enable doctests in pyspark.sql.connect.functions
 Key: SPARK-41658
 URL: https://issues.apache.org/jira/browse/SPARK-41658
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41655) Enable doctests in pyspark.sql.connect.column

2022-12-20 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-41655:


 Summary: Enable doctests in pyspark.sql.connect.column
 Key: SPARK-41655
 URL: https://issues.apache.org/jira/browse/SPARK-41655
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41654) Enable doctests in pyspark.sql.connect.window

2022-12-20 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-41654:


 Summary: Enable doctests in pyspark.sql.connect.window
 Key: SPARK-41654
 URL: https://issues.apache.org/jira/browse/SPARK-41654
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41656) Enable doctests in pyspark.sql.connect.dataframe

2022-12-20 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-41656:


 Summary: Enable doctests in pyspark.sql.connect.dataframe
 Key: SPARK-41656
 URL: https://issues.apache.org/jira/browse/SPARK-41656
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41653) Test parity: enable doctests in Spark Connect

2022-12-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-41653:
-
 Epic Link: SPARK-39375
Issue Type: Umbrella  (was: Bug)

> Test parity: enable doctests in Spark Connect
> -
>
> Key: SPARK-41653
> URL: https://issues.apache.org/jira/browse/SPARK-41653
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> We should actually run the doctests of Spark Connect.
> We should add something like 
> https://github.com/apache/spark/blob/master/python/pyspark/sql/column.py#L1227-L1247
>  to Spark Connect modules, and add the module into 
> https://github.com/apache/spark/blob/master/dev/sparktestsupport/modules.py#L507



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41653) Test parity: enable doctests in Spark Connect

2022-12-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-41653:
-
Parent: (was: SPARK-39375)
Issue Type: Bug  (was: Sub-task)

> Test parity: enable doctests in Spark Connect
> -
>
> Key: SPARK-41653
> URL: https://issues.apache.org/jira/browse/SPARK-41653
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> We should actually run the doctests of Spark Connect.
> We should add something like 
> https://github.com/apache/spark/blob/master/python/pyspark/sql/column.py#L1227-L1247
>  to Spark Connect modules, and add the module into 
> https://github.com/apache/spark/blob/master/dev/sparktestsupport/modules.py#L507



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41653) Test parity: enable doctests in Spark Connect

2022-12-20 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-41653:


 Summary: Test parity: enable doctests in Spark Connect
 Key: SPARK-41653
 URL: https://issues.apache.org/jira/browse/SPARK-41653
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Hyukjin Kwon


We should actually run the doctests of Spark Connect.

We should add something like 
https://github.com/apache/spark/blob/master/python/pyspark/sql/column.py#L1227-L1247
 to Spark Connect modules, and add the module into 
https://github.com/apache/spark/blob/master/dev/sparktestsupport/modules.py#L507



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41651) Test parity: pyspark.sql.tests.test_dataframe

2022-12-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-41651:
-
Description: 
After https://github.com/apache/spark/pull/39041 (SPARK-41528), we now reuses 
the same test cases, see 
{{python/pyspark/sql/tests/connect/test_parity_dataframe.py}}.

We should remove all the test cases defined there, and fix Spark Connect 
behaviours accordingly.

  was:
After https://github.com/apache/spark/pull/39041 (SPARK-41528), we now reuses 
the same test cases, see 
`python/pyspark/sql/tests/connect/test_parity_dataframe.py`.

We should remove all the test cases defined there, and fix Spark Connect 
behaviours accordingly.


> Test parity: pyspark.sql.tests.test_dataframe
> -
>
> Key: SPARK-41651
> URL: https://issues.apache.org/jira/browse/SPARK-41651
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> After https://github.com/apache/spark/pull/39041 (SPARK-41528), we now reuses 
> the same test cases, see 
> {{python/pyspark/sql/tests/connect/test_parity_dataframe.py}}.
> We should remove all the test cases defined there, and fix Spark Connect 
> behaviours accordingly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41652) Test parity: pyspark.sql.tests.test_functions

2022-12-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-41652:
-
Description: 
After https://github.com/apache/spark/pull/39041 (SPARK-41528), we now reuses 
the same test cases, see 
{{python/pyspark/sql/tests/connect/test_parity_functions.py}}.

We should remove all the test cases defined there, and fix Spark Connect 
behaviours accordingly.

  was:
After https://github.com/apache/spark/pull/39041 (SPARK-41528), we now reuses 
the same test cases, see 
`python/pyspark/sql/tests/connect/test_parity_functions.py`.

We should remove all the test cases defined there, and fix Spark Connect 
behaviours accordingly.


> Test parity: pyspark.sql.tests.test_functions
> -
>
> Key: SPARK-41652
> URL: https://issues.apache.org/jira/browse/SPARK-41652
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> After https://github.com/apache/spark/pull/39041 (SPARK-41528), we now reuses 
> the same test cases, see 
> {{python/pyspark/sql/tests/connect/test_parity_functions.py}}.
> We should remove all the test cases defined there, and fix Spark Connect 
> behaviours accordingly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41651) Test parity: pyspark.sql.tests.test_dataframe

2022-12-20 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650118#comment-17650118
 ] 

Hyukjin Kwon commented on SPARK-41651:
--

cc [~beliefer] and [~dengziming] in case you guys are interested in this.


> Test parity: pyspark.sql.tests.test_dataframe
> -
>
> Key: SPARK-41651
> URL: https://issues.apache.org/jira/browse/SPARK-41651
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> After https://github.com/apache/spark/pull/39041 (SPARK-41528), we now reuses 
> the same test cases, see 
> `python/pyspark/sql/tests/connect/test_parity_dataframe.py`.
> We should remove all the test cases defined there, and fix Spark Connect 
> behaviours accordingly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41652) Test parity: pyspark.sql.tests.test_functions

2022-12-20 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650119#comment-17650119
 ] 

Hyukjin Kwon commented on SPARK-41652:
--

cc [~beliefer] and [~dengziming] in case you guys are interested in this.


> Test parity: pyspark.sql.tests.test_functions
> -
>
> Key: SPARK-41652
> URL: https://issues.apache.org/jira/browse/SPARK-41652
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> After https://github.com/apache/spark/pull/39041 (SPARK-41528), we now reuses 
> the same test cases, see 
> `python/pyspark/sql/tests/connect/test_parity_functions.py`.
> We should remove all the test cases defined there, and fix Spark Connect 
> behaviours accordingly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41642) Deduplicate docstrings in Python Spark Connect

2022-12-20 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650117#comment-17650117
 ] 

Hyukjin Kwon commented on SPARK-41642:
--

cc [~beliefer] and [~dengziming] in case you guys are interested in this.


> Deduplicate docstrings in Python Spark Connect
> --
>
> Key: SPARK-41642
> URL: https://issues.apache.org/jira/browse/SPARK-41642
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> There are a lot of duplications in the current docstrings in PySpark Spark 
> Connect API side.
> We should deduplicate them all.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41652) Test parity: pyspark.sql.tests.test_functions

2022-12-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-41652:
-
Epic Link: SPARK-39375

> Test parity: pyspark.sql.tests.test_functions
> -
>
> Key: SPARK-41652
> URL: https://issues.apache.org/jira/browse/SPARK-41652
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> After https://github.com/apache/spark/pull/39041 (SPARK-41528), we now reuses 
> the same test cases, see 
> `python/pyspark/sql/tests/connect/test_parity_functions.py`.
> We should remove all the test cases defined there, and fix Spark Connect 
> behaviours accordingly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41652) Test parity: pyspark.sql.tests.test_functions

2022-12-20 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-41652:


 Summary: Test parity: pyspark.sql.tests.test_functions
 Key: SPARK-41652
 URL: https://issues.apache.org/jira/browse/SPARK-41652
 Project: Spark
  Issue Type: Umbrella
  Components: Connect
Affects Versions: 3.4.0
Reporter: Hyukjin Kwon


After https://github.com/apache/spark/pull/39041 (SPARK-41528), we now reuses 
the same test cases, see 
`python/pyspark/sql/tests/connect/test_parity_functions.py`.

We should remove all the test cases defined there, and fix Spark Connect 
behaviours accordingly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41652) Test parity: pyspark.sql.tests.test_functions

2022-12-20 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650115#comment-17650115
 ] 

Hyukjin Kwon commented on SPARK-41652:
--

Please create a subtask and go ahead.

> Test parity: pyspark.sql.tests.test_functions
> -
>
> Key: SPARK-41652
> URL: https://issues.apache.org/jira/browse/SPARK-41652
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> After https://github.com/apache/spark/pull/39041 (SPARK-41528), we now reuses 
> the same test cases, see 
> `python/pyspark/sql/tests/connect/test_parity_functions.py`.
> We should remove all the test cases defined there, and fix Spark Connect 
> behaviours accordingly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41650) json expressions much slower in optimized mode

2022-12-20 Thread Yi Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650114#comment-17650114
 ] 

Yi Zhang commented on SPARK-41650:
--

[~gurwls223] , [~viirya]  can you help look into this? 

> json expressions much slower in optimized mode
> --
>
> Key: SPARK-41650
> URL: https://issues.apache.org/jira/browse/SPARK-41650
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Structured Streaming
>Affects Versions: 3.2.2
>Reporter: Yi Zhang
>Priority: Major
>
> I noticed spark structured streaming reading from Kafka json string into 
> struct type is much slower in spark-3.1+ than spark-3.0. Profiling reveals 
> the json expressions in spark-3.0 mostly on evaluate subExpr, while 
> spark-3.1/3.2 spent a lot time on writeField. 
> Suspect this may be related to SPARK-32948, so I tried with add a bogus 
> option 
> from_json($"value", mySchema, Map("bogus_key"-> "bogus_value")
> this turns off the optimization and the performance is much better. For 
> reference, 
> for same amount #records, it is 30 seconds vs. 3 minute on a task processing 
> 500k records. This is big difference for a streaming job.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41651) Test parity: pyspark.sql.tests.test_dataframe

2022-12-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-41651:
-
Issue Type: Umbrella  (was: Improvement)

> Test parity: pyspark.sql.tests.test_dataframe
> -
>
> Key: SPARK-41651
> URL: https://issues.apache.org/jira/browse/SPARK-41651
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> After https://github.com/apache/spark/pull/39041 (SPARK-41528), we now reuses 
> the same test cases, see 
> `python/pyspark/sql/tests/connect/test_parity_dataframe.py`.
> We should remove all the test cases defined there, and fix Spark Connect 
> behaviours accordingly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41651) Test parity: pyspark.sql.tests.test_dataframe

2022-12-20 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650112#comment-17650112
 ] 

Hyukjin Kwon commented on SPARK-41651:
--

Please create a subtask and work on it.

> Test parity: pyspark.sql.tests.test_dataframe
> -
>
> Key: SPARK-41651
> URL: https://issues.apache.org/jira/browse/SPARK-41651
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> After https://github.com/apache/spark/pull/39041 (SPARK-41528), we now reuses 
> the same test cases, see 
> `python/pyspark/sql/tests/connect/test_parity_dataframe.py`.
> We should remove all the test cases defined there, and fix Spark Connect 
> behaviours accordingly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41651) Test parity: pyspark.sql.tests.test_dataframe

2022-12-20 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-41651:


 Summary: Test parity: pyspark.sql.tests.test_dataframe
 Key: SPARK-41651
 URL: https://issues.apache.org/jira/browse/SPARK-41651
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.0
Reporter: Hyukjin Kwon


After https://github.com/apache/spark/pull/39041 (SPARK-41528), we now reuses 
the same test cases, see 
`python/pyspark/sql/tests/connect/test_parity_dataframe.py`.

We should remove all the test cases defined there, and fix Spark Connect 
behaviours accordingly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41650) json expressions much slower in optimized mode

2022-12-20 Thread Yi Zhang (Jira)
Yi Zhang created SPARK-41650:


 Summary: json expressions much slower in optimized mode
 Key: SPARK-41650
 URL: https://issues.apache.org/jira/browse/SPARK-41650
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, Structured Streaming
Affects Versions: 3.2.2
Reporter: Yi Zhang


I noticed spark structured streaming reading from Kafka json string into struct 
type is much slower in spark-3.1+ than spark-3.0. Profiling reveals the json 
expressions in spark-3.0 mostly on evaluate subExpr, while spark-3.1/3.2 spent 
a lot time on writeField. 

Suspect this may be related to SPARK-32948, so I tried with add a bogus option 

from_json($"value", mySchema, Map("bogus_key"-> "bogus_value")

this turns off the optimization and the performance is much better. For 
reference, 

for same amount #records, it is 30 seconds vs. 3 minute on a task processing 
500k records. This is big difference for a streaming job.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41649) Deduplicate docstrings in pyspark.sql.connect.window

2022-12-20 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-41649:


 Summary: Deduplicate docstrings in pyspark.sql.connect.window
 Key: SPARK-41649
 URL: https://issues.apache.org/jira/browse/SPARK-41649
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41642) Deduplicate docstrings in Python Spark Connect

2022-12-20 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650111#comment-17650111
 ] 

Hyukjin Kwon commented on SPARK-41642:
--

If the files is too big, feel free to split the JIRA or make a multiple 
followups. (e.g., pyspark.sql.connect.functions)

> Deduplicate docstrings in Python Spark Connect
> --
>
> Key: SPARK-41642
> URL: https://issues.apache.org/jira/browse/SPARK-41642
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> There are a lot of duplications in the current docstrings in PySpark Spark 
> Connect API side.
> We should deduplicate them all.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41648) Deduplicate docstrings in pyspark.sql.connect.readwriter

2022-12-20 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-41648:


 Summary: Deduplicate docstrings in pyspark.sql.connect.readwriter
 Key: SPARK-41648
 URL: https://issues.apache.org/jira/browse/SPARK-41648
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41646) Deduplicate docstrings in pyspark.sql.connect.session

2022-12-20 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-41646:


 Summary: Deduplicate docstrings in pyspark.sql.connect.session
 Key: SPARK-41646
 URL: https://issues.apache.org/jira/browse/SPARK-41646
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41647) Deduplicate docstrings in pyspark.sql.connect.functions

2022-12-20 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-41647:


 Summary: Deduplicate docstrings in pyspark.sql.connect.functions
 Key: SPARK-41647
 URL: https://issues.apache.org/jira/browse/SPARK-41647
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41645) Deduplicate docstrings in pyspark.sql.connect.dataframe

2022-12-20 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-41645:


 Summary: Deduplicate docstrings in pyspark.sql.connect.dataframe
 Key: SPARK-41645
 URL: https://issues.apache.org/jira/browse/SPARK-41645
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41643) Deduplicate docstrings in pyspark.sql.connect.column

2022-12-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650108#comment-17650108
 ] 

Apache Spark commented on SPARK-41643:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/39150

> Deduplicate docstrings in pyspark.sql.connect.column
> 
>
> Key: SPARK-41643
> URL: https://issues.apache.org/jira/browse/SPARK-41643
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41643) Deduplicate docstrings in pyspark.sql.connect.column

2022-12-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41643:


Assignee: Apache Spark

> Deduplicate docstrings in pyspark.sql.connect.column
> 
>
> Key: SPARK-41643
> URL: https://issues.apache.org/jira/browse/SPARK-41643
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41643) Deduplicate docstrings in pyspark.sql.connect.column

2022-12-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41643:


Assignee: (was: Apache Spark)

> Deduplicate docstrings in pyspark.sql.connect.column
> 
>
> Key: SPARK-41643
> URL: https://issues.apache.org/jira/browse/SPARK-41643
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41644) Introducing SPI mechanism to make it easy for other modules to register ProtoBufSerializer

2022-12-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41644:


Assignee: (was: Apache Spark)

> Introducing SPI mechanism to make it easy for other modules to register 
> ProtoBufSerializer
> --
>
> Key: SPARK-41644
> URL: https://issues.apache.org/jira/browse/SPARK-41644
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41644) Introducing SPI mechanism to make it easy for other modules to register ProtoBufSerializer

2022-12-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41644:


Assignee: Apache Spark

> Introducing SPI mechanism to make it easy for other modules to register 
> ProtoBufSerializer
> --
>
> Key: SPARK-41644
> URL: https://issues.apache.org/jira/browse/SPARK-41644
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41644) Introducing SPI mechanism to make it easy for other modules to register ProtoBufSerializer

2022-12-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650100#comment-17650100
 ] 

Apache Spark commented on SPARK-41644:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/39148

> Introducing SPI mechanism to make it easy for other modules to register 
> ProtoBufSerializer
> --
>
> Key: SPARK-41644
> URL: https://issues.apache.org/jira/browse/SPARK-41644
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41644) Introducing SPI mechanism to make it easy for other modules to register ProtoBufSerializer

2022-12-20 Thread Yang Jie (Jira)
Yang Jie created SPARK-41644:


 Summary: Introducing SPI mechanism to make it easy for other 
modules to register ProtoBufSerializer
 Key: SPARK-41644
 URL: https://issues.apache.org/jira/browse/SPARK-41644
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41597) Improve PySpark errors

2022-12-20 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-41597:

Component/s: Connect
 Tests

> Improve PySpark errors
> --
>
> Key: SPARK-41597
> URL: https://issues.apache.org/jira/browse/SPARK-41597
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect, PySpark, Tests
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> This ticket aims to introduce new PySpark framework to centralize the PySpark 
> error message into single path and improve the error message more actionable 
> and consistency.
> This umbrella JIRA might includes:
>  * Introduce new error framework for PySpark
>  * Migrate existing errors generated by Python driver into error classes.
>  * Migrate existing errors generated by Python worker into error classes.
>  * Migrate existing errors generated by Py4J into error classes.
>  * Introduce test utils for testing errors by its error class instead of 
> error messages.
>  * Improve the error messages.
>  * Documentation for PySpark error framework.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41539) stats and constraints in LogicalRDD may not be in sync with output attributes

2022-12-20 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-41539.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39082
[https://github.com/apache/spark/pull/39082]

> stats and constraints in LogicalRDD may not be in sync with output attributes
> -
>
> Key: SPARK-41539
> URL: https://issues.apache.org/jira/browse/SPARK-41539
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.4.0
>
>
> We encountered the case where the output of logical plan and optimized plan 
> were different in LogicalRDD (the difference was exprId for the case), led 
> the situation that stats and constraints are out of sync with output 
> attributes, eventually failed the query.
> We should remap stats and constraints based on the output of logical plan, 
> assuming that the output of logical plan and optimized plan are "slightly" 
> different (e.g. exprId) but "semantically" same.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41539) stats and constraints in LogicalRDD may not be in sync with output attributes

2022-12-20 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-41539:


Assignee: Jungtaek Lim

> stats and constraints in LogicalRDD may not be in sync with output attributes
> -
>
> Key: SPARK-41539
> URL: https://issues.apache.org/jira/browse/SPARK-41539
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
>
> We encountered the case where the output of logical plan and optimized plan 
> were different in LogicalRDD (the difference was exprId for the case), led 
> the situation that stats and constraints are out of sync with output 
> attributes, eventually failed the query.
> We should remap stats and constraints based on the output of logical plan, 
> assuming that the output of logical plan and optimized plan are "slightly" 
> different (e.g. exprId) but "semantically" same.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41634) Upgrade minimatch to 3.1.2

2022-12-20 Thread Kousuke Saruta (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta resolved SPARK-41634.

Fix Version/s: 3.4.0
 Assignee: Bjørn Jørgensen
   Resolution: Fixed

Issue resolved in https://github.com/apache/spark/pull/39143

> Upgrade minimatch to 3.1.2 
> ---
>
> Key: SPARK-41634
> URL: https://issues.apache.org/jira/browse/SPARK-41634
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Bjørn Jørgensen
>Assignee: Bjørn Jørgensen
>Priority: Minor
> Fix For: 3.4.0
>
>
> [CVE-2022-3517|https://nvd.nist.gov/vuln/detail/CVE-2022-3517]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41587) Upgrade org.scalatestplus:selenium-4-4 to org.scalatestplus:selenium-4-7

2022-12-20 Thread Kousuke Saruta (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta resolved SPARK-41587.

Fix Version/s: 3.4.0
 Assignee: Yang Jie
   Resolution: Fixed

Issue resolved in https://github.com/apache/spark/pull/39129

> Upgrade org.scalatestplus:selenium-4-4 to org.scalatestplus:selenium-4-7
> 
>
> Key: SPARK-41587
> URL: https://issues.apache.org/jira/browse/SPARK-41587
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 3.4.0
>
>
> https://github.com/scalatest/scalatestplus-selenium/releases/tag/release-3.2.14.0-for-selenium-4.7



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41292) Window-function support

2022-12-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650073#comment-17650073
 ] 

Apache Spark commented on SPARK-41292:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39149

> Window-function support
> ---
>
> Key: SPARK-41292
> URL: https://issues.apache.org/jira/browse/SPARK-41292
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Blocker
>
> For compatibility, we need support for expressing window functions. Window 
> functions are different from regular unresolved expressions as they need a 
> window spec and are generally treated more like aggregate functions.
> Part of this task is to identify if we can fully express the logic of window 
> functions using unresolved functions with expression arguments that represent 
> the window spec.
> Only once this validation is done, we should consider adding a new plan 
> operator / expression type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41292) Window-function support

2022-12-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41292:


Assignee: (was: Apache Spark)

> Window-function support
> ---
>
> Key: SPARK-41292
> URL: https://issues.apache.org/jira/browse/SPARK-41292
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Blocker
>
> For compatibility, we need support for expressing window functions. Window 
> functions are different from regular unresolved expressions as they need a 
> window spec and are generally treated more like aggregate functions.
> Part of this task is to identify if we can fully express the logic of window 
> functions using unresolved functions with expression arguments that represent 
> the window spec.
> Only once this validation is done, we should consider adding a new plan 
> operator / expression type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41292) Window-function support

2022-12-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41292:


Assignee: Apache Spark

> Window-function support
> ---
>
> Key: SPARK-41292
> URL: https://issues.apache.org/jira/browse/SPARK-41292
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Apache Spark
>Priority: Blocker
>
> For compatibility, we need support for expressing window functions. Window 
> functions are different from regular unresolved expressions as they need a 
> window spec and are generally treated more like aggregate functions.
> Part of this task is to identify if we can fully express the logic of window 
> functions using unresolved functions with expression arguments that represent 
> the window spec.
> Only once this validation is done, we should consider adding a new plan 
> operator / expression type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41643) Deduplicate docstrings in pyspark.sql.connect.column

2022-12-20 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-41643:


 Summary: Deduplicate docstrings in pyspark.sql.connect.column
 Key: SPARK-41643
 URL: https://issues.apache.org/jira/browse/SPARK-41643
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41643) Deduplicate docstrings in pyspark.sql.connect.column

2022-12-20 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650071#comment-17650071
 ] 

Hyukjin Kwon commented on SPARK-41643:
--

I am working on this

> Deduplicate docstrings in pyspark.sql.connect.column
> 
>
> Key: SPARK-41643
> URL: https://issues.apache.org/jira/browse/SPARK-41643
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41642) Deduplicate docstrings in Python Spark Connect

2022-12-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41642:


Assignee: Hyukjin Kwon

> Deduplicate docstrings in Python Spark Connect
> --
>
> Key: SPARK-41642
> URL: https://issues.apache.org/jira/browse/SPARK-41642
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> There are a lot of duplications in the current docstrings in PySpark Spark 
> Connect API side.
> We should deduplicate them all.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41642) Deduplicate docstrings in Python Spark Connect

2022-12-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-41642:
-
Description: 
There are a lot of duplications in the current docstrings in PySpark Spark 
Connect API side.
We should deduplicate them all.

> Deduplicate docstrings in Python Spark Connect
> --
>
> Key: SPARK-41642
> URL: https://issues.apache.org/jira/browse/SPARK-41642
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> There are a lot of duplications in the current docstrings in PySpark Spark 
> Connect API side.
> We should deduplicate them all.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41642) Deduplicate docstrings in Python Spark Connect

2022-12-20 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-41642:


 Summary: Deduplicate docstrings in Python Spark Connect
 Key: SPARK-41642
 URL: https://issues.apache.org/jira/browse/SPARK-41642
 Project: Spark
  Issue Type: Umbrella
  Components: Connect
Affects Versions: 3.4.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41426) Protobuf serializer for ResourceProfileWrapper

2022-12-20 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-41426.

Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39105
[https://github.com/apache/spark/pull/39105]

> Protobuf serializer for ResourceProfileWrapper
> --
>
> Key: SPARK-41426
> URL: https://issues.apache.org/jira/browse/SPARK-41426
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Sandeep Singh
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41426) Protobuf serializer for ResourceProfileWrapper

2022-12-20 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-41426:
--

Assignee: Sandeep Singh

> Protobuf serializer for ResourceProfileWrapper
> --
>
> Key: SPARK-41426
> URL: https://issues.apache.org/jira/browse/SPARK-41426
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Sandeep Singh
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41434) Support LambdaFunction expresssion

2022-12-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41434.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39068
[https://github.com/apache/spark/pull/39068]

> Support LambdaFunction expresssion
> --
>
> Key: SPARK-41434
> URL: https://issues.apache.org/jira/browse/SPARK-41434
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41434) Support LambdaFunction expresssion

2022-12-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41434:


Assignee: Ruifeng Zheng

> Support LambdaFunction expresssion
> --
>
> Key: SPARK-41434
> URL: https://issues.apache.org/jira/browse/SPARK-41434
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41192) Task finished before speculative task scheduled leads to holding idle executors

2022-12-20 Thread Mridul Muralidharan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan reassigned SPARK-41192:
---

Assignee: Yazhi Wang

> Task finished before speculative task scheduled leads to holding idle 
> executors
> ---
>
> Key: SPARK-41192
> URL: https://issues.apache.org/jira/browse/SPARK-41192
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.2, 3.3.1
>Reporter: Yazhi Wang
>Assignee: Yazhi Wang
>Priority: Minor
>  Labels: dynamic_allocation
> Attachments: dynamic-executors, dynamic-log
>
>
> When task finished before speculative task has been scheduled by 
> DAGScheduler, then the speculative tasks will be considered as pending and 
> count towards the calculation of number of needed executors, which will lead 
> to request more executors than needed
> h2. Background & Reproduce
> In one of our production job, we found that ExecutorAllocationManager was 
> holding more executors than needed. 
> We found it's difficult to reproduce in the test environment. In order to 
> stably reproduce and debug, we temporarily annotated the scheduling code of 
> speculative tasks in TaskSetManager:363 to ensure that the task be completed 
> before the speculative task being scheduled.
> {code:java}
> // Original code
> private def dequeueTask(
>     execId: String,
>     host: String,
>     maxLocality: TaskLocality.Value): Option[(Int, TaskLocality.Value, 
> Boolean)] = {
>   // Tries to schedule a regular task first; if it returns None, then 
> schedules
>   // a speculative task
>   dequeueTaskHelper(execId, host, maxLocality, false).orElse(
>     dequeueTaskHelper(execId, host, maxLocality, true))
> } 
> // Speculative task will never be scheduled
> private def dequeueTask(
>     execId: String,
>     host: String,
>     maxLocality: TaskLocality.Value): Option[(Int, TaskLocality.Value, 
> Boolean)] = {
>   // Tries to schedule a regular task first; if it returns None, then 
> schedules
>   // a speculative task
>   dequeueTaskHelper(execId, host, maxLocality, false)
> }  {code}
> Referring to examples in SPARK-30511
> You will see when running the last task, we would be hold 38 executors (see 
> attachment), which is exactly (149 + 1) / 4 = 38. But actually there are only 
> 2 tasks in running, which requires Math.min(20, 2/4) = 20 executors indeed.
> {code:java}
> ./bin/spark-shell --master yarn --conf spark.speculation=true --conf 
> spark.executor.cores=4 --conf spark.dynamicAllocation.enabled=true --conf 
> spark.dynamicAllocation.minExecutors=20 --conf 
> spark.dynamicAllocation.maxExecutors=1000 {code}
> {code:java}
> val n = 4000
> val someRDD = sc.parallelize(1 to n, n)
> someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => {
> if (index > 3998) {
>     Thread.sleep(1000 * 1000)
> } else if (index > 3850) {
>     Thread.sleep(50 * 1000) // Fake running tasks
> } else {
>     Thread.sleep(100)
> }
> Array.fill[Int](1)(1).iterator{code}
>  
> I will have a PR ready to fix this issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41192) Task finished before speculative task scheduled leads to holding idle executors

2022-12-20 Thread Mridul Muralidharan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan resolved SPARK-41192.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38711
[https://github.com/apache/spark/pull/38711]

> Task finished before speculative task scheduled leads to holding idle 
> executors
> ---
>
> Key: SPARK-41192
> URL: https://issues.apache.org/jira/browse/SPARK-41192
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.2, 3.3.1
>Reporter: Yazhi Wang
>Assignee: Yazhi Wang
>Priority: Minor
>  Labels: dynamic_allocation
> Fix For: 3.4.0
>
> Attachments: dynamic-executors, dynamic-log
>
>
> When task finished before speculative task has been scheduled by 
> DAGScheduler, then the speculative tasks will be considered as pending and 
> count towards the calculation of number of needed executors, which will lead 
> to request more executors than needed
> h2. Background & Reproduce
> In one of our production job, we found that ExecutorAllocationManager was 
> holding more executors than needed. 
> We found it's difficult to reproduce in the test environment. In order to 
> stably reproduce and debug, we temporarily annotated the scheduling code of 
> speculative tasks in TaskSetManager:363 to ensure that the task be completed 
> before the speculative task being scheduled.
> {code:java}
> // Original code
> private def dequeueTask(
>     execId: String,
>     host: String,
>     maxLocality: TaskLocality.Value): Option[(Int, TaskLocality.Value, 
> Boolean)] = {
>   // Tries to schedule a regular task first; if it returns None, then 
> schedules
>   // a speculative task
>   dequeueTaskHelper(execId, host, maxLocality, false).orElse(
>     dequeueTaskHelper(execId, host, maxLocality, true))
> } 
> // Speculative task will never be scheduled
> private def dequeueTask(
>     execId: String,
>     host: String,
>     maxLocality: TaskLocality.Value): Option[(Int, TaskLocality.Value, 
> Boolean)] = {
>   // Tries to schedule a regular task first; if it returns None, then 
> schedules
>   // a speculative task
>   dequeueTaskHelper(execId, host, maxLocality, false)
> }  {code}
> Referring to examples in SPARK-30511
> You will see when running the last task, we would be hold 38 executors (see 
> attachment), which is exactly (149 + 1) / 4 = 38. But actually there are only 
> 2 tasks in running, which requires Math.min(20, 2/4) = 20 executors indeed.
> {code:java}
> ./bin/spark-shell --master yarn --conf spark.speculation=true --conf 
> spark.executor.cores=4 --conf spark.dynamicAllocation.enabled=true --conf 
> spark.dynamicAllocation.minExecutors=20 --conf 
> spark.dynamicAllocation.maxExecutors=1000 {code}
> {code:java}
> val n = 4000
> val someRDD = sc.parallelize(1 to n, n)
> someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => {
> if (index > 3998) {
>     Thread.sleep(1000 * 1000)
> } else if (index > 3850) {
>     Thread.sleep(50 * 1000) // Fake running tasks
> } else {
>     Thread.sleep(100)
> }
> Array.fill[Int](1)(1).iterator{code}
>  
> I will have a PR ready to fix this issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41641) Implement `Column.over`

2022-12-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650065#comment-17650065
 ] 

Apache Spark commented on SPARK-41641:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39149

> Implement `Column.over`
> ---
>
> Key: SPARK-41641
> URL: https://issues.apache.org/jira/browse/SPARK-41641
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41641) Implement `Column.over`

2022-12-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41641:


Assignee: (was: Apache Spark)

> Implement `Column.over`
> ---
>
> Key: SPARK-41641
> URL: https://issues.apache.org/jira/browse/SPARK-41641
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41641) Implement `Column.over`

2022-12-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41641:


Assignee: Apache Spark

> Implement `Column.over`
> ---
>
> Key: SPARK-41641
> URL: https://issues.apache.org/jira/browse/SPARK-41641
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41640) implement `Window` functions

2022-12-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41640:


Assignee: (was: Apache Spark)

> implement `Window` functions
> 
>
> Key: SPARK-41640
> URL: https://issues.apache.org/jira/browse/SPARK-41640
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41640) implement `Window` functions

2022-12-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650064#comment-17650064
 ] 

Apache Spark commented on SPARK-41640:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/39149

> implement `Window` functions
> 
>
> Key: SPARK-41640
> URL: https://issues.apache.org/jira/browse/SPARK-41640
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41640) implement `Window` functions

2022-12-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41640:


Assignee: Apache Spark

> implement `Window` functions
> 
>
> Key: SPARK-41640
> URL: https://issues.apache.org/jira/browse/SPARK-41640
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41641) Implement `Column.over`

2022-12-20 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-41641:
-

 Summary: Implement `Column.over`
 Key: SPARK-41641
 URL: https://issues.apache.org/jira/browse/SPARK-41641
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.4.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41640) implement `Window` functions

2022-12-20 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-41640:
-

 Summary: implement `Window` functions
 Key: SPARK-41640
 URL: https://issues.apache.org/jira/browse/SPARK-41640
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.4.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41631) Support lateral column alias in Aggregate code path

2022-12-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-41631.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39040
[https://github.com/apache/spark/pull/39040]

> Support lateral column alias in Aggregate code path
> ---
>
> Key: SPARK-41631
> URL: https://issues.apache.org/jira/browse/SPARK-41631
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Xinyi Yu
>Assignee: Xinyi Yu
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41631) Support lateral column alias in Aggregate code path

2022-12-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-41631:
---

Assignee: Xinyi Yu

> Support lateral column alias in Aggregate code path
> ---
>
> Key: SPARK-41631
> URL: https://issues.apache.org/jira/browse/SPARK-41631
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Xinyi Yu
>Assignee: Xinyi Yu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41639) Remove ScalaReflectionLock

2022-12-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41639:


Assignee: (was: Apache Spark)

> Remove ScalaReflectionLock 
> ---
>
> Key: SPARK-41639
> URL: https://issues.apache.org/jira/browse/SPARK-41639
> Project: Spark
>  Issue Type: Task
>  Components: Protobuf
>Affects Versions: 3.4.0
>Reporter: Sandish Kumar HN
>Priority: Minor
> Fix For: 3.4.0
>
>
> Following up from PR [https://github.com/apache/spark/pull/38922] to remove 
> ScalaReflectionLock from SchemaConvertors



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41639) Remove ScalaReflectionLock

2022-12-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41639:


Assignee: Apache Spark

> Remove ScalaReflectionLock 
> ---
>
> Key: SPARK-41639
> URL: https://issues.apache.org/jira/browse/SPARK-41639
> Project: Spark
>  Issue Type: Task
>  Components: Protobuf
>Affects Versions: 3.4.0
>Reporter: Sandish Kumar HN
>Assignee: Apache Spark
>Priority: Minor
> Fix For: 3.4.0
>
>
> Following up from PR [https://github.com/apache/spark/pull/38922] to remove 
> ScalaReflectionLock from SchemaConvertors



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41639) Remove ScalaReflectionLock

2022-12-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650060#comment-17650060
 ] 

Apache Spark commented on SPARK-41639:
--

User 'SandishKumarHN' has created a pull request for this issue:
https://github.com/apache/spark/pull/39147

> Remove ScalaReflectionLock 
> ---
>
> Key: SPARK-41639
> URL: https://issues.apache.org/jira/browse/SPARK-41639
> Project: Spark
>  Issue Type: Task
>  Components: Protobuf
>Affects Versions: 3.4.0
>Reporter: Sandish Kumar HN
>Priority: Minor
> Fix For: 3.4.0
>
>
> Following up from PR [https://github.com/apache/spark/pull/38922] to remove 
> ScalaReflectionLock from SchemaConvertors



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41639) Remove ScalaReflectionLock

2022-12-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650059#comment-17650059
 ] 

Apache Spark commented on SPARK-41639:
--

User 'SandishKumarHN' has created a pull request for this issue:
https://github.com/apache/spark/pull/39147

> Remove ScalaReflectionLock 
> ---
>
> Key: SPARK-41639
> URL: https://issues.apache.org/jira/browse/SPARK-41639
> Project: Spark
>  Issue Type: Task
>  Components: Protobuf
>Affects Versions: 3.4.0
>Reporter: Sandish Kumar HN
>Priority: Minor
> Fix For: 3.4.0
>
>
> Following up from PR [https://github.com/apache/spark/pull/38922] to remove 
> ScalaReflectionLock from SchemaConvertors



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41639) Remove ScalaReflectionLock

2022-12-20 Thread Sandish Kumar HN (Jira)
Sandish Kumar HN created SPARK-41639:


 Summary: Remove ScalaReflectionLock 
 Key: SPARK-41639
 URL: https://issues.apache.org/jira/browse/SPARK-41639
 Project: Spark
  Issue Type: Task
  Components: Protobuf
Affects Versions: 3.4.0
Reporter: Sandish Kumar HN
 Fix For: 3.4.0


Following up from PR [https://github.com/apache/spark/pull/38922] to remove 
ScalaReflectionLock from SchemaConvertors



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41440) Implement DataFrame.randomSplit

2022-12-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41440:


Assignee: jiaan.geng  (was: Ruifeng Zheng)

> Implement DataFrame.randomSplit
> ---
>
> Key: SPARK-41440
> URL: https://issues.apache.org/jira/browse/SPARK-41440
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41440) Implement DataFrame.randomSplit

2022-12-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41440.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39017
[https://github.com/apache/spark/pull/39017]

> Implement DataFrame.randomSplit
> ---
>
> Key: SPARK-41440
> URL: https://issues.apache.org/jira/browse/SPARK-41440
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41440) Implement DataFrame.randomSplit

2022-12-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41440:


Assignee: Ruifeng Zheng

> Implement DataFrame.randomSplit
> ---
>
> Key: SPARK-41440
> URL: https://issues.apache.org/jira/browse/SPARK-41440
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41566) Upgrade netty from 4.1.84.Final to 4.1.86.Final

2022-12-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41566.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39116
[https://github.com/apache/spark/pull/39116]

> Upgrade netty from 4.1.84.Final to 4.1.86.Final
> ---
>
> Key: SPARK-41566
> URL: https://issues.apache.org/jira/browse/SPARK-41566
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Bjørn Jørgensen
>Assignee: Bjørn Jørgensen
>Priority: Minor
> Fix For: 3.4.0
>
>
> [HAProxyMessageDecoder Stack Exhaustion 
> DoS|https://github.com/netty/netty/security/advisories/GHSA-fx2c-96vj-985v]
> and 
> [HTTP Response splitting from assigning header value 
> iterator|https://github.com/netty/netty/security/advisories/GHSA-hh82-3pmq-7frp]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41566) Upgrade netty from 4.1.84.Final to 4.1.86.Final

2022-12-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41566:


Assignee: Bjørn Jørgensen

> Upgrade netty from 4.1.84.Final to 4.1.86.Final
> ---
>
> Key: SPARK-41566
> URL: https://issues.apache.org/jira/browse/SPARK-41566
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Bjørn Jørgensen
>Assignee: Bjørn Jørgensen
>Priority: Minor
>
> [HAProxyMessageDecoder Stack Exhaustion 
> DoS|https://github.com/netty/netty/security/advisories/GHSA-fx2c-96vj-985v]
> and 
> [HTTP Response splitting from assigning header value 
> iterator|https://github.com/netty/netty/security/advisories/GHSA-hh82-3pmq-7frp]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40520) Add a script to generate DOI mainifest

2022-12-20 Thread Yikun Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yikun Jiang resolved SPARK-40520.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 27
[https://github.com/apache/spark-docker/pull/27]

> Add a script to generate DOI mainifest
> --
>
> Key: SPARK-40520
> URL: https://issues.apache.org/jira/browse/SPARK-40520
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41396) Oneof field support and recursive fields

2022-12-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-41396:
---

Assignee: Sandish Kumar HN

> Oneof field support and recursive fields
> 
>
> Key: SPARK-41396
> URL: https://issues.apache.org/jira/browse/SPARK-41396
> Project: Spark
>  Issue Type: Task
>  Components: Protobuf
>Affects Versions: 2.3.0
>Reporter: Sandish Kumar HN
>Assignee: Sandish Kumar HN
>Priority: Major
>
> we should add support for protobuf OneOf fields to Spark-Protobuf. This will 
> involve implementing logic to detect when a protobuf message contains a OneOf 
> field, and to handle it appropriately when using from_protobuf and 
> to_protobuf. 
> we should add unit tests to ensure that the implementation of protobuf OneOf 
> field support is correct.
> Users can use protobuf OneOf fields with Spark-protobuf, making it more 
> complete and useful for processing protobuf data.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41396) Oneof field support and recursive fields

2022-12-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-41396.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38922
[https://github.com/apache/spark/pull/38922]

> Oneof field support and recursive fields
> 
>
> Key: SPARK-41396
> URL: https://issues.apache.org/jira/browse/SPARK-41396
> Project: Spark
>  Issue Type: Task
>  Components: Protobuf
>Affects Versions: 2.3.0
>Reporter: Sandish Kumar HN
>Assignee: Sandish Kumar HN
>Priority: Major
> Fix For: 3.4.0
>
>
> we should add support for protobuf OneOf fields to Spark-Protobuf. This will 
> involve implementing logic to detect when a protobuf message contains a OneOf 
> field, and to handle it appropriately when using from_protobuf and 
> to_protobuf. 
> we should add unit tests to ensure that the implementation of protobuf OneOf 
> field support is correct.
> Users can use protobuf OneOf fields with Spark-protobuf, making it more 
> complete and useful for processing protobuf data.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41584) Upgrade RoaringBitmap to 0.9.36

2022-12-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41584.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39125
[https://github.com/apache/spark/pull/39125]

> Upgrade RoaringBitmap to 0.9.36
> ---
>
> Key: SPARK-41584
> URL: https://issues.apache.org/jira/browse/SPARK-41584
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 3.4.0
>
>
> https://github.com/RoaringBitmap/RoaringBitmap/compare/0.9.35...0.9.36



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41584) Upgrade RoaringBitmap to 0.9.36

2022-12-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41584:


Assignee: Yang Jie

> Upgrade RoaringBitmap to 0.9.36
> ---
>
> Key: SPARK-41584
> URL: https://issues.apache.org/jira/browse/SPARK-41584
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>
> https://github.com/RoaringBitmap/RoaringBitmap/compare/0.9.35...0.9.36



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41589) PyTorch Distributor

2022-12-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41589:


Assignee: (was: Apache Spark)

> PyTorch Distributor
> ---
>
> Key: SPARK-41589
> URL: https://issues.apache.org/jira/browse/SPARK-41589
> Project: Spark
>  Issue Type: Umbrella
>  Components: ML, PySpark
>Affects Versions: 3.4.0
>Reporter: Rithwik Ediga Lakhamsani
>Priority: Major
>
> This is a project to make it easier for PySpark users to distribute PyTorch 
> code using PySpark. The corresponding [Design 
> Document|https://docs.google.com/document/d/1QPO1Ly8WteL6aIPvVcR7Xne9qVtJiB3fdrRn7NwBcpA/edit?usp=sharing]
>  can give more context. This was a project determined by the Databricks ML 
> Training Team; please reach out to [~gurwls223] (Spark-side) or [~erithwik] 
> for more context.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41589) PyTorch Distributor

2022-12-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650043#comment-17650043
 ] 

Apache Spark commented on SPARK-41589:
--

User 'rithwik-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/39146

> PyTorch Distributor
> ---
>
> Key: SPARK-41589
> URL: https://issues.apache.org/jira/browse/SPARK-41589
> Project: Spark
>  Issue Type: Umbrella
>  Components: ML, PySpark
>Affects Versions: 3.4.0
>Reporter: Rithwik Ediga Lakhamsani
>Priority: Major
>
> This is a project to make it easier for PySpark users to distribute PyTorch 
> code using PySpark. The corresponding [Design 
> Document|https://docs.google.com/document/d/1QPO1Ly8WteL6aIPvVcR7Xne9qVtJiB3fdrRn7NwBcpA/edit?usp=sharing]
>  can give more context. This was a project determined by the Databricks ML 
> Training Team; please reach out to [~gurwls223] (Spark-side) or [~erithwik] 
> for more context.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41589) PyTorch Distributor

2022-12-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41589:


Assignee: Apache Spark

> PyTorch Distributor
> ---
>
> Key: SPARK-41589
> URL: https://issues.apache.org/jira/browse/SPARK-41589
> Project: Spark
>  Issue Type: Umbrella
>  Components: ML, PySpark
>Affects Versions: 3.4.0
>Reporter: Rithwik Ediga Lakhamsani
>Assignee: Apache Spark
>Priority: Major
>
> This is a project to make it easier for PySpark users to distribute PyTorch 
> code using PySpark. The corresponding [Design 
> Document|https://docs.google.com/document/d/1QPO1Ly8WteL6aIPvVcR7Xne9qVtJiB3fdrRn7NwBcpA/edit?usp=sharing]
>  can give more context. This was a project determined by the Databricks ML 
> Training Team; please reach out to [~gurwls223] (Spark-side) or [~erithwik] 
> for more context.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41589) PyTorch Distributor

2022-12-20 Thread Rithwik Ediga Lakhamsani (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rithwik Ediga Lakhamsani updated SPARK-41589:
-
Description: This is a project to make it easier for PySpark users to 
distribute PyTorch code using PySpark. The corresponding [Design 
Document|https://docs.google.com/document/d/1QPO1Ly8WteL6aIPvVcR7Xne9qVtJiB3fdrRn7NwBcpA/edit?usp=sharing]
 can give more context. This was a project determined by the Databricks ML 
Training Team; please reach out to [~gurwls223] (Spark-side) or [~erithwik] for 
more context.  (was: This is a project to make it easier for PySpark users to 
distribute PyTorch code using PySpark. The corresponding [Design 
Document|https://docs.google.com/document/d/1QPO1Ly8WteL6aIPvVcR7Xne9qVtJiB3fdrRn7NwBcpA/edit?usp=sharing]
 can give more context. This was a project determined by the Databricks ML 
Training Team; please reach out to [~gurwls223] (Spark-side proxy) or 
[~erithwik] for more context.)

> PyTorch Distributor
> ---
>
> Key: SPARK-41589
> URL: https://issues.apache.org/jira/browse/SPARK-41589
> Project: Spark
>  Issue Type: Umbrella
>  Components: ML, PySpark
>Affects Versions: 3.4.0
>Reporter: Rithwik Ediga Lakhamsani
>Priority: Major
>
> This is a project to make it easier for PySpark users to distribute PyTorch 
> code using PySpark. The corresponding [Design 
> Document|https://docs.google.com/document/d/1QPO1Ly8WteL6aIPvVcR7Xne9qVtJiB3fdrRn7NwBcpA/edit?usp=sharing]
>  can give more context. This was a project determined by the Databricks ML 
> Training Team; please reach out to [~gurwls223] (Spark-side) or [~erithwik] 
> for more context.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41638) Move most tests to .sql files

2022-12-20 Thread Xinyi Yu (Jira)
Xinyi Yu created SPARK-41638:


 Summary: Move most tests to .sql files
 Key: SPARK-41638
 URL: https://issues.apache.org/jira/browse/SPARK-41638
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Xinyi Yu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41637) ORDER BY ALL

2022-12-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650036#comment-17650036
 ] 

Apache Spark commented on SPARK-41637:
--

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/39144

> ORDER BY ALL
> 
>
> Key: SPARK-41637
> URL: https://issues.apache.org/jira/browse/SPARK-41637
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>Priority: Major
>
> This patch adds ORDER BY ALL support to SQL. ORDER BY ALL is a syntactic 
> sugar to sort the output by all the fields, from left to right. It also 
> allows specifying asc/desc as well as null ordering. This was initially 
> introduced by DuckDB. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41637) ORDER BY ALL

2022-12-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650035#comment-17650035
 ] 

Apache Spark commented on SPARK-41637:
--

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/39144

> ORDER BY ALL
> 
>
> Key: SPARK-41637
> URL: https://issues.apache.org/jira/browse/SPARK-41637
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>Priority: Major
>
> This patch adds ORDER BY ALL support to SQL. ORDER BY ALL is a syntactic 
> sugar to sort the output by all the fields, from left to right. It also 
> allows specifying asc/desc as well as null ordering. This was initially 
> introduced by DuckDB. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41637) ORDER BY ALL

2022-12-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41637:


Assignee: Apache Spark  (was: Reynold Xin)

> ORDER BY ALL
> 
>
> Key: SPARK-41637
> URL: https://issues.apache.org/jira/browse/SPARK-41637
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Reynold Xin
>Assignee: Apache Spark
>Priority: Major
>
> This patch adds ORDER BY ALL support to SQL. ORDER BY ALL is a syntactic 
> sugar to sort the output by all the fields, from left to right. It also 
> allows specifying asc/desc as well as null ordering. This was initially 
> introduced by DuckDB. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   >