[jira] [Updated] (SPARK-47891) Improve docstring of mapInPandas

2024-04-17 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-47891:
-
Description: 
Improve docstring of mapInPandas
 * "using a Python native function that takes and outputs a pandas DataFrame" 
is confusing cause the function takes and outputs "ITERATOR of pandas 
DataFrames" instead.
 * "All columns are passed together as an iterator of pandas DataFrames" easily 
mislead users to think the entire DataFrame will be passed together, "a batch 
of rows" is used instead.

  was:Improve docstring of mapInPandas


> Improve docstring of mapInPandas
> 
>
> Key: SPARK-47891
> URL: https://issues.apache.org/jira/browse/SPARK-47891
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>
> Improve docstring of mapInPandas
>  * "using a Python native function that takes and outputs a pandas DataFrame" 
> is confusing cause the function takes and outputs "ITERATOR of pandas 
> DataFrames" instead.
>  * "All columns are passed together as an iterator of pandas DataFrames" 
> easily mislead users to think the entire DataFrame will be passed together, 
> "a batch of rows" is used instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47876) Improve docstring of mapInArrow

2024-04-16 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng resolved SPARK-47876.
--
Resolution: Done

Resolved by https://github.com/apache/spark/pull/46088

> Improve docstring of mapInArrow
> ---
>
> Key: SPARK-47876
> URL: https://issues.apache.org/jira/browse/SPARK-47876
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>
> Improve docstring of mapInArrow



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47876) Improve docstring of mapInArrow

2024-04-16 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-47876:


 Summary: Improve docstring of mapInArrow
 Key: SPARK-47876
 URL: https://issues.apache.org/jira/browse/SPARK-47876
 Project: Spark
  Issue Type: Documentation
  Components: Documentation, PySpark
Affects Versions: 4.0.0
Reporter: Xinrong Meng


Improve docstring of mapInArrow



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47823) Improve appName and getOrCreate usage for Spark Connect

2024-04-11 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-47823:
-
Description: 
 

In Spark Connect
{code:java}
spark = SparkSession.builder.appName("...").getOrCreate(){code}
 

raises error

 
{code:java}
[CANNOT_CONFIGURE_SPARK_CONNECT_MASTER] Spark Connect server and Spark master 
cannot be configured together: Spark master [...], Spark Connect [...]{code}
 

We should ban the usage of appName in Spark Connect

 

  was:
 

In Spark Connect
{code:java}
spark = SparkSession.builder.appName("...").getOrCreate(){code}
 

raises error{{{}{}}}

 
{code:java}
[CANNOT_CONFIGURE_SPARK_CONNECT_MASTER] Spark Connect server and Spark master 
cannot be configured together: Spark master [...], Spark Connect [...]{code}
 

We should ban the usage of appName in Spark Connect

 


> Improve appName and getOrCreate usage for Spark Connect
> ---
>
> Key: SPARK-47823
> URL: https://issues.apache.org/jira/browse/SPARK-47823
> Project: Spark
>  Issue Type: Story
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>
>  
> In Spark Connect
> {code:java}
> spark = SparkSession.builder.appName("...").getOrCreate(){code}
>  
> raises error
>  
> {code:java}
> [CANNOT_CONFIGURE_SPARK_CONNECT_MASTER] Spark Connect server and Spark master 
> cannot be configured together: Spark master [...], Spark Connect [...]{code}
>  
> We should ban the usage of appName in Spark Connect
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47823) Improve appName and getOrCreate usage for Spark Connect

2024-04-11 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-47823:


 Summary: Improve appName and getOrCreate usage for Spark Connect
 Key: SPARK-47823
 URL: https://issues.apache.org/jira/browse/SPARK-47823
 Project: Spark
  Issue Type: Story
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Xinrong Meng


 

In Spark Connect
{code:java}
spark = SparkSession.builder.appName("...").getOrCreate(){code}
 

raises error{{{}{}}}

 
{code:java}
[CANNOT_CONFIGURE_SPARK_CONNECT_MASTER] Spark Connect server and Spark master 
cannot be configured together: Spark master [...], Spark Connect [...]{code}
 

We should ban the usage of appName in Spark Connect

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47677) Pandas circular import error in Python 3.10

2024-04-01 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-47677:
-
Description: 
{{AttributeError: partially initialized module 'pandas' has no attribute 
'_pandas_datetime_CAPI' (most likely due to a circular import)}}

 

The above error appears in multiple tests with Python 3.10.

Python 3.11, 3.12 and pypy3 don't have the issue.

 

See [https://github.com/apache/spark/actions/runs/8469356110/job/23208894575] 
for details.

  was:
{{AttributeError: partially initialized module 'pandas' has no attribute 
'_pandas_datetime_CAPI' (most likely due to a circular import)}}

 

The above error appears in multiple tests with Python 3.10.

See [https://github.com/apache/spark/actions/runs/8469356110/job/23208894575] 
for details.


> Pandas circular import error in Python 3.10 
> 
>
> Key: SPARK-47677
> URL: https://issues.apache.org/jira/browse/SPARK-47677
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>
> {{AttributeError: partially initialized module 'pandas' has no attribute 
> '_pandas_datetime_CAPI' (most likely due to a circular import)}}
>  
> The above error appears in multiple tests with Python 3.10.
> Python 3.11, 3.12 and pypy3 don't have the issue.
>  
> See [https://github.com/apache/spark/actions/runs/8469356110/job/23208894575] 
> for details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47677) Pandas circular import error in Python 3.10

2024-04-01 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-47677:


 Summary: Pandas circular import error in Python 3.10 
 Key: SPARK-47677
 URL: https://issues.apache.org/jira/browse/SPARK-47677
 Project: Spark
  Issue Type: Test
  Components: PySpark, Tests
Affects Versions: 4.0.0
Reporter: Xinrong Meng


{{AttributeError: partially initialized module 'pandas' has no attribute 
'_pandas_datetime_CAPI' (most likely due to a circular import)}}

 

The above error appears in multiple tests with Python 3.10.

See [https://github.com/apache/spark/actions/runs/8469356110/job/23208894575] 
for details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47276) Introduce `spark.profile.clear` for SparkSession-based profiling

2024-03-07 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng resolved SPARK-47276.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45378
[https://github.com/apache/spark/pull/45378]

> Introduce `spark.profile.clear` for SparkSession-based profiling
> 
>
> Key: SPARK-47276
> URL: https://issues.apache.org/jira/browse/SPARK-47276
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Introduce `spark.profile.clear` for SparkSession-based profiling



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47276) Introduce `spark.profile.clear` for SparkSession-based profiling

2024-03-04 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-47276:


 Summary: Introduce `spark.profile.clear` for SparkSession-based 
profiling
 Key: SPARK-47276
 URL: https://issues.apache.org/jira/browse/SPARK-47276
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Xinrong Meng


Introduce `spark.profile.clear` for SparkSession-based profiling



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46975) Support dedicated fallback methods

2024-02-23 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng resolved SPARK-46975.
--
Resolution: Done

Resolved by https://github.com/apache/spark/pull/45026

> Support dedicated fallback methods
> --
>
> Key: SPARK-46975
> URL: https://issues.apache.org/jira/browse/SPARK-46975
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46975) Support dedicated fallback methods

2024-02-23 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng reassigned SPARK-46975:


Assignee: Ruifeng Zheng

> Support dedicated fallback methods
> --
>
> Key: SPARK-46975
> URL: https://issues.apache.org/jira/browse/SPARK-46975
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()

2024-02-22 Thread Xinrong Meng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819779#comment-17819779
 ] 

Xinrong Meng edited comment on SPARK-47132 at 2/22/24 7:21 PM:
---

[~wunderalbert] would you double check if you set up your Jira account 
correctly? I somehow couldn't assign the ticket to you.

 

!image-2024-02-22-11-21-30-460.png!


was (Author: xinrongm):
[~wunderalbert] would you double check if you set up your Jira account 
correctly? I somehow couldn't assign the ticket to you.

> Mistake in Docstring for Pyspark's Dataframe.head()
> ---
>
> Key: SPARK-47132
> URL: https://issues.apache.org/jira/browse/SPARK-47132
> Project: Spark
>  Issue Type: Documentation
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Albert Ziegler
>Priority: Trivial
>  Labels: pull-request-available
> Attachments: image-2024-02-22-11-18-02-429.png, 
> image-2024-02-22-11-21-30-460.png
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The docstring claims that {{head(n)}} would return a {{Row}} (rather than a 
> list of rows) iff n == 1, but that's incorrect.
> Type hints, example, and implementation show that the difference between row 
> or list of rows lies in whether n is supplied at all -- if it isn't, 
> {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} 
> returns a list.
>  
> A suggestion to fix is here: https://github.com/apache/spark/pull/45197



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()

2024-02-22 Thread Xinrong Meng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819779#comment-17819779
 ] 

Xinrong Meng commented on SPARK-47132:
--

[~wunderalbert] would you double check if you set up your Jira account 
correctly? I somehow couldn't assign the ticket to you.

> Mistake in Docstring for Pyspark's Dataframe.head()
> ---
>
> Key: SPARK-47132
> URL: https://issues.apache.org/jira/browse/SPARK-47132
> Project: Spark
>  Issue Type: Documentation
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Albert Ziegler
>Priority: Trivial
>  Labels: pull-request-available
> Attachments: image-2024-02-22-11-18-02-429.png
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The docstring claims that {{head(n)}} would return a {{Row}} (rather than a 
> list of rows) iff n == 1, but that's incorrect.
> Type hints, example, and implementation show that the difference between row 
> or list of rows lies in whether n is supplied at all -- if it isn't, 
> {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} 
> returns a list.
>  
> A suggestion to fix is here: https://github.com/apache/spark/pull/45197



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()

2024-02-22 Thread Xinrong Meng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819780#comment-17819780
 ] 

Xinrong Meng commented on SPARK-47132:
--

Resolved by https://github.com/apache/spark/pull/45197.

> Mistake in Docstring for Pyspark's Dataframe.head()
> ---
>
> Key: SPARK-47132
> URL: https://issues.apache.org/jira/browse/SPARK-47132
> Project: Spark
>  Issue Type: Documentation
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Albert Ziegler
>Priority: Trivial
>  Labels: pull-request-available
> Attachments: image-2024-02-22-11-18-02-429.png
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The docstring claims that {{head(n)}} would return a {{Row}} (rather than a 
> list of rows) iff n == 1, but that's incorrect.
> Type hints, example, and implementation show that the difference between row 
> or list of rows lies in whether n is supplied at all -- if it isn't, 
> {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} 
> returns a list.
>  
> A suggestion to fix is here: https://github.com/apache/spark/pull/45197



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()

2024-02-22 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-47132:
-
Attachment: image-2024-02-22-11-18-02-429.png

> Mistake in Docstring for Pyspark's Dataframe.head()
> ---
>
> Key: SPARK-47132
> URL: https://issues.apache.org/jira/browse/SPARK-47132
> Project: Spark
>  Issue Type: Documentation
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Albert Ziegler
>Priority: Trivial
>  Labels: pull-request-available
> Attachments: image-2024-02-22-11-18-02-429.png
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The docstring claims that {{head(n)}} would return a {{Row}} (rather than a 
> list of rows) iff n == 1, but that's incorrect.
> Type hints, example, and implementation show that the difference between row 
> or list of rows lies in whether n is supplied at all -- if it isn't, 
> {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} 
> returns a list.
>  
> A suggestion to fix is here: https://github.com/apache/spark/pull/45197



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()

2024-02-22 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-47132:
-
Issue Type: Documentation  (was: Bug)

> Mistake in Docstring for Pyspark's Dataframe.head()
> ---
>
> Key: SPARK-47132
> URL: https://issues.apache.org/jira/browse/SPARK-47132
> Project: Spark
>  Issue Type: Documentation
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Albert Ziegler
>Priority: Trivial
>  Labels: pull-request-available
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The docstring claims that {{head(n)}} would return a {{Row}} (rather than a 
> list of rows) iff n == 1, but that's incorrect.
> Type hints, example, and implementation show that the difference between row 
> or list of rows lies in whether n is supplied at all -- if it isn't, 
> {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} 
> returns a list.
>  
> A suggestion to fix is here: https://github.com/apache/spark/pull/45197



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()

2024-02-22 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-47132:
-
Affects Version/s: 4.0.0
   (was: 3.5.0)

> Mistake in Docstring for Pyspark's Dataframe.head()
> ---
>
> Key: SPARK-47132
> URL: https://issues.apache.org/jira/browse/SPARK-47132
> Project: Spark
>  Issue Type: Documentation
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Albert Ziegler
>Priority: Trivial
>  Labels: pull-request-available
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The docstring claims that {{head(n)}} would return a {{Row}} (rather than a 
> list of rows) iff n == 1, but that's incorrect.
> Type hints, example, and implementation show that the difference between row 
> or list of rows lies in whether n is supplied at all -- if it isn't, 
> {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} 
> returns a list.
>  
> A suggestion to fix is here: https://github.com/apache/spark/pull/45197



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()

2024-02-22 Thread Xinrong Meng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819777#comment-17819777
 ] 

Xinrong Meng commented on SPARK-47132:
--

I modified the ticket to Documentation (from Bug) and 4.0.0 (from 3.5.0).

> Mistake in Docstring for Pyspark's Dataframe.head()
> ---
>
> Key: SPARK-47132
> URL: https://issues.apache.org/jira/browse/SPARK-47132
> Project: Spark
>  Issue Type: Documentation
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Albert Ziegler
>Priority: Trivial
>  Labels: pull-request-available
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The docstring claims that {{head(n)}} would return a {{Row}} (rather than a 
> list of rows) iff n == 1, but that's incorrect.
> Type hints, example, and implementation show that the difference between row 
> or list of rows lies in whether n is supplied at all -- if it isn't, 
> {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} 
> returns a list.
>  
> A suggestion to fix is here: https://github.com/apache/spark/pull/45197



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47078) Documentation for SparkSession-based Profilers

2024-02-16 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-47078:


 Summary: Documentation for SparkSession-based Profilers
 Key: SPARK-47078
 URL: https://issues.apache.org/jira/browse/SPARK-47078
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark
Affects Versions: 4.0.0
Reporter: Xinrong Meng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47014) Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession

2024-02-14 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng reassigned SPARK-47014:


Assignee: Xinrong Meng

> Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession
> -
>
> Key: SPARK-47014
> URL: https://issues.apache.org/jira/browse/SPARK-47014
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>
> Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47014) Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession

2024-02-14 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng resolved SPARK-47014.
--
Resolution: Done

Resolved by https://github.com/apache/spark/pull/45073

> Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession
> -
>
> Key: SPARK-47014
> URL: https://issues.apache.org/jira/browse/SPARK-47014
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>
> Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47014) Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession

2024-02-08 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-47014:


 Summary: Implement methods dumpPerfProfiles and dumpMemoryProfiles 
of SparkSession
 Key: SPARK-47014
 URL: https://issues.apache.org/jira/browse/SPARK-47014
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Xinrong Meng


Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46690) Support profiling on FlatMapCoGroupsInBatchExec

2024-02-08 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng reassigned SPARK-46690:


Assignee: Xinrong Meng

> Support profiling on FlatMapCoGroupsInBatchExec
> ---
>
> Key: SPARK-46690
> URL: https://issues.apache.org/jira/browse/SPARK-46690
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Assignee: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46690) Support profiling on FlatMapCoGroupsInBatchExec

2024-02-08 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng resolved SPARK-46690.
--
Resolution: Done

Resolved by https://github.com/apache/spark/pull/45050

> Support profiling on FlatMapCoGroupsInBatchExec
> ---
>
> Key: SPARK-46690
> URL: https://issues.apache.org/jira/browse/SPARK-46690
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Assignee: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46689) Support profiling on FlatMapGroupsInBatchExec

2024-02-08 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng resolved SPARK-46689.
--
Resolution: Done

Resolved by https://github.com/apache/spark/pull/45050

> Support profiling on FlatMapGroupsInBatchExec
> -
>
> Key: SPARK-46689
> URL: https://issues.apache.org/jira/browse/SPARK-46689
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Assignee: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46689) Support profiling on FlatMapGroupsInBatchExec

2024-02-08 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng reassigned SPARK-46689:


Assignee: Xinrong Meng

> Support profiling on FlatMapGroupsInBatchExec
> -
>
> Key: SPARK-46689
> URL: https://issues.apache.org/jira/browse/SPARK-46689
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Assignee: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46925) Add a warning that instructs to install memory_profiler for memory profiling

2024-01-30 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-46925:


 Summary: Add a warning that instructs to install memory_profiler 
for memory profiling
 Key: SPARK-46925
 URL: https://issues.apache.org/jira/browse/SPARK-46925
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Xinrong Meng


Add a warning that instructs to install memory_profiler for memory profiling



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46880) Improve and test warning for Arrow-optimized Python UDF

2024-01-26 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-46880:


 Summary: Improve and test warning for Arrow-optimized Python UDF
 Key: SPARK-46880
 URL: https://issues.apache.org/jira/browse/SPARK-46880
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark, Tests
Affects Versions: 4.0.0
Reporter: Xinrong Meng


Improve and test warning for Arrow-optimized Python UDF



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46467) Improve and test exceptions of TimedeltaIndex

2024-01-19 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng resolved SPARK-46467.
--
  Assignee: Xinrong Meng
Resolution: Not A Problem

We don't have a plan to migrate Pandas API on Spark to PySpark error framework, 
instead, it should follow Pandas standard. So no proposed changes for now.

> Improve and test exceptions of TimedeltaIndex
> -
>
> Key: SPARK-46467
> URL: https://issues.apache.org/jira/browse/SPARK-46467
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46781) Test data source (pyspark.sql.datasource)

2024-01-19 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-46781:


 Summary: Test data source (pyspark.sql.datasource)
 Key: SPARK-46781
 URL: https://issues.apache.org/jira/browse/SPARK-46781
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Xinrong Meng


Test custom data source and input partition.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46781) Test custom data source and input partition (pyspark.sql.datasource)

2024-01-19 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-46781:
-
Summary: Test custom data source and input partition 
(pyspark.sql.datasource)  (was: Test data source (pyspark.sql.datasource))

> Test custom data source and input partition (pyspark.sql.datasource)
> 
>
> Key: SPARK-46781
> URL: https://issues.apache.org/jira/browse/SPARK-46781
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Test custom data source and input partition.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42862) Review and fix issues in Core API docs

2024-01-17 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-42862:
-
Parent: SPARK-42523  (was: SPARK-42693)

> Review and fix issues in Core API docs
> --
>
> Key: SPARK-42862
> URL: https://issues.apache.org/jira/browse/SPARK-42862
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Yuanjian Li
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42863) Review and fix issues in PySpark API docs

2024-01-17 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-42863:
-
Parent: SPARK-42523  (was: SPARK-42693)

> Review and fix issues in PySpark API docs
> -
>
> Key: SPARK-42863
> URL: https://issues.apache.org/jira/browse/SPARK-42863
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42864) Review and fix issues in MLlib API docs

2024-01-17 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-42864:
-
Parent: SPARK-42523  (was: SPARK-42693)

> Review and fix issues in MLlib API docs
> ---
>
> Key: SPARK-42864
> URL: https://issues.apache.org/jira/browse/SPARK-42864
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42861) Review and fix issues in SQL API docs

2024-01-17 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-42861:
-
Parent: SPARK-42523  (was: SPARK-42693)

> Review and fix issues in SQL API docs
> -
>
> Key: SPARK-42861
> URL: https://issues.apache.org/jira/browse/SPARK-42861
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42866) Review and fix issues in Spark Connect - Scala API docs

2024-01-17 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-42866:
-
Parent: SPARK-42523  (was: SPARK-42693)

> Review and fix issues in Spark Connect - Scala API docs
> ---
>
> Key: SPARK-42866
> URL: https://issues.apache.org/jira/browse/SPARK-42866
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42693) API Auditing

2024-01-17 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-42693:
-
Parent: SPARK-42523
Issue Type: Sub-task  (was: Story)

> API Auditing
> 
>
> Key: SPARK-42693
> URL: https://issues.apache.org/jira/browse/SPARK-42693
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark, Spark Core, SQL, Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Blocker
>
> Audit user-facing API of Spark 3.4. The main goal is to ensure public API 
> docs to be ready for release, for example, no private classes/methods is 
> leaking and marked public.
> There are 3 common ways to audit API:
>  * build docs (into a local website) against branch-3.4 to check
>  * 'git diff' to check the source code differences between v3.3.2 and 
> branch-3.4
>  * [https://github.com/apache/spark-website/pull/443] shows most of the API 
> doc differences between v3.3.2 and the 3.4.0 RC4(the latest RC); commits are 
> categorized by components



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42523) Apache Spark 3.4 release

2024-01-17 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng resolved SPARK-42523.
--
Resolution: Done

> Apache Spark 3.4 release
> 
>
> Key: SPARK-42523
> URL: https://issues.apache.org/jira/browse/SPARK-42523
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>
> An umbrella for Apache Spark 3.4 release



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46467) Improve and test exceptions of TimedeltaIndex

2023-12-20 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-46467:


 Summary: Improve and test exceptions of TimedeltaIndex
 Key: SPARK-46467
 URL: https://issues.apache.org/jira/browse/SPARK-46467
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, Tests
Affects Versions: 4.0.0
Reporter: Xinrong Meng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46459) Fix bundler to 2.4.22 to unclock CI

2023-12-19 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-46459:


 Summary: Fix bundler to 2.4.22 to unclock CI
 Key: SPARK-46459
 URL: https://issues.apache.org/jira/browse/SPARK-46459
 Project: Spark
  Issue Type: Story
  Components: Build, PySpark
Affects Versions: 4.0.0
Reporter: Xinrong Meng


Fix bundler to 2.4.22 to unclock CI



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46386) Improve assertions of observation (pyspark.sql.observation)

2023-12-14 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-46386:
-
Summary: Improve assertions of observation (pyspark.sql.observation)  (was: 
Improve and test assertions of observation (pyspark.sql.observation))

> Improve assertions of observation (pyspark.sql.observation)
> ---
>
> Key: SPARK-46386
> URL: https://issues.apache.org/jira/browse/SPARK-46386
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46386) Improve and test assertions of observation (pyspark.sql.observation)

2023-12-14 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-46386:
-
Parent: (was: SPARK-46041)
Issue Type: Improvement  (was: Sub-task)

> Improve and test assertions of observation (pyspark.sql.observation)
> 
>
> Key: SPARK-46386
> URL: https://issues.apache.org/jira/browse/SPARK-46386
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46413) Validate returnType of Arrow Python UDF

2023-12-14 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-46413:
-
Description: Validate returnType of Arrow Python UDF  (was: Check 
returnType of Arrow Python UDF)

> Validate returnType of Arrow Python UDF
> ---
>
> Key: SPARK-46413
> URL: https://issues.apache.org/jira/browse/SPARK-46413
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Validate returnType of Arrow Python UDF



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46413) Validate returnType of Arrow Python UDF

2023-12-14 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-46413:
-
Summary: Validate returnType of Arrow Python UDF  (was: Check returnType of 
Arrow Python UDF)

> Validate returnType of Arrow Python UDF
> ---
>
> Key: SPARK-46413
> URL: https://issues.apache.org/jira/browse/SPARK-46413
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Check returnType of Arrow Python UDF



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46413) Check returnType of Arrow Python UDF

2023-12-14 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-46413:


 Summary: Check returnType of Arrow Python UDF
 Key: SPARK-46413
 URL: https://issues.apache.org/jira/browse/SPARK-46413
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Xinrong Meng


Check returnType of Arrow Python UDF



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46398) Test rangeBetween window function (pyspark.sql.window)

2023-12-13 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-46398:


 Summary: Test rangeBetween window function (pyspark.sql.window)
 Key: SPARK-46398
 URL: https://issues.apache.org/jira/browse/SPARK-46398
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, Tests
Affects Versions: 4.0.0
Reporter: Xinrong Meng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46386) Improve and test assertions of observation (pyspark.sql.observation)

2023-12-12 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-46386:


 Summary: Improve and test assertions of observation 
(pyspark.sql.observation)
 Key: SPARK-46386
 URL: https://issues.apache.org/jira/browse/SPARK-46386
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, Tests
Affects Versions: 4.0.0
Reporter: Xinrong Meng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46385) Test aggregate functions for groups (pyspark.sql.group)

2023-12-12 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-46385:


 Summary: Test aggregate functions for groups (pyspark.sql.group)
 Key: SPARK-46385
 URL: https://issues.apache.org/jira/browse/SPARK-46385
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, Tests
Affects Versions: 4.0.0
Reporter: Xinrong Meng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46277) Validate startup urls with the config being set

2023-12-07 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng resolved SPARK-46277.
--
Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/44194

> Validate startup urls with the config being set
> ---
>
> Key: SPARK-46277
> URL: https://issues.apache.org/jira/browse/SPARK-46277
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-12-05-15-39-08-830.png
>
>
> !image-2023-12-05-15-39-08-830.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46277) Validate startup urls with the config being set

2023-12-07 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng reassigned SPARK-46277:


Assignee: Xinrong Meng

> Validate startup urls with the config being set
> ---
>
> Key: SPARK-46277
> URL: https://issues.apache.org/jira/browse/SPARK-46277
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-12-05-15-39-08-830.png
>
>
> !image-2023-12-05-15-39-08-830.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46291) Koalas Testing Migration

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-46291:
-
Description: Test migration from Koalas to Spark repository, including 
setting up the testing environment and dependencies, and CI jobs.

> Koalas Testing Migration
> 
>
> Key: SPARK-46291
> URL: https://issues.apache.org/jira/browse/SPARK-46291
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>
> Test migration from Koalas to Spark repository, including setting up the 
> testing environment and dependencies, and CI jobs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46291) Koalas Testing Migration

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-46291:
-
Summary: Koalas Testing Migration  (was: Testing migration)

> Koalas Testing Migration
> 
>
> Key: SPARK-46291
> URL: https://issues.apache.org/jira/browse/SPARK-46291
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46291) Testing migration

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng reassigned SPARK-46291:


Assignee: Xinrong Meng

> Testing migration
> -
>
> Key: SPARK-46291
> URL: https://issues.apache.org/jira/browse/SPARK-46291
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46291) Testing migration

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng resolved SPARK-46291.
--
Resolution: Done

> Testing migration
> -
>
> Key: SPARK-46291
> URL: https://issues.apache.org/jira/browse/SPARK-46291
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34999) Consolidate PySpark testing utils

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-34999:
-
Parent Issue: SPARK-46291  (was: SPARK-34849)

> Consolidate PySpark testing utils
> -
>
> Key: SPARK-34999
> URL: https://issues.apache.org/jira/browse/SPARK-34999
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.2.0
>
>
> `python/pyspark/pandas/testing` hold test utilites for pandas-on-spark, and 
> `python/pyspark/testing` contain test utilities for pyspark. Consolidating 
> them makes code cleaner and easier to maintain.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35012) Port Koalas DataFrame related unit tests into PySpark

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-35012:
-
Parent Issue: SPARK-46291  (was: SPARK-34849)

> Port Koalas DataFrame related unit tests into PySpark
> -
>
> Key: SPARK-35012
> URL: https://issues.apache.org/jira/browse/SPARK-35012
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.2.0
>
>
> This JIRA aims to port Koalas DataFrame related unit tests to [PySpark 
> tests|https://github.com/apache/spark/tree/master/python/pyspark/tests].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35300) Standardize module name in install.rst

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-35300:
-
Parent Issue: SPARK-46291  (was: SPARK-34849)

> Standardize module name in install.rst
> --
>
> Key: SPARK-35300
> URL: https://issues.apache.org/jira/browse/SPARK-35300
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.2.0
>
>
> We should use the full names of modules in install.rst.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35034) Port Koalas miscellaneous unit tests into PySpark

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-35034:
-
Parent Issue: SPARK-46291  (was: SPARK-34849)

> Port Koalas miscellaneous unit tests into PySpark
> -
>
> Key: SPARK-35034
> URL: https://issues.apache.org/jira/browse/SPARK-35034
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.2.0
>
>
> This JIRA aims to port Koalas miscellaneous unit tests to [PySpark 
> tests|https://github.com/apache/spark/tree/master/python/pyspark/tests].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35035) Port Koalas internal implementation unit tests into PySpark

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-35035:
-
Parent Issue: SPARK-46291  (was: SPARK-34849)

> Port Koalas internal implementation unit tests into PySpark
> ---
>
> Key: SPARK-35035
> URL: https://issues.apache.org/jira/browse/SPARK-35035
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.2.0
>
>
> This JIRA aims to port Koalas internal implementation related unit tests to 
> [PySpark 
> tests|https://github.com/apache/spark/tree/master/python/pyspark/tests].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35040) Remove Spark-version related codes from test codes.

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-35040:
-
Parent Issue: SPARK-46291  (was: SPARK-34849)

> Remove Spark-version related codes from test codes.
> ---
>
> Key: SPARK-35040
> URL: https://issues.apache.org/jira/browse/SPARK-35040
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Takuya Ueshin
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.2.0
>
>
> There are several places to check the PySpark version and switch the tests, 
> but now those are not necessary.
> We should remove them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35098) Revisit pandas-on-Spark test cases that are disabled because of pandas nondeterministic return values

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-35098:
-
Parent Issue: SPARK-46291  (was: SPARK-34849)

> Revisit pandas-on-Spark test cases that are disabled because of pandas 
> nondeterministic return values
> -
>
> Key: SPARK-35098
> URL: https://issues.apache.org/jira/browse/SPARK-35098
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.2.0
>
>
> Some test cases have been disabled in the places as shown below because of 
> pandas nondeterministic return values:
>  * pandas returns `None` or `nan` randomly
> python/pyspark/pandas/tests/test_series.py test_astype
>  * pandas returns `True` or `False` randomly
> python/pyspark/pandas/tests/indexes/test_base.py test_monotonic
> We should revisit them later.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35033) Port Koalas plot unit tests into PySpark

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-35033:
-
Parent Issue: SPARK-46291  (was: SPARK-34849)

> Port Koalas plot unit tests into PySpark
> 
>
> Key: SPARK-35033
> URL: https://issues.apache.org/jira/browse/SPARK-35033
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.2.0
>
>
> This JIRA aims to port Koalas plot unit tests to [PySpark 
> tests|https://github.com/apache/spark/tree/master/python/pyspark/tests].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35032) Port Koalas Index unit tests into PySpark

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-35032:
-
Parent Issue: SPARK-46291  (was: SPARK-34849)

> Port Koalas Index unit tests into PySpark
> -
>
> Key: SPARK-35032
> URL: https://issues.apache.org/jira/browse/SPARK-35032
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.2.0
>
>
> This JIRA aims to port Koalas Index unit tests to [PySpark 
> tests|https://github.com/apache/spark/tree/master/python/pyspark/tests].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35031) Port Koalas operations on different frames tests into PySpark

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-35031:
-
Parent Issue: SPARK-46291  (was: SPARK-34849)

> Port Koalas operations on different frames tests into PySpark
> -
>
> Key: SPARK-35031
> URL: https://issues.apache.org/jira/browse/SPARK-35031
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.2.0
>
>
> This JIRA aims to port Koalas operations on different frames related unit 
> tests to [PySpark 
> tests|https://github.com/apache/spark/tree/master/python/pyspark/tests].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34996) Port Koalas Series related unit tests into PySpark

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-34996:
-
Parent Issue: SPARK-46291  (was: SPARK-34849)

> Port Koalas Series related unit tests into PySpark
> --
>
> Key: SPARK-34996
> URL: https://issues.apache.org/jira/browse/SPARK-34996
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.2.0
>
>
> This JIRA aims to port Koalas Series related unit tests to [PySpark 
> tests|https://github.com/apache/spark/tree/master/python/pyspark/tests].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34887) Port/integrate Koalas dependencies into PySpark

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-34887:
-
Parent Issue: SPARK-46291  (was: SPARK-34849)

> Port/integrate Koalas dependencies into PySpark
> ---
>
> Key: SPARK-34887
> URL: https://issues.apache.org/jira/browse/SPARK-34887
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Haejoon Lee
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.2.0
>
>
> This JIRA aims to port Koalas dependencies appropriately to PySpark 
> dependencies.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34886) Port/integrate Koalas DataFrame unit test into PySpark

2023-12-06 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-34886:
-
Parent Issue: SPARK-46291  (was: SPARK-34849)

> Port/integrate Koalas DataFrame unit test into PySpark
> --
>
> Key: SPARK-34886
> URL: https://issues.apache.org/jira/browse/SPARK-34886
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Haejoon Lee
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.2.0
>
>
> This JIRA aims to port [Koalas DataFrame 
> test|https://github.com/databricks/koalas/tree/master/databricks/koalas/tests/test_dataframe.py]
>  appropriately to [PySpark 
> tests|https://github.com/apache/spark/tree/master/python/pyspark/tests].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46291) Testing migration

2023-12-06 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-46291:


 Summary: Testing migration
 Key: SPARK-46291
 URL: https://issues.apache.org/jira/browse/SPARK-46291
 Project: Spark
  Issue Type: Umbrella
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Xinrong Meng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46277) Validate startup urls with the config to set

2023-12-05 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-46277:
-
Attachment: image-2023-12-05-15-39-08-830.png

> Validate startup urls with the config to set
> 
>
> Key: SPARK-46277
> URL: https://issues.apache.org/jira/browse/SPARK-46277
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-12-05-15-39-08-830.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46277) Validate startup urls with the config to set

2023-12-05 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-46277:
-
Description: !image-2023-12-05-15-39-08-830.png!

> Validate startup urls with the config to set
> 
>
> Key: SPARK-46277
> URL: https://issues.apache.org/jira/browse/SPARK-46277
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-12-05-15-39-08-830.png
>
>
> !image-2023-12-05-15-39-08-830.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46277) Validate startup urls with the config being set

2023-12-05 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-46277:
-
Summary: Validate startup urls with the config being set  (was: Validate 
startup urls with the config to set)

> Validate startup urls with the config being set
> ---
>
> Key: SPARK-46277
> URL: https://issues.apache.org/jira/browse/SPARK-46277
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-12-05-15-39-08-830.png
>
>
> !image-2023-12-05-15-39-08-830.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46277) Validate startup urls with the config to set

2023-12-05 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-46277:


 Summary: Validate startup urls with the config to set
 Key: SPARK-46277
 URL: https://issues.apache.org/jira/browse/SPARK-46277
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Xinrong Meng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46252) Improve test coverage of memory_profiler.py

2023-12-04 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-46252:


 Summary: Improve test coverage of memory_profiler.py
 Key: SPARK-46252
 URL: https://issues.apache.org/jira/browse/SPARK-46252
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, Tests
Affects Versions: 4.0.0
Reporter: Xinrong Meng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44560) Improve tests and documentation for Arrow Python UDF

2023-07-27 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng resolved SPARK-44560.
--
Fix Version/s: 3.5.0
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 42178
[https://github.com/apache/spark/pull/42178]

> Improve tests and documentation for Arrow Python UDF
> 
>
> Key: SPARK-44560
> URL: https://issues.apache.org/jira/browse/SPARK-44560
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.5.0, 4.0.0
>
>
> Test on complex return type
> Remove complex return type constraints for Arrow Python UDF on Spark Connect
> Update documentation of the related Spark conf



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44560) Improve tests and documentation for Arrow Python UDF

2023-07-27 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng reassigned SPARK-44560:


Assignee: Xinrong Meng

> Improve tests and documentation for Arrow Python UDF
> 
>
> Key: SPARK-44560
> URL: https://issues.apache.org/jira/browse/SPARK-44560
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>
> Test on complex return type
> Remove complex return type constraints for Arrow Python UDF on Spark Connect
> Update documentation of the related Spark conf



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44560) Improve tests and documentation for Arrow Python UDF

2023-07-26 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-44560:


 Summary: Improve tests and documentation for Arrow Python UDF
 Key: SPARK-44560
 URL: https://issues.apache.org/jira/browse/SPARK-44560
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.5.0, 4.0.0
Reporter: Xinrong Meng


Test on complex return type

Remove complex return type constraints for Arrow Python UDF on Spark Connect

Update documentation of the related Spark conf



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44486) Implement PyArrow `self_destruct` feature for `toPandas`

2023-07-19 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-44486:
-
Description: 
Implement PyArrow `self_destruct` feature for `toPandas`

To make the Spark configuration 
`spark.sql.execution.arrow.pyspark.selfDestruct.enabled` be used to enable 
PyArrow’s `self_destruct` feature in Spark Connect, which can save memory when 
creating a Pandas DataFrame via `toPandas` by freeing Arrow-allocated memory 
while building the Pandas DataFrame. 

  was:
Implement PyArrow `self_destruct` feature for `toPandas`

 

Now the Spark configuration 
`spark.sql.execution.arrow.pyspark.selfDestruct.enabled` can be used to enable 
PyArrow’s `self_destruct` feature in Spark Connect, which can save memory when 
creating a Pandas DataFrame via `toPandas` by freeing Arrow-allocated memory 
while building the Pandas DataFrame. 


> Implement PyArrow `self_destruct` feature for `toPandas`
> 
>
> Key: SPARK-44486
> URL: https://issues.apache.org/jira/browse/SPARK-44486
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Implement PyArrow `self_destruct` feature for `toPandas`
> To make the Spark configuration 
> `spark.sql.execution.arrow.pyspark.selfDestruct.enabled` be used to enable 
> PyArrow’s `self_destruct` feature in Spark Connect, which can save memory 
> when creating a Pandas DataFrame via `toPandas` by freeing Arrow-allocated 
> memory while building the Pandas DataFrame. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44486) Implement PyArrow `self_destruct` feature for `toPandas`

2023-07-19 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-44486:
-
Description: 
Implement PyArrow `self_destruct` feature for `toPandas`

 

Now the Spark configuration 
`spark.sql.execution.arrow.pyspark.selfDestruct.enabled` can be used to enable 
PyArrow’s `self_destruct` feature in Spark Connect, which can save memory when 
creating a Pandas DataFrame via `toPandas` by freeing Arrow-allocated memory 
while building the Pandas DataFrame. 

  was:Implement PyArrow `self_destruct` feature for `toPandas`


> Implement PyArrow `self_destruct` feature for `toPandas`
> 
>
> Key: SPARK-44486
> URL: https://issues.apache.org/jira/browse/SPARK-44486
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Implement PyArrow `self_destruct` feature for `toPandas`
>  
> Now the Spark configuration 
> `spark.sql.execution.arrow.pyspark.selfDestruct.enabled` can be used to 
> enable PyArrow’s `self_destruct` feature in Spark Connect, which can save 
> memory when creating a Pandas DataFrame via `toPandas` by freeing 
> Arrow-allocated memory while building the Pandas DataFrame. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44486) Implement PyArrow `self_destruct` feature for `toPandas`

2023-07-19 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-44486:


 Summary: Implement PyArrow `self_destruct` feature for `toPandas`
 Key: SPARK-44486
 URL: https://issues.apache.org/jira/browse/SPARK-44486
 Project: Spark
  Issue Type: Improvement
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Xinrong Meng


Implement PyArrow `self_destruct` feature for `toPandas`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44446) Add checks for expected list type special cases

2023-07-17 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-6?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng reassigned SPARK-6:


Assignee: Amanda Liu

> Add checks for expected list type special cases
> ---
>
> Key: SPARK-6
> URL: https://issues.apache.org/jira/browse/SPARK-6
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Amanda Liu
>Assignee: Amanda Liu
>Priority: Major
>
> SPIP: 
> https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44446) Add checks for expected list type special cases

2023-07-17 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-6?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng resolved SPARK-6.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 42023
[https://github.com/apache/spark/pull/42023]

> Add checks for expected list type special cases
> ---
>
> Key: SPARK-6
> URL: https://issues.apache.org/jira/browse/SPARK-6
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Amanda Liu
>Assignee: Amanda Liu
>Priority: Major
> Fix For: 3.5.0
>
>
> SPIP: 
> https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44264) DeepSpeed Distrobutor

2023-07-14 Thread Xinrong Meng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17743293#comment-17743293
 ] 

Xinrong Meng commented on SPARK-44264:
--

Issue resolved by pull request https://github.com/apache/spark/pull/41946

> DeepSpeed Distrobutor
> -
>
> Key: SPARK-44264
> URL: https://issues.apache.org/jira/browse/SPARK-44264
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Affects Versions: 3.4.1
>Reporter: Lu Wang
>Priority: Critical
> Fix For: 3.5.0
>
>
> To make it easier for Pyspark users to run distributed training and inference 
> with DeepSpeed on spark clusters using PySpark. This was a project determined 
> by the Databricks ML Training Team.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44398) Scala foreachBatch API in Streaming Spark Connect

2023-07-13 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng resolved SPARK-44398.
--
Resolution: Fixed

Issue resolved by pull request 41969
[https://github.com/apache/spark/pull/41969]

> Scala foreachBatch API in Streaming Spark Connect
> -
>
> Key: SPARK-44398
> URL: https://issues.apache.org/jira/browse/SPARK-44398
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
>Priority: Major
> Fix For: 3.5.0
>
>
> Implement foreachBatch API in Scala Spark Connect



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44398) Scala foreachBatch API in Streaming Spark Connect

2023-07-13 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng reassigned SPARK-44398:


Assignee: Raghu Angadi

> Scala foreachBatch API in Streaming Spark Connect
> -
>
> Key: SPARK-44398
> URL: https://issues.apache.org/jira/browse/SPARK-44398
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
>Priority: Major
> Fix For: 3.5.0
>
>
> Implement foreachBatch API in Scala Spark Connect



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44401) Arrow Python UDF Use Guide

2023-07-12 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-44401:
-
Component/s: Documentation

> Arrow Python UDF Use Guide
> --
>
> Key: SPARK-44401
> URL: https://issues.apache.org/jira/browse/SPARK-44401
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.5.0
>Reporter: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44401) Arrow Python UDF Use Guide

2023-07-12 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-44401:


 Summary: Arrow Python UDF Use Guide
 Key: SPARK-44401
 URL: https://issues.apache.org/jira/browse/SPARK-44401
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.5.0
Reporter: Xinrong Meng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44399) Import SparkSession in Python UDF only when useArrow is None

2023-07-12 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-44399:


 Summary: Import SparkSession in Python UDF only when useArrow is 
None
 Key: SPARK-44399
 URL: https://issues.apache.org/jira/browse/SPARK-44399
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.5.0
Reporter: Xinrong Meng


Import SparkSession in Python UDF only when useArrow is None



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44150) Explicit Arrow casting for mismatched return type in Arrow Python UDF

2023-06-29 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng reassigned SPARK-44150:


Assignee: Xinrong Meng

> Explicit Arrow casting for mismatched return type in Arrow Python UDF
> -
>
> Key: SPARK-44150
> URL: https://issues.apache.org/jira/browse/SPARK-44150
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.5.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44150) Explicit Arrow casting for mismatched return type in Arrow Python UDF

2023-06-29 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng resolved SPARK-44150.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41503
[https://github.com/apache/spark/pull/41503]

> Explicit Arrow casting for mismatched return type in Arrow Python UDF
> -
>
> Key: SPARK-44150
> URL: https://issues.apache.org/jira/browse/SPARK-44150
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.5.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44150) Explicit Arrow casting for mismatched return type in Arrow Python UDF

2023-06-22 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-44150:


 Summary: Explicit Arrow casting for mismatched return type in 
Arrow Python UDF
 Key: SPARK-44150
 URL: https://issues.apache.org/jira/browse/SPARK-44150
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.5.0
Reporter: Xinrong Meng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40307) Introduce Arrow Python UDFs

2023-06-16 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-40307:
-
Affects Version/s: (was: 3.4.0)

> Introduce Arrow Python UDFs
> ---
>
> Key: SPARK-40307
> URL: https://issues.apache.org/jira/browse/SPARK-40307
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>
> Python user-defined function (UDF) enables users to run arbitrary code 
> against PySpark columns. It uses Pickle for (de)serialization and executes 
> row by row.
> One major performance bottleneck of Python UDFs is (de)serialization, that 
> is, the data interchanging between the worker JVM and the spawned Python 
> subprocess which actually executes the UDF. We should seek an alternative to 
> handle the (de)serialization: Arrow, which is used in the (de)serialization 
> of Pandas UDF already.
> There should be two ways to enable/disable the Arrow optimization for Python 
> UDFs:
> - the Spark configuration `spark.sql.execution.pythonUDF.arrow.enabled`, 
> disabled by default.
> - the `useArrow` parameter of the `udf` function, None by default.
> The Spark configuration takes effect only when `useArrow` is None. Otherwise, 
> `useArrow` decides whether a specific user-defined function is optimized by 
> Arrow or not.
> The reason why we introduce these two ways is to provide both a convenient, 
> per-Spark-session control and a finer-grained, per-UDF control of the Arrow 
> optimization for Python UDFs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43440) Support registration of an Arrow Python UDF

2023-06-14 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-43440:
-
Summary: Support registration of an Arrow Python UDF   (was: Support 
registration of an Arrow-optimized Python UDF )

> Support registration of an Arrow Python UDF 
> 
>
> Key: SPARK-43440
> URL: https://issues.apache.org/jira/browse/SPARK-43440
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.5.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.5.0
>
>
> Currently, when users register an Arrow-optimized Python UDF, it will be 
> registered as a pickled Python UDF and thus, executed without Arrow 
> optimization. 
> We should support Arrow-optimized Python UDFs registration and execute them 
> with Arrow optimization.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43893) Non-atomic data type support in Arrow Python UDF

2023-06-14 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-43893:
-
Summary: Non-atomic data type support in Arrow Python UDF  (was: Non-atomic 
data type support in Arrow-optimized Python UDF)

> Non-atomic data type support in Arrow Python UDF
> 
>
> Key: SPARK-43893
> URL: https://issues.apache.org/jira/browse/SPARK-43893
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.5.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43412) Introduce `SQL_ARROW_BATCHED_UDF` EvalType for Arrow Python UDFs

2023-06-14 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-43412:
-
Summary: Introduce `SQL_ARROW_BATCHED_UDF` EvalType for Arrow Python UDFs  
(was: Introduce `SQL_ARROW_BATCHED_UDF` EvalType for Arrow-optimized Python 
UDFs)

> Introduce `SQL_ARROW_BATCHED_UDF` EvalType for Arrow Python UDFs
> 
>
> Key: SPARK-43412
> URL: https://issues.apache.org/jira/browse/SPARK-43412
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.5.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.5.0
>
>
> We are about to improve nested non-atomic input/output support of an 
> Arrow-optimized Python UDF.
> However, currently, it shares the same EvalType with a pickled Python UDF, 
> but the same implementation with a Pandas UDF.
> Introducing an EvalType enables isolating the changes to Arrow-optimized 
> Python UDFs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43082) Arrow Python UDFs in Spark Connect

2023-06-14 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-43082:
-
Summary: Arrow Python UDFs in Spark Connect  (was: Arrow-optimized Python 
UDFs in Spark Connect)

> Arrow Python UDFs in Spark Connect
> --
>
> Key: SPARK-43082
> URL: https://issues.apache.org/jira/browse/SPARK-43082
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.5.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.5.0
>
>
> Implement Arrow-optimized Python UDFs in Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42893) Block Arrow Python UDFs

2023-06-14 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-42893:
-
Summary: Block Arrow Python UDFs  (was: Block Arrow-optimized Python UDFs)

> Block Arrow Python UDFs
> ---
>
> Key: SPARK-42893
> URL: https://issues.apache.org/jira/browse/SPARK-42893
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.4.0
>
>
> Considering the upcoming improvements on the result inconsistencies between 
> traditional Pickled Python UDFs and Arrow-optimized Python UDFs, we'd better 
> block the feature, otherwise, users who try out the feature will expect 
> behavior changes in the next release.
> In addition, since Spark Connect Python Client(SCPC) has been introduced in 
> Spark 3.4, we'd better ensure the feature is ready in both vanilla PySpark 
> and SCPC at the same time for compatibility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40307) Introduce Arrow Python UDFs

2023-06-14 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-40307:
-
Summary: Introduce Arrow Python UDFs  (was: Introduce Arrow-optimized 
Python UDFs)

> Introduce Arrow Python UDFs
> ---
>
> Key: SPARK-40307
> URL: https://issues.apache.org/jira/browse/SPARK-40307
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>
> Python user-defined function (UDF) enables users to run arbitrary code 
> against PySpark columns. It uses Pickle for (de)serialization and executes 
> row by row.
> One major performance bottleneck of Python UDFs is (de)serialization, that 
> is, the data interchanging between the worker JVM and the spawned Python 
> subprocess which actually executes the UDF. We should seek an alternative to 
> handle the (de)serialization: Arrow, which is used in the (de)serialization 
> of Pandas UDF already.
> There should be two ways to enable/disable the Arrow optimization for Python 
> UDFs:
> - the Spark configuration `spark.sql.execution.pythonUDF.arrow.enabled`, 
> disabled by default.
> - the `useArrow` parameter of the `udf` function, None by default.
> The Spark configuration takes effect only when `useArrow` is None. Otherwise, 
> `useArrow` decides whether a specific user-defined function is optimized by 
> Arrow or not.
> The reason why we introduce these two ways is to provide both a convenient, 
> per-Spark-session control and a finer-grained, per-UDF control of the Arrow 
> optimization for Python UDFs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43903) Improve ArrayType input support in Arrow Python UDF

2023-06-14 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-43903:
-
Summary: Improve ArrayType input support in Arrow Python UDF  (was: Improve 
ArrayType input support in Arrow-optimized Python UDF)

> Improve ArrayType input support in Arrow Python UDF
> ---
>
> Key: SPARK-43903
> URL: https://issues.apache.org/jira/browse/SPARK-43903
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.5.0
>Reporter: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43903) Improve ArrayType input support in Arrow-optimized Python UDF

2023-06-14 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-43903:
-
Summary: Improve ArrayType input support in Arrow-optimized Python UDF  
(was: Non-atomic data type support in Arrow-optimized Python UDF)

> Improve ArrayType input support in Arrow-optimized Python UDF
> -
>
> Key: SPARK-43903
> URL: https://issues.apache.org/jira/browse/SPARK-43903
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.5.0
>Reporter: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43893) Non-atomic data type support in Arrow-optimized Python UDF

2023-06-14 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-43893:
-
Summary: Non-atomic data type support in Arrow-optimized Python UDF  (was: 
StructType input/output support in Arrow-optimized Python UDF)

> Non-atomic data type support in Arrow-optimized Python UDF
> --
>
> Key: SPARK-43893
> URL: https://issues.apache.org/jira/browse/SPARK-43893
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.5.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >