[jira] [Updated] (SPARK-47891) Improve docstring of mapInPandas
[ https://issues.apache.org/jira/browse/SPARK-47891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-47891: - Description: Improve docstring of mapInPandas * "using a Python native function that takes and outputs a pandas DataFrame" is confusing cause the function takes and outputs "ITERATOR of pandas DataFrames" instead. * "All columns are passed together as an iterator of pandas DataFrames" easily mislead users to think the entire DataFrame will be passed together, "a batch of rows" is used instead. was:Improve docstring of mapInPandas > Improve docstring of mapInPandas > > > Key: SPARK-47891 > URL: https://issues.apache.org/jira/browse/SPARK-47891 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > Labels: pull-request-available > > Improve docstring of mapInPandas > * "using a Python native function that takes and outputs a pandas DataFrame" > is confusing cause the function takes and outputs "ITERATOR of pandas > DataFrames" instead. > * "All columns are passed together as an iterator of pandas DataFrames" > easily mislead users to think the entire DataFrame will be passed together, > "a batch of rows" is used instead. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47876) Improve docstring of mapInArrow
[ https://issues.apache.org/jira/browse/SPARK-47876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng resolved SPARK-47876. -- Resolution: Done Resolved by https://github.com/apache/spark/pull/46088 > Improve docstring of mapInArrow > --- > > Key: SPARK-47876 > URL: https://issues.apache.org/jira/browse/SPARK-47876 > Project: Spark > Issue Type: Documentation > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Labels: pull-request-available > > Improve docstring of mapInArrow -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47876) Improve docstring of mapInArrow
Xinrong Meng created SPARK-47876: Summary: Improve docstring of mapInArrow Key: SPARK-47876 URL: https://issues.apache.org/jira/browse/SPARK-47876 Project: Spark Issue Type: Documentation Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Xinrong Meng Improve docstring of mapInArrow -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47823) Improve appName and getOrCreate usage for Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-47823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-47823: - Description: In Spark Connect {code:java} spark = SparkSession.builder.appName("...").getOrCreate(){code} raises error {code:java} [CANNOT_CONFIGURE_SPARK_CONNECT_MASTER] Spark Connect server and Spark master cannot be configured together: Spark master [...], Spark Connect [...]{code} We should ban the usage of appName in Spark Connect was: In Spark Connect {code:java} spark = SparkSession.builder.appName("...").getOrCreate(){code} raises error{{{}{}}} {code:java} [CANNOT_CONFIGURE_SPARK_CONNECT_MASTER] Spark Connect server and Spark master cannot be configured together: Spark master [...], Spark Connect [...]{code} We should ban the usage of appName in Spark Connect > Improve appName and getOrCreate usage for Spark Connect > --- > > Key: SPARK-47823 > URL: https://issues.apache.org/jira/browse/SPARK-47823 > Project: Spark > Issue Type: Story > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > > > In Spark Connect > {code:java} > spark = SparkSession.builder.appName("...").getOrCreate(){code} > > raises error > > {code:java} > [CANNOT_CONFIGURE_SPARK_CONNECT_MASTER] Spark Connect server and Spark master > cannot be configured together: Spark master [...], Spark Connect [...]{code} > > We should ban the usage of appName in Spark Connect > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47823) Improve appName and getOrCreate usage for Spark Connect
Xinrong Meng created SPARK-47823: Summary: Improve appName and getOrCreate usage for Spark Connect Key: SPARK-47823 URL: https://issues.apache.org/jira/browse/SPARK-47823 Project: Spark Issue Type: Story Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Xinrong Meng In Spark Connect {code:java} spark = SparkSession.builder.appName("...").getOrCreate(){code} raises error{{{}{}}} {code:java} [CANNOT_CONFIGURE_SPARK_CONNECT_MASTER] Spark Connect server and Spark master cannot be configured together: Spark master [...], Spark Connect [...]{code} We should ban the usage of appName in Spark Connect -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47677) Pandas circular import error in Python 3.10
[ https://issues.apache.org/jira/browse/SPARK-47677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-47677: - Description: {{AttributeError: partially initialized module 'pandas' has no attribute '_pandas_datetime_CAPI' (most likely due to a circular import)}} The above error appears in multiple tests with Python 3.10. Python 3.11, 3.12 and pypy3 don't have the issue. See [https://github.com/apache/spark/actions/runs/8469356110/job/23208894575] for details. was: {{AttributeError: partially initialized module 'pandas' has no attribute '_pandas_datetime_CAPI' (most likely due to a circular import)}} The above error appears in multiple tests with Python 3.10. See [https://github.com/apache/spark/actions/runs/8469356110/job/23208894575] for details. > Pandas circular import error in Python 3.10 > > > Key: SPARK-47677 > URL: https://issues.apache.org/jira/browse/SPARK-47677 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > > {{AttributeError: partially initialized module 'pandas' has no attribute > '_pandas_datetime_CAPI' (most likely due to a circular import)}} > > The above error appears in multiple tests with Python 3.10. > Python 3.11, 3.12 and pypy3 don't have the issue. > > See [https://github.com/apache/spark/actions/runs/8469356110/job/23208894575] > for details. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47677) Pandas circular import error in Python 3.10
Xinrong Meng created SPARK-47677: Summary: Pandas circular import error in Python 3.10 Key: SPARK-47677 URL: https://issues.apache.org/jira/browse/SPARK-47677 Project: Spark Issue Type: Test Components: PySpark, Tests Affects Versions: 4.0.0 Reporter: Xinrong Meng {{AttributeError: partially initialized module 'pandas' has no attribute '_pandas_datetime_CAPI' (most likely due to a circular import)}} The above error appears in multiple tests with Python 3.10. See [https://github.com/apache/spark/actions/runs/8469356110/job/23208894575] for details. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47276) Introduce `spark.profile.clear` for SparkSession-based profiling
[ https://issues.apache.org/jira/browse/SPARK-47276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng resolved SPARK-47276. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45378 [https://github.com/apache/spark/pull/45378] > Introduce `spark.profile.clear` for SparkSession-based profiling > > > Key: SPARK-47276 > URL: https://issues.apache.org/jira/browse/SPARK-47276 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Introduce `spark.profile.clear` for SparkSession-based profiling -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47276) Introduce `spark.profile.clear` for SparkSession-based profiling
Xinrong Meng created SPARK-47276: Summary: Introduce `spark.profile.clear` for SparkSession-based profiling Key: SPARK-47276 URL: https://issues.apache.org/jira/browse/SPARK-47276 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Xinrong Meng Introduce `spark.profile.clear` for SparkSession-based profiling -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46975) Support dedicated fallback methods
[ https://issues.apache.org/jira/browse/SPARK-46975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng resolved SPARK-46975. -- Resolution: Done Resolved by https://github.com/apache/spark/pull/45026 > Support dedicated fallback methods > -- > > Key: SPARK-46975 > URL: https://issues.apache.org/jira/browse/SPARK-46975 > Project: Spark > Issue Type: Sub-task > Components: PS >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46975) Support dedicated fallback methods
[ https://issues.apache.org/jira/browse/SPARK-46975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng reassigned SPARK-46975: Assignee: Ruifeng Zheng > Support dedicated fallback methods > -- > > Key: SPARK-46975 > URL: https://issues.apache.org/jira/browse/SPARK-46975 > Project: Spark > Issue Type: Sub-task > Components: PS >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()
[ https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819779#comment-17819779 ] Xinrong Meng edited comment on SPARK-47132 at 2/22/24 7:21 PM: --- [~wunderalbert] would you double check if you set up your Jira account correctly? I somehow couldn't assign the ticket to you. !image-2024-02-22-11-21-30-460.png! was (Author: xinrongm): [~wunderalbert] would you double check if you set up your Jira account correctly? I somehow couldn't assign the ticket to you. > Mistake in Docstring for Pyspark's Dataframe.head() > --- > > Key: SPARK-47132 > URL: https://issues.apache.org/jira/browse/SPARK-47132 > Project: Spark > Issue Type: Documentation > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Albert Ziegler >Priority: Trivial > Labels: pull-request-available > Attachments: image-2024-02-22-11-18-02-429.png, > image-2024-02-22-11-21-30-460.png > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > The docstring claims that {{head(n)}} would return a {{Row}} (rather than a > list of rows) iff n == 1, but that's incorrect. > Type hints, example, and implementation show that the difference between row > or list of rows lies in whether n is supplied at all -- if it isn't, > {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} > returns a list. > > A suggestion to fix is here: https://github.com/apache/spark/pull/45197 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()
[ https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819779#comment-17819779 ] Xinrong Meng commented on SPARK-47132: -- [~wunderalbert] would you double check if you set up your Jira account correctly? I somehow couldn't assign the ticket to you. > Mistake in Docstring for Pyspark's Dataframe.head() > --- > > Key: SPARK-47132 > URL: https://issues.apache.org/jira/browse/SPARK-47132 > Project: Spark > Issue Type: Documentation > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Albert Ziegler >Priority: Trivial > Labels: pull-request-available > Attachments: image-2024-02-22-11-18-02-429.png > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > The docstring claims that {{head(n)}} would return a {{Row}} (rather than a > list of rows) iff n == 1, but that's incorrect. > Type hints, example, and implementation show that the difference between row > or list of rows lies in whether n is supplied at all -- if it isn't, > {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} > returns a list. > > A suggestion to fix is here: https://github.com/apache/spark/pull/45197 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()
[ https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819780#comment-17819780 ] Xinrong Meng commented on SPARK-47132: -- Resolved by https://github.com/apache/spark/pull/45197. > Mistake in Docstring for Pyspark's Dataframe.head() > --- > > Key: SPARK-47132 > URL: https://issues.apache.org/jira/browse/SPARK-47132 > Project: Spark > Issue Type: Documentation > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Albert Ziegler >Priority: Trivial > Labels: pull-request-available > Attachments: image-2024-02-22-11-18-02-429.png > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > The docstring claims that {{head(n)}} would return a {{Row}} (rather than a > list of rows) iff n == 1, but that's incorrect. > Type hints, example, and implementation show that the difference between row > or list of rows lies in whether n is supplied at all -- if it isn't, > {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} > returns a list. > > A suggestion to fix is here: https://github.com/apache/spark/pull/45197 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()
[ https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-47132: - Attachment: image-2024-02-22-11-18-02-429.png > Mistake in Docstring for Pyspark's Dataframe.head() > --- > > Key: SPARK-47132 > URL: https://issues.apache.org/jira/browse/SPARK-47132 > Project: Spark > Issue Type: Documentation > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Albert Ziegler >Priority: Trivial > Labels: pull-request-available > Attachments: image-2024-02-22-11-18-02-429.png > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > The docstring claims that {{head(n)}} would return a {{Row}} (rather than a > list of rows) iff n == 1, but that's incorrect. > Type hints, example, and implementation show that the difference between row > or list of rows lies in whether n is supplied at all -- if it isn't, > {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} > returns a list. > > A suggestion to fix is here: https://github.com/apache/spark/pull/45197 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()
[ https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-47132: - Issue Type: Documentation (was: Bug) > Mistake in Docstring for Pyspark's Dataframe.head() > --- > > Key: SPARK-47132 > URL: https://issues.apache.org/jira/browse/SPARK-47132 > Project: Spark > Issue Type: Documentation > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Albert Ziegler >Priority: Trivial > Labels: pull-request-available > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > The docstring claims that {{head(n)}} would return a {{Row}} (rather than a > list of rows) iff n == 1, but that's incorrect. > Type hints, example, and implementation show that the difference between row > or list of rows lies in whether n is supplied at all -- if it isn't, > {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} > returns a list. > > A suggestion to fix is here: https://github.com/apache/spark/pull/45197 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()
[ https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-47132: - Affects Version/s: 4.0.0 (was: 3.5.0) > Mistake in Docstring for Pyspark's Dataframe.head() > --- > > Key: SPARK-47132 > URL: https://issues.apache.org/jira/browse/SPARK-47132 > Project: Spark > Issue Type: Documentation > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Albert Ziegler >Priority: Trivial > Labels: pull-request-available > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > The docstring claims that {{head(n)}} would return a {{Row}} (rather than a > list of rows) iff n == 1, but that's incorrect. > Type hints, example, and implementation show that the difference between row > or list of rows lies in whether n is supplied at all -- if it isn't, > {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} > returns a list. > > A suggestion to fix is here: https://github.com/apache/spark/pull/45197 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()
[ https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819777#comment-17819777 ] Xinrong Meng commented on SPARK-47132: -- I modified the ticket to Documentation (from Bug) and 4.0.0 (from 3.5.0). > Mistake in Docstring for Pyspark's Dataframe.head() > --- > > Key: SPARK-47132 > URL: https://issues.apache.org/jira/browse/SPARK-47132 > Project: Spark > Issue Type: Documentation > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Albert Ziegler >Priority: Trivial > Labels: pull-request-available > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > The docstring claims that {{head(n)}} would return a {{Row}} (rather than a > list of rows) iff n == 1, but that's incorrect. > Type hints, example, and implementation show that the difference between row > or list of rows lies in whether n is supplied at all -- if it isn't, > {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} > returns a list. > > A suggestion to fix is here: https://github.com/apache/spark/pull/45197 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47078) Documentation for SparkSession-based Profilers
Xinrong Meng created SPARK-47078: Summary: Documentation for SparkSession-based Profilers Key: SPARK-47078 URL: https://issues.apache.org/jira/browse/SPARK-47078 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Xinrong Meng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47014) Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession
[ https://issues.apache.org/jira/browse/SPARK-47014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng reassigned SPARK-47014: Assignee: Xinrong Meng > Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession > - > > Key: SPARK-47014 > URL: https://issues.apache.org/jira/browse/SPARK-47014 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Labels: pull-request-available > > Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47014) Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession
[ https://issues.apache.org/jira/browse/SPARK-47014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng resolved SPARK-47014. -- Resolution: Done Resolved by https://github.com/apache/spark/pull/45073 > Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession > - > > Key: SPARK-47014 > URL: https://issues.apache.org/jira/browse/SPARK-47014 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Labels: pull-request-available > > Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47014) Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession
Xinrong Meng created SPARK-47014: Summary: Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession Key: SPARK-47014 URL: https://issues.apache.org/jira/browse/SPARK-47014 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Xinrong Meng Implement methods dumpPerfProfiles and dumpMemoryProfiles of SparkSession -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46690) Support profiling on FlatMapCoGroupsInBatchExec
[ https://issues.apache.org/jira/browse/SPARK-46690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng reassigned SPARK-46690: Assignee: Xinrong Meng > Support profiling on FlatMapCoGroupsInBatchExec > --- > > Key: SPARK-46690 > URL: https://issues.apache.org/jira/browse/SPARK-46690 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Takuya Ueshin >Assignee: Xinrong Meng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46690) Support profiling on FlatMapCoGroupsInBatchExec
[ https://issues.apache.org/jira/browse/SPARK-46690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng resolved SPARK-46690. -- Resolution: Done Resolved by https://github.com/apache/spark/pull/45050 > Support profiling on FlatMapCoGroupsInBatchExec > --- > > Key: SPARK-46690 > URL: https://issues.apache.org/jira/browse/SPARK-46690 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Takuya Ueshin >Assignee: Xinrong Meng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46689) Support profiling on FlatMapGroupsInBatchExec
[ https://issues.apache.org/jira/browse/SPARK-46689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng resolved SPARK-46689. -- Resolution: Done Resolved by https://github.com/apache/spark/pull/45050 > Support profiling on FlatMapGroupsInBatchExec > - > > Key: SPARK-46689 > URL: https://issues.apache.org/jira/browse/SPARK-46689 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Takuya Ueshin >Assignee: Xinrong Meng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46689) Support profiling on FlatMapGroupsInBatchExec
[ https://issues.apache.org/jira/browse/SPARK-46689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng reassigned SPARK-46689: Assignee: Xinrong Meng > Support profiling on FlatMapGroupsInBatchExec > - > > Key: SPARK-46689 > URL: https://issues.apache.org/jira/browse/SPARK-46689 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Takuya Ueshin >Assignee: Xinrong Meng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46925) Add a warning that instructs to install memory_profiler for memory profiling
Xinrong Meng created SPARK-46925: Summary: Add a warning that instructs to install memory_profiler for memory profiling Key: SPARK-46925 URL: https://issues.apache.org/jira/browse/SPARK-46925 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Xinrong Meng Add a warning that instructs to install memory_profiler for memory profiling -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46880) Improve and test warning for Arrow-optimized Python UDF
Xinrong Meng created SPARK-46880: Summary: Improve and test warning for Arrow-optimized Python UDF Key: SPARK-46880 URL: https://issues.apache.org/jira/browse/SPARK-46880 Project: Spark Issue Type: Sub-task Components: Connect, PySpark, Tests Affects Versions: 4.0.0 Reporter: Xinrong Meng Improve and test warning for Arrow-optimized Python UDF -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46467) Improve and test exceptions of TimedeltaIndex
[ https://issues.apache.org/jira/browse/SPARK-46467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng resolved SPARK-46467. -- Assignee: Xinrong Meng Resolution: Not A Problem We don't have a plan to migrate Pandas API on Spark to PySpark error framework, instead, it should follow Pandas standard. So no proposed changes for now. > Improve and test exceptions of TimedeltaIndex > - > > Key: SPARK-46467 > URL: https://issues.apache.org/jira/browse/SPARK-46467 > Project: Spark > Issue Type: Sub-task > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46781) Test data source (pyspark.sql.datasource)
Xinrong Meng created SPARK-46781: Summary: Test data source (pyspark.sql.datasource) Key: SPARK-46781 URL: https://issues.apache.org/jira/browse/SPARK-46781 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Xinrong Meng Test custom data source and input partition. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46781) Test custom data source and input partition (pyspark.sql.datasource)
[ https://issues.apache.org/jira/browse/SPARK-46781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-46781: - Summary: Test custom data source and input partition (pyspark.sql.datasource) (was: Test data source (pyspark.sql.datasource)) > Test custom data source and input partition (pyspark.sql.datasource) > > > Key: SPARK-46781 > URL: https://issues.apache.org/jira/browse/SPARK-46781 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > > Test custom data source and input partition. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42862) Review and fix issues in Core API docs
[ https://issues.apache.org/jira/browse/SPARK-42862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-42862: - Parent: SPARK-42523 (was: SPARK-42693) > Review and fix issues in Core API docs > -- > > Key: SPARK-42862 > URL: https://issues.apache.org/jira/browse/SPARK-42862 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Yuanjian Li >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42863) Review and fix issues in PySpark API docs
[ https://issues.apache.org/jira/browse/SPARK-42863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-42863: - Parent: SPARK-42523 (was: SPARK-42693) > Review and fix issues in PySpark API docs > - > > Key: SPARK-42863 > URL: https://issues.apache.org/jira/browse/SPARK-42863 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42864) Review and fix issues in MLlib API docs
[ https://issues.apache.org/jira/browse/SPARK-42864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-42864: - Parent: SPARK-42523 (was: SPARK-42693) > Review and fix issues in MLlib API docs > --- > > Key: SPARK-42864 > URL: https://issues.apache.org/jira/browse/SPARK-42864 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42861) Review and fix issues in SQL API docs
[ https://issues.apache.org/jira/browse/SPARK-42861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-42861: - Parent: SPARK-42523 (was: SPARK-42693) > Review and fix issues in SQL API docs > - > > Key: SPARK-42861 > URL: https://issues.apache.org/jira/browse/SPARK-42861 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Wenchen Fan >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42866) Review and fix issues in Spark Connect - Scala API docs
[ https://issues.apache.org/jira/browse/SPARK-42866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-42866: - Parent: SPARK-42523 (was: SPARK-42693) > Review and fix issues in Spark Connect - Scala API docs > --- > > Key: SPARK-42866 > URL: https://issues.apache.org/jira/browse/SPARK-42866 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Herman van Hövell >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42693) API Auditing
[ https://issues.apache.org/jira/browse/SPARK-42693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-42693: - Parent: SPARK-42523 Issue Type: Sub-task (was: Story) > API Auditing > > > Key: SPARK-42693 > URL: https://issues.apache.org/jira/browse/SPARK-42693 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark, Spark Core, SQL, Structured Streaming >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Blocker > > Audit user-facing API of Spark 3.4. The main goal is to ensure public API > docs to be ready for release, for example, no private classes/methods is > leaking and marked public. > There are 3 common ways to audit API: > * build docs (into a local website) against branch-3.4 to check > * 'git diff' to check the source code differences between v3.3.2 and > branch-3.4 > * [https://github.com/apache/spark-website/pull/443] shows most of the API > doc differences between v3.3.2 and the 3.4.0 RC4(the latest RC); commits are > categorized by components -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42523) Apache Spark 3.4 release
[ https://issues.apache.org/jira/browse/SPARK-42523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng resolved SPARK-42523. -- Resolution: Done > Apache Spark 3.4 release > > > Key: SPARK-42523 > URL: https://issues.apache.org/jira/browse/SPARK-42523 > Project: Spark > Issue Type: Umbrella > Components: Build >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > > An umbrella for Apache Spark 3.4 release -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46467) Improve and test exceptions of TimedeltaIndex
Xinrong Meng created SPARK-46467: Summary: Improve and test exceptions of TimedeltaIndex Key: SPARK-46467 URL: https://issues.apache.org/jira/browse/SPARK-46467 Project: Spark Issue Type: Sub-task Components: PySpark, Tests Affects Versions: 4.0.0 Reporter: Xinrong Meng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46459) Fix bundler to 2.4.22 to unclock CI
Xinrong Meng created SPARK-46459: Summary: Fix bundler to 2.4.22 to unclock CI Key: SPARK-46459 URL: https://issues.apache.org/jira/browse/SPARK-46459 Project: Spark Issue Type: Story Components: Build, PySpark Affects Versions: 4.0.0 Reporter: Xinrong Meng Fix bundler to 2.4.22 to unclock CI -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46386) Improve assertions of observation (pyspark.sql.observation)
[ https://issues.apache.org/jira/browse/SPARK-46386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-46386: - Summary: Improve assertions of observation (pyspark.sql.observation) (was: Improve and test assertions of observation (pyspark.sql.observation)) > Improve assertions of observation (pyspark.sql.observation) > --- > > Key: SPARK-46386 > URL: https://issues.apache.org/jira/browse/SPARK-46386 > Project: Spark > Issue Type: Improvement > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46386) Improve and test assertions of observation (pyspark.sql.observation)
[ https://issues.apache.org/jira/browse/SPARK-46386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-46386: - Parent: (was: SPARK-46041) Issue Type: Improvement (was: Sub-task) > Improve and test assertions of observation (pyspark.sql.observation) > > > Key: SPARK-46386 > URL: https://issues.apache.org/jira/browse/SPARK-46386 > Project: Spark > Issue Type: Improvement > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46413) Validate returnType of Arrow Python UDF
[ https://issues.apache.org/jira/browse/SPARK-46413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-46413: - Description: Validate returnType of Arrow Python UDF (was: Check returnType of Arrow Python UDF) > Validate returnType of Arrow Python UDF > --- > > Key: SPARK-46413 > URL: https://issues.apache.org/jira/browse/SPARK-46413 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > > Validate returnType of Arrow Python UDF -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46413) Validate returnType of Arrow Python UDF
[ https://issues.apache.org/jira/browse/SPARK-46413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-46413: - Summary: Validate returnType of Arrow Python UDF (was: Check returnType of Arrow Python UDF) > Validate returnType of Arrow Python UDF > --- > > Key: SPARK-46413 > URL: https://issues.apache.org/jira/browse/SPARK-46413 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > > Check returnType of Arrow Python UDF -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46413) Check returnType of Arrow Python UDF
Xinrong Meng created SPARK-46413: Summary: Check returnType of Arrow Python UDF Key: SPARK-46413 URL: https://issues.apache.org/jira/browse/SPARK-46413 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 4.0.0 Reporter: Xinrong Meng Check returnType of Arrow Python UDF -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46398) Test rangeBetween window function (pyspark.sql.window)
Xinrong Meng created SPARK-46398: Summary: Test rangeBetween window function (pyspark.sql.window) Key: SPARK-46398 URL: https://issues.apache.org/jira/browse/SPARK-46398 Project: Spark Issue Type: Sub-task Components: PySpark, Tests Affects Versions: 4.0.0 Reporter: Xinrong Meng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46386) Improve and test assertions of observation (pyspark.sql.observation)
Xinrong Meng created SPARK-46386: Summary: Improve and test assertions of observation (pyspark.sql.observation) Key: SPARK-46386 URL: https://issues.apache.org/jira/browse/SPARK-46386 Project: Spark Issue Type: Sub-task Components: PySpark, Tests Affects Versions: 4.0.0 Reporter: Xinrong Meng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46385) Test aggregate functions for groups (pyspark.sql.group)
Xinrong Meng created SPARK-46385: Summary: Test aggregate functions for groups (pyspark.sql.group) Key: SPARK-46385 URL: https://issues.apache.org/jira/browse/SPARK-46385 Project: Spark Issue Type: Sub-task Components: PySpark, Tests Affects Versions: 4.0.0 Reporter: Xinrong Meng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46277) Validate startup urls with the config being set
[ https://issues.apache.org/jira/browse/SPARK-46277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng resolved SPARK-46277. -- Resolution: Fixed Resolved by https://github.com/apache/spark/pull/44194 > Validate startup urls with the config being set > --- > > Key: SPARK-46277 > URL: https://issues.apache.org/jira/browse/SPARK-46277 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Labels: pull-request-available > Attachments: image-2023-12-05-15-39-08-830.png > > > !image-2023-12-05-15-39-08-830.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46277) Validate startup urls with the config being set
[ https://issues.apache.org/jira/browse/SPARK-46277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng reassigned SPARK-46277: Assignee: Xinrong Meng > Validate startup urls with the config being set > --- > > Key: SPARK-46277 > URL: https://issues.apache.org/jira/browse/SPARK-46277 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Labels: pull-request-available > Attachments: image-2023-12-05-15-39-08-830.png > > > !image-2023-12-05-15-39-08-830.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46291) Koalas Testing Migration
[ https://issues.apache.org/jira/browse/SPARK-46291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-46291: - Description: Test migration from Koalas to Spark repository, including setting up the testing environment and dependencies, and CI jobs. > Koalas Testing Migration > > > Key: SPARK-46291 > URL: https://issues.apache.org/jira/browse/SPARK-46291 > Project: Spark > Issue Type: Umbrella > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > > Test migration from Koalas to Spark repository, including setting up the > testing environment and dependencies, and CI jobs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46291) Koalas Testing Migration
[ https://issues.apache.org/jira/browse/SPARK-46291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-46291: - Summary: Koalas Testing Migration (was: Testing migration) > Koalas Testing Migration > > > Key: SPARK-46291 > URL: https://issues.apache.org/jira/browse/SPARK-46291 > Project: Spark > Issue Type: Umbrella > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46291) Testing migration
[ https://issues.apache.org/jira/browse/SPARK-46291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng reassigned SPARK-46291: Assignee: Xinrong Meng > Testing migration > - > > Key: SPARK-46291 > URL: https://issues.apache.org/jira/browse/SPARK-46291 > Project: Spark > Issue Type: Umbrella > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46291) Testing migration
[ https://issues.apache.org/jira/browse/SPARK-46291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng resolved SPARK-46291. -- Resolution: Done > Testing migration > - > > Key: SPARK-46291 > URL: https://issues.apache.org/jira/browse/SPARK-46291 > Project: Spark > Issue Type: Umbrella > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34999) Consolidate PySpark testing utils
[ https://issues.apache.org/jira/browse/SPARK-34999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-34999: - Parent Issue: SPARK-46291 (was: SPARK-34849) > Consolidate PySpark testing utils > - > > Key: SPARK-34999 > URL: https://issues.apache.org/jira/browse/SPARK-34999 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.2.0 > > > `python/pyspark/pandas/testing` hold test utilites for pandas-on-spark, and > `python/pyspark/testing` contain test utilities for pyspark. Consolidating > them makes code cleaner and easier to maintain. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35012) Port Koalas DataFrame related unit tests into PySpark
[ https://issues.apache.org/jira/browse/SPARK-35012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-35012: - Parent Issue: SPARK-46291 (was: SPARK-34849) > Port Koalas DataFrame related unit tests into PySpark > - > > Key: SPARK-35012 > URL: https://issues.apache.org/jira/browse/SPARK-35012 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.2.0 > > > This JIRA aims to port Koalas DataFrame related unit tests to [PySpark > tests|https://github.com/apache/spark/tree/master/python/pyspark/tests]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35300) Standardize module name in install.rst
[ https://issues.apache.org/jira/browse/SPARK-35300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-35300: - Parent Issue: SPARK-46291 (was: SPARK-34849) > Standardize module name in install.rst > -- > > Key: SPARK-35300 > URL: https://issues.apache.org/jira/browse/SPARK-35300 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.2.0 > > > We should use the full names of modules in install.rst. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35034) Port Koalas miscellaneous unit tests into PySpark
[ https://issues.apache.org/jira/browse/SPARK-35034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-35034: - Parent Issue: SPARK-46291 (was: SPARK-34849) > Port Koalas miscellaneous unit tests into PySpark > - > > Key: SPARK-35034 > URL: https://issues.apache.org/jira/browse/SPARK-35034 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.2.0 > > > This JIRA aims to port Koalas miscellaneous unit tests to [PySpark > tests|https://github.com/apache/spark/tree/master/python/pyspark/tests]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35035) Port Koalas internal implementation unit tests into PySpark
[ https://issues.apache.org/jira/browse/SPARK-35035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-35035: - Parent Issue: SPARK-46291 (was: SPARK-34849) > Port Koalas internal implementation unit tests into PySpark > --- > > Key: SPARK-35035 > URL: https://issues.apache.org/jira/browse/SPARK-35035 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.2.0 > > > This JIRA aims to port Koalas internal implementation related unit tests to > [PySpark > tests|https://github.com/apache/spark/tree/master/python/pyspark/tests]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35040) Remove Spark-version related codes from test codes.
[ https://issues.apache.org/jira/browse/SPARK-35040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-35040: - Parent Issue: SPARK-46291 (was: SPARK-34849) > Remove Spark-version related codes from test codes. > --- > > Key: SPARK-35040 > URL: https://issues.apache.org/jira/browse/SPARK-35040 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.2.0 > > > There are several places to check the PySpark version and switch the tests, > but now those are not necessary. > We should remove them. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35098) Revisit pandas-on-Spark test cases that are disabled because of pandas nondeterministic return values
[ https://issues.apache.org/jira/browse/SPARK-35098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-35098: - Parent Issue: SPARK-46291 (was: SPARK-34849) > Revisit pandas-on-Spark test cases that are disabled because of pandas > nondeterministic return values > - > > Key: SPARK-35098 > URL: https://issues.apache.org/jira/browse/SPARK-35098 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.2.0 > > > Some test cases have been disabled in the places as shown below because of > pandas nondeterministic return values: > * pandas returns `None` or `nan` randomly > python/pyspark/pandas/tests/test_series.py test_astype > * pandas returns `True` or `False` randomly > python/pyspark/pandas/tests/indexes/test_base.py test_monotonic > We should revisit them later. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35033) Port Koalas plot unit tests into PySpark
[ https://issues.apache.org/jira/browse/SPARK-35033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-35033: - Parent Issue: SPARK-46291 (was: SPARK-34849) > Port Koalas plot unit tests into PySpark > > > Key: SPARK-35033 > URL: https://issues.apache.org/jira/browse/SPARK-35033 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.2.0 > > > This JIRA aims to port Koalas plot unit tests to [PySpark > tests|https://github.com/apache/spark/tree/master/python/pyspark/tests]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35032) Port Koalas Index unit tests into PySpark
[ https://issues.apache.org/jira/browse/SPARK-35032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-35032: - Parent Issue: SPARK-46291 (was: SPARK-34849) > Port Koalas Index unit tests into PySpark > - > > Key: SPARK-35032 > URL: https://issues.apache.org/jira/browse/SPARK-35032 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.2.0 > > > This JIRA aims to port Koalas Index unit tests to [PySpark > tests|https://github.com/apache/spark/tree/master/python/pyspark/tests]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35031) Port Koalas operations on different frames tests into PySpark
[ https://issues.apache.org/jira/browse/SPARK-35031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-35031: - Parent Issue: SPARK-46291 (was: SPARK-34849) > Port Koalas operations on different frames tests into PySpark > - > > Key: SPARK-35031 > URL: https://issues.apache.org/jira/browse/SPARK-35031 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.2.0 > > > This JIRA aims to port Koalas operations on different frames related unit > tests to [PySpark > tests|https://github.com/apache/spark/tree/master/python/pyspark/tests]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34996) Port Koalas Series related unit tests into PySpark
[ https://issues.apache.org/jira/browse/SPARK-34996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-34996: - Parent Issue: SPARK-46291 (was: SPARK-34849) > Port Koalas Series related unit tests into PySpark > -- > > Key: SPARK-34996 > URL: https://issues.apache.org/jira/browse/SPARK-34996 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.2.0 > > > This JIRA aims to port Koalas Series related unit tests to [PySpark > tests|https://github.com/apache/spark/tree/master/python/pyspark/tests]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34887) Port/integrate Koalas dependencies into PySpark
[ https://issues.apache.org/jira/browse/SPARK-34887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-34887: - Parent Issue: SPARK-46291 (was: SPARK-34849) > Port/integrate Koalas dependencies into PySpark > --- > > Key: SPARK-34887 > URL: https://issues.apache.org/jira/browse/SPARK-34887 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Haejoon Lee >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.2.0 > > > This JIRA aims to port Koalas dependencies appropriately to PySpark > dependencies. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34886) Port/integrate Koalas DataFrame unit test into PySpark
[ https://issues.apache.org/jira/browse/SPARK-34886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-34886: - Parent Issue: SPARK-46291 (was: SPARK-34849) > Port/integrate Koalas DataFrame unit test into PySpark > -- > > Key: SPARK-34886 > URL: https://issues.apache.org/jira/browse/SPARK-34886 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Haejoon Lee >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.2.0 > > > This JIRA aims to port [Koalas DataFrame > test|https://github.com/databricks/koalas/tree/master/databricks/koalas/tests/test_dataframe.py] > appropriately to [PySpark > tests|https://github.com/apache/spark/tree/master/python/pyspark/tests]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46291) Testing migration
Xinrong Meng created SPARK-46291: Summary: Testing migration Key: SPARK-46291 URL: https://issues.apache.org/jira/browse/SPARK-46291 Project: Spark Issue Type: Umbrella Components: PySpark Affects Versions: 3.2.0 Reporter: Xinrong Meng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46277) Validate startup urls with the config to set
[ https://issues.apache.org/jira/browse/SPARK-46277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-46277: - Attachment: image-2023-12-05-15-39-08-830.png > Validate startup urls with the config to set > > > Key: SPARK-46277 > URL: https://issues.apache.org/jira/browse/SPARK-46277 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > Labels: pull-request-available > Attachments: image-2023-12-05-15-39-08-830.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46277) Validate startup urls with the config to set
[ https://issues.apache.org/jira/browse/SPARK-46277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-46277: - Description: !image-2023-12-05-15-39-08-830.png! > Validate startup urls with the config to set > > > Key: SPARK-46277 > URL: https://issues.apache.org/jira/browse/SPARK-46277 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > Labels: pull-request-available > Attachments: image-2023-12-05-15-39-08-830.png > > > !image-2023-12-05-15-39-08-830.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46277) Validate startup urls with the config being set
[ https://issues.apache.org/jira/browse/SPARK-46277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-46277: - Summary: Validate startup urls with the config being set (was: Validate startup urls with the config to set) > Validate startup urls with the config being set > --- > > Key: SPARK-46277 > URL: https://issues.apache.org/jira/browse/SPARK-46277 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > Labels: pull-request-available > Attachments: image-2023-12-05-15-39-08-830.png > > > !image-2023-12-05-15-39-08-830.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46277) Validate startup urls with the config to set
Xinrong Meng created SPARK-46277: Summary: Validate startup urls with the config to set Key: SPARK-46277 URL: https://issues.apache.org/jira/browse/SPARK-46277 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 4.0.0 Reporter: Xinrong Meng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46252) Improve test coverage of memory_profiler.py
Xinrong Meng created SPARK-46252: Summary: Improve test coverage of memory_profiler.py Key: SPARK-46252 URL: https://issues.apache.org/jira/browse/SPARK-46252 Project: Spark Issue Type: Sub-task Components: PySpark, Tests Affects Versions: 4.0.0 Reporter: Xinrong Meng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44560) Improve tests and documentation for Arrow Python UDF
[ https://issues.apache.org/jira/browse/SPARK-44560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng resolved SPARK-44560. -- Fix Version/s: 3.5.0 4.0.0 Resolution: Fixed Issue resolved by pull request 42178 [https://github.com/apache/spark/pull/42178] > Improve tests and documentation for Arrow Python UDF > > > Key: SPARK-44560 > URL: https://issues.apache.org/jira/browse/SPARK-44560 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.0, 4.0.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.5.0, 4.0.0 > > > Test on complex return type > Remove complex return type constraints for Arrow Python UDF on Spark Connect > Update documentation of the related Spark conf -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44560) Improve tests and documentation for Arrow Python UDF
[ https://issues.apache.org/jira/browse/SPARK-44560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng reassigned SPARK-44560: Assignee: Xinrong Meng > Improve tests and documentation for Arrow Python UDF > > > Key: SPARK-44560 > URL: https://issues.apache.org/jira/browse/SPARK-44560 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.0, 4.0.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > > Test on complex return type > Remove complex return type constraints for Arrow Python UDF on Spark Connect > Update documentation of the related Spark conf -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44560) Improve tests and documentation for Arrow Python UDF
Xinrong Meng created SPARK-44560: Summary: Improve tests and documentation for Arrow Python UDF Key: SPARK-44560 URL: https://issues.apache.org/jira/browse/SPARK-44560 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 3.5.0, 4.0.0 Reporter: Xinrong Meng Test on complex return type Remove complex return type constraints for Arrow Python UDF on Spark Connect Update documentation of the related Spark conf -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44486) Implement PyArrow `self_destruct` feature for `toPandas`
[ https://issues.apache.org/jira/browse/SPARK-44486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-44486: - Description: Implement PyArrow `self_destruct` feature for `toPandas` To make the Spark configuration `spark.sql.execution.arrow.pyspark.selfDestruct.enabled` be used to enable PyArrow’s `self_destruct` feature in Spark Connect, which can save memory when creating a Pandas DataFrame via `toPandas` by freeing Arrow-allocated memory while building the Pandas DataFrame. was: Implement PyArrow `self_destruct` feature for `toPandas` Now the Spark configuration `spark.sql.execution.arrow.pyspark.selfDestruct.enabled` can be used to enable PyArrow’s `self_destruct` feature in Spark Connect, which can save memory when creating a Pandas DataFrame via `toPandas` by freeing Arrow-allocated memory while building the Pandas DataFrame. > Implement PyArrow `self_destruct` feature for `toPandas` > > > Key: SPARK-44486 > URL: https://issues.apache.org/jira/browse/SPARK-44486 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > > Implement PyArrow `self_destruct` feature for `toPandas` > To make the Spark configuration > `spark.sql.execution.arrow.pyspark.selfDestruct.enabled` be used to enable > PyArrow’s `self_destruct` feature in Spark Connect, which can save memory > when creating a Pandas DataFrame via `toPandas` by freeing Arrow-allocated > memory while building the Pandas DataFrame. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44486) Implement PyArrow `self_destruct` feature for `toPandas`
[ https://issues.apache.org/jira/browse/SPARK-44486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-44486: - Description: Implement PyArrow `self_destruct` feature for `toPandas` Now the Spark configuration `spark.sql.execution.arrow.pyspark.selfDestruct.enabled` can be used to enable PyArrow’s `self_destruct` feature in Spark Connect, which can save memory when creating a Pandas DataFrame via `toPandas` by freeing Arrow-allocated memory while building the Pandas DataFrame. was:Implement PyArrow `self_destruct` feature for `toPandas` > Implement PyArrow `self_destruct` feature for `toPandas` > > > Key: SPARK-44486 > URL: https://issues.apache.org/jira/browse/SPARK-44486 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Priority: Major > > Implement PyArrow `self_destruct` feature for `toPandas` > > Now the Spark configuration > `spark.sql.execution.arrow.pyspark.selfDestruct.enabled` can be used to > enable PyArrow’s `self_destruct` feature in Spark Connect, which can save > memory when creating a Pandas DataFrame via `toPandas` by freeing > Arrow-allocated memory while building the Pandas DataFrame. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44486) Implement PyArrow `self_destruct` feature for `toPandas`
Xinrong Meng created SPARK-44486: Summary: Implement PyArrow `self_destruct` feature for `toPandas` Key: SPARK-44486 URL: https://issues.apache.org/jira/browse/SPARK-44486 Project: Spark Issue Type: Improvement Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Xinrong Meng Implement PyArrow `self_destruct` feature for `toPandas` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44446) Add checks for expected list type special cases
[ https://issues.apache.org/jira/browse/SPARK-6?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng reassigned SPARK-6: Assignee: Amanda Liu > Add checks for expected list type special cases > --- > > Key: SPARK-6 > URL: https://issues.apache.org/jira/browse/SPARK-6 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Amanda Liu >Assignee: Amanda Liu >Priority: Major > > SPIP: > https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44446) Add checks for expected list type special cases
[ https://issues.apache.org/jira/browse/SPARK-6?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng resolved SPARK-6. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 42023 [https://github.com/apache/spark/pull/42023] > Add checks for expected list type special cases > --- > > Key: SPARK-6 > URL: https://issues.apache.org/jira/browse/SPARK-6 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Amanda Liu >Assignee: Amanda Liu >Priority: Major > Fix For: 3.5.0 > > > SPIP: > https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44264) DeepSpeed Distrobutor
[ https://issues.apache.org/jira/browse/SPARK-44264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17743293#comment-17743293 ] Xinrong Meng commented on SPARK-44264: -- Issue resolved by pull request https://github.com/apache/spark/pull/41946 > DeepSpeed Distrobutor > - > > Key: SPARK-44264 > URL: https://issues.apache.org/jira/browse/SPARK-44264 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark >Affects Versions: 3.4.1 >Reporter: Lu Wang >Priority: Critical > Fix For: 3.5.0 > > > To make it easier for Pyspark users to run distributed training and inference > with DeepSpeed on spark clusters using PySpark. This was a project determined > by the Databricks ML Training Team. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44398) Scala foreachBatch API in Streaming Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-44398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng resolved SPARK-44398. -- Resolution: Fixed Issue resolved by pull request 41969 [https://github.com/apache/spark/pull/41969] > Scala foreachBatch API in Streaming Spark Connect > - > > Key: SPARK-44398 > URL: https://issues.apache.org/jira/browse/SPARK-44398 > Project: Spark > Issue Type: Task > Components: Connect >Affects Versions: 3.4.1 >Reporter: Raghu Angadi >Assignee: Raghu Angadi >Priority: Major > Fix For: 3.5.0 > > > Implement foreachBatch API in Scala Spark Connect -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44398) Scala foreachBatch API in Streaming Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-44398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng reassigned SPARK-44398: Assignee: Raghu Angadi > Scala foreachBatch API in Streaming Spark Connect > - > > Key: SPARK-44398 > URL: https://issues.apache.org/jira/browse/SPARK-44398 > Project: Spark > Issue Type: Task > Components: Connect >Affects Versions: 3.4.1 >Reporter: Raghu Angadi >Assignee: Raghu Angadi >Priority: Major > Fix For: 3.5.0 > > > Implement foreachBatch API in Scala Spark Connect -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44401) Arrow Python UDF Use Guide
[ https://issues.apache.org/jira/browse/SPARK-44401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-44401: - Component/s: Documentation > Arrow Python UDF Use Guide > -- > > Key: SPARK-44401 > URL: https://issues.apache.org/jira/browse/SPARK-44401 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 3.5.0 >Reporter: Xinrong Meng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44401) Arrow Python UDF Use Guide
Xinrong Meng created SPARK-44401: Summary: Arrow Python UDF Use Guide Key: SPARK-44401 URL: https://issues.apache.org/jira/browse/SPARK-44401 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.5.0 Reporter: Xinrong Meng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44399) Import SparkSession in Python UDF only when useArrow is None
Xinrong Meng created SPARK-44399: Summary: Import SparkSession in Python UDF only when useArrow is None Key: SPARK-44399 URL: https://issues.apache.org/jira/browse/SPARK-44399 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.5.0 Reporter: Xinrong Meng Import SparkSession in Python UDF only when useArrow is None -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44150) Explicit Arrow casting for mismatched return type in Arrow Python UDF
[ https://issues.apache.org/jira/browse/SPARK-44150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng reassigned SPARK-44150: Assignee: Xinrong Meng > Explicit Arrow casting for mismatched return type in Arrow Python UDF > - > > Key: SPARK-44150 > URL: https://issues.apache.org/jira/browse/SPARK-44150 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44150) Explicit Arrow casting for mismatched return type in Arrow Python UDF
[ https://issues.apache.org/jira/browse/SPARK-44150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng resolved SPARK-44150. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41503 [https://github.com/apache/spark/pull/41503] > Explicit Arrow casting for mismatched return type in Arrow Python UDF > - > > Key: SPARK-44150 > URL: https://issues.apache.org/jira/browse/SPARK-44150 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44150) Explicit Arrow casting for mismatched return type in Arrow Python UDF
Xinrong Meng created SPARK-44150: Summary: Explicit Arrow casting for mismatched return type in Arrow Python UDF Key: SPARK-44150 URL: https://issues.apache.org/jira/browse/SPARK-44150 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 3.5.0 Reporter: Xinrong Meng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40307) Introduce Arrow Python UDFs
[ https://issues.apache.org/jira/browse/SPARK-40307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-40307: - Affects Version/s: (was: 3.4.0) > Introduce Arrow Python UDFs > --- > > Key: SPARK-40307 > URL: https://issues.apache.org/jira/browse/SPARK-40307 > Project: Spark > Issue Type: Umbrella > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > > Python user-defined function (UDF) enables users to run arbitrary code > against PySpark columns. It uses Pickle for (de)serialization and executes > row by row. > One major performance bottleneck of Python UDFs is (de)serialization, that > is, the data interchanging between the worker JVM and the spawned Python > subprocess which actually executes the UDF. We should seek an alternative to > handle the (de)serialization: Arrow, which is used in the (de)serialization > of Pandas UDF already. > There should be two ways to enable/disable the Arrow optimization for Python > UDFs: > - the Spark configuration `spark.sql.execution.pythonUDF.arrow.enabled`, > disabled by default. > - the `useArrow` parameter of the `udf` function, None by default. > The Spark configuration takes effect only when `useArrow` is None. Otherwise, > `useArrow` decides whether a specific user-defined function is optimized by > Arrow or not. > The reason why we introduce these two ways is to provide both a convenient, > per-Spark-session control and a finer-grained, per-UDF control of the Arrow > optimization for Python UDFs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43440) Support registration of an Arrow Python UDF
[ https://issues.apache.org/jira/browse/SPARK-43440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-43440: - Summary: Support registration of an Arrow Python UDF (was: Support registration of an Arrow-optimized Python UDF ) > Support registration of an Arrow Python UDF > > > Key: SPARK-43440 > URL: https://issues.apache.org/jira/browse/SPARK-43440 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.5.0 > > > Currently, when users register an Arrow-optimized Python UDF, it will be > registered as a pickled Python UDF and thus, executed without Arrow > optimization. > We should support Arrow-optimized Python UDFs registration and execute them > with Arrow optimization. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43893) Non-atomic data type support in Arrow Python UDF
[ https://issues.apache.org/jira/browse/SPARK-43893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-43893: - Summary: Non-atomic data type support in Arrow Python UDF (was: Non-atomic data type support in Arrow-optimized Python UDF) > Non-atomic data type support in Arrow Python UDF > > > Key: SPARK-43893 > URL: https://issues.apache.org/jira/browse/SPARK-43893 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43412) Introduce `SQL_ARROW_BATCHED_UDF` EvalType for Arrow Python UDFs
[ https://issues.apache.org/jira/browse/SPARK-43412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-43412: - Summary: Introduce `SQL_ARROW_BATCHED_UDF` EvalType for Arrow Python UDFs (was: Introduce `SQL_ARROW_BATCHED_UDF` EvalType for Arrow-optimized Python UDFs) > Introduce `SQL_ARROW_BATCHED_UDF` EvalType for Arrow Python UDFs > > > Key: SPARK-43412 > URL: https://issues.apache.org/jira/browse/SPARK-43412 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.5.0 > > > We are about to improve nested non-atomic input/output support of an > Arrow-optimized Python UDF. > However, currently, it shares the same EvalType with a pickled Python UDF, > but the same implementation with a Pandas UDF. > Introducing an EvalType enables isolating the changes to Arrow-optimized > Python UDFs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43082) Arrow Python UDFs in Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-43082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-43082: - Summary: Arrow Python UDFs in Spark Connect (was: Arrow-optimized Python UDFs in Spark Connect) > Arrow Python UDFs in Spark Connect > -- > > Key: SPARK-43082 > URL: https://issues.apache.org/jira/browse/SPARK-43082 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.5.0 > > > Implement Arrow-optimized Python UDFs in Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42893) Block Arrow Python UDFs
[ https://issues.apache.org/jira/browse/SPARK-42893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-42893: - Summary: Block Arrow Python UDFs (was: Block Arrow-optimized Python UDFs) > Block Arrow Python UDFs > --- > > Key: SPARK-42893 > URL: https://issues.apache.org/jira/browse/SPARK-42893 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.4.0 > > > Considering the upcoming improvements on the result inconsistencies between > traditional Pickled Python UDFs and Arrow-optimized Python UDFs, we'd better > block the feature, otherwise, users who try out the feature will expect > behavior changes in the next release. > In addition, since Spark Connect Python Client(SCPC) has been introduced in > Spark 3.4, we'd better ensure the feature is ready in both vanilla PySpark > and SCPC at the same time for compatibility. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40307) Introduce Arrow Python UDFs
[ https://issues.apache.org/jira/browse/SPARK-40307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-40307: - Summary: Introduce Arrow Python UDFs (was: Introduce Arrow-optimized Python UDFs) > Introduce Arrow Python UDFs > --- > > Key: SPARK-40307 > URL: https://issues.apache.org/jira/browse/SPARK-40307 > Project: Spark > Issue Type: Umbrella > Components: PySpark >Affects Versions: 3.4.0, 3.5.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > > Python user-defined function (UDF) enables users to run arbitrary code > against PySpark columns. It uses Pickle for (de)serialization and executes > row by row. > One major performance bottleneck of Python UDFs is (de)serialization, that > is, the data interchanging between the worker JVM and the spawned Python > subprocess which actually executes the UDF. We should seek an alternative to > handle the (de)serialization: Arrow, which is used in the (de)serialization > of Pandas UDF already. > There should be two ways to enable/disable the Arrow optimization for Python > UDFs: > - the Spark configuration `spark.sql.execution.pythonUDF.arrow.enabled`, > disabled by default. > - the `useArrow` parameter of the `udf` function, None by default. > The Spark configuration takes effect only when `useArrow` is None. Otherwise, > `useArrow` decides whether a specific user-defined function is optimized by > Arrow or not. > The reason why we introduce these two ways is to provide both a convenient, > per-Spark-session control and a finer-grained, per-UDF control of the Arrow > optimization for Python UDFs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43903) Improve ArrayType input support in Arrow Python UDF
[ https://issues.apache.org/jira/browse/SPARK-43903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-43903: - Summary: Improve ArrayType input support in Arrow Python UDF (was: Improve ArrayType input support in Arrow-optimized Python UDF) > Improve ArrayType input support in Arrow Python UDF > --- > > Key: SPARK-43903 > URL: https://issues.apache.org/jira/browse/SPARK-43903 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.0 >Reporter: Xinrong Meng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43903) Improve ArrayType input support in Arrow-optimized Python UDF
[ https://issues.apache.org/jira/browse/SPARK-43903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-43903: - Summary: Improve ArrayType input support in Arrow-optimized Python UDF (was: Non-atomic data type support in Arrow-optimized Python UDF) > Improve ArrayType input support in Arrow-optimized Python UDF > - > > Key: SPARK-43903 > URL: https://issues.apache.org/jira/browse/SPARK-43903 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.0 >Reporter: Xinrong Meng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43893) Non-atomic data type support in Arrow-optimized Python UDF
[ https://issues.apache.org/jira/browse/SPARK-43893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-43893: - Summary: Non-atomic data type support in Arrow-optimized Python UDF (was: StructType input/output support in Arrow-optimized Python UDF) > Non-atomic data type support in Arrow-optimized Python UDF > -- > > Key: SPARK-43893 > URL: https://issues.apache.org/jira/browse/SPARK-43893 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org