[jira] [Resolved] (SPARK-47921) Fix ExecuteJobTag creation in ExecuteHolder

2024-04-24 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-47921.
---
Fix Version/s: 3.5.2
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 46140
[https://github.com/apache/spark/pull/46140]

> Fix ExecuteJobTag creation in ExecuteHolder
> ---
>
> Key: SPARK-47921
> URL: https://issues.apache.org/jira/browse/SPARK-47921
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.2, 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47346) Make daemon mode configurable when creating Python workers

2024-03-15 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-47346.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45468
[https://github.com/apache/spark/pull/45468]

> Make daemon mode configurable when creating Python workers
> --
>
> Key: SPARK-47346
> URL: https://issues.apache.org/jira/browse/SPARK-47346
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47346) Make daemon mode configurable when creating Python workers

2024-03-15 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin reassigned SPARK-47346:
-

Assignee: Allison Wang

> Make daemon mode configurable when creating Python workers
> --
>
> Key: SPARK-47346
> URL: https://issues.apache.org/jira/browse/SPARK-47346
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47251) Block invalid types from the `args` argument for `sql` command

2024-03-01 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-47251:
--
Summary: Block invalid types from the `args` argument for `sql` command  
(was: Block invalid types from the `arg` argument for `sql` command)

> Block invalid types from the `args` argument for `sql` command
> --
>
> Key: SPARK-47251
> URL: https://issues.apache.org/jira/browse/SPARK-47251
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.5.1
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47251) Block invalid types from the `arg` argument for `sql` command

2024-03-01 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-47251:
-

 Summary: Block invalid types from the `arg` argument for `sql` 
command
 Key: SPARK-47251
 URL: https://issues.apache.org/jira/browse/SPARK-47251
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 3.5.1
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47214) Create API for 'analyze' method to differentiate constant NULL arguments and other types of arguments

2024-02-28 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-47214.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45319
[https://github.com/apache/spark/pull/45319]

> Create API for 'analyze' method to differentiate constant NULL arguments and 
> other types of arguments
> -
>
> Key: SPARK-47214
> URL: https://issues.apache.org/jira/browse/SPARK-47214
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Daniel
>Assignee: Daniel
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47214) Create API for 'analyze' method to differentiate constant NULL arguments and other types of arguments

2024-02-28 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin reassigned SPARK-47214:
-

Assignee: Daniel

> Create API for 'analyze' method to differentiate constant NULL arguments and 
> other types of arguments
> -
>
> Key: SPARK-47214
> URL: https://issues.apache.org/jira/browse/SPARK-47214
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Daniel
>Assignee: Daniel
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47079) Unable to create PySpark dataframe containing Variant columns

2024-02-26 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin reassigned SPARK-47079:
-

Assignee: Desmond Cheong

> Unable to create PySpark dataframe containing Variant columns
> -
>
> Key: SPARK-47079
> URL: https://issues.apache.org/jira/browse/SPARK-47079
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, PySpark, SQL
>Affects Versions: 3.5.0
>Reporter: Desmond Cheong
>Assignee: Desmond Cheong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Trying to create a dataframe containing a variant type results in:
> AssertionError: Undefined error message parameter for error class: 
> CANNOT_PARSE_DATATYPE. Parameters: {'error': "Undefined error message 
> parameter for error class: CANNOT_PARSE_DATATYPE. Parameters:
> {'error': 'variant'}
> "}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47079) Unable to create PySpark dataframe containing Variant columns

2024-02-26 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-47079.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45131
[https://github.com/apache/spark/pull/45131]

> Unable to create PySpark dataframe containing Variant columns
> -
>
> Key: SPARK-47079
> URL: https://issues.apache.org/jira/browse/SPARK-47079
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, PySpark, SQL
>Affects Versions: 3.5.0
>Reporter: Desmond Cheong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Trying to create a dataframe containing a variant type results in:
> AssertionError: Undefined error message parameter for error class: 
> CANNOT_PARSE_DATATYPE. Parameters: {'error': "Undefined error message 
> parameter for error class: CANNOT_PARSE_DATATYPE. Parameters:
> {'error': 'variant'}
> "}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47035) Protocol for client side StreamingQueryListener

2024-02-23 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-47035.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45091
[https://github.com/apache/spark/pull/45091]

> Protocol for client side StreamingQueryListener
> ---
>
> Key: SPARK-47035
> URL: https://issues.apache.org/jira/browse/SPARK-47035
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Assignee: Wei Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47137) Add getAll to spark.conf for feature parity with Scala

2024-02-22 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-47137:
--
Summary: Add getAll to spark.conf for feature parity with Scala  (was: Add 
getAll for spark.conf for feature parity with Scala)

> Add getAll to spark.conf for feature parity with Scala
> --
>
> Key: SPARK-47137
> URL: https://issues.apache.org/jira/browse/SPARK-47137
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47137) Add getAll for spark.conf for feature parity with Scala

2024-02-22 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-47137:
--
Summary: Add getAll for spark.conf for feature parity with Scala  (was: Add 
getAll for pyspark.sql.conf for feature parity with Scala)

> Add getAll for spark.conf for feature parity with Scala
> ---
>
> Key: SPARK-47137
> URL: https://issues.apache.org/jira/browse/SPARK-47137
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47137) Add getAll for pyspark.sql.conf for feature parity with Scala

2024-02-22 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-47137:
-

 Summary: Add getAll for pyspark.sql.conf for feature parity with 
Scala
 Key: SPARK-47137
 URL: https://issues.apache.org/jira/browse/SPARK-47137
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47069) Introduce `spark.profile.show/dump` for SparkSession-based profiling

2024-02-22 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-47069.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45129
[https://github.com/apache/spark/pull/45129]

> Introduce `spark.profile.show/dump` for SparkSession-based profiling
> 
>
> Key: SPARK-47069
> URL: https://issues.apache.org/jira/browse/SPARK-47069
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Introduce `spark.profile.show/dump` for SparkSession-based profiling



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47027) Use temporary directories for profiler test outputs

2024-02-12 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-47027:
--
Summary: Use temporary directories for profiler test outputs  (was: Move 
TestUtils to the generic testing utils.)

> Use temporary directories for profiler test outputs
> ---
>
> Key: SPARK-47027
> URL: https://issues.apache.org/jira/browse/SPARK-47027
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47027) Move TestUtils to the generic testing utils.

2024-02-12 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-47027:
-

 Summary: Move TestUtils to the generic testing utils.
 Key: SPARK-47027
 URL: https://issues.apache.org/jira/browse/SPARK-47027
 Project: Spark
  Issue Type: Test
  Components: Tests
Affects Versions: 4.0.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47002) Enforce that 'AnalyzeResult' 'orderBy' field is a list of pyspark.sql.functions.OrderingColumn

2024-02-08 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-47002.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45062
[https://github.com/apache/spark/pull/45062]

> Enforce that 'AnalyzeResult' 'orderBy' field is a list of 
> pyspark.sql.functions.OrderingColumn
> --
>
> Key: SPARK-47002
> URL: https://issues.apache.org/jira/browse/SPARK-47002
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Daniel
>Assignee: Daniel
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47002) Enforce that 'AnalyzeResult' 'orderBy' field is a list of pyspark.sql.functions.OrderingColumn

2024-02-08 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin reassigned SPARK-47002:
-

Assignee: Daniel

> Enforce that 'AnalyzeResult' 'orderBy' field is a list of 
> pyspark.sql.functions.OrderingColumn
> --
>
> Key: SPARK-47002
> URL: https://issues.apache.org/jira/browse/SPARK-47002
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Daniel
>Assignee: Daniel
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46691) Support profiling on WindowInPandasExec

2024-02-07 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-46691.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45035
[https://github.com/apache/spark/pull/45035]

> Support profiling on WindowInPandasExec
> ---
>
> Key: SPARK-46691
> URL: https://issues.apache.org/jira/browse/SPARK-46691
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46688) Support profiling on AggregateInPandasExec

2024-02-07 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin reassigned SPARK-46688:
-

Assignee: Xinrong Meng

> Support profiling on AggregateInPandasExec
> --
>
> Key: SPARK-46688
> URL: https://issues.apache.org/jira/browse/SPARK-46688
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Assignee: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46688) Support profiling on AggregateInPandasExec

2024-02-07 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-46688.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45035
[https://github.com/apache/spark/pull/45035]

> Support profiling on AggregateInPandasExec
> --
>
> Key: SPARK-46688
> URL: https://issues.apache.org/jira/browse/SPARK-46688
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Assignee: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46691) Support profiling on WindowInPandasExec

2024-02-07 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin reassigned SPARK-46691:
-

Assignee: Xinrong Meng

> Support profiling on WindowInPandasExec
> ---
>
> Key: SPARK-46691
> URL: https://issues.apache.org/jira/browse/SPARK-46691
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Assignee: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46966) Create API for 'analyze' method to indicate subset of input table columns to select

2024-02-07 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin reassigned SPARK-46966:
-

Assignee: Daniel

> Create API for 'analyze' method to indicate subset of input table columns to 
> select
> ---
>
> Key: SPARK-46966
> URL: https://issues.apache.org/jira/browse/SPARK-46966
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Daniel
>Assignee: Daniel
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46966) Create API for 'analyze' method to indicate subset of input table columns to select

2024-02-07 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-46966.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45007
[https://github.com/apache/spark/pull/45007]

> Create API for 'analyze' method to indicate subset of input table columns to 
> select
> ---
>
> Key: SPARK-46966
> URL: https://issues.apache.org/jira/browse/SPARK-46966
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Daniel
>Assignee: Daniel
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46687) Implement memory-profiler

2024-01-29 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-46687.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44775
[https://github.com/apache/spark/pull/44775]

> Implement memory-profiler
> -
>
> Key: SPARK-46687
> URL: https://issues.apache.org/jira/browse/SPARK-46687
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46691) Support profiling on WindowInPandasExec

2024-01-11 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-46691:
-

 Summary: Support profiling on WindowInPandasExec
 Key: SPARK-46691
 URL: https://issues.apache.org/jira/browse/SPARK-46691
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46690) Support profiling on FlatMapCoGroupsInBatchExec

2024-01-11 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-46690:
-

 Summary: Support profiling on FlatMapCoGroupsInBatchExec
 Key: SPARK-46690
 URL: https://issues.apache.org/jira/browse/SPARK-46690
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46689) Support profiling on FlatMapGroupsInBatchExec

2024-01-11 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-46689:
-

 Summary: Support profiling on FlatMapGroupsInBatchExec
 Key: SPARK-46689
 URL: https://issues.apache.org/jira/browse/SPARK-46689
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46688) Support profiling on AggregateInPandasExec

2024-01-11 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-46688:
-

 Summary: Support profiling on AggregateInPandasExec
 Key: SPARK-46688
 URL: https://issues.apache.org/jira/browse/SPARK-46688
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46687) Implement memory-profiler

2024-01-11 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-46687:
-

 Summary: Implement memory-profiler
 Key: SPARK-46687
 URL: https://issues.apache.org/jira/browse/SPARK-46687
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46686) Basic support of SparkSession based Python UDF profiler

2024-01-11 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-46686:
-

 Summary: Basic support of SparkSession based Python UDF profiler
 Key: SPARK-46686
 URL: https://issues.apache.org/jira/browse/SPARK-46686
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46685) Introduce SparkSession based PySpark UDF profiler

2024-01-11 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-46685:
-

 Summary: Introduce SparkSession based PySpark UDF profiler
 Key: SPARK-46685
 URL: https://issues.apache.org/jira/browse/SPARK-46685
 Project: Spark
  Issue Type: New Feature
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Takuya Ueshin


The existing UDF profilers are SparkContext based, which can't support Spark 
Connect.

We should introduce SparkSession based profilers and support Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46684) CoGroup.applyInPandas/Arrow should pass arguments properly

2024-01-11 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-46684:
-

 Summary: CoGroup.applyInPandas/Arrow should pass arguments properly
 Key: SPARK-46684
 URL: https://issues.apache.org/jira/browse/SPARK-46684
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.5.0
Reporter: Takuya Ueshin


In Spark Connect, {{CoGroup.applyInPandas/Arrow}} doesn't take arguments 
properly, so the arguments of the UDF can be broken:
{noformat}
>>> import pandas as pd
>>>
>>> df1 = spark.createDataFrame(
... [(1, 1.0, "a"), (2, 2.0, "b"), (1, 3.0, "c"), (2, 4.0, "d")], ("id", 
"v1", "v2")
... )
>>> df2 = spark.createDataFrame([(1, "x"), (2, "y"), (1, "z")], ("id", "v3"))
>>>
>>> def summarize(left, right):
... return pd.DataFrame(
... {
... "left_rows": [len(left)],
... "left_columns": [len(left.columns)],
... "right_rows": [len(right)],
... "right_columns": [len(right.columns)],
... }
... )
...
>>> df = (
... df1.groupby("id")
... .cogroup(df2.groupby("id"))
... .applyInPandas(
... summarize,
... schema="left_rows long, left_columns long, right_rows long, 
right_columns long",
... )
... )
>>>
>>> df.show()
+-++--+-+
|left_rows|left_columns|right_rows|right_columns|
+-++--+-+
|2|   1| 2|1|
|2|   1| 1|1|
+-++--+-+
{noformat}

The result should be:

{noformat}
+-++--+-+
|left_rows|left_columns|right_rows|right_columns|
+-++--+-+
|        2|           3|         2|            2|
|        2|           3|         1|            2|
+-++--+-+
{noformat}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46040) Update API for 'analyze' partitioning/ordering columns to support general expressions

2023-12-04 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-46040.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43946
[https://github.com/apache/spark/pull/43946]

> Update API for 'analyze' partitioning/ordering columns to support general 
> expressions
> -
>
> Key: SPARK-46040
> URL: https://issues.apache.org/jira/browse/SPARK-46040
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Daniel
>Assignee: Daniel
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45746) Return specific error messages if UDTF 'analyze' method accepts or returns wrong values

2023-11-29 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-45746.
---
Fix Version/s: 4.0.0
 Assignee: Daniel
   Resolution: Fixed

Issue resolved by pull request 43611
https://github.com/apache/spark/pull/43611

> Return specific error messages if UDTF 'analyze' method accepts or returns 
> wrong values
> ---
>
> Key: SPARK-45746
> URL: https://issues.apache.org/jira/browse/SPARK-45746
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Daniel
>Assignee: Daniel
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45810) Create API to stop consuming rows from the input table

2023-11-15 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-45810.
---
  Assignee: Daniel
Resolution: Fixed

Issue resolved by pull request 43682
https://github.com/apache/spark/pull/43682

> Create API to stop consuming rows from the input table
> --
>
> Key: SPARK-45810
> URL: https://issues.apache.org/jira/browse/SPARK-45810
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Daniel
>Assignee: Daniel
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45813) Return the observed metrics from commands

2023-11-06 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-45813:
-

 Summary: Return the observed metrics from commands
 Key: SPARK-45813
 URL: https://issues.apache.org/jira/browse/SPARK-45813
 Project: Spark
  Issue Type: Improvement
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45656) Fix observation when named observations with the same name on different datasets.

2023-10-24 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-45656:
-

 Summary: Fix observation when named observations with the same 
name on different datasets.
 Key: SPARK-45656
 URL: https://issues.apache.org/jira/browse/SPARK-45656
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.0, 4.0.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45524) Initial support for Python data source read API

2023-10-24 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-45524.
---
Fix Version/s: 4.0.0
 Assignee: Allison Wang
   Resolution: Fixed

Issue resolved by pull request 43360
https://github.com/apache/spark/pull/43360

> Initial support for Python data source read API
> ---
>
> Key: SPARK-45524
> URL: https://issues.apache.org/jira/browse/SPARK-45524
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Add API for data source and data source reader and add Catalyst + execution 
> support.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45620) Fix user-facing APIs related to Python UDTF to use camelCase.

2023-10-20 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-45620:
-

 Summary: Fix user-facing APIs related to Python UDTF to use 
camelCase.
 Key: SPARK-45620
 URL: https://issues.apache.org/jira/browse/SPARK-45620
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45523) Return useful error message if UDTF returns None for non-nullable column

2023-10-20 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-45523.
---
Fix Version/s: 4.0.0
 Assignee: Daniel
   Resolution: Fixed

Issue resolved by pull request 43356
https://github.com/apache/spark/pull/43356

> Return useful error message if UDTF returns None for non-nullable column
> 
>
> Key: SPARK-45523
> URL: https://issues.apache.org/jira/browse/SPARK-45523
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Daniel
>Assignee: Daniel
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45619) Apply the observed metrics to Observation object.

2023-10-20 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-45619:
-

 Summary: Apply the observed metrics to Observation object.
 Key: SPARK-45619
 URL: https://issues.apache.org/jira/browse/SPARK-45619
 Project: Spark
  Issue Type: Improvement
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45577) Fix UserDefinedPythonTableFunctionAnalyzeRunner to pass folded values from named arguments

2023-10-17 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-45577.
---
Fix Version/s: 4.0.0
 Assignee: Takuya Ueshin
   Resolution: Fixed

Issue resolved by pull request 43407
https://github.com/apache/spark/pull/43407

> Fix UserDefinedPythonTableFunctionAnalyzeRunner to pass folded values from 
> named arguments
> --
>
> Key: SPARK-45577
> URL: https://issues.apache.org/jira/browse/SPARK-45577
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45577) Fix UserDefinedPythonTableFunctionAnalyzeRunner to pass folded values from named arguments

2023-10-17 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-45577:
-

 Summary: Fix UserDefinedPythonTableFunctionAnalyzeRunner to pass 
folded values from named arguments
 Key: SPARK-45577
 URL: https://issues.apache.org/jira/browse/SPARK-45577
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45505) Refactor analyzeInPython function to make it reusable

2023-10-12 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-45505.
---
Fix Version/s: 4.0.0
 Assignee: Allison Wang
   Resolution: Fixed

Issue resolved by pull request 43340
https://github.com/apache/spark/pull/43340

> Refactor analyzeInPython function to make it reusable
> -
>
> Key: SPARK-45505
> URL: https://issues.apache.org/jira/browse/SPARK-45505
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Refactor analyzeInPython method in UserDefinedPythonTableFunction object into 
> an abstract class so that it can be reused in the future.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45402) Add API for 'analyze' method to return a buffer to be consumed on each class creation

2023-10-11 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-45402.
---
Fix Version/s: 4.0.0
 Assignee: Daniel
   Resolution: Fixed

Issue resolved by pull request 43204
https://github.com/apache/spark/pull/43204

> Add API for 'analyze' method to return a buffer to be consumed on each class 
> creation
> -
>
> Key: SPARK-45402
> URL: https://issues.apache.org/jira/browse/SPARK-45402
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Daniel
>Assignee: Daniel
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45494) Introduce read/write a byte array util functions for PythonWorkerUtils

2023-10-10 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-45494:
-

 Summary: Introduce read/write a byte array util functions for 
PythonWorkerUtils
 Key: SPARK-45494
 URL: https://issues.apache.org/jira/browse/SPARK-45494
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45441) Introduce more util functions for PythonWorkerUtils

2023-10-06 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-45441:
-

 Summary: Introduce more util functions for PythonWorkerUtils
 Key: SPARK-45441
 URL: https://issues.apache.org/jira/browse/SPARK-45441
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45405) Refactor Python UDTF execution

2023-10-03 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-45405:
-

 Summary: Refactor Python UDTF execution
 Key: SPARK-45405
 URL: https://issues.apache.org/jira/browse/SPARK-45405
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45362) Project out PARTITION BY expressions before 'eval' method consumes input rows

2023-09-28 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-45362.
---
Fix Version/s: 4.0.0
 Assignee: Daniel
   Resolution: Fixed

Issue resolved by pull request 43156
https://github.com/apache/spark/pull/43156

> Project out PARTITION BY expressions before 'eval' method consumes input rows
> -
>
> Key: SPARK-45362
> URL: https://issues.apache.org/jira/browse/SPARK-45362
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Daniel
>Assignee: Daniel
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45266) Refactor ResolveFunctions analyzer rule to delay making lateral join when table arguments are used

2023-09-28 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-45266.
---
Fix Version/s: 4.0.0
 Assignee: Takuya Ueshin
   Resolution: Fixed

Issue resolved by pull request 43042
https://github.com/apache/spark/pull/43042

> Refactor ResolveFunctions analyzer rule to delay making lateral join when 
> table arguments are used
> --
>
> Key: SPARK-45266
> URL: https://issues.apache.org/jira/browse/SPARK-45266
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45266) Refactor ResolveFunctions analyzer rule to delay making lateral join when table arguments are used

2023-09-21 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-45266:
-

 Summary: Refactor ResolveFunctions analyzer rule to delay making 
lateral join when table arguments are used
 Key: SPARK-45266
 URL: https://issues.apache.org/jira/browse/SPARK-45266
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45118) Refactor converters for complex types to short cut when the element types don't need converters

2023-09-14 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-45118.
---
Fix Version/s: 4.0.0
 Assignee: Takuya Ueshin
   Resolution: Fixed

Issue resolved by pull request 42874
https://github.com/apache/spark/pull/42874

> Refactor converters for complex types to short cut when the element types 
> don't need converters
> ---
>
> Key: SPARK-45118
> URL: https://issues.apache.org/jira/browse/SPARK-45118
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45159) Handle named arguments only when necessary

2023-09-13 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-45159:
-

 Summary: Handle named arguments only when necessary
 Key: SPARK-45159
 URL: https://issues.apache.org/jira/browse/SPARK-45159
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45118) Refactor converters for complex types to short cut when the element types don't need converters

2023-09-11 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-45118:
-

 Summary: Refactor converters for complex types to short cut when 
the element types don't need converters
 Key: SPARK-45118
 URL: https://issues.apache.org/jira/browse/SPARK-45118
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44901) Add API in 'analyze' method to return partitioning/ordering expressions

2023-09-01 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-44901.
---
Fix Version/s: 4.0.0
 Assignee: Daniel
   Resolution: Fixed

Issue resolved by pull request 42595
https://github.com/apache/spark/pull/42595

> Add API in 'analyze' method to return partitioning/ordering expressions
> ---
>
> Key: SPARK-44901
> URL: https://issues.apache.org/jira/browse/SPARK-44901
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Daniel
>Assignee: Daniel
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44952) Add named argument support for aggregate Pandas UDFs

2023-09-01 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-44952.
---
Fix Version/s: 4.0.0
 Assignee: Takuya Ueshin
   Resolution: Fixed

Issue resolved by pull request 42663
https://github.com/apache/spark/pull/42663

> Add named argument support for aggregate Pandas UDFs
> 
>
> Key: SPARK-44952
> URL: https://issues.apache.org/jira/browse/SPARK-44952
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44952) Add named argument support for aggregate Pandas UDFs

2023-08-24 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-44952:
-

 Summary: Add named argument support for aggregate Pandas UDFs
 Key: SPARK-44952
 URL: https://issues.apache.org/jira/browse/SPARK-44952
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44918) Add named argument support for scalar Python/Pandas UDFs

2023-08-24 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-44918.
---
Fix Version/s: 4.0.0
 Assignee: Takuya Ueshin
   Resolution: Fixed

Issue resolved by pull request 42617
https://github.com/apache/spark/pull/42617

> Add named argument support for scalar Python/Pandas UDFs
> 
>
> Key: SPARK-44918
> URL: https://issues.apache.org/jira/browse/SPARK-44918
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44918) Add named argument support for scalar Python/Pandas UDFs

2023-08-22 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-44918:
-

 Summary: Add named argument support for scalar Python/Pandas UDFs
 Key: SPARK-44918
 URL: https://issues.apache.org/jira/browse/SPARK-44918
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44748) Query execution to support PARTITION BY and ORDER BY clause for table arguments

2023-08-21 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-44748.
---
Fix Version/s: 4.0.0
 Assignee: Daniel
   Resolution: Fixed

Issue resolved by pull request 42420
https://github.com/apache/spark/pull/42420

> Query execution to support PARTITION BY and ORDER BY clause for table 
> arguments
> ---
>
> Key: SPARK-44748
> URL: https://issues.apache.org/jira/browse/SPARK-44748
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Daniel
>Assignee: Daniel
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44876) Enable and fix test_parity_arrow_python_udf

2023-08-18 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-44876:
--
Description: 
{{pyspark.sql.tests.connect.test_parity_arrow_python_udf}} is not listed in 
{{dev/sparktestsupport/modules.py}}, and it fails when running manually.

{code}
==
ERROR [0.072s]: test_register 
(pyspark.sql.tests.connect.test_parity_arrow_python_udf.ArrowPythonUDFParityTests)
--
Traceback (most recent call last):
...
pyspark.errors.exceptions.base.PySparkRuntimeError: 
[SCHEMA_MISMATCH_FOR_PANDAS_UDF] Result vector from pandas_udf was not the 
required length: expected 1, got 38.
{code}

  was:{{pyspark.sql.tests.connect.test_parity_arrow_python_udf}} is not listed 
in {{dev/sparktestsupport/modules.py}}, and it fails when running manually.


> Enable and fix test_parity_arrow_python_udf
> ---
>
> Key: SPARK-44876
> URL: https://issues.apache.org/jira/browse/SPARK-44876
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Takuya Ueshin
>Priority: Blocker
>
> {{pyspark.sql.tests.connect.test_parity_arrow_python_udf}} is not listed in 
> {{dev/sparktestsupport/modules.py}}, and it fails when running manually.
> {code}
> ==
> ERROR [0.072s]: test_register 
> (pyspark.sql.tests.connect.test_parity_arrow_python_udf.ArrowPythonUDFParityTests)
> --
> Traceback (most recent call last):
> ...
> pyspark.errors.exceptions.base.PySparkRuntimeError: 
> [SCHEMA_MISMATCH_FOR_PANDAS_UDF] Result vector from pandas_udf was not the 
> required length: expected 1, got 38.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44876) Enable and fix test_parity_arrow_python_udf

2023-08-18 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-44876:
-

 Summary: Enable and fix test_parity_arrow_python_udf
 Key: SPARK-44876
 URL: https://issues.apache.org/jira/browse/SPARK-44876
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 3.5.0
Reporter: Takuya Ueshin


{{pyspark.sql.tests.connect.test_parity_arrow_python_udf}} is not listed in 
{{dev/sparktestsupport/modules.py}}, and it fails when running manually.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44834) Add SQL query test suites for Python UDTFs

2023-08-17 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-44834.
---
Fix Version/s: 3.5.0
 Assignee: Allison Wang
   Resolution: Fixed

Issue resolved by pull request 42517
https://github.com/apache/spark/pull/42517

> Add SQL query test suites for Python UDTFs
> --
>
> Key: SPARK-44834
> URL: https://issues.apache.org/jira/browse/SPARK-44834
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
> Fix For: 3.5.0
>
>
> Add SQL query test suites for executing Python UDTFs in SQL.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44836) Refactor Arrow Python UDTF

2023-08-16 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-44836.
---
Fix Version/s: 3.5.0
 Assignee: Takuya Ueshin
   Resolution: Fixed

Issue resolved by pull request 42520
https://github.com/apache/spark/pull/42520

> Refactor Arrow Python UDTF
> --
>
> Key: SPARK-44836
> URL: https://issues.apache.org/jira/browse/SPARK-44836
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44836) Refactor Arrow Python UDTF

2023-08-16 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-44836:
-

 Summary: Refactor Arrow Python UDTF
 Key: SPARK-44836
 URL: https://issues.apache.org/jira/browse/SPARK-44836
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.5.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44749) Support named arguments in Python UDTF

2023-08-14 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-44749.
---
Fix Version/s: 4.0.0
 Assignee: Takuya Ueshin
   Resolution: Fixed

Issue resolved by pull request 42422
https://github.com/apache/spark/pull/42422

> Support named arguments in Python UDTF
> --
>
> Key: SPARK-44749
> URL: https://issues.apache.org/jira/browse/SPARK-44749
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44749) Support named arguments in Python UDTF

2023-08-09 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-44749:
-

 Summary: Support named arguments in Python UDTF
 Key: SPARK-44749
 URL: https://issues.apache.org/jira/browse/SPARK-44749
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44561) Fix AssertionError when converting UDTF output to a complex type

2023-08-07 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-44561.
---
Fix Version/s: 3.5.0
 Assignee: Takuya Ueshin
   Resolution: Fixed

Issue resolved by pull request 42310
https://github.com/apache/spark/pull/42310

> Fix AssertionError when converting UDTF output to a complex type
> 
>
> Key: SPARK-44561
> URL: https://issues.apache.org/jira/browse/SPARK-44561
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Allison Wang
>Assignee: Takuya Ueshin
>Priority: Major
> Fix For: 3.5.0
>
>
> {code:java}
> class TestUDTF:
> def eval(self):
> yield {'a': 1, 'b': 2},
> udtf(TestUDTF, returnType="x: map")().show() {code}
> This will fail with:
>   File "pandas/_libs/lib.pyx", line 2834, in pandas._libs.lib.map_infer
>   File "python/pyspark/sql/pandas/types.py", line 804, in convert_map
>     assert isinstance(value, dict)
> AssertionError
> Same for `convert_struct`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44433) Implement termination of Python process for foreachBatch & streaming listener

2023-08-04 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-44433.
---
  Assignee: Wei Liu
Resolution: Fixed

Issue resolved by pull request 42283
https://github.com/apache/spark/pull/42283

> Implement termination of Python process for foreachBatch & streaming listener
> -
>
> Key: SPARK-44433
> URL: https://issues.apache.org/jira/browse/SPARK-44433
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 3.4.1
>Reporter: Raghu Angadi
>Assignee: Wei Liu
>Priority: Major
> Fix For: 3.5.0
>
>
> In the first implementation of Python support for foreachBatch, the python 
> process termination is not handled correctly. 
>  
> See the long TODO in 
> [https://github.com/apache/spark/blob/master/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/StreamingForeachBatchHelper.scala]
>  
> about an outline of the feature to terminate the runners by registering 
> StreamingQueryListners. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44663) Disable arrow optimization by default for Python UDTFs

2023-08-04 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-44663.
---
Fix Version/s: 3.5.0
 Assignee: Allison Wang
   Resolution: Fixed

Issue resolved by pull request 42329
https://github.com/apache/spark/pull/42329

> Disable arrow optimization by default for Python UDTFs
> --
>
> Key: SPARK-44663
> URL: https://issues.apache.org/jira/browse/SPARK-44663
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
> Fix For: 3.5.0
>
>
> Disable arrow optimization to make Python UDTFs consistent with Python UDFs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44644) Improve error messages for creating Python UDTFs with pickling errors

2023-08-04 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-44644.
---
   Fix Version/s: 4.0.0
Target Version/s: 3.5.0
Assignee: Allison Wang
  Resolution: Fixed

Issue resolved by pull request 42309
https://github.com/apache/spark/pull/42309

> Improve error messages for creating Python UDTFs with pickling errors
> -
>
> Key: SPARK-44644
> URL: https://issues.apache.org/jira/browse/SPARK-44644
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
> Fix For: 4.0.0
>
>
> Currently, when users create a Python UDTF with a non-pickleable object, it 
> throws this error:
> _pickle.PicklingError: Cannot pickle files that are not opened for reading: w
>  
> We should make this more user-friendly



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44648) Set up memory limits for analyze in Python.

2023-08-04 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-44648.
---
  Assignee: Takuya Ueshin
Resolution: Fixed

Issue resolved by pull request 42328
https://github.com/apache/spark/pull/42328

> Set up memory limits for analyze in Python.
> ---
>
> Key: SPARK-44648
> URL: https://issues.apache.org/jira/browse/SPARK-44648
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44648) Set up memory limits for analyze in Python.

2023-08-02 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-44648:
-

 Summary: Set up memory limits for analyze in Python.
 Key: SPARK-44648
 URL: https://issues.apache.org/jira/browse/SPARK-44648
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44643) __repr__ broken for Row when the field is empty Row

2023-08-02 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-44643:
-

 Summary: __repr__ broken for Row when the field is empty Row
 Key: SPARK-44643
 URL: https://issues.apache.org/jira/browse/SPARK-44643
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 3.5.0
Reporter: Takuya Ueshin


PySpark {{Row}} raises and exception if the field is empty Row:

{code:python}
>>> repr(Row(Row()))
Traceback (most recent call last):
...
TypeError: not enough arguments for format string
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44561) Fix AssertionError when converting UDTF output to a complex type

2023-08-02 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-44561:
--
Fix Version/s: (was: 4.0.0)

> Fix AssertionError when converting UDTF output to a complex type
> 
>
> Key: SPARK-44561
> URL: https://issues.apache.org/jira/browse/SPARK-44561
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Allison Wang
>Priority: Major
>
> {code:java}
> class TestUDTF:
> def eval(self):
> yield {'a': 1, 'b': 2},
> udtf(TestUDTF, returnType="x: map")().show() {code}
> This will fail with:
>   File "pandas/_libs/lib.pyx", line 2834, in pandas._libs.lib.map_infer
>   File "python/pyspark/sql/pandas/types.py", line 804, in convert_map
>     assert isinstance(value, dict)
> AssertionError
> Same for `convert_struct`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44561) Fix AssertionError when converting UDTF output to a complex type

2023-08-02 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin reassigned SPARK-44561:
-

Assignee: (was: Allison Wang)

> Fix AssertionError when converting UDTF output to a complex type
> 
>
> Key: SPARK-44561
> URL: https://issues.apache.org/jira/browse/SPARK-44561
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Allison Wang
>Priority: Major
> Fix For: 4.0.0
>
>
> {code:java}
> class TestUDTF:
> def eval(self):
> yield {'a': 1, 'b': 2},
> udtf(TestUDTF, returnType="x: map")().show() {code}
> This will fail with:
>   File "pandas/_libs/lib.pyx", line 2834, in pandas._libs.lib.map_infer
>   File "python/pyspark/sql/pandas/types.py", line 804, in convert_map
>     assert isinstance(value, dict)
> AssertionError
> Same for `convert_struct`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44559) Improve error messages for Python UDTF arrow type casts

2023-08-02 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin reassigned SPARK-44559:
-

Assignee: Allison Wang

> Improve error messages for Python UDTF arrow type casts
> ---
>
> Key: SPARK-44559
> URL: https://issues.apache.org/jira/browse/SPARK-44559
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
> Fix For: 3.5.0
>
>
> Currently, if a Python UDTF outputs a type that is incompatible with the 
> specified output schema, Spark will throw the following confusing error 
> message:
> {code:java}
>   File "pyarrow/array.pxi", line 1044, in pyarrow.lib.Array.from_pandas
>   File "pyarrow/array.pxi", line 316, in pyarrow.lib.array
>   File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
>   File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Could not convert [1, 2] with type list: tried to 
> convert to int32{code}
> We should improve this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] (SPARK-44561) Fix AssertionError when converting UDTF output to a complex type

2023-08-02 Thread Takuya Ueshin (Jira)


[ https://issues.apache.org/jira/browse/SPARK-44561 ]


Takuya Ueshin deleted comment on SPARK-44561:
---

was (Author: ueshin):
Issue resolved by pull request 42191
https://github.com/apache/spark/pull/42191

> Fix AssertionError when converting UDTF output to a complex type
> 
>
> Key: SPARK-44561
> URL: https://issues.apache.org/jira/browse/SPARK-44561
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
> Fix For: 4.0.0
>
>
> {code:java}
> class TestUDTF:
> def eval(self):
> yield {'a': 1, 'b': 2},
> udtf(TestUDTF, returnType="x: map")().show() {code}
> This will fail with:
>   File "pandas/_libs/lib.pyx", line 2834, in pandas._libs.lib.map_infer
>   File "python/pyspark/sql/pandas/types.py", line 804, in convert_map
>     assert isinstance(value, dict)
> AssertionError
> Same for `convert_struct`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44614) Add missing packages in setup.py

2023-08-01 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-44614:
--
Fix Version/s: 3.5.0

> Add missing packages in setup.py
> 
>
> Key: SPARK-44614
> URL: https://issues.apache.org/jira/browse/SPARK-44614
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
> Fix For: 3.5.0, 4.0.0
>
>
> Some packages for SQL module are missing in {{setup.py}} file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44614) Add missing packages in setup.py

2023-08-01 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-44614.
---
Fix Version/s: 4.0.0
 Assignee: Takuya Ueshin
   Resolution: Fixed

Issue resolved by pull request 42248
https://github.com/apache/spark/pull/42248

> Add missing packages in setup.py
> 
>
> Key: SPARK-44614
> URL: https://issues.apache.org/jira/browse/SPARK-44614
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
> Fix For: 4.0.0
>
>
> Some packages for SQL module are missing in {{setup.py}} file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44561) Fix AssertionError when converting UDTF output to a complex type

2023-07-31 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-44561.
---
Fix Version/s: 4.0.0
 Assignee: Allison Wang
   Resolution: Fixed

Issue resolved by pull request 42191
https://github.com/apache/spark/pull/42191

> Fix AssertionError when converting UDTF output to a complex type
> 
>
> Key: SPARK-44561
> URL: https://issues.apache.org/jira/browse/SPARK-44561
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
> Fix For: 4.0.0
>
>
> {code:java}
> class TestUDTF:
> def eval(self):
> yield {'a': 1, 'b': 2},
> udtf(TestUDTF, returnType="x: map")().show() {code}
> This will fail with:
>   File "pandas/_libs/lib.pyx", line 2834, in pandas._libs.lib.map_infer
>   File "python/pyspark/sql/pandas/types.py", line 804, in convert_map
>     assert isinstance(value, dict)
> AssertionError
> Same for `convert_struct`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44614) Add missing packages in setup.py

2023-07-31 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-44614:
-

 Summary: Add missing packages in setup.py
 Key: SPARK-44614
 URL: https://issues.apache.org/jira/browse/SPARK-44614
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.5.0
Reporter: Takuya Ueshin


Some packages for SQL module are missing in {{setup.py}} file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44603) Add pyspark.testing to setup.py

2023-07-31 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-44603.
---
Fix Version/s: 3.5.0
 Assignee: Amanda Liu
   Resolution: Fixed

Issue resolved by pull request 42231
https://github.com/apache/spark/pull/42231

> Add pyspark.testing to setup.py
> ---
>
> Key: SPARK-44603
> URL: https://issues.apache.org/jira/browse/SPARK-44603
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Amanda Liu
>Assignee: Amanda Liu
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44479) Support Python UDTFs with empty schema

2023-07-27 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-44479:
--
Fix Version/s: 3.5.0

> Support Python UDTFs with empty schema
> --
>
> Key: SPARK-44479
> URL: https://issues.apache.org/jira/browse/SPARK-44479
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
> Fix For: 3.5.0
>
>
> Support UDTFs with empty schema, for example:
> {code:python}
> >>> class TestUDTF:
> ...   def eval(self):
> ... yield tuple()
> {code}
> Currently it fails with `useArrow=True`:
> {code:python}
> >>> udtf(TestUDTF, returnType=StructType())().collect()
> Traceback (most recent call last):
> ...
> ValueError: not enough values to unpack (expected 2, got 0)
> {code}
> whereas without Arrow:
> {code:python}
> >>> udtf(TestUDTF, returnType=StructType(), useArrow=False)().collect()
> [Row()]
> {code}
> Otherwise, we should raise an error without Arrow, too, to be consistent.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43968) Improve error messages for Python UDTFs with wrong number of outputs

2023-07-27 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-43968.
---
Fix Version/s: 4.0.0
 Assignee: Allison Wang
   Resolution: Fixed

Issue resolved by pull request 42157
https://github.com/apache/spark/pull/42157

> Improve error messages for Python UDTFs with wrong number of outputs
> 
>
> Key: SPARK-43968
> URL: https://issues.apache.org/jira/browse/SPARK-43968
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
> Fix For: 4.0.0
>
>
> Improve the error messages for Python UDTFs when the number of outputs 
> mismatches the number of outputs specified in the return type of the UDTFs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44533) Add support for accumulator, broadcast, and Spark files in Python UDTF's analyze.

2023-07-26 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-44533.
---
  Assignee: Takuya Ueshin
Resolution: Fixed

Issue resolved by pull request 42135
https://github.com/apache/spark/pull/42135

> Add support for accumulator, broadcast, and Spark files in Python UDTF's 
> analyze.
> -
>
> Key: SPARK-44533
> URL: https://issues.apache.org/jira/browse/SPARK-44533
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44479) Support Python UDTFs with empty schema

2023-07-26 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-44479.
---
  Assignee: Takuya Ueshin
Resolution: Fixed

Issue resolved by pull request 42161
https://github.com/apache/spark/pull/42161

> Support Python UDTFs with empty schema
> --
>
> Key: SPARK-44479
> URL: https://issues.apache.org/jira/browse/SPARK-44479
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
>
> Support UDTFs with empty schema, for example:
> {code:python}
> >>> class TestUDTF:
> ...   def eval(self):
> ... yield tuple()
> {code}
> Currently it fails with `useArrow=True`:
> {code:python}
> >>> udtf(TestUDTF, returnType=StructType())().collect()
> Traceback (most recent call last):
> ...
> ValueError: not enough values to unpack (expected 2, got 0)
> {code}
> whereas without Arrow:
> {code:python}
> >>> udtf(TestUDTF, returnType=StructType(), useArrow=False)().collect()
> [Row()]
> {code}
> Otherwise, we should raise an error without Arrow, too, to be consistent.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44503) Support PARTITION BY and ORDER BY clause for table arguments

2023-07-24 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-44503.
---
Fix Version/s: 4.0.0
 Assignee: Daniel
   Resolution: Fixed

Issue resolved by pull request 42100
https://github.com/apache/spark/pull/42100

> Support PARTITION BY and ORDER BY clause for table arguments
> 
>
> Key: SPARK-44503
> URL: https://issues.apache.org/jira/browse/SPARK-44503
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Daniel
>Assignee: Daniel
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44533) Add support for accumulator, broadcast, and Spark files in Python UDTF's analyze.

2023-07-24 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-44533:
-

 Summary: Add support for accumulator, broadcast, and Spark files 
in Python UDTF's analyze.
 Key: SPARK-44533
 URL: https://issues.apache.org/jira/browse/SPARK-44533
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44479) Support Python UDTFs with empty schema

2023-07-18 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-44479:
--
Description: 
Support UDTFs with empty schema, for example:

{code:python}
>>> class TestUDTF:
...   def eval(self):
... yield tuple()
{code}

Currently it fails with `useArrow=True`:

{code:python}
>>> udtf(TestUDTF, returnType=StructType())().collect()
Traceback (most recent call last):
...
ValueError: not enough values to unpack (expected 2, got 0)
{code}

whereas without Arrow:

{code:python}
>>> udtf(TestUDTF, returnType=StructType(), useArrow=False)().collect()
[Row()]
{code}

Otherwise, we should raise an error without Arrow, too.


  was:
Support UDTFs with empty schema, for example:

{code:python}
>>> class TestUDTF:
...   def eval(self):
... yield tuple()
{code}

Currently it fails with `useArrow=True`:

{code:python}
>>> udtf(TestUDTF, returnType=StructType())().collect()
Traceback (most recent call last):
...
ValueError: not enough values to unpack (expected 2, got 0)
{code}

whereas without Arrow:

{code:python}
>>> udtf(TestUDTF, returnType=StructType(), useArrow=False)().collect()
[Row()]
{code}



> Support Python UDTFs with empty schema
> --
>
> Key: SPARK-44479
> URL: https://issues.apache.org/jira/browse/SPARK-44479
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> Support UDTFs with empty schema, for example:
> {code:python}
> >>> class TestUDTF:
> ...   def eval(self):
> ... yield tuple()
> {code}
> Currently it fails with `useArrow=True`:
> {code:python}
> >>> udtf(TestUDTF, returnType=StructType())().collect()
> Traceback (most recent call last):
> ...
> ValueError: not enough values to unpack (expected 2, got 0)
> {code}
> whereas without Arrow:
> {code:python}
> >>> udtf(TestUDTF, returnType=StructType(), useArrow=False)().collect()
> [Row()]
> {code}
> Otherwise, we should raise an error without Arrow, too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44479) Support Python UDTFs with empty schema

2023-07-18 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-44479:
--
Description: 
Support UDTFs with empty schema, for example:

{code:python}
>>> class TestUDTF:
...   def eval(self):
... yield tuple()
{code}

Currently it fails with `useArrow=True`:

{code:python}
>>> udtf(TestUDTF, returnType=StructType())().collect()
Traceback (most recent call last):
...
ValueError: not enough values to unpack (expected 2, got 0)
{code}

whereas without Arrow:

{code:python}
>>> udtf(TestUDTF, returnType=StructType(), useArrow=False)().collect()
[Row()]
{code}

Otherwise, we should raise an error without Arrow, too, to be consistent.


  was:
Support UDTFs with empty schema, for example:

{code:python}
>>> class TestUDTF:
...   def eval(self):
... yield tuple()
{code}

Currently it fails with `useArrow=True`:

{code:python}
>>> udtf(TestUDTF, returnType=StructType())().collect()
Traceback (most recent call last):
...
ValueError: not enough values to unpack (expected 2, got 0)
{code}

whereas without Arrow:

{code:python}
>>> udtf(TestUDTF, returnType=StructType(), useArrow=False)().collect()
[Row()]
{code}

Otherwise, we should raise an error without Arrow, too.



> Support Python UDTFs with empty schema
> --
>
> Key: SPARK-44479
> URL: https://issues.apache.org/jira/browse/SPARK-44479
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> Support UDTFs with empty schema, for example:
> {code:python}
> >>> class TestUDTF:
> ...   def eval(self):
> ... yield tuple()
> {code}
> Currently it fails with `useArrow=True`:
> {code:python}
> >>> udtf(TestUDTF, returnType=StructType())().collect()
> Traceback (most recent call last):
> ...
> ValueError: not enough values to unpack (expected 2, got 0)
> {code}
> whereas without Arrow:
> {code:python}
> >>> udtf(TestUDTF, returnType=StructType(), useArrow=False)().collect()
> [Row()]
> {code}
> Otherwise, we should raise an error without Arrow, too, to be consistent.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44479) Support Python UDTFs with empty schema

2023-07-18 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-44479:
-

 Summary: Support Python UDTFs with empty schema
 Key: SPARK-44479
 URL: https://issues.apache.org/jira/browse/SPARK-44479
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.5.0
Reporter: Takuya Ueshin


Support UDTFs with empty schema, for example:

{code:python}
>>> class TestUDTF:
...   def eval(self):
... yield tuple()
{code}

Currently it fails with `useArrow=True`:

{code:python}
>>> udtf(TestUDTF, returnType=StructType())().collect()
Traceback (most recent call last):
...
ValueError: not enough values to unpack (expected 2, got 0)
{code}

whereas without Arrow:

{code:python}
>>> udtf(TestUDTF, returnType=StructType(), useArrow=False)().collect()
[Row()]
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44395) Update table function arguments to require parentheses around identifier after the TABLE keyword

2023-07-13 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-44395.
---
Fix Version/s: 3.5.0
 Assignee: Daniel
   Resolution: Fixed

Issue resolved by pull request 41965
https://github.com/apache/spark/pull/41965

> Update table function arguments to require parentheses around identifier 
> after the TABLE keyword
> 
>
> Key: SPARK-44395
> URL: https://issues.apache.org/jira/browse/SPARK-44395
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Daniel
>Assignee: Daniel
>Priority: Major
> Fix For: 3.5.0
>
>
> Per the SQL standard, `TABLE identifier` should actually be passed as 
> `TABLE(identifier)`. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44380) Support for UDTF to analyze in Python

2023-07-11 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-44380:
-

 Summary: Support for UDTF to analyze in Python
 Key: SPARK-44380
 URL: https://issues.apache.org/jira/browse/SPARK-44380
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.5.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44249) Refactor PythonUDTFRunner to send its return type separately

2023-06-29 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-44249:
-

 Summary: Refactor PythonUDTFRunner to send its return type 
separately
 Key: SPARK-44249
 URL: https://issues.apache.org/jira/browse/SPARK-44249
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.5.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44233) Support an outer outer context in subquery resolution

2023-06-28 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-44233:
--
Description: 
{code:python}
>>> sql("select * from range(8) t, lateral (select * from t) s")
Traceback (most recent call last):
...
pyspark.errors.exceptions.captured.AnalysisException: [TABLE_OR_VIEW_NOT_FOUND] 
The table or view `t` cannot be found. Verify the spelling and correctness of 
the schema and catalog.
If you did not qualify the name with a schema, verify the current_schema() 
output, or qualify the name with the correct schema and catalog.
To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS.; 
line 1 pos 49;
'Project [*]
+- 'LateralJoin lateral-subquery#0 [], Inner
   :  +- 'SubqueryAlias s
   :     +- 'Project [*]
   :        +- 'UnresolvedRelation [t], [], false
   +- SubqueryAlias t
      +- Range (0, 8, step=1, splits=None){code}
The subquery {{(select * from t)}} seems not looking the outer outer context 
and fails to resolve {{t}}.

  was:
{code:java}
>>> sql("select * from range(8) t, lateral (select * from t) s")
Traceback (most recent call last):
...
pyspark.errors.exceptions.captured.AnalysisException: [TABLE_OR_VIEW_NOT_FOUND] 
The table or view `t` cannot be found. Verify the spelling and correctness of 
the schema and catalog.
If you did not qualify the name with a schema, verify the current_schema() 
output, or qualify the name with the correct schema and catalog.
To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS.; 
line 1 pos 49;
'Project [*]
+- 'LateralJoin lateral-subquery#0 [], Inner
   :  +- 'SubqueryAlias s
   :     +- 'Project [*]
   :        +- 'UnresolvedRelation [t], [], false
   +- SubqueryAlias t
      +- Range (0, 8, step=1, splits=None){code}
The subquery (select * from t) seems not looking the outer outer context and 
fails to resolve t.


> Support an outer outer context in subquery resolution
> -
>
> Key: SPARK-44233
> URL: https://issues.apache.org/jira/browse/SPARK-44233
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> {code:python}
> >>> sql("select * from range(8) t, lateral (select * from t) s")
> Traceback (most recent call last):
> ...
> pyspark.errors.exceptions.captured.AnalysisException: 
> [TABLE_OR_VIEW_NOT_FOUND] The table or view `t` cannot be found. Verify the 
> spelling and correctness of the schema and catalog.
> If you did not qualify the name with a schema, verify the current_schema() 
> output, or qualify the name with the correct schema and catalog.
> To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF 
> EXISTS.; line 1 pos 49;
> 'Project [*]
> +- 'LateralJoin lateral-subquery#0 [], Inner
>    :  +- 'SubqueryAlias s
>    :     +- 'Project [*]
>    :        +- 'UnresolvedRelation [t], [], false
>    +- SubqueryAlias t
>       +- Range (0, 8, step=1, splits=None){code}
> The subquery {{(select * from t)}} seems not looking the outer outer context 
> and fails to resolve {{t}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44233) Support an outer outer context in subquery resolution

2023-06-28 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-44233:
--
Description: 
{code:java}
>>> sql("select * from range(8) t, lateral (select * from t) s")
Traceback (most recent call last):
...
pyspark.errors.exceptions.captured.AnalysisException: [TABLE_OR_VIEW_NOT_FOUND] 
The table or view `t` cannot be found. Verify the spelling and correctness of 
the schema and catalog.
If you did not qualify the name with a schema, verify the current_schema() 
output, or qualify the name with the correct schema and catalog.
To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS.; 
line 1 pos 49;
'Project [*]
+- 'LateralJoin lateral-subquery#0 [], Inner
   :  +- 'SubqueryAlias s
   :     +- 'Project [*]
   :        +- 'UnresolvedRelation [t], [], false
   +- SubqueryAlias t
      +- Range (0, 8, step=1, splits=None){code}
The subquery (select * from t) seems not looking the outer outer context and 
fails to resolve t.

  was:
{code:java}
>>> sql("select * from range(8) t, lateral (select * from t) s")
Traceback (most recent call last):
...
pyspark.errors.exceptions.captured.AnalysisException: [TABLE_OR_VIEW_NOT_FOUND] 
The table or view `t` cannot be found. Verify the spelling and correctness of 
the schema and catalog.
If you did not qualify the name with a schema, verify the current_schema() 
output, or qualify the name with the correct schema and catalog.
To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS.; 
line 1 pos 49;
'Project [*]
+- 'LateralJoin lateral-subquery#0 [], Inner
   :  +- 'SubqueryAlias s
   :     +- 'Project [*]
   :        +- 'UnresolvedRelation [t], [], false
   +- SubqueryAlias t
      +- Range (0, 8, step=1, splits=None){code}


> Support an outer outer context in subquery resolution
> -
>
> Key: SPARK-44233
> URL: https://issues.apache.org/jira/browse/SPARK-44233
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> {code:java}
> >>> sql("select * from range(8) t, lateral (select * from t) s")
> Traceback (most recent call last):
> ...
> pyspark.errors.exceptions.captured.AnalysisException: 
> [TABLE_OR_VIEW_NOT_FOUND] The table or view `t` cannot be found. Verify the 
> spelling and correctness of the schema and catalog.
> If you did not qualify the name with a schema, verify the current_schema() 
> output, or qualify the name with the correct schema and catalog.
> To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF 
> EXISTS.; line 1 pos 49;
> 'Project [*]
> +- 'LateralJoin lateral-subquery#0 [], Inner
>    :  +- 'SubqueryAlias s
>    :     +- 'Project [*]
>    :        +- 'UnresolvedRelation [t], [], false
>    +- SubqueryAlias t
>       +- Range (0, 8, step=1, splits=None){code}
> The subquery (select * from t) seems not looking the outer outer context and 
> fails to resolve t.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44233) Support an outer outer context in subquery resolution

2023-06-28 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-44233:
--
Description: 
{code:java}
>>> sql("select * from range(8) t, lateral (select * from t) s")
Traceback (most recent call last):
...
pyspark.errors.exceptions.captured.AnalysisException: [TABLE_OR_VIEW_NOT_FOUND] 
The table or view `t` cannot be found. Verify the spelling and correctness of 
the schema and catalog.
If you did not qualify the name with a schema, verify the current_schema() 
output, or qualify the name with the correct schema and catalog.
To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS.; 
line 1 pos 49;
'Project [*]
+- 'LateralJoin lateral-subquery#0 [], Inner
   :  +- 'SubqueryAlias s
   :     +- 'Project [*]
   :        +- 'UnresolvedRelation [t], [], false
   +- SubqueryAlias t
      +- Range (0, 8, step=1, splits=None){code}

> Support an outer outer context in subquery resolution
> -
>
> Key: SPARK-44233
> URL: https://issues.apache.org/jira/browse/SPARK-44233
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> {code:java}
> >>> sql("select * from range(8) t, lateral (select * from t) s")
> Traceback (most recent call last):
> ...
> pyspark.errors.exceptions.captured.AnalysisException: 
> [TABLE_OR_VIEW_NOT_FOUND] The table or view `t` cannot be found. Verify the 
> spelling and correctness of the schema and catalog.
> If you did not qualify the name with a schema, verify the current_schema() 
> output, or qualify the name with the correct schema and catalog.
> To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF 
> EXISTS.; line 1 pos 49;
> 'Project [*]
> +- 'LateralJoin lateral-subquery#0 [], Inner
>    :  +- 'SubqueryAlias s
>    :     +- 'Project [*]
>    :        +- 'UnresolvedRelation [t], [], false
>    +- SubqueryAlias t
>       +- Range (0, 8, step=1, splits=None){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44233) Support an outer outer context in subquery resolution

2023-06-28 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-44233:
-

 Summary: Support an outer outer context in subquery resolution
 Key: SPARK-44233
 URL: https://issues.apache.org/jira/browse/SPARK-44233
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.5.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   >