[jira] [Resolved] (SPARK-47921) Fix ExecuteJobTag creation in ExecuteHolder
[ https://issues.apache.org/jira/browse/SPARK-47921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-47921. --- Fix Version/s: 3.5.2 4.0.0 Resolution: Fixed Issue resolved by pull request 46140 [https://github.com/apache/spark/pull/46140] > Fix ExecuteJobTag creation in ExecuteHolder > --- > > Key: SPARK-47921 > URL: https://issues.apache.org/jira/browse/SPARK-47921 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > Fix For: 3.5.2, 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47346) Make daemon mode configurable when creating Python workers
[ https://issues.apache.org/jira/browse/SPARK-47346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-47346. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45468 [https://github.com/apache/spark/pull/45468] > Make daemon mode configurable when creating Python workers > -- > > Key: SPARK-47346 > URL: https://issues.apache.org/jira/browse/SPARK-47346 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47346) Make daemon mode configurable when creating Python workers
[ https://issues.apache.org/jira/browse/SPARK-47346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin reassigned SPARK-47346: - Assignee: Allison Wang > Make daemon mode configurable when creating Python workers > -- > > Key: SPARK-47346 > URL: https://issues.apache.org/jira/browse/SPARK-47346 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47251) Block invalid types from the `args` argument for `sql` command
[ https://issues.apache.org/jira/browse/SPARK-47251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin updated SPARK-47251: -- Summary: Block invalid types from the `args` argument for `sql` command (was: Block invalid types from the `arg` argument for `sql` command) > Block invalid types from the `args` argument for `sql` command > -- > > Key: SPARK-47251 > URL: https://issues.apache.org/jira/browse/SPARK-47251 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.5.1 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47251) Block invalid types from the `arg` argument for `sql` command
Takuya Ueshin created SPARK-47251: - Summary: Block invalid types from the `arg` argument for `sql` command Key: SPARK-47251 URL: https://issues.apache.org/jira/browse/SPARK-47251 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.5.1 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47214) Create API for 'analyze' method to differentiate constant NULL arguments and other types of arguments
[ https://issues.apache.org/jira/browse/SPARK-47214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-47214. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45319 [https://github.com/apache/spark/pull/45319] > Create API for 'analyze' method to differentiate constant NULL arguments and > other types of arguments > - > > Key: SPARK-47214 > URL: https://issues.apache.org/jira/browse/SPARK-47214 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Daniel >Assignee: Daniel >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47214) Create API for 'analyze' method to differentiate constant NULL arguments and other types of arguments
[ https://issues.apache.org/jira/browse/SPARK-47214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin reassigned SPARK-47214: - Assignee: Daniel > Create API for 'analyze' method to differentiate constant NULL arguments and > other types of arguments > - > > Key: SPARK-47214 > URL: https://issues.apache.org/jira/browse/SPARK-47214 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Daniel >Assignee: Daniel >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47079) Unable to create PySpark dataframe containing Variant columns
[ https://issues.apache.org/jira/browse/SPARK-47079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin reassigned SPARK-47079: - Assignee: Desmond Cheong > Unable to create PySpark dataframe containing Variant columns > - > > Key: SPARK-47079 > URL: https://issues.apache.org/jira/browse/SPARK-47079 > Project: Spark > Issue Type: Bug > Components: Connect, PySpark, SQL >Affects Versions: 3.5.0 >Reporter: Desmond Cheong >Assignee: Desmond Cheong >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Trying to create a dataframe containing a variant type results in: > AssertionError: Undefined error message parameter for error class: > CANNOT_PARSE_DATATYPE. Parameters: {'error': "Undefined error message > parameter for error class: CANNOT_PARSE_DATATYPE. Parameters: > {'error': 'variant'} > "} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47079) Unable to create PySpark dataframe containing Variant columns
[ https://issues.apache.org/jira/browse/SPARK-47079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-47079. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45131 [https://github.com/apache/spark/pull/45131] > Unable to create PySpark dataframe containing Variant columns > - > > Key: SPARK-47079 > URL: https://issues.apache.org/jira/browse/SPARK-47079 > Project: Spark > Issue Type: Bug > Components: Connect, PySpark, SQL >Affects Versions: 3.5.0 >Reporter: Desmond Cheong >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Trying to create a dataframe containing a variant type results in: > AssertionError: Undefined error message parameter for error class: > CANNOT_PARSE_DATATYPE. Parameters: {'error': "Undefined error message > parameter for error class: CANNOT_PARSE_DATATYPE. Parameters: > {'error': 'variant'} > "} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47035) Protocol for client side StreamingQueryListener
[ https://issues.apache.org/jira/browse/SPARK-47035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-47035. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45091 [https://github.com/apache/spark/pull/45091] > Protocol for client side StreamingQueryListener > --- > > Key: SPARK-47035 > URL: https://issues.apache.org/jira/browse/SPARK-47035 > Project: Spark > Issue Type: Task > Components: Connect, Structured Streaming >Affects Versions: 4.0.0 >Reporter: Wei Liu >Assignee: Wei Liu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47137) Add getAll to spark.conf for feature parity with Scala
[ https://issues.apache.org/jira/browse/SPARK-47137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin updated SPARK-47137: -- Summary: Add getAll to spark.conf for feature parity with Scala (was: Add getAll for spark.conf for feature parity with Scala) > Add getAll to spark.conf for feature parity with Scala > -- > > Key: SPARK-47137 > URL: https://issues.apache.org/jira/browse/SPARK-47137 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47137) Add getAll for spark.conf for feature parity with Scala
[ https://issues.apache.org/jira/browse/SPARK-47137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin updated SPARK-47137: -- Summary: Add getAll for spark.conf for feature parity with Scala (was: Add getAll for pyspark.sql.conf for feature parity with Scala) > Add getAll for spark.conf for feature parity with Scala > --- > > Key: SPARK-47137 > URL: https://issues.apache.org/jira/browse/SPARK-47137 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47137) Add getAll for pyspark.sql.conf for feature parity with Scala
Takuya Ueshin created SPARK-47137: - Summary: Add getAll for pyspark.sql.conf for feature parity with Scala Key: SPARK-47137 URL: https://issues.apache.org/jira/browse/SPARK-47137 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 4.0.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47069) Introduce `spark.profile.show/dump` for SparkSession-based profiling
[ https://issues.apache.org/jira/browse/SPARK-47069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-47069. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45129 [https://github.com/apache/spark/pull/45129] > Introduce `spark.profile.show/dump` for SparkSession-based profiling > > > Key: SPARK-47069 > URL: https://issues.apache.org/jira/browse/SPARK-47069 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Introduce `spark.profile.show/dump` for SparkSession-based profiling -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47027) Use temporary directories for profiler test outputs
[ https://issues.apache.org/jira/browse/SPARK-47027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin updated SPARK-47027: -- Summary: Use temporary directories for profiler test outputs (was: Move TestUtils to the generic testing utils.) > Use temporary directories for profiler test outputs > --- > > Key: SPARK-47027 > URL: https://issues.apache.org/jira/browse/SPARK-47027 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 4.0.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47027) Move TestUtils to the generic testing utils.
Takuya Ueshin created SPARK-47027: - Summary: Move TestUtils to the generic testing utils. Key: SPARK-47027 URL: https://issues.apache.org/jira/browse/SPARK-47027 Project: Spark Issue Type: Test Components: Tests Affects Versions: 4.0.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47002) Enforce that 'AnalyzeResult' 'orderBy' field is a list of pyspark.sql.functions.OrderingColumn
[ https://issues.apache.org/jira/browse/SPARK-47002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-47002. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45062 [https://github.com/apache/spark/pull/45062] > Enforce that 'AnalyzeResult' 'orderBy' field is a list of > pyspark.sql.functions.OrderingColumn > -- > > Key: SPARK-47002 > URL: https://issues.apache.org/jira/browse/SPARK-47002 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Daniel >Assignee: Daniel >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47002) Enforce that 'AnalyzeResult' 'orderBy' field is a list of pyspark.sql.functions.OrderingColumn
[ https://issues.apache.org/jira/browse/SPARK-47002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin reassigned SPARK-47002: - Assignee: Daniel > Enforce that 'AnalyzeResult' 'orderBy' field is a list of > pyspark.sql.functions.OrderingColumn > -- > > Key: SPARK-47002 > URL: https://issues.apache.org/jira/browse/SPARK-47002 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Daniel >Assignee: Daniel >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46691) Support profiling on WindowInPandasExec
[ https://issues.apache.org/jira/browse/SPARK-46691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-46691. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45035 [https://github.com/apache/spark/pull/45035] > Support profiling on WindowInPandasExec > --- > > Key: SPARK-46691 > URL: https://issues.apache.org/jira/browse/SPARK-46691 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Takuya Ueshin >Assignee: Xinrong Meng >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46688) Support profiling on AggregateInPandasExec
[ https://issues.apache.org/jira/browse/SPARK-46688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin reassigned SPARK-46688: - Assignee: Xinrong Meng > Support profiling on AggregateInPandasExec > -- > > Key: SPARK-46688 > URL: https://issues.apache.org/jira/browse/SPARK-46688 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Takuya Ueshin >Assignee: Xinrong Meng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46688) Support profiling on AggregateInPandasExec
[ https://issues.apache.org/jira/browse/SPARK-46688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-46688. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45035 [https://github.com/apache/spark/pull/45035] > Support profiling on AggregateInPandasExec > -- > > Key: SPARK-46688 > URL: https://issues.apache.org/jira/browse/SPARK-46688 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Takuya Ueshin >Assignee: Xinrong Meng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46691) Support profiling on WindowInPandasExec
[ https://issues.apache.org/jira/browse/SPARK-46691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin reassigned SPARK-46691: - Assignee: Xinrong Meng > Support profiling on WindowInPandasExec > --- > > Key: SPARK-46691 > URL: https://issues.apache.org/jira/browse/SPARK-46691 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Takuya Ueshin >Assignee: Xinrong Meng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46966) Create API for 'analyze' method to indicate subset of input table columns to select
[ https://issues.apache.org/jira/browse/SPARK-46966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin reassigned SPARK-46966: - Assignee: Daniel > Create API for 'analyze' method to indicate subset of input table columns to > select > --- > > Key: SPARK-46966 > URL: https://issues.apache.org/jira/browse/SPARK-46966 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Daniel >Assignee: Daniel >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46966) Create API for 'analyze' method to indicate subset of input table columns to select
[ https://issues.apache.org/jira/browse/SPARK-46966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-46966. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45007 [https://github.com/apache/spark/pull/45007] > Create API for 'analyze' method to indicate subset of input table columns to > select > --- > > Key: SPARK-46966 > URL: https://issues.apache.org/jira/browse/SPARK-46966 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Daniel >Assignee: Daniel >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46687) Implement memory-profiler
[ https://issues.apache.org/jira/browse/SPARK-46687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-46687. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44775 [https://github.com/apache/spark/pull/44775] > Implement memory-profiler > - > > Key: SPARK-46687 > URL: https://issues.apache.org/jira/browse/SPARK-46687 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Takuya Ueshin >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46691) Support profiling on WindowInPandasExec
Takuya Ueshin created SPARK-46691: - Summary: Support profiling on WindowInPandasExec Key: SPARK-46691 URL: https://issues.apache.org/jira/browse/SPARK-46691 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46690) Support profiling on FlatMapCoGroupsInBatchExec
Takuya Ueshin created SPARK-46690: - Summary: Support profiling on FlatMapCoGroupsInBatchExec Key: SPARK-46690 URL: https://issues.apache.org/jira/browse/SPARK-46690 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46689) Support profiling on FlatMapGroupsInBatchExec
Takuya Ueshin created SPARK-46689: - Summary: Support profiling on FlatMapGroupsInBatchExec Key: SPARK-46689 URL: https://issues.apache.org/jira/browse/SPARK-46689 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46688) Support profiling on AggregateInPandasExec
Takuya Ueshin created SPARK-46688: - Summary: Support profiling on AggregateInPandasExec Key: SPARK-46688 URL: https://issues.apache.org/jira/browse/SPARK-46688 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46687) Implement memory-profiler
Takuya Ueshin created SPARK-46687: - Summary: Implement memory-profiler Key: SPARK-46687 URL: https://issues.apache.org/jira/browse/SPARK-46687 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46686) Basic support of SparkSession based Python UDF profiler
Takuya Ueshin created SPARK-46686: - Summary: Basic support of SparkSession based Python UDF profiler Key: SPARK-46686 URL: https://issues.apache.org/jira/browse/SPARK-46686 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46685) Introduce SparkSession based PySpark UDF profiler
Takuya Ueshin created SPARK-46685: - Summary: Introduce SparkSession based PySpark UDF profiler Key: SPARK-46685 URL: https://issues.apache.org/jira/browse/SPARK-46685 Project: Spark Issue Type: New Feature Components: PySpark Affects Versions: 4.0.0 Reporter: Takuya Ueshin The existing UDF profilers are SparkContext based, which can't support Spark Connect. We should introduce SparkSession based profilers and support Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46684) CoGroup.applyInPandas/Arrow should pass arguments properly
Takuya Ueshin created SPARK-46684: - Summary: CoGroup.applyInPandas/Arrow should pass arguments properly Key: SPARK-46684 URL: https://issues.apache.org/jira/browse/SPARK-46684 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.5.0 Reporter: Takuya Ueshin In Spark Connect, {{CoGroup.applyInPandas/Arrow}} doesn't take arguments properly, so the arguments of the UDF can be broken: {noformat} >>> import pandas as pd >>> >>> df1 = spark.createDataFrame( ... [(1, 1.0, "a"), (2, 2.0, "b"), (1, 3.0, "c"), (2, 4.0, "d")], ("id", "v1", "v2") ... ) >>> df2 = spark.createDataFrame([(1, "x"), (2, "y"), (1, "z")], ("id", "v3")) >>> >>> def summarize(left, right): ... return pd.DataFrame( ... { ... "left_rows": [len(left)], ... "left_columns": [len(left.columns)], ... "right_rows": [len(right)], ... "right_columns": [len(right.columns)], ... } ... ) ... >>> df = ( ... df1.groupby("id") ... .cogroup(df2.groupby("id")) ... .applyInPandas( ... summarize, ... schema="left_rows long, left_columns long, right_rows long, right_columns long", ... ) ... ) >>> >>> df.show() +-++--+-+ |left_rows|left_columns|right_rows|right_columns| +-++--+-+ |2| 1| 2|1| |2| 1| 1|1| +-++--+-+ {noformat} The result should be: {noformat} +-++--+-+ |left_rows|left_columns|right_rows|right_columns| +-++--+-+ | 2| 3| 2| 2| | 2| 3| 1| 2| +-++--+-+ {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46040) Update API for 'analyze' partitioning/ordering columns to support general expressions
[ https://issues.apache.org/jira/browse/SPARK-46040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-46040. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43946 [https://github.com/apache/spark/pull/43946] > Update API for 'analyze' partitioning/ordering columns to support general > expressions > - > > Key: SPARK-46040 > URL: https://issues.apache.org/jira/browse/SPARK-46040 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 4.0.0 >Reporter: Daniel >Assignee: Daniel >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45746) Return specific error messages if UDTF 'analyze' method accepts or returns wrong values
[ https://issues.apache.org/jira/browse/SPARK-45746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-45746. --- Fix Version/s: 4.0.0 Assignee: Daniel Resolution: Fixed Issue resolved by pull request 43611 https://github.com/apache/spark/pull/43611 > Return specific error messages if UDTF 'analyze' method accepts or returns > wrong values > --- > > Key: SPARK-45746 > URL: https://issues.apache.org/jira/browse/SPARK-45746 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Daniel >Assignee: Daniel >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45810) Create API to stop consuming rows from the input table
[ https://issues.apache.org/jira/browse/SPARK-45810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-45810. --- Assignee: Daniel Resolution: Fixed Issue resolved by pull request 43682 https://github.com/apache/spark/pull/43682 > Create API to stop consuming rows from the input table > -- > > Key: SPARK-45810 > URL: https://issues.apache.org/jira/browse/SPARK-45810 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Daniel >Assignee: Daniel >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45813) Return the observed metrics from commands
Takuya Ueshin created SPARK-45813: - Summary: Return the observed metrics from commands Key: SPARK-45813 URL: https://issues.apache.org/jira/browse/SPARK-45813 Project: Spark Issue Type: Improvement Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45656) Fix observation when named observations with the same name on different datasets.
Takuya Ueshin created SPARK-45656: - Summary: Fix observation when named observations with the same name on different datasets. Key: SPARK-45656 URL: https://issues.apache.org/jira/browse/SPARK-45656 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.0, 4.0.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45524) Initial support for Python data source read API
[ https://issues.apache.org/jira/browse/SPARK-45524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-45524. --- Fix Version/s: 4.0.0 Assignee: Allison Wang Resolution: Fixed Issue resolved by pull request 43360 https://github.com/apache/spark/pull/43360 > Initial support for Python data source read API > --- > > Key: SPARK-45524 > URL: https://issues.apache.org/jira/browse/SPARK-45524 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Add API for data source and data source reader and add Catalyst + execution > support. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45620) Fix user-facing APIs related to Python UDTF to use camelCase.
Takuya Ueshin created SPARK-45620: - Summary: Fix user-facing APIs related to Python UDTF to use camelCase. Key: SPARK-45620 URL: https://issues.apache.org/jira/browse/SPARK-45620 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 4.0.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45523) Return useful error message if UDTF returns None for non-nullable column
[ https://issues.apache.org/jira/browse/SPARK-45523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-45523. --- Fix Version/s: 4.0.0 Assignee: Daniel Resolution: Fixed Issue resolved by pull request 43356 https://github.com/apache/spark/pull/43356 > Return useful error message if UDTF returns None for non-nullable column > > > Key: SPARK-45523 > URL: https://issues.apache.org/jira/browse/SPARK-45523 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Daniel >Assignee: Daniel >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45619) Apply the observed metrics to Observation object.
Takuya Ueshin created SPARK-45619: - Summary: Apply the observed metrics to Observation object. Key: SPARK-45619 URL: https://issues.apache.org/jira/browse/SPARK-45619 Project: Spark Issue Type: Improvement Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45577) Fix UserDefinedPythonTableFunctionAnalyzeRunner to pass folded values from named arguments
[ https://issues.apache.org/jira/browse/SPARK-45577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-45577. --- Fix Version/s: 4.0.0 Assignee: Takuya Ueshin Resolution: Fixed Issue resolved by pull request 43407 https://github.com/apache/spark/pull/43407 > Fix UserDefinedPythonTableFunctionAnalyzeRunner to pass folded values from > named arguments > -- > > Key: SPARK-45577 > URL: https://issues.apache.org/jira/browse/SPARK-45577 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45577) Fix UserDefinedPythonTableFunctionAnalyzeRunner to pass folded values from named arguments
Takuya Ueshin created SPARK-45577: - Summary: Fix UserDefinedPythonTableFunctionAnalyzeRunner to pass folded values from named arguments Key: SPARK-45577 URL: https://issues.apache.org/jira/browse/SPARK-45577 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 4.0.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45505) Refactor analyzeInPython function to make it reusable
[ https://issues.apache.org/jira/browse/SPARK-45505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-45505. --- Fix Version/s: 4.0.0 Assignee: Allison Wang Resolution: Fixed Issue resolved by pull request 43340 https://github.com/apache/spark/pull/43340 > Refactor analyzeInPython function to make it reusable > - > > Key: SPARK-45505 > URL: https://issues.apache.org/jira/browse/SPARK-45505 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Refactor analyzeInPython method in UserDefinedPythonTableFunction object into > an abstract class so that it can be reused in the future. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45402) Add API for 'analyze' method to return a buffer to be consumed on each class creation
[ https://issues.apache.org/jira/browse/SPARK-45402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-45402. --- Fix Version/s: 4.0.0 Assignee: Daniel Resolution: Fixed Issue resolved by pull request 43204 https://github.com/apache/spark/pull/43204 > Add API for 'analyze' method to return a buffer to be consumed on each class > creation > - > > Key: SPARK-45402 > URL: https://issues.apache.org/jira/browse/SPARK-45402 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Daniel >Assignee: Daniel >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45494) Introduce read/write a byte array util functions for PythonWorkerUtils
Takuya Ueshin created SPARK-45494: - Summary: Introduce read/write a byte array util functions for PythonWorkerUtils Key: SPARK-45494 URL: https://issues.apache.org/jira/browse/SPARK-45494 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 4.0.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45441) Introduce more util functions for PythonWorkerUtils
Takuya Ueshin created SPARK-45441: - Summary: Introduce more util functions for PythonWorkerUtils Key: SPARK-45441 URL: https://issues.apache.org/jira/browse/SPARK-45441 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 4.0.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45405) Refactor Python UDTF execution
Takuya Ueshin created SPARK-45405: - Summary: Refactor Python UDTF execution Key: SPARK-45405 URL: https://issues.apache.org/jira/browse/SPARK-45405 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 4.0.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45362) Project out PARTITION BY expressions before 'eval' method consumes input rows
[ https://issues.apache.org/jira/browse/SPARK-45362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-45362. --- Fix Version/s: 4.0.0 Assignee: Daniel Resolution: Fixed Issue resolved by pull request 43156 https://github.com/apache/spark/pull/43156 > Project out PARTITION BY expressions before 'eval' method consumes input rows > - > > Key: SPARK-45362 > URL: https://issues.apache.org/jira/browse/SPARK-45362 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 4.0.0 >Reporter: Daniel >Assignee: Daniel >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45266) Refactor ResolveFunctions analyzer rule to delay making lateral join when table arguments are used
[ https://issues.apache.org/jira/browse/SPARK-45266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-45266. --- Fix Version/s: 4.0.0 Assignee: Takuya Ueshin Resolution: Fixed Issue resolved by pull request 43042 https://github.com/apache/spark/pull/43042 > Refactor ResolveFunctions analyzer rule to delay making lateral join when > table arguments are used > -- > > Key: SPARK-45266 > URL: https://issues.apache.org/jira/browse/SPARK-45266 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45266) Refactor ResolveFunctions analyzer rule to delay making lateral join when table arguments are used
Takuya Ueshin created SPARK-45266: - Summary: Refactor ResolveFunctions analyzer rule to delay making lateral join when table arguments are used Key: SPARK-45266 URL: https://issues.apache.org/jira/browse/SPARK-45266 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 4.0.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45118) Refactor converters for complex types to short cut when the element types don't need converters
[ https://issues.apache.org/jira/browse/SPARK-45118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-45118. --- Fix Version/s: 4.0.0 Assignee: Takuya Ueshin Resolution: Fixed Issue resolved by pull request 42874 https://github.com/apache/spark/pull/42874 > Refactor converters for complex types to short cut when the element types > don't need converters > --- > > Key: SPARK-45118 > URL: https://issues.apache.org/jira/browse/SPARK-45118 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45159) Handle named arguments only when necessary
Takuya Ueshin created SPARK-45159: - Summary: Handle named arguments only when necessary Key: SPARK-45159 URL: https://issues.apache.org/jira/browse/SPARK-45159 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 4.0.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45118) Refactor converters for complex types to short cut when the element types don't need converters
Takuya Ueshin created SPARK-45118: - Summary: Refactor converters for complex types to short cut when the element types don't need converters Key: SPARK-45118 URL: https://issues.apache.org/jira/browse/SPARK-45118 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 4.0.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44901) Add API in 'analyze' method to return partitioning/ordering expressions
[ https://issues.apache.org/jira/browse/SPARK-44901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-44901. --- Fix Version/s: 4.0.0 Assignee: Daniel Resolution: Fixed Issue resolved by pull request 42595 https://github.com/apache/spark/pull/42595 > Add API in 'analyze' method to return partitioning/ordering expressions > --- > > Key: SPARK-44901 > URL: https://issues.apache.org/jira/browse/SPARK-44901 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Daniel >Assignee: Daniel >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44952) Add named argument support for aggregate Pandas UDFs
[ https://issues.apache.org/jira/browse/SPARK-44952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-44952. --- Fix Version/s: 4.0.0 Assignee: Takuya Ueshin Resolution: Fixed Issue resolved by pull request 42663 https://github.com/apache/spark/pull/42663 > Add named argument support for aggregate Pandas UDFs > > > Key: SPARK-44952 > URL: https://issues.apache.org/jira/browse/SPARK-44952 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44952) Add named argument support for aggregate Pandas UDFs
Takuya Ueshin created SPARK-44952: - Summary: Add named argument support for aggregate Pandas UDFs Key: SPARK-44952 URL: https://issues.apache.org/jira/browse/SPARK-44952 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 4.0.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44918) Add named argument support for scalar Python/Pandas UDFs
[ https://issues.apache.org/jira/browse/SPARK-44918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-44918. --- Fix Version/s: 4.0.0 Assignee: Takuya Ueshin Resolution: Fixed Issue resolved by pull request 42617 https://github.com/apache/spark/pull/42617 > Add named argument support for scalar Python/Pandas UDFs > > > Key: SPARK-44918 > URL: https://issues.apache.org/jira/browse/SPARK-44918 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44918) Add named argument support for scalar Python/Pandas UDFs
Takuya Ueshin created SPARK-44918: - Summary: Add named argument support for scalar Python/Pandas UDFs Key: SPARK-44918 URL: https://issues.apache.org/jira/browse/SPARK-44918 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 4.0.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44748) Query execution to support PARTITION BY and ORDER BY clause for table arguments
[ https://issues.apache.org/jira/browse/SPARK-44748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-44748. --- Fix Version/s: 4.0.0 Assignee: Daniel Resolution: Fixed Issue resolved by pull request 42420 https://github.com/apache/spark/pull/42420 > Query execution to support PARTITION BY and ORDER BY clause for table > arguments > --- > > Key: SPARK-44748 > URL: https://issues.apache.org/jira/browse/SPARK-44748 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Daniel >Assignee: Daniel >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44876) Enable and fix test_parity_arrow_python_udf
[ https://issues.apache.org/jira/browse/SPARK-44876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin updated SPARK-44876: -- Description: {{pyspark.sql.tests.connect.test_parity_arrow_python_udf}} is not listed in {{dev/sparktestsupport/modules.py}}, and it fails when running manually. {code} == ERROR [0.072s]: test_register (pyspark.sql.tests.connect.test_parity_arrow_python_udf.ArrowPythonUDFParityTests) -- Traceback (most recent call last): ... pyspark.errors.exceptions.base.PySparkRuntimeError: [SCHEMA_MISMATCH_FOR_PANDAS_UDF] Result vector from pandas_udf was not the required length: expected 1, got 38. {code} was:{{pyspark.sql.tests.connect.test_parity_arrow_python_udf}} is not listed in {{dev/sparktestsupport/modules.py}}, and it fails when running manually. > Enable and fix test_parity_arrow_python_udf > --- > > Key: SPARK-44876 > URL: https://issues.apache.org/jira/browse/SPARK-44876 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Takuya Ueshin >Priority: Blocker > > {{pyspark.sql.tests.connect.test_parity_arrow_python_udf}} is not listed in > {{dev/sparktestsupport/modules.py}}, and it fails when running manually. > {code} > == > ERROR [0.072s]: test_register > (pyspark.sql.tests.connect.test_parity_arrow_python_udf.ArrowPythonUDFParityTests) > -- > Traceback (most recent call last): > ... > pyspark.errors.exceptions.base.PySparkRuntimeError: > [SCHEMA_MISMATCH_FOR_PANDAS_UDF] Result vector from pandas_udf was not the > required length: expected 1, got 38. > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44876) Enable and fix test_parity_arrow_python_udf
Takuya Ueshin created SPARK-44876: - Summary: Enable and fix test_parity_arrow_python_udf Key: SPARK-44876 URL: https://issues.apache.org/jira/browse/SPARK-44876 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.5.0 Reporter: Takuya Ueshin {{pyspark.sql.tests.connect.test_parity_arrow_python_udf}} is not listed in {{dev/sparktestsupport/modules.py}}, and it fails when running manually. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44834) Add SQL query test suites for Python UDTFs
[ https://issues.apache.org/jira/browse/SPARK-44834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-44834. --- Fix Version/s: 3.5.0 Assignee: Allison Wang Resolution: Fixed Issue resolved by pull request 42517 https://github.com/apache/spark/pull/42517 > Add SQL query test suites for Python UDTFs > -- > > Key: SPARK-44834 > URL: https://issues.apache.org/jira/browse/SPARK-44834 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Fix For: 3.5.0 > > > Add SQL query test suites for executing Python UDTFs in SQL. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44836) Refactor Arrow Python UDTF
[ https://issues.apache.org/jira/browse/SPARK-44836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-44836. --- Fix Version/s: 3.5.0 Assignee: Takuya Ueshin Resolution: Fixed Issue resolved by pull request 42520 https://github.com/apache/spark/pull/42520 > Refactor Arrow Python UDTF > -- > > Key: SPARK-44836 > URL: https://issues.apache.org/jira/browse/SPARK-44836 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44836) Refactor Arrow Python UDTF
Takuya Ueshin created SPARK-44836: - Summary: Refactor Arrow Python UDTF Key: SPARK-44836 URL: https://issues.apache.org/jira/browse/SPARK-44836 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.5.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44749) Support named arguments in Python UDTF
[ https://issues.apache.org/jira/browse/SPARK-44749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-44749. --- Fix Version/s: 4.0.0 Assignee: Takuya Ueshin Resolution: Fixed Issue resolved by pull request 42422 https://github.com/apache/spark/pull/42422 > Support named arguments in Python UDTF > -- > > Key: SPARK-44749 > URL: https://issues.apache.org/jira/browse/SPARK-44749 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44749) Support named arguments in Python UDTF
Takuya Ueshin created SPARK-44749: - Summary: Support named arguments in Python UDTF Key: SPARK-44749 URL: https://issues.apache.org/jira/browse/SPARK-44749 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44561) Fix AssertionError when converting UDTF output to a complex type
[ https://issues.apache.org/jira/browse/SPARK-44561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-44561. --- Fix Version/s: 3.5.0 Assignee: Takuya Ueshin Resolution: Fixed Issue resolved by pull request 42310 https://github.com/apache/spark/pull/42310 > Fix AssertionError when converting UDTF output to a complex type > > > Key: SPARK-44561 > URL: https://issues.apache.org/jira/browse/SPARK-44561 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Allison Wang >Assignee: Takuya Ueshin >Priority: Major > Fix For: 3.5.0 > > > {code:java} > class TestUDTF: > def eval(self): > yield {'a': 1, 'b': 2}, > udtf(TestUDTF, returnType="x: map")().show() {code} > This will fail with: > File "pandas/_libs/lib.pyx", line 2834, in pandas._libs.lib.map_infer > File "python/pyspark/sql/pandas/types.py", line 804, in convert_map > assert isinstance(value, dict) > AssertionError > Same for `convert_struct` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44433) Implement termination of Python process for foreachBatch & streaming listener
[ https://issues.apache.org/jira/browse/SPARK-44433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-44433. --- Assignee: Wei Liu Resolution: Fixed Issue resolved by pull request 42283 https://github.com/apache/spark/pull/42283 > Implement termination of Python process for foreachBatch & streaming listener > - > > Key: SPARK-44433 > URL: https://issues.apache.org/jira/browse/SPARK-44433 > Project: Spark > Issue Type: Task > Components: Connect, Structured Streaming >Affects Versions: 3.4.1 >Reporter: Raghu Angadi >Assignee: Wei Liu >Priority: Major > Fix For: 3.5.0 > > > In the first implementation of Python support for foreachBatch, the python > process termination is not handled correctly. > > See the long TODO in > [https://github.com/apache/spark/blob/master/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/StreamingForeachBatchHelper.scala] > > about an outline of the feature to terminate the runners by registering > StreamingQueryListners. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44663) Disable arrow optimization by default for Python UDTFs
[ https://issues.apache.org/jira/browse/SPARK-44663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-44663. --- Fix Version/s: 3.5.0 Assignee: Allison Wang Resolution: Fixed Issue resolved by pull request 42329 https://github.com/apache/spark/pull/42329 > Disable arrow optimization by default for Python UDTFs > -- > > Key: SPARK-44663 > URL: https://issues.apache.org/jira/browse/SPARK-44663 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Fix For: 3.5.0 > > > Disable arrow optimization to make Python UDTFs consistent with Python UDFs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44644) Improve error messages for creating Python UDTFs with pickling errors
[ https://issues.apache.org/jira/browse/SPARK-44644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-44644. --- Fix Version/s: 4.0.0 Target Version/s: 3.5.0 Assignee: Allison Wang Resolution: Fixed Issue resolved by pull request 42309 https://github.com/apache/spark/pull/42309 > Improve error messages for creating Python UDTFs with pickling errors > - > > Key: SPARK-44644 > URL: https://issues.apache.org/jira/browse/SPARK-44644 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Fix For: 4.0.0 > > > Currently, when users create a Python UDTF with a non-pickleable object, it > throws this error: > _pickle.PicklingError: Cannot pickle files that are not opened for reading: w > > We should make this more user-friendly -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44648) Set up memory limits for analyze in Python.
[ https://issues.apache.org/jira/browse/SPARK-44648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-44648. --- Assignee: Takuya Ueshin Resolution: Fixed Issue resolved by pull request 42328 https://github.com/apache/spark/pull/42328 > Set up memory limits for analyze in Python. > --- > > Key: SPARK-44648 > URL: https://issues.apache.org/jira/browse/SPARK-44648 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44648) Set up memory limits for analyze in Python.
Takuya Ueshin created SPARK-44648: - Summary: Set up memory limits for analyze in Python. Key: SPARK-44648 URL: https://issues.apache.org/jira/browse/SPARK-44648 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44643) __repr__ broken for Row when the field is empty Row
Takuya Ueshin created SPARK-44643: - Summary: __repr__ broken for Row when the field is empty Row Key: SPARK-44643 URL: https://issues.apache.org/jira/browse/SPARK-44643 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.5.0 Reporter: Takuya Ueshin PySpark {{Row}} raises and exception if the field is empty Row: {code:python} >>> repr(Row(Row())) Traceback (most recent call last): ... TypeError: not enough arguments for format string {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44561) Fix AssertionError when converting UDTF output to a complex type
[ https://issues.apache.org/jira/browse/SPARK-44561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin updated SPARK-44561: -- Fix Version/s: (was: 4.0.0) > Fix AssertionError when converting UDTF output to a complex type > > > Key: SPARK-44561 > URL: https://issues.apache.org/jira/browse/SPARK-44561 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Allison Wang >Priority: Major > > {code:java} > class TestUDTF: > def eval(self): > yield {'a': 1, 'b': 2}, > udtf(TestUDTF, returnType="x: map")().show() {code} > This will fail with: > File "pandas/_libs/lib.pyx", line 2834, in pandas._libs.lib.map_infer > File "python/pyspark/sql/pandas/types.py", line 804, in convert_map > assert isinstance(value, dict) > AssertionError > Same for `convert_struct` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44561) Fix AssertionError when converting UDTF output to a complex type
[ https://issues.apache.org/jira/browse/SPARK-44561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin reassigned SPARK-44561: - Assignee: (was: Allison Wang) > Fix AssertionError when converting UDTF output to a complex type > > > Key: SPARK-44561 > URL: https://issues.apache.org/jira/browse/SPARK-44561 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Allison Wang >Priority: Major > Fix For: 4.0.0 > > > {code:java} > class TestUDTF: > def eval(self): > yield {'a': 1, 'b': 2}, > udtf(TestUDTF, returnType="x: map")().show() {code} > This will fail with: > File "pandas/_libs/lib.pyx", line 2834, in pandas._libs.lib.map_infer > File "python/pyspark/sql/pandas/types.py", line 804, in convert_map > assert isinstance(value, dict) > AssertionError > Same for `convert_struct` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44559) Improve error messages for Python UDTF arrow type casts
[ https://issues.apache.org/jira/browse/SPARK-44559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin reassigned SPARK-44559: - Assignee: Allison Wang > Improve error messages for Python UDTF arrow type casts > --- > > Key: SPARK-44559 > URL: https://issues.apache.org/jira/browse/SPARK-44559 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Fix For: 3.5.0 > > > Currently, if a Python UDTF outputs a type that is incompatible with the > specified output schema, Spark will throw the following confusing error > message: > {code:java} > File "pyarrow/array.pxi", line 1044, in pyarrow.lib.Array.from_pandas > File "pyarrow/array.pxi", line 316, in pyarrow.lib.array > File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array > File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status > pyarrow.lib.ArrowInvalid: Could not convert [1, 2] with type list: tried to > convert to int32{code} > We should improve this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-44561) Fix AssertionError when converting UDTF output to a complex type
[ https://issues.apache.org/jira/browse/SPARK-44561 ] Takuya Ueshin deleted comment on SPARK-44561: --- was (Author: ueshin): Issue resolved by pull request 42191 https://github.com/apache/spark/pull/42191 > Fix AssertionError when converting UDTF output to a complex type > > > Key: SPARK-44561 > URL: https://issues.apache.org/jira/browse/SPARK-44561 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Fix For: 4.0.0 > > > {code:java} > class TestUDTF: > def eval(self): > yield {'a': 1, 'b': 2}, > udtf(TestUDTF, returnType="x: map")().show() {code} > This will fail with: > File "pandas/_libs/lib.pyx", line 2834, in pandas._libs.lib.map_infer > File "python/pyspark/sql/pandas/types.py", line 804, in convert_map > assert isinstance(value, dict) > AssertionError > Same for `convert_struct` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44614) Add missing packages in setup.py
[ https://issues.apache.org/jira/browse/SPARK-44614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin updated SPARK-44614: -- Fix Version/s: 3.5.0 > Add missing packages in setup.py > > > Key: SPARK-44614 > URL: https://issues.apache.org/jira/browse/SPARK-44614 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > Fix For: 3.5.0, 4.0.0 > > > Some packages for SQL module are missing in {{setup.py}} file. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44614) Add missing packages in setup.py
[ https://issues.apache.org/jira/browse/SPARK-44614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-44614. --- Fix Version/s: 4.0.0 Assignee: Takuya Ueshin Resolution: Fixed Issue resolved by pull request 42248 https://github.com/apache/spark/pull/42248 > Add missing packages in setup.py > > > Key: SPARK-44614 > URL: https://issues.apache.org/jira/browse/SPARK-44614 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > Fix For: 4.0.0 > > > Some packages for SQL module are missing in {{setup.py}} file. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44561) Fix AssertionError when converting UDTF output to a complex type
[ https://issues.apache.org/jira/browse/SPARK-44561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-44561. --- Fix Version/s: 4.0.0 Assignee: Allison Wang Resolution: Fixed Issue resolved by pull request 42191 https://github.com/apache/spark/pull/42191 > Fix AssertionError when converting UDTF output to a complex type > > > Key: SPARK-44561 > URL: https://issues.apache.org/jira/browse/SPARK-44561 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Fix For: 4.0.0 > > > {code:java} > class TestUDTF: > def eval(self): > yield {'a': 1, 'b': 2}, > udtf(TestUDTF, returnType="x: map")().show() {code} > This will fail with: > File "pandas/_libs/lib.pyx", line 2834, in pandas._libs.lib.map_infer > File "python/pyspark/sql/pandas/types.py", line 804, in convert_map > assert isinstance(value, dict) > AssertionError > Same for `convert_struct` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44614) Add missing packages in setup.py
Takuya Ueshin created SPARK-44614: - Summary: Add missing packages in setup.py Key: SPARK-44614 URL: https://issues.apache.org/jira/browse/SPARK-44614 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.5.0 Reporter: Takuya Ueshin Some packages for SQL module are missing in {{setup.py}} file. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44603) Add pyspark.testing to setup.py
[ https://issues.apache.org/jira/browse/SPARK-44603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-44603. --- Fix Version/s: 3.5.0 Assignee: Amanda Liu Resolution: Fixed Issue resolved by pull request 42231 https://github.com/apache/spark/pull/42231 > Add pyspark.testing to setup.py > --- > > Key: SPARK-44603 > URL: https://issues.apache.org/jira/browse/SPARK-44603 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Amanda Liu >Assignee: Amanda Liu >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44479) Support Python UDTFs with empty schema
[ https://issues.apache.org/jira/browse/SPARK-44479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin updated SPARK-44479: -- Fix Version/s: 3.5.0 > Support Python UDTFs with empty schema > -- > > Key: SPARK-44479 > URL: https://issues.apache.org/jira/browse/SPARK-44479 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > Fix For: 3.5.0 > > > Support UDTFs with empty schema, for example: > {code:python} > >>> class TestUDTF: > ... def eval(self): > ... yield tuple() > {code} > Currently it fails with `useArrow=True`: > {code:python} > >>> udtf(TestUDTF, returnType=StructType())().collect() > Traceback (most recent call last): > ... > ValueError: not enough values to unpack (expected 2, got 0) > {code} > whereas without Arrow: > {code:python} > >>> udtf(TestUDTF, returnType=StructType(), useArrow=False)().collect() > [Row()] > {code} > Otherwise, we should raise an error without Arrow, too, to be consistent. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43968) Improve error messages for Python UDTFs with wrong number of outputs
[ https://issues.apache.org/jira/browse/SPARK-43968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-43968. --- Fix Version/s: 4.0.0 Assignee: Allison Wang Resolution: Fixed Issue resolved by pull request 42157 https://github.com/apache/spark/pull/42157 > Improve error messages for Python UDTFs with wrong number of outputs > > > Key: SPARK-43968 > URL: https://issues.apache.org/jira/browse/SPARK-43968 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Fix For: 4.0.0 > > > Improve the error messages for Python UDTFs when the number of outputs > mismatches the number of outputs specified in the return type of the UDTFs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44533) Add support for accumulator, broadcast, and Spark files in Python UDTF's analyze.
[ https://issues.apache.org/jira/browse/SPARK-44533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-44533. --- Assignee: Takuya Ueshin Resolution: Fixed Issue resolved by pull request 42135 https://github.com/apache/spark/pull/42135 > Add support for accumulator, broadcast, and Spark files in Python UDTF's > analyze. > - > > Key: SPARK-44533 > URL: https://issues.apache.org/jira/browse/SPARK-44533 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44479) Support Python UDTFs with empty schema
[ https://issues.apache.org/jira/browse/SPARK-44479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-44479. --- Assignee: Takuya Ueshin Resolution: Fixed Issue resolved by pull request 42161 https://github.com/apache/spark/pull/42161 > Support Python UDTFs with empty schema > -- > > Key: SPARK-44479 > URL: https://issues.apache.org/jira/browse/SPARK-44479 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > > Support UDTFs with empty schema, for example: > {code:python} > >>> class TestUDTF: > ... def eval(self): > ... yield tuple() > {code} > Currently it fails with `useArrow=True`: > {code:python} > >>> udtf(TestUDTF, returnType=StructType())().collect() > Traceback (most recent call last): > ... > ValueError: not enough values to unpack (expected 2, got 0) > {code} > whereas without Arrow: > {code:python} > >>> udtf(TestUDTF, returnType=StructType(), useArrow=False)().collect() > [Row()] > {code} > Otherwise, we should raise an error without Arrow, too, to be consistent. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44503) Support PARTITION BY and ORDER BY clause for table arguments
[ https://issues.apache.org/jira/browse/SPARK-44503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-44503. --- Fix Version/s: 4.0.0 Assignee: Daniel Resolution: Fixed Issue resolved by pull request 42100 https://github.com/apache/spark/pull/42100 > Support PARTITION BY and ORDER BY clause for table arguments > > > Key: SPARK-44503 > URL: https://issues.apache.org/jira/browse/SPARK-44503 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Daniel >Assignee: Daniel >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44533) Add support for accumulator, broadcast, and Spark files in Python UDTF's analyze.
Takuya Ueshin created SPARK-44533: - Summary: Add support for accumulator, broadcast, and Spark files in Python UDTF's analyze. Key: SPARK-44533 URL: https://issues.apache.org/jira/browse/SPARK-44533 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44479) Support Python UDTFs with empty schema
[ https://issues.apache.org/jira/browse/SPARK-44479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin updated SPARK-44479: -- Description: Support UDTFs with empty schema, for example: {code:python} >>> class TestUDTF: ... def eval(self): ... yield tuple() {code} Currently it fails with `useArrow=True`: {code:python} >>> udtf(TestUDTF, returnType=StructType())().collect() Traceback (most recent call last): ... ValueError: not enough values to unpack (expected 2, got 0) {code} whereas without Arrow: {code:python} >>> udtf(TestUDTF, returnType=StructType(), useArrow=False)().collect() [Row()] {code} Otherwise, we should raise an error without Arrow, too. was: Support UDTFs with empty schema, for example: {code:python} >>> class TestUDTF: ... def eval(self): ... yield tuple() {code} Currently it fails with `useArrow=True`: {code:python} >>> udtf(TestUDTF, returnType=StructType())().collect() Traceback (most recent call last): ... ValueError: not enough values to unpack (expected 2, got 0) {code} whereas without Arrow: {code:python} >>> udtf(TestUDTF, returnType=StructType(), useArrow=False)().collect() [Row()] {code} > Support Python UDTFs with empty schema > -- > > Key: SPARK-44479 > URL: https://issues.apache.org/jira/browse/SPARK-44479 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Takuya Ueshin >Priority: Major > > Support UDTFs with empty schema, for example: > {code:python} > >>> class TestUDTF: > ... def eval(self): > ... yield tuple() > {code} > Currently it fails with `useArrow=True`: > {code:python} > >>> udtf(TestUDTF, returnType=StructType())().collect() > Traceback (most recent call last): > ... > ValueError: not enough values to unpack (expected 2, got 0) > {code} > whereas without Arrow: > {code:python} > >>> udtf(TestUDTF, returnType=StructType(), useArrow=False)().collect() > [Row()] > {code} > Otherwise, we should raise an error without Arrow, too. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44479) Support Python UDTFs with empty schema
[ https://issues.apache.org/jira/browse/SPARK-44479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin updated SPARK-44479: -- Description: Support UDTFs with empty schema, for example: {code:python} >>> class TestUDTF: ... def eval(self): ... yield tuple() {code} Currently it fails with `useArrow=True`: {code:python} >>> udtf(TestUDTF, returnType=StructType())().collect() Traceback (most recent call last): ... ValueError: not enough values to unpack (expected 2, got 0) {code} whereas without Arrow: {code:python} >>> udtf(TestUDTF, returnType=StructType(), useArrow=False)().collect() [Row()] {code} Otherwise, we should raise an error without Arrow, too, to be consistent. was: Support UDTFs with empty schema, for example: {code:python} >>> class TestUDTF: ... def eval(self): ... yield tuple() {code} Currently it fails with `useArrow=True`: {code:python} >>> udtf(TestUDTF, returnType=StructType())().collect() Traceback (most recent call last): ... ValueError: not enough values to unpack (expected 2, got 0) {code} whereas without Arrow: {code:python} >>> udtf(TestUDTF, returnType=StructType(), useArrow=False)().collect() [Row()] {code} Otherwise, we should raise an error without Arrow, too. > Support Python UDTFs with empty schema > -- > > Key: SPARK-44479 > URL: https://issues.apache.org/jira/browse/SPARK-44479 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Takuya Ueshin >Priority: Major > > Support UDTFs with empty schema, for example: > {code:python} > >>> class TestUDTF: > ... def eval(self): > ... yield tuple() > {code} > Currently it fails with `useArrow=True`: > {code:python} > >>> udtf(TestUDTF, returnType=StructType())().collect() > Traceback (most recent call last): > ... > ValueError: not enough values to unpack (expected 2, got 0) > {code} > whereas without Arrow: > {code:python} > >>> udtf(TestUDTF, returnType=StructType(), useArrow=False)().collect() > [Row()] > {code} > Otherwise, we should raise an error without Arrow, too, to be consistent. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44479) Support Python UDTFs with empty schema
Takuya Ueshin created SPARK-44479: - Summary: Support Python UDTFs with empty schema Key: SPARK-44479 URL: https://issues.apache.org/jira/browse/SPARK-44479 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.5.0 Reporter: Takuya Ueshin Support UDTFs with empty schema, for example: {code:python} >>> class TestUDTF: ... def eval(self): ... yield tuple() {code} Currently it fails with `useArrow=True`: {code:python} >>> udtf(TestUDTF, returnType=StructType())().collect() Traceback (most recent call last): ... ValueError: not enough values to unpack (expected 2, got 0) {code} whereas without Arrow: {code:python} >>> udtf(TestUDTF, returnType=StructType(), useArrow=False)().collect() [Row()] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44395) Update table function arguments to require parentheses around identifier after the TABLE keyword
[ https://issues.apache.org/jira/browse/SPARK-44395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-44395. --- Fix Version/s: 3.5.0 Assignee: Daniel Resolution: Fixed Issue resolved by pull request 41965 https://github.com/apache/spark/pull/41965 > Update table function arguments to require parentheses around identifier > after the TABLE keyword > > > Key: SPARK-44395 > URL: https://issues.apache.org/jira/browse/SPARK-44395 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Daniel >Assignee: Daniel >Priority: Major > Fix For: 3.5.0 > > > Per the SQL standard, `TABLE identifier` should actually be passed as > `TABLE(identifier)`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44380) Support for UDTF to analyze in Python
Takuya Ueshin created SPARK-44380: - Summary: Support for UDTF to analyze in Python Key: SPARK-44380 URL: https://issues.apache.org/jira/browse/SPARK-44380 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.5.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44249) Refactor PythonUDTFRunner to send its return type separately
Takuya Ueshin created SPARK-44249: - Summary: Refactor PythonUDTFRunner to send its return type separately Key: SPARK-44249 URL: https://issues.apache.org/jira/browse/SPARK-44249 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.5.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44233) Support an outer outer context in subquery resolution
[ https://issues.apache.org/jira/browse/SPARK-44233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin updated SPARK-44233: -- Description: {code:python} >>> sql("select * from range(8) t, lateral (select * from t) s") Traceback (most recent call last): ... pyspark.errors.exceptions.captured.AnalysisException: [TABLE_OR_VIEW_NOT_FOUND] The table or view `t` cannot be found. Verify the spelling and correctness of the schema and catalog. If you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog. To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS.; line 1 pos 49; 'Project [*] +- 'LateralJoin lateral-subquery#0 [], Inner : +- 'SubqueryAlias s : +- 'Project [*] : +- 'UnresolvedRelation [t], [], false +- SubqueryAlias t +- Range (0, 8, step=1, splits=None){code} The subquery {{(select * from t)}} seems not looking the outer outer context and fails to resolve {{t}}. was: {code:java} >>> sql("select * from range(8) t, lateral (select * from t) s") Traceback (most recent call last): ... pyspark.errors.exceptions.captured.AnalysisException: [TABLE_OR_VIEW_NOT_FOUND] The table or view `t` cannot be found. Verify the spelling and correctness of the schema and catalog. If you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog. To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS.; line 1 pos 49; 'Project [*] +- 'LateralJoin lateral-subquery#0 [], Inner : +- 'SubqueryAlias s : +- 'Project [*] : +- 'UnresolvedRelation [t], [], false +- SubqueryAlias t +- Range (0, 8, step=1, splits=None){code} The subquery (select * from t) seems not looking the outer outer context and fails to resolve t. > Support an outer outer context in subquery resolution > - > > Key: SPARK-44233 > URL: https://issues.apache.org/jira/browse/SPARK-44233 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Takuya Ueshin >Priority: Major > > {code:python} > >>> sql("select * from range(8) t, lateral (select * from t) s") > Traceback (most recent call last): > ... > pyspark.errors.exceptions.captured.AnalysisException: > [TABLE_OR_VIEW_NOT_FOUND] The table or view `t` cannot be found. Verify the > spelling and correctness of the schema and catalog. > If you did not qualify the name with a schema, verify the current_schema() > output, or qualify the name with the correct schema and catalog. > To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF > EXISTS.; line 1 pos 49; > 'Project [*] > +- 'LateralJoin lateral-subquery#0 [], Inner > : +- 'SubqueryAlias s > : +- 'Project [*] > : +- 'UnresolvedRelation [t], [], false > +- SubqueryAlias t > +- Range (0, 8, step=1, splits=None){code} > The subquery {{(select * from t)}} seems not looking the outer outer context > and fails to resolve {{t}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44233) Support an outer outer context in subquery resolution
[ https://issues.apache.org/jira/browse/SPARK-44233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin updated SPARK-44233: -- Description: {code:java} >>> sql("select * from range(8) t, lateral (select * from t) s") Traceback (most recent call last): ... pyspark.errors.exceptions.captured.AnalysisException: [TABLE_OR_VIEW_NOT_FOUND] The table or view `t` cannot be found. Verify the spelling and correctness of the schema and catalog. If you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog. To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS.; line 1 pos 49; 'Project [*] +- 'LateralJoin lateral-subquery#0 [], Inner : +- 'SubqueryAlias s : +- 'Project [*] : +- 'UnresolvedRelation [t], [], false +- SubqueryAlias t +- Range (0, 8, step=1, splits=None){code} The subquery (select * from t) seems not looking the outer outer context and fails to resolve t. was: {code:java} >>> sql("select * from range(8) t, lateral (select * from t) s") Traceback (most recent call last): ... pyspark.errors.exceptions.captured.AnalysisException: [TABLE_OR_VIEW_NOT_FOUND] The table or view `t` cannot be found. Verify the spelling and correctness of the schema and catalog. If you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog. To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS.; line 1 pos 49; 'Project [*] +- 'LateralJoin lateral-subquery#0 [], Inner : +- 'SubqueryAlias s : +- 'Project [*] : +- 'UnresolvedRelation [t], [], false +- SubqueryAlias t +- Range (0, 8, step=1, splits=None){code} > Support an outer outer context in subquery resolution > - > > Key: SPARK-44233 > URL: https://issues.apache.org/jira/browse/SPARK-44233 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Takuya Ueshin >Priority: Major > > {code:java} > >>> sql("select * from range(8) t, lateral (select * from t) s") > Traceback (most recent call last): > ... > pyspark.errors.exceptions.captured.AnalysisException: > [TABLE_OR_VIEW_NOT_FOUND] The table or view `t` cannot be found. Verify the > spelling and correctness of the schema and catalog. > If you did not qualify the name with a schema, verify the current_schema() > output, or qualify the name with the correct schema and catalog. > To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF > EXISTS.; line 1 pos 49; > 'Project [*] > +- 'LateralJoin lateral-subquery#0 [], Inner > : +- 'SubqueryAlias s > : +- 'Project [*] > : +- 'UnresolvedRelation [t], [], false > +- SubqueryAlias t > +- Range (0, 8, step=1, splits=None){code} > The subquery (select * from t) seems not looking the outer outer context and > fails to resolve t. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44233) Support an outer outer context in subquery resolution
[ https://issues.apache.org/jira/browse/SPARK-44233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin updated SPARK-44233: -- Description: {code:java} >>> sql("select * from range(8) t, lateral (select * from t) s") Traceback (most recent call last): ... pyspark.errors.exceptions.captured.AnalysisException: [TABLE_OR_VIEW_NOT_FOUND] The table or view `t` cannot be found. Verify the spelling and correctness of the schema and catalog. If you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog. To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS.; line 1 pos 49; 'Project [*] +- 'LateralJoin lateral-subquery#0 [], Inner : +- 'SubqueryAlias s : +- 'Project [*] : +- 'UnresolvedRelation [t], [], false +- SubqueryAlias t +- Range (0, 8, step=1, splits=None){code} > Support an outer outer context in subquery resolution > - > > Key: SPARK-44233 > URL: https://issues.apache.org/jira/browse/SPARK-44233 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Takuya Ueshin >Priority: Major > > {code:java} > >>> sql("select * from range(8) t, lateral (select * from t) s") > Traceback (most recent call last): > ... > pyspark.errors.exceptions.captured.AnalysisException: > [TABLE_OR_VIEW_NOT_FOUND] The table or view `t` cannot be found. Verify the > spelling and correctness of the schema and catalog. > If you did not qualify the name with a schema, verify the current_schema() > output, or qualify the name with the correct schema and catalog. > To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF > EXISTS.; line 1 pos 49; > 'Project [*] > +- 'LateralJoin lateral-subquery#0 [], Inner > : +- 'SubqueryAlias s > : +- 'Project [*] > : +- 'UnresolvedRelation [t], [], false > +- SubqueryAlias t > +- Range (0, 8, step=1, splits=None){code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44233) Support an outer outer context in subquery resolution
Takuya Ueshin created SPARK-44233: - Summary: Support an outer outer context in subquery resolution Key: SPARK-44233 URL: https://issues.apache.org/jira/browse/SPARK-44233 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org