[jira] [Updated] (SPARK-44365) Define the computing logic through PartitionEvaluator API and use it in SQL operators InMemoryTableScanExec, DataSourceScanExec, MergeRowsExec , ReferenceSort
[ https://issues.apache.org/jira/browse/SPARK-44365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod KC updated SPARK-44365: - Description: Define the computing logic through PartitionEvaluator API and use it in SQL operators InMemoryTableScanExec DataSourceScanExec MergeRowsExec ReferenceSort was:Define the computing logic through PartitionEvaluator API and use it in SQL operators `InMemoryTableScanExec` > Define the computing logic through PartitionEvaluator API and use it in SQL > operators InMemoryTableScanExec, DataSourceScanExec, MergeRowsExec , > ReferenceSort > -- > > Key: SPARK-44365 > URL: https://issues.apache.org/jira/browse/SPARK-44365 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Vinod KC >Priority: Major > > Define the computing logic through PartitionEvaluator API and use it in SQL > operators > InMemoryTableScanExec > DataSourceScanExec > MergeRowsExec > ReferenceSort -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44365) Define the computing logic through PartitionEvaluator API and use it in SQL operators InMemoryTableScanExec, DataSourceScanExec, MergeRowsExec , ReferenceSort
[ https://issues.apache.org/jira/browse/SPARK-44365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod KC updated SPARK-44365: - Summary: Define the computing logic through PartitionEvaluator API and use it in SQL operators InMemoryTableScanExec, DataSourceScanExec, MergeRowsExec , ReferenceSort (was: Define the computing logic through PartitionEvaluator API and use it in SQL operators InMemoryTableScanExec) > Define the computing logic through PartitionEvaluator API and use it in SQL > operators InMemoryTableScanExec, DataSourceScanExec, MergeRowsExec , > ReferenceSort > -- > > Key: SPARK-44365 > URL: https://issues.apache.org/jira/browse/SPARK-44365 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Vinod KC >Priority: Major > > Define the computing logic through PartitionEvaluator API and use it in SQL > operators `InMemoryTableScanExec` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44369) Use PartitionEvaluator API in CollectMetricsExec, GenerateExec, ExpandExec, DebugExec, HiveTableScanExec, DataSourceScanExec, SortExec
[ https://issues.apache.org/jira/browse/SPARK-44369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod KC updated SPARK-44369: - Summary: Use PartitionEvaluator API in CollectMetricsExec, GenerateExec, ExpandExec, DebugExec, HiveTableScanExec, DataSourceScanExec, SortExec (was: Use PartitionEvaluator API in CollectMetricsExec, GenerateExec, ExpandExec, DebugExec, HiveTableScanExec, DataSourceScanExec) > Use PartitionEvaluator API in CollectMetricsExec, GenerateExec, ExpandExec, > DebugExec, HiveTableScanExec, DataSourceScanExec, SortExec > -- > > Key: SPARK-44369 > URL: https://issues.apache.org/jira/browse/SPARK-44369 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Vinod KC >Priority: Major > > Use PartitionEvaluator API in > CollectMetricsExec > GenerateExec > ExpandExec > DebugExec > HiveTableScanExec > DataSourceScanExec > SortExec > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44369) Use PartitionEvaluator API in CollectMetricsExec, GenerateExec, ExpandExec, DebugExec, HiveTableScanExec, DataSourceScanExec
[ https://issues.apache.org/jira/browse/SPARK-44369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod KC updated SPARK-44369: - Description: Use PartitionEvaluator API in CollectMetricsExec GenerateExec ExpandExec DebugExec HiveTableScanExec DataSourceScanExec SortExec was: Use PartitionEvaluator API in CollectMetricsExec GenerateExec ExpandExec DebugExec HiveTableScanExec DataSourceScanExec > Use PartitionEvaluator API in CollectMetricsExec, GenerateExec, ExpandExec, > DebugExec, HiveTableScanExec, DataSourceScanExec > > > Key: SPARK-44369 > URL: https://issues.apache.org/jira/browse/SPARK-44369 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Vinod KC >Priority: Major > > Use PartitionEvaluator API in > CollectMetricsExec > GenerateExec > ExpandExec > DebugExec > HiveTableScanExec > DataSourceScanExec > SortExec > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44369) Use PartitionEvaluator API in CollectMetricsExec, GenerateExec, ExpandExec, DebugExec, HiveTableScanExec, DataSourceScanExec
[ https://issues.apache.org/jira/browse/SPARK-44369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741839#comment-17741839 ] Vinod KC commented on SPARK-44369: -- I'm working on it > Use PartitionEvaluator API in CollectMetricsExec, GenerateExec, ExpandExec, > DebugExec, HiveTableScanExec, DataSourceScanExec > > > Key: SPARK-44369 > URL: https://issues.apache.org/jira/browse/SPARK-44369 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Vinod KC >Priority: Major > > Use PartitionEvaluator API in > CollectMetricsExec > GenerateExec > ExpandExec > DebugExec > HiveTableScanExec > DataSourceScanExec -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44369) Use PartitionEvaluator API in CollectMetricsExec, GenerateExec, ExpandExec, DebugExec, HiveTableScanExec, DataSourceScanExec
Vinod KC created SPARK-44369: Summary: Use PartitionEvaluator API in CollectMetricsExec, GenerateExec, ExpandExec, DebugExec, HiveTableScanExec, DataSourceScanExec Key: SPARK-44369 URL: https://issues.apache.org/jira/browse/SPARK-44369 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: Vinod KC Use PartitionEvaluator API in CollectMetricsExec GenerateExec ExpandExec DebugExec HiveTableScanExec DataSourceScanExec -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44362) Use PartitionEvaluator API in AggregateInPandasExec, WindowInPandasExec,EvalPythonExec,AttachDistributedSequenceExec
[ https://issues.apache.org/jira/browse/SPARK-44362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741838#comment-17741838 ] Vinod KC commented on SPARK-44362: -- Im working on it > Use PartitionEvaluator API in AggregateInPandasExec, > WindowInPandasExec,EvalPythonExec,AttachDistributedSequenceExec > - > > Key: SPARK-44362 > URL: https://issues.apache.org/jira/browse/SPARK-44362 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Vinod KC >Priority: Major > > Use PartitionEvaluator API in > AggregateInPandasExec > WindowInPandasExec > EvalPythonExec > AttachDistributedSequenceExec -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44365) Define the computing logic through PartitionEvaluator API and use it in SQL operators InMemoryTableScanExec
[ https://issues.apache.org/jira/browse/SPARK-44365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741837#comment-17741837 ] Vinod KC commented on SPARK-44365: -- Im working on it > Define the computing logic through PartitionEvaluator API and use it in SQL > operators InMemoryTableScanExec > --- > > Key: SPARK-44365 > URL: https://issues.apache.org/jira/browse/SPARK-44365 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Vinod KC >Priority: Major > > Define the computing logic through PartitionEvaluator API and use it in SQL > operators `InMemoryTableScanExec` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44361) Use PartitionEvaluator API in BatchEvalPythonUDTFExec, FlatMapGroupsInPandasExec, MapInBatchExec, FlatMapCoGroupsInPandasExec
[ https://issues.apache.org/jira/browse/SPARK-44361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741836#comment-17741836 ] Vinod KC commented on SPARK-44361: -- Im working on it > Use PartitionEvaluator API in BatchEvalPythonUDTFExec, > FlatMapGroupsInPandasExec, MapInBatchExec, FlatMapCoGroupsInPandasExec > -- > > Key: SPARK-44361 > URL: https://issues.apache.org/jira/browse/SPARK-44361 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Vinod KC >Priority: Major > > Use PartitionEvaluator API in > BatchEvalPythonUDTFExec, > FlatMapGroupsInPandasExec, > MapInBatchExec, > FlatMapCoGroupsInPandasExec -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44357) Add pyspark_testing module for GHA tests
[ https://issues.apache.org/jira/browse/SPARK-44357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-44357: Assignee: Amanda Liu > Add pyspark_testing module for GHA tests > > > Key: SPARK-44357 > URL: https://issues.apache.org/jira/browse/SPARK-44357 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Amanda Liu >Assignee: Amanda Liu >Priority: Major > > SPIP: > https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44357) Add pyspark_testing module for GHA tests
[ https://issues.apache.org/jira/browse/SPARK-44357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44357. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41896 [https://github.com/apache/spark/pull/41896] > Add pyspark_testing module for GHA tests > > > Key: SPARK-44357 > URL: https://issues.apache.org/jira/browse/SPARK-44357 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Amanda Liu >Assignee: Amanda Liu >Priority: Major > Fix For: 3.5.0 > > > SPIP: > https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44363) Display percent of unequal rows in DataFrame comparison
[ https://issues.apache.org/jira/browse/SPARK-44363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44363. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41926 [https://github.com/apache/spark/pull/41926] > Display percent of unequal rows in DataFrame comparison > --- > > Key: SPARK-44363 > URL: https://issues.apache.org/jira/browse/SPARK-44363 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Amanda Liu >Assignee: Amanda Liu >Priority: Major > Fix For: 3.5.0 > > > SPIP: > https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44363) Display percent of unequal rows in DataFrame comparison
[ https://issues.apache.org/jira/browse/SPARK-44363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-44363: Assignee: Amanda Liu > Display percent of unequal rows in DataFrame comparison > --- > > Key: SPARK-44363 > URL: https://issues.apache.org/jira/browse/SPARK-44363 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Amanda Liu >Assignee: Amanda Liu >Priority: Major > > SPIP: > https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44368) Support partition operation on dataframe in Spark Connect Go Client
BoYang created SPARK-44368: -- Summary: Support partition operation on dataframe in Spark Connect Go Client Key: SPARK-44368 URL: https://issues.apache.org/jira/browse/SPARK-44368 Project: Spark Issue Type: Sub-task Components: Connect Contrib Affects Versions: 3.4.1 Reporter: BoYang Support partition operation on dataframe in Spark Connect Go Client -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44251) Potential for incorrect results or NPE when full outer USING join has null key value
[ https://issues.apache.org/jira/browse/SPARK-44251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-44251: Fix Version/s: 3.5.0 (was: 4.0.0) > Potential for incorrect results or NPE when full outer USING join has null > key value > > > Key: SPARK-44251 > URL: https://issues.apache.org/jira/browse/SPARK-44251 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.2, 3.4.1, 3.5.0 >Reporter: Bruce Robbins >Assignee: Bruce Robbins >Priority: Major > Labels: correctness > Fix For: 3.3.3, 3.5.0, 3.4.2 > > > The following query produces incorrect results: > {noformat} > create or replace temp view v1 as values (1, 2), (null, 7) as (c1, c2); > create or replace temp view v2 as values (2, 3) as (c1, c2); > select explode(array(c1)) as x > from v1 > full outer join v2 > using (c1); > -1 <== should be null > 1 > 2 > {noformat} > The following query fails with a {{NullPointerException}}: > {noformat} > create or replace temp view v1 as values ('1', 2), (null, 7) as (c1, c2); > create or replace temp view v2 as values ('2', 3) as (c1, c2); > select explode(array(c1)) as x > from v1 > full outer join v2 > using (c1); > 23/06/25 17:06:39 ERROR Executor: Exception in task 0.0 in stage 14.0 (TID 11) > java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.generate_doConsume_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.smj_consumeFullOuterJoinRow_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.wholestagecodegen_findNextJoinRows_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43) > ... > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44251) Potential for incorrect results or NPE when full outer USING join has null key value
[ https://issues.apache.org/jira/browse/SPARK-44251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang resolved SPARK-44251. - Fix Version/s: 3.3.3 4.0.0 3.4.2 Resolution: Fixed Issue resolved by pull request 41809 [https://github.com/apache/spark/pull/41809] > Potential for incorrect results or NPE when full outer USING join has null > key value > > > Key: SPARK-44251 > URL: https://issues.apache.org/jira/browse/SPARK-44251 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.2, 3.4.1, 3.5.0 >Reporter: Bruce Robbins >Assignee: Bruce Robbins >Priority: Major > Labels: correctness > Fix For: 3.3.3, 4.0.0, 3.4.2 > > > The following query produces incorrect results: > {noformat} > create or replace temp view v1 as values (1, 2), (null, 7) as (c1, c2); > create or replace temp view v2 as values (2, 3) as (c1, c2); > select explode(array(c1)) as x > from v1 > full outer join v2 > using (c1); > -1 <== should be null > 1 > 2 > {noformat} > The following query fails with a {{NullPointerException}}: > {noformat} > create or replace temp view v1 as values ('1', 2), (null, 7) as (c1, c2); > create or replace temp view v2 as values ('2', 3) as (c1, c2); > select explode(array(c1)) as x > from v1 > full outer join v2 > using (c1); > 23/06/25 17:06:39 ERROR Executor: Exception in task 0.0 in stage 14.0 (TID 11) > java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.generate_doConsume_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.smj_consumeFullOuterJoinRow_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.wholestagecodegen_findNextJoinRows_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43) > ... > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44251) Potential for incorrect results or NPE when full outer USING join has null key value
[ https://issues.apache.org/jira/browse/SPARK-44251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang reassigned SPARK-44251: --- Assignee: Bruce Robbins > Potential for incorrect results or NPE when full outer USING join has null > key value > > > Key: SPARK-44251 > URL: https://issues.apache.org/jira/browse/SPARK-44251 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.2, 3.4.1, 3.5.0 >Reporter: Bruce Robbins >Assignee: Bruce Robbins >Priority: Major > Labels: correctness > > The following query produces incorrect results: > {noformat} > create or replace temp view v1 as values (1, 2), (null, 7) as (c1, c2); > create or replace temp view v2 as values (2, 3) as (c1, c2); > select explode(array(c1)) as x > from v1 > full outer join v2 > using (c1); > -1 <== should be null > 1 > 2 > {noformat} > The following query fails with a {{NullPointerException}}: > {noformat} > create or replace temp view v1 as values ('1', 2), (null, 7) as (c1, c2); > create or replace temp view v2 as values ('2', 3) as (c1, c2); > select explode(array(c1)) as x > from v1 > full outer join v2 > using (c1); > 23/06/25 17:06:39 ERROR Executor: Exception in task 0.0 in stage 14.0 (TID 11) > java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.generate_doConsume_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.smj_consumeFullOuterJoinRow_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.wholestagecodegen_findNextJoinRows_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43) > ... > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44367) Show error message on UI for each query
Kent Yao created SPARK-44367: Summary: Show error message on UI for each query Key: SPARK-44367 URL: https://issues.apache.org/jira/browse/SPARK-44367 Project: Spark Issue Type: Improvement Components: SQL, Web UI Affects Versions: 3.5.0 Reporter: Kent Yao displaying sql errors to improve UX for sql developing -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41997) Test parity: pyspark.sql.tests.test_readwriter
[ https://issues.apache.org/jira/browse/SPARK-41997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41997: Assignee: Hyukjin Kwon > Test parity: pyspark.sql.tests.test_readwriter > -- > > Key: SPARK-41997 > URL: https://issues.apache.org/jira/browse/SPARK-41997 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > See https://issues.apache.org/jira/browse/SPARK-41652 and > https://issues.apache.org/jira/browse/SPARK-41651 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42264) Test Parity: pyspark.sql.tests.test_udf and pyspark.sql.tests.pandas.test_pandas_udf
[ https://issues.apache.org/jira/browse/SPARK-42264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-42264: - Shepherd: Hyukjin Kwon > Test Parity: pyspark.sql.tests.test_udf and > pyspark.sql.tests.pandas.test_pandas_udf > > > Key: SPARK-42264 > URL: https://issues.apache.org/jira/browse/SPARK-42264 > Project: Spark > Issue Type: Umbrella > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43974) Upgrade buf to v1.23.1
[ https://issues.apache.org/jira/browse/SPARK-43974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-43974. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41469 [https://github.com/apache/spark/pull/41469] > Upgrade buf to v1.23.1 > -- > > Key: SPARK-43974 > URL: https://issues.apache.org/jira/browse/SPARK-43974 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44366) Migrate antlr4 from 4.9 to 4.10+
BingKun Pan created SPARK-44366: --- Summary: Migrate antlr4 from 4.9 to 4.10+ Key: SPARK-44366 URL: https://issues.apache.org/jira/browse/SPARK-44366 Project: Spark Issue Type: Sub-task Components: Build Affects Versions: 4.0.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44332) Fix the sorting error of Executor ID Column on Executors UI Page
[ https://issues.apache.org/jira/browse/SPARK-44332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-44332. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41887 [https://github.com/apache/spark/pull/41887] > Fix the sorting error of Executor ID Column on Executors UI Page > > > Key: SPARK-44332 > URL: https://issues.apache.org/jira/browse/SPARK-44332 > Project: Spark > Issue Type: Bug > Components: Spark Core, Web UI >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44332) Fix the sorting error of Executor ID Column on Executors UI Page
[ https://issues.apache.org/jira/browse/SPARK-44332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-44332: Assignee: BingKun Pan > Fix the sorting error of Executor ID Column on Executors UI Page > > > Key: SPARK-44332 > URL: https://issues.apache.org/jira/browse/SPARK-44332 > Project: Spark > Issue Type: Bug > Components: Spark Core, Web UI >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43983) Implement cross validator estimator
[ https://issues.apache.org/jira/browse/SPARK-43983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu resolved SPARK-43983. Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41881 [https://github.com/apache/spark/pull/41881] > Implement cross validator estimator > --- > > Key: SPARK-43983 > URL: https://issues.apache.org/jira/browse/SPARK-43983 > Project: Spark > Issue Type: Sub-task > Components: Connect, ML, PySpark >Affects Versions: 3.5.0 >Reporter: Weichen Xu >Assignee: Weichen Xu >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44216) Add assertSchemaEqual API with ignore_nullable optional flag
[ https://issues.apache.org/jira/browse/SPARK-44216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amanda Liu updated SPARK-44216: --- Summary: Add assertSchemaEqual API with ignore_nullable optional flag (was: Add improved error message formatting for assert_df_equality) > Add assertSchemaEqual API with ignore_nullable optional flag > > > Key: SPARK-44216 > URL: https://issues.apache.org/jira/browse/SPARK-44216 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Amanda Liu >Priority: Major > > SPIP: > https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44216) Make assertSchemaEqual API with ignore_nullable optional flag
[ https://issues.apache.org/jira/browse/SPARK-44216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amanda Liu updated SPARK-44216: --- Summary: Make assertSchemaEqual API with ignore_nullable optional flag (was: Add assertSchemaEqual API with ignore_nullable optional flag) > Make assertSchemaEqual API with ignore_nullable optional flag > - > > Key: SPARK-44216 > URL: https://issues.apache.org/jira/browse/SPARK-44216 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Amanda Liu >Priority: Major > > SPIP: > https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44365) Use PartitionEvaluator API in AggregateInPandasExec, WindowInPandasExec,EvalPythonExec,AttachDistributedSequenceExec
[ https://issues.apache.org/jira/browse/SPARK-44365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod KC updated SPARK-44365: - Description: Define the computing logic through PartitionEvaluator API and use it in SQL operators `InMemoryTableScanExec` (was: InMemoryTableScanExec) > Use PartitionEvaluator API in AggregateInPandasExec, > WindowInPandasExec,EvalPythonExec,AttachDistributedSequenceExec > - > > Key: SPARK-44365 > URL: https://issues.apache.org/jira/browse/SPARK-44365 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Vinod KC >Priority: Major > > Define the computing logic through PartitionEvaluator API and use it in SQL > operators `InMemoryTableScanExec` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44365) Use PartitionEvaluator API in AggregateInPandasExec, WindowInPandasExec,EvalPythonExec,AttachDistributedSequenceExec
[ https://issues.apache.org/jira/browse/SPARK-44365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod KC updated SPARK-44365: - Description: InMemoryTableScanExec (was: `BatchEvalPythonUDTFExec` `FlatMapGroupsInPandasExec` `MapInBatchExec` `FlatMapCoGroupsInPandasExec`) > Use PartitionEvaluator API in AggregateInPandasExec, > WindowInPandasExec,EvalPythonExec,AttachDistributedSequenceExec > - > > Key: SPARK-44365 > URL: https://issues.apache.org/jira/browse/SPARK-44365 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Vinod KC >Priority: Major > > InMemoryTableScanExec -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44365) Define the computing logic through PartitionEvaluator API and use it in SQL operators InMemoryTableScanExec
[ https://issues.apache.org/jira/browse/SPARK-44365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod KC updated SPARK-44365: - Summary: Define the computing logic through PartitionEvaluator API and use it in SQL operators InMemoryTableScanExec (was: Use PartitionEvaluator API in AggregateInPandasExec, WindowInPandasExec,EvalPythonExec,AttachDistributedSequenceExec) > Define the computing logic through PartitionEvaluator API and use it in SQL > operators InMemoryTableScanExec > --- > > Key: SPARK-44365 > URL: https://issues.apache.org/jira/browse/SPARK-44365 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Vinod KC >Priority: Major > > Define the computing logic through PartitionEvaluator API and use it in SQL > operators `InMemoryTableScanExec` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44365) Use PartitionEvaluator API in AggregateInPandasExec, WindowInPandasExec,EvalPythonExec,AttachDistributedSequenceExec
Vinod KC created SPARK-44365: Summary: Use PartitionEvaluator API in AggregateInPandasExec, WindowInPandasExec,EvalPythonExec,AttachDistributedSequenceExec Key: SPARK-44365 URL: https://issues.apache.org/jira/browse/SPARK-44365 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: Vinod KC `BatchEvalPythonUDTFExec` `FlatMapGroupsInPandasExec` `MapInBatchExec` `FlatMapCoGroupsInPandasExec` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44363) Display percent of unequal rows in DataFrame comparison
[ https://issues.apache.org/jira/browse/SPARK-44363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amanda Liu updated SPARK-44363: --- Summary: Display percent of unequal rows in DataFrame comparison (was: Display percent of unequal rows in dataframe comparison) > Display percent of unequal rows in DataFrame comparison > --- > > Key: SPARK-44363 > URL: https://issues.apache.org/jira/browse/SPARK-44363 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Amanda Liu >Priority: Major > > SPIP: > https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44061) Add assertDataFrameEquality util function
[ https://issues.apache.org/jira/browse/SPARK-44061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amanda Liu updated SPARK-44061: --- Summary: Add assertDataFrameEquality util function (was: Add assert_df_equality util function) > Add assertDataFrameEquality util function > - > > Key: SPARK-44061 > URL: https://issues.apache.org/jira/browse/SPARK-44061 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Amanda Liu >Assignee: Amanda Liu >Priority: Major > Fix For: 3.5.0 > > > SPIP: > https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44364) Support List[Row] data type for expected DataFrame argument
Amanda Liu created SPARK-44364: -- Summary: Support List[Row] data type for expected DataFrame argument Key: SPARK-44364 URL: https://issues.apache.org/jira/browse/SPARK-44364 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.5.0 Reporter: Amanda Liu SPIP: https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44363) Display percent of unequal rows in dataframe comparison
Amanda Liu created SPARK-44363: -- Summary: Display percent of unequal rows in dataframe comparison Key: SPARK-44363 URL: https://issues.apache.org/jira/browse/SPARK-44363 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.5.0 Reporter: Amanda Liu SPIP: https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44362) Use PartitionEvaluator API in AggregateInPandasExec, WindowInPandasExec,EvalPythonExec,AttachDistributedSequenceExec
Vinod KC created SPARK-44362: Summary: Use PartitionEvaluator API in AggregateInPandasExec, WindowInPandasExec,EvalPythonExec,AttachDistributedSequenceExec Key: SPARK-44362 URL: https://issues.apache.org/jira/browse/SPARK-44362 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: Vinod KC Use PartitionEvaluator API in AggregateInPandasExec WindowInPandasExec EvalPythonExec AttachDistributedSequenceExec -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44361) Use PartitionEvaluator API in BatchEvalPythonUDTFExec, FlatMapGroupsInPandasExec, MapInBatchExec, FlatMapCoGroupsInPandasExec
Vinod KC created SPARK-44361: Summary: Use PartitionEvaluator API in BatchEvalPythonUDTFExec, FlatMapGroupsInPandasExec, MapInBatchExec, FlatMapCoGroupsInPandasExec Key: SPARK-44361 URL: https://issues.apache.org/jira/browse/SPARK-44361 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: Vinod KC Use PartitionEvaluator API in BatchEvalPythonUDTFExec, FlatMapGroupsInPandasExec, MapInBatchExec, FlatMapCoGroupsInPandasExec -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44360) Support schema pruning in delta-based MERGE operations
Anton Okolnychyi created SPARK-44360: Summary: Support schema pruning in delta-based MERGE operations Key: SPARK-44360 URL: https://issues.apache.org/jira/browse/SPARK-44360 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: Anton Okolnychyi We need to support schema pruning in delta-based MERGE operations. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44352) Move sameType back to DataType
[ https://issues.apache.org/jira/browse/SPARK-44352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hövell resolved SPARK-44352. --- Fix Version/s: 3.5.0 Resolution: Fixed > Move sameType back to DataType > -- > > Key: SPARK-44352 > URL: https://issues.apache.org/jira/browse/SPARK-44352 > Project: Spark > Issue Type: New Feature > Components: Connect, SQL >Affects Versions: 3.4.1 >Reporter: Herman van Hövell >Assignee: Herman van Hövell >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44353) Remove toAttributes from StructType
[ https://issues.apache.org/jira/browse/SPARK-44353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hövell reassigned SPARK-44353: - Assignee: Herman van Hövell > Remove toAttributes from StructType > --- > > Key: SPARK-44353 > URL: https://issues.apache.org/jira/browse/SPARK-44353 > Project: Spark > Issue Type: New Feature > Components: Connect, SQL >Affects Versions: 3.4.1 >Reporter: Herman van Hövell >Assignee: Herman van Hövell >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44343) Separate encoder inference from expression encoder generation in ScalaReflection
[ https://issues.apache.org/jira/browse/SPARK-44343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hövell resolved SPARK-44343. --- Fix Version/s: 3.5.0 Resolution: Fixed > Separate encoder inference from expression encoder generation in > ScalaReflection > > > Key: SPARK-44343 > URL: https://issues.apache.org/jira/browse/SPARK-44343 > Project: Spark > Issue Type: New Feature > Components: Connect, SQL >Affects Versions: 3.4.1 >Reporter: Herman van Hövell >Assignee: Herman van Hövell >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44359) Define the computing logic through PartitionEvaluator API and use it in SQL aggregate operators
[ https://issues.apache.org/jira/browse/SPARK-44359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741736#comment-17741736 ] Vinod KC commented on SPARK-44359: -- Im working on this > Define the computing logic through PartitionEvaluator API and use it in SQL > aggregate operators > --- > > Key: SPARK-44359 > URL: https://issues.apache.org/jira/browse/SPARK-44359 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Vinod KC >Priority: Major > > Define the computing logic through PartitionEvaluator API and use it in SQL > aggregate operators > `MergingSessionsExec` > `SortAggregateExec` > `UpdatingSessionsExec` > `HashAggregateExec` > `ObjectHashAggregateExec` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44359) Define the computing logic through PartitionEvaluator API and use it in SQL aggregate operators
Vinod KC created SPARK-44359: Summary: Define the computing logic through PartitionEvaluator API and use it in SQL aggregate operators Key: SPARK-44359 URL: https://issues.apache.org/jira/browse/SPARK-44359 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: Vinod KC Define the computing logic through PartitionEvaluator API and use it in SQL aggregate operators `MergingSessionsExec` `SortAggregateExec` `UpdatingSessionsExec` `HashAggregateExec` `ObjectHashAggregateExec` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44330) Define the computing logic through PartitionEvaluator API and use it in BroadcastNestedLoopJoinExec & BroadcastHashJoinExec
[ https://issues.apache.org/jira/browse/SPARK-44330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741734#comment-17741734 ] Vinod KC commented on SPARK-44330: -- PR raised : https://github.com/apache/spark/pull/41888 > Define the computing logic through PartitionEvaluator API and use it in > BroadcastNestedLoopJoinExec & BroadcastHashJoinExec > --- > > Key: SPARK-44330 > URL: https://issues.apache.org/jira/browse/SPARK-44330 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Vinod KC >Priority: Major > > Define the computing logic through PartitionEvaluator API and use it in > BroadcastNestedLoopJoinExec & BroadcastHashJoinExec -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44350) Upgrade sbt to 1.9.2
[ https://issues.apache.org/jira/browse/SPARK-44350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-44350: - Priority: Trivial (was: Minor) > Upgrade sbt to 1.9.2 > > > Key: SPARK-44350 > URL: https://issues.apache.org/jira/browse/SPARK-44350 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Trivial > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44350) Upgrade sbt to 1.9.2
[ https://issues.apache.org/jira/browse/SPARK-44350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-44350: Assignee: BingKun Pan > Upgrade sbt to 1.9.2 > > > Key: SPARK-44350 > URL: https://issues.apache.org/jira/browse/SPARK-44350 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44350) Upgrade sbt to 1.9.2
[ https://issues.apache.org/jira/browse/SPARK-44350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-44350. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41916 [https://github.com/apache/spark/pull/41916] > Upgrade sbt to 1.9.2 > > > Key: SPARK-44350 > URL: https://issues.apache.org/jira/browse/SPARK-44350 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44357) Add pyspark_testing module for GHA tests
Amanda Liu created SPARK-44357: -- Summary: Add pyspark_testing module for GHA tests Key: SPARK-44357 URL: https://issues.apache.org/jira/browse/SPARK-44357 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.5.0 Reporter: Amanda Liu SPIP: https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44356) Move INSERT INTO to CTEDef code path
Max Gekk created SPARK-44356: Summary: Move INSERT INTO to CTEDef code path Key: SPARK-44356 URL: https://issues.apache.org/jira/browse/SPARK-44356 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: Max Gekk Support the combination WITH ... INSERT INTO. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44355) Move commands to CTEDef code path and deprecate CTE inline path
Max Gekk created SPARK-44355: Summary: Move commands to CTEDef code path and deprecate CTE inline path Key: SPARK-44355 URL: https://issues.apache.org/jira/browse/SPARK-44355 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.0 Reporter: Max Gekk Right now our CTE resolution code path is diverged: query with commands go into CTE inline code path where non-command queries go into CTEDef code path (see https://github.com/apache/spark/blob/42719d9425b9a24ef016b5c2874e522b960cf114/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala#L50 ). For longer term we should migrate command queries go through CTEDef as well and deprecate the CTE inline path. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44351) Make some syntactic simplification
[ https://issues.apache.org/jira/browse/SPARK-44351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie reassigned SPARK-44351: Assignee: Yang Jie > Make some syntactic simplification > -- > > Key: SPARK-44351 > URL: https://issues.apache.org/jira/browse/SPARK-44351 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > > - Use `exists` instead of `find` and `emptiness check` > - Use `orNull` instead of `etOrElse(null)` > - Use `getOrElse(key, value)` instead of `get(key).getOrElse(value)` on map > - Use `find` instead of `filter` + `headOption` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44351) Make some syntactic simplification
[ https://issues.apache.org/jira/browse/SPARK-44351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-44351. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41915 [https://github.com/apache/spark/pull/41915] > Make some syntactic simplification > -- > > Key: SPARK-44351 > URL: https://issues.apache.org/jira/browse/SPARK-44351 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.5.0 > > > - Use `exists` instead of `find` and `emptiness check` > - Use `orNull` instead of `etOrElse(null)` > - Use `getOrElse(key, value)` instead of `get(key).getOrElse(value)` on map > - Use `find` instead of `filter` + `headOption` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44354) Cannot create dataframe with CharType/VarcharType column
[ https://issues.apache.org/jira/browse/SPARK-44354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai-Michael Roesner updated SPARK-44354: Description: When trying to create a dataframe with a CharType or VarcharType column like so: {code} from datetime import date from decimal import Decimal from pyspark.sql import SparkSession from pyspark.sql.types import * data = [ (1, 'abc', Decimal(3.142), date(2023, 1, 1)), (2, 'bcd', Decimal(1.414), date(2023, 1, 2)), (3, 'cde', Decimal(2.718), date(2023, 1, 3))] schema = StructType([ StructField('INT', IntegerType()), StructField('STR', CharType(3)), StructField('DEC', DecimalType(4, 3)), StructField('DAT', DateType())]) spark = SparkSession.builder.appName('data-types').getOrCreate() df = spark.createDataFrame(data, schema) df.show() {code} a {{java.lang.IllegalStateException}} is thrown [here|https://github.com/apache/spark/blob/85e252e8503534009f4fb5ea005d44c9eda31447/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L168]. I'm expecting this to work... PS: Excerpt from the logs: {code} py4j.protocol.Py4JJavaError: An error occurred while calling o24.applySchemaToPythonRDD. : java.lang.IllegalStateException: [BUG] logical plan should not have output of char/varchar type: LogicalRDD [INT#0, STR#1, DEC#2, DAT#3], false at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$1(CheckAnalysis.scala:168) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$1$adapted(CheckAnalysis.scala:163) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:295) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0(CheckAnalysis.scala:163) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0$(CheckAnalysis.scala:160) at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis0(Analyzer.scala:188) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:156) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:146) at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:188) at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:211) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330) at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:208) at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:76) at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111) at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:202) at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:526) at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:202) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827) at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:201) at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:76) at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66) at org.apache.spark.sql.Dataset$.$anonfun$ofRows$1(Dataset.scala:90) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:88) at org.apache.spark.sql.SparkSession.internalCreateDataFrame(SparkSession.scala:571) at org.apache.spark.sql.SparkSession.applySchemaToPythonRDD(SparkSession.scala:804) at org.apache.spark.sql.SparkSession.applySchemaToPythonRDD(SparkSession.scala:789) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
[jira] [Updated] (SPARK-44354) Cannot create dataframe with CharType/VarcharType column
[ https://issues.apache.org/jira/browse/SPARK-44354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai-Michael Roesner updated SPARK-44354: Component/s: SQL > Cannot create dataframe with CharType/VarcharType column > > > Key: SPARK-44354 > URL: https://issues.apache.org/jira/browse/SPARK-44354 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 3.4.0 >Reporter: Kai-Michael Roesner >Priority: Major > > When trying to create a dataframe with a CharType or VarcharType column like > so: > {code} > from datetime import date > from decimal import Decimal > from pyspark.sql import SparkSession > from pyspark.sql.types import * > data = [ > (1, 'abc', Decimal(3.142), date(2023, 1, 1)), > (2, 'bcd', Decimal(1.414), date(2023, 1, 2)), > (3, 'cde', Decimal(2.718), date(2023, 1, 3))] > schema = StructType([ > StructField('INT', IntegerType()), > StructField('STR', CharType(3)), > StructField('DEC', DecimalType(4, 3)), > StructField('DAT', DateType())]) > spark = SparkSession.builder.appName('data-types').getOrCreate() > df = spark.createDataFrame(data, schema) > df.show() > {code} > a {{java.lang.IllegalStateException}} is thrown > [here|https://github.com/apache/spark/blob/85e252e8503534009f4fb5ea005d44c9eda31447/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L168]. > I'm expecting this to work... -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44354) Cannot create dataframe with CharType/VarcharType column
[ https://issues.apache.org/jira/browse/SPARK-44354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai-Michael Roesner updated SPARK-44354: Description: When trying to create a dataframe with a CharType or VarcharType column like so: {code} from datetime import date from decimal import Decimal from pyspark.sql import SparkSession from pyspark.sql.types import * data = [ (1, 'abc', Decimal(3.142), date(2023, 1, 1)), (2, 'bcd', Decimal(1.414), date(2023, 1, 2)), (3, 'cde', Decimal(2.718), date(2023, 1, 3))] schema = StructType([ StructField('INT', IntegerType()), StructField('STR', CharType(3)), StructField('DEC', DecimalType(4, 3)), StructField('DAT', DateType())]) spark = SparkSession.builder.appName('data-types').getOrCreate() df = spark.createDataFrame(data, schema) df.show() {code} a {{java.lang.IllegalStateException}} is thrown [here|https://github.com/apache/spark/blob/85e252e8503534009f4fb5ea005d44c9eda31447/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L168]. I'm expecting this to work... was: When trying to create a dataframe with a CharType or VarcharType column like so: {code} from datetime import date from decimal import Decimal from pyspark.sql import SparkSession from pyspark.sql.types import * data = [ (1, 'abc', Decimal(3.142), date(2023, 1, 1)), (2, 'bcd', Decimal(1.414), date(2023, 1, 2)), (3, 'cde', Decimal(2.718), date(2023, 1, 3))] schema = StructType([ StructField('INT', IntegerType()), StructField('STR', CharType(3)), StructField('DEC', DecimalType(4, 3)), StructField('DAT', DateType())]) spark = SparkSession.builder.appName('data-types').getOrCreate() df = spark.createDataFrame(data, schema) df.show() {code} a {{java.lang.IllegalStateException}} is thrown [here|https://github.com/apache/spark/blame/85e252e8503534009f4fb5ea005d44c9eda31447/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L168] I'm expecting this to work... > Cannot create dataframe with CharType/VarcharType column > > > Key: SPARK-44354 > URL: https://issues.apache.org/jira/browse/SPARK-44354 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Kai-Michael Roesner >Priority: Major > > When trying to create a dataframe with a CharType or VarcharType column like > so: > {code} > from datetime import date > from decimal import Decimal > from pyspark.sql import SparkSession > from pyspark.sql.types import * > data = [ > (1, 'abc', Decimal(3.142), date(2023, 1, 1)), > (2, 'bcd', Decimal(1.414), date(2023, 1, 2)), > (3, 'cde', Decimal(2.718), date(2023, 1, 3))] > schema = StructType([ > StructField('INT', IntegerType()), > StructField('STR', CharType(3)), > StructField('DEC', DecimalType(4, 3)), > StructField('DAT', DateType())]) > spark = SparkSession.builder.appName('data-types').getOrCreate() > df = spark.createDataFrame(data, schema) > df.show() > {code} > a {{java.lang.IllegalStateException}} is thrown > [here|https://github.com/apache/spark/blob/85e252e8503534009f4fb5ea005d44c9eda31447/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L168]. > I'm expecting this to work... -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44354) Cannot create dataframe with CharType/VarcharType column
Kai-Michael Roesner created SPARK-44354: --- Summary: Cannot create dataframe with CharType/VarcharType column Key: SPARK-44354 URL: https://issues.apache.org/jira/browse/SPARK-44354 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.4.0 Reporter: Kai-Michael Roesner When trying to create a dataframe with a CharType or VarcharType column like so: {code} from datetime import date from decimal import Decimal from pyspark.sql import SparkSession from pyspark.sql.types import * data = [ (1, 'abc', Decimal(3.142), date(2023, 1, 1)), (2, 'bcd', Decimal(1.414), date(2023, 1, 2)), (3, 'cde', Decimal(2.718), date(2023, 1, 3))] schema = StructType([ StructField('INT', IntegerType()), StructField('STR', CharType(3)), StructField('DEC', DecimalType(4, 3)), StructField('DAT', DateType())]) spark = SparkSession.builder.appName('data-types').getOrCreate() df = spark.createDataFrame(data, schema) df.show() {code} a {{java.lang.IllegalStateException}} is thrown [here|https://github.com/apache/spark/blame/85e252e8503534009f4fb5ea005d44c9eda31447/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L168] I'm expecting this to work... -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38476) Use error classes in org.apache.spark.storage
[ https://issues.apache.org/jira/browse/SPARK-38476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Zhang updated SPARK-38476: - Summary: Use error classes in org.apache.spark.storage (was: Use error classes in org.apache.spark.shuffle) > Use error classes in org.apache.spark.storage > - > > Key: SPARK-38476 > URL: https://issues.apache.org/jira/browse/SPARK-38476 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Bo Zhang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38477) Use error classes in org.apache.spark.shuffle
[ https://issues.apache.org/jira/browse/SPARK-38477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Zhang updated SPARK-38477: - Summary: Use error classes in org.apache.spark.shuffle (was: Use error classes in org.apache.spark.storage) > Use error classes in org.apache.spark.shuffle > - > > Key: SPARK-38477 > URL: https://issues.apache.org/jira/browse/SPARK-38477 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Bo Zhang >Assignee: Bo Zhang >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44353) Remove toAttributes from StructType
Herman van Hövell created SPARK-44353: - Summary: Remove toAttributes from StructType Key: SPARK-44353 URL: https://issues.apache.org/jira/browse/SPARK-44353 Project: Spark Issue Type: New Feature Components: Connect, SQL Affects Versions: 3.4.1 Reporter: Herman van Hövell -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44352) Move sameType back to DataType
[ https://issues.apache.org/jira/browse/SPARK-44352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741579#comment-17741579 ] Nikita Awasthi commented on SPARK-44352: User 'hvanhovell' has created a pull request for this issue: https://github.com/apache/spark/pull/41921 > Move sameType back to DataType > -- > > Key: SPARK-44352 > URL: https://issues.apache.org/jira/browse/SPARK-44352 > Project: Spark > Issue Type: New Feature > Components: Connect, SQL >Affects Versions: 3.4.1 >Reporter: Herman van Hövell >Assignee: Herman van Hövell >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44352) Move sameType back to DataType
Herman van Hövell created SPARK-44352: - Summary: Move sameType back to DataType Key: SPARK-44352 URL: https://issues.apache.org/jira/browse/SPARK-44352 Project: Spark Issue Type: New Feature Components: Connect, SQL Affects Versions: 3.4.1 Reporter: Herman van Hövell Assignee: Herman van Hövell -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39375) SPIP: Spark Connect - A client and server interface for Apache Spark
[ https://issues.apache.org/jira/browse/SPARK-39375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-39375: - Description: Please find the full document for discussion here: [Spark Connect SPIP|https://docs.google.com/document/d/1Mnl6jmGszixLW4KcJU5j9IgpG9-UabS0dcM6PM2XGDc/edit#heading=h.wmsrrfealhrj] Below, we have just referenced the introduction. h2. What are you trying to do? While Spark is used extensively, it was designed nearly a decade ago, which, in the age of serverless computing and ubiquitous programming language use, poses a number of limitations. Most of the limitations stem from the tightly coupled Spark driver architecture and fact that clusters are typically shared across users: (1) {*}Lack of built-in remote connectivity{*}: the Spark driver runs both the client application and scheduler, which results in a heavyweight architecture that requires proximity to the cluster. There is no built-in capability to remotely connect to a Spark cluster in languages other than SQL and users therefore rely on external solutions such as the inactive project [Apache Livy|https://livy.apache.org/]. (2) {*}Lack of rich developer experience{*}: The current architecture and APIs do not cater for interactive data exploration (as done with Notebooks), or allow for building out rich developer experience common in modern code editors. (3) {*}Stability{*}: with the current shared driver architecture, users causing critical exceptions (e.g. OOM) bring the whole cluster down for all users. (4) {*}Upgradability{*}: the current entangling of platform and client APIs (e.g. first and third-party dependencies in the classpath) does not allow for seamless upgrades between Spark versions (and with that, hinders new feature adoption). We propose to overcome these challenges by building on the DataFrame API and the underlying unresolved logical plans. The DataFrame API is widely used and makes it very easy to iteratively express complex logic. We will introduce {_}Spark Connect{_}, a remote option of the DataFrame API that separates the client from the Spark server. With Spark Connect, Spark will become decoupled, allowing for built-in remote connectivity: The decoupled client SDK can be used to run interactive data exploration and connect to the server for DataFrame operations. Spark Connect will benefit Spark developers in different ways: The decoupled architecture will result in improved stability, as clients are separated from the driver. From the Spark Connect client perspective, Spark will be (almost) versionless, and thus enable seamless upgradability, as server APIs can evolve without affecting the client API. The decoupled client-server architecture can be leveraged to build close integrations with local developer tooling. Finally, separating the client process from the Spark server process will improve Spark’s overall security posture by avoiding the tight coupling of the client inside the Spark runtime environment. Spark Connect will strengthen Spark’s position as the modern unified engine for large-scale data analytics and expand applicability to use cases and developers we could not reach with the current setup: Spark will become ubiquitously usable as the DataFrame API can be used with (almost) any programming language. | |SPARK-41282|Feature parity: Column API in Spark Connect|!https://issues.apache.org/jira/secure/viewavatar?size=xsmall=21130=issuetype!|{color:#42526e}REOPENED{color}|[Ruifeng Zheng|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=podongfeng]|[_Actions_|https://issues.apache.org/jira/rest/api/1.0/issues/13506151/ActionsAndOperations]| | |SPARK-41283|Feature parity: Functions API in Spark Connect|!https://issues.apache.org/jira/secure/viewavatar?size=xsmall=21130=issuetype!|{color:#006644}RESOLVED{color}|[Ruifeng Zheng|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=podongfeng]|[_Actions_|https://issues.apache.org/jira/rest/api/1.0/issues/13506152/ActionsAndOperations]| | |SPARK-41279|Feature parity: DataFrame API in Spark Connect|!https://issues.apache.org/jira/secure/viewavatar?size=xsmall=21130=issuetype!|{color:#42526e}OPEN{color}|[Ruifeng Zheng|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=podongfeng]|[_Actions_|https://issues.apache.org/jira/rest/api/1.0/issues/13506146/ActionsAndOperations]| | |SPARK-41281|Feature parity: SparkSession API in Spark Connect|!https://issues.apache.org/jira/secure/viewavatar?size=xsmall=21130=issuetype!|{color:#42526e}OPEN{color}|[Ruifeng Zheng|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=podongfeng]|[_Actions_|https://issues.apache.org/jira/rest/api/1.0/issues/13506150/ActionsAndOperations]| | |SPARK-41284|Feature parity: I/O in Spark
[jira] [Resolved] (SPARK-44271) Move util functions from DataType to ResolveDefaultColumns
[ https://issues.apache.org/jira/browse/SPARK-44271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hövell resolved SPARK-44271. --- Fix Version/s: 3.5.0 Resolution: Fixed > Move util functions from DataType to ResolveDefaultColumns > -- > > Key: SPARK-44271 > URL: https://issues.apache.org/jira/browse/SPARK-44271 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44131) Add call_function and deprecate call_udf for Scala API
[ https://issues.apache.org/jira/browse/SPARK-44131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-44131. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41687 [https://github.com/apache/spark/pull/41687] > Add call_function and deprecate call_udf for Scala API > -- > > Key: SPARK-44131 > URL: https://issues.apache.org/jira/browse/SPARK-44131 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.5.0 > > > The scala API for SQL exists a method call_udf used to call the user-defined > functions. > In fact, call_udf also could call the builtin functions. > The behavior is confused for users. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44131) Add call_function and deprecate call_udf for Scala API
[ https://issues.apache.org/jira/browse/SPARK-44131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-44131: - Assignee: jiaan.geng > Add call_function and deprecate call_udf for Scala API > -- > > Key: SPARK-44131 > URL: https://issues.apache.org/jira/browse/SPARK-44131 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > > The scala API for SQL exists a method call_udf used to call the user-defined > functions. > In fact, call_udf also could call the builtin functions. > The behavior is confused for users. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43628) Enable SparkContext-related tests with Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-43628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-43628: Summary: Enable SparkContext-related tests with Spark Connect (was: Enable SparkContext with Spark Connect) > Enable SparkContext-related tests with Spark Connect > > > Key: SPARK-43628 > URL: https://issues.apache.org/jira/browse/SPARK-43628 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > > Enable SparkContext with Spark Connect -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44349) Add math functions to SparkR
[ https://issues.apache.org/jira/browse/SPARK-44349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741512#comment-17741512 ] ASF GitHub Bot commented on SPARK-44349: User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/41914 > Add math functions to SparkR > > > Key: SPARK-44349 > URL: https://issues.apache.org/jira/browse/SPARK-44349 > Project: Spark > Issue Type: Sub-task > Components: R >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44267) Upgrade `pandas` to 2.0.3
[ https://issues.apache.org/jira/browse/SPARK-44267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-44267: Assignee: BingKun Pan > Upgrade `pandas` to 2.0.3 > - > > Key: SPARK-44267 > URL: https://issues.apache.org/jira/browse/SPARK-44267 > Project: Spark > Issue Type: Improvement > Components: Build, PySpark >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44267) Upgrade `pandas` to 2.0.3
[ https://issues.apache.org/jira/browse/SPARK-44267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44267. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41812 [https://github.com/apache/spark/pull/41812] > Upgrade `pandas` to 2.0.3 > - > > Key: SPARK-44267 > URL: https://issues.apache.org/jira/browse/SPARK-44267 > Project: Spark > Issue Type: Improvement > Components: Build, PySpark >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44337) Any fields set to Any.getDefaultInstance cause exceptions.
[ https://issues.apache.org/jira/browse/SPARK-44337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-44337: Assignee: Raghu Angadi > Any fields set to Any.getDefaultInstance cause exceptions. > -- > > Key: SPARK-44337 > URL: https://issues.apache.org/jira/browse/SPARK-44337 > Project: Spark > Issue Type: Task > Components: Protobuf >Affects Versions: 3.4.1 >Reporter: Raghu Angadi >Assignee: Raghu Angadi >Priority: Major > Fix For: 3.4.2 > > > Protobuf functions added support for converting `Any` fields to json strings. > It uses Protobuf's built in `JsonFormat` to covert to JSON. > JsonFormat fails to handled the fields when they are set to > `Any.getDefaultInstance()` in the original message. This fails only while > using descriptor set, but does not fail while using Java classes. Since using > descriptor files is the common case, this can be blocker. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44337) Any fields set to Any.getDefaultInstance cause exceptions.
[ https://issues.apache.org/jira/browse/SPARK-44337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44337. -- Fix Version/s: 3.5.0 (was: 3.4.2) Resolution: Fixed Issue resolved by pull request 41897 [https://github.com/apache/spark/pull/41897] > Any fields set to Any.getDefaultInstance cause exceptions. > -- > > Key: SPARK-44337 > URL: https://issues.apache.org/jira/browse/SPARK-44337 > Project: Spark > Issue Type: Task > Components: Protobuf >Affects Versions: 3.4.1 >Reporter: Raghu Angadi >Assignee: Raghu Angadi >Priority: Major > Fix For: 3.5.0 > > > Protobuf functions added support for converting `Any` fields to json strings. > It uses Protobuf's built in `JsonFormat` to covert to JSON. > JsonFormat fails to handled the fields when they are set to > `Any.getDefaultInstance()` in the original message. This fails only while > using descriptor set, but does not fail while using Java classes. Since using > descriptor files is the common case, this can be blocker. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44351) Make some syntactic simplification
Yang Jie created SPARK-44351: Summary: Make some syntactic simplification Key: SPARK-44351 URL: https://issues.apache.org/jira/browse/SPARK-44351 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.0 Reporter: Yang Jie - Use `exists` instead of `find` and `emptiness check` - Use `orNull` instead of `etOrElse(null)` - Use `getOrElse(key, value)` instead of `get(key).getOrElse(value)` on map - Use `find` instead of `filter` + `headOption` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44350) Upgrade sbt to 1.9.2
BingKun Pan created SPARK-44350: --- Summary: Upgrade sbt to 1.9.2 Key: SPARK-44350 URL: https://issues.apache.org/jira/browse/SPARK-44350 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.5.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44328) Assign names to the error class _LEGACY_ERROR_TEMP_[2325-2328]
[ https://issues.apache.org/jira/browse/SPARK-44328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-44328: Assignee: jiaan.geng > Assign names to the error class _LEGACY_ERROR_TEMP_[2325-2328] > -- > > Key: SPARK-44328 > URL: https://issues.apache.org/jira/browse/SPARK-44328 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44328) Assign names to the error class _LEGACY_ERROR_TEMP_[2325-2328]
[ https://issues.apache.org/jira/browse/SPARK-44328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-44328. -- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 41889 [https://github.com/apache/spark/pull/41889] > Assign names to the error class _LEGACY_ERROR_TEMP_[2325-2328] > -- > > Key: SPARK-44328 > URL: https://issues.apache.org/jira/browse/SPARK-44328 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44348) Reenable Session-based artifact test cases
[ https://issues.apache.org/jira/browse/SPARK-44348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741467#comment-17741467 ] Hyukjin Kwon commented on SPARK-44348: -- I am working on this. > Reenable Session-based artifact test cases > -- > > Key: SPARK-44348 > URL: https://issues.apache.org/jira/browse/SPARK-44348 > Project: Spark > Issue Type: Task > Components: PySpark, Tests >Affects Versions: 3.5.0 >Reporter: Hyukjin Kwon >Priority: Major > > Several tests in https://github.com/apache/spark/pull/41495 were skipped. > Should be investigated and reenabled back. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org