[jira] [Updated] (SPARK-44365) Define the computing logic through PartitionEvaluator API and use it in SQL operators InMemoryTableScanExec, DataSourceScanExec, MergeRowsExec , ReferenceSort

2023-07-10 Thread Vinod KC (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod KC updated SPARK-44365:
-
Description: 
Define the computing logic through PartitionEvaluator API and use it in SQL 
operators 

InMemoryTableScanExec

DataSourceScanExec

MergeRowsExec

ReferenceSort

  was:Define the computing logic through PartitionEvaluator API and use it in 
SQL operators `InMemoryTableScanExec`


> Define the computing logic through PartitionEvaluator API and use it in SQL 
> operators InMemoryTableScanExec, DataSourceScanExec, MergeRowsExec , 
> ReferenceSort
> --
>
> Key: SPARK-44365
> URL: https://issues.apache.org/jira/browse/SPARK-44365
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Vinod KC
>Priority: Major
>
> Define the computing logic through PartitionEvaluator API and use it in SQL 
> operators 
> InMemoryTableScanExec
> DataSourceScanExec
> MergeRowsExec
> ReferenceSort



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44365) Define the computing logic through PartitionEvaluator API and use it in SQL operators InMemoryTableScanExec, DataSourceScanExec, MergeRowsExec , ReferenceSort

2023-07-10 Thread Vinod KC (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod KC updated SPARK-44365:
-
Summary: Define the computing logic through PartitionEvaluator API and use 
it in SQL operators InMemoryTableScanExec, DataSourceScanExec, MergeRowsExec , 
ReferenceSort  (was: Define the computing logic through PartitionEvaluator API 
and use it in SQL operators InMemoryTableScanExec)

> Define the computing logic through PartitionEvaluator API and use it in SQL 
> operators InMemoryTableScanExec, DataSourceScanExec, MergeRowsExec , 
> ReferenceSort
> --
>
> Key: SPARK-44365
> URL: https://issues.apache.org/jira/browse/SPARK-44365
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Vinod KC
>Priority: Major
>
> Define the computing logic through PartitionEvaluator API and use it in SQL 
> operators `InMemoryTableScanExec`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44369) Use PartitionEvaluator API in CollectMetricsExec, GenerateExec, ExpandExec, DebugExec, HiveTableScanExec, DataSourceScanExec, SortExec

2023-07-10 Thread Vinod KC (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod KC updated SPARK-44369:
-
Summary: Use PartitionEvaluator API in CollectMetricsExec, GenerateExec, 
ExpandExec, DebugExec, HiveTableScanExec, DataSourceScanExec, SortExec  (was: 
Use PartitionEvaluator API in CollectMetricsExec, GenerateExec, ExpandExec, 
DebugExec, HiveTableScanExec, DataSourceScanExec)

> Use PartitionEvaluator API in CollectMetricsExec, GenerateExec, ExpandExec, 
> DebugExec, HiveTableScanExec, DataSourceScanExec, SortExec
> --
>
> Key: SPARK-44369
> URL: https://issues.apache.org/jira/browse/SPARK-44369
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Vinod KC
>Priority: Major
>
> Use PartitionEvaluator API in 
> CollectMetricsExec
> GenerateExec
> ExpandExec
> DebugExec
> HiveTableScanExec
> DataSourceScanExec
> SortExec
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44369) Use PartitionEvaluator API in CollectMetricsExec, GenerateExec, ExpandExec, DebugExec, HiveTableScanExec, DataSourceScanExec

2023-07-10 Thread Vinod KC (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod KC updated SPARK-44369:
-
Description: 
Use PartitionEvaluator API in 

CollectMetricsExec
GenerateExec
ExpandExec
DebugExec
HiveTableScanExec
DataSourceScanExec

SortExec

 

  was:
Use PartitionEvaluator API in 

CollectMetricsExec
GenerateExec
ExpandExec
DebugExec
HiveTableScanExec
DataSourceScanExec


> Use PartitionEvaluator API in CollectMetricsExec, GenerateExec, ExpandExec, 
> DebugExec, HiveTableScanExec, DataSourceScanExec
> 
>
> Key: SPARK-44369
> URL: https://issues.apache.org/jira/browse/SPARK-44369
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Vinod KC
>Priority: Major
>
> Use PartitionEvaluator API in 
> CollectMetricsExec
> GenerateExec
> ExpandExec
> DebugExec
> HiveTableScanExec
> DataSourceScanExec
> SortExec
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44369) Use PartitionEvaluator API in CollectMetricsExec, GenerateExec, ExpandExec, DebugExec, HiveTableScanExec, DataSourceScanExec

2023-07-10 Thread Vinod KC (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741839#comment-17741839
 ] 

Vinod KC commented on SPARK-44369:
--

I'm working on it

> Use PartitionEvaluator API in CollectMetricsExec, GenerateExec, ExpandExec, 
> DebugExec, HiveTableScanExec, DataSourceScanExec
> 
>
> Key: SPARK-44369
> URL: https://issues.apache.org/jira/browse/SPARK-44369
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Vinod KC
>Priority: Major
>
> Use PartitionEvaluator API in 
> CollectMetricsExec
> GenerateExec
> ExpandExec
> DebugExec
> HiveTableScanExec
> DataSourceScanExec



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44369) Use PartitionEvaluator API in CollectMetricsExec, GenerateExec, ExpandExec, DebugExec, HiveTableScanExec, DataSourceScanExec

2023-07-10 Thread Vinod KC (Jira)
Vinod KC created SPARK-44369:


 Summary: Use PartitionEvaluator API in CollectMetricsExec, 
GenerateExec, ExpandExec, DebugExec, HiveTableScanExec, DataSourceScanExec
 Key: SPARK-44369
 URL: https://issues.apache.org/jira/browse/SPARK-44369
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: Vinod KC


Use PartitionEvaluator API in 

CollectMetricsExec
GenerateExec
ExpandExec
DebugExec
HiveTableScanExec
DataSourceScanExec



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44362) Use PartitionEvaluator API in AggregateInPandasExec, WindowInPandasExec,EvalPythonExec,AttachDistributedSequenceExec

2023-07-10 Thread Vinod KC (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741838#comment-17741838
 ] 

Vinod KC commented on SPARK-44362:
--

Im working on it

> Use  PartitionEvaluator API in AggregateInPandasExec, 
> WindowInPandasExec,EvalPythonExec,AttachDistributedSequenceExec
> -
>
> Key: SPARK-44362
> URL: https://issues.apache.org/jira/browse/SPARK-44362
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Vinod KC
>Priority: Major
>
> Use  PartitionEvaluator API in
> AggregateInPandasExec
> WindowInPandasExec
> EvalPythonExec
> AttachDistributedSequenceExec



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44365) Define the computing logic through PartitionEvaluator API and use it in SQL operators InMemoryTableScanExec

2023-07-10 Thread Vinod KC (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741837#comment-17741837
 ] 

Vinod KC commented on SPARK-44365:
--

Im working on it

> Define the computing logic through PartitionEvaluator API and use it in SQL 
> operators InMemoryTableScanExec
> ---
>
> Key: SPARK-44365
> URL: https://issues.apache.org/jira/browse/SPARK-44365
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Vinod KC
>Priority: Major
>
> Define the computing logic through PartitionEvaluator API and use it in SQL 
> operators `InMemoryTableScanExec`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44361) Use PartitionEvaluator API in BatchEvalPythonUDTFExec, FlatMapGroupsInPandasExec, MapInBatchExec, FlatMapCoGroupsInPandasExec

2023-07-10 Thread Vinod KC (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741836#comment-17741836
 ] 

Vinod KC commented on SPARK-44361:
--

Im working on it

> Use  PartitionEvaluator API in BatchEvalPythonUDTFExec, 
> FlatMapGroupsInPandasExec, MapInBatchExec, FlatMapCoGroupsInPandasExec
> --
>
> Key: SPARK-44361
> URL: https://issues.apache.org/jira/browse/SPARK-44361
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Vinod KC
>Priority: Major
>
> Use  PartitionEvaluator API in
> BatchEvalPythonUDTFExec,
> FlatMapGroupsInPandasExec,
> MapInBatchExec,
> FlatMapCoGroupsInPandasExec



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44357) Add pyspark_testing module for GHA tests

2023-07-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-44357:


Assignee: Amanda Liu

> Add pyspark_testing module for GHA tests
> 
>
> Key: SPARK-44357
> URL: https://issues.apache.org/jira/browse/SPARK-44357
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Amanda Liu
>Assignee: Amanda Liu
>Priority: Major
>
> SPIP: 
> https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44357) Add pyspark_testing module for GHA tests

2023-07-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-44357.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41896
[https://github.com/apache/spark/pull/41896]

> Add pyspark_testing module for GHA tests
> 
>
> Key: SPARK-44357
> URL: https://issues.apache.org/jira/browse/SPARK-44357
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Amanda Liu
>Assignee: Amanda Liu
>Priority: Major
> Fix For: 3.5.0
>
>
> SPIP: 
> https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44363) Display percent of unequal rows in DataFrame comparison

2023-07-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-44363.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41926
[https://github.com/apache/spark/pull/41926]

> Display percent of unequal rows in DataFrame comparison
> ---
>
> Key: SPARK-44363
> URL: https://issues.apache.org/jira/browse/SPARK-44363
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Amanda Liu
>Assignee: Amanda Liu
>Priority: Major
> Fix For: 3.5.0
>
>
> SPIP: 
> https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44363) Display percent of unequal rows in DataFrame comparison

2023-07-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-44363:


Assignee: Amanda Liu

> Display percent of unequal rows in DataFrame comparison
> ---
>
> Key: SPARK-44363
> URL: https://issues.apache.org/jira/browse/SPARK-44363
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Amanda Liu
>Assignee: Amanda Liu
>Priority: Major
>
> SPIP: 
> https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44368) Support partition operation on dataframe in Spark Connect Go Client

2023-07-10 Thread BoYang (Jira)
BoYang created SPARK-44368:
--

 Summary: Support partition operation on dataframe in Spark Connect 
Go Client
 Key: SPARK-44368
 URL: https://issues.apache.org/jira/browse/SPARK-44368
 Project: Spark
  Issue Type: Sub-task
  Components: Connect Contrib
Affects Versions: 3.4.1
Reporter: BoYang


Support partition operation on dataframe in Spark Connect Go Client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44251) Potential for incorrect results or NPE when full outer USING join has null key value

2023-07-10 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-44251:

Fix Version/s: 3.5.0
   (was: 4.0.0)

> Potential for incorrect results or NPE when full outer USING join has null 
> key value
> 
>
> Key: SPARK-44251
> URL: https://issues.apache.org/jira/browse/SPARK-44251
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2, 3.4.1, 3.5.0
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Major
>  Labels: correctness
> Fix For: 3.3.3, 3.5.0, 3.4.2
>
>
> The following query produces incorrect results:
> {noformat}
> create or replace temp view v1 as values (1, 2), (null, 7) as (c1, c2);
> create or replace temp view v2 as values (2, 3) as (c1, c2);
> select explode(array(c1)) as x
> from v1
> full outer join v2
> using (c1);
> -1   <== should be null
> 1
> 2
> {noformat}
> The following query fails with a {{NullPointerException}}:
> {noformat}
> create or replace temp view v1 as values ('1', 2), (null, 7) as (c1, c2);
> create or replace temp view v2 as values ('2', 3) as (c1, c2);
> select explode(array(c1)) as x
> from v1
> full outer join v2
> using (c1);
> 23/06/25 17:06:39 ERROR Executor: Exception in task 0.0 in stage 14.0 (TID 11)
> java.lang.NullPointerException
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.generate_doConsume_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.smj_consumeFullOuterJoinRow_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.wholestagecodegen_findNextJoinRows_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
> ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44251) Potential for incorrect results or NPE when full outer USING join has null key value

2023-07-10 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-44251.
-
Fix Version/s: 3.3.3
   4.0.0
   3.4.2
   Resolution: Fixed

Issue resolved by pull request 41809
[https://github.com/apache/spark/pull/41809]

> Potential for incorrect results or NPE when full outer USING join has null 
> key value
> 
>
> Key: SPARK-44251
> URL: https://issues.apache.org/jira/browse/SPARK-44251
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2, 3.4.1, 3.5.0
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Major
>  Labels: correctness
> Fix For: 3.3.3, 4.0.0, 3.4.2
>
>
> The following query produces incorrect results:
> {noformat}
> create or replace temp view v1 as values (1, 2), (null, 7) as (c1, c2);
> create or replace temp view v2 as values (2, 3) as (c1, c2);
> select explode(array(c1)) as x
> from v1
> full outer join v2
> using (c1);
> -1   <== should be null
> 1
> 2
> {noformat}
> The following query fails with a {{NullPointerException}}:
> {noformat}
> create or replace temp view v1 as values ('1', 2), (null, 7) as (c1, c2);
> create or replace temp view v2 as values ('2', 3) as (c1, c2);
> select explode(array(c1)) as x
> from v1
> full outer join v2
> using (c1);
> 23/06/25 17:06:39 ERROR Executor: Exception in task 0.0 in stage 14.0 (TID 11)
> java.lang.NullPointerException
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.generate_doConsume_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.smj_consumeFullOuterJoinRow_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.wholestagecodegen_findNextJoinRows_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
> ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44251) Potential for incorrect results or NPE when full outer USING join has null key value

2023-07-10 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang reassigned SPARK-44251:
---

Assignee: Bruce Robbins

> Potential for incorrect results or NPE when full outer USING join has null 
> key value
> 
>
> Key: SPARK-44251
> URL: https://issues.apache.org/jira/browse/SPARK-44251
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2, 3.4.1, 3.5.0
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Major
>  Labels: correctness
>
> The following query produces incorrect results:
> {noformat}
> create or replace temp view v1 as values (1, 2), (null, 7) as (c1, c2);
> create or replace temp view v2 as values (2, 3) as (c1, c2);
> select explode(array(c1)) as x
> from v1
> full outer join v2
> using (c1);
> -1   <== should be null
> 1
> 2
> {noformat}
> The following query fails with a {{NullPointerException}}:
> {noformat}
> create or replace temp view v1 as values ('1', 2), (null, 7) as (c1, c2);
> create or replace temp view v2 as values ('2', 3) as (c1, c2);
> select explode(array(c1)) as x
> from v1
> full outer join v2
> using (c1);
> 23/06/25 17:06:39 ERROR Executor: Exception in task 0.0 in stage 14.0 (TID 11)
> java.lang.NullPointerException
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.generate_doConsume_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.smj_consumeFullOuterJoinRow_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.wholestagecodegen_findNextJoinRows_0$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
> ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44367) Show error message on UI for each query

2023-07-10 Thread Kent Yao (Jira)
Kent Yao created SPARK-44367:


 Summary: Show error message  on UI for each query
 Key: SPARK-44367
 URL: https://issues.apache.org/jira/browse/SPARK-44367
 Project: Spark
  Issue Type: Improvement
  Components: SQL, Web UI
Affects Versions: 3.5.0
Reporter: Kent Yao


displaying sql errors to improve UX for sql developing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41997) Test parity: pyspark.sql.tests.test_readwriter

2023-07-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41997:


Assignee: Hyukjin Kwon

> Test parity: pyspark.sql.tests.test_readwriter
> --
>
> Key: SPARK-41997
> URL: https://issues.apache.org/jira/browse/SPARK-41997
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> See https://issues.apache.org/jira/browse/SPARK-41652 and 
> https://issues.apache.org/jira/browse/SPARK-41651



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42264) Test Parity: pyspark.sql.tests.test_udf and pyspark.sql.tests.pandas.test_pandas_udf

2023-07-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-42264:
-
Shepherd: Hyukjin Kwon

> Test Parity: pyspark.sql.tests.test_udf and 
> pyspark.sql.tests.pandas.test_pandas_udf
> 
>
> Key: SPARK-42264
> URL: https://issues.apache.org/jira/browse/SPARK-42264
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43974) Upgrade buf to v1.23.1

2023-07-10 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-43974.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41469
[https://github.com/apache/spark/pull/41469]

> Upgrade buf to v1.23.1
> --
>
> Key: SPARK-43974
> URL: https://issues.apache.org/jira/browse/SPARK-43974
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44366) Migrate antlr4 from 4.9 to 4.10+

2023-07-10 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-44366:
---

 Summary: Migrate antlr4 from 4.9 to 4.10+
 Key: SPARK-44366
 URL: https://issues.apache.org/jira/browse/SPARK-44366
 Project: Spark
  Issue Type: Sub-task
  Components: Build
Affects Versions: 4.0.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44332) Fix the sorting error of Executor ID Column on Executors UI Page

2023-07-10 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-44332.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41887
[https://github.com/apache/spark/pull/41887]

> Fix the sorting error of Executor ID Column on Executors UI Page
> 
>
> Key: SPARK-44332
> URL: https://issues.apache.org/jira/browse/SPARK-44332
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44332) Fix the sorting error of Executor ID Column on Executors UI Page

2023-07-10 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-44332:


Assignee: BingKun Pan

> Fix the sorting error of Executor ID Column on Executors UI Page
> 
>
> Key: SPARK-44332
> URL: https://issues.apache.org/jira/browse/SPARK-44332
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43983) Implement cross validator estimator

2023-07-10 Thread Weichen Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weichen Xu resolved SPARK-43983.

Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41881
[https://github.com/apache/spark/pull/41881]

> Implement cross validator estimator
> ---
>
> Key: SPARK-43983
> URL: https://issues.apache.org/jira/browse/SPARK-43983
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, ML, PySpark
>Affects Versions: 3.5.0
>Reporter: Weichen Xu
>Assignee: Weichen Xu
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44216) Add assertSchemaEqual API with ignore_nullable optional flag

2023-07-10 Thread Amanda Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amanda Liu updated SPARK-44216:
---
Summary: Add assertSchemaEqual API with ignore_nullable optional flag  
(was: Add improved error message formatting for assert_df_equality)

> Add assertSchemaEqual API with ignore_nullable optional flag
> 
>
> Key: SPARK-44216
> URL: https://issues.apache.org/jira/browse/SPARK-44216
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Amanda Liu
>Priority: Major
>
> SPIP: 
> https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44216) Make assertSchemaEqual API with ignore_nullable optional flag

2023-07-10 Thread Amanda Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amanda Liu updated SPARK-44216:
---
Summary: Make assertSchemaEqual API with ignore_nullable optional flag  
(was: Add assertSchemaEqual API with ignore_nullable optional flag)

> Make assertSchemaEqual API with ignore_nullable optional flag
> -
>
> Key: SPARK-44216
> URL: https://issues.apache.org/jira/browse/SPARK-44216
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Amanda Liu
>Priority: Major
>
> SPIP: 
> https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44365) Use PartitionEvaluator API in AggregateInPandasExec, WindowInPandasExec,EvalPythonExec,AttachDistributedSequenceExec

2023-07-10 Thread Vinod KC (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod KC updated SPARK-44365:
-
Description: Define the computing logic through PartitionEvaluator API and 
use it in SQL operators `InMemoryTableScanExec`  (was: InMemoryTableScanExec)

> Use  PartitionEvaluator API in AggregateInPandasExec, 
> WindowInPandasExec,EvalPythonExec,AttachDistributedSequenceExec
> -
>
> Key: SPARK-44365
> URL: https://issues.apache.org/jira/browse/SPARK-44365
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Vinod KC
>Priority: Major
>
> Define the computing logic through PartitionEvaluator API and use it in SQL 
> operators `InMemoryTableScanExec`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44365) Use PartitionEvaluator API in AggregateInPandasExec, WindowInPandasExec,EvalPythonExec,AttachDistributedSequenceExec

2023-07-10 Thread Vinod KC (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod KC updated SPARK-44365:
-
Description: InMemoryTableScanExec  (was: `BatchEvalPythonUDTFExec`
`FlatMapGroupsInPandasExec`
`MapInBatchExec`
`FlatMapCoGroupsInPandasExec`)

> Use  PartitionEvaluator API in AggregateInPandasExec, 
> WindowInPandasExec,EvalPythonExec,AttachDistributedSequenceExec
> -
>
> Key: SPARK-44365
> URL: https://issues.apache.org/jira/browse/SPARK-44365
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Vinod KC
>Priority: Major
>
> InMemoryTableScanExec



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44365) Define the computing logic through PartitionEvaluator API and use it in SQL operators InMemoryTableScanExec

2023-07-10 Thread Vinod KC (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod KC updated SPARK-44365:
-
Summary: Define the computing logic through PartitionEvaluator API and use 
it in SQL operators InMemoryTableScanExec  (was: Use  PartitionEvaluator API in 
AggregateInPandasExec, 
WindowInPandasExec,EvalPythonExec,AttachDistributedSequenceExec)

> Define the computing logic through PartitionEvaluator API and use it in SQL 
> operators InMemoryTableScanExec
> ---
>
> Key: SPARK-44365
> URL: https://issues.apache.org/jira/browse/SPARK-44365
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Vinod KC
>Priority: Major
>
> Define the computing logic through PartitionEvaluator API and use it in SQL 
> operators `InMemoryTableScanExec`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44365) Use PartitionEvaluator API in AggregateInPandasExec, WindowInPandasExec,EvalPythonExec,AttachDistributedSequenceExec

2023-07-10 Thread Vinod KC (Jira)
Vinod KC created SPARK-44365:


 Summary: Use  PartitionEvaluator API in AggregateInPandasExec, 
WindowInPandasExec,EvalPythonExec,AttachDistributedSequenceExec
 Key: SPARK-44365
 URL: https://issues.apache.org/jira/browse/SPARK-44365
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: Vinod KC


`BatchEvalPythonUDTFExec`
`FlatMapGroupsInPandasExec`
`MapInBatchExec`
`FlatMapCoGroupsInPandasExec`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44363) Display percent of unequal rows in DataFrame comparison

2023-07-10 Thread Amanda Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amanda Liu updated SPARK-44363:
---
Summary: Display percent of unequal rows in DataFrame comparison  (was: 
Display percent of unequal rows in dataframe comparison)

> Display percent of unequal rows in DataFrame comparison
> ---
>
> Key: SPARK-44363
> URL: https://issues.apache.org/jira/browse/SPARK-44363
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Amanda Liu
>Priority: Major
>
> SPIP: 
> https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44061) Add assertDataFrameEquality util function

2023-07-10 Thread Amanda Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amanda Liu updated SPARK-44061:
---
Summary: Add assertDataFrameEquality util function  (was: Add 
assert_df_equality util function)

> Add assertDataFrameEquality util function
> -
>
> Key: SPARK-44061
> URL: https://issues.apache.org/jira/browse/SPARK-44061
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Amanda Liu
>Assignee: Amanda Liu
>Priority: Major
> Fix For: 3.5.0
>
>
> SPIP: 
> https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44364) Support List[Row] data type for expected DataFrame argument

2023-07-10 Thread Amanda Liu (Jira)
Amanda Liu created SPARK-44364:
--

 Summary: Support List[Row] data type for expected DataFrame 
argument
 Key: SPARK-44364
 URL: https://issues.apache.org/jira/browse/SPARK-44364
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.5.0
Reporter: Amanda Liu


SPIP: 
https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44363) Display percent of unequal rows in dataframe comparison

2023-07-10 Thread Amanda Liu (Jira)
Amanda Liu created SPARK-44363:
--

 Summary: Display percent of unequal rows in dataframe comparison
 Key: SPARK-44363
 URL: https://issues.apache.org/jira/browse/SPARK-44363
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.5.0
Reporter: Amanda Liu


SPIP: 
https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44362) Use PartitionEvaluator API in AggregateInPandasExec, WindowInPandasExec,EvalPythonExec,AttachDistributedSequenceExec

2023-07-10 Thread Vinod KC (Jira)
Vinod KC created SPARK-44362:


 Summary: Use  PartitionEvaluator API in AggregateInPandasExec, 
WindowInPandasExec,EvalPythonExec,AttachDistributedSequenceExec
 Key: SPARK-44362
 URL: https://issues.apache.org/jira/browse/SPARK-44362
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: Vinod KC


Use  PartitionEvaluator API in

AggregateInPandasExec

WindowInPandasExec

EvalPythonExec

AttachDistributedSequenceExec



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44361) Use PartitionEvaluator API in BatchEvalPythonUDTFExec, FlatMapGroupsInPandasExec, MapInBatchExec, FlatMapCoGroupsInPandasExec

2023-07-10 Thread Vinod KC (Jira)
Vinod KC created SPARK-44361:


 Summary: Use  PartitionEvaluator API in BatchEvalPythonUDTFExec, 
FlatMapGroupsInPandasExec, MapInBatchExec, FlatMapCoGroupsInPandasExec
 Key: SPARK-44361
 URL: https://issues.apache.org/jira/browse/SPARK-44361
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: Vinod KC


Use  PartitionEvaluator API in

BatchEvalPythonUDTFExec,

FlatMapGroupsInPandasExec,

MapInBatchExec,

FlatMapCoGroupsInPandasExec



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44360) Support schema pruning in delta-based MERGE operations

2023-07-10 Thread Anton Okolnychyi (Jira)
Anton Okolnychyi created SPARK-44360:


 Summary: Support schema pruning in delta-based MERGE operations
 Key: SPARK-44360
 URL: https://issues.apache.org/jira/browse/SPARK-44360
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: Anton Okolnychyi


We need to support schema pruning in delta-based MERGE operations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44352) Move sameType back to DataType

2023-07-10 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-44352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-44352.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

> Move sameType back to DataType
> --
>
> Key: SPARK-44352
> URL: https://issues.apache.org/jira/browse/SPARK-44352
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, SQL
>Affects Versions: 3.4.1
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44353) Remove toAttributes from StructType

2023-07-10 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-44353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell reassigned SPARK-44353:
-

Assignee: Herman van Hövell

> Remove toAttributes from StructType
> ---
>
> Key: SPARK-44353
> URL: https://issues.apache.org/jira/browse/SPARK-44353
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, SQL
>Affects Versions: 3.4.1
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44343) Separate encoder inference from expression encoder generation in ScalaReflection

2023-07-10 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-44343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-44343.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

> Separate encoder inference from expression encoder generation in 
> ScalaReflection
> 
>
> Key: SPARK-44343
> URL: https://issues.apache.org/jira/browse/SPARK-44343
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, SQL
>Affects Versions: 3.4.1
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44359) Define the computing logic through PartitionEvaluator API and use it in SQL aggregate operators

2023-07-10 Thread Vinod KC (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741736#comment-17741736
 ] 

Vinod KC commented on SPARK-44359:
--

Im working on this

> Define the computing logic through PartitionEvaluator API and use it in SQL 
> aggregate operators
> ---
>
> Key: SPARK-44359
> URL: https://issues.apache.org/jira/browse/SPARK-44359
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Vinod KC
>Priority: Major
>
> Define the computing logic through PartitionEvaluator API and use it in SQL 
> aggregate operators
> `MergingSessionsExec`
> `SortAggregateExec`
> `UpdatingSessionsExec`
> `HashAggregateExec`
> `ObjectHashAggregateExec`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44359) Define the computing logic through PartitionEvaluator API and use it in SQL aggregate operators

2023-07-10 Thread Vinod KC (Jira)
Vinod KC created SPARK-44359:


 Summary: Define the computing logic through PartitionEvaluator API 
and use it in SQL aggregate operators
 Key: SPARK-44359
 URL: https://issues.apache.org/jira/browse/SPARK-44359
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: Vinod KC


Define the computing logic through PartitionEvaluator API and use it in SQL 
aggregate operators

`MergingSessionsExec`
`SortAggregateExec`
`UpdatingSessionsExec`
`HashAggregateExec`

`ObjectHashAggregateExec`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44330) Define the computing logic through PartitionEvaluator API and use it in BroadcastNestedLoopJoinExec & BroadcastHashJoinExec

2023-07-10 Thread Vinod KC (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741734#comment-17741734
 ] 

Vinod KC commented on SPARK-44330:
--

PR raised : https://github.com/apache/spark/pull/41888

> Define the computing logic through PartitionEvaluator API and use it in 
> BroadcastNestedLoopJoinExec & BroadcastHashJoinExec
> ---
>
> Key: SPARK-44330
> URL: https://issues.apache.org/jira/browse/SPARK-44330
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Vinod KC
>Priority: Major
>
> Define the computing logic through PartitionEvaluator API and use it in 
> BroadcastNestedLoopJoinExec & BroadcastHashJoinExec



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44350) Upgrade sbt to 1.9.2

2023-07-10 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-44350:
-
Priority: Trivial  (was: Minor)

> Upgrade sbt to 1.9.2
> 
>
> Key: SPARK-44350
> URL: https://issues.apache.org/jira/browse/SPARK-44350
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Trivial
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44350) Upgrade sbt to 1.9.2

2023-07-10 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-44350:


Assignee: BingKun Pan

> Upgrade sbt to 1.9.2
> 
>
> Key: SPARK-44350
> URL: https://issues.apache.org/jira/browse/SPARK-44350
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44350) Upgrade sbt to 1.9.2

2023-07-10 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-44350.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41916
[https://github.com/apache/spark/pull/41916]

> Upgrade sbt to 1.9.2
> 
>
> Key: SPARK-44350
> URL: https://issues.apache.org/jira/browse/SPARK-44350
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44357) Add pyspark_testing module for GHA tests

2023-07-10 Thread Amanda Liu (Jira)
Amanda Liu created SPARK-44357:
--

 Summary: Add pyspark_testing module for GHA tests
 Key: SPARK-44357
 URL: https://issues.apache.org/jira/browse/SPARK-44357
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.5.0
Reporter: Amanda Liu


SPIP: 
https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44356) Move INSERT INTO to CTEDef code path

2023-07-10 Thread Max Gekk (Jira)
Max Gekk created SPARK-44356:


 Summary: Move INSERT INTO to CTEDef code path
 Key: SPARK-44356
 URL: https://issues.apache.org/jira/browse/SPARK-44356
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: Max Gekk


Support the combination WITH ... INSERT INTO.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44355) Move commands to CTEDef code path and deprecate CTE inline path

2023-07-10 Thread Max Gekk (Jira)
Max Gekk created SPARK-44355:


 Summary: Move commands to CTEDef code path and deprecate CTE 
inline path
 Key: SPARK-44355
 URL: https://issues.apache.org/jira/browse/SPARK-44355
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.5.0
Reporter: Max Gekk


Right now our CTE resolution code path is diverged: query with commands go into 
CTE inline code path where non-command queries go into CTEDef code path (see 
https://github.com/apache/spark/blob/42719d9425b9a24ef016b5c2874e522b960cf114/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala#L50
 ).

For longer term we should migrate command queries go through CTEDef as well and 
deprecate the CTE inline path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44351) Make some syntactic simplification

2023-07-10 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie reassigned SPARK-44351:


Assignee: Yang Jie

> Make some syntactic simplification
> --
>
> Key: SPARK-44351
> URL: https://issues.apache.org/jira/browse/SPARK-44351
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>
> - Use `exists` instead of `find` and `emptiness check`
> - Use `orNull` instead of `etOrElse(null)`
> - Use `getOrElse(key, value)` instead of `get(key).getOrElse(value)` on map
> - Use `find` instead of `filter` + `headOption`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44351) Make some syntactic simplification

2023-07-10 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-44351.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41915
[https://github.com/apache/spark/pull/41915]

> Make some syntactic simplification
> --
>
> Key: SPARK-44351
> URL: https://issues.apache.org/jira/browse/SPARK-44351
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.5.0
>
>
> - Use `exists` instead of `find` and `emptiness check`
> - Use `orNull` instead of `etOrElse(null)`
> - Use `getOrElse(key, value)` instead of `get(key).getOrElse(value)` on map
> - Use `find` instead of `filter` + `headOption`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44354) Cannot create dataframe with CharType/VarcharType column

2023-07-10 Thread Kai-Michael Roesner (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai-Michael Roesner updated SPARK-44354:

Description: 
When trying to create a dataframe with a CharType or VarcharType column like so:
{code}
from datetime import date
from decimal import Decimal
from pyspark.sql import SparkSession
from pyspark.sql.types import *

data = [
  (1, 'abc', Decimal(3.142), date(2023, 1, 1)),
  (2, 'bcd', Decimal(1.414), date(2023, 1, 2)),
  (3, 'cde', Decimal(2.718), date(2023, 1, 3))]

schema = StructType([
  StructField('INT', IntegerType()),
  StructField('STR', CharType(3)),
  StructField('DEC', DecimalType(4, 3)),
  StructField('DAT', DateType())])

spark = SparkSession.builder.appName('data-types').getOrCreate()
df = spark.createDataFrame(data, schema)
df.show()
{code}
a {{java.lang.IllegalStateException}} is thrown 
[here|https://github.com/apache/spark/blob/85e252e8503534009f4fb5ea005d44c9eda31447/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L168].
I'm expecting this to work...

PS: Excerpt from the logs:
{code}
py4j.protocol.Py4JJavaError: An error occurred while calling 
o24.applySchemaToPythonRDD.
: java.lang.IllegalStateException: [BUG] logical plan should not have output of 
char/varchar type: LogicalRDD [INT#0, STR#1, DEC#2, DAT#3], false

at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$1(CheckAnalysis.scala:168)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$1$adapted(CheckAnalysis.scala:163)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:295)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0(CheckAnalysis.scala:163)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0$(CheckAnalysis.scala:160)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis0(Analyzer.scala:188)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:156)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:146)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:188)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:211)
at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:208)
at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:76)
at 
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:202)
at 
org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:526)
at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:202)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
at 
org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:201)
at 
org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:76)
at 
org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74)
at 
org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$1(Dataset.scala:90)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:88)
at 
org.apache.spark.sql.SparkSession.internalCreateDataFrame(SparkSession.scala:571)
at 
org.apache.spark.sql.SparkSession.applySchemaToPythonRDD(SparkSession.scala:804)
at 
org.apache.spark.sql.SparkSession.applySchemaToPythonRDD(SparkSession.scala:789)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at 
py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
   

[jira] [Updated] (SPARK-44354) Cannot create dataframe with CharType/VarcharType column

2023-07-10 Thread Kai-Michael Roesner (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai-Michael Roesner updated SPARK-44354:

Component/s: SQL

> Cannot create dataframe with CharType/VarcharType column
> 
>
> Key: SPARK-44354
> URL: https://issues.apache.org/jira/browse/SPARK-44354
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Kai-Michael Roesner
>Priority: Major
>
> When trying to create a dataframe with a CharType or VarcharType column like 
> so:
> {code}
> from datetime import date
> from decimal import Decimal
> from pyspark.sql import SparkSession
> from pyspark.sql.types import *
> data = [
>   (1, 'abc', Decimal(3.142), date(2023, 1, 1)),
>   (2, 'bcd', Decimal(1.414), date(2023, 1, 2)),
>   (3, 'cde', Decimal(2.718), date(2023, 1, 3))]
> schema = StructType([
>   StructField('INT', IntegerType()),
>   StructField('STR', CharType(3)),
>   StructField('DEC', DecimalType(4, 3)),
>   StructField('DAT', DateType())])
> spark = SparkSession.builder.appName('data-types').getOrCreate()
> df = spark.createDataFrame(data, schema)
> df.show()
> {code}
> a {{java.lang.IllegalStateException}} is thrown 
> [here|https://github.com/apache/spark/blob/85e252e8503534009f4fb5ea005d44c9eda31447/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L168].
> I'm expecting this to work...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44354) Cannot create dataframe with CharType/VarcharType column

2023-07-10 Thread Kai-Michael Roesner (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai-Michael Roesner updated SPARK-44354:

Description: 
When trying to create a dataframe with a CharType or VarcharType column like so:
{code}
from datetime import date
from decimal import Decimal
from pyspark.sql import SparkSession
from pyspark.sql.types import *

data = [
  (1, 'abc', Decimal(3.142), date(2023, 1, 1)),
  (2, 'bcd', Decimal(1.414), date(2023, 1, 2)),
  (3, 'cde', Decimal(2.718), date(2023, 1, 3))]

schema = StructType([
  StructField('INT', IntegerType()),
  StructField('STR', CharType(3)),
  StructField('DEC', DecimalType(4, 3)),
  StructField('DAT', DateType())])

spark = SparkSession.builder.appName('data-types').getOrCreate()
df = spark.createDataFrame(data, schema)
df.show()
{code}
a {{java.lang.IllegalStateException}} is thrown 
[here|https://github.com/apache/spark/blob/85e252e8503534009f4fb5ea005d44c9eda31447/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L168].
I'm expecting this to work...

  was:
When trying to create a dataframe with a CharType or VarcharType column like so:
{code}
from datetime import date
from decimal import Decimal
from pyspark.sql import SparkSession
from pyspark.sql.types import *

data = [
  (1, 'abc', Decimal(3.142), date(2023, 1, 1)),
  (2, 'bcd', Decimal(1.414), date(2023, 1, 2)),
  (3, 'cde', Decimal(2.718), date(2023, 1, 3))]

schema = StructType([
  StructField('INT', IntegerType()),
  StructField('STR', CharType(3)),
  StructField('DEC', DecimalType(4, 3)),
  StructField('DAT', DateType())])

spark = SparkSession.builder.appName('data-types').getOrCreate()
df = spark.createDataFrame(data, schema)
df.show()
{code}
a {{java.lang.IllegalStateException}} is thrown 
[here|https://github.com/apache/spark/blame/85e252e8503534009f4fb5ea005d44c9eda31447/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L168]
I'm expecting this to work...


> Cannot create dataframe with CharType/VarcharType column
> 
>
> Key: SPARK-44354
> URL: https://issues.apache.org/jira/browse/SPARK-44354
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Kai-Michael Roesner
>Priority: Major
>
> When trying to create a dataframe with a CharType or VarcharType column like 
> so:
> {code}
> from datetime import date
> from decimal import Decimal
> from pyspark.sql import SparkSession
> from pyspark.sql.types import *
> data = [
>   (1, 'abc', Decimal(3.142), date(2023, 1, 1)),
>   (2, 'bcd', Decimal(1.414), date(2023, 1, 2)),
>   (3, 'cde', Decimal(2.718), date(2023, 1, 3))]
> schema = StructType([
>   StructField('INT', IntegerType()),
>   StructField('STR', CharType(3)),
>   StructField('DEC', DecimalType(4, 3)),
>   StructField('DAT', DateType())])
> spark = SparkSession.builder.appName('data-types').getOrCreate()
> df = spark.createDataFrame(data, schema)
> df.show()
> {code}
> a {{java.lang.IllegalStateException}} is thrown 
> [here|https://github.com/apache/spark/blob/85e252e8503534009f4fb5ea005d44c9eda31447/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L168].
> I'm expecting this to work...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44354) Cannot create dataframe with CharType/VarcharType column

2023-07-10 Thread Kai-Michael Roesner (Jira)
Kai-Michael Roesner created SPARK-44354:
---

 Summary: Cannot create dataframe with CharType/VarcharType column
 Key: SPARK-44354
 URL: https://issues.apache.org/jira/browse/SPARK-44354
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 3.4.0
Reporter: Kai-Michael Roesner


When trying to create a dataframe with a CharType or VarcharType column like so:
{code}
from datetime import date
from decimal import Decimal
from pyspark.sql import SparkSession
from pyspark.sql.types import *

data = [
  (1, 'abc', Decimal(3.142), date(2023, 1, 1)),
  (2, 'bcd', Decimal(1.414), date(2023, 1, 2)),
  (3, 'cde', Decimal(2.718), date(2023, 1, 3))]

schema = StructType([
  StructField('INT', IntegerType()),
  StructField('STR', CharType(3)),
  StructField('DEC', DecimalType(4, 3)),
  StructField('DAT', DateType())])

spark = SparkSession.builder.appName('data-types').getOrCreate()
df = spark.createDataFrame(data, schema)
df.show()
{code}
a {{java.lang.IllegalStateException}} is thrown 
[here|https://github.com/apache/spark/blame/85e252e8503534009f4fb5ea005d44c9eda31447/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L168]
I'm expecting this to work...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38476) Use error classes in org.apache.spark.storage

2023-07-10 Thread Bo Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Zhang updated SPARK-38476:
-
Summary: Use error classes in org.apache.spark.storage  (was: Use error 
classes in org.apache.spark.shuffle)

> Use error classes in org.apache.spark.storage
> -
>
> Key: SPARK-38476
> URL: https://issues.apache.org/jira/browse/SPARK-38476
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Bo Zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38477) Use error classes in org.apache.spark.shuffle

2023-07-10 Thread Bo Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Zhang updated SPARK-38477:
-
Summary: Use error classes in org.apache.spark.shuffle  (was: Use error 
classes in org.apache.spark.storage)

> Use error classes in org.apache.spark.shuffle
> -
>
> Key: SPARK-38477
> URL: https://issues.apache.org/jira/browse/SPARK-38477
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Bo Zhang
>Assignee: Bo Zhang
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44353) Remove toAttributes from StructType

2023-07-10 Thread Jira
Herman van Hövell created SPARK-44353:
-

 Summary: Remove toAttributes from StructType
 Key: SPARK-44353
 URL: https://issues.apache.org/jira/browse/SPARK-44353
 Project: Spark
  Issue Type: New Feature
  Components: Connect, SQL
Affects Versions: 3.4.1
Reporter: Herman van Hövell






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44352) Move sameType back to DataType

2023-07-10 Thread Nikita Awasthi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741579#comment-17741579
 ] 

Nikita Awasthi commented on SPARK-44352:


User 'hvanhovell' has created a pull request for this issue:
https://github.com/apache/spark/pull/41921

> Move sameType back to DataType
> --
>
> Key: SPARK-44352
> URL: https://issues.apache.org/jira/browse/SPARK-44352
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, SQL
>Affects Versions: 3.4.1
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44352) Move sameType back to DataType

2023-07-10 Thread Jira
Herman van Hövell created SPARK-44352:
-

 Summary: Move sameType back to DataType
 Key: SPARK-44352
 URL: https://issues.apache.org/jira/browse/SPARK-44352
 Project: Spark
  Issue Type: New Feature
  Components: Connect, SQL
Affects Versions: 3.4.1
Reporter: Herman van Hövell
Assignee: Herman van Hövell






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39375) SPIP: Spark Connect - A client and server interface for Apache Spark

2023-07-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-39375:
-
Description: 
Please find the full document for discussion here: [Spark Connect 
SPIP|https://docs.google.com/document/d/1Mnl6jmGszixLW4KcJU5j9IgpG9-UabS0dcM6PM2XGDc/edit#heading=h.wmsrrfealhrj]
 Below, we have just referenced the introduction.
h2. What are you trying to do?

While Spark is used extensively, it was designed nearly a decade ago, which, in 
the age of serverless computing and ubiquitous programming language use, poses 
a number of limitations. Most of the limitations stem from the tightly coupled 
Spark driver architecture and fact that clusters are typically shared across 
users: (1) {*}Lack of built-in remote connectivity{*}: the Spark driver runs 
both the client application and scheduler, which results in a heavyweight 
architecture that requires proximity to the cluster. There is no built-in 
capability to  remotely connect to a Spark cluster in languages other than SQL 
and users therefore rely on external solutions such as the inactive project 
[Apache Livy|https://livy.apache.org/]. (2) {*}Lack of rich developer 
experience{*}: The current architecture and APIs do not cater for interactive 
data exploration (as done with Notebooks), or allow for building out rich 
developer experience common in modern code editors. (3) {*}Stability{*}: with 
the current shared driver architecture, users causing critical exceptions (e.g. 
OOM) bring the whole cluster down for all users. (4) {*}Upgradability{*}: the 
current entangling of platform and client APIs (e.g. first and third-party 
dependencies in the classpath) does not allow for seamless upgrades between 
Spark versions (and with that, hinders new feature adoption).

 

We propose to overcome these challenges by building on the DataFrame API and 
the underlying unresolved logical plans. The DataFrame API is widely used and 
makes it very easy to iteratively express complex logic. We will introduce 
{_}Spark Connect{_}, a remote option of the DataFrame API that separates the 
client from the Spark server. With Spark Connect, Spark will become decoupled, 
allowing for built-in remote connectivity: The decoupled client SDK can be used 
to run interactive data exploration and connect to the server for DataFrame 
operations. 

 

Spark Connect will benefit Spark developers in different ways: The decoupled 
architecture will result in improved stability, as clients are separated from 
the driver. From the Spark Connect client perspective, Spark will be (almost) 
versionless, and thus enable seamless upgradability, as server APIs can evolve 
without affecting the client API. The decoupled client-server architecture can 
be leveraged to build close integrations with local developer tooling. Finally, 
separating the client process from the Spark server process will improve 
Spark’s overall security posture by avoiding the tight coupling of the client 
inside the Spark runtime environment.

 

Spark Connect will strengthen Spark’s position as the modern unified engine for 
large-scale data analytics and expand applicability to use cases and developers 
we could not reach with the current setup: Spark will become ubiquitously 
usable as the DataFrame API can be used with (almost) any programming language.
 
| |SPARK-41282|Feature parity: Column API in Spark 
Connect|!https://issues.apache.org/jira/secure/viewavatar?size=xsmall=21130=issuetype!|{color:#42526e}REOPENED{color}|[Ruifeng
 
Zheng|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=podongfeng]|[_Actions_|https://issues.apache.org/jira/rest/api/1.0/issues/13506151/ActionsAndOperations]|
| |SPARK-41283|Feature parity: Functions API in Spark 
Connect|!https://issues.apache.org/jira/secure/viewavatar?size=xsmall=21130=issuetype!|{color:#006644}RESOLVED{color}|[Ruifeng
 
Zheng|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=podongfeng]|[_Actions_|https://issues.apache.org/jira/rest/api/1.0/issues/13506152/ActionsAndOperations]|
| |SPARK-41279|Feature parity: DataFrame API in Spark 
Connect|!https://issues.apache.org/jira/secure/viewavatar?size=xsmall=21130=issuetype!|{color:#42526e}OPEN{color}|[Ruifeng
 
Zheng|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=podongfeng]|[_Actions_|https://issues.apache.org/jira/rest/api/1.0/issues/13506146/ActionsAndOperations]|
| |SPARK-41281|Feature parity: SparkSession API in Spark 
Connect|!https://issues.apache.org/jira/secure/viewavatar?size=xsmall=21130=issuetype!|{color:#42526e}OPEN{color}|[Ruifeng
 
Zheng|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=podongfeng]|[_Actions_|https://issues.apache.org/jira/rest/api/1.0/issues/13506150/ActionsAndOperations]|
| |SPARK-41284|Feature parity: I/O in Spark 

[jira] [Resolved] (SPARK-44271) Move util functions from DataType to ResolveDefaultColumns

2023-07-10 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-44271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-44271.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

> Move util functions from DataType to ResolveDefaultColumns
> --
>
> Key: SPARK-44271
> URL: https://issues.apache.org/jira/browse/SPARK-44271
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44131) Add call_function and deprecate call_udf for Scala API

2023-07-10 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-44131.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41687
[https://github.com/apache/spark/pull/41687]

> Add call_function and deprecate call_udf for Scala API
> --
>
> Key: SPARK-44131
> URL: https://issues.apache.org/jira/browse/SPARK-44131
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.5.0
>
>
> The scala API for SQL exists a method call_udf used to call the user-defined 
> functions.
> In fact, call_udf also could call the builtin functions.
> The behavior is confused for users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44131) Add call_function and deprecate call_udf for Scala API

2023-07-10 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-44131:
-

Assignee: jiaan.geng

> Add call_function and deprecate call_udf for Scala API
> --
>
> Key: SPARK-44131
> URL: https://issues.apache.org/jira/browse/SPARK-44131
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
>
> The scala API for SQL exists a method call_udf used to call the user-defined 
> functions.
> In fact, call_udf also could call the builtin functions.
> The behavior is confused for users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43628) Enable SparkContext-related tests with Spark Connect

2023-07-10 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-43628:

Summary: Enable SparkContext-related tests with Spark Connect  (was: Enable 
SparkContext with Spark Connect)

> Enable SparkContext-related tests with Spark Connect
> 
>
> Key: SPARK-43628
> URL: https://issues.apache.org/jira/browse/SPARK-43628
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Enable SparkContext with Spark Connect



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44349) Add math functions to SparkR

2023-07-10 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741512#comment-17741512
 ] 

ASF GitHub Bot commented on SPARK-44349:


User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/41914

> Add math functions to SparkR
> 
>
> Key: SPARK-44349
> URL: https://issues.apache.org/jira/browse/SPARK-44349
> Project: Spark
>  Issue Type: Sub-task
>  Components: R
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44267) Upgrade `pandas` to 2.0.3

2023-07-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-44267:


Assignee: BingKun Pan

> Upgrade `pandas` to 2.0.3
> -
>
> Key: SPARK-44267
> URL: https://issues.apache.org/jira/browse/SPARK-44267
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, PySpark
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44267) Upgrade `pandas` to 2.0.3

2023-07-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-44267.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41812
[https://github.com/apache/spark/pull/41812]

> Upgrade `pandas` to 2.0.3
> -
>
> Key: SPARK-44267
> URL: https://issues.apache.org/jira/browse/SPARK-44267
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, PySpark
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44337) Any fields set to Any.getDefaultInstance cause exceptions.

2023-07-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-44337:


Assignee: Raghu Angadi

> Any fields set to Any.getDefaultInstance cause exceptions.
> --
>
> Key: SPARK-44337
> URL: https://issues.apache.org/jira/browse/SPARK-44337
> Project: Spark
>  Issue Type: Task
>  Components: Protobuf
>Affects Versions: 3.4.1
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
>Priority: Major
> Fix For: 3.4.2
>
>
> Protobuf functions added support for converting `Any` fields to json strings. 
> It uses Protobuf's built in `JsonFormat` to covert to JSON.
> JsonFormat fails to handled the fields when they are set to 
> `Any.getDefaultInstance()` in the original message. This fails only while 
> using descriptor set, but does not fail while using Java classes. Since using 
> descriptor files is the common case, this can be blocker. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44337) Any fields set to Any.getDefaultInstance cause exceptions.

2023-07-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-44337.
--
Fix Version/s: 3.5.0
   (was: 3.4.2)
   Resolution: Fixed

Issue resolved by pull request 41897
[https://github.com/apache/spark/pull/41897]

> Any fields set to Any.getDefaultInstance cause exceptions.
> --
>
> Key: SPARK-44337
> URL: https://issues.apache.org/jira/browse/SPARK-44337
> Project: Spark
>  Issue Type: Task
>  Components: Protobuf
>Affects Versions: 3.4.1
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
>Priority: Major
> Fix For: 3.5.0
>
>
> Protobuf functions added support for converting `Any` fields to json strings. 
> It uses Protobuf's built in `JsonFormat` to covert to JSON.
> JsonFormat fails to handled the fields when they are set to 
> `Any.getDefaultInstance()` in the original message. This fails only while 
> using descriptor set, but does not fail while using Java classes. Since using 
> descriptor files is the common case, this can be blocker. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44351) Make some syntactic simplification

2023-07-10 Thread Yang Jie (Jira)
Yang Jie created SPARK-44351:


 Summary: Make some syntactic simplification
 Key: SPARK-44351
 URL: https://issues.apache.org/jira/browse/SPARK-44351
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.5.0
Reporter: Yang Jie


- Use `exists` instead of `find` and `emptiness check`
- Use `orNull` instead of `etOrElse(null)`
- Use `getOrElse(key, value)` instead of `get(key).getOrElse(value)` on map
- Use `find` instead of `filter` + `headOption`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44350) Upgrade sbt to 1.9.2

2023-07-10 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-44350:
---

 Summary: Upgrade sbt to 1.9.2
 Key: SPARK-44350
 URL: https://issues.apache.org/jira/browse/SPARK-44350
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.5.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44328) Assign names to the error class _LEGACY_ERROR_TEMP_[2325-2328]

2023-07-10 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-44328:


Assignee: jiaan.geng

> Assign names to the error class _LEGACY_ERROR_TEMP_[2325-2328]
> --
>
> Key: SPARK-44328
> URL: https://issues.apache.org/jira/browse/SPARK-44328
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44328) Assign names to the error class _LEGACY_ERROR_TEMP_[2325-2328]

2023-07-10 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-44328.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41889
[https://github.com/apache/spark/pull/41889]

> Assign names to the error class _LEGACY_ERROR_TEMP_[2325-2328]
> --
>
> Key: SPARK-44328
> URL: https://issues.apache.org/jira/browse/SPARK-44328
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44348) Reenable Session-based artifact test cases

2023-07-10 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741467#comment-17741467
 ] 

Hyukjin Kwon commented on SPARK-44348:
--

I am working on this.

> Reenable Session-based artifact test cases
> --
>
> Key: SPARK-44348
> URL: https://issues.apache.org/jira/browse/SPARK-44348
> Project: Spark
>  Issue Type: Task
>  Components: PySpark, Tests
>Affects Versions: 3.5.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Several tests in https://github.com/apache/spark/pull/41495 were skipped. 
> Should be investigated and reenabled back.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org