date:20231220

[jira] [Resolved] (SPARK-46392) In Function DataSourceStrategy.translateFilterWithMapping, we need transfer cast expression to data source for filtering

2023-12-20 Thread jiahong.li (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiahong.li resolved SPARK-46392.

Resolution: Abandoned

> In Function DataSourceStrategy.translateFilterWithMapping, we need transfer 
> cast expression to  data source for filtering
> -
>
> Key: SPARK-46392
> URL: https://issues.apache.org/jira/browse/SPARK-46392
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiahong.li
>Priority: Minor
>  Labels: pull-request-available
>
> Considering this Situation:
>  We create a partition table that created by source which is extends 
> TableProvider, if we select data from  some specific partitions, choose 
> partition dataType differ from table partition type leads partition can not 
> be pushed down .
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46473) Reuse `getPartitionedFile` method

2023-12-20 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46473:
---
Labels: pull-request-available  (was: )

> Reuse `getPartitionedFile` method
> -
>
> Key: SPARK-46473
> URL: https://issues.apache.org/jira/browse/SPARK-46473
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: xiaoping.huang
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46473) Reuse `getPartitionedFile` method

2023-12-20 Thread xiaoping.huang (Jira)

xiaoping.huang created SPARK-46473:
--

 Summary: Reuse `getPartitionedFile` method
 Key: SPARK-46473
 URL: https://issues.apache.org/jira/browse/SPARK-46473
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.5.0
Reporter: xiaoping.huang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46472) Refine docstring of `array_prepend/array_append/array_insert`

2023-12-20 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46472:
---
Labels: pull-request-available  (was: )

> Refine docstring of `array_prepend/array_append/array_insert`
> -
>
> Key: SPARK-46472
> URL: https://issues.apache.org/jira/browse/SPARK-46472
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46472) Refine docstring of `array_prepend/array_append/array_insert`

2023-12-20 Thread Yang Jie (Jira)

Yang Jie created SPARK-46472:


 Summary: Refine docstring of 
`array_prepend/array_append/array_insert`
 Key: SPARK-46472
 URL: https://issues.apache.org/jira/browse/SPARK-46472
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark
Affects Versions: 4.0.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46469) Clean up useless local variables in `InsertIntoHiveTable`

2023-12-20 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-46469:


Assignee: Yang Jie

> Clean up useless local variables in `InsertIntoHiveTable`
> -
>
> Key: SPARK-46469
> URL: https://issues.apache.org/jira/browse/SPARK-46469
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46469) Clean up useless local variables in `InsertIntoHiveTable`

2023-12-20 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-46469.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44433
[https://github.com/apache/spark/pull/44433]

> Clean up useless local variables in `InsertIntoHiveTable`
> -
>
> Key: SPARK-46469
> URL: https://issues.apache.org/jira/browse/SPARK-46469
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46471) Reorganize `OpsOnDiffFramesEnabledTests`

2023-12-20 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46471:
---
Labels: pull-request-available  (was: )

> Reorganize `OpsOnDiffFramesEnabledTests`
> 
>
> Key: SPARK-46471
> URL: https://issues.apache.org/jira/browse/SPARK-46471
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46471) Reorganize `OpsOnDiffFramesEnabledTests`

2023-12-20 Thread Ruifeng Zheng (Jira)

Ruifeng Zheng created SPARK-46471:
-

 Summary: Reorganize `OpsOnDiffFramesEnabledTests`
 Key: SPARK-46471
 URL: https://issues.apache.org/jira/browse/SPARK-46471
 Project: Spark
  Issue Type: Sub-task
  Components: PS, Tests
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46470) Move `test_series_datetime` to `pyspark.pandas.tests.connect.series.*`

2023-12-20 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46470:
---
Labels: pull-request-available  (was: )

> Move `test_series_datetime` to `pyspark.pandas.tests.connect.series.*`
> --
>
> Key: SPARK-46470
> URL: https://issues.apache.org/jira/browse/SPARK-46470
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46469) Clean up useless local variables in `InsertIntoHiveTable`

2023-12-20 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46469:
---
Labels: pull-request-available  (was: )

> Clean up useless local variables in `InsertIntoHiveTable`
> -
>
> Key: SPARK-46469
> URL: https://issues.apache.org/jira/browse/SPARK-46469
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46469) Clean up useless local variables in `InsertIntoHiveTable`

2023-12-20 Thread Yang Jie (Jira)

Yang Jie created SPARK-46469:


 Summary: Clean up useless local variables in `InsertIntoHiveTable`
 Key: SPARK-46469
 URL: https://issues.apache.org/jira/browse/SPARK-46469
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46207) Support MergeInto in DataFrameWriterV2

2023-12-20 Thread Jiaan Geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiaan Geng reassigned SPARK-46207:
--

Assignee: Huaxin Gao

> Support MergeInto in DataFrameWriterV2
> --
>
> Key: SPARK-46207
> URL: https://issues.apache.org/jira/browse/SPARK-46207
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46207) Support MergeInto in DataFrameWriterV2

2023-12-20 Thread Jiaan Geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiaan Geng resolved SPARK-46207.

Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44119
[https://github.com/apache/spark/pull/44119]

> Support MergeInto in DataFrameWriterV2
> --
>
> Key: SPARK-46207
> URL: https://issues.apache.org/jira/browse/SPARK-46207
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-46392) In Function DataSourceStrategy.translateFilterWithMapping, we need transfer cast expression to data source for filtering

2023-12-20 Thread jiahong.li (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-46392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17799227#comment-17799227
 ] 

jiahong.li commented on SPARK-46392:


pr: https://github.com/apache/spark/pull/44431

> In Function DataSourceStrategy.translateFilterWithMapping, we need transfer 
> cast expression to  data source for filtering
> -
>
> Key: SPARK-46392
> URL: https://issues.apache.org/jira/browse/SPARK-46392
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiahong.li
>Priority: Minor
>  Labels: pull-request-available
>
> Considering this Situation:
>  We create a partition table that created by source which is extends 
> TableProvider, if we select data from  some specific partitions, choose 
> partition dataType differ from table partition type leads partition can not 
> be pushed down .
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46468) COUNT bug in lateral/exists subqueries

2023-12-20 Thread Andrey Gubichev (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Gubichev updated SPARK-46468:

Description: 
Some further instances of a COUNT bug.

 

One example is this test from join-lateral.sql

[https://github.com/apache/spark/blame/master/sql/core/src/test/resources/sql-tests/results/join-lateral.sql.out#L757]

 

According to PostgreSQL, the query should return 2 rows:

c1 | c2 | sum

---{-}++{-}--{-}{-}

  0 |  1 |   2

  1 |  2 |    NULL

 

whereas Spark SQL only returns the first one.

 

Similar instance is the following query, which should return 1 row from t1 but 
has an empty result now:

{{create temporary view t1(c1, c2) as values (0, 1), (1, 2);}}
{{create temporary view t2(c1, c2) as values (0, 2), (0, 3);}}

{{SELECT tt1.c2}}
{{FROM t1 as tt1}}
{{WHERE tt1.c1 in (}}
select max(tt2.c1)
from t2 as tt2
 where tt1.c2 is null);

  was:
Some further instances of a COUNT bug.

 

One example is this test from join-lateral.sql

[https://github.com/apache/spark/blame/master/sql/core/src/test/resources/sql-tests/results/join-lateral.sql.out#L757]

 

According to PostgreSQL, the query should return 2 rows:

 c1 | c2 | sum 

++-

  0 |  1 |   2

  1 |  2 |    NULL

 

whereas Spark SQL only returns the first one.

 

Similar instance is the following query, which should return 1 row from t1 but 
has an empty result now:

{{create temporary view t1(c1, c2) as values (0, 1), (1, 2);}}
{{create temporary view t2(c1, c2) as values (0, 2), (0, 3);}}


{{SELECT tt1.c2}}
{{FROM t1 as tt1}}
{{WHERE tt1.c1 in (}}
{{  select max(tt2.c1)}}
{{  from t2 as tt2}}
{{  where tt1.c2 is null);}}

{{}}


> COUNT bug in lateral/exists subqueries
> --
>
> Key: SPARK-46468
> URL: https://issues.apache.org/jira/browse/SPARK-46468
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Andrey Gubichev
>Priority: Major
>
> Some further instances of a COUNT bug.
>  
> One example is this test from join-lateral.sql
> [https://github.com/apache/spark/blame/master/sql/core/src/test/resources/sql-tests/results/join-lateral.sql.out#L757]
>  
> According to PostgreSQL, the query should return 2 rows:
> c1 | c2 | sum
> ---{-}++{-}--{-}{-}
>   0 |  1 |   2
>   1 |  2 |    NULL
>  
> whereas Spark SQL only returns the first one.
>  
> Similar instance is the following query, which should return 1 row from t1 
> but has an empty result now:
> {{create temporary view t1(c1, c2) as values (0, 1), (1, 2);}}
> {{create temporary view t2(c1, c2) as values (0, 2), (0, 3);}}
> {{SELECT tt1.c2}}
> {{FROM t1 as tt1}}
> {{WHERE tt1.c1 in (}}
> select max(tt2.c1)
> from t2 as tt2
>  where tt1.c2 is null);



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46468) COUNT bug in lateral/exists subqueries

2023-12-20 Thread Andrey Gubichev (Jira)

Andrey Gubichev created SPARK-46468:
---

 Summary: COUNT bug in lateral/exists subqueries
 Key: SPARK-46468
 URL: https://issues.apache.org/jira/browse/SPARK-46468
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Andrey Gubichev


Some further instances of a COUNT bug.

 

One example is this test from join-lateral.sql

[https://github.com/apache/spark/blame/master/sql/core/src/test/resources/sql-tests/results/join-lateral.sql.out#L757]

 

According to PostgreSQL, the query should return 2 rows:

 c1 | c2 | sum 

++-

  0 |  1 |   2

  1 |  2 |    NULL

 

whereas Spark SQL only returns the first one.

 

Similar instance is the following query, which should return 1 row from t1 but 
has an empty result now:

{{create temporary view t1(c1, c2) as values (0, 1), (1, 2);}}
{{create temporary view t2(c1, c2) as values (0, 2), (0, 3);}}


{{SELECT tt1.c2}}
{{FROM t1 as tt1}}
{{WHERE tt1.c1 in (}}
{{  select max(tt2.c1)}}
{{  from t2 as tt2}}
{{  where tt1.c2 is null);}}

{{}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46465) Implement Column.isNaN

2023-12-20 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46465:


Assignee: Ruifeng Zheng

> Implement Column.isNaN
> --
>
> Key: SPARK-46465
> URL: https://issues.apache.org/jira/browse/SPARK-46465
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46465) Implement Column.isNaN

2023-12-20 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46465.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44422
[https://github.com/apache/spark/pull/44422]

> Implement Column.isNaN
> --
>
> Key: SPARK-46465
> URL: https://issues.apache.org/jira/browse/SPARK-46465
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46462) Reorganize `OpsOnDiffFramesGroupByRollingTests`

2023-12-20 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46462:


Assignee: Ruifeng Zheng

> Reorganize `OpsOnDiffFramesGroupByRollingTests`
> ---
>
> Key: SPARK-46462
> URL: https://issues.apache.org/jira/browse/SPARK-46462
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46462) Reorganize `OpsOnDiffFramesGroupByRollingTests`

2023-12-20 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46462.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44420
[https://github.com/apache/spark/pull/44420]

> Reorganize `OpsOnDiffFramesGroupByRollingTests`
> ---
>
> Key: SPARK-46462
> URL: https://issues.apache.org/jira/browse/SPARK-46462
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46463) Reorganize `OpsOnDiffFramesGroupByExpandingTests`

2023-12-20 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46463.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44421
[https://github.com/apache/spark/pull/44421]

> Reorganize `OpsOnDiffFramesGroupByExpandingTests`
> -
>
> Key: SPARK-46463
> URL: https://issues.apache.org/jira/browse/SPARK-46463
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46463) Reorganize `OpsOnDiffFramesGroupByExpandingTests`

2023-12-20 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46463:


Assignee: Ruifeng Zheng

> Reorganize `OpsOnDiffFramesGroupByExpandingTests`
> -
>
> Key: SPARK-46463
> URL: https://issues.apache.org/jira/browse/SPARK-46463
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-46466) vectorized parquet reader should never do rebase for timestamp ntz

2023-12-20 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-46466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17799207#comment-17799207
 ] 

Dongjoon Hyun commented on SPARK-46466:
---

According to the PR content, I added the `correctness` label and changed this 
as a blocker for Apache Spark 3.5.1 and 3.4.3.

> vectorized parquet reader should never do rebase for timestamp ntz
> --
>
> Key: SPARK-46466
> URL: https://issues.apache.org/jira/browse/SPARK-46466
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Priority: Blocker
>  Labels: correctness, pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46466) vectorized parquet reader should never do rebase for timestamp ntz

2023-12-20 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-46466:
--
Target Version/s: 3.5.1, 3.4.3

> vectorized parquet reader should never do rebase for timestamp ntz
> --
>
> Key: SPARK-46466
> URL: https://issues.apache.org/jira/browse/SPARK-46466
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Priority: Blocker
>  Labels: correctness, pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46466) vectorized parquet reader should never do rebase for timestamp ntz

2023-12-20 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-46466:
--
Labels: correctness pull-request-available  (was: pull-request-available)

> vectorized parquet reader should never do rebase for timestamp ntz
> --
>
> Key: SPARK-46466
> URL: https://issues.apache.org/jira/browse/SPARK-46466
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Priority: Major
>  Labels: correctness, pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46466) vectorized parquet reader should never do rebase for timestamp ntz

2023-12-20 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-46466:
--
Priority: Blocker  (was: Major)

> vectorized parquet reader should never do rebase for timestamp ntz
> --
>
> Key: SPARK-46466
> URL: https://issues.apache.org/jira/browse/SPARK-46466
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Priority: Blocker
>  Labels: correctness, pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46413) Validate returnType of Arrow Python UDF

2023-12-20 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46413.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44362
[https://github.com/apache/spark/pull/44362]

> Validate returnType of Arrow Python UDF
> ---
>
> Key: SPARK-46413
> URL: https://issues.apache.org/jira/browse/SPARK-46413
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Validate returnType of Arrow Python UDF



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46398) Test rangeBetween window function (pyspark.sql.window)

2023-12-20 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46398.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44339
[https://github.com/apache/spark/pull/44339]

> Test rangeBetween window function (pyspark.sql.window)
> --
>
> Key: SPARK-46398
> URL: https://issues.apache.org/jira/browse/SPARK-46398
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46398) Test rangeBetween window function (pyspark.sql.window)

2023-12-20 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46398:


Assignee: Xinrong Meng

> Test rangeBetween window function (pyspark.sql.window)
> --
>
> Key: SPARK-46398
> URL: https://issues.apache.org/jira/browse/SPARK-46398
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46456) Shutdown hook timeouts during ui stop

2023-12-20 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46456.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44413
[https://github.com/apache/spark/pull/44413]

> Shutdown hook timeouts during ui stop
> -
>
> Key: SPARK-46456
> URL: https://issues.apache.org/jira/browse/SPARK-46456
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.1, 3.5.0, 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-46450) session_window doesn't identify sessions with provided gap when used as a window function

2023-12-20 Thread Juan Pumarino (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-46450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17799140#comment-17799140
 ] 

Juan Pumarino commented on SPARK-46450:
---

[~kabhwan] thanks for the explanation; I learned a bit more about how Spark 
internals work.

> session_window doesn't identify sessions with provided gap when used as a 
> window function
> -
>
> Key: SPARK-46450
> URL: https://issues.apache.org/jira/browse/SPARK-46450
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Juan Pumarino
>Priority: Minor
>
> {{PARTITION BY session_window}} doesn't produce the expected results. Here's 
> an example:
> {code:sql}
> SELECT 
>   id,
>   ts,
>   collect_list(id) OVER (PARTITION BY session_window(ts, '1 hour')) as 
> window_ids
> FROM VALUES
>   (1, "2023-12-11 01:10"),
>   (2, "2023-12-11 01:15"),
>   (3, "2023-12-11 01:40"),
>   (4, "2023-12-11 02:05"),
>   (5, "2023-12-11 03:15"),
>   (6, "2023-12-11 03:20"),
>   (7, "2023-12-11 04:10"),
>   (8, "2023-12-11 05:05")
>   AS tab(id, ts)
> {code}
> Actual result:
> {code:java}
> +---++--+
> |id |ts  |window_ids|
> +---++--+
> |1  |2023-12-11 01:10|[1]   |
> |2  |2023-12-11 01:15|[2]   |
> |3  |2023-12-11 01:40|[3]   |
> |4  |2023-12-11 02:05|[4]   |
> |5  |2023-12-11 03:15|[5]   |
> |6  |2023-12-11 03:20|[6]   |
> |7  |2023-12-11 04:10|[7]   |
> |8  |2023-12-11 05:05|[8]   |
> +---++--+
> {code}
> Expected result, assigning rows to two sessions with 1-hour gap:
> {code:java}
> +---+++
> |id |ts  |window_ids  |
> +---+++
> |1  |2023-12-11 01:10|[1, 2, 3, 4]|
> |2  |2023-12-11 01:15|[1, 2, 3, 4]|
> |3  |2023-12-11 01:40|[1, 2, 3, 4]|
> |4  |2023-12-11 02:05|[1, 2, 3, 4]|
> |5  |2023-12-11 03:15|[5, 6, 7, 8]|
> |6  |2023-12-11 03:20|[5, 6, 7, 8]|
> |7  |2023-12-11 04:10|[5, 6, 7, 8]|
> |8  |2023-12-11 05:05|[5, 6, 7, 8]|
> +---+++
> {code}
> I compared its behavior with the results as a grouping function and with how 
> {{window()}} behaves in both cases, which seems to confirm that the result is 
> inconsistent. Here are the other examples:
> *{{group by window()}}*
> {code:sql}
> SELECT 
>   collect_list(id) AS ids,
>   collect_list(ts) AS tss,
>   window
> FROM VALUES
>   (1, "2023-12-11 01:10"),
>   (2, "2023-12-11 01:15"),
>   (3, "2023-12-11 01:40"),
>   (4, "2023-12-11 02:05"),
>   (5, "2023-12-11 03:15"),
>   (6, "2023-12-11 03:20"),
>   (7, "2023-12-11 04:10"),
>   (8, "2023-12-11 05:05")
>   AS tab(id, ts)
> GROUP by window(ts, '1 hour')
> {code}
> Correctly assigns rows to 1-hour windows:
> {code:java}
> +-+--+--+
> |ids  |tss   |window  
>   |
> +-+--+--+
> |[1, 2, 3]|[2023-12-11 01:10, 2023-12-11 01:15, 2023-12-11 01:40]|{2023-12-11 
> 01:00:00, 2023-12-11 02:00:00}|
> |[4]  |[2023-12-11 02:05]|{2023-12-11 
> 02:00:00, 2023-12-11 03:00:00}|
> |[5, 6]   |[2023-12-11 03:15, 2023-12-11 03:20]  |{2023-12-11 
> 03:00:00, 2023-12-11 04:00:00}|
> |[7]  |[2023-12-11 04:10]|{2023-12-11 
> 04:00:00, 2023-12-11 05:00:00}|
> |[8]  |[2023-12-11 05:05]|{2023-12-11 
> 05:00:00, 2023-12-11 06:00:00}|
> +-+--+--+
> {code}
>  
> *{{group by session_window()}}*
> {code:sql}
> SELECT 
>   collect_list(id) AS ids,
>   collect_list(ts) AS tss,
>   session_window
> FROM VALUES
>   (1, "2023-12-11 01:10"),
>   (2, "2023-12-11 01:15"),
>   (3, "2023-12-11 01:40"),
>   (4, "2023-12-11 02:05"),
>   (5, "2023-12-11 03:15"),
>   (6, "2023-12-11 03:20"),
>   (7, "2023-12-11 04:10"),
>   (8, "2023-12-11 05:05")
>   AS tab(id, ts)
> GROUP by session_window(ts, '1 hour')
> {code}
> Correctly assigns rows to two sessions with 1-hour gap:
> {code:java}
> +++--+
> |ids |tss 
> |session_window|
> +++--+
> |[1, 2, 3,

[jira] [Created] (SPARK-46467) Improve and test exceptions of TimedeltaIndex

2023-12-20 Thread Xinrong Meng (Jira)

Xinrong Meng created SPARK-46467:


 Summary: Improve and test exceptions of TimedeltaIndex
 Key: SPARK-46467
 URL: https://issues.apache.org/jira/browse/SPARK-46467
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, Tests
Affects Versions: 4.0.0
Reporter: Xinrong Meng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46467) Improve and test exceptions of TimedeltaIndex

2023-12-20 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46467:
---
Labels: pull-request-available  (was: )

> Improve and test exceptions of TimedeltaIndex
> -
>
> Key: SPARK-46467
> URL: https://issues.apache.org/jira/browse/SPARK-46467
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46447) Remove the legacy datetime rebasing SQL configs

2023-12-20 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-46447.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44402
[https://github.com/apache/spark/pull/44402]

> Remove the legacy datetime rebasing SQL configs
> ---
>
> Key: SPARK-46447
> URL: https://issues.apache.org/jira/browse/SPARK-46447
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Remove already deprecated SQL configs (alternatives to other configs):
> - spark.sql.legacy.parquet.int96RebaseModeInWrite
> - spark.sql.legacy.parquet.datetimeRebaseModeInWrite
> - spark.sql.legacy.parquet.int96RebaseModeInRead
> - spark.sql.legacy.parquet.datetimeRebaseModeInRead
> - spark.sql.legacy.avro.datetimeRebaseModeInWrite
> - spark.sql.legacy.avro.datetimeRebaseModeInRead



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46466) vectorized parquet reader should never do rebase for timestamp ntz

2023-12-20 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46466:
---
Labels: pull-request-available  (was: )

> vectorized parquet reader should never do rebase for timestamp ntz
> --
>
> Key: SPARK-46466
> URL: https://issues.apache.org/jira/browse/SPARK-46466
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46466) vectorized parquet reader should never do rebase for timestamp ntz

2023-12-20 Thread Wenchen Fan (Jira)

Wenchen Fan created SPARK-46466:
---

 Summary: vectorized parquet reader should never do rebase for 
timestamp ntz
 Key: SPARK-46466
 URL: https://issues.apache.org/jira/browse/SPARK-46466
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-46461) The `sbt console` command is not available

2023-12-20 Thread Yang Jie (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-46461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17798975#comment-17798975
 ] 

Yang Jie commented on SPARK-46461:
--

cc [~srowen] [~dongjoon] [~gurwls223] 

I'm not certain if `sbt console` command needs to be fixed, but I've found 
configurations related to the `console` command in `SparkBuild.scala` that 
should have been unusable for quite some time.

 

[https://github.com/apache/spark/blob/c9cfaac90fd423c3a38e295234e24744b946cb02/project/SparkBuild.scala#L1126-L1148]

 
{code:java}
object SQL {
  lazy val settings = Seq(
    (console / initialCommands) :=
      """
        |import org.apache.spark.SparkContext
        |import org.apache.spark.sql.SQLContext
        |import org.apache.spark.sql.catalyst.analysis._
        |import org.apache.spark.sql.catalyst.dsl._
        |import org.apache.spark.sql.catalyst.errors._
        |import org.apache.spark.sql.catalyst.expressions._
        |import org.apache.spark.sql.catalyst.plans.logical._
        |import org.apache.spark.sql.catalyst.rules._
        |import org.apache.spark.sql.catalyst.util._
        |import org.apache.spark.sql.execution
        |import org.apache.spark.sql.functions._
        |import org.apache.spark.sql.types._
        |
        |val sc = new SparkContext("local[*]", "dev-shell")
        |val sqlContext = new SQLContext(sc)
        |import sqlContext.implicits._
        |import sqlContext._
      """.stripMargin,
    (console / cleanupCommands) := "sc.stop()"
  )
} {code}
 

[https://github.com/apache/spark/blob/c9cfaac90fd423c3a38e295234e24744b946cb02/project/SparkBuild.scala#L1164-L1180]

 
{code:java}
    (console / initialCommands) :=
      """
        |import org.apache.spark.SparkContext
        |import org.apache.spark.sql.catalyst.analysis._
        |import org.apache.spark.sql.catalyst.dsl._
        |import org.apache.spark.sql.catalyst.errors._
        |import org.apache.spark.sql.catalyst.expressions._
        |import org.apache.spark.sql.catalyst.plans.logical._
        |import org.apache.spark.sql.catalyst.rules._
        |import org.apache.spark.sql.catalyst.util._
        |import org.apache.spark.sql.execution
        |import org.apache.spark.sql.functions._
        |import org.apache.spark.sql.hive._
        |import org.apache.spark.sql.hive.test.TestHive._
        |import org.apache.spark.sql.hive.test.TestHive.implicits._
        |import org.apache.spark.sql.types._""".stripMargin,
    (console / cleanupCommands) := "sparkContext.stop()", {code}

> The `sbt console` command is not available
> --
>
> Key: SPARK-46461
> URL: https://issues.apache.org/jira/browse/SPARK-46461
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Minor
>
> # Unable to define expressions after executing the `build/sbt console` command
> {code:java}
> scala> val i = 1 // show
> package $line3 {
>   sealed class $read extends _root_.scala.Serializable {
>     def () = {
>       super.;
>       ()
>     };
>     sealed class $iw extends _root_.java.io.Serializable {
>       def () = {
>         super.;
>         ()
>       };
>       val i = 1
>     };
>     val $iw = new $iw.
>   };
>   object $read extends scala.AnyRef {
>     def () = {
>       super.;
>       ()
>     };
>     val INSTANCE = new $read.
>   }
> }
> warning: -target is deprecated: Use -release instead to compile against the 
> correct platform API.
> Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation
>        ^
>        error: expected class or object definition {code}
> 2.  Due to the default unused imports check, the error "unused imports" will 
> be reported after executing the `build/sbt sql/console` command
> {code:java}
> Welcome to Scala 2.13.12 (OpenJDK 64-Bit Server VM, Java 17.0.9).
> Type in expressions for evaluation. Or try :help.
> warning: -target is deprecated: Use -release instead to compile against the 
> correct platform API.
> Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation
>        import org.apache.spark.sql.catalyst.errors._
>                                             ^
> On line 6: error: object errors is not a member of package 
> org.apache.spark.sql.catalyst
>        import org.apache.spark.sql.catalyst.analysis._
>                                                      ^
> On line 4: error: Unused import
>        Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unused-imports, site=
>        import org.apache.spark.sql.catalyst.dsl._
>                                                 ^
> On line 5: error: Unused import
>        Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=unused-imports, site=
>

[jira] [Resolved] (SPARK-46452) Add a new API in DSv2 DataWriter to write an iterator of records

2023-12-20 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-46452.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44410
[https://github.com/apache/spark/pull/44410]

> Add a new API in DSv2 DataWriter to write an iterator of records
> 
>
> Key: SPARK-46452
> URL: https://issues.apache.org/jira/browse/SPARK-46452
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Add a new API that takes an iterator of records.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46452) Add a new API in DSv2 DataWriter to write an iterator of records

2023-12-20 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-46452:
---

Assignee: Allison Wang

> Add a new API in DSv2 DataWriter to write an iterator of records
> 
>
> Key: SPARK-46452
> URL: https://issues.apache.org/jira/browse/SPARK-46452
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>
> Add a new API that takes an iterator of records.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46464) Fix the scroll issue of tables when overflow

2023-12-20 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46464:
---
Labels: pull-request-available  (was: )

> Fix the scroll issue of tables when overflow
> 
>
> Key: SPARK-46464
> URL: https://issues.apache.org/jira/browse/SPARK-46464
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 3.5.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46465) Implement Column.isNaN

2023-12-20 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46465:
---
Labels: pull-request-available  (was: )

> Implement Column.isNaN
> --
>
> Key: SPARK-46465
> URL: https://issues.apache.org/jira/browse/SPARK-46465
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46464) Fix the scroll issue of tables when overflow

2023-12-20 Thread Kent Yao (Jira)

Kent Yao created SPARK-46464:


 Summary: Fix the scroll issue of tables when overflow
 Key: SPARK-46464
 URL: https://issues.apache.org/jira/browse/SPARK-46464
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Affects Versions: 3.5.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46463) Reorganize `OpsOnDiffFramesGroupByExpandingTests`

2023-12-20 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46463:
---
Labels: pull-request-available  (was: )

> Reorganize `OpsOnDiffFramesGroupByExpandingTests`
> -
>
> Key: SPARK-46463
> URL: https://issues.apache.org/jira/browse/SPARK-46463
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46463) Reorganize `OpsOnDiffFramesGroupByExpandingTests`

2023-12-20 Thread Ruifeng Zheng (Jira)

Ruifeng Zheng created SPARK-46463:
-

 Summary: Reorganize `OpsOnDiffFramesGroupByExpandingTests`
 Key: SPARK-46463
 URL: https://issues.apache.org/jira/browse/SPARK-46463
 Project: Spark
  Issue Type: Sub-task
  Components: PS, Tests
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46052) Remove unnecessary TaskScheduler.killAllTaskAttempts

2023-12-20 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-46052:
--

Assignee: Apache Spark

> Remove unnecessary TaskScheduler.killAllTaskAttempts
> 
>
> Key: SPARK-46052
> URL: https://issues.apache.org/jira/browse/SPARK-46052
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.3, 3.1.3, 3.2.4, 3.3.3, 3.4.1, 3.5.0
>Reporter: wuyi
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Spark has two functions to kill all tasks in a Stage:
> * `cancelTasks`: Not only kill all the running tasks in all the stage 
> attempts but also abort all the stage attempts
> *  `killAllTaskAttempts`: Only kill all the running tasks in all the stage 
> attemtps but won't abort the attempts.
> However, there's no use case in Spark that a stage would launch new tasks 
> after its all tasks get killed. So I think we can replace 
> `killAllTaskAttempts` with `cancelTasks` directly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46052) Remove unnecessary TaskScheduler.killAllTaskAttempts

2023-12-20 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-46052:
--

Assignee: (was: Apache Spark)

> Remove unnecessary TaskScheduler.killAllTaskAttempts
> 
>
> Key: SPARK-46052
> URL: https://issues.apache.org/jira/browse/SPARK-46052
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.3, 3.1.3, 3.2.4, 3.3.3, 3.4.1, 3.5.0
>Reporter: wuyi
>Priority: Major
>  Labels: pull-request-available
>
> Spark has two functions to kill all tasks in a Stage:
> * `cancelTasks`: Not only kill all the running tasks in all the stage 
> attempts but also abort all the stage attempts
> *  `killAllTaskAttempts`: Only kill all the running tasks in all the stage 
> attemtps but won't abort the attempts.
> However, there's no use case in Spark that a stage would launch new tasks 
> after its all tasks get killed. So I think we can replace 
> `killAllTaskAttempts` with `cancelTasks` directly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46330) Loading of Spark UI blocks for a long time when HybridStore enabled

2023-12-20 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-46330.
--
Fix Version/s: 3.4.3
   3.5.1
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 44260
[https://github.com/apache/spark/pull/44260]

> Loading of Spark UI blocks for a long time when HybridStore enabled
> ---
>
> Key: SPARK-46330
> URL: https://issues.apache.org/jira/browse/SPARK-46330
> Project: Spark
>  Issue Type: Bug
>  Components: UI
>Affects Versions: 3.1.2, 3.3.1
>Reporter: Zhou Yifan
>Assignee: Zhou Yifan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.3, 3.5.1, 4.0.0
>
>
> In our SparkHistoryServer, we used these two property to speed up Spark UI's 
> loading:
> {code:java}
> spark.history.store.hybridStore.enabled true
> spark.history.store.hybridStore.maxMemoryUsage 16g {code}
> Occasionally, we found it took minutes to load a small eventlog which usually 
> took seconds.
> In the jstack output of SparkHistoryServer, we found that 4 threads were 
> blocked and waiting to lock 
> *org.apache.spark.deploy.history.FsHistoryProvider* object monitor, which was 
> locked by thread "spark-history-task-0" closing a HybridStore.
> {code:java}
> "qtp791499503-2688947" #2688947 daemon prio=5 os_prio=0 
> tid=0x7f4044042800 nid=0x8d98 waiting for monitor entry 
> [0x7f3f6476]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>     at 
> org.apache.spark.deploy.history.FsHistoryProvider.getAppUI(FsHistoryProvider.scala:386)
>     - waiting to lock <0x0004c64433f0> (a 
> org.apache.spark.deploy.history.FsHistoryProvider)
>     at 
> org.apache.spark.deploy.history.HistoryServer.getAppUI(HistoryServer.scala:194)
>     at 
> org.apache.spark.deploy.history.ApplicationCache.$anonfun$loadApplicationEntry$2(ApplicationCache.scala:182)
>     at 
> org.apache.spark.deploy.history.ApplicationCache$$Lambda$805/90086258.apply(Unknown
>  Source)
>     at 
> org.apache.spark.deploy.history.ApplicationCache.time(ApplicationCache.scala:154)
>     at 
> org.apache.spark.deploy.history.ApplicationCache.org$apache$spark$deploy$history$ApplicationCache$$loadApplicationEntry(ApplicationCache.scala:180)
>     at 
> org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:71)
>     at 
> org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:58)
>     at 
> org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>     at 
> org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>     at 
> org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>     - locked <0x00066effc3e8> (a 
> org.sparkproject.guava.cache.LocalCache$StrongAccessEntry)
>     at 
> org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
>     at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000)
>     at org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
>     at 
> org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>     at 
> org.apache.spark.deploy.history.ApplicationCache.get(ApplicationCache.scala:108)
>     at 
> org.apache.spark.deploy.history.ApplicationCache.withSparkUI(ApplicationCache.scala:120)
>     at 
> org.apache.spark.deploy.history.HistoryServer.org$apache$spark$deploy$history$HistoryServer$$loadAppUi(HistoryServer.scala:251)
>     at 
> org.apache.spark.deploy.history.HistoryServer$$anon$1.doGet(HistoryServer.scala:99)
> "spark-history-task-0" #49 daemon prio=5 os_prio=0 tid=0x7f431e55b800 
> nid=0x1ac6 in Object.wait() [0x7f41b2cc9000]
>    java.lang.Thread.State: WAITING (on object monitor)
>     at java.lang.Object.wait(Native Method)
>     at java.lang.Thread.join(Thread.java:1252)
>     - locked <0x00063ccbc9f0> (a java.lang.Thread)
>     at java.lang.Thread.join(Thread.java:1326)
>     at 
> org.apache.spark.deploy.history.HybridStore.close(HybridStore.scala:106)
>     at org.apache.spark.status.AppStatusStore.close(AppStatusStore.scala:553)
>     at 
> org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$invalidateUI$1(FsHistoryProvider.scala:913)
>     at 
> org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$invalidateUI$1$adapted(FsHistoryProvider.scala:911)
>     at 
> org.apache.spark.deploy.history.FsHistoryProvider$$Lambda$416/229723341.apply(Unknown
>  Source)
>     at scala.Option.foreach(Option.scala:407)
>     at 
> org.apache.spark.deploy.history.FsHistoryProvider.invalidateUI(FsHistoryProvider.scala:911)
>     - locked <0x0004c64433f0> (a 
> org.apache.spark.deploy.history.FsHistoryProvider)

[jira] [Assigned] (SPARK-46330) Loading of Spark UI blocks for a long time when HybridStore enabled

2023-12-20 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-46330:


Assignee: Zhou Yifan

> Loading of Spark UI blocks for a long time when HybridStore enabled
> ---
>
> Key: SPARK-46330
> URL: https://issues.apache.org/jira/browse/SPARK-46330
> Project: Spark
>  Issue Type: Bug
>  Components: UI
>Affects Versions: 3.1.2, 3.3.1
>Reporter: Zhou Yifan
>Assignee: Zhou Yifan
>Priority: Major
>  Labels: pull-request-available
>
> In our SparkHistoryServer, we used these two property to speed up Spark UI's 
> loading:
> {code:java}
> spark.history.store.hybridStore.enabled true
> spark.history.store.hybridStore.maxMemoryUsage 16g {code}
> Occasionally, we found it took minutes to load a small eventlog which usually 
> took seconds.
> In the jstack output of SparkHistoryServer, we found that 4 threads were 
> blocked and waiting to lock 
> *org.apache.spark.deploy.history.FsHistoryProvider* object monitor, which was 
> locked by thread "spark-history-task-0" closing a HybridStore.
> {code:java}
> "qtp791499503-2688947" #2688947 daemon prio=5 os_prio=0 
> tid=0x7f4044042800 nid=0x8d98 waiting for monitor entry 
> [0x7f3f6476]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>     at 
> org.apache.spark.deploy.history.FsHistoryProvider.getAppUI(FsHistoryProvider.scala:386)
>     - waiting to lock <0x0004c64433f0> (a 
> org.apache.spark.deploy.history.FsHistoryProvider)
>     at 
> org.apache.spark.deploy.history.HistoryServer.getAppUI(HistoryServer.scala:194)
>     at 
> org.apache.spark.deploy.history.ApplicationCache.$anonfun$loadApplicationEntry$2(ApplicationCache.scala:182)
>     at 
> org.apache.spark.deploy.history.ApplicationCache$$Lambda$805/90086258.apply(Unknown
>  Source)
>     at 
> org.apache.spark.deploy.history.ApplicationCache.time(ApplicationCache.scala:154)
>     at 
> org.apache.spark.deploy.history.ApplicationCache.org$apache$spark$deploy$history$ApplicationCache$$loadApplicationEntry(ApplicationCache.scala:180)
>     at 
> org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:71)
>     at 
> org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:58)
>     at 
> org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>     at 
> org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>     at 
> org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>     - locked <0x00066effc3e8> (a 
> org.sparkproject.guava.cache.LocalCache$StrongAccessEntry)
>     at 
> org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
>     at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000)
>     at org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
>     at 
> org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>     at 
> org.apache.spark.deploy.history.ApplicationCache.get(ApplicationCache.scala:108)
>     at 
> org.apache.spark.deploy.history.ApplicationCache.withSparkUI(ApplicationCache.scala:120)
>     at 
> org.apache.spark.deploy.history.HistoryServer.org$apache$spark$deploy$history$HistoryServer$$loadAppUi(HistoryServer.scala:251)
>     at 
> org.apache.spark.deploy.history.HistoryServer$$anon$1.doGet(HistoryServer.scala:99)
> "spark-history-task-0" #49 daemon prio=5 os_prio=0 tid=0x7f431e55b800 
> nid=0x1ac6 in Object.wait() [0x7f41b2cc9000]
>    java.lang.Thread.State: WAITING (on object monitor)
>     at java.lang.Object.wait(Native Method)
>     at java.lang.Thread.join(Thread.java:1252)
>     - locked <0x00063ccbc9f0> (a java.lang.Thread)
>     at java.lang.Thread.join(Thread.java:1326)
>     at 
> org.apache.spark.deploy.history.HybridStore.close(HybridStore.scala:106)
>     at org.apache.spark.status.AppStatusStore.close(AppStatusStore.scala:553)
>     at 
> org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$invalidateUI$1(FsHistoryProvider.scala:913)
>     at 
> org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$invalidateUI$1$adapted(FsHistoryProvider.scala:911)
>     at 
> org.apache.spark.deploy.history.FsHistoryProvider$$Lambda$416/229723341.apply(Unknown
>  Source)
>     at scala.Option.foreach(Option.scala:407)
>     at 
> org.apache.spark.deploy.history.FsHistoryProvider.invalidateUI(FsHistoryProvider.scala:911)
>     - locked <0x0004c64433f0> (a 
> org.apache.spark.deploy.history.FsHistoryProvider)
>     at 
> org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$checkForLogs$7(FsHistoryProvider.scala:541)
>     at 
>

[jira] [Updated] (SPARK-46460) The filter of partition including cast function may lead the partition pruning to disable

2023-12-20 Thread Zhou Tong (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhou Tong updated SPARK-46460:
--
Summary: The filter of partition including cast function may lead the 
partition pruning to disable  (was: The filter of partition includes cast 
function may lead the partition pruning to disable)

> The filter of partition including cast function may lead the partition 
> pruning to disable
> -
>
> Key: SPARK-46460
> URL: https://issues.apache.org/jira/browse/SPARK-46460
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer, SQL
>Affects Versions: 3.2.0
>Reporter: Zhou Tong
>Priority: Minor
>  Labels: pull-request-available
> Attachments: SPARK-46460.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> SQL：select * from test_db.test_table where day between 
> date_sub('2023-12-01',1) and  '2023-12-03'
> The Physical Plan of sql above will implement _cast_ function on partition 
> col 'day',  like this, {_}cast(day as date) > 2023-11-30{_}. In this 
> situation, spark just pass the filter condition _day < "2023-12-03"_ to 
> HiveMetastore, not including filter condition {_}cast(day as date) > 
> 2023-11-30{_}, which may lead performance of HMS degarde if the HiveTable has 
> huge number of partitions.
>  
> In this regard, a new rule may solve this problem. This rule can convert 
> binary comparison _cast(day as date) > 2023-11-30_ to {_}day > 
> cast(2023-11-30 as string){_}. The right node is foldable, so the result is 
> {_}day > "2023-11-30"{_}, and the filter condition passed to HMS will be _day 
> > "2023-11-30" and_ _day < "2023-12-03"._
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28386) Cannot resolve ORDER BY columns with GROUP BY and HAVING

2023-12-20 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-28386:
-
Fix Version/s: 4.0.0

> Cannot resolve ORDER BY columns with GROUP BY and HAVING
> 
>
> Key: SPARK-28386
> URL: https://issues.apache.org/jira/browse/SPARK-28386
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> How to reproduce:
> {code:sql}
> CREATE TABLE test_having (a int, b int, c string, d string) USING parquet;
> INSERT INTO test_having VALUES (0, 1, '', 'A');
> INSERT INTO test_having VALUES (1, 2, '', 'b');
> INSERT INTO test_having VALUES (2, 2, '', 'c');
> INSERT INTO test_having VALUES (3, 3, '', 'D');
> INSERT INTO test_having VALUES (4, 3, '', 'e');
> INSERT INTO test_having VALUES (5, 3, '', 'F');
> INSERT INTO test_having VALUES (6, 4, '', 'g');
> INSERT INTO test_having VALUES (7, 4, '', 'h');
> INSERT INTO test_having VALUES (8, 4, '', 'I');
> INSERT INTO test_having VALUES (9, 4, '', 'j');
> SELECT lower(c), count(c) FROM test_having
>   GROUP BY lower(c) HAVING count(*) > 2
>   ORDER BY lower(c);
> {code}
> {noformat}
> spark-sql> SELECT lower(c), count(c) FROM test_having
>  > GROUP BY lower(c) HAVING count(*) > 2
>  > ORDER BY lower(c);
> Error in query: cannot resolve '`c`' given input columns: [lower(c), 
> count(c)]; line 3 pos 19;
> 'Sort ['lower('c) ASC NULLS FIRST], true
> +- Project [lower(c)#158, count(c)#159L]
>+- Filter (count(1)#161L > cast(2 as bigint))
>   +- Aggregate [lower(c#7)], [lower(c#7) AS lower(c)#158, count(c#7) AS 
> count(c)#159L, count(1) AS count(1)#161L]
>  +- SubqueryAlias test_having
> +- Relation[a#5,b#6,c#7,d#8] parquet
> {noformat}
> But it works when setting an alias:
> {noformat}
> spark-sql> SELECT lower(c) withAias, count(c) FROM test_having
>  > GROUP BY lower(c) HAVING count(*) > 2
>  > ORDER BY withAias;
> 3
>   4
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28386) Cannot resolve ORDER BY columns with GROUP BY and HAVING

2023-12-20 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-28386.
--
  Assignee: Cheng Pan
Resolution: Fixed

Issue resolved by https://github.com/apache/spark/pull/44352

> Cannot resolve ORDER BY columns with GROUP BY and HAVING
> 
>
> Key: SPARK-28386
> URL: https://issues.apache.org/jira/browse/SPARK-28386
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
>
> How to reproduce:
> {code:sql}
> CREATE TABLE test_having (a int, b int, c string, d string) USING parquet;
> INSERT INTO test_having VALUES (0, 1, '', 'A');
> INSERT INTO test_having VALUES (1, 2, '', 'b');
> INSERT INTO test_having VALUES (2, 2, '', 'c');
> INSERT INTO test_having VALUES (3, 3, '', 'D');
> INSERT INTO test_having VALUES (4, 3, '', 'e');
> INSERT INTO test_having VALUES (5, 3, '', 'F');
> INSERT INTO test_having VALUES (6, 4, '', 'g');
> INSERT INTO test_having VALUES (7, 4, '', 'h');
> INSERT INTO test_having VALUES (8, 4, '', 'I');
> INSERT INTO test_having VALUES (9, 4, '', 'j');
> SELECT lower(c), count(c) FROM test_having
>   GROUP BY lower(c) HAVING count(*) > 2
>   ORDER BY lower(c);
> {code}
> {noformat}
> spark-sql> SELECT lower(c), count(c) FROM test_having
>  > GROUP BY lower(c) HAVING count(*) > 2
>  > ORDER BY lower(c);
> Error in query: cannot resolve '`c`' given input columns: [lower(c), 
> count(c)]; line 3 pos 19;
> 'Sort ['lower('c) ASC NULLS FIRST], true
> +- Project [lower(c)#158, count(c)#159L]
>+- Filter (count(1)#161L > cast(2 as bigint))
>   +- Aggregate [lower(c#7)], [lower(c#7) AS lower(c)#158, count(c#7) AS 
> count(c)#159L, count(1) AS count(1)#161L]
>  +- SubqueryAlias test_having
> +- Relation[a#5,b#6,c#7,d#8] parquet
> {noformat}
> But it works when setting an alias:
> {noformat}
> spark-sql> SELECT lower(c) withAias, count(c) FROM test_having
>  > GROUP BY lower(c) HAVING count(*) > 2
>  > ORDER BY withAias;
> 3
>   4
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46462) Reorganize `OpsOnDiffFramesGroupByRollingTests`

2023-12-20 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46462:
---
Labels: pull-request-available  (was: )

> Reorganize `OpsOnDiffFramesGroupByRollingTests`
> ---
>
> Key: SPARK-46462
> URL: https://issues.apache.org/jira/browse/SPARK-46462
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-46462) Reorganize `OpsOnDiffFramesGroupByRollingTests`

2023-12-20 Thread Ruifeng Zheng (Jira)

Ruifeng Zheng created SPARK-46462:
-

 Summary: Reorganize `OpsOnDiffFramesGroupByRollingTests`
 Key: SPARK-46462
 URL: https://issues.apache.org/jira/browse/SPARK-46462
 Project: Spark
  Issue Type: Sub-task
  Components: PS, Tests
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46399) Add exit status to the Application End event for the use of Spark Listener

2023-12-20 Thread Mridul Muralidharan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan reassigned SPARK-46399:
---

Assignee: Reza Safi

> Add exit status to the Application End event for the use of Spark Listener
> --
>
> Key: SPARK-46399
> URL: https://issues.apache.org/jira/browse/SPARK-46399
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Reza Safi
>Assignee: Reza Safi
>Priority: Minor
>  Labels: pull-request-available
>
> Currently SparkListenerApplicationEnd only has a timestamp value and there is 
> not exit status recorded with it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46399) Add exit status to the Application End event for the use of Spark Listener

2023-12-20 Thread Mridul Muralidharan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan resolved SPARK-46399.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44340
[https://github.com/apache/spark/pull/44340]

> Add exit status to the Application End event for the use of Spark Listener
> --
>
> Key: SPARK-46399
> URL: https://issues.apache.org/jira/browse/SPARK-46399
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Reza Safi
>Assignee: Reza Safi
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Currently SparkListenerApplicationEnd only has a timestamp value and there is 
> not exit status recorded with it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46272) Support CTAS using DSv2 sources

2023-12-20 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-46272.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44190
[https://github.com/apache/spark/pull/44190]

> Support CTAS using DSv2 sources
> ---
>
> Key: SPARK-46272
> URL: https://issues.apache.org/jira/browse/SPARK-46272
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46272) Support CTAS using DSv2 sources

2023-12-20 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-46272:
---

Assignee: Allison Wang

> Support CTAS using DSv2 sources
> ---
>
> Key: SPARK-46272
> URL: https://issues.apache.org/jira/browse/SPARK-46272
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

58 matches

Mail list logo