[jira] [Commented] (SPARK-48016) Fix a bug in try_divide function when with decimals

2024-04-29 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-48016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842200#comment-17842200
 ] 

Dongjoon Hyun commented on SPARK-48016:
---

Thank you so much!

> Fix a bug in try_divide function when with decimals
> ---
>
> Key: SPARK-48016
> URL: https://issues.apache.org/jira/browse/SPARK-48016
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.2
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>  Labels: pull-request-available
>
> Binary Arithmetic operators should include the evalMode during makeCopy. 
> Otherwise, the following query will throw DIVIDE_BY_ZERO error instead of 
> returning null
>  
> {code:java}
> SELECT try_divide(1, decimal(0)); {code}
> This is caused from the rule DecimalPrecision:
> {code:java}
> case b @ BinaryOperator(left, right) if left.dataType != right.dataType =>
>   (left, right) match {
>  ...
> case (l: Literal, r) if r.dataType.isInstanceOf[DecimalType] &&
> l.dataType.isInstanceOf[IntegralType] &&
> literalPickMinimumPrecision =>
>   b.makeCopy(Array(Cast(l, DataTypeUtils.fromLiteral(l)), r)) {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-48016) Fix a bug in try_divide function when with decimals

2024-04-29 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-48016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842191#comment-17842191
 ] 

Dongjoon Hyun commented on SPARK-48016:
---

Hi, [~Gengliang.Wang] .
- I updated the JIRA title according to the commit title.
- The umbrella Jira issue is done at Apache Spark 3.4.0. To give a more 
visibility, shall we move this to SPARK-44111 because recent ANSI JIRA issues 
are there?

> Fix a bug in try_divide function when with decimals
> ---
>
> Key: SPARK-48016
> URL: https://issues.apache.org/jira/browse/SPARK-48016
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.2
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>  Labels: pull-request-available
>
> Binary Arithmetic operators should include the evalMode during makeCopy. 
> Otherwise, the following query will throw DIVIDE_BY_ZERO error instead of 
> returning null
>  
> {code:java}
> SELECT try_divide(1, decimal(0)); {code}
> This is caused from the rule DecimalPrecision:
> {code:java}
> case b @ BinaryOperator(left, right) if left.dataType != right.dataType =>
>   (left, right) match {
>  ...
> case (l: Literal, r) if r.dataType.isInstanceOf[DecimalType] &&
> l.dataType.isInstanceOf[IntegralType] &&
> literalPickMinimumPrecision =>
>   b.makeCopy(Array(Cast(l, DataTypeUtils.fromLiteral(l)), r)) {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48016) Fix a bug in try_divide function when with decimals

2024-04-29 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-48016:
--
Summary: Fix a bug in try_divide function when with decimals  (was: Binary 
Arithmetic operators should include the evalMode when makeCopy)

> Fix a bug in try_divide function when with decimals
> ---
>
> Key: SPARK-48016
> URL: https://issues.apache.org/jira/browse/SPARK-48016
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.2
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>  Labels: pull-request-available
>
> Binary Arithmetic operators should include the evalMode during makeCopy. 
> Otherwise, the following query will throw DIVIDE_BY_ZERO error instead of 
> returning null
>  
> {code:java}
> SELECT try_divide(1, decimal(0)); {code}
> This is caused from the rule DecimalPrecision:
> {code:java}
> case b @ BinaryOperator(left, right) if left.dataType != right.dataType =>
>   (left, right) match {
>  ...
> case (l: Literal, r) if r.dataType.isInstanceOf[DecimalType] &&
> l.dataType.isInstanceOf[IntegralType] &&
> literalPickMinimumPrecision =>
>   b.makeCopy(Array(Cast(l, DataTypeUtils.fromLiteral(l)), r)) {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48042) Don't use a copy of timestamp formatter with a new override zone for each value

2024-04-29 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48042.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46282
[https://github.com/apache/spark/pull/46282]

> Don't use a copy of timestamp formatter with a new override zone for each 
> value
> ---
>
> Key: SPARK-48042
> URL: https://issues.apache.org/jira/browse/SPARK-48042
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48044) Cache `DataFrame.isStreaming`

2024-04-29 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48044.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46281
[https://github.com/apache/spark/pull/46281]

> Cache `DataFrame.isStreaming`
> -
>
> Key: SPARK-48044
> URL: https://issues.apache.org/jira/browse/SPARK-48044
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48046) Remove `clock` parameter from `DriverServiceFeatureStep`

2024-04-29 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48046.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46284
[https://github.com/apache/spark/pull/46284]

> Remove `clock` parameter from `DriverServiceFeatureStep`
> 
>
> Key: SPARK-48046
> URL: https://issues.apache.org/jira/browse/SPARK-48046
> Project: Spark
>  Issue Type: Task
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48046) Remove `clock` parameter from `DriverServiceFeatureStep`

2024-04-29 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-48046:
-

Assignee: Dongjoon Hyun

> Remove `clock` parameter from `DriverServiceFeatureStep`
> 
>
> Key: SPARK-48046
> URL: https://issues.apache.org/jira/browse/SPARK-48046
> Project: Spark
>  Issue Type: Task
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48046) Remove `clock` parameter from `DriverServiceFeatureStep`

2024-04-29 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-48046:
-

 Summary: Remove `clock` parameter from `DriverServiceFeatureStep`
 Key: SPARK-48046
 URL: https://issues.apache.org/jira/browse/SPARK-48046
 Project: Spark
  Issue Type: Task
  Components: Kubernetes
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48038) Promote driverServiceName to KubernetesDriverConf

2024-04-29 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-48038:
-

Assignee: Cheng Pan

> Promote driverServiceName to KubernetesDriverConf
> -
>
> Key: SPARK-48038
> URL: https://issues.apache.org/jira/browse/SPARK-48038
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48038) Promote driverServiceName to KubernetesDriverConf

2024-04-29 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48038.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46276
[https://github.com/apache/spark/pull/46276]

> Promote driverServiceName to KubernetesDriverConf
> -
>
> Key: SPARK-48038
> URL: https://issues.apache.org/jira/browse/SPARK-48038
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48036) Update `sql-ref-ansi-compliance.md` and `sql-ref-identifier.md`

2024-04-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48036.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46271
[https://github.com/apache/spark/pull/46271]

> Update `sql-ref-ansi-compliance.md` and `sql-ref-identifier.md`
> ---
>
> Key: SPARK-48036
> URL: https://issues.apache.org/jira/browse/SPARK-48036
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48029) Update the packages name removed in building the spark docker image

2024-04-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48029.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46258
[https://github.com/apache/spark/pull/46258]

> Update the packages name removed in building the spark docker image
> ---
>
> Key: SPARK-48029
> URL: https://issues.apache.org/jira/browse/SPARK-48029
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48029) Update the packages name removed in building the spark docker image

2024-04-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-48029:
-

Assignee: BingKun Pan

> Update the packages name removed in building the spark docker image
> ---
>
> Key: SPARK-48029
> URL: https://issues.apache.org/jira/browse/SPARK-48029
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48036) Update `sql-ref-ansi-compliance.md` and `sql-ref-identifier.md`

2024-04-28 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-48036:
-

 Summary: Update `sql-ref-ansi-compliance.md` and 
`sql-ref-identifier.md`
 Key: SPARK-48036
 URL: https://issues.apache.org/jira/browse/SPARK-48036
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48032) Upgrade `commons-codec` to 1.17.0

2024-04-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48032.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46268
[https://github.com/apache/spark/pull/46268]

> Upgrade `commons-codec` to 1.17.0
> -
>
> Key: SPARK-48032
> URL: https://issues.apache.org/jira/browse/SPARK-48032
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47730) Support APP_ID and EXECUTOR_ID placeholder in labels

2024-04-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47730:
--
Parent: SPARK-44111
Issue Type: Sub-task  (was: Improvement)

> Support APP_ID and EXECUTOR_ID placeholder in labels
> 
>
> Key: SPARK-47730
> URL: https://issues.apache.org/jira/browse/SPARK-47730
> Project: Spark
>  Issue Type: Sub-task
>  Components: k8s
>Affects Versions: 3.5.1
>Reporter: Xi Chen
>Assignee: Xi Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47730) Support APP_ID and EXECUTOR_ID placeholder in labels

2024-04-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47730.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46149
[https://github.com/apache/spark/pull/46149]

> Support APP_ID and EXECUTOR_ID placeholder in labels
> 
>
> Key: SPARK-47730
> URL: https://issues.apache.org/jira/browse/SPARK-47730
> Project: Spark
>  Issue Type: Improvement
>  Components: k8s
>Affects Versions: 3.5.1
>Reporter: Xi Chen
>Assignee: Xi Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47730) Support APP_ID and EXECUTOR_ID placeholder in labels

2024-04-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47730:
-

Assignee: Xi Chen

> Support APP_ID and EXECUTOR_ID placeholder in labels
> 
>
> Key: SPARK-47730
> URL: https://issues.apache.org/jira/browse/SPARK-47730
> Project: Spark
>  Issue Type: Improvement
>  Components: k8s
>Affects Versions: 3.5.1
>Reporter: Xi Chen
>Assignee: Xi Chen
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48021) Add `--add-modules=jdk.incubator.vector` to `JavaModuleOptions`

2024-04-27 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-48021:
-

Assignee: BingKun Pan

> Add `--add-modules=jdk.incubator.vector` to `JavaModuleOptions`
> ---
>
> Key: SPARK-48021
> URL: https://issues.apache.org/jira/browse/SPARK-48021
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48021) Add `--add-modules=jdk.incubator.vector` to `JavaModuleOptions`

2024-04-27 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48021.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46246
[https://github.com/apache/spark/pull/46246]

> Add `--add-modules=jdk.incubator.vector` to `JavaModuleOptions`
> ---
>
> Key: SPARK-48021
> URL: https://issues.apache.org/jira/browse/SPARK-48021
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47408) Fix mathExpressions that use StringType

2024-04-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47408.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46227
[https://github.com/apache/spark/pull/46227]

> Fix mathExpressions that use StringType
> ---
>
> Key: SPARK-47408
> URL: https://issues.apache.org/jira/browse/SPARK-47408
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47943) Add Operator CI Task for Java Build and Test

2024-04-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47943:
--
Fix Version/s: kubernetes-operator-0.1.0
   (was: 4.0.0)

> Add Operator CI Task for Java Build and Test
> 
>
> Key: SPARK-47943
> URL: https://issues.apache.org/jira/browse/SPARK-47943
> Project: Spark
>  Issue Type: Sub-task
>  Components: k8s
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Zhou JIANG
>Assignee: Zhou JIANG
>Priority: Major
>  Labels: pull-request-available
> Fix For: kubernetes-operator-0.1.0
>
>
> We need to add CI task to build and test Java code for upcoming operator pull 
> requests. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48015) Update `build.gradle` to fix deprecation warnings

2024-04-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-48015:
-

Assignee: Dongjoon Hyun

> Update `build.gradle` to fix deprecation warnings
> -
>
> Key: SPARK-48015
> URL: https://issues.apache.org/jira/browse/SPARK-48015
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Kubernetes
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Trivial
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48015) Update `build.gradle` to fix deprecation warnings

2024-04-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48015.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 9
[https://github.com/apache/spark-kubernetes-operator/pull/9]

> Update `build.gradle` to fix deprecation warnings
> -
>
> Key: SPARK-48015
> URL: https://issues.apache.org/jira/browse/SPARK-48015
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Kubernetes
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47929) Setup Static Analysis for Operator

2024-04-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47929:
--
Fix Version/s: kubernetes-operator-0.1.0
   (was: 4.0.0)

> Setup Static Analysis for Operator
> --
>
> Key: SPARK-47929
> URL: https://issues.apache.org/jira/browse/SPARK-47929
> Project: Spark
>  Issue Type: Sub-task
>  Components: k8s
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Zhou JIANG
>Assignee: Zhou JIANG
>Priority: Major
>  Labels: pull-request-available
> Fix For: kubernetes-operator-0.1.0
>
>
> Add common analysis tasks including checkstyle, spotbugs, jacoco. Also 
> include spotless for style fix.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47950) Add Java API Module for Spark Operator

2024-04-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47950:
--
Fix Version/s: kubernetes-operator-0.1.0
   (was: 4.0.0)

> Add Java API Module for Spark Operator
> --
>
> Key: SPARK-47950
> URL: https://issues.apache.org/jira/browse/SPARK-47950
> Project: Spark
>  Issue Type: Sub-task
>  Components: k8s
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Zhou JIANG
>Assignee: Zhou JIANG
>Priority: Major
>  Labels: pull-request-available
> Fix For: kubernetes-operator-0.1.0
>
>
> Spark Operator API refers to the 
> [CustomResourceDefinition|https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/]
>  __ that represents the spec for Spark Application in k8s.
> This aims to add Java API library for Spark Operator, with the ability to 
> generate yaml spec.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48015) Update `build.gradle` to fix deprecation warnings

2024-04-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-48015:
--
Fix Version/s: kubernetes-operator-0.1.0
   (was: 4.0.0)

> Update `build.gradle` to fix deprecation warnings
> -
>
> Key: SPARK-48015
> URL: https://issues.apache.org/jira/browse/SPARK-48015
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Kubernetes
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: kubernetes-operator-0.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48015) Update `build.gradle` to fix deprecation warnings

2024-04-26 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-48015:
-

 Summary: Update `build.gradle` to fix deprecation warnings
 Key: SPARK-48015
 URL: https://issues.apache.org/jira/browse/SPARK-48015
 Project: Spark
  Issue Type: Sub-task
  Components: Build, Kubernetes
Affects Versions: kubernetes-operator-0.1.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47950) Add Java API Module for Spark Operator

2024-04-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47950:
-

Assignee: Zhou JIANG

> Add Java API Module for Spark Operator
> --
>
> Key: SPARK-47950
> URL: https://issues.apache.org/jira/browse/SPARK-47950
> Project: Spark
>  Issue Type: Sub-task
>  Components: k8s
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Zhou JIANG
>Assignee: Zhou JIANG
>Priority: Major
>  Labels: pull-request-available
>
> Spark Operator API refers to the 
> [CustomResourceDefinition|https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/]
>  __ that represents the spec for Spark Application in k8s.
> This aims to add Java API library for Spark Operator, with the ability to 
> generate yaml spec.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47950) Add Java API Module for Spark Operator

2024-04-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47950.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 8
[https://github.com/apache/spark-kubernetes-operator/pull/8]

> Add Java API Module for Spark Operator
> --
>
> Key: SPARK-47950
> URL: https://issues.apache.org/jira/browse/SPARK-47950
> Project: Spark
>  Issue Type: Sub-task
>  Components: k8s
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Zhou JIANG
>Assignee: Zhou JIANG
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Spark Operator API refers to the 
> [CustomResourceDefinition|https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/]
>  __ that represents the spec for Spark Application in k8s.
> This aims to add Java API library for Spark Operator, with the ability to 
> generate yaml spec.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48010) Avoid repeated calls to conf.resolver in resolveExpression

2024-04-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48010.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46248
[https://github.com/apache/spark/pull/46248]

> Avoid repeated calls to conf.resolver in resolveExpression
> --
>
> Key: SPARK-48010
> URL: https://issues.apache.org/jira/browse/SPARK-48010
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.3
>Reporter: Nikhil Sheoran
>Assignee: Nikhil Sheoran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Consider a view with a large number of columns (~1000s). When resolving this 
> view, looking at the flamegraph, observed repeated initializations of `conf` 
> to obtain the `resolver` for each column of the view.
> This can be easily optimized to reuse the same resolver (obtained once) for 
> the various calls to `innerResolve` in `resolveExpression`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46122) Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default

2024-04-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-46122:
-

Assignee: Dongjoon Hyun

> Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default
> -
>
> Key: SPARK-46122
> URL: https://issues.apache.org/jira/browse/SPARK-46122
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48005) Enable `DefaultIndexParityTests. test_index_distributed_sequence_cleanup`

2024-04-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48005.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46242
[https://github.com/apache/spark/pull/46242]

> Enable `DefaultIndexParityTests. test_index_distributed_sequence_cleanup`
> -
>
> Key: SPARK-48005
> URL: https://issues.apache.org/jira/browse/SPARK-48005
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48007) MsSQLServer: upgrade mssql.jdbc.version to 12.6.1.jre11

2024-04-26 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48007.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46244
[https://github.com/apache/spark/pull/46244]

> MsSQLServer: upgrade mssql.jdbc.version to 12.6.1.jre11
> ---
>
> Key: SPARK-48007
> URL: https://issues.apache.org/jira/browse/SPARK-48007
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Tests
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47991) Arrange the test cases for window frames and window functions.

2024-04-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47991.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46226
[https://github.com/apache/spark/pull/46226]

> Arrange the test cases for window frames and window functions.
> --
>
> Key: SPARK-47991
> URL: https://issues.apache.org/jira/browse/SPARK-47991
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22231) Support of map, filter, withField, dropFields in nested list of structures

2024-04-25 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-22231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840934#comment-17840934
 ] 

Dongjoon Hyun commented on SPARK-22231:
---

I removed the outdated target version from this issue.

> Support of map, filter, withField, dropFields in nested list of structures
> --
>
> Key: SPARK-22231
> URL: https://issues.apache.org/jira/browse/SPARK-22231
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: DB Tsai
>Priority: Major
>
> At Netflix's algorithm team, we work on ranking problems to find the great 
> content to fulfill the unique tastes of our members. Before building a 
> recommendation algorithms, we need to prepare the training, testing, and 
> validation datasets in Apache Spark. Due to the nature of ranking problems, 
> we have a nested list of items to be ranked in one column, and the top level 
> is the contexts describing the setting for where a model is to be used (e.g. 
> profiles, country, time, device, etc.)  Here is a blog post describing the 
> details, [Distributed Time Travel for Feature 
> Generation|https://medium.com/netflix-techblog/distributed-time-travel-for-feature-generation-389cccdd3907].
>  
> To be more concrete, for the ranks of videos for a given profile_id at a 
> given country, our data schema can be looked like this,
> {code:java}
> root
>  |-- profile_id: long (nullable = true)
>  |-- country_iso_code: string (nullable = true)
>  |-- items: array (nullable = false)
>  ||-- element: struct (containsNull = false)
>  |||-- title_id: integer (nullable = true)
>  |||-- scores: double (nullable = true)
> ...
> {code}
> We oftentimes need to work on the nested list of structs by applying some 
> functions on them. Sometimes, we're dropping or adding new columns in the 
> nested list of structs. Currently, there is no easy solution in open source 
> Apache Spark to perform those operations using SQL primitives; many people 
> just convert the data into RDD to work on the nested level of data, and then 
> reconstruct the new dataframe as workaround. This is extremely inefficient 
> because all the optimizations like predicate pushdown in SQL can not be 
> performed, we can not leverage on the columnar format, and the serialization 
> and deserialization cost becomes really huge even we just want to add a new 
> column in the nested level.
> We built a solution internally at Netflix which we're very happy with. We 
> plan to make it open source in Spark upstream. We would like to socialize the 
> API design to see if we miss any use-case.  
> The first API we added is *mapItems* on dataframe which take a function from 
> *Column* to *Column*, and then apply the function on nested dataframe. Here 
> is an example,
> {code:java}
> case class Data(foo: Int, bar: Double, items: Seq[Double])
> val df: Dataset[Data] = spark.createDataset(Seq(
>   Data(10, 10.0, Seq(10.1, 10.2, 10.3, 10.4)),
>   Data(20, 20.0, Seq(20.1, 20.2, 20.3, 20.4))
> ))
> val result = df.mapItems("items") {
>   item => item * 2.0
> }
> result.printSchema()
> // root
> // |-- foo: integer (nullable = false)
> // |-- bar: double (nullable = false)
> // |-- items: array (nullable = true)
> // ||-- element: double (containsNull = true)
> result.show()
> // +---+++
> // |foo| bar|   items|
> // +---+++
> // | 10|10.0|[20.2, 20.4, 20.6...|
> // | 20|20.0|[40.2, 40.4, 40.6...|
> // +---+++
> {code}
> Now, with the ability of applying a function in the nested dataframe, we can 
> add a new function, *withColumn* in *Column* to add or replace the existing 
> column that has the same name in the nested list of struct. Here is two 
> examples demonstrating the API together with *mapItems*; the first one 
> replaces the existing column,
> {code:java}
> case class Item(a: Int, b: Double)
> case class Data(foo: Int, bar: Double, items: Seq[Item])
> val df: Dataset[Data] = spark.createDataset(Seq(
>   Data(10, 10.0, Seq(Item(10, 10.0), Item(11, 11.0))),
>   Data(20, 20.0, Seq(Item(20, 20.0), Item(21, 21.0)))
> ))
> val result = df.mapItems("items") {
>   item => item.withColumn(item("b") + 1 as "b")
> }
> result.printSchema
> root
> // |-- foo: integer (nullable = false)
> // |-- bar: double (nullable = false)
> // |-- items: array (nullable = true)
> // ||-- element: struct (containsNull = true)
> // |||-- a: integer (nullable = true)
> // |||-- b: double (nullable = true)
> result.show(false)
> // +---++--+
> // |foo|bar |items |
> // +---++--+
> // |10 |10.0|[[10,11.0], [11,12.0]]|
> // |20 |20.0|[[20,21.0], [21,22.0]]|
> // 

[jira] [Updated] (SPARK-22231) Support of map, filter, withField, dropFields in nested list of structures

2024-04-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-22231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-22231:
--
Target Version/s:   (was: 3.2.0)

> Support of map, filter, withField, dropFields in nested list of structures
> --
>
> Key: SPARK-22231
> URL: https://issues.apache.org/jira/browse/SPARK-22231
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: DB Tsai
>Priority: Major
>
> At Netflix's algorithm team, we work on ranking problems to find the great 
> content to fulfill the unique tastes of our members. Before building a 
> recommendation algorithms, we need to prepare the training, testing, and 
> validation datasets in Apache Spark. Due to the nature of ranking problems, 
> we have a nested list of items to be ranked in one column, and the top level 
> is the contexts describing the setting for where a model is to be used (e.g. 
> profiles, country, time, device, etc.)  Here is a blog post describing the 
> details, [Distributed Time Travel for Feature 
> Generation|https://medium.com/netflix-techblog/distributed-time-travel-for-feature-generation-389cccdd3907].
>  
> To be more concrete, for the ranks of videos for a given profile_id at a 
> given country, our data schema can be looked like this,
> {code:java}
> root
>  |-- profile_id: long (nullable = true)
>  |-- country_iso_code: string (nullable = true)
>  |-- items: array (nullable = false)
>  ||-- element: struct (containsNull = false)
>  |||-- title_id: integer (nullable = true)
>  |||-- scores: double (nullable = true)
> ...
> {code}
> We oftentimes need to work on the nested list of structs by applying some 
> functions on them. Sometimes, we're dropping or adding new columns in the 
> nested list of structs. Currently, there is no easy solution in open source 
> Apache Spark to perform those operations using SQL primitives; many people 
> just convert the data into RDD to work on the nested level of data, and then 
> reconstruct the new dataframe as workaround. This is extremely inefficient 
> because all the optimizations like predicate pushdown in SQL can not be 
> performed, we can not leverage on the columnar format, and the serialization 
> and deserialization cost becomes really huge even we just want to add a new 
> column in the nested level.
> We built a solution internally at Netflix which we're very happy with. We 
> plan to make it open source in Spark upstream. We would like to socialize the 
> API design to see if we miss any use-case.  
> The first API we added is *mapItems* on dataframe which take a function from 
> *Column* to *Column*, and then apply the function on nested dataframe. Here 
> is an example,
> {code:java}
> case class Data(foo: Int, bar: Double, items: Seq[Double])
> val df: Dataset[Data] = spark.createDataset(Seq(
>   Data(10, 10.0, Seq(10.1, 10.2, 10.3, 10.4)),
>   Data(20, 20.0, Seq(20.1, 20.2, 20.3, 20.4))
> ))
> val result = df.mapItems("items") {
>   item => item * 2.0
> }
> result.printSchema()
> // root
> // |-- foo: integer (nullable = false)
> // |-- bar: double (nullable = false)
> // |-- items: array (nullable = true)
> // ||-- element: double (containsNull = true)
> result.show()
> // +---+++
> // |foo| bar|   items|
> // +---+++
> // | 10|10.0|[20.2, 20.4, 20.6...|
> // | 20|20.0|[40.2, 40.4, 40.6...|
> // +---+++
> {code}
> Now, with the ability of applying a function in the nested dataframe, we can 
> add a new function, *withColumn* in *Column* to add or replace the existing 
> column that has the same name in the nested list of struct. Here is two 
> examples demonstrating the API together with *mapItems*; the first one 
> replaces the existing column,
> {code:java}
> case class Item(a: Int, b: Double)
> case class Data(foo: Int, bar: Double, items: Seq[Item])
> val df: Dataset[Data] = spark.createDataset(Seq(
>   Data(10, 10.0, Seq(Item(10, 10.0), Item(11, 11.0))),
>   Data(20, 20.0, Seq(Item(20, 20.0), Item(21, 21.0)))
> ))
> val result = df.mapItems("items") {
>   item => item.withColumn(item("b") + 1 as "b")
> }
> result.printSchema
> root
> // |-- foo: integer (nullable = false)
> // |-- bar: double (nullable = false)
> // |-- items: array (nullable = true)
> // ||-- element: struct (containsNull = true)
> // |||-- a: integer (nullable = true)
> // |||-- b: double (nullable = true)
> result.show(false)
> // +---++--+
> // |foo|bar |items |
> // +---++--+
> // |10 |10.0|[[10,11.0], [11,12.0]]|
> // |20 |20.0|[[20,21.0], [21,22.0]]|
> // +---++--+
> {code}
> and the second 

[jira] [Updated] (SPARK-24941) Add RDDBarrier.coalesce() function

2024-04-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-24941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-24941:
--
Target Version/s:   (was: 3.2.0)

> Add RDDBarrier.coalesce() function
> --
>
> Key: SPARK-24941
> URL: https://issues.apache.org/jira/browse/SPARK-24941
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Xingbo Jiang
>Priority: Major
>
> https://github.com/apache/spark/pull/21758#discussion_r204917245
> The number of partitions from the input data can be unexpectedly large, eg. 
> if you do
> {code}
> sc.textFile(...).barrier().mapPartitions()
> {code}
> The number of input partitions is based on the hdfs input splits. We shall 
> provide a way in RDDBarrier to enable users to specify the number of tasks in 
> a barrier stage. Maybe something like RDDBarrier.coalesce(numPartitions: Int) 
> .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25383) Image data source supports sample pushdown

2024-04-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-25383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25383:
--
Target Version/s:   (was: 3.2.0)

> Image data source supports sample pushdown
> --
>
> Key: SPARK-25383
> URL: https://issues.apache.org/jira/browse/SPARK-25383
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, SQL
>Affects Versions: 3.1.0
>Reporter: Xiangrui Meng
>Priority: Major
>
> After SPARK-25349, we should update image data source to support sampling.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25752) Add trait to easily whitelist logical operators that produce named output from CleanupAliases

2024-04-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-25752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25752:
--
Target Version/s:   (was: 3.2.0)

> Add trait to easily whitelist logical operators that produce named output 
> from CleanupAliases
> -
>
> Key: SPARK-25752
> URL: https://issues.apache.org/jira/browse/SPARK-25752
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>Priority: Minor
>
> The rule `CleanupAliases` cleans up aliases from logical operators that do 
> not match a whitelist. This whitelist is hardcoded inside the rule which is 
> cumbersome. This PR is to clean that up by making a trait `HasNamedOutput` 
> that will be ignored by `CleanupAliases` and other ops that require aliases 
> to be preserved in the operator should extend it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28629) Capture the missing rules in HiveSessionStateBuilder

2024-04-25 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840928#comment-17840928
 ] 

Dongjoon Hyun commented on SPARK-28629:
---

I removed the outdated target version from this issue.

> Capture the missing rules in HiveSessionStateBuilder
> 
>
> Key: SPARK-28629
> URL: https://issues.apache.org/jira/browse/SPARK-28629
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Xiao Li
>Priority: Major
>
> A general mistake for new contributors is to forget adding the corresponding 
> rules into the extended extendedResolutionRules, postHocResolutionRules, 
> extendedCheckRules in HiveSessionStateBuilder. We need to avoid missing the 
> rules or capture them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27780) Shuffle server & client should be versioned to enable smoother upgrade

2024-04-25 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-27780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840930#comment-17840930
 ] 

Dongjoon Hyun commented on SPARK-27780:
---

I removed the outdated target version from this issue.

> Shuffle server & client should be versioned to enable smoother upgrade
> --
>
> Key: SPARK-27780
> URL: https://issues.apache.org/jira/browse/SPARK-27780
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Imran Rashid
>Priority: Major
>
> The external shuffle service is often upgraded at a different time than spark 
> itself.  However, this causes problems when the protocol changes between the 
> shuffle service and the spark runtime -- this forces users to upgrade 
> everything simultaneously.
> We should add versioning to the shuffle client & server, so they know what 
> messages the other will support.  This would allow better handling of mixed 
> versions, from better error msgs to allowing some mismatched versions (with 
> reduced capabilities).
> This originally came up in a discussion here: 
> https://github.com/apache/spark/pull/24565#issuecomment-493496466
> There are a few ways we could do the versioning which we still need to 
> discuss:
> 1) Version specified by config.  This allows for mixed versions across the 
> cluster and rolling upgrades.  It also will let a spark 3.0 client talk to a 
> 2.4 shuffle service.  But, may be a nuisance for users to get this right.
> 2) Auto-detection during registration with local shuffle service.  This makes 
> the versioning easy for the end user, and can even handle a 2.4 shuffle 
> service though it does not support the new versioning.  However, it will not 
> handle a rolling upgrade correctly -- if the local shuffle service has been 
> upgraded, but other nodes in the cluster have not, it will get the version 
> wrong.
> 3) Exchange versions per-connection.  When a connection is opened, the server 
> & client could first exchange messages with their versions, so they know how 
> to continue communication after that.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28629) Capture the missing rules in HiveSessionStateBuilder

2024-04-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28629:
--
Target Version/s:   (was: 3.2.0)

> Capture the missing rules in HiveSessionStateBuilder
> 
>
> Key: SPARK-28629
> URL: https://issues.apache.org/jira/browse/SPARK-28629
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Xiao Li
>Priority: Major
>
> A general mistake for new contributors is to forget adding the corresponding 
> rules into the extended extendedResolutionRules, postHocResolutionRules, 
> extendedCheckRules in HiveSessionStateBuilder. We need to avoid missing the 
> rules or capture them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27780) Shuffle server & client should be versioned to enable smoother upgrade

2024-04-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-27780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27780:
--
Target Version/s:   (was: 3.2.0)

> Shuffle server & client should be versioned to enable smoother upgrade
> --
>
> Key: SPARK-27780
> URL: https://issues.apache.org/jira/browse/SPARK-27780
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Imran Rashid
>Priority: Major
>
> The external shuffle service is often upgraded at a different time than spark 
> itself.  However, this causes problems when the protocol changes between the 
> shuffle service and the spark runtime -- this forces users to upgrade 
> everything simultaneously.
> We should add versioning to the shuffle client & server, so they know what 
> messages the other will support.  This would allow better handling of mixed 
> versions, from better error msgs to allowing some mismatched versions (with 
> reduced capabilities).
> This originally came up in a discussion here: 
> https://github.com/apache/spark/pull/24565#issuecomment-493496466
> There are a few ways we could do the versioning which we still need to 
> discuss:
> 1) Version specified by config.  This allows for mixed versions across the 
> cluster and rolling upgrades.  It also will let a spark 3.0 client talk to a 
> 2.4 shuffle service.  But, may be a nuisance for users to get this right.
> 2) Auto-detection during registration with local shuffle service.  This makes 
> the versioning easy for the end user, and can even handle a 2.4 shuffle 
> service though it does not support the new versioning.  However, it will not 
> handle a rolling upgrade correctly -- if the local shuffle service has been 
> upgraded, but other nodes in the cluster have not, it will get the version 
> wrong.
> 3) Exchange versions per-connection.  When a connection is opened, the server 
> & client could first exchange messages with their versions, so they know how 
> to continue communication after that.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30324) Simplify API for JSON access in DataFrames/SQL

2024-04-25 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840927#comment-17840927
 ] 

Dongjoon Hyun commented on SPARK-30324:
---

I removed the outdated target version from this issue.

> Simplify API for JSON access in DataFrames/SQL
> --
>
> Key: SPARK-30324
> URL: https://issues.apache.org/jira/browse/SPARK-30324
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Burak Yavuz
>Priority: Major
>
> get_json_object() is a UDF to parse JSON fields. It is verbose and hard to 
> use, e.g. I wasn't expecting the path to a field to have to start with "$.". 
> We can simplify all of this when a column is of StringType, and a nested 
> field is requested. This API sugar will in the query planner be rewritten as 
> get_json_object.
> This nested access can then be extended in the future to other 
> semi-structured formats.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30324) Simplify API for JSON access in DataFrames/SQL

2024-04-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-30324:
--
Target Version/s:   (was: 3.2.0)

> Simplify API for JSON access in DataFrames/SQL
> --
>
> Key: SPARK-30324
> URL: https://issues.apache.org/jira/browse/SPARK-30324
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Burak Yavuz
>Priority: Major
>
> get_json_object() is a UDF to parse JSON fields. It is verbose and hard to 
> use, e.g. I wasn't expecting the path to a field to have to start with "$.". 
> We can simplify all of this when a column is of StringType, and a nested 
> field is requested. This API sugar will in the query planner be rewritten as 
> get_json_object.
> This nested access can then be extended in the future to other 
> semi-structured formats.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30334) Add metadata around semi-structured columns to Spark

2024-04-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-30334:
--
Target Version/s:   (was: 3.2.0)

> Add metadata around semi-structured columns to Spark
> 
>
> Key: SPARK-30334
> URL: https://issues.apache.org/jira/browse/SPARK-30334
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Burak Yavuz
>Priority: Major
>
> Semi-structured data is used widely in the data industry for reporting events 
> in a wide variety of formats. Click events in product analytics can be stored 
> as json. Some application logs can be in the form of delimited key=value 
> text. Some data may be in xml.
> The goal of this project is to be able to signal Spark that such a column 
> exists. This will then enable Spark to "auto-parse" these columns on the fly. 
> The proposal is to store this information as part of the column metadata, in 
> the fields:
>  - format: The format of the semi-structured column, e.g. json, xml, avro
>  - options: Options for parsing these columns
> Then imagine having the following data:
> {code:java}
> ++---++
> | ts | event |raw |
> ++---++
> | 2019-10-12 | click | {"field":"value"}  |
> ++---++ {code}
> SELECT raw.field FROM data
> will return "value"
> or the following data
> {code:java}
> ++---+--+
> | ts | event | raw  |
> ++---+--+
> | 2019-10-12 | click | field1=v1|field2=v2  |
> ++---+--+ {code}
> SELECT raw.field1 FROM data
> will return v1.
>  
> As a first step, we will introduce the function "as_json", which accomplishes 
> this for JSON columns.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30334) Add metadata around semi-structured columns to Spark

2024-04-25 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840926#comment-17840926
 ] 

Dongjoon Hyun commented on SPARK-30334:
---

I removed the outdated target version from this issue.

> Add metadata around semi-structured columns to Spark
> 
>
> Key: SPARK-30334
> URL: https://issues.apache.org/jira/browse/SPARK-30334
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Burak Yavuz
>Priority: Major
>
> Semi-structured data is used widely in the data industry for reporting events 
> in a wide variety of formats. Click events in product analytics can be stored 
> as json. Some application logs can be in the form of delimited key=value 
> text. Some data may be in xml.
> The goal of this project is to be able to signal Spark that such a column 
> exists. This will then enable Spark to "auto-parse" these columns on the fly. 
> The proposal is to store this information as part of the column metadata, in 
> the fields:
>  - format: The format of the semi-structured column, e.g. json, xml, avro
>  - options: Options for parsing these columns
> Then imagine having the following data:
> {code:java}
> ++---++
> | ts | event |raw |
> ++---++
> | 2019-10-12 | click | {"field":"value"}  |
> ++---++ {code}
> SELECT raw.field FROM data
> will return "value"
> or the following data
> {code:java}
> ++---+--+
> | ts | event | raw  |
> ++---+--+
> | 2019-10-12 | click | field1=v1|field2=v2  |
> ++---+--+ {code}
> SELECT raw.field1 FROM data
> will return v1.
>  
> As a first step, we will introduce the function "as_json", which accomplishes 
> this for JSON columns.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24942) Improve cluster resource management with jobs containing barrier stage

2024-04-25 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-24942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840913#comment-17840913
 ] 

Dongjoon Hyun commented on SPARK-24942:
---

I removed the outdated target version, `3.2.0`, from this Jira. For now, Apache 
Spark community has no target version for this issue.

> Improve cluster resource management with jobs containing barrier stage
> --
>
> Key: SPARK-24942
> URL: https://issues.apache.org/jira/browse/SPARK-24942
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Xingbo Jiang
>Priority: Major
>
> https://github.com/apache/spark/pull/21758#discussion_r205652317
> We shall improve cluster resource management to address the following issues:
> - With dynamic resource allocation enabled, it may happen that we acquire 
> some executors (but not enough to launch all the tasks in a barrier stage) 
> and later release them due to executor idle time expire, and then acquire 
> again.
> - There can be deadlock with two concurrent applications. Each application 
> may acquire some resources, but not enough to launch all the tasks in a 
> barrier stage. And after hitting the idle timeout and releasing them, they 
> may acquire resources again, but just continually trade resources between 
> each other.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24942) Improve cluster resource management with jobs containing barrier stage

2024-04-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-24942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-24942:
--
Target Version/s:   (was: 3.2.0)

> Improve cluster resource management with jobs containing barrier stage
> --
>
> Key: SPARK-24942
> URL: https://issues.apache.org/jira/browse/SPARK-24942
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Xingbo Jiang
>Priority: Major
>
> https://github.com/apache/spark/pull/21758#discussion_r205652317
> We shall improve cluster resource management to address the following issues:
> - With dynamic resource allocation enabled, it may happen that we acquire 
> some executors (but not enough to launch all the tasks in a barrier stage) 
> and later release them due to executor idle time expire, and then acquire 
> again.
> - There can be deadlock with two concurrent applications. Each application 
> may acquire some resources, but not enough to launch all the tasks in a 
> barrier stage. And after hitting the idle timeout and releasing them, they 
> may acquire resources again, but just continually trade resources between 
> each other.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44111) Prepare Apache Spark 4.0.0

2024-04-25 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840853#comment-17840853
 ] 

Dongjoon Hyun commented on SPARK-44111:
---

Yes, we will provide `4.0.0-preview` in advance, [~fbiville] . Here is the 
discussion thread on Apache Spark dev mailing list.
 * [https://lists.apache.org/thread/nxmvz2j7kp96otzlnl3kd277knlb6qgb]

[~cloud_fan] is the release manager who is leading Apache Spark 4.0.0 release 
(including preview).

> Prepare Apache Spark 4.0.0
> --
>
> Key: SPARK-44111
> URL: https://issues.apache.org/jira/browse/SPARK-44111
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Critical
>  Labels: pull-request-available
>
> For now, this issue aims to collect ideas for planning Apache Spark 4.0.0.
> We will add more items which will be excluded from Apache Spark 3.5.0 
> (Feature Freeze: July 16th, 2023).
> {code}
> Spark 1: 2014.05 (1.0.0) ~ 2016.11 (1.6.3)
> Spark 2: 2016.07 (2.0.0) ~ 2021.05 (2.4.8)
> Spark 3: 2020.06 (3.0.0) ~ 2026.xx (3.5.x)
> Spark 4: 2024.06 (4.0.0, NEW)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47987) Enable `ArrowParityTests.test_createDataFrame_empty_partition`

2024-04-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47987.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46220
[https://github.com/apache/spark/pull/46220]

> Enable `ArrowParityTests.test_createDataFrame_empty_partition`
> --
>
> Key: SPARK-47987
> URL: https://issues.apache.org/jira/browse/SPARK-47987
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47990) Upgrade `zstd-jni` to 1.5.6-3

2024-04-25 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47990.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46225
[https://github.com/apache/spark/pull/46225]

> Upgrade `zstd-jni` to 1.5.6-3
> -
>
> Key: SPARK-47990
> URL: https://issues.apache.org/jira/browse/SPARK-47990
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-46122) Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default

2024-04-24 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840644#comment-17840644
 ] 

Dongjoon Hyun commented on SPARK-46122:
---

I sent the discussion thread for this issue.

- [https://lists.apache.org/thread/ylk96fg4lvn6klxhj6t6yh42lyqb8wmd]

> Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default
> -
>
> Key: SPARK-46122
> URL: https://issues.apache.org/jira/browse/SPARK-46122
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46122) Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default

2024-04-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-46122:
--
Summary: Set `spark.sql.legacy.createHiveTableByDefault` to `false` by 
default  (was: Set `spark.sql.legacy.createHiveTableByDefault` to false by 
default)

> Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default
> -
>
> Key: SPARK-46122
> URL: https://issues.apache.org/jira/browse/SPARK-46122
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46122) Set `spark.sql.legacy.createHiveTableByDefault` to false by default

2024-04-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-46122:
--
Summary: Set `spark.sql.legacy.createHiveTableByDefault` to false by 
default  (was: Disable spark.sql.legacy.createHiveTableByDefault by default)

> Set `spark.sql.legacy.createHiveTableByDefault` to false by default
> ---
>
> Key: SPARK-46122
> URL: https://issues.apache.org/jira/browse/SPARK-46122
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47979) Use Hive tables explicitly for Hive table capability tests

2024-04-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47979.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46211
[https://github.com/apache/spark/pull/46211]

> Use Hive tables explicitly for Hive table capability tests
> --
>
> Key: SPARK-47979
> URL: https://issues.apache.org/jira/browse/SPARK-47979
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47979) Use Hive table explicitly for Hive table capability tests

2024-04-24 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-47979:
-

 Summary: Use Hive table explicitly for Hive table capability tests
 Key: SPARK-47979
 URL: https://issues.apache.org/jira/browse/SPARK-47979
 Project: Spark
  Issue Type: Test
  Components: SQL, Tests
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47979) Use Hive tables explicitly for Hive table capability tests

2024-04-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47979:
--
Summary: Use Hive tables explicitly for Hive table capability tests  (was: 
Use Hive table explicitly for Hive table capability tests)

> Use Hive tables explicitly for Hive table capability tests
> --
>
> Key: SPARK-47979
> URL: https://issues.apache.org/jira/browse/SPARK-47979
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45265) Support Hive 4.0 metastore

2024-04-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45265:
-

Assignee: (was: Attila Zsolt Piros)

> Support Hive 4.0 metastore
> --
>
> Key: SPARK-45265
> URL: https://issues.apache.org/jira/browse/SPARK-45265
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Attila Zsolt Piros
>Priority: Major
>  Labels: pull-request-available
>
> Although Hive 4.0.0 is still beta I would like to work on this as Hive 4.0.0 
> will support support the pushdowns of partition column filters with 
> VARCHAR/CHAR types.
> For details please see HIVE-26661: Support partition filter for char and 
> varchar types on Hive metastore



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44677) Drop legacy Hive-based ORC file format

2024-04-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-44677:
--
Parent: (was: SPARK-44111)
Issue Type: Task  (was: Sub-task)

> Drop legacy Hive-based ORC file format
> --
>
> Key: SPARK-44677
> URL: https://issues.apache.org/jira/browse/SPARK-44677
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Priority: Major
>
> Currently, Spark allows to use spark.sql.orc.impl=native/hive to switch the 
> ORC FileFormat implementation.
> SPARK-23456(2.4) switched the default value of spark.sql.orc.impl from "hive" 
> to "native". and prepared to drop the "hive" implementation in the future.
> > ... eventually, Apache Spark will drop old Hive-based ORC code.
> The native implementation works well during the whole Spark 3.x period, so 
> it's a good time to consider dropping the "hive" one in Spark 4.0.
> Also, we should take care about the backward-compatibility during change.
> > BTW, IIRC, there was a different at Hive ORC CHAR implementation before. 
> > So, we couldn't remove it for backward-compatibility issues. Since Spark 
> > implements many CHAR features, we need to re-verify that {{native}} 
> > implementation has all legacy Hive-based ORC features



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47499) Reuse `test_help_command` in Connect

2024-04-24 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840514#comment-17840514
 ] 

Dongjoon Hyun commented on SPARK-47499:
---

Thank you for collecting this to the umbrella Jira, [~podongfeng] . 

> Reuse `test_help_command` in Connect
> 
>
> Key: SPARK-47499
> URL: https://issues.apache.org/jira/browse/SPARK-47499
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47633) Cache miss for queries using JOIN LATERAL with join condition

2024-04-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47633:
-

Assignee: Bruce Robbins

> Cache miss for queries using JOIN LATERAL with join condition
> -
>
> Key: SPARK-47633
> URL: https://issues.apache.org/jira/browse/SPARK-47633
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.2, 4.0.0, 3.5.1
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Major
>  Labels: pull-request-available
>
> For example:
> {noformat}
> CREATE or REPLACE TEMP VIEW t1(c1, c2) AS VALUES (0, 1), (1, 2);
> CREATE or REPLACE TEMP VIEW t2(c1, c2) AS VALUES (0, 1), (1, 2);
> create or replace temp view v1 as
> select *
> from t1
> join lateral (
>   select c1 as a, c2 as b
>   from t2)
> on c1 = a;
> cache table v1;
> explain select * from v1;
> == Physical Plan ==
> AdaptiveSparkPlan isFinalPlan=false
> +- BroadcastHashJoin [c1#180], [a#173], Inner, BuildRight, false
>:- LocalTableScan [c1#180, c2#181]
>+- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, 
> false] as bigint)),false), [plan_id=113]
>   +- LocalTableScan [a#173, b#174]
> {noformat}
> Note that there is no {{InMemoryRelation}}.
> However, if you move the join condition into the subquery, the cached plan is 
> used:
> {noformat}
> CREATE or REPLACE TEMP VIEW t1(c1, c2) AS VALUES (0, 1), (1, 2);
> CREATE or REPLACE TEMP VIEW t2(c1, c2) AS VALUES (0, 1), (1, 2);
> create or replace temp view v2 as
> select *
> from t1
> join lateral (
>   select c1 as a, c2 as b
>   from t2
>   where t1.c1 = t2.c1);
> cache table v2;
> explain select * from v2;
> == Physical Plan ==
> AdaptiveSparkPlan isFinalPlan=false
> +- Scan In-memory table v2 [c1#176, c2#177, a#178, b#179]
>   +- InMemoryRelation [c1#176, c2#177, a#178, b#179], StorageLevel(disk, 
> memory, deserialized, 1 replicas)
> +- AdaptiveSparkPlan isFinalPlan=true
>+- == Final Plan ==
>   *(1) Project [c1#26, c2#27, a#19, b#20]
>   +- *(1) BroadcastHashJoin [c1#26], [c1#30], Inner, 
> BuildLeft, false
>  :- BroadcastQueryStage 0
>  :  +- BroadcastExchange 
> HashedRelationBroadcastMode(List(cast(input[0, int, false] as 
> bigint)),false), [plan_id=37]
>  : +- LocalTableScan [c1#26, c2#27]
>  +- *(1) LocalTableScan [a#19, b#20, c1#30]
>+- == Initial Plan ==
>   Project [c1#26, c2#27, a#19, b#20]
>   +- BroadcastHashJoin [c1#26], [c1#30], Inner, BuildLeft, 
> false
>  :- BroadcastExchange 
> HashedRelationBroadcastMode(List(cast(input[0, int, false] as 
> bigint)),false), [plan_id=37]
>  :  +- LocalTableScan [c1#26, c2#27]
>  +- LocalTableScan [a#19, b#20, c1#30]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47633) Cache miss for queries using JOIN LATERAL with join condition

2024-04-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47633.
---
Fix Version/s: 3.5.2
   Resolution: Fixed

Issue resolved by pull request 46190
[https://github.com/apache/spark/pull/46190]

> Cache miss for queries using JOIN LATERAL with join condition
> -
>
> Key: SPARK-47633
> URL: https://issues.apache.org/jira/browse/SPARK-47633
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.2, 4.0.0, 3.5.1
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.2
>
>
> For example:
> {noformat}
> CREATE or REPLACE TEMP VIEW t1(c1, c2) AS VALUES (0, 1), (1, 2);
> CREATE or REPLACE TEMP VIEW t2(c1, c2) AS VALUES (0, 1), (1, 2);
> create or replace temp view v1 as
> select *
> from t1
> join lateral (
>   select c1 as a, c2 as b
>   from t2)
> on c1 = a;
> cache table v1;
> explain select * from v1;
> == Physical Plan ==
> AdaptiveSparkPlan isFinalPlan=false
> +- BroadcastHashJoin [c1#180], [a#173], Inner, BuildRight, false
>:- LocalTableScan [c1#180, c2#181]
>+- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, 
> false] as bigint)),false), [plan_id=113]
>   +- LocalTableScan [a#173, b#174]
> {noformat}
> Note that there is no {{InMemoryRelation}}.
> However, if you move the join condition into the subquery, the cached plan is 
> used:
> {noformat}
> CREATE or REPLACE TEMP VIEW t1(c1, c2) AS VALUES (0, 1), (1, 2);
> CREATE or REPLACE TEMP VIEW t2(c1, c2) AS VALUES (0, 1), (1, 2);
> create or replace temp view v2 as
> select *
> from t1
> join lateral (
>   select c1 as a, c2 as b
>   from t2
>   where t1.c1 = t2.c1);
> cache table v2;
> explain select * from v2;
> == Physical Plan ==
> AdaptiveSparkPlan isFinalPlan=false
> +- Scan In-memory table v2 [c1#176, c2#177, a#178, b#179]
>   +- InMemoryRelation [c1#176, c2#177, a#178, b#179], StorageLevel(disk, 
> memory, deserialized, 1 replicas)
> +- AdaptiveSparkPlan isFinalPlan=true
>+- == Final Plan ==
>   *(1) Project [c1#26, c2#27, a#19, b#20]
>   +- *(1) BroadcastHashJoin [c1#26], [c1#30], Inner, 
> BuildLeft, false
>  :- BroadcastQueryStage 0
>  :  +- BroadcastExchange 
> HashedRelationBroadcastMode(List(cast(input[0, int, false] as 
> bigint)),false), [plan_id=37]
>  : +- LocalTableScan [c1#26, c2#27]
>  +- *(1) LocalTableScan [a#19, b#20, c1#30]
>+- == Initial Plan ==
>   Project [c1#26, c2#27, a#19, b#20]
>   +- BroadcastHashJoin [c1#26], [c1#30], Inner, BuildLeft, 
> false
>  :- BroadcastExchange 
> HashedRelationBroadcastMode(List(cast(input[0, int, false] as 
> bigint)),false), [plan_id=37]
>  :  +- LocalTableScan [c1#26, c2#27]
>  +- LocalTableScan [a#19, b#20, c1#30]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47974) Remove install_scala from build/mvn

2024-04-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47974:
--
Parent: SPARK-44111
Issue Type: Sub-task  (was: Improvement)

> Remove install_scala from build/mvn
> ---
>
> Key: SPARK-47974
> URL: https://issues.apache.org/jira/browse/SPARK-47974
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47974) Remove install_scala from build/mvn

2024-04-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47974.
---
Fix Version/s: 4.0.0
 Assignee: Cheng Pan
   Resolution: Fixed

This is resolved via [https://github.com/apache/spark/pull/46204]

> Remove install_scala from build/mvn
> ---
>
> Key: SPARK-47974
> URL: https://issues.apache.org/jira/browse/SPARK-47974
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47969) Make `test_creation_index` deterministic

2024-04-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47969.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46200
[https://github.com/apache/spark/pull/46200]

> Make `test_creation_index` deterministic
> 
>
> Key: SPARK-47969
> URL: https://issues.apache.org/jira/browse/SPARK-47969
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47956) sanity check for unresolved LCA reference

2024-04-23 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47956.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46185
[https://github.com/apache/spark/pull/46185]

> sanity check for unresolved LCA reference
> -
>
> Key: SPARK-47956
> URL: https://issues.apache.org/jira/browse/SPARK-47956
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47956) sanity check for unresolved LCA reference

2024-04-23 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47956:
-

Assignee: Wenchen Fan

> sanity check for unresolved LCA reference
> -
>
> Key: SPARK-47956
> URL: https://issues.apache.org/jira/browse/SPARK-47956
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47948) Upgrade the minimum Pandas version to 2.0.0

2024-04-23 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47948:
-

Assignee: Haejoon Lee

> Upgrade the minimum Pandas version to 2.0.0
> ---
>
> Key: SPARK-47948
> URL: https://issues.apache.org/jira/browse/SPARK-47948
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> Bump up the minimum version of Pandas from 1.4.4 to 2.0.0 to support Pandas 
> API on Spark from Apache Spark 4.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47948) Upgrade the minimum Pandas version to 2.0.0

2024-04-23 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47948.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46175
[https://github.com/apache/spark/pull/46175]

> Upgrade the minimum Pandas version to 2.0.0
> ---
>
> Key: SPARK-47948
> URL: https://issues.apache.org/jira/browse/SPARK-47948
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Bump up the minimum version of Pandas from 1.4.4 to 2.0.0 to support Pandas 
> API on Spark from Apache Spark 4.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47948) Upgrade the minimum Pandas version to 2.0.0

2024-04-23 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47948:
--
Summary: Upgrade the minimum Pandas version to 2.0.0  (was: Bump Pandas to 
2.0.0)

> Upgrade the minimum Pandas version to 2.0.0
> ---
>
> Key: SPARK-47948
> URL: https://issues.apache.org/jira/browse/SPARK-47948
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> Bump up the minimum version of Pandas from 1.4.4 to 2.0.0 to support Pandas 
> API on Spark from Apache Spark 4.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47949) MsSQLServer: Bump up docker image version to2022-CU12-GDR1-ubuntu-22.04

2024-04-23 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47949.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46176
[https://github.com/apache/spark/pull/46176]

> MsSQLServer: Bump up docker image version to2022-CU12-GDR1-ubuntu-22.04
> ---
>
> Key: SPARK-47949
> URL: https://issues.apache.org/jira/browse/SPARK-47949
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Docker
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> https://mcr.microsoft.com/en-us/product/mssql/server/tags



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47949) MsSQLServer: Bump up docker image version to2022-CU12-GDR1-ubuntu-22.04

2024-04-23 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47949:
-

Assignee: Kent Yao

> MsSQLServer: Bump up docker image version to2022-CU12-GDR1-ubuntu-22.04
> ---
>
> Key: SPARK-47949
> URL: https://issues.apache.org/jira/browse/SPARK-47949
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Docker
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>
> https://mcr.microsoft.com/en-us/product/mssql/server/tags



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47953) MsSQLServer: Document Mapping Spark SQL Data Types to Microsoft SQL Server

2024-04-23 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47953:
-

Assignee: Kent Yao

> MsSQLServer: Document Mapping Spark SQL Data Types to Microsoft SQL Server
> --
>
> Key: SPARK-47953
> URL: https://issues.apache.org/jira/browse/SPARK-47953
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47953) MsSQLServer: Document Mapping Spark SQL Data Types to Microsoft SQL Server

2024-04-23 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47953.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46177
[https://github.com/apache/spark/pull/46177]

> MsSQLServer: Document Mapping Spark SQL Data Types to Microsoft SQL Server
> --
>
> Key: SPARK-47953
> URL: https://issues.apache.org/jira/browse/SPARK-47953
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47943) Add Operator CI Task for Java Build and Test

2024-04-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47943:
-

Assignee: Zhou JIANG

> Add Operator CI Task for Java Build and Test
> 
>
> Key: SPARK-47943
> URL: https://issues.apache.org/jira/browse/SPARK-47943
> Project: Spark
>  Issue Type: Sub-task
>  Components: k8s
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Zhou JIANG
>Assignee: Zhou JIANG
>Priority: Major
>  Labels: pull-request-available
>
> We need to add CI task to build and test Java code for upcoming operator pull 
> requests. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47943) Add Operator CI Task for Java Build and Test

2024-04-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47943.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 7
[https://github.com/apache/spark-kubernetes-operator/pull/7]

> Add Operator CI Task for Java Build and Test
> 
>
> Key: SPARK-47943
> URL: https://issues.apache.org/jira/browse/SPARK-47943
> Project: Spark
>  Issue Type: Sub-task
>  Components: k8s
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Zhou JIANG
>Assignee: Zhou JIANG
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> We need to add CI task to build and test Java code for upcoming operator pull 
> requests. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47929) Setup Static Analysis for Operator

2024-04-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47929.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 6
[https://github.com/apache/spark-kubernetes-operator/pull/6]

> Setup Static Analysis for Operator
> --
>
> Key: SPARK-47929
> URL: https://issues.apache.org/jira/browse/SPARK-47929
> Project: Spark
>  Issue Type: Sub-task
>  Components: k8s
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Zhou JIANG
>Assignee: Zhou JIANG
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Add common analysis tasks including checkstyle, spotbugs, jacoco. Also 
> include spotless for style fix.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47938) MsSQLServer: Cannot find data type BYTE error

2024-04-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47938.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46164
[https://github.com/apache/spark/pull/46164]

> MsSQLServer: Cannot find data type BYTE error
> -
>
> Key: SPARK-47938
> URL: https://issues.apache.org/jira/browse/SPARK-47938
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47937) Fix docstring of `hll_sketch_agg`

2024-04-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47937.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46163
[https://github.com/apache/spark/pull/46163]

> Fix docstring of `hll_sketch_agg`
> -
>
> Key: SPARK-47937
> URL: https://issues.apache.org/jira/browse/SPARK-47937
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47937) Fix docstring of `hll_sketch_agg`

2024-04-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47937:
-

Assignee: Ruifeng Zheng

> Fix docstring of `hll_sketch_agg`
> -
>
> Key: SPARK-47937
> URL: https://issues.apache.org/jira/browse/SPARK-47937
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47904) Preserve case in Avro schema when using enableStableIdentifiersForUnionType

2024-04-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47904.
---
Fix Version/s: 3.5.2
   Resolution: Fixed

Issue resolved by pull request 46169
[https://github.com/apache/spark/pull/46169]

> Preserve case in Avro schema when using enableStableIdentifiersForUnionType
> ---
>
> Key: SPARK-47904
> URL: https://issues.apache.org/jira/browse/SPARK-47904
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.2
>Reporter: Ivan Sadikov
>Assignee: Ivan Sadikov
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.2
>
>
> When enableStableIdentifiersForUnionType is enabled, all of the types are 
> lowercased which creates a problem when field types are case-sensitive: 
> {code:java}
> Schema.createEnum("myENUM", "", null, List[String]("E1", "e2").asJava),
> Schema.createRecord("myRecord2", "", null, false, List[Schema.Field](new 
> Schema.Field("F", Schema.create(Type.FLOAT))).asJava){code}
>  would become
> {code:java}
> struct>  {code}
> but instead should be 
> {code:java}
> struct>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47904) Preserve case in Avro schema when using enableStableIdentifiersForUnionType

2024-04-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47904:
--
Fix Version/s: 4.0.0

> Preserve case in Avro schema when using enableStableIdentifiersForUnionType
> ---
>
> Key: SPARK-47904
> URL: https://issues.apache.org/jira/browse/SPARK-47904
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.2
>Reporter: Ivan Sadikov
>Assignee: Ivan Sadikov
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.2
>
>
> When enableStableIdentifiersForUnionType is enabled, all of the types are 
> lowercased which creates a problem when field types are case-sensitive: 
> {code:java}
> Schema.createEnum("myENUM", "", null, List[String]("E1", "e2").asJava),
> Schema.createRecord("myRecord2", "", null, false, List[Schema.Field](new 
> Schema.Field("F", Schema.create(Type.FLOAT))).asJava){code}
>  would become
> {code:java}
> struct>  {code}
> but instead should be 
> {code:java}
> struct>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47904) Preserve case in Avro schema when using enableStableIdentifiersForUnionType

2024-04-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47904:
-

Assignee: Ivan Sadikov

> Preserve case in Avro schema when using enableStableIdentifiersForUnionType
> ---
>
> Key: SPARK-47904
> URL: https://issues.apache.org/jira/browse/SPARK-47904
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.2
>Reporter: Ivan Sadikov
>Assignee: Ivan Sadikov
>Priority: Major
>  Labels: pull-request-available
>
> When enableStableIdentifiersForUnionType is enabled, all of the types are 
> lowercased which creates a problem when field types are case-sensitive: 
> {code:java}
> Schema.createEnum("myENUM", "", null, List[String]("E1", "e2").asJava),
> Schema.createRecord("myRecord2", "", null, false, List[Schema.Field](new 
> Schema.Field("F", Schema.create(Type.FLOAT))).asJava){code}
>  would become
> {code:java}
> struct>  {code}
> but instead should be 
> {code:java}
> struct>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47942) Drop K8s v1.26 Support

2024-04-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47942.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46168
[https://github.com/apache/spark/pull/46168]

> Drop K8s v1.26 Support
> --
>
> Key: SPARK-47942
> URL: https://issues.apache.org/jira/browse/SPARK-47942
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47942) Drop K8s v1.26 Support

2024-04-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47942:
-

Assignee: Dongjoon Hyun

> Drop K8s v1.26 Support
> --
>
> Key: SPARK-47942
> URL: https://issues.apache.org/jira/browse/SPARK-47942
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47942) Drop K8s v1.26 Support

2024-04-22 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-47942:
-

 Summary: Drop K8s v1.26 Support
 Key: SPARK-47942
 URL: https://issues.apache.org/jira/browse/SPARK-47942
 Project: Spark
  Issue Type: Sub-task
  Components: Kubernetes
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47940) Upgrade `guava` dependency to `33.1.0-jre` in Docker IT

2024-04-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47940:
--
Reporter: Cheng Pan  (was: Dongjoon Hyun)

> Upgrade `guava` dependency to `33.1.0-jre` in Docker IT
> ---
>
> Key: SPARK-47940
> URL: https://issues.apache.org/jira/browse/SPARK-47940
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Tests
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47940) Upgrade `guava` dependency to `33.1.0-jre` in Docker IT

2024-04-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47940.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46167
[https://github.com/apache/spark/pull/46167]

> Upgrade `guava` dependency to `33.1.0-jre` in Docker IT
> ---
>
> Key: SPARK-47940
> URL: https://issues.apache.org/jira/browse/SPARK-47940
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Cheng Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47940) Upgrade `guava` dependency to `33.1.0-jre` in Docker IT

2024-04-22 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-47940:
-

 Summary: Upgrade `guava` dependency to `33.1.0-jre` in Docker IT
 Key: SPARK-47940
 URL: https://issues.apache.org/jira/browse/SPARK-47940
 Project: Spark
  Issue Type: Sub-task
  Components: Build, Tests
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47935) Pin pandas==2.0.3 for pypy3.8

2024-04-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47935:
-

Assignee: Ruifeng Zheng

> Pin pandas==2.0.3 for pypy3.8
> -
>
> Key: SPARK-47935
> URL: https://issues.apache.org/jira/browse/SPARK-47935
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47935) Pin pandas==2.0.3 for pypy3.8

2024-04-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47935.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46159
[https://github.com/apache/spark/pull/46159]

> Pin pandas==2.0.3 for pypy3.8
> -
>
> Key: SPARK-47935
> URL: https://issues.apache.org/jira/browse/SPARK-47935
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47930) Upgrade RoaringBitmap to 1.0.6

2024-04-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47930.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46152
[https://github.com/apache/spark/pull/46152]

> Upgrade RoaringBitmap to 1.0.6
> --
>
> Key: SPARK-47930
> URL: https://issues.apache.org/jira/browse/SPARK-47930
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47925) Mark `BloomFilterAggregateQuerySuite` as `ExtendedSQLTest`

2024-04-19 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47925.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46145
[https://github.com/apache/spark/pull/46145]

> Mark `BloomFilterAggregateQuerySuite` as `ExtendedSQLTest`
> --
>
> Key: SPARK-47925
> URL: https://issues.apache.org/jira/browse/SPARK-47925
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47925) Mark `BloomFilterAggregateQuerySuite` as `ExtendedSQLTest`

2024-04-19 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47925:
-

Assignee: Dongjoon Hyun

> Mark `BloomFilterAggregateQuerySuite` as `ExtendedSQLTest`
> --
>
> Key: SPARK-47925
> URL: https://issues.apache.org/jira/browse/SPARK-47925
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Trivial
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47924) Add a debug log to `DiskStore.moveFileToBlock`

2024-04-19 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47924.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46144
[https://github.com/apache/spark/pull/46144]

> Add a debug log to `DiskStore.moveFileToBlock`
> --
>
> Key: SPARK-47924
> URL: https://issues.apache.org/jira/browse/SPARK-47924
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47924) Add a debug log to `DiskStore.moveFileToBlock`

2024-04-19 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47924:
-

Assignee: Dongjoon Hyun

> Add a debug log to `DiskStore.moveFileToBlock`
> --
>
> Key: SPARK-47924
> URL: https://issues.apache.org/jira/browse/SPARK-47924
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Trivial
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47923) Upgrade the minimum version of `arrow` R package to 10.0.0

2024-04-19 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47923.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46142
[https://github.com/apache/spark/pull/46142]

> Upgrade the minimum version of `arrow` R package to 10.0.0
> --
>
> Key: SPARK-47923
> URL: https://issues.apache.org/jira/browse/SPARK-47923
> Project: Spark
>  Issue Type: Sub-task
>  Components: R
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47925) Mark `BloomFilterAggregateQuerySuite` as `ExtendedSQLTest`

2024-04-19 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-47925:
-

 Summary: Mark `BloomFilterAggregateQuerySuite` as `ExtendedSQLTest`
 Key: SPARK-47925
 URL: https://issues.apache.org/jira/browse/SPARK-47925
 Project: Spark
  Issue Type: Test
  Components: SQL, Tests
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >