date:20240515

[jira] [Resolved] (SPARK-46001) Spark UI Test Improvements

2024-05-15 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-46001.
--
Fix Version/s: 4.0.0
 Assignee: Kent Yao
   Resolution: Fixed

> Spark UI Test Improvements
> --
>
> Key: SPARK-46001
> URL: https://issues.apache.org/jira/browse/SPARK-46001
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL, Tests, UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 4.0.0
>
>
> Spark UI tests are not supported, it's hard to test for developers and 
> maintain for the owners



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48299) Upgrade scala-maven-plugin to 4.9.1

2024-05-15 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-48299:


Assignee: Yang Jie

> Upgrade scala-maven-plugin to 4.9.1
> ---
>
> Key: SPARK-48299
> URL: https://issues.apache.org/jira/browse/SPARK-48299
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48299) Upgrade scala-maven-plugin to 4.9.1

2024-05-15 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48299.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46593
[https://github.com/apache/spark/pull/46593]

> Upgrade scala-maven-plugin to 4.9.1
> ---
>
> Key: SPARK-48299
> URL: https://issues.apache.org/jira/browse/SPARK-48299
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48299) Upgrade scala-maven-plugin to 4.9.1

2024-05-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48299:
---
Labels: pull-request-available  (was: )

> Upgrade scala-maven-plugin to 4.9.1
> ---
>
> Key: SPARK-48299
> URL: https://issues.apache.org/jira/browse/SPARK-48299
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47607) Add documentation for Structured logging framework

2024-05-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47607:
---
Labels: pull-request-available  (was: )

> Add documentation for Structured logging framework
> --
>
> Key: SPARK-47607
> URL: https://issues.apache.org/jira/browse/SPARK-47607
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48214) Ban import `org.slf4j.Logger` & `org.slf4j.LoggerFactory`

2024-05-15 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-48214:
--

Assignee: BingKun Pan

> Ban import `org.slf4j.Logger` & `org.slf4j.LoggerFactory`
> -
>
> Key: SPARK-48214
> URL: https://issues.apache.org/jira/browse/SPARK-48214
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Critical
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48214) Ban import `org.slf4j.Logger` & `org.slf4j.LoggerFactory`

2024-05-15 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-48214.

Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46502
[https://github.com/apache/spark/pull/46502]

> Ban import `org.slf4j.Logger` & `org.slf4j.LoggerFactory`
> -
>
> Key: SPARK-48214
> URL: https://issues.apache.org/jira/browse/SPARK-48214
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-48298) Add TCP mode to StatsdSink

2024-05-15 Thread Eric Yang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-48298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846789#comment-17846789
 ] 

Eric Yang edited comment on SPARK-48298 at 5/16/24 4:48 AM:


PR: https://github.com/apache/spark/pull/46604


was (Author: JIRAUSER304132):
I'm preparing a PR for it.

> Add TCP mode to StatsdSink
> --
>
> Key: SPARK-48298
> URL: https://issues.apache.org/jira/browse/SPARK-48298
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Eric Yang
>Priority: Major
>  Labels: pull-request-available
>
> Currently, the StatsdSink in Spark supports UDP mode only, which is the 
> default mode of StatsD. However, in real production environments, we often 
> find that a more reliable transmission of metrics is needed to avoid metrics 
> lose in high-traffic systems.
>  
> TCP mode is already supported by Statsd: 
> [https://github.com/statsd/statsd/blob/master/docs/server.md]
> Prometheus' statsd_exporter: [https://github.com/prometheus/statsd_exporter] 
> and also many other Statsd-based metrics proxies/receivers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36783) ScanOperation should not push Filter through nondeterministic Project

2024-05-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-36783:
---
Labels: pull-request-available  (was: )

> ScanOperation should not push Filter through nondeterministic Project
> -
>
> Key: SPARK-36783
> URL: https://issues.apache.org/jira/browse/SPARK-36783
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.2.0, 3.1.3, 3.0.4
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48287) Apply the builtin `timestamp_diff` method

2024-05-15 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-48287:
-

Assignee: Ruifeng Zheng

> Apply the builtin `timestamp_diff` method
> -
>
> Key: SPARK-48287
> URL: https://issues.apache.org/jira/browse/SPARK-48287
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48287) Apply the builtin `timestamp_diff` method

2024-05-15 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-48287.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46595
[https://github.com/apache/spark/pull/46595]

> Apply the builtin `timestamp_diff` method
> -
>
> Key: SPARK-48287
> URL: https://issues.apache.org/jira/browse/SPARK-48287
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48219) StreamReader Charset fix with UTF8

2024-05-15 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-48219:


Assignee: xy

> StreamReader Charset fix with UTF8
> --
>
> Key: SPARK-48219
> URL: https://issues.apache.org/jira/browse/SPARK-48219
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.3
>Reporter: xy
>Assignee: xy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48219) StreamReader Charset fix with UTF8

2024-05-15 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48219.
--
Resolution: Fixed

Issue resolved by pull request 46509
[https://github.com/apache/spark/pull/46509]

> StreamReader Charset fix with UTF8
> --
>
> Key: SPARK-48219
> URL: https://issues.apache.org/jira/browse/SPARK-48219
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.3
>Reporter: xy
>Assignee: xy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48299) Upgrade scala-maven-plugin to 4.9.1

2024-05-15 Thread Yang Jie (Jira)

Yang Jie created SPARK-48299:


 Summary: Upgrade scala-maven-plugin to 4.9.1
 Key: SPARK-48299
 URL: https://issues.apache.org/jira/browse/SPARK-48299
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 4.0.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48298) Add TCP mode to StatsdSink

2024-05-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48298:
---
Labels: pull-request-available  (was: )

> Add TCP mode to StatsdSink
> --
>
> Key: SPARK-48298
> URL: https://issues.apache.org/jira/browse/SPARK-48298
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Eric Yang
>Priority: Major
>  Labels: pull-request-available
>
> Currently, the StatsdSink in Spark supports UDP mode only, which is the 
> default mode of StatsD. However, in real production environments, we often 
> find that a more reliable transmission of metrics is needed to avoid metrics 
> lose in high-traffic systems.
>  
> TCP mode is already supported by Statsd: 
> [https://github.com/statsd/statsd/blob/master/docs/server.md]
> Prometheus' statsd_exporter: [https://github.com/prometheus/statsd_exporter] 
> and also many other Statsd-based metrics proxies/receivers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48298) Add TCP mode to StatsdSink

2024-05-15 Thread Eric Yang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated SPARK-48298:
--
Summary: Add TCP mode to StatsdSink  (was: StatsdSink supports TCP mode)

> Add TCP mode to StatsdSink
> --
>
> Key: SPARK-48298
> URL: https://issues.apache.org/jira/browse/SPARK-48298
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Eric Yang
>Priority: Major
>
> Currently, the StatsdSink in Spark supports UDP mode only, which is the 
> default mode of StatsD. However, in real production environments, we often 
> find that a more reliable transmission of metrics is needed to avoid metrics 
> lose in high-traffic systems.
>  
> TCP mode is already supported by Statsd: 
> [https://github.com/statsd/statsd/blob/master/docs/server.md]
> Prometheus' statsd_exporter: [https://github.com/prometheus/statsd_exporter] 
> and also many other Statsd-based metrics proxies/receivers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48295) Turn on compute.ops_on_diff_frames by default

2024-05-15 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48295.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46602
[https://github.com/apache/spark/pull/46602]

> Turn on compute.ops_on_diff_frames by default
> -
>
> Key: SPARK-48295
> URL: https://issues.apache.org/jira/browse/SPARK-48295
> Project: Spark
>  Issue Type: Improvement
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48295) Turn on compute.ops_on_diff_frames by default

2024-05-15 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48295:


Assignee: Ruifeng Zheng

> Turn on compute.ops_on_diff_frames by default
> -
>
> Key: SPARK-48295
> URL: https://issues.apache.org/jira/browse/SPARK-48295
> Project: Spark
>  Issue Type: Improvement
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48298) StatsdSink supports TCP mode

2024-05-15 Thread Eric Yang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated SPARK-48298:
--
Description: 
Currently, the StatsdSink in Spark supports UDP mode only, which is the default 
mode of StatsD. However, in real production environments, we often find that a 
more reliable transmission of metrics is needed to avoid metrics lose in 
high-traffic systems.

 

TCP mode is already supported by Statsd: 
[https://github.com/statsd/statsd/blob/master/docs/server.md]

Prometheus' statsd_exporter: [https://github.com/prometheus/statsd_exporter] 

and also many other Statsd-based metrics proxies/receivers.

  was:
Currently, the StatsdSink in Spark supports UDP mode only, which is the default 
mode of StatsD. However, in real production environments, we often find that a 
more reliable transmission of metrics is needed to avoid metrics lose in 
high-traffic systems.

 

TCP mode is already supported by Statsd: 
[https://github.com/statsd/statsd/blob/master/docs/server.md]

Prometheus' statsd_exporter: [https://github.com/prometheus/statsd_exporter] 

and also many other Statsd-based metrics proxy/receiver.


> StatsdSink supports TCP mode
> 
>
> Key: SPARK-48298
> URL: https://issues.apache.org/jira/browse/SPARK-48298
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Eric Yang
>Priority: Major
>
> Currently, the StatsdSink in Spark supports UDP mode only, which is the 
> default mode of StatsD. However, in real production environments, we often 
> find that a more reliable transmission of metrics is needed to avoid metrics 
> lose in high-traffic systems.
>  
> TCP mode is already supported by Statsd: 
> [https://github.com/statsd/statsd/blob/master/docs/server.md]
> Prometheus' statsd_exporter: [https://github.com/prometheus/statsd_exporter] 
> and also many other Statsd-based metrics proxies/receivers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48298) StatsdSink supports TCP mode

2024-05-15 Thread Eric Yang (Jira)

Eric Yang created SPARK-48298:
-

 Summary: StatsdSink supports TCP mode
 Key: SPARK-48298
 URL: https://issues.apache.org/jira/browse/SPARK-48298
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Eric Yang


Currently, the StatsdSink in Spark supports UDP mode only, which is the default 
mode of StatsD. However, in real production environments, we often find that a 
more reliable transmission of metrics is needed to avoid metrics lose in 
high-traffic systems.

 

TCP mode is already supported by Statsd: 
[https://github.com/statsd/statsd/blob/master/docs/server.md]

Prometheus' statsd_exporter: [https://github.com/prometheus/statsd_exporter] 

and also many other Statsd-based metrics proxy/receiver.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-48298) StatsdSink supports TCP mode

2024-05-15 Thread Eric Yang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-48298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846789#comment-17846789
 ] 

Eric Yang commented on SPARK-48298:
---

I'm preparing a PR for it.

> StatsdSink supports TCP mode
> 
>
> Key: SPARK-48298
> URL: https://issues.apache.org/jira/browse/SPARK-48298
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Eric Yang
>Priority: Major
>
> Currently, the StatsdSink in Spark supports UDP mode only, which is the 
> default mode of StatsD. However, in real production environments, we often 
> find that a more reliable transmission of metrics is needed to avoid metrics 
> lose in high-traffic systems.
>  
> TCP mode is already supported by Statsd: 
> [https://github.com/statsd/statsd/blob/master/docs/server.md]
> Prometheus' statsd_exporter: [https://github.com/prometheus/statsd_exporter] 
> and also many other Statsd-based metrics proxy/receiver.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48297) Char/Varchar breaks in TRANSFORM clause

2024-05-15 Thread Kent Yao (Jira)

Kent Yao created SPARK-48297:


 Summary: Char/Varchar breaks in TRANSFORM clause
 Key: SPARK-48297
 URL: https://issues.apache.org/jira/browse/SPARK-48297
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48296) Codegen Support for `to_xml`

2024-05-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48296:
---
Labels: pull-request-available  (was: )

> Codegen Support for `to_xml`
> 
>
> Key: SPARK-48296
> URL: https://issues.apache.org/jira/browse/SPARK-48296
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48296) Codegen Support for `to_xml`

2024-05-15 Thread BingKun Pan (Jira)

BingKun Pan created SPARK-48296:
---

 Summary: Codegen Support for `to_xml`
 Key: SPARK-48296
 URL: https://issues.apache.org/jira/browse/SPARK-48296
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48289) Clean up Oracle JDBC tests by skipping redundant SYSTEM password reset

2024-05-15 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48289.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46598
[https://github.com/apache/spark/pull/46598]

> Clean up Oracle JDBC tests by skipping redundant SYSTEM password reset
> --
>
> Key: SPARK-48289
> URL: https://issues.apache.org/jira/browse/SPARK-48289
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Luca Canali
>Assignee: Luca Canali
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48289) Clean up Oracle JDBC tests by skipping redundant SYSTEM password reset

2024-05-15 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-48289:


Assignee: Luca Canali

> Clean up Oracle JDBC tests by skipping redundant SYSTEM password reset
> --
>
> Key: SPARK-48289
> URL: https://issues.apache.org/jira/browse/SPARK-48289
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Luca Canali
>Assignee: Luca Canali
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48252) Update CommonExpressionRef when necessary

2024-05-15 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48252.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46552
[https://github.com/apache/spark/pull/46552]

> Update CommonExpressionRef when necessary
> -
>
> Key: SPARK-48252
> URL: https://issues.apache.org/jira/browse/SPARK-48252
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48252) Update CommonExpressionRef when necessary

2024-05-15 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48252:
---

Assignee: Wenchen Fan

> Update CommonExpressionRef when necessary
> -
>
> Key: SPARK-48252
> URL: https://issues.apache.org/jira/browse/SPARK-48252
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47946) Nested field's nullable value could be invalid after extracted using GetStructField

2024-05-15 Thread Linhong Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Linhong Liu resolved SPARK-47946.
-
Resolution: Not A Problem

> Nested field's nullable value could be invalid after extracted using 
> GetStructField
> ---
>
> Key: SPARK-47946
> URL: https://issues.apache.org/jira/browse/SPARK-47946
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.4.2
>Reporter: Junyoung Cho
>Priority: Major
>
> I've got error when append to table using DataFrameWriterV2.
> The error was occured in TableOutputResolver.checkNullability. This error 
> occurs when the data type of the schema is the same, but the order of the 
> fields is different.
> I found that GetStructField.nullable returns unexpected result.
> {code:java}
> override def nullable: Boolean = child.nullable || 
> childSchema(ordinal).nullable {code}
> Even if nested field has not nullability attribute, it returns true when 
> parent struct has nullability attribute.
> ||Parent nullability||Child nullability||Result||
> |true|true|true|
> |{color:#ff}true{color}|{color:#ff}false{color}|{color:#ff}true{color}|
> |{color:#172b4d}false{color}|{color:#172b4d}true{color}|{color:#172b4d}true{color}|
> |false|false|false|
>  
> I think the logic should be changed to get just child's nullability, because 
> both of parent and child should be nullable to be considered nullable.
>  
> {code:java}
> override def nullable: Boolean = childSchema(ordinal).nullable  {code}
>  
>  
>  
> I want to check current logic is reasonable, or my suggestion can occur other 
> side effect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-47946) Nested field's nullable value could be invalid after extracted using GetStructField

2024-05-15 Thread Linhong Liu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-47946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846773#comment-17846773
 ] 

Linhong Liu commented on SPARK-47946:
-

No, it's not an issue.

think about this

 
||key||value (nullable=true)||
|a|{"x": 1, "y": 2}|
|b|null|
|c|{"x": null, "y": 3}|

let's assume `value.y` cannot be null (e.g. nullable = false), and run `select 
value.y from tbl`, what's the result? and what's the nullability of this 
column? it should be

 

 
||y||
|2|
|null|
|2|

 

 

> Nested field's nullable value could be invalid after extracted using 
> GetStructField
> ---
>
> Key: SPARK-47946
> URL: https://issues.apache.org/jira/browse/SPARK-47946
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.4.2
>Reporter: Junyoung Cho
>Priority: Major
>
> I've got error when append to table using DataFrameWriterV2.
> The error was occured in TableOutputResolver.checkNullability. This error 
> occurs when the data type of the schema is the same, but the order of the 
> fields is different.
> I found that GetStructField.nullable returns unexpected result.
> {code:java}
> override def nullable: Boolean = child.nullable || 
> childSchema(ordinal).nullable {code}
> Even if nested field has not nullability attribute, it returns true when 
> parent struct has nullability attribute.
> ||Parent nullability||Child nullability||Result||
> |true|true|true|
> |{color:#ff}true{color}|{color:#ff}false{color}|{color:#ff}true{color}|
> |{color:#172b4d}false{color}|{color:#172b4d}true{color}|{color:#172b4d}true{color}|
> |false|false|false|
>  
> I think the logic should be changed to get just child's nullability, because 
> both of parent and child should be nullable to be considered nullable.
>  
> {code:java}
> override def nullable: Boolean = childSchema(ordinal).nullable  {code}
>  
>  
>  
> I want to check current logic is reasonable, or my suggestion can occur other 
> side effect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48295) Turn on compute.ops_on_diff_frames by default

2024-05-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48295:
---
Labels: pull-request-available  (was: )

> Turn on compute.ops_on_diff_frames by default
> -
>
> Key: SPARK-48295
> URL: https://issues.apache.org/jira/browse/SPARK-48295
> Project: Spark
>  Issue Type: Improvement
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48295) Turn on compute.ops_on_diff_frames by default

2024-05-15 Thread Ruifeng Zheng (Jira)

Ruifeng Zheng created SPARK-48295:
-

 Summary: Turn on compute.ops_on_diff_frames by default
 Key: SPARK-48295
 URL: https://issues.apache.org/jira/browse/SPARK-48295
 Project: Spark
  Issue Type: Improvement
  Components: PS
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48291) Rename Java Logger as SparkLogger

2024-05-15 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-48291.

Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46600
[https://github.com/apache/spark/pull/46600]

> Rename Java Logger as SparkLogger 
> --
>
> Key: SPARK-48291
> URL: https://issues.apache.org/jira/browse/SPARK-48291
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Two new classes org.apache.spark.internal.Logger and 
> org.apache.spark.internal.LoggerFactory were introduced from 
> [https://github.com/apache/spark/pull/46301|https://github.com/apache/spark/pull/46301,]
> Given that {{Logger}} is a widely recognized interface in Log4j, it may lead 
> to confusion to have a class with the same name. To avoid this and clarify 
> its purpose within the Spark framework, I propose renaming 
> {{org.apache.spark.internal.Logger}} to 
> {{{}org.apache.spark.internal.SparkLogger{}}}. Similarly, to maintain 
> consistency, {{org.apache.spark.internal.LoggerFactory}} should be renamed to 
> {{{}org.apache.spark.internal.SparkLoggerFactory{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-48238) Spark fail to start due to class o.a.h.yarn.server.webproxy.amfilter.AmIpFilter is not a jakarta.servlet.Filter

2024-05-15 Thread HiuFung Kwok (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-48238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846764#comment-17846764
 ] 

HiuFung Kwok edited comment on SPARK-48238 at 5/15/24 10:06 PM:


[~dongjoon] I'm still new to the codebase, I will need to check in exact how we 
currently provide backward support for Hadoop and Hive, before commenting 
further. 

 

 


was (Author: hf):
[~dongjoon] I'm still new to the codebase, I will need to check in exactly how 
we currently provide backward support for Hadoop and Hive, before commenting 
further. 

 

 

> Spark fail to start due to class 
> o.a.h.yarn.server.webproxy.amfilter.AmIpFilter is not a jakarta.servlet.Filter
> ---
>
> Key: SPARK-48238
> URL: https://issues.apache.org/jira/browse/SPARK-48238
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Priority: Blocker
>
> I tested the latest master branch, it failed to start on YARN mode
> {code:java}
> dev/make-distribution.sh --tgz -Phive,hive-thriftserver,yarn{code}
>  
> {code:java}
> $ bin/spark-sql --master yarn
> WARNING: Using incubator modules: jdk.incubator.vector
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> 2024-05-10 17:58:17 WARN NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 2024-05-10 17:58:18 WARN Client: Neither spark.yarn.jars nor 
> spark.yarn.archive} is set, falling back to uploading libraries under 
> SPARK_HOME.
> 2024-05-10 17:58:25 ERROR SparkContext: Error initializing SparkContext.
> org.sparkproject.jetty.util.MultiException: Multiple exceptions
>     at 
> org.sparkproject.jetty.util.MultiException.ifExceptionThrow(MultiException.java:117)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.sparkproject.jetty.servlet.ServletHandler.initialize(ServletHandler.java:751)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.sparkproject.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:392)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.sparkproject.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:902)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.sparkproject.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:306)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at org.apache.spark.ui.ServerInfo.addHandler(JettyUtils.scala:514) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$2(SparkUI.scala:81) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$2$adapted(SparkUI.scala:81)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:619) 
> ~[scala-library-2.13.13.jar:?]
>     at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:617) 
> ~[scala-library-2.13.13.jar:?]
>     at scala.collection.AbstractIterable.foreach(Iterable.scala:935) 
> ~[scala-library-2.13.13.jar:?]
>     at 
> org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$1(SparkUI.scala:81) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$1$adapted(SparkUI.scala:79)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at scala.Option.foreach(Option.scala:437) ~[scala-library-2.13.13.jar:?]
>     at org.apache.spark.ui.SparkUI.attachAllHandlers(SparkUI.scala:79) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at org.apache.spark.SparkContext.$anonfun$new$31(SparkContext.scala:690) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.spark.SparkContext.$anonfun$new$31$adapted(SparkContext.scala:690) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at scala.Option.foreach(Option.scala:437) ~[scala-library-2.13.13.jar:?]
>     at org.apache.spark.SparkContext.(SparkContext.scala:690) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2963) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:1118)
>  ~[spark-sql_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at

[jira] [Commented] (SPARK-48238) Spark fail to start due to class o.a.h.yarn.server.webproxy.amfilter.AmIpFilter is not a jakarta.servlet.Filter

2024-05-15 Thread HiuFung Kwok (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-48238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846764#comment-17846764
 ] 

HiuFung Kwok commented on SPARK-48238:
--

[~dongjoon] I'm still new to the codebase, I will need to check in exactly how 
we currently provide backward support for Hadoop and Hive, before commenting 
further. 

 

 

> Spark fail to start due to class 
> o.a.h.yarn.server.webproxy.amfilter.AmIpFilter is not a jakarta.servlet.Filter
> ---
>
> Key: SPARK-48238
> URL: https://issues.apache.org/jira/browse/SPARK-48238
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Priority: Blocker
>
> I tested the latest master branch, it failed to start on YARN mode
> {code:java}
> dev/make-distribution.sh --tgz -Phive,hive-thriftserver,yarn{code}
>  
> {code:java}
> $ bin/spark-sql --master yarn
> WARNING: Using incubator modules: jdk.incubator.vector
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> 2024-05-10 17:58:17 WARN NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 2024-05-10 17:58:18 WARN Client: Neither spark.yarn.jars nor 
> spark.yarn.archive} is set, falling back to uploading libraries under 
> SPARK_HOME.
> 2024-05-10 17:58:25 ERROR SparkContext: Error initializing SparkContext.
> org.sparkproject.jetty.util.MultiException: Multiple exceptions
>     at 
> org.sparkproject.jetty.util.MultiException.ifExceptionThrow(MultiException.java:117)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.sparkproject.jetty.servlet.ServletHandler.initialize(ServletHandler.java:751)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.sparkproject.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:392)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.sparkproject.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:902)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.sparkproject.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:306)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at org.apache.spark.ui.ServerInfo.addHandler(JettyUtils.scala:514) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$2(SparkUI.scala:81) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$2$adapted(SparkUI.scala:81)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:619) 
> ~[scala-library-2.13.13.jar:?]
>     at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:617) 
> ~[scala-library-2.13.13.jar:?]
>     at scala.collection.AbstractIterable.foreach(Iterable.scala:935) 
> ~[scala-library-2.13.13.jar:?]
>     at 
> org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$1(SparkUI.scala:81) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$1$adapted(SparkUI.scala:79)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at scala.Option.foreach(Option.scala:437) ~[scala-library-2.13.13.jar:?]
>     at org.apache.spark.ui.SparkUI.attachAllHandlers(SparkUI.scala:79) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at org.apache.spark.SparkContext.$anonfun$new$31(SparkContext.scala:690) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.spark.SparkContext.$anonfun$new$31$adapted(SparkContext.scala:690) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at scala.Option.foreach(Option.scala:437) ~[scala-library-2.13.13.jar:?]
>     at org.apache.spark.SparkContext.(SparkContext.scala:690) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2963) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:1118)
>  ~[spark-sql_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at scala.Option.getOrElse(Option.scala:201) [scala-library-2.13.13.jar:?]
>     at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:1112)
>  [spark-sql_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
>

[jira] [Created] (SPARK-48294) Make nestedTypeMissingElementTypeError case insensitive

2024-05-15 Thread Michael Zhang (Jira)

Michael Zhang created SPARK-48294:
-

 Summary: Make nestedTypeMissingElementTypeError case insensitive
 Key: SPARK-48294
 URL: https://issues.apache.org/jira/browse/SPARK-48294
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.1, 3.5.0, 4.0.0, 3.5.2
Reporter: Michael Zhang
 Fix For: 4.0.0


When incorrectly declaring a complex data type using nested types (ARRAY, MAP 
and STRUCT), the query fails with a match error rather than 
`INCOMPLETE_TYPE_DEFINITION`. This is because the match is case sensitive,



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-48294) Make nestedTypeMissingElementTypeError case insensitive

2024-05-15 Thread Michael Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-48294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846739#comment-17846739
 ] 

Michael Zhang commented on SPARK-48294:
---

I am working on this issue

> Make nestedTypeMissingElementTypeError case insensitive
> ---
>
> Key: SPARK-48294
> URL: https://issues.apache.org/jira/browse/SPARK-48294
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0, 4.0.0, 3.5.1, 3.5.2
>Reporter: Michael Zhang
>Priority: Major
> Fix For: 4.0.0
>
>
> When incorrectly declaring a complex data type using nested types (ARRAY, MAP 
> and STRUCT), the query fails with a match error rather than 
> `INCOMPLETE_TYPE_DEFINITION`. This is because the match is case sensitive,



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48291) Rename Java Logger as SparkLogger

2024-05-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48291:
---
Labels: pull-request-available  (was: )

> Rename Java Logger as SparkLogger 
> --
>
> Key: SPARK-48291
> URL: https://issues.apache.org/jira/browse/SPARK-48291
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>  Labels: pull-request-available
>
> Two new classes org.apache.spark.internal.Logger and 
> org.apache.spark.internal.LoggerFactory were introduced from 
> [https://github.com/apache/spark/pull/46301|https://github.com/apache/spark/pull/46301,]
> Given that {{Logger}} is a widely recognized interface in Log4j, it may lead 
> to confusion to have a class with the same name. To avoid this and clarify 
> its purpose within the Spark framework, I propose renaming 
> {{org.apache.spark.internal.Logger}} to 
> {{{}org.apache.spark.internal.SparkLogger{}}}. Similarly, to maintain 
> consistency, {{org.apache.spark.internal.LoggerFactory}} should be renamed to 
> {{{}org.apache.spark.internal.SparkLoggerFactory{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48292) Improve stage failure reason message in OutputCommitCoordinator

2024-05-15 Thread L. C. Hsieh (Jira)

L. C. Hsieh created SPARK-48292:
---

 Summary: Improve stage failure reason message in 
OutputCommitCoordinator 
 Key: SPARK-48292
 URL: https://issues.apache.org/jira/browse/SPARK-48292
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.3
Reporter: L. C. Hsieh


When a task attemp fails but it is authorized to do task commit, 
OutputCommitCoordinator will make the stage failed with a reason message which 
says that task commit success, but actually the driver never knows if a task 
commit is successful or not. We should update the reason message to make it 
less confused.

See https://github.com/apache/spark/pull/36564#discussion_r1598660630



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48292) Improve stage failure reason message in OutputCommitCoordinator

2024-05-15 Thread L. C. Hsieh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh updated SPARK-48292:

Affects Version/s: (was: 3.5.1)
   (was: 3.4.3)

> Improve stage failure reason message in OutputCommitCoordinator 
> 
>
> Key: SPARK-48292
> URL: https://issues.apache.org/jira/browse/SPARK-48292
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: L. C. Hsieh
>Priority: Minor
>
> When a task attemp fails but it is authorized to do task commit, 
> OutputCommitCoordinator will make the stage failed with a reason message 
> which says that task commit success, but actually the driver never knows if a 
> task commit is successful or not. We should update the reason message to make 
> it less confused.
> See https://github.com/apache/spark/pull/36564#discussion_r1598660630



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48292) Improve stage failure reason message in OutputCommitCoordinator

2024-05-15 Thread L. C. Hsieh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh updated SPARK-48292:

Affects Version/s: 3.5.1
   4.0.0

> Improve stage failure reason message in OutputCommitCoordinator 
> 
>
> Key: SPARK-48292
> URL: https://issues.apache.org/jira/browse/SPARK-48292
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.1, 3.4.3
>Reporter: L. C. Hsieh
>Priority: Minor
>
> When a task attemp fails but it is authorized to do task commit, 
> OutputCommitCoordinator will make the stage failed with a reason message 
> which says that task commit success, but actually the driver never knows if a 
> task commit is successful or not. We should update the reason message to make 
> it less confused.
> See https://github.com/apache/spark/pull/36564#discussion_r1598660630



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48291) Rename Java Logger as SparkLogger

2024-05-15 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-48291:
---
Summary: Rename Java Logger as SparkLogger   (was: Refactor Java Logger as 
SparkLogger )

> Rename Java Logger as SparkLogger 
> --
>
> Key: SPARK-48291
> URL: https://issues.apache.org/jira/browse/SPARK-48291
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Two new classes org.apache.spark.internal.Logger and 
> org.apache.spark.internal.LoggerFactory were introduced from 
> [https://github.com/apache/spark/pull/46301|https://github.com/apache/spark/pull/46301,]
> Given that {{Logger}} is a widely recognized interface in Log4j, it may lead 
> to confusion to have a class with the same name. To avoid this and clarify 
> its purpose within the Spark framework, I propose renaming 
> {{org.apache.spark.internal.Logger}} to 
> {{{}org.apache.spark.internal.SparkLogger{}}}. Similarly, to maintain 
> consistency, {{org.apache.spark.internal.LoggerFactory}} should be renamed to 
> {{{}org.apache.spark.internal.SparkLoggerFactory{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48291) Refactor Java Logger as SparkLogger

2024-05-15 Thread Gengliang Wang (Jira)

Gengliang Wang created SPARK-48291:
--

 Summary: Refactor Java Logger as SparkLogger 
 Key: SPARK-48291
 URL: https://issues.apache.org/jira/browse/SPARK-48291
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


Two new classes org.apache.spark.internal.Logger and 
org.apache.spark.internal.LoggerFactory were introduced from 
[https://github.com/apache/spark/pull/46301|https://github.com/apache/spark/pull/46301,]

Given that {{Logger}} is a widely recognized interface in Log4j, it may lead to 
confusion to have a class with the same name. To avoid this and clarify its 
purpose within the Spark framework, I propose renaming 
{{org.apache.spark.internal.Logger}} to 
{{{}org.apache.spark.internal.SparkLogger{}}}. Similarly, to maintain 
consistency, {{org.apache.spark.internal.LoggerFactory}} should be renamed to 
{{{}org.apache.spark.internal.SparkLoggerFactory{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48256) Add a rule to check file headers for the java side, and fix inconsistent files

2024-05-15 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48256.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46557
[https://github.com/apache/spark/pull/46557]

> Add a rule to check file headers for the java side, and fix inconsistent files
> --
>
> Key: SPARK-48256
> URL: https://issues.apache.org/jira/browse/SPARK-48256
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48256) Add a rule to check file headers for the java side, and fix inconsistent files

2024-05-15 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-48256:
-

Assignee: BingKun Pan

> Add a rule to check file headers for the java side, and fix inconsistent files
> --
>
> Key: SPARK-48256
> URL: https://issues.apache.org/jira/browse/SPARK-48256
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48218) TransportClientFactory.createClient may NPE cause FetchFailedException

2024-05-15 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-48218:
-

Assignee: dzcxzl

> TransportClientFactory.createClient may NPE cause FetchFailedException
> --
>
> Key: SPARK-48218
> URL: https://issues.apache.org/jira/browse/SPARK-48218
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 4.0.0
>Reporter: dzcxzl
>Assignee: dzcxzl
>Priority: Minor
>  Labels: pull-request-available
>
> {code:java}
> org.apache.spark.shuffle.FetchFailedException
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:1180)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:913)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:84)
>   at 
> org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:178)
>   at 
> org.apache.spark.network.shuffle.ExternalBlockStoreClient.lambda$fetchBlocks$0(ExternalBlockStoreClient.java:128)
>   at 
> org.apache.spark.network.shuffle.RetryingBlockTransferor.transferAllOutstanding(RetryingBlockTransferor.java:154)
>   at 
> org.apache.spark.network.shuffle.RetryingBlockTransferor.start(RetryingBlockTransferor.java:133)
>   at 
> org.apache.spark.network.shuffle.ExternalBlockStoreClient.fetchBlocks(ExternalBlockStoreClient.java:139)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48218) TransportClientFactory.createClient may NPE cause FetchFailedException

2024-05-15 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48218.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46506
[https://github.com/apache/spark/pull/46506]

> TransportClientFactory.createClient may NPE cause FetchFailedException
> --
>
> Key: SPARK-48218
> URL: https://issues.apache.org/jira/browse/SPARK-48218
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 4.0.0
>Reporter: dzcxzl
>Assignee: dzcxzl
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {code:java}
> org.apache.spark.shuffle.FetchFailedException
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:1180)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:913)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:84)
>   at 
> org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:178)
>   at 
> org.apache.spark.network.shuffle.ExternalBlockStoreClient.lambda$fetchBlocks$0(ExternalBlockStoreClient.java:128)
>   at 
> org.apache.spark.network.shuffle.RetryingBlockTransferor.transferAllOutstanding(RetryingBlockTransferor.java:154)
>   at 
> org.apache.spark.network.shuffle.RetryingBlockTransferor.start(RetryingBlockTransferor.java:133)
>   at 
> org.apache.spark.network.shuffle.ExternalBlockStoreClient.fetchBlocks(ExternalBlockStoreClient.java:139)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48049) Upgrade Scala to 2.13.14

2024-05-15 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48049.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46288
[https://github.com/apache/spark/pull/46288]

> Upgrade Scala to 2.13.14
> 
>
> Key: SPARK-48049
> URL: https://issues.apache.org/jira/browse/SPARK-48049
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48285) Update docs for size function and sizeOfNull configuration

2024-05-15 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48285.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46592
[https://github.com/apache/spark/pull/46592]

> Update docs for size function and sizeOfNull configuration
> --
>
> Key: SPARK-48285
> URL: https://issues.apache.org/jira/browse/SPARK-48285
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-48238) Spark fail to start due to class o.a.h.yarn.server.webproxy.amfilter.AmIpFilter is not a jakarta.servlet.Filter

2024-05-15 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-48238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846701#comment-17846701
 ] 

Dongjoon Hyun commented on SPARK-48238:
---

Hi, [~chengpan] and [~HF] and [~cloud_fan]. Is this true that we need to revert 
SPARK-45522 and SPARK-47118 for only YARN support?
Do you think there is an alternative like we did for Hadoop 2 and Hadoop 3 
support or Hive 1 and Hive 2 support?
For example, can we isolate Jetty issues to YARN module and JettyUtil via 
configurations?

> Spark fail to start due to class 
> o.a.h.yarn.server.webproxy.amfilter.AmIpFilter is not a jakarta.servlet.Filter
> ---
>
> Key: SPARK-48238
> URL: https://issues.apache.org/jira/browse/SPARK-48238
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Priority: Blocker
>
> I tested the latest master branch, it failed to start on YARN mode
> {code:java}
> dev/make-distribution.sh --tgz -Phive,hive-thriftserver,yarn{code}
>  
> {code:java}
> $ bin/spark-sql --master yarn
> WARNING: Using incubator modules: jdk.incubator.vector
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> 2024-05-10 17:58:17 WARN NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 2024-05-10 17:58:18 WARN Client: Neither spark.yarn.jars nor 
> spark.yarn.archive} is set, falling back to uploading libraries under 
> SPARK_HOME.
> 2024-05-10 17:58:25 ERROR SparkContext: Error initializing SparkContext.
> org.sparkproject.jetty.util.MultiException: Multiple exceptions
>     at 
> org.sparkproject.jetty.util.MultiException.ifExceptionThrow(MultiException.java:117)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.sparkproject.jetty.servlet.ServletHandler.initialize(ServletHandler.java:751)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.sparkproject.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:392)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.sparkproject.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:902)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.sparkproject.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:306)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at org.apache.spark.ui.ServerInfo.addHandler(JettyUtils.scala:514) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$2(SparkUI.scala:81) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$2$adapted(SparkUI.scala:81)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:619) 
> ~[scala-library-2.13.13.jar:?]
>     at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:617) 
> ~[scala-library-2.13.13.jar:?]
>     at scala.collection.AbstractIterable.foreach(Iterable.scala:935) 
> ~[scala-library-2.13.13.jar:?]
>     at 
> org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$1(SparkUI.scala:81) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$1$adapted(SparkUI.scala:79)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at scala.Option.foreach(Option.scala:437) ~[scala-library-2.13.13.jar:?]
>     at org.apache.spark.ui.SparkUI.attachAllHandlers(SparkUI.scala:79) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at org.apache.spark.SparkContext.$anonfun$new$31(SparkContext.scala:690) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.spark.SparkContext.$anonfun$new$31$adapted(SparkContext.scala:690) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at scala.Option.foreach(Option.scala:437) ~[scala-library-2.13.13.jar:?]
>     at org.apache.spark.SparkContext.(SparkContext.scala:690) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2963) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:1118)
>  ~[spark-sql_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at scala.Option.getOrElse(Option.scala:201) [scala-library-2.13.13.jar:?]
>     at 
>

[jira] [Updated] (SPARK-48238) Spark fail to start due to class o.a.h.yarn.server.webproxy.amfilter.AmIpFilter is not a jakarta.servlet.Filter

2024-05-15 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-48238:
--
Description: 
I tested the latest master branch, it failed to start on YARN mode
{code:java}
dev/make-distribution.sh --tgz -Phive,hive-thriftserver,yarn{code}
 
{code:java}
$ bin/spark-sql --master yarn
WARNING: Using incubator modules: jdk.incubator.vector
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
2024-05-10 17:58:17 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
2024-05-10 17:58:18 WARN Client: Neither spark.yarn.jars nor 
spark.yarn.archive} is set, falling back to uploading libraries under 
SPARK_HOME.
2024-05-10 17:58:25 ERROR SparkContext: Error initializing SparkContext.
org.sparkproject.jetty.util.MultiException: Multiple exceptions
    at 
org.sparkproject.jetty.util.MultiException.ifExceptionThrow(MultiException.java:117)
 ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
    at 
org.sparkproject.jetty.servlet.ServletHandler.initialize(ServletHandler.java:751)
 ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
    at 
org.sparkproject.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:392)
 ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
    at 
org.sparkproject.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:902)
 ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
    at 
org.sparkproject.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:306)
 ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
    at 
org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93)
 ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
    at org.apache.spark.ui.ServerInfo.addHandler(JettyUtils.scala:514) 
~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
    at 
org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$2(SparkUI.scala:81) 
~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
    at 
org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$2$adapted(SparkUI.scala:81)
 ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
    at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:619) 
~[scala-library-2.13.13.jar:?]
    at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:617) 
~[scala-library-2.13.13.jar:?]
    at scala.collection.AbstractIterable.foreach(Iterable.scala:935) 
~[scala-library-2.13.13.jar:?]
    at 
org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$1(SparkUI.scala:81) 
~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
    at 
org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$1$adapted(SparkUI.scala:79)
 ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
    at scala.Option.foreach(Option.scala:437) ~[scala-library-2.13.13.jar:?]
    at org.apache.spark.ui.SparkUI.attachAllHandlers(SparkUI.scala:79) 
~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
    at org.apache.spark.SparkContext.$anonfun$new$31(SparkContext.scala:690) 
~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
    at 
org.apache.spark.SparkContext.$anonfun$new$31$adapted(SparkContext.scala:690) 
~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
    at scala.Option.foreach(Option.scala:437) ~[scala-library-2.13.13.jar:?]
    at org.apache.spark.SparkContext.(SparkContext.scala:690) 
~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
    at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2963) 
~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
    at 
org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:1118)
 ~[spark-sql_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
    at scala.Option.getOrElse(Option.scala:201) [scala-library-2.13.13.jar:?]
    at 
org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:1112) 
[spark-sql_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
    at 
org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:64) 
[spark-hive-thriftserver_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
    at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:405)
 [spark-hive-thriftserver_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
    at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:162)
 [spark-hive-thriftserver_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
    at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
 [spark-hive-thriftserver_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method) ~[?:?]
    at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
 ~[?:?]
    at

[jira] [Updated] (SPARK-48290) AQE not working when joining dataframes with more than 2000 partitions

2024-05-15 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-48290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

André F. updated SPARK-48290:
-
Description: 
We are joining 2 large dataframes with a considerable skew on the left side in 
one specific key (>2000 skew ratio).
{code:java}
left side num partitions: 10335
right side num partitions: 1241

left side num rows: 20181947343
right side num rows: 107462219 {code}
Since we have `{{{}spark.sql.adaptive.enabled{}}} ` we expect AQE to act during 
the join, dealing with the skewed partition automatically.

During their join, we can see the following log indicating that the skew was 
not detected since their statistics looks weirdly equal for min/median/max 
sizes:
{code:java}
OptimizeSkewedJoin: number of skewed partitions: left 0, right 0
 OptimizeSkewedJoin: 
Optimizing skewed join.
Left side partitions size info:
median size: 780925482, max size: 780925482, min size: 780925482, avg size: 
780925482
Right side partitions size info:
median size: 3325797, max size: 3325797, min size: 3325797, avg size: 3325797
   {code}
Looking at this log line and the spark configuration possibilities, our two 
main hypotheses to work around this behavior and correctly detect the skew were:
 # Increasing the `minNumPartitionsToHighlyCompress` so that Spark doesn’t 
convert the statistics into a `CompressedMapStatus` and therefore is able to 
identify the skewed partition.
 # Allowing spark to use a `HighlyCompressedMapStatus`, but change other 
configurations such as `spark.shuffle.accurateBlockThreshold` and 
`spark.shuffle.accurateBlockSkewedFactor` so that even then the size of the 
skewed partitions/blocks is accurately registered and consequently used in the 
optimization.

We tried different values for `spark.shuffle.accurateBlockThreshold` (even 
absurd ones like 1MB) and nothing seem to work. The statistics indicates that 
the min/median and max are the same somehow and thus, the skew is not detected.

However, when forcibly reducing `spark.sql.shuffle.partitions` to less than 
2000 partitions, the statistics looked correct and the optimized skewed join 
acts as it should:
{code:java}
OptimizeSkewedJoin: number of skewed partitions: left 1, right 0
OptimizeSkewedJoin: Left side partition 42 (263 GB) is skewed, split it into 
337 parts.
OptimizeSkewedJoin: 
Optimizing skewed join.
Left side partitions size info:
median size: 862803419, max size: 282616632301, min size: 842320875, avg size: 
1019367139
Right side partitions size info:
median size: 4320067, max size: 4376957, min size: 4248989, avg size: 4319766 
{code}
Should we assume that the statistics are becoming corrupted when Spark uses 
`HighlyCompressedMapStatus`? Should we try another configuration property to 
try to work around this problem? (Assuming that fine tuning all dataframes in 
skewed joins in our ETL to have less than 2000 partitions is not an option)

 

  was:
We are joining 2 large dataframes with a considerable skew on the left side in 
one specific key (>2000 skew ratio). Since we have 
`{{{}spark.sql.adaptive.enabled{}}} ` we expect AQE to act during the join, 
dealing with the skewed partition automatically.

During their join, we can see the following log indicating that the skew was 
not detected since their statistics looks weirdly equal for min/median/max 
sizes:
{code:java}
OptimizeSkewedJoin: number of skewed partitions: left 0, right 0
 OptimizeSkewedJoin: 
Optimizing skewed join.
Left side partitions size info:
median size: 780925482, max size: 780925482, min size: 780925482, avg size: 
780925482
Right side partitions size info:
median size: 3325797, max size: 3325797, min size: 3325797, avg size: 3325797
   {code}
Looking at this log line and the spark configuration possibilities, our two 
main hypotheses to work around this behavior and correctly detect the skew were:
 # Increasing the `minNumPartitionsToHighlyCompress` so that Spark doesn’t 
convert the statistics into a `CompressedMapStatus` and therefore is able to 
identify the skewed partition.
 # Allowing spark to use a `HighlyCompressedMapStatus`, but change other 
configurations such as `spark.shuffle.accurateBlockThreshold` and 
`spark.shuffle.accurateBlockSkewedFactor` so that even then the size of the 
skewed partitions/blocks is accurately registered and consequently used in the 
optimization.

We tried different values for `spark.shuffle.accurateBlockThreshold` (even 
absurd ones like 1MB) and nothing seem to work. The statistics indicates that 
the min/median and max are the same somehow and thus, the skew is not detected.

However, when forcibly reducing `spark.sql.shuffle.partitions` to less than 
2000 partitions, the statistics looked correct and the optimized skewed join 
acts as it should:
{code:java}
OptimizeSkewedJoin: number of skewed partitions: left 1, right 0
OptimizeSkewedJoin: Left side partition 42 (263 GB) is skewed, split it into 
337

[jira] [Created] (SPARK-48290) AQE not working when joining dataframes with more than 2000 partitions

2024-05-15 Thread Jira

André F. created SPARK-48290:


 Summary: AQE not working when joining dataframes with more than 
2000 partitions
 Key: SPARK-48290
 URL: https://issues.apache.org/jira/browse/SPARK-48290
 Project: Spark
  Issue Type: Question
  Components: Optimizer, SQL
Affects Versions: 3.5.1, 3.3.2
 Environment: spark-standalone

spark3.5.1

 
Reporter: André F.


We are joining 2 large dataframes with a considerable skew on the left side in 
one specific key (>2000 skew ratio). Since we have 
`{{{}spark.sql.adaptive.enabled{}}} ` we expect AQE to act during the join, 
dealing with the skewed partition automatically.

During their join, we can see the following log indicating that the skew was 
not detected since their statistics looks weirdly equal for min/median/max 
sizes:
{code:java}
OptimizeSkewedJoin: number of skewed partitions: left 0, right 0
 OptimizeSkewedJoin: 
Optimizing skewed join.
Left side partitions size info:
median size: 780925482, max size: 780925482, min size: 780925482, avg size: 
780925482
Right side partitions size info:
median size: 3325797, max size: 3325797, min size: 3325797, avg size: 3325797
   {code}
Looking at this log line and the spark configuration possibilities, our two 
main hypotheses to work around this behavior and correctly detect the skew were:
 # Increasing the `minNumPartitionsToHighlyCompress` so that Spark doesn’t 
convert the statistics into a `CompressedMapStatus` and therefore is able to 
identify the skewed partition.
 # Allowing spark to use a `HighlyCompressedMapStatus`, but change other 
configurations such as `spark.shuffle.accurateBlockThreshold` and 
`spark.shuffle.accurateBlockSkewedFactor` so that even then the size of the 
skewed partitions/blocks is accurately registered and consequently used in the 
optimization.

We tried different values for `spark.shuffle.accurateBlockThreshold` (even 
absurd ones like 1MB) and nothing seem to work. The statistics indicates that 
the min/median and max are the same somehow and thus, the skew is not detected.

However, when forcibly reducing `spark.sql.shuffle.partitions` to less than 
2000 partitions, the statistics looked correct and the optimized skewed join 
acts as it should:
{code:java}
OptimizeSkewedJoin: number of skewed partitions: left 1, right 0
OptimizeSkewedJoin: Left side partition 42 (263 GB) is skewed, split it into 
337 parts.
OptimizeSkewedJoin: 
Optimizing skewed join.
Left side partitions size info:
median size: 862803419, max size: 282616632301, min size: 842320875, avg size: 
1019367139
Right side partitions size info:
median size: 4320067, max size: 4376957, min size: 4248989, avg size: 4319766 
{code}
Should we assume that the statistics are becoming corrupted when Spark uses 
`HighlyCompressedMapStatus`? Should we try another configuration property to 
try to work around this problem? (Assuming that fine tuning all dataframes in 
skewed joins in our ETL to have less than 2000 partitions is not an option)

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48172) Fix escaping issues in JDBCDialects

2024-05-15 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48172.
-
Fix Version/s: 3.4.4
   3.5.2
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 46588
[https://github.com/apache/spark/pull/46588]

> Fix escaping issues in JDBCDialects
> ---
>
> Key: SPARK-48172
> URL: https://issues.apache.org/jira/browse/SPARK-48172
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Milosevic
>Assignee: Mihailo Milosevic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.4, 3.5.2, 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] (SPARK-32472) Expose confusion matrix elements by threshold in BinaryClassificationMetrics

2024-05-15 Thread Gideon P (Jira)



[ https://issues.apache.org/jira/browse/SPARK-32472 ]


Gideon P deleted comment on SPARK-32472:
--

was (Author: JIRAUSER304403):
[~kmoore] can I raise a PR for this issue? 

> Expose confusion matrix elements by threshold in BinaryClassificationMetrics
> 
>
> Key: SPARK-32472
> URL: https://issues.apache.org/jira/browse/SPARK-32472
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 3.0.0
>Reporter: Kevin Moore
>Priority: Minor
>
> Currently, the only thresholded metrics available from 
> BinaryClassificationMetrics are precision, recall, f-measure, and (indirectly 
> through roc()) the false positive rate.
> Unfortunately, you can't always compute the individual thresholded confusion 
> matrix elements (TP, FP, TN, FN) from these quantities. You can make a system 
> of equations out of the existing thresholded metrics and the total count, but 
> they become underdetermined when there are no true positives.
> Fortunately, the individual confusion matrix elements by threshold are 
> already computed and sitting in the confusions variable. It would be helpful 
> to expose these elements directly. The easiest way would probably be by 
> adding methods like 
> {code:java}
> def truePositivesByThreshold(): RDD[(Double, Double)] = confusions.map{ case 
> (t, c) => (t, c.weightedTruePositives) }{code}
> An alternative could be to expose the entire RDD[(Double, 
> BinaryConfusionMatrix)] in one method, but BinaryConfusionMatrix is also 
> currently package private.
> The closest issue to this I found was this one for adding new calculations to 
> BinaryClassificationMetrics 
> https://issues.apache.org/jira/browse/SPARK-18844, which was closed without 
> any changes being merged.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32472) Expose confusion matrix elements by threshold in BinaryClassificationMetrics

2024-05-15 Thread Gideon P (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846644#comment-17846644
 ] 

Gideon P commented on SPARK-32472:
--

[~kmoore] can I raise a PR for this issue? 

> Expose confusion matrix elements by threshold in BinaryClassificationMetrics
> 
>
> Key: SPARK-32472
> URL: https://issues.apache.org/jira/browse/SPARK-32472
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 3.0.0
>Reporter: Kevin Moore
>Priority: Minor
>
> Currently, the only thresholded metrics available from 
> BinaryClassificationMetrics are precision, recall, f-measure, and (indirectly 
> through roc()) the false positive rate.
> Unfortunately, you can't always compute the individual thresholded confusion 
> matrix elements (TP, FP, TN, FN) from these quantities. You can make a system 
> of equations out of the existing thresholded metrics and the total count, but 
> they become underdetermined when there are no true positives.
> Fortunately, the individual confusion matrix elements by threshold are 
> already computed and sitting in the confusions variable. It would be helpful 
> to expose these elements directly. The easiest way would probably be by 
> adding methods like 
> {code:java}
> def truePositivesByThreshold(): RDD[(Double, Double)] = confusions.map{ case 
> (t, c) => (t, c.weightedTruePositives) }{code}
> An alternative could be to expose the entire RDD[(Double, 
> BinaryConfusionMatrix)] in one method, but BinaryConfusionMatrix is also 
> currently package private.
> The closest issue to this I found was this one for adding new calculations to 
> BinaryClassificationMetrics 
> https://issues.apache.org/jira/browse/SPARK-18844, which was closed without 
> any changes being merged.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48289) Clen up Oracle JDBC tests by skipping redundant SYSTEM password reset

2024-05-15 Thread Luca Canali (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luca Canali updated SPARK-48289:

Summary: Clen up Oracle JDBC tests by skipping redundant SYSTEM password 
reset  (was: Improve Oracle JDBC tests by skipping redundant SYSTEM password 
reset)

> Clen up Oracle JDBC tests by skipping redundant SYSTEM password reset
> -
>
> Key: SPARK-48289
> URL: https://issues.apache.org/jira/browse/SPARK-48289
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Luca Canali
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48289) Clean up Oracle JDBC tests by skipping redundant SYSTEM password reset

2024-05-15 Thread Luca Canali (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luca Canali updated SPARK-48289:

Summary: Clean up Oracle JDBC tests by skipping redundant SYSTEM password 
reset  (was: Clen up Oracle JDBC tests by skipping redundant SYSTEM password 
reset)

> Clean up Oracle JDBC tests by skipping redundant SYSTEM password reset
> --
>
> Key: SPARK-48289
> URL: https://issues.apache.org/jira/browse/SPARK-48289
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Luca Canali
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48289) Improve Oracle JDBC tests by skipping redundant SYSTEM password reset

2024-05-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48289:
---
Labels: pull-request-available  (was: )

> Improve Oracle JDBC tests by skipping redundant SYSTEM password reset
> -
>
> Key: SPARK-48289
> URL: https://issues.apache.org/jira/browse/SPARK-48289
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Luca Canali
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45769) data retrieval fails on executors with spark connect

2024-05-15 Thread Sven Teresniak (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846640#comment-17846640
 ] 

Sven Teresniak commented on SPARK-45769:


I can confirm this bug when using Spark Connect to access a standalone Spark 
(v3.5.1) cluster in a k8s environment. In my setup we are *not* using k8s 
operators (static "standalone" cluster). The bug *only* occurs when the Spark 
Connect Server is using a Spark master with adjacent workers. I can not trigger 
the bug when the driver (read: Spark Connect Server) is doing all the work 
(like a `local[0]` setup).

> data retrieval fails on executors with spark connect
> 
>
> Key: SPARK-45769
> URL: https://issues.apache.org/jira/browse/SPARK-45769
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Steven Ottens
>Priority: Major
>
> We have an OpenShift cluster with Spark and JupyterHub and we use 
> Spark-Connect to access Spark from within Jupyter. This worked fine with 
> Spark 3.4.1. However after upgrading to Spark 3.5.0 we were not able to 
> access any data in our Delta Tables through Spark. Initially I assumed it was 
> a bug in Delta: [https://github.com/delta-io/delta/issues/2235]
> The actual error is
> {code:java}
> SparkConnectGrpcException: (org.apache.spark.SparkException) Job aborted due 
> to stage failure: Task 0 in stage 6.0 failed 4 times, most recent failure: 
> Lost task 0.3 in stage 6.0 (TID 13) (172.31.15.72 executor 4): 
> java.lang.ClassCastException: cannot assign instance of 
> java.lang.invoke.SerializedLambda to field 
> org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance 
> of org.apache.spark.rdd.MapPartitionsRDD{code}
> However after further investigation I discovered that this is a regression in 
> Spark 3.5.0. The issue is similar to SPARK-36917, however I am not using any 
> custom functions, nor any other classes than spark-connect, and this setup 
> used to work in 3.4.1. The issue only occurs when remote executors are used 
> in a kubernetes environment. Running a plain Spark-Connect eg
> {code:java}
> ./sbin/start-connect-server.sh --packages 
> org.apache.spark:spark-connect_2.12:3.5.0{code}
> doesn't produce the error.
> The issue occurs both in a full OpenShift cluster as in a tiny minikube 
> setup. The steps to reproduce are based on the minikube setup.
> You need to have a minimal Spark 3.5.0 setup with 1 driver and at least 1 
> executor and use python to access data through Spark. The query I used to 
> test this is
> {code:java}
> from pyspark.sql import SparkSession
> logFile = '/opt/spark/work-dir/data.csv'
> spark = SparkSession.builder.remote('sc://spark-connect').getOrCreate()
> df = spark.read.csv(logFile)
> df.count()
> {code}
> However it doesn't matter if the data is local, or remote on a S3 storage, 
> nor if the data is plain text, CSV or Delta Table.
> h3. Steps to reproduce:
>  # Install minikube
>  # Create a service account 'spark'
> {code:java}
> kubectl create sa spark{code}
>  # Bind the 'edit' role to the service account
> {code:java}
> kubectl create rolebinding spark-edit \
>  --clusterrole=edit \
>  --serviceaccount=default:spark \
>  --namespace=default{code}
>  # Create a service for spark
> {code:java}
> kubectl create -f service.yml{code}
>  # Create a Spark-Connect deployment with the default Spark docker image: 
> [https://hub.docker.com/_/spark] (do change the deployment.yml to point to 
> the kubernetes API endpoint
> {code:java}
> kubectl create -f deployment.yml{code}
>  # Add data to both the executor and the driver pods, e.g. login on the 
> terminal of the pods and run on both pods
> {code:java}
> touch data.csv
> echo id,name > data.csv
> echo 1,2 >> data.csv {code}
>  # Start a spark-remote session to access the newly created data. I logged in 
> on the driver pod and installed the necessary python packages:
> {code:java}
> python3 -m pip install pandas pyspark grpcio-tools grpcio-status pyarrow{code}
> Started a python shell and executed:
> {code:java}
> from pyspark.sql import SparkSession
> logFile = '/opt/spark/work-dir/data.csv'
> spark = SparkSession.builder.remote('sc://spark-connect').getOrCreate()
> df = spark.read.csv(logFile)
> df.count() {code}
> h3. Necessary files:
> Service.yml:
> {code:java}
> apiVersion: v1
> kind: Service
> metadata:
>   labels:
> app: spark-connect
>   name: spark-connect
>   namespace: default
> spec:
>   ipFamilies:
> - IPv4
>   ports:
> - name: connect-grpc
>   protocol: TCP
>   port: 15002 # Port the service listens on.
>   targetPort: 15002 # Port on the backing pods to which the service 
> forwards connections
> - name: sparkui
>   protocol: TCP
>   port: 4040 # Port the service listens on.
>

[jira] [Created] (SPARK-48289) Improve Oracle JDBC tests by skipping redundant SYSTEM password reset

2024-05-15 Thread Luca Canali (Jira)

Luca Canali created SPARK-48289:
---

 Summary: Improve Oracle JDBC tests by skipping redundant SYSTEM 
password reset
 Key: SPARK-48289
 URL: https://issues.apache.org/jira/browse/SPARK-48289
 Project: Spark
  Issue Type: Improvement
  Components: Tests
Affects Versions: 4.0.0
Reporter: Luca Canali






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48288) Add source data type to connector.Cast expression

2024-05-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48288:
---
Labels: pull-request-available  (was: )

> Add source data type to connector.Cast expression
> -
>
> Key: SPARK-48288
> URL: https://issues.apache.org/jira/browse/SPARK-48288
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Uros Stankovic
>Priority: Major
>  Labels: pull-request-available
>
> Currently, 
> V2ExpressionBuilder will build connector.Cast expression from catalyst.Cast 
> expression.
> Catalyst cast have expression data type, but connector cast does not have it.
> Since some casts are not allowed on external engine, we need to know source 
> and target data type, since we want finer granularity to block some 
> unsupported casts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48288) Add source data type to connector.Cast expression

2024-05-15 Thread Uros Stankovic (Jira)

Uros Stankovic created SPARK-48288:
--

 Summary: Add source data type to connector.Cast expression
 Key: SPARK-48288
 URL: https://issues.apache.org/jira/browse/SPARK-48288
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Uros Stankovic


Currently, 
V2ExpressionBuilder will build connector.Cast expression from catalyst.Cast 
expression.
Catalyst cast have expression data type, but connector cast does not have it.
Since some casts are not allowed on external engine, we need to know source and 
target data type, since we want finer granularity to block some unsupported 
casts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48287) Apply the builtin `timestamp_diff` method

2024-05-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48287:
---
Labels: pull-request-available  (was: )

> Apply the builtin `timestamp_diff` method
> -
>
> Key: SPARK-48287
> URL: https://issues.apache.org/jira/browse/SPARK-48287
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48287) Apply the builtin `timestamp_diff` method

2024-05-15 Thread Ruifeng Zheng (Jira)

Ruifeng Zheng created SPARK-48287:
-

 Summary: Apply the builtin `timestamp_diff` method
 Key: SPARK-48287
 URL: https://issues.apache.org/jira/browse/SPARK-48287
 Project: Spark
  Issue Type: Improvement
  Components: Connect, PS
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48286) Analyze 'exists' default expression instead of 'current' default expression in structField to v2 column conversion

2024-05-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48286:
---
Labels: pull-request-available  (was: )

> Analyze 'exists' default expression instead of 'current' default expression 
> in structField to v2 column conversion
> --
>
> Key: SPARK-48286
> URL: https://issues.apache.org/jira/browse/SPARK-48286
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Uros Stankovic
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> org.apache.spark.sql.catalyst.util.ResolveDefaultColumns#analyze method 
> accepts 3 parameter
> 1) Field to analyze
> 2) Statement type - String
> 3) Metadata key - CURRENT_DEFAULT or EXISTS_DEFAULT
> Method 
> org.apache.spark.sql.connector.catalog.CatalogV2Util#structFieldToV2Column
> pass fieldToAnalyze and EXISTS_DEFAULT as second parameter, so it is not 
> metadata key, instead of that, it is statement type, so bad expression is 
> analyzed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48286) Analyze 'exists' default expression instead of 'current' default expression in structField to v2 column conversion

2024-05-15 Thread Uros Stankovic (Jira)

Uros Stankovic created SPARK-48286:
--

 Summary: Analyze 'exists' default expression instead of 'current' 
default expression in structField to v2 column conversion
 Key: SPARK-48286
 URL: https://issues.apache.org/jira/browse/SPARK-48286
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Uros Stankovic
 Fix For: 4.0.0


org.apache.spark.sql.catalyst.util.ResolveDefaultColumns#analyze method accepts 
3 parameter

1) Field to analyze
2) Statement type - String

3) Metadata key - CURRENT_DEFAULT or EXISTS_DEFAULT



Method 
org.apache.spark.sql.connector.catalog.CatalogV2Util#structFieldToV2Column
pass fieldToAnalyze and EXISTS_DEFAULT as second parameter, so it is not 
metadata key, instead of that, it is statement type, so bad expression is 
analyzed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48271) Turn match error in RowEncoder into UNSUPPORTED_DATA_TYPE_FOR_ENCODER

2024-05-15 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48271.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46586
[https://github.com/apache/spark/pull/46586]

> Turn match error in RowEncoder into UNSUPPORTED_DATA_TYPE_FOR_ENCODER
> -
>
> Key: SPARK-48271
> URL: https://issues.apache.org/jira/browse/SPARK-48271
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48271) Turn match error in RowEncoder into UNSUPPORTED_DATA_TYPE_FOR_ENCODER

2024-05-15 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-48271:


Assignee: Wenchen Fan

> Turn match error in RowEncoder into UNSUPPORTED_DATA_TYPE_FOR_ENCODER
> -
>
> Key: SPARK-48271
> URL: https://issues.apache.org/jira/browse/SPARK-48271
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48285) Update docs for size function and sizeOfNull configuration

2024-05-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48285:
---
Labels: pull-request-available  (was: )

> Update docs for size function and sizeOfNull configuration
> --
>
> Key: SPARK-48285
> URL: https://issues.apache.org/jira/browse/SPARK-48285
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48285) Update docs for size function and sizeOfNull configuration

2024-05-15 Thread Kent Yao (Jira)

Kent Yao created SPARK-48285:


 Summary: Update docs for size function and sizeOfNull configuration
 Key: SPARK-48285
 URL: https://issues.apache.org/jira/browse/SPARK-48285
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48279) Upgrade ORC to 2.0.1

2024-05-15 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-48279.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46587
[https://github.com/apache/spark/pull/46587]

> Upgrade ORC to 2.0.1
> 
>
> Key: SPARK-48279
> URL: https://issues.apache.org/jira/browse/SPARK-48279
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: William Hyun
>Assignee: William Hyun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48278) Refine the string representation of `Cast`

2024-05-15 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-48278.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46585
[https://github.com/apache/spark/pull/46585]

> Refine the string representation of `Cast`
> --
>
> Key: SPARK-48278
> URL: https://issues.apache.org/jira/browse/SPARK-48278
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48272) Add function `timestamp_diff`

2024-05-15 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-48272:
-

Assignee: Ruifeng Zheng

> Add function `timestamp_diff`
> -
>
> Key: SPARK-48272
> URL: https://issues.apache.org/jira/browse/SPARK-48272
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48272) Add function `timestamp_diff`

2024-05-15 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-48272.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46576
[https://github.com/apache/spark/pull/46576]

> Add function `timestamp_diff`
> -
>
> Key: SPARK-48272
> URL: https://issues.apache.org/jira/browse/SPARK-48272
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24914) totalSize is not a good estimate for broadcast joins

2024-05-15 Thread Giambattista Bloisi (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-24914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846525#comment-17846525
 ] 

Giambattista Bloisi commented on SPARK-24914:
-

Also SPARK-40038 reports problems because compressed file size is used in place 
of the uncompressed file size: but in the case of loading the files

> totalSize is not a good estimate for broadcast joins
> 
>
> Key: SPARK-24914
> URL: https://issues.apache.org/jira/browse/SPARK-24914
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Bruce Robbins
>Priority: Major
>
> When determining whether to do a broadcast join, Spark estimates the size of 
> the smaller table as follows:
>  - if totalSize is defined and greater than 0, use it.
>  - else, if rawDataSize is defined and greater than 0, use it
>  - else, use spark.sql.defaultSizeInBytes (default: Long.MaxValue)
> Therefore, Spark prefers totalSize over rawDataSize.
> Unfortunately, totalSize is often quite a bit smaller than the actual table 
> size, since it represents the size of the table's files on disk. Parquet and 
> Orc files, for example, are encoded and compressed. This can result in the 
> JVM throwing an OutOfMemoryError while Spark is loading the table into a 
> HashedRelation, or when Spark actually attempts to broadcast the data.
> On the other hand, rawDataSize represents the uncompressed size of the 
> dataset, according to Hive documentation. This seems like a pretty good 
> number to use in preference to totalSize. However, due to HIVE-20079, this 
> value is simply #columns * #rows. Once that bug is fixed, it may be a 
> superior statistic, at least for managed tables.
> In the meantime, we could apply a configurable "fudge factor" to totalSize, 
> at least for types of files that are encoded and compressed. Hive has the 
> setting hive.stats.deserialization.factor, which defaults to 1.0, and is 
> described as follows:
> {quote}in the absence of uncompressed/raw data size, total file size will be 
> used for statistics annotation. But the file may be compressed, encoded and 
> serialized which may be lesser in size than the actual uncompressed/raw data 
> size. This factor will be multiplied to file size to estimate the raw data 
> size.
> {quote}
> Also, I propose a configuration setting to allow the user to completely 
> ignore rawDataSize, since that value is broken (due to HIVE-20079). When that 
> configuration setting is set to true, Spark would instead estimate the table 
> as follows:
> - if totalSize is defined and greater than 0, use totalSize*fudgeFactor.
>  - else, use spark.sql.defaultSizeInBytes (default: Long.MaxValue)
> Caveat: This mitigates the issue only for Hive tables. It does not help much 
> when the user is reading files using {{spark.read.parquet}}, unless we apply 
> the same fudge factor there.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-48284) Fix UTF8String indexOf behaviour for empty string search

2024-05-15 Thread Jira



[ 
https://issues.apache.org/jira/browse/SPARK-48284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846527#comment-17846527
 ] 

Uroš Bojanić commented on SPARK-48284:
--

Working on this

> Fix UTF8String indexOf behaviour for empty string search
> 
>
> Key: SPARK-48284
> URL: https://issues.apache.org/jira/browse/SPARK-48284
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>
> Currently, UTF8String.indexOf returns 0 when given an empty parameters 
> string, and any integer start value.
> Examples:
> {{"abc".indexOf("", 0);  // returns: 0}}
> {{"abc".indexOf("", 2);  // returns: 0}}
> {{"abc".indexOf("", 9);  // returns: 0}}
> {{"abc".indexOf("", -3); // returns: 0}}
> This is not correct, as "start" is not taken into consideration.
> Correct behaviour would be:
> {{"abc".indexOf("", 0);  // returns: 0}}
> {{"abc".indexOf("", 2);  // returns: 2}}
> {{"abc".indexOf("", 9);  // returns: -1}}
> {{"abc".indexOf("", -3); // returns: -1}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48284) Fix UTF8String indexOf behaviour for empty string search

2024-05-15 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-48284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-48284:
-
Description: 
Currently, UTF8String.indexOf returns 0 when given an empty parameters string, 
and any integer start value.

Examples:

{{"abc".indexOf("", 0);  // returns: 0}}

{{"abc".indexOf("", 2);  // returns: 0}}

{{"abc".indexOf("", 9);  // returns: 0}}

{{"abc".indexOf("", -3); // returns: 0}}

This is not correct, as "start" is not taken into consideration.

Correct behaviour would be:

{{"abc".indexOf("", 0);  // returns: 0}}

{{"abc".indexOf("", 2);  // returns: 2}}

{{"abc".indexOf("", 9);  // returns: -1}}

{{"abc".indexOf("", -3); // returns: -1}}

  was:
Currently, UTF8String.indexOf returns 0 when given an empty parameters string, 
and any integer start value.

Examples:

{{"abc".indexOf("", 0);  // returns: 0}}

{{{}"abc".indexOf("", 2);  // returns: 0{}}}{{{}{}}}

{{"abc".indexOf("", 9);  // returns: 0}}

{{{}"abc".indexOf("", -3);  // returns: 0{}}}{{{}{}}}{{{}{}}}


This is not correct, as "start" is not taken into consideration.

Correct behaviour would be:

{{"abc".indexOf("", 0);  // returns: 0}}

{{{}"abc".indexOf("", 2);  // returns: 2{}}}{{{}{}}}

{{"abc".indexOf("", 9);  // returns: -1}}

{{"abc".indexOf("", -3);  // returns: -1}}


> Fix UTF8String indexOf behaviour for empty string search
> 
>
> Key: SPARK-48284
> URL: https://issues.apache.org/jira/browse/SPARK-48284
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>
> Currently, UTF8String.indexOf returns 0 when given an empty parameters 
> string, and any integer start value.
> Examples:
> {{"abc".indexOf("", 0);  // returns: 0}}
> {{"abc".indexOf("", 2);  // returns: 0}}
> {{"abc".indexOf("", 9);  // returns: 0}}
> {{"abc".indexOf("", -3); // returns: 0}}
> This is not correct, as "start" is not taken into consideration.
> Correct behaviour would be:
> {{"abc".indexOf("", 0);  // returns: 0}}
> {{"abc".indexOf("", 2);  // returns: 2}}
> {{"abc".indexOf("", 9);  // returns: -1}}
> {{"abc".indexOf("", -3); // returns: -1}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48284) Fix UTF8String indexOf behaviour for empty string search

2024-05-15 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-48284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-48284:
-
Description: 
Currently, UTF8String.indexOf returns 0 when given an empty parameters string, 
and any integer start value.

Examples:

{{"abc".indexOf("", 0);  // returns: 0}}

{{{}"abc".indexOf("", 2);  // returns: 0{}}}{{{}{}}}

{{"abc".indexOf("", 9);  // returns: 0}}

{{{}"abc".indexOf("", -3);  // returns: 0{}}}{{{}{}}}{{{}{}}}


This is not correct, as "start" is not taken into consideration.

Correct behaviour would be:

{{"abc".indexOf("", 0);  // returns: 0}}

{{{}"abc".indexOf("", 2);  // returns: 2{}}}{{{}{}}}

{{"abc".indexOf("", 9);  // returns: -1}}

{{"abc".indexOf("", -3);  // returns: -1}}

  was:Calling UTF8String.indexOf with an empty parameters string, and any 
integer start value.


> Fix UTF8String indexOf behaviour for empty string search
> 
>
> Key: SPARK-48284
> URL: https://issues.apache.org/jira/browse/SPARK-48284
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>
> Currently, UTF8String.indexOf returns 0 when given an empty parameters 
> string, and any integer start value.
> Examples:
> {{"abc".indexOf("", 0);  // returns: 0}}
> {{{}"abc".indexOf("", 2);  // returns: 0{}}}{{{}{}}}
> {{"abc".indexOf("", 9);  // returns: 0}}
> {{{}"abc".indexOf("", -3);  // returns: 0{}}}{{{}{}}}{{{}{}}}
> This is not correct, as "start" is not taken into consideration.
> Correct behaviour would be:
> {{"abc".indexOf("", 0);  // returns: 0}}
> {{{}"abc".indexOf("", 2);  // returns: 2{}}}{{{}{}}}
> {{"abc".indexOf("", 9);  // returns: -1}}
> {{"abc".indexOf("", -3);  // returns: -1}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48284) Fix UTF8String indexOf behaviour for empty string search

2024-05-15 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-48284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-48284:
-
Description: Calling UTF8String.indexOf with an empty parameters string, 
and any integer start value.

> Fix UTF8String indexOf behaviour for empty string search
> 
>
> Key: SPARK-48284
> URL: https://issues.apache.org/jira/browse/SPARK-48284
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>
> Calling UTF8String.indexOf with an empty parameters string, and any integer 
> start value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48284) Fix UTF8String indexOf behaviour for empty string search

2024-05-15 Thread Jira

Uroš Bojanić created SPARK-48284:


 Summary: Fix UTF8String indexOf behaviour for empty string search
 Key: SPARK-48284
 URL: https://issues.apache.org/jira/browse/SPARK-48284
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 4.0.0
Reporter: Uroš Bojanić






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48283) Implement modified Lowercase operation for UTF8_BINARY_LCASE

2024-05-15 Thread Jira

Uroš Bojanić created SPARK-48283:


 Summary: Implement modified Lowercase operation for 
UTF8_BINARY_LCASE
 Key: SPARK-48283
 URL: https://issues.apache.org/jira/browse/SPARK-48283
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Uroš Bojanić






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48281) Alter logic for: instr, substring_index (UTF8_BINARY_LCASE)

2024-05-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48281:
---
Labels: pull-request-available  (was: )

> Alter logic for: instr, substring_index (UTF8_BINARY_LCASE)
> ---
>
> Key: SPARK-48281
> URL: https://issues.apache.org/jira/browse/SPARK-48281
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48282) Alter logic for: find_in_set, replace (UTF8_BINARY_LCASE)

2024-05-15 Thread Jira

Uroš Bojanić created SPARK-48282:


 Summary: Alter logic for: find_in_set, replace (UTF8_BINARY_LCASE)
 Key: SPARK-48282
 URL: https://issues.apache.org/jira/browse/SPARK-48282
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Uroš Bojanić






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48221) Alter logic for: startsWith, endsWith, contains, locate (UTF8_BINARY_LCASE)

2024-05-15 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-48221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-48221:
-
Summary: Alter logic for: startsWith, endsWith, contains, locate 
(UTF8_BINARY_LCASE)  (was: Alter logic for startsWith, endsWith, contains, 
locate (UTF8_BINARY_LCASE))

> Alter logic for: startsWith, endsWith, contains, locate (UTF8_BINARY_LCASE)
> ---
>
> Key: SPARK-48221
> URL: https://issues.apache.org/jira/browse/SPARK-48221
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48281) Alter logic for: instr, substring_index (UTF8_BINARY_LCASE)

2024-05-15 Thread Jira

Uroš Bojanić created SPARK-48281:


 Summary: Alter logic for: instr, substring_index 
(UTF8_BINARY_LCASE)
 Key: SPARK-48281
 URL: https://issues.apache.org/jira/browse/SPARK-48281
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Uroš Bojanić






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48221) Alter logic for startsWith, endsWith, contains, locate (UTF8_BINARY_LCASE)

2024-05-15 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-48221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-48221:
-
Summary: Alter logic for startsWith, endsWith, contains, locate 
(UTF8_BINARY_LCASE)  (was: Alter string search logic for UTF8_BINARY_LCASE 
collation)

> Alter logic for startsWith, endsWith, contains, locate (UTF8_BINARY_LCASE)
> --
>
> Key: SPARK-48221
> URL: https://issues.apache.org/jira/browse/SPARK-48221
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48269) DB2: Document Mapping Spark SQL Data Types from DB2 and add tests

2024-05-15 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-48269:


Assignee: Kent Yao

> DB2: Document Mapping Spark SQL Data Types from DB2 and add tests
> -
>
> Key: SPARK-48269
> URL: https://issues.apache.org/jira/browse/SPARK-48269
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, Tests
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48269) DB2: Document Mapping Spark SQL Data Types from DB2 and add tests

2024-05-15 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-48269.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46572
[https://github.com/apache/spark/pull/46572]

> DB2: Document Mapping Spark SQL Data Types from DB2 and add tests
> -
>
> Key: SPARK-48269
> URL: https://issues.apache.org/jira/browse/SPARK-48269
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, Tests
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48280) Add Expression Walker for Testing

2024-05-15 Thread Mihailo Milosevic (Jira)

Mihailo Milosevic created SPARK-48280:
-

 Summary: Add Expression Walker for Testing
 Key: SPARK-48280
 URL: https://issues.apache.org/jira/browse/SPARK-48280
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Mihailo Milosevic






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48277) Improve error message for ErrorClassesJsonReader.getErrorMessage

2024-05-15 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48277.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46584
[https://github.com/apache/spark/pull/46584]

> Improve error message for ErrorClassesJsonReader.getErrorMessage
> 
>
> Key: SPARK-48277
> URL: https://issues.apache.org/jira/browse/SPARK-48277
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48160) XPath expressions (all collations)

2024-05-15 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48160:
---

Assignee: Uroš Bojanić

> XPath expressions (all collations)
> --
>
> Key: SPARK-48160
> URL: https://issues.apache.org/jira/browse/SPARK-48160
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48160) XPath expressions (all collations)

2024-05-15 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48160.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46508
[https://github.com/apache/spark/pull/46508]

> XPath expressions (all collations)
> --
>
> Key: SPARK-48160
> URL: https://issues.apache.org/jira/browse/SPARK-48160
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48160) XPath expressions (all collations)

2024-05-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48160:
---
Labels: pull-request-available  (was: )

> XPath expressions (all collations)
> --
>
> Key: SPARK-48160
> URL: https://issues.apache.org/jira/browse/SPARK-48160
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48162) Miscellaneous expressions (all collations)

2024-05-15 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48162:
---

Assignee: Uroš Bojanić

> Miscellaneous expressions (all collations)
> --
>
> Key: SPARK-48162
> URL: https://issues.apache.org/jira/browse/SPARK-48162
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48162) Miscellaneous expressions (all collations)

2024-05-15 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-48162.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46461
[https://github.com/apache/spark/pull/46461]

> Miscellaneous expressions (all collations)
> --
>
> Key: SPARK-48162
> URL: https://issues.apache.org/jira/browse/SPARK-48162
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

96 matches

Mail list logo