[jira] [Resolved] (SPARK-46001) Spark UI Test Improvements
[ https://issues.apache.org/jira/browse/SPARK-46001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-46001. -- Fix Version/s: 4.0.0 Assignee: Kent Yao Resolution: Fixed > Spark UI Test Improvements > -- > > Key: SPARK-46001 > URL: https://issues.apache.org/jira/browse/SPARK-46001 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL, Tests, UI >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 4.0.0 > > > Spark UI tests are not supported, it's hard to test for developers and > maintain for the owners -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48299) Upgrade scala-maven-plugin to 4.9.1
[ https://issues.apache.org/jira/browse/SPARK-48299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-48299: Assignee: Yang Jie > Upgrade scala-maven-plugin to 4.9.1 > --- > > Key: SPARK-48299 > URL: https://issues.apache.org/jira/browse/SPARK-48299 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48299) Upgrade scala-maven-plugin to 4.9.1
[ https://issues.apache.org/jira/browse/SPARK-48299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48299. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46593 [https://github.com/apache/spark/pull/46593] > Upgrade scala-maven-plugin to 4.9.1 > --- > > Key: SPARK-48299 > URL: https://issues.apache.org/jira/browse/SPARK-48299 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48299) Upgrade scala-maven-plugin to 4.9.1
[ https://issues.apache.org/jira/browse/SPARK-48299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48299: --- Labels: pull-request-available (was: ) > Upgrade scala-maven-plugin to 4.9.1 > --- > > Key: SPARK-48299 > URL: https://issues.apache.org/jira/browse/SPARK-48299 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47607) Add documentation for Structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47607: --- Labels: pull-request-available (was: ) > Add documentation for Structured logging framework > -- > > Key: SPARK-47607 > URL: https://issues.apache.org/jira/browse/SPARK-47607 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48214) Ban import `org.slf4j.Logger` & `org.slf4j.LoggerFactory`
[ https://issues.apache.org/jira/browse/SPARK-48214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-48214: -- Assignee: BingKun Pan > Ban import `org.slf4j.Logger` & `org.slf4j.LoggerFactory` > - > > Key: SPARK-48214 > URL: https://issues.apache.org/jira/browse/SPARK-48214 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Critical > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48214) Ban import `org.slf4j.Logger` & `org.slf4j.LoggerFactory`
[ https://issues.apache.org/jira/browse/SPARK-48214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-48214. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46502 [https://github.com/apache/spark/pull/46502] > Ban import `org.slf4j.Logger` & `org.slf4j.LoggerFactory` > - > > Key: SPARK-48214 > URL: https://issues.apache.org/jira/browse/SPARK-48214 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-48298) Add TCP mode to StatsdSink
[ https://issues.apache.org/jira/browse/SPARK-48298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846789#comment-17846789 ] Eric Yang edited comment on SPARK-48298 at 5/16/24 4:48 AM: PR: https://github.com/apache/spark/pull/46604 was (Author: JIRAUSER304132): I'm preparing a PR for it. > Add TCP mode to StatsdSink > -- > > Key: SPARK-48298 > URL: https://issues.apache.org/jira/browse/SPARK-48298 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Eric Yang >Priority: Major > Labels: pull-request-available > > Currently, the StatsdSink in Spark supports UDP mode only, which is the > default mode of StatsD. However, in real production environments, we often > find that a more reliable transmission of metrics is needed to avoid metrics > lose in high-traffic systems. > > TCP mode is already supported by Statsd: > [https://github.com/statsd/statsd/blob/master/docs/server.md] > Prometheus' statsd_exporter: [https://github.com/prometheus/statsd_exporter] > and also many other Statsd-based metrics proxies/receivers. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36783) ScanOperation should not push Filter through nondeterministic Project
[ https://issues.apache.org/jira/browse/SPARK-36783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-36783: --- Labels: pull-request-available (was: ) > ScanOperation should not push Filter through nondeterministic Project > - > > Key: SPARK-36783 > URL: https://issues.apache.org/jira/browse/SPARK-36783 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: pull-request-available > Fix For: 3.2.0, 3.1.3, 3.0.4 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48287) Apply the builtin `timestamp_diff` method
[ https://issues.apache.org/jira/browse/SPARK-48287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-48287: - Assignee: Ruifeng Zheng > Apply the builtin `timestamp_diff` method > - > > Key: SPARK-48287 > URL: https://issues.apache.org/jira/browse/SPARK-48287 > Project: Spark > Issue Type: Improvement > Components: Connect, PS >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48287) Apply the builtin `timestamp_diff` method
[ https://issues.apache.org/jira/browse/SPARK-48287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-48287. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46595 [https://github.com/apache/spark/pull/46595] > Apply the builtin `timestamp_diff` method > - > > Key: SPARK-48287 > URL: https://issues.apache.org/jira/browse/SPARK-48287 > Project: Spark > Issue Type: Improvement > Components: Connect, PS >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48219) StreamReader Charset fix with UTF8
[ https://issues.apache.org/jira/browse/SPARK-48219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-48219: Assignee: xy > StreamReader Charset fix with UTF8 > -- > > Key: SPARK-48219 > URL: https://issues.apache.org/jira/browse/SPARK-48219 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.3 >Reporter: xy >Assignee: xy >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48219) StreamReader Charset fix with UTF8
[ https://issues.apache.org/jira/browse/SPARK-48219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48219. -- Resolution: Fixed Issue resolved by pull request 46509 [https://github.com/apache/spark/pull/46509] > StreamReader Charset fix with UTF8 > -- > > Key: SPARK-48219 > URL: https://issues.apache.org/jira/browse/SPARK-48219 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.3 >Reporter: xy >Assignee: xy >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48299) Upgrade scala-maven-plugin to 4.9.1
Yang Jie created SPARK-48299: Summary: Upgrade scala-maven-plugin to 4.9.1 Key: SPARK-48299 URL: https://issues.apache.org/jira/browse/SPARK-48299 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 4.0.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48298) Add TCP mode to StatsdSink
[ https://issues.apache.org/jira/browse/SPARK-48298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48298: --- Labels: pull-request-available (was: ) > Add TCP mode to StatsdSink > -- > > Key: SPARK-48298 > URL: https://issues.apache.org/jira/browse/SPARK-48298 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Eric Yang >Priority: Major > Labels: pull-request-available > > Currently, the StatsdSink in Spark supports UDP mode only, which is the > default mode of StatsD. However, in real production environments, we often > find that a more reliable transmission of metrics is needed to avoid metrics > lose in high-traffic systems. > > TCP mode is already supported by Statsd: > [https://github.com/statsd/statsd/blob/master/docs/server.md] > Prometheus' statsd_exporter: [https://github.com/prometheus/statsd_exporter] > and also many other Statsd-based metrics proxies/receivers. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48298) Add TCP mode to StatsdSink
[ https://issues.apache.org/jira/browse/SPARK-48298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated SPARK-48298: -- Summary: Add TCP mode to StatsdSink (was: StatsdSink supports TCP mode) > Add TCP mode to StatsdSink > -- > > Key: SPARK-48298 > URL: https://issues.apache.org/jira/browse/SPARK-48298 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Eric Yang >Priority: Major > > Currently, the StatsdSink in Spark supports UDP mode only, which is the > default mode of StatsD. However, in real production environments, we often > find that a more reliable transmission of metrics is needed to avoid metrics > lose in high-traffic systems. > > TCP mode is already supported by Statsd: > [https://github.com/statsd/statsd/blob/master/docs/server.md] > Prometheus' statsd_exporter: [https://github.com/prometheus/statsd_exporter] > and also many other Statsd-based metrics proxies/receivers. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48295) Turn on compute.ops_on_diff_frames by default
[ https://issues.apache.org/jira/browse/SPARK-48295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48295. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46602 [https://github.com/apache/spark/pull/46602] > Turn on compute.ops_on_diff_frames by default > - > > Key: SPARK-48295 > URL: https://issues.apache.org/jira/browse/SPARK-48295 > Project: Spark > Issue Type: Improvement > Components: PS >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48295) Turn on compute.ops_on_diff_frames by default
[ https://issues.apache.org/jira/browse/SPARK-48295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48295: Assignee: Ruifeng Zheng > Turn on compute.ops_on_diff_frames by default > - > > Key: SPARK-48295 > URL: https://issues.apache.org/jira/browse/SPARK-48295 > Project: Spark > Issue Type: Improvement > Components: PS >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48298) StatsdSink supports TCP mode
[ https://issues.apache.org/jira/browse/SPARK-48298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated SPARK-48298: -- Description: Currently, the StatsdSink in Spark supports UDP mode only, which is the default mode of StatsD. However, in real production environments, we often find that a more reliable transmission of metrics is needed to avoid metrics lose in high-traffic systems. TCP mode is already supported by Statsd: [https://github.com/statsd/statsd/blob/master/docs/server.md] Prometheus' statsd_exporter: [https://github.com/prometheus/statsd_exporter] and also many other Statsd-based metrics proxies/receivers. was: Currently, the StatsdSink in Spark supports UDP mode only, which is the default mode of StatsD. However, in real production environments, we often find that a more reliable transmission of metrics is needed to avoid metrics lose in high-traffic systems. TCP mode is already supported by Statsd: [https://github.com/statsd/statsd/blob/master/docs/server.md] Prometheus' statsd_exporter: [https://github.com/prometheus/statsd_exporter] and also many other Statsd-based metrics proxy/receiver. > StatsdSink supports TCP mode > > > Key: SPARK-48298 > URL: https://issues.apache.org/jira/browse/SPARK-48298 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Eric Yang >Priority: Major > > Currently, the StatsdSink in Spark supports UDP mode only, which is the > default mode of StatsD. However, in real production environments, we often > find that a more reliable transmission of metrics is needed to avoid metrics > lose in high-traffic systems. > > TCP mode is already supported by Statsd: > [https://github.com/statsd/statsd/blob/master/docs/server.md] > Prometheus' statsd_exporter: [https://github.com/prometheus/statsd_exporter] > and also many other Statsd-based metrics proxies/receivers. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48298) StatsdSink supports TCP mode
Eric Yang created SPARK-48298: - Summary: StatsdSink supports TCP mode Key: SPARK-48298 URL: https://issues.apache.org/jira/browse/SPARK-48298 Project: Spark Issue Type: New Feature Components: Spark Core Affects Versions: 4.0.0 Reporter: Eric Yang Currently, the StatsdSink in Spark supports UDP mode only, which is the default mode of StatsD. However, in real production environments, we often find that a more reliable transmission of metrics is needed to avoid metrics lose in high-traffic systems. TCP mode is already supported by Statsd: [https://github.com/statsd/statsd/blob/master/docs/server.md] Prometheus' statsd_exporter: [https://github.com/prometheus/statsd_exporter] and also many other Statsd-based metrics proxy/receiver. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-48298) StatsdSink supports TCP mode
[ https://issues.apache.org/jira/browse/SPARK-48298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846789#comment-17846789 ] Eric Yang commented on SPARK-48298: --- I'm preparing a PR for it. > StatsdSink supports TCP mode > > > Key: SPARK-48298 > URL: https://issues.apache.org/jira/browse/SPARK-48298 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Eric Yang >Priority: Major > > Currently, the StatsdSink in Spark supports UDP mode only, which is the > default mode of StatsD. However, in real production environments, we often > find that a more reliable transmission of metrics is needed to avoid metrics > lose in high-traffic systems. > > TCP mode is already supported by Statsd: > [https://github.com/statsd/statsd/blob/master/docs/server.md] > Prometheus' statsd_exporter: [https://github.com/prometheus/statsd_exporter] > and also many other Statsd-based metrics proxy/receiver. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48297) Char/Varchar breaks in TRANSFORM clause
Kent Yao created SPARK-48297: Summary: Char/Varchar breaks in TRANSFORM clause Key: SPARK-48297 URL: https://issues.apache.org/jira/browse/SPARK-48297 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48296) Codegen Support for `to_xml`
[ https://issues.apache.org/jira/browse/SPARK-48296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48296: --- Labels: pull-request-available (was: ) > Codegen Support for `to_xml` > > > Key: SPARK-48296 > URL: https://issues.apache.org/jira/browse/SPARK-48296 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48296) Codegen Support for `to_xml`
BingKun Pan created SPARK-48296: --- Summary: Codegen Support for `to_xml` Key: SPARK-48296 URL: https://issues.apache.org/jira/browse/SPARK-48296 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48289) Clean up Oracle JDBC tests by skipping redundant SYSTEM password reset
[ https://issues.apache.org/jira/browse/SPARK-48289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48289. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46598 [https://github.com/apache/spark/pull/46598] > Clean up Oracle JDBC tests by skipping redundant SYSTEM password reset > -- > > Key: SPARK-48289 > URL: https://issues.apache.org/jira/browse/SPARK-48289 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 4.0.0 >Reporter: Luca Canali >Assignee: Luca Canali >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48289) Clean up Oracle JDBC tests by skipping redundant SYSTEM password reset
[ https://issues.apache.org/jira/browse/SPARK-48289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-48289: Assignee: Luca Canali > Clean up Oracle JDBC tests by skipping redundant SYSTEM password reset > -- > > Key: SPARK-48289 > URL: https://issues.apache.org/jira/browse/SPARK-48289 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 4.0.0 >Reporter: Luca Canali >Assignee: Luca Canali >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48252) Update CommonExpressionRef when necessary
[ https://issues.apache.org/jira/browse/SPARK-48252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48252. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46552 [https://github.com/apache/spark/pull/46552] > Update CommonExpressionRef when necessary > - > > Key: SPARK-48252 > URL: https://issues.apache.org/jira/browse/SPARK-48252 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48252) Update CommonExpressionRef when necessary
[ https://issues.apache.org/jira/browse/SPARK-48252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48252: --- Assignee: Wenchen Fan > Update CommonExpressionRef when necessary > - > > Key: SPARK-48252 > URL: https://issues.apache.org/jira/browse/SPARK-48252 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47946) Nested field's nullable value could be invalid after extracted using GetStructField
[ https://issues.apache.org/jira/browse/SPARK-47946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Linhong Liu resolved SPARK-47946. - Resolution: Not A Problem > Nested field's nullable value could be invalid after extracted using > GetStructField > --- > > Key: SPARK-47946 > URL: https://issues.apache.org/jira/browse/SPARK-47946 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.4.2 >Reporter: Junyoung Cho >Priority: Major > > I've got error when append to table using DataFrameWriterV2. > The error was occured in TableOutputResolver.checkNullability. This error > occurs when the data type of the schema is the same, but the order of the > fields is different. > I found that GetStructField.nullable returns unexpected result. > {code:java} > override def nullable: Boolean = child.nullable || > childSchema(ordinal).nullable {code} > Even if nested field has not nullability attribute, it returns true when > parent struct has nullability attribute. > ||Parent nullability||Child nullability||Result|| > |true|true|true| > |{color:#ff}true{color}|{color:#ff}false{color}|{color:#ff}true{color}| > |{color:#172b4d}false{color}|{color:#172b4d}true{color}|{color:#172b4d}true{color}| > |false|false|false| > > I think the logic should be changed to get just child's nullability, because > both of parent and child should be nullable to be considered nullable. > > {code:java} > override def nullable: Boolean = childSchema(ordinal).nullable {code} > > > > I want to check current logic is reasonable, or my suggestion can occur other > side effect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47946) Nested field's nullable value could be invalid after extracted using GetStructField
[ https://issues.apache.org/jira/browse/SPARK-47946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846773#comment-17846773 ] Linhong Liu commented on SPARK-47946: - No, it's not an issue. think about this ||key||value (nullable=true)|| |a|{"x": 1, "y": 2}| |b|null| |c|{"x": null, "y": 3}| let's assume `value.y` cannot be null (e.g. nullable = false), and run `select value.y from tbl`, what's the result? and what's the nullability of this column? it should be ||y|| |2| |null| |2| > Nested field's nullable value could be invalid after extracted using > GetStructField > --- > > Key: SPARK-47946 > URL: https://issues.apache.org/jira/browse/SPARK-47946 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.4.2 >Reporter: Junyoung Cho >Priority: Major > > I've got error when append to table using DataFrameWriterV2. > The error was occured in TableOutputResolver.checkNullability. This error > occurs when the data type of the schema is the same, but the order of the > fields is different. > I found that GetStructField.nullable returns unexpected result. > {code:java} > override def nullable: Boolean = child.nullable || > childSchema(ordinal).nullable {code} > Even if nested field has not nullability attribute, it returns true when > parent struct has nullability attribute. > ||Parent nullability||Child nullability||Result|| > |true|true|true| > |{color:#ff}true{color}|{color:#ff}false{color}|{color:#ff}true{color}| > |{color:#172b4d}false{color}|{color:#172b4d}true{color}|{color:#172b4d}true{color}| > |false|false|false| > > I think the logic should be changed to get just child's nullability, because > both of parent and child should be nullable to be considered nullable. > > {code:java} > override def nullable: Boolean = childSchema(ordinal).nullable {code} > > > > I want to check current logic is reasonable, or my suggestion can occur other > side effect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48295) Turn on compute.ops_on_diff_frames by default
[ https://issues.apache.org/jira/browse/SPARK-48295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48295: --- Labels: pull-request-available (was: ) > Turn on compute.ops_on_diff_frames by default > - > > Key: SPARK-48295 > URL: https://issues.apache.org/jira/browse/SPARK-48295 > Project: Spark > Issue Type: Improvement > Components: PS >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48295) Turn on compute.ops_on_diff_frames by default
Ruifeng Zheng created SPARK-48295: - Summary: Turn on compute.ops_on_diff_frames by default Key: SPARK-48295 URL: https://issues.apache.org/jira/browse/SPARK-48295 Project: Spark Issue Type: Improvement Components: PS Affects Versions: 4.0.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48291) Rename Java Logger as SparkLogger
[ https://issues.apache.org/jira/browse/SPARK-48291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-48291. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46600 [https://github.com/apache/spark/pull/46600] > Rename Java Logger as SparkLogger > -- > > Key: SPARK-48291 > URL: https://issues.apache.org/jira/browse/SPARK-48291 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Two new classes org.apache.spark.internal.Logger and > org.apache.spark.internal.LoggerFactory were introduced from > [https://github.com/apache/spark/pull/46301|https://github.com/apache/spark/pull/46301,] > Given that {{Logger}} is a widely recognized interface in Log4j, it may lead > to confusion to have a class with the same name. To avoid this and clarify > its purpose within the Spark framework, I propose renaming > {{org.apache.spark.internal.Logger}} to > {{{}org.apache.spark.internal.SparkLogger{}}}. Similarly, to maintain > consistency, {{org.apache.spark.internal.LoggerFactory}} should be renamed to > {{{}org.apache.spark.internal.SparkLoggerFactory{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-48238) Spark fail to start due to class o.a.h.yarn.server.webproxy.amfilter.AmIpFilter is not a jakarta.servlet.Filter
[ https://issues.apache.org/jira/browse/SPARK-48238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846764#comment-17846764 ] HiuFung Kwok edited comment on SPARK-48238 at 5/15/24 10:06 PM: [~dongjoon] I'm still new to the codebase, I will need to check in exact how we currently provide backward support for Hadoop and Hive, before commenting further. was (Author: hf): [~dongjoon] I'm still new to the codebase, I will need to check in exactly how we currently provide backward support for Hadoop and Hive, before commenting further. > Spark fail to start due to class > o.a.h.yarn.server.webproxy.amfilter.AmIpFilter is not a jakarta.servlet.Filter > --- > > Key: SPARK-48238 > URL: https://issues.apache.org/jira/browse/SPARK-48238 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Priority: Blocker > > I tested the latest master branch, it failed to start on YARN mode > {code:java} > dev/make-distribution.sh --tgz -Phive,hive-thriftserver,yarn{code} > > {code:java} > $ bin/spark-sql --master yarn > WARNING: Using incubator modules: jdk.incubator.vector > Setting default log level to "WARN". > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use > setLogLevel(newLevel). > 2024-05-10 17:58:17 WARN NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 2024-05-10 17:58:18 WARN Client: Neither spark.yarn.jars nor > spark.yarn.archive} is set, falling back to uploading libraries under > SPARK_HOME. > 2024-05-10 17:58:25 ERROR SparkContext: Error initializing SparkContext. > org.sparkproject.jetty.util.MultiException: Multiple exceptions > at > org.sparkproject.jetty.util.MultiException.ifExceptionThrow(MultiException.java:117) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.sparkproject.jetty.servlet.ServletHandler.initialize(ServletHandler.java:751) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.sparkproject.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:392) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.sparkproject.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:902) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.sparkproject.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:306) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.spark.ui.ServerInfo.addHandler(JettyUtils.scala:514) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$2(SparkUI.scala:81) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$2$adapted(SparkUI.scala:81) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:619) > ~[scala-library-2.13.13.jar:?] > at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:617) > ~[scala-library-2.13.13.jar:?] > at scala.collection.AbstractIterable.foreach(Iterable.scala:935) > ~[scala-library-2.13.13.jar:?] > at > org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$1(SparkUI.scala:81) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$1$adapted(SparkUI.scala:79) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at scala.Option.foreach(Option.scala:437) ~[scala-library-2.13.13.jar:?] > at org.apache.spark.ui.SparkUI.attachAllHandlers(SparkUI.scala:79) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.spark.SparkContext.$anonfun$new$31(SparkContext.scala:690) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.spark.SparkContext.$anonfun$new$31$adapted(SparkContext.scala:690) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at scala.Option.foreach(Option.scala:437) ~[scala-library-2.13.13.jar:?] > at org.apache.spark.SparkContext.(SparkContext.scala:690) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2963) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:1118) > ~[spark-sql_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at
[jira] [Commented] (SPARK-48238) Spark fail to start due to class o.a.h.yarn.server.webproxy.amfilter.AmIpFilter is not a jakarta.servlet.Filter
[ https://issues.apache.org/jira/browse/SPARK-48238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846764#comment-17846764 ] HiuFung Kwok commented on SPARK-48238: -- [~dongjoon] I'm still new to the codebase, I will need to check in exactly how we currently provide backward support for Hadoop and Hive, before commenting further. > Spark fail to start due to class > o.a.h.yarn.server.webproxy.amfilter.AmIpFilter is not a jakarta.servlet.Filter > --- > > Key: SPARK-48238 > URL: https://issues.apache.org/jira/browse/SPARK-48238 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Priority: Blocker > > I tested the latest master branch, it failed to start on YARN mode > {code:java} > dev/make-distribution.sh --tgz -Phive,hive-thriftserver,yarn{code} > > {code:java} > $ bin/spark-sql --master yarn > WARNING: Using incubator modules: jdk.incubator.vector > Setting default log level to "WARN". > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use > setLogLevel(newLevel). > 2024-05-10 17:58:17 WARN NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 2024-05-10 17:58:18 WARN Client: Neither spark.yarn.jars nor > spark.yarn.archive} is set, falling back to uploading libraries under > SPARK_HOME. > 2024-05-10 17:58:25 ERROR SparkContext: Error initializing SparkContext. > org.sparkproject.jetty.util.MultiException: Multiple exceptions > at > org.sparkproject.jetty.util.MultiException.ifExceptionThrow(MultiException.java:117) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.sparkproject.jetty.servlet.ServletHandler.initialize(ServletHandler.java:751) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.sparkproject.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:392) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.sparkproject.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:902) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.sparkproject.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:306) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.spark.ui.ServerInfo.addHandler(JettyUtils.scala:514) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$2(SparkUI.scala:81) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$2$adapted(SparkUI.scala:81) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:619) > ~[scala-library-2.13.13.jar:?] > at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:617) > ~[scala-library-2.13.13.jar:?] > at scala.collection.AbstractIterable.foreach(Iterable.scala:935) > ~[scala-library-2.13.13.jar:?] > at > org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$1(SparkUI.scala:81) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$1$adapted(SparkUI.scala:79) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at scala.Option.foreach(Option.scala:437) ~[scala-library-2.13.13.jar:?] > at org.apache.spark.ui.SparkUI.attachAllHandlers(SparkUI.scala:79) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.spark.SparkContext.$anonfun$new$31(SparkContext.scala:690) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.spark.SparkContext.$anonfun$new$31$adapted(SparkContext.scala:690) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at scala.Option.foreach(Option.scala:437) ~[scala-library-2.13.13.jar:?] > at org.apache.spark.SparkContext.(SparkContext.scala:690) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2963) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:1118) > ~[spark-sql_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at scala.Option.getOrElse(Option.scala:201) [scala-library-2.13.13.jar:?] > at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:1112) > [spark-sql_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at >
[jira] [Created] (SPARK-48294) Make nestedTypeMissingElementTypeError case insensitive
Michael Zhang created SPARK-48294: - Summary: Make nestedTypeMissingElementTypeError case insensitive Key: SPARK-48294 URL: https://issues.apache.org/jira/browse/SPARK-48294 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.1, 3.5.0, 4.0.0, 3.5.2 Reporter: Michael Zhang Fix For: 4.0.0 When incorrectly declaring a complex data type using nested types (ARRAY, MAP and STRUCT), the query fails with a match error rather than `INCOMPLETE_TYPE_DEFINITION`. This is because the match is case sensitive, -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-48294) Make nestedTypeMissingElementTypeError case insensitive
[ https://issues.apache.org/jira/browse/SPARK-48294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846739#comment-17846739 ] Michael Zhang commented on SPARK-48294: --- I am working on this issue > Make nestedTypeMissingElementTypeError case insensitive > --- > > Key: SPARK-48294 > URL: https://issues.apache.org/jira/browse/SPARK-48294 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0, 4.0.0, 3.5.1, 3.5.2 >Reporter: Michael Zhang >Priority: Major > Fix For: 4.0.0 > > > When incorrectly declaring a complex data type using nested types (ARRAY, MAP > and STRUCT), the query fails with a match error rather than > `INCOMPLETE_TYPE_DEFINITION`. This is because the match is case sensitive, -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48291) Rename Java Logger as SparkLogger
[ https://issues.apache.org/jira/browse/SPARK-48291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48291: --- Labels: pull-request-available (was: ) > Rename Java Logger as SparkLogger > -- > > Key: SPARK-48291 > URL: https://issues.apache.org/jira/browse/SPARK-48291 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Labels: pull-request-available > > Two new classes org.apache.spark.internal.Logger and > org.apache.spark.internal.LoggerFactory were introduced from > [https://github.com/apache/spark/pull/46301|https://github.com/apache/spark/pull/46301,] > Given that {{Logger}} is a widely recognized interface in Log4j, it may lead > to confusion to have a class with the same name. To avoid this and clarify > its purpose within the Spark framework, I propose renaming > {{org.apache.spark.internal.Logger}} to > {{{}org.apache.spark.internal.SparkLogger{}}}. Similarly, to maintain > consistency, {{org.apache.spark.internal.LoggerFactory}} should be renamed to > {{{}org.apache.spark.internal.SparkLoggerFactory{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48292) Improve stage failure reason message in OutputCommitCoordinator
L. C. Hsieh created SPARK-48292: --- Summary: Improve stage failure reason message in OutputCommitCoordinator Key: SPARK-48292 URL: https://issues.apache.org/jira/browse/SPARK-48292 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.3 Reporter: L. C. Hsieh When a task attemp fails but it is authorized to do task commit, OutputCommitCoordinator will make the stage failed with a reason message which says that task commit success, but actually the driver never knows if a task commit is successful or not. We should update the reason message to make it less confused. See https://github.com/apache/spark/pull/36564#discussion_r1598660630 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48292) Improve stage failure reason message in OutputCommitCoordinator
[ https://issues.apache.org/jira/browse/SPARK-48292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh updated SPARK-48292: Affects Version/s: (was: 3.5.1) (was: 3.4.3) > Improve stage failure reason message in OutputCommitCoordinator > > > Key: SPARK-48292 > URL: https://issues.apache.org/jira/browse/SPARK-48292 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: L. C. Hsieh >Priority: Minor > > When a task attemp fails but it is authorized to do task commit, > OutputCommitCoordinator will make the stage failed with a reason message > which says that task commit success, but actually the driver never knows if a > task commit is successful or not. We should update the reason message to make > it less confused. > See https://github.com/apache/spark/pull/36564#discussion_r1598660630 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48292) Improve stage failure reason message in OutputCommitCoordinator
[ https://issues.apache.org/jira/browse/SPARK-48292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh updated SPARK-48292: Affects Version/s: 3.5.1 4.0.0 > Improve stage failure reason message in OutputCommitCoordinator > > > Key: SPARK-48292 > URL: https://issues.apache.org/jira/browse/SPARK-48292 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0, 3.5.1, 3.4.3 >Reporter: L. C. Hsieh >Priority: Minor > > When a task attemp fails but it is authorized to do task commit, > OutputCommitCoordinator will make the stage failed with a reason message > which says that task commit success, but actually the driver never knows if a > task commit is successful or not. We should update the reason message to make > it less confused. > See https://github.com/apache/spark/pull/36564#discussion_r1598660630 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48291) Rename Java Logger as SparkLogger
[ https://issues.apache.org/jira/browse/SPARK-48291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang updated SPARK-48291: --- Summary: Rename Java Logger as SparkLogger (was: Refactor Java Logger as SparkLogger ) > Rename Java Logger as SparkLogger > -- > > Key: SPARK-48291 > URL: https://issues.apache.org/jira/browse/SPARK-48291 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > Two new classes org.apache.spark.internal.Logger and > org.apache.spark.internal.LoggerFactory were introduced from > [https://github.com/apache/spark/pull/46301|https://github.com/apache/spark/pull/46301,] > Given that {{Logger}} is a widely recognized interface in Log4j, it may lead > to confusion to have a class with the same name. To avoid this and clarify > its purpose within the Spark framework, I propose renaming > {{org.apache.spark.internal.Logger}} to > {{{}org.apache.spark.internal.SparkLogger{}}}. Similarly, to maintain > consistency, {{org.apache.spark.internal.LoggerFactory}} should be renamed to > {{{}org.apache.spark.internal.SparkLoggerFactory{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48291) Refactor Java Logger as SparkLogger
Gengliang Wang created SPARK-48291: -- Summary: Refactor Java Logger as SparkLogger Key: SPARK-48291 URL: https://issues.apache.org/jira/browse/SPARK-48291 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 4.0.0 Reporter: Gengliang Wang Assignee: Gengliang Wang Two new classes org.apache.spark.internal.Logger and org.apache.spark.internal.LoggerFactory were introduced from [https://github.com/apache/spark/pull/46301|https://github.com/apache/spark/pull/46301,] Given that {{Logger}} is a widely recognized interface in Log4j, it may lead to confusion to have a class with the same name. To avoid this and clarify its purpose within the Spark framework, I propose renaming {{org.apache.spark.internal.Logger}} to {{{}org.apache.spark.internal.SparkLogger{}}}. Similarly, to maintain consistency, {{org.apache.spark.internal.LoggerFactory}} should be renamed to {{{}org.apache.spark.internal.SparkLoggerFactory{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48256) Add a rule to check file headers for the java side, and fix inconsistent files
[ https://issues.apache.org/jira/browse/SPARK-48256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-48256. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46557 [https://github.com/apache/spark/pull/46557] > Add a rule to check file headers for the java side, and fix inconsistent files > -- > > Key: SPARK-48256 > URL: https://issues.apache.org/jira/browse/SPARK-48256 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48256) Add a rule to check file headers for the java side, and fix inconsistent files
[ https://issues.apache.org/jira/browse/SPARK-48256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-48256: - Assignee: BingKun Pan > Add a rule to check file headers for the java side, and fix inconsistent files > -- > > Key: SPARK-48256 > URL: https://issues.apache.org/jira/browse/SPARK-48256 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48218) TransportClientFactory.createClient may NPE cause FetchFailedException
[ https://issues.apache.org/jira/browse/SPARK-48218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-48218: - Assignee: dzcxzl > TransportClientFactory.createClient may NPE cause FetchFailedException > -- > > Key: SPARK-48218 > URL: https://issues.apache.org/jira/browse/SPARK-48218 > Project: Spark > Issue Type: Improvement > Components: Shuffle >Affects Versions: 4.0.0 >Reporter: dzcxzl >Assignee: dzcxzl >Priority: Minor > Labels: pull-request-available > > {code:java} > org.apache.spark.shuffle.FetchFailedException > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:1180) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:913) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:84) > at > org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29) > Caused by: java.lang.NullPointerException > at > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:178) > at > org.apache.spark.network.shuffle.ExternalBlockStoreClient.lambda$fetchBlocks$0(ExternalBlockStoreClient.java:128) > at > org.apache.spark.network.shuffle.RetryingBlockTransferor.transferAllOutstanding(RetryingBlockTransferor.java:154) > at > org.apache.spark.network.shuffle.RetryingBlockTransferor.start(RetryingBlockTransferor.java:133) > at > org.apache.spark.network.shuffle.ExternalBlockStoreClient.fetchBlocks(ExternalBlockStoreClient.java:139) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48218) TransportClientFactory.createClient may NPE cause FetchFailedException
[ https://issues.apache.org/jira/browse/SPARK-48218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-48218. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46506 [https://github.com/apache/spark/pull/46506] > TransportClientFactory.createClient may NPE cause FetchFailedException > -- > > Key: SPARK-48218 > URL: https://issues.apache.org/jira/browse/SPARK-48218 > Project: Spark > Issue Type: Improvement > Components: Shuffle >Affects Versions: 4.0.0 >Reporter: dzcxzl >Assignee: dzcxzl >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > {code:java} > org.apache.spark.shuffle.FetchFailedException > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:1180) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:913) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:84) > at > org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29) > Caused by: java.lang.NullPointerException > at > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:178) > at > org.apache.spark.network.shuffle.ExternalBlockStoreClient.lambda$fetchBlocks$0(ExternalBlockStoreClient.java:128) > at > org.apache.spark.network.shuffle.RetryingBlockTransferor.transferAllOutstanding(RetryingBlockTransferor.java:154) > at > org.apache.spark.network.shuffle.RetryingBlockTransferor.start(RetryingBlockTransferor.java:133) > at > org.apache.spark.network.shuffle.ExternalBlockStoreClient.fetchBlocks(ExternalBlockStoreClient.java:139) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48049) Upgrade Scala to 2.13.14
[ https://issues.apache.org/jira/browse/SPARK-48049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-48049. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46288 [https://github.com/apache/spark/pull/46288] > Upgrade Scala to 2.13.14 > > > Key: SPARK-48049 > URL: https://issues.apache.org/jira/browse/SPARK-48049 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48285) Update docs for size function and sizeOfNull configuration
[ https://issues.apache.org/jira/browse/SPARK-48285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-48285. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46592 [https://github.com/apache/spark/pull/46592] > Update docs for size function and sizeOfNull configuration > -- > > Key: SPARK-48285 > URL: https://issues.apache.org/jira/browse/SPARK-48285 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-48238) Spark fail to start due to class o.a.h.yarn.server.webproxy.amfilter.AmIpFilter is not a jakarta.servlet.Filter
[ https://issues.apache.org/jira/browse/SPARK-48238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846701#comment-17846701 ] Dongjoon Hyun commented on SPARK-48238: --- Hi, [~chengpan] and [~HF] and [~cloud_fan]. Is this true that we need to revert SPARK-45522 and SPARK-47118 for only YARN support? Do you think there is an alternative like we did for Hadoop 2 and Hadoop 3 support or Hive 1 and Hive 2 support? For example, can we isolate Jetty issues to YARN module and JettyUtil via configurations? > Spark fail to start due to class > o.a.h.yarn.server.webproxy.amfilter.AmIpFilter is not a jakarta.servlet.Filter > --- > > Key: SPARK-48238 > URL: https://issues.apache.org/jira/browse/SPARK-48238 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Priority: Blocker > > I tested the latest master branch, it failed to start on YARN mode > {code:java} > dev/make-distribution.sh --tgz -Phive,hive-thriftserver,yarn{code} > > {code:java} > $ bin/spark-sql --master yarn > WARNING: Using incubator modules: jdk.incubator.vector > Setting default log level to "WARN". > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use > setLogLevel(newLevel). > 2024-05-10 17:58:17 WARN NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 2024-05-10 17:58:18 WARN Client: Neither spark.yarn.jars nor > spark.yarn.archive} is set, falling back to uploading libraries under > SPARK_HOME. > 2024-05-10 17:58:25 ERROR SparkContext: Error initializing SparkContext. > org.sparkproject.jetty.util.MultiException: Multiple exceptions > at > org.sparkproject.jetty.util.MultiException.ifExceptionThrow(MultiException.java:117) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.sparkproject.jetty.servlet.ServletHandler.initialize(ServletHandler.java:751) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.sparkproject.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:392) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.sparkproject.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:902) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.sparkproject.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:306) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.spark.ui.ServerInfo.addHandler(JettyUtils.scala:514) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$2(SparkUI.scala:81) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$2$adapted(SparkUI.scala:81) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:619) > ~[scala-library-2.13.13.jar:?] > at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:617) > ~[scala-library-2.13.13.jar:?] > at scala.collection.AbstractIterable.foreach(Iterable.scala:935) > ~[scala-library-2.13.13.jar:?] > at > org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$1(SparkUI.scala:81) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$1$adapted(SparkUI.scala:79) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at scala.Option.foreach(Option.scala:437) ~[scala-library-2.13.13.jar:?] > at org.apache.spark.ui.SparkUI.attachAllHandlers(SparkUI.scala:79) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.spark.SparkContext.$anonfun$new$31(SparkContext.scala:690) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.spark.SparkContext.$anonfun$new$31$adapted(SparkContext.scala:690) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at scala.Option.foreach(Option.scala:437) ~[scala-library-2.13.13.jar:?] > at org.apache.spark.SparkContext.(SparkContext.scala:690) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2963) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:1118) > ~[spark-sql_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at scala.Option.getOrElse(Option.scala:201) [scala-library-2.13.13.jar:?] > at >
[jira] [Updated] (SPARK-48238) Spark fail to start due to class o.a.h.yarn.server.webproxy.amfilter.AmIpFilter is not a jakarta.servlet.Filter
[ https://issues.apache.org/jira/browse/SPARK-48238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-48238: -- Description: I tested the latest master branch, it failed to start on YARN mode {code:java} dev/make-distribution.sh --tgz -Phive,hive-thriftserver,yarn{code} {code:java} $ bin/spark-sql --master yarn WARNING: Using incubator modules: jdk.incubator.vector Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 2024-05-10 17:58:17 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2024-05-10 17:58:18 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive} is set, falling back to uploading libraries under SPARK_HOME. 2024-05-10 17:58:25 ERROR SparkContext: Error initializing SparkContext. org.sparkproject.jetty.util.MultiException: Multiple exceptions at org.sparkproject.jetty.util.MultiException.ifExceptionThrow(MultiException.java:117) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.sparkproject.jetty.servlet.ServletHandler.initialize(ServletHandler.java:751) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.sparkproject.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:392) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.sparkproject.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:902) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.sparkproject.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:306) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.spark.ui.ServerInfo.addHandler(JettyUtils.scala:514) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$2(SparkUI.scala:81) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$2$adapted(SparkUI.scala:81) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:619) ~[scala-library-2.13.13.jar:?] at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:617) ~[scala-library-2.13.13.jar:?] at scala.collection.AbstractIterable.foreach(Iterable.scala:935) ~[scala-library-2.13.13.jar:?] at org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$1(SparkUI.scala:81) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$1$adapted(SparkUI.scala:79) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at scala.Option.foreach(Option.scala:437) ~[scala-library-2.13.13.jar:?] at org.apache.spark.ui.SparkUI.attachAllHandlers(SparkUI.scala:79) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.spark.SparkContext.$anonfun$new$31(SparkContext.scala:690) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.spark.SparkContext.$anonfun$new$31$adapted(SparkContext.scala:690) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at scala.Option.foreach(Option.scala:437) ~[scala-library-2.13.13.jar:?] at org.apache.spark.SparkContext.(SparkContext.scala:690) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2963) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:1118) ~[spark-sql_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at scala.Option.getOrElse(Option.scala:201) [scala-library-2.13.13.jar:?] at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:1112) [spark-sql_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:64) [spark-hive-thriftserver_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:405) [spark-hive-thriftserver_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:162) [spark-hive-thriftserver_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) [spark-hive-thriftserver_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) ~[?:?] at
[jira] [Updated] (SPARK-48290) AQE not working when joining dataframes with more than 2000 partitions
[ https://issues.apache.org/jira/browse/SPARK-48290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] André F. updated SPARK-48290: - Description: We are joining 2 large dataframes with a considerable skew on the left side in one specific key (>2000 skew ratio). {code:java} left side num partitions: 10335 right side num partitions: 1241 left side num rows: 20181947343 right side num rows: 107462219 {code} Since we have `{{{}spark.sql.adaptive.enabled{}}} ` we expect AQE to act during the join, dealing with the skewed partition automatically. During their join, we can see the following log indicating that the skew was not detected since their statistics looks weirdly equal for min/median/max sizes: {code:java} OptimizeSkewedJoin: number of skewed partitions: left 0, right 0 OptimizeSkewedJoin: Optimizing skewed join. Left side partitions size info: median size: 780925482, max size: 780925482, min size: 780925482, avg size: 780925482 Right side partitions size info: median size: 3325797, max size: 3325797, min size: 3325797, avg size: 3325797 {code} Looking at this log line and the spark configuration possibilities, our two main hypotheses to work around this behavior and correctly detect the skew were: # Increasing the `minNumPartitionsToHighlyCompress` so that Spark doesn’t convert the statistics into a `CompressedMapStatus` and therefore is able to identify the skewed partition. # Allowing spark to use a `HighlyCompressedMapStatus`, but change other configurations such as `spark.shuffle.accurateBlockThreshold` and `spark.shuffle.accurateBlockSkewedFactor` so that even then the size of the skewed partitions/blocks is accurately registered and consequently used in the optimization. We tried different values for `spark.shuffle.accurateBlockThreshold` (even absurd ones like 1MB) and nothing seem to work. The statistics indicates that the min/median and max are the same somehow and thus, the skew is not detected. However, when forcibly reducing `spark.sql.shuffle.partitions` to less than 2000 partitions, the statistics looked correct and the optimized skewed join acts as it should: {code:java} OptimizeSkewedJoin: number of skewed partitions: left 1, right 0 OptimizeSkewedJoin: Left side partition 42 (263 GB) is skewed, split it into 337 parts. OptimizeSkewedJoin: Optimizing skewed join. Left side partitions size info: median size: 862803419, max size: 282616632301, min size: 842320875, avg size: 1019367139 Right side partitions size info: median size: 4320067, max size: 4376957, min size: 4248989, avg size: 4319766 {code} Should we assume that the statistics are becoming corrupted when Spark uses `HighlyCompressedMapStatus`? Should we try another configuration property to try to work around this problem? (Assuming that fine tuning all dataframes in skewed joins in our ETL to have less than 2000 partitions is not an option) was: We are joining 2 large dataframes with a considerable skew on the left side in one specific key (>2000 skew ratio). Since we have `{{{}spark.sql.adaptive.enabled{}}} ` we expect AQE to act during the join, dealing with the skewed partition automatically. During their join, we can see the following log indicating that the skew was not detected since their statistics looks weirdly equal for min/median/max sizes: {code:java} OptimizeSkewedJoin: number of skewed partitions: left 0, right 0 OptimizeSkewedJoin: Optimizing skewed join. Left side partitions size info: median size: 780925482, max size: 780925482, min size: 780925482, avg size: 780925482 Right side partitions size info: median size: 3325797, max size: 3325797, min size: 3325797, avg size: 3325797 {code} Looking at this log line and the spark configuration possibilities, our two main hypotheses to work around this behavior and correctly detect the skew were: # Increasing the `minNumPartitionsToHighlyCompress` so that Spark doesn’t convert the statistics into a `CompressedMapStatus` and therefore is able to identify the skewed partition. # Allowing spark to use a `HighlyCompressedMapStatus`, but change other configurations such as `spark.shuffle.accurateBlockThreshold` and `spark.shuffle.accurateBlockSkewedFactor` so that even then the size of the skewed partitions/blocks is accurately registered and consequently used in the optimization. We tried different values for `spark.shuffle.accurateBlockThreshold` (even absurd ones like 1MB) and nothing seem to work. The statistics indicates that the min/median and max are the same somehow and thus, the skew is not detected. However, when forcibly reducing `spark.sql.shuffle.partitions` to less than 2000 partitions, the statistics looked correct and the optimized skewed join acts as it should: {code:java} OptimizeSkewedJoin: number of skewed partitions: left 1, right 0 OptimizeSkewedJoin: Left side partition 42 (263 GB) is skewed, split it into 337
[jira] [Created] (SPARK-48290) AQE not working when joining dataframes with more than 2000 partitions
André F. created SPARK-48290: Summary: AQE not working when joining dataframes with more than 2000 partitions Key: SPARK-48290 URL: https://issues.apache.org/jira/browse/SPARK-48290 Project: Spark Issue Type: Question Components: Optimizer, SQL Affects Versions: 3.5.1, 3.3.2 Environment: spark-standalone spark3.5.1 Reporter: André F. We are joining 2 large dataframes with a considerable skew on the left side in one specific key (>2000 skew ratio). Since we have `{{{}spark.sql.adaptive.enabled{}}} ` we expect AQE to act during the join, dealing with the skewed partition automatically. During their join, we can see the following log indicating that the skew was not detected since their statistics looks weirdly equal for min/median/max sizes: {code:java} OptimizeSkewedJoin: number of skewed partitions: left 0, right 0 OptimizeSkewedJoin: Optimizing skewed join. Left side partitions size info: median size: 780925482, max size: 780925482, min size: 780925482, avg size: 780925482 Right side partitions size info: median size: 3325797, max size: 3325797, min size: 3325797, avg size: 3325797 {code} Looking at this log line and the spark configuration possibilities, our two main hypotheses to work around this behavior and correctly detect the skew were: # Increasing the `minNumPartitionsToHighlyCompress` so that Spark doesn’t convert the statistics into a `CompressedMapStatus` and therefore is able to identify the skewed partition. # Allowing spark to use a `HighlyCompressedMapStatus`, but change other configurations such as `spark.shuffle.accurateBlockThreshold` and `spark.shuffle.accurateBlockSkewedFactor` so that even then the size of the skewed partitions/blocks is accurately registered and consequently used in the optimization. We tried different values for `spark.shuffle.accurateBlockThreshold` (even absurd ones like 1MB) and nothing seem to work. The statistics indicates that the min/median and max are the same somehow and thus, the skew is not detected. However, when forcibly reducing `spark.sql.shuffle.partitions` to less than 2000 partitions, the statistics looked correct and the optimized skewed join acts as it should: {code:java} OptimizeSkewedJoin: number of skewed partitions: left 1, right 0 OptimizeSkewedJoin: Left side partition 42 (263 GB) is skewed, split it into 337 parts. OptimizeSkewedJoin: Optimizing skewed join. Left side partitions size info: median size: 862803419, max size: 282616632301, min size: 842320875, avg size: 1019367139 Right side partitions size info: median size: 4320067, max size: 4376957, min size: 4248989, avg size: 4319766 {code} Should we assume that the statistics are becoming corrupted when Spark uses `HighlyCompressedMapStatus`? Should we try another configuration property to try to work around this problem? (Assuming that fine tuning all dataframes in skewed joins in our ETL to have less than 2000 partitions is not an option) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48172) Fix escaping issues in JDBCDialects
[ https://issues.apache.org/jira/browse/SPARK-48172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48172. - Fix Version/s: 3.4.4 3.5.2 4.0.0 Resolution: Fixed Issue resolved by pull request 46588 [https://github.com/apache/spark/pull/46588] > Fix escaping issues in JDBCDialects > --- > > Key: SPARK-48172 > URL: https://issues.apache.org/jira/browse/SPARK-48172 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Assignee: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > Fix For: 3.4.4, 3.5.2, 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-32472) Expose confusion matrix elements by threshold in BinaryClassificationMetrics
[ https://issues.apache.org/jira/browse/SPARK-32472 ] Gideon P deleted comment on SPARK-32472: -- was (Author: JIRAUSER304403): [~kmoore] can I raise a PR for this issue? > Expose confusion matrix elements by threshold in BinaryClassificationMetrics > > > Key: SPARK-32472 > URL: https://issues.apache.org/jira/browse/SPARK-32472 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 3.0.0 >Reporter: Kevin Moore >Priority: Minor > > Currently, the only thresholded metrics available from > BinaryClassificationMetrics are precision, recall, f-measure, and (indirectly > through roc()) the false positive rate. > Unfortunately, you can't always compute the individual thresholded confusion > matrix elements (TP, FP, TN, FN) from these quantities. You can make a system > of equations out of the existing thresholded metrics and the total count, but > they become underdetermined when there are no true positives. > Fortunately, the individual confusion matrix elements by threshold are > already computed and sitting in the confusions variable. It would be helpful > to expose these elements directly. The easiest way would probably be by > adding methods like > {code:java} > def truePositivesByThreshold(): RDD[(Double, Double)] = confusions.map{ case > (t, c) => (t, c.weightedTruePositives) }{code} > An alternative could be to expose the entire RDD[(Double, > BinaryConfusionMatrix)] in one method, but BinaryConfusionMatrix is also > currently package private. > The closest issue to this I found was this one for adding new calculations to > BinaryClassificationMetrics > https://issues.apache.org/jira/browse/SPARK-18844, which was closed without > any changes being merged. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32472) Expose confusion matrix elements by threshold in BinaryClassificationMetrics
[ https://issues.apache.org/jira/browse/SPARK-32472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846644#comment-17846644 ] Gideon P commented on SPARK-32472: -- [~kmoore] can I raise a PR for this issue? > Expose confusion matrix elements by threshold in BinaryClassificationMetrics > > > Key: SPARK-32472 > URL: https://issues.apache.org/jira/browse/SPARK-32472 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 3.0.0 >Reporter: Kevin Moore >Priority: Minor > > Currently, the only thresholded metrics available from > BinaryClassificationMetrics are precision, recall, f-measure, and (indirectly > through roc()) the false positive rate. > Unfortunately, you can't always compute the individual thresholded confusion > matrix elements (TP, FP, TN, FN) from these quantities. You can make a system > of equations out of the existing thresholded metrics and the total count, but > they become underdetermined when there are no true positives. > Fortunately, the individual confusion matrix elements by threshold are > already computed and sitting in the confusions variable. It would be helpful > to expose these elements directly. The easiest way would probably be by > adding methods like > {code:java} > def truePositivesByThreshold(): RDD[(Double, Double)] = confusions.map{ case > (t, c) => (t, c.weightedTruePositives) }{code} > An alternative could be to expose the entire RDD[(Double, > BinaryConfusionMatrix)] in one method, but BinaryConfusionMatrix is also > currently package private. > The closest issue to this I found was this one for adding new calculations to > BinaryClassificationMetrics > https://issues.apache.org/jira/browse/SPARK-18844, which was closed without > any changes being merged. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48289) Clen up Oracle JDBC tests by skipping redundant SYSTEM password reset
[ https://issues.apache.org/jira/browse/SPARK-48289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luca Canali updated SPARK-48289: Summary: Clen up Oracle JDBC tests by skipping redundant SYSTEM password reset (was: Improve Oracle JDBC tests by skipping redundant SYSTEM password reset) > Clen up Oracle JDBC tests by skipping redundant SYSTEM password reset > - > > Key: SPARK-48289 > URL: https://issues.apache.org/jira/browse/SPARK-48289 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 4.0.0 >Reporter: Luca Canali >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48289) Clean up Oracle JDBC tests by skipping redundant SYSTEM password reset
[ https://issues.apache.org/jira/browse/SPARK-48289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luca Canali updated SPARK-48289: Summary: Clean up Oracle JDBC tests by skipping redundant SYSTEM password reset (was: Clen up Oracle JDBC tests by skipping redundant SYSTEM password reset) > Clean up Oracle JDBC tests by skipping redundant SYSTEM password reset > -- > > Key: SPARK-48289 > URL: https://issues.apache.org/jira/browse/SPARK-48289 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 4.0.0 >Reporter: Luca Canali >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48289) Improve Oracle JDBC tests by skipping redundant SYSTEM password reset
[ https://issues.apache.org/jira/browse/SPARK-48289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48289: --- Labels: pull-request-available (was: ) > Improve Oracle JDBC tests by skipping redundant SYSTEM password reset > - > > Key: SPARK-48289 > URL: https://issues.apache.org/jira/browse/SPARK-48289 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 4.0.0 >Reporter: Luca Canali >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45769) data retrieval fails on executors with spark connect
[ https://issues.apache.org/jira/browse/SPARK-45769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846640#comment-17846640 ] Sven Teresniak commented on SPARK-45769: I can confirm this bug when using Spark Connect to access a standalone Spark (v3.5.1) cluster in a k8s environment. In my setup we are *not* using k8s operators (static "standalone" cluster). The bug *only* occurs when the Spark Connect Server is using a Spark master with adjacent workers. I can not trigger the bug when the driver (read: Spark Connect Server) is doing all the work (like a `local[0]` setup). > data retrieval fails on executors with spark connect > > > Key: SPARK-45769 > URL: https://issues.apache.org/jira/browse/SPARK-45769 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Steven Ottens >Priority: Major > > We have an OpenShift cluster with Spark and JupyterHub and we use > Spark-Connect to access Spark from within Jupyter. This worked fine with > Spark 3.4.1. However after upgrading to Spark 3.5.0 we were not able to > access any data in our Delta Tables through Spark. Initially I assumed it was > a bug in Delta: [https://github.com/delta-io/delta/issues/2235] > The actual error is > {code:java} > SparkConnectGrpcException: (org.apache.spark.SparkException) Job aborted due > to stage failure: Task 0 in stage 6.0 failed 4 times, most recent failure: > Lost task 0.3 in stage 6.0 (TID 13) (172.31.15.72 executor 4): > java.lang.ClassCastException: cannot assign instance of > java.lang.invoke.SerializedLambda to field > org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance > of org.apache.spark.rdd.MapPartitionsRDD{code} > However after further investigation I discovered that this is a regression in > Spark 3.5.0. The issue is similar to SPARK-36917, however I am not using any > custom functions, nor any other classes than spark-connect, and this setup > used to work in 3.4.1. The issue only occurs when remote executors are used > in a kubernetes environment. Running a plain Spark-Connect eg > {code:java} > ./sbin/start-connect-server.sh --packages > org.apache.spark:spark-connect_2.12:3.5.0{code} > doesn't produce the error. > The issue occurs both in a full OpenShift cluster as in a tiny minikube > setup. The steps to reproduce are based on the minikube setup. > You need to have a minimal Spark 3.5.0 setup with 1 driver and at least 1 > executor and use python to access data through Spark. The query I used to > test this is > {code:java} > from pyspark.sql import SparkSession > logFile = '/opt/spark/work-dir/data.csv' > spark = SparkSession.builder.remote('sc://spark-connect').getOrCreate() > df = spark.read.csv(logFile) > df.count() > {code} > However it doesn't matter if the data is local, or remote on a S3 storage, > nor if the data is plain text, CSV or Delta Table. > h3. Steps to reproduce: > # Install minikube > # Create a service account 'spark' > {code:java} > kubectl create sa spark{code} > # Bind the 'edit' role to the service account > {code:java} > kubectl create rolebinding spark-edit \ > --clusterrole=edit \ > --serviceaccount=default:spark \ > --namespace=default{code} > # Create a service for spark > {code:java} > kubectl create -f service.yml{code} > # Create a Spark-Connect deployment with the default Spark docker image: > [https://hub.docker.com/_/spark] (do change the deployment.yml to point to > the kubernetes API endpoint > {code:java} > kubectl create -f deployment.yml{code} > # Add data to both the executor and the driver pods, e.g. login on the > terminal of the pods and run on both pods > {code:java} > touch data.csv > echo id,name > data.csv > echo 1,2 >> data.csv {code} > # Start a spark-remote session to access the newly created data. I logged in > on the driver pod and installed the necessary python packages: > {code:java} > python3 -m pip install pandas pyspark grpcio-tools grpcio-status pyarrow{code} > Started a python shell and executed: > {code:java} > from pyspark.sql import SparkSession > logFile = '/opt/spark/work-dir/data.csv' > spark = SparkSession.builder.remote('sc://spark-connect').getOrCreate() > df = spark.read.csv(logFile) > df.count() {code} > h3. Necessary files: > Service.yml: > {code:java} > apiVersion: v1 > kind: Service > metadata: > labels: > app: spark-connect > name: spark-connect > namespace: default > spec: > ipFamilies: > - IPv4 > ports: > - name: connect-grpc > protocol: TCP > port: 15002 # Port the service listens on. > targetPort: 15002 # Port on the backing pods to which the service > forwards connections > - name: sparkui > protocol: TCP > port: 4040 # Port the service listens on. >
[jira] [Created] (SPARK-48289) Improve Oracle JDBC tests by skipping redundant SYSTEM password reset
Luca Canali created SPARK-48289: --- Summary: Improve Oracle JDBC tests by skipping redundant SYSTEM password reset Key: SPARK-48289 URL: https://issues.apache.org/jira/browse/SPARK-48289 Project: Spark Issue Type: Improvement Components: Tests Affects Versions: 4.0.0 Reporter: Luca Canali -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48288) Add source data type to connector.Cast expression
[ https://issues.apache.org/jira/browse/SPARK-48288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48288: --- Labels: pull-request-available (was: ) > Add source data type to connector.Cast expression > - > > Key: SPARK-48288 > URL: https://issues.apache.org/jira/browse/SPARK-48288 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Uros Stankovic >Priority: Major > Labels: pull-request-available > > Currently, > V2ExpressionBuilder will build connector.Cast expression from catalyst.Cast > expression. > Catalyst cast have expression data type, but connector cast does not have it. > Since some casts are not allowed on external engine, we need to know source > and target data type, since we want finer granularity to block some > unsupported casts. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48288) Add source data type to connector.Cast expression
Uros Stankovic created SPARK-48288: -- Summary: Add source data type to connector.Cast expression Key: SPARK-48288 URL: https://issues.apache.org/jira/browse/SPARK-48288 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: Uros Stankovic Currently, V2ExpressionBuilder will build connector.Cast expression from catalyst.Cast expression. Catalyst cast have expression data type, but connector cast does not have it. Since some casts are not allowed on external engine, we need to know source and target data type, since we want finer granularity to block some unsupported casts. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48287) Apply the builtin `timestamp_diff` method
[ https://issues.apache.org/jira/browse/SPARK-48287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48287: --- Labels: pull-request-available (was: ) > Apply the builtin `timestamp_diff` method > - > > Key: SPARK-48287 > URL: https://issues.apache.org/jira/browse/SPARK-48287 > Project: Spark > Issue Type: Improvement > Components: Connect, PS >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48287) Apply the builtin `timestamp_diff` method
Ruifeng Zheng created SPARK-48287: - Summary: Apply the builtin `timestamp_diff` method Key: SPARK-48287 URL: https://issues.apache.org/jira/browse/SPARK-48287 Project: Spark Issue Type: Improvement Components: Connect, PS Affects Versions: 4.0.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48286) Analyze 'exists' default expression instead of 'current' default expression in structField to v2 column conversion
[ https://issues.apache.org/jira/browse/SPARK-48286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48286: --- Labels: pull-request-available (was: ) > Analyze 'exists' default expression instead of 'current' default expression > in structField to v2 column conversion > -- > > Key: SPARK-48286 > URL: https://issues.apache.org/jira/browse/SPARK-48286 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Uros Stankovic >Priority: Trivial > Labels: pull-request-available > Fix For: 4.0.0 > > > org.apache.spark.sql.catalyst.util.ResolveDefaultColumns#analyze method > accepts 3 parameter > 1) Field to analyze > 2) Statement type - String > 3) Metadata key - CURRENT_DEFAULT or EXISTS_DEFAULT > Method > org.apache.spark.sql.connector.catalog.CatalogV2Util#structFieldToV2Column > pass fieldToAnalyze and EXISTS_DEFAULT as second parameter, so it is not > metadata key, instead of that, it is statement type, so bad expression is > analyzed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48286) Analyze 'exists' default expression instead of 'current' default expression in structField to v2 column conversion
Uros Stankovic created SPARK-48286: -- Summary: Analyze 'exists' default expression instead of 'current' default expression in structField to v2 column conversion Key: SPARK-48286 URL: https://issues.apache.org/jira/browse/SPARK-48286 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 4.0.0 Reporter: Uros Stankovic Fix For: 4.0.0 org.apache.spark.sql.catalyst.util.ResolveDefaultColumns#analyze method accepts 3 parameter 1) Field to analyze 2) Statement type - String 3) Metadata key - CURRENT_DEFAULT or EXISTS_DEFAULT Method org.apache.spark.sql.connector.catalog.CatalogV2Util#structFieldToV2Column pass fieldToAnalyze and EXISTS_DEFAULT as second parameter, so it is not metadata key, instead of that, it is statement type, so bad expression is analyzed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48271) Turn match error in RowEncoder into UNSUPPORTED_DATA_TYPE_FOR_ENCODER
[ https://issues.apache.org/jira/browse/SPARK-48271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48271. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46586 [https://github.com/apache/spark/pull/46586] > Turn match error in RowEncoder into UNSUPPORTED_DATA_TYPE_FOR_ENCODER > - > > Key: SPARK-48271 > URL: https://issues.apache.org/jira/browse/SPARK-48271 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48271) Turn match error in RowEncoder into UNSUPPORTED_DATA_TYPE_FOR_ENCODER
[ https://issues.apache.org/jira/browse/SPARK-48271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-48271: Assignee: Wenchen Fan > Turn match error in RowEncoder into UNSUPPORTED_DATA_TYPE_FOR_ENCODER > - > > Key: SPARK-48271 > URL: https://issues.apache.org/jira/browse/SPARK-48271 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48285) Update docs for size function and sizeOfNull configuration
[ https://issues.apache.org/jira/browse/SPARK-48285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48285: --- Labels: pull-request-available (was: ) > Update docs for size function and sizeOfNull configuration > -- > > Key: SPARK-48285 > URL: https://issues.apache.org/jira/browse/SPARK-48285 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48285) Update docs for size function and sizeOfNull configuration
Kent Yao created SPARK-48285: Summary: Update docs for size function and sizeOfNull configuration Key: SPARK-48285 URL: https://issues.apache.org/jira/browse/SPARK-48285 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48279) Upgrade ORC to 2.0.1
[ https://issues.apache.org/jira/browse/SPARK-48279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-48279. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46587 [https://github.com/apache/spark/pull/46587] > Upgrade ORC to 2.0.1 > > > Key: SPARK-48279 > URL: https://issues.apache.org/jira/browse/SPARK-48279 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 4.0.0 >Reporter: William Hyun >Assignee: William Hyun >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48278) Refine the string representation of `Cast`
[ https://issues.apache.org/jira/browse/SPARK-48278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-48278. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46585 [https://github.com/apache/spark/pull/46585] > Refine the string representation of `Cast` > -- > > Key: SPARK-48278 > URL: https://issues.apache.org/jira/browse/SPARK-48278 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48272) Add function `timestamp_diff`
[ https://issues.apache.org/jira/browse/SPARK-48272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-48272: - Assignee: Ruifeng Zheng > Add function `timestamp_diff` > - > > Key: SPARK-48272 > URL: https://issues.apache.org/jira/browse/SPARK-48272 > Project: Spark > Issue Type: New Feature > Components: Connect, PySpark, SQL >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48272) Add function `timestamp_diff`
[ https://issues.apache.org/jira/browse/SPARK-48272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-48272. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46576 [https://github.com/apache/spark/pull/46576] > Add function `timestamp_diff` > - > > Key: SPARK-48272 > URL: https://issues.apache.org/jira/browse/SPARK-48272 > Project: Spark > Issue Type: New Feature > Components: Connect, PySpark, SQL >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24914) totalSize is not a good estimate for broadcast joins
[ https://issues.apache.org/jira/browse/SPARK-24914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846525#comment-17846525 ] Giambattista Bloisi commented on SPARK-24914: - Also SPARK-40038 reports problems because compressed file size is used in place of the uncompressed file size: but in the case of loading the files > totalSize is not a good estimate for broadcast joins > > > Key: SPARK-24914 > URL: https://issues.apache.org/jira/browse/SPARK-24914 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Bruce Robbins >Priority: Major > > When determining whether to do a broadcast join, Spark estimates the size of > the smaller table as follows: > - if totalSize is defined and greater than 0, use it. > - else, if rawDataSize is defined and greater than 0, use it > - else, use spark.sql.defaultSizeInBytes (default: Long.MaxValue) > Therefore, Spark prefers totalSize over rawDataSize. > Unfortunately, totalSize is often quite a bit smaller than the actual table > size, since it represents the size of the table's files on disk. Parquet and > Orc files, for example, are encoded and compressed. This can result in the > JVM throwing an OutOfMemoryError while Spark is loading the table into a > HashedRelation, or when Spark actually attempts to broadcast the data. > On the other hand, rawDataSize represents the uncompressed size of the > dataset, according to Hive documentation. This seems like a pretty good > number to use in preference to totalSize. However, due to HIVE-20079, this > value is simply #columns * #rows. Once that bug is fixed, it may be a > superior statistic, at least for managed tables. > In the meantime, we could apply a configurable "fudge factor" to totalSize, > at least for types of files that are encoded and compressed. Hive has the > setting hive.stats.deserialization.factor, which defaults to 1.0, and is > described as follows: > {quote}in the absence of uncompressed/raw data size, total file size will be > used for statistics annotation. But the file may be compressed, encoded and > serialized which may be lesser in size than the actual uncompressed/raw data > size. This factor will be multiplied to file size to estimate the raw data > size. > {quote} > Also, I propose a configuration setting to allow the user to completely > ignore rawDataSize, since that value is broken (due to HIVE-20079). When that > configuration setting is set to true, Spark would instead estimate the table > as follows: > - if totalSize is defined and greater than 0, use totalSize*fudgeFactor. > - else, use spark.sql.defaultSizeInBytes (default: Long.MaxValue) > Caveat: This mitigates the issue only for Hive tables. It does not help much > when the user is reading files using {{spark.read.parquet}}, unless we apply > the same fudge factor there. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-48284) Fix UTF8String indexOf behaviour for empty string search
[ https://issues.apache.org/jira/browse/SPARK-48284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846527#comment-17846527 ] Uroš Bojanić commented on SPARK-48284: -- Working on this > Fix UTF8String indexOf behaviour for empty string search > > > Key: SPARK-48284 > URL: https://issues.apache.org/jira/browse/SPARK-48284 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > > Currently, UTF8String.indexOf returns 0 when given an empty parameters > string, and any integer start value. > Examples: > {{"abc".indexOf("", 0); // returns: 0}} > {{"abc".indexOf("", 2); // returns: 0}} > {{"abc".indexOf("", 9); // returns: 0}} > {{"abc".indexOf("", -3); // returns: 0}} > This is not correct, as "start" is not taken into consideration. > Correct behaviour would be: > {{"abc".indexOf("", 0); // returns: 0}} > {{"abc".indexOf("", 2); // returns: 2}} > {{"abc".indexOf("", 9); // returns: -1}} > {{"abc".indexOf("", -3); // returns: -1}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48284) Fix UTF8String indexOf behaviour for empty string search
[ https://issues.apache.org/jira/browse/SPARK-48284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-48284: - Description: Currently, UTF8String.indexOf returns 0 when given an empty parameters string, and any integer start value. Examples: {{"abc".indexOf("", 0); // returns: 0}} {{"abc".indexOf("", 2); // returns: 0}} {{"abc".indexOf("", 9); // returns: 0}} {{"abc".indexOf("", -3); // returns: 0}} This is not correct, as "start" is not taken into consideration. Correct behaviour would be: {{"abc".indexOf("", 0); // returns: 0}} {{"abc".indexOf("", 2); // returns: 2}} {{"abc".indexOf("", 9); // returns: -1}} {{"abc".indexOf("", -3); // returns: -1}} was: Currently, UTF8String.indexOf returns 0 when given an empty parameters string, and any integer start value. Examples: {{"abc".indexOf("", 0); // returns: 0}} {{{}"abc".indexOf("", 2); // returns: 0{}}}{{{}{}}} {{"abc".indexOf("", 9); // returns: 0}} {{{}"abc".indexOf("", -3); // returns: 0{}}}{{{}{}}}{{{}{}}} This is not correct, as "start" is not taken into consideration. Correct behaviour would be: {{"abc".indexOf("", 0); // returns: 0}} {{{}"abc".indexOf("", 2); // returns: 2{}}}{{{}{}}} {{"abc".indexOf("", 9); // returns: -1}} {{"abc".indexOf("", -3); // returns: -1}} > Fix UTF8String indexOf behaviour for empty string search > > > Key: SPARK-48284 > URL: https://issues.apache.org/jira/browse/SPARK-48284 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > > Currently, UTF8String.indexOf returns 0 when given an empty parameters > string, and any integer start value. > Examples: > {{"abc".indexOf("", 0); // returns: 0}} > {{"abc".indexOf("", 2); // returns: 0}} > {{"abc".indexOf("", 9); // returns: 0}} > {{"abc".indexOf("", -3); // returns: 0}} > This is not correct, as "start" is not taken into consideration. > Correct behaviour would be: > {{"abc".indexOf("", 0); // returns: 0}} > {{"abc".indexOf("", 2); // returns: 2}} > {{"abc".indexOf("", 9); // returns: -1}} > {{"abc".indexOf("", -3); // returns: -1}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48284) Fix UTF8String indexOf behaviour for empty string search
[ https://issues.apache.org/jira/browse/SPARK-48284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-48284: - Description: Currently, UTF8String.indexOf returns 0 when given an empty parameters string, and any integer start value. Examples: {{"abc".indexOf("", 0); // returns: 0}} {{{}"abc".indexOf("", 2); // returns: 0{}}}{{{}{}}} {{"abc".indexOf("", 9); // returns: 0}} {{{}"abc".indexOf("", -3); // returns: 0{}}}{{{}{}}}{{{}{}}} This is not correct, as "start" is not taken into consideration. Correct behaviour would be: {{"abc".indexOf("", 0); // returns: 0}} {{{}"abc".indexOf("", 2); // returns: 2{}}}{{{}{}}} {{"abc".indexOf("", 9); // returns: -1}} {{"abc".indexOf("", -3); // returns: -1}} was:Calling UTF8String.indexOf with an empty parameters string, and any integer start value. > Fix UTF8String indexOf behaviour for empty string search > > > Key: SPARK-48284 > URL: https://issues.apache.org/jira/browse/SPARK-48284 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > > Currently, UTF8String.indexOf returns 0 when given an empty parameters > string, and any integer start value. > Examples: > {{"abc".indexOf("", 0); // returns: 0}} > {{{}"abc".indexOf("", 2); // returns: 0{}}}{{{}{}}} > {{"abc".indexOf("", 9); // returns: 0}} > {{{}"abc".indexOf("", -3); // returns: 0{}}}{{{}{}}}{{{}{}}} > This is not correct, as "start" is not taken into consideration. > Correct behaviour would be: > {{"abc".indexOf("", 0); // returns: 0}} > {{{}"abc".indexOf("", 2); // returns: 2{}}}{{{}{}}} > {{"abc".indexOf("", 9); // returns: -1}} > {{"abc".indexOf("", -3); // returns: -1}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48284) Fix UTF8String indexOf behaviour for empty string search
[ https://issues.apache.org/jira/browse/SPARK-48284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-48284: - Description: Calling UTF8String.indexOf with an empty parameters string, and any integer start value. > Fix UTF8String indexOf behaviour for empty string search > > > Key: SPARK-48284 > URL: https://issues.apache.org/jira/browse/SPARK-48284 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > > Calling UTF8String.indexOf with an empty parameters string, and any integer > start value. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48284) Fix UTF8String indexOf behaviour for empty string search
Uroš Bojanić created SPARK-48284: Summary: Fix UTF8String indexOf behaviour for empty string search Key: SPARK-48284 URL: https://issues.apache.org/jira/browse/SPARK-48284 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 4.0.0 Reporter: Uroš Bojanić -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48283) Implement modified Lowercase operation for UTF8_BINARY_LCASE
Uroš Bojanić created SPARK-48283: Summary: Implement modified Lowercase operation for UTF8_BINARY_LCASE Key: SPARK-48283 URL: https://issues.apache.org/jira/browse/SPARK-48283 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Uroš Bojanić -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48281) Alter logic for: instr, substring_index (UTF8_BINARY_LCASE)
[ https://issues.apache.org/jira/browse/SPARK-48281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48281: --- Labels: pull-request-available (was: ) > Alter logic for: instr, substring_index (UTF8_BINARY_LCASE) > --- > > Key: SPARK-48281 > URL: https://issues.apache.org/jira/browse/SPARK-48281 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48282) Alter logic for: find_in_set, replace (UTF8_BINARY_LCASE)
Uroš Bojanić created SPARK-48282: Summary: Alter logic for: find_in_set, replace (UTF8_BINARY_LCASE) Key: SPARK-48282 URL: https://issues.apache.org/jira/browse/SPARK-48282 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Uroš Bojanić -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48221) Alter logic for: startsWith, endsWith, contains, locate (UTF8_BINARY_LCASE)
[ https://issues.apache.org/jira/browse/SPARK-48221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-48221: - Summary: Alter logic for: startsWith, endsWith, contains, locate (UTF8_BINARY_LCASE) (was: Alter logic for startsWith, endsWith, contains, locate (UTF8_BINARY_LCASE)) > Alter logic for: startsWith, endsWith, contains, locate (UTF8_BINARY_LCASE) > --- > > Key: SPARK-48221 > URL: https://issues.apache.org/jira/browse/SPARK-48221 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48281) Alter logic for: instr, substring_index (UTF8_BINARY_LCASE)
Uroš Bojanić created SPARK-48281: Summary: Alter logic for: instr, substring_index (UTF8_BINARY_LCASE) Key: SPARK-48281 URL: https://issues.apache.org/jira/browse/SPARK-48281 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Uroš Bojanić -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48221) Alter logic for startsWith, endsWith, contains, locate (UTF8_BINARY_LCASE)
[ https://issues.apache.org/jira/browse/SPARK-48221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uroš Bojanić updated SPARK-48221: - Summary: Alter logic for startsWith, endsWith, contains, locate (UTF8_BINARY_LCASE) (was: Alter string search logic for UTF8_BINARY_LCASE collation) > Alter logic for startsWith, endsWith, contains, locate (UTF8_BINARY_LCASE) > -- > > Key: SPARK-48221 > URL: https://issues.apache.org/jira/browse/SPARK-48221 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48269) DB2: Document Mapping Spark SQL Data Types from DB2 and add tests
[ https://issues.apache.org/jira/browse/SPARK-48269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-48269: Assignee: Kent Yao > DB2: Document Mapping Spark SQL Data Types from DB2 and add tests > - > > Key: SPARK-48269 > URL: https://issues.apache.org/jira/browse/SPARK-48269 > Project: Spark > Issue Type: Sub-task > Components: Documentation, Tests >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48269) DB2: Document Mapping Spark SQL Data Types from DB2 and add tests
[ https://issues.apache.org/jira/browse/SPARK-48269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-48269. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46572 [https://github.com/apache/spark/pull/46572] > DB2: Document Mapping Spark SQL Data Types from DB2 and add tests > - > > Key: SPARK-48269 > URL: https://issues.apache.org/jira/browse/SPARK-48269 > Project: Spark > Issue Type: Sub-task > Components: Documentation, Tests >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48280) Add Expression Walker for Testing
Mihailo Milosevic created SPARK-48280: - Summary: Add Expression Walker for Testing Key: SPARK-48280 URL: https://issues.apache.org/jira/browse/SPARK-48280 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Mihailo Milosevic -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48277) Improve error message for ErrorClassesJsonReader.getErrorMessage
[ https://issues.apache.org/jira/browse/SPARK-48277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48277. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46584 [https://github.com/apache/spark/pull/46584] > Improve error message for ErrorClassesJsonReader.getErrorMessage > > > Key: SPARK-48277 > URL: https://issues.apache.org/jira/browse/SPARK-48277 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48160) XPath expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48160: --- Assignee: Uroš Bojanić > XPath expressions (all collations) > -- > > Key: SPARK-48160 > URL: https://issues.apache.org/jira/browse/SPARK-48160 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48160) XPath expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48160. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46508 [https://github.com/apache/spark/pull/46508] > XPath expressions (all collations) > -- > > Key: SPARK-48160 > URL: https://issues.apache.org/jira/browse/SPARK-48160 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48160) XPath expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48160: --- Labels: pull-request-available (was: ) > XPath expressions (all collations) > -- > > Key: SPARK-48160 > URL: https://issues.apache.org/jira/browse/SPARK-48160 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48162) Miscellaneous expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-48162: --- Assignee: Uroš Bojanić > Miscellaneous expressions (all collations) > -- > > Key: SPARK-48162 > URL: https://issues.apache.org/jira/browse/SPARK-48162 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48162) Miscellaneous expressions (all collations)
[ https://issues.apache.org/jira/browse/SPARK-48162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-48162. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46461 [https://github.com/apache/spark/pull/46461] > Miscellaneous expressions (all collations) > -- > > Key: SPARK-48162 > URL: https://issues.apache.org/jira/browse/SPARK-48162 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Assignee: Uroš Bojanić >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org