[jira] [Commented] (SPARK-28464) Document kafka minPartitions option in "Structured Streaming + Kafka Integration Guide"
[ https://issues.apache.org/jira/browse/SPARK-28464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889641#comment-16889641 ] Arun Pandian commented on SPARK-28464: -- Opened a pull request [https://github.com/apache/spark/pull/25219] > Document kafka minPartitions option in "Structured Streaming + Kafka > Integration Guide" > > > Key: SPARK-28464 > URL: https://issues.apache.org/jira/browse/SPARK-28464 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 2.4.3 >Reporter: Arun Pandian >Priority: Minor > > SPARK-23541 added the option "minPartitions" to Kafka source. "minPartitions" > is missing in the "Structured Streaming + Kafka Integration Guide" and needs > to be documented. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28464) Document kafka minPartitions option in "Structured Streaming + Kafka Integration Guide"
Arunpandian Ganesan created SPARK-28464: --- Summary: Document kafka minPartitions option in "Structured Streaming + Kafka Integration Guide" Key: SPARK-28464 URL: https://issues.apache.org/jira/browse/SPARK-28464 Project: Spark Issue Type: Documentation Components: Documentation Affects Versions: 2.4.3 Reporter: Arunpandian Ganesan SPARK-23541 added the option "minPartitions" to Kafka source. "minPartitions" is missing in the "Structured Streaming + Kafka Integration Guide" and needs to be documented. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28463) Thriftserver throws java.math.BigDecimal incompatible with org.apache.hadoop.hive.common.type.HiveDecimal
Yuming Wang created SPARK-28463: --- Summary: Thriftserver throws java.math.BigDecimal incompatible with org.apache.hadoop.hive.common.type.HiveDecimal Key: SPARK-28463 URL: https://issues.apache.org/jira/browse/SPARK-28463 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Yuming Wang How to reproduce this issue: {code:sh} build/sbt clean package -Phive -Phive-thriftserver -Phadoop-3.2 export SPARK_PREPEND_CLASSES=true sbin/start-thriftserver.sh [root@spark-3267648 spark]# bin/beeline -u jdbc:hive2://localhost:1/default -e "select cast(1 as decimal(38, 18));" Connecting to jdbc:hive2://localhost:1/default Connected to: Spark SQL (version 3.0.0-SNAPSHOT) Driver: Hive JDBC (version 2.3.5) Transaction isolation: TRANSACTION_REPEATABLE_READ Error: java.lang.ClassCastException: java.math.BigDecimal incompatible with org.apache.hadoop.hive.common.type.HiveDecimal (state=,code=0) Closing: 0: jdbc:hive2://localhost:1/default {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28462) Add a prefix '*' to non-nullable attribute names in PlanTestBase.comparePlans failures
Takeshi Yamamuro created SPARK-28462: Summary: Add a prefix '*' to non-nullable attribute names in PlanTestBase.comparePlans failures Key: SPARK-28462 URL: https://issues.apache.org/jira/browse/SPARK-28462 Project: Spark Issue Type: Test Components: SQL Affects Versions: 3.0.0 Reporter: Takeshi Yamamuro This ticket proposes to add a prefix '*' to non-nullable attribute names in PlanTestBase.comparePlans failures. In the current master, nullability mismatches might generate the same error message for left/right logical plans like this; {code} // This failure message was extracted from #24765 - constraints should be inferred from aliased literals *** FAILED *** == FAIL: Plans do not match === !'Join Inner, (two#0 = a#0) 'Join Inner, (two#0 = a#0) :- Filter (isnotnull(a#0) AND (2 <=> a#0)) :- Filter (isnotnull(a#0) AND (2 <=> a#0)) : +- LocalRelation , [a#0, b#0, c#0] : +- LocalRelation , [a#0, b#0, c#0] +- Project [2 AS two#0] +- Project [2 AS two#0] +- LocalRelation , [a#0, b#0, c#0] +- LocalRelation , [a#0, b#0, c#0] (PlanTest.scala:145) {code} This ticket intends to change this error message to one below; {code} - constraints should be inferred from aliased literals *** FAILED *** == FAIL: Plans do not match === !'Join Inner, (*two#0 = a#0) 'Join Inner, (*two#0 = *a#0) :- Filter (isnotnull(a#0) AND (2 <=> a#0)) :- Filter (isnotnull(a#0) AND (2 <=> a#0)) : +- LocalRelation , [a#0, b#0, c#0] : +- LocalRelation , [a#0, b#0, c#0] +- Project [2 AS two#0] +- Project [2 AS two#0] +- LocalRelation , [a#0, b#0, c#0] +- LocalRelation , [a#0, b#0, c#0] (PlanTest.scala:145) {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-28461) Pad Decimal numbers with trailing zeros to the scale of the column
[ https://issues.apache.org/jira/browse/SPARK-28461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889627#comment-16889627 ] Dongjoon Hyun edited comment on SPARK-28461 at 7/21/19 1:22 AM: Hi, [~yumwang]. Please link the Hive issue to `Issue Links` instead of embedding in the JIRA description. was (Author: dongjoon): Hi, [~yumwang]. Please link the issue to `Issue Links` instead of embedding in the JIRA description. > Pad Decimal numbers with trailing zeros to the scale of the column > -- > > Key: SPARK-28461 > URL: https://issues.apache.org/jira/browse/SPARK-28461 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > PostgreSQL: > {code:sql} > postgres=# select cast(1 as decimal(38, 18)); >numeric > -- > 1.00 > (1 row) > {code} > Spark SQL: > {code:sql} > spark-sql> select cast(1 as decimal(38, 18)); > 1 > spark-sql> > {code} > Hive fix this issue by HIVE-12063. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28461) Pad Decimal numbers with trailing zeros to the scale of the column
[ https://issues.apache.org/jira/browse/SPARK-28461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889627#comment-16889627 ] Dongjoon Hyun commented on SPARK-28461: --- Hi, [~yumwang]. Please link the issue to `Issue Links` instead of embedding in the JIRA description. > Pad Decimal numbers with trailing zeros to the scale of the column > -- > > Key: SPARK-28461 > URL: https://issues.apache.org/jira/browse/SPARK-28461 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > PostgreSQL: > {code:sql} > postgres=# select cast(1 as decimal(38, 18)); >numeric > -- > 1.00 > (1 row) > {code} > Spark SQL: > {code:sql} > spark-sql> select cast(1 as decimal(38, 18)); > 1 > spark-sql> > {code} > Hive fix this issue by HIVE-12063. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28461) Pad Decimal numbers with trailing zeros to the scale of the column
Yuming Wang created SPARK-28461: --- Summary: Pad Decimal numbers with trailing zeros to the scale of the column Key: SPARK-28461 URL: https://issues.apache.org/jira/browse/SPARK-28461 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Yuming Wang PostgreSQL: {code:sql} postgres=# select cast(1 as decimal(38, 18)); numeric -- 1.00 (1 row) {code} Spark SQL: {code:sql} spark-sql> select cast(1 as decimal(38, 18)); 1 spark-sql> {code} Hive fix this issue by HIVE-12063. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27441) Add read/write tests to Hive serde tables
[ https://issues.apache.org/jira/browse/SPARK-27441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889620#comment-16889620 ] Yuming Wang commented on SPARK-27441: - [~dongjoon] Could you update the status? > Add read/write tests to Hive serde tables > - > > Key: SPARK-27441 > URL: https://issues.apache.org/jira/browse/SPARK-27441 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > The versions between Hive, Parquet and ORC after the built-in Hive upgrade to > 2.3.4: > built-in Hive is 1.2.1: > || ||ORC||Parquet|| > |Spark datasource table|1.5.5|1.10.1| > |Spark hive table|Hive built-in|1.6.0| > |Hive 1.2.1|Hive built-in|1.6.0| > built-in Hive is 2.3.4: > || ||ORC||Parquet|| > |Spark datasource table|1.5.5|1.10.1| > |Spark hive table|1.5.5|1.8.1| > |Hive 2.3.4|1.3.3|1.8.1| > We should add a test for Hive Serde table. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28446) Document Kafka Headers support
[ https://issues.apache.org/jira/browse/SPARK-28446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-28446. --- Resolution: Duplicate > Document Kafka Headers support > -- > > Key: SPARK-28446 > URL: https://issues.apache.org/jira/browse/SPARK-28446 > Project: Spark > Issue Type: Documentation > Components: Documentation, Structured Streaming >Affects Versions: 3.0.0 >Reporter: Lee Dongjin >Priority: Minor > > This issue is a follow up of SPARK-23539. > After completing SPARK-23539, the following information about the headers > functionality should be noted in Structured Streaming + Kafka Integration > Guide: > * The requirements to use Headers functionality (i.e., Kafka version). > * How to turn on the Headers functionality. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-28457) curl: (60) SSL certificate problem: unable to get local issuer certificate More details here:
[ https://issues.apache.org/jira/browse/SPARK-28457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889573#comment-16889573 ] Michael Heuer edited comment on SPARK-28457 at 7/20/19 7:12 PM: We're also seeing this in [our CI builds|https://amplab.cs.berkeley.edu/jenkins/job/ADAM/HADOOP_VERSION=2.7.5,SCALAVER=2.12,SPARK_VERSION=2.4.3,label=ubuntu/4351/console] on older Spark versions {{{color:#22}+ curl -L '{color}[https://www.apache.org/dyn/mirrors/mirrors.cgi?action=download=spark/spark-2.4.3/spark-2.4.3-bin-without-hadoop-scala-2.12.tgz]{color:#22}' -o spark-2.4.3-bin-without-hadoop{color}{color:#22}-scala-2.12.tgz{color}}} {{{color:#22}curl: (60) SSL certificate problem: unable to get local issuer {color}}} was (Author: heuermh): We're also seeing this in [our CI builds|https://amplab.cs.berkeley.edu/jenkins/job/ADAM/HADOOP_VERSION=2.7.5,SCALAVER=2.12,SPARK_VERSION=2.4.3,label=ubuntu/4351/console] on older Spark versions {{{color:#22}+ curl -L '{color}[https://www.apache.org/dyn/mirrors/mirrors.cgi?action=download=spark/spark-2.4.3/spark-2.4.3-bin-without-hadoop-scala-2.12.tgz]{color:#22}' -o spark-2.4.3-bin-without-hadoop{color}{color:#22}-scala-2.12.tgz{color}}} {{ }}{{{color:#22}curl: (60) SSL certificate problem: unable to get local issuer certificate{color}}}{{{color:#22}curl performs SSL certificate verification by default, using a "bundle"{color}}} {{ {color:#22} of Certificate Authority (CA) public keys (CA certs). If the default{color}}} {{ {color:#22} bundle file isn't adequate, you can specify an alternate file{color}}} {{ {color:#22} using the --cacert option.{color}}} {{ {color:#22}If this HTTPS server uses a certificate signed by a CA represented in{color}}} {{ {color:#22} the bundle, the certificate verification probably failed due to a{color}}} {{ {color:#22} problem with the certificate (it might be expired, or the name might{color}}} {{ {color:#22} not match the domain name in the URL).{color}}} {{ {color:#22}If you'd like to turn off curl's verification of the certificate, use{color}}} {{ {color:#22} the -k (or --insecure) option.{color}}} {{ {color:#22}Build step 'Execute shell' marked build as failure{color}}} > curl: (60) SSL certificate problem: unable to get local issuer certificate > More details here: > -- > > Key: SPARK-28457 > URL: https://issues.apache.org/jira/browse/SPARK-28457 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Blocker > > > Build broke since this afternoon. > [spark-master-compile-maven-hadoop-2.7 #10224 (broken since this > build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-2.7/10224/] > [spark-master-compile-maven-hadoop-3.2 #171 (broken since this > build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-3.2/171/] > [spark-master-lint #10599 (broken since this > build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/10599/] > > {code:java} > > > https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz > curl: (60) SSL certificate problem: unable to get local issuer certificate > More details here: > https://curl.haxx.se/docs/sslcerts.html > curl performs SSL certificate verification by default, using a "bundle" > of Certificate Authority (CA) public keys (CA certs). If the default > bundle file isn't adequate, you can specify an alternate file > using the --cacert option. > If this HTTPS server uses a certificate signed by a CA represented in > the bundle, the certificate verification probably failed due to a > problem with the certificate (it might be expired, or the name might > not match the domain name in the URL). > If you'd like to turn off curl's verification of the certificate, use > the -k (or --insecure) option. > gzip: stdin: unexpected end of file > tar: Child returned status 1 > tar: Error is not recoverable: exiting now > Using `mvn` from path: > /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn > build/mvn: line 163: > /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn: > No such file or directory > Build step 'Execute shell' marked build as failure > Finished: FAILURE > {code} > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail:
[jira] [Comment Edited] (SPARK-28457) curl: (60) SSL certificate problem: unable to get local issuer certificate More details here:
[ https://issues.apache.org/jira/browse/SPARK-28457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889573#comment-16889573 ] Michael Heuer edited comment on SPARK-28457 at 7/20/19 7:12 PM: We're also seeing this in [our CI builds|https://amplab.cs.berkeley.edu/jenkins/job/ADAM/HADOOP_VERSION=2.7.5,SCALAVER=2.12,SPARK_VERSION=2.4.3,label=ubuntu/4351/console] on older Spark versions {{{color:#22}+ curl -L '{color}[https://www.apache.org/dyn/mirrors/mirrors.cgi?action=download=spark/spark-2.4.3/spark-2.4.3-bin-without-hadoop-scala-2.12.tgz]{color:#22}' -o spark-2.4.3-bin-without-hadoop{color}{color:#22}-scala-2.12.tgz{color}}} {{ }}{{{color:#22}curl: (60) SSL certificate problem: unable to get local issuer certificate{color}}}{{{color:#22}curl performs SSL certificate verification by default, using a "bundle"{color}}} {{ {color:#22} of Certificate Authority (CA) public keys (CA certs). If the default{color}}} {{ {color:#22} bundle file isn't adequate, you can specify an alternate file{color}}} {{ {color:#22} using the --cacert option.{color}}} {{ {color:#22}If this HTTPS server uses a certificate signed by a CA represented in{color}}} {{ {color:#22} the bundle, the certificate verification probably failed due to a{color}}} {{ {color:#22} problem with the certificate (it might be expired, or the name might{color}}} {{ {color:#22} not match the domain name in the URL).{color}}} {{ {color:#22}If you'd like to turn off curl's verification of the certificate, use{color}}} {{ {color:#22} the -k (or --insecure) option.{color}}} {{ {color:#22}Build step 'Execute shell' marked build as failure{color}}} was (Author: heuermh): We're also seeing this in [our CI builds|https://amplab.cs.berkeley.edu/jenkins/job/ADAM/HADOOP_VERSION=2.7.5,SCALAVER=2.12,SPARK_VERSION=2.4.3,label=ubuntu/4351/console] on older Spark versions ``` {color:#22}+ curl -L '{color}[https://www.apache.org/dyn/mirrors/mirrors.cgi?action=download=spark/spark-2.4.3/spark-2.4.3-bin-without-hadoop-scala-2.12.tgz]{color:#22}' -o spark-2.4.3-bin-without-hadoop{color}{color:#22}-scala-2.12.tgz{color} {color:#22} % Total % Received % Xferd Average Speed Time Time Time Current{color} {color:#22} Dload Upload Total Spent Left Speed{color} {color:#22} 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0{color} {color:#22} 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0{color} {color:#22}curl: (60) SSL certificate problem: unable to get local issuer certificate{color} {color:#22}curl performs SSL certificate verification by default, using a "bundle"{color} {color:#22} of Certificate Authority (CA) public keys (CA certs). If the default{color} {color:#22} bundle file isn't adequate, you can specify an alternate file{color} {color:#22} using the --cacert option.{color} {color:#22}If this HTTPS server uses a certificate signed by a CA represented in{color} {color:#22} the bundle, the certificate verification probably failed due to a{color} {color:#22} problem with the certificate (it might be expired, or the name might{color} {color:#22} not match the domain name in the URL).{color} {color:#22}If you'd like to turn off curl's verification of the certificate, use{color} {color:#22} the -k (or --insecure) option.{color} {color:#22}Build step 'Execute shell' marked build as failure {color}``` > curl: (60) SSL certificate problem: unable to get local issuer certificate > More details here: > -- > > Key: SPARK-28457 > URL: https://issues.apache.org/jira/browse/SPARK-28457 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Blocker > > > Build broke since this afternoon. > [spark-master-compile-maven-hadoop-2.7 #10224 (broken since this > build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-2.7/10224/] > [spark-master-compile-maven-hadoop-3.2 #171 (broken since this > build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-3.2/171/] > [spark-master-lint #10599 (broken since this > build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/10599/] > > {code:java} > > > https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz > curl: (60) SSL certificate problem: unable to get local issuer certificate > More details here: >
[jira] [Commented] (SPARK-28457) curl: (60) SSL certificate problem: unable to get local issuer certificate More details here:
[ https://issues.apache.org/jira/browse/SPARK-28457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889573#comment-16889573 ] Michael Heuer commented on SPARK-28457: --- We're also seeing this in [our CI builds|https://amplab.cs.berkeley.edu/jenkins/job/ADAM/HADOOP_VERSION=2.7.5,SCALAVER=2.12,SPARK_VERSION=2.4.3,label=ubuntu/4351/console] on older Spark versions ``` {color:#22}+ curl -L '{color}[https://www.apache.org/dyn/mirrors/mirrors.cgi?action=download=spark/spark-2.4.3/spark-2.4.3-bin-without-hadoop-scala-2.12.tgz]{color:#22}' -o spark-2.4.3-bin-without-hadoop{color}{color:#22}-scala-2.12.tgz{color} {color:#22} % Total % Received % Xferd Average Speed Time Time Time Current{color} {color:#22} Dload Upload Total Spent Left Speed{color} {color:#22} 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0{color} {color:#22} 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0{color} {color:#22}curl: (60) SSL certificate problem: unable to get local issuer certificate{color} {color:#22}curl performs SSL certificate verification by default, using a "bundle"{color} {color:#22} of Certificate Authority (CA) public keys (CA certs). If the default{color} {color:#22} bundle file isn't adequate, you can specify an alternate file{color} {color:#22} using the --cacert option.{color} {color:#22}If this HTTPS server uses a certificate signed by a CA represented in{color} {color:#22} the bundle, the certificate verification probably failed due to a{color} {color:#22} problem with the certificate (it might be expired, or the name might{color} {color:#22} not match the domain name in the URL).{color} {color:#22}If you'd like to turn off curl's verification of the certificate, use{color} {color:#22} the -k (or --insecure) option.{color} {color:#22}Build step 'Execute shell' marked build as failure {color}``` > curl: (60) SSL certificate problem: unable to get local issuer certificate > More details here: > -- > > Key: SPARK-28457 > URL: https://issues.apache.org/jira/browse/SPARK-28457 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Blocker > > > Build broke since this afternoon. > [spark-master-compile-maven-hadoop-2.7 #10224 (broken since this > build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-2.7/10224/] > [spark-master-compile-maven-hadoop-3.2 #171 (broken since this > build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-3.2/171/] > [spark-master-lint #10599 (broken since this > build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/10599/] > > {code:java} > > > https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz > curl: (60) SSL certificate problem: unable to get local issuer certificate > More details here: > https://curl.haxx.se/docs/sslcerts.html > curl performs SSL certificate verification by default, using a "bundle" > of Certificate Authority (CA) public keys (CA certs). If the default > bundle file isn't adequate, you can specify an alternate file > using the --cacert option. > If this HTTPS server uses a certificate signed by a CA represented in > the bundle, the certificate verification probably failed due to a > problem with the certificate (it might be expired, or the name might > not match the domain name in the URL). > If you'd like to turn off curl's verification of the certificate, use > the -k (or --insecure) option. > gzip: stdin: unexpected end of file > tar: Child returned status 1 > tar: Error is not recoverable: exiting now > Using `mvn` from path: > /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn > build/mvn: line 163: > /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn: > No such file or directory > Build step 'Execute shell' marked build as failure > Finished: FAILURE > {code} > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28460) Port test from HIVE-11835
Yuming Wang created SPARK-28460: --- Summary: Port test from HIVE-11835 Key: SPARK-28460 URL: https://issues.apache.org/jira/browse/SPARK-28460 Project: Spark Issue Type: Sub-task Components: SQL, Tests Affects Versions: 3.0.0 Reporter: Yuming Wang -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27500) Add tests for built-in Hive 2.3
[ https://issues.apache.org/jira/browse/SPARK-27500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-27500: Summary: Add tests for built-in Hive 2.3 (was: Add tests for the built-in Hive 2.3) > Add tests for built-in Hive 2.3 > --- > > Key: SPARK-27500 > URL: https://issues.apache.org/jira/browse/SPARK-27500 > Project: Spark > Issue Type: Umbrella > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > Our Spark will use some of the new features and bug fixes of Hive 2.3, and we > should add tests for these. This is an umbrella JIRA for tracking this. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27441) Add read/write tests to Hive serde tables
[ https://issues.apache.org/jira/browse/SPARK-27441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-27441: Summary: Add read/write tests to Hive serde tables (was: Add read/write tests to Hive serde tables(include Parquet vectorized reader)) > Add read/write tests to Hive serde tables > - > > Key: SPARK-27441 > URL: https://issues.apache.org/jira/browse/SPARK-27441 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > The versions between Hive, Parquet and ORC after the built-in Hive upgrade to > 2.3.4: > built-in Hive is 1.2.1: > || ||ORC||Parquet|| > |Spark datasource table|1.5.5|1.10.1| > |Spark hive table|Hive built-in|1.6.0| > |Hive 1.2.1|Hive built-in|1.6.0| > built-in Hive is 2.3.4: > || ||ORC||Parquet|| > |Spark datasource table|1.5.5|1.10.1| > |Spark hive table|1.5.5|1.8.1| > |Hive 2.3.4|1.3.3|1.8.1| > We should add a test for Hive Serde table. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28459) Date/Time Functions: make_timestamp
[ https://issues.apache.org/jira/browse/SPARK-28459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889559#comment-16889559 ] Maxim Gekk commented on SPARK-28459: [~shivuson...@gmail.com] I have already had a draft changes for this similar to https://github.com/apache/spark/pull/25210 > Date/Time Functions: make_timestamp > --- > > Key: SPARK-28459 > URL: https://issues.apache.org/jira/browse/SPARK-28459 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > ||Function||Return Type||Description||Example||Result|| > |{{make_timestamp(_year_ }}{{int}}{{, _month_ }}{{int}}{{, _day_ }}{{int}}{{, > _hour_ }}{{int}}{{, _min_ }}{{int}}{{, _sec_}}{{double > precision}}{{)}}|{{timestamp}}|Create timestamp from year, month, day, hour, > minute and seconds fields|{{make_timestamp(2013, 7, 15, 8, 15, > 23.5)}}|{{2013-07-15 08:15:23.5}}| > https://www.postgresql.org/docs/11/functions-datetime.html -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28459) Date/Time Functions: make_timestamp
[ https://issues.apache.org/jira/browse/SPARK-28459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889556#comment-16889556 ] Shivu Sondur commented on SPARK-28459: -- I will check this. > Date/Time Functions: make_timestamp > --- > > Key: SPARK-28459 > URL: https://issues.apache.org/jira/browse/SPARK-28459 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > ||Function||Return Type||Description||Example||Result|| > |{{make_timestamp(_year_ }}{{int}}{{, _month_ }}{{int}}{{, _day_ }}{{int}}{{, > _hour_ }}{{int}}{{, _min_ }}{{int}}{{, _sec_}}{{double > precision}}{{)}}|{{timestamp}}|Create timestamp from year, month, day, hour, > minute and seconds fields|{{make_timestamp(2013, 7, 15, 8, 15, > 23.5)}}|{{2013-07-15 08:15:23.5}}| > https://www.postgresql.org/docs/11/functions-datetime.html -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24907) Migrate JDBC data source to DataSource API v2
[ https://issues.apache.org/jira/browse/SPARK-24907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889552#comment-16889552 ] Shiv Prashant Sood commented on SPARK-24907: Pushed a PR with scaffolding for read/write paths and first draft implementation of write( append) path based on current APIs https://github.com/apache/spark/pull/25211 > Migrate JDBC data source to DataSource API v2 > - > > Key: SPARK-24907 > URL: https://issues.apache.org/jira/browse/SPARK-24907 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.3.0, 3.0.0 >Reporter: Teng Peng >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28459) Date/Time Functions: make_timestamp
[ https://issues.apache.org/jira/browse/SPARK-28459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28459: -- Description: ||Function||Return Type||Description||Example||Result|| |{{make_timestamp(_year_ }}{{int}}{{, _month_ }}{{int}}{{, _day_ }}{{int}}{{, _hour_ }}{{int}}{{, _min_ }}{{int}}{{, _sec_}}{{double precision}}{{)}}|{{timestamp}}|Create timestamp from year, month, day, hour, minute and seconds fields|{{make_timestamp(2013, 7, 15, 8, 15, 23.5)}}|{{2013-07-15 08:15:23.5}}| https://www.postgresql.org/docs/11/functions-datetime.html was: ||Function||Return Type||Description||Example||Result|| |{{make_date(_year_ }}{{int}}{{, _month_ }}{{int}}{{, _day_ }}{{int}}{{)}}|{{date}}|Create date from year, month and day fields|{{make_date(2013, 7, 15)}}|{{2013-07-15}}| |{{make_timestamp(_year_ }}{{int}}{{, _month_ }}{{int}}{{, _day_ }}{{int}}{{, _hour_ }}{{int}}{{, _min_ }}{{int}}{{, _sec_}}{{double precision}}{{)}}|{{timestamp}}|Create timestamp from year, month, day, hour, minute and seconds fields|{{make_timestamp(2013, 7, 15, 8, 15, 23.5)}}|{{2013-07-15 08:15:23.5}}| https://www.postgresql.org/docs/11/functions-datetime.html > Date/Time Functions: make_timestamp > --- > > Key: SPARK-28459 > URL: https://issues.apache.org/jira/browse/SPARK-28459 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > ||Function||Return Type||Description||Example||Result|| > |{{make_timestamp(_year_ }}{{int}}{{, _month_ }}{{int}}{{, _day_ }}{{int}}{{, > _hour_ }}{{int}}{{, _min_ }}{{int}}{{, _sec_}}{{double > precision}}{{)}}|{{timestamp}}|Create timestamp from year, month, day, hour, > minute and seconds fields|{{make_timestamp(2013, 7, 15, 8, 15, > 23.5)}}|{{2013-07-15 08:15:23.5}}| > https://www.postgresql.org/docs/11/functions-datetime.html -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28432) Date/Time Functions: make_date
[ https://issues.apache.org/jira/browse/SPARK-28432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28432: -- Description: ||Function||Return Type||Description||Example||Result|| |{{make_date(_year_ }}{{int}}{{, _month_ }}{{int}}{{, _day_ }}{{int}}{{)}}|{{date}}|Create date from year, month and day fields|{{make_date(2013, 7, 15)}}|{{2013-07-15}}| https://www.postgresql.org/docs/11/functions-datetime.html was: ||Function||Return Type||Description||Example||Result|| |{{make_date(_year_ }}{{int}}{{, _month_ }}{{int}}{{, _day_ }}{{int}}{{)}}|{{date}}|Create date from year, month and day fields|{{make_date(2013, 7, 15)}}|{{2013-07-15}}| |{{make_timestamp(_year_ }}{{int}}{{, _month_ }}{{int}}{{, _day_ }}{{int}}{{, _hour_ }}{{int}}{{, _min_ }}{{int}}{{, _sec_}}{{double precision}}{{)}}|{{timestamp}}|Create timestamp from year, month, day, hour, minute and seconds fields|{{make_timestamp(2013, 7, 15, 8, 15, 23.5)}}|{{2013-07-15 08:15:23.5}}| https://www.postgresql.org/docs/11/functions-datetime.html > Date/Time Functions: make_date > -- > > Key: SPARK-28432 > URL: https://issues.apache.org/jira/browse/SPARK-28432 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > ||Function||Return Type||Description||Example||Result|| > |{{make_date(_year_ }}{{int}}{{, _month_ }}{{int}}{{, _day_ > }}{{int}}{{)}}|{{date}}|Create date from year, month and day > fields|{{make_date(2013, 7, 15)}}|{{2013-07-15}}| > https://www.postgresql.org/docs/11/functions-datetime.html -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28459) Date/Time Functions: make_timestamp
Dongjoon Hyun created SPARK-28459: - Summary: Date/Time Functions: make_timestamp Key: SPARK-28459 URL: https://issues.apache.org/jira/browse/SPARK-28459 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Yuming Wang ||Function||Return Type||Description||Example||Result|| |{{make_date(_year_ }}{{int}}{{, _month_ }}{{int}}{{, _day_ }}{{int}}{{)}}|{{date}}|Create date from year, month and day fields|{{make_date(2013, 7, 15)}}|{{2013-07-15}}| |{{make_timestamp(_year_ }}{{int}}{{, _month_ }}{{int}}{{, _day_ }}{{int}}{{, _hour_ }}{{int}}{{, _min_ }}{{int}}{{, _sec_}}{{double precision}}{{)}}|{{timestamp}}|Create timestamp from year, month, day, hour, minute and seconds fields|{{make_timestamp(2013, 7, 15, 8, 15, 23.5)}}|{{2013-07-15 08:15:23.5}}| https://www.postgresql.org/docs/11/functions-datetime.html -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28432) Date/Time Functions: make_date
[ https://issues.apache.org/jira/browse/SPARK-28432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28432: -- Summary: Date/Time Functions: make_date (was: Date/Time Functions: make_date/make_timestamp) > Date/Time Functions: make_date > -- > > Key: SPARK-28432 > URL: https://issues.apache.org/jira/browse/SPARK-28432 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > ||Function||Return Type||Description||Example||Result|| > |{{make_date(_year_ }}{{int}}{{, _month_ }}{{int}}{{, _day_ > }}{{int}}{{)}}|{{date}}|Create date from year, month and day > fields|{{make_date(2013, 7, 15)}}|{{2013-07-15}}| > |{{make_timestamp(_year_ }}{{int}}{{, _month_ }}{{int}}{{, _day_ }}{{int}}{{, > _hour_ }}{{int}}{{, _min_ }}{{int}}{{, _sec_}}{{double > precision}}{{)}}|{{timestamp}}|Create timestamp from year, month, day, hour, > minute and seconds fields|{{make_timestamp(2013, 7, 15, 8, 15, > 23.5)}}|{{2013-07-15 08:15:23.5}}| > https://www.postgresql.org/docs/11/functions-datetime.html -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24907) Migrate JDBC data source to DataSource API v2
[ https://issues.apache.org/jira/browse/SPARK-24907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shiv Prashant Sood updated SPARK-24907: --- Affects Version/s: 3.0.0 > Migrate JDBC data source to DataSource API v2 > - > > Key: SPARK-24907 > URL: https://issues.apache.org/jira/browse/SPARK-24907 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.3.0, 3.0.0 >Reporter: Teng Peng >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28243) remove setFeatureSubsetStrategy and setSubsamplingRate from Python TreeEnsembleParams
[ https://issues.apache.org/jira/browse/SPARK-28243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-28243: - Assignee: Huaxin Gao > remove setFeatureSubsetStrategy and setSubsamplingRate from Python > TreeEnsembleParams > - > > Key: SPARK-28243 > URL: https://issues.apache.org/jira/browse/SPARK-28243 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark >Affects Versions: 3.0.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Minor > > Remove the deprecated setFeatureSubsetStrategy and setSubsamplingRate from > Python TreeEnsembleParams -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28243) remove setFeatureSubsetStrategy and setSubsamplingRate from Python TreeEnsembleParams
[ https://issues.apache.org/jira/browse/SPARK-28243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-28243. --- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 25046 [https://github.com/apache/spark/pull/25046] > remove setFeatureSubsetStrategy and setSubsamplingRate from Python > TreeEnsembleParams > - > > Key: SPARK-28243 > URL: https://issues.apache.org/jira/browse/SPARK-28243 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark >Affects Versions: 3.0.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Minor > Fix For: 3.0.0 > > > Remove the deprecated setFeatureSubsetStrategy and setSubsamplingRate from > Python TreeEnsembleParams -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-28225) Unexpected behavior for Window functions
[ https://issues.apache.org/jira/browse/SPARK-28225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889518#comment-16889518 ] Marco Gaido edited comment on SPARK-28225 at 7/20/19 2:43 PM: -- Let me cite PostgreSQL documentation to explain you the behavior: ??When an aggregate function is used as a window function, it aggregates over the rows within the current row's window frame. An aggregate used with ORDER BY and the default window frame definition produces a "running sum" type of behavior, which may or may not be what's wanted. To obtain aggregation over the whole partition, omit ORDER BY or use ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. Other frame specifications can be used to obtain other effects.?? So the returned values seem correct to me. was (Author: mgaido): Let me cite PostgreSQL documentation to explain you the behavior: {noformat} When an aggregate function is used as a window function, it aggregates over the rows within the current row's window frame. An aggregate used with ORDER BY and the default window frame definition produces a "running sum" type of behavior, which may or may not be what's wanted. To obtain aggregation over the whole partition, omit ORDER BY or use ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. Other frame specifications can be used to obtain other effects. {noformat} So the returned values seem correct to me. > Unexpected behavior for Window functions > > > Key: SPARK-28225 > URL: https://issues.apache.org/jira/browse/SPARK-28225 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Andrew Leverentz >Priority: Major > > I've noticed some odd behavior when combining the "first" aggregate function > with an ordered Window. > In particular, I'm working with columns created using the syntax > {code} > first($"y", ignoreNulls = true).over(Window.orderBy($"x")) > {code} > Below, I'm including some code which reproduces this issue in a Databricks > notebook. > *Code:* > {code:java} > import org.apache.spark.sql.functions.first > import org.apache.spark.sql.expressions.Window > import org.apache.spark.sql.Row > import org.apache.spark.sql.types.{StructType,StructField,IntegerType} > val schema = StructType(Seq( > StructField("x", IntegerType, false), > StructField("y", IntegerType, true), > StructField("z", IntegerType, true) > )) > val input = > spark.createDataFrame(sc.parallelize(Seq( > Row(101, null, 11), > Row(102, null, 12), > Row(103, null, 13), > Row(203, 24, null), > Row(201, 26, null), > Row(202, 25, null) > )), schema = schema) > input.show > val output = input > .withColumn("u1", first($"y", ignoreNulls = > true).over(Window.orderBy($"x".asc_nulls_last))) > .withColumn("u2", first($"y", ignoreNulls = > true).over(Window.orderBy($"x".asc))) > .withColumn("u3", first($"y", ignoreNulls = > true).over(Window.orderBy($"x".desc_nulls_last))) > .withColumn("u4", first($"y", ignoreNulls = > true).over(Window.orderBy($"x".desc))) > .withColumn("u5", first($"z", ignoreNulls = > true).over(Window.orderBy($"x".asc_nulls_last))) > .withColumn("u6", first($"z", ignoreNulls = > true).over(Window.orderBy($"x".asc))) > .withColumn("u7", first($"z", ignoreNulls = > true).over(Window.orderBy($"x".desc_nulls_last))) > .withColumn("u8", first($"z", ignoreNulls = > true).over(Window.orderBy($"x".desc))) > output.show > {code} > *Expectation:* > Based on my understanding of how ordered-Window and aggregate functions work, > the results I expected to see were: > * u1 = u2 = constant value of 26 > * u3 = u4 = constant value of 24 > * u5 = u6 = constant value of 11 > * u7 = u8 = constant value of 13 > However, columns u1, u2, u7, and u8 contain some unexpected nulls. > *Results:* > {code:java} > +---+++++---+---+---+---+++ > | x| y| z| u1| u2| u3| u4| u5| u6| u7| u8| > +---+++++---+---+---+---+++ > |203| 24|null| 26| 26| 24| 24| 11| 11|null|null| > |202| 25|null| 26| 26| 24| 24| 11| 11|null|null| > |201| 26|null| 26| 26| 24| 24| 11| 11|null|null| > |103|null| 13|null|null| 24| 24| 11| 11| 13| 13| > |102|null| 12|null|null| 24| 24| 11| 11| 13| 13| > |101|null| 11|null|null| 24| 24| 11| 11| 13| 13| > +---+++++---+---+---+---+++ > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28225) Unexpected behavior for Window functions
[ https://issues.apache.org/jira/browse/SPARK-28225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889518#comment-16889518 ] Marco Gaido commented on SPARK-28225: - Let me cite PostgreSQL documentation to explain you the behavior: {noformat} When an aggregate function is used as a window function, it aggregates over the rows within the current row's window frame. An aggregate used with ORDER BY and the default window frame definition produces a "running sum" type of behavior, which may or may not be what's wanted. To obtain aggregation over the whole partition, omit ORDER BY or use ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. Other frame specifications can be used to obtain other effects. {noformat} So the returned values seem correct to me. > Unexpected behavior for Window functions > > > Key: SPARK-28225 > URL: https://issues.apache.org/jira/browse/SPARK-28225 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Andrew Leverentz >Priority: Major > > I've noticed some odd behavior when combining the "first" aggregate function > with an ordered Window. > In particular, I'm working with columns created using the syntax > {code} > first($"y", ignoreNulls = true).over(Window.orderBy($"x")) > {code} > Below, I'm including some code which reproduces this issue in a Databricks > notebook. > *Code:* > {code:java} > import org.apache.spark.sql.functions.first > import org.apache.spark.sql.expressions.Window > import org.apache.spark.sql.Row > import org.apache.spark.sql.types.{StructType,StructField,IntegerType} > val schema = StructType(Seq( > StructField("x", IntegerType, false), > StructField("y", IntegerType, true), > StructField("z", IntegerType, true) > )) > val input = > spark.createDataFrame(sc.parallelize(Seq( > Row(101, null, 11), > Row(102, null, 12), > Row(103, null, 13), > Row(203, 24, null), > Row(201, 26, null), > Row(202, 25, null) > )), schema = schema) > input.show > val output = input > .withColumn("u1", first($"y", ignoreNulls = > true).over(Window.orderBy($"x".asc_nulls_last))) > .withColumn("u2", first($"y", ignoreNulls = > true).over(Window.orderBy($"x".asc))) > .withColumn("u3", first($"y", ignoreNulls = > true).over(Window.orderBy($"x".desc_nulls_last))) > .withColumn("u4", first($"y", ignoreNulls = > true).over(Window.orderBy($"x".desc))) > .withColumn("u5", first($"z", ignoreNulls = > true).over(Window.orderBy($"x".asc_nulls_last))) > .withColumn("u6", first($"z", ignoreNulls = > true).over(Window.orderBy($"x".asc))) > .withColumn("u7", first($"z", ignoreNulls = > true).over(Window.orderBy($"x".desc_nulls_last))) > .withColumn("u8", first($"z", ignoreNulls = > true).over(Window.orderBy($"x".desc))) > output.show > {code} > *Expectation:* > Based on my understanding of how ordered-Window and aggregate functions work, > the results I expected to see were: > * u1 = u2 = constant value of 26 > * u3 = u4 = constant value of 24 > * u5 = u6 = constant value of 11 > * u7 = u8 = constant value of 13 > However, columns u1, u2, u7, and u8 contain some unexpected nulls. > *Results:* > {code:java} > +---+++++---+---+---+---+++ > | x| y| z| u1| u2| u3| u4| u5| u6| u7| u8| > +---+++++---+---+---+---+++ > |203| 24|null| 26| 26| 24| 24| 11| 11|null|null| > |202| 25|null| 26| 26| 24| 24| 11| 11|null|null| > |201| 26|null| 26| 26| 24| 24| 11| 11|null|null| > |103|null| 13|null|null| 24| 24| 11| 11| 13| 13| > |102|null| 12|null|null| 24| 24| 11| 11| 13| 13| > |101|null| 11|null|null| 24| 24| 11| 11| 13| 13| > +---+++++---+---+---+---+++ > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28304) FileFormatWriter introduces an uncoditional sort, even when all attributes are constants
[ https://issues.apache.org/jira/browse/SPARK-28304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889436#comment-16889436 ] Eyal Farago commented on SPARK-28304: - [~joshrosen], thanks for your comment, I think this is a bit broader subject than just FileFormatWriter as any sort operator can either simplify its ordering or be completely eliminated when some/all of the sort columns are known to be constant. furthermore, in some cases one ordering can be used to satisfy several orderings (or on the other hand, ordering requirements of downstream operators can be relaxed) - so I believe this is best handled by the optimizer/planner, by essentially making EnsureRequirements aware of these kind of cases. as a short term fix, the SortExec operator can filter away constant ordering columns in the execute() method, and in case it's left with no ordering columns simply bypass the sort altogether. BTW, is there any reason the FileFormatWriter doesn't take the regular optimizing/planning path? > FileFormatWriter introduces an uncoditional sort, even when all attributes > are constants > > > Key: SPARK-28304 > URL: https://issues.apache.org/jira/browse/SPARK-28304 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.2 >Reporter: Eyal Farago >Priority: Major > Labels: performance > > FileFormatWriter derives a required sort order based on the partition > columns, bucketing columns and explicitly required ordering. However in some > use cases Some (or even all) of these fields are constant, in these cases the > sort can be skipped. > i.e. in my use-case, we add a GUUID column identifying a specific > (incremental) load, this can be thought of as a batch id. Since we run one > batch at a time, this column is always a constant which means there's no need > to sort based on this column, since we don't use bucketing or require an > explicit ordering the entire sort can be skipped for our case. > > I suggest: > # filter away constant columns from the required ordering calculated by > FileFormatWriter > # generalizing this to any Sort operator in a spark plan. > # introduce optimizer rules to remove constants from sort ordering, > potentially eliminating the sort operator altogether. > # modify EnsureRequirements to be aware of constant field when deciding > whether to introduce a sort or not. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28386) Cannot resolve ORDER BY columns with GROUP BY and HAVING
[ https://issues.apache.org/jira/browse/SPARK-28386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889422#comment-16889422 ] Marco Gaido commented on SPARK-28386: - I think this is a duplicate of SPARK-26741. I have a PR for it but it seems a bit stuck. Any help in reviewing it would be very appreciated. Thanks. > Cannot resolve ORDER BY columns with GROUP BY and HAVING > > > Key: SPARK-28386 > URL: https://issues.apache.org/jira/browse/SPARK-28386 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > How to reproduce: > {code:sql} > CREATE TABLE test_having (a int, b int, c string, d string) USING parquet; > INSERT INTO test_having VALUES (0, 1, '', 'A'); > INSERT INTO test_having VALUES (1, 2, '', 'b'); > INSERT INTO test_having VALUES (2, 2, '', 'c'); > INSERT INTO test_having VALUES (3, 3, '', 'D'); > INSERT INTO test_having VALUES (4, 3, '', 'e'); > INSERT INTO test_having VALUES (5, 3, '', 'F'); > INSERT INTO test_having VALUES (6, 4, '', 'g'); > INSERT INTO test_having VALUES (7, 4, '', 'h'); > INSERT INTO test_having VALUES (8, 4, '', 'I'); > INSERT INTO test_having VALUES (9, 4, '', 'j'); > SELECT lower(c), count(c) FROM test_having > GROUP BY lower(c) HAVING count(*) > 2 > ORDER BY lower(c); > {code} > {noformat} > spark-sql> SELECT lower(c), count(c) FROM test_having > > GROUP BY lower(c) HAVING count(*) > 2 > > ORDER BY lower(c); > Error in query: cannot resolve '`c`' given input columns: [lower(c), > count(c)]; line 3 pos 19; > 'Sort ['lower('c) ASC NULLS FIRST], true > +- Project [lower(c)#158, count(c)#159L] >+- Filter (count(1)#161L > cast(2 as bigint)) > +- Aggregate [lower(c#7)], [lower(c#7) AS lower(c)#158, count(c#7) AS > count(c)#159L, count(1) AS count(1)#161L] > +- SubqueryAlias test_having > +- Relation[a#5,b#6,c#7,d#8] parquet > {noformat} > But it works when setting an alias: > {noformat} > spark-sql> SELECT lower(c) withAias, count(c) FROM test_having > > GROUP BY lower(c) HAVING count(*) > 2 > > ORDER BY withAias; > 3 > 4 > {noformat} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27815) Improve SQL optimizer's predicate pushdown performance for cascading joins
[ https://issues.apache.org/jira/browse/SPARK-27815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889405#comment-16889405 ] Hyukjin Kwon commented on SPARK-27815: -- There was a followup at https://github.com/apache/spark/pull/25207 > Improve SQL optimizer's predicate pushdown performance for cascading joins > --- > > Key: SPARK-27815 > URL: https://issues.apache.org/jira/browse/SPARK-27815 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yesheng Ma >Assignee: Yesheng Ma >Priority: Major > Fix For: 3.0.0 > > > The current catalyst optimizer's predicate pushdown is divided into two > separate rules: PushDownPredicate and PushThroughJoin. This is not efficient > for optimizing cascading joins such as TPC-DS q64, where a whole default > batch is re-executed just due to this. We need a more efficient approach to > pushdown predicate as much as possible in a single pass. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28155) do not leak SaveMode to file source v2
[ https://issues.apache.org/jira/browse/SPARK-28155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889404#comment-16889404 ] Hyukjin Kwon commented on SPARK-28155: -- SPARK-28155 was switched to SPARK-2781 due to an accident of a commit log. > do not leak SaveMode to file source v2 > -- > > Key: SPARK-28155 > URL: https://issues.apache.org/jira/browse/SPARK-28155 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Priority: Blocker > > Currently there is a hack in `DataFrameWriter`, which passes `SaveMode` to > file source v2. This should be removed and file source v2 should not accept > SaveMode. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28458) CLONE - do not leak SaveMode to file source v2
[ https://issues.apache.org/jira/browse/SPARK-28458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889403#comment-16889403 ] Hyukjin Kwon commented on SPARK-28458: -- It was cloned by me to keep a copy for awhile during JIRA switch between SPARK-28155 and SPARK-27815 > CLONE - do not leak SaveMode to file source v2 > -- > > Key: SPARK-28458 > URL: https://issues.apache.org/jira/browse/SPARK-28458 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Priority: Blocker > > Currently there is a hack in `DataFrameWriter`, which passes `SaveMode` to > file source v2. This should be removed and file source v2 should not accept > SaveMode. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28458) CLONE - do not leak SaveMode to file source v2
[ https://issues.apache.org/jira/browse/SPARK-28458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-28458. -- Resolution: Invalid > CLONE - do not leak SaveMode to file source v2 > -- > > Key: SPARK-28458 > URL: https://issues.apache.org/jira/browse/SPARK-28458 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Priority: Blocker > > Currently there is a hack in `DataFrameWriter`, which passes `SaveMode` to > file source v2. This should be removed and file source v2 should not accept > SaveMode. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28155) do not leak SaveMode to file source v2
[ https://issues.apache.org/jira/browse/SPARK-28155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-28155: - Reporter: Wenchen Fan (was: Yesheng Ma) > do not leak SaveMode to file source v2 > -- > > Key: SPARK-28155 > URL: https://issues.apache.org/jira/browse/SPARK-28155 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Priority: Blocker > > The current catalyst optimizer's predicate pushdown is divided into two > separate rules: PushDownPredicate and PushThroughJoin. This is not efficient > for optimizing cascading joins such as TPC-DS q64, where a whole default > batch is re-executed just due to this. We need a more efficient approach to > pushdown predicate as much as possible in a single pass. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28155) do not leak SaveMode to file source v2
[ https://issues.apache.org/jira/browse/SPARK-28155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-28155: - Priority: Blocker (was: Major) > do not leak SaveMode to file source v2 > -- > > Key: SPARK-28155 > URL: https://issues.apache.org/jira/browse/SPARK-28155 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yesheng Ma >Assignee: Yesheng Ma >Priority: Blocker > > The current catalyst optimizer's predicate pushdown is divided into two > separate rules: PushDownPredicate and PushThroughJoin. This is not efficient > for optimizing cascading joins such as TPC-DS q64, where a whole default > batch is re-executed just due to this. We need a more efficient approach to > pushdown predicate as much as possible in a single pass. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28155) do not leak SaveMode to file source v2
[ https://issues.apache.org/jira/browse/SPARK-28155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-28155: - Description: Currently there is a hack in `DataFrameWriter`, which passes `SaveMode` to file source v2. This should be removed and file source v2 should not accept SaveMode. (was: The current catalyst optimizer's predicate pushdown is divided into two separate rules: PushDownPredicate and PushThroughJoin. This is not efficient for optimizing cascading joins such as TPC-DS q64, where a whole default batch is re-executed just due to this. We need a more efficient approach to pushdown predicate as much as possible in a single pass.) > do not leak SaveMode to file source v2 > -- > > Key: SPARK-28155 > URL: https://issues.apache.org/jira/browse/SPARK-28155 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Priority: Blocker > > Currently there is a hack in `DataFrameWriter`, which passes `SaveMode` to > file source v2. This should be removed and file source v2 should not accept > SaveMode. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-28155) do not leak SaveMode to file source v2
[ https://issues.apache.org/jira/browse/SPARK-28155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-28155: - Comment: was deleted (was: [https://github.com/apache/spark/pull/24956]) > do not leak SaveMode to file source v2 > -- > > Key: SPARK-28155 > URL: https://issues.apache.org/jira/browse/SPARK-28155 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yesheng Ma >Priority: Blocker > > The current catalyst optimizer's predicate pushdown is divided into two > separate rules: PushDownPredicate and PushThroughJoin. This is not efficient > for optimizing cascading joins such as TPC-DS q64, where a whole default > batch is re-executed just due to this. We need a more efficient approach to > pushdown predicate as much as possible in a single pass. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-28155) do not leak SaveMode to file source v2
[ https://issues.apache.org/jira/browse/SPARK-28155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reopened SPARK-28155: -- Assignee: (was: Yesheng Ma) > do not leak SaveMode to file source v2 > -- > > Key: SPARK-28155 > URL: https://issues.apache.org/jira/browse/SPARK-28155 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yesheng Ma >Priority: Blocker > > The current catalyst optimizer's predicate pushdown is divided into two > separate rules: PushDownPredicate and PushThroughJoin. This is not efficient > for optimizing cascading joins such as TPC-DS q64, where a whole default > batch is re-executed just due to this. We need a more efficient approach to > pushdown predicate as much as possible in a single pass. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28155) do not leak SaveMode to file source v2
[ https://issues.apache.org/jira/browse/SPARK-28155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-28155: - Issue Type: Bug (was: Improvement) > do not leak SaveMode to file source v2 > -- > > Key: SPARK-28155 > URL: https://issues.apache.org/jira/browse/SPARK-28155 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yesheng Ma >Assignee: Yesheng Ma >Priority: Major > Fix For: 3.0.0 > > > The current catalyst optimizer's predicate pushdown is divided into two > separate rules: PushDownPredicate and PushThroughJoin. This is not efficient > for optimizing cascading joins such as TPC-DS q64, where a whole default > batch is re-executed just due to this. We need a more efficient approach to > pushdown predicate as much as possible in a single pass. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27815) Improve SQL optimizer's predicate pushdown performance for cascading joins
[ https://issues.apache.org/jira/browse/SPARK-27815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889402#comment-16889402 ] Hyukjin Kwon commented on SPARK-27815: -- There was a mistake about JIRA ID and commit. The JIRA is now switched to SPARK-28155 > Improve SQL optimizer's predicate pushdown performance for cascading joins > --- > > Key: SPARK-27815 > URL: https://issues.apache.org/jira/browse/SPARK-27815 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yesheng Ma >Assignee: Yesheng Ma >Priority: Major > Fix For: 3.0.0 > > > The current catalyst optimizer's predicate pushdown is divided into two > separate rules: PushDownPredicate and PushThroughJoin. This is not efficient > for optimizing cascading joins such as TPC-DS q64, where a whole default > batch is re-executed just due to this. We need a more efficient approach to > pushdown predicate as much as possible in a single pass. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28155) do not leak SaveMode to file source v2
[ https://issues.apache.org/jira/browse/SPARK-28155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-28155: - Target Version/s: 3.0.0 Fix Version/s: (was: 3.0.0) > do not leak SaveMode to file source v2 > -- > > Key: SPARK-28155 > URL: https://issues.apache.org/jira/browse/SPARK-28155 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yesheng Ma >Assignee: Yesheng Ma >Priority: Major > > The current catalyst optimizer's predicate pushdown is divided into two > separate rules: PushDownPredicate and PushThroughJoin. This is not efficient > for optimizing cascading joins such as TPC-DS q64, where a whole default > batch is re-executed just due to this. We need a more efficient approach to > pushdown predicate as much as possible in a single pass. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28155) do not leak SaveMode to file source v2
[ https://issues.apache.org/jira/browse/SPARK-28155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-28155: - Summary: do not leak SaveMode to file source v2 (was: Improve SQL optimizer's predicate pushdown performance for cascading joins) > do not leak SaveMode to file source v2 > -- > > Key: SPARK-28155 > URL: https://issues.apache.org/jira/browse/SPARK-28155 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yesheng Ma >Assignee: Yesheng Ma >Priority: Major > Fix For: 3.0.0 > > > The current catalyst optimizer's predicate pushdown is divided into two > separate rules: PushDownPredicate and PushThroughJoin. This is not efficient > for optimizing cascading joins such as TPC-DS q64, where a whole default > batch is re-executed just due to this. We need a more efficient approach to > pushdown predicate as much as possible in a single pass. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-27815) do not leak SaveMode to file source v2
[ https://issues.apache.org/jira/browse/SPARK-27815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-27815: - Comment: was deleted (was: [~cloud_fan] Hi Wenchen, do you think it is viable solution mentioned below? Create a new V2WriteCommand case class and its Exec named maybe _OverwriteByQueryId_ to replace WriteToDataSourceV2, which accepts a QueryId so that tests can pass. Or should we keep WriteToDataSourceV2?) > do not leak SaveMode to file source v2 > -- > > Key: SPARK-27815 > URL: https://issues.apache.org/jira/browse/SPARK-27815 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yesheng Ma >Assignee: Yesheng Ma >Priority: Major > Fix For: 3.0.0 > > > The current catalyst optimizer's predicate pushdown is divided into two > separate rules: PushDownPredicate and PushThroughJoin. This is not efficient > for optimizing cascading joins such as TPC-DS q64, where a whole default > batch is re-executed just due to this. We need a more efficient approach to > pushdown predicate as much as possible in a single pass. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27815) Improve SQL optimizer's predicate pushdown performance for cascading joins
[ https://issues.apache.org/jira/browse/SPARK-27815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-27815: - Summary: Improve SQL optimizer's predicate pushdown performance for cascading joins (was: do not leak SaveMode to file source v2) > Improve SQL optimizer's predicate pushdown performance for cascading joins > --- > > Key: SPARK-27815 > URL: https://issues.apache.org/jira/browse/SPARK-27815 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yesheng Ma >Assignee: Yesheng Ma >Priority: Major > Fix For: 3.0.0 > > > The current catalyst optimizer's predicate pushdown is divided into two > separate rules: PushDownPredicate and PushThroughJoin. This is not efficient > for optimizing cascading joins such as TPC-DS q64, where a whole default > batch is re-executed just due to this. We need a more efficient approach to > pushdown predicate as much as possible in a single pass. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27815) Improve SQL optimizer's predicate pushdown performance for cascading joins
[ https://issues.apache.org/jira/browse/SPARK-27815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-27815. -- Resolution: Fixed > Improve SQL optimizer's predicate pushdown performance for cascading joins > --- > > Key: SPARK-27815 > URL: https://issues.apache.org/jira/browse/SPARK-27815 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yesheng Ma >Assignee: Yesheng Ma >Priority: Major > Fix For: 3.0.0 > > > The current catalyst optimizer's predicate pushdown is divided into two > separate rules: PushDownPredicate and PushThroughJoin. This is not efficient > for optimizing cascading joins such as TPC-DS q64, where a whole default > batch is re-executed just due to this. We need a more efficient approach to > pushdown predicate as much as possible in a single pass. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27815) do not leak SaveMode to file source v2
[ https://issues.apache.org/jira/browse/SPARK-27815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889401#comment-16889401 ] Hyukjin Kwon commented on SPARK-27815: -- Fixed in https://github.com/apache/spark/pull/24956 > do not leak SaveMode to file source v2 > -- > > Key: SPARK-27815 > URL: https://issues.apache.org/jira/browse/SPARK-27815 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yesheng Ma >Assignee: Yesheng Ma >Priority: Major > Fix For: 3.0.0 > > > The current catalyst optimizer's predicate pushdown is divided into two > separate rules: PushDownPredicate and PushThroughJoin. This is not efficient > for optimizing cascading joins such as TPC-DS q64, where a whole default > batch is re-executed just due to this. We need a more efficient approach to > pushdown predicate as much as possible in a single pass. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-28155) Improve SQL optimizer's predicate pushdown performance for cascading joins
[ https://issues.apache.org/jira/browse/SPARK-28155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-28155: - Comment: was deleted (was: User 'yeshengm' has created a pull request for this issue: https://github.com/apache/spark/pull/24956) > Improve SQL optimizer's predicate pushdown performance for cascading joins > -- > > Key: SPARK-28155 > URL: https://issues.apache.org/jira/browse/SPARK-28155 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yesheng Ma >Assignee: Yesheng Ma >Priority: Major > Fix For: 3.0.0 > > > The current catalyst optimizer's predicate pushdown is divided into two > separate rules: PushDownPredicate and PushThroughJoin. This is not efficient > for optimizing cascading joins such as TPC-DS q64, where a whole default > batch is re-executed just due to this. We need a more efficient approach to > pushdown predicate as much as possible in a single pass. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-28155) Improve SQL optimizer's predicate pushdown performance for cascading joins
[ https://issues.apache.org/jira/browse/SPARK-28155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-28155: - Comment: was deleted (was: This is committed with a wrong JIRA ID, `SPARK-28155`.) > Improve SQL optimizer's predicate pushdown performance for cascading joins > -- > > Key: SPARK-28155 > URL: https://issues.apache.org/jira/browse/SPARK-28155 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yesheng Ma >Assignee: Yesheng Ma >Priority: Major > Fix For: 3.0.0 > > > The current catalyst optimizer's predicate pushdown is divided into two > separate rules: PushDownPredicate and PushThroughJoin. This is not efficient > for optimizing cascading joins such as TPC-DS q64, where a whole default > batch is re-executed just due to this. We need a more efficient approach to > pushdown predicate as much as possible in a single pass. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28155) Improve SQL optimizer's predicate pushdown performance for cascading joins
[ https://issues.apache.org/jira/browse/SPARK-28155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889400#comment-16889400 ] Hyukjin Kwon commented on SPARK-28155: -- [~Tonix517] said: Hi Wenchen, do you think it is viable solution mentioned below? Create a new V2WriteCommand case class and its Exec named maybe OverwriteByQueryId to replace WriteToDataSourceV2, which accepts a QueryId so that tests can pass. Or should we keep WriteToDataSourceV2? > Improve SQL optimizer's predicate pushdown performance for cascading joins > -- > > Key: SPARK-28155 > URL: https://issues.apache.org/jira/browse/SPARK-28155 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yesheng Ma >Assignee: Yesheng Ma >Priority: Major > Fix For: 3.0.0 > > > The current catalyst optimizer's predicate pushdown is divided into two > separate rules: PushDownPredicate and PushThroughJoin. This is not efficient > for optimizing cascading joins such as TPC-DS q64, where a whole default > batch is re-executed just due to this. We need a more efficient approach to > pushdown predicate as much as possible in a single pass. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27815) do not leak SaveMode to file source v2
[ https://issues.apache.org/jira/browse/SPARK-27815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-27815: - Issue Type: Improvement (was: Bug) > do not leak SaveMode to file source v2 > -- > > Key: SPARK-27815 > URL: https://issues.apache.org/jira/browse/SPARK-27815 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Priority: Major > > The current catalyst optimizer's predicate pushdown is divided into two > separate rules: PushDownPredicate and PushThroughJoin. This is not efficient > for optimizing cascading joins such as TPC-DS q64, where a whole default > batch is re-executed just due to this. We need a more efficient approach to > pushdown predicate as much as possible in a single pass. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27815) do not leak SaveMode to file source v2
[ https://issues.apache.org/jira/browse/SPARK-27815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-27815: - Reporter: Yesheng Ma (was: Wenchen Fan) > do not leak SaveMode to file source v2 > -- > > Key: SPARK-27815 > URL: https://issues.apache.org/jira/browse/SPARK-27815 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yesheng Ma >Assignee: Yesheng Ma >Priority: Major > Fix For: 3.0.0 > > > The current catalyst optimizer's predicate pushdown is divided into two > separate rules: PushDownPredicate and PushThroughJoin. This is not efficient > for optimizing cascading joins such as TPC-DS q64, where a whole default > batch is re-executed just due to this. We need a more efficient approach to > pushdown predicate as much as possible in a single pass. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27815) do not leak SaveMode to file source v2
[ https://issues.apache.org/jira/browse/SPARK-27815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-27815: - Priority: Major (was: Blocker) > do not leak SaveMode to file source v2 > -- > > Key: SPARK-27815 > URL: https://issues.apache.org/jira/browse/SPARK-27815 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Priority: Major > > The current catalyst optimizer's predicate pushdown is divided into two > separate rules: PushDownPredicate and PushThroughJoin. This is not efficient > for optimizing cascading joins such as TPC-DS q64, where a whole default > batch is re-executed just due to this. We need a more efficient approach to > pushdown predicate as much as possible in a single pass. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27815) do not leak SaveMode to file source v2
[ https://issues.apache.org/jira/browse/SPARK-27815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-27815: Assignee: Yesheng Ma > do not leak SaveMode to file source v2 > -- > > Key: SPARK-27815 > URL: https://issues.apache.org/jira/browse/SPARK-27815 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Assignee: Yesheng Ma >Priority: Major > Fix For: 3.0.0 > > > The current catalyst optimizer's predicate pushdown is divided into two > separate rules: PushDownPredicate and PushThroughJoin. This is not efficient > for optimizing cascading joins such as TPC-DS q64, where a whole default > batch is re-executed just due to this. We need a more efficient approach to > pushdown predicate as much as possible in a single pass. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27815) do not leak SaveMode to file source v2
[ https://issues.apache.org/jira/browse/SPARK-27815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-27815: - Target Version/s: (was: 3.0.0) > do not leak SaveMode to file source v2 > -- > > Key: SPARK-27815 > URL: https://issues.apache.org/jira/browse/SPARK-27815 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Priority: Major > Fix For: 3.0.0 > > > The current catalyst optimizer's predicate pushdown is divided into two > separate rules: PushDownPredicate and PushThroughJoin. This is not efficient > for optimizing cascading joins such as TPC-DS q64, where a whole default > batch is re-executed just due to this. We need a more efficient approach to > pushdown predicate as much as possible in a single pass. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27815) do not leak SaveMode to file source v2
[ https://issues.apache.org/jira/browse/SPARK-27815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-27815: - Description: The current catalyst optimizer's predicate pushdown is divided into two separate rules: PushDownPredicate and PushThroughJoin. This is not efficient for optimizing cascading joins such as TPC-DS q64, where a whole default batch is re-executed just due to this. We need a more efficient approach to pushdown predicate as much as possible in a single pass. (was: Currently there is a hack in `DataFrameWriter`, which passes `SaveMode` to file source v2. This should be removed and file source v2 should not accept SaveMode.) > do not leak SaveMode to file source v2 > -- > > Key: SPARK-27815 > URL: https://issues.apache.org/jira/browse/SPARK-27815 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Priority: Blocker > > The current catalyst optimizer's predicate pushdown is divided into two > separate rules: PushDownPredicate and PushThroughJoin. This is not efficient > for optimizing cascading joins such as TPC-DS q64, where a whole default > batch is re-executed just due to this. We need a more efficient approach to > pushdown predicate as much as possible in a single pass. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27815) do not leak SaveMode to file source v2
[ https://issues.apache.org/jira/browse/SPARK-27815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-27815: - Fix Version/s: 3.0.0 > do not leak SaveMode to file source v2 > -- > > Key: SPARK-27815 > URL: https://issues.apache.org/jira/browse/SPARK-27815 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Priority: Major > Fix For: 3.0.0 > > > The current catalyst optimizer's predicate pushdown is divided into two > separate rules: PushDownPredicate and PushThroughJoin. This is not efficient > for optimizing cascading joins such as TPC-DS q64, where a whole default > batch is re-executed just due to this. We need a more efficient approach to > pushdown predicate as much as possible in a single pass. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28458) CLONE - do not leak SaveMode to file source v2
Hyukjin Kwon created SPARK-28458: Summary: CLONE - do not leak SaveMode to file source v2 Key: SPARK-28458 URL: https://issues.apache.org/jira/browse/SPARK-28458 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Wenchen Fan Currently there is a hack in `DataFrameWriter`, which passes `SaveMode` to file source v2. This should be removed and file source v2 should not accept SaveMode. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-28457) curl: (60) SSL certificate problem: unable to get local issuer certificate More details here:
[ https://issues.apache.org/jira/browse/SPARK-28457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889392#comment-16889392 ] Xiao Li edited comment on SPARK-28457 at 7/20/19 6:46 AM: -- [~shaneknapp], Could you help take a look at this? was (Author: smilegator): [~shaneknapp] > curl: (60) SSL certificate problem: unable to get local issuer certificate > More details here: > -- > > Key: SPARK-28457 > URL: https://issues.apache.org/jira/browse/SPARK-28457 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Blocker > > > Build broke since this afternoon. > [spark-master-compile-maven-hadoop-2.7 #10224 (broken since this > build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-2.7/10224/] > [spark-master-compile-maven-hadoop-3.2 #171 (broken since this > build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-3.2/171/] > [spark-master-lint #10599 (broken since this > build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/10599/] > > {code:java} > > > https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz > curl: (60) SSL certificate problem: unable to get local issuer certificate > More details here: > https://curl.haxx.se/docs/sslcerts.html > curl performs SSL certificate verification by default, using a "bundle" > of Certificate Authority (CA) public keys (CA certs). If the default > bundle file isn't adequate, you can specify an alternate file > using the --cacert option. > If this HTTPS server uses a certificate signed by a CA represented in > the bundle, the certificate verification probably failed due to a > problem with the certificate (it might be expired, or the name might > not match the domain name in the URL). > If you'd like to turn off curl's verification of the certificate, use > the -k (or --insecure) option. > gzip: stdin: unexpected end of file > tar: Child returned status 1 > tar: Error is not recoverable: exiting now > Using `mvn` from path: > /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn > build/mvn: line 163: > /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn: > No such file or directory > Build step 'Execute shell' marked build as failure > Finished: FAILURE > {code} > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28457) curl: (60) SSL certificate problem: unable to get local issuer certificate More details here:
[ https://issues.apache.org/jira/browse/SPARK-28457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889392#comment-16889392 ] Xiao Li commented on SPARK-28457: - [~shaneknapp] > curl: (60) SSL certificate problem: unable to get local issuer certificate > More details here: > -- > > Key: SPARK-28457 > URL: https://issues.apache.org/jira/browse/SPARK-28457 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Blocker > > > Build broke since this afternoon. > [spark-master-compile-maven-hadoop-2.7 #10224 (broken since this > build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-2.7/10224/] > [spark-master-compile-maven-hadoop-3.2 #171 (broken since this > build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-3.2/171/] > [spark-master-lint #10599 (broken since this > build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/10599/] > > {code:java} > > > https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz > curl: (60) SSL certificate problem: unable to get local issuer certificate > More details here: > https://curl.haxx.se/docs/sslcerts.html > curl performs SSL certificate verification by default, using a "bundle" > of Certificate Authority (CA) public keys (CA certs). If the default > bundle file isn't adequate, you can specify an alternate file > using the --cacert option. > If this HTTPS server uses a certificate signed by a CA represented in > the bundle, the certificate verification probably failed due to a > problem with the certificate (it might be expired, or the name might > not match the domain name in the URL). > If you'd like to turn off curl's verification of the certificate, use > the -k (or --insecure) option. > gzip: stdin: unexpected end of file > tar: Child returned status 1 > tar: Error is not recoverable: exiting now > Using `mvn` from path: > /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn > build/mvn: line 163: > /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn: > No such file or directory > Build step 'Execute shell' marked build as failure > Finished: FAILURE > {code} > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28457) curl: (60) SSL certificate problem: unable to get local issuer certificate More details here:
[ https://issues.apache.org/jira/browse/SPARK-28457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-28457: Description: Build broke since this afternoon. [spark-master-compile-maven-hadoop-2.7 #10224 (broken since this build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-2.7/10224/] [spark-master-compile-maven-hadoop-3.2 #171 (broken since this build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-3.2/171/] [spark-master-lint #10599 (broken since this build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/10599/] {code:java} https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz curl: (60) SSL certificate problem: unable to get local issuer certificate More details here: https://curl.haxx.se/docs/sslcerts.html curl performs SSL certificate verification by default, using a "bundle" of Certificate Authority (CA) public keys (CA certs). If the default bundle file isn't adequate, you can specify an alternate file using the --cacert option. If this HTTPS server uses a certificate signed by a CA represented in the bundle, the certificate verification probably failed due to a problem with the certificate (it might be expired, or the name might not match the domain name in the URL). If you'd like to turn off curl's verification of the certificate, use the -k (or --insecure) option. gzip: stdin: unexpected end of file tar: Child returned status 1 tar: Error is not recoverable: exiting now Using `mvn` from path: /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn build/mvn: line 163: /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn: No such file or directory Build step 'Execute shell' marked build as failure Finished: FAILURE {code} was: [spark-master-compile-maven-hadoop-2.7 #10224 (broken since this build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-2.7/10224/] [spark-master-compile-maven-hadoop-3.2 #171 (broken since this build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-3.2/171/] [spark-master-lint #10599 (broken since this build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/10599/] [https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz] curl: (60) SSL certificate problem: unable to get local issuer certificate More details here: [https://curl.haxx.se/docs/sslcerts.html] curl performs SSL certificate verification by default, using a "bundle" of Certificate Authority (CA) public keys (CA certs). If the default bundle file isn't adequate, you can specify an alternate file using the --cacert option. If this HTTPS server uses a certificate signed by a CA represented in the bundle, the certificate verification probably failed due to a problem with the certificate (it might be expired, or the name might not match the domain name in the URL). If you'd like to turn off curl's verification of the certificate, use the -k (or --insecure) option. gzip: stdin: unexpected end of file tar: Child returned status 1 tar: Error is not recoverable: exiting now Using `mvn` from path: /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn build/mvn: line 163: /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn: No such file or directory Build step 'Execute shell' marked build as failure Finished: FAILURE > curl: (60) SSL certificate problem: unable to get local issuer certificate > More details here: > -- > > Key: SPARK-28457 > URL: https://issues.apache.org/jira/browse/SPARK-28457 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Blocker > > > Build broke since this afternoon. > [spark-master-compile-maven-hadoop-2.7 #10224 (broken since this > build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-2.7/10224/] > [spark-master-compile-maven-hadoop-3.2 #171 (broken since this > build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-3.2/171/] > [spark-master-lint #10599 (broken since this > build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/10599/] > > {code:java} > > >
[jira] [Updated] (SPARK-28457) curl: (60) SSL certificate problem: unable to get local issuer certificate More details here:
[ https://issues.apache.org/jira/browse/SPARK-28457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-28457: Description: [spark-master-compile-maven-hadoop-2.7 #10224 (broken since this build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-2.7/10224/] [spark-master-compile-maven-hadoop-3.2 #171 (broken since this build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-3.2/171/] [spark-master-lint #10599 (broken since this build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/10599/] [https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz] curl: (60) SSL certificate problem: unable to get local issuer certificate More details here: [https://curl.haxx.se/docs/sslcerts.html] curl performs SSL certificate verification by default, using a "bundle" of Certificate Authority (CA) public keys (CA certs). If the default bundle file isn't adequate, you can specify an alternate file using the --cacert option. If this HTTPS server uses a certificate signed by a CA represented in the bundle, the certificate verification probably failed due to a problem with the certificate (it might be expired, or the name might not match the domain name in the URL). If you'd like to turn off curl's verification of the certificate, use the -k (or --insecure) option. gzip: stdin: unexpected end of file tar: Child returned status 1 tar: Error is not recoverable: exiting now Using `mvn` from path: /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn build/mvn: line 163: /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn: No such file or directory Build step 'Execute shell' marked build as failure Finished: FAILURE was: + build/mvn -DzincPort=3215 -DskipTests -Phadoop-2.7 -Phive-thriftserver -Pkinesis-asl -Pspark-ganglia-lgpl -Pmesos -Pyarn clean compile test-compile exec: curl --progress-bar -L https://downloads.lightbend.com/zinc/0.3.15/zinc-0.3.15.tgz ##15.3% # 29.8% 51.2% # 69.3% 78.3% 100.0% exec: curl --progress-bar -L https://downloads.lightbend.com/scala/2.12.8/scala-2.12.8.tgz ##31.5% ### 49.6% ##81.3% 100.0% exec: curl --progress-bar -L https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz curl: (60) SSL certificate problem: unable to get local issuer certificate More details here: https://curl.haxx.se/docs/sslcerts.html curl performs SSL certificate verification by default, using a "bundle" of Certificate Authority (CA) public keys (CA certs). If the default bundle file isn't adequate, you can specify an alternate file using the --cacert option. If this HTTPS server uses a certificate signed by a CA represented in the bundle, the certificate verification probably failed due to a problem with the certificate (it might be expired, or the name might not match the domain name in the URL). If you'd like to turn off curl's verification of the certificate, use the -k (or --insecure) option. gzip: stdin: unexpected end of file tar: Child returned status 1 tar: Error is not recoverable: exiting now Using `mvn` from path: /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn build/mvn: line 163: /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn: No such file or directory Build step 'Execute shell' marked build as failure Finished: FAILURE > curl: (60) SSL certificate problem: unable to get local issuer certificate > More details here: > -- > > Key: SPARK-28457 > URL: https://issues.apache.org/jira/browse/SPARK-28457 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Blocker > > [spark-master-compile-maven-hadoop-2.7 #10224
[jira] [Updated] (SPARK-28457) curl: (60) SSL certificate problem: unable to get local issuer certificate More details here:
[ https://issues.apache.org/jira/browse/SPARK-28457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-28457: Priority: Blocker (was: Major) > curl: (60) SSL certificate problem: unable to get local issuer certificate > More details here: > -- > > Key: SPARK-28457 > URL: https://issues.apache.org/jira/browse/SPARK-28457 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Blocker > > + build/mvn -DzincPort=3215 -DskipTests -Phadoop-2.7 -Phive-thriftserver > -Pkinesis-asl -Pspark-ganglia-lgpl -Pmesos -Pyarn clean compile test-compile > exec: curl --progress-bar -L > [https://downloads.lightbend.com/zinc/0.3.15/zinc-0.3.15.tgz] > ## > 15.3% > # > 29.8% > > 51.2% > # > 69.3% > > 78.3% > > 100.0% > exec: curl --progress-bar -L > [https://downloads.lightbend.com/scala/2.12.8/scala-2.12.8.tgz] > ## > 31.5% > ### > 49.6% > ## > 81.3% > > 100.0% > exec: curl --progress-bar -L > [https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz] > curl: (60) SSL certificate problem: unable to get local issuer certificate > More details here: > [https://curl.haxx.se/docs/sslcerts.html] > curl performs SSL certificate verification by default, using a "bundle" > of Certificate Authority (CA) public keys (CA certs). If the default > bundle file isn't adequate, you can specify an alternate file > using the --cacert option. > If this HTTPS server uses a certificate signed by a CA represented in > the bundle, the certificate verification probably failed due to a > problem with the certificate (it might be expired, or the name might > not match the domain name in the URL). > If you'd like to turn off curl's verification of the certificate, use > the -k (or --insecure) option. > gzip: stdin: unexpected end of file > tar: Child returned status 1 > tar: Error is not recoverable: exiting now > Using `mvn` from path: > /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn > build/mvn: line 163: > /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn: > No such file or directory > Build step 'Execute shell' marked build as failure > Finished: FAILURE -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28457) curl: (60) SSL certificate problem: unable to get local issuer certificate More details here:
Xiao Li created SPARK-28457: --- Summary: curl: (60) SSL certificate problem: unable to get local issuer certificate More details here: Key: SPARK-28457 URL: https://issues.apache.org/jira/browse/SPARK-28457 Project: Spark Issue Type: Bug Components: Project Infra Affects Versions: 3.0.0 Reporter: Xiao Li + build/mvn -DzincPort=3215 -DskipTests -Phadoop-2.7 -Phive-thriftserver -Pkinesis-asl -Pspark-ganglia-lgpl -Pmesos -Pyarn clean compile test-compile exec: curl --progress-bar -L [https://downloads.lightbend.com/zinc/0.3.15/zinc-0.3.15.tgz] ##15.3% # 29.8% 51.2% # 69.3% 78.3% 100.0% exec: curl --progress-bar -L [https://downloads.lightbend.com/scala/2.12.8/scala-2.12.8.tgz] ##31.5% ### 49.6% ##81.3% 100.0% exec: curl --progress-bar -L [https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz] curl: (60) SSL certificate problem: unable to get local issuer certificate More details here: [https://curl.haxx.se/docs/sslcerts.html] curl performs SSL certificate verification by default, using a "bundle" of Certificate Authority (CA) public keys (CA certs). If the default bundle file isn't adequate, you can specify an alternate file using the --cacert option. If this HTTPS server uses a certificate signed by a CA represented in the bundle, the certificate verification probably failed due to a problem with the certificate (it might be expired, or the name might not match the domain name in the URL). If you'd like to turn off curl's verification of the certificate, use the -k (or --insecure) option. gzip: stdin: unexpected end of file tar: Child returned status 1 tar: Error is not recoverable: exiting now Using `mvn` from path: /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn build/mvn: line 163: /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn: No such file or directory Build step 'Execute shell' marked build as failure Finished: FAILURE -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28457) curl: (60) SSL certificate problem: unable to get local issuer certificate More details here:
[ https://issues.apache.org/jira/browse/SPARK-28457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-28457: Description: + build/mvn -DzincPort=3215 -DskipTests -Phadoop-2.7 -Phive-thriftserver -Pkinesis-asl -Pspark-ganglia-lgpl -Pmesos -Pyarn clean compile test-compile exec: curl --progress-bar -L https://downloads.lightbend.com/zinc/0.3.15/zinc-0.3.15.tgz ##15.3% # 29.8% 51.2% # 69.3% 78.3% 100.0% exec: curl --progress-bar -L https://downloads.lightbend.com/scala/2.12.8/scala-2.12.8.tgz ##31.5% ### 49.6% ##81.3% 100.0% exec: curl --progress-bar -L https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz curl: (60) SSL certificate problem: unable to get local issuer certificate More details here: https://curl.haxx.se/docs/sslcerts.html curl performs SSL certificate verification by default, using a "bundle" of Certificate Authority (CA) public keys (CA certs). If the default bundle file isn't adequate, you can specify an alternate file using the --cacert option. If this HTTPS server uses a certificate signed by a CA represented in the bundle, the certificate verification probably failed due to a problem with the certificate (it might be expired, or the name might not match the domain name in the URL). If you'd like to turn off curl's verification of the certificate, use the -k (or --insecure) option. gzip: stdin: unexpected end of file tar: Child returned status 1 tar: Error is not recoverable: exiting now Using `mvn` from path: /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn build/mvn: line 163: /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn: No such file or directory Build step 'Execute shell' marked build as failure Finished: FAILURE was: + build/mvn -DzincPort=3215 -DskipTests -Phadoop-2.7 -Phive-thriftserver -Pkinesis-asl -Pspark-ganglia-lgpl -Pmesos -Pyarn clean compile test-compile exec: curl --progress-bar -L [https://downloads.lightbend.com/zinc/0.3.15/zinc-0.3.15.tgz] ##15.3% # 29.8% 51.2% # 69.3% 78.3% 100.0% exec: curl --progress-bar -L [https://downloads.lightbend.com/scala/2.12.8/scala-2.12.8.tgz] ##31.5% ### 49.6% ##81.3% 100.0% exec: curl --progress-bar -L [https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz] curl: (60) SSL certificate problem: unable to get local issuer certificate More details here: [https://curl.haxx.se/docs/sslcerts.html] curl performs SSL certificate verification by default, using a "bundle" of Certificate Authority (CA) public keys (CA certs). If the default bundle file isn't adequate, you can specify an alternate file using the --cacert option. If this HTTPS server uses a certificate signed by a CA represented in the bundle, the certificate verification probably failed due to a problem with the certificate (it might be expired, or the name might not match the domain name in the URL). If you'd like to turn off curl's verification of the certificate, use the -k (or --insecure) option. gzip: stdin: unexpected end of file tar: Child returned status 1 tar: Error is not recoverable: exiting now Using `mvn` from path: /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn build/mvn: line 163:
[jira] [Resolved] (SPARK-28282) Convert and port 'inline-table.sql' into UDF test base
[ https://issues.apache.org/jira/browse/SPARK-28282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-28282. -- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 25124 [https://github.com/apache/spark/pull/25124] > Convert and port 'inline-table.sql' into UDF test base > -- > > Key: SPARK-28282 > URL: https://issues.apache.org/jira/browse/SPARK-28282 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL, Tests >Affects Versions: 3.0.0 >Reporter: Hyukjin Kwon >Assignee: Terry Kim >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28282) Convert and port 'inline-table.sql' into UDF test base
[ https://issues.apache.org/jira/browse/SPARK-28282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-28282: Assignee: Terry Kim > Convert and port 'inline-table.sql' into UDF test base > -- > > Key: SPARK-28282 > URL: https://issues.apache.org/jira/browse/SPARK-28282 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL, Tests >Affects Versions: 3.0.0 >Reporter: Hyukjin Kwon >Assignee: Terry Kim >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28279) Convert and port 'group-analytics.sql' into UDF test base
[ https://issues.apache.org/jira/browse/SPARK-28279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-28279: Assignee: Stavros Kontopoulos > Convert and port 'group-analytics.sql' into UDF test base > - > > Key: SPARK-28279 > URL: https://issues.apache.org/jira/browse/SPARK-28279 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL, Tests >Affects Versions: 3.0.0 >Reporter: Hyukjin Kwon >Assignee: Stavros Kontopoulos >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28279) Convert and port 'group-analytics.sql' into UDF test base
[ https://issues.apache.org/jira/browse/SPARK-28279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-28279. -- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 25196 [https://github.com/apache/spark/pull/25196] > Convert and port 'group-analytics.sql' into UDF test base > - > > Key: SPARK-28279 > URL: https://issues.apache.org/jira/browse/SPARK-28279 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL, Tests >Affects Versions: 3.0.0 >Reporter: Hyukjin Kwon >Assignee: Stavros Kontopoulos >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org