[jira] [Commented] (SPARK-28464) Document kafka minPartitions option in "Structured Streaming + Kafka Integration Guide"

2019-07-20 Thread Arun Pandian (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889641#comment-16889641
 ] 

Arun Pandian commented on SPARK-28464:
--

Opened a pull request [https://github.com/apache/spark/pull/25219]

> Document kafka minPartitions option in "Structured Streaming + Kafka 
> Integration Guide" 
> 
>
> Key: SPARK-28464
> URL: https://issues.apache.org/jira/browse/SPARK-28464
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 2.4.3
>Reporter: Arun Pandian
>Priority: Minor
>
> SPARK-23541 added the option "minPartitions" to Kafka source. "minPartitions" 
> is missing in the "Structured Streaming + Kafka Integration Guide" and needs 
> to be documented.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28464) Document kafka minPartitions option in "Structured Streaming + Kafka Integration Guide"

2019-07-20 Thread Arunpandian Ganesan (JIRA)
Arunpandian Ganesan created SPARK-28464:
---

 Summary: Document kafka minPartitions option in "Structured 
Streaming + Kafka Integration Guide" 
 Key: SPARK-28464
 URL: https://issues.apache.org/jira/browse/SPARK-28464
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Affects Versions: 2.4.3
Reporter: Arunpandian Ganesan


SPARK-23541 added the option "minPartitions" to Kafka source. "minPartitions" 
is missing in the "Structured Streaming + Kafka Integration Guide" and needs to 
be documented.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28463) Thriftserver throws java.math.BigDecimal incompatible with org.apache.hadoop.hive.common.type.HiveDecimal

2019-07-20 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-28463:
---

 Summary: Thriftserver throws java.math.BigDecimal incompatible 
with org.apache.hadoop.hive.common.type.HiveDecimal
 Key: SPARK-28463
 URL: https://issues.apache.org/jira/browse/SPARK-28463
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yuming Wang


How to reproduce this issue:
{code:sh}
build/sbt clean package -Phive -Phive-thriftserver -Phadoop-3.2
export SPARK_PREPEND_CLASSES=true
sbin/start-thriftserver.sh

[root@spark-3267648 spark]# bin/beeline -u jdbc:hive2://localhost:1/default 
-e "select cast(1 as decimal(38, 18));"
Connecting to jdbc:hive2://localhost:1/default
Connected to: Spark SQL (version 3.0.0-SNAPSHOT)
Driver: Hive JDBC (version 2.3.5)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Error: java.lang.ClassCastException: java.math.BigDecimal incompatible with 
org.apache.hadoop.hive.common.type.HiveDecimal (state=,code=0)
Closing: 0: jdbc:hive2://localhost:1/default
{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28462) Add a prefix '*' to non-nullable attribute names in PlanTestBase.comparePlans failures

2019-07-20 Thread Takeshi Yamamuro (JIRA)
Takeshi Yamamuro created SPARK-28462:


 Summary: Add a prefix '*' to non-nullable attribute names in 
PlanTestBase.comparePlans failures
 Key: SPARK-28462
 URL: https://issues.apache.org/jira/browse/SPARK-28462
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 3.0.0
Reporter: Takeshi Yamamuro


This ticket proposes to add a prefix '*' to non-nullable attribute names in 
PlanTestBase.comparePlans failures. In the current master, nullability 
mismatches might generate the same error message for left/right logical plans 
like this;
{code}
// This failure message was extracted from #24765
- constraints should be inferred from aliased literals *** FAILED ***
 == FAIL: Plans do not match ===
 !'Join Inner, (two#0 = a#0) 'Join Inner, (two#0 = a#0)
 :- Filter (isnotnull(a#0) AND (2 <=> a#0)) :- Filter (isnotnull(a#0) AND (2 
<=> a#0))
 : +- LocalRelation , [a#0, b#0, c#0] : +- LocalRelation , [a#0, 
b#0, c#0]
 +- Project [2 AS two#0] +- Project [2 AS two#0]
 +- LocalRelation , [a#0, b#0, c#0] +- LocalRelation , [a#0, b#0, 
c#0] (PlanTest.scala:145)
{code}
This ticket intends to change this error message to one below;
{code}
- constraints should be inferred from aliased literals *** FAILED ***
 == FAIL: Plans do not match ===
 !'Join Inner, (*two#0 = a#0) 'Join Inner, (*two#0 = *a#0)
 :- Filter (isnotnull(a#0) AND (2 <=> a#0)) :- Filter (isnotnull(a#0) AND (2 
<=> a#0))
 : +- LocalRelation , [a#0, b#0, c#0] : +- LocalRelation , [a#0, 
b#0, c#0]
 +- Project [2 AS two#0] +- Project [2 AS two#0]
 +- LocalRelation , [a#0, b#0, c#0] +- LocalRelation , [a#0, b#0, 
c#0] (PlanTest.scala:145)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-28461) Pad Decimal numbers with trailing zeros to the scale of the column

2019-07-20 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889627#comment-16889627
 ] 

Dongjoon Hyun edited comment on SPARK-28461 at 7/21/19 1:22 AM:


Hi, [~yumwang]. Please link the Hive issue to `Issue Links` instead of 
embedding in the JIRA description.


was (Author: dongjoon):
Hi, [~yumwang]. Please link the issue to `Issue Links` instead of embedding in 
the JIRA description.

> Pad Decimal numbers with trailing zeros to the scale of the column
> --
>
> Key: SPARK-28461
> URL: https://issues.apache.org/jira/browse/SPARK-28461
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> PostgreSQL:
> {code:sql}
> postgres=# select cast(1 as decimal(38, 18));
>numeric
> --
>  1.00
> (1 row)
> {code}
> Spark SQL:
> {code:sql}
> spark-sql> select cast(1 as decimal(38, 18));
> 1
> spark-sql>
> {code}
> Hive fix this issue by HIVE-12063.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28461) Pad Decimal numbers with trailing zeros to the scale of the column

2019-07-20 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889627#comment-16889627
 ] 

Dongjoon Hyun commented on SPARK-28461:
---

Hi, [~yumwang]. Please link the issue to `Issue Links` instead of embedding in 
the JIRA description.

> Pad Decimal numbers with trailing zeros to the scale of the column
> --
>
> Key: SPARK-28461
> URL: https://issues.apache.org/jira/browse/SPARK-28461
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> PostgreSQL:
> {code:sql}
> postgres=# select cast(1 as decimal(38, 18));
>numeric
> --
>  1.00
> (1 row)
> {code}
> Spark SQL:
> {code:sql}
> spark-sql> select cast(1 as decimal(38, 18));
> 1
> spark-sql>
> {code}
> Hive fix this issue by HIVE-12063.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28461) Pad Decimal numbers with trailing zeros to the scale of the column

2019-07-20 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-28461:
---

 Summary: Pad Decimal numbers with trailing zeros to the scale of 
the column
 Key: SPARK-28461
 URL: https://issues.apache.org/jira/browse/SPARK-28461
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yuming Wang


PostgreSQL:
{code:sql}
postgres=# select cast(1 as decimal(38, 18));
   numeric
--
 1.00
(1 row)
{code}

Spark SQL:
{code:sql}
spark-sql> select cast(1 as decimal(38, 18));
1
spark-sql>
{code}

Hive fix this issue by HIVE-12063.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27441) Add read/write tests to Hive serde tables

2019-07-20 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889620#comment-16889620
 ] 

Yuming Wang commented on SPARK-27441:
-

[~dongjoon] Could you update the status?

> Add read/write tests to Hive serde tables
> -
>
> Key: SPARK-27441
> URL: https://issues.apache.org/jira/browse/SPARK-27441
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> The versions between Hive, Parquet and ORC after the built-in Hive upgrade to 
> 2.3.4:
> built-in Hive is 1.2.1:
> || ||ORC||Parquet||
> |Spark datasource table|1.5.5|1.10.1|
> |Spark hive table|Hive built-in|1.6.0|
> |Hive 1.2.1|Hive built-in|1.6.0|
> built-in Hive is 2.3.4:
> || ||ORC||Parquet||
> |Spark datasource table|1.5.5|1.10.1|
> |Spark hive table|1.5.5|1.8.1|
> |Hive 2.3.4|1.3.3|1.8.1|
>  We should add a test for Hive Serde table.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28446) Document Kafka Headers support

2019-07-20 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-28446.
---
Resolution: Duplicate

> Document Kafka Headers support
> --
>
> Key: SPARK-28446
> URL: https://issues.apache.org/jira/browse/SPARK-28446
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Lee Dongjin
>Priority: Minor
>
> This issue is a follow up of SPARK-23539.
> After completing SPARK-23539, the following information about the headers 
> functionality should be noted in Structured Streaming + Kafka Integration 
> Guide:
>  * The requirements to use Headers functionality (i.e., Kafka version).
>  * How to turn on the Headers functionality.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-28457) curl: (60) SSL certificate problem: unable to get local issuer certificate More details here:

2019-07-20 Thread Michael Heuer (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889573#comment-16889573
 ] 

Michael Heuer edited comment on SPARK-28457 at 7/20/19 7:12 PM:


We're also seeing this in [our CI 
builds|https://amplab.cs.berkeley.edu/jenkins/job/ADAM/HADOOP_VERSION=2.7.5,SCALAVER=2.12,SPARK_VERSION=2.4.3,label=ubuntu/4351/console]
 on older Spark versions

{{{color:#22}+ curl -L 
'{color}[https://www.apache.org/dyn/mirrors/mirrors.cgi?action=download=spark/spark-2.4.3/spark-2.4.3-bin-without-hadoop-scala-2.12.tgz]{color:#22}'
 -o spark-2.4.3-bin-without-hadoop{color}{color:#22}-scala-2.12.tgz{color}}}
{{{color:#22}curl: (60) SSL certificate problem: unable to get local issuer 
{color}}}


was (Author: heuermh):
We're also seeing this in [our CI 
builds|https://amplab.cs.berkeley.edu/jenkins/job/ADAM/HADOOP_VERSION=2.7.5,SCALAVER=2.12,SPARK_VERSION=2.4.3,label=ubuntu/4351/console]
 on older Spark versions

{{{color:#22}+ curl -L 
'{color}[https://www.apache.org/dyn/mirrors/mirrors.cgi?action=download=spark/spark-2.4.3/spark-2.4.3-bin-without-hadoop-scala-2.12.tgz]{color:#22}'
 -o spark-2.4.3-bin-without-hadoop{color}{color:#22}-scala-2.12.tgz{color}}}
{{ }}{{{color:#22}curl: (60) SSL certificate problem: unable to get local 
issuer certificate{color}}}{{{color:#22}curl performs SSL certificate 
verification by default, using a "bundle"{color}}}
{{ {color:#22} of Certificate Authority (CA) public keys (CA certs). If the 
default{color}}}
{{ {color:#22} bundle file isn't adequate, you can specify an alternate 
file{color}}}
{{ {color:#22} using the --cacert option.{color}}}
{{ {color:#22}If this HTTPS server uses a certificate signed by a CA 
represented in{color}}}
{{ {color:#22} the bundle, the certificate verification probably failed due 
to a{color}}}
{{ {color:#22} problem with the certificate (it might be expired, or the 
name might{color}}}
{{ {color:#22} not match the domain name in the URL).{color}}}
{{ {color:#22}If you'd like to turn off curl's verification of the 
certificate, use{color}}}
{{ {color:#22} the -k (or --insecure) option.{color}}}
{{ {color:#22}Build step 'Execute shell' marked build as failure{color}}}

> curl: (60) SSL certificate problem: unable to get local issuer certificate 
> More details here: 
> --
>
> Key: SPARK-28457
> URL: https://issues.apache.org/jira/browse/SPARK-28457
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Priority: Blocker
>
>  
> Build broke since this afternoon.
> [spark-master-compile-maven-hadoop-2.7 #10224 (broken since this 
> build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-2.7/10224/]
>  [spark-master-compile-maven-hadoop-3.2 #171 (broken since this 
> build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-3.2/171/]
>  [spark-master-lint #10599 (broken since this 
> build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/10599/]
>   
> {code:java}
>   
>  
> https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz
>  curl: (60) SSL certificate problem: unable to get local issuer certificate
>  More details here: 
>  https://curl.haxx.se/docs/sslcerts.html
>  curl performs SSL certificate verification by default, using a "bundle"
>  of Certificate Authority (CA) public keys (CA certs). If the default
>  bundle file isn't adequate, you can specify an alternate file
>  using the --cacert option.
>  If this HTTPS server uses a certificate signed by a CA represented in
>  the bundle, the certificate verification probably failed due to a
>  problem with the certificate (it might be expired, or the name might
>  not match the domain name in the URL).
>  If you'd like to turn off curl's verification of the certificate, use
>  the -k (or --insecure) option.
> gzip: stdin: unexpected end of file
>  tar: Child returned status 1
>  tar: Error is not recoverable: exiting now
>  Using `mvn` from path: 
> /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn
>  build/mvn: line 163: 
> /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn:
>  No such file or directory
>  Build step 'Execute shell' marked build as failure
>  Finished: FAILURE
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: 

[jira] [Comment Edited] (SPARK-28457) curl: (60) SSL certificate problem: unable to get local issuer certificate More details here:

2019-07-20 Thread Michael Heuer (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889573#comment-16889573
 ] 

Michael Heuer edited comment on SPARK-28457 at 7/20/19 7:12 PM:


We're also seeing this in [our CI 
builds|https://amplab.cs.berkeley.edu/jenkins/job/ADAM/HADOOP_VERSION=2.7.5,SCALAVER=2.12,SPARK_VERSION=2.4.3,label=ubuntu/4351/console]
 on older Spark versions

{{{color:#22}+ curl -L 
'{color}[https://www.apache.org/dyn/mirrors/mirrors.cgi?action=download=spark/spark-2.4.3/spark-2.4.3-bin-without-hadoop-scala-2.12.tgz]{color:#22}'
 -o spark-2.4.3-bin-without-hadoop{color}{color:#22}-scala-2.12.tgz{color}}}
{{ }}{{{color:#22}curl: (60) SSL certificate problem: unable to get local 
issuer certificate{color}}}{{{color:#22}curl performs SSL certificate 
verification by default, using a "bundle"{color}}}
{{ {color:#22} of Certificate Authority (CA) public keys (CA certs). If the 
default{color}}}
{{ {color:#22} bundle file isn't adequate, you can specify an alternate 
file{color}}}
{{ {color:#22} using the --cacert option.{color}}}
{{ {color:#22}If this HTTPS server uses a certificate signed by a CA 
represented in{color}}}
{{ {color:#22} the bundle, the certificate verification probably failed due 
to a{color}}}
{{ {color:#22} problem with the certificate (it might be expired, or the 
name might{color}}}
{{ {color:#22} not match the domain name in the URL).{color}}}
{{ {color:#22}If you'd like to turn off curl's verification of the 
certificate, use{color}}}
{{ {color:#22} the -k (or --insecure) option.{color}}}
{{ {color:#22}Build step 'Execute shell' marked build as failure{color}}}


was (Author: heuermh):
We're also seeing this in [our CI 
builds|https://amplab.cs.berkeley.edu/jenkins/job/ADAM/HADOOP_VERSION=2.7.5,SCALAVER=2.12,SPARK_VERSION=2.4.3,label=ubuntu/4351/console]
 on older Spark versions

```

{color:#22}+ curl -L 
'{color}[https://www.apache.org/dyn/mirrors/mirrors.cgi?action=download=spark/spark-2.4.3/spark-2.4.3-bin-without-hadoop-scala-2.12.tgz]{color:#22}'
 -o spark-2.4.3-bin-without-hadoop{color}{color:#22}-scala-2.12.tgz{color}
{color:#22}  % Total    % Received % Xferd  Average Speed   Time    Time    
 Time  Current{color}
{color:#22}                                 Dload  Upload   Total   Spent   
 Left  Speed{color}

{color:#22}  0     0    0     0    0     0      0      0 --:--:-- --:--:-- 
--:--:--     0{color}
{color:#22}  0     0    0     0    0     0      0      0 --:--:-- --:--:-- 
--:--:--     0{color}
{color:#22}curl: (60) SSL certificate problem: unable to get local issuer 
certificate{color}

{color:#22}curl performs SSL certificate verification by default, using a 
"bundle"{color}
{color:#22} of Certificate Authority (CA) public keys (CA certs). If the 
default{color}
{color:#22} bundle file isn't adequate, you can specify an alternate 
file{color}
{color:#22} using the --cacert option.{color}
{color:#22}If this HTTPS server uses a certificate signed by a CA 
represented in{color}
{color:#22} the bundle, the certificate verification probably failed due to 
a{color}
{color:#22} problem with the certificate (it might be expired, or the name 
might{color}
{color:#22} not match the domain name in the URL).{color}
{color:#22}If you'd like to turn off curl's verification of the 
certificate, use{color}
{color:#22} the -k (or --insecure) option.{color}
{color:#22}Build step 'Execute shell' marked build as failure
{color}```

> curl: (60) SSL certificate problem: unable to get local issuer certificate 
> More details here: 
> --
>
> Key: SPARK-28457
> URL: https://issues.apache.org/jira/browse/SPARK-28457
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Priority: Blocker
>
>  
> Build broke since this afternoon.
> [spark-master-compile-maven-hadoop-2.7 #10224 (broken since this 
> build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-2.7/10224/]
>  [spark-master-compile-maven-hadoop-3.2 #171 (broken since this 
> build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-3.2/171/]
>  [spark-master-lint #10599 (broken since this 
> build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/10599/]
>   
> {code:java}
>   
>  
> https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz
>  curl: (60) SSL certificate problem: unable to get local issuer certificate
>  More details here: 
>  

[jira] [Commented] (SPARK-28457) curl: (60) SSL certificate problem: unable to get local issuer certificate More details here:

2019-07-20 Thread Michael Heuer (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889573#comment-16889573
 ] 

Michael Heuer commented on SPARK-28457:
---

We're also seeing this in [our CI 
builds|https://amplab.cs.berkeley.edu/jenkins/job/ADAM/HADOOP_VERSION=2.7.5,SCALAVER=2.12,SPARK_VERSION=2.4.3,label=ubuntu/4351/console]
 on older Spark versions

```

{color:#22}+ curl -L 
'{color}[https://www.apache.org/dyn/mirrors/mirrors.cgi?action=download=spark/spark-2.4.3/spark-2.4.3-bin-without-hadoop-scala-2.12.tgz]{color:#22}'
 -o spark-2.4.3-bin-without-hadoop{color}{color:#22}-scala-2.12.tgz{color}
{color:#22}  % Total    % Received % Xferd  Average Speed   Time    Time    
 Time  Current{color}
{color:#22}                                 Dload  Upload   Total   Spent   
 Left  Speed{color}

{color:#22}  0     0    0     0    0     0      0      0 --:--:-- --:--:-- 
--:--:--     0{color}
{color:#22}  0     0    0     0    0     0      0      0 --:--:-- --:--:-- 
--:--:--     0{color}
{color:#22}curl: (60) SSL certificate problem: unable to get local issuer 
certificate{color}

{color:#22}curl performs SSL certificate verification by default, using a 
"bundle"{color}
{color:#22} of Certificate Authority (CA) public keys (CA certs). If the 
default{color}
{color:#22} bundle file isn't adequate, you can specify an alternate 
file{color}
{color:#22} using the --cacert option.{color}
{color:#22}If this HTTPS server uses a certificate signed by a CA 
represented in{color}
{color:#22} the bundle, the certificate verification probably failed due to 
a{color}
{color:#22} problem with the certificate (it might be expired, or the name 
might{color}
{color:#22} not match the domain name in the URL).{color}
{color:#22}If you'd like to turn off curl's verification of the 
certificate, use{color}
{color:#22} the -k (or --insecure) option.{color}
{color:#22}Build step 'Execute shell' marked build as failure
{color}```

> curl: (60) SSL certificate problem: unable to get local issuer certificate 
> More details here: 
> --
>
> Key: SPARK-28457
> URL: https://issues.apache.org/jira/browse/SPARK-28457
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Priority: Blocker
>
>  
> Build broke since this afternoon.
> [spark-master-compile-maven-hadoop-2.7 #10224 (broken since this 
> build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-2.7/10224/]
>  [spark-master-compile-maven-hadoop-3.2 #171 (broken since this 
> build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-3.2/171/]
>  [spark-master-lint #10599 (broken since this 
> build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/10599/]
>   
> {code:java}
>   
>  
> https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz
>  curl: (60) SSL certificate problem: unable to get local issuer certificate
>  More details here: 
>  https://curl.haxx.se/docs/sslcerts.html
>  curl performs SSL certificate verification by default, using a "bundle"
>  of Certificate Authority (CA) public keys (CA certs). If the default
>  bundle file isn't adequate, you can specify an alternate file
>  using the --cacert option.
>  If this HTTPS server uses a certificate signed by a CA represented in
>  the bundle, the certificate verification probably failed due to a
>  problem with the certificate (it might be expired, or the name might
>  not match the domain name in the URL).
>  If you'd like to turn off curl's verification of the certificate, use
>  the -k (or --insecure) option.
> gzip: stdin: unexpected end of file
>  tar: Child returned status 1
>  tar: Error is not recoverable: exiting now
>  Using `mvn` from path: 
> /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn
>  build/mvn: line 163: 
> /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn:
>  No such file or directory
>  Build step 'Execute shell' marked build as failure
>  Finished: FAILURE
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28460) Port test from HIVE-11835

2019-07-20 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-28460:
---

 Summary: Port test from HIVE-11835
 Key: SPARK-28460
 URL: https://issues.apache.org/jira/browse/SPARK-28460
 Project: Spark
  Issue Type: Sub-task
  Components: SQL, Tests
Affects Versions: 3.0.0
Reporter: Yuming Wang






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27500) Add tests for built-in Hive 2.3

2019-07-20 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-27500:

Summary: Add tests for built-in Hive 2.3  (was: Add tests for the built-in 
Hive 2.3)

> Add tests for built-in Hive 2.3
> ---
>
> Key: SPARK-27500
> URL: https://issues.apache.org/jira/browse/SPARK-27500
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> Our Spark will use some of the new features and bug fixes of Hive 2.3, and we 
> should add tests for these. This is an umbrella JIRA for tracking this.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27441) Add read/write tests to Hive serde tables

2019-07-20 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-27441:

Summary: Add read/write tests to Hive serde tables  (was: Add read/write 
tests to Hive serde tables(include Parquet vectorized reader))

> Add read/write tests to Hive serde tables
> -
>
> Key: SPARK-27441
> URL: https://issues.apache.org/jira/browse/SPARK-27441
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> The versions between Hive, Parquet and ORC after the built-in Hive upgrade to 
> 2.3.4:
> built-in Hive is 1.2.1:
> || ||ORC||Parquet||
> |Spark datasource table|1.5.5|1.10.1|
> |Spark hive table|Hive built-in|1.6.0|
> |Hive 1.2.1|Hive built-in|1.6.0|
> built-in Hive is 2.3.4:
> || ||ORC||Parquet||
> |Spark datasource table|1.5.5|1.10.1|
> |Spark hive table|1.5.5|1.8.1|
> |Hive 2.3.4|1.3.3|1.8.1|
>  We should add a test for Hive Serde table.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28459) Date/Time Functions: make_timestamp

2019-07-20 Thread Maxim Gekk (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889559#comment-16889559
 ] 

Maxim Gekk commented on SPARK-28459:


[~shivuson...@gmail.com] I have already had a draft changes for this similar to 
https://github.com/apache/spark/pull/25210

> Date/Time Functions: make_timestamp
> ---
>
> Key: SPARK-28459
> URL: https://issues.apache.org/jira/browse/SPARK-28459
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> ||Function||Return Type||Description||Example||Result||
> |{{make_timestamp(_year_ }}{{int}}{{, _month_ }}{{int}}{{, _day_ }}{{int}}{{, 
> _hour_ }}{{int}}{{, _min_ }}{{int}}{{, _sec_}}{{double 
> precision}}{{)}}|{{timestamp}}|Create timestamp from year, month, day, hour, 
> minute and seconds fields|{{make_timestamp(2013, 7, 15, 8, 15, 
> 23.5)}}|{{2013-07-15 08:15:23.5}}|
> https://www.postgresql.org/docs/11/functions-datetime.html



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28459) Date/Time Functions: make_timestamp

2019-07-20 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889556#comment-16889556
 ] 

Shivu Sondur commented on SPARK-28459:
--

I will check this.

> Date/Time Functions: make_timestamp
> ---
>
> Key: SPARK-28459
> URL: https://issues.apache.org/jira/browse/SPARK-28459
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> ||Function||Return Type||Description||Example||Result||
> |{{make_timestamp(_year_ }}{{int}}{{, _month_ }}{{int}}{{, _day_ }}{{int}}{{, 
> _hour_ }}{{int}}{{, _min_ }}{{int}}{{, _sec_}}{{double 
> precision}}{{)}}|{{timestamp}}|Create timestamp from year, month, day, hour, 
> minute and seconds fields|{{make_timestamp(2013, 7, 15, 8, 15, 
> 23.5)}}|{{2013-07-15 08:15:23.5}}|
> https://www.postgresql.org/docs/11/functions-datetime.html



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24907) Migrate JDBC data source to DataSource API v2

2019-07-20 Thread Shiv Prashant Sood (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889552#comment-16889552
 ] 

Shiv Prashant Sood commented on SPARK-24907:


Pushed a PR with scaffolding for read/write paths and first draft 
implementation of write( append) path based on current APIs
https://github.com/apache/spark/pull/25211 


> Migrate JDBC data source to DataSource API v2
> -
>
> Key: SPARK-24907
> URL: https://issues.apache.org/jira/browse/SPARK-24907
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.3.0, 3.0.0
>Reporter: Teng Peng
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28459) Date/Time Functions: make_timestamp

2019-07-20 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28459:
--
Description: 
||Function||Return Type||Description||Example||Result||
|{{make_timestamp(_year_ }}{{int}}{{, _month_ }}{{int}}{{, _day_ }}{{int}}{{, 
_hour_ }}{{int}}{{, _min_ }}{{int}}{{, _sec_}}{{double 
precision}}{{)}}|{{timestamp}}|Create timestamp from year, month, day, hour, 
minute and seconds fields|{{make_timestamp(2013, 7, 15, 8, 15, 
23.5)}}|{{2013-07-15 08:15:23.5}}|

https://www.postgresql.org/docs/11/functions-datetime.html

  was:
||Function||Return Type||Description||Example||Result||
|{{make_date(_year_ }}{{int}}{{, _month_ }}{{int}}{{, _day_ 
}}{{int}}{{)}}|{{date}}|Create date from year, month and day 
fields|{{make_date(2013, 7, 15)}}|{{2013-07-15}}|
|{{make_timestamp(_year_ }}{{int}}{{, _month_ }}{{int}}{{, _day_ }}{{int}}{{, 
_hour_ }}{{int}}{{, _min_ }}{{int}}{{, _sec_}}{{double 
precision}}{{)}}|{{timestamp}}|Create timestamp from year, month, day, hour, 
minute and seconds fields|{{make_timestamp(2013, 7, 15, 8, 15, 
23.5)}}|{{2013-07-15 08:15:23.5}}|

https://www.postgresql.org/docs/11/functions-datetime.html


> Date/Time Functions: make_timestamp
> ---
>
> Key: SPARK-28459
> URL: https://issues.apache.org/jira/browse/SPARK-28459
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> ||Function||Return Type||Description||Example||Result||
> |{{make_timestamp(_year_ }}{{int}}{{, _month_ }}{{int}}{{, _day_ }}{{int}}{{, 
> _hour_ }}{{int}}{{, _min_ }}{{int}}{{, _sec_}}{{double 
> precision}}{{)}}|{{timestamp}}|Create timestamp from year, month, day, hour, 
> minute and seconds fields|{{make_timestamp(2013, 7, 15, 8, 15, 
> 23.5)}}|{{2013-07-15 08:15:23.5}}|
> https://www.postgresql.org/docs/11/functions-datetime.html



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28432) Date/Time Functions: make_date

2019-07-20 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28432:
--
Description: 
||Function||Return Type||Description||Example||Result||
|{{make_date(_year_ }}{{int}}{{, _month_ }}{{int}}{{, _day_ 
}}{{int}}{{)}}|{{date}}|Create date from year, month and day 
fields|{{make_date(2013, 7, 15)}}|{{2013-07-15}}|

https://www.postgresql.org/docs/11/functions-datetime.html

  was:
||Function||Return Type||Description||Example||Result||
|{{make_date(_year_ }}{{int}}{{, _month_ }}{{int}}{{, _day_ 
}}{{int}}{{)}}|{{date}}|Create date from year, month and day 
fields|{{make_date(2013, 7, 15)}}|{{2013-07-15}}|
|{{make_timestamp(_year_ }}{{int}}{{, _month_ }}{{int}}{{, _day_ }}{{int}}{{, 
_hour_ }}{{int}}{{, _min_ }}{{int}}{{, _sec_}}{{double 
precision}}{{)}}|{{timestamp}}|Create timestamp from year, month, day, hour, 
minute and seconds fields|{{make_timestamp(2013, 7, 15, 8, 15, 
23.5)}}|{{2013-07-15 08:15:23.5}}|

https://www.postgresql.org/docs/11/functions-datetime.html


> Date/Time Functions: make_date
> --
>
> Key: SPARK-28432
> URL: https://issues.apache.org/jira/browse/SPARK-28432
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> ||Function||Return Type||Description||Example||Result||
> |{{make_date(_year_ }}{{int}}{{, _month_ }}{{int}}{{, _day_ 
> }}{{int}}{{)}}|{{date}}|Create date from year, month and day 
> fields|{{make_date(2013, 7, 15)}}|{{2013-07-15}}|
> https://www.postgresql.org/docs/11/functions-datetime.html



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28459) Date/Time Functions: make_timestamp

2019-07-20 Thread Dongjoon Hyun (JIRA)
Dongjoon Hyun created SPARK-28459:
-

 Summary: Date/Time Functions: make_timestamp
 Key: SPARK-28459
 URL: https://issues.apache.org/jira/browse/SPARK-28459
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yuming Wang


||Function||Return Type||Description||Example||Result||
|{{make_date(_year_ }}{{int}}{{, _month_ }}{{int}}{{, _day_ 
}}{{int}}{{)}}|{{date}}|Create date from year, month and day 
fields|{{make_date(2013, 7, 15)}}|{{2013-07-15}}|
|{{make_timestamp(_year_ }}{{int}}{{, _month_ }}{{int}}{{, _day_ }}{{int}}{{, 
_hour_ }}{{int}}{{, _min_ }}{{int}}{{, _sec_}}{{double 
precision}}{{)}}|{{timestamp}}|Create timestamp from year, month, day, hour, 
minute and seconds fields|{{make_timestamp(2013, 7, 15, 8, 15, 
23.5)}}|{{2013-07-15 08:15:23.5}}|

https://www.postgresql.org/docs/11/functions-datetime.html



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28432) Date/Time Functions: make_date

2019-07-20 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28432:
--
Summary: Date/Time Functions: make_date  (was: Date/Time Functions: 
make_date/make_timestamp)

> Date/Time Functions: make_date
> --
>
> Key: SPARK-28432
> URL: https://issues.apache.org/jira/browse/SPARK-28432
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> ||Function||Return Type||Description||Example||Result||
> |{{make_date(_year_ }}{{int}}{{, _month_ }}{{int}}{{, _day_ 
> }}{{int}}{{)}}|{{date}}|Create date from year, month and day 
> fields|{{make_date(2013, 7, 15)}}|{{2013-07-15}}|
> |{{make_timestamp(_year_ }}{{int}}{{, _month_ }}{{int}}{{, _day_ }}{{int}}{{, 
> _hour_ }}{{int}}{{, _min_ }}{{int}}{{, _sec_}}{{double 
> precision}}{{)}}|{{timestamp}}|Create timestamp from year, month, day, hour, 
> minute and seconds fields|{{make_timestamp(2013, 7, 15, 8, 15, 
> 23.5)}}|{{2013-07-15 08:15:23.5}}|
> https://www.postgresql.org/docs/11/functions-datetime.html



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24907) Migrate JDBC data source to DataSource API v2

2019-07-20 Thread Shiv Prashant Sood (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shiv Prashant Sood updated SPARK-24907:
---
Affects Version/s: 3.0.0

> Migrate JDBC data source to DataSource API v2
> -
>
> Key: SPARK-24907
> URL: https://issues.apache.org/jira/browse/SPARK-24907
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.3.0, 3.0.0
>Reporter: Teng Peng
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28243) remove setFeatureSubsetStrategy and setSubsamplingRate from Python TreeEnsembleParams

2019-07-20 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-28243:
-

Assignee: Huaxin Gao

> remove setFeatureSubsetStrategy and setSubsamplingRate from Python 
> TreeEnsembleParams
> -
>
> Key: SPARK-28243
> URL: https://issues.apache.org/jira/browse/SPARK-28243
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Affects Versions: 3.0.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Minor
>
> Remove the deprecated setFeatureSubsetStrategy and setSubsamplingRate from 
> Python TreeEnsembleParams



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28243) remove setFeatureSubsetStrategy and setSubsamplingRate from Python TreeEnsembleParams

2019-07-20 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-28243.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25046
[https://github.com/apache/spark/pull/25046]

> remove setFeatureSubsetStrategy and setSubsamplingRate from Python 
> TreeEnsembleParams
> -
>
> Key: SPARK-28243
> URL: https://issues.apache.org/jira/browse/SPARK-28243
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Affects Versions: 3.0.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Minor
> Fix For: 3.0.0
>
>
> Remove the deprecated setFeatureSubsetStrategy and setSubsamplingRate from 
> Python TreeEnsembleParams



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-28225) Unexpected behavior for Window functions

2019-07-20 Thread Marco Gaido (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889518#comment-16889518
 ] 

Marco Gaido edited comment on SPARK-28225 at 7/20/19 2:43 PM:
--

Let me cite PostgreSQL documentation to explain you the behavior:

??When an aggregate function is used as a window function, it aggregates over 
the rows within the current row's window frame. An aggregate used with ORDER BY 
and the default window frame definition produces a "running sum" type of 
behavior, which may or may not be what's wanted. To obtain aggregation over the 
whole partition, omit ORDER BY or use ROWS BETWEEN UNBOUNDED PRECEDING AND 
UNBOUNDED FOLLOWING. Other frame specifications can be used to obtain other 
effects.??

So the returned values seem correct to me.


was (Author: mgaido):
Let me cite PostgreSQL documentation to explain you the behavior:

{noformat}
When an aggregate function is used as a window function, it aggregates over the 
rows within the current row's window frame. An aggregate used with ORDER BY and 
the default window frame definition produces a "running sum" type of behavior, 
which may or may not be what's wanted. To obtain aggregation over the whole 
partition, omit ORDER BY or use ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED 
FOLLOWING. Other frame specifications can be used to obtain other effects.
{noformat}

So the returned values seem correct to me.


> Unexpected behavior for Window functions
> 
>
> Key: SPARK-28225
> URL: https://issues.apache.org/jira/browse/SPARK-28225
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Andrew Leverentz
>Priority: Major
>
> I've noticed some odd behavior when combining the "first" aggregate function 
> with an ordered Window.
> In particular, I'm working with columns created using the syntax
> {code}
> first($"y", ignoreNulls = true).over(Window.orderBy($"x"))
> {code}
> Below, I'm including some code which reproduces this issue in a Databricks 
> notebook.
> *Code:*
> {code:java}
> import org.apache.spark.sql.functions.first
> import org.apache.spark.sql.expressions.Window
> import org.apache.spark.sql.Row
> import org.apache.spark.sql.types.{StructType,StructField,IntegerType}
> val schema = StructType(Seq(
>   StructField("x", IntegerType, false),
>   StructField("y", IntegerType, true),
>   StructField("z", IntegerType, true)
> ))
> val input =
>   spark.createDataFrame(sc.parallelize(Seq(
> Row(101, null, 11),
> Row(102, null, 12),
> Row(103, null, 13),
> Row(203, 24, null),
> Row(201, 26, null),
> Row(202, 25, null)
>   )), schema = schema)
> input.show
> val output = input
>   .withColumn("u1", first($"y", ignoreNulls = 
> true).over(Window.orderBy($"x".asc_nulls_last)))
>   .withColumn("u2", first($"y", ignoreNulls = 
> true).over(Window.orderBy($"x".asc)))
>   .withColumn("u3", first($"y", ignoreNulls = 
> true).over(Window.orderBy($"x".desc_nulls_last)))
>   .withColumn("u4", first($"y", ignoreNulls = 
> true).over(Window.orderBy($"x".desc)))
>   .withColumn("u5", first($"z", ignoreNulls = 
> true).over(Window.orderBy($"x".asc_nulls_last)))
>   .withColumn("u6", first($"z", ignoreNulls = 
> true).over(Window.orderBy($"x".asc)))
>   .withColumn("u7", first($"z", ignoreNulls = 
> true).over(Window.orderBy($"x".desc_nulls_last)))
>   .withColumn("u8", first($"z", ignoreNulls = 
> true).over(Window.orderBy($"x".desc)))
> output.show
> {code}
> *Expectation:*
> Based on my understanding of how ordered-Window and aggregate functions work, 
> the results I expected to see were:
>  * u1 = u2 = constant value of 26
>  * u3 = u4 = constant value of 24
>  * u5 = u6 = constant value of 11
>  * u7 = u8 = constant value of 13
> However, columns u1, u2, u7, and u8 contain some unexpected nulls. 
> *Results:*
> {code:java}
> +---+++++---+---+---+---+++
> |  x|   y|   z|  u1|  u2| u3| u4| u5| u6|  u7|  u8|
> +---+++++---+---+---+---+++
> |203|  24|null|  26|  26| 24| 24| 11| 11|null|null|
> |202|  25|null|  26|  26| 24| 24| 11| 11|null|null|
> |201|  26|null|  26|  26| 24| 24| 11| 11|null|null|
> |103|null|  13|null|null| 24| 24| 11| 11|  13|  13|
> |102|null|  12|null|null| 24| 24| 11| 11|  13|  13|
> |101|null|  11|null|null| 24| 24| 11| 11|  13|  13|
> +---+++++---+---+---+---+++
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28225) Unexpected behavior for Window functions

2019-07-20 Thread Marco Gaido (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889518#comment-16889518
 ] 

Marco Gaido commented on SPARK-28225:
-

Let me cite PostgreSQL documentation to explain you the behavior:

{noformat}
When an aggregate function is used as a window function, it aggregates over the 
rows within the current row's window frame. An aggregate used with ORDER BY and 
the default window frame definition produces a "running sum" type of behavior, 
which may or may not be what's wanted. To obtain aggregation over the whole 
partition, omit ORDER BY or use ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED 
FOLLOWING. Other frame specifications can be used to obtain other effects.
{noformat}

So the returned values seem correct to me.


> Unexpected behavior for Window functions
> 
>
> Key: SPARK-28225
> URL: https://issues.apache.org/jira/browse/SPARK-28225
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Andrew Leverentz
>Priority: Major
>
> I've noticed some odd behavior when combining the "first" aggregate function 
> with an ordered Window.
> In particular, I'm working with columns created using the syntax
> {code}
> first($"y", ignoreNulls = true).over(Window.orderBy($"x"))
> {code}
> Below, I'm including some code which reproduces this issue in a Databricks 
> notebook.
> *Code:*
> {code:java}
> import org.apache.spark.sql.functions.first
> import org.apache.spark.sql.expressions.Window
> import org.apache.spark.sql.Row
> import org.apache.spark.sql.types.{StructType,StructField,IntegerType}
> val schema = StructType(Seq(
>   StructField("x", IntegerType, false),
>   StructField("y", IntegerType, true),
>   StructField("z", IntegerType, true)
> ))
> val input =
>   spark.createDataFrame(sc.parallelize(Seq(
> Row(101, null, 11),
> Row(102, null, 12),
> Row(103, null, 13),
> Row(203, 24, null),
> Row(201, 26, null),
> Row(202, 25, null)
>   )), schema = schema)
> input.show
> val output = input
>   .withColumn("u1", first($"y", ignoreNulls = 
> true).over(Window.orderBy($"x".asc_nulls_last)))
>   .withColumn("u2", first($"y", ignoreNulls = 
> true).over(Window.orderBy($"x".asc)))
>   .withColumn("u3", first($"y", ignoreNulls = 
> true).over(Window.orderBy($"x".desc_nulls_last)))
>   .withColumn("u4", first($"y", ignoreNulls = 
> true).over(Window.orderBy($"x".desc)))
>   .withColumn("u5", first($"z", ignoreNulls = 
> true).over(Window.orderBy($"x".asc_nulls_last)))
>   .withColumn("u6", first($"z", ignoreNulls = 
> true).over(Window.orderBy($"x".asc)))
>   .withColumn("u7", first($"z", ignoreNulls = 
> true).over(Window.orderBy($"x".desc_nulls_last)))
>   .withColumn("u8", first($"z", ignoreNulls = 
> true).over(Window.orderBy($"x".desc)))
> output.show
> {code}
> *Expectation:*
> Based on my understanding of how ordered-Window and aggregate functions work, 
> the results I expected to see were:
>  * u1 = u2 = constant value of 26
>  * u3 = u4 = constant value of 24
>  * u5 = u6 = constant value of 11
>  * u7 = u8 = constant value of 13
> However, columns u1, u2, u7, and u8 contain some unexpected nulls. 
> *Results:*
> {code:java}
> +---+++++---+---+---+---+++
> |  x|   y|   z|  u1|  u2| u3| u4| u5| u6|  u7|  u8|
> +---+++++---+---+---+---+++
> |203|  24|null|  26|  26| 24| 24| 11| 11|null|null|
> |202|  25|null|  26|  26| 24| 24| 11| 11|null|null|
> |201|  26|null|  26|  26| 24| 24| 11| 11|null|null|
> |103|null|  13|null|null| 24| 24| 11| 11|  13|  13|
> |102|null|  12|null|null| 24| 24| 11| 11|  13|  13|
> |101|null|  11|null|null| 24| 24| 11| 11|  13|  13|
> +---+++++---+---+---+---+++
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28304) FileFormatWriter introduces an uncoditional sort, even when all attributes are constants

2019-07-20 Thread Eyal Farago (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889436#comment-16889436
 ] 

Eyal Farago commented on SPARK-28304:
-

[~joshrosen], thanks for your comment,

I think this is a bit broader subject than just FileFormatWriter as any sort 
operator can either simplify its ordering or be completely eliminated when 
some/all of the sort columns are known to be constant.

furthermore, in some cases one ordering can be used to satisfy several 
orderings (or on the other hand, ordering requirements of downstream operators 
can be relaxed) - so I believe this is best handled by the optimizer/planner, 
by essentially making EnsureRequirements aware of these kind of cases. as a 
short term fix, the SortExec operator can filter away constant ordering columns 
in the execute() method, and in case it's left with no ordering columns simply 
bypass the sort altogether. 

BTW, is there any reason the FileFormatWriter doesn't take the regular 
optimizing/planning path?

 

> FileFormatWriter introduces an uncoditional sort, even when all attributes 
> are constants
> 
>
> Key: SPARK-28304
> URL: https://issues.apache.org/jira/browse/SPARK-28304
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.2
>Reporter: Eyal Farago
>Priority: Major
>  Labels: performance
>
> FileFormatWriter derives a required sort order based on the partition 
> columns, bucketing columns and explicitly required ordering. However in some 
> use cases Some (or even all) of these fields are constant, in these cases the 
> sort can be skipped.
> i.e. in my use-case, we add a GUUID column identifying a specific 
> (incremental) load, this can be thought of as a batch id. Since we run one 
> batch at a time, this column is always a constant which means there's no need 
> to sort based on this column, since we don't use bucketing or require an 
> explicit ordering the entire sort can be skipped for our case.
>  
> I suggest:
>  # filter away constant columns from the required ordering calculated by 
> FileFormatWriter 
>  # generalizing this to any Sort operator in a spark plan.
>  # introduce optimizer rules to remove constants from sort ordering, 
> potentially eliminating the sort operator altogether.
>  # modify EnsureRequirements to be aware of constant field when deciding 
> whether to introduce a sort or not. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28386) Cannot resolve ORDER BY columns with GROUP BY and HAVING

2019-07-20 Thread Marco Gaido (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889422#comment-16889422
 ] 

Marco Gaido commented on SPARK-28386:
-

I think this is a duplicate of SPARK-26741. I have a PR for it but it seems a 
bit stuck. Any help in reviewing it would be very appreciated. Thanks.

> Cannot resolve ORDER BY columns with GROUP BY and HAVING
> 
>
> Key: SPARK-28386
> URL: https://issues.apache.org/jira/browse/SPARK-28386
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> How to reproduce:
> {code:sql}
> CREATE TABLE test_having (a int, b int, c string, d string) USING parquet;
> INSERT INTO test_having VALUES (0, 1, '', 'A');
> INSERT INTO test_having VALUES (1, 2, '', 'b');
> INSERT INTO test_having VALUES (2, 2, '', 'c');
> INSERT INTO test_having VALUES (3, 3, '', 'D');
> INSERT INTO test_having VALUES (4, 3, '', 'e');
> INSERT INTO test_having VALUES (5, 3, '', 'F');
> INSERT INTO test_having VALUES (6, 4, '', 'g');
> INSERT INTO test_having VALUES (7, 4, '', 'h');
> INSERT INTO test_having VALUES (8, 4, '', 'I');
> INSERT INTO test_having VALUES (9, 4, '', 'j');
> SELECT lower(c), count(c) FROM test_having
>   GROUP BY lower(c) HAVING count(*) > 2
>   ORDER BY lower(c);
> {code}
> {noformat}
> spark-sql> SELECT lower(c), count(c) FROM test_having
>  > GROUP BY lower(c) HAVING count(*) > 2
>  > ORDER BY lower(c);
> Error in query: cannot resolve '`c`' given input columns: [lower(c), 
> count(c)]; line 3 pos 19;
> 'Sort ['lower('c) ASC NULLS FIRST], true
> +- Project [lower(c)#158, count(c)#159L]
>+- Filter (count(1)#161L > cast(2 as bigint))
>   +- Aggregate [lower(c#7)], [lower(c#7) AS lower(c)#158, count(c#7) AS 
> count(c)#159L, count(1) AS count(1)#161L]
>  +- SubqueryAlias test_having
> +- Relation[a#5,b#6,c#7,d#8] parquet
> {noformat}
> But it works when setting an alias:
> {noformat}
> spark-sql> SELECT lower(c) withAias, count(c) FROM test_having
>  > GROUP BY lower(c) HAVING count(*) > 2
>  > ORDER BY withAias;
> 3
>   4
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27815) Improve SQL optimizer's predicate pushdown performance for cascading joins

2019-07-20 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889405#comment-16889405
 ] 

Hyukjin Kwon commented on SPARK-27815:
--

There was a followup at https://github.com/apache/spark/pull/25207

> Improve SQL optimizer's predicate pushdown performance for cascading joins
> ---
>
> Key: SPARK-27815
> URL: https://issues.apache.org/jira/browse/SPARK-27815
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yesheng Ma
>Assignee: Yesheng Ma
>Priority: Major
> Fix For: 3.0.0
>
>
> The current catalyst optimizer's predicate pushdown is divided into two 
> separate rules: PushDownPredicate and PushThroughJoin. This is not efficient 
> for optimizing cascading joins such as TPC-DS q64, where a whole default 
> batch is re-executed just due to this. We need a more efficient approach to 
> pushdown predicate as much as possible in a single pass.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28155) do not leak SaveMode to file source v2

2019-07-20 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889404#comment-16889404
 ] 

Hyukjin Kwon commented on SPARK-28155:
--

 SPARK-28155 was switched to SPARK-2781 due to an accident of a commit log.

> do not leak SaveMode to file source v2
> --
>
> Key: SPARK-28155
> URL: https://issues.apache.org/jira/browse/SPARK-28155
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Priority: Blocker
>
> Currently there is a hack in `DataFrameWriter`, which passes `SaveMode` to 
> file source v2. This should be removed and file source v2 should not accept 
> SaveMode.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28458) CLONE - do not leak SaveMode to file source v2

2019-07-20 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889403#comment-16889403
 ] 

Hyukjin Kwon commented on SPARK-28458:
--

It was cloned by me to keep a copy for awhile during JIRA switch between 
SPARK-28155 and SPARK-27815

> CLONE - do not leak SaveMode to file source v2
> --
>
> Key: SPARK-28458
> URL: https://issues.apache.org/jira/browse/SPARK-28458
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Priority: Blocker
>
> Currently there is a hack in `DataFrameWriter`, which passes `SaveMode` to 
> file source v2. This should be removed and file source v2 should not accept 
> SaveMode.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28458) CLONE - do not leak SaveMode to file source v2

2019-07-20 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-28458.
--
Resolution: Invalid

> CLONE - do not leak SaveMode to file source v2
> --
>
> Key: SPARK-28458
> URL: https://issues.apache.org/jira/browse/SPARK-28458
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Priority: Blocker
>
> Currently there is a hack in `DataFrameWriter`, which passes `SaveMode` to 
> file source v2. This should be removed and file source v2 should not accept 
> SaveMode.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28155) do not leak SaveMode to file source v2

2019-07-20 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-28155:
-
Reporter: Wenchen Fan  (was: Yesheng Ma)

> do not leak SaveMode to file source v2
> --
>
> Key: SPARK-28155
> URL: https://issues.apache.org/jira/browse/SPARK-28155
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Priority: Blocker
>
> The current catalyst optimizer's predicate pushdown is divided into two 
> separate rules: PushDownPredicate and PushThroughJoin. This is not efficient 
> for optimizing cascading joins such as TPC-DS q64, where a whole default 
> batch is re-executed just due to this. We need a more efficient approach to 
> pushdown predicate as much as possible in a single pass.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28155) do not leak SaveMode to file source v2

2019-07-20 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-28155:
-
Priority: Blocker  (was: Major)

> do not leak SaveMode to file source v2
> --
>
> Key: SPARK-28155
> URL: https://issues.apache.org/jira/browse/SPARK-28155
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yesheng Ma
>Assignee: Yesheng Ma
>Priority: Blocker
>
> The current catalyst optimizer's predicate pushdown is divided into two 
> separate rules: PushDownPredicate and PushThroughJoin. This is not efficient 
> for optimizing cascading joins such as TPC-DS q64, where a whole default 
> batch is re-executed just due to this. We need a more efficient approach to 
> pushdown predicate as much as possible in a single pass.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28155) do not leak SaveMode to file source v2

2019-07-20 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-28155:
-
Description: Currently there is a hack in `DataFrameWriter`, which passes 
`SaveMode` to file source v2. This should be removed and file source v2 should 
not accept SaveMode.  (was: The current catalyst optimizer's predicate pushdown 
is divided into two separate rules: PushDownPredicate and PushThroughJoin. This 
is not efficient for optimizing cascading joins such as TPC-DS q64, where a 
whole default batch is re-executed just due to this. We need a more efficient 
approach to pushdown predicate as much as possible in a single pass.)

> do not leak SaveMode to file source v2
> --
>
> Key: SPARK-28155
> URL: https://issues.apache.org/jira/browse/SPARK-28155
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Priority: Blocker
>
> Currently there is a hack in `DataFrameWriter`, which passes `SaveMode` to 
> file source v2. This should be removed and file source v2 should not accept 
> SaveMode.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-28155) do not leak SaveMode to file source v2

2019-07-20 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-28155:
-
Comment: was deleted

(was: [https://github.com/apache/spark/pull/24956])

> do not leak SaveMode to file source v2
> --
>
> Key: SPARK-28155
> URL: https://issues.apache.org/jira/browse/SPARK-28155
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yesheng Ma
>Priority: Blocker
>
> The current catalyst optimizer's predicate pushdown is divided into two 
> separate rules: PushDownPredicate and PushThroughJoin. This is not efficient 
> for optimizing cascading joins such as TPC-DS q64, where a whole default 
> batch is re-executed just due to this. We need a more efficient approach to 
> pushdown predicate as much as possible in a single pass.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-28155) do not leak SaveMode to file source v2

2019-07-20 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reopened SPARK-28155:
--
  Assignee: (was: Yesheng Ma)

> do not leak SaveMode to file source v2
> --
>
> Key: SPARK-28155
> URL: https://issues.apache.org/jira/browse/SPARK-28155
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yesheng Ma
>Priority: Blocker
>
> The current catalyst optimizer's predicate pushdown is divided into two 
> separate rules: PushDownPredicate and PushThroughJoin. This is not efficient 
> for optimizing cascading joins such as TPC-DS q64, where a whole default 
> batch is re-executed just due to this. We need a more efficient approach to 
> pushdown predicate as much as possible in a single pass.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28155) do not leak SaveMode to file source v2

2019-07-20 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-28155:
-
Issue Type: Bug  (was: Improvement)

> do not leak SaveMode to file source v2
> --
>
> Key: SPARK-28155
> URL: https://issues.apache.org/jira/browse/SPARK-28155
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yesheng Ma
>Assignee: Yesheng Ma
>Priority: Major
> Fix For: 3.0.0
>
>
> The current catalyst optimizer's predicate pushdown is divided into two 
> separate rules: PushDownPredicate and PushThroughJoin. This is not efficient 
> for optimizing cascading joins such as TPC-DS q64, where a whole default 
> batch is re-executed just due to this. We need a more efficient approach to 
> pushdown predicate as much as possible in a single pass.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27815) Improve SQL optimizer's predicate pushdown performance for cascading joins

2019-07-20 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889402#comment-16889402
 ] 

Hyukjin Kwon commented on SPARK-27815:
--

There was a mistake about JIRA ID and commit. The JIRA is now switched to 
SPARK-28155

> Improve SQL optimizer's predicate pushdown performance for cascading joins
> ---
>
> Key: SPARK-27815
> URL: https://issues.apache.org/jira/browse/SPARK-27815
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yesheng Ma
>Assignee: Yesheng Ma
>Priority: Major
> Fix For: 3.0.0
>
>
> The current catalyst optimizer's predicate pushdown is divided into two 
> separate rules: PushDownPredicate and PushThroughJoin. This is not efficient 
> for optimizing cascading joins such as TPC-DS q64, where a whole default 
> batch is re-executed just due to this. We need a more efficient approach to 
> pushdown predicate as much as possible in a single pass.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28155) do not leak SaveMode to file source v2

2019-07-20 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-28155:
-
Target Version/s: 3.0.0
   Fix Version/s: (was: 3.0.0)

> do not leak SaveMode to file source v2
> --
>
> Key: SPARK-28155
> URL: https://issues.apache.org/jira/browse/SPARK-28155
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yesheng Ma
>Assignee: Yesheng Ma
>Priority: Major
>
> The current catalyst optimizer's predicate pushdown is divided into two 
> separate rules: PushDownPredicate and PushThroughJoin. This is not efficient 
> for optimizing cascading joins such as TPC-DS q64, where a whole default 
> batch is re-executed just due to this. We need a more efficient approach to 
> pushdown predicate as much as possible in a single pass.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28155) do not leak SaveMode to file source v2

2019-07-20 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-28155:
-
Summary: do not leak SaveMode to file source v2  (was: Improve SQL 
optimizer's predicate pushdown performance for cascading joins)

> do not leak SaveMode to file source v2
> --
>
> Key: SPARK-28155
> URL: https://issues.apache.org/jira/browse/SPARK-28155
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yesheng Ma
>Assignee: Yesheng Ma
>Priority: Major
> Fix For: 3.0.0
>
>
> The current catalyst optimizer's predicate pushdown is divided into two 
> separate rules: PushDownPredicate and PushThroughJoin. This is not efficient 
> for optimizing cascading joins such as TPC-DS q64, where a whole default 
> batch is re-executed just due to this. We need a more efficient approach to 
> pushdown predicate as much as possible in a single pass.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-27815) do not leak SaveMode to file source v2

2019-07-20 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-27815:
-
Comment: was deleted

(was: [~cloud_fan] Hi Wenchen, do you think it is viable solution mentioned 
below? 

Create a new V2WriteCommand case class and its Exec named maybe 
_OverwriteByQueryId_ to replace WriteToDataSourceV2, which accepts a QueryId so 
that tests can pass.

Or should we keep WriteToDataSourceV2?)

> do not leak SaveMode to file source v2
> --
>
> Key: SPARK-27815
> URL: https://issues.apache.org/jira/browse/SPARK-27815
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yesheng Ma
>Assignee: Yesheng Ma
>Priority: Major
> Fix For: 3.0.0
>
>
> The current catalyst optimizer's predicate pushdown is divided into two 
> separate rules: PushDownPredicate and PushThroughJoin. This is not efficient 
> for optimizing cascading joins such as TPC-DS q64, where a whole default 
> batch is re-executed just due to this. We need a more efficient approach to 
> pushdown predicate as much as possible in a single pass.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27815) Improve SQL optimizer's predicate pushdown performance for cascading joins

2019-07-20 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-27815:
-
Summary: Improve SQL optimizer's predicate pushdown performance for 
cascading joins   (was: do not leak SaveMode to file source v2)

> Improve SQL optimizer's predicate pushdown performance for cascading joins
> ---
>
> Key: SPARK-27815
> URL: https://issues.apache.org/jira/browse/SPARK-27815
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yesheng Ma
>Assignee: Yesheng Ma
>Priority: Major
> Fix For: 3.0.0
>
>
> The current catalyst optimizer's predicate pushdown is divided into two 
> separate rules: PushDownPredicate and PushThroughJoin. This is not efficient 
> for optimizing cascading joins such as TPC-DS q64, where a whole default 
> batch is re-executed just due to this. We need a more efficient approach to 
> pushdown predicate as much as possible in a single pass.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-27815) Improve SQL optimizer's predicate pushdown performance for cascading joins

2019-07-20 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-27815.
--
Resolution: Fixed

> Improve SQL optimizer's predicate pushdown performance for cascading joins
> ---
>
> Key: SPARK-27815
> URL: https://issues.apache.org/jira/browse/SPARK-27815
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yesheng Ma
>Assignee: Yesheng Ma
>Priority: Major
> Fix For: 3.0.0
>
>
> The current catalyst optimizer's predicate pushdown is divided into two 
> separate rules: PushDownPredicate and PushThroughJoin. This is not efficient 
> for optimizing cascading joins such as TPC-DS q64, where a whole default 
> batch is re-executed just due to this. We need a more efficient approach to 
> pushdown predicate as much as possible in a single pass.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27815) do not leak SaveMode to file source v2

2019-07-20 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889401#comment-16889401
 ] 

Hyukjin Kwon commented on SPARK-27815:
--

Fixed in https://github.com/apache/spark/pull/24956

> do not leak SaveMode to file source v2
> --
>
> Key: SPARK-27815
> URL: https://issues.apache.org/jira/browse/SPARK-27815
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yesheng Ma
>Assignee: Yesheng Ma
>Priority: Major
> Fix For: 3.0.0
>
>
> The current catalyst optimizer's predicate pushdown is divided into two 
> separate rules: PushDownPredicate and PushThroughJoin. This is not efficient 
> for optimizing cascading joins such as TPC-DS q64, where a whole default 
> batch is re-executed just due to this. We need a more efficient approach to 
> pushdown predicate as much as possible in a single pass.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-28155) Improve SQL optimizer's predicate pushdown performance for cascading joins

2019-07-20 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-28155:
-
Comment: was deleted

(was: User 'yeshengm' has created a pull request for this issue:
https://github.com/apache/spark/pull/24956)

> Improve SQL optimizer's predicate pushdown performance for cascading joins
> --
>
> Key: SPARK-28155
> URL: https://issues.apache.org/jira/browse/SPARK-28155
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yesheng Ma
>Assignee: Yesheng Ma
>Priority: Major
> Fix For: 3.0.0
>
>
> The current catalyst optimizer's predicate pushdown is divided into two 
> separate rules: PushDownPredicate and PushThroughJoin. This is not efficient 
> for optimizing cascading joins such as TPC-DS q64, where a whole default 
> batch is re-executed just due to this. We need a more efficient approach to 
> pushdown predicate as much as possible in a single pass.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-28155) Improve SQL optimizer's predicate pushdown performance for cascading joins

2019-07-20 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-28155:
-
Comment: was deleted

(was: This is committed with a wrong JIRA ID, `SPARK-28155`.)

> Improve SQL optimizer's predicate pushdown performance for cascading joins
> --
>
> Key: SPARK-28155
> URL: https://issues.apache.org/jira/browse/SPARK-28155
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yesheng Ma
>Assignee: Yesheng Ma
>Priority: Major
> Fix For: 3.0.0
>
>
> The current catalyst optimizer's predicate pushdown is divided into two 
> separate rules: PushDownPredicate and PushThroughJoin. This is not efficient 
> for optimizing cascading joins such as TPC-DS q64, where a whole default 
> batch is re-executed just due to this. We need a more efficient approach to 
> pushdown predicate as much as possible in a single pass.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28155) Improve SQL optimizer's predicate pushdown performance for cascading joins

2019-07-20 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889400#comment-16889400
 ] 

Hyukjin Kwon commented on SPARK-28155:
--

[~Tonix517] said:

Hi Wenchen, do you think it is viable solution mentioned below? 

Create a new V2WriteCommand case class and its Exec named maybe 
OverwriteByQueryId to replace WriteToDataSourceV2, which accepts a QueryId so 
that tests can pass.

Or should we keep WriteToDataSourceV2?

> Improve SQL optimizer's predicate pushdown performance for cascading joins
> --
>
> Key: SPARK-28155
> URL: https://issues.apache.org/jira/browse/SPARK-28155
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yesheng Ma
>Assignee: Yesheng Ma
>Priority: Major
> Fix For: 3.0.0
>
>
> The current catalyst optimizer's predicate pushdown is divided into two 
> separate rules: PushDownPredicate and PushThroughJoin. This is not efficient 
> for optimizing cascading joins such as TPC-DS q64, where a whole default 
> batch is re-executed just due to this. We need a more efficient approach to 
> pushdown predicate as much as possible in a single pass.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27815) do not leak SaveMode to file source v2

2019-07-20 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-27815:
-
Issue Type: Improvement  (was: Bug)

> do not leak SaveMode to file source v2
> --
>
> Key: SPARK-27815
> URL: https://issues.apache.org/jira/browse/SPARK-27815
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Priority: Major
>
> The current catalyst optimizer's predicate pushdown is divided into two 
> separate rules: PushDownPredicate and PushThroughJoin. This is not efficient 
> for optimizing cascading joins such as TPC-DS q64, where a whole default 
> batch is re-executed just due to this. We need a more efficient approach to 
> pushdown predicate as much as possible in a single pass.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27815) do not leak SaveMode to file source v2

2019-07-20 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-27815:
-
Reporter: Yesheng Ma  (was: Wenchen Fan)

> do not leak SaveMode to file source v2
> --
>
> Key: SPARK-27815
> URL: https://issues.apache.org/jira/browse/SPARK-27815
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yesheng Ma
>Assignee: Yesheng Ma
>Priority: Major
> Fix For: 3.0.0
>
>
> The current catalyst optimizer's predicate pushdown is divided into two 
> separate rules: PushDownPredicate and PushThroughJoin. This is not efficient 
> for optimizing cascading joins such as TPC-DS q64, where a whole default 
> batch is re-executed just due to this. We need a more efficient approach to 
> pushdown predicate as much as possible in a single pass.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27815) do not leak SaveMode to file source v2

2019-07-20 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-27815:
-
Priority: Major  (was: Blocker)

> do not leak SaveMode to file source v2
> --
>
> Key: SPARK-27815
> URL: https://issues.apache.org/jira/browse/SPARK-27815
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Priority: Major
>
> The current catalyst optimizer's predicate pushdown is divided into two 
> separate rules: PushDownPredicate and PushThroughJoin. This is not efficient 
> for optimizing cascading joins such as TPC-DS q64, where a whole default 
> batch is re-executed just due to this. We need a more efficient approach to 
> pushdown predicate as much as possible in a single pass.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27815) do not leak SaveMode to file source v2

2019-07-20 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-27815:


Assignee: Yesheng Ma

> do not leak SaveMode to file source v2
> --
>
> Key: SPARK-27815
> URL: https://issues.apache.org/jira/browse/SPARK-27815
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Yesheng Ma
>Priority: Major
> Fix For: 3.0.0
>
>
> The current catalyst optimizer's predicate pushdown is divided into two 
> separate rules: PushDownPredicate and PushThroughJoin. This is not efficient 
> for optimizing cascading joins such as TPC-DS q64, where a whole default 
> batch is re-executed just due to this. We need a more efficient approach to 
> pushdown predicate as much as possible in a single pass.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27815) do not leak SaveMode to file source v2

2019-07-20 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-27815:
-
Target Version/s:   (was: 3.0.0)

> do not leak SaveMode to file source v2
> --
>
> Key: SPARK-27815
> URL: https://issues.apache.org/jira/browse/SPARK-27815
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Priority: Major
> Fix For: 3.0.0
>
>
> The current catalyst optimizer's predicate pushdown is divided into two 
> separate rules: PushDownPredicate and PushThroughJoin. This is not efficient 
> for optimizing cascading joins such as TPC-DS q64, where a whole default 
> batch is re-executed just due to this. We need a more efficient approach to 
> pushdown predicate as much as possible in a single pass.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27815) do not leak SaveMode to file source v2

2019-07-20 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-27815:
-
Description: The current catalyst optimizer's predicate pushdown is divided 
into two separate rules: PushDownPredicate and PushThroughJoin. This is not 
efficient for optimizing cascading joins such as TPC-DS q64, where a whole 
default batch is re-executed just due to this. We need a more efficient 
approach to pushdown predicate as much as possible in a single pass.  (was: 
Currently there is a hack in `DataFrameWriter`, which passes `SaveMode` to file 
source v2. This should be removed and file source v2 should not accept 
SaveMode.)

> do not leak SaveMode to file source v2
> --
>
> Key: SPARK-27815
> URL: https://issues.apache.org/jira/browse/SPARK-27815
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Priority: Blocker
>
> The current catalyst optimizer's predicate pushdown is divided into two 
> separate rules: PushDownPredicate and PushThroughJoin. This is not efficient 
> for optimizing cascading joins such as TPC-DS q64, where a whole default 
> batch is re-executed just due to this. We need a more efficient approach to 
> pushdown predicate as much as possible in a single pass.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27815) do not leak SaveMode to file source v2

2019-07-20 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-27815:
-
Fix Version/s: 3.0.0

> do not leak SaveMode to file source v2
> --
>
> Key: SPARK-27815
> URL: https://issues.apache.org/jira/browse/SPARK-27815
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Priority: Major
> Fix For: 3.0.0
>
>
> The current catalyst optimizer's predicate pushdown is divided into two 
> separate rules: PushDownPredicate and PushThroughJoin. This is not efficient 
> for optimizing cascading joins such as TPC-DS q64, where a whole default 
> batch is re-executed just due to this. We need a more efficient approach to 
> pushdown predicate as much as possible in a single pass.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28458) CLONE - do not leak SaveMode to file source v2

2019-07-20 Thread Hyukjin Kwon (JIRA)
Hyukjin Kwon created SPARK-28458:


 Summary: CLONE - do not leak SaveMode to file source v2
 Key: SPARK-28458
 URL: https://issues.apache.org/jira/browse/SPARK-28458
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Wenchen Fan


Currently there is a hack in `DataFrameWriter`, which passes `SaveMode` to file 
source v2. This should be removed and file source v2 should not accept SaveMode.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-28457) curl: (60) SSL certificate problem: unable to get local issuer certificate More details here:

2019-07-20 Thread Xiao Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889392#comment-16889392
 ] 

Xiao Li edited comment on SPARK-28457 at 7/20/19 6:46 AM:
--

[~shaneknapp], Could you help take a look at this?


was (Author: smilegator):
[~shaneknapp]

> curl: (60) SSL certificate problem: unable to get local issuer certificate 
> More details here: 
> --
>
> Key: SPARK-28457
> URL: https://issues.apache.org/jira/browse/SPARK-28457
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Priority: Blocker
>
>  
> Build broke since this afternoon.
> [spark-master-compile-maven-hadoop-2.7 #10224 (broken since this 
> build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-2.7/10224/]
>  [spark-master-compile-maven-hadoop-3.2 #171 (broken since this 
> build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-3.2/171/]
>  [spark-master-lint #10599 (broken since this 
> build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/10599/]
>   
> {code:java}
>   
>  
> https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz
>  curl: (60) SSL certificate problem: unable to get local issuer certificate
>  More details here: 
>  https://curl.haxx.se/docs/sslcerts.html
>  curl performs SSL certificate verification by default, using a "bundle"
>  of Certificate Authority (CA) public keys (CA certs). If the default
>  bundle file isn't adequate, you can specify an alternate file
>  using the --cacert option.
>  If this HTTPS server uses a certificate signed by a CA represented in
>  the bundle, the certificate verification probably failed due to a
>  problem with the certificate (it might be expired, or the name might
>  not match the domain name in the URL).
>  If you'd like to turn off curl's verification of the certificate, use
>  the -k (or --insecure) option.
> gzip: stdin: unexpected end of file
>  tar: Child returned status 1
>  tar: Error is not recoverable: exiting now
>  Using `mvn` from path: 
> /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn
>  build/mvn: line 163: 
> /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn:
>  No such file or directory
>  Build step 'Execute shell' marked build as failure
>  Finished: FAILURE
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28457) curl: (60) SSL certificate problem: unable to get local issuer certificate More details here:

2019-07-20 Thread Xiao Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889392#comment-16889392
 ] 

Xiao Li commented on SPARK-28457:
-

[~shaneknapp]

> curl: (60) SSL certificate problem: unable to get local issuer certificate 
> More details here: 
> --
>
> Key: SPARK-28457
> URL: https://issues.apache.org/jira/browse/SPARK-28457
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Priority: Blocker
>
>  
> Build broke since this afternoon.
> [spark-master-compile-maven-hadoop-2.7 #10224 (broken since this 
> build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-2.7/10224/]
>  [spark-master-compile-maven-hadoop-3.2 #171 (broken since this 
> build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-3.2/171/]
>  [spark-master-lint #10599 (broken since this 
> build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/10599/]
>   
> {code:java}
>   
>  
> https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz
>  curl: (60) SSL certificate problem: unable to get local issuer certificate
>  More details here: 
>  https://curl.haxx.se/docs/sslcerts.html
>  curl performs SSL certificate verification by default, using a "bundle"
>  of Certificate Authority (CA) public keys (CA certs). If the default
>  bundle file isn't adequate, you can specify an alternate file
>  using the --cacert option.
>  If this HTTPS server uses a certificate signed by a CA represented in
>  the bundle, the certificate verification probably failed due to a
>  problem with the certificate (it might be expired, or the name might
>  not match the domain name in the URL).
>  If you'd like to turn off curl's verification of the certificate, use
>  the -k (or --insecure) option.
> gzip: stdin: unexpected end of file
>  tar: Child returned status 1
>  tar: Error is not recoverable: exiting now
>  Using `mvn` from path: 
> /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn
>  build/mvn: line 163: 
> /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn:
>  No such file or directory
>  Build step 'Execute shell' marked build as failure
>  Finished: FAILURE
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28457) curl: (60) SSL certificate problem: unable to get local issuer certificate More details here:

2019-07-20 Thread Xiao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-28457:

Description: 
 

Build broke since this afternoon.

[spark-master-compile-maven-hadoop-2.7 #10224 (broken since this 
build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-2.7/10224/]
 [spark-master-compile-maven-hadoop-3.2 #171 (broken since this 
build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-3.2/171/]
 [spark-master-lint #10599 (broken since this 
build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/10599/]
  
{code:java}

  
 
https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz
 curl: (60) SSL certificate problem: unable to get local issuer certificate
 More details here: 
 https://curl.haxx.se/docs/sslcerts.html
 curl performs SSL certificate verification by default, using a "bundle"
 of Certificate Authority (CA) public keys (CA certs). If the default
 bundle file isn't adequate, you can specify an alternate file
 using the --cacert option.
 If this HTTPS server uses a certificate signed by a CA represented in
 the bundle, the certificate verification probably failed due to a
 problem with the certificate (it might be expired, or the name might
 not match the domain name in the URL).
 If you'd like to turn off curl's verification of the certificate, use
 the -k (or --insecure) option.
gzip: stdin: unexpected end of file
 tar: Child returned status 1
 tar: Error is not recoverable: exiting now
 Using `mvn` from path: 
/home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn
 build/mvn: line 163: 
/home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn:
 No such file or directory
 Build step 'Execute shell' marked build as failure
 Finished: FAILURE
{code}
 

  was:
[spark-master-compile-maven-hadoop-2.7 #10224 (broken since this 
build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-2.7/10224/]
[spark-master-compile-maven-hadoop-3.2 #171 (broken since this 
build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-3.2/171/]
[spark-master-lint #10599 (broken since this 
build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/10599/]
 
 
[https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz]
curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: 
[https://curl.haxx.se/docs/sslcerts.html]
curl performs SSL certificate verification by default, using a "bundle"
 of Certificate Authority (CA) public keys (CA certs). If the default
 bundle file isn't adequate, you can specify an alternate file
 using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
 the bundle, the certificate verification probably failed due to a
 problem with the certificate (it might be expired, or the name might
 not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
 the -k (or --insecure) option.

gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now
Using `mvn` from path: 
/home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn
build/mvn: line 163: 
/home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn:
 No such file or directory
Build step 'Execute shell' marked build as failure
Finished: FAILURE


> curl: (60) SSL certificate problem: unable to get local issuer certificate 
> More details here: 
> --
>
> Key: SPARK-28457
> URL: https://issues.apache.org/jira/browse/SPARK-28457
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Priority: Blocker
>
>  
> Build broke since this afternoon.
> [spark-master-compile-maven-hadoop-2.7 #10224 (broken since this 
> build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-2.7/10224/]
>  [spark-master-compile-maven-hadoop-3.2 #171 (broken since this 
> build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-3.2/171/]
>  [spark-master-lint #10599 (broken since this 
> build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/10599/]
>   
> {code:java}
>   
>  
> 

[jira] [Updated] (SPARK-28457) curl: (60) SSL certificate problem: unable to get local issuer certificate More details here:

2019-07-20 Thread Xiao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-28457:

Description: 
[spark-master-compile-maven-hadoop-2.7 #10224 (broken since this 
build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-2.7/10224/]
[spark-master-compile-maven-hadoop-3.2 #171 (broken since this 
build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-3.2/171/]
[spark-master-lint #10599 (broken since this 
build)|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/10599/]
 
 
[https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz]
curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: 
[https://curl.haxx.se/docs/sslcerts.html]
curl performs SSL certificate verification by default, using a "bundle"
 of Certificate Authority (CA) public keys (CA certs). If the default
 bundle file isn't adequate, you can specify an alternate file
 using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
 the bundle, the certificate verification probably failed due to a
 problem with the certificate (it might be expired, or the name might
 not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
 the -k (or --insecure) option.

gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now
Using `mvn` from path: 
/home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn
build/mvn: line 163: 
/home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn:
 No such file or directory
Build step 'Execute shell' marked build as failure
Finished: FAILURE

  was:
+ build/mvn -DzincPort=3215 -DskipTests -Phadoop-2.7 -Phive-thriftserver 
-Pkinesis-asl -Pspark-ganglia-lgpl -Pmesos -Pyarn clean compile test-compile
exec: curl --progress-bar -L 
https://downloads.lightbend.com/zinc/0.3.15/zinc-0.3.15.tgz

##15.3%
# 29.8%
  51.2%
# 69.3%
  78.3%
 100.0%
exec: curl --progress-bar -L 
https://downloads.lightbend.com/scala/2.12.8/scala-2.12.8.tgz

##31.5%
###   49.6%
##81.3%
 100.0%
exec: curl --progress-bar -L 
https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz

curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: https://curl.haxx.se/docs/sslcerts.html

curl performs SSL certificate verification by default, using a "bundle"
 of Certificate Authority (CA) public keys (CA certs). If the default
 bundle file isn't adequate, you can specify an alternate file
 using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
 the bundle, the certificate verification probably failed due to a
 problem with the certificate (it might be expired, or the name might
 not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
 the -k (or --insecure) option.

gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now
Using `mvn` from path: 
/home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn
build/mvn: line 163: 
/home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn:
 No such file or directory
Build step 'Execute shell' marked build as failure
Finished: FAILURE


> curl: (60) SSL certificate problem: unable to get local issuer certificate 
> More details here: 
> --
>
> Key: SPARK-28457
> URL: https://issues.apache.org/jira/browse/SPARK-28457
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Priority: Blocker
>
> [spark-master-compile-maven-hadoop-2.7 #10224 

[jira] [Updated] (SPARK-28457) curl: (60) SSL certificate problem: unable to get local issuer certificate More details here:

2019-07-20 Thread Xiao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-28457:

Priority: Blocker  (was: Major)

> curl: (60) SSL certificate problem: unable to get local issuer certificate 
> More details here: 
> --
>
> Key: SPARK-28457
> URL: https://issues.apache.org/jira/browse/SPARK-28457
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Priority: Blocker
>
> + build/mvn -DzincPort=3215 -DskipTests -Phadoop-2.7 -Phive-thriftserver 
> -Pkinesis-asl -Pspark-ganglia-lgpl -Pmesos -Pyarn clean compile test-compile
> exec: curl --progress-bar -L 
> [https://downloads.lightbend.com/zinc/0.3.15/zinc-0.3.15.tgz]
> ##
> 15.3%
> # 
> 29.8%
>   
> 51.2%
> # 
> 69.3%
>   
> 78.3%
>  
> 100.0%
> exec: curl --progress-bar -L 
> [https://downloads.lightbend.com/scala/2.12.8/scala-2.12.8.tgz]
> ##
> 31.5%
> ###   
> 49.6%
> ##
> 81.3%
>  
> 100.0%
> exec: curl --progress-bar -L 
> [https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz]
> curl: (60) SSL certificate problem: unable to get local issuer certificate
> More details here: 
> [https://curl.haxx.se/docs/sslcerts.html]
> curl performs SSL certificate verification by default, using a "bundle"
>  of Certificate Authority (CA) public keys (CA certs). If the default
>  bundle file isn't adequate, you can specify an alternate file
>  using the --cacert option.
> If this HTTPS server uses a certificate signed by a CA represented in
>  the bundle, the certificate verification probably failed due to a
>  problem with the certificate (it might be expired, or the name might
>  not match the domain name in the URL).
> If you'd like to turn off curl's verification of the certificate, use
>  the -k (or --insecure) option.
> gzip: stdin: unexpected end of file
> tar: Child returned status 1
> tar: Error is not recoverable: exiting now
> Using `mvn` from path: 
> /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn
> build/mvn: line 163: 
> /home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn:
>  No such file or directory
> Build step 'Execute shell' marked build as failure
> Finished: FAILURE



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28457) curl: (60) SSL certificate problem: unable to get local issuer certificate More details here:

2019-07-20 Thread Xiao Li (JIRA)
Xiao Li created SPARK-28457:
---

 Summary: curl: (60) SSL certificate problem: unable to get local 
issuer certificate More details here: 
 Key: SPARK-28457
 URL: https://issues.apache.org/jira/browse/SPARK-28457
 Project: Spark
  Issue Type: Bug
  Components: Project Infra
Affects Versions: 3.0.0
Reporter: Xiao Li


+ build/mvn -DzincPort=3215 -DskipTests -Phadoop-2.7 -Phive-thriftserver 
-Pkinesis-asl -Pspark-ganglia-lgpl -Pmesos -Pyarn clean compile test-compile
exec: curl --progress-bar -L 
[https://downloads.lightbend.com/zinc/0.3.15/zinc-0.3.15.tgz]
##15.3%
# 29.8%
  51.2%
# 69.3%
  78.3%
 100.0%
exec: curl --progress-bar -L 
[https://downloads.lightbend.com/scala/2.12.8/scala-2.12.8.tgz]
##31.5%
###   49.6%
##81.3%
 100.0%
exec: curl --progress-bar -L 
[https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz]
curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: 
[https://curl.haxx.se/docs/sslcerts.html]
curl performs SSL certificate verification by default, using a "bundle"
 of Certificate Authority (CA) public keys (CA certs). If the default
 bundle file isn't adequate, you can specify an alternate file
 using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
 the bundle, the certificate verification probably failed due to a
 problem with the certificate (it might be expired, or the name might
 not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
 the -k (or --insecure) option.

gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now
Using `mvn` from path: 
/home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn
build/mvn: line 163: 
/home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn:
 No such file or directory
Build step 'Execute shell' marked build as failure
Finished: FAILURE



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28457) curl: (60) SSL certificate problem: unable to get local issuer certificate More details here:

2019-07-20 Thread Xiao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-28457:

Description: 
+ build/mvn -DzincPort=3215 -DskipTests -Phadoop-2.7 -Phive-thriftserver 
-Pkinesis-asl -Pspark-ganglia-lgpl -Pmesos -Pyarn clean compile test-compile
exec: curl --progress-bar -L 
https://downloads.lightbend.com/zinc/0.3.15/zinc-0.3.15.tgz

##15.3%
# 29.8%
  51.2%
# 69.3%
  78.3%
 100.0%
exec: curl --progress-bar -L 
https://downloads.lightbend.com/scala/2.12.8/scala-2.12.8.tgz

##31.5%
###   49.6%
##81.3%
 100.0%
exec: curl --progress-bar -L 
https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz

curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: https://curl.haxx.se/docs/sslcerts.html

curl performs SSL certificate verification by default, using a "bundle"
 of Certificate Authority (CA) public keys (CA certs). If the default
 bundle file isn't adequate, you can specify an alternate file
 using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
 the bundle, the certificate verification probably failed due to a
 problem with the certificate (it might be expired, or the name might
 not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
 the -k (or --insecure) option.

gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now
Using `mvn` from path: 
/home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn
build/mvn: line 163: 
/home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn:
 No such file or directory
Build step 'Execute shell' marked build as failure
Finished: FAILURE

  was:
+ build/mvn -DzincPort=3215 -DskipTests -Phadoop-2.7 -Phive-thriftserver 
-Pkinesis-asl -Pspark-ganglia-lgpl -Pmesos -Pyarn clean compile test-compile
exec: curl --progress-bar -L 
[https://downloads.lightbend.com/zinc/0.3.15/zinc-0.3.15.tgz]
##15.3%
# 29.8%
  51.2%
# 69.3%
  78.3%
 100.0%
exec: curl --progress-bar -L 
[https://downloads.lightbend.com/scala/2.12.8/scala-2.12.8.tgz]
##31.5%
###   49.6%
##81.3%
 100.0%
exec: curl --progress-bar -L 
[https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz]
curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: 
[https://curl.haxx.se/docs/sslcerts.html]
curl performs SSL certificate verification by default, using a "bundle"
 of Certificate Authority (CA) public keys (CA certs). If the default
 bundle file isn't adequate, you can specify an alternate file
 using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
 the bundle, the certificate verification probably failed due to a
 problem with the certificate (it might be expired, or the name might
 not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
 the -k (or --insecure) option.

gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now
Using `mvn` from path: 
/home/jenkins/workspace/spark-master-compile-maven-hadoop-2.7/build/apache-maven-3.6.1/bin/mvn
build/mvn: line 163: 

[jira] [Resolved] (SPARK-28282) Convert and port 'inline-table.sql' into UDF test base

2019-07-20 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-28282.
--
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25124
[https://github.com/apache/spark/pull/25124]

> Convert and port 'inline-table.sql' into UDF test base
> --
>
> Key: SPARK-28282
> URL: https://issues.apache.org/jira/browse/SPARK-28282
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Terry Kim
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28282) Convert and port 'inline-table.sql' into UDF test base

2019-07-20 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-28282:


Assignee: Terry Kim

> Convert and port 'inline-table.sql' into UDF test base
> --
>
> Key: SPARK-28282
> URL: https://issues.apache.org/jira/browse/SPARK-28282
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Terry Kim
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28279) Convert and port 'group-analytics.sql' into UDF test base

2019-07-20 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-28279:


Assignee: Stavros Kontopoulos

> Convert and port 'group-analytics.sql' into UDF test base
> -
>
> Key: SPARK-28279
> URL: https://issues.apache.org/jira/browse/SPARK-28279
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Stavros Kontopoulos
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28279) Convert and port 'group-analytics.sql' into UDF test base

2019-07-20 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-28279.
--
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25196
[https://github.com/apache/spark/pull/25196]

> Convert and port 'group-analytics.sql' into UDF test base
> -
>
> Key: SPARK-28279
> URL: https://issues.apache.org/jira/browse/SPARK-28279
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Stavros Kontopoulos
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org