[jira] [Commented] (SPARK-24152) SparkR CRAN feasibility check server problem

2019-07-02 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877501#comment-16877501
 ] 

Dongjoon Hyun commented on SPARK-24152:
---

Thank you so much!

> SparkR CRAN feasibility check server problem
> 
>
> Key: SPARK-24152
> URL: https://issues.apache.org/jira/browse/SPARK-24152
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Dongjoon Hyun
>Assignee: Liang-Chi Hsieh
>Priority: Critical
>
> PR builder and master branch test fails with the following SparkR error with 
> unknown reason. The following is an error message from that.
> {code}
> * this is package 'SparkR' version '2.4.0'
> * checking CRAN incoming feasibility ...Error in 
> .check_package_CRAN_incoming(pkgdir) : 
>   dims [product 24] do not match the length of object [0]
> Execution halted
> {code}
> *PR BUILDER*
> - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90039/
> - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89983/
> - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89998/
> *MASTER BRANCH*
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/4458/
>  (Fail with no failures)
> This is critical because we already start to merge the PR by ignoring this 
> **known unkonwn** SparkR failure.
> - https://github.com/apache/spark/pull/21175



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24152) SparkR CRAN feasibility check server problem

2019-07-02 Thread Liang-Chi Hsieh (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877497#comment-16877497
 ] 

Liang-Chi Hsieh edited comment on SPARK-24152 at 7/3/19 5:19 AM:
-

Received reply that is cleaned up. So I think it is fine now.


was (Author: viirya):
Received reply that is cleaned up.

> SparkR CRAN feasibility check server problem
> 
>
> Key: SPARK-24152
> URL: https://issues.apache.org/jira/browse/SPARK-24152
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Dongjoon Hyun
>Assignee: Liang-Chi Hsieh
>Priority: Critical
>
> PR builder and master branch test fails with the following SparkR error with 
> unknown reason. The following is an error message from that.
> {code}
> * this is package 'SparkR' version '2.4.0'
> * checking CRAN incoming feasibility ...Error in 
> .check_package_CRAN_incoming(pkgdir) : 
>   dims [product 24] do not match the length of object [0]
> Execution halted
> {code}
> *PR BUILDER*
> - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90039/
> - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89983/
> - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89998/
> *MASTER BRANCH*
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/4458/
>  (Fail with no failures)
> This is critical because we already start to merge the PR by ignoring this 
> **known unkonwn** SparkR failure.
> - https://github.com/apache/spark/pull/21175



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24152) SparkR CRAN feasibility check server problem

2019-07-02 Thread Liang-Chi Hsieh (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877497#comment-16877497
 ] 

Liang-Chi Hsieh commented on SPARK-24152:
-

Received reply that is cleaned up.

> SparkR CRAN feasibility check server problem
> 
>
> Key: SPARK-24152
> URL: https://issues.apache.org/jira/browse/SPARK-24152
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Dongjoon Hyun
>Assignee: Liang-Chi Hsieh
>Priority: Critical
>
> PR builder and master branch test fails with the following SparkR error with 
> unknown reason. The following is an error message from that.
> {code}
> * this is package 'SparkR' version '2.4.0'
> * checking CRAN incoming feasibility ...Error in 
> .check_package_CRAN_incoming(pkgdir) : 
>   dims [product 24] do not match the length of object [0]
> Execution halted
> {code}
> *PR BUILDER*
> - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90039/
> - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89983/
> - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89998/
> *MASTER BRANCH*
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/4458/
>  (Fail with no failures)
> This is critical because we already start to merge the PR by ignoring this 
> **known unkonwn** SparkR failure.
> - https://github.com/apache/spark/pull/21175



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28133) Hyperbolic Functions

2019-07-02 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877489#comment-16877489
 ] 

Apache Spark commented on SPARK-28133:
--

User 'Tonix517' has created a pull request for this issue:
https://github.com/apache/spark/pull/25041

> Hyperbolic Functions
> 
>
> Key: SPARK-28133
> URL: https://issues.apache.org/jira/browse/SPARK-28133
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> ||Function||Description||Example||Result||
> |{{sinh(_x_)}}|hyperbolic sine|{{sinh(0)}}|{{0}}|
> |{{cosh(_x_)}}|hyperbolic cosine|{{cosh(0)}}|{{1}}|
> |{{tanh(_x_)}}|hyperbolic tangent|{{tanh(0)}}|{{0}}|
> |{{asinh(_x_)}}|inverse hyperbolic sine|{{asinh(0)}}|{{0}}|
> |{{acosh(_x_)}}|inverse hyperbolic cosine|{{acosh(1)}}|{{0}}|
> |{{atanh(_x_)}}|inverse hyperbolic tangent|{{atanh(0)}}|{{0}}|
>  
>  
> [https://www.postgresql.org/docs/12/functions-math.html#FUNCTIONS-MATH-HYP-TABLE]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28083) Support LIKE ... ESCAPE syntax

2019-07-02 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28083:
--
Summary: Support LIKE ... ESCAPE syntax  (was: ANSI SQL: LIKE predicate: 
ESCAPE clause)

> Support LIKE ... ESCAPE syntax
> --
>
> Key: SPARK-28083
> URL: https://issues.apache.org/jira/browse/SPARK-28083
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> Format:
> {noformat}
>  ::=
> 
>   | 
>  ::=
>
>  ::=
>   [ NOT ] LIKE  [ ESCAPE  ]
>  ::=
>   
>  ::=
>   
>  ::=
>
>  ::=
>   [ NOT ] LIKE  [ ESCAPE  ]
>  ::=
>   
>  ::=
>   
> {noformat}
>  
> [https://www.postgresql.org/docs/11/functions-matching.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28238) DESCRIBE TABLE for Data Source V2 tables

2019-07-02 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28238:


Assignee: (was: Apache Spark)

> DESCRIBE TABLE for Data Source V2 tables
> 
>
> Key: SPARK-28238
> URL: https://issues.apache.org/jira/browse/SPARK-28238
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Matt Cheah
>Priority: Major
>
> Implement the \{{DESCRIBE TABLE}} DDL command for tables that are loaded by 
> V2 catalogs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28238) DESCRIBE TABLE for Data Source V2 tables

2019-07-02 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28238:


Assignee: Apache Spark

> DESCRIBE TABLE for Data Source V2 tables
> 
>
> Key: SPARK-28238
> URL: https://issues.apache.org/jira/browse/SPARK-28238
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Matt Cheah
>Assignee: Apache Spark
>Priority: Major
>
> Implement the \{{DESCRIBE TABLE}} DDL command for tables that are loaded by 
> V2 catalogs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28238) DESCRIBE TABLE for Data Source V2 tables

2019-07-02 Thread Matt Cheah (JIRA)
Matt Cheah created SPARK-28238:
--

 Summary: DESCRIBE TABLE for Data Source V2 tables
 Key: SPARK-28238
 URL: https://issues.apache.org/jira/browse/SPARK-28238
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.0.0
Reporter: Matt Cheah


Implement the \{{DESCRIBE TABLE}} DDL command for tables that are loaded by V2 
catalogs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14067) spark-shell WARN ObjectStore: Failed to get database default,returning NoSuchObjectException

2019-07-02 Thread gloCalHelp.com (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877450#comment-16877450
 ] 

gloCalHelp.com commented on SPARK-14067:


Thank you for your guiding, sorry for no time to confirm.

> spark-shell WARN ObjectStore: Failed to get database default,returning 
> NoSuchObjectException
> 
>
> Key: SPARK-14067
> URL: https://issues.apache.org/jira/browse/SPARK-14067
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 1.6.0, 1.6.1
> Environment: OS:ubuntu14.04 LTS 3 vms,
> 1, having start hadoop cluster with 3 nodes;
> 2, having start spark's standalone deploy mode cluster with 3 nodes
> by sbin/start-all.sh
> 3, then before entering scala console,  an exception happened as below,
> but can type :help command.
>Reporter: gloCalHelp.com
>Priority: Critical
>
> OS:ubuntu14.04 LTS 3 vms,
> 1, having start hadoop cluster with 3 nodes;
> 2, having start spark's standalone deploy mode cluster with 3 nodes
> by sbin/start-all.sh
> 3, then before entering scala console,  an exception happened as title and 
> below log,
> but can type :help command.
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 1.6.1
>   /_/
> Using Scala version 2.10.5 (Java HotSpot(TM) Client VM, Java 1.7.0_45)
> Type in expressions to have them evaluated.
> Type :help for more information.
> 16/03/22 16:03:19 WARN Utils: Your hostname, master resolves to a loopback 
> address: 127.0.0.1; using 192.168.185.168 instead (on interface eth0)
> 16/03/22 16:03:19 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to 
> another address
> Spark context available as sc.
> 16/03/22 16:04:06 WARN Connection: BoneCP specified but not present in 
> CLASSPATH (or one of dependencies)
> 16/03/22 16:04:08 WARN Connection: BoneCP specified but not present in 
> CLASSPATH (or one of dependencies)
> 16/03/22 16:04:28 WARN ObjectStore: Version information not found in 
> metastore. hive.metastore.schema.verification is not enabled so recording the 
> schema version 1.2.0
> 16/03/22 16:04:29 WARN ObjectStore: Failed to get database default, returning 
> NoSuchObjectException
> Java HotSpot(TM) Client VM warning: You have loaded library 
> /tmp/libnetty-transport-native-epoll7745204847881537447.so which might have 
> disabled stack guard. The VM will try to fix the stack guard now.
> It's highly recommended that you fix the library with 'execstack -c 
> ', or link it with '-z noexecstack'.
> 16/03/22 16:04:42 WARN Connection: BoneCP specified but not present in 
> CLASSPATH (or one of dependencies)
> 16/03/22 16:04:44 WARN Connection: BoneCP specified but not present in 
> CLASSPATH (or one of dependencies)
> SQL context available as sqlContext.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24152) SparkR CRAN feasibility check server problem

2019-07-02 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877405#comment-16877405
 ] 

Hyukjin Kwon commented on SPARK-24152:
--

Thanks for followup!

> SparkR CRAN feasibility check server problem
> 
>
> Key: SPARK-24152
> URL: https://issues.apache.org/jira/browse/SPARK-24152
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Dongjoon Hyun
>Assignee: Liang-Chi Hsieh
>Priority: Critical
>
> PR builder and master branch test fails with the following SparkR error with 
> unknown reason. The following is an error message from that.
> {code}
> * this is package 'SparkR' version '2.4.0'
> * checking CRAN incoming feasibility ...Error in 
> .check_package_CRAN_incoming(pkgdir) : 
>   dims [product 24] do not match the length of object [0]
> Execution halted
> {code}
> *PR BUILDER*
> - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90039/
> - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89983/
> - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89998/
> *MASTER BRANCH*
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/4458/
>  (Fail with no failures)
> This is critical because we already start to merge the PR by ignoring this 
> **known unkonwn** SparkR failure.
> - https://github.com/apache/spark/pull/21175



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24152) SparkR CRAN feasibility check server problem

2019-07-02 Thread Liang-Chi Hsieh (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877401#comment-16877401
 ] 

Liang-Chi Hsieh commented on SPARK-24152:
-

I noticed that this issue happens now again. Contacted CRAN admin and asked for 
help. Will update when they reply.

> SparkR CRAN feasibility check server problem
> 
>
> Key: SPARK-24152
> URL: https://issues.apache.org/jira/browse/SPARK-24152
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Dongjoon Hyun
>Assignee: Liang-Chi Hsieh
>Priority: Critical
>
> PR builder and master branch test fails with the following SparkR error with 
> unknown reason. The following is an error message from that.
> {code}
> * this is package 'SparkR' version '2.4.0'
> * checking CRAN incoming feasibility ...Error in 
> .check_package_CRAN_incoming(pkgdir) : 
>   dims [product 24] do not match the length of object [0]
> Execution halted
> {code}
> *PR BUILDER*
> - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90039/
> - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89983/
> - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89998/
> *MASTER BRANCH*
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/4458/
>  (Fail with no failures)
> This is critical because we already start to merge the PR by ignoring this 
> **known unkonwn** SparkR failure.
> - https://github.com/apache/spark/pull/21175



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27296) User Defined Aggregating Functions (UDAFs) have a major efficiency problem

2019-07-02 Thread Erik Erlandson (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877374#comment-16877374
 ] 

Erik Erlandson commented on SPARK-27296:


The basic approach as described above appears to be working (see the linked 
PR). To obtain the desired behavior I had to create a new API, which is fairly 
similar to UDAF, but inherits from TypedImperativeAggregate. This new API 
supports UDT and Column instantiation, and so I believe it offers feature 
parity with the original UDAF, with substantial performance improvements.

> User Defined Aggregating Functions (UDAFs) have a major efficiency problem
> --
>
> Key: SPARK-27296
> URL: https://issues.apache.org/jira/browse/SPARK-27296
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL, Structured Streaming
>Affects Versions: 2.3.3, 2.4.0, 3.0.0
>Reporter: Erik Erlandson
>Priority: Major
>  Labels: performance, usability
>
> Spark's UDAFs appear to be serializing and de-serializing to/from the 
> MutableAggregationBuffer for each row.  This gist shows a small reproducing 
> UDAF and a spark shell session:
> [https://gist.github.com/erikerlandson/3c4d8c6345d1521d89e0d894a423046f]
> The UDAF and its compantion UDT are designed to count the number of times 
> that ser/de is invoked for the aggregator.  The spark shell session 
> demonstrates that it is executing ser/de on every row of the data frame.
> Note, Spark's pre-defined aggregators do not have this problem, as they are 
> based on an internal aggregating trait that does the correct thing and only 
> calls ser/de at points such as partition boundaries, presenting final 
> results, etc.
> This is a major problem for UDAFs, as it means that every UDAF is doing a 
> massive amount of unnecessary work per row, including but not limited to Row 
> object allocations. For a more realistic UDAF having its own non trivial 
> internal structure it is obviously that much worse.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28237) Add a new batch strategy called Idempotent to catch potential bugs in corresponding rules

2019-07-02 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28237:


Assignee: (was: Apache Spark)

> Add a new batch strategy called Idempotent to catch potential bugs in 
> corresponding rules
> -
>
> Key: SPARK-28237
> URL: https://issues.apache.org/jira/browse/SPARK-28237
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yesheng Ma
>Priority: Major
>
> The current {{RuleExecutor}} system contains two kinds of strategies: 
> {{Once}} and {{FixedPoint}}. The {{Once}} strategy is supposed to run once. 
> However, for particular rules (e.g. PullOutNondeterministic), they are 
> designed to be idempotent, but Spark currently lacks corresponding mechanism 
> to prevent such kind of non-idempotent behavior from happening.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28237) Add a new batch strategy called Idempotent to catch potential bugs in corresponding rules

2019-07-02 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28237:


Assignee: Apache Spark

> Add a new batch strategy called Idempotent to catch potential bugs in 
> corresponding rules
> -
>
> Key: SPARK-28237
> URL: https://issues.apache.org/jira/browse/SPARK-28237
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yesheng Ma
>Assignee: Apache Spark
>Priority: Major
>
> The current {{RuleExecutor}} system contains two kinds of strategies: 
> {{Once}} and {{FixedPoint}}. The {{Once}} strategy is supposed to run once. 
> However, for particular rules (e.g. PullOutNondeterministic), they are 
> designed to be idempotent, but Spark currently lacks corresponding mechanism 
> to prevent such kind of non-idempotent behavior from happening.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28237) Add a new batch strategy called Idempotent to catch potential bugs in corresponding rules

2019-07-02 Thread Yesheng Ma (JIRA)
Yesheng Ma created SPARK-28237:
--

 Summary: Add a new batch strategy called Idempotent to catch 
potential bugs in corresponding rules
 Key: SPARK-28237
 URL: https://issues.apache.org/jira/browse/SPARK-28237
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yesheng Ma


The current {{RuleExecutor}} system contains two kinds of strategies: {{Once}} 
and {{FixedPoint}}. The {{Once}} strategy is supposed to run once. However, for 
particular rules (e.g. PullOutNondeterministic), they are designed to be 
idempotent, but Spark currently lacks corresponding mechanism to prevent such 
kind of non-idempotent behavior from happening.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27489) UI updates to show executor resource information

2019-07-02 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27489:


Assignee: Apache Spark  (was: Thomas Graves)

> UI updates to show executor resource information
> 
>
> Key: SPARK-27489
> URL: https://issues.apache.org/jira/browse/SPARK-27489
> Project: Spark
>  Issue Type: Story
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: Thomas Graves
>Assignee: Apache Spark
>Priority: Major
>
> We are adding other resource type support to the executors and Spark. We 
> should show the resource information for each executor on the UI Executors 
> page.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27489) UI updates to show executor resource information

2019-07-02 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27489:


Assignee: Thomas Graves  (was: Apache Spark)

> UI updates to show executor resource information
> 
>
> Key: SPARK-27489
> URL: https://issues.apache.org/jira/browse/SPARK-27489
> Project: Spark
>  Issue Type: Story
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Major
>
> We are adding other resource type support to the executors and Spark. We 
> should show the resource information for each executor on the UI Executors 
> page.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28236) Fix PullOutNondeterministic Analyzer rule to enforce idempotence

2019-07-02 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28236:


Assignee: Apache Spark

> Fix PullOutNondeterministic Analyzer rule to enforce idempotence
> 
>
> Key: SPARK-28236
> URL: https://issues.apache.org/jira/browse/SPARK-28236
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yesheng Ma
>Assignee: Apache Spark
>Priority: Major
>
> Previous {{PullOutNonDeterministic}} rule transforms aggregates when the 
> aggregating expression has sub-expressions whose {{deterministic}} field is 
> set to false. However, this might break {{PullOutNonDeterministic's}} 
> idempotence property since the actually aggregation rewriting will only 
> transform those with {{NonDeterministic}} trait.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28236) Fix PullOutNondeterministic Analyzer rule to enforce idempotence

2019-07-02 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28236:


Assignee: (was: Apache Spark)

> Fix PullOutNondeterministic Analyzer rule to enforce idempotence
> 
>
> Key: SPARK-28236
> URL: https://issues.apache.org/jira/browse/SPARK-28236
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yesheng Ma
>Priority: Major
>
> Previous {{PullOutNonDeterministic}} rule transforms aggregates when the 
> aggregating expression has sub-expressions whose {{deterministic}} field is 
> set to false. However, this might break {{PullOutNonDeterministic's}} 
> idempotence property since the actually aggregation rewriting will only 
> transform those with {{NonDeterministic}} trait.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28235) Decimal sum return type

2019-07-02 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28235:


Assignee: Apache Spark

> Decimal sum return type
> ---
>
> Key: SPARK-28235
> URL: https://issues.apache.org/jira/browse/SPARK-28235
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Marco Gaido
>Assignee: Apache Spark
>Priority: Major
>
> Our implementation of decimal operations follows SQLServer behavior. As per 
> https://docs.microsoft.com/it-it/sql/t-sql/functions/sum-transact-sql?view=sql-server-2017,
>  the result of sum operation should be `DECIMAL(38, s)` while currently we 
> are setting it to `DECIMAL(10 + p, s)`. This means that with large datasets, 
> we may incur in overflow, even though we may have been able to represent the 
> value with higher precision and SQLServer returns correct results in that 
> case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28235) Decimal sum return type

2019-07-02 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28235:


Assignee: (was: Apache Spark)

> Decimal sum return type
> ---
>
> Key: SPARK-28235
> URL: https://issues.apache.org/jira/browse/SPARK-28235
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Marco Gaido
>Priority: Major
>
> Our implementation of decimal operations follows SQLServer behavior. As per 
> https://docs.microsoft.com/it-it/sql/t-sql/functions/sum-transact-sql?view=sql-server-2017,
>  the result of sum operation should be `DECIMAL(38, s)` while currently we 
> are setting it to `DECIMAL(10 + p, s)`. This means that with large datasets, 
> we may incur in overflow, even though we may have been able to represent the 
> value with higher precision and SQLServer returns correct results in that 
> case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28015) Invalid date formats should throw an exception

2019-07-02 Thread Iskender Unlu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877275#comment-16877275
 ] 

Iskender Unlu commented on SPARK-28015:
---

I will try to work on this issue as my first contribution trial.

> Invalid date formats should throw an exception
> --
>
> Key: SPARK-28015
> URL: https://issues.apache.org/jira/browse/SPARK-28015
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Yuming Wang
>Priority: Major
>
> Invalid date formats should throw an exception:
> {code:sql}
> SELECT date '1999 08 01'
> 1999-01-01
> {code}
> Supported date formats:
> https://github.com/apache/spark/blob/ab8710b57916a129fcb89464209361120d224535/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L365-L374



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28236) Fix PullOutNondeterministic Analyzer rule to enforce idempotence

2019-07-02 Thread Yesheng Ma (JIRA)
Yesheng Ma created SPARK-28236:
--

 Summary: Fix PullOutNondeterministic Analyzer rule to enforce 
idempotence
 Key: SPARK-28236
 URL: https://issues.apache.org/jira/browse/SPARK-28236
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yesheng Ma


Previous {{PullOutNonDeterministic}} rule transforms aggregates when the 
aggregating expression has sub-expressions whose {{deterministic}} field is set 
to false. However, this might break {{PullOutNonDeterministic's}} idempotence 
property since the actually aggregation rewriting will only transform those 
with {{NonDeterministic}} trait.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28235) Decimal sum return type

2019-07-02 Thread Marco Gaido (JIRA)
Marco Gaido created SPARK-28235:
---

 Summary: Decimal sum return type
 Key: SPARK-28235
 URL: https://issues.apache.org/jira/browse/SPARK-28235
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Marco Gaido


Our implementation of decimal operations follows SQLServer behavior. As per 
https://docs.microsoft.com/it-it/sql/t-sql/functions/sum-transact-sql?view=sql-server-2017,
 the result of sum operation should be `DECIMAL(38, s)` while currently we are 
setting it to `DECIMAL(10 + p, s)`. This means that with large datasets, we may 
incur in overflow, even though we may have been able to represent the value 
with higher precision and SQLServer returns correct results in that case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28222) Feature importance outputs different values in GBT and Random Forest in 2.3.3 and 2.4 pyspark version

2019-07-02 Thread Marco Gaido (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877267#comment-16877267
 ] 

Marco Gaido commented on SPARK-28222:
-

Mmmmh, there has been a bug fix for it (see SPARK-26721), but it should be in 
3.0 only AFAIK. The question is: which is the rigth value? Can you compare it 
with other libs like sklearn?

> Feature importance outputs different values in GBT and Random Forest in 2.3.3 
> and 2.4 pyspark version
> -
>
> Key: SPARK-28222
> URL: https://issues.apache.org/jira/browse/SPARK-28222
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.4.0, 2.4.1, 2.4.2, 2.4.3
>Reporter: eneriwrt
>Priority: Minor
>
> Feature importance values obtained in a binary classification project outputs 
> different values if 2.3.3 version used or 2.4.0. It happens in Random Forest 
> and GBT.
> As an example:
> *SPARK 2.4*
> MODEL RandomForestClassifier_gini [0.0, 0.4117930839002269, 
> 0.06894132653061226, 0.15857667209786705, 0.2974447311021076, 
> 0.06324418636918638]
> MODEL RandomForestClassifier_entropy [0.0, 0.3864372497988694, 
> 0.06578883597468652, 0.17433924485055197, 0.31754597164210124, 
> 0.055888697733790925]
> MODEL GradientBoostingClassifier [0.0, 0.7556, 
> 0.24438, 0.0, 1.4602196686471875e-17, 0.0]
> *SPARK 2.3.3*
> MODEL RandomForestClassifier_gini [0.0, 0.40957086167800455, 
> 0.06894132653061226, 0.16413222765342259, 0.2974447311021076, 
> 0.05991085303585305]
> MODEL RandomForestClassifier_entropy [0.0, 0.3864372497988694, 
> 0.06578883597468652, 0.18789704501922055, 0.30398817147343266, 
> 0.055888697733790925]
> MODEL GradientBoostingClassifier [0.0, 0.7555, 
> 0.24438, 0.0, 2.4326753518951276e-17, 0.0]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28219) Data source v2 user guide

2019-07-02 Thread Ryan Blue (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877172#comment-16877172
 ] 

Ryan Blue commented on SPARK-28219:
---

I'm closing this as a duplicate. Please use SPARK-27708.

If you want to note specific docs to write, please add them to that issue.

> Data source v2 user guide
> -
>
> Key: SPARK-28219
> URL: https://issues.apache.org/jira/browse/SPARK-28219
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28219) Data source v2 user guide

2019-07-02 Thread Ryan Blue (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue resolved SPARK-28219.
---
Resolution: Duplicate

> Data source v2 user guide
> -
>
> Key: SPARK-28219
> URL: https://issues.apache.org/jira/browse/SPARK-28219
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27560) HashPartitioner uses Object.hashCode which is not seeded

2019-07-02 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27560:


Assignee: Apache Spark

> HashPartitioner uses Object.hashCode which is not seeded
> 
>
> Key: SPARK-27560
> URL: https://issues.apache.org/jira/browse/SPARK-27560
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.4.0
> Environment: Notebook is running spark v2.4.0 local[*]
> Python 3.6.6 (default, Sep  6 2018, 13:10:03)
> [GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.2)] on darwin
> I imagine this would reproduce on all operating systems and most versions of 
> spark though.
>Reporter: Andrew McHarg
>Assignee: Apache Spark
>Priority: Minor
>
> Forgive the quality of the bug report here, I am a pyspark user and not super 
> familiar with the internals of spark, yet it seems I have a strange corner 
> case with the HashPartitioner.
> This may already be known but repartition with HashPartitioner seems to 
> assign everything the same partition if data that was partitioned by the same 
> column is only partially read (say one partition). I suppose it is obvious 
> concequence of Object.hashCode being deterministic but took some while to 
> track down. 
> Steps to repro:
>  # Get dataframe with a bunch of uuids say 1
>  # repartition(100, 'uuid_column')
>  # save to parquet
>  # read from parquet
>  # collect()[:100] then filter using pyspark.sql.functions isin (yes I know 
> this is bad and sampleBy should probably be used here)
>  # repartition(10, 'uuid_column')
>  # Resulting dataframe will have all of its data in one single partition
> Jupyter notebook for the above: 
> https://gist.github.com/robo-hamburger/4752a40cb643318464e58ab66cf7d23e
> I think an easy fix would be to seed the HashPartitioner like many hashtable 
> libraries do to avoid denial of service attacks. It also might be the case 
> this is obvious behavior for more experienced spark users :)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27560) HashPartitioner uses Object.hashCode which is not seeded

2019-07-02 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27560:


Assignee: (was: Apache Spark)

> HashPartitioner uses Object.hashCode which is not seeded
> 
>
> Key: SPARK-27560
> URL: https://issues.apache.org/jira/browse/SPARK-27560
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.4.0
> Environment: Notebook is running spark v2.4.0 local[*]
> Python 3.6.6 (default, Sep  6 2018, 13:10:03)
> [GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.2)] on darwin
> I imagine this would reproduce on all operating systems and most versions of 
> spark though.
>Reporter: Andrew McHarg
>Priority: Minor
>
> Forgive the quality of the bug report here, I am a pyspark user and not super 
> familiar with the internals of spark, yet it seems I have a strange corner 
> case with the HashPartitioner.
> This may already be known but repartition with HashPartitioner seems to 
> assign everything the same partition if data that was partitioned by the same 
> column is only partially read (say one partition). I suppose it is obvious 
> concequence of Object.hashCode being deterministic but took some while to 
> track down. 
> Steps to repro:
>  # Get dataframe with a bunch of uuids say 1
>  # repartition(100, 'uuid_column')
>  # save to parquet
>  # read from parquet
>  # collect()[:100] then filter using pyspark.sql.functions isin (yes I know 
> this is bad and sampleBy should probably be used here)
>  # repartition(10, 'uuid_column')
>  # Resulting dataframe will have all of its data in one single partition
> Jupyter notebook for the above: 
> https://gist.github.com/robo-hamburger/4752a40cb643318464e58ab66cf7d23e
> I think an easy fix would be to seed the HashPartitioner like many hashtable 
> libraries do to avoid denial of service attacks. It also might be the case 
> this is obvious behavior for more experienced spark users :)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28223) stream-stream joins should fail unsupported checker in update mode

2019-07-02 Thread Jose Torres (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Torres resolved SPARK-28223.
-
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25023
[https://github.com/apache/spark/pull/25023]

> stream-stream joins should fail unsupported checker in update mode
> --
>
> Key: SPARK-28223
> URL: https://issues.apache.org/jira/browse/SPARK-28223
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.4.3
>Reporter: Jose Torres
>Priority: Major
> Fix For: 3.0.0
>
>
> Right now they fail only for inner joins, because we implemented the check 
> when that was the only supported type.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28234) Spark Resources - add python support to get resources

2019-07-02 Thread Thomas Graves (JIRA)
Thomas Graves created SPARK-28234:
-

 Summary: Spark Resources - add python support to get resources
 Key: SPARK-28234
 URL: https://issues.apache.org/jira/browse/SPARK-28234
 Project: Spark
  Issue Type: Story
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: Thomas Graves


Add the equivalent python api for sc.resources and TaskContext.resources



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28224) Sum aggregation returns null on overflow decimals

2019-07-02 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28224:


Assignee: Apache Spark

> Sum aggregation returns null on overflow decimals
> -
>
> Key: SPARK-28224
> URL: https://issues.apache.org/jira/browse/SPARK-28224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Mick Jermsurawong
>Assignee: Apache Spark
>Priority: Major
>
> Given the option to throw exception on overflow on, sum aggregation of 
> overflowing bigdecimal still remain null. {{DecimalAggregates}} is only 
> invoked when expression of the sum (not the elements to be operated) has 
> sufficiently small precision. The fix seems to be in Sum expression itself. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28224) Sum aggregation returns null on overflow decimals

2019-07-02 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28224:


Assignee: (was: Apache Spark)

> Sum aggregation returns null on overflow decimals
> -
>
> Key: SPARK-28224
> URL: https://issues.apache.org/jira/browse/SPARK-28224
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Mick Jermsurawong
>Priority: Major
>
> Given the option to throw exception on overflow on, sum aggregation of 
> overflowing bigdecimal still remain null. {{DecimalAggregates}} is only 
> invoked when expression of the sum (not the elements to be operated) has 
> sufficiently small precision. The fix seems to be in Sum expression itself. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27360) Standalone cluster mode support for GPU-aware scheduling

2019-07-02 Thread Thomas Graves (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877086#comment-16877086
 ] 

Thomas Graves commented on SPARK-27360:
---

Are you going to handle updating the master/worker UI to include resources?

> Standalone cluster mode support for GPU-aware scheduling
> 
>
> Key: SPARK-27360
> URL: https://issues.apache.org/jira/browse/SPARK-27360
> Project: Spark
>  Issue Type: Story
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>Priority: Major
>
> Design and implement standalone manager support for GPU-aware scheduling:
> 1. static conf to describe resources
> 2. spark-submit to request resources 
> 2. auto discovery of GPUs
> 3. executor process isolation



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28233) Upgrade maven-jar-plugin and maven-source-plugin

2019-07-02 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-28233:

Description: 
Upgrade {{maven-jar-plugin}} to 3.1.2 and {{maven-source-plugin}} to 3.1.0 to 
avoid:
 * MJAR-259 – Archiving to jar is very slow
 * MSOURCES-119 – Archiving to jar is very slow

Release notes:
[https://blogs.apache.org/maven/entry/apache-maven-source-plugin-version]
[https://blogs.apache.org/maven/entry/apache-maven-jar-plugin-version2]
[https://blogs.apache.org/maven/entry/apache-maven-jar-plugin-version1]

> Upgrade maven-jar-plugin and maven-source-plugin
> 
>
> Key: SPARK-28233
> URL: https://issues.apache.org/jira/browse/SPARK-28233
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> Upgrade {{maven-jar-plugin}} to 3.1.2 and {{maven-source-plugin}} to 3.1.0 to 
> avoid:
>  * MJAR-259 – Archiving to jar is very slow
>  * MSOURCES-119 – Archiving to jar is very slow
> Release notes:
> [https://blogs.apache.org/maven/entry/apache-maven-source-plugin-version]
> [https://blogs.apache.org/maven/entry/apache-maven-jar-plugin-version2]
> [https://blogs.apache.org/maven/entry/apache-maven-jar-plugin-version1]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28233) Upgrade maven-jar-plugin and maven-source-plugin

2019-07-02 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28233:


Assignee: (was: Apache Spark)

> Upgrade maven-jar-plugin and maven-source-plugin
> 
>
> Key: SPARK-28233
> URL: https://issues.apache.org/jira/browse/SPARK-28233
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28233) Upgrade maven-jar-plugin and maven-source-plugin

2019-07-02 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-28233:
---

 Summary: Upgrade maven-jar-plugin and maven-source-plugin
 Key: SPARK-28233
 URL: https://issues.apache.org/jira/browse/SPARK-28233
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 3.0.0
Reporter: Yuming Wang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28233) Upgrade maven-jar-plugin and maven-source-plugin

2019-07-02 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28233:


Assignee: Apache Spark

> Upgrade maven-jar-plugin and maven-source-plugin
> 
>
> Key: SPARK-28233
> URL: https://issues.apache.org/jira/browse/SPARK-28233
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25353) executeTake in SparkPlan could decode rows more than necessary.

2019-07-02 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-25353:
---

Assignee: Dooyoung Hwang

> executeTake in SparkPlan could decode rows more than necessary.
> ---
>
> Key: SPARK-25353
> URL: https://issues.apache.org/jira/browse/SPARK-25353
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Dooyoung Hwang
>Assignee: Dooyoung Hwang
>Priority: Major
> Fix For: 3.0.0
>
>
> In some cases, executeTake in SparkPlan could decode more than necessary.
> For example, df.limit(1000).collect() is executed.
>   +- executeTake in SparkPlan is called with arg 1000.
>   +- If total rows count from partitions is 2000, executeTake decode them 
> and create array of InternalRow whose size is 2000.
>   +- Slice the first 1000 rows, and return them. 1000 rows in the rear 
> are not used.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28230) Fix Spark Streaming gracefully shutdown on Hadopp 2.8.x

2019-07-02 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SPARK-28230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876941#comment-16876941
 ] 

Burak KÖSE commented on SPARK-28230:


This is a problem for teams using cloud solutions, such as EMR. It comes with 
2.8.5 for the latest version of Spark. Has 2.8.6 ever released?

> Fix Spark Streaming gracefully shutdown on Hadopp 2.8.x
> ---
>
> Key: SPARK-28230
> URL: https://issues.apache.org/jira/browse/SPARK-28230
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Structured Streaming
>Affects Versions: 2.4.3
>Reporter: Burak KÖSE
>Priority: Minor
>
> Gracefully shutdown is not properly working on Hadoop 2.8.x. Hadoop 
> introduces 10 seconds timeout by default. This is hardcoded and not something 
> that we can configure with Hadoop settings.
>  
> [https://github.com/apache/hadoop/blob/branch-2.8.5/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ShutdownHookManager.java#L51]
>  
> However, if we use 
>  
> {code:java}
> public void addShutdownHook(Runnable shutdownHook, int priority, long 
> timeout, TimeUnit unit){code}
>  
> in 
> [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala#L180|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala#L180,]
> problem might be solved.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25353) executeTake in SparkPlan could decode rows more than necessary.

2019-07-02 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-25353.
-
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 22347
[https://github.com/apache/spark/pull/22347]

> executeTake in SparkPlan could decode rows more than necessary.
> ---
>
> Key: SPARK-25353
> URL: https://issues.apache.org/jira/browse/SPARK-25353
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Dooyoung Hwang
>Priority: Major
> Fix For: 3.0.0
>
>
> In some cases, executeTake in SparkPlan could decode more than necessary.
> For example, df.limit(1000).collect() is executed.
>   +- executeTake in SparkPlan is called with arg 1000.
>   +- If total rows count from partitions is 2000, executeTake decode them 
> and create array of InternalRow whose size is 2000.
>   +- Slice the first 1000 rows, and return them. 1000 rows in the rear 
> are not used.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28232) Add groupIdPrefix for Kafka batch connector

2019-07-02 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-28232.
-
   Resolution: Fixed
 Assignee: Gabor Somogyi
Fix Version/s: 3.0.0

> Add groupIdPrefix for Kafka batch connector
> ---
>
> Key: SPARK-28232
> URL: https://issues.apache.org/jira/browse/SPARK-28232
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Major
> Fix For: 3.0.0
>
>
> According to the documentation groupIdPrefix should be available for 
> streaming and batch.
> It is not the case because the batch part is missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27877) Implement SQL-standard LATERAL subqueries

2019-07-02 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876909#comment-16876909
 ] 

Yuming Wang commented on SPARK-27877:
-

[~lishuming] tickets are assigned only once the PR is merged. Please go ahead 
submitting the PR: https://github.com/apache/spark/pulls

> Implement SQL-standard LATERAL subqueries
> -
>
> Key: SPARK-27877
> URL: https://issues.apache.org/jira/browse/SPARK-27877
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> Subqueries appearing in {{FROM}} can be preceded by the key word {{LATERAL}}. 
> This allows them to reference columns provided by preceding {{FROM}} items. A 
> trivial example of {{LATERAL}} is:
> {code:sql}
> SELECT * FROM foo, LATERAL (SELECT * FROM bar WHERE bar.id = foo.bar_id) ss;
> {code}
> More details:
>  
> [https://www.postgresql.org/docs/9.3/queries-table-expressions.html#QUERIES-LATERAL]
>  
> [https://github.com/postgres/postgres/commit/5ebaaa49445eb1ba7b299bbea3a477d4e4c0430]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28230) Fix Spark Streaming gracefully shutdown on Hadopp 2.8.x

2019-07-02 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876880#comment-16876880
 ] 

Yuming Wang commented on SPARK-28230:
-

Hadoop 2.8.6 should support it. See HADOOP-15679 for more details.

> Fix Spark Streaming gracefully shutdown on Hadopp 2.8.x
> ---
>
> Key: SPARK-28230
> URL: https://issues.apache.org/jira/browse/SPARK-28230
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Structured Streaming
>Affects Versions: 2.4.3
>Reporter: Burak KÖSE
>Priority: Minor
>
> Gracefully shutdown is not properly working on Hadoop 2.8.x. Hadoop 
> introduces 10 seconds timeout by default. This is hardcoded and not something 
> that we can configure with Hadoop settings.
>  
> [https://github.com/apache/hadoop/blob/branch-2.8.5/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ShutdownHookManager.java#L51]
>  
> However, if we use 
>  
> {code:java}
> public void addShutdownHook(Runnable shutdownHook, int priority, long 
> timeout, TimeUnit unit){code}
>  
> in 
> [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala#L180|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala#L180,]
> problem might be solved.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27877) Implement SQL-standard LATERAL subqueries

2019-07-02 Thread ShuMing Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876865#comment-16876865
 ] 

ShuMing Li commented on SPARK-27877:


Can this issue assign to me?  Let me do it.

> Implement SQL-standard LATERAL subqueries
> -
>
> Key: SPARK-27877
> URL: https://issues.apache.org/jira/browse/SPARK-27877
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> Subqueries appearing in {{FROM}} can be preceded by the key word {{LATERAL}}. 
> This allows them to reference columns provided by preceding {{FROM}} items. A 
> trivial example of {{LATERAL}} is:
> {code:sql}
> SELECT * FROM foo, LATERAL (SELECT * FROM bar WHERE bar.id = foo.bar_id) ss;
> {code}
> More details:
>  
> [https://www.postgresql.org/docs/9.3/queries-table-expressions.html#QUERIES-LATERAL]
>  
> [https://github.com/postgres/postgres/commit/5ebaaa49445eb1ba7b299bbea3a477d4e4c0430]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28232) Add groupIdPrefix for Kafka batch connector

2019-07-02 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28232:


Assignee: Apache Spark

> Add groupIdPrefix for Kafka batch connector
> ---
>
> Key: SPARK-28232
> URL: https://issues.apache.org/jira/browse/SPARK-28232
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Assignee: Apache Spark
>Priority: Major
>
> According to the documentation groupIdPrefix should be available for 
> streaming and batch.
> It is not the case because the batch part is missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28232) Add groupIdPrefix for Kafka batch connector

2019-07-02 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28232:


Assignee: (was: Apache Spark)

> Add groupIdPrefix for Kafka batch connector
> ---
>
> Key: SPARK-28232
> URL: https://issues.apache.org/jira/browse/SPARK-28232
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Priority: Major
>
> According to the documentation groupIdPrefix should be available for 
> streaming and batch.
> It is not the case because the batch part is missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28231) Adaptive execution should ignore RepartitionByExpression

2019-07-02 Thread Jrex Ge (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jrex Ge updated SPARK-28231:

Description: Dataset repartitionby will modify the partition information by 
adaptive execution

> Adaptive execution should ignore RepartitionByExpression
> 
>
> Key: SPARK-28231
> URL: https://issues.apache.org/jira/browse/SPARK-28231
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.1
>Reporter: Jrex Ge
>Priority: Critical
>
> Dataset repartitionby will modify the partition information by adaptive 
> execution



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28232) Add groupIdPrefix for Kafka batch connector

2019-07-02 Thread Gabor Somogyi (JIRA)
Gabor Somogyi created SPARK-28232:
-

 Summary: Add groupIdPrefix for Kafka batch connector
 Key: SPARK-28232
 URL: https://issues.apache.org/jira/browse/SPARK-28232
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Gabor Somogyi


According to the documentation groupIdPrefix should be available for streaming 
and batch.
It is not the case because the batch part is missing.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28231) Adaptive execution should ignore RepartitionByExpression

2019-07-02 Thread Jrex Ge (JIRA)
Jrex Ge created SPARK-28231:
---

 Summary: Adaptive execution should ignore RepartitionByExpression
 Key: SPARK-28231
 URL: https://issues.apache.org/jira/browse/SPARK-28231
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.1
Reporter: Jrex Ge






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28230) Fix Spark Streaming gracefully shutdown on Hadopp 2.8.x

2019-07-02 Thread JIRA
Burak KÖSE created SPARK-28230:
--

 Summary: Fix Spark Streaming gracefully shutdown on Hadopp 2.8.x
 Key: SPARK-28230
 URL: https://issues.apache.org/jira/browse/SPARK-28230
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, Structured Streaming
Affects Versions: 2.4.3
Reporter: Burak KÖSE


Gracefully shutdown is not properly working on Hadoop 2.8.x. Hadoop introduces 
10 seconds timeout by default. This is hardcoded and not something that we can 
configure with Hadoop settings.

 

[https://github.com/apache/hadoop/blob/branch-2.8.5/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ShutdownHookManager.java#L51]

 

However, if we use 

 
{code:java}
public void addShutdownHook(Runnable shutdownHook, int priority, long timeout, 
TimeUnit unit){code}
 

in 
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala#L180|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala#L180,]

problem might be solved.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28229) How to implement the same functionality as presto's TRY(expr) ?

2019-07-02 Thread U Shaw (JIRA)
U Shaw created SPARK-28229:
--

 Summary: How to implement the same functionality as presto's 
TRY(expr) ?
 Key: SPARK-28229
 URL: https://issues.apache.org/jira/browse/SPARK-28229
 Project: Spark
  Issue Type: Question
  Components: SQL
Affects Versions: 2.4.3
Reporter: U Shaw



How to implement the same functionality as presto's TRY(expr) ? 
Is there already a similar function?

--
TRY
try(expression)
Evaluate an expression and handle certain types of errors by returning NULL.

In cases where it is preferable that queries produce NULL or default values 
instead of failing when corrupt or invalid data is encountered, the TRY 
function may be useful. To specify default values, the TRY function can be used 
in conjunction with the COALESCE function.

The following errors are handled by TRY:

Division by zero
Invalid cast or function argument
Numeric value out of range
Examples
Source table with some invalid data:

SELECT * FROM shipping;
 origin_state | origin_zip | packages | total_cost
--++--+
 California   |  94131 |   25 |100
 California   |  P332a |5 | 72
 California   |  94025 |0 |155
 New Jersey   |  08544 |  225 |490
(4 rows)
Query failure without TRY:

SELECT CAST(origin_zip AS BIGINT) FROM shipping;
Query failed: Can not cast 'P332a' to BIGINT
NULL values with TRY:

SELECT TRY(CAST(origin_zip AS BIGINT)) FROM shipping;
 origin_zip

  94131
 NULL
  94025
  08544
(4 rows)
Query failure without TRY:

SELECT total_cost / packages AS per_package FROM shipping;
Query failed: / by zero
Default values with TRY and COALESCE:

SELECT COALESCE(TRY(total_cost / packages), 0) AS per_package FROM shipping;
 per_package
-
  4
 14
  0
 19
(4 rows)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28228) Fix substitution order of nested WITH clauses

2019-07-02 Thread Peter Toth (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Toth updated SPARK-28228:
---
Summary: Fix substitution order of nested WITH clauses  (was: Better 
support for WITH clause)

> Fix substitution order of nested WITH clauses
> -
>
> Key: SPARK-28228
> URL: https://issues.apache.org/jira/browse/SPARK-28228
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Peter Toth
>Priority: Major
>
> PostgreSQL handles nested WITHs in a different way then Spark does currently. 
> These queries retunes 1 in Spark while they return 2 in PostgreSQL:
> {noformat}
> WITH
>   t AS (SELECT 1),
>   t2 AS (
> WITH t AS (SELECT 2)
> SELECT * FROM t
>   )
> SELECT * FROM t2
> {noformat}
> {noformat}
> WITH t AS (SELECT 1)
> SELECT (
>   WITH t AS (SELECT 2)
>   SELECT * FROM t
> )
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28228) Better support for WITH clause

2019-07-02 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28228:


Assignee: Apache Spark

> Better support for WITH clause
> --
>
> Key: SPARK-28228
> URL: https://issues.apache.org/jira/browse/SPARK-28228
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Peter Toth
>Assignee: Apache Spark
>Priority: Major
>
> PostgreSQL handles nested WITHs in a different way then Spark does currently. 
> These queries retunes 1 in Spark while they return 2 in PostgreSQL:
> {noformat}
> WITH
>   t AS (SELECT 1),
>   t2 AS (
> WITH t AS (SELECT 2)
> SELECT * FROM t
>   )
> SELECT * FROM t2
> {noformat}
> {noformat}
> WITH t AS (SELECT 1)
> SELECT (
>   WITH t AS (SELECT 2)
>   SELECT * FROM t
> )
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28228) Better support for WITH clause

2019-07-02 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28228:


Assignee: (was: Apache Spark)

> Better support for WITH clause
> --
>
> Key: SPARK-28228
> URL: https://issues.apache.org/jira/browse/SPARK-28228
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Peter Toth
>Priority: Major
>
> PostgreSQL handles nested WITHs in a different way then Spark does currently. 
> These queries retunes 1 in Spark while they return 2 in PostgreSQL:
> {noformat}
> WITH
>   t AS (SELECT 1),
>   t2 AS (
> WITH t AS (SELECT 2)
> SELECT * FROM t
>   )
> SELECT * FROM t2
> {noformat}
> {noformat}
> WITH t AS (SELECT 1)
> SELECT (
>   WITH t AS (SELECT 2)
>   SELECT * FROM t
> )
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28228) Better support for WITH clause

2019-07-02 Thread Peter Toth (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Toth updated SPARK-28228:
---
Description: 
PostgreSQL handles nested WITHs in a different way then Spark does currently. 
These queries retunes 1 in Spark while they return 2 in PostgreSQL:

{noformat}
WITH
  t AS (SELECT 1),
  t2 AS (
WITH t AS (SELECT 2)
SELECT * FROM t
  )
SELECT * FROM t2
{noformat}

{noformat}
WITH t AS (SELECT 1)
SELECT (
  WITH t AS (SELECT 2)
  SELECT * FROM t
)
{noformat}

  was:
Because of Spark-17590 it should be relatively easy to support WITH clause in 
subqueries besides nested CTE definitions.

Here an example of a query that does not run on spark:
create table test (seqno int, k string, v int) using parquet;
insert into TABLE test values (1,'a', 99),(2, 'b', 88),(3, 'a', 77),(4, 'b', 
66),(5, 'c', 55),(6, 'a', 44),(7, 'b', 33);
SELECT percentile(b, 0.5) FROM (WITH mavg AS (SELECT k, AVG(v) OVER (PARTITION 
BY k ORDER BY seqno ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) as b FROM test 
ORDER BY seqno) SELECT k, MAX(b) as b  FROM mavg GROUP BY k);


> Better support for WITH clause
> --
>
> Key: SPARK-28228
> URL: https://issues.apache.org/jira/browse/SPARK-28228
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Peter Toth
>Priority: Major
>
> PostgreSQL handles nested WITHs in a different way then Spark does currently. 
> These queries retunes 1 in Spark while they return 2 in PostgreSQL:
> {noformat}
> WITH
>   t AS (SELECT 1),
>   t2 AS (
> WITH t AS (SELECT 2)
> SELECT * FROM t
>   )
> SELECT * FROM t2
> {noformat}
> {noformat}
> WITH t AS (SELECT 1)
> SELECT (
>   WITH t AS (SELECT 2)
>   SELECT * FROM t
> )
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28228) Better support for WITH clause

2019-07-02 Thread Peter Toth (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Toth updated SPARK-28228:
---
Affects Version/s: (was: 2.2.0)
   3.0.0

> Better support for WITH clause
> --
>
> Key: SPARK-28228
> URL: https://issues.apache.org/jira/browse/SPARK-28228
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Peter Toth
>Priority: Major
>
> Because of Spark-17590 it should be relatively easy to support WITH clause in 
> subqueries besides nested CTE definitions.
> Here an example of a query that does not run on spark:
> create table test (seqno int, k string, v int) using parquet;
> insert into TABLE test values (1,'a', 99),(2, 'b', 88),(3, 'a', 77),(4, 'b', 
> 66),(5, 'c', 55),(6, 'a', 44),(7, 'b', 33);
> SELECT percentile(b, 0.5) FROM (WITH mavg AS (SELECT k, AVG(v) OVER 
> (PARTITION BY k ORDER BY seqno ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) as b 
> FROM test ORDER BY seqno) SELECT k, MAX(b) as b  FROM mavg GROUP BY k);



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28228) Better support for WITH clause

2019-07-02 Thread Peter Toth (JIRA)
Peter Toth created SPARK-28228:
--

 Summary: Better support for WITH clause
 Key: SPARK-28228
 URL: https://issues.apache.org/jira/browse/SPARK-28228
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.2.0
Reporter: Peter Toth


Because of Spark-17590 it should be relatively easy to support WITH clause in 
subqueries besides nested CTE definitions.

Here an example of a query that does not run on spark:
create table test (seqno int, k string, v int) using parquet;
insert into TABLE test values (1,'a', 99),(2, 'b', 88),(3, 'a', 77),(4, 'b', 
66),(5, 'c', 55),(6, 'a', 44),(7, 'b', 33);
SELECT percentile(b, 0.5) FROM (WITH mavg AS (SELECT k, AVG(v) OVER (PARTITION 
BY k ORDER BY seqno ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) as b FROM test 
ORDER BY seqno) SELECT k, MAX(b) as b  FROM mavg GROUP BY k);



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-27802) SparkUI throws NoSuchElementException when inconsistency appears between `ExecutorStageSummaryWrapper`s and `ExecutorSummaryWrapper`s

2019-07-02 Thread liupengcheng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875945#comment-16875945
 ] 

liupengcheng edited comment on SPARK-27802 at 7/2/19 6:23 AM:
--

[~shahid] yes, but I checked master branch, I found that these logic was 
removed in 3.0.0 and replaced with some javascript scripts, so I'am not sure 
whether we can fix it only in versions prior to 3.0.0? I haven't looked into 
the code of 3.0.0, so I am not sure whether this issue still exists, that's why 
I haven't put an PR for it.

you can follow these steps to reproduce the issue:
 # set spark.ui.retainedDeadExecutors=0 and set spark.ui.retainedStages=1000
 # set spark.dynamicAllocation.enabled=true
 # run a spark app, and wait for complete, and let executors idle.
 # check the stage UI.


was (Author: liupengcheng):
[~shahid] yes, but I checked master branch, I found that these logic was 
removed in 3.0.0, so I'am not sure whether we can fix it only in 2.3? that's 
why I haven't put an PR for it.

you can follow these steps to reproduce the issue:
 # set spark.ui.retainedDeadExecutors=0 and set spark.ui.retainedStages=1000
 # set spark.dynamicAllocation.enabled=true
 # run a spark app, and wait for complete, and let executors idle.
 # check the stage UI.

> SparkUI throws NoSuchElementException when inconsistency appears between 
> `ExecutorStageSummaryWrapper`s and `ExecutorSummaryWrapper`s
> -
>
> Key: SPARK-27802
> URL: https://issues.apache.org/jira/browse/SPARK-27802
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.2
>Reporter: liupengcheng
>Priority: Major
>
> Recently, we hit this issue when testing spark2.3. It report the following 
> error messages when clicking on the stage UI link.
> We add more logs to print the executorId(here is 10) to debug, and finally 
> find out that it's caused by the inconsistency between the list of 
> `ExecutorStageSummaryWrapper` and the `ExecutorSummaryWrapper` in the 
> KVStore. The number of deadExecutors may exceeded threshold and being removed 
> from list of `ExecutorSummaryWrapper`, however, it may still be kept in the 
> list of `ExecutorStageSummaryWrapper` in the store.
> {code:java}
> HTTP ERROR 500
> Problem accessing /stages/stage/. Reason:
> Server Error
> Caused by:
> java.util.NoSuchElementException: 10
>   at 
> org.apache.spark.util.kvstore.InMemoryStore.read(InMemoryStore.java:83)
>   at 
> org.apache.spark.status.ElementTrackingStore.read(ElementTrackingStore.scala:95)
>   at 
> org.apache.spark.status.AppStatusStore.executorSummary(AppStatusStore.scala:70)
>   at 
> org.apache.spark.ui.jobs.ExecutorTable$$anonfun$createExecutorTable$2.apply(ExecutorTable.scala:99)
>   at 
> org.apache.spark.ui.jobs.ExecutorTable$$anonfun$createExecutorTable$2.apply(ExecutorTable.scala:92)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.ui.jobs.ExecutorTable.createExecutorTable(ExecutorTable.scala:92)
>   at 
> org.apache.spark.ui.jobs.ExecutorTable.toNodeSeq(ExecutorTable.scala:75)
>   at org.apache.spark.ui.jobs.StagePage.render(StagePage.scala:478)
>   at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82)
>   at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82)
>   at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>   at 
> org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>   at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>   at 
> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:166)
>   at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>   at 
> org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>   at 
> org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>   at 
> org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>   at