date:20200511

[jira] [Assigned] (SPARK-31684) Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same as the table

2020-05-11 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31684:


Assignee: (was: Apache Spark)

> Overwrite partition failed with 'WRONG FS' when the target partition is not 
> belong to the filesystem as same as the table 
> --
>
> Key: SPARK-31684
> URL: https://issues.apache.org/jira/browse/SPARK-31684
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.3, 2.2.3, 2.3.4, 2.4.5, 3.0.0, 3.1.0
>Reporter: Kent Yao
>Priority: Blocker
>
> With https://issues.apache.org/jira/browse/SPARK-18107, we will disable the 
> underlying replace(overwrite) and instead do delete in spark side and only do 
> copy in hive side to bypass the performance issue - 
> https://issues.apache.org/jira/browse/HIVE-11940
>  
> Conditionally, if the table location and partition location do not belong to 
> the same [[FileSystem]], We should not disable hive overwrite. Otherwise, 
> hive will use the [[FileSystem]] instance belong to the table location to 
> copy files, which will fail [[FileSystem#checkPath]]
> see 
> https://github.com/apache/hive/blob/rel/release-2.3.7/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L1648-L1659



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31684) Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same as the table

2020-05-11 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105142#comment-17105142
 ] 

Apache Spark commented on SPARK-31684:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/28511

> Overwrite partition failed with 'WRONG FS' when the target partition is not 
> belong to the filesystem as same as the table 
> --
>
> Key: SPARK-31684
> URL: https://issues.apache.org/jira/browse/SPARK-31684
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.3, 2.2.3, 2.3.4, 2.4.5, 3.0.0, 3.1.0
>Reporter: Kent Yao
>Priority: Blocker
>
> With https://issues.apache.org/jira/browse/SPARK-18107, we will disable the 
> underlying replace(overwrite) and instead do delete in spark side and only do 
> copy in hive side to bypass the performance issue - 
> https://issues.apache.org/jira/browse/HIVE-11940
>  
> Conditionally, if the table location and partition location do not belong to 
> the same [[FileSystem]], We should not disable hive overwrite. Otherwise, 
> hive will use the [[FileSystem]] instance belong to the table location to 
> copy files, which will fail [[FileSystem#checkPath]]
> see 
> https://github.com/apache/hive/blob/rel/release-2.3.7/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L1648-L1659



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31684) Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same as the table

2020-05-11 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31684:


Assignee: Apache Spark

> Overwrite partition failed with 'WRONG FS' when the target partition is not 
> belong to the filesystem as same as the table 
> --
>
> Key: SPARK-31684
> URL: https://issues.apache.org/jira/browse/SPARK-31684
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.3, 2.2.3, 2.3.4, 2.4.5, 3.0.0, 3.1.0
>Reporter: Kent Yao
>Assignee: Apache Spark
>Priority: Blocker
>
> With https://issues.apache.org/jira/browse/SPARK-18107, we will disable the 
> underlying replace(overwrite) and instead do delete in spark side and only do 
> copy in hive side to bypass the performance issue - 
> https://issues.apache.org/jira/browse/HIVE-11940
>  
> Conditionally, if the table location and partition location do not belong to 
> the same [[FileSystem]], We should not disable hive overwrite. Otherwise, 
> hive will use the [[FileSystem]] instance belong to the table location to 
> copy files, which will fail [[FileSystem#checkPath]]
> see 
> https://github.com/apache/hive/blob/rel/release-2.3.7/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L1648-L1659



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31684) Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same as the table

2020-05-11 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-31684:
-
Description: 
With https://issues.apache.org/jira/browse/SPARK-18107, we will disable the 
underlying replace(overwrite) and instead do delete in spark side and only do 
copy in hive side to bypass the performance issue - 
https://issues.apache.org/jira/browse/HIVE-11940
 
Conditionally, if the table location and partition location do not belong to 
the same [[FileSystem]], We should not disable hive overwrite. Otherwise, hive 
will use the [[FileSystem]] instance belong to the table location to copy 
files, which will fail [[FileSystem#checkPath]]
see 
https://github.com/apache/hive/blob/rel/release-2.3.7/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L1648-L1659

  was:
With https://issues.apache.org/jira/browse/SPARK-18107, we conditionally 
disable the underlying replace(overwrite) and instead do delete in spark side 
and only do copy in hive side to bypass the performance issue - 
https://issues.apache.org/jira/browse/HIVE-11940
 
Additionally, if the table location and partition location do not belong to the 
same [[FileSystem]], We should not disable hive overwrite. Otherwise, hive will 
use the [[FileSystem]] instance belong to the table location to copy files, 
which will fail [[FileSystem#checkPath]]
see 
https://github.com/apache/hive/blob/rel/release-2.3.7/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L1648-L1659


> Overwrite partition failed with 'WRONG FS' when the target partition is not 
> belong to the filesystem as same as the table 
> --
>
> Key: SPARK-31684
> URL: https://issues.apache.org/jira/browse/SPARK-31684
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.3, 2.2.3, 2.3.4, 2.4.5, 3.0.0, 3.1.0
>Reporter: Kent Yao
>Priority: Blocker
>
> With https://issues.apache.org/jira/browse/SPARK-18107, we will disable the 
> underlying replace(overwrite) and instead do delete in spark side and only do 
> copy in hive side to bypass the performance issue - 
> https://issues.apache.org/jira/browse/HIVE-11940
>  
> Conditionally, if the table location and partition location do not belong to 
> the same [[FileSystem]], We should not disable hive overwrite. Otherwise, 
> hive will use the [[FileSystem]] instance belong to the table location to 
> copy files, which will fail [[FileSystem#checkPath]]
> see 
> https://github.com/apache/hive/blob/rel/release-2.3.7/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L1648-L1659



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31684) Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same as the table

2020-05-11 Thread Kent Yao (Jira)

Kent Yao created SPARK-31684:


 Summary: Overwrite partition failed with 'WRONG FS' when the 
target partition is not belong to the filesystem as same as the table 
 Key: SPARK-31684
 URL: https://issues.apache.org/jira/browse/SPARK-31684
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.5, 2.3.4, 2.2.3, 2.1.3, 3.0.0, 3.1.0
Reporter: Kent Yao


With https://issues.apache.org/jira/browse/SPARK-18107, we conditionally 
disable the underlying replace(overwrite) and instead do delete in spark side 
and only do copy in hive side to bypass the performance issue - 
https://issues.apache.org/jira/browse/HIVE-11940
 
Additionally, if the table location and partition location do not belong to the 
same [[FileSystem]], We should not disable hive overwrite. Otherwise, hive will 
use the [[FileSystem]] instance belong to the table location to copy files, 
which will fail [[FileSystem#checkPath]]
see 
https://github.com/apache/hive/blob/rel/release-2.3.7/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L1648-L1659



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-30699) GMM blockify input vectors

2020-05-11 Thread zhengruifeng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng resolved SPARK-30699.
--
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 27473
[https://github.com/apache/spark/pull/27473]

> GMM blockify input vectors
> --
>
> Key: SPARK-30699
> URL: https://issues.apache.org/jira/browse/SPARK-30699
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.1.0
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Minor
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31683) Make Prometheus output consistent with DropWizard 4.1 result

2020-05-11 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105058#comment-17105058
 ] 

Apache Spark commented on SPARK-31683:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/28510

> Make Prometheus output consistent with DropWizard 4.1 result
> 
>
> Key: SPARK-31683
> URL: https://issues.apache.org/jira/browse/SPARK-31683
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> SPARK-29032 adds Prometheus support.
> After that, SPARK-29674 upgraded DropWizard for JDK9+ support and causes 
> difference in output labels and number of keys.
>  
> This issue aims to fix this inconsistency in Spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31683) Make Prometheus output consistent with DropWizard 4.1 result

2020-05-11 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105056#comment-17105056
 ] 

Apache Spark commented on SPARK-31683:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/28510

> Make Prometheus output consistent with DropWizard 4.1 result
> 
>
> Key: SPARK-31683
> URL: https://issues.apache.org/jira/browse/SPARK-31683
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> SPARK-29032 adds Prometheus support.
> After that, SPARK-29674 upgraded DropWizard for JDK9+ support and causes 
> difference in output labels and number of keys.
>  
> This issue aims to fix this inconsistency in Spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31683) Make Prometheus output consistent with DropWizard 4.1 result

2020-05-11 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31683:


Assignee: Apache Spark

> Make Prometheus output consistent with DropWizard 4.1 result
> 
>
> Key: SPARK-31683
> URL: https://issues.apache.org/jira/browse/SPARK-31683
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>
> SPARK-29032 adds Prometheus support.
> After that, SPARK-29674 upgraded DropWizard for JDK9+ support and causes 
> difference in output labels and number of keys.
>  
> This issue aims to fix this inconsistency in Spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31683) Make Prometheus output consistent with DropWizard 4.1 result

2020-05-11 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31683:


Assignee: (was: Apache Spark)

> Make Prometheus output consistent with DropWizard 4.1 result
> 
>
> Key: SPARK-31683
> URL: https://issues.apache.org/jira/browse/SPARK-31683
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> SPARK-29032 adds Prometheus support.
> After that, SPARK-29674 upgraded DropWizard for JDK9+ support and causes 
> difference in output labels and number of keys.
>  
> This issue aims to fix this inconsistency in Spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31683) Make Prometheus output consistent with DropWizard 4.1 result

2020-05-11 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-31683:
-

 Summary: Make Prometheus output consistent with DropWizard 4.1 
result
 Key: SPARK-31683
 URL: https://issues.apache.org/jira/browse/SPARK-31683
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: Dongjoon Hyun


SPARK-29032 adds Prometheus support.

After that, SPARK-29674 upgraded DropWizard for JDK9+ support and causes 
difference in output labels and number of keys.

 

This issue aims to fix this inconsistency in Spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31655) Upgrade snappy to version 1.1.7.5

2020-05-11 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105052#comment-17105052
 ] 

Apache Spark commented on SPARK-31655:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/28509

> Upgrade snappy to version 1.1.7.5
> -
>
> Key: SPARK-31655
> URL: https://issues.apache.org/jira/browse/SPARK-31655
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Minor
> Fix For: 3.1.0
>
>
> Upgrade snappy to version 1.1.7.5



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31655) Upgrade snappy to version 1.1.7.5

2020-05-11 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105049#comment-17105049
 ] 

Apache Spark commented on SPARK-31655:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/28508

> Upgrade snappy to version 1.1.7.5
> -
>
> Key: SPARK-31655
> URL: https://issues.apache.org/jira/browse/SPARK-31655
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Minor
> Fix For: 3.1.0
>
>
> Upgrade snappy to version 1.1.7.5



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31655) Upgrade snappy to version 1.1.7.5

2020-05-11 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105043#comment-17105043
 ] 

Apache Spark commented on SPARK-31655:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/28507

> Upgrade snappy to version 1.1.7.5
> -
>
> Key: SPARK-31655
> URL: https://issues.apache.org/jira/browse/SPARK-31655
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Minor
> Fix For: 3.1.0
>
>
> Upgrade snappy to version 1.1.7.5



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31655) Upgrade snappy to version 1.1.7.5

2020-05-11 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105042#comment-17105042
 ] 

Apache Spark commented on SPARK-31655:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/28507

> Upgrade snappy to version 1.1.7.5
> -
>
> Key: SPARK-31655
> URL: https://issues.apache.org/jira/browse/SPARK-31655
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Minor
> Fix For: 3.1.0
>
>
> Upgrade snappy to version 1.1.7.5



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31655) Upgrade snappy to version 1.1.7.5

2020-05-11 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105026#comment-17105026
 ] 

Apache Spark commented on SPARK-31655:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/28506

> Upgrade snappy to version 1.1.7.5
> -
>
> Key: SPARK-31655
> URL: https://issues.apache.org/jira/browse/SPARK-31655
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Minor
> Fix For: 3.1.0
>
>
> Upgrade snappy to version 1.1.7.5



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31655) Upgrade snappy to version 1.1.7.5

2020-05-11 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105025#comment-17105025
 ] 

Apache Spark commented on SPARK-31655:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/28506

> Upgrade snappy to version 1.1.7.5
> -
>
> Key: SPARK-31655
> URL: https://issues.apache.org/jira/browse/SPARK-31655
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Minor
> Fix For: 3.1.0
>
>
> Upgrade snappy to version 1.1.7.5



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31682) Turn on spark.sql.legacy.createHiveTableByDefault.enabled by default

2020-05-11 Thread wuyi (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wuyi updated SPARK-31682:
-
Parent: (was: SPARK-30098)
Issue Type: Improvement  (was: Sub-task)

>  Turn on spark.sql.legacy.createHiveTableByDefault.enabled by default
> -
>
> Key: SPARK-31682
> URL: https://issues.apache.org/jira/browse/SPARK-31682
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: wuyi
>Priority: Major
>
> According to the latest status of [[DISCUSS] Resolve ambiguous parser rule 
> between two "create 
> table"s|http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Resolve-ambiguous-parser-rule-between-two-quot-create-table-quot-s-td29051i20.html#a29355],
>  there're might be a choice to turn this conf on by default to unblock Spark 
> 3.0 release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31682) Turn on spark.sql.legacy.createHiveTableByDefault.enabled by default

2020-05-11 Thread wuyi (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wuyi updated SPARK-31682:
-
Parent: SPARK-31085
Issue Type: Sub-task  (was: Improvement)

>  Turn on spark.sql.legacy.createHiveTableByDefault.enabled by default
> -
>
> Key: SPARK-31682
> URL: https://issues.apache.org/jira/browse/SPARK-31682
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: wuyi
>Priority: Major
>
> According to the latest status of [[DISCUSS] Resolve ambiguous parser rule 
> between two "create 
> table"s|http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Resolve-ambiguous-parser-rule-between-two-quot-create-table-quot-s-td29051i20.html#a29355],
>  there're might be a choice to turn this conf on by default to unblock Spark 
> 3.0 release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31682) Turn on spark.sql.legacy.createHiveTableByDefault.enabled by default

2020-05-11 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31682:


Assignee: (was: Apache Spark)

>  Turn on spark.sql.legacy.createHiveTableByDefault.enabled by default
> -
>
> Key: SPARK-31682
> URL: https://issues.apache.org/jira/browse/SPARK-31682
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: wuyi
>Priority: Major
>
> According to the latest status of [[DISCUSS] Resolve ambiguous parser rule 
> between two "create 
> table"s|http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Resolve-ambiguous-parser-rule-between-two-quot-create-table-quot-s-td29051i20.html#a29355],
>  there're might be a choice to turn this conf on by default to unblock Spark 
> 3.0 release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31682) Turn on spark.sql.legacy.createHiveTableByDefault.enabled by default

2020-05-11 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31682:


Assignee: Apache Spark

>  Turn on spark.sql.legacy.createHiveTableByDefault.enabled by default
> -
>
> Key: SPARK-31682
> URL: https://issues.apache.org/jira/browse/SPARK-31682
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: wuyi
>Assignee: Apache Spark
>Priority: Major
>
> According to the latest status of [[DISCUSS] Resolve ambiguous parser rule 
> between two "create 
> table"s|http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Resolve-ambiguous-parser-rule-between-two-quot-create-table-quot-s-td29051i20.html#a29355],
>  there're might be a choice to turn this conf on by default to unblock Spark 
> 3.0 release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31682) Turn on spark.sql.legacy.createHiveTableByDefault.enabled by default

2020-05-11 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105016#comment-17105016
 ] 

Apache Spark commented on SPARK-31682:
--

User 'Ngone51' has created a pull request for this issue:
https://github.com/apache/spark/pull/28500

>  Turn on spark.sql.legacy.createHiveTableByDefault.enabled by default
> -
>
> Key: SPARK-31682
> URL: https://issues.apache.org/jira/browse/SPARK-31682
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: wuyi
>Priority: Major
>
> According to the latest status of [[DISCUSS] Resolve ambiguous parser rule 
> between two "create 
> table"s|http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Resolve-ambiguous-parser-rule-between-two-quot-create-table-quot-s-td29051i20.html#a29355],
>  there're might be a choice to turn this conf on by default to unblock Spark 
> 3.0 release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31682) Turn on spark.sql.legacy.createHiveTableByDefault.enabled by default

2020-05-11 Thread wuyi (Jira)

wuyi created SPARK-31682:


 Summary:  Turn on 
spark.sql.legacy.createHiveTableByDefault.enabled by default
 Key: SPARK-31682
 URL: https://issues.apache.org/jira/browse/SPARK-31682
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: wuyi


According to the latest status of [[DISCUSS] Resolve ambiguous parser rule 
between two "create 
table"s|http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Resolve-ambiguous-parser-rule-between-two-quot-create-table-quot-s-td29051i20.html#a29355],
 there're might be a choice to turn this conf on by default to unblock Spark 
3.0 release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31393) Show the correct alias in schema for expression

2020-05-11 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-31393.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 28164
[https://github.com/apache/spark/pull/28164]

> Show the correct alias in schema for expression
> ---
>
> Key: SPARK-31393
> URL: https://issues.apache.org/jira/browse/SPARK-31393
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.0.0
>
>
> Spark SQL exists some function no elegant implementation alias.
> For example: BitwiseCount override the sql method
> {code:java}
> override def sql: String = s"bit_count(${child.sql})" 
> {code}
> I don't think it's elegant enough.
> Because `Expression` gives the following definitions.
> {code:java}
>   def sql: String = {
> val childrenSQL = children.map(_.sql).mkString(", ")
> s"$prettyName($childrenSQL)"
>   }
> {code}
> By this definition, BitwiseCount should override `prettyName` method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31393) Show the correct alias in schema for expression

2020-05-11 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-31393:


Assignee: jiaan.geng

> Show the correct alias in schema for expression
> ---
>
> Key: SPARK-31393
> URL: https://issues.apache.org/jira/browse/SPARK-31393
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
>
> Spark SQL exists some function no elegant implementation alias.
> For example: BitwiseCount override the sql method
> {code:java}
> override def sql: String = s"bit_count(${child.sql})" 
> {code}
> I don't think it's elegant enough.
> Because `Expression` gives the following definitions.
> {code:java}
>   def sql: String = {
> val childrenSQL = children.map(_.sql).mkString(", ")
> s"$prettyName($childrenSQL)"
>   }
> {code}
> By this definition, BitwiseCount should override `prettyName` method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31559) AM starts with initial fetched tokens in any attempt

2020-05-11 Thread Marcelo Masiero Vanzin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-31559.

Fix Version/s: 3.0.0
 Assignee: Jungtaek Lim
   Resolution: Fixed

> AM starts with initial fetched tokens in any attempt
> 
>
> Key: SPARK-31559
> URL: https://issues.apache.org/jira/browse/SPARK-31559
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0
>
>
> The issue is only occurred in yarn-cluster mode.
> Submitter will obtain delegation tokens for yarn-cluster mode, and add these 
> credentials to the launch context. AM will be launched with these 
> credentials, and AM and driver are able to leverage these tokens.
> In Yarn cluster mode, driver is launched in AM, which in turn initializes 
> token manager (while initializing SparkContext) and obtain delegation tokens 
> (+ schedule to renew) if both principal and keytab are available.
> That said, even we provide principal and keytab to run application with 
> yarn-cluster mode, AM always starts with initial tokens from launch context 
> until token manager runs and obtains delegation tokens.
> So there's a "gap", and if user codes (driver) access to external system with 
> delegation tokens (e.g. HDFS) before initializing SparkContext, it cannot 
> leverage the tokens token manager will obtain. It will make the application 
> fail if AM is killed "after" the initial tokens are expired and relaunched.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31671) Wrong error message in VectorAssembler when column lengths can not be inferred

2020-05-11 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-31671.
--
Fix Version/s: 2.4.7
   3.0.0
 Assignee: YijieFan
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/28487

> Wrong error message in VectorAssembler  when column lengths can not be 
> inferred
> ---
>
> Key: SPARK-31671
> URL: https://issues.apache.org/jira/browse/SPARK-31671
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.4.4
> Environment: Mac OS  catalina
>Reporter: YijieFan
>Assignee: YijieFan
>Priority: Minor
> Fix For: 3.0.0, 2.4.7
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> In VectorAssembler when input column lengths can not be inferred and 
> handleInvalid = "keep", it will throw a runtime exception with message like 
> below
> _Can not infer column lengths with handleInvalid = "keep". *Consider using 
> VectorSizeHint*_
>  *_|to add metadata for columns: [column1, column2]_*
> However, even if you set vector size hint for *column1*, the message remains, 
> and will not change to  *[column2]* only. This is not consistent with the 
> description in the error message.
> This introduce difficulties when I try to resolve this exception, for I do 
> not know which column required vectorSizeHint. This is especially troublesome 
> when you have a large number of columns to deal with.
> Here is a simple example:
>  
> {code:java}
> // create a df without vector size
> val df = Seq(
>   (Vectors.dense(1.0), Vectors.dense(2.0))
> ).toDF("n1", "n2")
> // only set vector size hint for n1 column
> val hintedDf = new VectorSizeHint()
>   .setInputCol("n1")
>   .setSize(1)
>   .transform(df)
> // assemble n1, n2
> val output = new VectorAssembler()
>   .setInputCols(Array("n1", "n2"))
>   .setOutputCol("features")
>   .setHandleInvalid("keep")
>   .transform(hintedDf)
> // because only n1 has vector size, the error message should tell us to set 
> vector size for n2 too
> output.show()
> {code}
> Expected error message:
>  
> {code:java}
> Can not infer column lengths with handleInvalid = "keep". Consider using 
> VectorSizeHint to add metadata for columns: [n2].
> {code}
> Actual error message:
> {code:java}
> Can not infer column lengths with handleInvalid = "keep". Consider using 
> VectorSizeHint to add metadata for columns: [n1, n2].
> {code}
> I change one line in VectorAssembler.scala, so that it can work properly as 
> expected. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20007) Make SparkR apply() functions robust to workers that return empty data.frame

2020-05-11 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-20007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104969#comment-17104969
 ] 

Apache Spark commented on SPARK-20007:
--

User 'liangz1' has created a pull request for this issue:
https://github.com/apache/spark/pull/28504

> Make SparkR apply() functions robust to workers that return empty data.frame
> 
>
> Key: SPARK-20007
> URL: https://issues.apache.org/jira/browse/SPARK-20007
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Hossein Falaki
>Priority: Major
>  Labels: bulk-closed
>
> When using {{gapply()}} (or other members of {{apply()}} family) with a 
> schema, Spark will try to parse data returned form the R process on each 
> worker as Spark DataFrame Rows based on the schema. In this case our provided 
> schema suggests that we have six column. When an R worker returns results to 
> JVM, SparkSQL will try to access its columns one by one and cast them to 
> proper types. If R worker returns nothing, JVM will throw 
> {{ArrayIndexOutOfBoundsException}} exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31671) Wrong error message in VectorAssembler when column lengths can not be inferred

2020-05-11 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-31671:
-
Affects Version/s: (was: 3.0.1)
   Labels:   (was: pull-request-available)

> Wrong error message in VectorAssembler  when column lengths can not be 
> inferred
> ---
>
> Key: SPARK-31671
> URL: https://issues.apache.org/jira/browse/SPARK-31671
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.4.4
> Environment: Mac OS  catalina
>Reporter: YijieFan
>Priority: Minor
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> In VectorAssembler when input column lengths can not be inferred and 
> handleInvalid = "keep", it will throw a runtime exception with message like 
> below
> _Can not infer column lengths with handleInvalid = "keep". *Consider using 
> VectorSizeHint*_
>  *_|to add metadata for columns: [column1, column2]_*
> However, even if you set vector size hint for *column1*, the message remains, 
> and will not change to  *[column2]* only. This is not consistent with the 
> description in the error message.
> This introduce difficulties when I try to resolve this exception, for I do 
> not know which column required vectorSizeHint. This is especially troublesome 
> when you have a large number of columns to deal with.
> Here is a simple example:
>  
> {code:java}
> // create a df without vector size
> val df = Seq(
>   (Vectors.dense(1.0), Vectors.dense(2.0))
> ).toDF("n1", "n2")
> // only set vector size hint for n1 column
> val hintedDf = new VectorSizeHint()
>   .setInputCol("n1")
>   .setSize(1)
>   .transform(df)
> // assemble n1, n2
> val output = new VectorAssembler()
>   .setInputCols(Array("n1", "n2"))
>   .setOutputCol("features")
>   .setHandleInvalid("keep")
>   .transform(hintedDf)
> // because only n1 has vector size, the error message should tell us to set 
> vector size for n2 too
> output.show()
> {code}
> Expected error message:
>  
> {code:java}
> Can not infer column lengths with handleInvalid = "keep". Consider using 
> VectorSizeHint to add metadata for columns: [n2].
> {code}
> Actual error message:
> {code:java}
> Can not infer column lengths with handleInvalid = "keep". Consider using 
> VectorSizeHint to add metadata for columns: [n1, n2].
> {code}
> I change one line in VectorAssembler.scala, so that it can work properly as 
> expected. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20007) Make SparkR apply() functions robust to workers that return empty data.frame

2020-05-11 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-20007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104967#comment-17104967
 ] 

Apache Spark commented on SPARK-20007:
--

User 'liangz1' has created a pull request for this issue:
https://github.com/apache/spark/pull/28504

> Make SparkR apply() functions robust to workers that return empty data.frame
> 
>
> Key: SPARK-20007
> URL: https://issues.apache.org/jira/browse/SPARK-20007
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Hossein Falaki
>Priority: Major
>  Labels: bulk-closed
>
> When using {{gapply()}} (or other members of {{apply()}} family) with a 
> schema, Spark will try to parse data returned form the R process on each 
> worker as Spark DataFrame Rows based on the schema. In this case our provided 
> schema suggests that we have six column. When an R worker returns results to 
> JVM, SparkSQL will try to access its columns one by one and cast them to 
> proper types. If R worker returns nothing, JVM will throw 
> {{ArrayIndexOutOfBoundsException}} exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31666) Cannot map hostPath volumes to container

2020-05-11 Thread Stephen Hopper (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104947#comment-17104947
 ] 

Stephen Hopper commented on SPARK-31666:


I was able to fix the issue by patching Spark. As noted in this issue: 
[https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/issues/828]

I applied versions of these two PRs (for Spark 3.0) (with some minor tweaks to 
make them compatible with 2.4.5):
[https://github.com/apache/spark/pull/22323]
[https://github.com/apache/spark/pull/24879]

I then rebuilt Spark (as well as spark-submit and spark-operator) and it's 
working now.

 

However, this is still going to be an issue for anyone on 2.4.5 as the docs 
state that hostPath directories should be useable. Should I open a PR to 
backport this fix for Spark 2.4.6? When is Spark 3.0 coming out?

> Cannot map hostPath volumes to container
> 
>
> Key: SPARK-31666
> URL: https://issues.apache.org/jira/browse/SPARK-31666
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.5
>Reporter: Stephen Hopper
>Priority: Major
>
> I'm trying to mount additional hostPath directories as seen in a couple of 
> places:
> [https://aws.amazon.com/blogs/containers/optimizing-spark-performance-on-kubernetes/]
> [https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/user-guide.md#using-volume-for-scratch-space]
> [https://spark.apache.org/docs/latest/running-on-kubernetes.html#using-kubernetes-volumes]
>  
> However, whenever I try to submit my job, I run into this error:
> {code:java}
> Uncaught exception in thread kubernetes-executor-snapshots-subscribers-1 │
>  io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: 
> POST at: https://kubernetes.default.svc/api/v1/namespaces/my-spark-ns/pods. 
> Message: Pod "spark-pi-1588970477877-exec-1" is invalid: 
> spec.containers[0].volumeMounts[1].mountPath: Invalid value: "/tmp1": must be 
> unique. Received status: Status(apiVersion=v1, code=422, 
> details=StatusDetails(causes=[StatusCause(field=spec.containers[0].volumeMounts[1].mountPath,
>  message=Invalid value: "/tmp1": must be unique, reason=FieldValueInvalid, 
> additionalProperties={})], group=null, kind=Pod, 
> name=spark-pi-1588970477877-exec-1, retryAfterSeconds=null, uid=null, 
> additionalProperties={}), kind=Status, message=Pod 
> "spark-pi-1588970477877-exec-1" is invalid: 
> spec.containers[0].volumeMounts[1].mountPath: Invalid value: "/tmp1": must be 
> unique, metadata=ListMeta(_continue=null, remainingItemCount=null, 
> resourceVersion=null, selfLink=null, additionalProperties={}), 
> reason=Invalid, status=Failure, additionalProperties={}).{code}
>  
> This is my spark-submit command (note: I've used my own build of spark for 
> kubernetes as well as a few other images that I've seen floating around (such 
> as this one seedjeffwan/spark:v2.4.5) and they all have this same issue):
> {code:java}
> bin/spark-submit \
>  --master k8s://https://my-k8s-server:443 \
>  --deploy-mode cluster \
>  --name spark-pi \
>  --class org.apache.spark.examples.SparkPi \
>  --conf spark.executor.instances=2 \
>  --conf spark.kubernetes.container.image=my-spark-image:my-tag \
>  --conf spark.kubernetes.driver.pod.name=sparkpi-test-driver \
>  --conf spark.kubernetes.namespace=my-spark-ns \
>  --conf 
> spark.kubernetes.executor.volumes.hostPath.spark-local-dir-2.mount.path=/tmp1 
> \
>  --conf 
> spark.kubernetes.executor.volumes.hostPath.spark-local-dir-2.options.path=/tmp1
>  \
>  --conf spark.local.dir="/tmp1" \
>  --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark
>  local:///opt/spark/examples/jars/spark-examples_2.11-2.4.5.jar 2{code}
> Any ideas on what's causing this?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31681) Python multiclass logistic regression evaluate should return LogisticRegressionSummary

2020-05-11 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104936#comment-17104936
 ] 

Apache Spark commented on SPARK-31681:
--

User 'huaxingao' has created a pull request for this issue:
https://github.com/apache/spark/pull/28503

> Python multiclass logistic regression evaluate should return 
> LogisticRegressionSummary
> --
>
> Key: SPARK-31681
> URL: https://issues.apache.org/jira/browse/SPARK-31681
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 3.1.0
>Reporter: Huaxin Gao
>Priority: Minor
>
> {code:java}
> def evaluate(self, dataset):
> ..
> java_blr_summary = self._call_java("evaluate", dataset)
> return BinaryLogisticRegressionSummary(java_blr_summary)
> {code}
> We should return LogisticRegressionSummary instead of 
> BinaryLogisticRegressionSummary for multiclass LogisticRegression



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31681) Python multiclass logistic regression evaluate should return LogisticRegressionSummary

2020-05-11 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31681:


Assignee: Apache Spark

> Python multiclass logistic regression evaluate should return 
> LogisticRegressionSummary
> --
>
> Key: SPARK-31681
> URL: https://issues.apache.org/jira/browse/SPARK-31681
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 3.1.0
>Reporter: Huaxin Gao
>Assignee: Apache Spark
>Priority: Minor
>
> {code:java}
> def evaluate(self, dataset):
> ..
> java_blr_summary = self._call_java("evaluate", dataset)
> return BinaryLogisticRegressionSummary(java_blr_summary)
> {code}
> We should return LogisticRegressionSummary instead of 
> BinaryLogisticRegressionSummary for multiclass LogisticRegression



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31681) Python multiclass logistic regression evaluate should return LogisticRegressionSummary

2020-05-11 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31681:


Assignee: (was: Apache Spark)

> Python multiclass logistic regression evaluate should return 
> LogisticRegressionSummary
> --
>
> Key: SPARK-31681
> URL: https://issues.apache.org/jira/browse/SPARK-31681
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 3.1.0
>Reporter: Huaxin Gao
>Priority: Minor
>
> {code:java}
> def evaluate(self, dataset):
> ..
> java_blr_summary = self._call_java("evaluate", dataset)
> return BinaryLogisticRegressionSummary(java_blr_summary)
> {code}
> We should return LogisticRegressionSummary instead of 
> BinaryLogisticRegressionSummary for multiclass LogisticRegression



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31681) Python multiclass logistic regression evaluate should return LogisticRegressionSummary

2020-05-11 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104935#comment-17104935
 ] 

Apache Spark commented on SPARK-31681:
--

User 'huaxingao' has created a pull request for this issue:
https://github.com/apache/spark/pull/28503

> Python multiclass logistic regression evaluate should return 
> LogisticRegressionSummary
> --
>
> Key: SPARK-31681
> URL: https://issues.apache.org/jira/browse/SPARK-31681
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 3.1.0
>Reporter: Huaxin Gao
>Priority: Minor
>
> {code:java}
> def evaluate(self, dataset):
> ..
> java_blr_summary = self._call_java("evaluate", dataset)
> return BinaryLogisticRegressionSummary(java_blr_summary)
> {code}
> We should return LogisticRegressionSummary instead of 
> BinaryLogisticRegressionSummary for multiclass LogisticRegression



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31681) Python multiclass logistic regression evaluate should return LogisticRegressionSummary

2020-05-11 Thread Huaxin Gao (Jira)

Huaxin Gao created SPARK-31681:
--

 Summary: Python multiclass logistic regression evaluate should 
return LogisticRegressionSummary
 Key: SPARK-31681
 URL: https://issues.apache.org/jira/browse/SPARK-31681
 Project: Spark
  Issue Type: Bug
  Components: ML, PySpark
Affects Versions: 3.1.0
Reporter: Huaxin Gao



{code:java}
def evaluate(self, dataset):
..
java_blr_summary = self._call_java("evaluate", dataset)
return BinaryLogisticRegressionSummary(java_blr_summary)
{code}

We should return LogisticRegressionSummary instead of 
BinaryLogisticRegressionSummary for multiclass LogisticRegression



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31680) Support Java 8 datetime types by Random data generator

2020-05-11 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31680:


Assignee: Apache Spark

> Support Java 8 datetime types by Random data generator
> --
>
> Key: SPARK-31680
> URL: https://issues.apache.org/jira/browse/SPARK-31680
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Currently, RandomDataGenerator.forType can generate:
> * java.sql.Date values for DateType
> * java.sql.Timestamp values for TimestampType
> The ticket aims to support java.time.Instant for TimestampType and 
> java.time.LocalDate for DateType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31680) Support Java 8 datetime types by Random data generator

2020-05-11 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104884#comment-17104884
 ] 

Apache Spark commented on SPARK-31680:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/28502

> Support Java 8 datetime types by Random data generator
> --
>
> Key: SPARK-31680
> URL: https://issues.apache.org/jira/browse/SPARK-31680
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Currently, RandomDataGenerator.forType can generate:
> * java.sql.Date values for DateType
> * java.sql.Timestamp values for TimestampType
> The ticket aims to support java.time.Instant for TimestampType and 
> java.time.LocalDate for DateType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31680) Support Java 8 datetime types by Random data generator

2020-05-11 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31680:


Assignee: (was: Apache Spark)

> Support Java 8 datetime types by Random data generator
> --
>
> Key: SPARK-31680
> URL: https://issues.apache.org/jira/browse/SPARK-31680
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Currently, RandomDataGenerator.forType can generate:
> * java.sql.Date values for DateType
> * java.sql.Timestamp values for TimestampType
> The ticket aims to support java.time.Instant for TimestampType and 
> java.time.LocalDate for DateType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31680) Support Java 8 datetime types by Random data generator

2020-05-11 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-31680:
--

 Summary: Support Java 8 datetime types by Random data generator
 Key: SPARK-31680
 URL: https://issues.apache.org/jira/browse/SPARK-31680
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 3.1.0
Reporter: Maxim Gekk


Currently, RandomDataGenerator.forType can generate:
* java.sql.Date values for DateType
* java.sql.Timestamp values for TimestampType
The ticket aims to support java.time.Instant for TimestampType and 
java.time.LocalDate for DateType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31456) If shutdownhook is added with priority Integer.MIN_VALUE, it's supposed to be called the last, but it gets called before other positive priority shutdownhook

2020-05-11 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-31456.
---
Fix Version/s: 3.0.0
   2.4.6
   Resolution: Fixed

Issue resolved by pull request 28494
[https://github.com/apache/spark/pull/28494]

> If shutdownhook is added with priority Integer.MIN_VALUE, it's supposed to be 
> called the last, but it gets called before other positive priority 
> shutdownhook
> -
>
> Key: SPARK-31456
> URL: https://issues.apache.org/jira/browse/SPARK-31456
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.3, 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.5
> Environment: macOS Mojave 10.14.6
>Reporter: Xiaolei Liu
>Assignee: Oleg Kuznetsov
>Priority: Major
> Fix For: 2.4.6, 3.0.0
>
>
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala
> Since shutdownHookManager use below method to do the comparison. 
> override def compareTo(other: SparkShutdownHook): Int = {
> other.priority - priority
>   }
> Which will cause :
> (Int)(25 - Integer.MIN_VALUE) < 0
> Then the shutdownhook with Integer.Min_VALUE would not be called the last. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31456) If shutdownhook is added with priority Integer.MIN_VALUE, it's supposed to be called the last, but it gets called before other positive priority shutdownhook

2020-05-11 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31456:
--
Fix Version/s: (was: 2.4.6)
   2.4.7

> If shutdownhook is added with priority Integer.MIN_VALUE, it's supposed to be 
> called the last, but it gets called before other positive priority 
> shutdownhook
> -
>
> Key: SPARK-31456
> URL: https://issues.apache.org/jira/browse/SPARK-31456
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.3, 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.5
> Environment: macOS Mojave 10.14.6
>Reporter: Xiaolei Liu
>Assignee: Oleg Kuznetsov
>Priority: Major
> Fix For: 3.0.0, 2.4.7
>
>
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala
> Since shutdownHookManager use below method to do the comparison. 
> override def compareTo(other: SparkShutdownHook): Int = {
> other.priority - priority
>   }
> Which will cause :
> (Int)(25 - Integer.MIN_VALUE) < 0
> Then the shutdownhook with Integer.Min_VALUE would not be called the last. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31456) If shutdownhook is added with priority Integer.MIN_VALUE, it's supposed to be called the last, but it gets called before other positive priority shutdownhook

2020-05-11 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-31456:
-

Assignee: Oleg Kuznetsov

> If shutdownhook is added with priority Integer.MIN_VALUE, it's supposed to be 
> called the last, but it gets called before other positive priority 
> shutdownhook
> -
>
> Key: SPARK-31456
> URL: https://issues.apache.org/jira/browse/SPARK-31456
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.3, 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.5
> Environment: macOS Mojave 10.14.6
>Reporter: Xiaolei Liu
>Assignee: Oleg Kuznetsov
>Priority: Major
>
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala
> Since shutdownHookManager use below method to do the comparison. 
> override def compareTo(other: SparkShutdownHook): Int = {
> other.priority - priority
>   }
> Which will cause :
> (Int)(25 - Integer.MIN_VALUE) < 0
> Then the shutdownhook with Integer.Min_VALUE would not be called the last. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31456) If shutdownhook is added with priority Integer.MIN_VALUE, it's supposed to be called the last, but it gets called before other positive priority shutdownhook

2020-05-11 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31456:
--
Affects Version/s: 1.6.3
   2.0.2
   2.1.3
   2.2.3
   2.3.4

> If shutdownhook is added with priority Integer.MIN_VALUE, it's supposed to be 
> called the last, but it gets called before other positive priority 
> shutdownhook
> -
>
> Key: SPARK-31456
> URL: https://issues.apache.org/jira/browse/SPARK-31456
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.3, 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.5
> Environment: macOS Mojave 10.14.6
>Reporter: Xiaolei Liu
>Priority: Major
>
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala
> Since shutdownHookManager use below method to do the comparison. 
> override def compareTo(other: SparkShutdownHook): Int = {
> other.priority - priority
>   }
> Which will cause :
> (Int)(25 - Integer.MIN_VALUE) < 0
> Then the shutdownhook with Integer.Min_VALUE would not be called the last. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27249) Developers API for Transformers beyond UnaryTransformer

2020-05-11 Thread Nick Afshartous (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-27249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104661#comment-17104661
 ] 

Nick Afshartous commented on SPARK-27249:
-

[~enrush] Hi Everett, can you please chime in on the thread in the PR.  There's 
a question about whether or not the need is covered by existing API's.  

> Developers API for Transformers beyond UnaryTransformer
> ---
>
> Key: SPARK-27249
> URL: https://issues.apache.org/jira/browse/SPARK-27249
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Affects Versions: 3.1.0
>Reporter: Everett Rush
>Priority: Minor
>  Labels: starter
> Attachments: Screen Shot 2020-01-17 at 4.20.57 PM.png
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> It would be nice to have a developers' API for dataset transformations that 
> need more than one column from a row (ie UnaryTransformer inputs one column 
> and outputs one column) or that contain objects too expensive to initialize 
> repeatedly in a UDF such as a database connection. 
>  
> Design:
> Abstract class PartitionTransformer extends Transformer and defines the 
> partition transformation function as Iterator[Row] => Iterator[Row]
> NB: This parallels the UnaryTransformer createTransformFunc method
>  
> When developers subclass this transformer, they can provide their own schema 
> for the output Row in which case the PartitionTransformer creates a row 
> encoder and executes the transformation. Alternatively the developer can set 
> output Datatype and output col name. Then the PartitionTransformer class will 
> create a new schema, a row encoder, and execute the transformation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31663) Grouping sets with having clause returns the wrong result

2020-05-11 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31663:


Assignee: (was: Apache Spark)

> Grouping sets with having clause returns the wrong result
> -
>
> Key: SPARK-31663
> URL: https://issues.apache.org/jira/browse/SPARK-31663
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.5, 3.0.0
>Reporter: Yuanjian Li
>Priority: Major
>  Labels: correctness
>
> Grouping sets with having clause returns the wrong result when the condition 
> of having contained conflicting naming. See the below example:
> {code:java}
> select sum(a) as b FROM VALUES (1, 10), (2, 20) AS T(a, b) group by GROUPING 
> SETS ((b), (a, b)) having b > 10{code}
> The `b` in `having b > 10` should be resolved as `T.b` not `sum(a)`, so the 
> right result should be
> {code:java}
> +---+
> |  b|
> +---+
> |  2|
> |  2|
> +---+{code}
> instead of an empty result.
> The root cause is similar to SPARK-31519, it's caused by we parsed HAVING as 
> Filter(..., Agg(...)) and resolved these two parts in different rules. The 
> CUBE and ROLLUP have the same issue.
> Other systems worked as expected, I checked PostgreSQL 9.6 and MS SQL Server 
> 2017.
>  
> For Apache Spark 2.0.2 ~ 2.3.4, the following query is tested.
> {code:java}
> spark-sql> select sum(a) as b from t group by b grouping sets(b) having b > 
> 10;
> Time taken: 0.194 seconds
> hive> select sum(a) as b from t group by b grouping sets(b) having b > 10;
> 2
> Time taken: 1.605 seconds, Fetched: 1 row(s) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31663) Grouping sets with having clause returns the wrong result

2020-05-11 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31663:


Assignee: Apache Spark

> Grouping sets with having clause returns the wrong result
> -
>
> Key: SPARK-31663
> URL: https://issues.apache.org/jira/browse/SPARK-31663
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.5, 3.0.0
>Reporter: Yuanjian Li
>Assignee: Apache Spark
>Priority: Major
>  Labels: correctness
>
> Grouping sets with having clause returns the wrong result when the condition 
> of having contained conflicting naming. See the below example:
> {code:java}
> select sum(a) as b FROM VALUES (1, 10), (2, 20) AS T(a, b) group by GROUPING 
> SETS ((b), (a, b)) having b > 10{code}
> The `b` in `having b > 10` should be resolved as `T.b` not `sum(a)`, so the 
> right result should be
> {code:java}
> +---+
> |  b|
> +---+
> |  2|
> |  2|
> +---+{code}
> instead of an empty result.
> The root cause is similar to SPARK-31519, it's caused by we parsed HAVING as 
> Filter(..., Agg(...)) and resolved these two parts in different rules. The 
> CUBE and ROLLUP have the same issue.
> Other systems worked as expected, I checked PostgreSQL 9.6 and MS SQL Server 
> 2017.
>  
> For Apache Spark 2.0.2 ~ 2.3.4, the following query is tested.
> {code:java}
> spark-sql> select sum(a) as b from t group by b grouping sets(b) having b > 
> 10;
> Time taken: 0.194 seconds
> hive> select sum(a) as b from t group by b grouping sets(b) having b > 10;
> 2
> Time taken: 1.605 seconds, Fetched: 1 row(s) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31663) Grouping sets with having clause returns the wrong result

2020-05-11 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104528#comment-17104528
 ] 

Apache Spark commented on SPARK-31663:
--

User 'xuanyuanking' has created a pull request for this issue:
https://github.com/apache/spark/pull/28501

> Grouping sets with having clause returns the wrong result
> -
>
> Key: SPARK-31663
> URL: https://issues.apache.org/jira/browse/SPARK-31663
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.5, 3.0.0
>Reporter: Yuanjian Li
>Priority: Major
>  Labels: correctness
>
> Grouping sets with having clause returns the wrong result when the condition 
> of having contained conflicting naming. See the below example:
> {code:java}
> select sum(a) as b FROM VALUES (1, 10), (2, 20) AS T(a, b) group by GROUPING 
> SETS ((b), (a, b)) having b > 10{code}
> The `b` in `having b > 10` should be resolved as `T.b` not `sum(a)`, so the 
> right result should be
> {code:java}
> +---+
> |  b|
> +---+
> |  2|
> |  2|
> +---+{code}
> instead of an empty result.
> The root cause is similar to SPARK-31519, it's caused by we parsed HAVING as 
> Filter(..., Agg(...)) and resolved these two parts in different rules. The 
> CUBE and ROLLUP have the same issue.
> Other systems worked as expected, I checked PostgreSQL 9.6 and MS SQL Server 
> 2017.
>  
> For Apache Spark 2.0.2 ~ 2.3.4, the following query is tested.
> {code:java}
> spark-sql> select sum(a) as b from t group by b grouping sets(b) having b > 
> 10;
> Time taken: 0.194 seconds
> hive> select sum(a) as b from t group by b grouping sets(b) having b > 10;
> 2
> Time taken: 1.605 seconds, Fetched: 1 row(s) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31575) Synchronise global JVM security configuration modification

2020-05-11 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-31575:
-
Priority: Minor  (was: Major)

> Synchronise global JVM security configuration modification
> --
>
> Key: SPARK-31575
> URL: https://issues.apache.org/jira/browse/SPARK-31575
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Minor
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31575) Synchronise global JVM security configuration modification

2020-05-11 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-31575.
--
Fix Version/s: 3.1.0
 Assignee: Gabor Somogyi
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/28368

> Synchronise global JVM security configuration modification
> --
>
> Key: SPARK-31575
> URL: https://issues.apache.org/jira/browse/SPARK-31575
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Major
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31667) Python side flatten the result dataframe of ANOVATest/ChisqTest/FValueTest

2020-05-11 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-31667.
--
Fix Version/s: 3.1.0
 Assignee: Huaxin Gao
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/28483

> Python side flatten the result dataframe of ANOVATest/ChisqTest/FValueTest
> --
>
> Key: SPARK-31667
> URL: https://issues.apache.org/jira/browse/SPARK-31667
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, PySpark
>Affects Versions: 3.1.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Minor
> Fix For: 3.1.0
>
>
> Add Python version of
> {code:java}
> @Since("3.1.0")
> def test(
> dataset: DataFrame,
> featuresCol: String,
> labelCol: String,
> flatten: Boolean): DataFrame 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30098) Use default datasource as provider for CREATE TABLE syntax

2020-05-11 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104479#comment-17104479
 ] 

Apache Spark commented on SPARK-30098:
--

User 'Ngone51' has created a pull request for this issue:
https://github.com/apache/spark/pull/28500

> Use default datasource as provider for CREATE TABLE syntax
> --
>
> Key: SPARK-30098
> URL: https://issues.apache.org/jira/browse/SPARK-30098
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
>  Labels: release-notes
> Fix For: 3.0.0
>
>
> Changing the default provider from `hive` to the value of 
> `spark.sql.sources.default` for "CREATE TABLE" syntax to make it be 
> consistent with DataFrameWriter.saveAsTable API.
> Also, it brings more friendly to end users since Spark is well know of using 
> parquet(default value of `spark.sql.sources.default`) as its default I/O 
> format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30098) Use default datasource as provider for CREATE TABLE syntax

2020-05-11 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104478#comment-17104478
 ] 

Apache Spark commented on SPARK-30098:
--

User 'Ngone51' has created a pull request for this issue:
https://github.com/apache/spark/pull/28500

> Use default datasource as provider for CREATE TABLE syntax
> --
>
> Key: SPARK-30098
> URL: https://issues.apache.org/jira/browse/SPARK-30098
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
>  Labels: release-notes
> Fix For: 3.0.0
>
>
> Changing the default provider from `hive` to the value of 
> `spark.sql.sources.default` for "CREATE TABLE" syntax to make it be 
> consistent with DataFrameWriter.saveAsTable API.
> Also, it brings more friendly to end users since Spark is well know of using 
> parquet(default value of `spark.sql.sources.default`) as its default I/O 
> format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31665) Test parquet dictionary encoding of random dates/timestamps

2020-05-11 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-31665:
---

Assignee: Maxim Gekk

> Test parquet dictionary encoding of random dates/timestamps
> ---
>
> Key: SPARK-31665
> URL: https://issues.apache.org/jira/browse/SPARK-31665
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>
> Currently, dictionary encoding is not tested in ParquetHadoopFsRelationSuite 
> test "test all data types" because dates and timestamps are uniformly 
> distributed, and dictionary encoding is not applied for the types in fact. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31665) Test parquet dictionary encoding of random dates/timestamps

2020-05-11 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31665.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 28481
[https://github.com/apache/spark/pull/28481]

> Test parquet dictionary encoding of random dates/timestamps
> ---
>
> Key: SPARK-31665
> URL: https://issues.apache.org/jira/browse/SPARK-31665
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently, dictionary encoding is not tested in ParquetHadoopFsRelationSuite 
> test "test all data types" because dates and timestamps are uniformly 
> distributed, and dictionary encoding is not applied for the types in fact. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31666) Cannot map hostPath volumes to container

2020-05-11 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104391#comment-17104391
 ] 

Hyukjin Kwon commented on SPARK-31666:
--

No idea. Something must have been wrong.

> Cannot map hostPath volumes to container
> 
>
> Key: SPARK-31666
> URL: https://issues.apache.org/jira/browse/SPARK-31666
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.5
>Reporter: Stephen Hopper
>Priority: Major
>
> I'm trying to mount additional hostPath directories as seen in a couple of 
> places:
> [https://aws.amazon.com/blogs/containers/optimizing-spark-performance-on-kubernetes/]
> [https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/user-guide.md#using-volume-for-scratch-space]
> [https://spark.apache.org/docs/latest/running-on-kubernetes.html#using-kubernetes-volumes]
>  
> However, whenever I try to submit my job, I run into this error:
> {code:java}
> Uncaught exception in thread kubernetes-executor-snapshots-subscribers-1 │
>  io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: 
> POST at: https://kubernetes.default.svc/api/v1/namespaces/my-spark-ns/pods. 
> Message: Pod "spark-pi-1588970477877-exec-1" is invalid: 
> spec.containers[0].volumeMounts[1].mountPath: Invalid value: "/tmp1": must be 
> unique. Received status: Status(apiVersion=v1, code=422, 
> details=StatusDetails(causes=[StatusCause(field=spec.containers[0].volumeMounts[1].mountPath,
>  message=Invalid value: "/tmp1": must be unique, reason=FieldValueInvalid, 
> additionalProperties={})], group=null, kind=Pod, 
> name=spark-pi-1588970477877-exec-1, retryAfterSeconds=null, uid=null, 
> additionalProperties={}), kind=Status, message=Pod 
> "spark-pi-1588970477877-exec-1" is invalid: 
> spec.containers[0].volumeMounts[1].mountPath: Invalid value: "/tmp1": must be 
> unique, metadata=ListMeta(_continue=null, remainingItemCount=null, 
> resourceVersion=null, selfLink=null, additionalProperties={}), 
> reason=Invalid, status=Failure, additionalProperties={}).{code}
>  
> This is my spark-submit command (note: I've used my own build of spark for 
> kubernetes as well as a few other images that I've seen floating around (such 
> as this one seedjeffwan/spark:v2.4.5) and they all have this same issue):
> {code:java}
> bin/spark-submit \
>  --master k8s://https://my-k8s-server:443 \
>  --deploy-mode cluster \
>  --name spark-pi \
>  --class org.apache.spark.examples.SparkPi \
>  --conf spark.executor.instances=2 \
>  --conf spark.kubernetes.container.image=my-spark-image:my-tag \
>  --conf spark.kubernetes.driver.pod.name=sparkpi-test-driver \
>  --conf spark.kubernetes.namespace=my-spark-ns \
>  --conf 
> spark.kubernetes.executor.volumes.hostPath.spark-local-dir-2.mount.path=/tmp1 
> \
>  --conf 
> spark.kubernetes.executor.volumes.hostPath.spark-local-dir-2.options.path=/tmp1
>  \
>  --conf spark.local.dir="/tmp1" \
>  --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark
>  local:///opt/spark/examples/jars/spark-examples_2.11-2.4.5.jar 2{code}
> Any ideas on what's causing this?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-31666) Cannot map hostPath volumes to container

2020-05-11 Thread Stephen Hopper (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Hopper reopened SPARK-31666:


[~hyukjin.kwon] why was this closed and marked as invalid?

> Cannot map hostPath volumes to container
> 
>
> Key: SPARK-31666
> URL: https://issues.apache.org/jira/browse/SPARK-31666
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.5
>Reporter: Stephen Hopper
>Priority: Major
>
> I'm trying to mount additional hostPath directories as seen in a couple of 
> places:
> [https://aws.amazon.com/blogs/containers/optimizing-spark-performance-on-kubernetes/]
> [https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/user-guide.md#using-volume-for-scratch-space]
> [https://spark.apache.org/docs/latest/running-on-kubernetes.html#using-kubernetes-volumes]
>  
> However, whenever I try to submit my job, I run into this error:
> {code:java}
> Uncaught exception in thread kubernetes-executor-snapshots-subscribers-1 │
>  io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: 
> POST at: https://kubernetes.default.svc/api/v1/namespaces/my-spark-ns/pods. 
> Message: Pod "spark-pi-1588970477877-exec-1" is invalid: 
> spec.containers[0].volumeMounts[1].mountPath: Invalid value: "/tmp1": must be 
> unique. Received status: Status(apiVersion=v1, code=422, 
> details=StatusDetails(causes=[StatusCause(field=spec.containers[0].volumeMounts[1].mountPath,
>  message=Invalid value: "/tmp1": must be unique, reason=FieldValueInvalid, 
> additionalProperties={})], group=null, kind=Pod, 
> name=spark-pi-1588970477877-exec-1, retryAfterSeconds=null, uid=null, 
> additionalProperties={}), kind=Status, message=Pod 
> "spark-pi-1588970477877-exec-1" is invalid: 
> spec.containers[0].volumeMounts[1].mountPath: Invalid value: "/tmp1": must be 
> unique, metadata=ListMeta(_continue=null, remainingItemCount=null, 
> resourceVersion=null, selfLink=null, additionalProperties={}), 
> reason=Invalid, status=Failure, additionalProperties={}).{code}
>  
> This is my spark-submit command (note: I've used my own build of spark for 
> kubernetes as well as a few other images that I've seen floating around (such 
> as this one seedjeffwan/spark:v2.4.5) and they all have this same issue):
> {code:java}
> bin/spark-submit \
>  --master k8s://https://my-k8s-server:443 \
>  --deploy-mode cluster \
>  --name spark-pi \
>  --class org.apache.spark.examples.SparkPi \
>  --conf spark.executor.instances=2 \
>  --conf spark.kubernetes.container.image=my-spark-image:my-tag \
>  --conf spark.kubernetes.driver.pod.name=sparkpi-test-driver \
>  --conf spark.kubernetes.namespace=my-spark-ns \
>  --conf 
> spark.kubernetes.executor.volumes.hostPath.spark-local-dir-2.mount.path=/tmp1 
> \
>  --conf 
> spark.kubernetes.executor.volumes.hostPath.spark-local-dir-2.options.path=/tmp1
>  \
>  --conf spark.local.dir="/tmp1" \
>  --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark
>  local:///opt/spark/examples/jars/spark-examples_2.11-2.4.5.jar 2{code}
> Any ideas on what's causing this?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31678) PrintStackTrace for Spark SQL CLI when error occurs

2020-05-11 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104352#comment-17104352
 ] 

Apache Spark commented on SPARK-31678:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/28499

> PrintStackTrace for Spark SQL CLI when error occurs
> ---
>
> Key: SPARK-31678
> URL: https://issues.apache.org/jira/browse/SPARK-31678
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.5, 3.0.0, 3.1.0
>Reporter: Kent Yao
>Priority: Minor
>
> When I was finding the root cause of 
> https://issues.apache.org/jira/browse/SPARK-31675, I noticed that it is very 
> difficult for me to see what was actually going on, since it output nothing 
> else but
> {code:java}
> Error in query: java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://cluster2/user/warehouse/bdms_hzyaoqin_test.db/abc/.hive-staging_hive_2020-05-11_17-10-27_123_6306294638950056285-1/-ext-1/part-0-badf2a31-ab36-4b60-82a1-0848774e4af5-c000,
>  expected: hdfs://cluster1
> {code}
> It is really hard for us to find causes through such a simple error message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31678) PrintStackTrace for Spark SQL CLI when error occurs

2020-05-11 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31678:


Assignee: Apache Spark

> PrintStackTrace for Spark SQL CLI when error occurs
> ---
>
> Key: SPARK-31678
> URL: https://issues.apache.org/jira/browse/SPARK-31678
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.5, 3.0.0, 3.1.0
>Reporter: Kent Yao
>Assignee: Apache Spark
>Priority: Minor
>
> When I was finding the root cause of 
> https://issues.apache.org/jira/browse/SPARK-31675, I noticed that it is very 
> difficult for me to see what was actually going on, since it output nothing 
> else but
> {code:java}
> Error in query: java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://cluster2/user/warehouse/bdms_hzyaoqin_test.db/abc/.hive-staging_hive_2020-05-11_17-10-27_123_6306294638950056285-1/-ext-1/part-0-badf2a31-ab36-4b60-82a1-0848774e4af5-c000,
>  expected: hdfs://cluster1
> {code}
> It is really hard for us to find causes through such a simple error message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31678) PrintStackTrace for Spark SQL CLI when error occurs

2020-05-11 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31678:


Assignee: (was: Apache Spark)

> PrintStackTrace for Spark SQL CLI when error occurs
> ---
>
> Key: SPARK-31678
> URL: https://issues.apache.org/jira/browse/SPARK-31678
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.5, 3.0.0, 3.1.0
>Reporter: Kent Yao
>Priority: Minor
>
> When I was finding the root cause of 
> https://issues.apache.org/jira/browse/SPARK-31675, I noticed that it is very 
> difficult for me to see what was actually going on, since it output nothing 
> else but
> {code:java}
> Error in query: java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://cluster2/user/warehouse/bdms_hzyaoqin_test.db/abc/.hive-staging_hive_2020-05-11_17-10-27_123_6306294638950056285-1/-ext-1/part-0-badf2a31-ab36-4b60-82a1-0848774e4af5-c000,
>  expected: hdfs://cluster1
> {code}
> It is really hard for us to find causes through such a simple error message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31679) Flaky test: org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite: Failed to create new KafkaAdminClient

2020-05-11 Thread Gabor Somogyi (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104322#comment-17104322
 ] 

Gabor Somogyi commented on SPARK-31679:
---

Started to work on this.

> Flaky test: org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite: Failed 
> to create new KafkaAdminClient
> --
>
> Key: SPARK-31679
> URL: https://issues.apache.org/jira/browse/SPARK-31679
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming, Tests
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Priority: Major
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122389/testReport/
> {code:java}
> Failed
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.(It is not a test it 
> is a sbt.testing.SuiteSelector)
> Failing for the past 1 build (Since Failed#122389 )
> Took 34 sec.
> Error Message
> org.apache.kafka.common.KafkaException: Failed to create new KafkaAdminClient
> Stacktrace
> sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: Failed to 
> create new KafkaAdminClient
>   at 
> org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:479)
>   at org.apache.kafka.clients.admin.Admin.create(Admin.java:61)
>   at 
> org.apache.kafka.clients.admin.AdminClient.create(AdminClient.java:39)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:267)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:290)
>   at 
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49)
>   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
>   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
>   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:58)
>   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317)
>   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: 
> javax.security.auth.login.LoginException: Client not found in Kerberos 
> database (6) - Client not found in Kerberos database
>   at 
> org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:172)
>   at 
> org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:157)
>   at 
> org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:73)
>   at 
> org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:105)
>   at 
> org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:454)
>   ... 17 more
> Caused by: sbt.ForkMain$ForkError: javax.security.auth.login.LoginException: 
> Client not found in Kerberos database (6) - Client not found in Kerberos 
> database
>   at 
> com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:804)
>   at 
> com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
>   at 
> javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
>   at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
>   at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at 
> javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
>   at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
>   at 
> org.apache.kafka.common.security.authenticator.AbstractLogin.login(AbstractLogin.java:60)
>   at 
> org.apache.kafka.common.security.kerberos.KerberosLogin.login(KerberosLogin.ja

[jira] [Created] (SPARK-31679) Flaky test: org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite: Failed to create new KafkaAdminClient

2020-05-11 Thread Gabor Somogyi (Jira)

Gabor Somogyi created SPARK-31679:
-

 Summary: Flaky test: 
org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite: Failed to create new 
KafkaAdminClient
 Key: SPARK-31679
 URL: https://issues.apache.org/jira/browse/SPARK-31679
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming, Tests
Affects Versions: 3.0.0
Reporter: Gabor Somogyi


https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122389/testReport/
{code:java}
Failed
org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.(It is not a test it is 
a sbt.testing.SuiteSelector)

Failing for the past 1 build (Since Failed#122389 )
Took 34 sec.
Error Message
org.apache.kafka.common.KafkaException: Failed to create new KafkaAdminClient
Stacktrace
sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: Failed to 
create new KafkaAdminClient
at 
org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:479)
at org.apache.kafka.clients.admin.Admin.create(Admin.java:61)
at 
org.apache.kafka.clients.admin.AdminClient.create(AdminClient.java:39)
at 
org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:267)
at 
org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:290)
at 
org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49)
at 
org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:58)
at 
org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317)
at 
org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510)
at sbt.ForkMain$Run$2.call(ForkMain.java:296)
at sbt.ForkMain$Run$2.call(ForkMain.java:286)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: 
javax.security.auth.login.LoginException: Client not found in Kerberos database 
(6) - Client not found in Kerberos database
at 
org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:172)
at 
org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:157)
at 
org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:73)
at 
org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:105)
at 
org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:454)
... 17 more
Caused by: sbt.ForkMain$ForkError: javax.security.auth.login.LoginException: 
Client not found in Kerberos database (6) - Client not found in Kerberos 
database
at 
com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:804)
at 
com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
at 
javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
at java.security.AccessController.doPrivileged(Native Method)
at 
javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
at 
org.apache.kafka.common.security.authenticator.AbstractLogin.login(AbstractLogin.java:60)
at 
org.apache.kafka.common.security.kerberos.KerberosLogin.login(KerberosLogin.java:103)
at 
org.apache.kafka.common.security.authenticator.LoginManager.(LoginManager.java:62)
at 
org.apache.kafka.common.security.authenticator.LoginManager.acquireLoginManager(LoginManager.java:105)
at 
org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:158)
... 21 more
Caused by: sbt.ForkMain$ForkError: sun.security.krb5.KrbException: Client not

[jira] [Created] (SPARK-31678) PrintStackTrace for Spark SQL CLI when error occurs

2020-05-11 Thread Kent Yao (Jira)

Kent Yao created SPARK-31678:


 Summary: PrintStackTrace for Spark SQL CLI when error occurs
 Key: SPARK-31678
 URL: https://issues.apache.org/jira/browse/SPARK-31678
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.5, 3.0.0, 3.1.0
Reporter: Kent Yao


When I was finding the root cause of 
https://issues.apache.org/jira/browse/SPARK-31675, I noticed that it is very 
difficult for me to see what was actually going on, since it output nothing 
else but


{code:java}
Error in query: java.lang.IllegalArgumentException: Wrong FS: 
hdfs://cluster2/user/warehouse/bdms_hzyaoqin_test.db/abc/.hive-staging_hive_2020-05-11_17-10-27_123_6306294638950056285-1/-ext-1/part-0-badf2a31-ab36-4b60-82a1-0848774e4af5-c000,
 expected: hdfs://cluster1
{code}

It is really hard for us to find causes through such a simple error message.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31676) QuantileDiscretizer raise error parameter splits given invalid value (splits array includes -0.0 and 0.0)

2020-05-11 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31676:


Assignee: Apache Spark

> QuantileDiscretizer raise error parameter splits given invalid value (splits 
> array includes -0.0 and 0.0)
> -
>
> Key: SPARK-31676
> URL: https://issues.apache.org/jira/browse/SPARK-31676
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Weichen Xu
>Assignee: Apache Spark
>Priority: Major
>
> Reproduce code
> {code}
> import scala.util.Random
> val rng = new Random(3)
> val a1 = Array.tabulate(200)(_=>rng.nextDouble * 2.0 - 1.0) ++ 
> Array.fill(20)(0.0) ++ Array.fill(20)(-0.0)
> import spark.implicits._
> val df1 = sc.parallelize(a1, 2).toDF("id")
> import org.apache.spark.ml.feature.QuantileDiscretizer
> val qd = new 
> QuantileDiscretizer().setInputCol("id").setOutputCol("out").setNumBuckets(200).setRelativeError(0.0)
> val model = qd.fit(df1)
> {code}
> Raise error like:
>   at org.apache.spark.ml.param.Param.validate(params.scala:76)
>   at org.apache.spark.ml.param.ParamPair.(params.scala:634)
>   at org.apache.spark.ml.param.Param.$minus$greater(params.scala:85)
>   at org.apache.spark.ml.param.Params.set(params.scala:713)
>   at org.apache.spark.ml.param.Params.set$(params.scala:712)
>   at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:41)
>   at org.apache.spark.ml.feature.Bucketizer.setSplits(Bucketizer.scala:77)
>   at 
> org.apache.spark.ml.feature.QuantileDiscretizer.fit(QuantileDiscretizer.scala:231)
>   ... 49 elided
> java.lang.IllegalArgumentException: quantileDiscretizer_479bb5a3ca99 
> parameter splits given invalid value [-Infinity,-0.9986765732730827,..., 
> -0.0, 0.0, ..., 0.9907184077958491,Infinity]
> 0.0 > -0.0 is False, which break the paremater validation check.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31676) QuantileDiscretizer raise error parameter splits given invalid value (splits array includes -0.0 and 0.0)

2020-05-11 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104269#comment-17104269
 ] 

Apache Spark commented on SPARK-31676:
--

User 'WeichenXu123' has created a pull request for this issue:
https://github.com/apache/spark/pull/28498

> QuantileDiscretizer raise error parameter splits given invalid value (splits 
> array includes -0.0 and 0.0)
> -
>
> Key: SPARK-31676
> URL: https://issues.apache.org/jira/browse/SPARK-31676
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Weichen Xu
>Priority: Major
>
> Reproduce code
> {code}
> import scala.util.Random
> val rng = new Random(3)
> val a1 = Array.tabulate(200)(_=>rng.nextDouble * 2.0 - 1.0) ++ 
> Array.fill(20)(0.0) ++ Array.fill(20)(-0.0)
> import spark.implicits._
> val df1 = sc.parallelize(a1, 2).toDF("id")
> import org.apache.spark.ml.feature.QuantileDiscretizer
> val qd = new 
> QuantileDiscretizer().setInputCol("id").setOutputCol("out").setNumBuckets(200).setRelativeError(0.0)
> val model = qd.fit(df1)
> {code}
> Raise error like:
>   at org.apache.spark.ml.param.Param.validate(params.scala:76)
>   at org.apache.spark.ml.param.ParamPair.(params.scala:634)
>   at org.apache.spark.ml.param.Param.$minus$greater(params.scala:85)
>   at org.apache.spark.ml.param.Params.set(params.scala:713)
>   at org.apache.spark.ml.param.Params.set$(params.scala:712)
>   at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:41)
>   at org.apache.spark.ml.feature.Bucketizer.setSplits(Bucketizer.scala:77)
>   at 
> org.apache.spark.ml.feature.QuantileDiscretizer.fit(QuantileDiscretizer.scala:231)
>   ... 49 elided
> java.lang.IllegalArgumentException: quantileDiscretizer_479bb5a3ca99 
> parameter splits given invalid value [-Infinity,-0.9986765732730827,..., 
> -0.0, 0.0, ..., 0.9907184077958491,Infinity]
> 0.0 > -0.0 is False, which break the paremater validation check.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31676) QuantileDiscretizer raise error parameter splits given invalid value (splits array includes -0.0 and 0.0)

2020-05-11 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31676:


Assignee: (was: Apache Spark)

> QuantileDiscretizer raise error parameter splits given invalid value (splits 
> array includes -0.0 and 0.0)
> -
>
> Key: SPARK-31676
> URL: https://issues.apache.org/jira/browse/SPARK-31676
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Weichen Xu
>Priority: Major
>
> Reproduce code
> {code}
> import scala.util.Random
> val rng = new Random(3)
> val a1 = Array.tabulate(200)(_=>rng.nextDouble * 2.0 - 1.0) ++ 
> Array.fill(20)(0.0) ++ Array.fill(20)(-0.0)
> import spark.implicits._
> val df1 = sc.parallelize(a1, 2).toDF("id")
> import org.apache.spark.ml.feature.QuantileDiscretizer
> val qd = new 
> QuantileDiscretizer().setInputCol("id").setOutputCol("out").setNumBuckets(200).setRelativeError(0.0)
> val model = qd.fit(df1)
> {code}
> Raise error like:
>   at org.apache.spark.ml.param.Param.validate(params.scala:76)
>   at org.apache.spark.ml.param.ParamPair.(params.scala:634)
>   at org.apache.spark.ml.param.Param.$minus$greater(params.scala:85)
>   at org.apache.spark.ml.param.Params.set(params.scala:713)
>   at org.apache.spark.ml.param.Params.set$(params.scala:712)
>   at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:41)
>   at org.apache.spark.ml.feature.Bucketizer.setSplits(Bucketizer.scala:77)
>   at 
> org.apache.spark.ml.feature.QuantileDiscretizer.fit(QuantileDiscretizer.scala:231)
>   ... 49 elided
> java.lang.IllegalArgumentException: quantileDiscretizer_479bb5a3ca99 
> parameter splits given invalid value [-Infinity,-0.9986765732730827,..., 
> -0.0, 0.0, ..., 0.9907184077958491,Infinity]
> 0.0 > -0.0 is False, which break the paremater validation check.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31676) QuantileDiscretizer raise error parameter splits given invalid value (splits array includes -0.0 and 0.0)

2020-05-11 Thread Weichen Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weichen Xu updated SPARK-31676:
---
Description: 
Reproduce code

{code: scala}

import scala.util.Random
val rng = new Random(3)

val a1 = Array.tabulate(200)(_=>rng.nextDouble * 2.0 - 1.0) ++ 
Array.fill(20)(0.0) ++ Array.fill(20)(-0.0)

import spark.implicits._
val df1 = sc.parallelize(a1, 2).toDF("id")

import org.apache.spark.ml.feature.QuantileDiscretizer
val qd = new 
QuantileDiscretizer().setInputCol("id").setOutputCol("out").setNumBuckets(200).setRelativeError(0.0)

val model = qd.fit(df1)

{code}

Raise error like:

  at org.apache.spark.ml.param.Param.validate(params.scala:76)
  at org.apache.spark.ml.param.ParamPair.(params.scala:634)
  at org.apache.spark.ml.param.Param.$minus$greater(params.scala:85)
  at org.apache.spark.ml.param.Params.set(params.scala:713)
  at org.apache.spark.ml.param.Params.set$(params.scala:712)
  at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:41)
  at org.apache.spark.ml.feature.Bucketizer.setSplits(Bucketizer.scala:77)
  at 
org.apache.spark.ml.feature.QuantileDiscretizer.fit(QuantileDiscretizer.scala:231)
  ... 49 elided
java.lang.IllegalArgumentException: quantileDiscretizer_479bb5a3ca99 parameter 
splits given invalid value [-Infinity,-0.9986765732730827,..., -0.0, 0.0, ..., 
0.9907184077958491,Infinity]

0.0 > -0.0 is False, which break the paremater validation check.



  was:
Reproduce code
{code: scala}

import scala.util.Random
val rng = new Random(3)

val a1 = Array.tabulate(200)(_=>rng.nextDouble * 2.0 - 1.0) ++ 
Array.fill(20)(0.0) ++ Array.fill(20)(-0.0)

import spark.implicits._
val df1 = sc.parallelize(a1, 2).toDF("id")

import org.apache.spark.ml.feature.QuantileDiscretizer
val qd = new 
QuantileDiscretizer().setInputCol("id").setOutputCol("out").setNumBuckets(200).setRelativeError(0.0)

val model = qd.fit(df1)

{code}

Raise error like:

  at org.apache.spark.ml.param.Param.validate(params.scala:76)
  at org.apache.spark.ml.param.ParamPair.(params.scala:634)
  at org.apache.spark.ml.param.Param.$minus$greater(params.scala:85)
  at org.apache.spark.ml.param.Params.set(params.scala:713)
  at org.apache.spark.ml.param.Params.set$(params.scala:712)
  at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:41)
  at org.apache.spark.ml.feature.Bucketizer.setSplits(Bucketizer.scala:77)
  at 
org.apache.spark.ml.feature.QuantileDiscretizer.fit(QuantileDiscretizer.scala:231)
  ... 49 elided
java.lang.IllegalArgumentException: quantileDiscretizer_479bb5a3ca99 parameter 
splits given invalid value [-Infinity,-0.9986765732730827,..., -0.0, 0.0, ..., 
0.9907184077958491,Infinity]

0.0 > -0.0 is False, which break the paremater validation check.




> QuantileDiscretizer raise error parameter splits given invalid value (splits 
> array includes -0.0 and 0.0)
> -
>
> Key: SPARK-31676
> URL: https://issues.apache.org/jira/browse/SPARK-31676
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Weichen Xu
>Priority: Major
>
> Reproduce code
> {code: scala}
> import scala.util.Random
> val rng = new Random(3)
> val a1 = Array.tabulate(200)(_=>rng.nextDouble * 2.0 - 1.0) ++ 
> Array.fill(20)(0.0) ++ Array.fill(20)(-0.0)
> import spark.implicits._
> val df1 = sc.parallelize(a1, 2).toDF("id")
> import org.apache.spark.ml.feature.QuantileDiscretizer
> val qd = new 
> QuantileDiscretizer().setInputCol("id").setOutputCol("out").setNumBuckets(200).setRelativeError(0.0)
> val model = qd.fit(df1)
> {code}
> Raise error like:
>   at org.apache.spark.ml.param.Param.validate(params.scala:76)
>   at org.apache.spark.ml.param.ParamPair.(params.scala:634)
>   at org.apache.spark.ml.param.Param.$minus$greater(params.scala:85)
>   at org.apache.spark.ml.param.Params.set(params.scala:713)
>   at org.apache.spark.ml.param.Params.set$(params.scala:712)
>   at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:41)
>   at org.apache.spark.ml.feature.Bucketizer.setSplits(Bucketizer.scala:77)
>   at 
> org.apache.spark.ml.feature.QuantileDiscretizer.fit(QuantileDiscretizer.scala:231)
>   ... 49 elided
> java.lang.IllegalArgumentException: quantileDiscretizer_479bb5a3ca99 
> parameter splits given invalid value [-Infinity,-0.9986765732730827,..., 
> -0.0, 0.0, ..., 0.9907184077958491,Infinity]
> 0.0 > -0.0 is False, which break the paremater validation check.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31676) QuantileDiscretizer raise error parameter splits given invalid value (splits array includes -0.0 and 0.0)

2020-05-11 Thread Weichen Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weichen Xu updated SPARK-31676:
---
Description: 
Reproduce code

{code}

import scala.util.Random
val rng = new Random(3)

val a1 = Array.tabulate(200)(_=>rng.nextDouble * 2.0 - 1.0) ++ 
Array.fill(20)(0.0) ++ Array.fill(20)(-0.0)

import spark.implicits._
val df1 = sc.parallelize(a1, 2).toDF("id")

import org.apache.spark.ml.feature.QuantileDiscretizer
val qd = new 
QuantileDiscretizer().setInputCol("id").setOutputCol("out").setNumBuckets(200).setRelativeError(0.0)

val model = qd.fit(df1)

{code}

Raise error like:

  at org.apache.spark.ml.param.Param.validate(params.scala:76)
  at org.apache.spark.ml.param.ParamPair.(params.scala:634)
  at org.apache.spark.ml.param.Param.$minus$greater(params.scala:85)
  at org.apache.spark.ml.param.Params.set(params.scala:713)
  at org.apache.spark.ml.param.Params.set$(params.scala:712)
  at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:41)
  at org.apache.spark.ml.feature.Bucketizer.setSplits(Bucketizer.scala:77)
  at 
org.apache.spark.ml.feature.QuantileDiscretizer.fit(QuantileDiscretizer.scala:231)
  ... 49 elided
java.lang.IllegalArgumentException: quantileDiscretizer_479bb5a3ca99 parameter 
splits given invalid value [-Infinity,-0.9986765732730827,..., -0.0, 0.0, ..., 
0.9907184077958491,Infinity]

0.0 > -0.0 is False, which break the paremater validation check.



  was:
Reproduce code

{code: scala}

import scala.util.Random
val rng = new Random(3)

val a1 = Array.tabulate(200)(_=>rng.nextDouble * 2.0 - 1.0) ++ 
Array.fill(20)(0.0) ++ Array.fill(20)(-0.0)

import spark.implicits._
val df1 = sc.parallelize(a1, 2).toDF("id")

import org.apache.spark.ml.feature.QuantileDiscretizer
val qd = new 
QuantileDiscretizer().setInputCol("id").setOutputCol("out").setNumBuckets(200).setRelativeError(0.0)

val model = qd.fit(df1)

{code}

Raise error like:

  at org.apache.spark.ml.param.Param.validate(params.scala:76)
  at org.apache.spark.ml.param.ParamPair.(params.scala:634)
  at org.apache.spark.ml.param.Param.$minus$greater(params.scala:85)
  at org.apache.spark.ml.param.Params.set(params.scala:713)
  at org.apache.spark.ml.param.Params.set$(params.scala:712)
  at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:41)
  at org.apache.spark.ml.feature.Bucketizer.setSplits(Bucketizer.scala:77)
  at 
org.apache.spark.ml.feature.QuantileDiscretizer.fit(QuantileDiscretizer.scala:231)
  ... 49 elided
java.lang.IllegalArgumentException: quantileDiscretizer_479bb5a3ca99 parameter 
splits given invalid value [-Infinity,-0.9986765732730827,..., -0.0, 0.0, ..., 
0.9907184077958491,Infinity]

0.0 > -0.0 is False, which break the paremater validation check.




> QuantileDiscretizer raise error parameter splits given invalid value (splits 
> array includes -0.0 and 0.0)
> -
>
> Key: SPARK-31676
> URL: https://issues.apache.org/jira/browse/SPARK-31676
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Weichen Xu
>Priority: Major
>
> Reproduce code
> {code}
> import scala.util.Random
> val rng = new Random(3)
> val a1 = Array.tabulate(200)(_=>rng.nextDouble * 2.0 - 1.0) ++ 
> Array.fill(20)(0.0) ++ Array.fill(20)(-0.0)
> import spark.implicits._
> val df1 = sc.parallelize(a1, 2).toDF("id")
> import org.apache.spark.ml.feature.QuantileDiscretizer
> val qd = new 
> QuantileDiscretizer().setInputCol("id").setOutputCol("out").setNumBuckets(200).setRelativeError(0.0)
> val model = qd.fit(df1)
> {code}
> Raise error like:
>   at org.apache.spark.ml.param.Param.validate(params.scala:76)
>   at org.apache.spark.ml.param.ParamPair.(params.scala:634)
>   at org.apache.spark.ml.param.Param.$minus$greater(params.scala:85)
>   at org.apache.spark.ml.param.Params.set(params.scala:713)
>   at org.apache.spark.ml.param.Params.set$(params.scala:712)
>   at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:41)
>   at org.apache.spark.ml.feature.Bucketizer.setSplits(Bucketizer.scala:77)
>   at 
> org.apache.spark.ml.feature.QuantileDiscretizer.fit(QuantileDiscretizer.scala:231)
>   ... 49 elided
> java.lang.IllegalArgumentException: quantileDiscretizer_479bb5a3ca99 
> parameter splits given invalid value [-Infinity,-0.9986765732730827,..., 
> -0.0, 0.0, ..., 0.9907184077958491,Infinity]
> 0.0 > -0.0 is False, which break the paremater validation check.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31677) Use KVStore to cache stream query progress

2020-05-11 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104265#comment-17104265
 ] 

Apache Spark commented on SPARK-31677:
--

User 'uncleGen' has created a pull request for this issue:
https://github.com/apache/spark/pull/28497

> Use KVStore to cache stream query progress
> --
>
> Key: SPARK-31677
> URL: https://issues.apache.org/jira/browse/SPARK-31677
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Genmao Yu
>Priority: Major
>
> 1. Streaming query progress information are cached twice in *StreamExecution* 
> and *StreamingQueryStatusListener*.  It is memory-wasting. We can make this 
> two usage unified.
> 2. Use *KVStore* instead to cache streaming query progress information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31677) Use KVStore to cache stream query progress

2020-05-11 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31677:


Assignee: Apache Spark

> Use KVStore to cache stream query progress
> --
>
> Key: SPARK-31677
> URL: https://issues.apache.org/jira/browse/SPARK-31677
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Genmao Yu
>Assignee: Apache Spark
>Priority: Major
>
> 1. Streaming query progress information are cached twice in *StreamExecution* 
> and *StreamingQueryStatusListener*.  It is memory-wasting. We can make this 
> two usage unified.
> 2. Use *KVStore* instead to cache streaming query progress information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31677) Use KVStore to cache stream query progress

2020-05-11 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31677:


Assignee: (was: Apache Spark)

> Use KVStore to cache stream query progress
> --
>
> Key: SPARK-31677
> URL: https://issues.apache.org/jira/browse/SPARK-31677
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Genmao Yu
>Priority: Major
>
> 1. Streaming query progress information are cached twice in *StreamExecution* 
> and *StreamingQueryStatusListener*.  It is memory-wasting. We can make this 
> two usage unified.
> 2. Use *KVStore* instead to cache streaming query progress information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31677) Use KVStore to cache stream query progress

2020-05-11 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104264#comment-17104264
 ] 

Apache Spark commented on SPARK-31677:
--

User 'uncleGen' has created a pull request for this issue:
https://github.com/apache/spark/pull/28497

> Use KVStore to cache stream query progress
> --
>
> Key: SPARK-31677
> URL: https://issues.apache.org/jira/browse/SPARK-31677
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Genmao Yu
>Priority: Major
>
> 1. Streaming query progress information are cached twice in *StreamExecution* 
> and *StreamingQueryStatusListener*.  It is memory-wasting. We can make this 
> two usage unified.
> 2. Use *KVStore* instead to cache streaming query progress information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31677) Use KVStore to cache stream query progress

2020-05-11 Thread Genmao Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated SPARK-31677:
--
Environment: (was: 1. Streaming query progress information are cached 
twice in *StreamExecution* and *StreamingQueryStatusListener*.  It is 
memory-wasting. We can make this two usage unified.
2. Use *KVStore* instead to cache streaming query progress information.)

> Use KVStore to cache stream query progress
> --
>
> Key: SPARK-31677
> URL: https://issues.apache.org/jira/browse/SPARK-31677
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Genmao Yu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31677) Use KVStore to cache stream query progress

2020-05-11 Thread Genmao Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated SPARK-31677:
--
Description: 
1. Streaming query progress information are cached twice in *StreamExecution* 
and *StreamingQueryStatusListener*.  It is memory-wasting. We can make this two 
usage unified.
2. Use *KVStore* instead to cache streaming query progress information.

> Use KVStore to cache stream query progress
> --
>
> Key: SPARK-31677
> URL: https://issues.apache.org/jira/browse/SPARK-31677
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Genmao Yu
>Priority: Major
>
> 1. Streaming query progress information are cached twice in *StreamExecution* 
> and *StreamingQueryStatusListener*.  It is memory-wasting. We can make this 
> two usage unified.
> 2. Use *KVStore* instead to cache streaming query progress information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31677) Use KVStore to cache stream query progress

2020-05-11 Thread Genmao Yu (Jira)

Genmao Yu created SPARK-31677:
-

 Summary: Use KVStore to cache stream query progress
 Key: SPARK-31677
 URL: https://issues.apache.org/jira/browse/SPARK-31677
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 2.4.5, 3.0.0
 Environment: 1. Streaming query progress information are cached twice 
in *StreamExecution* and *StreamingQueryStatusListener*.  It is memory-wasting. 
We can make this two usage unified.
2. Use *KVStore* instead to cache streaming query progress information.
Reporter: Genmao Yu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31676) QuantileDiscretizer raise error parameter splits given invalid value (splits array includes -0.0 and 0.0)

2020-05-11 Thread Weichen Xu (Jira)

Weichen Xu created SPARK-31676:
--

 Summary: QuantileDiscretizer raise error parameter splits given 
invalid value (splits array includes -0.0 and 0.0)
 Key: SPARK-31676
 URL: https://issues.apache.org/jira/browse/SPARK-31676
 Project: Spark
  Issue Type: Bug
  Components: ML
Affects Versions: 2.4.5, 3.0.0
Reporter: Weichen Xu


Reproduce code
{code: scala}

import scala.util.Random
val rng = new Random(3)

val a1 = Array.tabulate(200)(_=>rng.nextDouble * 2.0 - 1.0) ++ 
Array.fill(20)(0.0) ++ Array.fill(20)(-0.0)

import spark.implicits._
val df1 = sc.parallelize(a1, 2).toDF("id")

import org.apache.spark.ml.feature.QuantileDiscretizer
val qd = new 
QuantileDiscretizer().setInputCol("id").setOutputCol("out").setNumBuckets(200).setRelativeError(0.0)

val model = qd.fit(df1)

{code}

Raise error like:

  at org.apache.spark.ml.param.Param.validate(params.scala:76)
  at org.apache.spark.ml.param.ParamPair.(params.scala:634)
  at org.apache.spark.ml.param.Param.$minus$greater(params.scala:85)
  at org.apache.spark.ml.param.Params.set(params.scala:713)
  at org.apache.spark.ml.param.Params.set$(params.scala:712)
  at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:41)
  at org.apache.spark.ml.feature.Bucketizer.setSplits(Bucketizer.scala:77)
  at 
org.apache.spark.ml.feature.QuantileDiscretizer.fit(QuantileDiscretizer.scala:231)
  ... 49 elided
java.lang.IllegalArgumentException: quantileDiscretizer_479bb5a3ca99 parameter 
splits given invalid value [-Infinity,-0.9986765732730827,..., -0.0, 0.0, ..., 
0.9907184077958491,Infinity]

0.0 > -0.0 is False, which break the paremater validation check.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31675) Fail to insert data to a table with remote location which causes by hive encryption check

2020-05-11 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-31675:
-
Description: 
Before this fix https://issues.apache.org/jira/browse/HIVE-14380 in Hive 2.2.0, 
when moving files from staging dir to the final table dir, Hive will do 
encryption check for the srcPaths and destPaths


{code:java}
// Some comments here
 if (!isSrcLocal) {
// For NOT local src file, rename the file
if (hdfsEncryptionShim != null && 
(hdfsEncryptionShim.isPathEncrypted(srcf) || 
hdfsEncryptionShim.isPathEncrypted(destf))
&& !hdfsEncryptionShim.arePathsOnSameEncryptionZone(srcf, destf))
{
  LOG.info("Copying source " + srcf + " to " + destf + " because HDFS 
encryption zones are different.");
  success = FileUtils.copy(srcf.getFileSystem(conf), srcf, 
destf.getFileSystem(conf), destf,
  true,// delete source
  replace, // overwrite destination
  conf);
} else {
{code}

The hdfsEncryptionShim instance holds a global FileSystem instance belong to 
the default fileSystem. It causes failures when checking a path that belongs to 
a remote file system.

For example, I 
{code:sql}
key int NULL

# Detailed Table Information
Databasebdms_hzyaoqin_test_2
Table   abc
Owner   bdms_hzyaoqin
Created TimeMon May 11 15:14:15 CST 2020
Last Access Thu Jan 01 08:00:00 CST 1970
Created By  Spark 2.4.3
TypeMANAGED
Providerhive
Table Properties[transient_lastDdlTime=1589181255]
Locationhdfs://cluster2/user/warehouse/bdms_hzyaoqin_test.db/abc
Serde Library   org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat org.apache.hadoop.mapred.TextInputFormat
OutputFormatorg.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Storage Properties  [serialization.format=1]
Partition Provider  Catalog
Time taken: 0.224 seconds, Fetched 18 row(s)
{code}

The table abc belongs to the remote hdfs 'hdfs://cluster2', and when we run 
command below via a spark sql job with default fs is ' 'hdfs://cluster1'
{code:sql}
insert into bdms_hzyaoqin_test_2.abc values(1);
{code}


{code:java}

Error in query: java.lang.IllegalArgumentException: Wrong FS: 
hdfs://cluster2/user/warehouse/bdms_hzyaoqin_test.db/abc/.hive-staging_hive_2020-05-11_17-10-27_123_6306294638950056285-1/-ext-1/part-0-badf2a31-ab36-4b60-82a1-0848774e4af5-c000,
 expected: hdfs://cluster1
{code}



  was:
Before this fix https://issues.apache.org/jira/browse/HIVE-14380 in Hive 2.2.0, 
when moving files from staging dir to the final table dir, Hive will do 
encryption check for the srcPaths and destPaths


{code:java}
// Some comments here
 if (!isSrcLocal) {
// For NOT local src file, rename the file
if (hdfsEncryptionShim != null && 
(hdfsEncryptionShim.isPathEncrypted(srcf) || 
hdfsEncryptionShim.isPathEncrypted(destf))
&& !hdfsEncryptionShim.arePathsOnSameEncryptionZone(srcf, destf))
{
  LOG.info("Copying source " + srcf + " to " + destf + " because HDFS 
encryption zones are different.");
  success = FileUtils.copy(srcf.getFileSystem(conf), srcf, 
destf.getFileSystem(conf), destf,
  true,// delete source
  replace, // overwrite destination
  conf);
} else {
{code}

The hdfsEncryptionShim instance holds a global FileSystem instance belong to 
the default fileSystem. It causes failures when checking a path that belongs to 
a remote file system.

For example, I 
{code:sql}
key int NULL

# Detailed Table Information
Databasebdms_hzyaoqin_test_2
Table   abc
Owner   bdms_hzyaoqin
Created TimeMon May 11 15:14:15 CST 2020
Last Access Thu Jan 01 08:00:00 CST 1970
Created By  Spark 2.4.3
TypeMANAGED
Providerhive
Table Properties[transient_lastDdlTime=1589181255]
Locationhdfs://cluster2/user/warehouse/bdms_hzyaoqin_test.db/abc
Serde Library   org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat org.apache.hadoop.mapred.TextInputFormat
OutputFormatorg.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Storage Properties  [serialization.format=1]
Partition Provider  Catalog
Time taken: 0.224 seconds, Fetched 18 row(s)
{code}

The table abc belongs to the remote hdfs 'hdfs://cluster2', and when we run 
command below via a spark sql job with default fs is ' 'hdfs://cluster1'
{code:sql}
insert into bdms_hzyaoqin_test_2.abc values(1);
{code}


{code:java}

Error in query: java.lang.IllegalArgumentException: Wrong FS: 
hdfs://cluster2/user/warehouse/bdms_hzyaoqin_test.db/abc/.hive-staging_hive_2020-05-11_17-10-27_123_6306294638950056285-1/-ext-1/part-0-badf2a31-ab36-4b60-82a1-0848774e4af5-c000,
 expected: hdfs://cluster1
// Some comments here
public String getFoo()
{
return foo;
}
{code}




> Fail to inse

[jira] [Created] (SPARK-31675) Fail to insert data to a table with remote location which causes by hive encryption check

2020-05-11 Thread Kent Yao (Jira)

Kent Yao created SPARK-31675:


 Summary: Fail to insert data to a table with remote location which 
causes by hive encryption check
 Key: SPARK-31675
 URL: https://issues.apache.org/jira/browse/SPARK-31675
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.6, 3.0.0, 3.1.0
Reporter: Kent Yao


Before this fix https://issues.apache.org/jira/browse/HIVE-14380 in Hive 2.2.0, 
when moving files from staging dir to the final table dir, Hive will do 
encryption check for the srcPaths and destPaths


{code:java}
// Some comments here
 if (!isSrcLocal) {
// For NOT local src file, rename the file
if (hdfsEncryptionShim != null && 
(hdfsEncryptionShim.isPathEncrypted(srcf) || 
hdfsEncryptionShim.isPathEncrypted(destf))
&& !hdfsEncryptionShim.arePathsOnSameEncryptionZone(srcf, destf))
{
  LOG.info("Copying source " + srcf + " to " + destf + " because HDFS 
encryption zones are different.");
  success = FileUtils.copy(srcf.getFileSystem(conf), srcf, 
destf.getFileSystem(conf), destf,
  true,// delete source
  replace, // overwrite destination
  conf);
} else {
{code}

The hdfsEncryptionShim instance holds a global FileSystem instance belong to 
the default fileSystem. It causes failures when checking a path that belongs to 
a remote file system.

For example, I 
{code:sql}
key int NULL

# Detailed Table Information
Databasebdms_hzyaoqin_test_2
Table   abc
Owner   bdms_hzyaoqin
Created TimeMon May 11 15:14:15 CST 2020
Last Access Thu Jan 01 08:00:00 CST 1970
Created By  Spark 2.4.3
TypeMANAGED
Providerhive
Table Properties[transient_lastDdlTime=1589181255]
Locationhdfs://cluster2/user/warehouse/bdms_hzyaoqin_test.db/abc
Serde Library   org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat org.apache.hadoop.mapred.TextInputFormat
OutputFormatorg.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Storage Properties  [serialization.format=1]
Partition Provider  Catalog
Time taken: 0.224 seconds, Fetched 18 row(s)
{code}

The table abc belongs to the remote hdfs 'hdfs://cluster2', and when we run 
command below via a spark sql job with default fs is ' 'hdfs://cluster1'
{code:sql}
insert into bdms_hzyaoqin_test_2.abc values(1);
{code}


{code:java}

Error in query: java.lang.IllegalArgumentException: Wrong FS: 
hdfs://cluster2/user/warehouse/bdms_hzyaoqin_test.db/abc/.hive-staging_hive_2020-05-11_17-10-27_123_6306294638950056285-1/-ext-1/part-0-badf2a31-ab36-4b60-82a1-0848774e4af5-c000,
 expected: hdfs://cluster1
// Some comments here
public String getFoo()
{
return foo;
}
{code}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31634) "show tables like" support for SQL wildcard characters (% and _)

2020-05-11 Thread pavithra ramachandran (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104223#comment-17104223
 ] 

pavithra ramachandran commented on SPARK-31634:
---

[~yumwang] i see that show tables uses catalogue and there is an open Jira in 
hive side. Once that gets fixed, it will work in spark, Or do u want us to 
handle separately handle in spark. 

> "show tables like" support for SQL wildcard characters (% and _)
> 
>
> Key: SPARK-31634
> URL: https://issues.apache.org/jira/browse/SPARK-31634
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>
> https://docs.snowflake.com/en/sql-reference/sql/show-tables.html
> https://clickhouse.tech/docs/en/sql-reference/statements/show/
> https://www.mysqltutorial.org/mysql-show-tables/
> https://issues.apache.org/jira/browse/HIVE-23359



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-30331) The final AdaptiveSparkPlan event is not marked with `isFinalPlan=true`

2020-05-11 Thread Manu Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manu Zhang updated SPARK-30331:
---
Parent: SPARK-31412
Issue Type: Sub-task  (was: Bug)

> The final AdaptiveSparkPlan event is not marked with `isFinalPlan=true`
> ---
>
> Key: SPARK-30331
> URL: https://issues.apache.org/jira/browse/SPARK-30331
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Manu Zhang
>Assignee: Manu Zhang
>Priority: Minor
> Fix For: 3.0.0
>
>
> This is due to that the final AdaptiveSparkPlan event is sent out before 
> {{isFinalPlan}} variable set to `true`. It would fail any listener attempting 
> to catch the final event by pattern matching `isFinalPlan=true`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31658) SQL UI doesn't show write commands of AQE plan

2020-05-11 Thread Manu Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manu Zhang updated SPARK-31658:
---
Parent: SPARK-31412
Issue Type: Sub-task  (was: Improvement)

> SQL UI doesn't show write commands of AQE plan
> --
>
> Key: SPARK-31658
> URL: https://issues.apache.org/jira/browse/SPARK-31658
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Web UI
>Affects Versions: 3.0.0
>Reporter: Manu Zhang
>Assignee: Manu Zhang
>Priority: Minor
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31620) TreeNodeException: Binding attribute, tree: sum#19L

2020-05-11 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104151#comment-17104151
 ] 

Apache Spark commented on SPARK-31620:
--

User 'Ngone51' has created a pull request for this issue:
https://github.com/apache/spark/pull/28496

> TreeNodeException: Binding attribute, tree: sum#19L
> ---
>
> Key: SPARK-31620
> URL: https://issues.apache.org/jira/browse/SPARK-31620
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.4, 2.4.5, 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> {noformat}
> scala> spark.sql("create temporary view t1 as select * from values (1, 2) as 
> t1(a, b)")
> res0: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("create temporary view t2 as select * from values (3, 4) as 
> t2(c, d)")
> res1: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("select sum(if(c > (select a from t1), d, 0)) as csum from 
> t2").show
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding 
> attribute, tree: sum#19L
>   at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:75)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:74)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:309)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:309)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:314)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:399)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:237)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:397)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:350)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:314)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:314)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$2(TreeNode.scala:368)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$4(TreeNode.scala:427)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>   at 
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
>   at 
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:427)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:237)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:397)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:350)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:314)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:298)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:74)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$.$anonfun$bindReferences$1(BoundAttribute.scala:96)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>   at scala.collection.immutable.List.map(List.scala:298)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReferences(BoundAttribute.scala:96)
>   at 
> org.apache.spark.sql.execution.aggregate.HashAggregateExec.$anonfun$doConsumeWithoutKeys$4(HashAggregateExec.scala:348)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>   at scala.collection.immutable.List.map(List.scala:298)
>   at 
> org.apache.spark.sql.executio

[jira] [Assigned] (SPARK-31620) TreeNodeException: Binding attribute, tree: sum#19L

2020-05-11 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31620:


Assignee: Apache Spark

> TreeNodeException: Binding attribute, tree: sum#19L
> ---
>
> Key: SPARK-31620
> URL: https://issues.apache.org/jira/browse/SPARK-31620
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.4, 2.4.5, 3.0.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>Priority: Major
>
> {noformat}
> scala> spark.sql("create temporary view t1 as select * from values (1, 2) as 
> t1(a, b)")
> res0: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("create temporary view t2 as select * from values (3, 4) as 
> t2(c, d)")
> res1: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("select sum(if(c > (select a from t1), d, 0)) as csum from 
> t2").show
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding 
> attribute, tree: sum#19L
>   at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:75)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:74)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:309)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:309)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:314)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:399)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:237)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:397)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:350)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:314)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:314)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$2(TreeNode.scala:368)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$4(TreeNode.scala:427)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>   at 
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
>   at 
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:427)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:237)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:397)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:350)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:314)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:298)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:74)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$.$anonfun$bindReferences$1(BoundAttribute.scala:96)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>   at scala.collection.immutable.List.map(List.scala:298)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReferences(BoundAttribute.scala:96)
>   at 
> org.apache.spark.sql.execution.aggregate.HashAggregateExec.$anonfun$doConsumeWithoutKeys$4(HashAggregateExec.scala:348)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>   at scala.collection.immutable.List.map(List.scala:298)
>   at 
> org.apache.spark.sql.execution.aggregate.HashAggregateExec.doConsumeWithoutKeys(HashAggregateExec.scala:347)
>   at 
>

[jira] [Commented] (SPARK-31620) TreeNodeException: Binding attribute, tree: sum#19L

2020-05-11 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104148#comment-17104148
 ] 

Apache Spark commented on SPARK-31620:
--

User 'Ngone51' has created a pull request for this issue:
https://github.com/apache/spark/pull/28496

> TreeNodeException: Binding attribute, tree: sum#19L
> ---
>
> Key: SPARK-31620
> URL: https://issues.apache.org/jira/browse/SPARK-31620
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.4, 2.4.5, 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> {noformat}
> scala> spark.sql("create temporary view t1 as select * from values (1, 2) as 
> t1(a, b)")
> res0: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("create temporary view t2 as select * from values (3, 4) as 
> t2(c, d)")
> res1: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("select sum(if(c > (select a from t1), d, 0)) as csum from 
> t2").show
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding 
> attribute, tree: sum#19L
>   at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:75)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:74)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:309)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:309)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:314)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:399)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:237)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:397)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:350)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:314)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:314)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$2(TreeNode.scala:368)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$4(TreeNode.scala:427)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>   at 
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
>   at 
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:427)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:237)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:397)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:350)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:314)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:298)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:74)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$.$anonfun$bindReferences$1(BoundAttribute.scala:96)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>   at scala.collection.immutable.List.map(List.scala:298)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReferences(BoundAttribute.scala:96)
>   at 
> org.apache.spark.sql.execution.aggregate.HashAggregateExec.$anonfun$doConsumeWithoutKeys$4(HashAggregateExec.scala:348)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>   at scala.collection.immutable.List.map(List.scala:298)
>   at 
> org.apache.spark.sql.executio

[jira] [Assigned] (SPARK-31620) TreeNodeException: Binding attribute, tree: sum#19L

2020-05-11 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31620:


Assignee: (was: Apache Spark)

> TreeNodeException: Binding attribute, tree: sum#19L
> ---
>
> Key: SPARK-31620
> URL: https://issues.apache.org/jira/browse/SPARK-31620
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.4, 2.4.5, 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> {noformat}
> scala> spark.sql("create temporary view t1 as select * from values (1, 2) as 
> t1(a, b)")
> res0: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("create temporary view t2 as select * from values (3, 4) as 
> t2(c, d)")
> res1: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("select sum(if(c > (select a from t1), d, 0)) as csum from 
> t2").show
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding 
> attribute, tree: sum#19L
>   at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:75)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:74)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:309)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:309)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:314)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:399)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:237)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:397)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:350)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:314)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:314)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$2(TreeNode.scala:368)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$4(TreeNode.scala:427)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>   at 
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
>   at 
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:427)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:237)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:397)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:350)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:314)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:298)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:74)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$.$anonfun$bindReferences$1(BoundAttribute.scala:96)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>   at scala.collection.immutable.List.map(List.scala:298)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReferences(BoundAttribute.scala:96)
>   at 
> org.apache.spark.sql.execution.aggregate.HashAggregateExec.$anonfun$doConsumeWithoutKeys$4(HashAggregateExec.scala:348)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>   at scala.collection.immutable.List.map(List.scala:298)
>   at 
> org.apache.spark.sql.execution.aggregate.HashAggregateExec.doConsumeWithoutKeys(HashAggregateExec.scala:347)
>   at 
> org.apache.spark.sql.exe

85 matches

Mail list logo