date:20210927

[jira] [Commented] (SPARK-32855) Improve DPP for some join type do not support broadcast filtering side

2021-09-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421134#comment-17421134
 ] 

Apache Spark commented on SPARK-32855:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/34124

> Improve DPP for some join type do not support broadcast filtering side
> --
>
> Key: SPARK-32855
> URL: https://issues.apache.org/jira/browse/SPARK-32855
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.2.0
>
>
> For some filtering side can not broadcast by join type but can broadcast by 
> size,
> then we should not consider reuse broadcast only, for example:
> Left outer join and left side very small.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32855) Improve DPP for some join type do not support broadcast filtering side

2021-09-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421133#comment-17421133
 ] 

Apache Spark commented on SPARK-32855:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/34124

> Improve DPP for some join type do not support broadcast filtering side
> --
>
> Key: SPARK-32855
> URL: https://issues.apache.org/jira/browse/SPARK-32855
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.2.0
>
>
> For some filtering side can not broadcast by join type but can broadcast by 
> size,
> then we should not consider reuse broadcast only, for example:
> Left outer join and left side very small.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36872) Decommissioning executors get killed before transferring their data because of the hardcoded timeout of 60 secs

2021-09-27 Thread Shekhar Gupta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shekhar Gupta updated SPARK-36872:
--
Description: During the graceful decommissioning phase, executors need to 
transfer all of their shuffle and cache data to the peer executors. However, 
they get killed before transferring all the data because of the hardcoded 
timeout value of 60 secs in the decommissioning script. As a result of 
executors dying prematurely, the spark tasks on other executors fail which 
causes application failures, and it is hard to debug those failures. To fix the 
issue, we ended up writing a custom script with a different timeout and rebuilt 
the spark image but we would prefer an easier solution that does not require 
rebuilding the image.   (was: During the graceful decommissioning phase, 
executors need to transfer all of their shuffle and cache data to the peer 
executors. However, they get killed before could transfer all the data because 
of the hardcoded timeout value of 60 secs in the decommissioning script. As a 
result of executors dying prematurely, the spark tasks on other executors fail 
which causes application failures, and it is hard to debug those failures. To 
fix the issue, we ended up writing a custom script with a different timeout and 
rebuilt the spark image but we would prefer an easier solution that does not 
require rebuilding the image. )

> Decommissioning executors get killed before transferring their data because 
> of the hardcoded timeout of 60 secs
> ---
>
> Key: SPARK-36872
> URL: https://issues.apache.org/jira/browse/SPARK-36872
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.1.1, 3.1.2, 3.2.0
>Reporter: Shekhar Gupta
>Priority: Trivial
>
> During the graceful decommissioning phase, executors need to transfer all of 
> their shuffle and cache data to the peer executors. However, they get killed 
> before transferring all the data because of the hardcoded timeout value of 60 
> secs in the decommissioning script. As a result of executors dying 
> prematurely, the spark tasks on other executors fail which causes application 
> failures, and it is hard to debug those failures. To fix the issue, we ended 
> up writing a custom script with a different timeout and rebuilt the spark 
> image but we would prefer an easier solution that does not require rebuilding 
> the image. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36872) Decommissioning executors get killed before transferring their data because of the hardcoded timeout of 60 secs

2021-09-27 Thread Shekhar Gupta (Jira)

Shekhar Gupta created SPARK-36872:
-

 Summary: Decommissioning executors get killed before transferring 
their data because of the hardcoded timeout of 60 secs
 Key: SPARK-36872
 URL: https://issues.apache.org/jira/browse/SPARK-36872
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 3.1.2, 3.1.1, 3.2.0
Reporter: Shekhar Gupta


During the graceful decommissioning phase, executors need to transfer all of 
their shuffle and cache data to the peer executors. However, they get killed 
before could transfer all the data because of the hardcoded timeout value of 60 
secs in the decommissioning script. As a result of executors dying prematurely, 
the spark tasks on other executors fail which causes application failures, and 
it is hard to debug those failures. To fix the issue, we ended up writing a 
custom script with a different timeout and rebuilt the spark image but we would 
prefer an easier solution that does not require rebuilding the image. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36678) Migrate SHOW TABLES to use V2 command by default

2021-09-27 Thread Terry Kim (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421120#comment-17421120
 ] 

Terry Kim commented on SPARK-36678:
---

Got it, thanks.

> Migrate SHOW TABLES to use V2 command by default
> 
>
> Key: SPARK-36678
> URL: https://issues.apache.org/jira/browse/SPARK-36678
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Terry Kim
>Priority: Major
>
> Migrate SHOW TABLES to use V2 command by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-36586) Migrate all ParsedStatement to the new v2 command framework

2021-09-27 Thread Terry Kim (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421106#comment-17421106
 ] 

Terry Kim edited comment on SPARK-36586 at 9/28/21, 1:51 AM:
-

[~cloud_fan] totally missed this ping (never got an email notification). :) 
Looks like the work has started and I will also chip in. Thanks!


was (Author: imback82):
[~cloud_fan] totally missed this ping. :) Looks like the work has started and I 
will also chip in. Thanks!

> Migrate all ParsedStatement to the new v2 command framework
> ---
>
> Key: SPARK-36586
> URL: https://issues.apache.org/jira/browse/SPARK-36586
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Wenchen Fan
>Priority: Major
>
> The ParsedStatement needs to be pattern matched in two analyzer rules and 
> results to a lot of duplicated code.
> The new v2 command framework defines a few basic logical plan nodes such as 
> UnresolvedTable, and we only need to resolve these basic nodes, and pattern 
> match v2 commands only once in the rule `ResolveSessionCatalog` for v1 
> command fallback.
> We should migrate all the ParsedStatement to v2 command framework.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36586) Migrate all ParsedStatement to the new v2 command framework

2021-09-27 Thread Terry Kim (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421106#comment-17421106
 ] 

Terry Kim commented on SPARK-36586:
---

[~cloud_fan] totally missed this ping. :) Looks like the work has started and I 
will also chip in. Thanks!

> Migrate all ParsedStatement to the new v2 command framework
> ---
>
> Key: SPARK-36586
> URL: https://issues.apache.org/jira/browse/SPARK-36586
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Wenchen Fan
>Priority: Major
>
> The ParsedStatement needs to be pattern matched in two analyzer rules and 
> results to a lot of duplicated code.
> The new v2 command framework defines a few basic logical plan nodes such as 
> UnresolvedTable, and we only need to resolve these basic nodes, and pattern 
> match v2 commands only once in the rule `ResolveSessionCatalog` for v1 
> command fallback.
> We should migrate all the ParsedStatement to v2 command framework.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-36058) Support replicasets/job API

2021-09-27 Thread Yikun Jiang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420568#comment-17420568
 ] 

Yikun Jiang edited comment on SPARK-36058 at 9/28/21, 1:11 AM:
---

After this, it makes the executor pod allocator pluggable, we could add the 
VolcanoJobAllocator, and it enables the volcano ability (VolcanoJob with queue, 
podgoup support...) on excutors side. How about driver side? Do you have any 
suggestion on it? [~holden]

BTW, I just wanna sure that [1][2] abilities targets on driver/excutor side, 
right? That means we need add PodGroup/Queue abilities on each allocator and 
driver?

[1] scheduler & queue: https://issues.apache.org/jira/browse/SPARK-36059
[2] PodGroup  https://issues.apache.org/jira/browse/SPARK-36061


was (Author: yikunkero):
After this, it makes the executor pod allocator pluggable, we could add the 
VolcanoJobAllocator, and it enables the volcano ability (with queue, podgoup 
and so on...) on excutors side. How about driver side? Do you have any 
suggestion on it? [~holden]

BTW, I just wanna sure that [1][2] abilities targets on driver/excutor side, 
right? That means we need add PodGroup/Queue abilities on each allocator and 
driver?

[1] scheduler & queue: https://issues.apache.org/jira/browse/SPARK-36059
[2] PodGroup  https://issues.apache.org/jira/browse/SPARK-36061

> Support replicasets/job API
> ---
>
> Key: SPARK-36058
> URL: https://issues.apache.org/jira/browse/SPARK-36058
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
> Fix For: 3.3.0
>
>
> Volcano & Yunikorn both support scheduling invidual pods, but they also 
> support higher level abstractions similar to the vanilla Kube replicasets 
> which we can use to improve scheduling performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36871) Migrate CreateViewStatement to v2 command

2021-09-27 Thread Huaxin Gao (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421095#comment-17421095
 ] 

Huaxin Gao commented on SPARK-36871:


I am working on this

> Migrate CreateViewStatement to v2 command
> -
>
> Key: SPARK-36871
> URL: https://issues.apache.org/jira/browse/SPARK-36871
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36871) Migrate CreateViewStatement to v2 command

2021-09-27 Thread Huaxin Gao (Jira)

Huaxin Gao created SPARK-36871:
--

 Summary: Migrate CreateViewStatement to v2 command
 Key: SPARK-36871
 URL: https://issues.apache.org/jira/browse/SPARK-36871
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0
Reporter: Huaxin Gao






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36870) Introduce INTERNAL_ERROR error class

2021-09-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421087#comment-17421087
 ] 

Apache Spark commented on SPARK-36870:
--

User 'karenfeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/34123

> Introduce INTERNAL_ERROR error class
> 
>
> Key: SPARK-36870
> URL: https://issues.apache.org/jira/browse/SPARK-36870
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Karen Feng
>Priority: Major
>
> Introduces the INTERNAL_ERROR error class; this will be used to determine if 
> an exception is an internal error and is useful for end-users and developers 
> to diagnose whether an issue should be reported.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36870) Introduce INTERNAL_ERROR error class

2021-09-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36870:


Assignee: Apache Spark

> Introduce INTERNAL_ERROR error class
> 
>
> Key: SPARK-36870
> URL: https://issues.apache.org/jira/browse/SPARK-36870
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Karen Feng
>Assignee: Apache Spark
>Priority: Major
>
> Introduces the INTERNAL_ERROR error class; this will be used to determine if 
> an exception is an internal error and is useful for end-users and developers 
> to diagnose whether an issue should be reported.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36870) Introduce INTERNAL_ERROR error class

2021-09-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36870:


Assignee: (was: Apache Spark)

> Introduce INTERNAL_ERROR error class
> 
>
> Key: SPARK-36870
> URL: https://issues.apache.org/jira/browse/SPARK-36870
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Karen Feng
>Priority: Major
>
> Introduces the INTERNAL_ERROR error class; this will be used to determine if 
> an exception is an internal error and is useful for end-users and developers 
> to diagnose whether an issue should be reported.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36870) Introduce INTERNAL_ERROR error class

2021-09-27 Thread Karen Feng (Jira)

Karen Feng created SPARK-36870:
--

 Summary: Introduce INTERNAL_ERROR error class
 Key: SPARK-36870
 URL: https://issues.apache.org/jira/browse/SPARK-36870
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.3.0
Reporter: Karen Feng


Introduces the INTERNAL_ERROR error class; this will be used to determine if an 
exception is an internal error and is useful for end-users and developers to 
diagnose whether an issue should be reported.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34826) Adaptive fetch of shuffle mergers for Push based shuffle

2021-09-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421077#comment-17421077
 ] 

Apache Spark commented on SPARK-34826:
--

User 'venkata91' has created a pull request for this issue:
https://github.com/apache/spark/pull/34122

> Adaptive fetch of shuffle mergers for Push based shuffle
> 
>
> Key: SPARK-34826
> URL: https://issues.apache.org/jira/browse/SPARK-34826
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Venkata krishnan Sowrirajan
>Priority: Major
>
> Currently the shuffle mergers are set during the creation of ShuffleMapStage. 
> In the initial set of stages, there won't be enough executors added which can 
> cause not enough shuffle mergers to be set during the creation of the shuffle 
> map stage. This task is to handle the issue of low merge ratio for initial 
> stages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34826) Adaptive fetch of shuffle mergers for Push based shuffle

2021-09-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34826:


Assignee: (was: Apache Spark)

> Adaptive fetch of shuffle mergers for Push based shuffle
> 
>
> Key: SPARK-34826
> URL: https://issues.apache.org/jira/browse/SPARK-34826
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Venkata krishnan Sowrirajan
>Priority: Major
>
> Currently the shuffle mergers are set during the creation of ShuffleMapStage. 
> In the initial set of stages, there won't be enough executors added which can 
> cause not enough shuffle mergers to be set during the creation of the shuffle 
> map stage. This task is to handle the issue of low merge ratio for initial 
> stages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34826) Adaptive fetch of shuffle mergers for Push based shuffle

2021-09-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34826:


Assignee: Apache Spark

> Adaptive fetch of shuffle mergers for Push based shuffle
> 
>
> Key: SPARK-34826
> URL: https://issues.apache.org/jira/browse/SPARK-34826
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Venkata krishnan Sowrirajan
>Assignee: Apache Spark
>Priority: Major
>
> Currently the shuffle mergers are set during the creation of ShuffleMapStage. 
> In the initial set of stages, there won't be enough executors added which can 
> cause not enough shuffle mergers to be set during the creation of the shuffle 
> map stage. This task is to handle the issue of low merge ratio for initial 
> stages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34826) Adaptive fetch of shuffle mergers for Push based shuffle

2021-09-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421075#comment-17421075
 ] 

Apache Spark commented on SPARK-34826:
--

User 'venkata91' has created a pull request for this issue:
https://github.com/apache/spark/pull/34122

> Adaptive fetch of shuffle mergers for Push based shuffle
> 
>
> Key: SPARK-34826
> URL: https://issues.apache.org/jira/browse/SPARK-34826
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Venkata krishnan Sowrirajan
>Priority: Major
>
> Currently the shuffle mergers are set during the creation of ShuffleMapStage. 
> In the initial set of stages, there won't be enough executors added which can 
> cause not enough shuffle mergers to be set during the creation of the shuffle 
> map stage. This task is to handle the issue of low merge ratio for initial 
> stages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36869) Spark job fails due to java.io.InvalidClassException: scala.collection.mutable.WrappedArray$ofRef; local class incompatible

2021-09-27 Thread Hamid EL MAAZOUZ (Jira)

Hamid EL MAAZOUZ created SPARK-36869:


 Summary: Spark job fails due to java.io.InvalidClassException: 
scala.collection.mutable.WrappedArray$ofRef; local class incompatible
 Key: SPARK-36869
 URL: https://issues.apache.org/jira/browse/SPARK-36869
 Project: Spark
  Issue Type: Bug
  Components: Input/Output
Affects Versions: 3.1.2
 Environment: * RHEL 8.4
 * Java 11.0.12
 * Spark 3.1.2 (only prebuilt with *2.12.10)*
 * Scala *2.12.14* for the application code
Reporter: Hamid EL MAAZOUZ


This is a Scala problem. It has been already reported here 
[https://github.com/scala/bug/issues/5046] and a fix has been merged here 
[https://github.com/scala/scala/pull/9166.|https://github.com/scala/scala/pull/9166]

According to [https://github.com/scala/bug/issues/5046#issuecomment-928108088], 
the *fix* is available on *Scala 2.12.14*, but *Spark 3.0+* is only pre-built 
with Scala *2.12.10*.

 
 * Stacktrace of the failure: (Taken from stderr of a worker process)

{code:java}
Spark Executor Command: "/usr/java/jdk-11.0.12/bin/java" "-cp" 
"/opt/apache/spark-3.1.2-bin-hadoop3.2/conf/:/opt/apache/spark-3.1.2-bin-hadoop3.2/jars/*"
 "-Xmx1024M" "-Dspark.driver.port=45887" 
"org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" 
"spark://CoarseGrainedScheduler@192.168.0.191:45887" "--executor-id" "0" 
"--hostname" "192.168.0.191" "--cores" "12" "--app-id" 
"app-20210927231035-" "--worker-url" "spark://Worker@192.168.0.191:35261"
Using Spark's default log4j profile: 
org/apache/spark/log4j-defaults.properties
21/09/27 23:10:36 INFO CoarseGrainedExecutorBackend: Started daemon with 
process name: 18957@localhost
21/09/27 23:10:36 INFO SignalUtils: Registering signal handler for TERM
21/09/27 23:10:36 INFO SignalUtils: Registering signal handler for HUP
21/09/27 23:10:36 INFO SignalUtils: Registering signal handler for INT
21/09/27 23:10:36 WARN Utils: Your hostname, localhost resolves to a loopback 
address: 127.0.0.1; using 192.168.0.191 instead (on interface wlp82s0)
21/09/27 23:10:36 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another 
address
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform 
(file:/opt/apache/spark-3.1.2-bin-hadoop3.2/jars/spark-unsafe_2.12-3.1.2.jar) 
to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of 
org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal 
reflective access operations
WARNING: All illegal access operations will be denied in a future release
21/09/27 23:10:36 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
21/09/27 23:10:36 INFO SecurityManager: Changing view acls to: hamidelmaazouz
21/09/27 23:10:36 INFO SecurityManager: Changing modify acls to: hamidelmaazouz
21/09/27 23:10:36 INFO SecurityManager: Changing view acls groups to: 
21/09/27 23:10:36 INFO SecurityManager: Changing modify acls groups to: 
21/09/27 23:10:36 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users  with view permissions: Set(hamidelmaazouz); 
groups with view permissions: Set(); users  with modify permissions: 
Set(hamidelmaazouz); groups with modify permissions: Set()
21/09/27 23:10:37 INFO TransportClientFactory: Successfully created connection 
to /192.168.0.191:45887 after 44 ms (0 ms spent in bootstraps)
21/09/27 23:10:37 WARN TransportChannelHandler: Exception in connection from 
/192.168.0.191:45887
java.io.InvalidClassException: scala.collection.mutable.WrappedArray$ofRef; 
local class incompatible: stream classdesc serialVersionUID = 
3456489343829468865, local class serialVersionUID = 1028182004549731694
at 
java.base/java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:689)
at 
java.base/java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2012)
at 
java.base/java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1862)
at 
java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2169)
at 
java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1679)
at 
java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2464)
at 
java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2358)
at 
java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2196)
at 
java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1679)
at 
java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:493)
at 
java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:451)

[jira] [Commented] (SPARK-36868) Migrate CreateFunctionStatement to v2 command framework

2021-09-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421067#comment-17421067
 ] 

Apache Spark commented on SPARK-36868:
--

User 'huaxingao' has created a pull request for this issue:
https://github.com/apache/spark/pull/34121

> Migrate CreateFunctionStatement to v2 command framework
> ---
>
> Key: SPARK-36868
> URL: https://issues.apache.org/jira/browse/SPARK-36868
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36868) Migrate CreateFunctionStatement to v2 command framework

2021-09-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36868:


Assignee: Apache Spark

> Migrate CreateFunctionStatement to v2 command framework
> ---
>
> Key: SPARK-36868
> URL: https://issues.apache.org/jira/browse/SPARK-36868
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36868) Migrate CreateFunctionStatement to v2 command framework

2021-09-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421066#comment-17421066
 ] 

Apache Spark commented on SPARK-36868:
--

User 'huaxingao' has created a pull request for this issue:
https://github.com/apache/spark/pull/34121

> Migrate CreateFunctionStatement to v2 command framework
> ---
>
> Key: SPARK-36868
> URL: https://issues.apache.org/jira/browse/SPARK-36868
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36868) Migrate CreateFunctionStatement to v2 command framework

2021-09-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36868:


Assignee: (was: Apache Spark)

> Migrate CreateFunctionStatement to v2 command framework
> ---
>
> Key: SPARK-36868
> URL: https://issues.apache.org/jira/browse/SPARK-36868
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36868) Migrate CreateFunctionStatement to v2 command framework

2021-09-27 Thread Huaxin Gao (Jira)

Huaxin Gao created SPARK-36868:
--

 Summary: Migrate CreateFunctionStatement to v2 command framework
 Key: SPARK-36868
 URL: https://issues.apache.org/jira/browse/SPARK-36868
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0
Reporter: Huaxin Gao






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35672) Spark fails to launch executors with very large user classpath lists on YARN

2021-09-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421017#comment-17421017
 ] 

Apache Spark commented on SPARK-35672:
--

User 'xkrogen' has created a pull request for this issue:
https://github.com/apache/spark/pull/34120

> Spark fails to launch executors with very large user classpath lists on YARN
> 
>
> Key: SPARK-35672
> URL: https://issues.apache.org/jira/browse/SPARK-35672
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Affects Versions: 3.1.2
> Environment: Linux RHEL7
> Spark 3.1.1
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
>
> When running Spark on YARN, the {{user-class-path}} argument to 
> {{CoarseGrainedExecutorBackend}} is used to pass a list of user JAR URIs to 
> executor processes. The argument is specified once for each JAR, and the URIs 
> are fully-qualified, so the paths can be quite long. With large user JAR 
> lists (say 1000+), this can result in system-level argument length limits 
> being exceeded, typically manifesting as the error message:
> {code}
> /bin/bash: Argument list too long
> {code}
> A [Google 
> search|https://www.google.com/search?q=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22]
>  indicates that this is not a theoretical problem and afflicts real users, 
> including ours. This issue was originally observed on Spark 2.3, but has been 
> confirmed to exist in the master branch as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35672) Spark fails to launch executors with very large user classpath lists on YARN

2021-09-27 Thread Erik Krogen (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421016#comment-17421016
 ] 

Erik Krogen commented on SPARK-35672:
-

Re-submitted at [PR #34120|https://github.com/apache/spark/pull/34120]

> Spark fails to launch executors with very large user classpath lists on YARN
> 
>
> Key: SPARK-35672
> URL: https://issues.apache.org/jira/browse/SPARK-35672
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Affects Versions: 3.1.2
> Environment: Linux RHEL7
> Spark 3.1.1
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
>
> When running Spark on YARN, the {{user-class-path}} argument to 
> {{CoarseGrainedExecutorBackend}} is used to pass a list of user JAR URIs to 
> executor processes. The argument is specified once for each JAR, and the URIs 
> are fully-qualified, so the paths can be quite long. With large user JAR 
> lists (say 1000+), this can result in system-level argument length limits 
> being exceeded, typically manifesting as the error message:
> {code}
> /bin/bash: Argument list too long
> {code}
> A [Google 
> search|https://www.google.com/search?q=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22]
>  indicates that this is not a theoretical problem and afflicts real users, 
> including ours. This issue was originally observed on Spark 2.3, but has been 
> confirmed to exist in the master branch as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35672) Spark fails to launch executors with very large user classpath lists on YARN

2021-09-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421018#comment-17421018
 ] 

Apache Spark commented on SPARK-35672:
--

User 'xkrogen' has created a pull request for this issue:
https://github.com/apache/spark/pull/34120

> Spark fails to launch executors with very large user classpath lists on YARN
> 
>
> Key: SPARK-35672
> URL: https://issues.apache.org/jira/browse/SPARK-35672
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Affects Versions: 3.1.2
> Environment: Linux RHEL7
> Spark 3.1.1
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
>
> When running Spark on YARN, the {{user-class-path}} argument to 
> {{CoarseGrainedExecutorBackend}} is used to pass a list of user JAR URIs to 
> executor processes. The argument is specified once for each JAR, and the URIs 
> are fully-qualified, so the paths can be quite long. With large user JAR 
> lists (say 1000+), this can result in system-level argument length limits 
> being exceeded, typically manifesting as the error message:
> {code}
> /bin/bash: Argument list too long
> {code}
> A [Google 
> search|https://www.google.com/search?q=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22]
>  indicates that this is not a theoretical problem and afflicts real users, 
> including ours. This issue was originally observed on Spark 2.3, but has been 
> confirmed to exist in the master branch as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36867) Misleading Error Message with Invalid Column and Group By

2021-09-27 Thread Alan Jackoway (Jira)

Alan Jackoway created SPARK-36867:
-

 Summary: Misleading Error Message with Invalid Column and Group By
 Key: SPARK-36867
 URL: https://issues.apache.org/jira/browse/SPARK-36867
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.2
Reporter: Alan Jackoway


When you run a query with an invalid column that also does a group by on a 
constructed column, the error message you get back references a missing column 
for the group by rather than the invalid column.

You can reproduce this in pyspark in 3.1.2 with the following code:
{code:python}
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("Group By Issue").getOrCreate()
data = spark.createDataFrame(
[("2021-09-15", 1), ("2021-09-16", 2), ("2021-09-17", 10), ("2021-09-18", 
25), ("2021-09-19", 500), ("2021-09-20", 50), ("2021-09-21", 100)],
schema=["d", "v"]
)
data.createOrReplaceTempView("data")
# This is valid
spark.sql("select sum(v) as value, date(date_trunc('week', d)) as week from 
data group by week").show()
# This is invalid because val is the wrong variable
spark.sql("select sum(val) as value, date(date_trunc('week', d)) as week from 
data group by week").show()
{code}

The error message for the second spark.sql line is
{quote}
pyspark.sql.utils.AnalysisException: cannot resolve '`week`' given input 
columns: [data.d, data.v]; line 1 pos 81;
'Aggregate ['week], ['sum('val) AS value#21, cast(date_trunc(week, cast(d#0 as 
timestamp), Some(America/New_York)) as date) AS week#22]
+- SubqueryAlias data
   +- LogicalRDD [d#0, v#1L], false
{quote}
but the actual problem is that I used the wrong variable name in a different 
part of the query. Nothing is wrong with {{week}} in this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17722) YarnScheduler: Initial job has not accepted any resources

2021-09-27 Thread Davide Benedetto (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-17722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420999#comment-17420999
 ] 

Davide Benedetto commented on SPARK-17722:
--

Hi Partha
I have your same issue. Which yarn configurations have you set?



> YarnScheduler: Initial job has not accepted any resources
> -
>
> Key: SPARK-17722
> URL: https://issues.apache.org/jira/browse/SPARK-17722
> Project: Spark
>  Issue Type: Bug
>Reporter: Partha Pratim Ghosh
>Priority: Major
>
> Connected spark in yarn mode from eclipse java. On trying to run task it is 
> giving the following - 
> YarnScheduler: Initial job has not accepted any resources; check your cluster 
> UI to ensure that workers are registered and have sufficient resources. The 
> request is going to Hadoop cluster scheduler and from there we can see the 
> job in spark UI. But there it is saying that no task has been assigned for 
> this.
> Same code is running from spark-submit where we need to remove the following 
> lines - 
> System.setProperty("java.security.krb5.conf", "C:\\xxx\\krb5.conf");
>   
>   org.apache.hadoop.conf.Configuration conf = new 
>   org.apache.hadoop.conf.Configuration();
>   conf.set("hadoop.security.authentication", "kerberos");
>   UserGroupInformation.setConfiguration(conf);
> Following is the configuration - 
> import org.apache.hadoop.security.UserGroupInformation;
> import org.apache.spark.SparkConf;
> import org.apache.spark.api.java.JavaRDD;
> import org.apache.spark.api.java.JavaSparkContext;
> import org.apache.spark.sql.DataFrame;
> import org.apache.spark.sql.SQLContext;
> public class TestConnectivity {
>   /**
>* @param args
>*/
>   public static void main(String[] args) {
>   System.setProperty("java.security.krb5.conf", 
> "C:\\xxx\\krb5.conf");
>   
>   org.apache.hadoop.conf.Configuration conf = new 
>   org.apache.hadoop.conf.Configuration();
>   conf.set("hadoop.security.authentication", "kerberos");
>   UserGroupInformation.setConfiguration(conf);
>SparkConf config = new SparkConf().setAppName("Test Spark ");
>config = config.setMaster("yarn-client");
>config .set("spark.dynamicAllocation.enabled", "false");
>config.set("spark.executor.memory", "2g");
>config.set("spark.executor.instances", "1");
>config.set("spark.executor.cores", "2");
>//config.set("spark.driver.memory", "2g");
>//config.set("spark.driver.cores", "1");
>/*config.set("spark.executor.am.memory", "2g");
>config.set("spark.executor.am.cores", "2");*/
>config.set("spark.cores.max", "4");
>config.set("yarn.nodemanager.resource.cpu-vcores","4");
>config.set("spark.yarn.queue","root.root");
>/*config.set("spark.deploy.defaultCores", "2");
>config.set("spark.task.cpus", "2");*/
>config.set("spark.yarn.jar", 
> "file:/C:/xxx/spark-assembly_2.10-1.6.0-cdh5.7.1.jar");
>   JavaSparkContext sc = new JavaSparkContext(config);
>   SQLContext sqlcontext = new SQLContext(sc);
>   DataFrame df = sqlcontext.jsonFile(logFile);
>   JavaRDD logData = 
> sc.textFile("sparkexamples/Employee.json").cache();
>  DataFrame df = sqlcontext.jsonRDD(logData);
>  
>   df.show();
>   df.printSchema();
>   
>   //UserGroupInformation.setConfiguration(conf);
>   
>   }
> }



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36863) Update dependency manifests for all released artifacts

2021-09-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36863:


Assignee: Apache Spark

> Update dependency manifests for all released artifacts
> --
>
> Key: SPARK-36863
> URL: https://issues.apache.org/jira/browse/SPARK-36863
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Chao Sun
>Assignee: Apache Spark
>Priority: Minor
>
> We should update dependency manifests for all released artifacts. Currently 
> we don't do for modules such as {{hadoop-cloud}}, {{kinesis-asl}} etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36863) Update dependency manifests for all released artifacts

2021-09-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36863:


Assignee: (was: Apache Spark)

> Update dependency manifests for all released artifacts
> --
>
> Key: SPARK-36863
> URL: https://issues.apache.org/jira/browse/SPARK-36863
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Chao Sun
>Priority: Minor
>
> We should update dependency manifests for all released artifacts. Currently 
> we don't do for modules such as {{hadoop-cloud}}, {{kinesis-asl}} etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36863) Update dependency manifests for all released artifacts

2021-09-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420994#comment-17420994
 ] 

Apache Spark commented on SPARK-36863:
--

User 'sunchao' has created a pull request for this issue:
https://github.com/apache/spark/pull/34119

> Update dependency manifests for all released artifacts
> --
>
> Key: SPARK-36863
> URL: https://issues.apache.org/jira/browse/SPARK-36863
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Chao Sun
>Priority: Minor
>
> We should update dependency manifests for all released artifacts. Currently 
> we don't do for modules such as {{hadoop-cloud}}, {{kinesis-asl}} etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36863) Update dependency manifests for all released artifacts

2021-09-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420995#comment-17420995
 ] 

Apache Spark commented on SPARK-36863:
--

User 'sunchao' has created a pull request for this issue:
https://github.com/apache/spark/pull/34119

> Update dependency manifests for all released artifacts
> --
>
> Key: SPARK-36863
> URL: https://issues.apache.org/jira/browse/SPARK-36863
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Chao Sun
>Assignee: Apache Spark
>Priority: Minor
>
> We should update dependency manifests for all released artifacts. Currently 
> we don't do for modules such as {{hadoop-cloud}}, {{kinesis-asl}} etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36847) Explicitly specify error codes when ignoring type hint errors

2021-09-27 Thread Takuya Ueshin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-36847.
---
Fix Version/s: 3.3.0
 Assignee: Takuya Ueshin
   Resolution: Fixed

Issue resolved by pull request 34102
https://github.com/apache/spark/pull/34102

> Explicitly specify error codes when ignoring type hint errors
> -
>
> Key: SPARK-36847
> URL: https://issues.apache.org/jira/browse/SPARK-36847
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
> Fix For: 3.3.0
>
>
> We use a lot of {{type: ignore}} annotation to ignore type hint errors in 
> pandas-on-Spark.
> We should explicitly specify the error codes to make it clear what kind of 
> error is being ignored, then the type hint checker can check more cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36866) Pushdown filters with ANSI interval values to parquet

2021-09-27 Thread Max Gekk (Jira)

Max Gekk created SPARK-36866:


 Summary: Pushdown filters with ANSI interval values to parquet
 Key: SPARK-36866
 URL: https://issues.apache.org/jira/browse/SPARK-36866
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0
Reporter: Max Gekk
Assignee: Max Gekk






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36865) Add PySpark API document of session_window

2021-09-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36865:


Assignee: Kousuke Saruta  (was: Apache Spark)

> Add PySpark API document of session_window
> --
>
> Key: SPARK-36865
> URL: https://issues.apache.org/jira/browse/SPARK-36865
> Project: Spark
>  Issue Type: Bug
>  Components: docs, PySpark
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> There is no PySpark API document of session_window.
> The docstring of the function also doesn't comply with the numpydoc format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36865) Add PySpark API document of session_window

2021-09-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420959#comment-17420959
 ] 

Apache Spark commented on SPARK-36865:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/34118

> Add PySpark API document of session_window
> --
>
> Key: SPARK-36865
> URL: https://issues.apache.org/jira/browse/SPARK-36865
> Project: Spark
>  Issue Type: Bug
>  Components: docs, PySpark
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> There is no PySpark API document of session_window.
> The docstring of the function also doesn't comply with the numpydoc format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36865) Add PySpark API document of session_window

2021-09-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36865:


Assignee: Apache Spark  (was: Kousuke Saruta)

> Add PySpark API document of session_window
> --
>
> Key: SPARK-36865
> URL: https://issues.apache.org/jira/browse/SPARK-36865
> Project: Spark
>  Issue Type: Bug
>  Components: docs, PySpark
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Apache Spark
>Priority: Major
>
> There is no PySpark API document of session_window.
> The docstring of the function also doesn't comply with the numpydoc format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36821) Create a test to extend ColumnarBatch

2021-09-27 Thread DB Tsai (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DB Tsai reassigned SPARK-36821:
---

Assignee: Yufei Gu

> Create a test to extend ColumnarBatch
> -
>
> Key: SPARK-36821
> URL: https://issues.apache.org/jira/browse/SPARK-36821
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
>Priority: Major
>
> As a followup of Spark-36814, to create a test to extend ColumnarBatch to 
> prevent future changes to break it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36821) Create a test to extend ColumnarBatch

2021-09-27 Thread DB Tsai (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DB Tsai resolved SPARK-36821.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34087
[https://github.com/apache/spark/pull/34087]

> Create a test to extend ColumnarBatch
> -
>
> Key: SPARK-36821
> URL: https://issues.apache.org/jira/browse/SPARK-36821
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
>Priority: Major
> Fix For: 3.3.0
>
>
> As a followup of Spark-36814, to create a test to extend ColumnarBatch to 
> prevent future changes to break it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36865) Add PySpark API document of session_window

2021-09-27 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-36865:
---
Summary: Add PySpark API document of session_window  (was: Add PySpark API 
document for session_window)

> Add PySpark API document of session_window
> --
>
> Key: SPARK-36865
> URL: https://issues.apache.org/jira/browse/SPARK-36865
> Project: Spark
>  Issue Type: Bug
>  Components: docs, PySpark
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> There is no PySpark API document for session_window.
> The docstring of the function also doesn't comply with the numpydoc format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36865) Add PySpark API document of session_window

2021-09-27 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-36865:
---
Description: 
There is no PySpark API document of session_window.
The docstring of the function also doesn't comply with the numpydoc format.

  was:
There is no PySpark API document for session_window.
The docstring of the function also doesn't comply with the numpydoc format.


> Add PySpark API document of session_window
> --
>
> Key: SPARK-36865
> URL: https://issues.apache.org/jira/browse/SPARK-36865
> Project: Spark
>  Issue Type: Bug
>  Components: docs, PySpark
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> There is no PySpark API document of session_window.
> The docstring of the function also doesn't comply with the numpydoc format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36865) Add PySpark API document for session_window

2021-09-27 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-36865:
---
Description: 
There is no PySpark API document for session_window.
The docstring of the function also doesn't comply with the numpydoc format.

  was:The layout of PySpark API document for session_window is broken because 
the corresponding docstring doesn't comply with numpydoc format.


> Add PySpark API document for session_window
> ---
>
> Key: SPARK-36865
> URL: https://issues.apache.org/jira/browse/SPARK-36865
> Project: Spark
>  Issue Type: Bug
>  Components: docs, PySpark
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> There is no PySpark API document for session_window.
> The docstring of the function also doesn't comply with the numpydoc format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36865) Add PySpark API document for session_window

2021-09-27 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-36865:
---
Summary: Add PySpark API document for session_window  (was: The layout of 
PySpark API document for session_window is broken)

> Add PySpark API document for session_window
> ---
>
> Key: SPARK-36865
> URL: https://issues.apache.org/jira/browse/SPARK-36865
> Project: Spark
>  Issue Type: Bug
>  Components: docs, PySpark
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> The layout of PySpark API document for session_window is broken because the 
> corresponding docstring doesn't comply with numpydoc format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36865) The layout of PySpark API document for session_window is broken

2021-09-27 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-36865:
--

 Summary: The layout of PySpark API document for session_window is 
broken
 Key: SPARK-36865
 URL: https://issues.apache.org/jira/browse/SPARK-36865
 Project: Spark
  Issue Type: Bug
  Components: docs, PySpark
Affects Versions: 3.2.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


The layout of PySpark API document for session_window is broken because the 
corresponding docstring doesn't comply with numpydoc format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36864) guava version mismatch with hadoop-aws

2021-09-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36864:


Assignee: (was: Apache Spark)

> guava version mismatch with hadoop-aws
> --
>
> Key: SPARK-36864
> URL: https://issues.apache.org/jira/browse/SPARK-36864
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.2
>Reporter: Zhongwei Zhu
>Priority: Minor
>
> When use hadoop-aws 3.2 with spark 3.0, got below error. This is caused by 
> guava version mismatch as hadoop used guava 27.0-jre while spark used 14.0.1.
> Exception in thread "main" java.lang.NoSuchMethodError: 
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;Ljava/lang/Object;)V
>  at org.apache.hadoop.fs.s3a.S3AUtils.lookupPassword(S3AUtils.java:742)
>  at org.apache.hadoop.fs.s3a.S3AUtils.getAWSAccessKeys(S3AUtils.java:712)
>  at 
> org.apache.hadoop.fs.s3a.S3AUtils.createAWSCredentialProviderSet(S3AUtils.java:559)
>  at 
> org.apache.hadoop.fs.s3a.DefaultS3ClientFactory.createS3Client(DefaultS3ClientFactory.java:52)
>  at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:264)
>  at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
>  at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
>  at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
>  at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
>  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
>  at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1853)
>  at 
> org.apache.spark.deploy.history.EventLogFileWriter.(EventLogFileWriters.scala:60)
>  at 
> org.apache.spark.deploy.history.SingleEventLogFileWriter.(EventLogFileWriters.scala:213)
>  at 
> org.apache.spark.deploy.history.EventLogFileWriter$.apply(EventLogFileWriters.scala:181)
>  at 
> org.apache.spark.scheduler.EventLoggingListener.(EventLoggingListener.scala:66)
>  at org.apache.spark.SparkContext.(SparkContext.scala:584)
>  at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2588)
>  at 
> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:937)
>  at scala.Option.getOrElse(Option.scala:189)
>  at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:931)
>  at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:30)
>  at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>  at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:944)
>  at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>  at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>  at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>  at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1023)
>  at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1032)
>  at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36864) guava version mismatch with hadoop-aws

2021-09-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36864:


Assignee: Apache Spark

> guava version mismatch with hadoop-aws
> --
>
> Key: SPARK-36864
> URL: https://issues.apache.org/jira/browse/SPARK-36864
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.2
>Reporter: Zhongwei Zhu
>Assignee: Apache Spark
>Priority: Minor
>
> When use hadoop-aws 3.2 with spark 3.0, got below error. This is caused by 
> guava version mismatch as hadoop used guava 27.0-jre while spark used 14.0.1.
> Exception in thread "main" java.lang.NoSuchMethodError: 
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;Ljava/lang/Object;)V
>  at org.apache.hadoop.fs.s3a.S3AUtils.lookupPassword(S3AUtils.java:742)
>  at org.apache.hadoop.fs.s3a.S3AUtils.getAWSAccessKeys(S3AUtils.java:712)
>  at 
> org.apache.hadoop.fs.s3a.S3AUtils.createAWSCredentialProviderSet(S3AUtils.java:559)
>  at 
> org.apache.hadoop.fs.s3a.DefaultS3ClientFactory.createS3Client(DefaultS3ClientFactory.java:52)
>  at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:264)
>  at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
>  at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
>  at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
>  at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
>  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
>  at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1853)
>  at 
> org.apache.spark.deploy.history.EventLogFileWriter.(EventLogFileWriters.scala:60)
>  at 
> org.apache.spark.deploy.history.SingleEventLogFileWriter.(EventLogFileWriters.scala:213)
>  at 
> org.apache.spark.deploy.history.EventLogFileWriter$.apply(EventLogFileWriters.scala:181)
>  at 
> org.apache.spark.scheduler.EventLoggingListener.(EventLoggingListener.scala:66)
>  at org.apache.spark.SparkContext.(SparkContext.scala:584)
>  at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2588)
>  at 
> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:937)
>  at scala.Option.getOrElse(Option.scala:189)
>  at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:931)
>  at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:30)
>  at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>  at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:944)
>  at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>  at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>  at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>  at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1023)
>  at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1032)
>  at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36864) guava version mismatch with hadoop-aws

2021-09-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420935#comment-17420935
 ] 

Apache Spark commented on SPARK-36864:
--

User 'warrenzhu25' has created a pull request for this issue:
https://github.com/apache/spark/pull/34117

> guava version mismatch with hadoop-aws
> --
>
> Key: SPARK-36864
> URL: https://issues.apache.org/jira/browse/SPARK-36864
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.2
>Reporter: Zhongwei Zhu
>Priority: Minor
>
> When use hadoop-aws 3.2 with spark 3.0, got below error. This is caused by 
> guava version mismatch as hadoop used guava 27.0-jre while spark used 14.0.1.
> Exception in thread "main" java.lang.NoSuchMethodError: 
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;Ljava/lang/Object;)V
>  at org.apache.hadoop.fs.s3a.S3AUtils.lookupPassword(S3AUtils.java:742)
>  at org.apache.hadoop.fs.s3a.S3AUtils.getAWSAccessKeys(S3AUtils.java:712)
>  at 
> org.apache.hadoop.fs.s3a.S3AUtils.createAWSCredentialProviderSet(S3AUtils.java:559)
>  at 
> org.apache.hadoop.fs.s3a.DefaultS3ClientFactory.createS3Client(DefaultS3ClientFactory.java:52)
>  at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:264)
>  at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
>  at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
>  at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
>  at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
>  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
>  at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1853)
>  at 
> org.apache.spark.deploy.history.EventLogFileWriter.(EventLogFileWriters.scala:60)
>  at 
> org.apache.spark.deploy.history.SingleEventLogFileWriter.(EventLogFileWriters.scala:213)
>  at 
> org.apache.spark.deploy.history.EventLogFileWriter$.apply(EventLogFileWriters.scala:181)
>  at 
> org.apache.spark.scheduler.EventLoggingListener.(EventLoggingListener.scala:66)
>  at org.apache.spark.SparkContext.(SparkContext.scala:584)
>  at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2588)
>  at 
> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:937)
>  at scala.Option.getOrElse(Option.scala:189)
>  at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:931)
>  at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:30)
>  at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>  at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:944)
>  at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>  at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>  at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>  at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1023)
>  at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1032)
>  at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36864) guava version mismatch with hadoop-aws

2021-09-27 Thread Zhongwei Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhongwei Zhu updated SPARK-36864:
-
Summary: guava version mismatch with hadoop-aws  (was: guava version 
mismatch between hadoop-aws and spark)

> guava version mismatch with hadoop-aws
> --
>
> Key: SPARK-36864
> URL: https://issues.apache.org/jira/browse/SPARK-36864
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.2
>Reporter: Zhongwei Zhu
>Priority: Minor
>
> When use hadoop-aws 3.2 with spark 3.0, got below error. This is caused by 
> guava version mismatch as hadoop used guava 27.0-jre while spark used 14.0.1.
> Exception in thread "main" java.lang.NoSuchMethodError: 
> com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;Ljava/lang/Object;)V
>  at org.apache.hadoop.fs.s3a.S3AUtils.lookupPassword(S3AUtils.java:742)
>  at org.apache.hadoop.fs.s3a.S3AUtils.getAWSAccessKeys(S3AUtils.java:712)
>  at 
> org.apache.hadoop.fs.s3a.S3AUtils.createAWSCredentialProviderSet(S3AUtils.java:559)
>  at 
> org.apache.hadoop.fs.s3a.DefaultS3ClientFactory.createS3Client(DefaultS3ClientFactory.java:52)
>  at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:264)
>  at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
>  at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
>  at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
>  at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
>  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
>  at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1853)
>  at 
> org.apache.spark.deploy.history.EventLogFileWriter.(EventLogFileWriters.scala:60)
>  at 
> org.apache.spark.deploy.history.SingleEventLogFileWriter.(EventLogFileWriters.scala:213)
>  at 
> org.apache.spark.deploy.history.EventLogFileWriter$.apply(EventLogFileWriters.scala:181)
>  at 
> org.apache.spark.scheduler.EventLoggingListener.(EventLoggingListener.scala:66)
>  at org.apache.spark.SparkContext.(SparkContext.scala:584)
>  at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2588)
>  at 
> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:937)
>  at scala.Option.getOrElse(Option.scala:189)
>  at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:931)
>  at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:30)
>  at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>  at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:944)
>  at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>  at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>  at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>  at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1023)
>  at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1032)
>  at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36848) Migrate ShowCurrentNamespaceStatement to v2 command framework

2021-09-27 Thread L. C. Hsieh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh reassigned SPARK-36848:
---

Assignee: Huaxin Gao

> Migrate ShowCurrentNamespaceStatement to v2 command framework
> -
>
> Key: SPARK-36848
> URL: https://issues.apache.org/jira/browse/SPARK-36848
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36848) Migrate ShowCurrentNamespaceStatement to v2 command framework

2021-09-27 Thread L. C. Hsieh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh resolved SPARK-36848.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34104
[https://github.com/apache/spark/pull/34104]

> Migrate ShowCurrentNamespaceStatement to v2 command framework
> -
>
> Key: SPARK-36848
> URL: https://issues.apache.org/jira/browse/SPARK-36848
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36864) guava version mismatch between hadoop-aws and spark

2021-09-27 Thread Zhongwei Zhu (Jira)

Zhongwei Zhu created SPARK-36864:


 Summary: guava version mismatch between hadoop-aws and spark
 Key: SPARK-36864
 URL: https://issues.apache.org/jira/browse/SPARK-36864
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.1.2
Reporter: Zhongwei Zhu


When use hadoop-aws 3.2 with spark 3.0, got below error. This is caused by 
guava version mismatch as hadoop used guava 27.0-jre while spark used 14.0.1.

Exception in thread "main" java.lang.NoSuchMethodError: 
com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;Ljava/lang/Object;)V
 at org.apache.hadoop.fs.s3a.S3AUtils.lookupPassword(S3AUtils.java:742)
 at org.apache.hadoop.fs.s3a.S3AUtils.getAWSAccessKeys(S3AUtils.java:712)
 at 
org.apache.hadoop.fs.s3a.S3AUtils.createAWSCredentialProviderSet(S3AUtils.java:559)
 at 
org.apache.hadoop.fs.s3a.DefaultS3ClientFactory.createS3Client(DefaultS3ClientFactory.java:52)
 at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:264)
 at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
 at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
 at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
 at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
 at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1853)
 at 
org.apache.spark.deploy.history.EventLogFileWriter.(EventLogFileWriters.scala:60)
 at 
org.apache.spark.deploy.history.SingleEventLogFileWriter.(EventLogFileWriters.scala:213)
 at 
org.apache.spark.deploy.history.EventLogFileWriter$.apply(EventLogFileWriters.scala:181)
 at 
org.apache.spark.scheduler.EventLoggingListener.(EventLoggingListener.scala:66)
 at org.apache.spark.SparkContext.(SparkContext.scala:584)
 at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2588)
 at 
org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:937)
 at scala.Option.getOrElse(Option.scala:189)
 at 
org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:931)
 at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:30)
 at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
 at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:944)
 at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
 at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
 at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
 at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1023)
 at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1032)
 at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36863) Update dependency manifests for all released artifacts

2021-09-27 Thread Chao Sun (Jira)

Chao Sun created SPARK-36863:


 Summary: Update dependency manifests for all released artifacts
 Key: SPARK-36863
 URL: https://issues.apache.org/jira/browse/SPARK-36863
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.3.0
Reporter: Chao Sun


We should update dependency manifests for all released artifacts. Currently we 
don't do for modules such as {{hadoop-cloud}}, {{kinesis-asl}} etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31646) Remove unused registeredConnections counter from ShuffleMetrics

2021-09-27 Thread Yongjun Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420870#comment-17420870
 ] 

Yongjun Zhang commented on SPARK-31646:
---

Thanks [~mauzhang].

> Remove unused registeredConnections counter from ShuffleMetrics
> ---
>
> Key: SPARK-31646
> URL: https://issues.apache.org/jira/browse/SPARK-31646
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy, Shuffle, Spark Core
>Affects Versions: 3.0.0
>Reporter: Manu Zhang
>Assignee: Manu Zhang
>Priority: Minor
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32855) Improve DPP for some join type do not support broadcast filtering side

2021-09-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420835#comment-17420835
 ] 

Apache Spark commented on SPARK-32855:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/34116

> Improve DPP for some join type do not support broadcast filtering side
> --
>
> Key: SPARK-32855
> URL: https://issues.apache.org/jira/browse/SPARK-32855
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.2.0
>
>
> For some filtering side can not broadcast by join type but can broadcast by 
> size,
> then we should not consider reuse broadcast only, for example:
> Left outer join and left side very small.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32855) Improve DPP for some join type do not support broadcast filtering side

2021-09-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420834#comment-17420834
 ] 

Apache Spark commented on SPARK-32855:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/34116

> Improve DPP for some join type do not support broadcast filtering side
> --
>
> Key: SPARK-32855
> URL: https://issues.apache.org/jira/browse/SPARK-32855
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.2.0
>
>
> For some filtering side can not broadcast by join type but can broadcast by 
> size,
> then we should not consider reuse broadcast only, for example:
> Left outer join and left side very small.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36861) Partition columns are overly eagerly parsed as dates

2021-09-27 Thread Xiao Li (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-36861:

Target Version/s: 3.3.0
Priority: Blocker  (was: Major)

> Partition columns are overly eagerly parsed as dates
> 
>
> Key: SPARK-36861
> URL: https://issues.apache.org/jira/browse/SPARK-36861
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Tanel Kiis
>Priority: Blocker
>
> I have an input directory with subdirs:
> * hour=2021-01-01T00
> * hour=2021-01-01T01
> * hour=2021-01-01T02
> * ...
> in spark 3.1 the 'hour' column is parsed as a string type, but in 3.2 RC it 
> is parsed as date type and the hour part is lost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-36058) Support replicasets/job API

2021-09-27 Thread Yikun Jiang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420568#comment-17420568
 ] 

Yikun Jiang edited comment on SPARK-36058 at 9/27/21, 2:32 PM:
---

After this, it makes the executor pod allocator pluggable, we could add the 
VolcanoJobAllocator, and it enables the volcano ability (with queue, podgoup 
and so on...) on excutors side. How about driver side? Do you have any 
suggestion on it? [~holden]

BTW, I just wanna sure that [1][2] abilities targets on driver/excutor side, 
right? That means we need add PodGroup/Queue abilities on each allocator and 
driver?

[1] scheduler & queue: https://issues.apache.org/jira/browse/SPARK-36059
[2] PodGroup  https://issues.apache.org/jira/browse/SPARK-36061


was (Author: yikunkero):
After this, it makes the executor pod allocator pluggable, we could add the 
VolcanoJobAllocator, and it enables the volcano ability (with queue, podgoup 
and so on...) on excutors side. How about driver side? Do you have any 
suggestion on it? [~holden] 

BTW, I just wanna sure that [1][2] abilities targets on driver/excutor side, 
right? That means we need add PodGroup/Queue abilities on each allocator?

[1] scheduler & queue: https://issues.apache.org/jira/browse/SPARK-36059
[2] PodGroup  https://issues.apache.org/jira/browse/SPARK-36061

> Support replicasets/job API
> ---
>
> Key: SPARK-36058
> URL: https://issues.apache.org/jira/browse/SPARK-36058
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
> Fix For: 3.3.0
>
>
> Volcano & Yunikorn both support scheduling invidual pods, but they also 
> support higher level abstractions similar to the vanilla Kube replicasets 
> which we can use to improve scheduling performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-36058) Support replicasets/job API

2021-09-27 Thread Yikun Jiang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420568#comment-17420568
 ] 

Yikun Jiang edited comment on SPARK-36058 at 9/27/21, 2:31 PM:
---

After this, it makes the executor pod allocator pluggable, we could add the 
VolcanoJobAllocator, and it enables the volcano ability (with queue, podgoup 
and so on...) on excutors side. How about driver side? Do you have any 
suggestion on it? [~holden] 

BTW, I just wanna sure that [1][2] abilities targets on driver/excutor side, 
right? That means we need add PodGroup/Queue abilities on each allocator?

[1] scheduler & queue: https://issues.apache.org/jira/browse/SPARK-36059
[2] PodGroup  https://issues.apache.org/jira/browse/SPARK-36061


was (Author: yikunkero):
After this, it makes the executor pod allocator pluggable, we could add the 
VolcanoJobAllocator, and it enables the volcano ability (with queue, podgoup 
and so on...) on excutors side.

How about driver side? Do you have any suggestion on it? [~holden] 

BTW, I just wanna sure that [1][2] abilities targets on driver/excutor side, 
right?

[1] scheduler & queue: https://issues.apache.org/jira/browse/SPARK-36059
[2] PodGroup  https://issues.apache.org/jira/browse/SPARK-36061

> Support replicasets/job API
> ---
>
> Key: SPARK-36058
> URL: https://issues.apache.org/jira/browse/SPARK-36058
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
> Fix For: 3.3.0
>
>
> Volcano & Yunikorn both support scheduling invidual pods, but they also 
> support higher level abstractions similar to the vanilla Kube replicasets 
> which we can use to improve scheduling performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36829) Refactor collectionOperation related Null check related code

2021-09-27 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-36829:
---

Assignee: angerszhu

> Refactor collectionOperation related Null check related code
> 
>
> Key: SPARK-36829
> URL: https://issues.apache.org/jira/browse/SPARK-36829
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.2, 3.1.2, 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36829) Refactor collectionOperation related Null check related code

2021-09-27 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-36829.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34077
[https://github.com/apache/spark/pull/34077]

> Refactor collectionOperation related Null check related code
> 
>
> Key: SPARK-36829
> URL: https://issues.apache.org/jira/browse/SPARK-36829
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.2, 3.1.2, 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36862) ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java'

2021-09-27 Thread Jungtaek Lim (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420746#comment-17420746
 ] 

Jungtaek Lim commented on SPARK-36862:
--

I guess you'd have the generated code in the log. Please attach the snippet. If 
there's no log for generated code, please raise a log level for CodeGenerator 
in log4j config and try running the query again.

org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator=DEBUG

> ERROR CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java'
> -
>
> Key: SPARK-36862
> URL: https://issues.apache.org/jira/browse/SPARK-36862
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit, SQL
>Affects Versions: 3.1.1, 3.1.2
> Environment: Spark 3.1.1 and Spark 3.1.2
> hadoop 3.2.1
>Reporter: Magdalena Pilawska
>Priority: Major
>
> Hi,
> I am getting the following error running spark-submit command:
> ERROR CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 321, Column 103: ')' expected instead of '['
>  
> It fails running the spark sql command on delta lake: 
> spark.sql(sqlTransformation)
> The template of sqlTransformation is as follows:
> MERGE INTO target_table AS d
>  USING source_table AS s 
>  on s.id = d.id
>  WHEN MATCHED AND d.hash_value <> s.hash_value
>  THEN UPDATE SET d.name =s.name, d.address = s.address
>  
> It is permanent error both for *spark 3.1.1* and *3.1.2* versions.
>  
> The same works fine with spark 3.0.0.
>  
> Here is the full log:
> 2021-09-22 16:43:22,110 ERROR CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 55, Column 103: ')' expected instead of '['2021-09-22 16:43:22,110 ERROR 
> CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 55, Column 103: ')' expected instead of 
> '['org.codehaus.commons.compiler.CompileException: File 'generated.java', 
> Line 55, Column 103: ')' expected instead of '[' at 
> org.codehaus.janino.TokenStreamImpl.compileException(TokenStreamImpl.java:362)
>  at org.codehaus.janino.TokenStreamImpl.read(TokenStreamImpl.java:150) at 
> org.codehaus.janino.Parser.read(Parser.java:3703) at 
> org.codehaus.janino.Parser.parseFormalParameters(Parser.java:1622) at 
> org.codehaus.janino.Parser.parseMethodDeclarationRest(Parser.java:1518) at 
> org.codehaus.janino.Parser.parseClassBodyDeclaration(Parser.java:1028) at 
> org.codehaus.janino.Parser.parseClassBody(Parser.java:841) at 
> org.codehaus.janino.Parser.parseClassDeclarationRest(Parser.java:736) at 
> org.codehaus.janino.Parser.parseClassBodyDeclaration(Parser.java:941) at 
> org.codehaus.janino.ClassBodyEvaluator.cook(ClassBodyEvaluator.java:234) at 
> org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:205) at 
> org.codehaus.commons.compiler.Cookable.cook(Cookable.java:80) at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1427)
>  at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1524)
>  at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1521)
>  at 
> org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>  at 
> org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>  at 
> org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>  at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) 
> at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000) at 
> org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at 
> org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>  at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1375)
>  at 
> org.apache.spark.sql.execution.WholeStageCodegenExec.liftedTree1$1(WholeStageCodegenExec.scala:721)
>  at 
> org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:720)
>  at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:185)
>  at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:223)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:220) at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:181) at 
>

[jira] [Commented] (SPARK-36817) Does Apache Spark 3 support GPU usage for Spark RDDs?

2021-09-27 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420737#comment-17420737
 ] 

Thomas Graves commented on SPARK-36817:
---

please refer to https://github.com/NVIDIA/spark-rapids/issues/35791

> Does Apache Spark 3 support GPU usage for Spark RDDs?
> -
>
> Key: SPARK-36817
> URL: https://issues.apache.org/jira/browse/SPARK-36817
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: Abhishek Shakya
>Priority: Major
>
> I am currently trying to run genomic analyses pipelines using 
> [Hail|https://hail.is/](library for genomics analyses written in python and 
> Scala). Recently, Apache Spark 3 was released and it supported GPU usage.
> I tried [spark-rapids|https://nvidia.github.io/spark-rapids/] library start 
> an on-premise slurm cluster with gpu nodes. I was able to initialise the 
> cluster. However, when I tried running hail tasks, the executors keep getting 
> killed.
> On querying in Hail forum, I got the response that
> {quote}That’s a GPU code generator for Spark-SQL, and Hail doesn’t use any 
> Spark-SQL interfaces, only the RDD interfaces.
> {quote}
> So, does Spark3 not support GPU usage for RDD interfaces?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36862) ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java'

2021-09-27 Thread Magdalena Pilawska (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420730#comment-17420730
 ] 

Magdalena Pilawska commented on SPARK-36862:


Hi [~kabhwan],

I updated the description with triggered operation and log details, thanks.

> ERROR CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java'
> -
>
> Key: SPARK-36862
> URL: https://issues.apache.org/jira/browse/SPARK-36862
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit, SQL
>Affects Versions: 3.1.1, 3.1.2
> Environment: Spark 3.1.1 and Spark 3.1.2
> hadoop 3.2.1
>Reporter: Magdalena Pilawska
>Priority: Major
>
> Hi,
> I am getting the following error running spark-submit command:
> ERROR CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 321, Column 103: ')' expected instead of '['
>  
> It fails running the spark sql command on delta lake: 
> spark.sql(sqlTransformation)
> The template of sqlTransformation is as follows:
> MERGE INTO target_table AS d
>  USING source_table AS s 
>  on s.id = d.id
>  WHEN MATCHED AND d.hash_value <> s.hash_value
>  THEN UPDATE SET d.name =s.name, d.address = s.address
>  
> It is permanent error both for *spark 3.1.1* and *3.1.2* versions.
>  
> The same works fine with spark 3.0.0.
>  
> Here is the full log:
> 2021-09-22 16:43:22,110 ERROR CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 55, Column 103: ')' expected instead of '['2021-09-22 16:43:22,110 ERROR 
> CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 55, Column 103: ')' expected instead of 
> '['org.codehaus.commons.compiler.CompileException: File 'generated.java', 
> Line 55, Column 103: ')' expected instead of '[' at 
> org.codehaus.janino.TokenStreamImpl.compileException(TokenStreamImpl.java:362)
>  at org.codehaus.janino.TokenStreamImpl.read(TokenStreamImpl.java:150) at 
> org.codehaus.janino.Parser.read(Parser.java:3703) at 
> org.codehaus.janino.Parser.parseFormalParameters(Parser.java:1622) at 
> org.codehaus.janino.Parser.parseMethodDeclarationRest(Parser.java:1518) at 
> org.codehaus.janino.Parser.parseClassBodyDeclaration(Parser.java:1028) at 
> org.codehaus.janino.Parser.parseClassBody(Parser.java:841) at 
> org.codehaus.janino.Parser.parseClassDeclarationRest(Parser.java:736) at 
> org.codehaus.janino.Parser.parseClassBodyDeclaration(Parser.java:941) at 
> org.codehaus.janino.ClassBodyEvaluator.cook(ClassBodyEvaluator.java:234) at 
> org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:205) at 
> org.codehaus.commons.compiler.Cookable.cook(Cookable.java:80) at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1427)
>  at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1524)
>  at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1521)
>  at 
> org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>  at 
> org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>  at 
> org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>  at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) 
> at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000) at 
> org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at 
> org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>  at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1375)
>  at 
> org.apache.spark.sql.execution.WholeStageCodegenExec.liftedTree1$1(WholeStageCodegenExec.scala:721)
>  at 
> org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:720)
>  at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:185)
>  at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:223)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:220) at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:181) at 
> org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.inputRDD$lzycompute(ShuffleExchangeExec.scala:160)
>  at 
>

[jira] [Updated] (SPARK-36862) ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java'

2021-09-27 Thread Magdalena Pilawska (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Magdalena Pilawska updated SPARK-36862:
---
Description: 
Hi,

I am getting the following error running spark-submit command:

ERROR CodeGenerator: failed to compile: 
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
321, Column 103: ')' expected instead of '['

 

It fails running the spark sql command on delta lake: 
spark.sql(sqlTransformation)

The template of sqlTransformation is as follows:

MERGE INTO target_table AS d
 USING source_table AS s 
 on s.id = d.id
 WHEN MATCHED AND d.hash_value <> s.hash_value
 THEN UPDATE SET d.name =s.name, d.address = s.address

 

It is permanent error both for *spark 3.1.1* and *3.1.2* versions.

 

The same works fine with spark 3.0.0.

 

Here is the full log:

2021-09-22 16:43:22,110 ERROR CodeGenerator: failed to compile: 
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 55, 
Column 103: ')' expected instead of '['2021-09-22 16:43:22,110 ERROR 
CodeGenerator: failed to compile: 
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 55, 
Column 103: ')' expected instead of 
'['org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
55, Column 103: ')' expected instead of '[' at 
org.codehaus.janino.TokenStreamImpl.compileException(TokenStreamImpl.java:362) 
at org.codehaus.janino.TokenStreamImpl.read(TokenStreamImpl.java:150) at 
org.codehaus.janino.Parser.read(Parser.java:3703) at 
org.codehaus.janino.Parser.parseFormalParameters(Parser.java:1622) at 
org.codehaus.janino.Parser.parseMethodDeclarationRest(Parser.java:1518) at 
org.codehaus.janino.Parser.parseClassBodyDeclaration(Parser.java:1028) at 
org.codehaus.janino.Parser.parseClassBody(Parser.java:841) at 
org.codehaus.janino.Parser.parseClassDeclarationRest(Parser.java:736) at 
org.codehaus.janino.Parser.parseClassBodyDeclaration(Parser.java:941) at 
org.codehaus.janino.ClassBodyEvaluator.cook(ClassBodyEvaluator.java:234) at 
org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:205) at 
org.codehaus.commons.compiler.Cookable.cook(Cookable.java:80) at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1427)
 at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1524)
 at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1521)
 at 
org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
 at 
org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) 
at 
org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
 at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) 
at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000) at 
org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at 
org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
 at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1375)
 at 
org.apache.spark.sql.execution.WholeStageCodegenExec.liftedTree1$1(WholeStageCodegenExec.scala:721)
 at 
org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:720)
 at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:185)
 at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:223)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) 
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:220) 
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:181) at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.inputRDD$lzycompute(ShuffleExchangeExec.scala:160)
 at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.inputRDD(ShuffleExchangeExec.scala:160)
 at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.mapOutputStatisticsFuture$lzycompute(ShuffleExchangeExec.scala:164)
 at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.mapOutputStatisticsFuture(ShuffleExchangeExec.scala:163)
 at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeLike.$anonfun$materializeFuture$2(ShuffleExchangeExec.scala:100)
 at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52) 
at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeLike.$anonfun$materializeFuture$1(ShuffleExchangeExec.scala:100)
 at org.apache.spark.sql.util.LazyValue.getOrInit(LazyValue.scala:41) at 
org.apache.spark.sql.execution.exchange.Exchange.getOrInitMaterializeFuture(Exchange.scala:68)
 at

[jira] [Updated] (SPARK-36862) ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java'

2021-09-27 Thread Magdalena Pilawska (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Magdalena Pilawska updated SPARK-36862:
---
Description: 
Hi,

I am getting the following error running spark-submit command:

ERROR CodeGenerator: failed to compile: 
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
321, Column 103: ')' expected instead of '['

 

It fails running the spark sql command on delta lake: 
spark.sql(sqlTransformation)

The template of sqlTransformation is as follows:

MERGE INTO target_table AS d
USING source_table AS s 
on s.id = d.id
WHEN MATCHED AND d.hash_value <> s.hash_value
THEN UPDATE SET d.name =s.name, d.address = s.address

 

It is permanent error both for *spark 3.1.1* and *3.1.2* versions.

 

The same works fine with spark 3.0.0.

  was:
Hi,

I am getting the following error running spark-submit command:


ERROR CodeGenerator: failed to compile: 
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
321, Column 103: ')' expected instead of '['

 

It is permanent error both for spark 3.1.1 and 3.1.2 versions.

 

The same works fine with spark 3.0.0.


> ERROR CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java'
> -
>
> Key: SPARK-36862
> URL: https://issues.apache.org/jira/browse/SPARK-36862
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit, SQL
>Affects Versions: 3.1.1, 3.1.2
> Environment: Spark 3.1.1 and Spark 3.1.2
> hadoop 3.2.1
>Reporter: Magdalena Pilawska
>Priority: Major
>
> Hi,
> I am getting the following error running spark-submit command:
> ERROR CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 321, Column 103: ')' expected instead of '['
>  
> It fails running the spark sql command on delta lake: 
> spark.sql(sqlTransformation)
> The template of sqlTransformation is as follows:
> MERGE INTO target_table AS d
> USING source_table AS s 
> on s.id = d.id
> WHEN MATCHED AND d.hash_value <> s.hash_value
> THEN UPDATE SET d.name =s.name, d.address = s.address
>  
> It is permanent error both for *spark 3.1.1* and *3.1.2* versions.
>  
> The same works fine with spark 3.0.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31646) Remove unused registeredConnections counter from ShuffleMetrics

2021-09-27 Thread Manu Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420688#comment-17420688
 ] 

Manu Zhang commented on SPARK-31646:


{quote}So I will try to derive numBackLoggedConnections outside at the metrics 
monitoring system. Any better suggestion?
{quote}
This looks good.

> Remove unused registeredConnections counter from ShuffleMetrics
> ---
>
> Key: SPARK-31646
> URL: https://issues.apache.org/jira/browse/SPARK-31646
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy, Shuffle, Spark Core
>Affects Versions: 3.0.0
>Reporter: Manu Zhang
>Assignee: Manu Zhang
>Priority: Minor
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-36058) Support replicasets/job API

2021-09-27 Thread Yikun Jiang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420568#comment-17420568
 ] 

Yikun Jiang edited comment on SPARK-36058 at 9/27/21, 11:22 AM:


After this, it makes the executor pod allocator pluggable, we could add the 
VolcanoJobAllocator, and it enables the volcano ability (with queue, podgoup 
and so on...) on excutors side.

How about driver side? Do you have any suggestion on it? [~holden] 

BTW, I just wanna sure that [1][2] abilities targets on driver/excutor side, 
right?

[1] scheduler & queue: https://issues.apache.org/jira/browse/SPARK-36059
[2] PodGroup  https://issues.apache.org/jira/browse/SPARK-36061


was (Author: yikunkero):
After this, it makes the executor pod allocator pluggable, we could add the 
VolcanoJobAllocator, and it enables the volcano ability (with queue, podgoup 
and so on...) on excutors side.

and looks like[1][2] abilities also targets on excutor side, right?

How about driver side? Do you have any suggestion on it? [~holden] 

[1] scheduler & queue: https://issues.apache.org/jira/browse/SPARK-36059
[2] PodGroup  https://issues.apache.org/jira/browse/SPARK-36061

> Support replicasets/job API
> ---
>
> Key: SPARK-36058
> URL: https://issues.apache.org/jira/browse/SPARK-36058
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
> Fix For: 3.3.0
>
>
> Volcano & Yunikorn both support scheduling invidual pods, but they also 
> support higher level abstractions similar to the vanilla Kube replicasets 
> which we can use to improve scheduling performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36862) ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java'

2021-09-27 Thread Jungtaek Lim (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420671#comment-17420671
 ] 

Jungtaek Lim commented on SPARK-36862:
--

Could you please provide more information? I guess the log would contain the 
generated code which we can investigate on. Otherwise I could give some 
instruction to enable DEBUG log to retain the generated code.

Also could you please provide which operation(s) your query executes, if 
they're OK to be exposed to the public?

> ERROR CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java'
> -
>
> Key: SPARK-36862
> URL: https://issues.apache.org/jira/browse/SPARK-36862
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit, SQL
>Affects Versions: 3.1.1, 3.1.2
> Environment: Spark 3.1.1 and Spark 3.1.2
> hadoop 3.2.1
>Reporter: Magdalena Pilawska
>Priority: Major
>
> Hi,
> I am getting the following error running spark-submit command:
> ERROR CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 321, Column 103: ')' expected instead of '['
>  
> It is permanent error both for spark 3.1.1 and 3.1.2 versions.
>  
> The same works fine with spark 3.0.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36862) ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java'

2021-09-27 Thread Magdalena Pilawska (Jira)

Magdalena Pilawska created SPARK-36862:
--

 Summary: ERROR CodeGenerator: failed to compile: 
org.codehaus.commons.compiler.CompileException: File 'generated.java'
 Key: SPARK-36862
 URL: https://issues.apache.org/jira/browse/SPARK-36862
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit, SQL
Affects Versions: 3.1.2, 3.1.1
 Environment: Spark 3.1.1 and Spark 3.1.2

hadoop 3.2.1
Reporter: Magdalena Pilawska


Hi,

I am getting the following error running spark-submit command:


ERROR CodeGenerator: failed to compile: 
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
321, Column 103: ')' expected instead of '['

 

It is permanent error both for spark 3.1.1 and 3.1.2 versions.

 

The same works fine with spark 3.0.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36438) Support list-like Python objects for Series comparison

2021-09-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420656#comment-17420656
 ] 

Apache Spark commented on SPARK-36438:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/34114

> Support list-like Python objects for Series comparison
> --
>
> Key: SPARK-36438
> URL: https://issues.apache.org/jira/browse/SPARK-36438
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36438) Support list-like Python objects for Series comparison

2021-09-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36438:


Assignee: Apache Spark

> Support list-like Python objects for Series comparison
> --
>
> Key: SPARK-36438
> URL: https://issues.apache.org/jira/browse/SPARK-36438
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36438) Support list-like Python objects for Series comparison

2021-09-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420655#comment-17420655
 ] 

Apache Spark commented on SPARK-36438:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/34114

> Support list-like Python objects for Series comparison
> --
>
> Key: SPARK-36438
> URL: https://issues.apache.org/jira/browse/SPARK-36438
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36438) Support list-like Python objects for Series comparison

2021-09-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36438:


Assignee: (was: Apache Spark)

> Support list-like Python objects for Series comparison
> --
>
> Key: SPARK-36438
> URL: https://issues.apache.org/jira/browse/SPARK-36438
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36816) Introduce a config variable for the incrementalCollects row batch size

2021-09-27 Thread Ole (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420578#comment-17420578
 ] 

Ole commented on SPARK-36816:
-

I am running a Thrift Server {{/spark/sbin/start-thriftserver.sh}} with 
{{--conf spark.sql.thriftServer.incrementalCollect=true}} to prevent 
OutOfMemory Exceptions. Querying data results in batched result sets (as 
intended) with log messages like this:
{code:bash}
21/09/27 08:25:33 INFO SparkExecuteStatementOperation: Returning result set 
with 1000 rows from offsets [932000, 933000) with 
50f346c0-02d4-40a2-a73c-30d326d2aae{code}
I'd like to be able to configure the value of {{1000 rows }}to be able to 
adjust that value to our server capacity. Result would look like this:
{code:java}
21/09/27 08:25:33 INFO SparkExecuteStatementOperation: Returning result set 
with 1 rows from offsets [932000, 942000) with 
50f346c0-02d4-40a2-a73c-30d326d2aae{code}

> Introduce a config variable for the incrementalCollects row batch size
> --
>
> Key: SPARK-36816
> URL: https://issues.apache.org/jira/browse/SPARK-36816
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Ole
>Priority: Minor
>
> After enabling *_spark.sql.thriftServer.incrementalCollects_* Thrift will 
> execute queries in batches (as intended). Unfortunately the batch size cannot 
> be configured as it seems to be hardcoded 
> [here|https://github.com/apache/spark/blob/6699f76fe2afa7f154b4ba424f3fe048fcee46df/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/thrift/ThriftCLIServiceClient.java#L404].
>  It would be useful to configure that value to be able to adjust it to your 
> environment.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36058) Support replicasets/job API

2021-09-27 Thread Yikun Jiang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420568#comment-17420568
 ] 

Yikun Jiang commented on SPARK-36058:
-

After this, it makes the executor pod allocator pluggable, we could add the 
VolcanoJobAllocator, and it enables the volcano ability (with queue, podgoup 
and so on...) on excutors side.

and looks like[1][2] abilities also targets on excutor side, right?

How about driver side? Do you have any suggestion on it? [~holden] 

[1] scheduler & queue: https://issues.apache.org/jira/browse/SPARK-36059
[2] PodGroup  https://issues.apache.org/jira/browse/SPARK-36061

> Support replicasets/job API
> ---
>
> Key: SPARK-36058
> URL: https://issues.apache.org/jira/browse/SPARK-36058
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
> Fix For: 3.3.0
>
>
> Volcano & Yunikorn both support scheduling invidual pods, but they also 
> support higher level abstractions similar to the vanilla Kube replicasets 
> which we can use to improve scheduling performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36861) Partition columns are overly eagerly parsed as dates

2021-09-27 Thread Tanel Kiis (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420553#comment-17420553
 ] 

Tanel Kiis commented on SPARK-36861:


Sorry, indeed I ran the test on master. Nevermind it then, does not impact the 
3.2 release.

> Partition columns are overly eagerly parsed as dates
> 
>
> Key: SPARK-36861
> URL: https://issues.apache.org/jira/browse/SPARK-36861
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Tanel Kiis
>Priority: Major
>
> I have an input directory with subdirs:
> * hour=2021-01-01T00
> * hour=2021-01-01T01
> * hour=2021-01-01T02
> * ...
> in spark 3.1 the 'hour' column is parsed as a string type, but in 3.2 RC it 
> is parsed as date type and the hour part is lost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-36861) Partition columns are overly eagerly parsed as dates

2021-09-27 Thread Gengliang Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420549#comment-17420549
 ] 

Gengliang Wang edited comment on SPARK-36861 at 9/27/21, 8:06 AM:
--

Hmm, the PR https://github.com/apache/spark/pull/33709 is only on master. I 
can't reproduce your case on 3.2.0 RC4 with:

{code:scala}
> val df = Seq(("2021-01-01T00", 0), ("2021-01-01T01", 1), ("2021-01-01T02", 
> 2)).toDF("hour", "i")
> df.write.partitionBy("hour").parquet("/tmp/t1")
> spark.read.parquet("/tmp/t1").schema
res2: org.apache.spark.sql.types.StructType = 
StructType(StructField(i,IntegerType,true), StructField(hour,StringType,true))
{code}

The issue can be reproduced on Spark master though.



was (Author: gengliang.wang):
Hmm, the PR https://github.com/apache/spark/pull/33709 is only on master. I 
can't reproduce your case on RC4 with:

{code:scala}
> val df = Seq(("2021-01-01T00", 0), ("2021-01-01T01", 1), ("2021-01-01T02", 
> 2)).toDF("hour", "i")
> df.write.partitionBy("hour").parquet("/tmp/t1")
> spark.read.parquet("/tmp/t1").schema
res2: org.apache.spark.sql.types.StructType = 
StructType(StructField(i,IntegerType,true), StructField(hour,StringType,true))
{code}


> Partition columns are overly eagerly parsed as dates
> 
>
> Key: SPARK-36861
> URL: https://issues.apache.org/jira/browse/SPARK-36861
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tanel Kiis
>Priority: Major
>
> I have an input directory with subdirs:
> * hour=2021-01-01T00
> * hour=2021-01-01T01
> * hour=2021-01-01T02
> * ...
> in spark 3.1 the 'hour' column is parsed as a string type, but in 3.2 RC it 
> is parsed as date type and the hour part is lost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36861) Partition columns are overly eagerly parsed as dates

2021-09-27 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-36861:
---
Affects Version/s: (was: 3.2.0)
   3.3.0

> Partition columns are overly eagerly parsed as dates
> 
>
> Key: SPARK-36861
> URL: https://issues.apache.org/jira/browse/SPARK-36861
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Tanel Kiis
>Priority: Major
>
> I have an input directory with subdirs:
> * hour=2021-01-01T00
> * hour=2021-01-01T01
> * hour=2021-01-01T02
> * ...
> in spark 3.1 the 'hour' column is parsed as a string type, but in 3.2 RC it 
> is parsed as date type and the hour part is lost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36861) Partition columns are overly eagerly parsed as dates

2021-09-27 Thread Gengliang Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420549#comment-17420549
 ] 

Gengliang Wang commented on SPARK-36861:


Hmm, the PR https://github.com/apache/spark/pull/33709 is only on master. I 
can't reproduce your case on RC4 with:

{code:scala}
> val df = Seq(("2021-01-01T00", 0), ("2021-01-01T01", 1), ("2021-01-01T02", 
> 2)).toDF("hour", "i")
> df.write.partitionBy("hour").parquet("/tmp/t1")
> spark.read.parquet("/tmp/t1").schema
res2: org.apache.spark.sql.types.StructType = 
StructType(StructField(i,IntegerType,true), StructField(hour,StringType,true))
{code}


> Partition columns are overly eagerly parsed as dates
> 
>
> Key: SPARK-36861
> URL: https://issues.apache.org/jira/browse/SPARK-36861
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tanel Kiis
>Priority: Major
>
> I have an input directory with subdirs:
> * hour=2021-01-01T00
> * hour=2021-01-01T01
> * hour=2021-01-01T02
> * ...
> in spark 3.1 the 'hour' column is parsed as a string type, but in 3.2 RC it 
> is parsed as date type and the hour part is lost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-32712) Support writing Hive non-ORC/Parquet bucketed table

2021-09-27 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-32712.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34103
[https://github.com/apache/spark/pull/34103]

> Support writing Hive non-ORC/Parquet bucketed table 
> 
>
> Key: SPARK-32712
> URL: https://issues.apache.org/jira/browse/SPARK-32712
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Cheng Su
>Assignee: Cheng Su
>Priority: Minor
> Fix For: 3.3.0
>
>
> Hive non-ORC/Parquet write code path is original Hive table write path 
> (InsertIntoHiveTable). This JIRA is to support write hivehash bucketed table 
> (for Hive 1.x.y and 2.x.y), and hive murmur3hash bucketed table (for Hive 
> 3.x.y), for these non-ORC/Parquet-serde Hive bucketed table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32712) Support writing Hive non-ORC/Parquet bucketed table

2021-09-27 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-32712:
---

Assignee: Cheng Su

> Support writing Hive non-ORC/Parquet bucketed table 
> 
>
> Key: SPARK-32712
> URL: https://issues.apache.org/jira/browse/SPARK-32712
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Cheng Su
>Assignee: Cheng Su
>Priority: Minor
>
> Hive non-ORC/Parquet write code path is original Hive table write path 
> (InsertIntoHiveTable). This JIRA is to support write hivehash bucketed table 
> (for Hive 1.x.y and 2.x.y), and hive murmur3hash bucketed table (for Hive 
> 3.x.y), for these non-ORC/Parquet-serde Hive bucketed table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36861) Partition columns are overly eagerly parsed as dates

2021-09-27 Thread Gengliang Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420538#comment-17420538
 ] 

Gengliang Wang commented on SPARK-36861:


[~tanelk] This is a new behavior introduced from 
https://github.com/apache/spark/pull/33709
However, turning into date and losing the hour part seems wrong. cc [~maxgekk] 
[~cloud_fan]

> Partition columns are overly eagerly parsed as dates
> 
>
> Key: SPARK-36861
> URL: https://issues.apache.org/jira/browse/SPARK-36861
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tanel Kiis
>Priority: Major
>
> I have an input directory with subdirs:
> * hour=2021-01-01T00
> * hour=2021-01-01T01
> * hour=2021-01-01T02
> * ...
> in spark 3.1 the 'hour' column is parsed as a string type, but in 3.2 RC it 
> is parsed as date type and the hour part is lost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36797) Union should resolve nested columns as top-level columns

2021-09-27 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-36797:
---

Assignee: L. C. Hsieh

> Union should resolve nested columns as top-level columns
> 
>
> Key: SPARK-36797
> URL: https://issues.apache.org/jira/browse/SPARK-36797
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> Union, by definition, resolves columns by position. Currently we only follow 
> this behavior at top-level columns, but not nested columns.
> As we are making nested columns as first-class citizen, the 
> nested-column-only limitation and the difference between top-level column and 
> nested column do not make sense.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-36797) Union should resolve nested columns as top-level columns

2021-09-27 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-36797.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34038
[https://github.com/apache/spark/pull/34038]

> Union should resolve nested columns as top-level columns
> 
>
> Key: SPARK-36797
> URL: https://issues.apache.org/jira/browse/SPARK-36797
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
> Fix For: 3.3.0
>
>
> Union, by definition, resolves columns by position. Currently we only follow 
> this behavior at top-level columns, but not nested columns.
> As we are making nested columns as first-class citizen, the 
> nested-column-only limitation and the difference between top-level column and 
> nested column do not make sense.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36435) Implement MultIndex.equal_levels

2021-09-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420534#comment-17420534
 ] 

Apache Spark commented on SPARK-36435:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/34113

> Implement MultIndex.equal_levels
> 
>
> Key: SPARK-36435
> URL: https://issues.apache.org/jira/browse/SPARK-36435
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36435) Implement MultIndex.equal_levels

2021-09-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36435:


Assignee: (was: Apache Spark)

> Implement MultIndex.equal_levels
> 
>
> Key: SPARK-36435
> URL: https://issues.apache.org/jira/browse/SPARK-36435
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-36435) Implement MultIndex.equal_levels

2021-09-27 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36435:


Assignee: Apache Spark

> Implement MultIndex.equal_levels
> 
>
> Key: SPARK-36435
> URL: https://issues.apache.org/jira/browse/SPARK-36435
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36861) Partition columns are overly eagerly parsed as dates

2021-09-27 Thread Tanel Kiis (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420533#comment-17420533
 ] 

Tanel Kiis commented on SPARK-36861:


If this is expected behaviour, then I would expect there to be a simple way to 
turn this off. Currently only one I can think of is manually specifing the 
schema.

> Partition columns are overly eagerly parsed as dates
> 
>
> Key: SPARK-36861
> URL: https://issues.apache.org/jira/browse/SPARK-36861
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tanel Kiis
>Priority: Major
>
> I have an input directory with subdirs:
> * hour=2021-01-01T00
> * hour=2021-01-01T01
> * hour=2021-01-01T02
> * ...
> in spark 3.1 the 'hour' column is parsed as a string type, but in 3.2 RC it 
> is parsed as date type and the hour part is lost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-36861) Partition columns are overly eagerly parsed as dates

2021-09-27 Thread Tanel Kiis (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420532#comment-17420532
 ] 

Tanel Kiis edited comment on SPARK-36861 at 9/27/21, 7:45 AM:
--

[~Gengliang.Wang] I think, that this should be considered as a blocker for the 
3.2 release
 


was (Author: tanelk):
[~Gengliang.Wang] I think, that this should be considered as a blocker.
 

> Partition columns are overly eagerly parsed as dates
> 
>
> Key: SPARK-36861
> URL: https://issues.apache.org/jira/browse/SPARK-36861
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tanel Kiis
>Priority: Major
>
> I have an input directory with subdirs:
> * hour=2021-01-01T00
> * hour=2021-01-01T01
> * hour=2021-01-01T02
> * ...
> in spark 3.1 the 'hour' column is parsed as a string type, but in 3.2 RC it 
> is parsed as date type and the hour part is lost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36861) Partition columns are overly eagerly parsed as dates

2021-09-27 Thread Tanel Kiis (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420532#comment-17420532
 ] 

Tanel Kiis commented on SPARK-36861:


[~Gengliang.Wang] I think, that this should be considered as a blocker.
 

> Partition columns are overly eagerly parsed as dates
> 
>
> Key: SPARK-36861
> URL: https://issues.apache.org/jira/browse/SPARK-36861
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tanel Kiis
>Priority: Major
>
> I have an input directory with subdirs:
> * hour=2021-01-01T00
> * hour=2021-01-01T01
> * hour=2021-01-01T02
> * ...
> in spark 3.1 the 'hour' column is parsed as a string type, but in 3.2 RC it 
> is parsed as date type and the hour part is lost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-36861) Partition columns are overly eagerly parsed as dates

2021-09-27 Thread Tanel Kiis (Jira)

Tanel Kiis created SPARK-36861:
--

 Summary: Partition columns are overly eagerly parsed as dates
 Key: SPARK-36861
 URL: https://issues.apache.org/jira/browse/SPARK-36861
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.0
Reporter: Tanel Kiis


I have an input directory with subdirs:
* hour=2021-01-01T00
* hour=2021-01-01T01
* hour=2021-01-01T02
* ...

in spark 3.1 the 'hour' column is parsed as a string type, but in 3.2 RC it is 
parsed as date type and the hour part is lost.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36711) Support multi-index in new syntax

2021-09-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420522#comment-17420522
 ] 

Apache Spark commented on SPARK-36711:
--

User 'thangnd197' has created a pull request for this issue:
https://github.com/apache/spark/pull/34112

> Support multi-index in new syntax
> -
>
> Key: SPARK-36711
> URL: https://issues.apache.org/jira/browse/SPARK-36711
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Support multi-index in the new syntax SPARK-36709



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36711) Support multi-index in new syntax

2021-09-27 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420519#comment-17420519
 ] 

Apache Spark commented on SPARK-36711:
--

User 'thangnd197' has created a pull request for this issue:
https://github.com/apache/spark/pull/34112

> Support multi-index in new syntax
> -
>
> Key: SPARK-36711
> URL: https://issues.apache.org/jira/browse/SPARK-36711
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Support multi-index in the new syntax SPARK-36709



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36856) Building by "./build/mvn" may be stuck on MacOS

2021-09-27 Thread copperybean (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

copperybean updated SPARK-36856:

Issue Type: Bug  (was: Improvement)

> Building by "./build/mvn" may be stuck on MacOS
> ---
>
> Key: SPARK-36856
> URL: https://issues.apache.org/jira/browse/SPARK-36856
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.0, 3.3.0
> Environment: MacOS 11.4
>Reporter: copperybean
>Priority: Major
>
> Command "./build/mvn" will be stuck on my MacOS 11.4. Because it is using 
> error java home. On my mac, "/usr/bin/java" is a real file instead of a 
> symbolic link, so the java home is set to path "/usr", and lead the launched 
> maven process stuck with this error java home.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36860) Create the external hive table for HBase failed

2021-09-27 Thread wineternity (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wineternity updated SPARK-36860:

Attachment: image-2021-09-27-14-25-28-900.png

> Create the external hive table for HBase failed 
> 
>
> Key: SPARK-36860
> URL: https://issues.apache.org/jira/browse/SPARK-36860
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: wineternity
>Priority: Major
> Attachments: image-2021-09-27-14-18-10-910.png, 
> image-2021-09-27-14-25-28-900.png
>
>
> We use follow sql to create hive external table , which read from hbase
> {code:java}
> CREATE EXTERNAL TABLE if not exists dev.sanyu_spotlight_headline_material(
>rowkey string COMMENT 'HBase主键',
>content string COMMENT '图文正文')
> USING HIVE   
> ROW FORMAT SERDE
>'org.apache.hadoop.hive.hbase.HBaseSerDe'
>  STORED BY
>'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>  WITH SERDEPROPERTIES (
>'hbase.columns.mapping'=':key, cf1:content'
> )
>  TBLPROPERTIES (
>'hbase.table.name'='spotlight_headline_material'
>  );
> {code}
> But the sql failed in Spark 3.1.2, which throw this exception
> {code:java}
> 21/09/27 11:44:24 INFO scheduler.DAGScheduler: Asked to cancel job group 
> 26d7459f-7b58-4c18-9939-5f2737525ff2
> 21/09/27 11:44:24 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query with 26d7459f-7b58-4c18-9939-5f2737525ff2, currentState 
> RUNNING,
> org.apache.spark.sql.catalyst.parser.ParseException:
> Operation not allowed: Unexpected combination of ROW FORMAT SERDE 
> 'org.apache.hadoop.hive.hbase.HBaseSerDe' and STORED BY 
> 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'WITHSERDEPROPERTIES('hbase.columns.mapping'=':key,
>  cf1:content')(line 5, pos 0)
> {code}
> this check was introduced from this change: 
> [https://github.com/apache/spark/pull/28026]
>  
> Could anyone gave the introduction how to create the external table for hbase 
> in spark3 now ? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36860) Create the external hive table for HBase failed

2021-09-27 Thread wineternity (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420497#comment-17420497
 ] 

wineternity commented on SPARK-36860:
-

thanks, [~sarutak] , may I ask why spark doesn't support creating Hive table 
using storage handlers? 

it seems spark supported the stored by syntax, the data is already in 
CreateFileFormatContext.

!image-2021-09-27-14-18-10-910.png!

but the validateRowFormatFileFormat in AstBuilder later only checks the 
fileformat provided in stored as syntax, and as fileformat is null here, it 
throw out an exception.  Maybe stored by clause can be fixed by fix this check?

!image-2021-09-27-14-25-28-900.png!

> Create the external hive table for HBase failed 
> 
>
> Key: SPARK-36860
> URL: https://issues.apache.org/jira/browse/SPARK-36860
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: wineternity
>Priority: Major
> Attachments: image-2021-09-27-14-18-10-910.png
>
>
> We use follow sql to create hive external table , which read from hbase
> {code:java}
> CREATE EXTERNAL TABLE if not exists dev.sanyu_spotlight_headline_material(
>rowkey string COMMENT 'HBase主键',
>content string COMMENT '图文正文')
> USING HIVE   
> ROW FORMAT SERDE
>'org.apache.hadoop.hive.hbase.HBaseSerDe'
>  STORED BY
>'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>  WITH SERDEPROPERTIES (
>'hbase.columns.mapping'=':key, cf1:content'
> )
>  TBLPROPERTIES (
>'hbase.table.name'='spotlight_headline_material'
>  );
> {code}
> But the sql failed in Spark 3.1.2, which throw this exception
> {code:java}
> 21/09/27 11:44:24 INFO scheduler.DAGScheduler: Asked to cancel job group 
> 26d7459f-7b58-4c18-9939-5f2737525ff2
> 21/09/27 11:44:24 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query with 26d7459f-7b58-4c18-9939-5f2737525ff2, currentState 
> RUNNING,
> org.apache.spark.sql.catalyst.parser.ParseException:
> Operation not allowed: Unexpected combination of ROW FORMAT SERDE 
> 'org.apache.hadoop.hive.hbase.HBaseSerDe' and STORED BY 
> 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'WITHSERDEPROPERTIES('hbase.columns.mapping'=':key,
>  cf1:content')(line 5, pos 0)
> {code}
> this check was introduced from this change: 
> [https://github.com/apache/spark/pull/28026]
>  
> Could anyone gave the introduction how to create the external table for hbase 
> in spark3 now ? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36860) Create the external hive table for HBase failed

2021-09-27 Thread wineternity (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wineternity updated SPARK-36860:

Attachment: image-2021-09-27-14-18-10-910.png

> Create the external hive table for HBase failed 
> 
>
> Key: SPARK-36860
> URL: https://issues.apache.org/jira/browse/SPARK-36860
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: wineternity
>Priority: Major
> Attachments: image-2021-09-27-14-18-10-910.png
>
>
> We use follow sql to create hive external table , which read from hbase
> {code:java}
> CREATE EXTERNAL TABLE if not exists dev.sanyu_spotlight_headline_material(
>rowkey string COMMENT 'HBase主键',
>content string COMMENT '图文正文')
> USING HIVE   
> ROW FORMAT SERDE
>'org.apache.hadoop.hive.hbase.HBaseSerDe'
>  STORED BY
>'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>  WITH SERDEPROPERTIES (
>'hbase.columns.mapping'=':key, cf1:content'
> )
>  TBLPROPERTIES (
>'hbase.table.name'='spotlight_headline_material'
>  );
> {code}
> But the sql failed in Spark 3.1.2, which throw this exception
> {code:java}
> 21/09/27 11:44:24 INFO scheduler.DAGScheduler: Asked to cancel job group 
> 26d7459f-7b58-4c18-9939-5f2737525ff2
> 21/09/27 11:44:24 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query with 26d7459f-7b58-4c18-9939-5f2737525ff2, currentState 
> RUNNING,
> org.apache.spark.sql.catalyst.parser.ParseException:
> Operation not allowed: Unexpected combination of ROW FORMAT SERDE 
> 'org.apache.hadoop.hive.hbase.HBaseSerDe' and STORED BY 
> 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'WITHSERDEPROPERTIES('hbase.columns.mapping'=':key,
>  cf1:content')(line 5, pos 0)
> {code}
> this check was introduced from this change: 
> [https://github.com/apache/spark/pull/28026]
>  
> Could anyone gave the introduction how to create the external table for hbase 
> in spark3 now ? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

98 matches

Mail list logo