[jira] [Commented] (SPARK-32855) Improve DPP for some join type do not support broadcast filtering side
[ https://issues.apache.org/jira/browse/SPARK-32855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421134#comment-17421134 ] Apache Spark commented on SPARK-32855: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/34124 > Improve DPP for some join type do not support broadcast filtering side > -- > > Key: SPARK-32855 > URL: https://issues.apache.org/jira/browse/SPARK-32855 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.2.0 > > > For some filtering side can not broadcast by join type but can broadcast by > size, > then we should not consider reuse broadcast only, for example: > Left outer join and left side very small. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32855) Improve DPP for some join type do not support broadcast filtering side
[ https://issues.apache.org/jira/browse/SPARK-32855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421133#comment-17421133 ] Apache Spark commented on SPARK-32855: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/34124 > Improve DPP for some join type do not support broadcast filtering side > -- > > Key: SPARK-32855 > URL: https://issues.apache.org/jira/browse/SPARK-32855 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.2.0 > > > For some filtering side can not broadcast by join type but can broadcast by > size, > then we should not consider reuse broadcast only, for example: > Left outer join and left side very small. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36872) Decommissioning executors get killed before transferring their data because of the hardcoded timeout of 60 secs
[ https://issues.apache.org/jira/browse/SPARK-36872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shekhar Gupta updated SPARK-36872: -- Description: During the graceful decommissioning phase, executors need to transfer all of their shuffle and cache data to the peer executors. However, they get killed before transferring all the data because of the hardcoded timeout value of 60 secs in the decommissioning script. As a result of executors dying prematurely, the spark tasks on other executors fail which causes application failures, and it is hard to debug those failures. To fix the issue, we ended up writing a custom script with a different timeout and rebuilt the spark image but we would prefer an easier solution that does not require rebuilding the image. (was: During the graceful decommissioning phase, executors need to transfer all of their shuffle and cache data to the peer executors. However, they get killed before could transfer all the data because of the hardcoded timeout value of 60 secs in the decommissioning script. As a result of executors dying prematurely, the spark tasks on other executors fail which causes application failures, and it is hard to debug those failures. To fix the issue, we ended up writing a custom script with a different timeout and rebuilt the spark image but we would prefer an easier solution that does not require rebuilding the image. ) > Decommissioning executors get killed before transferring their data because > of the hardcoded timeout of 60 secs > --- > > Key: SPARK-36872 > URL: https://issues.apache.org/jira/browse/SPARK-36872 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.1.1, 3.1.2, 3.2.0 >Reporter: Shekhar Gupta >Priority: Trivial > > During the graceful decommissioning phase, executors need to transfer all of > their shuffle and cache data to the peer executors. However, they get killed > before transferring all the data because of the hardcoded timeout value of 60 > secs in the decommissioning script. As a result of executors dying > prematurely, the spark tasks on other executors fail which causes application > failures, and it is hard to debug those failures. To fix the issue, we ended > up writing a custom script with a different timeout and rebuilt the spark > image but we would prefer an easier solution that does not require rebuilding > the image. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36872) Decommissioning executors get killed before transferring their data because of the hardcoded timeout of 60 secs
Shekhar Gupta created SPARK-36872: - Summary: Decommissioning executors get killed before transferring their data because of the hardcoded timeout of 60 secs Key: SPARK-36872 URL: https://issues.apache.org/jira/browse/SPARK-36872 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 3.1.2, 3.1.1, 3.2.0 Reporter: Shekhar Gupta During the graceful decommissioning phase, executors need to transfer all of their shuffle and cache data to the peer executors. However, they get killed before could transfer all the data because of the hardcoded timeout value of 60 secs in the decommissioning script. As a result of executors dying prematurely, the spark tasks on other executors fail which causes application failures, and it is hard to debug those failures. To fix the issue, we ended up writing a custom script with a different timeout and rebuilt the spark image but we would prefer an easier solution that does not require rebuilding the image. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36678) Migrate SHOW TABLES to use V2 command by default
[ https://issues.apache.org/jira/browse/SPARK-36678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421120#comment-17421120 ] Terry Kim commented on SPARK-36678: --- Got it, thanks. > Migrate SHOW TABLES to use V2 command by default > > > Key: SPARK-36678 > URL: https://issues.apache.org/jira/browse/SPARK-36678 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Terry Kim >Priority: Major > > Migrate SHOW TABLES to use V2 command by default. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-36586) Migrate all ParsedStatement to the new v2 command framework
[ https://issues.apache.org/jira/browse/SPARK-36586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421106#comment-17421106 ] Terry Kim edited comment on SPARK-36586 at 9/28/21, 1:51 AM: - [~cloud_fan] totally missed this ping (never got an email notification). :) Looks like the work has started and I will also chip in. Thanks! was (Author: imback82): [~cloud_fan] totally missed this ping. :) Looks like the work has started and I will also chip in. Thanks! > Migrate all ParsedStatement to the new v2 command framework > --- > > Key: SPARK-36586 > URL: https://issues.apache.org/jira/browse/SPARK-36586 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Wenchen Fan >Priority: Major > > The ParsedStatement needs to be pattern matched in two analyzer rules and > results to a lot of duplicated code. > The new v2 command framework defines a few basic logical plan nodes such as > UnresolvedTable, and we only need to resolve these basic nodes, and pattern > match v2 commands only once in the rule `ResolveSessionCatalog` for v1 > command fallback. > We should migrate all the ParsedStatement to v2 command framework. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36586) Migrate all ParsedStatement to the new v2 command framework
[ https://issues.apache.org/jira/browse/SPARK-36586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421106#comment-17421106 ] Terry Kim commented on SPARK-36586: --- [~cloud_fan] totally missed this ping. :) Looks like the work has started and I will also chip in. Thanks! > Migrate all ParsedStatement to the new v2 command framework > --- > > Key: SPARK-36586 > URL: https://issues.apache.org/jira/browse/SPARK-36586 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Wenchen Fan >Priority: Major > > The ParsedStatement needs to be pattern matched in two analyzer rules and > results to a lot of duplicated code. > The new v2 command framework defines a few basic logical plan nodes such as > UnresolvedTable, and we only need to resolve these basic nodes, and pattern > match v2 commands only once in the rule `ResolveSessionCatalog` for v1 > command fallback. > We should migrate all the ParsedStatement to v2 command framework. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-36058) Support replicasets/job API
[ https://issues.apache.org/jira/browse/SPARK-36058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420568#comment-17420568 ] Yikun Jiang edited comment on SPARK-36058 at 9/28/21, 1:11 AM: --- After this, it makes the executor pod allocator pluggable, we could add the VolcanoJobAllocator, and it enables the volcano ability (VolcanoJob with queue, podgoup support...) on excutors side. How about driver side? Do you have any suggestion on it? [~holden] BTW, I just wanna sure that [1][2] abilities targets on driver/excutor side, right? That means we need add PodGroup/Queue abilities on each allocator and driver? [1] scheduler & queue: https://issues.apache.org/jira/browse/SPARK-36059 [2] PodGroup https://issues.apache.org/jira/browse/SPARK-36061 was (Author: yikunkero): After this, it makes the executor pod allocator pluggable, we could add the VolcanoJobAllocator, and it enables the volcano ability (with queue, podgoup and so on...) on excutors side. How about driver side? Do you have any suggestion on it? [~holden] BTW, I just wanna sure that [1][2] abilities targets on driver/excutor side, right? That means we need add PodGroup/Queue abilities on each allocator and driver? [1] scheduler & queue: https://issues.apache.org/jira/browse/SPARK-36059 [2] PodGroup https://issues.apache.org/jira/browse/SPARK-36061 > Support replicasets/job API > --- > > Key: SPARK-36058 > URL: https://issues.apache.org/jira/browse/SPARK-36058 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.2.0, 3.3.0 >Reporter: Holden Karau >Assignee: Holden Karau >Priority: Major > Fix For: 3.3.0 > > > Volcano & Yunikorn both support scheduling invidual pods, but they also > support higher level abstractions similar to the vanilla Kube replicasets > which we can use to improve scheduling performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36871) Migrate CreateViewStatement to v2 command
[ https://issues.apache.org/jira/browse/SPARK-36871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421095#comment-17421095 ] Huaxin Gao commented on SPARK-36871: I am working on this > Migrate CreateViewStatement to v2 command > - > > Key: SPARK-36871 > URL: https://issues.apache.org/jira/browse/SPARK-36871 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Huaxin Gao >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36871) Migrate CreateViewStatement to v2 command
Huaxin Gao created SPARK-36871: -- Summary: Migrate CreateViewStatement to v2 command Key: SPARK-36871 URL: https://issues.apache.org/jira/browse/SPARK-36871 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.3.0 Reporter: Huaxin Gao -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36870) Introduce INTERNAL_ERROR error class
[ https://issues.apache.org/jira/browse/SPARK-36870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421087#comment-17421087 ] Apache Spark commented on SPARK-36870: -- User 'karenfeng' has created a pull request for this issue: https://github.com/apache/spark/pull/34123 > Introduce INTERNAL_ERROR error class > > > Key: SPARK-36870 > URL: https://issues.apache.org/jira/browse/SPARK-36870 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Karen Feng >Priority: Major > > Introduces the INTERNAL_ERROR error class; this will be used to determine if > an exception is an internal error and is useful for end-users and developers > to diagnose whether an issue should be reported. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36870) Introduce INTERNAL_ERROR error class
[ https://issues.apache.org/jira/browse/SPARK-36870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36870: Assignee: Apache Spark > Introduce INTERNAL_ERROR error class > > > Key: SPARK-36870 > URL: https://issues.apache.org/jira/browse/SPARK-36870 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Karen Feng >Assignee: Apache Spark >Priority: Major > > Introduces the INTERNAL_ERROR error class; this will be used to determine if > an exception is an internal error and is useful for end-users and developers > to diagnose whether an issue should be reported. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36870) Introduce INTERNAL_ERROR error class
[ https://issues.apache.org/jira/browse/SPARK-36870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36870: Assignee: (was: Apache Spark) > Introduce INTERNAL_ERROR error class > > > Key: SPARK-36870 > URL: https://issues.apache.org/jira/browse/SPARK-36870 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Karen Feng >Priority: Major > > Introduces the INTERNAL_ERROR error class; this will be used to determine if > an exception is an internal error and is useful for end-users and developers > to diagnose whether an issue should be reported. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36870) Introduce INTERNAL_ERROR error class
Karen Feng created SPARK-36870: -- Summary: Introduce INTERNAL_ERROR error class Key: SPARK-36870 URL: https://issues.apache.org/jira/browse/SPARK-36870 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.3.0 Reporter: Karen Feng Introduces the INTERNAL_ERROR error class; this will be used to determine if an exception is an internal error and is useful for end-users and developers to diagnose whether an issue should be reported. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34826) Adaptive fetch of shuffle mergers for Push based shuffle
[ https://issues.apache.org/jira/browse/SPARK-34826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421077#comment-17421077 ] Apache Spark commented on SPARK-34826: -- User 'venkata91' has created a pull request for this issue: https://github.com/apache/spark/pull/34122 > Adaptive fetch of shuffle mergers for Push based shuffle > > > Key: SPARK-34826 > URL: https://issues.apache.org/jira/browse/SPARK-34826 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Venkata krishnan Sowrirajan >Priority: Major > > Currently the shuffle mergers are set during the creation of ShuffleMapStage. > In the initial set of stages, there won't be enough executors added which can > cause not enough shuffle mergers to be set during the creation of the shuffle > map stage. This task is to handle the issue of low merge ratio for initial > stages. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34826) Adaptive fetch of shuffle mergers for Push based shuffle
[ https://issues.apache.org/jira/browse/SPARK-34826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34826: Assignee: (was: Apache Spark) > Adaptive fetch of shuffle mergers for Push based shuffle > > > Key: SPARK-34826 > URL: https://issues.apache.org/jira/browse/SPARK-34826 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Venkata krishnan Sowrirajan >Priority: Major > > Currently the shuffle mergers are set during the creation of ShuffleMapStage. > In the initial set of stages, there won't be enough executors added which can > cause not enough shuffle mergers to be set during the creation of the shuffle > map stage. This task is to handle the issue of low merge ratio for initial > stages. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34826) Adaptive fetch of shuffle mergers for Push based shuffle
[ https://issues.apache.org/jira/browse/SPARK-34826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34826: Assignee: Apache Spark > Adaptive fetch of shuffle mergers for Push based shuffle > > > Key: SPARK-34826 > URL: https://issues.apache.org/jira/browse/SPARK-34826 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Venkata krishnan Sowrirajan >Assignee: Apache Spark >Priority: Major > > Currently the shuffle mergers are set during the creation of ShuffleMapStage. > In the initial set of stages, there won't be enough executors added which can > cause not enough shuffle mergers to be set during the creation of the shuffle > map stage. This task is to handle the issue of low merge ratio for initial > stages. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34826) Adaptive fetch of shuffle mergers for Push based shuffle
[ https://issues.apache.org/jira/browse/SPARK-34826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421075#comment-17421075 ] Apache Spark commented on SPARK-34826: -- User 'venkata91' has created a pull request for this issue: https://github.com/apache/spark/pull/34122 > Adaptive fetch of shuffle mergers for Push based shuffle > > > Key: SPARK-34826 > URL: https://issues.apache.org/jira/browse/SPARK-34826 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Venkata krishnan Sowrirajan >Priority: Major > > Currently the shuffle mergers are set during the creation of ShuffleMapStage. > In the initial set of stages, there won't be enough executors added which can > cause not enough shuffle mergers to be set during the creation of the shuffle > map stage. This task is to handle the issue of low merge ratio for initial > stages. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36869) Spark job fails due to java.io.InvalidClassException: scala.collection.mutable.WrappedArray$ofRef; local class incompatible
Hamid EL MAAZOUZ created SPARK-36869: Summary: Spark job fails due to java.io.InvalidClassException: scala.collection.mutable.WrappedArray$ofRef; local class incompatible Key: SPARK-36869 URL: https://issues.apache.org/jira/browse/SPARK-36869 Project: Spark Issue Type: Bug Components: Input/Output Affects Versions: 3.1.2 Environment: * RHEL 8.4 * Java 11.0.12 * Spark 3.1.2 (only prebuilt with *2.12.10)* * Scala *2.12.14* for the application code Reporter: Hamid EL MAAZOUZ This is a Scala problem. It has been already reported here [https://github.com/scala/bug/issues/5046] and a fix has been merged here [https://github.com/scala/scala/pull/9166.|https://github.com/scala/scala/pull/9166] According to [https://github.com/scala/bug/issues/5046#issuecomment-928108088], the *fix* is available on *Scala 2.12.14*, but *Spark 3.0+* is only pre-built with Scala *2.12.10*. * Stacktrace of the failure: (Taken from stderr of a worker process) {code:java} Spark Executor Command: "/usr/java/jdk-11.0.12/bin/java" "-cp" "/opt/apache/spark-3.1.2-bin-hadoop3.2/conf/:/opt/apache/spark-3.1.2-bin-hadoop3.2/jars/*" "-Xmx1024M" "-Dspark.driver.port=45887" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@192.168.0.191:45887" "--executor-id" "0" "--hostname" "192.168.0.191" "--cores" "12" "--app-id" "app-20210927231035-" "--worker-url" "spark://Worker@192.168.0.191:35261" Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 21/09/27 23:10:36 INFO CoarseGrainedExecutorBackend: Started daemon with process name: 18957@localhost 21/09/27 23:10:36 INFO SignalUtils: Registering signal handler for TERM 21/09/27 23:10:36 INFO SignalUtils: Registering signal handler for HUP 21/09/27 23:10:36 INFO SignalUtils: Registering signal handler for INT 21/09/27 23:10:36 WARN Utils: Your hostname, localhost resolves to a loopback address: 127.0.0.1; using 192.168.0.191 instead (on interface wlp82s0) 21/09/27 23:10:36 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/apache/spark-3.1.2-bin-hadoop3.2/jars/spark-unsafe_2.12-3.1.2.jar) to constructor java.nio.DirectByteBuffer(long,int) WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release 21/09/27 23:10:36 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 21/09/27 23:10:36 INFO SecurityManager: Changing view acls to: hamidelmaazouz 21/09/27 23:10:36 INFO SecurityManager: Changing modify acls to: hamidelmaazouz 21/09/27 23:10:36 INFO SecurityManager: Changing view acls groups to: 21/09/27 23:10:36 INFO SecurityManager: Changing modify acls groups to: 21/09/27 23:10:36 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hamidelmaazouz); groups with view permissions: Set(); users with modify permissions: Set(hamidelmaazouz); groups with modify permissions: Set() 21/09/27 23:10:37 INFO TransportClientFactory: Successfully created connection to /192.168.0.191:45887 after 44 ms (0 ms spent in bootstraps) 21/09/27 23:10:37 WARN TransportChannelHandler: Exception in connection from /192.168.0.191:45887 java.io.InvalidClassException: scala.collection.mutable.WrappedArray$ofRef; local class incompatible: stream classdesc serialVersionUID = 3456489343829468865, local class serialVersionUID = 1028182004549731694 at java.base/java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:689) at java.base/java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2012) at java.base/java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1862) at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2169) at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1679) at java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2464) at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2358) at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2196) at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1679) at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:493) at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:451)
[jira] [Commented] (SPARK-36868) Migrate CreateFunctionStatement to v2 command framework
[ https://issues.apache.org/jira/browse/SPARK-36868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421067#comment-17421067 ] Apache Spark commented on SPARK-36868: -- User 'huaxingao' has created a pull request for this issue: https://github.com/apache/spark/pull/34121 > Migrate CreateFunctionStatement to v2 command framework > --- > > Key: SPARK-36868 > URL: https://issues.apache.org/jira/browse/SPARK-36868 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Huaxin Gao >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36868) Migrate CreateFunctionStatement to v2 command framework
[ https://issues.apache.org/jira/browse/SPARK-36868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36868: Assignee: Apache Spark > Migrate CreateFunctionStatement to v2 command framework > --- > > Key: SPARK-36868 > URL: https://issues.apache.org/jira/browse/SPARK-36868 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Huaxin Gao >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36868) Migrate CreateFunctionStatement to v2 command framework
[ https://issues.apache.org/jira/browse/SPARK-36868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421066#comment-17421066 ] Apache Spark commented on SPARK-36868: -- User 'huaxingao' has created a pull request for this issue: https://github.com/apache/spark/pull/34121 > Migrate CreateFunctionStatement to v2 command framework > --- > > Key: SPARK-36868 > URL: https://issues.apache.org/jira/browse/SPARK-36868 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Huaxin Gao >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36868) Migrate CreateFunctionStatement to v2 command framework
[ https://issues.apache.org/jira/browse/SPARK-36868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36868: Assignee: (was: Apache Spark) > Migrate CreateFunctionStatement to v2 command framework > --- > > Key: SPARK-36868 > URL: https://issues.apache.org/jira/browse/SPARK-36868 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Huaxin Gao >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36868) Migrate CreateFunctionStatement to v2 command framework
Huaxin Gao created SPARK-36868: -- Summary: Migrate CreateFunctionStatement to v2 command framework Key: SPARK-36868 URL: https://issues.apache.org/jira/browse/SPARK-36868 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.3.0 Reporter: Huaxin Gao -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35672) Spark fails to launch executors with very large user classpath lists on YARN
[ https://issues.apache.org/jira/browse/SPARK-35672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421017#comment-17421017 ] Apache Spark commented on SPARK-35672: -- User 'xkrogen' has created a pull request for this issue: https://github.com/apache/spark/pull/34120 > Spark fails to launch executors with very large user classpath lists on YARN > > > Key: SPARK-35672 > URL: https://issues.apache.org/jira/browse/SPARK-35672 > Project: Spark > Issue Type: Bug > Components: Spark Core, YARN >Affects Versions: 3.1.2 > Environment: Linux RHEL7 > Spark 3.1.1 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > > When running Spark on YARN, the {{user-class-path}} argument to > {{CoarseGrainedExecutorBackend}} is used to pass a list of user JAR URIs to > executor processes. The argument is specified once for each JAR, and the URIs > are fully-qualified, so the paths can be quite long. With large user JAR > lists (say 1000+), this can result in system-level argument length limits > being exceeded, typically manifesting as the error message: > {code} > /bin/bash: Argument list too long > {code} > A [Google > search|https://www.google.com/search?q=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22] > indicates that this is not a theoretical problem and afflicts real users, > including ours. This issue was originally observed on Spark 2.3, but has been > confirmed to exist in the master branch as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35672) Spark fails to launch executors with very large user classpath lists on YARN
[ https://issues.apache.org/jira/browse/SPARK-35672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421016#comment-17421016 ] Erik Krogen commented on SPARK-35672: - Re-submitted at [PR #34120|https://github.com/apache/spark/pull/34120] > Spark fails to launch executors with very large user classpath lists on YARN > > > Key: SPARK-35672 > URL: https://issues.apache.org/jira/browse/SPARK-35672 > Project: Spark > Issue Type: Bug > Components: Spark Core, YARN >Affects Versions: 3.1.2 > Environment: Linux RHEL7 > Spark 3.1.1 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > > When running Spark on YARN, the {{user-class-path}} argument to > {{CoarseGrainedExecutorBackend}} is used to pass a list of user JAR URIs to > executor processes. The argument is specified once for each JAR, and the URIs > are fully-qualified, so the paths can be quite long. With large user JAR > lists (say 1000+), this can result in system-level argument length limits > being exceeded, typically manifesting as the error message: > {code} > /bin/bash: Argument list too long > {code} > A [Google > search|https://www.google.com/search?q=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22] > indicates that this is not a theoretical problem and afflicts real users, > including ours. This issue was originally observed on Spark 2.3, but has been > confirmed to exist in the master branch as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35672) Spark fails to launch executors with very large user classpath lists on YARN
[ https://issues.apache.org/jira/browse/SPARK-35672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421018#comment-17421018 ] Apache Spark commented on SPARK-35672: -- User 'xkrogen' has created a pull request for this issue: https://github.com/apache/spark/pull/34120 > Spark fails to launch executors with very large user classpath lists on YARN > > > Key: SPARK-35672 > URL: https://issues.apache.org/jira/browse/SPARK-35672 > Project: Spark > Issue Type: Bug > Components: Spark Core, YARN >Affects Versions: 3.1.2 > Environment: Linux RHEL7 > Spark 3.1.1 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > > When running Spark on YARN, the {{user-class-path}} argument to > {{CoarseGrainedExecutorBackend}} is used to pass a list of user JAR URIs to > executor processes. The argument is specified once for each JAR, and the URIs > are fully-qualified, so the paths can be quite long. With large user JAR > lists (say 1000+), this can result in system-level argument length limits > being exceeded, typically manifesting as the error message: > {code} > /bin/bash: Argument list too long > {code} > A [Google > search|https://www.google.com/search?q=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22=spark%20%22%2Fbin%2Fbash%3A%20argument%20list%20too%20long%22] > indicates that this is not a theoretical problem and afflicts real users, > including ours. This issue was originally observed on Spark 2.3, but has been > confirmed to exist in the master branch as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36867) Misleading Error Message with Invalid Column and Group By
Alan Jackoway created SPARK-36867: - Summary: Misleading Error Message with Invalid Column and Group By Key: SPARK-36867 URL: https://issues.apache.org/jira/browse/SPARK-36867 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.2 Reporter: Alan Jackoway When you run a query with an invalid column that also does a group by on a constructed column, the error message you get back references a missing column for the group by rather than the invalid column. You can reproduce this in pyspark in 3.1.2 with the following code: {code:python} from pyspark.sql import SparkSession spark = SparkSession.builder.appName("Group By Issue").getOrCreate() data = spark.createDataFrame( [("2021-09-15", 1), ("2021-09-16", 2), ("2021-09-17", 10), ("2021-09-18", 25), ("2021-09-19", 500), ("2021-09-20", 50), ("2021-09-21", 100)], schema=["d", "v"] ) data.createOrReplaceTempView("data") # This is valid spark.sql("select sum(v) as value, date(date_trunc('week', d)) as week from data group by week").show() # This is invalid because val is the wrong variable spark.sql("select sum(val) as value, date(date_trunc('week', d)) as week from data group by week").show() {code} The error message for the second spark.sql line is {quote} pyspark.sql.utils.AnalysisException: cannot resolve '`week`' given input columns: [data.d, data.v]; line 1 pos 81; 'Aggregate ['week], ['sum('val) AS value#21, cast(date_trunc(week, cast(d#0 as timestamp), Some(America/New_York)) as date) AS week#22] +- SubqueryAlias data +- LogicalRDD [d#0, v#1L], false {quote} but the actual problem is that I used the wrong variable name in a different part of the query. Nothing is wrong with {{week}} in this case. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17722) YarnScheduler: Initial job has not accepted any resources
[ https://issues.apache.org/jira/browse/SPARK-17722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420999#comment-17420999 ] Davide Benedetto commented on SPARK-17722: -- Hi Partha I have your same issue. Which yarn configurations have you set? > YarnScheduler: Initial job has not accepted any resources > - > > Key: SPARK-17722 > URL: https://issues.apache.org/jira/browse/SPARK-17722 > Project: Spark > Issue Type: Bug >Reporter: Partha Pratim Ghosh >Priority: Major > > Connected spark in yarn mode from eclipse java. On trying to run task it is > giving the following - > YarnScheduler: Initial job has not accepted any resources; check your cluster > UI to ensure that workers are registered and have sufficient resources. The > request is going to Hadoop cluster scheduler and from there we can see the > job in spark UI. But there it is saying that no task has been assigned for > this. > Same code is running from spark-submit where we need to remove the following > lines - > System.setProperty("java.security.krb5.conf", "C:\\xxx\\krb5.conf"); > > org.apache.hadoop.conf.Configuration conf = new > org.apache.hadoop.conf.Configuration(); > conf.set("hadoop.security.authentication", "kerberos"); > UserGroupInformation.setConfiguration(conf); > Following is the configuration - > import org.apache.hadoop.security.UserGroupInformation; > import org.apache.spark.SparkConf; > import org.apache.spark.api.java.JavaRDD; > import org.apache.spark.api.java.JavaSparkContext; > import org.apache.spark.sql.DataFrame; > import org.apache.spark.sql.SQLContext; > public class TestConnectivity { > /** >* @param args >*/ > public static void main(String[] args) { > System.setProperty("java.security.krb5.conf", > "C:\\xxx\\krb5.conf"); > > org.apache.hadoop.conf.Configuration conf = new > org.apache.hadoop.conf.Configuration(); > conf.set("hadoop.security.authentication", "kerberos"); > UserGroupInformation.setConfiguration(conf); >SparkConf config = new SparkConf().setAppName("Test Spark "); >config = config.setMaster("yarn-client"); >config .set("spark.dynamicAllocation.enabled", "false"); >config.set("spark.executor.memory", "2g"); >config.set("spark.executor.instances", "1"); >config.set("spark.executor.cores", "2"); >//config.set("spark.driver.memory", "2g"); >//config.set("spark.driver.cores", "1"); >/*config.set("spark.executor.am.memory", "2g"); >config.set("spark.executor.am.cores", "2");*/ >config.set("spark.cores.max", "4"); >config.set("yarn.nodemanager.resource.cpu-vcores","4"); >config.set("spark.yarn.queue","root.root"); >/*config.set("spark.deploy.defaultCores", "2"); >config.set("spark.task.cpus", "2");*/ >config.set("spark.yarn.jar", > "file:/C:/xxx/spark-assembly_2.10-1.6.0-cdh5.7.1.jar"); > JavaSparkContext sc = new JavaSparkContext(config); > SQLContext sqlcontext = new SQLContext(sc); > DataFrame df = sqlcontext.jsonFile(logFile); > JavaRDD logData = > sc.textFile("sparkexamples/Employee.json").cache(); > DataFrame df = sqlcontext.jsonRDD(logData); > > df.show(); > df.printSchema(); > > //UserGroupInformation.setConfiguration(conf); > > } > } -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36863) Update dependency manifests for all released artifacts
[ https://issues.apache.org/jira/browse/SPARK-36863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36863: Assignee: Apache Spark > Update dependency manifests for all released artifacts > -- > > Key: SPARK-36863 > URL: https://issues.apache.org/jira/browse/SPARK-36863 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.3.0 >Reporter: Chao Sun >Assignee: Apache Spark >Priority: Minor > > We should update dependency manifests for all released artifacts. Currently > we don't do for modules such as {{hadoop-cloud}}, {{kinesis-asl}} etc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36863) Update dependency manifests for all released artifacts
[ https://issues.apache.org/jira/browse/SPARK-36863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36863: Assignee: (was: Apache Spark) > Update dependency manifests for all released artifacts > -- > > Key: SPARK-36863 > URL: https://issues.apache.org/jira/browse/SPARK-36863 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.3.0 >Reporter: Chao Sun >Priority: Minor > > We should update dependency manifests for all released artifacts. Currently > we don't do for modules such as {{hadoop-cloud}}, {{kinesis-asl}} etc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36863) Update dependency manifests for all released artifacts
[ https://issues.apache.org/jira/browse/SPARK-36863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420994#comment-17420994 ] Apache Spark commented on SPARK-36863: -- User 'sunchao' has created a pull request for this issue: https://github.com/apache/spark/pull/34119 > Update dependency manifests for all released artifacts > -- > > Key: SPARK-36863 > URL: https://issues.apache.org/jira/browse/SPARK-36863 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.3.0 >Reporter: Chao Sun >Priority: Minor > > We should update dependency manifests for all released artifacts. Currently > we don't do for modules such as {{hadoop-cloud}}, {{kinesis-asl}} etc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36863) Update dependency manifests for all released artifacts
[ https://issues.apache.org/jira/browse/SPARK-36863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420995#comment-17420995 ] Apache Spark commented on SPARK-36863: -- User 'sunchao' has created a pull request for this issue: https://github.com/apache/spark/pull/34119 > Update dependency manifests for all released artifacts > -- > > Key: SPARK-36863 > URL: https://issues.apache.org/jira/browse/SPARK-36863 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.3.0 >Reporter: Chao Sun >Assignee: Apache Spark >Priority: Minor > > We should update dependency manifests for all released artifacts. Currently > we don't do for modules such as {{hadoop-cloud}}, {{kinesis-asl}} etc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36847) Explicitly specify error codes when ignoring type hint errors
[ https://issues.apache.org/jira/browse/SPARK-36847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-36847. --- Fix Version/s: 3.3.0 Assignee: Takuya Ueshin Resolution: Fixed Issue resolved by pull request 34102 https://github.com/apache/spark/pull/34102 > Explicitly specify error codes when ignoring type hint errors > - > > Key: SPARK-36847 > URL: https://issues.apache.org/jira/browse/SPARK-36847 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > Fix For: 3.3.0 > > > We use a lot of {{type: ignore}} annotation to ignore type hint errors in > pandas-on-Spark. > We should explicitly specify the error codes to make it clear what kind of > error is being ignored, then the type hint checker can check more cases. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36866) Pushdown filters with ANSI interval values to parquet
Max Gekk created SPARK-36866: Summary: Pushdown filters with ANSI interval values to parquet Key: SPARK-36866 URL: https://issues.apache.org/jira/browse/SPARK-36866 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.3.0 Reporter: Max Gekk Assignee: Max Gekk -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36865) Add PySpark API document of session_window
[ https://issues.apache.org/jira/browse/SPARK-36865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36865: Assignee: Kousuke Saruta (was: Apache Spark) > Add PySpark API document of session_window > -- > > Key: SPARK-36865 > URL: https://issues.apache.org/jira/browse/SPARK-36865 > Project: Spark > Issue Type: Bug > Components: docs, PySpark >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > There is no PySpark API document of session_window. > The docstring of the function also doesn't comply with the numpydoc format. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36865) Add PySpark API document of session_window
[ https://issues.apache.org/jira/browse/SPARK-36865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420959#comment-17420959 ] Apache Spark commented on SPARK-36865: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/34118 > Add PySpark API document of session_window > -- > > Key: SPARK-36865 > URL: https://issues.apache.org/jira/browse/SPARK-36865 > Project: Spark > Issue Type: Bug > Components: docs, PySpark >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > There is no PySpark API document of session_window. > The docstring of the function also doesn't comply with the numpydoc format. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36865) Add PySpark API document of session_window
[ https://issues.apache.org/jira/browse/SPARK-36865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36865: Assignee: Apache Spark (was: Kousuke Saruta) > Add PySpark API document of session_window > -- > > Key: SPARK-36865 > URL: https://issues.apache.org/jira/browse/SPARK-36865 > Project: Spark > Issue Type: Bug > Components: docs, PySpark >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Apache Spark >Priority: Major > > There is no PySpark API document of session_window. > The docstring of the function also doesn't comply with the numpydoc format. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36821) Create a test to extend ColumnarBatch
[ https://issues.apache.org/jira/browse/SPARK-36821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DB Tsai reassigned SPARK-36821: --- Assignee: Yufei Gu > Create a test to extend ColumnarBatch > - > > Key: SPARK-36821 > URL: https://issues.apache.org/jira/browse/SPARK-36821 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yufei Gu >Assignee: Yufei Gu >Priority: Major > > As a followup of Spark-36814, to create a test to extend ColumnarBatch to > prevent future changes to break it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36821) Create a test to extend ColumnarBatch
[ https://issues.apache.org/jira/browse/SPARK-36821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DB Tsai resolved SPARK-36821. - Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34087 [https://github.com/apache/spark/pull/34087] > Create a test to extend ColumnarBatch > - > > Key: SPARK-36821 > URL: https://issues.apache.org/jira/browse/SPARK-36821 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yufei Gu >Assignee: Yufei Gu >Priority: Major > Fix For: 3.3.0 > > > As a followup of Spark-36814, to create a test to extend ColumnarBatch to > prevent future changes to break it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36865) Add PySpark API document of session_window
[ https://issues.apache.org/jira/browse/SPARK-36865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-36865: --- Summary: Add PySpark API document of session_window (was: Add PySpark API document for session_window) > Add PySpark API document of session_window > -- > > Key: SPARK-36865 > URL: https://issues.apache.org/jira/browse/SPARK-36865 > Project: Spark > Issue Type: Bug > Components: docs, PySpark >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > There is no PySpark API document for session_window. > The docstring of the function also doesn't comply with the numpydoc format. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36865) Add PySpark API document of session_window
[ https://issues.apache.org/jira/browse/SPARK-36865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-36865: --- Description: There is no PySpark API document of session_window. The docstring of the function also doesn't comply with the numpydoc format. was: There is no PySpark API document for session_window. The docstring of the function also doesn't comply with the numpydoc format. > Add PySpark API document of session_window > -- > > Key: SPARK-36865 > URL: https://issues.apache.org/jira/browse/SPARK-36865 > Project: Spark > Issue Type: Bug > Components: docs, PySpark >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > There is no PySpark API document of session_window. > The docstring of the function also doesn't comply with the numpydoc format. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36865) Add PySpark API document for session_window
[ https://issues.apache.org/jira/browse/SPARK-36865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-36865: --- Description: There is no PySpark API document for session_window. The docstring of the function also doesn't comply with the numpydoc format. was:The layout of PySpark API document for session_window is broken because the corresponding docstring doesn't comply with numpydoc format. > Add PySpark API document for session_window > --- > > Key: SPARK-36865 > URL: https://issues.apache.org/jira/browse/SPARK-36865 > Project: Spark > Issue Type: Bug > Components: docs, PySpark >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > There is no PySpark API document for session_window. > The docstring of the function also doesn't comply with the numpydoc format. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36865) Add PySpark API document for session_window
[ https://issues.apache.org/jira/browse/SPARK-36865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-36865: --- Summary: Add PySpark API document for session_window (was: The layout of PySpark API document for session_window is broken) > Add PySpark API document for session_window > --- > > Key: SPARK-36865 > URL: https://issues.apache.org/jira/browse/SPARK-36865 > Project: Spark > Issue Type: Bug > Components: docs, PySpark >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > The layout of PySpark API document for session_window is broken because the > corresponding docstring doesn't comply with numpydoc format. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36865) The layout of PySpark API document for session_window is broken
Kousuke Saruta created SPARK-36865: -- Summary: The layout of PySpark API document for session_window is broken Key: SPARK-36865 URL: https://issues.apache.org/jira/browse/SPARK-36865 Project: Spark Issue Type: Bug Components: docs, PySpark Affects Versions: 3.2.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta The layout of PySpark API document for session_window is broken because the corresponding docstring doesn't comply with numpydoc format. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36864) guava version mismatch with hadoop-aws
[ https://issues.apache.org/jira/browse/SPARK-36864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36864: Assignee: (was: Apache Spark) > guava version mismatch with hadoop-aws > -- > > Key: SPARK-36864 > URL: https://issues.apache.org/jira/browse/SPARK-36864 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.1.2 >Reporter: Zhongwei Zhu >Priority: Minor > > When use hadoop-aws 3.2 with spark 3.0, got below error. This is caused by > guava version mismatch as hadoop used guava 27.0-jre while spark used 14.0.1. > Exception in thread "main" java.lang.NoSuchMethodError: > com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;Ljava/lang/Object;)V > at org.apache.hadoop.fs.s3a.S3AUtils.lookupPassword(S3AUtils.java:742) > at org.apache.hadoop.fs.s3a.S3AUtils.getAWSAccessKeys(S3AUtils.java:712) > at > org.apache.hadoop.fs.s3a.S3AUtils.createAWSCredentialProviderSet(S3AUtils.java:559) > at > org.apache.hadoop.fs.s3a.DefaultS3ClientFactory.createS3Client(DefaultS3ClientFactory.java:52) > at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:264) > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124) > at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479) > at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1853) > at > org.apache.spark.deploy.history.EventLogFileWriter.(EventLogFileWriters.scala:60) > at > org.apache.spark.deploy.history.SingleEventLogFileWriter.(EventLogFileWriters.scala:213) > at > org.apache.spark.deploy.history.EventLogFileWriter$.apply(EventLogFileWriters.scala:181) > at > org.apache.spark.scheduler.EventLoggingListener.(EventLoggingListener.scala:66) > at org.apache.spark.SparkContext.(SparkContext.scala:584) > at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2588) > at > org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:937) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:931) > at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:30) > at org.apache.spark.examples.SparkPi.main(SparkPi.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:944) > at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1023) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1032) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36864) guava version mismatch with hadoop-aws
[ https://issues.apache.org/jira/browse/SPARK-36864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36864: Assignee: Apache Spark > guava version mismatch with hadoop-aws > -- > > Key: SPARK-36864 > URL: https://issues.apache.org/jira/browse/SPARK-36864 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.1.2 >Reporter: Zhongwei Zhu >Assignee: Apache Spark >Priority: Minor > > When use hadoop-aws 3.2 with spark 3.0, got below error. This is caused by > guava version mismatch as hadoop used guava 27.0-jre while spark used 14.0.1. > Exception in thread "main" java.lang.NoSuchMethodError: > com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;Ljava/lang/Object;)V > at org.apache.hadoop.fs.s3a.S3AUtils.lookupPassword(S3AUtils.java:742) > at org.apache.hadoop.fs.s3a.S3AUtils.getAWSAccessKeys(S3AUtils.java:712) > at > org.apache.hadoop.fs.s3a.S3AUtils.createAWSCredentialProviderSet(S3AUtils.java:559) > at > org.apache.hadoop.fs.s3a.DefaultS3ClientFactory.createS3Client(DefaultS3ClientFactory.java:52) > at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:264) > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124) > at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479) > at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1853) > at > org.apache.spark.deploy.history.EventLogFileWriter.(EventLogFileWriters.scala:60) > at > org.apache.spark.deploy.history.SingleEventLogFileWriter.(EventLogFileWriters.scala:213) > at > org.apache.spark.deploy.history.EventLogFileWriter$.apply(EventLogFileWriters.scala:181) > at > org.apache.spark.scheduler.EventLoggingListener.(EventLoggingListener.scala:66) > at org.apache.spark.SparkContext.(SparkContext.scala:584) > at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2588) > at > org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:937) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:931) > at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:30) > at org.apache.spark.examples.SparkPi.main(SparkPi.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:944) > at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1023) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1032) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36864) guava version mismatch with hadoop-aws
[ https://issues.apache.org/jira/browse/SPARK-36864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420935#comment-17420935 ] Apache Spark commented on SPARK-36864: -- User 'warrenzhu25' has created a pull request for this issue: https://github.com/apache/spark/pull/34117 > guava version mismatch with hadoop-aws > -- > > Key: SPARK-36864 > URL: https://issues.apache.org/jira/browse/SPARK-36864 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.1.2 >Reporter: Zhongwei Zhu >Priority: Minor > > When use hadoop-aws 3.2 with spark 3.0, got below error. This is caused by > guava version mismatch as hadoop used guava 27.0-jre while spark used 14.0.1. > Exception in thread "main" java.lang.NoSuchMethodError: > com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;Ljava/lang/Object;)V > at org.apache.hadoop.fs.s3a.S3AUtils.lookupPassword(S3AUtils.java:742) > at org.apache.hadoop.fs.s3a.S3AUtils.getAWSAccessKeys(S3AUtils.java:712) > at > org.apache.hadoop.fs.s3a.S3AUtils.createAWSCredentialProviderSet(S3AUtils.java:559) > at > org.apache.hadoop.fs.s3a.DefaultS3ClientFactory.createS3Client(DefaultS3ClientFactory.java:52) > at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:264) > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124) > at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479) > at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1853) > at > org.apache.spark.deploy.history.EventLogFileWriter.(EventLogFileWriters.scala:60) > at > org.apache.spark.deploy.history.SingleEventLogFileWriter.(EventLogFileWriters.scala:213) > at > org.apache.spark.deploy.history.EventLogFileWriter$.apply(EventLogFileWriters.scala:181) > at > org.apache.spark.scheduler.EventLoggingListener.(EventLoggingListener.scala:66) > at org.apache.spark.SparkContext.(SparkContext.scala:584) > at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2588) > at > org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:937) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:931) > at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:30) > at org.apache.spark.examples.SparkPi.main(SparkPi.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:944) > at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1023) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1032) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36864) guava version mismatch with hadoop-aws
[ https://issues.apache.org/jira/browse/SPARK-36864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhongwei Zhu updated SPARK-36864: - Summary: guava version mismatch with hadoop-aws (was: guava version mismatch between hadoop-aws and spark) > guava version mismatch with hadoop-aws > -- > > Key: SPARK-36864 > URL: https://issues.apache.org/jira/browse/SPARK-36864 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.1.2 >Reporter: Zhongwei Zhu >Priority: Minor > > When use hadoop-aws 3.2 with spark 3.0, got below error. This is caused by > guava version mismatch as hadoop used guava 27.0-jre while spark used 14.0.1. > Exception in thread "main" java.lang.NoSuchMethodError: > com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;Ljava/lang/Object;)V > at org.apache.hadoop.fs.s3a.S3AUtils.lookupPassword(S3AUtils.java:742) > at org.apache.hadoop.fs.s3a.S3AUtils.getAWSAccessKeys(S3AUtils.java:712) > at > org.apache.hadoop.fs.s3a.S3AUtils.createAWSCredentialProviderSet(S3AUtils.java:559) > at > org.apache.hadoop.fs.s3a.DefaultS3ClientFactory.createS3Client(DefaultS3ClientFactory.java:52) > at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:264) > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124) > at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479) > at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1853) > at > org.apache.spark.deploy.history.EventLogFileWriter.(EventLogFileWriters.scala:60) > at > org.apache.spark.deploy.history.SingleEventLogFileWriter.(EventLogFileWriters.scala:213) > at > org.apache.spark.deploy.history.EventLogFileWriter$.apply(EventLogFileWriters.scala:181) > at > org.apache.spark.scheduler.EventLoggingListener.(EventLoggingListener.scala:66) > at org.apache.spark.SparkContext.(SparkContext.scala:584) > at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2588) > at > org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:937) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:931) > at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:30) > at org.apache.spark.examples.SparkPi.main(SparkPi.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:944) > at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1023) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1032) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36848) Migrate ShowCurrentNamespaceStatement to v2 command framework
[ https://issues.apache.org/jira/browse/SPARK-36848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh reassigned SPARK-36848: --- Assignee: Huaxin Gao > Migrate ShowCurrentNamespaceStatement to v2 command framework > - > > Key: SPARK-36848 > URL: https://issues.apache.org/jira/browse/SPARK-36848 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36848) Migrate ShowCurrentNamespaceStatement to v2 command framework
[ https://issues.apache.org/jira/browse/SPARK-36848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh resolved SPARK-36848. - Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34104 [https://github.com/apache/spark/pull/34104] > Migrate ShowCurrentNamespaceStatement to v2 command framework > - > > Key: SPARK-36848 > URL: https://issues.apache.org/jira/browse/SPARK-36848 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36864) guava version mismatch between hadoop-aws and spark
Zhongwei Zhu created SPARK-36864: Summary: guava version mismatch between hadoop-aws and spark Key: SPARK-36864 URL: https://issues.apache.org/jira/browse/SPARK-36864 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.1.2 Reporter: Zhongwei Zhu When use hadoop-aws 3.2 with spark 3.0, got below error. This is caused by guava version mismatch as hadoop used guava 27.0-jre while spark used 14.0.1. Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;Ljava/lang/Object;)V at org.apache.hadoop.fs.s3a.S3AUtils.lookupPassword(S3AUtils.java:742) at org.apache.hadoop.fs.s3a.S3AUtils.getAWSAccessKeys(S3AUtils.java:712) at org.apache.hadoop.fs.s3a.S3AUtils.createAWSCredentialProviderSet(S3AUtils.java:559) at org.apache.hadoop.fs.s3a.DefaultS3ClientFactory.createS3Client(DefaultS3ClientFactory.java:52) at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:264) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479) at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1853) at org.apache.spark.deploy.history.EventLogFileWriter.(EventLogFileWriters.scala:60) at org.apache.spark.deploy.history.SingleEventLogFileWriter.(EventLogFileWriters.scala:213) at org.apache.spark.deploy.history.EventLogFileWriter$.apply(EventLogFileWriters.scala:181) at org.apache.spark.scheduler.EventLoggingListener.(EventLoggingListener.scala:66) at org.apache.spark.SparkContext.(SparkContext.scala:584) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2588) at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:937) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:931) at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:30) at org.apache.spark.examples.SparkPi.main(SparkPi.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:944) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1023) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1032) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36863) Update dependency manifests for all released artifacts
Chao Sun created SPARK-36863: Summary: Update dependency manifests for all released artifacts Key: SPARK-36863 URL: https://issues.apache.org/jira/browse/SPARK-36863 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.3.0 Reporter: Chao Sun We should update dependency manifests for all released artifacts. Currently we don't do for modules such as {{hadoop-cloud}}, {{kinesis-asl}} etc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31646) Remove unused registeredConnections counter from ShuffleMetrics
[ https://issues.apache.org/jira/browse/SPARK-31646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420870#comment-17420870 ] Yongjun Zhang commented on SPARK-31646: --- Thanks [~mauzhang]. > Remove unused registeredConnections counter from ShuffleMetrics > --- > > Key: SPARK-31646 > URL: https://issues.apache.org/jira/browse/SPARK-31646 > Project: Spark > Issue Type: Improvement > Components: Deploy, Shuffle, Spark Core >Affects Versions: 3.0.0 >Reporter: Manu Zhang >Assignee: Manu Zhang >Priority: Minor > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32855) Improve DPP for some join type do not support broadcast filtering side
[ https://issues.apache.org/jira/browse/SPARK-32855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420835#comment-17420835 ] Apache Spark commented on SPARK-32855: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/34116 > Improve DPP for some join type do not support broadcast filtering side > -- > > Key: SPARK-32855 > URL: https://issues.apache.org/jira/browse/SPARK-32855 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.2.0 > > > For some filtering side can not broadcast by join type but can broadcast by > size, > then we should not consider reuse broadcast only, for example: > Left outer join and left side very small. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32855) Improve DPP for some join type do not support broadcast filtering side
[ https://issues.apache.org/jira/browse/SPARK-32855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420834#comment-17420834 ] Apache Spark commented on SPARK-32855: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/34116 > Improve DPP for some join type do not support broadcast filtering side > -- > > Key: SPARK-32855 > URL: https://issues.apache.org/jira/browse/SPARK-32855 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.2.0 > > > For some filtering side can not broadcast by join type but can broadcast by > size, > then we should not consider reuse broadcast only, for example: > Left outer join and left side very small. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36861) Partition columns are overly eagerly parsed as dates
[ https://issues.apache.org/jira/browse/SPARK-36861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-36861: Target Version/s: 3.3.0 Priority: Blocker (was: Major) > Partition columns are overly eagerly parsed as dates > > > Key: SPARK-36861 > URL: https://issues.apache.org/jira/browse/SPARK-36861 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Tanel Kiis >Priority: Blocker > > I have an input directory with subdirs: > * hour=2021-01-01T00 > * hour=2021-01-01T01 > * hour=2021-01-01T02 > * ... > in spark 3.1 the 'hour' column is parsed as a string type, but in 3.2 RC it > is parsed as date type and the hour part is lost. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-36058) Support replicasets/job API
[ https://issues.apache.org/jira/browse/SPARK-36058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420568#comment-17420568 ] Yikun Jiang edited comment on SPARK-36058 at 9/27/21, 2:32 PM: --- After this, it makes the executor pod allocator pluggable, we could add the VolcanoJobAllocator, and it enables the volcano ability (with queue, podgoup and so on...) on excutors side. How about driver side? Do you have any suggestion on it? [~holden] BTW, I just wanna sure that [1][2] abilities targets on driver/excutor side, right? That means we need add PodGroup/Queue abilities on each allocator and driver? [1] scheduler & queue: https://issues.apache.org/jira/browse/SPARK-36059 [2] PodGroup https://issues.apache.org/jira/browse/SPARK-36061 was (Author: yikunkero): After this, it makes the executor pod allocator pluggable, we could add the VolcanoJobAllocator, and it enables the volcano ability (with queue, podgoup and so on...) on excutors side. How about driver side? Do you have any suggestion on it? [~holden] BTW, I just wanna sure that [1][2] abilities targets on driver/excutor side, right? That means we need add PodGroup/Queue abilities on each allocator? [1] scheduler & queue: https://issues.apache.org/jira/browse/SPARK-36059 [2] PodGroup https://issues.apache.org/jira/browse/SPARK-36061 > Support replicasets/job API > --- > > Key: SPARK-36058 > URL: https://issues.apache.org/jira/browse/SPARK-36058 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.2.0, 3.3.0 >Reporter: Holden Karau >Assignee: Holden Karau >Priority: Major > Fix For: 3.3.0 > > > Volcano & Yunikorn both support scheduling invidual pods, but they also > support higher level abstractions similar to the vanilla Kube replicasets > which we can use to improve scheduling performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-36058) Support replicasets/job API
[ https://issues.apache.org/jira/browse/SPARK-36058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420568#comment-17420568 ] Yikun Jiang edited comment on SPARK-36058 at 9/27/21, 2:31 PM: --- After this, it makes the executor pod allocator pluggable, we could add the VolcanoJobAllocator, and it enables the volcano ability (with queue, podgoup and so on...) on excutors side. How about driver side? Do you have any suggestion on it? [~holden] BTW, I just wanna sure that [1][2] abilities targets on driver/excutor side, right? That means we need add PodGroup/Queue abilities on each allocator? [1] scheduler & queue: https://issues.apache.org/jira/browse/SPARK-36059 [2] PodGroup https://issues.apache.org/jira/browse/SPARK-36061 was (Author: yikunkero): After this, it makes the executor pod allocator pluggable, we could add the VolcanoJobAllocator, and it enables the volcano ability (with queue, podgoup and so on...) on excutors side. How about driver side? Do you have any suggestion on it? [~holden] BTW, I just wanna sure that [1][2] abilities targets on driver/excutor side, right? [1] scheduler & queue: https://issues.apache.org/jira/browse/SPARK-36059 [2] PodGroup https://issues.apache.org/jira/browse/SPARK-36061 > Support replicasets/job API > --- > > Key: SPARK-36058 > URL: https://issues.apache.org/jira/browse/SPARK-36058 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.2.0, 3.3.0 >Reporter: Holden Karau >Assignee: Holden Karau >Priority: Major > Fix For: 3.3.0 > > > Volcano & Yunikorn both support scheduling invidual pods, but they also > support higher level abstractions similar to the vanilla Kube replicasets > which we can use to improve scheduling performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36829) Refactor collectionOperation related Null check related code
[ https://issues.apache.org/jira/browse/SPARK-36829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-36829: --- Assignee: angerszhu > Refactor collectionOperation related Null check related code > > > Key: SPARK-36829 > URL: https://issues.apache.org/jira/browse/SPARK-36829 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.2, 3.1.2, 3.2.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36829) Refactor collectionOperation related Null check related code
[ https://issues.apache.org/jira/browse/SPARK-36829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-36829. - Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34077 [https://github.com/apache/spark/pull/34077] > Refactor collectionOperation related Null check related code > > > Key: SPARK-36829 > URL: https://issues.apache.org/jira/browse/SPARK-36829 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.2, 3.1.2, 3.2.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36862) ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java'
[ https://issues.apache.org/jira/browse/SPARK-36862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420746#comment-17420746 ] Jungtaek Lim commented on SPARK-36862: -- I guess you'd have the generated code in the log. Please attach the snippet. If there's no log for generated code, please raise a log level for CodeGenerator in log4j config and try running the query again. org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator=DEBUG > ERROR CodeGenerator: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java' > - > > Key: SPARK-36862 > URL: https://issues.apache.org/jira/browse/SPARK-36862 > Project: Spark > Issue Type: Bug > Components: Spark Submit, SQL >Affects Versions: 3.1.1, 3.1.2 > Environment: Spark 3.1.1 and Spark 3.1.2 > hadoop 3.2.1 >Reporter: Magdalena Pilawska >Priority: Major > > Hi, > I am getting the following error running spark-submit command: > ERROR CodeGenerator: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 321, Column 103: ')' expected instead of '[' > > It fails running the spark sql command on delta lake: > spark.sql(sqlTransformation) > The template of sqlTransformation is as follows: > MERGE INTO target_table AS d > USING source_table AS s > on s.id = d.id > WHEN MATCHED AND d.hash_value <> s.hash_value > THEN UPDATE SET d.name =s.name, d.address = s.address > > It is permanent error both for *spark 3.1.1* and *3.1.2* versions. > > The same works fine with spark 3.0.0. > > Here is the full log: > 2021-09-22 16:43:22,110 ERROR CodeGenerator: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 55, Column 103: ')' expected instead of '['2021-09-22 16:43:22,110 ERROR > CodeGenerator: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 55, Column 103: ')' expected instead of > '['org.codehaus.commons.compiler.CompileException: File 'generated.java', > Line 55, Column 103: ')' expected instead of '[' at > org.codehaus.janino.TokenStreamImpl.compileException(TokenStreamImpl.java:362) > at org.codehaus.janino.TokenStreamImpl.read(TokenStreamImpl.java:150) at > org.codehaus.janino.Parser.read(Parser.java:3703) at > org.codehaus.janino.Parser.parseFormalParameters(Parser.java:1622) at > org.codehaus.janino.Parser.parseMethodDeclarationRest(Parser.java:1518) at > org.codehaus.janino.Parser.parseClassBodyDeclaration(Parser.java:1028) at > org.codehaus.janino.Parser.parseClassBody(Parser.java:841) at > org.codehaus.janino.Parser.parseClassDeclarationRest(Parser.java:736) at > org.codehaus.janino.Parser.parseClassBodyDeclaration(Parser.java:941) at > org.codehaus.janino.ClassBodyEvaluator.cook(ClassBodyEvaluator.java:234) at > org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:205) at > org.codehaus.commons.compiler.Cookable.cook(Cookable.java:80) at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1427) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1524) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1521) > at > org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) > at > org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) > at > org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) > at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) > at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000) at > org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at > org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1375) > at > org.apache.spark.sql.execution.WholeStageCodegenExec.liftedTree1$1(WholeStageCodegenExec.scala:721) > at > org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:720) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:185) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:223) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:220) at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:181) at >
[jira] [Commented] (SPARK-36817) Does Apache Spark 3 support GPU usage for Spark RDDs?
[ https://issues.apache.org/jira/browse/SPARK-36817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420737#comment-17420737 ] Thomas Graves commented on SPARK-36817: --- please refer to https://github.com/NVIDIA/spark-rapids/issues/35791 > Does Apache Spark 3 support GPU usage for Spark RDDs? > - > > Key: SPARK-36817 > URL: https://issues.apache.org/jira/browse/SPARK-36817 > Project: Spark > Issue Type: Question > Components: Spark Core >Affects Versions: 3.1.2 >Reporter: Abhishek Shakya >Priority: Major > > I am currently trying to run genomic analyses pipelines using > [Hail|https://hail.is/](library for genomics analyses written in python and > Scala). Recently, Apache Spark 3 was released and it supported GPU usage. > I tried [spark-rapids|https://nvidia.github.io/spark-rapids/] library start > an on-premise slurm cluster with gpu nodes. I was able to initialise the > cluster. However, when I tried running hail tasks, the executors keep getting > killed. > On querying in Hail forum, I got the response that > {quote}That’s a GPU code generator for Spark-SQL, and Hail doesn’t use any > Spark-SQL interfaces, only the RDD interfaces. > {quote} > So, does Spark3 not support GPU usage for RDD interfaces? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36862) ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java'
[ https://issues.apache.org/jira/browse/SPARK-36862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420730#comment-17420730 ] Magdalena Pilawska commented on SPARK-36862: Hi [~kabhwan], I updated the description with triggered operation and log details, thanks. > ERROR CodeGenerator: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java' > - > > Key: SPARK-36862 > URL: https://issues.apache.org/jira/browse/SPARK-36862 > Project: Spark > Issue Type: Bug > Components: Spark Submit, SQL >Affects Versions: 3.1.1, 3.1.2 > Environment: Spark 3.1.1 and Spark 3.1.2 > hadoop 3.2.1 >Reporter: Magdalena Pilawska >Priority: Major > > Hi, > I am getting the following error running spark-submit command: > ERROR CodeGenerator: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 321, Column 103: ')' expected instead of '[' > > It fails running the spark sql command on delta lake: > spark.sql(sqlTransformation) > The template of sqlTransformation is as follows: > MERGE INTO target_table AS d > USING source_table AS s > on s.id = d.id > WHEN MATCHED AND d.hash_value <> s.hash_value > THEN UPDATE SET d.name =s.name, d.address = s.address > > It is permanent error both for *spark 3.1.1* and *3.1.2* versions. > > The same works fine with spark 3.0.0. > > Here is the full log: > 2021-09-22 16:43:22,110 ERROR CodeGenerator: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 55, Column 103: ')' expected instead of '['2021-09-22 16:43:22,110 ERROR > CodeGenerator: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 55, Column 103: ')' expected instead of > '['org.codehaus.commons.compiler.CompileException: File 'generated.java', > Line 55, Column 103: ')' expected instead of '[' at > org.codehaus.janino.TokenStreamImpl.compileException(TokenStreamImpl.java:362) > at org.codehaus.janino.TokenStreamImpl.read(TokenStreamImpl.java:150) at > org.codehaus.janino.Parser.read(Parser.java:3703) at > org.codehaus.janino.Parser.parseFormalParameters(Parser.java:1622) at > org.codehaus.janino.Parser.parseMethodDeclarationRest(Parser.java:1518) at > org.codehaus.janino.Parser.parseClassBodyDeclaration(Parser.java:1028) at > org.codehaus.janino.Parser.parseClassBody(Parser.java:841) at > org.codehaus.janino.Parser.parseClassDeclarationRest(Parser.java:736) at > org.codehaus.janino.Parser.parseClassBodyDeclaration(Parser.java:941) at > org.codehaus.janino.ClassBodyEvaluator.cook(ClassBodyEvaluator.java:234) at > org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:205) at > org.codehaus.commons.compiler.Cookable.cook(Cookable.java:80) at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1427) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1524) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1521) > at > org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) > at > org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) > at > org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) > at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) > at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000) at > org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at > org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1375) > at > org.apache.spark.sql.execution.WholeStageCodegenExec.liftedTree1$1(WholeStageCodegenExec.scala:721) > at > org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:720) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:185) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:223) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:220) at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:181) at > org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.inputRDD$lzycompute(ShuffleExchangeExec.scala:160) > at >
[jira] [Updated] (SPARK-36862) ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java'
[ https://issues.apache.org/jira/browse/SPARK-36862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Magdalena Pilawska updated SPARK-36862: --- Description: Hi, I am getting the following error running spark-submit command: ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 321, Column 103: ')' expected instead of '[' It fails running the spark sql command on delta lake: spark.sql(sqlTransformation) The template of sqlTransformation is as follows: MERGE INTO target_table AS d USING source_table AS s on s.id = d.id WHEN MATCHED AND d.hash_value <> s.hash_value THEN UPDATE SET d.name =s.name, d.address = s.address It is permanent error both for *spark 3.1.1* and *3.1.2* versions. The same works fine with spark 3.0.0. Here is the full log: 2021-09-22 16:43:22,110 ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 55, Column 103: ')' expected instead of '['2021-09-22 16:43:22,110 ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 55, Column 103: ')' expected instead of '['org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 55, Column 103: ')' expected instead of '[' at org.codehaus.janino.TokenStreamImpl.compileException(TokenStreamImpl.java:362) at org.codehaus.janino.TokenStreamImpl.read(TokenStreamImpl.java:150) at org.codehaus.janino.Parser.read(Parser.java:3703) at org.codehaus.janino.Parser.parseFormalParameters(Parser.java:1622) at org.codehaus.janino.Parser.parseMethodDeclarationRest(Parser.java:1518) at org.codehaus.janino.Parser.parseClassBodyDeclaration(Parser.java:1028) at org.codehaus.janino.Parser.parseClassBody(Parser.java:841) at org.codehaus.janino.Parser.parseClassDeclarationRest(Parser.java:736) at org.codehaus.janino.Parser.parseClassBodyDeclaration(Parser.java:941) at org.codehaus.janino.ClassBodyEvaluator.cook(ClassBodyEvaluator.java:234) at org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:205) at org.codehaus.commons.compiler.Cookable.cook(Cookable.java:80) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1427) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1524) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1521) at org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) at org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) at org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000) at org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1375) at org.apache.spark.sql.execution.WholeStageCodegenExec.liftedTree1$1(WholeStageCodegenExec.scala:721) at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:720) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:185) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:223) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:220) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:181) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.inputRDD$lzycompute(ShuffleExchangeExec.scala:160) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.inputRDD(ShuffleExchangeExec.scala:160) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.mapOutputStatisticsFuture$lzycompute(ShuffleExchangeExec.scala:164) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.mapOutputStatisticsFuture(ShuffleExchangeExec.scala:163) at org.apache.spark.sql.execution.exchange.ShuffleExchangeLike.$anonfun$materializeFuture$2(ShuffleExchangeExec.scala:100) at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52) at org.apache.spark.sql.execution.exchange.ShuffleExchangeLike.$anonfun$materializeFuture$1(ShuffleExchangeExec.scala:100) at org.apache.spark.sql.util.LazyValue.getOrInit(LazyValue.scala:41) at org.apache.spark.sql.execution.exchange.Exchange.getOrInitMaterializeFuture(Exchange.scala:68) at
[jira] [Updated] (SPARK-36862) ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java'
[ https://issues.apache.org/jira/browse/SPARK-36862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Magdalena Pilawska updated SPARK-36862: --- Description: Hi, I am getting the following error running spark-submit command: ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 321, Column 103: ')' expected instead of '[' It fails running the spark sql command on delta lake: spark.sql(sqlTransformation) The template of sqlTransformation is as follows: MERGE INTO target_table AS d USING source_table AS s on s.id = d.id WHEN MATCHED AND d.hash_value <> s.hash_value THEN UPDATE SET d.name =s.name, d.address = s.address It is permanent error both for *spark 3.1.1* and *3.1.2* versions. The same works fine with spark 3.0.0. was: Hi, I am getting the following error running spark-submit command: ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 321, Column 103: ')' expected instead of '[' It is permanent error both for spark 3.1.1 and 3.1.2 versions. The same works fine with spark 3.0.0. > ERROR CodeGenerator: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java' > - > > Key: SPARK-36862 > URL: https://issues.apache.org/jira/browse/SPARK-36862 > Project: Spark > Issue Type: Bug > Components: Spark Submit, SQL >Affects Versions: 3.1.1, 3.1.2 > Environment: Spark 3.1.1 and Spark 3.1.2 > hadoop 3.2.1 >Reporter: Magdalena Pilawska >Priority: Major > > Hi, > I am getting the following error running spark-submit command: > ERROR CodeGenerator: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 321, Column 103: ')' expected instead of '[' > > It fails running the spark sql command on delta lake: > spark.sql(sqlTransformation) > The template of sqlTransformation is as follows: > MERGE INTO target_table AS d > USING source_table AS s > on s.id = d.id > WHEN MATCHED AND d.hash_value <> s.hash_value > THEN UPDATE SET d.name =s.name, d.address = s.address > > It is permanent error both for *spark 3.1.1* and *3.1.2* versions. > > The same works fine with spark 3.0.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31646) Remove unused registeredConnections counter from ShuffleMetrics
[ https://issues.apache.org/jira/browse/SPARK-31646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420688#comment-17420688 ] Manu Zhang commented on SPARK-31646: {quote}So I will try to derive numBackLoggedConnections outside at the metrics monitoring system. Any better suggestion? {quote} This looks good. > Remove unused registeredConnections counter from ShuffleMetrics > --- > > Key: SPARK-31646 > URL: https://issues.apache.org/jira/browse/SPARK-31646 > Project: Spark > Issue Type: Improvement > Components: Deploy, Shuffle, Spark Core >Affects Versions: 3.0.0 >Reporter: Manu Zhang >Assignee: Manu Zhang >Priority: Minor > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-36058) Support replicasets/job API
[ https://issues.apache.org/jira/browse/SPARK-36058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420568#comment-17420568 ] Yikun Jiang edited comment on SPARK-36058 at 9/27/21, 11:22 AM: After this, it makes the executor pod allocator pluggable, we could add the VolcanoJobAllocator, and it enables the volcano ability (with queue, podgoup and so on...) on excutors side. How about driver side? Do you have any suggestion on it? [~holden] BTW, I just wanna sure that [1][2] abilities targets on driver/excutor side, right? [1] scheduler & queue: https://issues.apache.org/jira/browse/SPARK-36059 [2] PodGroup https://issues.apache.org/jira/browse/SPARK-36061 was (Author: yikunkero): After this, it makes the executor pod allocator pluggable, we could add the VolcanoJobAllocator, and it enables the volcano ability (with queue, podgoup and so on...) on excutors side. and looks like[1][2] abilities also targets on excutor side, right? How about driver side? Do you have any suggestion on it? [~holden] [1] scheduler & queue: https://issues.apache.org/jira/browse/SPARK-36059 [2] PodGroup https://issues.apache.org/jira/browse/SPARK-36061 > Support replicasets/job API > --- > > Key: SPARK-36058 > URL: https://issues.apache.org/jira/browse/SPARK-36058 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.2.0, 3.3.0 >Reporter: Holden Karau >Assignee: Holden Karau >Priority: Major > Fix For: 3.3.0 > > > Volcano & Yunikorn both support scheduling invidual pods, but they also > support higher level abstractions similar to the vanilla Kube replicasets > which we can use to improve scheduling performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36862) ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java'
[ https://issues.apache.org/jira/browse/SPARK-36862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420671#comment-17420671 ] Jungtaek Lim commented on SPARK-36862: -- Could you please provide more information? I guess the log would contain the generated code which we can investigate on. Otherwise I could give some instruction to enable DEBUG log to retain the generated code. Also could you please provide which operation(s) your query executes, if they're OK to be exposed to the public? > ERROR CodeGenerator: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java' > - > > Key: SPARK-36862 > URL: https://issues.apache.org/jira/browse/SPARK-36862 > Project: Spark > Issue Type: Bug > Components: Spark Submit, SQL >Affects Versions: 3.1.1, 3.1.2 > Environment: Spark 3.1.1 and Spark 3.1.2 > hadoop 3.2.1 >Reporter: Magdalena Pilawska >Priority: Major > > Hi, > I am getting the following error running spark-submit command: > ERROR CodeGenerator: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 321, Column 103: ')' expected instead of '[' > > It is permanent error both for spark 3.1.1 and 3.1.2 versions. > > The same works fine with spark 3.0.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36862) ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java'
Magdalena Pilawska created SPARK-36862: -- Summary: ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java' Key: SPARK-36862 URL: https://issues.apache.org/jira/browse/SPARK-36862 Project: Spark Issue Type: Bug Components: Spark Submit, SQL Affects Versions: 3.1.2, 3.1.1 Environment: Spark 3.1.1 and Spark 3.1.2 hadoop 3.2.1 Reporter: Magdalena Pilawska Hi, I am getting the following error running spark-submit command: ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 321, Column 103: ')' expected instead of '[' It is permanent error both for spark 3.1.1 and 3.1.2 versions. The same works fine with spark 3.0.0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36438) Support list-like Python objects for Series comparison
[ https://issues.apache.org/jira/browse/SPARK-36438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420656#comment-17420656 ] Apache Spark commented on SPARK-36438: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/34114 > Support list-like Python objects for Series comparison > -- > > Key: SPARK-36438 > URL: https://issues.apache.org/jira/browse/SPARK-36438 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Xinrong Meng >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36438) Support list-like Python objects for Series comparison
[ https://issues.apache.org/jira/browse/SPARK-36438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36438: Assignee: Apache Spark > Support list-like Python objects for Series comparison > -- > > Key: SPARK-36438 > URL: https://issues.apache.org/jira/browse/SPARK-36438 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Xinrong Meng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36438) Support list-like Python objects for Series comparison
[ https://issues.apache.org/jira/browse/SPARK-36438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420655#comment-17420655 ] Apache Spark commented on SPARK-36438: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/34114 > Support list-like Python objects for Series comparison > -- > > Key: SPARK-36438 > URL: https://issues.apache.org/jira/browse/SPARK-36438 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Xinrong Meng >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36438) Support list-like Python objects for Series comparison
[ https://issues.apache.org/jira/browse/SPARK-36438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36438: Assignee: (was: Apache Spark) > Support list-like Python objects for Series comparison > -- > > Key: SPARK-36438 > URL: https://issues.apache.org/jira/browse/SPARK-36438 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Xinrong Meng >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36816) Introduce a config variable for the incrementalCollects row batch size
[ https://issues.apache.org/jira/browse/SPARK-36816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420578#comment-17420578 ] Ole commented on SPARK-36816: - I am running a Thrift Server {{/spark/sbin/start-thriftserver.sh}} with {{--conf spark.sql.thriftServer.incrementalCollect=true}} to prevent OutOfMemory Exceptions. Querying data results in batched result sets (as intended) with log messages like this: {code:bash} 21/09/27 08:25:33 INFO SparkExecuteStatementOperation: Returning result set with 1000 rows from offsets [932000, 933000) with 50f346c0-02d4-40a2-a73c-30d326d2aae{code} I'd like to be able to configure the value of {{1000 rows }}to be able to adjust that value to our server capacity. Result would look like this: {code:java} 21/09/27 08:25:33 INFO SparkExecuteStatementOperation: Returning result set with 1 rows from offsets [932000, 942000) with 50f346c0-02d4-40a2-a73c-30d326d2aae{code} > Introduce a config variable for the incrementalCollects row batch size > -- > > Key: SPARK-36816 > URL: https://issues.apache.org/jira/browse/SPARK-36816 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.2 >Reporter: Ole >Priority: Minor > > After enabling *_spark.sql.thriftServer.incrementalCollects_* Thrift will > execute queries in batches (as intended). Unfortunately the batch size cannot > be configured as it seems to be hardcoded > [here|https://github.com/apache/spark/blob/6699f76fe2afa7f154b4ba424f3fe048fcee46df/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/thrift/ThriftCLIServiceClient.java#L404]. > It would be useful to configure that value to be able to adjust it to your > environment. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36058) Support replicasets/job API
[ https://issues.apache.org/jira/browse/SPARK-36058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420568#comment-17420568 ] Yikun Jiang commented on SPARK-36058: - After this, it makes the executor pod allocator pluggable, we could add the VolcanoJobAllocator, and it enables the volcano ability (with queue, podgoup and so on...) on excutors side. and looks like[1][2] abilities also targets on excutor side, right? How about driver side? Do you have any suggestion on it? [~holden] [1] scheduler & queue: https://issues.apache.org/jira/browse/SPARK-36059 [2] PodGroup https://issues.apache.org/jira/browse/SPARK-36061 > Support replicasets/job API > --- > > Key: SPARK-36058 > URL: https://issues.apache.org/jira/browse/SPARK-36058 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.2.0, 3.3.0 >Reporter: Holden Karau >Assignee: Holden Karau >Priority: Major > Fix For: 3.3.0 > > > Volcano & Yunikorn both support scheduling invidual pods, but they also > support higher level abstractions similar to the vanilla Kube replicasets > which we can use to improve scheduling performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36861) Partition columns are overly eagerly parsed as dates
[ https://issues.apache.org/jira/browse/SPARK-36861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420553#comment-17420553 ] Tanel Kiis commented on SPARK-36861: Sorry, indeed I ran the test on master. Nevermind it then, does not impact the 3.2 release. > Partition columns are overly eagerly parsed as dates > > > Key: SPARK-36861 > URL: https://issues.apache.org/jira/browse/SPARK-36861 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Tanel Kiis >Priority: Major > > I have an input directory with subdirs: > * hour=2021-01-01T00 > * hour=2021-01-01T01 > * hour=2021-01-01T02 > * ... > in spark 3.1 the 'hour' column is parsed as a string type, but in 3.2 RC it > is parsed as date type and the hour part is lost. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-36861) Partition columns are overly eagerly parsed as dates
[ https://issues.apache.org/jira/browse/SPARK-36861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420549#comment-17420549 ] Gengliang Wang edited comment on SPARK-36861 at 9/27/21, 8:06 AM: -- Hmm, the PR https://github.com/apache/spark/pull/33709 is only on master. I can't reproduce your case on 3.2.0 RC4 with: {code:scala} > val df = Seq(("2021-01-01T00", 0), ("2021-01-01T01", 1), ("2021-01-01T02", > 2)).toDF("hour", "i") > df.write.partitionBy("hour").parquet("/tmp/t1") > spark.read.parquet("/tmp/t1").schema res2: org.apache.spark.sql.types.StructType = StructType(StructField(i,IntegerType,true), StructField(hour,StringType,true)) {code} The issue can be reproduced on Spark master though. was (Author: gengliang.wang): Hmm, the PR https://github.com/apache/spark/pull/33709 is only on master. I can't reproduce your case on RC4 with: {code:scala} > val df = Seq(("2021-01-01T00", 0), ("2021-01-01T01", 1), ("2021-01-01T02", > 2)).toDF("hour", "i") > df.write.partitionBy("hour").parquet("/tmp/t1") > spark.read.parquet("/tmp/t1").schema res2: org.apache.spark.sql.types.StructType = StructType(StructField(i,IntegerType,true), StructField(hour,StringType,true)) {code} > Partition columns are overly eagerly parsed as dates > > > Key: SPARK-36861 > URL: https://issues.apache.org/jira/browse/SPARK-36861 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Tanel Kiis >Priority: Major > > I have an input directory with subdirs: > * hour=2021-01-01T00 > * hour=2021-01-01T01 > * hour=2021-01-01T02 > * ... > in spark 3.1 the 'hour' column is parsed as a string type, but in 3.2 RC it > is parsed as date type and the hour part is lost. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36861) Partition columns are overly eagerly parsed as dates
[ https://issues.apache.org/jira/browse/SPARK-36861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang updated SPARK-36861: --- Affects Version/s: (was: 3.2.0) 3.3.0 > Partition columns are overly eagerly parsed as dates > > > Key: SPARK-36861 > URL: https://issues.apache.org/jira/browse/SPARK-36861 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Tanel Kiis >Priority: Major > > I have an input directory with subdirs: > * hour=2021-01-01T00 > * hour=2021-01-01T01 > * hour=2021-01-01T02 > * ... > in spark 3.1 the 'hour' column is parsed as a string type, but in 3.2 RC it > is parsed as date type and the hour part is lost. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36861) Partition columns are overly eagerly parsed as dates
[ https://issues.apache.org/jira/browse/SPARK-36861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420549#comment-17420549 ] Gengliang Wang commented on SPARK-36861: Hmm, the PR https://github.com/apache/spark/pull/33709 is only on master. I can't reproduce your case on RC4 with: {code:scala} > val df = Seq(("2021-01-01T00", 0), ("2021-01-01T01", 1), ("2021-01-01T02", > 2)).toDF("hour", "i") > df.write.partitionBy("hour").parquet("/tmp/t1") > spark.read.parquet("/tmp/t1").schema res2: org.apache.spark.sql.types.StructType = StructType(StructField(i,IntegerType,true), StructField(hour,StringType,true)) {code} > Partition columns are overly eagerly parsed as dates > > > Key: SPARK-36861 > URL: https://issues.apache.org/jira/browse/SPARK-36861 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Tanel Kiis >Priority: Major > > I have an input directory with subdirs: > * hour=2021-01-01T00 > * hour=2021-01-01T01 > * hour=2021-01-01T02 > * ... > in spark 3.1 the 'hour' column is parsed as a string type, but in 3.2 RC it > is parsed as date type and the hour part is lost. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32712) Support writing Hive non-ORC/Parquet bucketed table
[ https://issues.apache.org/jira/browse/SPARK-32712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-32712. - Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34103 [https://github.com/apache/spark/pull/34103] > Support writing Hive non-ORC/Parquet bucketed table > > > Key: SPARK-32712 > URL: https://issues.apache.org/jira/browse/SPARK-32712 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Cheng Su >Assignee: Cheng Su >Priority: Minor > Fix For: 3.3.0 > > > Hive non-ORC/Parquet write code path is original Hive table write path > (InsertIntoHiveTable). This JIRA is to support write hivehash bucketed table > (for Hive 1.x.y and 2.x.y), and hive murmur3hash bucketed table (for Hive > 3.x.y), for these non-ORC/Parquet-serde Hive bucketed table. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32712) Support writing Hive non-ORC/Parquet bucketed table
[ https://issues.apache.org/jira/browse/SPARK-32712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-32712: --- Assignee: Cheng Su > Support writing Hive non-ORC/Parquet bucketed table > > > Key: SPARK-32712 > URL: https://issues.apache.org/jira/browse/SPARK-32712 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Cheng Su >Assignee: Cheng Su >Priority: Minor > > Hive non-ORC/Parquet write code path is original Hive table write path > (InsertIntoHiveTable). This JIRA is to support write hivehash bucketed table > (for Hive 1.x.y and 2.x.y), and hive murmur3hash bucketed table (for Hive > 3.x.y), for these non-ORC/Parquet-serde Hive bucketed table. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36861) Partition columns are overly eagerly parsed as dates
[ https://issues.apache.org/jira/browse/SPARK-36861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420538#comment-17420538 ] Gengliang Wang commented on SPARK-36861: [~tanelk] This is a new behavior introduced from https://github.com/apache/spark/pull/33709 However, turning into date and losing the hour part seems wrong. cc [~maxgekk] [~cloud_fan] > Partition columns are overly eagerly parsed as dates > > > Key: SPARK-36861 > URL: https://issues.apache.org/jira/browse/SPARK-36861 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Tanel Kiis >Priority: Major > > I have an input directory with subdirs: > * hour=2021-01-01T00 > * hour=2021-01-01T01 > * hour=2021-01-01T02 > * ... > in spark 3.1 the 'hour' column is parsed as a string type, but in 3.2 RC it > is parsed as date type and the hour part is lost. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36797) Union should resolve nested columns as top-level columns
[ https://issues.apache.org/jira/browse/SPARK-36797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-36797: --- Assignee: L. C. Hsieh > Union should resolve nested columns as top-level columns > > > Key: SPARK-36797 > URL: https://issues.apache.org/jira/browse/SPARK-36797 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > > Union, by definition, resolves columns by position. Currently we only follow > this behavior at top-level columns, but not nested columns. > As we are making nested columns as first-class citizen, the > nested-column-only limitation and the difference between top-level column and > nested column do not make sense. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36797) Union should resolve nested columns as top-level columns
[ https://issues.apache.org/jira/browse/SPARK-36797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-36797. - Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34038 [https://github.com/apache/spark/pull/34038] > Union should resolve nested columns as top-level columns > > > Key: SPARK-36797 > URL: https://issues.apache.org/jira/browse/SPARK-36797 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Fix For: 3.3.0 > > > Union, by definition, resolves columns by position. Currently we only follow > this behavior at top-level columns, but not nested columns. > As we are making nested columns as first-class citizen, the > nested-column-only limitation and the difference between top-level column and > nested column do not make sense. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36435) Implement MultIndex.equal_levels
[ https://issues.apache.org/jira/browse/SPARK-36435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420534#comment-17420534 ] Apache Spark commented on SPARK-36435: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/34113 > Implement MultIndex.equal_levels > > > Key: SPARK-36435 > URL: https://issues.apache.org/jira/browse/SPARK-36435 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Xinrong Meng >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36435) Implement MultIndex.equal_levels
[ https://issues.apache.org/jira/browse/SPARK-36435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36435: Assignee: (was: Apache Spark) > Implement MultIndex.equal_levels > > > Key: SPARK-36435 > URL: https://issues.apache.org/jira/browse/SPARK-36435 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Xinrong Meng >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36435) Implement MultIndex.equal_levels
[ https://issues.apache.org/jira/browse/SPARK-36435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36435: Assignee: Apache Spark > Implement MultIndex.equal_levels > > > Key: SPARK-36435 > URL: https://issues.apache.org/jira/browse/SPARK-36435 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Xinrong Meng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36861) Partition columns are overly eagerly parsed as dates
[ https://issues.apache.org/jira/browse/SPARK-36861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420533#comment-17420533 ] Tanel Kiis commented on SPARK-36861: If this is expected behaviour, then I would expect there to be a simple way to turn this off. Currently only one I can think of is manually specifing the schema. > Partition columns are overly eagerly parsed as dates > > > Key: SPARK-36861 > URL: https://issues.apache.org/jira/browse/SPARK-36861 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Tanel Kiis >Priority: Major > > I have an input directory with subdirs: > * hour=2021-01-01T00 > * hour=2021-01-01T01 > * hour=2021-01-01T02 > * ... > in spark 3.1 the 'hour' column is parsed as a string type, but in 3.2 RC it > is parsed as date type and the hour part is lost. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-36861) Partition columns are overly eagerly parsed as dates
[ https://issues.apache.org/jira/browse/SPARK-36861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420532#comment-17420532 ] Tanel Kiis edited comment on SPARK-36861 at 9/27/21, 7:45 AM: -- [~Gengliang.Wang] I think, that this should be considered as a blocker for the 3.2 release was (Author: tanelk): [~Gengliang.Wang] I think, that this should be considered as a blocker. > Partition columns are overly eagerly parsed as dates > > > Key: SPARK-36861 > URL: https://issues.apache.org/jira/browse/SPARK-36861 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Tanel Kiis >Priority: Major > > I have an input directory with subdirs: > * hour=2021-01-01T00 > * hour=2021-01-01T01 > * hour=2021-01-01T02 > * ... > in spark 3.1 the 'hour' column is parsed as a string type, but in 3.2 RC it > is parsed as date type and the hour part is lost. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36861) Partition columns are overly eagerly parsed as dates
[ https://issues.apache.org/jira/browse/SPARK-36861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420532#comment-17420532 ] Tanel Kiis commented on SPARK-36861: [~Gengliang.Wang] I think, that this should be considered as a blocker. > Partition columns are overly eagerly parsed as dates > > > Key: SPARK-36861 > URL: https://issues.apache.org/jira/browse/SPARK-36861 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Tanel Kiis >Priority: Major > > I have an input directory with subdirs: > * hour=2021-01-01T00 > * hour=2021-01-01T01 > * hour=2021-01-01T02 > * ... > in spark 3.1 the 'hour' column is parsed as a string type, but in 3.2 RC it > is parsed as date type and the hour part is lost. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36861) Partition columns are overly eagerly parsed as dates
Tanel Kiis created SPARK-36861: -- Summary: Partition columns are overly eagerly parsed as dates Key: SPARK-36861 URL: https://issues.apache.org/jira/browse/SPARK-36861 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0 Reporter: Tanel Kiis I have an input directory with subdirs: * hour=2021-01-01T00 * hour=2021-01-01T01 * hour=2021-01-01T02 * ... in spark 3.1 the 'hour' column is parsed as a string type, but in 3.2 RC it is parsed as date type and the hour part is lost. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36711) Support multi-index in new syntax
[ https://issues.apache.org/jira/browse/SPARK-36711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420522#comment-17420522 ] Apache Spark commented on SPARK-36711: -- User 'thangnd197' has created a pull request for this issue: https://github.com/apache/spark/pull/34112 > Support multi-index in new syntax > - > > Key: SPARK-36711 > URL: https://issues.apache.org/jira/browse/SPARK-36711 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Hyukjin Kwon >Priority: Major > > Support multi-index in the new syntax SPARK-36709 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36711) Support multi-index in new syntax
[ https://issues.apache.org/jira/browse/SPARK-36711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420519#comment-17420519 ] Apache Spark commented on SPARK-36711: -- User 'thangnd197' has created a pull request for this issue: https://github.com/apache/spark/pull/34112 > Support multi-index in new syntax > - > > Key: SPARK-36711 > URL: https://issues.apache.org/jira/browse/SPARK-36711 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Hyukjin Kwon >Priority: Major > > Support multi-index in the new syntax SPARK-36709 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36856) Building by "./build/mvn" may be stuck on MacOS
[ https://issues.apache.org/jira/browse/SPARK-36856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] copperybean updated SPARK-36856: Issue Type: Bug (was: Improvement) > Building by "./build/mvn" may be stuck on MacOS > --- > > Key: SPARK-36856 > URL: https://issues.apache.org/jira/browse/SPARK-36856 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.0.0, 3.3.0 > Environment: MacOS 11.4 >Reporter: copperybean >Priority: Major > > Command "./build/mvn" will be stuck on my MacOS 11.4. Because it is using > error java home. On my mac, "/usr/bin/java" is a real file instead of a > symbolic link, so the java home is set to path "/usr", and lead the launched > maven process stuck with this error java home. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36860) Create the external hive table for HBase failed
[ https://issues.apache.org/jira/browse/SPARK-36860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wineternity updated SPARK-36860: Attachment: image-2021-09-27-14-25-28-900.png > Create the external hive table for HBase failed > > > Key: SPARK-36860 > URL: https://issues.apache.org/jira/browse/SPARK-36860 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2 >Reporter: wineternity >Priority: Major > Attachments: image-2021-09-27-14-18-10-910.png, > image-2021-09-27-14-25-28-900.png > > > We use follow sql to create hive external table , which read from hbase > {code:java} > CREATE EXTERNAL TABLE if not exists dev.sanyu_spotlight_headline_material( >rowkey string COMMENT 'HBase主键', >content string COMMENT '图文正文') > USING HIVE > ROW FORMAT SERDE >'org.apache.hadoop.hive.hbase.HBaseSerDe' > STORED BY >'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > WITH SERDEPROPERTIES ( >'hbase.columns.mapping'=':key, cf1:content' > ) > TBLPROPERTIES ( >'hbase.table.name'='spotlight_headline_material' > ); > {code} > But the sql failed in Spark 3.1.2, which throw this exception > {code:java} > 21/09/27 11:44:24 INFO scheduler.DAGScheduler: Asked to cancel job group > 26d7459f-7b58-4c18-9939-5f2737525ff2 > 21/09/27 11:44:24 ERROR thriftserver.SparkExecuteStatementOperation: Error > executing query with 26d7459f-7b58-4c18-9939-5f2737525ff2, currentState > RUNNING, > org.apache.spark.sql.catalyst.parser.ParseException: > Operation not allowed: Unexpected combination of ROW FORMAT SERDE > 'org.apache.hadoop.hive.hbase.HBaseSerDe' and STORED BY > 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'WITHSERDEPROPERTIES('hbase.columns.mapping'=':key, > cf1:content')(line 5, pos 0) > {code} > this check was introduced from this change: > [https://github.com/apache/spark/pull/28026] > > Could anyone gave the introduction how to create the external table for hbase > in spark3 now ? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36860) Create the external hive table for HBase failed
[ https://issues.apache.org/jira/browse/SPARK-36860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420497#comment-17420497 ] wineternity commented on SPARK-36860: - thanks, [~sarutak] , may I ask why spark doesn't support creating Hive table using storage handlers? it seems spark supported the stored by syntax, the data is already in CreateFileFormatContext. !image-2021-09-27-14-18-10-910.png! but the validateRowFormatFileFormat in AstBuilder later only checks the fileformat provided in stored as syntax, and as fileformat is null here, it throw out an exception. Maybe stored by clause can be fixed by fix this check? !image-2021-09-27-14-25-28-900.png! > Create the external hive table for HBase failed > > > Key: SPARK-36860 > URL: https://issues.apache.org/jira/browse/SPARK-36860 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2 >Reporter: wineternity >Priority: Major > Attachments: image-2021-09-27-14-18-10-910.png > > > We use follow sql to create hive external table , which read from hbase > {code:java} > CREATE EXTERNAL TABLE if not exists dev.sanyu_spotlight_headline_material( >rowkey string COMMENT 'HBase主键', >content string COMMENT '图文正文') > USING HIVE > ROW FORMAT SERDE >'org.apache.hadoop.hive.hbase.HBaseSerDe' > STORED BY >'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > WITH SERDEPROPERTIES ( >'hbase.columns.mapping'=':key, cf1:content' > ) > TBLPROPERTIES ( >'hbase.table.name'='spotlight_headline_material' > ); > {code} > But the sql failed in Spark 3.1.2, which throw this exception > {code:java} > 21/09/27 11:44:24 INFO scheduler.DAGScheduler: Asked to cancel job group > 26d7459f-7b58-4c18-9939-5f2737525ff2 > 21/09/27 11:44:24 ERROR thriftserver.SparkExecuteStatementOperation: Error > executing query with 26d7459f-7b58-4c18-9939-5f2737525ff2, currentState > RUNNING, > org.apache.spark.sql.catalyst.parser.ParseException: > Operation not allowed: Unexpected combination of ROW FORMAT SERDE > 'org.apache.hadoop.hive.hbase.HBaseSerDe' and STORED BY > 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'WITHSERDEPROPERTIES('hbase.columns.mapping'=':key, > cf1:content')(line 5, pos 0) > {code} > this check was introduced from this change: > [https://github.com/apache/spark/pull/28026] > > Could anyone gave the introduction how to create the external table for hbase > in spark3 now ? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36860) Create the external hive table for HBase failed
[ https://issues.apache.org/jira/browse/SPARK-36860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wineternity updated SPARK-36860: Attachment: image-2021-09-27-14-18-10-910.png > Create the external hive table for HBase failed > > > Key: SPARK-36860 > URL: https://issues.apache.org/jira/browse/SPARK-36860 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2 >Reporter: wineternity >Priority: Major > Attachments: image-2021-09-27-14-18-10-910.png > > > We use follow sql to create hive external table , which read from hbase > {code:java} > CREATE EXTERNAL TABLE if not exists dev.sanyu_spotlight_headline_material( >rowkey string COMMENT 'HBase主键', >content string COMMENT '图文正文') > USING HIVE > ROW FORMAT SERDE >'org.apache.hadoop.hive.hbase.HBaseSerDe' > STORED BY >'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > WITH SERDEPROPERTIES ( >'hbase.columns.mapping'=':key, cf1:content' > ) > TBLPROPERTIES ( >'hbase.table.name'='spotlight_headline_material' > ); > {code} > But the sql failed in Spark 3.1.2, which throw this exception > {code:java} > 21/09/27 11:44:24 INFO scheduler.DAGScheduler: Asked to cancel job group > 26d7459f-7b58-4c18-9939-5f2737525ff2 > 21/09/27 11:44:24 ERROR thriftserver.SparkExecuteStatementOperation: Error > executing query with 26d7459f-7b58-4c18-9939-5f2737525ff2, currentState > RUNNING, > org.apache.spark.sql.catalyst.parser.ParseException: > Operation not allowed: Unexpected combination of ROW FORMAT SERDE > 'org.apache.hadoop.hive.hbase.HBaseSerDe' and STORED BY > 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'WITHSERDEPROPERTIES('hbase.columns.mapping'=':key, > cf1:content')(line 5, pos 0) > {code} > this check was introduced from this change: > [https://github.com/apache/spark/pull/28026] > > Could anyone gave the introduction how to create the external table for hbase > in spark3 now ? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org