[jira] [Commented] (SPARK-34007) Downgrade scala-maven-plugin to 4.3.0
[ https://issues.apache.org/jira/browse/SPARK-34007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258736#comment-17258736 ] Apache Spark commented on SPARK-34007: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/31031 > Downgrade scala-maven-plugin to 4.3.0 > - > > Key: SPARK-34007 > URL: https://issues.apache.org/jira/browse/SPARK-34007 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Blocker > > After we upgraded scala-maven-plugin to 4.4.0 at SPARK-33512, the docker > release script fails as below: > {code} > [INFO] Compiling 21 Scala sources and 3 Java sources to > /opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes > ... > [ERROR] ## Exception when compiling 24 sources to > /opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes > java.lang.SecurityException: class "javax.servlet.SessionCookieConfig"'s > signer information does not match signer information of other classes in the > same package > java.lang.ClassLoader.checkCerts(ClassLoader.java:891) > java.lang.ClassLoader.preDefineClass(ClassLoader.java:661) > java.lang.ClassLoader.defineClass(ClassLoader.java:754) > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > java.net.URLClassLoader.defineClass(URLClassLoader.java:468) > java.net.URLClassLoader.access$100(URLClassLoader.java:74) > java.net.URLClassLoader$1.run(URLClassLoader.java:369) > java.net.URLClassLoader$1.run(URLClassLoader.java:363) > java.security.AccessController.doPrivileged(Native Method) > java.net.URLClassLoader.findClass(URLClassLoader.java:362) > java.lang.ClassLoader.loadClass(ClassLoader.java:418) > java.lang.ClassLoader.loadClass(ClassLoader.java:351) > java.lang.Class.getDeclaredMethods0(Native Method) > java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > java.lang.Class.privateGetPublicMethods(Class.java:2902) > java.lang.Class.getMethods(Class.java:1615) > sbt.internal.inc.ClassToAPI$.toDefinitions0(ClassToAPI.scala:170) > sbt.internal.inc.ClassToAPI$.$anonfun$toDefinitions$1(ClassToAPI.scala:123) > scala.collection.mutable.HashMap.getOrElseUpdate(HashMap.scala:86) > sbt.internal.inc.ClassToAPI$.toDefinitions(ClassToAPI.scala:123) > sbt.internal.inc.ClassToAPI$.$anonfun$process$1(ClassToAPI.scala:33) > scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34007) Downgrade scala-maven-plugin to 4.3.0
[ https://issues.apache.org/jira/browse/SPARK-34007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34007: Assignee: Apache Spark > Downgrade scala-maven-plugin to 4.3.0 > - > > Key: SPARK-34007 > URL: https://issues.apache.org/jira/browse/SPARK-34007 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Blocker > > After we upgraded scala-maven-plugin to 4.4.0 at SPARK-33512, the docker > release script fails as below: > {code} > [INFO] Compiling 21 Scala sources and 3 Java sources to > /opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes > ... > [ERROR] ## Exception when compiling 24 sources to > /opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes > java.lang.SecurityException: class "javax.servlet.SessionCookieConfig"'s > signer information does not match signer information of other classes in the > same package > java.lang.ClassLoader.checkCerts(ClassLoader.java:891) > java.lang.ClassLoader.preDefineClass(ClassLoader.java:661) > java.lang.ClassLoader.defineClass(ClassLoader.java:754) > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > java.net.URLClassLoader.defineClass(URLClassLoader.java:468) > java.net.URLClassLoader.access$100(URLClassLoader.java:74) > java.net.URLClassLoader$1.run(URLClassLoader.java:369) > java.net.URLClassLoader$1.run(URLClassLoader.java:363) > java.security.AccessController.doPrivileged(Native Method) > java.net.URLClassLoader.findClass(URLClassLoader.java:362) > java.lang.ClassLoader.loadClass(ClassLoader.java:418) > java.lang.ClassLoader.loadClass(ClassLoader.java:351) > java.lang.Class.getDeclaredMethods0(Native Method) > java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > java.lang.Class.privateGetPublicMethods(Class.java:2902) > java.lang.Class.getMethods(Class.java:1615) > sbt.internal.inc.ClassToAPI$.toDefinitions0(ClassToAPI.scala:170) > sbt.internal.inc.ClassToAPI$.$anonfun$toDefinitions$1(ClassToAPI.scala:123) > scala.collection.mutable.HashMap.getOrElseUpdate(HashMap.scala:86) > sbt.internal.inc.ClassToAPI$.toDefinitions(ClassToAPI.scala:123) > sbt.internal.inc.ClassToAPI$.$anonfun$process$1(ClassToAPI.scala:33) > scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34007) Downgrade scala-maven-plugin to 4.3.0
[ https://issues.apache.org/jira/browse/SPARK-34007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34007: Assignee: (was: Apache Spark) > Downgrade scala-maven-plugin to 4.3.0 > - > > Key: SPARK-34007 > URL: https://issues.apache.org/jira/browse/SPARK-34007 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Blocker > > After we upgraded scala-maven-plugin to 4.4.0 at SPARK-33512, the docker > release script fails as below: > {code} > [INFO] Compiling 21 Scala sources and 3 Java sources to > /opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes > ... > [ERROR] ## Exception when compiling 24 sources to > /opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes > java.lang.SecurityException: class "javax.servlet.SessionCookieConfig"'s > signer information does not match signer information of other classes in the > same package > java.lang.ClassLoader.checkCerts(ClassLoader.java:891) > java.lang.ClassLoader.preDefineClass(ClassLoader.java:661) > java.lang.ClassLoader.defineClass(ClassLoader.java:754) > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > java.net.URLClassLoader.defineClass(URLClassLoader.java:468) > java.net.URLClassLoader.access$100(URLClassLoader.java:74) > java.net.URLClassLoader$1.run(URLClassLoader.java:369) > java.net.URLClassLoader$1.run(URLClassLoader.java:363) > java.security.AccessController.doPrivileged(Native Method) > java.net.URLClassLoader.findClass(URLClassLoader.java:362) > java.lang.ClassLoader.loadClass(ClassLoader.java:418) > java.lang.ClassLoader.loadClass(ClassLoader.java:351) > java.lang.Class.getDeclaredMethods0(Native Method) > java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > java.lang.Class.privateGetPublicMethods(Class.java:2902) > java.lang.Class.getMethods(Class.java:1615) > sbt.internal.inc.ClassToAPI$.toDefinitions0(ClassToAPI.scala:170) > sbt.internal.inc.ClassToAPI$.$anonfun$toDefinitions$1(ClassToAPI.scala:123) > scala.collection.mutable.HashMap.getOrElseUpdate(HashMap.scala:86) > sbt.internal.inc.ClassToAPI$.toDefinitions(ClassToAPI.scala:123) > sbt.internal.inc.ClassToAPI$.$anonfun$process$1(ClassToAPI.scala:33) > scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33948) branch-3.1 jenkins test failed in Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-33948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258734#comment-17258734 ] Wenchen Fan commented on SPARK-33948: - SPARK-33619 improved the codegen test coverage of Spark expression tests, this might be the reason for these test failures. > branch-3.1 jenkins test failed in Scala 2.13 > - > > Key: SPARK-33948 > URL: https://issues.apache.org/jira/browse/SPARK-33948 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 3.1.0 > Environment: * > >Reporter: Yang Jie >Priority: Major > > [https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/#showFailuresLink] > * > [org.apache.spark.network.client.TransportClientFactorySuite.reuseClientsUpToConfigVariableConcurrent|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/reuseClientsUpToConfigVariableConcurrent/] > * > [org.apache.spark.network.client.TransportClientFactorySuite.reuseClientsUpToConfigVariableConcurrent|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/reuseClientsUpToConfigVariableConcurrent_2/] > * > [org.apache.spark.network.client.TransportClientFactorySuite.fastFailConnectionInTimeWindow|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/fastFailConnectionInTimeWindow/] > * > [org.apache.spark.network.client.TransportClientFactorySuite.fastFailConnectionInTimeWindow|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/fastFailConnectionInTimeWindow_2/] > * > [org.apache.spark.network.client.TransportClientFactorySuite.closeFactoryBeforeCreateClient|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeFactoryBeforeCreateClient/] > * > [org.apache.spark.network.client.TransportClientFactorySuite.closeFactoryBeforeCreateClient|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeFactoryBeforeCreateClient_2/] > * > [org.apache.spark.network.client.TransportClientFactorySuite.closeBlockClientsWithFactory|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeBlockClientsWithFactory/] > * > [org.apache.spark.network.client.TransportClientFactorySuite.closeBlockClientsWithFactory|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeBlockClientsWithFactory_2/] > * > [org.apache.spark.network.client.TransportClientFactorySuite.neverReturnInactiveClients|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/neverReturnInactiveClients/] > * > [org.apache.spark.network.client.TransportClientFactorySuite.neverReturnInactiveClients|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/neverReturnInactiveClients_2/] > * > [org.apache.spark.network.client.TransportClientFactorySuite.closeIdleConnectionForRequestTimeOut|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeIdleConnectionForRequestTimeOut/] > * > [org.apache.spark.network.client.TransportClientFactorySuite.closeIdleConnectionForRequestTimeOut|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeIdleConnectionForRequestTimeOut_2/] > * > [org.apache.spark.network.client.TransportClientFactorySuite.returnDifferentClientsForDifferentServers|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/returnDifferentClientsForDifferentServers/] > > [
[jira] [Commented] (SPARK-34006) [spark.sql.hive.convertMetastoreOrc]This parameter can solve orc format table insert overwrite read table, it should be stated in the document
[ https://issues.apache.org/jira/browse/SPARK-34006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258732#comment-17258732 ] Apache Spark commented on SPARK-34006: -- User 'dh20' has created a pull request for this issue: https://github.com/apache/spark/pull/31030 > [spark.sql.hive.convertMetastoreOrc]This parameter can solve orc format table > insert overwrite read table, it should be stated in the document > -- > > Key: SPARK-34006 > URL: https://issues.apache.org/jira/browse/SPARK-34006 > Project: Spark > Issue Type: Bug > Components: docs >Affects Versions: 3.0.1 >Reporter: hao >Priority: Major > > This parameter can solve orc format table insert overwrite read table, it > should be stated in the document -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34006) [spark.sql.hive.convertMetastoreOrc]This parameter can solve orc format table insert overwrite read table, it should be stated in the document
[ https://issues.apache.org/jira/browse/SPARK-34006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34006: Assignee: Apache Spark > [spark.sql.hive.convertMetastoreOrc]This parameter can solve orc format table > insert overwrite read table, it should be stated in the document > -- > > Key: SPARK-34006 > URL: https://issues.apache.org/jira/browse/SPARK-34006 > Project: Spark > Issue Type: Bug > Components: docs >Affects Versions: 3.0.1 >Reporter: hao >Assignee: Apache Spark >Priority: Major > > This parameter can solve orc format table insert overwrite read table, it > should be stated in the document -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34006) [spark.sql.hive.convertMetastoreOrc]This parameter can solve orc format table insert overwrite read table, it should be stated in the document
[ https://issues.apache.org/jira/browse/SPARK-34006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258733#comment-17258733 ] Apache Spark commented on SPARK-34006: -- User 'dh20' has created a pull request for this issue: https://github.com/apache/spark/pull/31030 > [spark.sql.hive.convertMetastoreOrc]This parameter can solve orc format table > insert overwrite read table, it should be stated in the document > -- > > Key: SPARK-34006 > URL: https://issues.apache.org/jira/browse/SPARK-34006 > Project: Spark > Issue Type: Bug > Components: docs >Affects Versions: 3.0.1 >Reporter: hao >Priority: Major > > This parameter can solve orc format table insert overwrite read table, it > should be stated in the document -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34006) [spark.sql.hive.convertMetastoreOrc]This parameter can solve orc format table insert overwrite read table, it should be stated in the document
[ https://issues.apache.org/jira/browse/SPARK-34006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34006: Assignee: (was: Apache Spark) > [spark.sql.hive.convertMetastoreOrc]This parameter can solve orc format table > insert overwrite read table, it should be stated in the document > -- > > Key: SPARK-34006 > URL: https://issues.apache.org/jira/browse/SPARK-34006 > Project: Spark > Issue Type: Bug > Components: docs >Affects Versions: 3.0.1 >Reporter: hao >Priority: Major > > This parameter can solve orc format table insert overwrite read table, it > should be stated in the document -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34007) Downgrade scala-maven-plugin to 4.3.0
[ https://issues.apache.org/jira/browse/SPARK-34007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-34007: - Target Version/s: 3.1.0 > Downgrade scala-maven-plugin to 4.3.0 > - > > Key: SPARK-34007 > URL: https://issues.apache.org/jira/browse/SPARK-34007 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Blocker > > After we upgraded scala-maven-plugin to 4.4.0 at SPARK-33512, the docker > release script fails as below: > {code} > [INFO] Compiling 21 Scala sources and 3 Java sources to > /opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes > ... > [ERROR] ## Exception when compiling 24 sources to > /opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes > java.lang.SecurityException: class "javax.servlet.SessionCookieConfig"'s > signer information does not match signer information of other classes in the > same package > java.lang.ClassLoader.checkCerts(ClassLoader.java:891) > java.lang.ClassLoader.preDefineClass(ClassLoader.java:661) > java.lang.ClassLoader.defineClass(ClassLoader.java:754) > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > java.net.URLClassLoader.defineClass(URLClassLoader.java:468) > java.net.URLClassLoader.access$100(URLClassLoader.java:74) > java.net.URLClassLoader$1.run(URLClassLoader.java:369) > java.net.URLClassLoader$1.run(URLClassLoader.java:363) > java.security.AccessController.doPrivileged(Native Method) > java.net.URLClassLoader.findClass(URLClassLoader.java:362) > java.lang.ClassLoader.loadClass(ClassLoader.java:418) > java.lang.ClassLoader.loadClass(ClassLoader.java:351) > java.lang.Class.getDeclaredMethods0(Native Method) > java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > java.lang.Class.privateGetPublicMethods(Class.java:2902) > java.lang.Class.getMethods(Class.java:1615) > sbt.internal.inc.ClassToAPI$.toDefinitions0(ClassToAPI.scala:170) > sbt.internal.inc.ClassToAPI$.$anonfun$toDefinitions$1(ClassToAPI.scala:123) > scala.collection.mutable.HashMap.getOrElseUpdate(HashMap.scala:86) > sbt.internal.inc.ClassToAPI$.toDefinitions(ClassToAPI.scala:123) > sbt.internal.inc.ClassToAPI$.$anonfun$process$1(ClassToAPI.scala:33) > scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33980) invalidate char/varchar in spark.readStream.schema
[ https://issues.apache.org/jira/browse/SPARK-33980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-33980: - Fix Version/s: (was: 3.1.1) 3.1.0 > invalidate char/varchar in spark.readStream.schema > -- > > Key: SPARK-33980 > URL: https://issues.apache.org/jira/browse/SPARK-33980 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.1.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 3.1.0 > > > invalidate char/varchar in spark.readStream.schema just like what we do for > spark.read.schema -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34005) Update peak memory metrics for each Executor on task end.
[ https://issues.apache.org/jira/browse/SPARK-34005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34005: Assignee: Kousuke Saruta (was: Apache Spark) > Update peak memory metrics for each Executor on task end. > - > > Key: SPARK-34005 > URL: https://issues.apache.org/jira/browse/SPARK-34005 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.0, 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > Like other peak memory metrics (e.g, stage, executors in a stage), it's > better to update the peak memory metrics for each Executor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34005) Update peak memory metrics for each Executor on task end.
[ https://issues.apache.org/jira/browse/SPARK-34005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258727#comment-17258727 ] Apache Spark commented on SPARK-34005: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/31029 > Update peak memory metrics for each Executor on task end. > - > > Key: SPARK-34005 > URL: https://issues.apache.org/jira/browse/SPARK-34005 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.0, 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > Like other peak memory metrics (e.g, stage, executors in a stage), it's > better to update the peak memory metrics for each Executor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow
[ https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258726#comment-17258726 ] L. C. Hsieh commented on SPARK-33833: - Yea, but this can be easily overcome here. We just need to have a user-provided group id for committing offset purpose. As users need to specify it when they want to commit offset and track the progress, this is used by users with caution. Even for committing with currently static group ID given by users, I do not think it is really a reason to reject the committing offset idea. Once users decide to commit offset and track the progress, they should be cautious with the risk. Anyway, this seems not the reason causing the previous PR to be closed. > Allow Spark Structured Streaming report Kafka Lag through Burrow > > > Key: SPARK-33833 > URL: https://issues.apache.org/jira/browse/SPARK-33833 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.0.1 >Reporter: Sam Davarnia >Priority: Major > > Because structured streaming tracks Kafka offset consumption by itself, > It is not possible to track total Kafka lag using Burrow similar to DStreams > We have used Stream hooks as mentioned > [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37] > > It would be great if Spark supports this feature out of the box. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34005) Update peak memory metrics for each Executor on task end.
[ https://issues.apache.org/jira/browse/SPARK-34005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34005: Assignee: Apache Spark (was: Kousuke Saruta) > Update peak memory metrics for each Executor on task end. > - > > Key: SPARK-34005 > URL: https://issues.apache.org/jira/browse/SPARK-34005 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.0, 3.2.0 >Reporter: Kousuke Saruta >Assignee: Apache Spark >Priority: Minor > > Like other peak memory metrics (e.g, stage, executors in a stage), it's > better to update the peak memory metrics for each Executor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33992) resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer
[ https://issues.apache.org/jira/browse/SPARK-33992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-33992: - Fix Version/s: (was: 3.1.1) 3.1.0 > resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer > - > > Key: SPARK-33992 > URL: https://issues.apache.org/jira/browse/SPARK-33992 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 3.1.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Minor > Fix For: 3.1.0 > > > PaddingAndLengthCheckForCharVarchar could fail query when > resolveOperatorsUpWithNewOutput > with > {code:java} > [info] - char/varchar resolution in sub query *** FAILED *** (367 > milliseconds) > [info] java.lang.RuntimeException: This method should not be called in the > analyzer > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule(AnalysisHelper.scala:150) > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule$(AnalysisHelper.scala:146) > [info] at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.assertNotAnalysisRule(LogicalPlan.scala:29) > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown(AnalysisHelper.scala:161) > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown$(AnalysisHelper.scala:160) > [info] at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29) > [info] at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29) > [info] at > org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$updateOuterReferencesInSubquery(QueryPlan.scala:267) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33894) Word2VecSuite failed for Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-33894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-33894: - Fix Version/s: (was: 3.1.1) 3.1.0 > Word2VecSuite failed for Scala 2.13 > --- > > Key: SPARK-33894 > URL: https://issues.apache.org/jira/browse/SPARK-33894 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Affects Versions: 3.2.0 >Reporter: Darcy Shen >Assignee: koert kuipers >Priority: Major > Fix For: 3.1.0 > > > This may be the first failed build: > https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7-scala-2.13/52/ > h2. Possible Work Around Fix > Move > case class Data(word: String, vector: Array[Float]) > out of the class Word2VecModel > h2. Attempts to git bisect > master branch git "bisect" > cc23581e2645c91fa8d6e6c81dc87b4221718bb1 fail > 3d0323401f7a3e4369a3d3f4ff98f15d19e8a643 fail > 9d9d4a8e122cf1137edeca857e925f7e76c1ace2 fail > f5d2165c95fe83f24be9841807613950c1d5d6d0 fail 2020-12-01 > h2. Attached Stack Trace > To reproduce it in master: > ./dev/change-scala-version.sh 2.13 > sbt -Pscala-2.13 > > project mllib > > testOnly org.apache.spark.ml.feature.Word2VecSuite > [info] Word2VecSuite: > [info] - params (45 milliseconds) > [info] - Word2Vec (5 seconds, 768 milliseconds) > [info] - getVectors (549 milliseconds) > [info] - findSynonyms (222 milliseconds) > [info] - window size (382 milliseconds) > [info] - Word2Vec read/write numPartitions calculation (1 millisecond) > [info] - Word2Vec read/write (669 milliseconds) > [info] - Word2VecModel read/write *** FAILED *** (423 milliseconds) > [info] org.apache.spark.SparkException: Job aborted. > [info] at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:231) > [info] at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:188) > [info] at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108) > [info] at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106) > [info] at > org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:131) > [info] at > org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180) > [info] at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218) > [info] at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > [info] at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215) > [info] at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176) > [info] at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132) > [info] at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131) > [info] at > org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989) > [info] at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) > [info] at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) > [info] at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) > [info] at > org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772) > [info] at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) > [info] at > org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989) > [info] at > org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438) > [info] at > org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415) > [info] at > org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293) > [info] at > org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:874) > [info] at > org.apache.spark.ml.feature.Word2VecModel$Word2VecModelWriter.saveImpl(Word2Vec.scala:368) > [info] at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:168) > [info] at org.apache.spark.ml.util.MLWritable.save(ReadWrite.scala:287) > [info] at org.apache.spark.ml.util.MLWritable.save$(ReadWrite.scala:287) > [info] at org.apache.spark.ml.feature.Word2VecModel.save(Word2Vec.scala:207) > [info] at > org.apache.spark.ml.util.DefaultReadWriteTest.testDefaultReadWrite(DefaultReadWriteTest.scala:51) > [info] at > org.apache.spark.ml.util.DefaultReadWriteTest.testDefaultReadWrite$(DefaultReadWriteTest.scala:42) > [info] at > org.apache.spark.ml.feature.Word2VecSuite.testDefaultReadWrite(Word2Ve
[jira] [Updated] (SPARK-34000) ExecutorAllocationListener threw an exception java.util.NoSuchElementException
[ https://issues.apache.org/jira/browse/SPARK-34000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-34000: - Fix Version/s: (was: 3.1.1) 3.1.0 > ExecutorAllocationListener threw an exception java.util.NoSuchElementException > -- > > Key: SPARK-34000 > URL: https://issues.apache.org/jira/browse/SPARK-34000 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.1, 3.1.0, 3.2.0 >Reporter: Lantao Jin >Assignee: Lantao Jin >Priority: Major > Fix For: 3.0.2, 3.1.0 > > > 21/01/04 03:00:32,259 WARN [task-result-getter-2] scheduler.TaskSetManager:69 > : Lost task 306.1 in stage 600.0 (TID 283610, > hdc49-mcc10-01-0510-4108-039-tess0097.stratus.rno.ebay.com, executor 27): > TaskKilled (another attempt succeeded) > 21/01/04 03:00:32,259 INFO [task-result-getter-2] scheduler.TaskSetManager:57 > : Task 306.1 in stage 600.0 (TID 283610) failed, but the task will not be > re-executed (either because the task failed with a shuffle data fetch > failure, so the > previous stage needs to be re-run, or because a different copy of the task > has already succeeded). > 21/01/04 03:00:32,259 INFO [task-result-getter-2] > cluster.YarnClusterScheduler:57 : Removed TaskSet 600.0, whose tasks have all > completed, from pool default > 21/01/04 03:00:32,259 INFO [HiveServer2-Handler-Pool: Thread-5853] > thriftserver.SparkExecuteStatementOperation:190 : Returning result set with > 50 rows from offsets [5378600, 5378650) with > 1fe245f8-a7f9-4ec0-bcb5-8cf324cbbb47 > 21/01/04 03:00:32,260 ERROR [spark-listener-group-executorManagement] > scheduler.AsyncEventQueue:94 : Listener ExecutorAllocationListener threw an > exception > java.util.NoSuchElementException: key not found: Stage 600 (Attempt 0) > at scala.collection.MapLike.default(MapLike.scala:235) > at scala.collection.MapLike.default$(MapLike.scala:234) > at scala.collection.AbstractMap.default(Map.scala:63) > at scala.collection.mutable.HashMap.apply(HashMap.scala:69) > at > org.apache.spark.ExecutorAllocationManager$ExecutorAllocationListener.onTaskEnd(ExecutorAllocationManager.scala:621) > at > org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:45) > at > org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) > at > org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38) > at > org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38) > at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:115) > at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:99) > at > org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:116) > at > org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:116) > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) > at > org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:102) > at > org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:97) > at > org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320) > at > org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:97) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33950) ALTER TABLE .. DROP PARTITION doesn't refresh cache
[ https://issues.apache.org/jira/browse/SPARK-33950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258725#comment-17258725 ] Hyukjin Kwon commented on SPARK-33950: -- I need to recreate rc1 tag. I failed to create a RC due to an dependency issue SPARK-34007. I am correcting the fix version to 3.1.0 > ALTER TABLE .. DROP PARTITION doesn't refresh cache > --- > > Key: SPARK-33950 > URL: https://issues.apache.org/jira/browse/SPARK-33950 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1, 3.1.0, 3.2.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Labels: correctness > Fix For: 3.0.2, 3.1.0, 3.2.0 > > > Here is the example to reproduce the issue: > {code:sql} > spark-sql> CREATE TABLE tbl1 (col0 int, part0 int) USING parquet PARTITIONED > BY (part0); > spark-sql> INSERT INTO tbl1 PARTITION (part0=0) SELECT 0; > spark-sql> INSERT INTO tbl1 PARTITION (part0=1) SELECT 1; > spark-sql> CACHE TABLE tbl1; > spark-sql> SELECT * FROM tbl1; > 0 0 > 1 1 > spark-sql> ALTER TABLE tbl1 DROP PARTITION (part0=0); > spark-sql> SELECT * FROM tbl1; > 0 0 > 1 1 > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-33980) invalidate char/varchar in spark.readStream.schema
[ https://issues.apache.org/jira/browse/SPARK-33980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258724#comment-17258724 ] Hyukjin Kwon edited comment on SPARK-33980 at 1/5/21, 7:43 AM: --- I need to recreate rc1 tag. I failed to create a RC due to an dependency issue SPARK-34007. I am correcting the fix version to 3.1.0 was (Author: hyukjin.kwon): I need to recreate rc1 tag. I failed to create a RC due to an dependency issue SPARK-34007. > invalidate char/varchar in spark.readStream.schema > -- > > Key: SPARK-33980 > URL: https://issues.apache.org/jira/browse/SPARK-33980 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.1.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 3.1.1 > > > invalidate char/varchar in spark.readStream.schema just like what we do for > spark.read.schema -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33950) ALTER TABLE .. DROP PARTITION doesn't refresh cache
[ https://issues.apache.org/jira/browse/SPARK-33950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-33950: - Fix Version/s: (was: 3.1.1) 3.1.0 > ALTER TABLE .. DROP PARTITION doesn't refresh cache > --- > > Key: SPARK-33950 > URL: https://issues.apache.org/jira/browse/SPARK-33950 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1, 3.1.0, 3.2.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Labels: correctness > Fix For: 3.0.2, 3.1.0, 3.2.0 > > > Here is the example to reproduce the issue: > {code:sql} > spark-sql> CREATE TABLE tbl1 (col0 int, part0 int) USING parquet PARTITIONED > BY (part0); > spark-sql> INSERT INTO tbl1 PARTITION (part0=0) SELECT 0; > spark-sql> INSERT INTO tbl1 PARTITION (part0=1) SELECT 1; > spark-sql> CACHE TABLE tbl1; > spark-sql> SELECT * FROM tbl1; > 0 0 > 1 1 > spark-sql> ALTER TABLE tbl1 DROP PARTITION (part0=0); > spark-sql> SELECT * FROM tbl1; > 0 0 > 1 1 > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33980) invalidate char/varchar in spark.readStream.schema
[ https://issues.apache.org/jira/browse/SPARK-33980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258724#comment-17258724 ] Hyukjin Kwon commented on SPARK-33980: -- I need to recreate rc1 tag. I failed to create a RC due to an dependency issue SPARK-34007. > invalidate char/varchar in spark.readStream.schema > -- > > Key: SPARK-33980 > URL: https://issues.apache.org/jira/browse/SPARK-33980 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.1.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 3.1.1 > > > invalidate char/varchar in spark.readStream.schema just like what we do for > spark.read.schema -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34007) Downgrade scala-maven-plugin to 4.3.0
Hyukjin Kwon created SPARK-34007: Summary: Downgrade scala-maven-plugin to 4.3.0 Key: SPARK-34007 URL: https://issues.apache.org/jira/browse/SPARK-34007 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.1.0 Reporter: Hyukjin Kwon After we upgraded scala-maven-plugin to 4.4.0 at SPARK-33512, the docker release script fails as below: {code} [INFO] Compiling 21 Scala sources and 3 Java sources to /opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes ... [ERROR] ## Exception when compiling 24 sources to /opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes java.lang.SecurityException: class "javax.servlet.SessionCookieConfig"'s signer information does not match signer information of other classes in the same package java.lang.ClassLoader.checkCerts(ClassLoader.java:891) java.lang.ClassLoader.preDefineClass(ClassLoader.java:661) java.lang.ClassLoader.defineClass(ClassLoader.java:754) java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) java.net.URLClassLoader.defineClass(URLClassLoader.java:468) java.net.URLClassLoader.access$100(URLClassLoader.java:74) java.net.URLClassLoader$1.run(URLClassLoader.java:369) java.net.URLClassLoader$1.run(URLClassLoader.java:363) java.security.AccessController.doPrivileged(Native Method) java.net.URLClassLoader.findClass(URLClassLoader.java:362) java.lang.ClassLoader.loadClass(ClassLoader.java:418) java.lang.ClassLoader.loadClass(ClassLoader.java:351) java.lang.Class.getDeclaredMethods0(Native Method) java.lang.Class.privateGetDeclaredMethods(Class.java:2701) java.lang.Class.privateGetPublicMethods(Class.java:2902) java.lang.Class.getMethods(Class.java:1615) sbt.internal.inc.ClassToAPI$.toDefinitions0(ClassToAPI.scala:170) sbt.internal.inc.ClassToAPI$.$anonfun$toDefinitions$1(ClassToAPI.scala:123) scala.collection.mutable.HashMap.getOrElseUpdate(HashMap.scala:86) sbt.internal.inc.ClassToAPI$.toDefinitions(ClassToAPI.scala:123) sbt.internal.inc.ClassToAPI$.$anonfun$process$1(ClassToAPI.scala:33) scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34006) [spark.sql.hive.convertMetastoreOrc]This parameter can solve orc format table insert overwrite read table, it should be stated in the document
hao created SPARK-34006: --- Summary: [spark.sql.hive.convertMetastoreOrc]This parameter can solve orc format table insert overwrite read table, it should be stated in the document Key: SPARK-34006 URL: https://issues.apache.org/jira/browse/SPARK-34006 Project: Spark Issue Type: Bug Components: docs Affects Versions: 3.0.1 Reporter: hao This parameter can solve orc format table insert overwrite read table, it should be stated in the document -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33989) Strip auto-generated cast when using Cast.sql
[ https://issues.apache.org/jira/browse/SPARK-33989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ulysses you updated SPARK-33989: Summary: Strip auto-generated cast when using Cast.sql (was: Strip auto-generated cast when resolving UnresolvedAlias) > Strip auto-generated cast when using Cast.sql > - > > Key: SPARK-33989 > URL: https://issues.apache.org/jira/browse/SPARK-33989 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: ulysses you >Priority: Minor > > During analysis we may introduce the Cast if exists type cast implicitly. > That makes assgined name unclear. > Let's say we have a sql `select id == null` which id is int type, then the > output field name will be `(id = CAST(null as int))`. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34003) Rule conflicts between PaddingAndLengthCheckForCharVarchar and ResolveAggregateFunctions
[ https://issues.apache.org/jira/browse/SPARK-34003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34003: Assignee: (was: Apache Spark) > Rule conflicts between PaddingAndLengthCheckForCharVarchar and > ResolveAggregateFunctions > > > Key: SPARK-34003 > URL: https://issues.apache.org/jira/browse/SPARK-34003 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Kent Yao >Priority: Critical > > ResolveAggregateFunctions is a hacky rule and it calls `executeSameContext` > to generate a `resolved agg` to determine which unresolved sort attribute > should be pushed into the agg. However, after we add the > PaddingAndLengthCheckForCharVarchar rule which will rewrite the query output, > thus, the `resolved agg` cannot match original attributes anymore. > It causes some dissociative sort attribute to be pushed in and fails the query > {code:java} > [info] Failed to analyze query: org.apache.spark.sql.AnalysisException: > expression 'testcat.t1.`v`' is neither present in the group by, nor is it an > aggregate function. Add to group by or wrap in first() (or first_value) if > you don't care which value you get.; > [info] Project [v#14, sum(i)#11L] > [info] +- Sort [aggOrder#12 ASC NULLS FIRST], true > [info] +- !Aggregate [v#14], [v#14, sum(cast(i#7 as bigint)) AS > sum(i)#11L, v#13 AS aggOrder#12] > [info] +- SubqueryAlias testcat.t1 > [info]+- Project [if ((length(v#6) <= 3)) v#6 else if > ((length(rtrim(v#6, None)) > 3)) cast(raise_error(concat(input string of > length , cast(length(v#6) as string), exceeds varchar type length > limitation: 3)) as string) else rpad(rtrim(v#6, None), 3, ) AS v#14, i#7] > [info] +- RelationV2[v#6, i#7, index#15, _partition#16] > testcat.t1 > [info] > [info] Project [v#14, sum(i)#11L] > [info] +- Sort [aggOrder#12 ASC NULLS FIRST], true > [info] +- !Aggregate [v#14], [v#14, sum(cast(i#7 as bigint)) AS > sum(i)#11L, v#13 AS aggOrder#12] > [info] +- SubqueryAlias testcat.t1 > [info]+- Project [if ((length(v#6) <= 3)) v#6 else if > ((length(rtrim(v#6, None)) > 3)) cast(raise_error(concat(input string of > length , cast(length(v#6) as string), exceeds varchar type length > limitation: 3)) as string) else rpad(rtrim(v#6, None), 3, ) AS v#14, i#7] > [info] +- RelationV2[v#6, i#7, index#15, _partition#16] > testcat.t1 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34003) Rule conflicts between PaddingAndLengthCheckForCharVarchar and ResolveAggregateFunctions
[ https://issues.apache.org/jira/browse/SPARK-34003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34003: Assignee: Apache Spark > Rule conflicts between PaddingAndLengthCheckForCharVarchar and > ResolveAggregateFunctions > > > Key: SPARK-34003 > URL: https://issues.apache.org/jira/browse/SPARK-34003 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Kent Yao >Assignee: Apache Spark >Priority: Critical > > ResolveAggregateFunctions is a hacky rule and it calls `executeSameContext` > to generate a `resolved agg` to determine which unresolved sort attribute > should be pushed into the agg. However, after we add the > PaddingAndLengthCheckForCharVarchar rule which will rewrite the query output, > thus, the `resolved agg` cannot match original attributes anymore. > It causes some dissociative sort attribute to be pushed in and fails the query > {code:java} > [info] Failed to analyze query: org.apache.spark.sql.AnalysisException: > expression 'testcat.t1.`v`' is neither present in the group by, nor is it an > aggregate function. Add to group by or wrap in first() (or first_value) if > you don't care which value you get.; > [info] Project [v#14, sum(i)#11L] > [info] +- Sort [aggOrder#12 ASC NULLS FIRST], true > [info] +- !Aggregate [v#14], [v#14, sum(cast(i#7 as bigint)) AS > sum(i)#11L, v#13 AS aggOrder#12] > [info] +- SubqueryAlias testcat.t1 > [info]+- Project [if ((length(v#6) <= 3)) v#6 else if > ((length(rtrim(v#6, None)) > 3)) cast(raise_error(concat(input string of > length , cast(length(v#6) as string), exceeds varchar type length > limitation: 3)) as string) else rpad(rtrim(v#6, None), 3, ) AS v#14, i#7] > [info] +- RelationV2[v#6, i#7, index#15, _partition#16] > testcat.t1 > [info] > [info] Project [v#14, sum(i)#11L] > [info] +- Sort [aggOrder#12 ASC NULLS FIRST], true > [info] +- !Aggregate [v#14], [v#14, sum(cast(i#7 as bigint)) AS > sum(i)#11L, v#13 AS aggOrder#12] > [info] +- SubqueryAlias testcat.t1 > [info]+- Project [if ((length(v#6) <= 3)) v#6 else if > ((length(rtrim(v#6, None)) > 3)) cast(raise_error(concat(input string of > length , cast(length(v#6) as string), exceeds varchar type length > limitation: 3)) as string) else rpad(rtrim(v#6, None), 3, ) AS v#14, i#7] > [info] +- RelationV2[v#6, i#7, index#15, _partition#16] > testcat.t1 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34004) Change FrameLessOffsetWindowFunction as sealed abstract class
[ https://issues.apache.org/jira/browse/SPARK-34004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34004: Assignee: Apache Spark > Change FrameLessOffsetWindowFunction as sealed abstract class > - > > Key: SPARK-34004 > URL: https://issues.apache.org/jira/browse/SPARK-34004 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Assignee: Apache Spark >Priority: Major > > Change FrameLessOffsetWindowFunction as sealed abstract class so that > simplify pattern match. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34004) Change FrameLessOffsetWindowFunction as sealed abstract class
[ https://issues.apache.org/jira/browse/SPARK-34004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258722#comment-17258722 ] Apache Spark commented on SPARK-34004: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/31026 > Change FrameLessOffsetWindowFunction as sealed abstract class > - > > Key: SPARK-34004 > URL: https://issues.apache.org/jira/browse/SPARK-34004 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Priority: Major > > Change FrameLessOffsetWindowFunction as sealed abstract class so that > simplify pattern match. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34004) Change FrameLessOffsetWindowFunction as sealed abstract class
[ https://issues.apache.org/jira/browse/SPARK-34004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34004: Assignee: (was: Apache Spark) > Change FrameLessOffsetWindowFunction as sealed abstract class > - > > Key: SPARK-34004 > URL: https://issues.apache.org/jira/browse/SPARK-34004 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Priority: Major > > Change FrameLessOffsetWindowFunction as sealed abstract class so that > simplify pattern match. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34003) Rule conflicts between PaddingAndLengthCheckForCharVarchar and ResolveAggregateFunctions
[ https://issues.apache.org/jira/browse/SPARK-34003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258723#comment-17258723 ] Apache Spark commented on SPARK-34003: -- User 'yaooqinn' has created a pull request for this issue: https://github.com/apache/spark/pull/31027 > Rule conflicts between PaddingAndLengthCheckForCharVarchar and > ResolveAggregateFunctions > > > Key: SPARK-34003 > URL: https://issues.apache.org/jira/browse/SPARK-34003 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Kent Yao >Priority: Critical > > ResolveAggregateFunctions is a hacky rule and it calls `executeSameContext` > to generate a `resolved agg` to determine which unresolved sort attribute > should be pushed into the agg. However, after we add the > PaddingAndLengthCheckForCharVarchar rule which will rewrite the query output, > thus, the `resolved agg` cannot match original attributes anymore. > It causes some dissociative sort attribute to be pushed in and fails the query > {code:java} > [info] Failed to analyze query: org.apache.spark.sql.AnalysisException: > expression 'testcat.t1.`v`' is neither present in the group by, nor is it an > aggregate function. Add to group by or wrap in first() (or first_value) if > you don't care which value you get.; > [info] Project [v#14, sum(i)#11L] > [info] +- Sort [aggOrder#12 ASC NULLS FIRST], true > [info] +- !Aggregate [v#14], [v#14, sum(cast(i#7 as bigint)) AS > sum(i)#11L, v#13 AS aggOrder#12] > [info] +- SubqueryAlias testcat.t1 > [info]+- Project [if ((length(v#6) <= 3)) v#6 else if > ((length(rtrim(v#6, None)) > 3)) cast(raise_error(concat(input string of > length , cast(length(v#6) as string), exceeds varchar type length > limitation: 3)) as string) else rpad(rtrim(v#6, None), 3, ) AS v#14, i#7] > [info] +- RelationV2[v#6, i#7, index#15, _partition#16] > testcat.t1 > [info] > [info] Project [v#14, sum(i)#11L] > [info] +- Sort [aggOrder#12 ASC NULLS FIRST], true > [info] +- !Aggregate [v#14], [v#14, sum(cast(i#7 as bigint)) AS > sum(i)#11L, v#13 AS aggOrder#12] > [info] +- SubqueryAlias testcat.t1 > [info]+- Project [if ((length(v#6) <= 3)) v#6 else if > ((length(rtrim(v#6, None)) > 3)) cast(raise_error(concat(input string of > length , cast(length(v#6) as string), exceeds varchar type length > limitation: 3)) as string) else rpad(rtrim(v#6, None), 3, ) AS v#14, i#7] > [info] +- RelationV2[v#6, i#7, index#15, _partition#16] > testcat.t1 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32017) Make Pyspark Hadoop 3.2+ Variant available in PyPI
[ https://issues.apache.org/jira/browse/SPARK-32017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258721#comment-17258721 ] Apache Spark commented on SPARK-32017: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/31028 > Make Pyspark Hadoop 3.2+ Variant available in PyPI > -- > > Key: SPARK-32017 > URL: https://issues.apache.org/jira/browse/SPARK-32017 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.0.0 >Reporter: George Pongracz >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.1.0 > > > The version of Pyspark 3.0.0 currently available in PyPI currently uses > hadoop 2.7.4. > Could a variant (or the default) have its version of Hadoop aligned to 3.2.0 > as per the downloadable spark binaries. > This would enable the PyPI version to be compatible with session token > authorisations and assist in accessing data residing in object stores with > stronger encryption methods. > If not PyPI then as a tar file in the apache download archives at the least > please. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34005) Update peak memory metrics for each Executor on task end.
Kousuke Saruta created SPARK-34005: -- Summary: Update peak memory metrics for each Executor on task end. Key: SPARK-34005 URL: https://issues.apache.org/jira/browse/SPARK-34005 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.1.0, 3.2.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta Like other peak memory metrics (e.g, stage, executors in a stage), it's better to update the peak memory metrics for each Executor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34005) Update peak memory metrics for each Executor on task end.
[ https://issues.apache.org/jira/browse/SPARK-34005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-34005: --- Issue Type: Improvement (was: Bug) > Update peak memory metrics for each Executor on task end. > - > > Key: SPARK-34005 > URL: https://issues.apache.org/jira/browse/SPARK-34005 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.0, 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > Like other peak memory metrics (e.g, stage, executors in a stage), it's > better to update the peak memory metrics for each Executor. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33919) Unify v1 and v2 SHOW NAMESPACES tests
[ https://issues.apache.org/jira/browse/SPARK-33919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-33919. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 30937 [https://github.com/apache/spark/pull/30937] > Unify v1 and v2 SHOW NAMESPACES tests > - > > Key: SPARK-33919 > URL: https://issues.apache.org/jira/browse/SPARK-33919 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.2.0 > > > Write unified tests for SHOW DATABASES and SHOW NAMESPACES that can be run > for v1 and v2 catalogs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33919) Unify v1 and v2 SHOW NAMESPACES tests
[ https://issues.apache.org/jira/browse/SPARK-33919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-33919: --- Assignee: Maxim Gekk > Unify v1 and v2 SHOW NAMESPACES tests > - > > Key: SPARK-33919 > URL: https://issues.apache.org/jira/browse/SPARK-33919 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > > Write unified tests for SHOW DATABASES and SHOW NAMESPACES that can be run > for v1 and v2 catalogs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33995) Make datetime addition easier for years, weeks, hours, minutes, and seconds
[ https://issues.apache.org/jira/browse/SPARK-33995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258718#comment-17258718 ] Maxim Gekk commented on SPARK-33995: > Option 1: Single make_interval function that takes 7 arguments Small clarification. make_interval could have default values for all 7 arguments like Postgress has, see [https://www.postgresql.org/docs/9.4/functions-datetime.html] > As a user, Option 3 would be my preference. >col("first_datetime").addHours(2).addSeconds(30) is easy for me to remember >and type. I like this approach too > Make datetime addition easier for years, weeks, hours, minutes, and seconds > --- > > Key: SPARK-33995 > URL: https://issues.apache.org/jira/browse/SPARK-33995 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.1.0 >Reporter: Matthew Powers >Priority: Minor > > There are add_months and date_add functions that make it easy to perform > datetime addition with months and days, but there isn't an easy way to > perform datetime addition with years, weeks, hours, minutes, or seconds with > the Scala/Python/R APIs. > Users need to write code like expr("first_datetime + INTERVAL 2 hours") to > add two hours to a timestamp with the Scala API, which isn't desirable. We > don't want to make Scala users manipulate SQL strings. > We can expose the [make_interval SQL > function|https://github.com/apache/spark/pull/26446/files] to make any > combination of datetime addition possible. That'll make tons of different > datetime addition operations possible and will be valuable for a wide array > of users. > make_interval takes 7 arguments: years, months, weeks, days, hours, mins, and > secs. > There are different ways to expose the make_interval functionality to > Scala/Python/R users: > * Option 1: Single make_interval function that takes 7 arguments > * Option 2: expose a few interval functions > ** make_date_interval function that takes years, months, days > ** make_time_interval function that takes hours, minutes, seconds > ** make_datetime_interval function that takes years, months, days, hours, > minutes, seconds > * Option 3: expose add_years, add_months, add_days, add_weeks, add_hours, > add_minutes, and add_seconds as Column methods. > * Option 4: Expose the add_years, add_hours, etc. as column functions. > add_weeks and date_add have already been exposed in this manner. > Option 1 is nice from a maintenance perspective cause it's a single function, > but it's not standard from a user perspective. Most languages support > datetime instantiation with these arguments: years, months, days, hours, > minutes, seconds. Mixing weeks into the equation is not standard. > As a user, Option 3 would be my preference. > col("first_datetime").addHours(2).addSeconds(30) is easy for me to remember > and type. col("first_datetime") + make_time_interval(lit(2), lit(0), > lit(30)) isn't as nice. col("first_datetime") + make_interval(lit(0), > lit(0), lit(0), lit(0), lit(2), lit(0), lit(30)) is harder still. > Any of these options is an improvement to the status quo. Let me know what > option you think is best and then I'll make a PR to implement it, building > off of Max's foundational work of course ;) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow
[ https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258717#comment-17258717 ] Jungtaek Lim commented on SPARK-33833: -- That’s available with serious caution. Spark has to have full control of offset management and it shouldn’t be touched from outside in any way. Creating unique group ID is a defensive approach on this, preventing end users to mess up by accident. Once end users set the static group ID, the guard is no longer valid. > Allow Spark Structured Streaming report Kafka Lag through Burrow > > > Key: SPARK-33833 > URL: https://issues.apache.org/jira/browse/SPARK-33833 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.0.1 >Reporter: Sam Davarnia >Priority: Major > > Because structured streaming tracks Kafka offset consumption by itself, > It is not possible to track total Kafka lag using Burrow similar to DStreams > We have used Stream hooks as mentioned > [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37] > > It would be great if Spark supports this feature out of the box. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow
[ https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258716#comment-17258716 ] L. C. Hsieh commented on SPARK-33833: - I read though the comments in the previous PR. The approach is pretty similar as what I did locally. So I guess that if nothing changes, it won't be considered too in the Spark codebase. > Allow Spark Structured Streaming report Kafka Lag through Burrow > > > Key: SPARK-33833 > URL: https://issues.apache.org/jira/browse/SPARK-33833 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.0.1 >Reporter: Sam Davarnia >Priority: Major > > Because structured streaming tracks Kafka offset consumption by itself, > It is not possible to track total Kafka lag using Burrow similar to DStreams > We have used Stream hooks as mentioned > [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37] > > It would be great if Spark supports this feature out of the box. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow
[ https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258707#comment-17258707 ] L. C. Hsieh commented on SPARK-33833: - Btw, thanks for providing the useful link to previous ticket/PR. [~kabhwan] > Allow Spark Structured Streaming report Kafka Lag through Burrow > > > Key: SPARK-33833 > URL: https://issues.apache.org/jira/browse/SPARK-33833 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.0.1 >Reporter: Sam Davarnia >Priority: Major > > Because structured streaming tracks Kafka offset consumption by itself, > It is not possible to track total Kafka lag using Burrow similar to DStreams > We have used Stream hooks as mentioned > [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37] > > It would be great if Spark supports this feature out of the box. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34004) Change FrameLessOffsetWindowFunction as sealed abstract class
jiaan.geng created SPARK-34004: -- Summary: Change FrameLessOffsetWindowFunction as sealed abstract class Key: SPARK-34004 URL: https://issues.apache.org/jira/browse/SPARK-34004 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: jiaan.geng Change FrameLessOffsetWindowFunction as sealed abstract class so that simplify pattern match. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33935) Fix CBOs cost function
[ https://issues.apache.org/jira/browse/SPARK-33935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-33935. -- Fix Version/s: 3.2.0 3.1.0 Assignee: Tanel Kiis Resolution: Fixed Resolved by https://github.com/apache/spark/pull/30965 > Fix CBOs cost function > --- > > Key: SPARK-33935 > URL: https://issues.apache.org/jira/browse/SPARK-33935 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Tanel Kiis >Assignee: Tanel Kiis >Priority: Major > Fix For: 3.1.0, 3.2.0 > > > The parameter spark.sql.cbo.joinReorder.card.weight is decumented as: > {code:title=spark.sql.cbo.joinReorder.card.weight} > The weight of cardinality (number of rows) for plan cost comparison in join > reorder: rows * weight + size * (1 - weight). > {code} > But in the implementation the formula is a bit different: > {code:title=Current implementation} > def betterThan(other: JoinPlan, conf: SQLConf): Boolean = { > if (other.planCost.card == 0 || other.planCost.size == 0) { > false > } else { > val relativeRows = BigDecimal(this.planCost.card) / > BigDecimal(other.planCost.card) > val relativeSize = BigDecimal(this.planCost.size) / > BigDecimal(other.planCost.size) > relativeRows * conf.joinReorderCardWeight + > relativeSize * (1 - conf.joinReorderCardWeight) < 1 > } > } > {code} > This change has an unfortunate consequence: > given two plans A and B, both A betterThan B and B betterThan A might give > the same results. This happes when one has many rows with small sizes and > other has few rows with large sizes. > A example values, that have this fenomen with the default weight value (0.7): > A.card = 500, B.card = 300 > A.size = 30, B.size = 80 > Both A betterThan B and B betterThan A would have score above 1 and would > return false. > A new implementation is proposed, that matches the documentation: > {code:title=Proposed implementation} > def betterThan(other: JoinPlan, conf: SQLConf): Boolean = { > val oldCost = BigDecimal(this.planCost.card) * > conf.joinReorderCardWeight + > BigDecimal(this.planCost.size) * (1 - conf.joinReorderCardWeight) > val newCost = BigDecimal(other.planCost.card) * > conf.joinReorderCardWeight + > BigDecimal(other.planCost.size) * (1 - conf.joinReorderCardWeight) > newCost < oldCost > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow
[ https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258705#comment-17258705 ] L. C. Hsieh commented on SPARK-33833: - I think SS allows users to specify custom group id, isn't it? > Allow Spark Structured Streaming report Kafka Lag through Burrow > > > Key: SPARK-33833 > URL: https://issues.apache.org/jira/browse/SPARK-33833 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.0.1 >Reporter: Sam Davarnia >Priority: Major > > Because structured streaming tracks Kafka offset consumption by itself, > It is not possible to track total Kafka lag using Burrow similar to DStreams > We have used Stream hooks as mentioned > [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37] > > It would be great if Spark supports this feature out of the box. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34003) Rule conflicts between PaddingAndLengthCheckForCharVarchar and ResolveAggregateFunctions
Kent Yao created SPARK-34003: Summary: Rule conflicts between PaddingAndLengthCheckForCharVarchar and ResolveAggregateFunctions Key: SPARK-34003 URL: https://issues.apache.org/jira/browse/SPARK-34003 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.0 Reporter: Kent Yao ResolveAggregateFunctions is a hacky rule and it calls `executeSameContext` to generate a `resolved agg` to determine which unresolved sort attribute should be pushed into the agg. However, after we add the PaddingAndLengthCheckForCharVarchar rule which will rewrite the query output, thus, the `resolved agg` cannot match original attributes anymore. It causes some dissociative sort attribute to be pushed in and fails the query {code:java} [info] Failed to analyze query: org.apache.spark.sql.AnalysisException: expression 'testcat.t1.`v`' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.; [info] Project [v#14, sum(i)#11L] [info] +- Sort [aggOrder#12 ASC NULLS FIRST], true [info] +- !Aggregate [v#14], [v#14, sum(cast(i#7 as bigint)) AS sum(i)#11L, v#13 AS aggOrder#12] [info] +- SubqueryAlias testcat.t1 [info]+- Project [if ((length(v#6) <= 3)) v#6 else if ((length(rtrim(v#6, None)) > 3)) cast(raise_error(concat(input string of length , cast(length(v#6) as string), exceeds varchar type length limitation: 3)) as string) else rpad(rtrim(v#6, None), 3, ) AS v#14, i#7] [info] +- RelationV2[v#6, i#7, index#15, _partition#16] testcat.t1 [info] [info] Project [v#14, sum(i)#11L] [info] +- Sort [aggOrder#12 ASC NULLS FIRST], true [info] +- !Aggregate [v#14], [v#14, sum(cast(i#7 as bigint)) AS sum(i)#11L, v#13 AS aggOrder#12] [info] +- SubqueryAlias testcat.t1 [info]+- Project [if ((length(v#6) <= 3)) v#6 else if ((length(rtrim(v#6, None)) > 3)) cast(raise_error(concat(input string of length , cast(length(v#6) as string), exceeds varchar type length limitation: 3)) as string) else rpad(rtrim(v#6, None), 3, ) AS v#14, i#7] [info] +- RelationV2[v#6, i#7, index#15, _partition#16] testcat.t1 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33100) Support parse the sql statements with c-style comments
[ https://issues.apache.org/jira/browse/SPARK-33100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-33100. -- Fix Version/s: 3.2.0 3.1.0 Assignee: feiwang (was: Apache Spark) Resolution: Fixed Resolved by https://github.com/apache/spark/pull/29982 > Support parse the sql statements with c-style comments > -- > > Key: SPARK-33100 > URL: https://issues.apache.org/jira/browse/SPARK-33100 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 >Reporter: feiwang >Assignee: feiwang >Priority: Minor > Fix For: 3.1.0, 3.2.0 > > > Now the spark-sql does not support parse the sql statements with C-style > comments. > For the sql statements: > {code:java} > /* SELECT 'test'; */ > SELECT 'test'; > {code} > Would be split to two statements: > The first: "/* SELECT 'test'" > The second: "*/ SELECT 'test'" > Then it would throw an exception because the first one is illegal. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow
[ https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258695#comment-17258695 ] Jungtaek Lim commented on SPARK-33833: -- For SS, consumer group is randomly generated by intention, which is the actual issue on leveraging the offset information with Kafka ecosystem. SPARK-27549 was the thing to address this, but that was unfortunately soft-rejected to have in Spark repository. Instead of pushing this more, I've just crafted the project on my repository - https://github.com/HeartSaVioR/spark-sql-kafka-offset-committer > Allow Spark Structured Streaming report Kafka Lag through Burrow > > > Key: SPARK-33833 > URL: https://issues.apache.org/jira/browse/SPARK-33833 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.0.1 >Reporter: Sam Davarnia >Priority: Major > > Because structured streaming tracks Kafka offset consumption by itself, > It is not possible to track total Kafka lag using Burrow similar to DStreams > We have used Stream hooks as mentioned > [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37] > > It would be great if Spark supports this feature out of the box. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow
[ https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258692#comment-17258692 ] L. C. Hsieh edited comment on SPARK-33833 at 1/5/21, 6:27 AM: -- Hmm, I did a few test locally. Does Burrow work only if Spark commits offset progress back to Kafka? I added some code to commit offset progress to Kafka. After I checked "__consumer_offsets" topic of Kafka, I found that no matter Spark commits the progress to Kafka or not, the record of the consumer group of the Spark SS query is always in "__consumer_offsets". Based on https://github.com/linkedin/Burrow/wiki, Burrow checks consumer groups info from this "__consumer_offsets" topic. So if either Spark commits or not, there will be a record about the consumer group, does it mean Burrow still works without Spark committing offset progress to Kafka? If so, then Spark doesn't need any change for this ticket. was (Author: viirya): Hmm, I did a few test locally. Does Burrow work only if Spark commits offset progress back to Kafka? I added some code to commit offset progress to Kafka. After I checked "__consumer_offsets" topic of Kafka, I found that no matter Spark commits the progress to Kafka or not, the record of the consumer group of the Spark SS query is always in "__consumer_offsets". Based on https://github.com/linkedin/Burrow/wiki, Burrow checks consumer groups info from this "__consumer_offsets" topic. So if either Spark commits or not, there will be a record about the consumer group, does it mean Burrow still works without Spark committing offset progress to Kafka? > Allow Spark Structured Streaming report Kafka Lag through Burrow > > > Key: SPARK-33833 > URL: https://issues.apache.org/jira/browse/SPARK-33833 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.0.1 >Reporter: Sam Davarnia >Priority: Major > > Because structured streaming tracks Kafka offset consumption by itself, > It is not possible to track total Kafka lag using Burrow similar to DStreams > We have used Stream hooks as mentioned > [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37] > > It would be great if Spark supports this feature out of the box. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow
[ https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258693#comment-17258693 ] L. C. Hsieh commented on SPARK-33833: - [~samdvr] Can you help elaborate the question above? Thanks. > Allow Spark Structured Streaming report Kafka Lag through Burrow > > > Key: SPARK-33833 > URL: https://issues.apache.org/jira/browse/SPARK-33833 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.0.1 >Reporter: Sam Davarnia >Priority: Major > > Because structured streaming tracks Kafka offset consumption by itself, > It is not possible to track total Kafka lag using Burrow similar to DStreams > We have used Stream hooks as mentioned > [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37] > > It would be great if Spark supports this feature out of the box. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow
[ https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258692#comment-17258692 ] L. C. Hsieh commented on SPARK-33833: - Hmm, I did a few test locally. Does Burrow work only if Spark commits offset progress back to Kafka? I added some code to commit offset progress to Kafka. After I checked "__consumer_offsets" topic of Kafka, I found that no matter Spark commits the progress to Kafka or not, the record of the consumer group of the Spark SS query is always in "__consumer_offsets". Based on https://github.com/linkedin/Burrow/wiki, Burrow checks consumer groups info from this "__consumer_offsets" topic. So if either Spark commits or not, there will be a record about the consumer group, does it mean Burrow still works without Spark committing offset progress to Kafka? > Allow Spark Structured Streaming report Kafka Lag through Burrow > > > Key: SPARK-33833 > URL: https://issues.apache.org/jira/browse/SPARK-33833 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.0.1 >Reporter: Sam Davarnia >Priority: Major > > Because structured streaming tracks Kafka offset consumption by itself, > It is not possible to track total Kafka lag using Burrow similar to DStreams > We have used Stream hooks as mentioned > [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37] > > It would be great if Spark supports this feature out of the box. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34002) Broken UDF Encoding
[ https://issues.apache.org/jira/browse/SPARK-34002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Hamilton updated SPARK-34002: -- Description: UDFs can behave differently depending on if a dataframe is cached, despite the dataframe being identical Repro: {code:java} import org.apache.spark.sql.expressions.UserDefinedFunction import org.apache.spark.sql.functions.{col, udf} case class Bar(a: Int) import spark.implicits._ def f1(bar: Bar): Option[Bar] = { None } def f2(bar: Bar): Option[Bar] = { Option(bar) } val udf1: UserDefinedFunction = udf(f1 _) val udf2: UserDefinedFunction = udf(f2 _) // Commenting in the cache will make this example work val df = (1 to 10).map(i => Tuple1(Bar(1))).toDF("c0")//.cache() val newDf = df .withColumn("c1", udf1(col("c0"))) .withColumn("c2", udf2(col("c1"))) newDf.show() {code} Error: Testing started at 12:58 AM ...Testing started at 12:58 AM ..."C:\Program Files\Java\jdk1.8.0_271\bin\java.exe" "-javaagent:C:\Program Files\JetBrains\IntelliJ IDEA 2020.2.3\lib\idea_rt.jar=56657:C:\Program Files\JetBrains\IntelliJ IDEA 2020.2.3\bin" -Dfile.encoding=UTF-8 -classpath "C:\Users\marhamil\AppData\Roaming\JetBrains\IntelliJIdea2020.2\plugins\Scala\lib\runners.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\charsets.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\deploy.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\ext\access-bridge-64.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\ext\cldrdata.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\ext\dnsns.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\ext\jaccess.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\ext\jfxrt.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\ext\localedata.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\ext\nashorn.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\ext\sunec.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\ext\sunjce_provider.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\ext\sunmscapi.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\ext\sunpkcs11.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\ext\zipfs.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\javaws.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\jce.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\jfr.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\jfxswt.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\jsse.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\management-agent.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\plugin.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\resources.jar;C:\Program Files\Java\jdk1.8.0_271\jre\lib\rt.jar;C:\code\mmlspark\target\scala-2.12\test-classes;C:\code\mmlspark\target\scala-2.12\classes;C:\Users\marhamil\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\aopalliance\aopalliance\1.0\aopalliance-1.0.jar;C:\Users\marhamil\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\beust\jcommander\1.27\jcommander-1.27.jar;C:\Users\marhamil\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\chuusai\shapeless_2.12\2.3.3\shapeless_2.12-2.3.3.jar;C:\Users\marhamil\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\clearspring\analytics\stream\2.9.6\stream-2.9.6.jar;C:\Users\marhamil\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\esotericsoftware\kryo-shaded\4.0.2\kryo-shaded-4.0.2.jar;C:\Users\marhamil\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\esotericsoftware\minlog\1.3.0\minlog-1.3.0.jar;C:\Users\marhamil\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\fasterxml\jackson\core\jackson-annotations\2.10.0\jackson-annotations-2.10.0.jar;C:\Users\marhamil\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\fasterxml\jackson\core\jackson-core\2.10.0\jackson-core-2.10.0.jar;C:\Users\marhamil\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\fasterxml\jackson\core\jackson-databind\2.10.0\jackson-databind-2.10.0.jar;C:\Users\marhamil\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\fasterxml\jackson\module\jackson-module-paranamer\2.10.0\jackson-module-paranamer-2.10.0.jar;C:\Users\marhamil\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\fasterxml\jackson\module\jackson-module-scala_2.12\2.10.0\jackson-module-scala_2.12-2.10.0.jar;C:\Users\marhamil\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\github\fommil\netlib\core\1.1.2\core-1.1.2.jar;C:\Users\marhamil\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\github\luben\zstd-jni\1.4.4-3\zstd-jni-1.4.4-3.jar;C:\Users\marhamil\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\github\spotbugs\spotbugs-annotations\3.1.9\spotbugs-annotations-3.1.9.jar;C:\Users\marhamil\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\github\vowpalwabbit\vw-jni\8.8.1\vw-jni-8.8.1.jar;C:\Users\marhamil\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\
[jira] [Updated] (SPARK-34002) Broken UDF Encoding
[ https://issues.apache.org/jira/browse/SPARK-34002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Hamilton updated SPARK-34002: -- Description: UDFs can behave differently depending on if a dataframe is cached, despite the dataframe being identical Repro: {code:java} import org.apache.spark.sql.expressions.UserDefinedFunction import org.apache.spark.sql.functions.{col, udf} case class Bar(a: Int) import spark.implicits._ def f1(bar: Bar): Option[Bar] = { None } def f2(bar: Bar): Option[Bar] = { Option(bar) } val udf1: UserDefinedFunction = udf(f1 _) val udf2: UserDefinedFunction = udf(f2 _) // Commenting in the cache will make this example work val df = (1 to 10).map(i => Tuple1(Bar(1))).toDF("c0")//.cache() val newDf = df .withColumn("c1", udf1(col("c0"))) .withColumn("c2", udf2(col("c1"))) newDf.show() {code} Error: {code:java} Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties21/01/05 00:52:57 INFO SparkContext: Running Spark version 3.0.121/01/05 00:52:57 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable21/01/05 00:52:57 INFO ResourceUtils: ==21/01/05 00:52:57 INFO ResourceUtils: Resources for spark.driver: 21/01/05 00:52:57 INFO ResourceUtils: ==21/01/05 00:52:57 INFO SparkContext: Submitted application: JsonOutputParserSuite21/01/05 00:52:57 INFO SparkContext: Spark configuration:spark.app.name=JsonOutputParserSuitespark.driver.maxResultSize=6gspark.logConf=truespark.master=local[*]spark.sql.crossJoin.enabled=truespark.sql.shuffle.partitions=20spark.sql.warehouse.dir=file:/code/mmlspark/spark-warehouse21/01/05 00:52:58 INFO SecurityManager: Changing view acls to: marhamil21/01/05 00:52:58 INFO SecurityManager: Changing modify acls to: marhamil21/01/05 00:52:58 INFO SecurityManager: Changing view acls groups to: 21/01/05 00:52:58 INFO SecurityManager: Changing modify acls groups to: 21/01/05 00:52:58 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(marhamil); groups with view permissions: Set(); users with modify permissions: Set(marhamil); groups with modify permissions: Set()21/01/05 00:52:58 INFO Utils: Successfully started service 'sparkDriver' on port 52315.21/01/05 00:52:58 INFO SparkEnv: Registering MapOutputTracker21/01/05 00:52:58 INFO SparkEnv: Registering BlockManagerMaster21/01/05 00:52:58 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information21/01/05 00:52:58 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up21/01/05 00:52:58 INFO SparkEnv: Registering BlockManagerMasterHeartbeat21/01/05 00:52:58 INFO DiskBlockManager: Created local directory at C:\Users\marhamil\AppData\Local\Temp\blockmgr-9a5c80ef-ade6-41ac-9933-a26f6c29171921/01/05 00:52:58 INFO MemoryStore: MemoryStore started with capacity 4.0 GiB21/01/05 00:52:59 INFO SparkEnv: Registering OutputCommitCoordinator21/01/05 00:52:59 INFO Utils: Successfully started service 'SparkUI' on port 4040.21/01/05 00:52:59 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://host.docker.internal:404021/01/05 00:52:59 INFO Executor: Starting executor ID driver on host host.docker.internal21/01/05 00:52:59 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 52359.21/01/05 00:52:59 INFO NettyBlockTransferService: Server created on host.docker.internal:5235921/01/05 00:52:59 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy21/01/05 00:52:59 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, host.docker.internal, 52359, None)21/01/05 00:52:59 INFO BlockManagerMasterEndpoint: Registering block manager host.docker.internal:52359 with 4.0 GiB RAM, BlockManagerId(driver, host.docker.internal, 52359, None)21/01/05 00:52:59 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, host.docker.internal, 52359, None)21/01/05 00:52:59 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, host.docker.internal, 52359, None)21/01/05 00:53:00 WARN SharedState: Not allowing to set spark.sql.warehouse.dir or hive.metastore.warehouse.dir in SparkSession's options, it should be set statically for cross-session usagesFailed to execute user defined function(JsonOutputParserSuite$$Lambda$574/51376124: (struct) => struct)org.apache.spark.SparkException: Failed to execute user defined function(JsonOutputParserSuite$$Lambda$574/51376124: (struct) => struct) at org.apache.spark.sql.catalyst.expressions.ScalaUDF.eval(ScalaUDF.scala:1130) at org.apache.spark.sql.catalyst.expressions.Alias.eval(na
[jira] [Updated] (SPARK-34002) Broken UDF Encoding
[ https://issues.apache.org/jira/browse/SPARK-34002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Hamilton updated SPARK-34002: -- Description: UDFs can behave differently depending on if a dataframe is cached, despite the dataframe being identical Repro: {code:java} case class Bar(a: Int) import spark.implicits._ def f1(bar: Bar): Option[Bar] = { None } def f2(bar: Bar): Option[Bar] = { Option(bar) } val udf1: UserDefinedFunction = udf(f1 _) val udf2: UserDefinedFunction = udf(f2 _) // Commenting in the cache will make this example work val df = (1 to 10).map(i => Tuple1(Bar(1))).toDF("c0")//.cache() val newDf = df .withColumn("c1", udf1(col("c0"))) .withColumn("c2", udf2(col("c1"))) newDf.show() {code} Error: {code:java} Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties21/01/05 00:52:57 INFO SparkContext: Running Spark version 3.0.121/01/05 00:52:57 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable21/01/05 00:52:57 INFO ResourceUtils: ==21/01/05 00:52:57 INFO ResourceUtils: Resources for spark.driver: 21/01/05 00:52:57 INFO ResourceUtils: ==21/01/05 00:52:57 INFO SparkContext: Submitted application: JsonOutputParserSuite21/01/05 00:52:57 INFO SparkContext: Spark configuration:spark.app.name=JsonOutputParserSuitespark.driver.maxResultSize=6gspark.logConf=truespark.master=local[*]spark.sql.crossJoin.enabled=truespark.sql.shuffle.partitions=20spark.sql.warehouse.dir=file:/code/mmlspark/spark-warehouse21/01/05 00:52:58 INFO SecurityManager: Changing view acls to: marhamil21/01/05 00:52:58 INFO SecurityManager: Changing modify acls to: marhamil21/01/05 00:52:58 INFO SecurityManager: Changing view acls groups to: 21/01/05 00:52:58 INFO SecurityManager: Changing modify acls groups to: 21/01/05 00:52:58 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(marhamil); groups with view permissions: Set(); users with modify permissions: Set(marhamil); groups with modify permissions: Set()21/01/05 00:52:58 INFO Utils: Successfully started service 'sparkDriver' on port 52315.21/01/05 00:52:58 INFO SparkEnv: Registering MapOutputTracker21/01/05 00:52:58 INFO SparkEnv: Registering BlockManagerMaster21/01/05 00:52:58 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information21/01/05 00:52:58 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up21/01/05 00:52:58 INFO SparkEnv: Registering BlockManagerMasterHeartbeat21/01/05 00:52:58 INFO DiskBlockManager: Created local directory at C:\Users\marhamil\AppData\Local\Temp\blockmgr-9a5c80ef-ade6-41ac-9933-a26f6c29171921/01/05 00:52:58 INFO MemoryStore: MemoryStore started with capacity 4.0 GiB21/01/05 00:52:59 INFO SparkEnv: Registering OutputCommitCoordinator21/01/05 00:52:59 INFO Utils: Successfully started service 'SparkUI' on port 4040.21/01/05 00:52:59 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://host.docker.internal:404021/01/05 00:52:59 INFO Executor: Starting executor ID driver on host host.docker.internal21/01/05 00:52:59 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 52359.21/01/05 00:52:59 INFO NettyBlockTransferService: Server created on host.docker.internal:5235921/01/05 00:52:59 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy21/01/05 00:52:59 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, host.docker.internal, 52359, None)21/01/05 00:52:59 INFO BlockManagerMasterEndpoint: Registering block manager host.docker.internal:52359 with 4.0 GiB RAM, BlockManagerId(driver, host.docker.internal, 52359, None)21/01/05 00:52:59 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, host.docker.internal, 52359, None)21/01/05 00:52:59 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, host.docker.internal, 52359, None)21/01/05 00:53:00 WARN SharedState: Not allowing to set spark.sql.warehouse.dir or hive.metastore.warehouse.dir in SparkSession's options, it should be set statically for cross-session usagesFailed to execute user defined function(JsonOutputParserSuite$$Lambda$574/51376124: (struct) => struct)org.apache.spark.SparkException: Failed to execute user defined function(JsonOutputParserSuite$$Lambda$574/51376124: (struct) => struct) at org.apache.spark.sql.catalyst.expressions.ScalaUDF.eval(ScalaUDF.scala:1130) at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:156) at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Int
[jira] [Updated] (SPARK-32085) Migrate to NumPy documentation style
[ https://issues.apache.org/jira/browse/SPARK-32085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-32085: - Fix Version/s: 3.1.0 > Migrate to NumPy documentation style > > > Key: SPARK-32085 > URL: https://issues.apache.org/jira/browse/SPARK-32085 > Project: Spark > Issue Type: Umbrella > Components: PySpark >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Maciej Szymkiewicz >Priority: Major > Fix For: 3.1.0 > > > https://github.com/numpy/numpydoc > For example, > Before: > https://github.com/apache/spark/blob/f0e6d0ec13d9cdadf341d1b976623345bcdb1028/python/pyspark/sql/dataframe.py#L276-L318 > After: > https://github.com/databricks/koalas/blob/6711e9c0f50c79dd57eeedb530da6c4ea3298de2/databricks/koalas/frame.py#L1122-L1176 > We can incrementally start to switch. > NOTE that this JIRA targets only to switch the style. It does not target to > add additional information or fixes together. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-32085) Migrate to NumPy documentation style
[ https://issues.apache.org/jira/browse/SPARK-32085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-32085. -- Resolution: Done > Migrate to NumPy documentation style > > > Key: SPARK-32085 > URL: https://issues.apache.org/jira/browse/SPARK-32085 > Project: Spark > Issue Type: Umbrella > Components: PySpark >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Maciej Szymkiewicz >Priority: Major > Fix For: 3.1.0 > > > https://github.com/numpy/numpydoc > For example, > Before: > https://github.com/apache/spark/blob/f0e6d0ec13d9cdadf341d1b976623345bcdb1028/python/pyspark/sql/dataframe.py#L276-L318 > After: > https://github.com/databricks/koalas/blob/6711e9c0f50c79dd57eeedb530da6c4ea3298de2/databricks/koalas/frame.py#L1122-L1176 > We can incrementally start to switch. > NOTE that this JIRA targets only to switch the style. It does not target to > add additional information or fixes together. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34002) Broken UDF behavior
Mark Hamilton created SPARK-34002: - Summary: Broken UDF behavior Key: SPARK-34002 URL: https://issues.apache.org/jira/browse/SPARK-34002 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.0.1 Reporter: Mark Hamilton UDFs can behave differently depending on if a dataframe is cached, despite the dataframe being identical Repro: {code:java} case class Bar(a: Int) import spark.implicits._ def f1(bar: Bar): Option[Bar] = { None } def f2(bar: Bar): Option[Bar] = { Option(bar) } val udf1: UserDefinedFunction = udf(f1 _) val udf2: UserDefinedFunction = udf(f2 _) // Commenting in the cache will make this example work val df = (1 to 10).map(i => Tuple1(Bar(1))).toDF("c0")//.cache() val newDf = df .withColumn("c1", udf1(col("c0"))) .withColumn("c2", udf2(col("c1"))) newDf.show() {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33242) Install numpydoc in Jenkins machines
[ https://issues.apache.org/jira/browse/SPARK-33242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-33242: - Parent: (was: SPARK-32085) Issue Type: Test (was: Sub-task) > Install numpydoc in Jenkins machines > > > Key: SPARK-33242 > URL: https://issues.apache.org/jira/browse/SPARK-33242 > Project: Spark > Issue Type: Test > Components: Project Infra, PySpark >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Shane Knapp >Priority: Major > > To switch to reST style to numpydoc style, we should install numpydoc as > well. This is being used in Sphinx. See the parent JIRA as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33992) resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer
[ https://issues.apache.org/jira/browse/SPARK-33992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-33992: -- Fix Version/s: (was: 3.1.0) 3.1.1 > resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer > - > > Key: SPARK-33992 > URL: https://issues.apache.org/jira/browse/SPARK-33992 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 3.1.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Minor > Fix For: 3.1.1 > > > PaddingAndLengthCheckForCharVarchar could fail query when > resolveOperatorsUpWithNewOutput > with > {code:java} > [info] - char/varchar resolution in sub query *** FAILED *** (367 > milliseconds) > [info] java.lang.RuntimeException: This method should not be called in the > analyzer > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule(AnalysisHelper.scala:150) > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule$(AnalysisHelper.scala:146) > [info] at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.assertNotAnalysisRule(LogicalPlan.scala:29) > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown(AnalysisHelper.scala:161) > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown$(AnalysisHelper.scala:160) > [info] at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29) > [info] at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29) > [info] at > org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$updateOuterReferencesInSubquery(QueryPlan.scala:267) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34000) ExecutorAllocationListener threw an exception java.util.NoSuchElementException
[ https://issues.apache.org/jira/browse/SPARK-34000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-34000: -- Fix Version/s: (was: 3.1.0) 3.1.1 > ExecutorAllocationListener threw an exception java.util.NoSuchElementException > -- > > Key: SPARK-34000 > URL: https://issues.apache.org/jira/browse/SPARK-34000 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.1, 3.1.0, 3.2.0 >Reporter: Lantao Jin >Assignee: Lantao Jin >Priority: Major > Fix For: 3.0.2, 3.1.1 > > > 21/01/04 03:00:32,259 WARN [task-result-getter-2] scheduler.TaskSetManager:69 > : Lost task 306.1 in stage 600.0 (TID 283610, > hdc49-mcc10-01-0510-4108-039-tess0097.stratus.rno.ebay.com, executor 27): > TaskKilled (another attempt succeeded) > 21/01/04 03:00:32,259 INFO [task-result-getter-2] scheduler.TaskSetManager:57 > : Task 306.1 in stage 600.0 (TID 283610) failed, but the task will not be > re-executed (either because the task failed with a shuffle data fetch > failure, so the > previous stage needs to be re-run, or because a different copy of the task > has already succeeded). > 21/01/04 03:00:32,259 INFO [task-result-getter-2] > cluster.YarnClusterScheduler:57 : Removed TaskSet 600.0, whose tasks have all > completed, from pool default > 21/01/04 03:00:32,259 INFO [HiveServer2-Handler-Pool: Thread-5853] > thriftserver.SparkExecuteStatementOperation:190 : Returning result set with > 50 rows from offsets [5378600, 5378650) with > 1fe245f8-a7f9-4ec0-bcb5-8cf324cbbb47 > 21/01/04 03:00:32,260 ERROR [spark-listener-group-executorManagement] > scheduler.AsyncEventQueue:94 : Listener ExecutorAllocationListener threw an > exception > java.util.NoSuchElementException: key not found: Stage 600 (Attempt 0) > at scala.collection.MapLike.default(MapLike.scala:235) > at scala.collection.MapLike.default$(MapLike.scala:234) > at scala.collection.AbstractMap.default(Map.scala:63) > at scala.collection.mutable.HashMap.apply(HashMap.scala:69) > at > org.apache.spark.ExecutorAllocationManager$ExecutorAllocationListener.onTaskEnd(ExecutorAllocationManager.scala:621) > at > org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:45) > at > org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) > at > org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38) > at > org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38) > at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:115) > at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:99) > at > org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:116) > at > org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:116) > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) > at > org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:102) > at > org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:97) > at > org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320) > at > org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:97) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34000) ExecutorAllocationListener threw an exception java.util.NoSuchElementException
[ https://issues.apache.org/jira/browse/SPARK-34000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-34000: - Assignee: Lantao Jin > ExecutorAllocationListener threw an exception java.util.NoSuchElementException > -- > > Key: SPARK-34000 > URL: https://issues.apache.org/jira/browse/SPARK-34000 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.1, 3.1.0, 3.2.0 >Reporter: Lantao Jin >Assignee: Lantao Jin >Priority: Major > > 21/01/04 03:00:32,259 WARN [task-result-getter-2] scheduler.TaskSetManager:69 > : Lost task 306.1 in stage 600.0 (TID 283610, > hdc49-mcc10-01-0510-4108-039-tess0097.stratus.rno.ebay.com, executor 27): > TaskKilled (another attempt succeeded) > 21/01/04 03:00:32,259 INFO [task-result-getter-2] scheduler.TaskSetManager:57 > : Task 306.1 in stage 600.0 (TID 283610) failed, but the task will not be > re-executed (either because the task failed with a shuffle data fetch > failure, so the > previous stage needs to be re-run, or because a different copy of the task > has already succeeded). > 21/01/04 03:00:32,259 INFO [task-result-getter-2] > cluster.YarnClusterScheduler:57 : Removed TaskSet 600.0, whose tasks have all > completed, from pool default > 21/01/04 03:00:32,259 INFO [HiveServer2-Handler-Pool: Thread-5853] > thriftserver.SparkExecuteStatementOperation:190 : Returning result set with > 50 rows from offsets [5378600, 5378650) with > 1fe245f8-a7f9-4ec0-bcb5-8cf324cbbb47 > 21/01/04 03:00:32,260 ERROR [spark-listener-group-executorManagement] > scheduler.AsyncEventQueue:94 : Listener ExecutorAllocationListener threw an > exception > java.util.NoSuchElementException: key not found: Stage 600 (Attempt 0) > at scala.collection.MapLike.default(MapLike.scala:235) > at scala.collection.MapLike.default$(MapLike.scala:234) > at scala.collection.AbstractMap.default(Map.scala:63) > at scala.collection.mutable.HashMap.apply(HashMap.scala:69) > at > org.apache.spark.ExecutorAllocationManager$ExecutorAllocationListener.onTaskEnd(ExecutorAllocationManager.scala:621) > at > org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:45) > at > org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) > at > org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38) > at > org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38) > at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:115) > at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:99) > at > org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:116) > at > org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:116) > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) > at > org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:102) > at > org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:97) > at > org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320) > at > org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:97) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34000) ExecutorAllocationListener threw an exception java.util.NoSuchElementException
[ https://issues.apache.org/jira/browse/SPARK-34000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-34000. --- Fix Version/s: 3.0.2 3.1.0 Resolution: Fixed Issue resolved by pull request 31025 [https://github.com/apache/spark/pull/31025] > ExecutorAllocationListener threw an exception java.util.NoSuchElementException > -- > > Key: SPARK-34000 > URL: https://issues.apache.org/jira/browse/SPARK-34000 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.1, 3.1.0, 3.2.0 >Reporter: Lantao Jin >Assignee: Lantao Jin >Priority: Major > Fix For: 3.1.0, 3.0.2 > > > 21/01/04 03:00:32,259 WARN [task-result-getter-2] scheduler.TaskSetManager:69 > : Lost task 306.1 in stage 600.0 (TID 283610, > hdc49-mcc10-01-0510-4108-039-tess0097.stratus.rno.ebay.com, executor 27): > TaskKilled (another attempt succeeded) > 21/01/04 03:00:32,259 INFO [task-result-getter-2] scheduler.TaskSetManager:57 > : Task 306.1 in stage 600.0 (TID 283610) failed, but the task will not be > re-executed (either because the task failed with a shuffle data fetch > failure, so the > previous stage needs to be re-run, or because a different copy of the task > has already succeeded). > 21/01/04 03:00:32,259 INFO [task-result-getter-2] > cluster.YarnClusterScheduler:57 : Removed TaskSet 600.0, whose tasks have all > completed, from pool default > 21/01/04 03:00:32,259 INFO [HiveServer2-Handler-Pool: Thread-5853] > thriftserver.SparkExecuteStatementOperation:190 : Returning result set with > 50 rows from offsets [5378600, 5378650) with > 1fe245f8-a7f9-4ec0-bcb5-8cf324cbbb47 > 21/01/04 03:00:32,260 ERROR [spark-listener-group-executorManagement] > scheduler.AsyncEventQueue:94 : Listener ExecutorAllocationListener threw an > exception > java.util.NoSuchElementException: key not found: Stage 600 (Attempt 0) > at scala.collection.MapLike.default(MapLike.scala:235) > at scala.collection.MapLike.default$(MapLike.scala:234) > at scala.collection.AbstractMap.default(Map.scala:63) > at scala.collection.mutable.HashMap.apply(HashMap.scala:69) > at > org.apache.spark.ExecutorAllocationManager$ExecutorAllocationListener.onTaskEnd(ExecutorAllocationManager.scala:621) > at > org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:45) > at > org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) > at > org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38) > at > org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38) > at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:115) > at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:99) > at > org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:116) > at > org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:116) > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) > at > org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:102) > at > org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:97) > at > org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320) > at > org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:97) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33992) resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer
[ https://issues.apache.org/jira/browse/SPARK-33992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-33992: --- Assignee: Kent Yao > resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer > - > > Key: SPARK-33992 > URL: https://issues.apache.org/jira/browse/SPARK-33992 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 3.1.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Minor > > PaddingAndLengthCheckForCharVarchar could fail query when > resolveOperatorsUpWithNewOutput > with > {code:java} > [info] - char/varchar resolution in sub query *** FAILED *** (367 > milliseconds) > [info] java.lang.RuntimeException: This method should not be called in the > analyzer > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule(AnalysisHelper.scala:150) > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule$(AnalysisHelper.scala:146) > [info] at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.assertNotAnalysisRule(LogicalPlan.scala:29) > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown(AnalysisHelper.scala:161) > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown$(AnalysisHelper.scala:160) > [info] at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29) > [info] at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29) > [info] at > org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$updateOuterReferencesInSubquery(QueryPlan.scala:267) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34001) Remove unused runShowTablesSql() in DataSourceV2SQLSuite.scala
[ https://issues.apache.org/jira/browse/SPARK-34001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258676#comment-17258676 ] Apache Spark commented on SPARK-34001: -- User 'imback82' has created a pull request for this issue: https://github.com/apache/spark/pull/31022 > Remove unused runShowTablesSql() in DataSourceV2SQLSuite.scala > -- > > Key: SPARK-34001 > URL: https://issues.apache.org/jira/browse/SPARK-34001 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Terry Kim >Assignee: Terry Kim >Priority: Minor > Fix For: 3.2.0 > > > runShowTablesSql() in DataSourceV2SQLSuite.scala is no longer used and can be > removed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33992) resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer
[ https://issues.apache.org/jira/browse/SPARK-33992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-33992. - Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 31013 [https://github.com/apache/spark/pull/31013] > resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer > - > > Key: SPARK-33992 > URL: https://issues.apache.org/jira/browse/SPARK-33992 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 3.1.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Minor > Fix For: 3.1.0 > > > PaddingAndLengthCheckForCharVarchar could fail query when > resolveOperatorsUpWithNewOutput > with > {code:java} > [info] - char/varchar resolution in sub query *** FAILED *** (367 > milliseconds) > [info] java.lang.RuntimeException: This method should not be called in the > analyzer > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule(AnalysisHelper.scala:150) > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule$(AnalysisHelper.scala:146) > [info] at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.assertNotAnalysisRule(LogicalPlan.scala:29) > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown(AnalysisHelper.scala:161) > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown$(AnalysisHelper.scala:160) > [info] at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29) > [info] at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29) > [info] at > org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$updateOuterReferencesInSubquery(QueryPlan.scala:267) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34001) Remove unused runShowTablesSql() in DataSourceV2SQLSuite.scala
[ https://issues.apache.org/jira/browse/SPARK-34001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258675#comment-17258675 ] Apache Spark commented on SPARK-34001: -- User 'imback82' has created a pull request for this issue: https://github.com/apache/spark/pull/31022 > Remove unused runShowTablesSql() in DataSourceV2SQLSuite.scala > -- > > Key: SPARK-34001 > URL: https://issues.apache.org/jira/browse/SPARK-34001 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Terry Kim >Assignee: Terry Kim >Priority: Minor > Fix For: 3.2.0 > > > runShowTablesSql() in DataSourceV2SQLSuite.scala is no longer used and can be > removed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34001) Remove unused runShowTablesSql() in DataSourceV2SQLSuite.scala
[ https://issues.apache.org/jira/browse/SPARK-34001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-34001: - Assignee: Terry Kim > Remove unused runShowTablesSql() in DataSourceV2SQLSuite.scala > -- > > Key: SPARK-34001 > URL: https://issues.apache.org/jira/browse/SPARK-34001 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Terry Kim >Assignee: Terry Kim >Priority: Minor > > runShowTablesSql() in DataSourceV2SQLSuite.scala is no longer used and can be > removed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34001) Remove unused runShowTablesSql() in DataSourceV2SQLSuite.scala
[ https://issues.apache.org/jira/browse/SPARK-34001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-34001. --- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31022 [https://github.com/apache/spark/pull/31022] > Remove unused runShowTablesSql() in DataSourceV2SQLSuite.scala > -- > > Key: SPARK-34001 > URL: https://issues.apache.org/jira/browse/SPARK-34001 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Terry Kim >Assignee: Terry Kim >Priority: Minor > Fix For: 3.2.0 > > > runShowTablesSql() in DataSourceV2SQLSuite.scala is no longer used and can be > removed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33998) Refactor v2CommandExec to provide an API to create an InternalRow
[ https://issues.apache.org/jira/browse/SPARK-33998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-33998. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31020 [https://github.com/apache/spark/pull/31020] > Refactor v2CommandExec to provide an API to create an InternalRow > - > > Key: SPARK-33998 > URL: https://issues.apache.org/jira/browse/SPARK-33998 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Terry Kim >Assignee: Terry Kim >Priority: Minor > Fix For: 3.2.0 > > > There are many v2 commands such as "SHOW TABLES", "DESCRIBE TABLE", etc. that > require creating InternalRow. Creating InternalRow can be refactored into > v2CommandExec to remove duplicate code to create serializer, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33998) Refactor v2CommandExec to provide an API to create an InternalRow
[ https://issues.apache.org/jira/browse/SPARK-33998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-33998: --- Assignee: Terry Kim > Refactor v2CommandExec to provide an API to create an InternalRow > - > > Key: SPARK-33998 > URL: https://issues.apache.org/jira/browse/SPARK-33998 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Terry Kim >Assignee: Terry Kim >Priority: Minor > > There are many v2 commands such as "SHOW TABLES", "DESCRIBE TABLE", etc. that > require creating InternalRow. Creating InternalRow can be refactored into > v2CommandExec to remove duplicate code to create serializer, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34000) ExecutorAllocationListener threw an exception java.util.NoSuchElementException
[ https://issues.apache.org/jira/browse/SPARK-34000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lantao Jin updated SPARK-34000: --- Affects Version/s: 3.0.1 > ExecutorAllocationListener threw an exception java.util.NoSuchElementException > -- > > Key: SPARK-34000 > URL: https://issues.apache.org/jira/browse/SPARK-34000 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.1, 3.1.0, 3.2.0 >Reporter: Lantao Jin >Priority: Major > > 21/01/04 03:00:32,259 WARN [task-result-getter-2] scheduler.TaskSetManager:69 > : Lost task 306.1 in stage 600.0 (TID 283610, > hdc49-mcc10-01-0510-4108-039-tess0097.stratus.rno.ebay.com, executor 27): > TaskKilled (another attempt succeeded) > 21/01/04 03:00:32,259 INFO [task-result-getter-2] scheduler.TaskSetManager:57 > : Task 306.1 in stage 600.0 (TID 283610) failed, but the task will not be > re-executed (either because the task failed with a shuffle data fetch > failure, so the > previous stage needs to be re-run, or because a different copy of the task > has already succeeded). > 21/01/04 03:00:32,259 INFO [task-result-getter-2] > cluster.YarnClusterScheduler:57 : Removed TaskSet 600.0, whose tasks have all > completed, from pool default > 21/01/04 03:00:32,259 INFO [HiveServer2-Handler-Pool: Thread-5853] > thriftserver.SparkExecuteStatementOperation:190 : Returning result set with > 50 rows from offsets [5378600, 5378650) with > 1fe245f8-a7f9-4ec0-bcb5-8cf324cbbb47 > 21/01/04 03:00:32,260 ERROR [spark-listener-group-executorManagement] > scheduler.AsyncEventQueue:94 : Listener ExecutorAllocationListener threw an > exception > java.util.NoSuchElementException: key not found: Stage 600 (Attempt 0) > at scala.collection.MapLike.default(MapLike.scala:235) > at scala.collection.MapLike.default$(MapLike.scala:234) > at scala.collection.AbstractMap.default(Map.scala:63) > at scala.collection.mutable.HashMap.apply(HashMap.scala:69) > at > org.apache.spark.ExecutorAllocationManager$ExecutorAllocationListener.onTaskEnd(ExecutorAllocationManager.scala:621) > at > org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:45) > at > org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) > at > org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38) > at > org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38) > at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:115) > at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:99) > at > org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:116) > at > org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:116) > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) > at > org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:102) > at > org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:97) > at > org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320) > at > org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:97) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34001) Remove unused runShowTablesSql() in DataSourceV2SQLSuite.scala
Terry Kim created SPARK-34001: - Summary: Remove unused runShowTablesSql() in DataSourceV2SQLSuite.scala Key: SPARK-34001 URL: https://issues.apache.org/jira/browse/SPARK-34001 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Terry Kim runShowTablesSql() in DataSourceV2SQLSuite.scala is no longer used and can be removed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34000) ExecutorAllocationListener threw an exception java.util.NoSuchElementException
[ https://issues.apache.org/jira/browse/SPARK-34000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lantao Jin updated SPARK-34000: --- Affects Version/s: (was: 3.0.1) > ExecutorAllocationListener threw an exception java.util.NoSuchElementException > -- > > Key: SPARK-34000 > URL: https://issues.apache.org/jira/browse/SPARK-34000 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.0, 3.2.0 >Reporter: Lantao Jin >Priority: Major > > 21/01/04 03:00:32,259 WARN [task-result-getter-2] scheduler.TaskSetManager:69 > : Lost task 306.1 in stage 600.0 (TID 283610, > hdc49-mcc10-01-0510-4108-039-tess0097.stratus.rno.ebay.com, executor 27): > TaskKilled (another attempt succeeded) > 21/01/04 03:00:32,259 INFO [task-result-getter-2] scheduler.TaskSetManager:57 > : Task 306.1 in stage 600.0 (TID 283610) failed, but the task will not be > re-executed (either because the task failed with a shuffle data fetch > failure, so the > previous stage needs to be re-run, or because a different copy of the task > has already succeeded). > 21/01/04 03:00:32,259 INFO [task-result-getter-2] > cluster.YarnClusterScheduler:57 : Removed TaskSet 600.0, whose tasks have all > completed, from pool default > 21/01/04 03:00:32,259 INFO [HiveServer2-Handler-Pool: Thread-5853] > thriftserver.SparkExecuteStatementOperation:190 : Returning result set with > 50 rows from offsets [5378600, 5378650) with > 1fe245f8-a7f9-4ec0-bcb5-8cf324cbbb47 > 21/01/04 03:00:32,260 ERROR [spark-listener-group-executorManagement] > scheduler.AsyncEventQueue:94 : Listener ExecutorAllocationListener threw an > exception > java.util.NoSuchElementException: key not found: Stage 600 (Attempt 0) > at scala.collection.MapLike.default(MapLike.scala:235) > at scala.collection.MapLike.default$(MapLike.scala:234) > at scala.collection.AbstractMap.default(Map.scala:63) > at scala.collection.mutable.HashMap.apply(HashMap.scala:69) > at > org.apache.spark.ExecutorAllocationManager$ExecutorAllocationListener.onTaskEnd(ExecutorAllocationManager.scala:621) > at > org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:45) > at > org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) > at > org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38) > at > org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38) > at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:115) > at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:99) > at > org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:116) > at > org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:116) > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) > at > org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:102) > at > org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:97) > at > org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320) > at > org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:97) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-31786) Exception on submitting Spark-Pi to Kubernetes 1.17.3
[ https://issues.apache.org/jira/browse/SPARK-31786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258670#comment-17258670 ] Dongjoon Hyun edited comment on SPARK-31786 at 1/5/21, 5:24 AM: Yes, you are correct. # `export` is only required for your machine. # `–conf` should be used for `driverEnv`. Yes, Spark 3.0 is better for K8s environment and Spark 3.1 is much better because of SPARK-33005 (`Kubernetes GA Preparation`). FYI, Apache Spark 3.1.0 RC1 is already created. - [https://github.com/apache/spark/tree/v3.1.0-rc1] Apache Spark 3.1.0 will arrive this month. was (Author: dongjoon): Yes, you are correct. # `export` is only required for your machine. # `–conf` should be used for `driverEnv`. Yes, Spark 3.0 is better for K8s environment and Spark 3.1 is much better because of SPARK-33005 . FYI, Apache Spark 3.1.0 RC1 is already created. - https://github.com/apache/spark/tree/v3.1.0-rc1 Apache Spark 3.1.0 will arrive this month. > Exception on submitting Spark-Pi to Kubernetes 1.17.3 > - > > Key: SPARK-31786 > URL: https://issues.apache.org/jira/browse/SPARK-31786 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.4.5, 3.0.0 >Reporter: Maciej Bryński >Assignee: Dongjoon Hyun >Priority: Blocker > Fix For: 3.0.0 > > > Hi, > I'm getting exception when submitting Spark-Pi app to Kubernetes cluster. > Kubernetes version: 1.17.3 > JDK version: openjdk version "1.8.0_252" > Exception: > {code} > ./bin/spark-submit --master k8s://https://172.31.23.60:8443 --deploy-mode > cluster --name spark-pi --conf > spark.kubernetes.container.image=spark-py:2.4.5 --conf > spark.kubernetes.executor.request.cores=0.1 --conf > spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf > spark.executor.instances=1 local:///opt/spark/examples/src/main/python/pi.py > log4j:WARN No appenders could be found for logger > (io.fabric8.kubernetes.client.Config). > log4j:WARN Please initialize the log4j system properly. > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more > info. > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > Exception in thread "main" > io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create] > for kind: [Pod] with name: [null] in namespace: [default] failed. > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64) > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:337) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:330) > at > org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:141) > at > org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:140) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) > at > org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.net.SocketException: Broken pipe (Write failed) > at java.net.SocketOutputStream.socketWrite0(Native Method) > at > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111) > at java.net.SocketOutputStream.write(SocketOutputStream.java:155) > at s
[jira] [Commented] (SPARK-31786) Exception on submitting Spark-Pi to Kubernetes 1.17.3
[ https://issues.apache.org/jira/browse/SPARK-31786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258670#comment-17258670 ] Dongjoon Hyun commented on SPARK-31786: --- Yes, you are correct. # `export` is only required for your machine. # `–conf` should be used for `driverEnv`. Yes, Spark 3.0 is better for K8s environment and Spark 3.1 is much better because of SPARK-33005 . FYI, Apache Spark 3.1.0 RC1 is already created. - https://github.com/apache/spark/tree/v3.1.0-rc1 Apache Spark 3.1.0 will arrive this month. > Exception on submitting Spark-Pi to Kubernetes 1.17.3 > - > > Key: SPARK-31786 > URL: https://issues.apache.org/jira/browse/SPARK-31786 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.4.5, 3.0.0 >Reporter: Maciej Bryński >Assignee: Dongjoon Hyun >Priority: Blocker > Fix For: 3.0.0 > > > Hi, > I'm getting exception when submitting Spark-Pi app to Kubernetes cluster. > Kubernetes version: 1.17.3 > JDK version: openjdk version "1.8.0_252" > Exception: > {code} > ./bin/spark-submit --master k8s://https://172.31.23.60:8443 --deploy-mode > cluster --name spark-pi --conf > spark.kubernetes.container.image=spark-py:2.4.5 --conf > spark.kubernetes.executor.request.cores=0.1 --conf > spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf > spark.executor.instances=1 local:///opt/spark/examples/src/main/python/pi.py > log4j:WARN No appenders could be found for logger > (io.fabric8.kubernetes.client.Config). > log4j:WARN Please initialize the log4j system properly. > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more > info. > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > Exception in thread "main" > io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create] > for kind: [Pod] with name: [null] in namespace: [default] failed. > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64) > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:337) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:330) > at > org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:141) > at > org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:140) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) > at > org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.net.SocketException: Broken pipe (Write failed) > at java.net.SocketOutputStream.socketWrite0(Native Method) > at > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111) > at java.net.SocketOutputStream.write(SocketOutputStream.java:155) > at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431) > at sun.security.ssl.OutputRecord.write(OutputRecord.java:417) > at > sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:894) > at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:865) > at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123) > at okio.Okio$1.write(Okio.java:79) > at okio.AsyncTimeout$1.
[jira] [Resolved] (SPARK-33794) next_day function should throw runtime exception when receiving invalid input under ANSI mode
[ https://issues.apache.org/jira/browse/SPARK-33794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-33794. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 30807 [https://github.com/apache/spark/pull/30807] > next_day function should throw runtime exception when receiving invalid input > under ANSI mode > - > > Key: SPARK-33794 > URL: https://issues.apache.org/jira/browse/SPARK-33794 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Chongguang LIU >Assignee: Chongguang LIU >Priority: Major > Fix For: 3.2.0 > > > Hello all, > According to [ANSI > compliance|https://spark.apache.org/docs/3.0.0/sql-ref-ansi-compliance.html#ansi-compliance], > the [next_day > function|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L3095] > should throw an runtime exception when receiving invalid value for > dayOfWeek, exemple receiving "xx" instead of "SUNDAY". > > A similar improvement has been done on the element_at function: > https://issues.apache.org/jira/browse/SPARK-33386 > > If you agree with this proposition, i can submit a pull request with > necessary change. > > Kind regardes, > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33794) next_day function should throw runtime exception when receiving invalid input under ANSI mode
[ https://issues.apache.org/jira/browse/SPARK-33794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-33794: --- Assignee: Chongguang LIU > next_day function should throw runtime exception when receiving invalid input > under ANSI mode > - > > Key: SPARK-33794 > URL: https://issues.apache.org/jira/browse/SPARK-33794 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Chongguang LIU >Assignee: Chongguang LIU >Priority: Major > > Hello all, > According to [ANSI > compliance|https://spark.apache.org/docs/3.0.0/sql-ref-ansi-compliance.html#ansi-compliance], > the [next_day > function|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L3095] > should throw an runtime exception when receiving invalid value for > dayOfWeek, exemple receiving "xx" instead of "SUNDAY". > > A similar improvement has been done on the element_at function: > https://issues.apache.org/jira/browse/SPARK-33386 > > If you agree with this proposition, i can submit a pull request with > necessary change. > > Kind regardes, > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25075) Build and test Spark against Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-25075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258667#comment-17258667 ] Dongjoon Hyun commented on SPARK-25075: --- [~smarter]. Sorry, but unfortunately, from my assesement, the current status is a little different. 1. Apache Spark community is not able to publish Scala 2.13-based Maven artifacts yet. 2. Apache Spark community is not able to provide Scala 2.13-based binary distribution yet. 3. As you see at this JIRA, the target version of this Jira is 3.2.0, not 3.1.0. 4. For Apache Spark 3.1.0, we already created RC1 without SPARK-33894 and SPARK-33894 is marked as Spark 3.1.1. * [https://github.com/apache/spark/releases/tag/v3.1.0-rc1] Due to (1)~(4), Apache Spark 3.1.0 RC1 will have only Scala 2.12 libraries and binaries during vote period. Of course, I guess we will roll more RCs with more improvements; at least SPARK-33894 will be a part of 3.1.0. However, I don't think we can say Scala 2.13 is supported without the official Scala 2.13 binaries and Scala 2.13 Maven artifacts. I guess you also agree that those are mandatory. cc [~hyukjin.kwon] and [~srowen] > Build and test Spark against Scala 2.13 > --- > > Key: SPARK-25075 > URL: https://issues.apache.org/jira/browse/SPARK-25075 > Project: Spark > Issue Type: Umbrella > Components: Build, MLlib, Project Infra, Spark Core, SQL >Affects Versions: 3.0.0 >Reporter: Guillaume Massé >Priority: Major > > This umbrella JIRA tracks the requirements for building and testing Spark > against the current Scala 2.13 milestone. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33950) ALTER TABLE .. DROP PARTITION doesn't refresh cache
[ https://issues.apache.org/jira/browse/SPARK-33950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-33950: -- Fix Version/s: (was: 3.1.0) 3.1.1 > ALTER TABLE .. DROP PARTITION doesn't refresh cache > --- > > Key: SPARK-33950 > URL: https://issues.apache.org/jira/browse/SPARK-33950 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1, 3.1.0, 3.2.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Labels: correctness > Fix For: 3.0.2, 3.2.0, 3.1.1 > > > Here is the example to reproduce the issue: > {code:sql} > spark-sql> CREATE TABLE tbl1 (col0 int, part0 int) USING parquet PARTITIONED > BY (part0); > spark-sql> INSERT INTO tbl1 PARTITION (part0=0) SELECT 0; > spark-sql> INSERT INTO tbl1 PARTITION (part0=1) SELECT 1; > spark-sql> CACHE TABLE tbl1; > spark-sql> SELECT * FROM tbl1; > 0 0 > 1 1 > spark-sql> ALTER TABLE tbl1 DROP PARTITION (part0=0); > spark-sql> SELECT * FROM tbl1; > 0 0 > 1 1 > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33894) Word2VecSuite failed for Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-33894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-33894: -- Fix Version/s: (was: 3.1.0) 3.1.1 > Word2VecSuite failed for Scala 2.13 > --- > > Key: SPARK-33894 > URL: https://issues.apache.org/jira/browse/SPARK-33894 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Affects Versions: 3.2.0 >Reporter: Darcy Shen >Assignee: koert kuipers >Priority: Major > Fix For: 3.1.1 > > > This may be the first failed build: > https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7-scala-2.13/52/ > h2. Possible Work Around Fix > Move > case class Data(word: String, vector: Array[Float]) > out of the class Word2VecModel > h2. Attempts to git bisect > master branch git "bisect" > cc23581e2645c91fa8d6e6c81dc87b4221718bb1 fail > 3d0323401f7a3e4369a3d3f4ff98f15d19e8a643 fail > 9d9d4a8e122cf1137edeca857e925f7e76c1ace2 fail > f5d2165c95fe83f24be9841807613950c1d5d6d0 fail 2020-12-01 > h2. Attached Stack Trace > To reproduce it in master: > ./dev/change-scala-version.sh 2.13 > sbt -Pscala-2.13 > > project mllib > > testOnly org.apache.spark.ml.feature.Word2VecSuite > [info] Word2VecSuite: > [info] - params (45 milliseconds) > [info] - Word2Vec (5 seconds, 768 milliseconds) > [info] - getVectors (549 milliseconds) > [info] - findSynonyms (222 milliseconds) > [info] - window size (382 milliseconds) > [info] - Word2Vec read/write numPartitions calculation (1 millisecond) > [info] - Word2Vec read/write (669 milliseconds) > [info] - Word2VecModel read/write *** FAILED *** (423 milliseconds) > [info] org.apache.spark.SparkException: Job aborted. > [info] at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:231) > [info] at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:188) > [info] at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108) > [info] at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106) > [info] at > org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:131) > [info] at > org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180) > [info] at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218) > [info] at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > [info] at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215) > [info] at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176) > [info] at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132) > [info] at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131) > [info] at > org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989) > [info] at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) > [info] at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) > [info] at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) > [info] at > org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772) > [info] at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) > [info] at > org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989) > [info] at > org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438) > [info] at > org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415) > [info] at > org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293) > [info] at > org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:874) > [info] at > org.apache.spark.ml.feature.Word2VecModel$Word2VecModelWriter.saveImpl(Word2Vec.scala:368) > [info] at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:168) > [info] at org.apache.spark.ml.util.MLWritable.save(ReadWrite.scala:287) > [info] at org.apache.spark.ml.util.MLWritable.save$(ReadWrite.scala:287) > [info] at org.apache.spark.ml.feature.Word2VecModel.save(Word2Vec.scala:207) > [info] at > org.apache.spark.ml.util.DefaultReadWriteTest.testDefaultReadWrite(DefaultReadWriteTest.scala:51) > [info] at > org.apache.spark.ml.util.DefaultReadWriteTest.testDefaultReadWrite$(DefaultReadWriteTest.scala:42) > [info] at > org.apache.spark.ml.feature.Word2VecSuite.testDefaultReadWrite(Word2
[jira] [Updated] (SPARK-33980) invalidate char/varchar in spark.readStream.schema
[ https://issues.apache.org/jira/browse/SPARK-33980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-33980: -- Fix Version/s: (was: 3.1.0) 3.1.1 > invalidate char/varchar in spark.readStream.schema > -- > > Key: SPARK-33980 > URL: https://issues.apache.org/jira/browse/SPARK-33980 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.1.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 3.1.1 > > > invalidate char/varchar in spark.readStream.schema just like what we do for > spark.read.schema -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34000) ExecutorAllocationListener threw an exception java.util.NoSuchElementException
[ https://issues.apache.org/jira/browse/SPARK-34000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34000: Assignee: (was: Apache Spark) > ExecutorAllocationListener threw an exception java.util.NoSuchElementException > -- > > Key: SPARK-34000 > URL: https://issues.apache.org/jira/browse/SPARK-34000 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.1, 3.1.0, 3.2.0 >Reporter: Lantao Jin >Priority: Major > > 21/01/04 03:00:32,259 WARN [task-result-getter-2] scheduler.TaskSetManager:69 > : Lost task 306.1 in stage 600.0 (TID 283610, > hdc49-mcc10-01-0510-4108-039-tess0097.stratus.rno.ebay.com, executor 27): > TaskKilled (another attempt succeeded) > 21/01/04 03:00:32,259 INFO [task-result-getter-2] scheduler.TaskSetManager:57 > : Task 306.1 in stage 600.0 (TID 283610) failed, but the task will not be > re-executed (either because the task failed with a shuffle data fetch > failure, so the > previous stage needs to be re-run, or because a different copy of the task > has already succeeded). > 21/01/04 03:00:32,259 INFO [task-result-getter-2] > cluster.YarnClusterScheduler:57 : Removed TaskSet 600.0, whose tasks have all > completed, from pool default > 21/01/04 03:00:32,259 INFO [HiveServer2-Handler-Pool: Thread-5853] > thriftserver.SparkExecuteStatementOperation:190 : Returning result set with > 50 rows from offsets [5378600, 5378650) with > 1fe245f8-a7f9-4ec0-bcb5-8cf324cbbb47 > 21/01/04 03:00:32,260 ERROR [spark-listener-group-executorManagement] > scheduler.AsyncEventQueue:94 : Listener ExecutorAllocationListener threw an > exception > java.util.NoSuchElementException: key not found: Stage 600 (Attempt 0) > at scala.collection.MapLike.default(MapLike.scala:235) > at scala.collection.MapLike.default$(MapLike.scala:234) > at scala.collection.AbstractMap.default(Map.scala:63) > at scala.collection.mutable.HashMap.apply(HashMap.scala:69) > at > org.apache.spark.ExecutorAllocationManager$ExecutorAllocationListener.onTaskEnd(ExecutorAllocationManager.scala:621) > at > org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:45) > at > org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) > at > org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38) > at > org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38) > at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:115) > at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:99) > at > org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:116) > at > org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:116) > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) > at > org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:102) > at > org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:97) > at > org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320) > at > org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:97) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34000) ExecutorAllocationListener threw an exception java.util.NoSuchElementException
[ https://issues.apache.org/jira/browse/SPARK-34000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34000: Assignee: Apache Spark > ExecutorAllocationListener threw an exception java.util.NoSuchElementException > -- > > Key: SPARK-34000 > URL: https://issues.apache.org/jira/browse/SPARK-34000 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.1, 3.1.0, 3.2.0 >Reporter: Lantao Jin >Assignee: Apache Spark >Priority: Major > > 21/01/04 03:00:32,259 WARN [task-result-getter-2] scheduler.TaskSetManager:69 > : Lost task 306.1 in stage 600.0 (TID 283610, > hdc49-mcc10-01-0510-4108-039-tess0097.stratus.rno.ebay.com, executor 27): > TaskKilled (another attempt succeeded) > 21/01/04 03:00:32,259 INFO [task-result-getter-2] scheduler.TaskSetManager:57 > : Task 306.1 in stage 600.0 (TID 283610) failed, but the task will not be > re-executed (either because the task failed with a shuffle data fetch > failure, so the > previous stage needs to be re-run, or because a different copy of the task > has already succeeded). > 21/01/04 03:00:32,259 INFO [task-result-getter-2] > cluster.YarnClusterScheduler:57 : Removed TaskSet 600.0, whose tasks have all > completed, from pool default > 21/01/04 03:00:32,259 INFO [HiveServer2-Handler-Pool: Thread-5853] > thriftserver.SparkExecuteStatementOperation:190 : Returning result set with > 50 rows from offsets [5378600, 5378650) with > 1fe245f8-a7f9-4ec0-bcb5-8cf324cbbb47 > 21/01/04 03:00:32,260 ERROR [spark-listener-group-executorManagement] > scheduler.AsyncEventQueue:94 : Listener ExecutorAllocationListener threw an > exception > java.util.NoSuchElementException: key not found: Stage 600 (Attempt 0) > at scala.collection.MapLike.default(MapLike.scala:235) > at scala.collection.MapLike.default$(MapLike.scala:234) > at scala.collection.AbstractMap.default(Map.scala:63) > at scala.collection.mutable.HashMap.apply(HashMap.scala:69) > at > org.apache.spark.ExecutorAllocationManager$ExecutorAllocationListener.onTaskEnd(ExecutorAllocationManager.scala:621) > at > org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:45) > at > org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) > at > org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38) > at > org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38) > at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:115) > at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:99) > at > org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:116) > at > org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:116) > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) > at > org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:102) > at > org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:97) > at > org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320) > at > org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:97) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34000) ExecutorAllocationListener threw an exception java.util.NoSuchElementException
[ https://issues.apache.org/jira/browse/SPARK-34000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258640#comment-17258640 ] Apache Spark commented on SPARK-34000: -- User 'LantaoJin' has created a pull request for this issue: https://github.com/apache/spark/pull/31025 > ExecutorAllocationListener threw an exception java.util.NoSuchElementException > -- > > Key: SPARK-34000 > URL: https://issues.apache.org/jira/browse/SPARK-34000 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.1, 3.1.0, 3.2.0 >Reporter: Lantao Jin >Priority: Major > > 21/01/04 03:00:32,259 WARN [task-result-getter-2] scheduler.TaskSetManager:69 > : Lost task 306.1 in stage 600.0 (TID 283610, > hdc49-mcc10-01-0510-4108-039-tess0097.stratus.rno.ebay.com, executor 27): > TaskKilled (another attempt succeeded) > 21/01/04 03:00:32,259 INFO [task-result-getter-2] scheduler.TaskSetManager:57 > : Task 306.1 in stage 600.0 (TID 283610) failed, but the task will not be > re-executed (either because the task failed with a shuffle data fetch > failure, so the > previous stage needs to be re-run, or because a different copy of the task > has already succeeded). > 21/01/04 03:00:32,259 INFO [task-result-getter-2] > cluster.YarnClusterScheduler:57 : Removed TaskSet 600.0, whose tasks have all > completed, from pool default > 21/01/04 03:00:32,259 INFO [HiveServer2-Handler-Pool: Thread-5853] > thriftserver.SparkExecuteStatementOperation:190 : Returning result set with > 50 rows from offsets [5378600, 5378650) with > 1fe245f8-a7f9-4ec0-bcb5-8cf324cbbb47 > 21/01/04 03:00:32,260 ERROR [spark-listener-group-executorManagement] > scheduler.AsyncEventQueue:94 : Listener ExecutorAllocationListener threw an > exception > java.util.NoSuchElementException: key not found: Stage 600 (Attempt 0) > at scala.collection.MapLike.default(MapLike.scala:235) > at scala.collection.MapLike.default$(MapLike.scala:234) > at scala.collection.AbstractMap.default(Map.scala:63) > at scala.collection.mutable.HashMap.apply(HashMap.scala:69) > at > org.apache.spark.ExecutorAllocationManager$ExecutorAllocationListener.onTaskEnd(ExecutorAllocationManager.scala:621) > at > org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:45) > at > org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) > at > org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38) > at > org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38) > at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:115) > at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:99) > at > org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:116) > at > org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:116) > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) > at > org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:102) > at > org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:97) > at > org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320) > at > org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:97) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33979) Filter predicate reorder
[ https://issues.apache.org/jira/browse/SPARK-33979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33979: Assignee: Apache Spark > Filter predicate reorder > > > Key: SPARK-33979 > URL: https://issues.apache.org/jira/browse/SPARK-33979 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Assignee: Apache Spark >Priority: Major > > Reorder filter predicate to improve query performance: > {noformat} > others < In < Like < UDF/CaseWhen/If < Inset < LikeAny/LikeAll > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33979) Filter predicate reorder
[ https://issues.apache.org/jira/browse/SPARK-33979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33979: Assignee: (was: Apache Spark) > Filter predicate reorder > > > Key: SPARK-33979 > URL: https://issues.apache.org/jira/browse/SPARK-33979 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > Reorder filter predicate to improve query performance: > {noformat} > others < In < Like < UDF/CaseWhen/If < Inset < LikeAny/LikeAll > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33979) Filter predicate reorder
[ https://issues.apache.org/jira/browse/SPARK-33979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258639#comment-17258639 ] Apache Spark commented on SPARK-33979: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/31024 > Filter predicate reorder > > > Key: SPARK-33979 > URL: https://issues.apache.org/jira/browse/SPARK-33979 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > Reorder filter predicate to improve query performance: > {noformat} > others < In < Like < UDF/CaseWhen/If < Inset < LikeAny/LikeAll > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34000) ExecutorAllocationListener threw an exception java.util.NoSuchElementException
Lantao Jin created SPARK-34000: -- Summary: ExecutorAllocationListener threw an exception java.util.NoSuchElementException Key: SPARK-34000 URL: https://issues.apache.org/jira/browse/SPARK-34000 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.0.1, 3.1.0, 3.2.0 Reporter: Lantao Jin 21/01/04 03:00:32,259 WARN [task-result-getter-2] scheduler.TaskSetManager:69 : Lost task 306.1 in stage 600.0 (TID 283610, hdc49-mcc10-01-0510-4108-039-tess0097.stratus.rno.ebay.com, executor 27): TaskKilled (another attempt succeeded) 21/01/04 03:00:32,259 INFO [task-result-getter-2] scheduler.TaskSetManager:57 : Task 306.1 in stage 600.0 (TID 283610) failed, but the task will not be re-executed (either because the task failed with a shuffle data fetch failure, so the previous stage needs to be re-run, or because a different copy of the task has already succeeded). 21/01/04 03:00:32,259 INFO [task-result-getter-2] cluster.YarnClusterScheduler:57 : Removed TaskSet 600.0, whose tasks have all completed, from pool default 21/01/04 03:00:32,259 INFO [HiveServer2-Handler-Pool: Thread-5853] thriftserver.SparkExecuteStatementOperation:190 : Returning result set with 50 rows from offsets [5378600, 5378650) with 1fe245f8-a7f9-4ec0-bcb5-8cf324cbbb47 21/01/04 03:00:32,260 ERROR [spark-listener-group-executorManagement] scheduler.AsyncEventQueue:94 : Listener ExecutorAllocationListener threw an exception java.util.NoSuchElementException: key not found: Stage 600 (Attempt 0) at scala.collection.MapLike.default(MapLike.scala:235) at scala.collection.MapLike.default$(MapLike.scala:234) at scala.collection.AbstractMap.default(Map.scala:63) at scala.collection.mutable.HashMap.apply(HashMap.scala:69) at org.apache.spark.ExecutorAllocationManager$ExecutorAllocationListener.onTaskEnd(ExecutorAllocationManager.scala:621) at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:45) at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38) at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38) at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:115) at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:99) at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:116) at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:116) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:102) at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:97) at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320) at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:97) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33999) Make sbt unidoc success with JDK11
[ https://issues.apache.org/jira/browse/SPARK-33999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258634#comment-17258634 ] Apache Spark commented on SPARK-33999: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/31023 > Make sbt unidoc success with JDK11 > -- > > Key: SPARK-33999 > URL: https://issues.apache.org/jira/browse/SPARK-33999 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > With the current master, sbt unidoc fails because the generated Java sources > cause syntax error. > As of JDK11, the default doclet seems to refuse such syntax error. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33999) Make sbt unidoc success with JDK11
[ https://issues.apache.org/jira/browse/SPARK-33999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33999: Assignee: Kousuke Saruta (was: Apache Spark) > Make sbt unidoc success with JDK11 > -- > > Key: SPARK-33999 > URL: https://issues.apache.org/jira/browse/SPARK-33999 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > With the current master, sbt unidoc fails because the generated Java sources > cause syntax error. > As of JDK11, the default doclet seems to refuse such syntax error. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33999) Make sbt unidoc success with JDK11
[ https://issues.apache.org/jira/browse/SPARK-33999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33999: Assignee: Apache Spark (was: Kousuke Saruta) > Make sbt unidoc success with JDK11 > -- > > Key: SPARK-33999 > URL: https://issues.apache.org/jira/browse/SPARK-33999 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Apache Spark >Priority: Minor > > With the current master, sbt unidoc fails because the generated Java sources > cause syntax error. > As of JDK11, the default doclet seems to refuse such syntax error. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33999) Make sbt unidoc success with JDK11
Kousuke Saruta created SPARK-33999: -- Summary: Make sbt unidoc success with JDK11 Key: SPARK-33999 URL: https://issues.apache.org/jira/browse/SPARK-33999 Project: Spark Issue Type: Sub-task Components: Build Affects Versions: 3.2.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta With the current master, sbt unidoc fails because the generated Java sources cause syntax error. As of JDK11, the default doclet seems to refuse such syntax error. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16087) Spark Hangs When Using Union With Persisted Hadoop RDD
[ https://issues.apache.org/jira/browse/SPARK-16087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rob Russo updated SPARK-16087: -- Affects Version/s: 3.0.1 > Spark Hangs When Using Union With Persisted Hadoop RDD > -- > > Key: SPARK-16087 > URL: https://issues.apache.org/jira/browse/SPARK-16087 > Project: Spark > Issue Type: Bug >Affects Versions: 1.4.1, 1.6.1, 2.0.1, 3.0.1 >Reporter: Kevin Conaway >Priority: Critical > Labels: bulk-closed > Attachments: SPARK-16087.dump.log, SPARK-16087.log, Screen Shot > 2016-06-21 at 4.27.26 PM.png, Screen Shot 2016-06-21 at 4.27.35 PM.png, > part-0, part-1, spark-16087.tar.gz > > > Spark hangs when materializing a persisted RDD that was built from a Hadoop > sequence file and then union-ed with a similar RDD. > Below is a small file that exhibits the issue: > {code:java} > import org.apache.hadoop.io.BytesWritable; > import org.apache.hadoop.io.LongWritable; > import org.apache.spark.SparkConf; > import org.apache.spark.api.java.JavaPairRDD; > import org.apache.spark.api.java.JavaSparkContext; > import org.apache.spark.api.java.function.PairFunction; > import org.apache.spark.serializer.KryoSerializer; > import org.apache.spark.storage.StorageLevel; > import scala.Tuple2; > public class SparkBug { > public static void main(String [] args) throws Exception { > JavaSparkContext sc = new JavaSparkContext( > new SparkConf() > .set("spark.serializer", KryoSerializer.class.getName()) > .set("spark.master", "local[*]") > .setAppName(SparkBug.class.getName()) > ); > JavaPairRDD rdd1 = sc.sequenceFile( >"hdfs://localhost:9000/part-0", > LongWritable.class, > BytesWritable.class > ).mapToPair(new PairFunction, > LongWritable, BytesWritable>() { > @Override > public Tuple2 > call(Tuple2 tuple) throws Exception { > return new Tuple2<>( > new LongWritable(tuple._1.get()), > new BytesWritable(tuple._2.copyBytes()) > ); > } > }).persist( > StorageLevel.MEMORY_ONLY() > ); > System.out.println("Before union: " + rdd1.count()); > JavaPairRDD rdd2 = sc.sequenceFile( > "hdfs://localhost:9000/part-1", > LongWritable.class, > BytesWritable.class > ); > JavaPairRDD joined = rdd1.union(rdd2); > System.out.println("After union: " + joined.count()); > } > } > {code} > You'll need to upload the attached part-0 and part-1 to a local hdfs > instance (I'm just using a dummy [Single Node > Cluster|http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html] > locally). > Some things to note: > - It does not hang if rdd1 is not persisted > - It does not hang is rdd1 is not materialized (via calling rdd1.count()) > before the union-ed RDD is materialized > - It does not hang if the mapToPair() transformation is removed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16087) Spark Hangs When Using Union With Persisted Hadoop RDD
[ https://issues.apache.org/jira/browse/SPARK-16087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258627#comment-17258627 ] Rob Russo commented on SPARK-16087: --- I know this ticket is old now but spark 3 seems to have resurfaced the issues. I had a suite of tests that worked fine in spark 2.x but spent more than a month intermittently debugging why a number of my tests hung only on spark 3. As [~kevinconaway] said in his comment, it may be 1 refactor away from resurfacing and it seems that might be what happened. For anyone running into this issue, here is my resolution that i finally discovered from this ticket: Based on [~kevinconaway]'s comment saying that setting _spark.driver.host=localhost_ forces the problem, I found that setting _spark.driver.host=127.0.0.1_ completely fixes the problem. Hopefully this helps for anyone else who is running into this. Due to this issue popping up i'm going to reopen the ticket and mark spark 3 as an affected version. > Spark Hangs When Using Union With Persisted Hadoop RDD > -- > > Key: SPARK-16087 > URL: https://issues.apache.org/jira/browse/SPARK-16087 > Project: Spark > Issue Type: Bug >Affects Versions: 1.4.1, 1.6.1, 2.0.1 >Reporter: Kevin Conaway >Priority: Critical > Labels: bulk-closed > Attachments: SPARK-16087.dump.log, SPARK-16087.log, Screen Shot > 2016-06-21 at 4.27.26 PM.png, Screen Shot 2016-06-21 at 4.27.35 PM.png, > part-0, part-1, spark-16087.tar.gz > > > Spark hangs when materializing a persisted RDD that was built from a Hadoop > sequence file and then union-ed with a similar RDD. > Below is a small file that exhibits the issue: > {code:java} > import org.apache.hadoop.io.BytesWritable; > import org.apache.hadoop.io.LongWritable; > import org.apache.spark.SparkConf; > import org.apache.spark.api.java.JavaPairRDD; > import org.apache.spark.api.java.JavaSparkContext; > import org.apache.spark.api.java.function.PairFunction; > import org.apache.spark.serializer.KryoSerializer; > import org.apache.spark.storage.StorageLevel; > import scala.Tuple2; > public class SparkBug { > public static void main(String [] args) throws Exception { > JavaSparkContext sc = new JavaSparkContext( > new SparkConf() > .set("spark.serializer", KryoSerializer.class.getName()) > .set("spark.master", "local[*]") > .setAppName(SparkBug.class.getName()) > ); > JavaPairRDD rdd1 = sc.sequenceFile( >"hdfs://localhost:9000/part-0", > LongWritable.class, > BytesWritable.class > ).mapToPair(new PairFunction, > LongWritable, BytesWritable>() { > @Override > public Tuple2 > call(Tuple2 tuple) throws Exception { > return new Tuple2<>( > new LongWritable(tuple._1.get()), > new BytesWritable(tuple._2.copyBytes()) > ); > } > }).persist( > StorageLevel.MEMORY_ONLY() > ); > System.out.println("Before union: " + rdd1.count()); > JavaPairRDD rdd2 = sc.sequenceFile( > "hdfs://localhost:9000/part-1", > LongWritable.class, > BytesWritable.class > ); > JavaPairRDD joined = rdd1.union(rdd2); > System.out.println("After union: " + joined.count()); > } > } > {code} > You'll need to upload the attached part-0 and part-1 to a local hdfs > instance (I'm just using a dummy [Single Node > Cluster|http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html] > locally). > Some things to note: > - It does not hang if rdd1 is not persisted > - It does not hang is rdd1 is not materialized (via calling rdd1.count()) > before the union-ed RDD is materialized > - It does not hang if the mapToPair() transformation is removed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-16087) Spark Hangs When Using Union With Persisted Hadoop RDD
[ https://issues.apache.org/jira/browse/SPARK-16087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rob Russo reopened SPARK-16087: --- Reopening as it occurred for us only after upgrading to spark 3.x > Spark Hangs When Using Union With Persisted Hadoop RDD > -- > > Key: SPARK-16087 > URL: https://issues.apache.org/jira/browse/SPARK-16087 > Project: Spark > Issue Type: Bug >Affects Versions: 1.4.1, 1.6.1, 2.0.1 >Reporter: Kevin Conaway >Priority: Critical > Labels: bulk-closed > Attachments: SPARK-16087.dump.log, SPARK-16087.log, Screen Shot > 2016-06-21 at 4.27.26 PM.png, Screen Shot 2016-06-21 at 4.27.35 PM.png, > part-0, part-1, spark-16087.tar.gz > > > Spark hangs when materializing a persisted RDD that was built from a Hadoop > sequence file and then union-ed with a similar RDD. > Below is a small file that exhibits the issue: > {code:java} > import org.apache.hadoop.io.BytesWritable; > import org.apache.hadoop.io.LongWritable; > import org.apache.spark.SparkConf; > import org.apache.spark.api.java.JavaPairRDD; > import org.apache.spark.api.java.JavaSparkContext; > import org.apache.spark.api.java.function.PairFunction; > import org.apache.spark.serializer.KryoSerializer; > import org.apache.spark.storage.StorageLevel; > import scala.Tuple2; > public class SparkBug { > public static void main(String [] args) throws Exception { > JavaSparkContext sc = new JavaSparkContext( > new SparkConf() > .set("spark.serializer", KryoSerializer.class.getName()) > .set("spark.master", "local[*]") > .setAppName(SparkBug.class.getName()) > ); > JavaPairRDD rdd1 = sc.sequenceFile( >"hdfs://localhost:9000/part-0", > LongWritable.class, > BytesWritable.class > ).mapToPair(new PairFunction, > LongWritable, BytesWritable>() { > @Override > public Tuple2 > call(Tuple2 tuple) throws Exception { > return new Tuple2<>( > new LongWritable(tuple._1.get()), > new BytesWritable(tuple._2.copyBytes()) > ); > } > }).persist( > StorageLevel.MEMORY_ONLY() > ); > System.out.println("Before union: " + rdd1.count()); > JavaPairRDD rdd2 = sc.sequenceFile( > "hdfs://localhost:9000/part-1", > LongWritable.class, > BytesWritable.class > ); > JavaPairRDD joined = rdd1.union(rdd2); > System.out.println("After union: " + joined.count()); > } > } > {code} > You'll need to upload the attached part-0 and part-1 to a local hdfs > instance (I'm just using a dummy [Single Node > Cluster|http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html] > locally). > Some things to note: > - It does not hang if rdd1 is not persisted > - It does not hang is rdd1 is not materialized (via calling rdd1.count()) > before the union-ed RDD is materialized > - It does not hang if the mapToPair() transformation is removed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33964) Combine distinct unions in more cases
[ https://issues.apache.org/jira/browse/SPARK-33964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-33964. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 30996 [https://github.com/apache/spark/pull/30996] > Combine distinct unions in more cases > - > > Key: SPARK-33964 > URL: https://issues.apache.org/jira/browse/SPARK-33964 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Tanel Kiis >Assignee: Tanel Kiis >Priority: Major > Fix For: 3.2.0 > > > In several TPCDS queries the CombineUnions rule does not manage to combine > unions, because they have noop Projects between them. > The Projects will be removed by RemoveNoopOperators, but by then > ReplaceDistinctWithAggregate has been applied and there are aggregates > between the unions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33964) Combine distinct unions in more cases
[ https://issues.apache.org/jira/browse/SPARK-33964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-33964: Assignee: Tanel Kiis > Combine distinct unions in more cases > - > > Key: SPARK-33964 > URL: https://issues.apache.org/jira/browse/SPARK-33964 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Tanel Kiis >Assignee: Tanel Kiis >Priority: Major > > In several TPCDS queries the CombineUnions rule does not manage to combine > unions, because they have noop Projects between them. > The Projects will be removed by RemoveNoopOperators, but by then > ReplaceDistinctWithAggregate has been applied and there are aggregates > between the unions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33382) Unify v1 and v2 SHOW TABLES tests
[ https://issues.apache.org/jira/browse/SPARK-33382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258617#comment-17258617 ] Apache Spark commented on SPARK-33382: -- User 'imback82' has created a pull request for this issue: https://github.com/apache/spark/pull/31022 > Unify v1 and v2 SHOW TABLES tests > - > > Key: SPARK-33382 > URL: https://issues.apache.org/jira/browse/SPARK-33382 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.1.0 > > > Gather common tests for DSv1 and DSv2 SHOW TABLES command to a common test. > Mix this trait to datasource specific test suites. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33997) Running bin/spark-sql gives NoSuchMethodError
[ https://issues.apache.org/jira/browse/SPARK-33997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved SPARK-33997. Resolution: Cannot Reproduce Rebuilt Spark locally and the error was gone. > Running bin/spark-sql gives NoSuchMethodError > - > > Key: SPARK-33997 > URL: https://issues.apache.org/jira/browse/SPARK-33997 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 >Reporter: Ted Yu >Priority: Major > > I ran 'mvn install -Phive -Phive-thriftserver -DskipTests' > Running bin/spark-sql gives the following error: > {code} > 21/01/05 00:06:06 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > Setting default log level to "WARN". > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use > setLogLevel(newLevel). > Exception in thread "main" java.lang.NoSuchMethodError: > org.apache.spark.sql.internal.SharedState$.loadHiveConfFile$default$3()Lscala/collection/Map; > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:136) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:934) > {code} > Scala version 2.12.10 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33998) Refactor v2CommandExec to provide an API to create an InternalRow
[ https://issues.apache.org/jira/browse/SPARK-33998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258606#comment-17258606 ] Apache Spark commented on SPARK-33998: -- User 'imback82' has created a pull request for this issue: https://github.com/apache/spark/pull/31020 > Refactor v2CommandExec to provide an API to create an InternalRow > - > > Key: SPARK-33998 > URL: https://issues.apache.org/jira/browse/SPARK-33998 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Terry Kim >Priority: Minor > > There are many v2 commands such as "SHOW TABLES", "DESCRIBE TABLE", etc. that > require creating InternalRow. Creating InternalRow can be refactored into > v2CommandExec to remove duplicate code to create serializer, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33998) Refactor v2CommandExec to provide an API to create an InternalRow
[ https://issues.apache.org/jira/browse/SPARK-33998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33998: Assignee: (was: Apache Spark) > Refactor v2CommandExec to provide an API to create an InternalRow > - > > Key: SPARK-33998 > URL: https://issues.apache.org/jira/browse/SPARK-33998 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Terry Kim >Priority: Minor > > There are many v2 commands such as "SHOW TABLES", "DESCRIBE TABLE", etc. that > require creating InternalRow. Creating InternalRow can be refactored into > v2CommandExec to remove duplicate code to create serializer, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33998) Refactor v2CommandExec to provide an API to create an InternalRow
[ https://issues.apache.org/jira/browse/SPARK-33998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258605#comment-17258605 ] Apache Spark commented on SPARK-33998: -- User 'imback82' has created a pull request for this issue: https://github.com/apache/spark/pull/31020 > Refactor v2CommandExec to provide an API to create an InternalRow > - > > Key: SPARK-33998 > URL: https://issues.apache.org/jira/browse/SPARK-33998 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Terry Kim >Priority: Minor > > There are many v2 commands such as "SHOW TABLES", "DESCRIBE TABLE", etc. that > require creating InternalRow. Creating InternalRow can be refactored into > v2CommandExec to remove duplicate code to create serializer, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33998) Refactor v2CommandExec to provide an API to create an InternalRow
[ https://issues.apache.org/jira/browse/SPARK-33998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33998: Assignee: Apache Spark > Refactor v2CommandExec to provide an API to create an InternalRow > - > > Key: SPARK-33998 > URL: https://issues.apache.org/jira/browse/SPARK-33998 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Terry Kim >Assignee: Apache Spark >Priority: Minor > > There are many v2 commands such as "SHOW TABLES", "DESCRIBE TABLE", etc. that > require creating InternalRow. Creating InternalRow can be refactored into > v2CommandExec to remove duplicate code to create serializer, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org