date:20210104

[jira] [Commented] (SPARK-34007) Downgrade scala-maven-plugin to 4.3.0

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258736#comment-17258736
 ] 

Apache Spark commented on SPARK-34007:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/31031

> Downgrade scala-maven-plugin to 4.3.0
> -
>
> Key: SPARK-34007
> URL: https://issues.apache.org/jira/browse/SPARK-34007
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Blocker
>
> After we upgraded scala-maven-plugin to 4.4.0 at SPARK-33512, the docker 
> release script fails as below:
> {code}
> [INFO] Compiling 21 Scala sources and 3 Java sources to 
> /opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes
>  ...
> [ERROR] ## Exception when compiling 24 sources to 
> /opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes
> java.lang.SecurityException: class "javax.servlet.SessionCookieConfig"'s 
> signer information does not match signer information of other classes in the 
> same package
> java.lang.ClassLoader.checkCerts(ClassLoader.java:891)
> java.lang.ClassLoader.preDefineClass(ClassLoader.java:661)
> java.lang.ClassLoader.defineClass(ClassLoader.java:754)
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
> java.net.URLClassLoader.access$100(URLClassLoader.java:74)
> java.net.URLClassLoader$1.run(URLClassLoader.java:369)
> java.net.URLClassLoader$1.run(URLClassLoader.java:363)
> java.security.AccessController.doPrivileged(Native Method)
> java.net.URLClassLoader.findClass(URLClassLoader.java:362)
> java.lang.ClassLoader.loadClass(ClassLoader.java:418)
> java.lang.ClassLoader.loadClass(ClassLoader.java:351)
> java.lang.Class.getDeclaredMethods0(Native Method)
> java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
> java.lang.Class.privateGetPublicMethods(Class.java:2902)
> java.lang.Class.getMethods(Class.java:1615)
> sbt.internal.inc.ClassToAPI$.toDefinitions0(ClassToAPI.scala:170)
> sbt.internal.inc.ClassToAPI$.$anonfun$toDefinitions$1(ClassToAPI.scala:123)
> scala.collection.mutable.HashMap.getOrElseUpdate(HashMap.scala:86)
> sbt.internal.inc.ClassToAPI$.toDefinitions(ClassToAPI.scala:123)
> sbt.internal.inc.ClassToAPI$.$anonfun$process$1(ClassToAPI.scala:33)
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34007) Downgrade scala-maven-plugin to 4.3.0

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34007:


Assignee: Apache Spark

> Downgrade scala-maven-plugin to 4.3.0
> -
>
> Key: SPARK-34007
> URL: https://issues.apache.org/jira/browse/SPARK-34007
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Blocker
>
> After we upgraded scala-maven-plugin to 4.4.0 at SPARK-33512, the docker 
> release script fails as below:
> {code}
> [INFO] Compiling 21 Scala sources and 3 Java sources to 
> /opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes
>  ...
> [ERROR] ## Exception when compiling 24 sources to 
> /opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes
> java.lang.SecurityException: class "javax.servlet.SessionCookieConfig"'s 
> signer information does not match signer information of other classes in the 
> same package
> java.lang.ClassLoader.checkCerts(ClassLoader.java:891)
> java.lang.ClassLoader.preDefineClass(ClassLoader.java:661)
> java.lang.ClassLoader.defineClass(ClassLoader.java:754)
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
> java.net.URLClassLoader.access$100(URLClassLoader.java:74)
> java.net.URLClassLoader$1.run(URLClassLoader.java:369)
> java.net.URLClassLoader$1.run(URLClassLoader.java:363)
> java.security.AccessController.doPrivileged(Native Method)
> java.net.URLClassLoader.findClass(URLClassLoader.java:362)
> java.lang.ClassLoader.loadClass(ClassLoader.java:418)
> java.lang.ClassLoader.loadClass(ClassLoader.java:351)
> java.lang.Class.getDeclaredMethods0(Native Method)
> java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
> java.lang.Class.privateGetPublicMethods(Class.java:2902)
> java.lang.Class.getMethods(Class.java:1615)
> sbt.internal.inc.ClassToAPI$.toDefinitions0(ClassToAPI.scala:170)
> sbt.internal.inc.ClassToAPI$.$anonfun$toDefinitions$1(ClassToAPI.scala:123)
> scala.collection.mutable.HashMap.getOrElseUpdate(HashMap.scala:86)
> sbt.internal.inc.ClassToAPI$.toDefinitions(ClassToAPI.scala:123)
> sbt.internal.inc.ClassToAPI$.$anonfun$process$1(ClassToAPI.scala:33)
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34007) Downgrade scala-maven-plugin to 4.3.0

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34007:


Assignee: (was: Apache Spark)

> Downgrade scala-maven-plugin to 4.3.0
> -
>
> Key: SPARK-34007
> URL: https://issues.apache.org/jira/browse/SPARK-34007
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Blocker
>
> After we upgraded scala-maven-plugin to 4.4.0 at SPARK-33512, the docker 
> release script fails as below:
> {code}
> [INFO] Compiling 21 Scala sources and 3 Java sources to 
> /opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes
>  ...
> [ERROR] ## Exception when compiling 24 sources to 
> /opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes
> java.lang.SecurityException: class "javax.servlet.SessionCookieConfig"'s 
> signer information does not match signer information of other classes in the 
> same package
> java.lang.ClassLoader.checkCerts(ClassLoader.java:891)
> java.lang.ClassLoader.preDefineClass(ClassLoader.java:661)
> java.lang.ClassLoader.defineClass(ClassLoader.java:754)
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
> java.net.URLClassLoader.access$100(URLClassLoader.java:74)
> java.net.URLClassLoader$1.run(URLClassLoader.java:369)
> java.net.URLClassLoader$1.run(URLClassLoader.java:363)
> java.security.AccessController.doPrivileged(Native Method)
> java.net.URLClassLoader.findClass(URLClassLoader.java:362)
> java.lang.ClassLoader.loadClass(ClassLoader.java:418)
> java.lang.ClassLoader.loadClass(ClassLoader.java:351)
> java.lang.Class.getDeclaredMethods0(Native Method)
> java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
> java.lang.Class.privateGetPublicMethods(Class.java:2902)
> java.lang.Class.getMethods(Class.java:1615)
> sbt.internal.inc.ClassToAPI$.toDefinitions0(ClassToAPI.scala:170)
> sbt.internal.inc.ClassToAPI$.$anonfun$toDefinitions$1(ClassToAPI.scala:123)
> scala.collection.mutable.HashMap.getOrElseUpdate(HashMap.scala:86)
> sbt.internal.inc.ClassToAPI$.toDefinitions(ClassToAPI.scala:123)
> sbt.internal.inc.ClassToAPI$.$anonfun$process$1(ClassToAPI.scala:33)
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33948) branch-3.1 jenkins test failed in Scala 2.13

2021-01-04 Thread Wenchen Fan (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258734#comment-17258734
 ] 

Wenchen Fan commented on SPARK-33948:
-

SPARK-33619 improved the codegen test coverage of Spark expression tests, this 
might be the reason for these test failures.

> branch-3.1 jenkins test failed in Scala 2.13 
> -
>
> Key: SPARK-33948
> URL: https://issues.apache.org/jira/browse/SPARK-33948
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 3.1.0
> Environment: * 
>  
>Reporter: Yang Jie
>Priority: Major
>
> [https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/#showFailuresLink]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.reuseClientsUpToConfigVariableConcurrent|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/reuseClientsUpToConfigVariableConcurrent/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.reuseClientsUpToConfigVariableConcurrent|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/reuseClientsUpToConfigVariableConcurrent_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.fastFailConnectionInTimeWindow|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/fastFailConnectionInTimeWindow/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.fastFailConnectionInTimeWindow|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/fastFailConnectionInTimeWindow_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeFactoryBeforeCreateClient|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeFactoryBeforeCreateClient/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeFactoryBeforeCreateClient|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeFactoryBeforeCreateClient_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeBlockClientsWithFactory|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeBlockClientsWithFactory/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeBlockClientsWithFactory|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeBlockClientsWithFactory_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.neverReturnInactiveClients|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/neverReturnInactiveClients/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.neverReturnInactiveClients|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/neverReturnInactiveClients_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeIdleConnectionForRequestTimeOut|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeIdleConnectionForRequestTimeOut/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeIdleConnectionForRequestTimeOut|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeIdleConnectionForRequestTimeOut_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.returnDifferentClientsForDifferentServers|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/returnDifferentClientsForDifferentServers/]
>  
> [

[jira] [Commented] (SPARK-34006) [spark.sql.hive.convertMetastoreOrc]This parameter can solve orc format table insert overwrite read table, it should be stated in the document

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258732#comment-17258732
 ] 

Apache Spark commented on SPARK-34006:
--

User 'dh20' has created a pull request for this issue:
https://github.com/apache/spark/pull/31030

> [spark.sql.hive.convertMetastoreOrc]This parameter can solve orc format table 
> insert overwrite read table, it should be stated in the document
> --
>
> Key: SPARK-34006
> URL: https://issues.apache.org/jira/browse/SPARK-34006
> Project: Spark
>  Issue Type: Bug
>  Components: docs
>Affects Versions: 3.0.1
>Reporter: hao
>Priority: Major
>
> This parameter can solve orc format table insert overwrite read table, it 
> should be stated in the document



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34006) [spark.sql.hive.convertMetastoreOrc]This parameter can solve orc format table insert overwrite read table, it should be stated in the document

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34006:


Assignee: Apache Spark

> [spark.sql.hive.convertMetastoreOrc]This parameter can solve orc format table 
> insert overwrite read table, it should be stated in the document
> --
>
> Key: SPARK-34006
> URL: https://issues.apache.org/jira/browse/SPARK-34006
> Project: Spark
>  Issue Type: Bug
>  Components: docs
>Affects Versions: 3.0.1
>Reporter: hao
>Assignee: Apache Spark
>Priority: Major
>
> This parameter can solve orc format table insert overwrite read table, it 
> should be stated in the document



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34006) [spark.sql.hive.convertMetastoreOrc]This parameter can solve orc format table insert overwrite read table, it should be stated in the document

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258733#comment-17258733
 ] 

Apache Spark commented on SPARK-34006:
--

User 'dh20' has created a pull request for this issue:
https://github.com/apache/spark/pull/31030

> [spark.sql.hive.convertMetastoreOrc]This parameter can solve orc format table 
> insert overwrite read table, it should be stated in the document
> --
>
> Key: SPARK-34006
> URL: https://issues.apache.org/jira/browse/SPARK-34006
> Project: Spark
>  Issue Type: Bug
>  Components: docs
>Affects Versions: 3.0.1
>Reporter: hao
>Priority: Major
>
> This parameter can solve orc format table insert overwrite read table, it 
> should be stated in the document



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34006) [spark.sql.hive.convertMetastoreOrc]This parameter can solve orc format table insert overwrite read table, it should be stated in the document

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34006:


Assignee: (was: Apache Spark)

> [spark.sql.hive.convertMetastoreOrc]This parameter can solve orc format table 
> insert overwrite read table, it should be stated in the document
> --
>
> Key: SPARK-34006
> URL: https://issues.apache.org/jira/browse/SPARK-34006
> Project: Spark
>  Issue Type: Bug
>  Components: docs
>Affects Versions: 3.0.1
>Reporter: hao
>Priority: Major
>
> This parameter can solve orc format table insert overwrite read table, it 
> should be stated in the document



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34007) Downgrade scala-maven-plugin to 4.3.0

2021-01-04 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-34007:
-
Target Version/s: 3.1.0

> Downgrade scala-maven-plugin to 4.3.0
> -
>
> Key: SPARK-34007
> URL: https://issues.apache.org/jira/browse/SPARK-34007
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Blocker
>
> After we upgraded scala-maven-plugin to 4.4.0 at SPARK-33512, the docker 
> release script fails as below:
> {code}
> [INFO] Compiling 21 Scala sources and 3 Java sources to 
> /opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes
>  ...
> [ERROR] ## Exception when compiling 24 sources to 
> /opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes
> java.lang.SecurityException: class "javax.servlet.SessionCookieConfig"'s 
> signer information does not match signer information of other classes in the 
> same package
> java.lang.ClassLoader.checkCerts(ClassLoader.java:891)
> java.lang.ClassLoader.preDefineClass(ClassLoader.java:661)
> java.lang.ClassLoader.defineClass(ClassLoader.java:754)
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
> java.net.URLClassLoader.access$100(URLClassLoader.java:74)
> java.net.URLClassLoader$1.run(URLClassLoader.java:369)
> java.net.URLClassLoader$1.run(URLClassLoader.java:363)
> java.security.AccessController.doPrivileged(Native Method)
> java.net.URLClassLoader.findClass(URLClassLoader.java:362)
> java.lang.ClassLoader.loadClass(ClassLoader.java:418)
> java.lang.ClassLoader.loadClass(ClassLoader.java:351)
> java.lang.Class.getDeclaredMethods0(Native Method)
> java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
> java.lang.Class.privateGetPublicMethods(Class.java:2902)
> java.lang.Class.getMethods(Class.java:1615)
> sbt.internal.inc.ClassToAPI$.toDefinitions0(ClassToAPI.scala:170)
> sbt.internal.inc.ClassToAPI$.$anonfun$toDefinitions$1(ClassToAPI.scala:123)
> scala.collection.mutable.HashMap.getOrElseUpdate(HashMap.scala:86)
> sbt.internal.inc.ClassToAPI$.toDefinitions(ClassToAPI.scala:123)
> sbt.internal.inc.ClassToAPI$.$anonfun$process$1(ClassToAPI.scala:33)
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33980) invalidate char/varchar in spark.readStream.schema

2021-01-04 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-33980:
-
Fix Version/s: (was: 3.1.1)
   3.1.0

> invalidate char/varchar in spark.readStream.schema
> --
>
> Key: SPARK-33980
> URL: https://issues.apache.org/jira/browse/SPARK-33980
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.1.0
>
>
> invalidate char/varchar in spark.readStream.schema just like what we do for 
> spark.read.schema



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34005) Update peak memory metrics for each Executor on task end.

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34005:


Assignee: Kousuke Saruta  (was: Apache Spark)

> Update peak memory metrics for each Executor on task end.
> -
>
> Key: SPARK-34005
> URL: https://issues.apache.org/jira/browse/SPARK-34005
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.0, 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> Like other peak memory metrics (e.g, stage, executors in a stage), it's 
> better to update the peak memory metrics for each Executor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34005) Update peak memory metrics for each Executor on task end.

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258727#comment-17258727
 ] 

Apache Spark commented on SPARK-34005:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/31029

> Update peak memory metrics for each Executor on task end.
> -
>
> Key: SPARK-34005
> URL: https://issues.apache.org/jira/browse/SPARK-34005
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.0, 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> Like other peak memory metrics (e.g, stage, executors in a stage), it's 
> better to update the peak memory metrics for each Executor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow

2021-01-04 Thread L. C. Hsieh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258726#comment-17258726
 ] 

L. C. Hsieh commented on SPARK-33833:
-

Yea, but this can be easily overcome here. We just need to have a user-provided 
group id for committing offset purpose. As users need to specify it when they 
want to commit offset and track the progress, this is used by users with 
caution. Even for committing with currently static group ID given by users, I 
do not think it is really a reason to reject the committing offset idea. Once 
users decide to commit offset and track the progress, they should be cautious 
with the risk.

Anyway, this seems not the reason causing the previous PR to be closed.

> Allow Spark Structured Streaming report Kafka Lag through Burrow
> 
>
> Key: SPARK-33833
> URL: https://issues.apache.org/jira/browse/SPARK-33833
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.1
>Reporter: Sam Davarnia
>Priority: Major
>
> Because structured streaming tracks Kafka offset consumption by itself, 
> It is not possible to track total Kafka lag using Burrow similar to DStreams
> We have used Stream hooks as mentioned 
> [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37]
>  
> It would be great if Spark supports this feature out of the box.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34005) Update peak memory metrics for each Executor on task end.

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34005:


Assignee: Apache Spark  (was: Kousuke Saruta)

> Update peak memory metrics for each Executor on task end.
> -
>
> Key: SPARK-34005
> URL: https://issues.apache.org/jira/browse/SPARK-34005
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.0, 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Apache Spark
>Priority: Minor
>
> Like other peak memory metrics (e.g, stage, executors in a stage), it's 
> better to update the peak memory metrics for each Executor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33992) resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer

2021-01-04 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-33992:
-
Fix Version/s: (was: 3.1.1)
   3.1.0

> resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer
> -
>
> Key: SPARK-33992
> URL: https://issues.apache.org/jira/browse/SPARK-33992
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Minor
> Fix For: 3.1.0
>
>
> PaddingAndLengthCheckForCharVarchar could fail query when 
> resolveOperatorsUpWithNewOutput
> with 
> {code:java}
> [info] - char/varchar resolution in sub query  *** FAILED *** (367 
> milliseconds)
> [info]   java.lang.RuntimeException: This method should not be called in the 
> analyzer
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule(AnalysisHelper.scala:150)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule$(AnalysisHelper.scala:146)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.assertNotAnalysisRule(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown(AnalysisHelper.scala:161)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown$(AnalysisHelper.scala:160)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$updateOuterReferencesInSubquery(QueryPlan.scala:267)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33894) Word2VecSuite failed for Scala 2.13

2021-01-04 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-33894:
-
Fix Version/s: (was: 3.1.1)
   3.1.0

> Word2VecSuite failed for Scala 2.13
> ---
>
> Key: SPARK-33894
> URL: https://issues.apache.org/jira/browse/SPARK-33894
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Affects Versions: 3.2.0
>Reporter: Darcy Shen
>Assignee: koert kuipers
>Priority: Major
> Fix For: 3.1.0
>
>
> This may be the first failed build:
> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7-scala-2.13/52/
> h2. Possible Work Around Fix
> Move 
> case class Data(word: String, vector: Array[Float])
> out of the class Word2VecModel
> h2. Attempts to git bisect
> master branch git "bisect"
> cc23581e2645c91fa8d6e6c81dc87b4221718bb1 fail
> 3d0323401f7a3e4369a3d3f4ff98f15d19e8a643  fail
> 9d9d4a8e122cf1137edeca857e925f7e76c1ace2   fail
> f5d2165c95fe83f24be9841807613950c1d5d6d0 fail 2020-12-01
> h2. Attached Stack Trace
> To reproduce it in master:
> ./dev/change-scala-version.sh 2.13
> sbt -Pscala-2.13
> > project mllib
> > testOnly org.apache.spark.ml.feature.Word2VecSuite
> [info] Word2VecSuite:
> [info] - params (45 milliseconds)
> [info] - Word2Vec (5 seconds, 768 milliseconds)
> [info] - getVectors (549 milliseconds)
> [info] - findSynonyms (222 milliseconds)
> [info] - window size (382 milliseconds)
> [info] - Word2Vec read/write numPartitions calculation (1 millisecond)
> [info] - Word2Vec read/write (669 milliseconds)
> [info] - Word2VecModel read/write *** FAILED *** (423 milliseconds)
> [info]   org.apache.spark.SparkException: Job aborted.
> [info]   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:231)
> [info]   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:188)
> [info]   at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108)
> [info]   at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106)
> [info]   at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:131)
> [info]   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
> [info]   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
> [info]   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> [info]   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
> [info]   at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
> [info]   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132)
> [info]   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
> [info]   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
> [info]   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
> [info]   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
> [info]   at 
> org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
> [info]   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:874)
> [info]   at 
> org.apache.spark.ml.feature.Word2VecModel$Word2VecModelWriter.saveImpl(Word2Vec.scala:368)
> [info]   at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:168)
> [info]   at org.apache.spark.ml.util.MLWritable.save(ReadWrite.scala:287)
> [info]   at org.apache.spark.ml.util.MLWritable.save$(ReadWrite.scala:287)
> [info]   at org.apache.spark.ml.feature.Word2VecModel.save(Word2Vec.scala:207)
> [info]   at 
> org.apache.spark.ml.util.DefaultReadWriteTest.testDefaultReadWrite(DefaultReadWriteTest.scala:51)
> [info]   at 
> org.apache.spark.ml.util.DefaultReadWriteTest.testDefaultReadWrite$(DefaultReadWriteTest.scala:42)
> [info]   at 
> org.apache.spark.ml.feature.Word2VecSuite.testDefaultReadWrite(Word2Ve

[jira] [Updated] (SPARK-34000) ExecutorAllocationListener threw an exception java.util.NoSuchElementException

2021-01-04 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-34000:
-
Fix Version/s: (was: 3.1.1)
   3.1.0

> ExecutorAllocationListener threw an exception java.util.NoSuchElementException
> --
>
> Key: SPARK-34000
> URL: https://issues.apache.org/jira/browse/SPARK-34000
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1, 3.1.0, 3.2.0
>Reporter: Lantao Jin
>Assignee: Lantao Jin
>Priority: Major
> Fix For: 3.0.2, 3.1.0
>
>
> 21/01/04 03:00:32,259 WARN [task-result-getter-2] scheduler.TaskSetManager:69 
> : Lost task 306.1 in stage 600.0 (TID 283610, 
> hdc49-mcc10-01-0510-4108-039-tess0097.stratus.rno.ebay.com, executor 27): 
> TaskKilled (another attempt succeeded)
> 21/01/04 03:00:32,259 INFO [task-result-getter-2] scheduler.TaskSetManager:57 
> : Task 306.1 in stage 600.0 (TID 283610) failed, but the task will not be 
> re-executed (either because the task failed with a shuffle data fetch 
> failure, so the
> previous stage needs to be re-run, or because a different copy of the task 
> has already succeeded).
> 21/01/04 03:00:32,259 INFO [task-result-getter-2] 
> cluster.YarnClusterScheduler:57 : Removed TaskSet 600.0, whose tasks have all 
> completed, from pool default
> 21/01/04 03:00:32,259 INFO [HiveServer2-Handler-Pool: Thread-5853] 
> thriftserver.SparkExecuteStatementOperation:190 : Returning result set with 
> 50 rows from offsets [5378600, 5378650) with 
> 1fe245f8-a7f9-4ec0-bcb5-8cf324cbbb47
> 21/01/04 03:00:32,260 ERROR [spark-listener-group-executorManagement] 
> scheduler.AsyncEventQueue:94 : Listener ExecutorAllocationListener threw an 
> exception
> java.util.NoSuchElementException: key not found: Stage 600 (Attempt 0)
> at scala.collection.MapLike.default(MapLike.scala:235)
> at scala.collection.MapLike.default$(MapLike.scala:234)
> at scala.collection.AbstractMap.default(Map.scala:63)
> at scala.collection.mutable.HashMap.apply(HashMap.scala:69)
> at 
> org.apache.spark.ExecutorAllocationManager$ExecutorAllocationListener.onTaskEnd(ExecutorAllocationManager.scala:621)
> at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:45)
> at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38)
> at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:115)
> at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:99)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:116)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:116)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:102)
> at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:97)
> at 
> org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320)
> at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:97)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33950) ALTER TABLE .. DROP PARTITION doesn't refresh cache

2021-01-04 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258725#comment-17258725
 ] 

Hyukjin Kwon commented on SPARK-33950:
--

I need to recreate rc1 tag. I failed to create a RC due to an dependency issue 
SPARK-34007.  I am correcting the fix version to 3.1.0

> ALTER TABLE .. DROP PARTITION doesn't refresh cache
> ---
>
> Key: SPARK-33950
> URL: https://issues.apache.org/jira/browse/SPARK-33950
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.0, 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>  Labels: correctness
> Fix For: 3.0.2, 3.1.0, 3.2.0
>
>
> Here is the example to reproduce the issue:
> {code:sql}
> spark-sql> CREATE TABLE tbl1 (col0 int, part0 int) USING parquet PARTITIONED 
> BY (part0);
> spark-sql> INSERT INTO tbl1 PARTITION (part0=0) SELECT 0;
> spark-sql> INSERT INTO tbl1 PARTITION (part0=1) SELECT 1;
> spark-sql> CACHE TABLE tbl1;
> spark-sql> SELECT * FROM tbl1;
> 0 0
> 1 1
> spark-sql> ALTER TABLE tbl1 DROP PARTITION (part0=0);
> spark-sql> SELECT * FROM tbl1;
> 0 0
> 1 1
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-33980) invalidate char/varchar in spark.readStream.schema

2021-01-04 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258724#comment-17258724
 ] 

Hyukjin Kwon edited comment on SPARK-33980 at 1/5/21, 7:43 AM:
---

I need to recreate rc1 tag. I failed to create a RC due to an dependency issue 
SPARK-34007.  I am correcting the fix version to 3.1.0


was (Author: hyukjin.kwon):
I need to recreate rc1 tag. I failed to create a RC due to an dependency issue 
SPARK-34007. 

> invalidate char/varchar in spark.readStream.schema
> --
>
> Key: SPARK-33980
> URL: https://issues.apache.org/jira/browse/SPARK-33980
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.1.1
>
>
> invalidate char/varchar in spark.readStream.schema just like what we do for 
> spark.read.schema



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33950) ALTER TABLE .. DROP PARTITION doesn't refresh cache

2021-01-04 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-33950:
-
Fix Version/s: (was: 3.1.1)
   3.1.0

> ALTER TABLE .. DROP PARTITION doesn't refresh cache
> ---
>
> Key: SPARK-33950
> URL: https://issues.apache.org/jira/browse/SPARK-33950
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.0, 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>  Labels: correctness
> Fix For: 3.0.2, 3.1.0, 3.2.0
>
>
> Here is the example to reproduce the issue:
> {code:sql}
> spark-sql> CREATE TABLE tbl1 (col0 int, part0 int) USING parquet PARTITIONED 
> BY (part0);
> spark-sql> INSERT INTO tbl1 PARTITION (part0=0) SELECT 0;
> spark-sql> INSERT INTO tbl1 PARTITION (part0=1) SELECT 1;
> spark-sql> CACHE TABLE tbl1;
> spark-sql> SELECT * FROM tbl1;
> 0 0
> 1 1
> spark-sql> ALTER TABLE tbl1 DROP PARTITION (part0=0);
> spark-sql> SELECT * FROM tbl1;
> 0 0
> 1 1
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33980) invalidate char/varchar in spark.readStream.schema

2021-01-04 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258724#comment-17258724
 ] 

Hyukjin Kwon commented on SPARK-33980:
--

I need to recreate rc1 tag. I failed to create a RC due to an dependency issue 
SPARK-34007. 

> invalidate char/varchar in spark.readStream.schema
> --
>
> Key: SPARK-33980
> URL: https://issues.apache.org/jira/browse/SPARK-33980
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.1.1
>
>
> invalidate char/varchar in spark.readStream.schema just like what we do for 
> spark.read.schema



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34007) Downgrade scala-maven-plugin to 4.3.0

2021-01-04 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-34007:


 Summary: Downgrade scala-maven-plugin to 4.3.0
 Key: SPARK-34007
 URL: https://issues.apache.org/jira/browse/SPARK-34007
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.1.0
Reporter: Hyukjin Kwon


After we upgraded scala-maven-plugin to 4.4.0 at SPARK-33512, the docker 
release script fails as below:

{code}
[INFO] Compiling 21 Scala sources and 3 Java sources to 
/opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes
 ...
[ERROR] ## Exception when compiling 24 sources to 
/opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes
java.lang.SecurityException: class "javax.servlet.SessionCookieConfig"'s signer 
information does not match signer information of other classes in the same 
package
java.lang.ClassLoader.checkCerts(ClassLoader.java:891)
java.lang.ClassLoader.preDefineClass(ClassLoader.java:661)
java.lang.ClassLoader.defineClass(ClassLoader.java:754)
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
java.net.URLClassLoader.access$100(URLClassLoader.java:74)
java.net.URLClassLoader$1.run(URLClassLoader.java:369)
java.net.URLClassLoader$1.run(URLClassLoader.java:363)
java.security.AccessController.doPrivileged(Native Method)
java.net.URLClassLoader.findClass(URLClassLoader.java:362)
java.lang.ClassLoader.loadClass(ClassLoader.java:418)
java.lang.ClassLoader.loadClass(ClassLoader.java:351)
java.lang.Class.getDeclaredMethods0(Native Method)
java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
java.lang.Class.privateGetPublicMethods(Class.java:2902)
java.lang.Class.getMethods(Class.java:1615)
sbt.internal.inc.ClassToAPI$.toDefinitions0(ClassToAPI.scala:170)
sbt.internal.inc.ClassToAPI$.$anonfun$toDefinitions$1(ClassToAPI.scala:123)
scala.collection.mutable.HashMap.getOrElseUpdate(HashMap.scala:86)
sbt.internal.inc.ClassToAPI$.toDefinitions(ClassToAPI.scala:123)
sbt.internal.inc.ClassToAPI$.$anonfun$process$1(ClassToAPI.scala:33)
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34006) [spark.sql.hive.convertMetastoreOrc]This parameter can solve orc format table insert overwrite read table, it should be stated in the document

2021-01-04 Thread hao (Jira)

hao created SPARK-34006:
---

 Summary: [spark.sql.hive.convertMetastoreOrc]This parameter can 
solve orc format table insert overwrite read table, it should be stated in the 
document
 Key: SPARK-34006
 URL: https://issues.apache.org/jira/browse/SPARK-34006
 Project: Spark
  Issue Type: Bug
  Components: docs
Affects Versions: 3.0.1
Reporter: hao


This parameter can solve orc format table insert overwrite read table, it 
should be stated in the document



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33989) Strip auto-generated cast when using Cast.sql

2021-01-04 Thread ulysses you (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ulysses you updated SPARK-33989:

Summary: Strip auto-generated cast when using Cast.sql  (was: Strip 
auto-generated cast when resolving UnresolvedAlias)

> Strip auto-generated cast when using Cast.sql
> -
>
> Key: SPARK-33989
> URL: https://issues.apache.org/jira/browse/SPARK-33989
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: ulysses you
>Priority: Minor
>
> During analysis we may introduce the Cast if exists type cast implicitly. 
> That makes assgined name unclear.
> Let's say we have a sql `select id == null` which id is int type, then the 
> output field name will be `(id = CAST(null as int))`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34003) Rule conflicts between PaddingAndLengthCheckForCharVarchar and ResolveAggregateFunctions

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34003:


Assignee: (was: Apache Spark)

> Rule conflicts between PaddingAndLengthCheckForCharVarchar and 
> ResolveAggregateFunctions
> 
>
> Key: SPARK-34003
> URL: https://issues.apache.org/jira/browse/SPARK-34003
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Priority: Critical
>
> ResolveAggregateFunctions is a hacky rule and it calls `executeSameContext` 
> to generate a `resolved agg` to determine which unresolved sort attribute 
> should be pushed into the agg. However, after we add the 
> PaddingAndLengthCheckForCharVarchar rule which will rewrite the query output, 
> thus, the `resolved agg` cannot match original attributes anymore. 
> It causes some dissociative sort attribute to be pushed in and fails the query
> {code:java}
> [info]   Failed to analyze query: org.apache.spark.sql.AnalysisException: 
> expression 'testcat.t1.`v`' is neither present in the group by, nor is it an 
> aggregate function. Add to group by or wrap in first() (or first_value) if 
> you don't care which value you get.;
> [info]   Project [v#14, sum(i)#11L]
> [info]   +- Sort [aggOrder#12 ASC NULLS FIRST], true
> [info]  +- !Aggregate [v#14], [v#14, sum(cast(i#7 as bigint)) AS 
> sum(i)#11L, v#13 AS aggOrder#12]
> [info] +- SubqueryAlias testcat.t1
> [info]+- Project [if ((length(v#6) <= 3)) v#6 else if 
> ((length(rtrim(v#6, None)) > 3)) cast(raise_error(concat(input string of 
> length , cast(length(v#6) as string),  exceeds varchar type length 
> limitation: 3)) as string) else rpad(rtrim(v#6, None), 3,  ) AS v#14, i#7]
> [info]   +- RelationV2[v#6, i#7, index#15, _partition#16] 
> testcat.t1
> [info]
> [info]   Project [v#14, sum(i)#11L]
> [info]   +- Sort [aggOrder#12 ASC NULLS FIRST], true
> [info]  +- !Aggregate [v#14], [v#14, sum(cast(i#7 as bigint)) AS 
> sum(i)#11L, v#13 AS aggOrder#12]
> [info] +- SubqueryAlias testcat.t1
> [info]+- Project [if ((length(v#6) <= 3)) v#6 else if 
> ((length(rtrim(v#6, None)) > 3)) cast(raise_error(concat(input string of 
> length , cast(length(v#6) as string),  exceeds varchar type length 
> limitation: 3)) as string) else rpad(rtrim(v#6, None), 3,  ) AS v#14, i#7]
> [info]   +- RelationV2[v#6, i#7, index#15, _partition#16] 
> testcat.t1
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34003) Rule conflicts between PaddingAndLengthCheckForCharVarchar and ResolveAggregateFunctions

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34003:


Assignee: Apache Spark

> Rule conflicts between PaddingAndLengthCheckForCharVarchar and 
> ResolveAggregateFunctions
> 
>
> Key: SPARK-34003
> URL: https://issues.apache.org/jira/browse/SPARK-34003
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Assignee: Apache Spark
>Priority: Critical
>
> ResolveAggregateFunctions is a hacky rule and it calls `executeSameContext` 
> to generate a `resolved agg` to determine which unresolved sort attribute 
> should be pushed into the agg. However, after we add the 
> PaddingAndLengthCheckForCharVarchar rule which will rewrite the query output, 
> thus, the `resolved agg` cannot match original attributes anymore. 
> It causes some dissociative sort attribute to be pushed in and fails the query
> {code:java}
> [info]   Failed to analyze query: org.apache.spark.sql.AnalysisException: 
> expression 'testcat.t1.`v`' is neither present in the group by, nor is it an 
> aggregate function. Add to group by or wrap in first() (or first_value) if 
> you don't care which value you get.;
> [info]   Project [v#14, sum(i)#11L]
> [info]   +- Sort [aggOrder#12 ASC NULLS FIRST], true
> [info]  +- !Aggregate [v#14], [v#14, sum(cast(i#7 as bigint)) AS 
> sum(i)#11L, v#13 AS aggOrder#12]
> [info] +- SubqueryAlias testcat.t1
> [info]+- Project [if ((length(v#6) <= 3)) v#6 else if 
> ((length(rtrim(v#6, None)) > 3)) cast(raise_error(concat(input string of 
> length , cast(length(v#6) as string),  exceeds varchar type length 
> limitation: 3)) as string) else rpad(rtrim(v#6, None), 3,  ) AS v#14, i#7]
> [info]   +- RelationV2[v#6, i#7, index#15, _partition#16] 
> testcat.t1
> [info]
> [info]   Project [v#14, sum(i)#11L]
> [info]   +- Sort [aggOrder#12 ASC NULLS FIRST], true
> [info]  +- !Aggregate [v#14], [v#14, sum(cast(i#7 as bigint)) AS 
> sum(i)#11L, v#13 AS aggOrder#12]
> [info] +- SubqueryAlias testcat.t1
> [info]+- Project [if ((length(v#6) <= 3)) v#6 else if 
> ((length(rtrim(v#6, None)) > 3)) cast(raise_error(concat(input string of 
> length , cast(length(v#6) as string),  exceeds varchar type length 
> limitation: 3)) as string) else rpad(rtrim(v#6, None), 3,  ) AS v#14, i#7]
> [info]   +- RelationV2[v#6, i#7, index#15, _partition#16] 
> testcat.t1
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34004) Change FrameLessOffsetWindowFunction as sealed abstract class

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34004:


Assignee: Apache Spark

> Change FrameLessOffsetWindowFunction as sealed abstract class
> -
>
> Key: SPARK-34004
> URL: https://issues.apache.org/jira/browse/SPARK-34004
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jiaan.geng
>Assignee: Apache Spark
>Priority: Major
>
> Change FrameLessOffsetWindowFunction as sealed abstract class so that 
> simplify pattern match.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34004) Change FrameLessOffsetWindowFunction as sealed abstract class

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258722#comment-17258722
 ] 

Apache Spark commented on SPARK-34004:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/31026

> Change FrameLessOffsetWindowFunction as sealed abstract class
> -
>
> Key: SPARK-34004
> URL: https://issues.apache.org/jira/browse/SPARK-34004
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jiaan.geng
>Priority: Major
>
> Change FrameLessOffsetWindowFunction as sealed abstract class so that 
> simplify pattern match.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34004) Change FrameLessOffsetWindowFunction as sealed abstract class

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34004:


Assignee: (was: Apache Spark)

> Change FrameLessOffsetWindowFunction as sealed abstract class
> -
>
> Key: SPARK-34004
> URL: https://issues.apache.org/jira/browse/SPARK-34004
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jiaan.geng
>Priority: Major
>
> Change FrameLessOffsetWindowFunction as sealed abstract class so that 
> simplify pattern match.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34003) Rule conflicts between PaddingAndLengthCheckForCharVarchar and ResolveAggregateFunctions

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258723#comment-17258723
 ] 

Apache Spark commented on SPARK-34003:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/31027

> Rule conflicts between PaddingAndLengthCheckForCharVarchar and 
> ResolveAggregateFunctions
> 
>
> Key: SPARK-34003
> URL: https://issues.apache.org/jira/browse/SPARK-34003
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Priority: Critical
>
> ResolveAggregateFunctions is a hacky rule and it calls `executeSameContext` 
> to generate a `resolved agg` to determine which unresolved sort attribute 
> should be pushed into the agg. However, after we add the 
> PaddingAndLengthCheckForCharVarchar rule which will rewrite the query output, 
> thus, the `resolved agg` cannot match original attributes anymore. 
> It causes some dissociative sort attribute to be pushed in and fails the query
> {code:java}
> [info]   Failed to analyze query: org.apache.spark.sql.AnalysisException: 
> expression 'testcat.t1.`v`' is neither present in the group by, nor is it an 
> aggregate function. Add to group by or wrap in first() (or first_value) if 
> you don't care which value you get.;
> [info]   Project [v#14, sum(i)#11L]
> [info]   +- Sort [aggOrder#12 ASC NULLS FIRST], true
> [info]  +- !Aggregate [v#14], [v#14, sum(cast(i#7 as bigint)) AS 
> sum(i)#11L, v#13 AS aggOrder#12]
> [info] +- SubqueryAlias testcat.t1
> [info]+- Project [if ((length(v#6) <= 3)) v#6 else if 
> ((length(rtrim(v#6, None)) > 3)) cast(raise_error(concat(input string of 
> length , cast(length(v#6) as string),  exceeds varchar type length 
> limitation: 3)) as string) else rpad(rtrim(v#6, None), 3,  ) AS v#14, i#7]
> [info]   +- RelationV2[v#6, i#7, index#15, _partition#16] 
> testcat.t1
> [info]
> [info]   Project [v#14, sum(i)#11L]
> [info]   +- Sort [aggOrder#12 ASC NULLS FIRST], true
> [info]  +- !Aggregate [v#14], [v#14, sum(cast(i#7 as bigint)) AS 
> sum(i)#11L, v#13 AS aggOrder#12]
> [info] +- SubqueryAlias testcat.t1
> [info]+- Project [if ((length(v#6) <= 3)) v#6 else if 
> ((length(rtrim(v#6, None)) > 3)) cast(raise_error(concat(input string of 
> length , cast(length(v#6) as string),  exceeds varchar type length 
> limitation: 3)) as string) else rpad(rtrim(v#6, None), 3,  ) AS v#14, i#7]
> [info]   +- RelationV2[v#6, i#7, index#15, _partition#16] 
> testcat.t1
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32017) Make Pyspark Hadoop 3.2+ Variant available in PyPI

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258721#comment-17258721
 ] 

Apache Spark commented on SPARK-32017:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/31028

> Make Pyspark Hadoop 3.2+ Variant available in PyPI
> --
>
> Key: SPARK-32017
> URL: https://issues.apache.org/jira/browse/SPARK-32017
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.0.0
>Reporter: George Pongracz
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.1.0
>
>
> The version of Pyspark 3.0.0 currently available in PyPI currently uses 
> hadoop 2.7.4.
> Could a variant (or the default) have its version of Hadoop aligned to 3.2.0 
> as per the downloadable spark binaries.
> This would enable the PyPI version to be compatible with session token 
> authorisations and assist in accessing data residing in object stores with 
> stronger encryption methods.
> If not PyPI then as a tar file in the apache download archives at the least 
> please.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34005) Update peak memory metrics for each Executor on task end.

2021-01-04 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-34005:
--

 Summary: Update peak memory metrics for each Executor on task end.
 Key: SPARK-34005
 URL: https://issues.apache.org/jira/browse/SPARK-34005
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.1.0, 3.2.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


Like other peak memory metrics (e.g, stage, executors in a stage), it's better 
to update the peak memory metrics for each Executor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34005) Update peak memory metrics for each Executor on task end.

2021-01-04 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-34005:
---
Issue Type: Improvement  (was: Bug)

> Update peak memory metrics for each Executor on task end.
> -
>
> Key: SPARK-34005
> URL: https://issues.apache.org/jira/browse/SPARK-34005
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.0, 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> Like other peak memory metrics (e.g, stage, executors in a stage), it's 
> better to update the peak memory metrics for each Executor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33919) Unify v1 and v2 SHOW NAMESPACES tests

2021-01-04 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-33919.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 30937
[https://github.com/apache/spark/pull/30937]

> Unify v1 and v2 SHOW NAMESPACES tests
> -
>
> Key: SPARK-33919
> URL: https://issues.apache.org/jira/browse/SPARK-33919
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> Write unified tests for SHOW DATABASES and SHOW NAMESPACES that can be run 
> for v1 and v2 catalogs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33919) Unify v1 and v2 SHOW NAMESPACES tests

2021-01-04 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-33919:
---

Assignee: Maxim Gekk

> Unify v1 and v2 SHOW NAMESPACES tests
> -
>
> Key: SPARK-33919
> URL: https://issues.apache.org/jira/browse/SPARK-33919
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>
> Write unified tests for SHOW DATABASES and SHOW NAMESPACES that can be run 
> for v1 and v2 catalogs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33995) Make datetime addition easier for years, weeks, hours, minutes, and seconds

2021-01-04 Thread Maxim Gekk (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258718#comment-17258718
 ] 

Maxim Gekk commented on SPARK-33995:


> Option 1: Single make_interval function that takes 7 arguments

Small clarification. make_interval could have default values for all 7 
arguments like Postgress has, see 
[https://www.postgresql.org/docs/9.4/functions-datetime.html]

> As a user, Option 3 would be my preference.  
>col("first_datetime").addHours(2).addSeconds(30) is easy for me to remember 
>and type.

I like this approach too

> Make datetime addition easier for years, weeks, hours, minutes, and seconds
> ---
>
> Key: SPARK-33995
> URL: https://issues.apache.org/jira/browse/SPARK-33995
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Matthew Powers
>Priority: Minor
>
> There are add_months and date_add functions that make it easy to perform 
> datetime addition with months and days, but there isn't an easy way to 
> perform datetime addition with years, weeks, hours, minutes, or seconds with 
> the Scala/Python/R APIs.
> Users need to write code like expr("first_datetime + INTERVAL 2 hours") to 
> add two hours to a timestamp with the Scala API, which isn't desirable.  We 
> don't want to make Scala users manipulate SQL strings.
> We can expose the [make_interval SQL 
> function|https://github.com/apache/spark/pull/26446/files] to make any 
> combination of datetime addition possible.  That'll make tons of different 
> datetime addition operations possible and will be valuable for a wide array 
> of users.
> make_interval takes 7 arguments: years, months, weeks, days, hours, mins, and 
> secs.
> There are different ways to expose the make_interval functionality to 
> Scala/Python/R users:
>  * Option 1: Single make_interval function that takes 7 arguments
>  * Option 2: expose a few interval functions
>  ** make_date_interval function that takes years, months, days
>  ** make_time_interval function that takes hours, minutes, seconds
>  ** make_datetime_interval function that takes years, months, days, hours, 
> minutes, seconds
>  * Option 3: expose add_years, add_months, add_days, add_weeks, add_hours, 
> add_minutes, and add_seconds as Column methods.  
>  * Option 4: Expose the add_years, add_hours, etc. as column functions.  
> add_weeks and date_add have already been exposed in this manner.  
> Option 1 is nice from a maintenance perspective cause it's a single function, 
> but it's not standard from a user perspective.  Most languages support 
> datetime instantiation with these arguments: years, months, days, hours, 
> minutes, seconds.  Mixing weeks into the equation is not standard.
> As a user, Option 3 would be my preference.  
> col("first_datetime").addHours(2).addSeconds(30) is easy for me to remember 
> and type.  col("first_datetime") + make_time_interval(lit(2), lit(0), 
> lit(30)) isn't as nice.  col("first_datetime") + make_interval(lit(0), 
> lit(0), lit(0), lit(0), lit(2), lit(0), lit(30)) is harder still.
> Any of these options is an improvement to the status quo.  Let me know what 
> option you think is best and then I'll make a PR to implement it, building 
> off of Max's foundational work of course ;)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow

2021-01-04 Thread Jungtaek Lim (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258717#comment-17258717
 ] 

Jungtaek Lim commented on SPARK-33833:
--

That’s available with serious caution. Spark has to have full control of offset 
management and it shouldn’t be touched from outside in any way. Creating unique 
group ID is a defensive approach on this, preventing end users to mess up by 
accident. Once end users set the static group ID, the guard is no longer valid.

> Allow Spark Structured Streaming report Kafka Lag through Burrow
> 
>
> Key: SPARK-33833
> URL: https://issues.apache.org/jira/browse/SPARK-33833
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.1
>Reporter: Sam Davarnia
>Priority: Major
>
> Because structured streaming tracks Kafka offset consumption by itself, 
> It is not possible to track total Kafka lag using Burrow similar to DStreams
> We have used Stream hooks as mentioned 
> [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37]
>  
> It would be great if Spark supports this feature out of the box.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow

2021-01-04 Thread L. C. Hsieh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258716#comment-17258716
 ] 

L. C. Hsieh commented on SPARK-33833:
-

I read though the comments in the previous PR. The approach is pretty similar 
as what I did locally. So I guess that if nothing changes, it won't be 
considered too in the Spark codebase.



> Allow Spark Structured Streaming report Kafka Lag through Burrow
> 
>
> Key: SPARK-33833
> URL: https://issues.apache.org/jira/browse/SPARK-33833
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.1
>Reporter: Sam Davarnia
>Priority: Major
>
> Because structured streaming tracks Kafka offset consumption by itself, 
> It is not possible to track total Kafka lag using Burrow similar to DStreams
> We have used Stream hooks as mentioned 
> [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37]
>  
> It would be great if Spark supports this feature out of the box.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow

2021-01-04 Thread L. C. Hsieh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258707#comment-17258707
 ] 

L. C. Hsieh commented on SPARK-33833:
-

Btw, thanks for providing the useful link to previous ticket/PR. 
[~kabhwan]

> Allow Spark Structured Streaming report Kafka Lag through Burrow
> 
>
> Key: SPARK-33833
> URL: https://issues.apache.org/jira/browse/SPARK-33833
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.1
>Reporter: Sam Davarnia
>Priority: Major
>
> Because structured streaming tracks Kafka offset consumption by itself, 
> It is not possible to track total Kafka lag using Burrow similar to DStreams
> We have used Stream hooks as mentioned 
> [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37]
>  
> It would be great if Spark supports this feature out of the box.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34004) Change FrameLessOffsetWindowFunction as sealed abstract class

2021-01-04 Thread jiaan.geng (Jira)

jiaan.geng created SPARK-34004:
--

 Summary: Change FrameLessOffsetWindowFunction as sealed abstract 
class
 Key: SPARK-34004
 URL: https://issues.apache.org/jira/browse/SPARK-34004
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: jiaan.geng


Change FrameLessOffsetWindowFunction as sealed abstract class so that simplify 
pattern match.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33935) Fix CBOs cost function

2021-01-04 Thread Takeshi Yamamuro (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro resolved SPARK-33935.
--
Fix Version/s: 3.2.0
   3.1.0
 Assignee: Tanel Kiis
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/30965

> Fix CBOs cost function 
> ---
>
> Key: SPARK-33935
> URL: https://issues.apache.org/jira/browse/SPARK-33935
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tanel Kiis
>Assignee: Tanel Kiis
>Priority: Major
> Fix For: 3.1.0, 3.2.0
>
>
> The parameter spark.sql.cbo.joinReorder.card.weight is decumented as:
> {code:title=spark.sql.cbo.joinReorder.card.weight}
> The weight of cardinality (number of rows) for plan cost comparison in join 
> reorder: rows * weight + size * (1 - weight).
> {code}
> But in the implementation the formula is a bit different:
> {code:title=Current implementation}
> def betterThan(other: JoinPlan, conf: SQLConf): Boolean = {
>   if (other.planCost.card == 0 || other.planCost.size == 0) {
> false
>   } else {
> val relativeRows = BigDecimal(this.planCost.card) / 
> BigDecimal(other.planCost.card)
> val relativeSize = BigDecimal(this.planCost.size) / 
> BigDecimal(other.planCost.size)
> relativeRows * conf.joinReorderCardWeight +
>   relativeSize * (1 - conf.joinReorderCardWeight) < 1
>   }
> }
> {code}
> This change has an unfortunate consequence: 
> given two plans A and B, both A betterThan B and B betterThan A might give 
> the same results. This happes when one has many rows with small sizes and 
> other has few rows with large sizes.
> A example values, that have this fenomen with the default weight value (0.7):
> A.card = 500, B.card = 300
> A.size = 30, B.size = 80
> Both A betterThan B and B betterThan A would have score above 1 and would 
> return false.
> A new implementation is proposed, that matches the documentation:
> {code:title=Proposed implementation}
> def betterThan(other: JoinPlan, conf: SQLConf): Boolean = {
>   val oldCost = BigDecimal(this.planCost.card) * 
> conf.joinReorderCardWeight +
> BigDecimal(this.planCost.size) * (1 - conf.joinReorderCardWeight)
>   val newCost = BigDecimal(other.planCost.card) * 
> conf.joinReorderCardWeight +
> BigDecimal(other.planCost.size) * (1 - conf.joinReorderCardWeight)
>   newCost < oldCost
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow

2021-01-04 Thread L. C. Hsieh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258705#comment-17258705
 ] 

L. C. Hsieh commented on SPARK-33833:
-

I think SS allows users to specify custom group id, isn't it?

> Allow Spark Structured Streaming report Kafka Lag through Burrow
> 
>
> Key: SPARK-33833
> URL: https://issues.apache.org/jira/browse/SPARK-33833
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.1
>Reporter: Sam Davarnia
>Priority: Major
>
> Because structured streaming tracks Kafka offset consumption by itself, 
> It is not possible to track total Kafka lag using Burrow similar to DStreams
> We have used Stream hooks as mentioned 
> [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37]
>  
> It would be great if Spark supports this feature out of the box.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34003) Rule conflicts between PaddingAndLengthCheckForCharVarchar and ResolveAggregateFunctions

2021-01-04 Thread Kent Yao (Jira)

Kent Yao created SPARK-34003:


 Summary: Rule conflicts between 
PaddingAndLengthCheckForCharVarchar and ResolveAggregateFunctions
 Key: SPARK-34003
 URL: https://issues.apache.org/jira/browse/SPARK-34003
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.0
Reporter: Kent Yao


ResolveAggregateFunctions is a hacky rule and it calls `executeSameContext` to 
generate a `resolved agg` to determine which unresolved sort attribute should 
be pushed into the agg. However, after we add the 
PaddingAndLengthCheckForCharVarchar rule which will rewrite the query output, 
thus, the `resolved agg` cannot match original attributes anymore. 

It causes some dissociative sort attribute to be pushed in and fails the query


{code:java}
[info]   Failed to analyze query: org.apache.spark.sql.AnalysisException: 
expression 'testcat.t1.`v`' is neither present in the group by, nor is it an 
aggregate function. Add to group by or wrap in first() (or first_value) if you 
don't care which value you get.;
[info]   Project [v#14, sum(i)#11L]
[info]   +- Sort [aggOrder#12 ASC NULLS FIRST], true
[info]  +- !Aggregate [v#14], [v#14, sum(cast(i#7 as bigint)) AS 
sum(i)#11L, v#13 AS aggOrder#12]
[info] +- SubqueryAlias testcat.t1
[info]+- Project [if ((length(v#6) <= 3)) v#6 else if 
((length(rtrim(v#6, None)) > 3)) cast(raise_error(concat(input string of length 
, cast(length(v#6) as string),  exceeds varchar type length limitation: 3)) as 
string) else rpad(rtrim(v#6, None), 3,  ) AS v#14, i#7]
[info]   +- RelationV2[v#6, i#7, index#15, _partition#16] testcat.t1
[info]
[info]   Project [v#14, sum(i)#11L]
[info]   +- Sort [aggOrder#12 ASC NULLS FIRST], true
[info]  +- !Aggregate [v#14], [v#14, sum(cast(i#7 as bigint)) AS 
sum(i)#11L, v#13 AS aggOrder#12]
[info] +- SubqueryAlias testcat.t1
[info]+- Project [if ((length(v#6) <= 3)) v#6 else if 
((length(rtrim(v#6, None)) > 3)) cast(raise_error(concat(input string of length 
, cast(length(v#6) as string),  exceeds varchar type length limitation: 3)) as 
string) else rpad(rtrim(v#6, None), 3,  ) AS v#14, i#7]
[info]   +- RelationV2[v#6, i#7, index#15, _partition#16] testcat.t1
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33100) Support parse the sql statements with c-style comments

2021-01-04 Thread Takeshi Yamamuro (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro resolved SPARK-33100.
--
Fix Version/s: 3.2.0
   3.1.0
 Assignee: feiwang  (was: Apache Spark)
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/29982

> Support parse the sql statements with c-style comments
> --
>
> Key: SPARK-33100
> URL: https://issues.apache.org/jira/browse/SPARK-33100
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: feiwang
>Assignee: feiwang
>Priority: Minor
> Fix For: 3.1.0, 3.2.0
>
>
> Now the spark-sql does not support parse the sql statements with C-style 
> comments.
> For the sql statements:
> {code:java}
> /* SELECT 'test'; */
> SELECT 'test';
> {code}
> Would be split to two statements:
> The first: "/* SELECT 'test'"
> The second: "*/ SELECT 'test'"
> Then it would throw an exception because the first one is illegal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow

2021-01-04 Thread Jungtaek Lim (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258695#comment-17258695
 ] 

Jungtaek Lim commented on SPARK-33833:
--

For SS, consumer group is randomly generated by intention, which is the actual 
issue on leveraging the offset information with Kafka ecosystem.

SPARK-27549 was the thing to address this, but that was unfortunately 
soft-rejected to have in Spark repository. Instead of pushing this more, I've 
just crafted the project on my repository -  
https://github.com/HeartSaVioR/spark-sql-kafka-offset-committer

> Allow Spark Structured Streaming report Kafka Lag through Burrow
> 
>
> Key: SPARK-33833
> URL: https://issues.apache.org/jira/browse/SPARK-33833
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.1
>Reporter: Sam Davarnia
>Priority: Major
>
> Because structured streaming tracks Kafka offset consumption by itself, 
> It is not possible to track total Kafka lag using Burrow similar to DStreams
> We have used Stream hooks as mentioned 
> [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37]
>  
> It would be great if Spark supports this feature out of the box.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow

2021-01-04 Thread L. C. Hsieh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258692#comment-17258692
 ] 

L. C. Hsieh edited comment on SPARK-33833 at 1/5/21, 6:27 AM:
--

Hmm, I did a few test locally. Does Burrow work only if Spark commits offset 
progress back to Kafka?

I added some code to commit offset progress to Kafka. After I checked 
"__consumer_offsets" topic of Kafka, I found that no matter Spark commits the 
progress to Kafka or not, the record of the consumer group of the Spark SS 
query is always in "__consumer_offsets".

Based on https://github.com/linkedin/Burrow/wiki, Burrow checks consumer groups 
info from this "__consumer_offsets" topic. So if either Spark commits or not, 
there will be a record about the consumer group, does it mean Burrow still 
works without Spark committing offset progress to Kafka?

If so, then Spark doesn't need any change for this ticket.




was (Author: viirya):
Hmm, I did a few test locally. Does Burrow work only if Spark commits offset 
progress back to Kafka?

I added some code to commit offset progress to Kafka. After I checked 
"__consumer_offsets" topic of Kafka, I found that no matter Spark commits the 
progress to Kafka or not, the record of the consumer group of the Spark SS 
query is always in "__consumer_offsets".

Based on https://github.com/linkedin/Burrow/wiki, Burrow checks consumer groups 
info from this "__consumer_offsets" topic. So if either Spark commits or not, 
there will be a record about the consumer group, does it mean Burrow still 
works without Spark committing offset progress to Kafka?



> Allow Spark Structured Streaming report Kafka Lag through Burrow
> 
>
> Key: SPARK-33833
> URL: https://issues.apache.org/jira/browse/SPARK-33833
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.1
>Reporter: Sam Davarnia
>Priority: Major
>
> Because structured streaming tracks Kafka offset consumption by itself, 
> It is not possible to track total Kafka lag using Burrow similar to DStreams
> We have used Stream hooks as mentioned 
> [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37]
>  
> It would be great if Spark supports this feature out of the box.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow

2021-01-04 Thread L. C. Hsieh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258693#comment-17258693
 ] 

L. C. Hsieh commented on SPARK-33833:
-

[~samdvr] Can you help elaborate the question above? Thanks.

> Allow Spark Structured Streaming report Kafka Lag through Burrow
> 
>
> Key: SPARK-33833
> URL: https://issues.apache.org/jira/browse/SPARK-33833
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.1
>Reporter: Sam Davarnia
>Priority: Major
>
> Because structured streaming tracks Kafka offset consumption by itself, 
> It is not possible to track total Kafka lag using Burrow similar to DStreams
> We have used Stream hooks as mentioned 
> [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37]
>  
> It would be great if Spark supports this feature out of the box.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33833) Allow Spark Structured Streaming report Kafka Lag through Burrow

2021-01-04 Thread L. C. Hsieh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258692#comment-17258692
 ] 

L. C. Hsieh commented on SPARK-33833:
-

Hmm, I did a few test locally. Does Burrow work only if Spark commits offset 
progress back to Kafka?

I added some code to commit offset progress to Kafka. After I checked 
"__consumer_offsets" topic of Kafka, I found that no matter Spark commits the 
progress to Kafka or not, the record of the consumer group of the Spark SS 
query is always in "__consumer_offsets".

Based on https://github.com/linkedin/Burrow/wiki, Burrow checks consumer groups 
info from this "__consumer_offsets" topic. So if either Spark commits or not, 
there will be a record about the consumer group, does it mean Burrow still 
works without Spark committing offset progress to Kafka?



> Allow Spark Structured Streaming report Kafka Lag through Burrow
> 
>
> Key: SPARK-33833
> URL: https://issues.apache.org/jira/browse/SPARK-33833
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.1
>Reporter: Sam Davarnia
>Priority: Major
>
> Because structured streaming tracks Kafka offset consumption by itself, 
> It is not possible to track total Kafka lag using Burrow similar to DStreams
> We have used Stream hooks as mentioned 
> [here|https://medium.com/@ronbarabash/how-to-measure-consumer-lag-in-spark-structured-streaming-6c3645e45a37]
>  
> It would be great if Spark supports this feature out of the box.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34002) Broken UDF Encoding

2021-01-04 Thread Mark Hamilton (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Hamilton updated SPARK-34002:
--
Description: 
UDFs can behave differently depending on if a dataframe is cached, despite the 
dataframe being identical

 

Repro:

 
{code:java}
import org.apache.spark.sql.expressions.UserDefinedFunction 
import org.apache.spark.sql.functions.{col, udf}

case class Bar(a: Int)
 
import spark.implicits._

def f1(bar: Bar): Option[Bar] = {
 None
}

def f2(bar: Bar): Option[Bar] = {
 Option(bar)
}

val udf1: UserDefinedFunction = udf(f1 _)
val udf2: UserDefinedFunction = udf(f2 _)

// Commenting in the cache will make this example work
val df = (1 to 10).map(i => Tuple1(Bar(1))).toDF("c0")//.cache()
val newDf = df
 .withColumn("c1", udf1(col("c0")))
 .withColumn("c2", udf2(col("c1")))
newDf.show()
{code}
 

Error:

Testing started at 12:58 AM ...Testing started at 12:58 AM ..."C:\Program 
Files\Java\jdk1.8.0_271\bin\java.exe" "-javaagent:C:\Program 
Files\JetBrains\IntelliJ IDEA 2020.2.3\lib\idea_rt.jar=56657:C:\Program 
Files\JetBrains\IntelliJ IDEA 2020.2.3\bin" -Dfile.encoding=UTF-8 -classpath 
"C:\Users\marhamil\AppData\Roaming\JetBrains\IntelliJIdea2020.2\plugins\Scala\lib\runners.jar;C:\Program
 Files\Java\jdk1.8.0_271\jre\lib\charsets.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\deploy.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\ext\access-bridge-64.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\ext\cldrdata.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\ext\dnsns.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\ext\jaccess.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\ext\jfxrt.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\ext\localedata.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\ext\nashorn.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\ext\sunec.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\ext\sunjce_provider.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\ext\sunmscapi.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\ext\sunpkcs11.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\ext\zipfs.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\javaws.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\jce.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\jfr.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\jfxswt.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\jsse.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\management-agent.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\plugin.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\resources.jar;C:\Program 
Files\Java\jdk1.8.0_271\jre\lib\rt.jar;C:\code\mmlspark\target\scala-2.12\test-classes;C:\code\mmlspark\target\scala-2.12\classes;C:\Users\marhamil\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\aopalliance\aopalliance\1.0\aopalliance-1.0.jar;C:\Users\marhamil\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\beust\jcommander\1.27\jcommander-1.27.jar;C:\Users\marhamil\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\chuusai\shapeless_2.12\2.3.3\shapeless_2.12-2.3.3.jar;C:\Users\marhamil\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\clearspring\analytics\stream\2.9.6\stream-2.9.6.jar;C:\Users\marhamil\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\esotericsoftware\kryo-shaded\4.0.2\kryo-shaded-4.0.2.jar;C:\Users\marhamil\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\esotericsoftware\minlog\1.3.0\minlog-1.3.0.jar;C:\Users\marhamil\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\fasterxml\jackson\core\jackson-annotations\2.10.0\jackson-annotations-2.10.0.jar;C:\Users\marhamil\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\fasterxml\jackson\core\jackson-core\2.10.0\jackson-core-2.10.0.jar;C:\Users\marhamil\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\fasterxml\jackson\core\jackson-databind\2.10.0\jackson-databind-2.10.0.jar;C:\Users\marhamil\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\fasterxml\jackson\module\jackson-module-paranamer\2.10.0\jackson-module-paranamer-2.10.0.jar;C:\Users\marhamil\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\fasterxml\jackson\module\jackson-module-scala_2.12\2.10.0\jackson-module-scala_2.12-2.10.0.jar;C:\Users\marhamil\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\github\fommil\netlib\core\1.1.2\core-1.1.2.jar;C:\Users\marhamil\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\github\luben\zstd-jni\1.4.4-3\zstd-jni-1.4.4-3.jar;C:\Users\marhamil\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\github\spotbugs\spotbugs-annotations\3.1.9\spotbugs-annotations-3.1.9.jar;C:\Users\marhamil\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\github\vowpalwabbit\vw-jni\8.8.1\vw-jni-8.8.1.jar;C:\Users\marhamil\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\

[jira] [Updated] (SPARK-34002) Broken UDF Encoding

2021-01-04 Thread Mark Hamilton (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Hamilton updated SPARK-34002:
--
Description: 
UDFs can behave differently depending on if a dataframe is cached, despite the 
dataframe being identical

 

Repro:

 
{code:java}
import org.apache.spark.sql.expressions.UserDefinedFunction 
import org.apache.spark.sql.functions.{col, udf}

case class Bar(a: Int)
 
import spark.implicits._

def f1(bar: Bar): Option[Bar] = {
 None
}

def f2(bar: Bar): Option[Bar] = {
 Option(bar)
}

val udf1: UserDefinedFunction = udf(f1 _)
val udf2: UserDefinedFunction = udf(f2 _)

// Commenting in the cache will make this example work
val df = (1 to 10).map(i => Tuple1(Bar(1))).toDF("c0")//.cache()
val newDf = df
 .withColumn("c1", udf1(col("c0")))
 .withColumn("c2", udf2(col("c1")))
newDf.show()
{code}
 

Error:
{code:java}
Using Spark's default log4j profile: 
org/apache/spark/log4j-defaults.properties21/01/05 00:52:57 INFO SparkContext: 
Running Spark version 3.0.121/01/05 00:52:57 WARN NativeCodeLoader: Unable to 
load native-hadoop library for your platform... using builtin-java classes 
where applicable21/01/05 00:52:57 INFO ResourceUtils: 
==21/01/05 00:52:57 
INFO ResourceUtils: Resources for spark.driver:
21/01/05 00:52:57 INFO ResourceUtils: 
==21/01/05 00:52:57 
INFO SparkContext: Submitted application: JsonOutputParserSuite21/01/05 
00:52:57 INFO SparkContext: Spark 
configuration:spark.app.name=JsonOutputParserSuitespark.driver.maxResultSize=6gspark.logConf=truespark.master=local[*]spark.sql.crossJoin.enabled=truespark.sql.shuffle.partitions=20spark.sql.warehouse.dir=file:/code/mmlspark/spark-warehouse21/01/05
 00:52:58 INFO SecurityManager: Changing view acls to: marhamil21/01/05 
00:52:58 INFO SecurityManager: Changing modify acls to: marhamil21/01/05 
00:52:58 INFO SecurityManager: Changing view acls groups to: 21/01/05 00:52:58 
INFO SecurityManager: Changing modify acls groups to: 21/01/05 00:52:58 INFO 
SecurityManager: SecurityManager: authentication disabled; ui acls disabled; 
users  with view permissions: Set(marhamil); groups with view permissions: 
Set(); users  with modify permissions: Set(marhamil); groups with modify 
permissions: Set()21/01/05 00:52:58 INFO Utils: Successfully started service 
'sparkDriver' on port 52315.21/01/05 00:52:58 INFO SparkEnv: Registering 
MapOutputTracker21/01/05 00:52:58 INFO SparkEnv: Registering 
BlockManagerMaster21/01/05 00:52:58 INFO BlockManagerMasterEndpoint: Using 
org.apache.spark.storage.DefaultTopologyMapper for getting topology 
information21/01/05 00:52:58 INFO BlockManagerMasterEndpoint: 
BlockManagerMasterEndpoint up21/01/05 00:52:58 INFO SparkEnv: Registering 
BlockManagerMasterHeartbeat21/01/05 00:52:58 INFO DiskBlockManager: Created 
local directory at 
C:\Users\marhamil\AppData\Local\Temp\blockmgr-9a5c80ef-ade6-41ac-9933-a26f6c29171921/01/05
 00:52:58 INFO MemoryStore: MemoryStore started with capacity 4.0 GiB21/01/05 
00:52:59 INFO SparkEnv: Registering OutputCommitCoordinator21/01/05 00:52:59 
INFO Utils: Successfully started service 'SparkUI' on port 4040.21/01/05 
00:52:59 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at 
http://host.docker.internal:404021/01/05 00:52:59 INFO Executor: Starting 
executor ID driver on host host.docker.internal21/01/05 00:52:59 INFO Utils: 
Successfully started service 
'org.apache.spark.network.netty.NettyBlockTransferService' on port 
52359.21/01/05 00:52:59 INFO NettyBlockTransferService: Server created on 
host.docker.internal:5235921/01/05 00:52:59 INFO BlockManager: Using 
org.apache.spark.storage.RandomBlockReplicationPolicy for block replication 
policy21/01/05 00:52:59 INFO BlockManagerMaster: Registering BlockManager 
BlockManagerId(driver, host.docker.internal, 52359, None)21/01/05 00:52:59 INFO 
BlockManagerMasterEndpoint: Registering block manager 
host.docker.internal:52359 with 4.0 GiB RAM, BlockManagerId(driver, 
host.docker.internal, 52359, None)21/01/05 00:52:59 INFO BlockManagerMaster: 
Registered BlockManager BlockManagerId(driver, host.docker.internal, 52359, 
None)21/01/05 00:52:59 INFO BlockManager: Initialized BlockManager: 
BlockManagerId(driver, host.docker.internal, 52359, None)21/01/05 00:53:00 WARN 
SharedState: Not allowing to set spark.sql.warehouse.dir or 
hive.metastore.warehouse.dir in SparkSession's options, it should be set 
statically for cross-session usagesFailed to execute user defined 
function(JsonOutputParserSuite$$Lambda$574/51376124: (struct) => 
struct)org.apache.spark.SparkException: Failed to execute user defined 
function(JsonOutputParserSuite$$Lambda$574/51376124: (struct) => 
struct) at 
org.apache.spark.sql.catalyst.expressions.ScalaUDF.eval(ScalaUDF.scala:1130) at 
org.apache.spark.sql.catalyst.expressions.Alias.eval(na

[jira] [Updated] (SPARK-34002) Broken UDF Encoding

2021-01-04 Thread Mark Hamilton (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Hamilton updated SPARK-34002:
--
Description: 
UDFs can behave differently depending on if a dataframe is cached, despite the 
dataframe being identical

 

Repro:

 
{code:java}
case class Bar(a: Int)
 
import spark.implicits._

def f1(bar: Bar): Option[Bar] = {
 None
}

def f2(bar: Bar): Option[Bar] = {
 Option(bar)
}

val udf1: UserDefinedFunction = udf(f1 _)
val udf2: UserDefinedFunction = udf(f2 _)

// Commenting in the cache will make this example work
val df = (1 to 10).map(i => Tuple1(Bar(1))).toDF("c0")//.cache()
val newDf = df
 .withColumn("c1", udf1(col("c0")))
 .withColumn("c2", udf2(col("c1")))
newDf.show()
{code}
 

Error:
{code:java}
Using Spark's default log4j profile: 
org/apache/spark/log4j-defaults.properties21/01/05 00:52:57 INFO SparkContext: 
Running Spark version 3.0.121/01/05 00:52:57 WARN NativeCodeLoader: Unable to 
load native-hadoop library for your platform... using builtin-java classes 
where applicable21/01/05 00:52:57 INFO ResourceUtils: 
==21/01/05 00:52:57 
INFO ResourceUtils: Resources for spark.driver:
21/01/05 00:52:57 INFO ResourceUtils: 
==21/01/05 00:52:57 
INFO SparkContext: Submitted application: JsonOutputParserSuite21/01/05 
00:52:57 INFO SparkContext: Spark 
configuration:spark.app.name=JsonOutputParserSuitespark.driver.maxResultSize=6gspark.logConf=truespark.master=local[*]spark.sql.crossJoin.enabled=truespark.sql.shuffle.partitions=20spark.sql.warehouse.dir=file:/code/mmlspark/spark-warehouse21/01/05
 00:52:58 INFO SecurityManager: Changing view acls to: marhamil21/01/05 
00:52:58 INFO SecurityManager: Changing modify acls to: marhamil21/01/05 
00:52:58 INFO SecurityManager: Changing view acls groups to: 21/01/05 00:52:58 
INFO SecurityManager: Changing modify acls groups to: 21/01/05 00:52:58 INFO 
SecurityManager: SecurityManager: authentication disabled; ui acls disabled; 
users  with view permissions: Set(marhamil); groups with view permissions: 
Set(); users  with modify permissions: Set(marhamil); groups with modify 
permissions: Set()21/01/05 00:52:58 INFO Utils: Successfully started service 
'sparkDriver' on port 52315.21/01/05 00:52:58 INFO SparkEnv: Registering 
MapOutputTracker21/01/05 00:52:58 INFO SparkEnv: Registering 
BlockManagerMaster21/01/05 00:52:58 INFO BlockManagerMasterEndpoint: Using 
org.apache.spark.storage.DefaultTopologyMapper for getting topology 
information21/01/05 00:52:58 INFO BlockManagerMasterEndpoint: 
BlockManagerMasterEndpoint up21/01/05 00:52:58 INFO SparkEnv: Registering 
BlockManagerMasterHeartbeat21/01/05 00:52:58 INFO DiskBlockManager: Created 
local directory at 
C:\Users\marhamil\AppData\Local\Temp\blockmgr-9a5c80ef-ade6-41ac-9933-a26f6c29171921/01/05
 00:52:58 INFO MemoryStore: MemoryStore started with capacity 4.0 GiB21/01/05 
00:52:59 INFO SparkEnv: Registering OutputCommitCoordinator21/01/05 00:52:59 
INFO Utils: Successfully started service 'SparkUI' on port 4040.21/01/05 
00:52:59 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at 
http://host.docker.internal:404021/01/05 00:52:59 INFO Executor: Starting 
executor ID driver on host host.docker.internal21/01/05 00:52:59 INFO Utils: 
Successfully started service 
'org.apache.spark.network.netty.NettyBlockTransferService' on port 
52359.21/01/05 00:52:59 INFO NettyBlockTransferService: Server created on 
host.docker.internal:5235921/01/05 00:52:59 INFO BlockManager: Using 
org.apache.spark.storage.RandomBlockReplicationPolicy for block replication 
policy21/01/05 00:52:59 INFO BlockManagerMaster: Registering BlockManager 
BlockManagerId(driver, host.docker.internal, 52359, None)21/01/05 00:52:59 INFO 
BlockManagerMasterEndpoint: Registering block manager 
host.docker.internal:52359 with 4.0 GiB RAM, BlockManagerId(driver, 
host.docker.internal, 52359, None)21/01/05 00:52:59 INFO BlockManagerMaster: 
Registered BlockManager BlockManagerId(driver, host.docker.internal, 52359, 
None)21/01/05 00:52:59 INFO BlockManager: Initialized BlockManager: 
BlockManagerId(driver, host.docker.internal, 52359, None)21/01/05 00:53:00 WARN 
SharedState: Not allowing to set spark.sql.warehouse.dir or 
hive.metastore.warehouse.dir in SparkSession's options, it should be set 
statically for cross-session usagesFailed to execute user defined 
function(JsonOutputParserSuite$$Lambda$574/51376124: (struct) => 
struct)org.apache.spark.SparkException: Failed to execute user defined 
function(JsonOutputParserSuite$$Lambda$574/51376124: (struct) => 
struct) at 
org.apache.spark.sql.catalyst.expressions.ScalaUDF.eval(ScalaUDF.scala:1130) at 
org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:156)
 at 
org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Int

[jira] [Updated] (SPARK-32085) Migrate to NumPy documentation style

2021-01-04 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-32085:
-
Fix Version/s: 3.1.0

> Migrate to NumPy documentation style
> 
>
> Key: SPARK-32085
> URL: https://issues.apache.org/jira/browse/SPARK-32085
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Maciej Szymkiewicz
>Priority: Major
> Fix For: 3.1.0
>
>
> https://github.com/numpy/numpydoc
> For example,
> Before: 
> https://github.com/apache/spark/blob/f0e6d0ec13d9cdadf341d1b976623345bcdb1028/python/pyspark/sql/dataframe.py#L276-L318
>  After: 
> https://github.com/databricks/koalas/blob/6711e9c0f50c79dd57eeedb530da6c4ea3298de2/databricks/koalas/frame.py#L1122-L1176
> We can incrementally start to switch.
> NOTE that this JIRA targets only to switch the style. It does not target to 
> add additional information or fixes together.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-32085) Migrate to NumPy documentation style

2021-01-04 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-32085.
--
Resolution: Done

> Migrate to NumPy documentation style
> 
>
> Key: SPARK-32085
> URL: https://issues.apache.org/jira/browse/SPARK-32085
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Maciej Szymkiewicz
>Priority: Major
> Fix For: 3.1.0
>
>
> https://github.com/numpy/numpydoc
> For example,
> Before: 
> https://github.com/apache/spark/blob/f0e6d0ec13d9cdadf341d1b976623345bcdb1028/python/pyspark/sql/dataframe.py#L276-L318
>  After: 
> https://github.com/databricks/koalas/blob/6711e9c0f50c79dd57eeedb530da6c4ea3298de2/databricks/koalas/frame.py#L1122-L1176
> We can incrementally start to switch.
> NOTE that this JIRA targets only to switch the style. It does not target to 
> add additional information or fixes together.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34002) Broken UDF behavior

2021-01-04 Thread Mark Hamilton (Jira)

Mark Hamilton created SPARK-34002:
-

 Summary: Broken UDF behavior
 Key: SPARK-34002
 URL: https://issues.apache.org/jira/browse/SPARK-34002
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.0.1
Reporter: Mark Hamilton


UDFs can behave differently depending on if a dataframe is cached, despite the 
dataframe being identical

 

Repro:

 
{code:java}
case class Bar(a: Int)
 
import spark.implicits._

def f1(bar: Bar): Option[Bar] = {
 None
}

def f2(bar: Bar): Option[Bar] = {
 Option(bar)
}

val udf1: UserDefinedFunction = udf(f1 _)
val udf2: UserDefinedFunction = udf(f2 _)

// Commenting in the cache will make this example work
val df = (1 to 10).map(i => Tuple1(Bar(1))).toDF("c0")//.cache()
val newDf = df
 .withColumn("c1", udf1(col("c0")))
 .withColumn("c2", udf2(col("c1")))
newDf.show()
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33242) Install numpydoc in Jenkins machines

2021-01-04 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-33242:
-
Parent: (was: SPARK-32085)
Issue Type: Test  (was: Sub-task)

> Install numpydoc in Jenkins machines
> 
>
> Key: SPARK-33242
> URL: https://issues.apache.org/jira/browse/SPARK-33242
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, PySpark
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Shane Knapp
>Priority: Major
>
> To switch to reST style to numpydoc style, we should install numpydoc as 
> well. This is being used in Sphinx. See the parent JIRA as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33992) resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33992:
--
Fix Version/s: (was: 3.1.0)
   3.1.1

> resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer
> -
>
> Key: SPARK-33992
> URL: https://issues.apache.org/jira/browse/SPARK-33992
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Minor
> Fix For: 3.1.1
>
>
> PaddingAndLengthCheckForCharVarchar could fail query when 
> resolveOperatorsUpWithNewOutput
> with 
> {code:java}
> [info] - char/varchar resolution in sub query  *** FAILED *** (367 
> milliseconds)
> [info]   java.lang.RuntimeException: This method should not be called in the 
> analyzer
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule(AnalysisHelper.scala:150)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule$(AnalysisHelper.scala:146)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.assertNotAnalysisRule(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown(AnalysisHelper.scala:161)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown$(AnalysisHelper.scala:160)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$updateOuterReferencesInSubquery(QueryPlan.scala:267)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34000) ExecutorAllocationListener threw an exception java.util.NoSuchElementException

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-34000:
--
Fix Version/s: (was: 3.1.0)
   3.1.1

> ExecutorAllocationListener threw an exception java.util.NoSuchElementException
> --
>
> Key: SPARK-34000
> URL: https://issues.apache.org/jira/browse/SPARK-34000
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1, 3.1.0, 3.2.0
>Reporter: Lantao Jin
>Assignee: Lantao Jin
>Priority: Major
> Fix For: 3.0.2, 3.1.1
>
>
> 21/01/04 03:00:32,259 WARN [task-result-getter-2] scheduler.TaskSetManager:69 
> : Lost task 306.1 in stage 600.0 (TID 283610, 
> hdc49-mcc10-01-0510-4108-039-tess0097.stratus.rno.ebay.com, executor 27): 
> TaskKilled (another attempt succeeded)
> 21/01/04 03:00:32,259 INFO [task-result-getter-2] scheduler.TaskSetManager:57 
> : Task 306.1 in stage 600.0 (TID 283610) failed, but the task will not be 
> re-executed (either because the task failed with a shuffle data fetch 
> failure, so the
> previous stage needs to be re-run, or because a different copy of the task 
> has already succeeded).
> 21/01/04 03:00:32,259 INFO [task-result-getter-2] 
> cluster.YarnClusterScheduler:57 : Removed TaskSet 600.0, whose tasks have all 
> completed, from pool default
> 21/01/04 03:00:32,259 INFO [HiveServer2-Handler-Pool: Thread-5853] 
> thriftserver.SparkExecuteStatementOperation:190 : Returning result set with 
> 50 rows from offsets [5378600, 5378650) with 
> 1fe245f8-a7f9-4ec0-bcb5-8cf324cbbb47
> 21/01/04 03:00:32,260 ERROR [spark-listener-group-executorManagement] 
> scheduler.AsyncEventQueue:94 : Listener ExecutorAllocationListener threw an 
> exception
> java.util.NoSuchElementException: key not found: Stage 600 (Attempt 0)
> at scala.collection.MapLike.default(MapLike.scala:235)
> at scala.collection.MapLike.default$(MapLike.scala:234)
> at scala.collection.AbstractMap.default(Map.scala:63)
> at scala.collection.mutable.HashMap.apply(HashMap.scala:69)
> at 
> org.apache.spark.ExecutorAllocationManager$ExecutorAllocationListener.onTaskEnd(ExecutorAllocationManager.scala:621)
> at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:45)
> at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38)
> at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:115)
> at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:99)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:116)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:116)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:102)
> at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:97)
> at 
> org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320)
> at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:97)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34000) ExecutorAllocationListener threw an exception java.util.NoSuchElementException

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-34000:
-

Assignee: Lantao Jin

> ExecutorAllocationListener threw an exception java.util.NoSuchElementException
> --
>
> Key: SPARK-34000
> URL: https://issues.apache.org/jira/browse/SPARK-34000
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1, 3.1.0, 3.2.0
>Reporter: Lantao Jin
>Assignee: Lantao Jin
>Priority: Major
>
> 21/01/04 03:00:32,259 WARN [task-result-getter-2] scheduler.TaskSetManager:69 
> : Lost task 306.1 in stage 600.0 (TID 283610, 
> hdc49-mcc10-01-0510-4108-039-tess0097.stratus.rno.ebay.com, executor 27): 
> TaskKilled (another attempt succeeded)
> 21/01/04 03:00:32,259 INFO [task-result-getter-2] scheduler.TaskSetManager:57 
> : Task 306.1 in stage 600.0 (TID 283610) failed, but the task will not be 
> re-executed (either because the task failed with a shuffle data fetch 
> failure, so the
> previous stage needs to be re-run, or because a different copy of the task 
> has already succeeded).
> 21/01/04 03:00:32,259 INFO [task-result-getter-2] 
> cluster.YarnClusterScheduler:57 : Removed TaskSet 600.0, whose tasks have all 
> completed, from pool default
> 21/01/04 03:00:32,259 INFO [HiveServer2-Handler-Pool: Thread-5853] 
> thriftserver.SparkExecuteStatementOperation:190 : Returning result set with 
> 50 rows from offsets [5378600, 5378650) with 
> 1fe245f8-a7f9-4ec0-bcb5-8cf324cbbb47
> 21/01/04 03:00:32,260 ERROR [spark-listener-group-executorManagement] 
> scheduler.AsyncEventQueue:94 : Listener ExecutorAllocationListener threw an 
> exception
> java.util.NoSuchElementException: key not found: Stage 600 (Attempt 0)
> at scala.collection.MapLike.default(MapLike.scala:235)
> at scala.collection.MapLike.default$(MapLike.scala:234)
> at scala.collection.AbstractMap.default(Map.scala:63)
> at scala.collection.mutable.HashMap.apply(HashMap.scala:69)
> at 
> org.apache.spark.ExecutorAllocationManager$ExecutorAllocationListener.onTaskEnd(ExecutorAllocationManager.scala:621)
> at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:45)
> at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38)
> at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:115)
> at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:99)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:116)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:116)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:102)
> at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:97)
> at 
> org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320)
> at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:97)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34000) ExecutorAllocationListener threw an exception java.util.NoSuchElementException

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-34000.
---
Fix Version/s: 3.0.2
   3.1.0
   Resolution: Fixed

Issue resolved by pull request 31025
[https://github.com/apache/spark/pull/31025]

> ExecutorAllocationListener threw an exception java.util.NoSuchElementException
> --
>
> Key: SPARK-34000
> URL: https://issues.apache.org/jira/browse/SPARK-34000
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1, 3.1.0, 3.2.0
>Reporter: Lantao Jin
>Assignee: Lantao Jin
>Priority: Major
> Fix For: 3.1.0, 3.0.2
>
>
> 21/01/04 03:00:32,259 WARN [task-result-getter-2] scheduler.TaskSetManager:69 
> : Lost task 306.1 in stage 600.0 (TID 283610, 
> hdc49-mcc10-01-0510-4108-039-tess0097.stratus.rno.ebay.com, executor 27): 
> TaskKilled (another attempt succeeded)
> 21/01/04 03:00:32,259 INFO [task-result-getter-2] scheduler.TaskSetManager:57 
> : Task 306.1 in stage 600.0 (TID 283610) failed, but the task will not be 
> re-executed (either because the task failed with a shuffle data fetch 
> failure, so the
> previous stage needs to be re-run, or because a different copy of the task 
> has already succeeded).
> 21/01/04 03:00:32,259 INFO [task-result-getter-2] 
> cluster.YarnClusterScheduler:57 : Removed TaskSet 600.0, whose tasks have all 
> completed, from pool default
> 21/01/04 03:00:32,259 INFO [HiveServer2-Handler-Pool: Thread-5853] 
> thriftserver.SparkExecuteStatementOperation:190 : Returning result set with 
> 50 rows from offsets [5378600, 5378650) with 
> 1fe245f8-a7f9-4ec0-bcb5-8cf324cbbb47
> 21/01/04 03:00:32,260 ERROR [spark-listener-group-executorManagement] 
> scheduler.AsyncEventQueue:94 : Listener ExecutorAllocationListener threw an 
> exception
> java.util.NoSuchElementException: key not found: Stage 600 (Attempt 0)
> at scala.collection.MapLike.default(MapLike.scala:235)
> at scala.collection.MapLike.default$(MapLike.scala:234)
> at scala.collection.AbstractMap.default(Map.scala:63)
> at scala.collection.mutable.HashMap.apply(HashMap.scala:69)
> at 
> org.apache.spark.ExecutorAllocationManager$ExecutorAllocationListener.onTaskEnd(ExecutorAllocationManager.scala:621)
> at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:45)
> at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38)
> at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:115)
> at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:99)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:116)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:116)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:102)
> at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:97)
> at 
> org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320)
> at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:97)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33992) resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer

2021-01-04 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-33992:
---

Assignee: Kent Yao

> resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer
> -
>
> Key: SPARK-33992
> URL: https://issues.apache.org/jira/browse/SPARK-33992
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Minor
>
> PaddingAndLengthCheckForCharVarchar could fail query when 
> resolveOperatorsUpWithNewOutput
> with 
> {code:java}
> [info] - char/varchar resolution in sub query  *** FAILED *** (367 
> milliseconds)
> [info]   java.lang.RuntimeException: This method should not be called in the 
> analyzer
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule(AnalysisHelper.scala:150)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule$(AnalysisHelper.scala:146)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.assertNotAnalysisRule(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown(AnalysisHelper.scala:161)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown$(AnalysisHelper.scala:160)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$updateOuterReferencesInSubquery(QueryPlan.scala:267)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34001) Remove unused runShowTablesSql() in DataSourceV2SQLSuite.scala

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258676#comment-17258676
 ] 

Apache Spark commented on SPARK-34001:
--

User 'imback82' has created a pull request for this issue:
https://github.com/apache/spark/pull/31022

> Remove unused runShowTablesSql() in DataSourceV2SQLSuite.scala
> --
>
> Key: SPARK-34001
> URL: https://issues.apache.org/jira/browse/SPARK-34001
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Minor
> Fix For: 3.2.0
>
>
> runShowTablesSql() in DataSourceV2SQLSuite.scala is no longer used and can be 
> removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33992) resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer

2021-01-04 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-33992.
-
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 31013
[https://github.com/apache/spark/pull/31013]

> resolveOperatorsUpWithNewOutput should wrap allowInvokingTransformsInAnalyzer
> -
>
> Key: SPARK-33992
> URL: https://issues.apache.org/jira/browse/SPARK-33992
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Minor
> Fix For: 3.1.0
>
>
> PaddingAndLengthCheckForCharVarchar could fail query when 
> resolveOperatorsUpWithNewOutput
> with 
> {code:java}
> [info] - char/varchar resolution in sub query  *** FAILED *** (367 
> milliseconds)
> [info]   java.lang.RuntimeException: This method should not be called in the 
> analyzer
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule(AnalysisHelper.scala:150)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule$(AnalysisHelper.scala:146)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.assertNotAnalysisRule(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown(AnalysisHelper.scala:161)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown$(AnalysisHelper.scala:160)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
> [info]   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$updateOuterReferencesInSubquery(QueryPlan.scala:267)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34001) Remove unused runShowTablesSql() in DataSourceV2SQLSuite.scala

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258675#comment-17258675
 ] 

Apache Spark commented on SPARK-34001:
--

User 'imback82' has created a pull request for this issue:
https://github.com/apache/spark/pull/31022

> Remove unused runShowTablesSql() in DataSourceV2SQLSuite.scala
> --
>
> Key: SPARK-34001
> URL: https://issues.apache.org/jira/browse/SPARK-34001
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Minor
> Fix For: 3.2.0
>
>
> runShowTablesSql() in DataSourceV2SQLSuite.scala is no longer used and can be 
> removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34001) Remove unused runShowTablesSql() in DataSourceV2SQLSuite.scala

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-34001:
-

Assignee: Terry Kim

> Remove unused runShowTablesSql() in DataSourceV2SQLSuite.scala
> --
>
> Key: SPARK-34001
> URL: https://issues.apache.org/jira/browse/SPARK-34001
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Minor
>
> runShowTablesSql() in DataSourceV2SQLSuite.scala is no longer used and can be 
> removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-34001) Remove unused runShowTablesSql() in DataSourceV2SQLSuite.scala

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-34001.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31022
[https://github.com/apache/spark/pull/31022]

> Remove unused runShowTablesSql() in DataSourceV2SQLSuite.scala
> --
>
> Key: SPARK-34001
> URL: https://issues.apache.org/jira/browse/SPARK-34001
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Minor
> Fix For: 3.2.0
>
>
> runShowTablesSql() in DataSourceV2SQLSuite.scala is no longer used and can be 
> removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33998) Refactor v2CommandExec to provide an API to create an InternalRow

2021-01-04 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-33998.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31020
[https://github.com/apache/spark/pull/31020]

> Refactor v2CommandExec to provide an API to create an InternalRow
> -
>
> Key: SPARK-33998
> URL: https://issues.apache.org/jira/browse/SPARK-33998
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Minor
> Fix For: 3.2.0
>
>
> There are many v2 commands such as "SHOW TABLES", "DESCRIBE TABLE", etc. that 
> require creating InternalRow. Creating InternalRow can be refactored into 
> v2CommandExec to remove duplicate code to create serializer, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33998) Refactor v2CommandExec to provide an API to create an InternalRow

2021-01-04 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-33998:
---

Assignee: Terry Kim

> Refactor v2CommandExec to provide an API to create an InternalRow
> -
>
> Key: SPARK-33998
> URL: https://issues.apache.org/jira/browse/SPARK-33998
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Minor
>
> There are many v2 commands such as "SHOW TABLES", "DESCRIBE TABLE", etc. that 
> require creating InternalRow. Creating InternalRow can be refactored into 
> v2CommandExec to remove duplicate code to create serializer, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34000) ExecutorAllocationListener threw an exception java.util.NoSuchElementException

2021-01-04 Thread Lantao Jin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lantao Jin updated SPARK-34000:
---
Affects Version/s: 3.0.1

> ExecutorAllocationListener threw an exception java.util.NoSuchElementException
> --
>
> Key: SPARK-34000
> URL: https://issues.apache.org/jira/browse/SPARK-34000
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1, 3.1.0, 3.2.0
>Reporter: Lantao Jin
>Priority: Major
>
> 21/01/04 03:00:32,259 WARN [task-result-getter-2] scheduler.TaskSetManager:69 
> : Lost task 306.1 in stage 600.0 (TID 283610, 
> hdc49-mcc10-01-0510-4108-039-tess0097.stratus.rno.ebay.com, executor 27): 
> TaskKilled (another attempt succeeded)
> 21/01/04 03:00:32,259 INFO [task-result-getter-2] scheduler.TaskSetManager:57 
> : Task 306.1 in stage 600.0 (TID 283610) failed, but the task will not be 
> re-executed (either because the task failed with a shuffle data fetch 
> failure, so the
> previous stage needs to be re-run, or because a different copy of the task 
> has already succeeded).
> 21/01/04 03:00:32,259 INFO [task-result-getter-2] 
> cluster.YarnClusterScheduler:57 : Removed TaskSet 600.0, whose tasks have all 
> completed, from pool default
> 21/01/04 03:00:32,259 INFO [HiveServer2-Handler-Pool: Thread-5853] 
> thriftserver.SparkExecuteStatementOperation:190 : Returning result set with 
> 50 rows from offsets [5378600, 5378650) with 
> 1fe245f8-a7f9-4ec0-bcb5-8cf324cbbb47
> 21/01/04 03:00:32,260 ERROR [spark-listener-group-executorManagement] 
> scheduler.AsyncEventQueue:94 : Listener ExecutorAllocationListener threw an 
> exception
> java.util.NoSuchElementException: key not found: Stage 600 (Attempt 0)
> at scala.collection.MapLike.default(MapLike.scala:235)
> at scala.collection.MapLike.default$(MapLike.scala:234)
> at scala.collection.AbstractMap.default(Map.scala:63)
> at scala.collection.mutable.HashMap.apply(HashMap.scala:69)
> at 
> org.apache.spark.ExecutorAllocationManager$ExecutorAllocationListener.onTaskEnd(ExecutorAllocationManager.scala:621)
> at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:45)
> at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38)
> at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:115)
> at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:99)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:116)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:116)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:102)
> at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:97)
> at 
> org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320)
> at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:97)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34001) Remove unused runShowTablesSql() in DataSourceV2SQLSuite.scala

2021-01-04 Thread Terry Kim (Jira)

Terry Kim created SPARK-34001:
-

 Summary: Remove unused runShowTablesSql() in 
DataSourceV2SQLSuite.scala
 Key: SPARK-34001
 URL: https://issues.apache.org/jira/browse/SPARK-34001
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: Terry Kim


runShowTablesSql() in DataSourceV2SQLSuite.scala is no longer used and can be 
removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34000) ExecutorAllocationListener threw an exception java.util.NoSuchElementException

2021-01-04 Thread Lantao Jin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lantao Jin updated SPARK-34000:
---
Affects Version/s: (was: 3.0.1)

> ExecutorAllocationListener threw an exception java.util.NoSuchElementException
> --
>
> Key: SPARK-34000
> URL: https://issues.apache.org/jira/browse/SPARK-34000
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.0, 3.2.0
>Reporter: Lantao Jin
>Priority: Major
>
> 21/01/04 03:00:32,259 WARN [task-result-getter-2] scheduler.TaskSetManager:69 
> : Lost task 306.1 in stage 600.0 (TID 283610, 
> hdc49-mcc10-01-0510-4108-039-tess0097.stratus.rno.ebay.com, executor 27): 
> TaskKilled (another attempt succeeded)
> 21/01/04 03:00:32,259 INFO [task-result-getter-2] scheduler.TaskSetManager:57 
> : Task 306.1 in stage 600.0 (TID 283610) failed, but the task will not be 
> re-executed (either because the task failed with a shuffle data fetch 
> failure, so the
> previous stage needs to be re-run, or because a different copy of the task 
> has already succeeded).
> 21/01/04 03:00:32,259 INFO [task-result-getter-2] 
> cluster.YarnClusterScheduler:57 : Removed TaskSet 600.0, whose tasks have all 
> completed, from pool default
> 21/01/04 03:00:32,259 INFO [HiveServer2-Handler-Pool: Thread-5853] 
> thriftserver.SparkExecuteStatementOperation:190 : Returning result set with 
> 50 rows from offsets [5378600, 5378650) with 
> 1fe245f8-a7f9-4ec0-bcb5-8cf324cbbb47
> 21/01/04 03:00:32,260 ERROR [spark-listener-group-executorManagement] 
> scheduler.AsyncEventQueue:94 : Listener ExecutorAllocationListener threw an 
> exception
> java.util.NoSuchElementException: key not found: Stage 600 (Attempt 0)
> at scala.collection.MapLike.default(MapLike.scala:235)
> at scala.collection.MapLike.default$(MapLike.scala:234)
> at scala.collection.AbstractMap.default(Map.scala:63)
> at scala.collection.mutable.HashMap.apply(HashMap.scala:69)
> at 
> org.apache.spark.ExecutorAllocationManager$ExecutorAllocationListener.onTaskEnd(ExecutorAllocationManager.scala:621)
> at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:45)
> at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38)
> at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:115)
> at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:99)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:116)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:116)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:102)
> at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:97)
> at 
> org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320)
> at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:97)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-31786) Exception on submitting Spark-Pi to Kubernetes 1.17.3

2021-01-04 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258670#comment-17258670
 ] 

Dongjoon Hyun edited comment on SPARK-31786 at 1/5/21, 5:24 AM:


Yes, you are correct.
 # `export` is only required for your machine.
 # `–conf` should be used for `driverEnv`.

Yes, Spark 3.0 is better for K8s environment and Spark 3.1 is much better 
because of SPARK-33005 (`Kubernetes GA Preparation`). FYI, Apache Spark 3.1.0 
RC1 is already created.
 - [https://github.com/apache/spark/tree/v3.1.0-rc1]

Apache Spark 3.1.0 will arrive this month.


was (Author: dongjoon):
Yes, you are correct.
 # `export` is only required for your machine.
 # `–conf` should be used for `driverEnv`.

Yes, Spark 3.0 is better for K8s environment and Spark 3.1 is much better 
because of SPARK-33005 . FYI, Apache Spark 3.1.0 RC1 is already created.

- https://github.com/apache/spark/tree/v3.1.0-rc1

Apache Spark 3.1.0 will arrive this month.

> Exception on submitting Spark-Pi to Kubernetes 1.17.3
> -
>
> Key: SPARK-31786
> URL: https://issues.apache.org/jira/browse/SPARK-31786
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Maciej Bryński
>Assignee: Dongjoon Hyun
>Priority: Blocker
> Fix For: 3.0.0
>
>
> Hi,
> I'm getting exception when submitting Spark-Pi app to Kubernetes cluster.
> Kubernetes version: 1.17.3
> JDK version: openjdk version "1.8.0_252"
> Exception:
> {code}
>  ./bin/spark-submit --master k8s://https://172.31.23.60:8443 --deploy-mode 
> cluster --name spark-pi --conf 
> spark.kubernetes.container.image=spark-py:2.4.5 --conf 
> spark.kubernetes.executor.request.cores=0.1 --conf 
> spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf 
> spark.executor.instances=1 local:///opt/spark/examples/src/main/python/pi.py
> log4j:WARN No appenders could be found for logger 
> (io.fabric8.kubernetes.client.Config).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
> info.
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> Exception in thread "main" 
> io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create]  
> for kind: [Pod]  with name: [null]  in namespace: [default]  failed.
> at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
> at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
> at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:337)
> at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:330)
> at 
> org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:141)
> at 
> org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:140)
> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543)
> at 
> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241)
> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204)
> at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
> at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
> at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.net.SocketException: Broken pipe (Write failed)
> at java.net.SocketOutputStream.socketWrite0(Native Method)
> at 
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
> at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
> at s

[jira] [Commented] (SPARK-31786) Exception on submitting Spark-Pi to Kubernetes 1.17.3

2021-01-04 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258670#comment-17258670
 ] 

Dongjoon Hyun commented on SPARK-31786:
---

Yes, you are correct.
 # `export` is only required for your machine.
 # `–conf` should be used for `driverEnv`.

Yes, Spark 3.0 is better for K8s environment and Spark 3.1 is much better 
because of SPARK-33005 . FYI, Apache Spark 3.1.0 RC1 is already created.

- https://github.com/apache/spark/tree/v3.1.0-rc1

Apache Spark 3.1.0 will arrive this month.

> Exception on submitting Spark-Pi to Kubernetes 1.17.3
> -
>
> Key: SPARK-31786
> URL: https://issues.apache.org/jira/browse/SPARK-31786
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Maciej Bryński
>Assignee: Dongjoon Hyun
>Priority: Blocker
> Fix For: 3.0.0
>
>
> Hi,
> I'm getting exception when submitting Spark-Pi app to Kubernetes cluster.
> Kubernetes version: 1.17.3
> JDK version: openjdk version "1.8.0_252"
> Exception:
> {code}
>  ./bin/spark-submit --master k8s://https://172.31.23.60:8443 --deploy-mode 
> cluster --name spark-pi --conf 
> spark.kubernetes.container.image=spark-py:2.4.5 --conf 
> spark.kubernetes.executor.request.cores=0.1 --conf 
> spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf 
> spark.executor.instances=1 local:///opt/spark/examples/src/main/python/pi.py
> log4j:WARN No appenders could be found for logger 
> (io.fabric8.kubernetes.client.Config).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
> info.
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> Exception in thread "main" 
> io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create]  
> for kind: [Pod]  with name: [null]  in namespace: [default]  failed.
> at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
> at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
> at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:337)
> at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:330)
> at 
> org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:141)
> at 
> org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:140)
> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543)
> at 
> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241)
> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241)
> at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204)
> at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
> at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
> at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.net.SocketException: Broken pipe (Write failed)
> at java.net.SocketOutputStream.socketWrite0(Native Method)
> at 
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
> at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
> at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431)
> at sun.security.ssl.OutputRecord.write(OutputRecord.java:417)
> at 
> sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:894)
> at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:865)
> at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123)
> at okio.Okio$1.write(Okio.java:79)
> at okio.AsyncTimeout$1.

[jira] [Resolved] (SPARK-33794) next_day function should throw runtime exception when receiving invalid input under ANSI mode

2021-01-04 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-33794.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 30807
[https://github.com/apache/spark/pull/30807]

> next_day function should throw runtime exception when receiving invalid input 
> under ANSI mode
> -
>
> Key: SPARK-33794
> URL: https://issues.apache.org/jira/browse/SPARK-33794
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Chongguang LIU
>Assignee: Chongguang LIU
>Priority: Major
> Fix For: 3.2.0
>
>
> Hello all,
> According to [ANSI 
> compliance|https://spark.apache.org/docs/3.0.0/sql-ref-ansi-compliance.html#ansi-compliance],
>  the [next_day 
> function|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L3095]
>  should throw an runtime exception when receiving invalid value for 
> dayOfWeek, exemple receiving "xx" instead of "SUNDAY".
>  
> A similar improvement has been done on the element_at function: 
> https://issues.apache.org/jira/browse/SPARK-33386
>  
> If you agree with this proposition, i can submit a pull request with 
> necessary change.
>  
> Kind regardes,
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33794) next_day function should throw runtime exception when receiving invalid input under ANSI mode

2021-01-04 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-33794:
---

Assignee: Chongguang LIU

> next_day function should throw runtime exception when receiving invalid input 
> under ANSI mode
> -
>
> Key: SPARK-33794
> URL: https://issues.apache.org/jira/browse/SPARK-33794
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Chongguang LIU
>Assignee: Chongguang LIU
>Priority: Major
>
> Hello all,
> According to [ANSI 
> compliance|https://spark.apache.org/docs/3.0.0/sql-ref-ansi-compliance.html#ansi-compliance],
>  the [next_day 
> function|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L3095]
>  should throw an runtime exception when receiving invalid value for 
> dayOfWeek, exemple receiving "xx" instead of "SUNDAY".
>  
> A similar improvement has been done on the element_at function: 
> https://issues.apache.org/jira/browse/SPARK-33386
>  
> If you agree with this proposition, i can submit a pull request with 
> necessary change.
>  
> Kind regardes,
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25075) Build and test Spark against Scala 2.13

2021-01-04 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-25075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258667#comment-17258667
 ] 

Dongjoon Hyun commented on SPARK-25075:
---

[~smarter]. Sorry, but unfortunately, from my assesement, the current status is 
a little different.

1. Apache Spark community is not able to publish Scala 2.13-based Maven 
artifacts yet.

2. Apache Spark community is not able to provide Scala 2.13-based binary 
distribution yet.

3. As you see at this JIRA, the target version of this Jira is 3.2.0, not 3.1.0.

4. For Apache Spark 3.1.0, we already created RC1 without SPARK-33894 and 
SPARK-33894 is marked as Spark 3.1.1. 
 * [https://github.com/apache/spark/releases/tag/v3.1.0-rc1]

Due to (1)~(4), Apache Spark 3.1.0 RC1 will have only Scala 2.12 libraries and 
binaries during vote period.

Of course, I guess we will roll more RCs with more improvements; at least 
SPARK-33894 will be a part of 3.1.0. However, I don't think we can say Scala 
2.13 is supported without the official Scala 2.13 binaries and Scala 2.13 Maven 
artifacts. I guess you also agree that those are mandatory.

 

cc [~hyukjin.kwon] and [~srowen]

> Build and test Spark against Scala 2.13
> ---
>
> Key: SPARK-25075
> URL: https://issues.apache.org/jira/browse/SPARK-25075
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, MLlib, Project Infra, Spark Core, SQL
>Affects Versions: 3.0.0
>Reporter: Guillaume Massé
>Priority: Major
>
> This umbrella JIRA tracks the requirements for building and testing Spark 
> against the current Scala 2.13 milestone.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33950) ALTER TABLE .. DROP PARTITION doesn't refresh cache

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33950:
--
Fix Version/s: (was: 3.1.0)
   3.1.1

> ALTER TABLE .. DROP PARTITION doesn't refresh cache
> ---
>
> Key: SPARK-33950
> URL: https://issues.apache.org/jira/browse/SPARK-33950
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.0, 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>  Labels: correctness
> Fix For: 3.0.2, 3.2.0, 3.1.1
>
>
> Here is the example to reproduce the issue:
> {code:sql}
> spark-sql> CREATE TABLE tbl1 (col0 int, part0 int) USING parquet PARTITIONED 
> BY (part0);
> spark-sql> INSERT INTO tbl1 PARTITION (part0=0) SELECT 0;
> spark-sql> INSERT INTO tbl1 PARTITION (part0=1) SELECT 1;
> spark-sql> CACHE TABLE tbl1;
> spark-sql> SELECT * FROM tbl1;
> 0 0
> 1 1
> spark-sql> ALTER TABLE tbl1 DROP PARTITION (part0=0);
> spark-sql> SELECT * FROM tbl1;
> 0 0
> 1 1
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33894) Word2VecSuite failed for Scala 2.13

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33894:
--
Fix Version/s: (was: 3.1.0)
   3.1.1

> Word2VecSuite failed for Scala 2.13
> ---
>
> Key: SPARK-33894
> URL: https://issues.apache.org/jira/browse/SPARK-33894
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Affects Versions: 3.2.0
>Reporter: Darcy Shen
>Assignee: koert kuipers
>Priority: Major
> Fix For: 3.1.1
>
>
> This may be the first failed build:
> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7-scala-2.13/52/
> h2. Possible Work Around Fix
> Move 
> case class Data(word: String, vector: Array[Float])
> out of the class Word2VecModel
> h2. Attempts to git bisect
> master branch git "bisect"
> cc23581e2645c91fa8d6e6c81dc87b4221718bb1 fail
> 3d0323401f7a3e4369a3d3f4ff98f15d19e8a643  fail
> 9d9d4a8e122cf1137edeca857e925f7e76c1ace2   fail
> f5d2165c95fe83f24be9841807613950c1d5d6d0 fail 2020-12-01
> h2. Attached Stack Trace
> To reproduce it in master:
> ./dev/change-scala-version.sh 2.13
> sbt -Pscala-2.13
> > project mllib
> > testOnly org.apache.spark.ml.feature.Word2VecSuite
> [info] Word2VecSuite:
> [info] - params (45 milliseconds)
> [info] - Word2Vec (5 seconds, 768 milliseconds)
> [info] - getVectors (549 milliseconds)
> [info] - findSynonyms (222 milliseconds)
> [info] - window size (382 milliseconds)
> [info] - Word2Vec read/write numPartitions calculation (1 millisecond)
> [info] - Word2Vec read/write (669 milliseconds)
> [info] - Word2VecModel read/write *** FAILED *** (423 milliseconds)
> [info]   org.apache.spark.SparkException: Job aborted.
> [info]   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:231)
> [info]   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:188)
> [info]   at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108)
> [info]   at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106)
> [info]   at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:131)
> [info]   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
> [info]   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
> [info]   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> [info]   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
> [info]   at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
> [info]   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132)
> [info]   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
> [info]   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
> [info]   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
> [info]   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
> [info]   at 
> org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
> [info]   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:874)
> [info]   at 
> org.apache.spark.ml.feature.Word2VecModel$Word2VecModelWriter.saveImpl(Word2Vec.scala:368)
> [info]   at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:168)
> [info]   at org.apache.spark.ml.util.MLWritable.save(ReadWrite.scala:287)
> [info]   at org.apache.spark.ml.util.MLWritable.save$(ReadWrite.scala:287)
> [info]   at org.apache.spark.ml.feature.Word2VecModel.save(Word2Vec.scala:207)
> [info]   at 
> org.apache.spark.ml.util.DefaultReadWriteTest.testDefaultReadWrite(DefaultReadWriteTest.scala:51)
> [info]   at 
> org.apache.spark.ml.util.DefaultReadWriteTest.testDefaultReadWrite$(DefaultReadWriteTest.scala:42)
> [info]   at 
> org.apache.spark.ml.feature.Word2VecSuite.testDefaultReadWrite(Word2

[jira] [Updated] (SPARK-33980) invalidate char/varchar in spark.readStream.schema

2021-01-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33980:
--
Fix Version/s: (was: 3.1.0)
   3.1.1

> invalidate char/varchar in spark.readStream.schema
> --
>
> Key: SPARK-33980
> URL: https://issues.apache.org/jira/browse/SPARK-33980
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.1.1
>
>
> invalidate char/varchar in spark.readStream.schema just like what we do for 
> spark.read.schema



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34000) ExecutorAllocationListener threw an exception java.util.NoSuchElementException

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34000:


Assignee: (was: Apache Spark)

> ExecutorAllocationListener threw an exception java.util.NoSuchElementException
> --
>
> Key: SPARK-34000
> URL: https://issues.apache.org/jira/browse/SPARK-34000
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1, 3.1.0, 3.2.0
>Reporter: Lantao Jin
>Priority: Major
>
> 21/01/04 03:00:32,259 WARN [task-result-getter-2] scheduler.TaskSetManager:69 
> : Lost task 306.1 in stage 600.0 (TID 283610, 
> hdc49-mcc10-01-0510-4108-039-tess0097.stratus.rno.ebay.com, executor 27): 
> TaskKilled (another attempt succeeded)
> 21/01/04 03:00:32,259 INFO [task-result-getter-2] scheduler.TaskSetManager:57 
> : Task 306.1 in stage 600.0 (TID 283610) failed, but the task will not be 
> re-executed (either because the task failed with a shuffle data fetch 
> failure, so the
> previous stage needs to be re-run, or because a different copy of the task 
> has already succeeded).
> 21/01/04 03:00:32,259 INFO [task-result-getter-2] 
> cluster.YarnClusterScheduler:57 : Removed TaskSet 600.0, whose tasks have all 
> completed, from pool default
> 21/01/04 03:00:32,259 INFO [HiveServer2-Handler-Pool: Thread-5853] 
> thriftserver.SparkExecuteStatementOperation:190 : Returning result set with 
> 50 rows from offsets [5378600, 5378650) with 
> 1fe245f8-a7f9-4ec0-bcb5-8cf324cbbb47
> 21/01/04 03:00:32,260 ERROR [spark-listener-group-executorManagement] 
> scheduler.AsyncEventQueue:94 : Listener ExecutorAllocationListener threw an 
> exception
> java.util.NoSuchElementException: key not found: Stage 600 (Attempt 0)
> at scala.collection.MapLike.default(MapLike.scala:235)
> at scala.collection.MapLike.default$(MapLike.scala:234)
> at scala.collection.AbstractMap.default(Map.scala:63)
> at scala.collection.mutable.HashMap.apply(HashMap.scala:69)
> at 
> org.apache.spark.ExecutorAllocationManager$ExecutorAllocationListener.onTaskEnd(ExecutorAllocationManager.scala:621)
> at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:45)
> at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38)
> at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:115)
> at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:99)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:116)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:116)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:102)
> at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:97)
> at 
> org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320)
> at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:97)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34000) ExecutorAllocationListener threw an exception java.util.NoSuchElementException

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34000:


Assignee: Apache Spark

> ExecutorAllocationListener threw an exception java.util.NoSuchElementException
> --
>
> Key: SPARK-34000
> URL: https://issues.apache.org/jira/browse/SPARK-34000
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1, 3.1.0, 3.2.0
>Reporter: Lantao Jin
>Assignee: Apache Spark
>Priority: Major
>
> 21/01/04 03:00:32,259 WARN [task-result-getter-2] scheduler.TaskSetManager:69 
> : Lost task 306.1 in stage 600.0 (TID 283610, 
> hdc49-mcc10-01-0510-4108-039-tess0097.stratus.rno.ebay.com, executor 27): 
> TaskKilled (another attempt succeeded)
> 21/01/04 03:00:32,259 INFO [task-result-getter-2] scheduler.TaskSetManager:57 
> : Task 306.1 in stage 600.0 (TID 283610) failed, but the task will not be 
> re-executed (either because the task failed with a shuffle data fetch 
> failure, so the
> previous stage needs to be re-run, or because a different copy of the task 
> has already succeeded).
> 21/01/04 03:00:32,259 INFO [task-result-getter-2] 
> cluster.YarnClusterScheduler:57 : Removed TaskSet 600.0, whose tasks have all 
> completed, from pool default
> 21/01/04 03:00:32,259 INFO [HiveServer2-Handler-Pool: Thread-5853] 
> thriftserver.SparkExecuteStatementOperation:190 : Returning result set with 
> 50 rows from offsets [5378600, 5378650) with 
> 1fe245f8-a7f9-4ec0-bcb5-8cf324cbbb47
> 21/01/04 03:00:32,260 ERROR [spark-listener-group-executorManagement] 
> scheduler.AsyncEventQueue:94 : Listener ExecutorAllocationListener threw an 
> exception
> java.util.NoSuchElementException: key not found: Stage 600 (Attempt 0)
> at scala.collection.MapLike.default(MapLike.scala:235)
> at scala.collection.MapLike.default$(MapLike.scala:234)
> at scala.collection.AbstractMap.default(Map.scala:63)
> at scala.collection.mutable.HashMap.apply(HashMap.scala:69)
> at 
> org.apache.spark.ExecutorAllocationManager$ExecutorAllocationListener.onTaskEnd(ExecutorAllocationManager.scala:621)
> at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:45)
> at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38)
> at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:115)
> at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:99)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:116)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:116)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:102)
> at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:97)
> at 
> org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320)
> at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:97)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34000) ExecutorAllocationListener threw an exception java.util.NoSuchElementException

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258640#comment-17258640
 ] 

Apache Spark commented on SPARK-34000:
--

User 'LantaoJin' has created a pull request for this issue:
https://github.com/apache/spark/pull/31025

> ExecutorAllocationListener threw an exception java.util.NoSuchElementException
> --
>
> Key: SPARK-34000
> URL: https://issues.apache.org/jira/browse/SPARK-34000
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1, 3.1.0, 3.2.0
>Reporter: Lantao Jin
>Priority: Major
>
> 21/01/04 03:00:32,259 WARN [task-result-getter-2] scheduler.TaskSetManager:69 
> : Lost task 306.1 in stage 600.0 (TID 283610, 
> hdc49-mcc10-01-0510-4108-039-tess0097.stratus.rno.ebay.com, executor 27): 
> TaskKilled (another attempt succeeded)
> 21/01/04 03:00:32,259 INFO [task-result-getter-2] scheduler.TaskSetManager:57 
> : Task 306.1 in stage 600.0 (TID 283610) failed, but the task will not be 
> re-executed (either because the task failed with a shuffle data fetch 
> failure, so the
> previous stage needs to be re-run, or because a different copy of the task 
> has already succeeded).
> 21/01/04 03:00:32,259 INFO [task-result-getter-2] 
> cluster.YarnClusterScheduler:57 : Removed TaskSet 600.0, whose tasks have all 
> completed, from pool default
> 21/01/04 03:00:32,259 INFO [HiveServer2-Handler-Pool: Thread-5853] 
> thriftserver.SparkExecuteStatementOperation:190 : Returning result set with 
> 50 rows from offsets [5378600, 5378650) with 
> 1fe245f8-a7f9-4ec0-bcb5-8cf324cbbb47
> 21/01/04 03:00:32,260 ERROR [spark-listener-group-executorManagement] 
> scheduler.AsyncEventQueue:94 : Listener ExecutorAllocationListener threw an 
> exception
> java.util.NoSuchElementException: key not found: Stage 600 (Attempt 0)
> at scala.collection.MapLike.default(MapLike.scala:235)
> at scala.collection.MapLike.default$(MapLike.scala:234)
> at scala.collection.AbstractMap.default(Map.scala:63)
> at scala.collection.mutable.HashMap.apply(HashMap.scala:69)
> at 
> org.apache.spark.ExecutorAllocationManager$ExecutorAllocationListener.onTaskEnd(ExecutorAllocationManager.scala:621)
> at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:45)
> at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38)
> at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:115)
> at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:99)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:116)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:116)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
> at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:102)
> at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:97)
> at 
> org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320)
> at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:97)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33979) Filter predicate reorder

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33979:


Assignee: Apache Spark

> Filter predicate reorder
> 
>
> Key: SPARK-33979
> URL: https://issues.apache.org/jira/browse/SPARK-33979
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>Priority: Major
>
> Reorder filter predicate to improve query performance:
> {noformat}
> others < In < Like < UDF/CaseWhen/If < Inset < LikeAny/LikeAll
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33979) Filter predicate reorder

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33979:


Assignee: (was: Apache Spark)

> Filter predicate reorder
> 
>
> Key: SPARK-33979
> URL: https://issues.apache.org/jira/browse/SPARK-33979
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> Reorder filter predicate to improve query performance:
> {noformat}
> others < In < Like < UDF/CaseWhen/If < Inset < LikeAny/LikeAll
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33979) Filter predicate reorder

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258639#comment-17258639
 ] 

Apache Spark commented on SPARK-33979:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/31024

> Filter predicate reorder
> 
>
> Key: SPARK-33979
> URL: https://issues.apache.org/jira/browse/SPARK-33979
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> Reorder filter predicate to improve query performance:
> {noformat}
> others < In < Like < UDF/CaseWhen/If < Inset < LikeAny/LikeAll
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34000) ExecutorAllocationListener threw an exception java.util.NoSuchElementException

2021-01-04 Thread Lantao Jin (Jira)

Lantao Jin created SPARK-34000:
--

 Summary: ExecutorAllocationListener threw an exception 
java.util.NoSuchElementException
 Key: SPARK-34000
 URL: https://issues.apache.org/jira/browse/SPARK-34000
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.0.1, 3.1.0, 3.2.0
Reporter: Lantao Jin


21/01/04 03:00:32,259 WARN [task-result-getter-2] scheduler.TaskSetManager:69 : 
Lost task 306.1 in stage 600.0 (TID 283610, 
hdc49-mcc10-01-0510-4108-039-tess0097.stratus.rno.ebay.com, executor 27): 
TaskKilled (another attempt succeeded)
21/01/04 03:00:32,259 INFO [task-result-getter-2] scheduler.TaskSetManager:57 : 
Task 306.1 in stage 600.0 (TID 283610) failed, but the task will not be 
re-executed (either because the task failed with a shuffle data fetch failure, 
so the
previous stage needs to be re-run, or because a different copy of the task has 
already succeeded).
21/01/04 03:00:32,259 INFO [task-result-getter-2] 
cluster.YarnClusterScheduler:57 : Removed TaskSet 600.0, whose tasks have all 
completed, from pool default
21/01/04 03:00:32,259 INFO [HiveServer2-Handler-Pool: Thread-5853] 
thriftserver.SparkExecuteStatementOperation:190 : Returning result set with 50 
rows from offsets [5378600, 5378650) with 1fe245f8-a7f9-4ec0-bcb5-8cf324cbbb47
21/01/04 03:00:32,260 ERROR [spark-listener-group-executorManagement] 
scheduler.AsyncEventQueue:94 : Listener ExecutorAllocationListener threw an 
exception
java.util.NoSuchElementException: key not found: Stage 600 (Attempt 0)
at scala.collection.MapLike.default(MapLike.scala:235)
at scala.collection.MapLike.default$(MapLike.scala:234)
at scala.collection.AbstractMap.default(Map.scala:63)
at scala.collection.mutable.HashMap.apply(HashMap.scala:69)
at 
org.apache.spark.ExecutorAllocationManager$ExecutorAllocationListener.onTaskEnd(ExecutorAllocationManager.scala:621)
at 
org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:45)
at 
org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
at 
org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38)
at 
org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38)
at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:115)
at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:99)
at 
org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:116)
at 
org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:116)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at 
org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:102)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:97)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320)
at 
org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:97)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33999) Make sbt unidoc success with JDK11

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258634#comment-17258634
 ] 

Apache Spark commented on SPARK-33999:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/31023

> Make sbt unidoc success with JDK11
> --
>
> Key: SPARK-33999
> URL: https://issues.apache.org/jira/browse/SPARK-33999
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> With the current master, sbt unidoc fails because the generated Java sources 
> cause syntax error.
> As of JDK11, the default doclet seems to refuse such syntax error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33999) Make sbt unidoc success with JDK11

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33999:


Assignee: Kousuke Saruta  (was: Apache Spark)

> Make sbt unidoc success with JDK11
> --
>
> Key: SPARK-33999
> URL: https://issues.apache.org/jira/browse/SPARK-33999
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> With the current master, sbt unidoc fails because the generated Java sources 
> cause syntax error.
> As of JDK11, the default doclet seems to refuse such syntax error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33999) Make sbt unidoc success with JDK11

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33999:


Assignee: Apache Spark  (was: Kousuke Saruta)

> Make sbt unidoc success with JDK11
> --
>
> Key: SPARK-33999
> URL: https://issues.apache.org/jira/browse/SPARK-33999
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Apache Spark
>Priority: Minor
>
> With the current master, sbt unidoc fails because the generated Java sources 
> cause syntax error.
> As of JDK11, the default doclet seems to refuse such syntax error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33999) Make sbt unidoc success with JDK11

2021-01-04 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-33999:
--

 Summary: Make sbt unidoc success with JDK11
 Key: SPARK-33999
 URL: https://issues.apache.org/jira/browse/SPARK-33999
 Project: Spark
  Issue Type: Sub-task
  Components: Build
Affects Versions: 3.2.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


With the current master, sbt unidoc fails because the generated Java sources 
cause syntax error.

As of JDK11, the default doclet seems to refuse such syntax error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-16087) Spark Hangs When Using Union With Persisted Hadoop RDD

2021-01-04 Thread Rob Russo (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-16087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rob Russo updated SPARK-16087:
--
Affects Version/s: 3.0.1

> Spark Hangs When Using Union With Persisted Hadoop RDD
> --
>
> Key: SPARK-16087
> URL: https://issues.apache.org/jira/browse/SPARK-16087
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.4.1, 1.6.1, 2.0.1, 3.0.1
>Reporter: Kevin Conaway
>Priority: Critical
>  Labels: bulk-closed
> Attachments: SPARK-16087.dump.log, SPARK-16087.log, Screen Shot 
> 2016-06-21 at 4.27.26 PM.png, Screen Shot 2016-06-21 at 4.27.35 PM.png, 
> part-0, part-1, spark-16087.tar.gz
>
>
> Spark hangs when materializing a persisted RDD that was built from a Hadoop 
> sequence file and then union-ed with a similar RDD.
> Below is a small file that exhibits the issue:
> {code:java}
> import org.apache.hadoop.io.BytesWritable;
> import org.apache.hadoop.io.LongWritable;
> import org.apache.spark.SparkConf;
> import org.apache.spark.api.java.JavaPairRDD;
> import org.apache.spark.api.java.JavaSparkContext;
> import org.apache.spark.api.java.function.PairFunction;
> import org.apache.spark.serializer.KryoSerializer;
> import org.apache.spark.storage.StorageLevel;
> import scala.Tuple2;
> public class SparkBug {
> public static void main(String [] args) throws Exception {
> JavaSparkContext sc = new JavaSparkContext(
> new SparkConf()
> .set("spark.serializer", KryoSerializer.class.getName())
> .set("spark.master", "local[*]")
> .setAppName(SparkBug.class.getName())
> );
> JavaPairRDD rdd1 = sc.sequenceFile(
>"hdfs://localhost:9000/part-0",
> LongWritable.class,
> BytesWritable.class
> ).mapToPair(new PairFunction, 
> LongWritable, BytesWritable>() {
> @Override
> public Tuple2 
> call(Tuple2 tuple) throws Exception {
> return new Tuple2<>(
> new LongWritable(tuple._1.get()),
> new BytesWritable(tuple._2.copyBytes())
> );
> }
> }).persist(
> StorageLevel.MEMORY_ONLY()
> );
> System.out.println("Before union: " + rdd1.count());
> JavaPairRDD rdd2 = sc.sequenceFile(
> "hdfs://localhost:9000/part-1",
> LongWritable.class,
> BytesWritable.class
> );
> JavaPairRDD joined = rdd1.union(rdd2);
> System.out.println("After union: " + joined.count());
> }
> }
> {code}
> You'll need to upload the attached part-0 and part-1 to a local hdfs 
> instance (I'm just using a dummy [Single Node 
> Cluster|http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html]
>  locally).
> Some things to note:
> - It does not hang if rdd1 is not persisted
> - It does not hang is rdd1 is not materialized (via calling rdd1.count()) 
> before the union-ed RDD is materialized
> - It does not hang if the mapToPair() transformation is removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16087) Spark Hangs When Using Union With Persisted Hadoop RDD

2021-01-04 Thread Rob Russo (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-16087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258627#comment-17258627
 ] 

Rob Russo commented on SPARK-16087:
---

I know this ticket is old now but spark 3 seems to have resurfaced the issues. 
I had a suite of tests that worked fine in spark 2.x but spent more than a 
month intermittently debugging why a number of my tests hung only on spark 3. 
As [~kevinconaway] said in his comment, it may be 1 refactor away from 
resurfacing and it seems that might be what happened.

 

For anyone running into this issue, here is my resolution that i finally 
discovered from this ticket:

Based on [~kevinconaway]'s comment saying that setting 
_spark.driver.host=localhost_ forces the problem, I found that setting 
_spark.driver.host=127.0.0.1_ completely fixes the problem. Hopefully this 
helps for anyone else who is running into this.

Due to this issue popping up i'm going to reopen the ticket and mark spark 3 as 
an affected version.

> Spark Hangs When Using Union With Persisted Hadoop RDD
> --
>
> Key: SPARK-16087
> URL: https://issues.apache.org/jira/browse/SPARK-16087
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.4.1, 1.6.1, 2.0.1
>Reporter: Kevin Conaway
>Priority: Critical
>  Labels: bulk-closed
> Attachments: SPARK-16087.dump.log, SPARK-16087.log, Screen Shot 
> 2016-06-21 at 4.27.26 PM.png, Screen Shot 2016-06-21 at 4.27.35 PM.png, 
> part-0, part-1, spark-16087.tar.gz
>
>
> Spark hangs when materializing a persisted RDD that was built from a Hadoop 
> sequence file and then union-ed with a similar RDD.
> Below is a small file that exhibits the issue:
> {code:java}
> import org.apache.hadoop.io.BytesWritable;
> import org.apache.hadoop.io.LongWritable;
> import org.apache.spark.SparkConf;
> import org.apache.spark.api.java.JavaPairRDD;
> import org.apache.spark.api.java.JavaSparkContext;
> import org.apache.spark.api.java.function.PairFunction;
> import org.apache.spark.serializer.KryoSerializer;
> import org.apache.spark.storage.StorageLevel;
> import scala.Tuple2;
> public class SparkBug {
> public static void main(String [] args) throws Exception {
> JavaSparkContext sc = new JavaSparkContext(
> new SparkConf()
> .set("spark.serializer", KryoSerializer.class.getName())
> .set("spark.master", "local[*]")
> .setAppName(SparkBug.class.getName())
> );
> JavaPairRDD rdd1 = sc.sequenceFile(
>"hdfs://localhost:9000/part-0",
> LongWritable.class,
> BytesWritable.class
> ).mapToPair(new PairFunction, 
> LongWritable, BytesWritable>() {
> @Override
> public Tuple2 
> call(Tuple2 tuple) throws Exception {
> return new Tuple2<>(
> new LongWritable(tuple._1.get()),
> new BytesWritable(tuple._2.copyBytes())
> );
> }
> }).persist(
> StorageLevel.MEMORY_ONLY()
> );
> System.out.println("Before union: " + rdd1.count());
> JavaPairRDD rdd2 = sc.sequenceFile(
> "hdfs://localhost:9000/part-1",
> LongWritable.class,
> BytesWritable.class
> );
> JavaPairRDD joined = rdd1.union(rdd2);
> System.out.println("After union: " + joined.count());
> }
> }
> {code}
> You'll need to upload the attached part-0 and part-1 to a local hdfs 
> instance (I'm just using a dummy [Single Node 
> Cluster|http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html]
>  locally).
> Some things to note:
> - It does not hang if rdd1 is not persisted
> - It does not hang is rdd1 is not materialized (via calling rdd1.count()) 
> before the union-ed RDD is materialized
> - It does not hang if the mapToPair() transformation is removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-16087) Spark Hangs When Using Union With Persisted Hadoop RDD

2021-01-04 Thread Rob Russo (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-16087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rob Russo reopened SPARK-16087:
---

Reopening as it occurred for us only after upgrading to spark 3.x

> Spark Hangs When Using Union With Persisted Hadoop RDD
> --
>
> Key: SPARK-16087
> URL: https://issues.apache.org/jira/browse/SPARK-16087
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.4.1, 1.6.1, 2.0.1
>Reporter: Kevin Conaway
>Priority: Critical
>  Labels: bulk-closed
> Attachments: SPARK-16087.dump.log, SPARK-16087.log, Screen Shot 
> 2016-06-21 at 4.27.26 PM.png, Screen Shot 2016-06-21 at 4.27.35 PM.png, 
> part-0, part-1, spark-16087.tar.gz
>
>
> Spark hangs when materializing a persisted RDD that was built from a Hadoop 
> sequence file and then union-ed with a similar RDD.
> Below is a small file that exhibits the issue:
> {code:java}
> import org.apache.hadoop.io.BytesWritable;
> import org.apache.hadoop.io.LongWritable;
> import org.apache.spark.SparkConf;
> import org.apache.spark.api.java.JavaPairRDD;
> import org.apache.spark.api.java.JavaSparkContext;
> import org.apache.spark.api.java.function.PairFunction;
> import org.apache.spark.serializer.KryoSerializer;
> import org.apache.spark.storage.StorageLevel;
> import scala.Tuple2;
> public class SparkBug {
> public static void main(String [] args) throws Exception {
> JavaSparkContext sc = new JavaSparkContext(
> new SparkConf()
> .set("spark.serializer", KryoSerializer.class.getName())
> .set("spark.master", "local[*]")
> .setAppName(SparkBug.class.getName())
> );
> JavaPairRDD rdd1 = sc.sequenceFile(
>"hdfs://localhost:9000/part-0",
> LongWritable.class,
> BytesWritable.class
> ).mapToPair(new PairFunction, 
> LongWritable, BytesWritable>() {
> @Override
> public Tuple2 
> call(Tuple2 tuple) throws Exception {
> return new Tuple2<>(
> new LongWritable(tuple._1.get()),
> new BytesWritable(tuple._2.copyBytes())
> );
> }
> }).persist(
> StorageLevel.MEMORY_ONLY()
> );
> System.out.println("Before union: " + rdd1.count());
> JavaPairRDD rdd2 = sc.sequenceFile(
> "hdfs://localhost:9000/part-1",
> LongWritable.class,
> BytesWritable.class
> );
> JavaPairRDD joined = rdd1.union(rdd2);
> System.out.println("After union: " + joined.count());
> }
> }
> {code}
> You'll need to upload the attached part-0 and part-1 to a local hdfs 
> instance (I'm just using a dummy [Single Node 
> Cluster|http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html]
>  locally).
> Some things to note:
> - It does not hang if rdd1 is not persisted
> - It does not hang is rdd1 is not materialized (via calling rdd1.count()) 
> before the union-ed RDD is materialized
> - It does not hang if the mapToPair() transformation is removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33964) Combine distinct unions in more cases

2021-01-04 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-33964.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 30996
[https://github.com/apache/spark/pull/30996]

> Combine distinct unions in more cases
> -
>
> Key: SPARK-33964
> URL: https://issues.apache.org/jira/browse/SPARK-33964
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tanel Kiis
>Assignee: Tanel Kiis
>Priority: Major
> Fix For: 3.2.0
>
>
> In several TPCDS queries the CombineUnions rule does not manage to combine 
> unions, because they have noop Projects between them.
> The Projects will be removed by RemoveNoopOperators, but by then 
> ReplaceDistinctWithAggregate has been applied and there are aggregates 
> between the unions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33964) Combine distinct unions in more cases

2021-01-04 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-33964:


Assignee: Tanel Kiis

> Combine distinct unions in more cases
> -
>
> Key: SPARK-33964
> URL: https://issues.apache.org/jira/browse/SPARK-33964
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tanel Kiis
>Assignee: Tanel Kiis
>Priority: Major
>
> In several TPCDS queries the CombineUnions rule does not manage to combine 
> unions, because they have noop Projects between them.
> The Projects will be removed by RemoveNoopOperators, but by then 
> ReplaceDistinctWithAggregate has been applied and there are aggregates 
> between the unions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33382) Unify v1 and v2 SHOW TABLES tests

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258617#comment-17258617
 ] 

Apache Spark commented on SPARK-33382:
--

User 'imback82' has created a pull request for this issue:
https://github.com/apache/spark/pull/31022

> Unify v1 and v2 SHOW TABLES tests
> -
>
> Key: SPARK-33382
> URL: https://issues.apache.org/jira/browse/SPARK-33382
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.1.0
>
>
> Gather common tests for DSv1 and DSv2 SHOW TABLES command to a common test. 
> Mix this trait to datasource specific test suites.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33997) Running bin/spark-sql gives NoSuchMethodError

2021-01-04 Thread Ted Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved SPARK-33997.

Resolution: Cannot Reproduce

Rebuilt Spark locally and the error was gone.

> Running bin/spark-sql gives NoSuchMethodError
> -
>
> Key: SPARK-33997
> URL: https://issues.apache.org/jira/browse/SPARK-33997
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Ted Yu
>Priority: Major
>
> I ran 'mvn install -Phive -Phive-thriftserver -DskipTests'
> Running bin/spark-sql gives the following error:
> {code}
> 21/01/05 00:06:06 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> Exception in thread "main" java.lang.NoSuchMethodError: 
> org.apache.spark.sql.internal.SharedState$.loadHiveConfFile$default$3()Lscala/collection/Map;
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:136)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:934)
> {code}
> Scala version 2.12.10



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33998) Refactor v2CommandExec to provide an API to create an InternalRow

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258606#comment-17258606
 ] 

Apache Spark commented on SPARK-33998:
--

User 'imback82' has created a pull request for this issue:
https://github.com/apache/spark/pull/31020

> Refactor v2CommandExec to provide an API to create an InternalRow
> -
>
> Key: SPARK-33998
> URL: https://issues.apache.org/jira/browse/SPARK-33998
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Terry Kim
>Priority: Minor
>
> There are many v2 commands such as "SHOW TABLES", "DESCRIBE TABLE", etc. that 
> require creating InternalRow. Creating InternalRow can be refactored into 
> v2CommandExec to remove duplicate code to create serializer, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33998) Refactor v2CommandExec to provide an API to create an InternalRow

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33998:


Assignee: (was: Apache Spark)

> Refactor v2CommandExec to provide an API to create an InternalRow
> -
>
> Key: SPARK-33998
> URL: https://issues.apache.org/jira/browse/SPARK-33998
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Terry Kim
>Priority: Minor
>
> There are many v2 commands such as "SHOW TABLES", "DESCRIBE TABLE", etc. that 
> require creating InternalRow. Creating InternalRow can be refactored into 
> v2CommandExec to remove duplicate code to create serializer, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33998) Refactor v2CommandExec to provide an API to create an InternalRow

2021-01-04 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258605#comment-17258605
 ] 

Apache Spark commented on SPARK-33998:
--

User 'imback82' has created a pull request for this issue:
https://github.com/apache/spark/pull/31020

> Refactor v2CommandExec to provide an API to create an InternalRow
> -
>
> Key: SPARK-33998
> URL: https://issues.apache.org/jira/browse/SPARK-33998
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Terry Kim
>Priority: Minor
>
> There are many v2 commands such as "SHOW TABLES", "DESCRIBE TABLE", etc. that 
> require creating InternalRow. Creating InternalRow can be refactored into 
> v2CommandExec to remove duplicate code to create serializer, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33998) Refactor v2CommandExec to provide an API to create an InternalRow

2021-01-04 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33998:


Assignee: Apache Spark

> Refactor v2CommandExec to provide an API to create an InternalRow
> -
>
> Key: SPARK-33998
> URL: https://issues.apache.org/jira/browse/SPARK-33998
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Terry Kim
>Assignee: Apache Spark
>Priority: Minor
>
> There are many v2 commands such as "SHOW TABLES", "DESCRIBE TABLE", etc. that 
> require creating InternalRow. Creating InternalRow can be refactored into 
> v2CommandExec to remove duplicate code to create serializer, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 3 >

1 - 100 of 224 matches

Mail list logo