[jira] [Assigned] (SPARK-29082) Spark driver cannot start with only delegation tokens

2019-09-18 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-29082:
--

Assignee: Marcelo Vanzin

> Spark driver cannot start with only delegation tokens
> -
>
> Key: SPARK-29082
> URL: https://issues.apache.org/jira/browse/SPARK-29082
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>Priority: Major
>
> If you start a Spark application with just delegation tokens, it fails. For 
> example, from an Oozie launch, you see things like this (line numbers may be 
> different):
> {noformat}
> No child hadoop job is executed.
> java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.oozie.action.hadoop.LauncherAM.runActionMain(LauncherAM.java:410)
> at 
> org.apache.oozie.action.hadoop.LauncherAM.access$300(LauncherAM.java:55)
> at 
> org.apache.oozie.action.hadoop.LauncherAM$2.run(LauncherAM.java:223)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at org.apache.oozie.action.hadoop.LauncherAM.run(LauncherAM.java:217)
> at 
> org.apache.oozie.action.hadoop.LauncherAM$1.run(LauncherAM.java:153)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at org.apache.oozie.action.hadoop.LauncherAM.main(LauncherAM.java:141)
> Caused by: org.apache.hadoop.security.KerberosAuthException: failure to 
> login: for principal: hrt_qa javax.security.auth.login.LoginException: Unable 
> to obtain password from user
> at 
> org.apache.hadoop.security.UserGroupInformation.doSubjectLogin(UserGroupInformation.java:1847)
> at 
> org.apache.hadoop.security.UserGroupInformation.getUGIFromTicketCache(UserGroupInformation.java:616)
> at 
> org.apache.spark.deploy.security.HadoopDelegationTokenManager.doLogin(HadoopDelegationTokenManager.scala:276)
> at 
> org.apache.spark.deploy.security.HadoopDelegationTokenManager.obtainDelegationTokens(HadoopDelegationTokenManager.scala:140)
> at 
> org.apache.spark.deploy.yarn.Client.setupSecurityToken(Client.scala:305)
> at 
> org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:1057)
> at 
> org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:179)
> at org.apache.spark.deploy.yarn.Client.run(Client.scala:1178)
> at 
> org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1584)
> at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:860)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29082) Spark driver cannot start with only delegation tokens

2019-09-18 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-29082.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 25805
[https://github.com/apache/spark/pull/25805]

> Spark driver cannot start with only delegation tokens
> -
>
> Key: SPARK-29082
> URL: https://issues.apache.org/jira/browse/SPARK-29082
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>Priority: Major
> Fix For: 3.0.0
>
>
> If you start a Spark application with just delegation tokens, it fails. For 
> example, from an Oozie launch, you see things like this (line numbers may be 
> different):
> {noformat}
> No child hadoop job is executed.
> java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.oozie.action.hadoop.LauncherAM.runActionMain(LauncherAM.java:410)
> at 
> org.apache.oozie.action.hadoop.LauncherAM.access$300(LauncherAM.java:55)
> at 
> org.apache.oozie.action.hadoop.LauncherAM$2.run(LauncherAM.java:223)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at org.apache.oozie.action.hadoop.LauncherAM.run(LauncherAM.java:217)
> at 
> org.apache.oozie.action.hadoop.LauncherAM$1.run(LauncherAM.java:153)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at org.apache.oozie.action.hadoop.LauncherAM.main(LauncherAM.java:141)
> Caused by: org.apache.hadoop.security.KerberosAuthException: failure to 
> login: for principal: hrt_qa javax.security.auth.login.LoginException: Unable 
> to obtain password from user
> at 
> org.apache.hadoop.security.UserGroupInformation.doSubjectLogin(UserGroupInformation.java:1847)
> at 
> org.apache.hadoop.security.UserGroupInformation.getUGIFromTicketCache(UserGroupInformation.java:616)
> at 
> org.apache.spark.deploy.security.HadoopDelegationTokenManager.doLogin(HadoopDelegationTokenManager.scala:276)
> at 
> org.apache.spark.deploy.security.HadoopDelegationTokenManager.obtainDelegationTokens(HadoopDelegationTokenManager.scala:140)
> at 
> org.apache.spark.deploy.yarn.Client.setupSecurityToken(Client.scala:305)
> at 
> org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:1057)
> at 
> org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:179)
> at org.apache.spark.deploy.yarn.Client.run(Client.scala:1178)
> at 
> org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1584)
> at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:860)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28091) Extend Spark metrics system with user-defined metrics using executor plugins

2019-09-18 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-28091:
--

Assignee: Luca Canali

> Extend Spark metrics system with user-defined metrics using executor plugins
> 
>
> Key: SPARK-28091
> URL: https://issues.apache.org/jira/browse/SPARK-28091
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Luca Canali
>Assignee: Luca Canali
>Priority: Minor
>
> This proposes to improve Spark instrumentation by adding a hook for 
> user-defined metrics, extending Spark’s Dropwizard/Codahale metrics system.
> The original motivation of this work was to add instrumentation for S3 
> filesystem access metrics by Spark job. Currently, [[ExecutorSource]] 
> instruments HDFS and local filesystem metrics. Rather than extending the code 
> there, we proposes with this JIRA to add a metrics plugin system which is of 
> more flexible and general use.
> Context: The Spark metrics system provides a large variety of metrics, see 
> also , useful to  monitor and troubleshoot Spark workloads. A typical 
> workflow is to sink the metrics to a storage system and build dashboards on 
> top of that.
> Highlights:
>  * The metric plugin system makes it easy to implement instrumentation for S3 
> access by Spark jobs.
>  * The metrics plugin system allows for easy extensions of how Spark collects 
> HDFS-related workload metrics. This is currently done using the Hadoop 
> Filesystem GetAllStatistics method, which is deprecated in recent versions of 
> Hadoop. Recent versions of Hadoop Filesystem recommend using method 
> GetGlobalStorageStatistics, which also provides several additional metrics. 
> GetGlobalStorageStatistics is not available in Hadoop 2.7 (had been 
> introduced in Hadoop 2.8). Using a metric plugin for Spark would allow an 
> easy way to “opt in” using such new API calls for those deploying suitable 
> Hadoop versions.
>  * We also have the use case of adding Hadoop filesystem monitoring for a 
> custom Hadoop compliant filesystem in use in our organization (EOS using the 
> XRootD protocol). The metrics plugin infrastructure makes this easy to do. 
> Others may have similar use cases.
>  * More generally, this method makes it straightforward to plug in Filesystem 
> and other metrics to the Spark monitoring system. Future work on plugin 
> implementation can address extending monitoring to measure usage of external 
> resources (OS, filesystem, network, accelerator cards, etc), that maybe would 
> not normally be considered general enough for inclusion in Apache Spark code, 
> but that can be nevertheless useful for specialized use cases, tests or 
> troubleshooting.
> Implementation:
> The proposed implementation builds on top of the work on Executor Plugin of 
> SPARK-24918 and builds on recent work on extending Spark executor metrics, 
> such as SPARK-25228
> Tests and examples:
> This has been so far manually tested running Spark on YARN and K8S clusters, 
> in particular for monitoring S3 and for extending HDFS instrumentation with 
> the Hadoop Filesystem “GetGlobalStorageStatistics” metrics. Executor metric 
> plugin example and code used for testing are available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28091) Extend Spark metrics system with user-defined metrics using executor plugins

2019-09-18 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-28091.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 24901
[https://github.com/apache/spark/pull/24901]

> Extend Spark metrics system with user-defined metrics using executor plugins
> 
>
> Key: SPARK-28091
> URL: https://issues.apache.org/jira/browse/SPARK-28091
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Luca Canali
>Assignee: Luca Canali
>Priority: Minor
> Fix For: 3.0.0
>
>
> This proposes to improve Spark instrumentation by adding a hook for 
> user-defined metrics, extending Spark’s Dropwizard/Codahale metrics system.
> The original motivation of this work was to add instrumentation for S3 
> filesystem access metrics by Spark job. Currently, [[ExecutorSource]] 
> instruments HDFS and local filesystem metrics. Rather than extending the code 
> there, we proposes with this JIRA to add a metrics plugin system which is of 
> more flexible and general use.
> Context: The Spark metrics system provides a large variety of metrics, see 
> also , useful to  monitor and troubleshoot Spark workloads. A typical 
> workflow is to sink the metrics to a storage system and build dashboards on 
> top of that.
> Highlights:
>  * The metric plugin system makes it easy to implement instrumentation for S3 
> access by Spark jobs.
>  * The metrics plugin system allows for easy extensions of how Spark collects 
> HDFS-related workload metrics. This is currently done using the Hadoop 
> Filesystem GetAllStatistics method, which is deprecated in recent versions of 
> Hadoop. Recent versions of Hadoop Filesystem recommend using method 
> GetGlobalStorageStatistics, which also provides several additional metrics. 
> GetGlobalStorageStatistics is not available in Hadoop 2.7 (had been 
> introduced in Hadoop 2.8). Using a metric plugin for Spark would allow an 
> easy way to “opt in” using such new API calls for those deploying suitable 
> Hadoop versions.
>  * We also have the use case of adding Hadoop filesystem monitoring for a 
> custom Hadoop compliant filesystem in use in our organization (EOS using the 
> XRootD protocol). The metrics plugin infrastructure makes this easy to do. 
> Others may have similar use cases.
>  * More generally, this method makes it straightforward to plug in Filesystem 
> and other metrics to the Spark monitoring system. Future work on plugin 
> implementation can address extending monitoring to measure usage of external 
> resources (OS, filesystem, network, accelerator cards, etc), that maybe would 
> not normally be considered general enough for inclusion in Apache Spark code, 
> but that can be nevertheless useful for specialized use cases, tests or 
> troubleshooting.
> Implementation:
> The proposed implementation builds on top of the work on Executor Plugin of 
> SPARK-24918 and builds on recent work on extending Spark executor metrics, 
> such as SPARK-25228
> Tests and examples:
> This has been so far manually tested running Spark on YARN and K8S clusters, 
> in particular for monitoring S3 and for extending HDFS instrumentation with 
> the Hadoop Filesystem “GetGlobalStorageStatistics” metrics. Executor metric 
> plugin example and code used for testing are available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29105) SHS may delete driver log file of in progress application

2019-09-18 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-29105:
--

Assignee: Marcelo Vanzin

> SHS may delete driver log file of in progress application
> -
>
> Key: SPARK-29105
> URL: https://issues.apache.org/jira/browse/SPARK-29105
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>Priority: Minor
>
> There's an issue with how the SHS cleans driver logs that is similar to the 
> problem of event logs: because the file size is not updated when you write to 
> it, the SHS fails to detect activity and thus may delete the file while it's 
> still being written to.
> SPARK-24787 added a workaround in the SHS so that it can detect that 
> situation for in-progress apps, replacing the previous solution which was too 
> slow for event logs.
> But that doesn't work for driver logs because they do not follow the same 
> pattern (different file names for in-progress files), and thus would require 
> the SHS to open the driver log files on every scan, which is expensive.
> The old approach (using the {{hsync}} API) seems to be a good match for the 
> driver logs, though, which don't slow down the listener bus like event logs 
> do.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29105) SHS may delete driver log file of in progress application

2019-09-18 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-29105.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 25819
[https://github.com/apache/spark/pull/25819]

> SHS may delete driver log file of in progress application
> -
>
> Key: SPARK-29105
> URL: https://issues.apache.org/jira/browse/SPARK-29105
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>Priority: Minor
> Fix For: 3.0.0
>
>
> There's an issue with how the SHS cleans driver logs that is similar to the 
> problem of event logs: because the file size is not updated when you write to 
> it, the SHS fails to detect activity and thus may delete the file while it's 
> still being written to.
> SPARK-24787 added a workaround in the SHS so that it can detect that 
> situation for in-progress apps, replacing the previous solution which was too 
> slow for event logs.
> But that doesn't work for driver logs because they do not follow the same 
> pattern (different file names for in-progress files), and thus would require 
> the SHS to open the driver log files on every scan, which is expensive.
> The old approach (using the {{hsync}} API) seems to be a good match for the 
> driver logs, though, which don't slow down the listener bus like event logs 
> do.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29027) KafkaDelegationTokenSuite fails

2019-09-17 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-29027.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 25803
[https://github.com/apache/spark/pull/25803]

> KafkaDelegationTokenSuite fails
> ---
>
> Key: SPARK-29027
> URL: https://issues.apache.org/jira/browse/SPARK-29027
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.0.0
> Environment: {code}
> commit 6378d4bc06cd1bb1a209bd5fb63d10ef52d75eb4
> Author: Sean Owen 
> Date:   Mon Sep 9 10:19:40 2019 -0500
> {code}
> Ubuntu 16.04 with OpenJDK 1.8 (1.8.0_222-8u222-b10-1ubuntu1~16.04.1-b10)
>Reporter: koert kuipers
>Assignee: Gabor Somogyi
>Priority: Minor
> Fix For: 3.0.0
>
>
> i am seeing consistent failure of KafkaDelegationTokenSuite on master
> {code}
> JsonUtilsSuite:
> - parsing partitions
> - parsing partitionOffsets
> KafkaDelegationTokenSuite:
> javax.security.sasl.SaslException: Failure to initialize security context 
> [Caused by GSSException: No valid credentials provided (Mechanism level: 
> Failed to find any Kerberos credentails)]
>   at 
> com.sun.security.sasl.gsskerb.GssKrb5Server.(GssKrb5Server.java:125)
>   at 
> com.sun.security.sasl.gsskerb.FactoryImpl.createSaslServer(FactoryImpl.java:85)
>   at javax.security.sasl.Sasl.createSaslServer(Sasl.java:524)
>   at 
> org.apache.zookeeper.server.ZooKeeperSaslServer$1.run(ZooKeeperSaslServer.java:118)
>   at 
> org.apache.zookeeper.server.ZooKeeperSaslServer$1.run(ZooKeeperSaslServer.java:114)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.zookeeper.server.ZooKeeperSaslServer.createSaslServer(ZooKeeperSaslServer.java:114)
>   at 
> org.apache.zookeeper.server.ZooKeeperSaslServer.(ZooKeeperSaslServer.java:48)
>   at 
> org.apache.zookeeper.server.NIOServerCnxn.(NIOServerCnxn.java:100)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.createConnection(NIOServerCnxnFactory.java:156)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:197)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: GSSException: No valid credentials provided (Mechanism level: 
> Failed to find any Kerberos credentails)
>   at 
> sun.security.jgss.krb5.Krb5AcceptCredential.getInstance(Krb5AcceptCredential.java:87)
>   at 
> sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:127)
>   at 
> sun.security.jgss.GSSManagerImpl.getCredentialElement(GSSManagerImpl.java:193)
>   at sun.security.jgss.GSSCredentialImpl.add(GSSCredentialImpl.java:427)
>   at sun.security.jgss.GSSCredentialImpl.(GSSCredentialImpl.java:62)
>   at 
> sun.security.jgss.GSSManagerImpl.createCredential(GSSManagerImpl.java:154)
>   at 
> com.sun.security.sasl.gsskerb.GssKrb5Server.(GssKrb5Server.java:108)
>   ... 12 more
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite *** ABORTED ***
>   org.I0Itec.zkclient.exception.ZkAuthFailedException: Authentication failure
>   at org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:947)
>   at org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:924)
>   at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1231)
>   at org.I0Itec.zkclient.ZkClient.(ZkClient.java:157)
>   at org.I0Itec.zkclient.ZkClient.(ZkClient.java:131)
>   at kafka.utils.ZkUtils$.createZkClientAndConnection(ZkUtils.scala:93)
>   at kafka.utils.ZkUtils$.apply(ZkUtils.scala:75)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedZookeeper(KafkaTestUtils.scala:202)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:243)
>   at 
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49)
>   ...
> KafkaSourceOffsetSuite:
> - comparison {"t":{"0":1}} <=> {"t":{"0":2}}
> - comparison {"t":{"1":0,"0":1}} <=> {"t":{"1":1,"0":2}}
> - comparison {"t":{"0":1},"T":{"0":0}} <=> {"t":{"0":2},"T":{"0":1}}
> - comparison {"t":{"0":1}} <=> {"t":{"1":1,"0":2}}
> - comparison {"t":{"0":1}} <=> {"t":{"1":3,"0":2}}
> - basic serialization - deserialization
> - OffsetSeqLog serialization - deserialization
> - read Spark 2.1.0 offset format
> {code}
> {code}
> [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> [INFO] 
> [INFO] Spark Project Parent POM ... SUCCESS [  4.178 
> s]
> [INFO] Spark Project Tags . SUCCESS [  9.373 
> s]
> [INFO] Spark Project Sketch ... SUCCESS [ 24.586 
> s]
> [INFO] Spark Project Local DB 

[jira] [Assigned] (SPARK-29027) KafkaDelegationTokenSuite fails

2019-09-17 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-29027:
--

Assignee: Gabor Somogyi

> KafkaDelegationTokenSuite fails
> ---
>
> Key: SPARK-29027
> URL: https://issues.apache.org/jira/browse/SPARK-29027
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.0.0
> Environment: {code}
> commit 6378d4bc06cd1bb1a209bd5fb63d10ef52d75eb4
> Author: Sean Owen 
> Date:   Mon Sep 9 10:19:40 2019 -0500
> {code}
> Ubuntu 16.04 with OpenJDK 1.8 (1.8.0_222-8u222-b10-1ubuntu1~16.04.1-b10)
>Reporter: koert kuipers
>Assignee: Gabor Somogyi
>Priority: Minor
>
> i am seeing consistent failure of KafkaDelegationTokenSuite on master
> {code}
> JsonUtilsSuite:
> - parsing partitions
> - parsing partitionOffsets
> KafkaDelegationTokenSuite:
> javax.security.sasl.SaslException: Failure to initialize security context 
> [Caused by GSSException: No valid credentials provided (Mechanism level: 
> Failed to find any Kerberos credentails)]
>   at 
> com.sun.security.sasl.gsskerb.GssKrb5Server.(GssKrb5Server.java:125)
>   at 
> com.sun.security.sasl.gsskerb.FactoryImpl.createSaslServer(FactoryImpl.java:85)
>   at javax.security.sasl.Sasl.createSaslServer(Sasl.java:524)
>   at 
> org.apache.zookeeper.server.ZooKeeperSaslServer$1.run(ZooKeeperSaslServer.java:118)
>   at 
> org.apache.zookeeper.server.ZooKeeperSaslServer$1.run(ZooKeeperSaslServer.java:114)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.zookeeper.server.ZooKeeperSaslServer.createSaslServer(ZooKeeperSaslServer.java:114)
>   at 
> org.apache.zookeeper.server.ZooKeeperSaslServer.(ZooKeeperSaslServer.java:48)
>   at 
> org.apache.zookeeper.server.NIOServerCnxn.(NIOServerCnxn.java:100)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.createConnection(NIOServerCnxnFactory.java:156)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:197)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: GSSException: No valid credentials provided (Mechanism level: 
> Failed to find any Kerberos credentails)
>   at 
> sun.security.jgss.krb5.Krb5AcceptCredential.getInstance(Krb5AcceptCredential.java:87)
>   at 
> sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:127)
>   at 
> sun.security.jgss.GSSManagerImpl.getCredentialElement(GSSManagerImpl.java:193)
>   at sun.security.jgss.GSSCredentialImpl.add(GSSCredentialImpl.java:427)
>   at sun.security.jgss.GSSCredentialImpl.(GSSCredentialImpl.java:62)
>   at 
> sun.security.jgss.GSSManagerImpl.createCredential(GSSManagerImpl.java:154)
>   at 
> com.sun.security.sasl.gsskerb.GssKrb5Server.(GssKrb5Server.java:108)
>   ... 12 more
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite *** ABORTED ***
>   org.I0Itec.zkclient.exception.ZkAuthFailedException: Authentication failure
>   at org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:947)
>   at org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:924)
>   at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1231)
>   at org.I0Itec.zkclient.ZkClient.(ZkClient.java:157)
>   at org.I0Itec.zkclient.ZkClient.(ZkClient.java:131)
>   at kafka.utils.ZkUtils$.createZkClientAndConnection(ZkUtils.scala:93)
>   at kafka.utils.ZkUtils$.apply(ZkUtils.scala:75)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedZookeeper(KafkaTestUtils.scala:202)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:243)
>   at 
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49)
>   ...
> KafkaSourceOffsetSuite:
> - comparison {"t":{"0":1}} <=> {"t":{"0":2}}
> - comparison {"t":{"1":0,"0":1}} <=> {"t":{"1":1,"0":2}}
> - comparison {"t":{"0":1},"T":{"0":0}} <=> {"t":{"0":2},"T":{"0":1}}
> - comparison {"t":{"0":1}} <=> {"t":{"1":1,"0":2}}
> - comparison {"t":{"0":1}} <=> {"t":{"1":3,"0":2}}
> - basic serialization - deserialization
> - OffsetSeqLog serialization - deserialization
> - read Spark 2.1.0 offset format
> {code}
> {code}
> [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> [INFO] 
> [INFO] Spark Project Parent POM ... SUCCESS [  4.178 
> s]
> [INFO] Spark Project Tags . SUCCESS [  9.373 
> s]
> [INFO] Spark Project Sketch ... SUCCESS [ 24.586 
> s]
> [INFO] Spark Project Local DB . SUCCESS [  5.456 
> s]
> [INFO] Spark Project Networking ... SUCCESS [ 49.819 
> s]
> 

[jira] [Created] (SPARK-29105) SHS may delete driver log file of in progress application

2019-09-16 Thread Marcelo Vanzin (Jira)
Marcelo Vanzin created SPARK-29105:
--

 Summary: SHS may delete driver log file of in progress application
 Key: SPARK-29105
 URL: https://issues.apache.org/jira/browse/SPARK-29105
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: Marcelo Vanzin


There's an issue with how the SHS cleans driver logs that is similar to the 
problem of event logs: because the file size is not updated when you write to 
it, the SHS fails to detect activity and thus may delete the file while it's 
still being written to.

SPARK-24787 added a workaround in the SHS so that it can detect that situation 
for in-progress apps, replacing the previous solution which was too slow for 
event logs.

But that doesn't work for driver logs because they do not follow the same 
pattern (different file names for in-progress files), and thus would require 
the SHS to open the driver log files on every scan, which is expensive.

The old approach (using the {{hsync}} API) seems to be a good match for the 
driver logs, though, which don't slow down the listener bus like event logs do.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26929) table owner should use user instead of principal when use kerberos

2019-09-16 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-26929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-26929:
--

Assignee: hong dongdong

> table owner should use user instead of principal when use kerberos
> --
>
> Key: SPARK-26929
> URL: https://issues.apache.org/jira/browse/SPARK-26929
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0
>Reporter: hong dongdong
>Assignee: hong dongdong
>Priority: Major
>
> In kerberos cluster, when use spark-sql or beeline to create table,  the 
> owner will be whole info of principal. the _issue_  was fixed in SPARK-19970 
> and modify by SPARK-22846, so it occur again. It will causes some problems 
> when using role., and this time should resolved two issues together.
> Use  org.apache.hadoop.hive.shims.Utils.getUGI  directly to get 
> ugi.getShortUserName
> instead of use  conf.getUser which return principal info.
> Code change
> {code:java}
> private val userName: String = try {
> val ugi = HiveUtils.getUGI
> ugi.getShortUserName
> } catch {
> case e: LoginException => throw new IOException(e)
> }
> {code}
> Berfore
> {code}
> scala> sql("create table t(a int)").show
>  scala> sql("desc formatted t").show(false)
>  ...
> |Owner:|sp...@example.com| |
> {code}
> After:
> {code}
>  scala> sql("create table t(a int)").show
>  scala> sql("desc formatted t").show(false)
>  ...
> |Owner:|spark| |
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26929) table owner should use user instead of principal when use kerberos

2019-09-16 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-26929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-26929.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 23952
[https://github.com/apache/spark/pull/23952]

> table owner should use user instead of principal when use kerberos
> --
>
> Key: SPARK-26929
> URL: https://issues.apache.org/jira/browse/SPARK-26929
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0
>Reporter: hong dongdong
>Assignee: hong dongdong
>Priority: Major
> Fix For: 3.0.0
>
>
> In kerberos cluster, when use spark-sql or beeline to create table,  the 
> owner will be whole info of principal. the _issue_  was fixed in SPARK-19970 
> and modify by SPARK-22846, so it occur again. It will causes some problems 
> when using role., and this time should resolved two issues together.
> Use  org.apache.hadoop.hive.shims.Utils.getUGI  directly to get 
> ugi.getShortUserName
> instead of use  conf.getUser which return principal info.
> Code change
> {code:java}
> private val userName: String = try {
> val ugi = HiveUtils.getUGI
> ugi.getShortUserName
> } catch {
> case e: LoginException => throw new IOException(e)
> }
> {code}
> Berfore
> {code}
> scala> sql("create table t(a int)").show
>  scala> sql("desc formatted t").show(false)
>  ...
> |Owner:|sp...@example.com| |
> {code}
> After:
> {code}
>  scala> sql("create table t(a int)").show
>  scala> sql("desc formatted t").show(false)
>  ...
> |Owner:|spark| |
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29082) Spark driver cannot start with only delegation tokens

2019-09-13 Thread Marcelo Vanzin (Jira)
Marcelo Vanzin created SPARK-29082:
--

 Summary: Spark driver cannot start with only delegation tokens
 Key: SPARK-29082
 URL: https://issues.apache.org/jira/browse/SPARK-29082
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: Marcelo Vanzin


If you start a Spark application with just delegation tokens, it fails. For 
example, from an Oozie launch, you see things like this (line numbers may be 
different):

{noformat}
No child hadoop job is executed.
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.oozie.action.hadoop.LauncherAM.runActionMain(LauncherAM.java:410)
at 
org.apache.oozie.action.hadoop.LauncherAM.access$300(LauncherAM.java:55)
at org.apache.oozie.action.hadoop.LauncherAM$2.run(LauncherAM.java:223)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.oozie.action.hadoop.LauncherAM.run(LauncherAM.java:217)
at org.apache.oozie.action.hadoop.LauncherAM$1.run(LauncherAM.java:153)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.oozie.action.hadoop.LauncherAM.main(LauncherAM.java:141)
Caused by: org.apache.hadoop.security.KerberosAuthException: failure to login: 
for principal: hrt_qa javax.security.auth.login.LoginException: Unable to 
obtain password from user

at 
org.apache.hadoop.security.UserGroupInformation.doSubjectLogin(UserGroupInformation.java:1847)
at 
org.apache.hadoop.security.UserGroupInformation.getUGIFromTicketCache(UserGroupInformation.java:616)
at 
org.apache.spark.deploy.security.HadoopDelegationTokenManager.doLogin(HadoopDelegationTokenManager.scala:276)
at 
org.apache.spark.deploy.security.HadoopDelegationTokenManager.obtainDelegationTokens(HadoopDelegationTokenManager.scala:140)
at 
org.apache.spark.deploy.yarn.Client.setupSecurityToken(Client.scala:305)
at 
org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:1057)
at 
org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:179)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1178)
at 
org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1584)
at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:860)
{noformat}




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24663) Flaky test: StreamingContextSuite "stop slow receiver gracefully"

2019-09-11 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-24663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-24663:
--

Assignee: Jungtaek Lim

> Flaky test: StreamingContextSuite "stop slow receiver gracefully"
> -
>
> Key: SPARK-24663
> URL: https://issues.apache.org/jira/browse/SPARK-24663
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Marcelo Vanzin
>Assignee: Jungtaek Lim
>Priority: Minor
>
> This is another test that sometimes fails on our build machines, although I 
> can't find failures on the riselab jenkins servers. Failure looks like:
> {noformat}
> org.scalatest.exceptions.TestFailedException: 0 was not greater than 0
>   at 
> org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466)
>   at 
> org.apache.spark.streaming.StreamingContextSuite$$anonfun$24.apply$mcV$sp(StreamingContextSuite.scala:356)
>   at 
> org.apache.spark.streaming.StreamingContextSuite$$anonfun$24.apply(StreamingContextSuite.scala:335)
>   at 
> org.apache.spark.streaming.StreamingContextSuite$$anonfun$24.apply(StreamingContextSuite.scala:335)
> {noformat}
> The test fails in about 2s, while a successful run generally takes 15s. 
> Looking at the logs, the receiver hasn't even started when things fail, which 
> points at a race during test initialization.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-24663) Flaky test: StreamingContextSuite "stop slow receiver gracefully"

2019-09-11 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-24663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-24663.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 25725
[https://github.com/apache/spark/pull/25725]

> Flaky test: StreamingContextSuite "stop slow receiver gracefully"
> -
>
> Key: SPARK-24663
> URL: https://issues.apache.org/jira/browse/SPARK-24663
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Marcelo Vanzin
>Assignee: Jungtaek Lim
>Priority: Minor
> Fix For: 3.0.0
>
>
> This is another test that sometimes fails on our build machines, although I 
> can't find failures on the riselab jenkins servers. Failure looks like:
> {noformat}
> org.scalatest.exceptions.TestFailedException: 0 was not greater than 0
>   at 
> org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466)
>   at 
> org.apache.spark.streaming.StreamingContextSuite$$anonfun$24.apply$mcV$sp(StreamingContextSuite.scala:356)
>   at 
> org.apache.spark.streaming.StreamingContextSuite$$anonfun$24.apply(StreamingContextSuite.scala:335)
>   at 
> org.apache.spark.streaming.StreamingContextSuite$$anonfun$24.apply(StreamingContextSuite.scala:335)
> {noformat}
> The test fails in about 2s, while a successful run generally takes 15s. 
> Looking at the logs, the receiver hasn't even started when things fail, which 
> points at a race during test initialization.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29007) Possible leak of SparkContext in tests / test suites initializing StreamingContext

2019-09-11 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-29007.

Fix Version/s: 3.0.0
 Assignee: Jungtaek Lim
   Resolution: Fixed

> Possible leak of SparkContext in tests / test suites initializing 
> StreamingContext
> --
>
> Key: SPARK-29007
> URL: https://issues.apache.org/jira/browse/SPARK-29007
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams, MLlib, Spark Core
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Minor
> Fix For: 3.0.0
>
>
> There're lots of tests creating StreamingContext with creating new 
> SparkContext in its constructor, and we don't have enough guard to prevent 
> leakage of SparkContext in test suites. Ideally we should ensure SparkContext 
> is not leaked between test suites, even between tests if each test creates 
> StreamingContext.
>  
> One of example for leakage is below:
> {noformat}
> [info] *** 4 SUITES ABORTED ***
> [info] *** 131 TESTS FAILED ***
> [error] Error: Total 418, Failed 131, Errors 4, Passed 283, Ignored 1
> [error] Failed tests:
> [error]   org.apache.spark.streaming.scheduler.JobGeneratorSuite
> [error]   org.apache.spark.streaming.ReceiverInputDStreamSuite
> [error]   org.apache.spark.streaming.WindowOperationsSuite
> [error]   org.apache.spark.streaming.StreamingContextSuite
> [error]   org.apache.spark.streaming.scheduler.ReceiverTrackerSuite
> [error]   org.apache.spark.streaming.CheckpointSuite
> [error]   org.apache.spark.streaming.UISeleniumSuite
> [error]   
> org.apache.spark.streaming.scheduler.ExecutorAllocationManagerSuite
> [error]   org.apache.spark.streaming.ReceiverSuite
> [error]   org.apache.spark.streaming.BasicOperationsSuite
> [error]   org.apache.spark.streaming.InputStreamsSuite
> [error] Error during tests:
> [error]   org.apache.spark.streaming.MapWithStateSuite
> [error]   org.apache.spark.streaming.DStreamScopeSuite
> [error]   org.apache.spark.streaming.rdd.MapWithStateRDDSuite
> [error]   org.apache.spark.streaming.scheduler.InputInfoTrackerSuite
>  {noformat}
> {{}}
> {noformat}
> [info] JobGeneratorSuite:
> [info] - SPARK-6222: Do not clear received block data too soon *** FAILED *** 
> (2 milliseconds)
> [info]   org.apache.spark.SparkException: Only one SparkContext should be 
> running in this JVM (see SPARK-2243).The currently running SparkContext was 
> created at:
> [info] org.apache.spark.SparkContext.(SparkContext.scala:82)
> [info] 
> org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:851)
> [info] 
> org.apache.spark.streaming.StreamingContext.(StreamingContext.scala:85)
> [info] 
> org.apache.spark.streaming.TestSuiteBase.setupStreams(TestSuiteBase.scala:317)
> [info] 
> org.apache.spark.streaming.TestSuiteBase.setupStreams$(TestSuiteBase.scala:311)
> [info] 
> org.apache.spark.streaming.CheckpointSuite.setupStreams(CheckpointSuite.scala:209)
> [info] 
> org.apache.spark.streaming.CheckpointSuite.$anonfun$new$3(CheckpointSuite.scala:258)
> [info] scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> [info] org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> [info] org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> [info] org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info] org.scalatest.Transformer.apply(Transformer.scala:22)
> [info] org.scalatest.Transformer.apply(Transformer.scala:20)
> [info] org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
> [info] org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:149)
> [info] org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
> [info] org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196)
> [info] org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
> [info] org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196)
> [info] org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178)
> [info]   at 
> org.apache.spark.SparkContext$.$anonfun$assertNoOtherContextIsRunning$2(SparkContext.scala:2512)
> [info]   at scala.Option.foreach(Option.scala:274)
> [info]   at 
> org.apache.spark.SparkContext$.assertNoOtherContextIsRunning(SparkContext.scala:2509)
> [info]   at 
> org.apache.spark.SparkContext$.markPartiallyConstructed(SparkContext.scala:2586)
> [info]   at org.apache.spark.SparkContext.(SparkContext.scala:87)
> [info]   at 
> org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:851)
> [info]   at 
> org.apache.spark.streaming.StreamingContext.(StreamingContext.scala:85)
> [info]   at 
> 

[jira] [Resolved] (SPARK-26989) Flaky test:DAGSchedulerSuite.Barrier task failures from the same stage attempt don't trigger multiple stage retries

2019-09-11 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-26989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-26989.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 25706
[https://github.com/apache/spark/pull/25706]

> Flaky test:DAGSchedulerSuite.Barrier task failures from the same stage 
> attempt don't trigger multiple stage retries
> ---
>
> Key: SPARK-26989
> URL: https://issues.apache.org/jira/browse/SPARK-26989
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 3.0.0
>Reporter: Marcelo Vanzin
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0
>
>
> https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/102761/testReport/junit/org.apache.spark.scheduler/DAGSchedulerSuite/Barrier_task_failures_from_the_same_stage_attempt_don_t_trigger_multiple_stage_retries/
> {noformat}
> org.apache.spark.scheduler.DAGSchedulerSuite.Barrier task failures from the 
> same stage attempt don't trigger multiple stage retries
> Error Message
> org.scalatest.exceptions.TestFailedException: ArrayBuffer() did not equal 
> List(0)
> Stacktrace
> sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 
> ArrayBuffer() did not equal List(0)
>   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:528)
>   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:527)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:501)
>   at 
> org.apache.spark.scheduler.DAGSchedulerSuite.$anonfun$new$144(DAGSchedulerSuite.scala:2644)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:104)
>   at 
> org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
>   at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
>   at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196)
>   at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178)
>   at 
> org.apache.spark.scheduler.DAGSchedulerSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(DAGSchedulerSuite.scala:122)
> {noformat}
> - 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109303/consoleFull
> {code}
> - Barrier task failures from the same stage attempt don't trigger multiple 
> stage retries *** FAILED ***
>   ArrayBuffer(0) did not equal List(0) (DAGSchedulerSuite.scala:2656)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26989) Flaky test:DAGSchedulerSuite.Barrier task failures from the same stage attempt don't trigger multiple stage retries

2019-09-11 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-26989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-26989:
--

Assignee: Jungtaek Lim

> Flaky test:DAGSchedulerSuite.Barrier task failures from the same stage 
> attempt don't trigger multiple stage retries
> ---
>
> Key: SPARK-26989
> URL: https://issues.apache.org/jira/browse/SPARK-26989
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 3.0.0
>Reporter: Marcelo Vanzin
>Assignee: Jungtaek Lim
>Priority: Major
>
> https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/102761/testReport/junit/org.apache.spark.scheduler/DAGSchedulerSuite/Barrier_task_failures_from_the_same_stage_attempt_don_t_trigger_multiple_stage_retries/
> {noformat}
> org.apache.spark.scheduler.DAGSchedulerSuite.Barrier task failures from the 
> same stage attempt don't trigger multiple stage retries
> Error Message
> org.scalatest.exceptions.TestFailedException: ArrayBuffer() did not equal 
> List(0)
> Stacktrace
> sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 
> ArrayBuffer() did not equal List(0)
>   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:528)
>   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:527)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:501)
>   at 
> org.apache.spark.scheduler.DAGSchedulerSuite.$anonfun$new$144(DAGSchedulerSuite.scala:2644)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:104)
>   at 
> org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
>   at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
>   at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196)
>   at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178)
>   at 
> org.apache.spark.scheduler.DAGSchedulerSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(DAGSchedulerSuite.scala:122)
> {noformat}
> - 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109303/consoleFull
> {code}
> - Barrier task failures from the same stage attempt don't trigger multiple 
> stage retries *** FAILED ***
>   ArrayBuffer(0) did not equal List(0) (DAGSchedulerSuite.scala:2656)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28570) Shuffle Storage API: Use writer API in UnsafeShuffleWriter

2019-09-10 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-28570.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 25304
[https://github.com/apache/spark/pull/25304]

> Shuffle Storage API: Use writer API in UnsafeShuffleWriter
> --
>
> Key: SPARK-28570
> URL: https://issues.apache.org/jira/browse/SPARK-28570
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle
>Affects Versions: 3.0.0
>Reporter: Matt Cheah
>Assignee: Matt Cheah
>Priority: Major
> Fix For: 3.0.0
>
>
> Use the APIs introduced in SPARK-28209 in the UnsafeShuffleWriter.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28570) Shuffle Storage API: Use writer API in UnsafeShuffleWriter

2019-09-10 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-28570:
--

Assignee: Matt Cheah

> Shuffle Storage API: Use writer API in UnsafeShuffleWriter
> --
>
> Key: SPARK-28570
> URL: https://issues.apache.org/jira/browse/SPARK-28570
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle
>Affects Versions: 3.0.0
>Reporter: Matt Cheah
>Assignee: Matt Cheah
>Priority: Major
>
> Use the APIs introduced in SPARK-28209 in the UnsafeShuffleWriter.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28928) Use Kafka delegation token protocol on sources/sinks

2019-09-09 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-28928:
--

Assignee: Gabor Somogyi

> Use Kafka delegation token protocol on sources/sinks
> 
>
> Key: SPARK-28928
> URL: https://issues.apache.org/jira/browse/SPARK-28928
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Major
>
> At the moment there are 3 places where communication protocol with Kafka 
> cluster has to be configured:
>  * On delegation token
>  * On source
>  * On sink
> Most of the time users are using the same protocol on all these places 
> (within one Kafka cluster). It would be better to declare it in one place 
> (delegation token side) and Kafka sources/sinks can take this config over.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28928) Use Kafka delegation token protocol on sources/sinks

2019-09-09 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-28928.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 25631
[https://github.com/apache/spark/pull/25631]

> Use Kafka delegation token protocol on sources/sinks
> 
>
> Key: SPARK-28928
> URL: https://issues.apache.org/jira/browse/SPARK-28928
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Major
> Fix For: 3.0.0
>
>
> At the moment there are 3 places where communication protocol with Kafka 
> cluster has to be configured:
>  * On delegation token
>  * On source
>  * On sink
> Most of the time users are using the same protocol on all these places 
> (within one Kafka cluster). It would be better to declare it in one place 
> (delegation token side) and Kafka sources/sinks can take this config over.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28214) Flaky test: org.apache.spark.streaming.CheckpointSuite.basic rdd checkpoints + dstream graph checkpoint recovery

2019-09-09 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-28214.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 25731
[https://github.com/apache/spark/pull/25731]

> Flaky test: org.apache.spark.streaming.CheckpointSuite.basic rdd checkpoints 
> + dstream graph checkpoint recovery
> 
>
> Key: SPARK-28214
> URL: https://issues.apache.org/jira/browse/SPARK-28214
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams, Tests
>Affects Versions: 3.0.0
>Reporter: Marcelo Vanzin
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0
>
>
> This test has failed a few times in some PRs. Example of a failure:
> {noformat}
> Error Message
> org.scalatest.exceptions.TestFailedException: Map() was empty No checkpointed 
> RDDs in state stream before first failure
> Stacktrace
> sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: Map() 
> was empty No checkpointed RDDs in state stream before first failure
>   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:528)
>   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:527)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:501)
>   at 
> org.apache.spark.streaming.CheckpointSuite.$anonfun$new$3(CheckpointSuite.scala:266)
>   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:149)
>   at 
> org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
>   at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
>   at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196)
>   at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178)
>   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:56)
>   at 
> org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221)
>   at 
> org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214)
>   at 
> org.apache.spark.streaming.CheckpointSuite.org$scalatest$BeforeAndAfter$$super$runTest(CheckpointSuite.scala:209)
> {noformat}
> On top of that, when this failure happens, the test leaves a running 
> {{SparkContext}} behind, which makes every single unit test run after it on 
> that project fail.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28214) Flaky test: org.apache.spark.streaming.CheckpointSuite.basic rdd checkpoints + dstream graph checkpoint recovery

2019-09-09 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-28214:
--

Assignee: Jungtaek Lim

> Flaky test: org.apache.spark.streaming.CheckpointSuite.basic rdd checkpoints 
> + dstream graph checkpoint recovery
> 
>
> Key: SPARK-28214
> URL: https://issues.apache.org/jira/browse/SPARK-28214
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams, Tests
>Affects Versions: 3.0.0
>Reporter: Marcelo Vanzin
>Assignee: Jungtaek Lim
>Priority: Major
>
> This test has failed a few times in some PRs. Example of a failure:
> {noformat}
> Error Message
> org.scalatest.exceptions.TestFailedException: Map() was empty No checkpointed 
> RDDs in state stream before first failure
> Stacktrace
> sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: Map() 
> was empty No checkpointed RDDs in state stream before first failure
>   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:528)
>   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:527)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:501)
>   at 
> org.apache.spark.streaming.CheckpointSuite.$anonfun$new$3(CheckpointSuite.scala:266)
>   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:149)
>   at 
> org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
>   at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
>   at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196)
>   at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178)
>   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:56)
>   at 
> org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221)
>   at 
> org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214)
>   at 
> org.apache.spark.streaming.CheckpointSuite.org$scalatest$BeforeAndAfter$$super$runTest(CheckpointSuite.scala:209)
> {noformat}
> On top of that, when this failure happens, the test leaves a running 
> {{SparkContext}} behind, which makes every single unit test run after it on 
> that project fail.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25151) Apply Apache Commons Pool to KafkaDataConsumer

2019-09-04 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-25151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-25151:
--

Assignee: Jungtaek Lim

> Apply Apache Commons Pool to KafkaDataConsumer
> --
>
> Key: SPARK-25151
> URL: https://issues.apache.org/jira/browse/SPARK-25151
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
>
> KafkaDataConsumer contains its own logic for caching InternalKafkaConsumer 
> which looks like can be simplified via applying Apache Commons Pool. Benefits 
> of applying Apache Commons Pool are following:
>  * We can get rid of synchronization of KafkaDataConsumer object while 
> acquiring and returning InternalKafkaConsumer.
>  * We can extract the feature of object pool to outside of the class, so that 
> the behaviors of the pool can be tested easily. Now it doesn't have detailed 
> tests and only covers reported issues.
>  * We can get various statistics for the object pool, and also be able to 
> enable JMX for the pool.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25151) Apply Apache Commons Pool to KafkaDataConsumer

2019-09-04 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-25151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-25151.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 22138
[https://github.com/apache/spark/pull/22138]

> Apply Apache Commons Pool to KafkaDataConsumer
> --
>
> Key: SPARK-25151
> URL: https://issues.apache.org/jira/browse/SPARK-25151
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0
>
>
> KafkaDataConsumer contains its own logic for caching InternalKafkaConsumer 
> which looks like can be simplified via applying Apache Commons Pool. Benefits 
> of applying Apache Commons Pool are following:
>  * We can get rid of synchronization of KafkaDataConsumer object while 
> acquiring and returning InternalKafkaConsumer.
>  * We can extract the feature of object pool to outside of the class, so that 
> the behaviors of the pool can be tested easily. Now it doesn't have detailed 
> tests and only covers reported issues.
>  * We can get various statistics for the object pool, and also be able to 
> enable JMX for the pool.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28931) Fix couple of bugs in FsHistoryProviderSuite

2019-09-04 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-28931.

Fix Version/s: 3.0.0
 Assignee: Jungtaek Lim
   Resolution: Fixed

> Fix couple of bugs in FsHistoryProviderSuite
> 
>
> Key: SPARK-28931
> URL: https://issues.apache.org/jira/browse/SPARK-28931
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Minor
> Fix For: 3.0.0
>
>
> There're some bugs reside on FsHistoryProviderSuite itself.
>  # When creating log file via {{newLogFile}}, codec is ignored, leading to 
> wrong file name. (No one tends to create test for test code, as well as the 
> bug doesn't affect existing tests indeed, so not easy to catch.)
>  # When writing events to log file via {{writeFile}}, metadata (in case of 
> new format) gets written to file regardless of its codec, and the content is 
> overwritten by another stream, hence no information for Spark version is 
> available. It affects existing test, hence we have wrong expected value to 
> workaround the bug.
> Note that they're bugs on test code, non-test code works fine.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28571) Shuffle storage API: Use API in SortShuffleWriter

2019-08-30 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-28571:
--

Assignee: Matt Cheah

> Shuffle storage API: Use API in SortShuffleWriter
> -
>
> Key: SPARK-28571
> URL: https://issues.apache.org/jira/browse/SPARK-28571
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle
>Affects Versions: 3.0.0
>Reporter: Matt Cheah
>Assignee: Matt Cheah
>Priority: Major
>
> Use the APIs introduced in SPARK-28209 in the SortShuffleWriter.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28571) Shuffle storage API: Use API in SortShuffleWriter

2019-08-30 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-28571.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 25342
[https://github.com/apache/spark/pull/25342]

> Shuffle storage API: Use API in SortShuffleWriter
> -
>
> Key: SPARK-28571
> URL: https://issues.apache.org/jira/browse/SPARK-28571
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle
>Affects Versions: 3.0.0
>Reporter: Matt Cheah
>Assignee: Matt Cheah
>Priority: Major
> Fix For: 3.0.0
>
>
> Use the APIs introduced in SPARK-28209 in the SortShuffleWriter.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28760) Add end-to-end Kafka delegation token test

2019-08-29 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-28760.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 25477
[https://github.com/apache/spark/pull/25477]

> Add end-to-end Kafka delegation token test
> --
>
> Key: SPARK-28760
> URL: https://issues.apache.org/jira/browse/SPARK-28760
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming, Tests
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Major
> Fix For: 3.0.0
>
>
> At the moment no end-to-end Kafka delegation token test exists which was 
> mainly because of missing KDC. KDC is missing in general from the testing 
> side so I've discovered what kind of possibilities are there. The most 
> obvious choice is the MiniKDC inside the Hadoop library where Apache Kerby 
> runs in the background. In this jira I would like to add Kerby to the testing 
> area and use it to cover security related features.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28760) Add end-to-end Kafka delegation token test

2019-08-29 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-28760:
--

Assignee: Gabor Somogyi

> Add end-to-end Kafka delegation token test
> --
>
> Key: SPARK-28760
> URL: https://issues.apache.org/jira/browse/SPARK-28760
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming, Tests
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Major
>
> At the moment no end-to-end Kafka delegation token test exists which was 
> mainly because of missing KDC. KDC is missing in general from the testing 
> side so I've discovered what kind of possibilities are there. The most 
> obvious choice is the MiniKDC inside the Hadoop library where Apache Kerby 
> runs in the background. In this jira I would like to add Kerby to the testing 
> area and use it to cover security related features.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25874) Simplify abstractions in the K8S backend

2019-08-29 Thread Marcelo Vanzin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918813#comment-16918813
 ] 

Marcelo Vanzin commented on SPARK-25874:


I'm going to close this one; the only remaining task is for adding (internal 
developer) documentation, which is minor.

> Simplify abstractions in the K8S backend
> 
>
> Key: SPARK-25874
> URL: https://issues.apache.org/jira/browse/SPARK-25874
> Project: Spark
>  Issue Type: Umbrella
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Marcelo Vanzin
>Priority: Major
>
> I spent some time recently re-familiarizing myself with the k8s backend, and 
> I think there is room for improvement. In the past, SPARK-22839 was added 
> which improved things a lot, but it is still hard to follow the code, and it 
> is still more complicated than it should be to add a new feature.
> I've worked on the main things that were bothering me and came up with these 
> changes:
> https://github.com/vanzin/spark/commits/k8s-simple
> Now that patch (first commit of the branch) is a little large, which makes it 
> hard to review and to properly assess what it is doing. So I plan to break it 
> down into a few steps that I will file as sub-tasks and send for review 
> independently.
> The commit message for that patch has a lot of the background of what I 
> changed and why. Since I plan to delete that branch after the work is done, 
> I'll paste it here:
> {noformat}
> There are two main changes happening here.
> (1) Simplify the KubernetesConf abstraction.
> The current code around KubernetesConf has a few drawbacks:
> - it uses composition (with a type parameter) for role-specific configuration
> - it breaks encapsulation of the user configuration, held in SparkConf, by
>   requiring that all the k8s-specific info is extracted from SparkConf before
>   the KubernetesConf object is created.
> - the above is usually done by parsing the SparkConf info into k8s-backend
>   types, which are then transformed into k8s requests.
> This ends up requiring a whole lot of code that is just not necessary.
> The type parameters make parameter and class declarations full of needless
> noise; the breakage of encapsulation makes the code that processes SparkConf
> and the code that builds the k8s descriptors live in different places, and
> the intermediate representation isn't adding much value.
> By using inheritance instead of the current model, role-specific
> specialization of certain config properties works simply by implementing some
> abstract methods of the base class (instead of breaking encapsulation), and
> there's no need anymore for parameterized types.
> By moving config processing to the code that actually transforms the config
> into k8s descriptors, a lot of intermediate boilerplate can be removed.
> This leads to...
> (2) Make all feature logic part of the feature step itself.
> Currently there's code in a lot of places to decide whether a feature
> should be enabled. There's code when parsing the configuration, building
> the custom intermediate representation in a way that is later used by
> different code in a builder class, which then decides whether feature A
> or feature B should be used.
> Instead, it's much cleaner to let a feature decide things for itself.
> If the config to enable feature A exists, it proceses the config and
> sets up the necessary k8s descriptors. If it doesn't, the feature is
> a no-op.
> This simplifies the shared code that calls into the existing features
> a lot. And does not make the existing features any more complicated.
> As part of this I merged the different language binding feature steps
> into a single step. They are sort of related, in the sense that if
> one is applied the others shouldn't, and merging them makes the logic
> to implement that cleaner.
> The driver and executor builders are now also much simpler, since they
> have no logic about what steps to apply or not. The tests were removed
> because of that, and some new tests were added to the suites for
> specific features, to verify what the old builder suites were testing.
> On top of the above I made a few minor changes (in comparison):
> - KubernetesVolumeUtils was modified to just throw exceptions. The old
>   code tried to avoid throwing exceptions by collecting results in `Try`
>   objects. That was not achieving anything since all the callers would
>   just call `get` on those objects, and the first one with a failure
>   would just throw the exception. The new code achieves the same
>   behavior and is simpler.
> - A bunch of small things, mainly to bring the code in line with the
>   usual Spark code style. I also removed unnecessary mocking in tests,
>   unused imports, and unused configs and constants.
> - Added some basic tests for 

[jira] [Resolved] (SPARK-25874) Simplify abstractions in the K8S backend

2019-08-29 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-25874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-25874.

Fix Version/s: 3.0.0
   Resolution: Done

> Simplify abstractions in the K8S backend
> 
>
> Key: SPARK-25874
> URL: https://issues.apache.org/jira/browse/SPARK-25874
> Project: Spark
>  Issue Type: Umbrella
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Marcelo Vanzin
>Priority: Major
> Fix For: 3.0.0
>
>
> I spent some time recently re-familiarizing myself with the k8s backend, and 
> I think there is room for improvement. In the past, SPARK-22839 was added 
> which improved things a lot, but it is still hard to follow the code, and it 
> is still more complicated than it should be to add a new feature.
> I've worked on the main things that were bothering me and came up with these 
> changes:
> https://github.com/vanzin/spark/commits/k8s-simple
> Now that patch (first commit of the branch) is a little large, which makes it 
> hard to review and to properly assess what it is doing. So I plan to break it 
> down into a few steps that I will file as sub-tasks and send for review 
> independently.
> The commit message for that patch has a lot of the background of what I 
> changed and why. Since I plan to delete that branch after the work is done, 
> I'll paste it here:
> {noformat}
> There are two main changes happening here.
> (1) Simplify the KubernetesConf abstraction.
> The current code around KubernetesConf has a few drawbacks:
> - it uses composition (with a type parameter) for role-specific configuration
> - it breaks encapsulation of the user configuration, held in SparkConf, by
>   requiring that all the k8s-specific info is extracted from SparkConf before
>   the KubernetesConf object is created.
> - the above is usually done by parsing the SparkConf info into k8s-backend
>   types, which are then transformed into k8s requests.
> This ends up requiring a whole lot of code that is just not necessary.
> The type parameters make parameter and class declarations full of needless
> noise; the breakage of encapsulation makes the code that processes SparkConf
> and the code that builds the k8s descriptors live in different places, and
> the intermediate representation isn't adding much value.
> By using inheritance instead of the current model, role-specific
> specialization of certain config properties works simply by implementing some
> abstract methods of the base class (instead of breaking encapsulation), and
> there's no need anymore for parameterized types.
> By moving config processing to the code that actually transforms the config
> into k8s descriptors, a lot of intermediate boilerplate can be removed.
> This leads to...
> (2) Make all feature logic part of the feature step itself.
> Currently there's code in a lot of places to decide whether a feature
> should be enabled. There's code when parsing the configuration, building
> the custom intermediate representation in a way that is later used by
> different code in a builder class, which then decides whether feature A
> or feature B should be used.
> Instead, it's much cleaner to let a feature decide things for itself.
> If the config to enable feature A exists, it proceses the config and
> sets up the necessary k8s descriptors. If it doesn't, the feature is
> a no-op.
> This simplifies the shared code that calls into the existing features
> a lot. And does not make the existing features any more complicated.
> As part of this I merged the different language binding feature steps
> into a single step. They are sort of related, in the sense that if
> one is applied the others shouldn't, and merging them makes the logic
> to implement that cleaner.
> The driver and executor builders are now also much simpler, since they
> have no logic about what steps to apply or not. The tests were removed
> because of that, and some new tests were added to the suites for
> specific features, to verify what the old builder suites were testing.
> On top of the above I made a few minor changes (in comparison):
> - KubernetesVolumeUtils was modified to just throw exceptions. The old
>   code tried to avoid throwing exceptions by collecting results in `Try`
>   objects. That was not achieving anything since all the callers would
>   just call `get` on those objects, and the first one with a failure
>   would just throw the exception. The new code achieves the same
>   behavior and is simpler.
> - A bunch of small things, mainly to bring the code in line with the
>   usual Spark code style. I also removed unnecessary mocking in tests,
>   unused imports, and unused configs and constants.
> - Added some basic tests for KerberosConfDriverFeatureStep.
> Note that there may still be leftover intermediate 

[jira] [Commented] (SPARK-28906) `bin/spark-submit --version` shows incorrect info

2019-08-28 Thread Marcelo Vanzin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918204#comment-16918204
 ] 

Marcelo Vanzin commented on SPARK-28906:


The code is fine. This is a problem in the release scripts.

> `bin/spark-submit --version` shows incorrect info
> -
>
> Key: SPARK-28906
> URL: https://issues.apache.org/jira/browse/SPARK-28906
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.4, 2.4.0, 2.4.1, 2.4.2, 
> 3.0.0, 2.4.3
>Reporter: Marcelo Vanzin
>Priority: Minor
> Attachments: image-2019-08-29-05-50-13-526.png
>
>
> Since Spark 2.3.1, `spark-submit` shows a wrong information.
> {code}
> $ bin/spark-submit --version
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 2.3.3
>   /_/
> Using Scala version 2.11.8, OpenJDK 64-Bit Server VM, 1.8.0_222
> Branch
> Compiled by user  on 2019-02-04T13:00:46Z
> Revision
> Url
> Type --help for more information.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28875) Cover Task retry scenario with test in Kafka connector

2019-08-26 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-28875.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 25582
[https://github.com/apache/spark/pull/25582]

> Cover Task retry scenario with test in Kafka connector
> --
>
> Key: SPARK-28875
> URL: https://issues.apache.org/jira/browse/SPARK-28875
> Project: Spark
>  Issue Type: Improvement
>  Components: DStreams, Structured Streaming, Tests
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Minor
> Fix For: 3.0.0
>
>
> When Task retry happens with Kafka source then it's not known whether the 
> consumer is the issue so the old consumer removed from cache and new consumer 
> created. The feature works fine but not covered with tests.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28875) Cover Task retry scenario with test in Kafka connector

2019-08-26 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-28875:
--

Assignee: Gabor Somogyi

> Cover Task retry scenario with test in Kafka connector
> --
>
> Key: SPARK-28875
> URL: https://issues.apache.org/jira/browse/SPARK-28875
> Project: Spark
>  Issue Type: Improvement
>  Components: DStreams, Structured Streaming, Tests
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Minor
>
> When Task retry happens with Kafka source then it's not known whether the 
> consumer is the issue so the old consumer removed from cache and new consumer 
> created. The feature works fine but not covered with tests.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28679) Spark Yarn ResourceRequestHelper shouldn't lookup setResourceInformation is no resources specified

2019-08-26 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-28679.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 25403
[https://github.com/apache/spark/pull/25403]

> Spark Yarn ResourceRequestHelper shouldn't lookup setResourceInformation is 
> no resources specified
> --
>
> Key: SPARK-28679
> URL: https://issues.apache.org/jira/browse/SPARK-28679
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 3.0.0
>Reporter: Thomas Graves
>Assignee: Alessandro Bellina
>Priority: Minor
> Fix For: 3.0.0
>
>
> in the Spark Yarn ResourceRequestHelper it uses reflection to lookup 
> setResourceInformation. We should skip that lookup if the resource Map is 
> empty.
> [https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceRequestHelper.scala#L154]
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28679) Spark Yarn ResourceRequestHelper shouldn't lookup setResourceInformation is no resources specified

2019-08-26 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-28679:
--

Assignee: Alessandro Bellina

> Spark Yarn ResourceRequestHelper shouldn't lookup setResourceInformation is 
> no resources specified
> --
>
> Key: SPARK-28679
> URL: https://issues.apache.org/jira/browse/SPARK-28679
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 3.0.0
>Reporter: Thomas Graves
>Assignee: Alessandro Bellina
>Priority: Minor
>
> in the Spark Yarn ResourceRequestHelper it uses reflection to lookup 
> setResourceInformation. We should skip that lookup if the resource Map is 
> empty.
> [https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceRequestHelper.scala#L154]
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28607) Don't hold a reference to two partitionLengths arrays

2019-08-26 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-28607.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 25341
[https://github.com/apache/spark/pull/25341]

> Don't hold a reference to two partitionLengths arrays
> -
>
> Key: SPARK-28607
> URL: https://issues.apache.org/jira/browse/SPARK-28607
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle
>Affects Versions: 3.0.0
>Reporter: Matt Cheah
>Assignee: Matt Cheah
>Priority: Major
> Fix For: 3.0.0
>
>
> SPARK-28209 introduced the new shuffle writer API and its usage in 
> BypassMergeSortShuffleWriter. However, the design of the API forces the 
> partition lengths to be tracked both in the implementation of the plugin and 
> also by the higher-level writer. This leads to redundant memory usage. We 
> should only track the lengths of the partitions in the implementation of the 
> plugin and propagate this information back up to the writer as the return 
> value of {{commitAllPartitions}}.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28607) Don't hold a reference to two partitionLengths arrays

2019-08-26 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-28607:
--

Assignee: Matt Cheah

> Don't hold a reference to two partitionLengths arrays
> -
>
> Key: SPARK-28607
> URL: https://issues.apache.org/jira/browse/SPARK-28607
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle
>Affects Versions: 3.0.0
>Reporter: Matt Cheah
>Assignee: Matt Cheah
>Priority: Major
>
> SPARK-28209 introduced the new shuffle writer API and its usage in 
> BypassMergeSortShuffleWriter. However, the design of the API forces the 
> partition lengths to be tracked both in the implementation of the plugin and 
> also by the higher-level writer. This leads to redundant memory usage. We 
> should only track the lengths of the partitions in the implementation of the 
> plugin and propagate this information back up to the writer as the return 
> value of {{commitAllPartitions}}.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28839) ExecutorMonitor$Tracker NullPointerException

2019-08-23 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-28839:
--

Assignee: Hyukjin Kwon

> ExecutorMonitor$Tracker NullPointerException
> 
>
> Key: SPARK-28839
> URL: https://issues.apache.org/jira/browse/SPARK-28839
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Hyukjin Kwon
>Priority: Major
>
> {noformat}
> 19/08/21 06:44:01 ERROR AsyncEventQueue: Listener ExecutorMonitor threw an 
> exception
> java.lang.NullPointerException
>   at 
> org.apache.spark.scheduler.dynalloc.ExecutorMonitor$Tracker.removeShuffle(ExecutorMonitor.scala:479)
>   at 
> org.apache.spark.scheduler.dynalloc.ExecutorMonitor.$anonfun$cleanupShuffle$2(ExecutorMonitor.scala:408)
>   at 
> org.apache.spark.scheduler.dynalloc.ExecutorMonitor.$anonfun$cleanupShuffle$2$adapted(ExecutorMonitor.scala:407)
>   at scala.collection.Iterator.foreach(Iterator.scala:941)
>   at scala.collection.Iterator.foreach$(Iterator.scala:941)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
>   at scala.collection.IterableLike.foreach(IterableLike.scala:74)
>   at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
>   at 
> org.apache.spark.scheduler.dynalloc.ExecutorMonitor.cleanupShuffle(ExecutorMonitor.scala:407)
>   at 
> org.apache.spark.scheduler.dynalloc.ExecutorMonitor.onOtherEvent(ExecutorMonitor.scala:351)
>   at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:82)
>   at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
>   at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:99)
>   at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:84)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:102)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:102)
>   at 
> scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
>   at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:97)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:93)
>   at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1319)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:93)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28839) ExecutorMonitor$Tracker NullPointerException

2019-08-23 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-28839.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 25551
[https://github.com/apache/spark/pull/25551]

> ExecutorMonitor$Tracker NullPointerException
> 
>
> Key: SPARK-28839
> URL: https://issues.apache.org/jira/browse/SPARK-28839
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.0.0
>
>
> {noformat}
> 19/08/21 06:44:01 ERROR AsyncEventQueue: Listener ExecutorMonitor threw an 
> exception
> java.lang.NullPointerException
>   at 
> org.apache.spark.scheduler.dynalloc.ExecutorMonitor$Tracker.removeShuffle(ExecutorMonitor.scala:479)
>   at 
> org.apache.spark.scheduler.dynalloc.ExecutorMonitor.$anonfun$cleanupShuffle$2(ExecutorMonitor.scala:408)
>   at 
> org.apache.spark.scheduler.dynalloc.ExecutorMonitor.$anonfun$cleanupShuffle$2$adapted(ExecutorMonitor.scala:407)
>   at scala.collection.Iterator.foreach(Iterator.scala:941)
>   at scala.collection.Iterator.foreach$(Iterator.scala:941)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
>   at scala.collection.IterableLike.foreach(IterableLike.scala:74)
>   at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
>   at 
> org.apache.spark.scheduler.dynalloc.ExecutorMonitor.cleanupShuffle(ExecutorMonitor.scala:407)
>   at 
> org.apache.spark.scheduler.dynalloc.ExecutorMonitor.onOtherEvent(ExecutorMonitor.scala:351)
>   at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:82)
>   at 
> org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
>   at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:99)
>   at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:84)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:102)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:102)
>   at 
> scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
>   at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:97)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:93)
>   at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1319)
>   at 
> org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:93)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27937) Revert changes introduced as a part of Automatic namespace discovery [SPARK-24149]

2019-08-20 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-27937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated SPARK-27937:
---
Docs Text: In Spark 3.0, the behavior for automatic delegation token 
retrieval for file systems is the same as Spark 2.3. Users need to explicitly 
include the URIs they want to access in the 
spark.kerberos.access.hadoopFileSystems configuration. The automatic discovery 
added in Spark 2.4 (SPARK-24149) was removed.
   Labels: release-notes  (was: )

> Revert changes introduced as a part of Automatic namespace discovery 
> [SPARK-24149]
> --
>
> Key: SPARK-27937
> URL: https://issues.apache.org/jira/browse/SPARK-27937
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.3
>Reporter: Dhruve Ashar
>Assignee: Dhruve Ashar
>Priority: Major
>  Labels: release-notes
> Fix For: 3.0.0
>
>
> Spark fails to launch for a valid deployment of HDFS while trying to get 
> tokens for a logical nameservice instead of an actual namenode (with HDFS 
> federation enabled). 
> On inspecting the source code closely, it is unclear why we were doing it and 
> based on the context from SPARK-24149, it solves a very specific use case of 
> getting the tokens for only those namenodes which are configured for HDFS 
> federation in the same cluster. IMHO these are better left to the user to 
> specify explicitly.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-27937) Revert changes introduced as a part of Automatic namespace discovery [SPARK-24149]

2019-08-20 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-27937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-27937.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 24785
[https://github.com/apache/spark/pull/24785]

> Revert changes introduced as a part of Automatic namespace discovery 
> [SPARK-24149]
> --
>
> Key: SPARK-27937
> URL: https://issues.apache.org/jira/browse/SPARK-27937
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.3
>Reporter: Dhruve Ashar
>Assignee: Dhruve Ashar
>Priority: Major
> Fix For: 3.0.0
>
>
> Spark fails to launch for a valid deployment of HDFS while trying to get 
> tokens for a logical nameservice instead of an actual namenode (with HDFS 
> federation enabled). 
> On inspecting the source code closely, it is unclear why we were doing it and 
> based on the context from SPARK-24149, it solves a very specific use case of 
> getting the tokens for only those namenodes which are configured for HDFS 
> federation in the same cluster. IMHO these are better left to the user to 
> specify explicitly.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27937) Revert changes introduced as a part of Automatic namespace discovery [SPARK-24149]

2019-08-20 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-27937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-27937:
--

Assignee: Dhruve Ashar

> Revert changes introduced as a part of Automatic namespace discovery 
> [SPARK-24149]
> --
>
> Key: SPARK-27937
> URL: https://issues.apache.org/jira/browse/SPARK-27937
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.3
>Reporter: Dhruve Ashar
>Assignee: Dhruve Ashar
>Priority: Major
>
> Spark fails to launch for a valid deployment of HDFS while trying to get 
> tokens for a logical nameservice instead of an actual namenode (with HDFS 
> federation enabled). 
> On inspecting the source code closely, it is unclear why we were doing it and 
> based on the context from SPARK-24149, it solves a very specific use case of 
> getting the tokens for only those namenodes which are configured for HDFS 
> federation in the same cluster. IMHO these are better left to the user to 
> specify explicitly.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28634) Failed to start SparkSession with Keytab file

2019-08-19 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-28634:
--

Assignee: Marcelo Vanzin

> Failed to start SparkSession with Keytab file 
> --
>
> Key: SPARK-28634
> URL: https://issues.apache.org/jira/browse/SPARK-28634
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Marcelo Vanzin
>Priority: Minor
>
> {noformat}
> [user-etl@hermesdevour002-700165 spark-3.0.0-SNAPSHOT-bin-2.7.4]$ 
> bin/spark-sql --master yarn --conf 
> spark.yarn.keytab=/apache/spark-2.3.0-bin-2.7.3/conf/user-etl.keytab --conf 
> spark.yarn.principal=user-...@prod.example.com
> log4j:WARN No such property [maxFileSize] in 
> org.apache.log4j.rolling.RollingFileAppender.
> log4j:WARN No such property [maxBackupIndex] in 
> org.apache.log4j.rolling.RollingFileAppender.
> Exception in thread "main" org.apache.spark.SparkException: Application 
> application_1564558112805_1794 failed 2 times due to AM Container for 
> appattempt_1564558112805_1794_02 exited with  exitCode: 1
> For more detailed output, check the application tracking page: 
> https://0.0.0.0:8190/applicationhistory/app/application_1564558112805_1794 
> Then click on links to logs of each attempt.
> Diagnostics: Exception from container-launch.
> Container id: container_e1987_1564558112805_1794_02_01
> Exit code: 1
> Shell output: main : command provided 1
> main : run as user is user-etl
> main : requested yarn user is user-etl
> Getting exit code file...
> Creating script paths...
> Writing pid file...
> Writing to tmp file 
> /hadoop/2/yarn/local/nmPrivate/application_1564558112805_1794/container_e1987_1564558112805_1794_02_01/container_e1987_1564558112805_1794_02_01.pid.tmp
> Writing to cgroup task files...
> Creating local dirs...
> Launching container...
> Getting exit code file...
> Creating script paths...
> Container exited with a non-zero exit code 1. Error file: prelaunch.err.
> Last 4096 bytes of prelaunch.err :
> Last 4096 bytes of stderr :
> log4j:WARN No such property [maxFileSize] in 
> org.apache.log4j.rolling.RollingFileAppender.
> log4j:WARN No such property [maxBackupIndex] in 
> org.apache.log4j.rolling.RollingFileAppender.
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/hadoop/2/yarn/local/usercache/user-etl/filecache/58/__spark_libs__4358879230136591830.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/apache/releases/hbase-1.1.2.2.6.4.1/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/apache/releases/hadoop-2.7.3.2.6.4.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> Exception in thread "main" org.apache.spark.SparkException: Keytab file: 
> /apache/spark-2.3.0-bin-2.7.3/conf/user-etl.keytab does not exist
>   at 
> org.apache.spark.deploy.SparkHadoopUtil.loginUserFromKeytab(SparkHadoopUtil.scala:131)
>   at 
> org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:846)
>   at 
> org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:889)
>   at 
> org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
> Failing this attempt. Failing the application.
>   at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:95)
>   at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:185)
>   at org.apache.spark.SparkContext.(SparkContext.scala:509)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2466)
>   at 
> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$5(SparkSession.scala:948)
>   at scala.Option.getOrElse(Option.scala:138)
>   at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:939)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:48)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:315)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:166)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   

[jira] [Resolved] (SPARK-28634) Failed to start SparkSession with Keytab file

2019-08-19 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-28634.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 25467
[https://github.com/apache/spark/pull/25467]

> Failed to start SparkSession with Keytab file 
> --
>
> Key: SPARK-28634
> URL: https://issues.apache.org/jira/browse/SPARK-28634
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Marcelo Vanzin
>Priority: Minor
> Fix For: 3.0.0
>
>
> {noformat}
> [user-etl@hermesdevour002-700165 spark-3.0.0-SNAPSHOT-bin-2.7.4]$ 
> bin/spark-sql --master yarn --conf 
> spark.yarn.keytab=/apache/spark-2.3.0-bin-2.7.3/conf/user-etl.keytab --conf 
> spark.yarn.principal=user-...@prod.example.com
> log4j:WARN No such property [maxFileSize] in 
> org.apache.log4j.rolling.RollingFileAppender.
> log4j:WARN No such property [maxBackupIndex] in 
> org.apache.log4j.rolling.RollingFileAppender.
> Exception in thread "main" org.apache.spark.SparkException: Application 
> application_1564558112805_1794 failed 2 times due to AM Container for 
> appattempt_1564558112805_1794_02 exited with  exitCode: 1
> For more detailed output, check the application tracking page: 
> https://0.0.0.0:8190/applicationhistory/app/application_1564558112805_1794 
> Then click on links to logs of each attempt.
> Diagnostics: Exception from container-launch.
> Container id: container_e1987_1564558112805_1794_02_01
> Exit code: 1
> Shell output: main : command provided 1
> main : run as user is user-etl
> main : requested yarn user is user-etl
> Getting exit code file...
> Creating script paths...
> Writing pid file...
> Writing to tmp file 
> /hadoop/2/yarn/local/nmPrivate/application_1564558112805_1794/container_e1987_1564558112805_1794_02_01/container_e1987_1564558112805_1794_02_01.pid.tmp
> Writing to cgroup task files...
> Creating local dirs...
> Launching container...
> Getting exit code file...
> Creating script paths...
> Container exited with a non-zero exit code 1. Error file: prelaunch.err.
> Last 4096 bytes of prelaunch.err :
> Last 4096 bytes of stderr :
> log4j:WARN No such property [maxFileSize] in 
> org.apache.log4j.rolling.RollingFileAppender.
> log4j:WARN No such property [maxBackupIndex] in 
> org.apache.log4j.rolling.RollingFileAppender.
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/hadoop/2/yarn/local/usercache/user-etl/filecache/58/__spark_libs__4358879230136591830.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/apache/releases/hbase-1.1.2.2.6.4.1/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/apache/releases/hadoop-2.7.3.2.6.4.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> Exception in thread "main" org.apache.spark.SparkException: Keytab file: 
> /apache/spark-2.3.0-bin-2.7.3/conf/user-etl.keytab does not exist
>   at 
> org.apache.spark.deploy.SparkHadoopUtil.loginUserFromKeytab(SparkHadoopUtil.scala:131)
>   at 
> org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:846)
>   at 
> org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:889)
>   at 
> org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
> Failing this attempt. Failing the application.
>   at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:95)
>   at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:185)
>   at org.apache.spark.SparkContext.(SparkContext.scala:509)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2466)
>   at 
> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$5(SparkSession.scala:948)
>   at scala.Option.getOrElse(Option.scala:138)
>   at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:939)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:48)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:315)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:166)
>   at 
> 

[jira] [Resolved] (SPARK-25262) Support tmpfs for local dirs in k8s

2019-08-19 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-25262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-25262.

Fix Version/s: 3.0.0
 Assignee: Rob Vesse
   Resolution: Fixed

> Support tmpfs for local dirs in k8s
> ---
>
> Key: SPARK-25262
> URL: https://issues.apache.org/jira/browse/SPARK-25262
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Rob Vesse
>Assignee: Rob Vesse
>Priority: Major
> Fix For: 3.0.0
>
>
> As discussed during review of the design document for SPARK-24434 while 
> providing pod templates will provide more in-depth customisation for Spark on 
> Kubernetes there are some things that cannot be modified because Spark code 
> generates pod specs in very specific ways.
> The particular issue identified relates to handling on {{spark.local.dirs}} 
> which is done by {{LocalDirsFeatureStep.scala}}.  For each directory 
> specified, or a single default if no explicit specification, it creates a 
> Kubernetes {{emptyDir}} volume.  As noted in the Kubernetes documentation 
> this will be backed by the node storage 
> (https://kubernetes.io/docs/concepts/storage/volumes/#emptydir).  In some 
> compute environments this may be extremely undesirable.  For example with 
> diskless compute resources the node storage will likely be a non-performant 
> remote mounted disk, often with limited capacity.  For such environments it 
> would likely be better to set {{medium: Memory}} on the volume per the K8S 
> documentation to use a {{tmpfs}} volume instead.
> Another closely related issue is that users might want to use a different 
> volume type to back the local directories and there is no possibility to do 
> that.
> Pod templates will not really solve either of these issues because Spark is 
> always going to attempt to generate a new volume for each local directory and 
> always going to set these as {{emptyDir}}.
> Therefore the proposal is to make two changes to {{LocalDirsFeatureStep}}:
> * Provide a new config setting to enable using {{tmpfs}} backed {{emptyDir}} 
> volumes
> * Modify the logic to check if there is a volume already defined with the 
> name and if so skip generating a volume definition for it



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25262) Support tmpfs for local dirs in k8s

2019-08-19 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-25262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated SPARK-25262:
---
Summary: Support tmpfs for local dirs in k8s  (was: Make Spark local dir 
volumes configurable with Spark on Kubernetes)

> Support tmpfs for local dirs in k8s
> ---
>
> Key: SPARK-25262
> URL: https://issues.apache.org/jira/browse/SPARK-25262
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Rob Vesse
>Priority: Major
>
> As discussed during review of the design document for SPARK-24434 while 
> providing pod templates will provide more in-depth customisation for Spark on 
> Kubernetes there are some things that cannot be modified because Spark code 
> generates pod specs in very specific ways.
> The particular issue identified relates to handling on {{spark.local.dirs}} 
> which is done by {{LocalDirsFeatureStep.scala}}.  For each directory 
> specified, or a single default if no explicit specification, it creates a 
> Kubernetes {{emptyDir}} volume.  As noted in the Kubernetes documentation 
> this will be backed by the node storage 
> (https://kubernetes.io/docs/concepts/storage/volumes/#emptydir).  In some 
> compute environments this may be extremely undesirable.  For example with 
> diskless compute resources the node storage will likely be a non-performant 
> remote mounted disk, often with limited capacity.  For such environments it 
> would likely be better to set {{medium: Memory}} on the volume per the K8S 
> documentation to use a {{tmpfs}} volume instead.
> Another closely related issue is that users might want to use a different 
> volume type to back the local directories and there is no possibility to do 
> that.
> Pod templates will not really solve either of these issues because Spark is 
> always going to attempt to generate a new volume for each local directory and 
> always going to set these as {{emptyDir}}.
> Therefore the proposal is to make two changes to {{LocalDirsFeatureStep}}:
> * Provide a new config setting to enable using {{tmpfs}} backed {{emptyDir}} 
> volumes
> * Modify the logic to check if there is a volume already defined with the 
> name and if so skip generating a volume definition for it



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25262) Make Spark local dir volumes configurable with Spark on Kubernetes

2019-08-19 Thread Marcelo Vanzin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910616#comment-16910616
 ] 

Marcelo Vanzin commented on SPARK-25262:


Full configurability was actually added in SPARK-28042. There is still a commit 
related to this one particular bug (da6fa38), so I won't dupe this, and will 
fix the title to reflect that feature instead.

> Make Spark local dir volumes configurable with Spark on Kubernetes
> --
>
> Key: SPARK-25262
> URL: https://issues.apache.org/jira/browse/SPARK-25262
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Rob Vesse
>Priority: Major
>
> As discussed during review of the design document for SPARK-24434 while 
> providing pod templates will provide more in-depth customisation for Spark on 
> Kubernetes there are some things that cannot be modified because Spark code 
> generates pod specs in very specific ways.
> The particular issue identified relates to handling on {{spark.local.dirs}} 
> which is done by {{LocalDirsFeatureStep.scala}}.  For each directory 
> specified, or a single default if no explicit specification, it creates a 
> Kubernetes {{emptyDir}} volume.  As noted in the Kubernetes documentation 
> this will be backed by the node storage 
> (https://kubernetes.io/docs/concepts/storage/volumes/#emptydir).  In some 
> compute environments this may be extremely undesirable.  For example with 
> diskless compute resources the node storage will likely be a non-performant 
> remote mounted disk, often with limited capacity.  For such environments it 
> would likely be better to set {{medium: Memory}} on the volume per the K8S 
> documentation to use a {{tmpfs}} volume instead.
> Another closely related issue is that users might want to use a different 
> volume type to back the local directories and there is no possibility to do 
> that.
> Pod templates will not really solve either of these issues because Spark is 
> always going to attempt to generate a new volume for each local directory and 
> always going to set these as {{emptyDir}}.
> Therefore the proposal is to make two changes to {{LocalDirsFeatureStep}}:
> * Provide a new config setting to enable using {{tmpfs}} backed {{emptyDir}} 
> volumes
> * Modify the logic to check if there is a volume already defined with the 
> name and if so skip generating a volume definition for it



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28634) Failed to start SparkSession with Keytab file

2019-08-15 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated SPARK-28634:
---
Priority: Minor  (was: Major)

> Failed to start SparkSession with Keytab file 
> --
>
> Key: SPARK-28634
> URL: https://issues.apache.org/jira/browse/SPARK-28634
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Minor
>
> {noformat}
> [user-etl@hermesdevour002-700165 spark-3.0.0-SNAPSHOT-bin-2.7.4]$ 
> bin/spark-sql --master yarn --conf 
> spark.yarn.keytab=/apache/spark-2.3.0-bin-2.7.3/conf/user-etl.keytab --conf 
> spark.yarn.principal=user-...@prod.example.com
> log4j:WARN No such property [maxFileSize] in 
> org.apache.log4j.rolling.RollingFileAppender.
> log4j:WARN No such property [maxBackupIndex] in 
> org.apache.log4j.rolling.RollingFileAppender.
> Exception in thread "main" org.apache.spark.SparkException: Application 
> application_1564558112805_1794 failed 2 times due to AM Container for 
> appattempt_1564558112805_1794_02 exited with  exitCode: 1
> For more detailed output, check the application tracking page: 
> https://0.0.0.0:8190/applicationhistory/app/application_1564558112805_1794 
> Then click on links to logs of each attempt.
> Diagnostics: Exception from container-launch.
> Container id: container_e1987_1564558112805_1794_02_01
> Exit code: 1
> Shell output: main : command provided 1
> main : run as user is user-etl
> main : requested yarn user is user-etl
> Getting exit code file...
> Creating script paths...
> Writing pid file...
> Writing to tmp file 
> /hadoop/2/yarn/local/nmPrivate/application_1564558112805_1794/container_e1987_1564558112805_1794_02_01/container_e1987_1564558112805_1794_02_01.pid.tmp
> Writing to cgroup task files...
> Creating local dirs...
> Launching container...
> Getting exit code file...
> Creating script paths...
> Container exited with a non-zero exit code 1. Error file: prelaunch.err.
> Last 4096 bytes of prelaunch.err :
> Last 4096 bytes of stderr :
> log4j:WARN No such property [maxFileSize] in 
> org.apache.log4j.rolling.RollingFileAppender.
> log4j:WARN No such property [maxBackupIndex] in 
> org.apache.log4j.rolling.RollingFileAppender.
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/hadoop/2/yarn/local/usercache/user-etl/filecache/58/__spark_libs__4358879230136591830.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/apache/releases/hbase-1.1.2.2.6.4.1/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/apache/releases/hadoop-2.7.3.2.6.4.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> Exception in thread "main" org.apache.spark.SparkException: Keytab file: 
> /apache/spark-2.3.0-bin-2.7.3/conf/user-etl.keytab does not exist
>   at 
> org.apache.spark.deploy.SparkHadoopUtil.loginUserFromKeytab(SparkHadoopUtil.scala:131)
>   at 
> org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:846)
>   at 
> org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:889)
>   at 
> org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
> Failing this attempt. Failing the application.
>   at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:95)
>   at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:185)
>   at org.apache.spark.SparkContext.(SparkContext.scala:509)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2466)
>   at 
> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$5(SparkSession.scala:948)
>   at scala.Option.getOrElse(Option.scala:138)
>   at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:939)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:48)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:315)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:166)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> 

[jira] [Reopened] (SPARK-28634) Failed to start SparkSession with Keytab file

2019-08-15 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reopened SPARK-28634:


> Failed to start SparkSession with Keytab file 
> --
>
> Key: SPARK-28634
> URL: https://issues.apache.org/jira/browse/SPARK-28634
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> {noformat}
> [user-etl@hermesdevour002-700165 spark-3.0.0-SNAPSHOT-bin-2.7.4]$ 
> bin/spark-sql --master yarn --conf 
> spark.yarn.keytab=/apache/spark-2.3.0-bin-2.7.3/conf/user-etl.keytab --conf 
> spark.yarn.principal=user-...@prod.example.com
> log4j:WARN No such property [maxFileSize] in 
> org.apache.log4j.rolling.RollingFileAppender.
> log4j:WARN No such property [maxBackupIndex] in 
> org.apache.log4j.rolling.RollingFileAppender.
> Exception in thread "main" org.apache.spark.SparkException: Application 
> application_1564558112805_1794 failed 2 times due to AM Container for 
> appattempt_1564558112805_1794_02 exited with  exitCode: 1
> For more detailed output, check the application tracking page: 
> https://0.0.0.0:8190/applicationhistory/app/application_1564558112805_1794 
> Then click on links to logs of each attempt.
> Diagnostics: Exception from container-launch.
> Container id: container_e1987_1564558112805_1794_02_01
> Exit code: 1
> Shell output: main : command provided 1
> main : run as user is user-etl
> main : requested yarn user is user-etl
> Getting exit code file...
> Creating script paths...
> Writing pid file...
> Writing to tmp file 
> /hadoop/2/yarn/local/nmPrivate/application_1564558112805_1794/container_e1987_1564558112805_1794_02_01/container_e1987_1564558112805_1794_02_01.pid.tmp
> Writing to cgroup task files...
> Creating local dirs...
> Launching container...
> Getting exit code file...
> Creating script paths...
> Container exited with a non-zero exit code 1. Error file: prelaunch.err.
> Last 4096 bytes of prelaunch.err :
> Last 4096 bytes of stderr :
> log4j:WARN No such property [maxFileSize] in 
> org.apache.log4j.rolling.RollingFileAppender.
> log4j:WARN No such property [maxBackupIndex] in 
> org.apache.log4j.rolling.RollingFileAppender.
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/hadoop/2/yarn/local/usercache/user-etl/filecache/58/__spark_libs__4358879230136591830.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/apache/releases/hbase-1.1.2.2.6.4.1/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/apache/releases/hadoop-2.7.3.2.6.4.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> Exception in thread "main" org.apache.spark.SparkException: Keytab file: 
> /apache/spark-2.3.0-bin-2.7.3/conf/user-etl.keytab does not exist
>   at 
> org.apache.spark.deploy.SparkHadoopUtil.loginUserFromKeytab(SparkHadoopUtil.scala:131)
>   at 
> org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:846)
>   at 
> org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:889)
>   at 
> org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
> Failing this attempt. Failing the application.
>   at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:95)
>   at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:185)
>   at org.apache.spark.SparkContext.(SparkContext.scala:509)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2466)
>   at 
> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$5(SparkSession.scala:948)
>   at scala.Option.getOrElse(Option.scala:138)
>   at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:939)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:48)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:315)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:166)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> 

[jira] [Commented] (SPARK-28634) Failed to start SparkSession with Keytab file

2019-08-15 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908450#comment-16908450
 ] 

Marcelo Vanzin commented on SPARK-28634:


I think it's still worth it to fix it, so that users with old configuration are 
not surprised by this.

> Failed to start SparkSession with Keytab file 
> --
>
> Key: SPARK-28634
> URL: https://issues.apache.org/jira/browse/SPARK-28634
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> {noformat}
> [user-etl@hermesdevour002-700165 spark-3.0.0-SNAPSHOT-bin-2.7.4]$ 
> bin/spark-sql --master yarn --conf 
> spark.yarn.keytab=/apache/spark-2.3.0-bin-2.7.3/conf/user-etl.keytab --conf 
> spark.yarn.principal=user-...@prod.example.com
> log4j:WARN No such property [maxFileSize] in 
> org.apache.log4j.rolling.RollingFileAppender.
> log4j:WARN No such property [maxBackupIndex] in 
> org.apache.log4j.rolling.RollingFileAppender.
> Exception in thread "main" org.apache.spark.SparkException: Application 
> application_1564558112805_1794 failed 2 times due to AM Container for 
> appattempt_1564558112805_1794_02 exited with  exitCode: 1
> For more detailed output, check the application tracking page: 
> https://0.0.0.0:8190/applicationhistory/app/application_1564558112805_1794 
> Then click on links to logs of each attempt.
> Diagnostics: Exception from container-launch.
> Container id: container_e1987_1564558112805_1794_02_01
> Exit code: 1
> Shell output: main : command provided 1
> main : run as user is user-etl
> main : requested yarn user is user-etl
> Getting exit code file...
> Creating script paths...
> Writing pid file...
> Writing to tmp file 
> /hadoop/2/yarn/local/nmPrivate/application_1564558112805_1794/container_e1987_1564558112805_1794_02_01/container_e1987_1564558112805_1794_02_01.pid.tmp
> Writing to cgroup task files...
> Creating local dirs...
> Launching container...
> Getting exit code file...
> Creating script paths...
> Container exited with a non-zero exit code 1. Error file: prelaunch.err.
> Last 4096 bytes of prelaunch.err :
> Last 4096 bytes of stderr :
> log4j:WARN No such property [maxFileSize] in 
> org.apache.log4j.rolling.RollingFileAppender.
> log4j:WARN No such property [maxBackupIndex] in 
> org.apache.log4j.rolling.RollingFileAppender.
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/hadoop/2/yarn/local/usercache/user-etl/filecache/58/__spark_libs__4358879230136591830.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/apache/releases/hbase-1.1.2.2.6.4.1/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/apache/releases/hadoop-2.7.3.2.6.4.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> Exception in thread "main" org.apache.spark.SparkException: Keytab file: 
> /apache/spark-2.3.0-bin-2.7.3/conf/user-etl.keytab does not exist
>   at 
> org.apache.spark.deploy.SparkHadoopUtil.loginUserFromKeytab(SparkHadoopUtil.scala:131)
>   at 
> org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:846)
>   at 
> org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:889)
>   at 
> org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
> Failing this attempt. Failing the application.
>   at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:95)
>   at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:185)
>   at org.apache.spark.SparkContext.(SparkContext.scala:509)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2466)
>   at 
> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$5(SparkSession.scala:948)
>   at scala.Option.getOrElse(Option.scala:138)
>   at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:939)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:48)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:315)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:166)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   

[jira] [Resolved] (SPARK-23977) Add commit protocol binding to Hadoop 3.1 PathOutputCommitter mechanism

2019-08-15 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-23977.

   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 24970
[https://github.com/apache/spark/pull/24970]

> Add commit protocol binding to Hadoop 3.1 PathOutputCommitter mechanism
> ---
>
> Key: SPARK-23977
> URL: https://issues.apache.org/jira/browse/SPARK-23977
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 3.0.0
>
>
> Hadoop 3.1 adds a mechanism for job-specific and store-specific committers 
> (MAPREDUCE-6823, MAPREDUCE-6956), and one key implementation, S3A committers, 
> HADOOP-13786
> These committers deliver high-performance output of MR and spark jobs to S3, 
> and offer the key semantics which Spark depends on: no visible output until 
> job commit, a failure of a task at an stage, including partway through task 
> commit, can be handled by executing and committing another task attempt. 
> In contrast, the FileOutputFormat commit algorithms on S3 have issues:
> * Awful performance because files are copied by rename
> * FileOutputFormat v1: weak task commit failure recovery semantics as the 
> (v1) expectation: "directory renames are atomic" doesn't hold.
> * S3 metadata eventual consistency can cause rename to miss files or fail 
> entirely (SPARK-15849)
> Note also that FileOutputFormat "v2" commit algorithm doesn't offer any of 
> the commit semantics w.r.t observability of or recovery from task commit 
> failure, on any filesystem.
> The S3A committers address these by way of uploading all data to the 
> destination through multipart uploads, uploads which are only completed in 
> job commit.
> The new {{PathOutputCommitter}} factory mechanism allows applications to work 
> with the S3A committers and any other, by adding a plugin mechanism into the 
> MRv2 FileOutputFormat class, where it job config and filesystem configuration 
> options can dynamically choose the output committer.
> Spark can use these with some binding classes to 
> # Add a subclass of {{HadoopMapReduceCommitProtocol}} which uses the MRv2 
> classes and {{PathOutputCommitterFactory}} to create the committers.
> # Add a {{BindingParquetOutputCommitter extends ParquetOutputCommitter}}
> to wire up Parquet output even when code requires the committer to be a 
> subclass of {{ParquetOutputCommitter}}
> This patch builds on SPARK-23807 for setting up the dependencies.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23977) Add commit protocol binding to Hadoop 3.1 PathOutputCommitter mechanism

2019-08-15 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-23977:
--

Assignee: Steve Loughran

> Add commit protocol binding to Hadoop 3.1 PathOutputCommitter mechanism
> ---
>
> Key: SPARK-23977
> URL: https://issues.apache.org/jira/browse/SPARK-23977
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>
> Hadoop 3.1 adds a mechanism for job-specific and store-specific committers 
> (MAPREDUCE-6823, MAPREDUCE-6956), and one key implementation, S3A committers, 
> HADOOP-13786
> These committers deliver high-performance output of MR and spark jobs to S3, 
> and offer the key semantics which Spark depends on: no visible output until 
> job commit, a failure of a task at an stage, including partway through task 
> commit, can be handled by executing and committing another task attempt. 
> In contrast, the FileOutputFormat commit algorithms on S3 have issues:
> * Awful performance because files are copied by rename
> * FileOutputFormat v1: weak task commit failure recovery semantics as the 
> (v1) expectation: "directory renames are atomic" doesn't hold.
> * S3 metadata eventual consistency can cause rename to miss files or fail 
> entirely (SPARK-15849)
> Note also that FileOutputFormat "v2" commit algorithm doesn't offer any of 
> the commit semantics w.r.t observability of or recovery from task commit 
> failure, on any filesystem.
> The S3A committers address these by way of uploading all data to the 
> destination through multipart uploads, uploads which are only completed in 
> job commit.
> The new {{PathOutputCommitter}} factory mechanism allows applications to work 
> with the S3A committers and any other, by adding a plugin mechanism into the 
> MRv2 FileOutputFormat class, where it job config and filesystem configuration 
> options can dynamically choose the output committer.
> Spark can use these with some binding classes to 
> # Add a subclass of {{HadoopMapReduceCommitProtocol}} which uses the MRv2 
> classes and {{PathOutputCommitterFactory}} to create the committers.
> # Add a {{BindingParquetOutputCommitter extends ParquetOutputCommitter}}
> to wire up Parquet output even when code requires the committer to be a 
> subclass of {{ParquetOutputCommitter}}
> This patch builds on SPARK-23807 for setting up the dependencies.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28487) K8S pod allocator behaves poorly with dynamic allocation

2019-08-13 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-28487.

   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25236
[https://github.com/apache/spark/pull/25236]

> K8S pod allocator behaves poorly with dynamic allocation
> 
>
> Key: SPARK-28487
> URL: https://issues.apache.org/jira/browse/SPARK-28487
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>Priority: Major
> Fix For: 3.0.0
>
>
> The current pod allocator in Spark is tuned towards the behavior without 
> dynamic allocation; it needs some enhancements so that dynamic allocation 
> behaves better on K8S.
> I'll be submitting an updated and enhanced version of a patch we've used 
> internally.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28487) K8S pod allocator behaves poorly with dynamic allocation

2019-08-13 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-28487:
--

Assignee: Marcelo Vanzin

> K8S pod allocator behaves poorly with dynamic allocation
> 
>
> Key: SPARK-28487
> URL: https://issues.apache.org/jira/browse/SPARK-28487
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>Priority: Major
>
> The current pod allocator in Spark is tuned towards the behavior without 
> dynamic allocation; it needs some enhancements so that dynamic allocation 
> behaves better on K8S.
> I'll be submitting an updated and enhanced version of a patch we've used 
> internally.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28638) Task summary metrics are wrong when there are running tasks

2019-08-12 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-28638:
--

Assignee: Gengliang Wang

> Task summary metrics are wrong when there are running tasks
> ---
>
> Key: SPARK-28638
> URL: https://issues.apache.org/jira/browse/SPARK-28638
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Currently, on requesting summary metrics, cached data are returned if the 
> current number of successful tasks is the same as the cached data.
> However, the number of successful tasks is wrong. In `AppStatusStore`, the 
> KVStore is ElementTrackingStore, instead of InMemoryStore. This PR is to fix 
> the class matching.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28638) Task summary metrics are wrong when there are running tasks

2019-08-12 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-28638.

   Resolution: Fixed
Fix Version/s: 2.4.4
   3.0.0

Issue resolved by pull request 25369
[https://github.com/apache/spark/pull/25369]

> Task summary metrics are wrong when there are running tasks
> ---
>
> Key: SPARK-28638
> URL: https://issues.apache.org/jira/browse/SPARK-28638
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.0.0, 2.4.4
>
>
> Currently, on requesting summary metrics, cached data are returned if the 
> current number of successful tasks is the same as the cached data.
> However, the number of successful tasks is wrong. In `AppStatusStore`, the 
> KVStore is ElementTrackingStore, instead of InMemoryStore. This PR is to fix 
> the class matching.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28634) Failed to start SparkSession with Keytab file

2019-08-07 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902238#comment-16902238
 ] 

Marcelo Vanzin commented on SPARK-28634:


Ah. If you use {{--principal}} and {{--keytab}} this works.

The config name has changed in master and you're using the deprecated ones; the 
YARN client code removes them from the config in client mode, but only the new 
names:
https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L769

For proper backwards compatibility it needs to remove the old names too. (Or 
make a change in the AM instead to ignore the keytab when running in client 
mode, which avoids the above hack.)

> Failed to start SparkSession with Keytab file 
> --
>
> Key: SPARK-28634
> URL: https://issues.apache.org/jira/browse/SPARK-28634
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> {noformat}
> [user-etl@hermesdevour002-700165 spark-3.0.0-SNAPSHOT-bin-2.7.4]$ 
> bin/spark-sql --master yarn --conf 
> spark.yarn.keytab=/apache/spark-2.3.0-bin-2.7.3/conf/user-etl.keytab --conf 
> spark.yarn.principal=user-...@prod.example.com
> log4j:WARN No such property [maxFileSize] in 
> org.apache.log4j.rolling.RollingFileAppender.
> log4j:WARN No such property [maxBackupIndex] in 
> org.apache.log4j.rolling.RollingFileAppender.
> Exception in thread "main" org.apache.spark.SparkException: Application 
> application_1564558112805_1794 failed 2 times due to AM Container for 
> appattempt_1564558112805_1794_02 exited with  exitCode: 1
> For more detailed output, check the application tracking page: 
> https://0.0.0.0:8190/applicationhistory/app/application_1564558112805_1794 
> Then click on links to logs of each attempt.
> Diagnostics: Exception from container-launch.
> Container id: container_e1987_1564558112805_1794_02_01
> Exit code: 1
> Shell output: main : command provided 1
> main : run as user is user-etl
> main : requested yarn user is user-etl
> Getting exit code file...
> Creating script paths...
> Writing pid file...
> Writing to tmp file 
> /hadoop/2/yarn/local/nmPrivate/application_1564558112805_1794/container_e1987_1564558112805_1794_02_01/container_e1987_1564558112805_1794_02_01.pid.tmp
> Writing to cgroup task files...
> Creating local dirs...
> Launching container...
> Getting exit code file...
> Creating script paths...
> Container exited with a non-zero exit code 1. Error file: prelaunch.err.
> Last 4096 bytes of prelaunch.err :
> Last 4096 bytes of stderr :
> log4j:WARN No such property [maxFileSize] in 
> org.apache.log4j.rolling.RollingFileAppender.
> log4j:WARN No such property [maxBackupIndex] in 
> org.apache.log4j.rolling.RollingFileAppender.
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/hadoop/2/yarn/local/usercache/user-etl/filecache/58/__spark_libs__4358879230136591830.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/apache/releases/hbase-1.1.2.2.6.4.1/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/apache/releases/hadoop-2.7.3.2.6.4.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> Exception in thread "main" org.apache.spark.SparkException: Keytab file: 
> /apache/spark-2.3.0-bin-2.7.3/conf/user-etl.keytab does not exist
>   at 
> org.apache.spark.deploy.SparkHadoopUtil.loginUserFromKeytab(SparkHadoopUtil.scala:131)
>   at 
> org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:846)
>   at 
> org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:889)
>   at 
> org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
> Failing this attempt. Failing the application.
>   at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:95)
>   at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:185)
>   at org.apache.spark.SparkContext.(SparkContext.scala:509)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2466)
>   at 
> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$5(SparkSession.scala:948)
>   at scala.Option.getOrElse(Option.scala:138)
>   at 
> 

[jira] [Comment Edited] (SPARK-28634) Failed to start SparkSession with Keytab file

2019-08-07 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902238#comment-16902238
 ] 

Marcelo Vanzin edited comment on SPARK-28634 at 8/7/19 4:48 PM:


Ah. If you use {{\-\-principal}} and {{\-\-keytab}} this works.

The config name has changed in master and you're using the deprecated ones; the 
YARN client code removes them from the config in client mode, but only the new 
names:
https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L769

For proper backwards compatibility it needs to remove the old names too. (Or 
make a change in the AM instead to ignore the keytab when running in client 
mode, which avoids the above hack.)


was (Author: vanzin):
Ah. If you use {{--principal}} and {{--keytab}} this works.

The config name has changed in master and you're using the deprecated ones; the 
YARN client code removes them from the config in client mode, but only the new 
names:
https://github.com/apache/spark/blob/master/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L769

For proper backwards compatibility it needs to remove the old names too. (Or 
make a change in the AM instead to ignore the keytab when running in client 
mode, which avoids the above hack.)

> Failed to start SparkSession with Keytab file 
> --
>
> Key: SPARK-28634
> URL: https://issues.apache.org/jira/browse/SPARK-28634
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> {noformat}
> [user-etl@hermesdevour002-700165 spark-3.0.0-SNAPSHOT-bin-2.7.4]$ 
> bin/spark-sql --master yarn --conf 
> spark.yarn.keytab=/apache/spark-2.3.0-bin-2.7.3/conf/user-etl.keytab --conf 
> spark.yarn.principal=user-...@prod.example.com
> log4j:WARN No such property [maxFileSize] in 
> org.apache.log4j.rolling.RollingFileAppender.
> log4j:WARN No such property [maxBackupIndex] in 
> org.apache.log4j.rolling.RollingFileAppender.
> Exception in thread "main" org.apache.spark.SparkException: Application 
> application_1564558112805_1794 failed 2 times due to AM Container for 
> appattempt_1564558112805_1794_02 exited with  exitCode: 1
> For more detailed output, check the application tracking page: 
> https://0.0.0.0:8190/applicationhistory/app/application_1564558112805_1794 
> Then click on links to logs of each attempt.
> Diagnostics: Exception from container-launch.
> Container id: container_e1987_1564558112805_1794_02_01
> Exit code: 1
> Shell output: main : command provided 1
> main : run as user is user-etl
> main : requested yarn user is user-etl
> Getting exit code file...
> Creating script paths...
> Writing pid file...
> Writing to tmp file 
> /hadoop/2/yarn/local/nmPrivate/application_1564558112805_1794/container_e1987_1564558112805_1794_02_01/container_e1987_1564558112805_1794_02_01.pid.tmp
> Writing to cgroup task files...
> Creating local dirs...
> Launching container...
> Getting exit code file...
> Creating script paths...
> Container exited with a non-zero exit code 1. Error file: prelaunch.err.
> Last 4096 bytes of prelaunch.err :
> Last 4096 bytes of stderr :
> log4j:WARN No such property [maxFileSize] in 
> org.apache.log4j.rolling.RollingFileAppender.
> log4j:WARN No such property [maxBackupIndex] in 
> org.apache.log4j.rolling.RollingFileAppender.
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/hadoop/2/yarn/local/usercache/user-etl/filecache/58/__spark_libs__4358879230136591830.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/apache/releases/hbase-1.1.2.2.6.4.1/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/apache/releases/hadoop-2.7.3.2.6.4.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> Exception in thread "main" org.apache.spark.SparkException: Keytab file: 
> /apache/spark-2.3.0-bin-2.7.3/conf/user-etl.keytab does not exist
>   at 
> org.apache.spark.deploy.SparkHadoopUtil.loginUserFromKeytab(SparkHadoopUtil.scala:131)
>   at 
> org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:846)
>   at 
> org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:889)
>   at 
> org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
> Failing this attempt. Failing the application.
>   at 
> 

[jira] [Updated] (SPARK-28584) Flaky test: org.apache.spark.scheduler.TaskSchedulerImplSuite

2019-08-01 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated SPARK-28584:
---
Summary: Flaky test: org.apache.spark.scheduler.TaskSchedulerImplSuite  
(was: Fix thread safety issue in blacklist timer, tests)

Please keep the original title. It describes the problem that lead to the fix. 
And make searching easier.

> Flaky test: org.apache.spark.scheduler.TaskSchedulerImplSuite
> -
>
> Key: SPARK-28584
> URL: https://issues.apache.org/jira/browse/SPARK-28584
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>Priority: Minor
> Fix For: 3.0.0
>
>
> This is another of those tests that don't seem to fail in PRs here, but fail 
> more often than we'd like in our build machines. In this case it fails in 
> several different ways, e.g.:
> {noformat}
> org.scalatest.exceptions.TestFailedException: 
> Map(org.apache.spark.scheduler.TaskSetManager$$EnhancerByMockitoWithCGLIB$$c676cf51@412f9d43
>  -> 1550579875956) did not contain key 
> org.apache.spark.scheduler.TaskSetManager$$EnhancerByMockitoWithCGLIB$$c676cf51@1945f15f
>   at 
> org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:528)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:501)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImplSuite$$anonfun$21.apply(TaskSchedulerImplSuite.scala:635)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImplSuite$$anonfun$21.apply(TaskSchedulerImplSuite.scala:591)
> {noformat}
> Or:
> {noformat}
> The code passed to eventually never returned normally. Attempted 40 times 
> over 503.217543 milliseconds. Last failure message: tsm.isZombie was false.
> Error message:
> org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to 
> eventually never returned normally. Attempted 40 times over 503.217543 
> milliseconds. Last failure message: tsm.isZombie was false.
>   at 
> org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:421)
>   at 
> org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:439)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImplSuite.eventually(TaskSchedulerImplSuite.scala:44)
>   at 
> org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:337)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImplSuite.eventually(TaskSchedulerImplSuite.scala:44)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImplSuite$$anonfun$18.apply(TaskSchedulerImplSuite.scala:543)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImplSuite$$anonfun$18.apply(TaskSchedulerImplSuite.scala:511)
> {noformat}
> There's a race condition in the test that can cause these different failures.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-28584) Flaky test: org.apache.spark.scheduler.TaskSchedulerImplSuite

2019-08-01 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898225#comment-16898225
 ] 

Marcelo Vanzin edited comment on SPARK-28584 at 8/1/19 5:42 PM:


Please keep the original title. It describes the problem that lead to the fix. 
And makes searching easier.


was (Author: vanzin):
Please keep the original title. It describes the problem that lead to the fix. 
And make searching easier.

> Flaky test: org.apache.spark.scheduler.TaskSchedulerImplSuite
> -
>
> Key: SPARK-28584
> URL: https://issues.apache.org/jira/browse/SPARK-28584
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>Priority: Minor
> Fix For: 3.0.0
>
>
> This is another of those tests that don't seem to fail in PRs here, but fail 
> more often than we'd like in our build machines. In this case it fails in 
> several different ways, e.g.:
> {noformat}
> org.scalatest.exceptions.TestFailedException: 
> Map(org.apache.spark.scheduler.TaskSetManager$$EnhancerByMockitoWithCGLIB$$c676cf51@412f9d43
>  -> 1550579875956) did not contain key 
> org.apache.spark.scheduler.TaskSetManager$$EnhancerByMockitoWithCGLIB$$c676cf51@1945f15f
>   at 
> org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:528)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:501)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImplSuite$$anonfun$21.apply(TaskSchedulerImplSuite.scala:635)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImplSuite$$anonfun$21.apply(TaskSchedulerImplSuite.scala:591)
> {noformat}
> Or:
> {noformat}
> The code passed to eventually never returned normally. Attempted 40 times 
> over 503.217543 milliseconds. Last failure message: tsm.isZombie was false.
> Error message:
> org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to 
> eventually never returned normally. Attempted 40 times over 503.217543 
> milliseconds. Last failure message: tsm.isZombie was false.
>   at 
> org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:421)
>   at 
> org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:439)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImplSuite.eventually(TaskSchedulerImplSuite.scala:44)
>   at 
> org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:337)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImplSuite.eventually(TaskSchedulerImplSuite.scala:44)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImplSuite$$anonfun$18.apply(TaskSchedulerImplSuite.scala:543)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImplSuite$$anonfun$18.apply(TaskSchedulerImplSuite.scala:511)
> {noformat}
> There's a race condition in the test that can cause these different failures.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28564) Access history application defaults to the last attempt id

2019-07-31 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-28564.

   Resolution: Fixed
Fix Version/s: 2.4.4
   3.0.0

Issue resolved by pull request 25301
[https://github.com/apache/spark/pull/25301]

> Access history application defaults to the last attempt id
> --
>
> Key: SPARK-28564
> URL: https://issues.apache.org/jira/browse/SPARK-28564
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: dzcxzl
>Assignee: dzcxzl
>Priority: Trivial
> Fix For: 3.0.0, 2.4.4
>
>
> When we set spark.history.ui.maxApplications to a small value, we can't get 
> some apps from the page search.
> If the url is spliced (http://localhost:18080/history/local-xxx), it can be 
> accessed if the app has no attempt.
> But in the case of multiple attempted apps, such a url cannot be accessed, 
> and the page displays Not Found.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28564) Access history application defaults to the last attempt id

2019-07-31 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-28564:
--

Assignee: dzcxzl

> Access history application defaults to the last attempt id
> --
>
> Key: SPARK-28564
> URL: https://issues.apache.org/jira/browse/SPARK-28564
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: dzcxzl
>Assignee: dzcxzl
>Priority: Trivial
>
> When we set spark.history.ui.maxApplications to a small value, we can't get 
> some apps from the page search.
> If the url is spliced (http://localhost:18080/history/local-xxx), it can be 
> accessed if the app has no attempt.
> But in the case of multiple attempted apps, such a url cannot be accessed, 
> and the page displays Not Found.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28584) Flaky test: org.apache.spark.scheduler.TaskSchedulerImplSuite

2019-07-31 Thread Marcelo Vanzin (JIRA)
Marcelo Vanzin created SPARK-28584:
--

 Summary: Flaky test: 
org.apache.spark.scheduler.TaskSchedulerImplSuite
 Key: SPARK-28584
 URL: https://issues.apache.org/jira/browse/SPARK-28584
 Project: Spark
  Issue Type: Bug
  Components: Tests
Affects Versions: 3.0.0
Reporter: Marcelo Vanzin


This is another of those tests that don't seem to fail in PRs here, but fail 
more often than we'd like in our build machines. In this case it fails in 
several different ways, e.g.:

{noformat}
org.scalatest.exceptions.TestFailedException: 
Map(org.apache.spark.scheduler.TaskSetManager$$EnhancerByMockitoWithCGLIB$$c676cf51@412f9d43
 -> 1550579875956) did not contain key 
org.apache.spark.scheduler.TaskSetManager$$EnhancerByMockitoWithCGLIB$$c676cf51@1945f15f
  at 
org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:528)
  at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
  at 
org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:501)
  at 
org.apache.spark.scheduler.TaskSchedulerImplSuite$$anonfun$21.apply(TaskSchedulerImplSuite.scala:635)
  at 
org.apache.spark.scheduler.TaskSchedulerImplSuite$$anonfun$21.apply(TaskSchedulerImplSuite.scala:591)
{noformat}

Or:

{noformat}
The code passed to eventually never returned normally. Attempted 40 times over 
503.217543 milliseconds. Last failure message: tsm.isZombie was false.

Error message:
org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to 
eventually never returned normally. Attempted 40 times over 503.217543 
milliseconds. Last failure message: tsm.isZombie was false.
  at 
org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:421)
  at 
org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:439)
  at 
org.apache.spark.scheduler.TaskSchedulerImplSuite.eventually(TaskSchedulerImplSuite.scala:44)
  at 
org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:337)
  at 
org.apache.spark.scheduler.TaskSchedulerImplSuite.eventually(TaskSchedulerImplSuite.scala:44)
  at 
org.apache.spark.scheduler.TaskSchedulerImplSuite$$anonfun$18.apply(TaskSchedulerImplSuite.scala:543)
  at 
org.apache.spark.scheduler.TaskSchedulerImplSuite$$anonfun$18.apply(TaskSchedulerImplSuite.scala:511)
{noformat}

There's a race condition in the test that can cause these different failures.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28209) Shuffle Storage API: Writes

2019-07-30 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-28209.

   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25007
[https://github.com/apache/spark/pull/25007]

> Shuffle Storage API: Writes
> ---
>
> Key: SPARK-28209
> URL: https://issues.apache.org/jira/browse/SPARK-28209
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle
>Affects Versions: 3.0.0
>Reporter: Matt Cheah
>Assignee: Matt Cheah
>Priority: Major
> Fix For: 3.0.0
>
>
> Adds the write-side API for storing shuffle data in arbitrary storage 
> systems. Also refactor the existing shuffle write code so that it uses this 
> API.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28209) Shuffle Storage API: Writes

2019-07-30 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-28209:
--

Assignee: Matt Cheah

> Shuffle Storage API: Writes
> ---
>
> Key: SPARK-28209
> URL: https://issues.apache.org/jira/browse/SPARK-28209
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle
>Affects Versions: 3.0.0
>Reporter: Matt Cheah
>Assignee: Matt Cheah
>Priority: Major
>
> Adds the write-side API for storing shuffle data in arbitrary storage 
> systems. Also refactor the existing shuffle write code so that it uses this 
> API.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28525) Allow Launcher to be applied Java options

2019-07-30 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-28525.

   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25265
[https://github.com/apache/spark/pull/25265]

> Allow Launcher to be applied Java options
> -
>
> Key: SPARK-28525
> URL: https://issues.apache.org/jira/browse/SPARK-28525
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Affects Versions: 3.0.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
> Fix For: 3.0.0
>
>
> Launcher is implemented as a Java application and sometimes I'd like to apply 
> Java options.
> One situation I have met is the time I try to attach debugger to Launcher.
> Launcher is launched from bin/spark-class but there is no room to apply Java 
> options.
> {code}
> build_command() {
>   "$RUNNER" -Xmx128m -cp "$LAUNCH_CLASSPATH" org.apache.spark.launcher.Main 
> "$@"
>   printf "%d\0" $?
> }
> {code}
> Considering that it's not so many times to apply Java options to Launcher,  
> one compromise would just modify spark-class by user like as follows.
> {code}
> build_command() {
>   "$RUNNER" -Xmx128m $SPARK_LAUNCHER_OPTS -cp "$LAUNCH_CLASSPATH" 
> org.apache.spark.launcher.Main "$@"
>   printf "%d\0" $?
> }
> {code}
> But it doesn't work when any text related to Java options is output to 
> standard output because whole output is used as command-string for 
> spark-shell and spark-submit in current implementation.
> One example is jdwp. When apply agentlib option to use jdwp for debug, we 
> will get output like as follows.
> {code}
> Listening for transport dt_socket at address: 9876
> {code}
> The output shown above is not a command-string so spark-submit and 
> spark-shell will fail.
> To enable Java options for Launcher, we need treat command-string and others.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28042) Support mapping spark.local.dir to hostPath volume

2019-07-29 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-28042:
--

Assignee: Junjie Chen

> Support mapping spark.local.dir to hostPath volume
> --
>
> Key: SPARK-28042
> URL: https://issues.apache.org/jira/browse/SPARK-28042
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Junjie Chen
>Assignee: Junjie Chen
>Priority: Minor
> Fix For: 3.0.0
>
>
> Currently, the k8s executor builder mount spark.local.dir as emptyDir or 
> memory, it should satisfy some small workload, while in some heavily workload 
> like TPCDS, both of them can have some problem, such as pods are evicted due 
> to disk pressure when using emptyDir, and OOM when using tmpfs.
> In particular on cloud environment, users may allocate cluster with minimum 
> configuration and add cloud storage when running workload. In this case, we 
> can specify multiple elastic storage as spark.local.dir to accelerate the 
> spilling. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28042) Support mapping spark.local.dir to hostPath volume

2019-07-29 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-28042.

   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 24879
[https://github.com/apache/spark/pull/24879]

> Support mapping spark.local.dir to hostPath volume
> --
>
> Key: SPARK-28042
> URL: https://issues.apache.org/jira/browse/SPARK-28042
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Junjie Chen
>Priority: Minor
> Fix For: 3.0.0
>
>
> Currently, the k8s executor builder mount spark.local.dir as emptyDir or 
> memory, it should satisfy some small workload, while in some heavily workload 
> like TPCDS, both of them can have some problem, such as pods are evicted due 
> to disk pressure when using emptyDir, and OOM when using tmpfs.
> In particular on cloud environment, users may allocate cluster with minimum 
> configuration and add cloud storage when running workload. In this case, we 
> can specify multiple elastic storage as spark.local.dir to accelerate the 
> spilling. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28535) Flaky test: JobCancellationSuite."interruptible iterator of shuffle reader"

2019-07-26 Thread Marcelo Vanzin (JIRA)
Marcelo Vanzin created SPARK-28535:
--

 Summary: Flaky test: JobCancellationSuite."interruptible iterator 
of shuffle reader"
 Key: SPARK-28535
 URL: https://issues.apache.org/jira/browse/SPARK-28535
 Project: Spark
  Issue Type: Bug
  Components: Tests
Affects Versions: 3.0.0
Reporter: Marcelo Vanzin


This is the same flakiness as in SPARK-23881, except the fix there didn't 
really take, at least on our build machines.

{noformat}
org.scalatest.exceptions.TestFailedException: 1 was not less than 1
  at 
org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:528)
  at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
  at 
org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:501)
{noformat}

Since that bug is short on explanations, the issue is that there's a race 
between the thread posting the "stage completed" event to the listener which 
unblocks the test, and the thread killing the task in the executor. If the even 
arrives first, it will unblock task execution, and there's a chance that all 
elements will actually be processed before the executor has a chance to stop 
the task.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25285) Add executor task metrics to track the number of tasks started and of tasks successfully completed

2019-07-26 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-25285:
--

Assignee: Luca Canali

> Add executor task metrics to track the number of tasks started and of tasks 
> successfully completed
> --
>
> Key: SPARK-25285
> URL: https://issues.apache.org/jira/browse/SPARK-25285
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Luca Canali
>Assignee: Luca Canali
>Priority: Minor
>
> The motivation for these additional metrics is to help in troubleshooting and 
> monitoring task execution on a cluster. Currently available metrics include 
> executor threadpool metrics for task completed and for active tasks. The 
> addition of threadpool tasStarted metric will allow for example to collect 
> info on the (approximate) number of failed tasks by computing the difference 
> thread started – (active threads + completed tasks and/or successfully 
> finished tasks).
>  The proposed metric finishedTasks is also intended for this type of 
> troubleshooting. The difference between finshedTasks and 
> threadpool.completeTasks, is that the latter is a (dropwizard library) gauge 
> taken from the threadpool, while the former is a (dropwizard) counter 
> computed in the [[Executor]] class, when a task successfully finishes, 
> together with several other task metrics counters.
>  Note, there are similarities with some of the metrics introduced in 
> SPARK-24398, however there are key differences, coming from the fact that 
> this PR concerns the executor source, therefore providing metric values per 
> executor + metric values do not require to pass through the listerner bus in 
> this case.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25285) Add executor task metrics to track the number of tasks started and of tasks successfully completed

2019-07-26 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-25285.

   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 22290
[https://github.com/apache/spark/pull/22290]

> Add executor task metrics to track the number of tasks started and of tasks 
> successfully completed
> --
>
> Key: SPARK-25285
> URL: https://issues.apache.org/jira/browse/SPARK-25285
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Luca Canali
>Assignee: Luca Canali
>Priority: Minor
> Fix For: 3.0.0
>
>
> The motivation for these additional metrics is to help in troubleshooting and 
> monitoring task execution on a cluster. Currently available metrics include 
> executor threadpool metrics for task completed and for active tasks. The 
> addition of threadpool tasStarted metric will allow for example to collect 
> info on the (approximate) number of failed tasks by computing the difference 
> thread started – (active threads + completed tasks and/or successfully 
> finished tasks).
>  The proposed metric finishedTasks is also intended for this type of 
> troubleshooting. The difference between finshedTasks and 
> threadpool.completeTasks, is that the latter is a (dropwizard library) gauge 
> taken from the threadpool, while the former is a (dropwizard) counter 
> computed in the [[Executor]] class, when a task successfully finishes, 
> together with several other task metrics counters.
>  Note, there are similarities with some of the metrics introduced in 
> SPARK-24398, however there are key differences, coming from the fact that 
> this PR concerns the executor source, therefore providing metric values per 
> executor + metric values do not require to pass through the listerner bus in 
> this case.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28465) K8s integration tests fail due to missing ceph-nano image

2019-07-24 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-28465.

   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25222
[https://github.com/apache/spark/pull/25222]

> K8s integration tests fail due to missing ceph-nano image
> -
>
> Key: SPARK-28465
> URL: https://issues.apache.org/jira/browse/SPARK-28465
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Stavros Kontopoulos
>Assignee: Stavros Kontopoulos
>Priority: Major
> Fix For: 3.0.0
>
>
> Image added here: 
> [https://github.com/lightbend/spark/blob/72c80ee81ca4c3c9569749b54e2db0ec91b128a5/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala#L66]
>  needs to be updated to the latest as it was removed from dockerhub.
> {quote}docker pull ceph/daemon:v4.0.0-stable-4.0-master-centos-7-x86_64
>  Error response from daemon: manifest for 
> ceph/daemon:v4.0.0-stable-4.0-master-centos-7-x86_64 not found
> {quote}
> Also we need to apply this fix: 
> [https://github.com/ceph/cn/issues/115#issuecomment-497384369]
> I will create a PR shortly.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28465) K8s integration tests fail due to missing ceph-nano image

2019-07-24 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-28465:
--

Assignee: Stavros Kontopoulos

> K8s integration tests fail due to missing ceph-nano image
> -
>
> Key: SPARK-28465
> URL: https://issues.apache.org/jira/browse/SPARK-28465
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Stavros Kontopoulos
>Assignee: Stavros Kontopoulos
>Priority: Major
>
> Image added here: 
> [https://github.com/lightbend/spark/blob/72c80ee81ca4c3c9569749b54e2db0ec91b128a5/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala#L66]
>  needs to be updated to the latest as it was removed from dockerhub.
> {quote}docker pull ceph/daemon:v4.0.0-stable-4.0-master-centos-7-x86_64
>  Error response from daemon: manifest for 
> ceph/daemon:v4.0.0-stable-4.0-master-centos-7-x86_64 not found
> {quote}
> Also we need to apply this fix: 
> [https://github.com/ceph/cn/issues/115#issuecomment-497384369]
> I will create a PR shortly.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28496) Use branch name instead of tag during dry-run

2019-07-24 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-28496.

   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25240
[https://github.com/apache/spark/pull/25240]

> Use branch name instead of tag during dry-run
> -
>
> Key: SPARK-28496
> URL: https://issues.apache.org/jira/browse/SPARK-28496
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.0.0
>
>
> There are two cases when we use `dry run`.
> First, when the tag already exists, we can ask `confirmation` on the existing 
> tag name.
> {code}
> $ dev/create-release/do-release-docker.sh -d /tmp/spark-2.4.4 -n -s docs
> Output directory already exists. Overwrite and continue? [y/n] y
> Branch [branch-2.4]:
> Current branch version is 2.4.4-SNAPSHOT.
> Release [2.4.4]: 2.4.3
> RC # [1]:
> v2.4.3-rc1 already exists. Continue anyway [y/n]? y
> This is a dry run. Please confirm the ref that will be built for testing.
> Ref [v2.4.3-rc1]:
> {code}
> Second, when the tag doesn't exist, we had better ask `confirmation` on the 
> branch name.
> {code}
> $ dev/create-release/do-release-docker.sh -d /tmp/spark-2.4.4 -n -s docs
> Branch [branch-2.4]:
> Current branch version is 2.4.4-SNAPSHOT.
> Release [2.4.4]:
> RC # [1]:
> This is a dry run. Please confirm the ref that will be built for testing.
> Ref [v2.4.4-rc1]:
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28496) Use branch name instead of tag during dry-run

2019-07-24 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-28496:
--

Assignee: Dongjoon Hyun

> Use branch name instead of tag during dry-run
> -
>
> Key: SPARK-28496
> URL: https://issues.apache.org/jira/browse/SPARK-28496
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>
> There are two cases when we use `dry run`.
> First, when the tag already exists, we can ask `confirmation` on the existing 
> tag name.
> {code}
> $ dev/create-release/do-release-docker.sh -d /tmp/spark-2.4.4 -n -s docs
> Output directory already exists. Overwrite and continue? [y/n] y
> Branch [branch-2.4]:
> Current branch version is 2.4.4-SNAPSHOT.
> Release [2.4.4]: 2.4.3
> RC # [1]:
> v2.4.3-rc1 already exists. Continue anyway [y/n]? y
> This is a dry run. Please confirm the ref that will be built for testing.
> Ref [v2.4.3-rc1]:
> {code}
> Second, when the tag doesn't exist, we had better ask `confirmation` on the 
> branch name.
> {code}
> $ dev/create-release/do-release-docker.sh -d /tmp/spark-2.4.4 -n -s docs
> Branch [branch-2.4]:
> Current branch version is 2.4.4-SNAPSHOT.
> Release [2.4.4]:
> RC # [1]:
> This is a dry run. Please confirm the ref that will be built for testing.
> Ref [v2.4.4-rc1]:
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28509) K8S integration tests are failing

2019-07-24 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892079#comment-16892079
 ] 

Marcelo Vanzin commented on SPARK-28509:


[~shaneknapp] in case this is an infra issue.

> K8S integration tests are failing
> -
>
> Key: SPARK-28509
> URL: https://issues.apache.org/jira/browse/SPARK-28509
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Tests
>Affects Versions: 3.0.0
>Reporter: Marcelo Vanzin
>Priority: Major
>
> I've been seeing lots of failures in master. e.g. 
> https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/13180/console
> {noformat}
> - Start pod creation from template *** FAILED ***
>   io.fabric8.kubernetes.client.KubernetesClientException: 404 page not found
>   at 
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2.onFailure(WatchConnectionManager.java:201)
>   at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:571)
>   at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:198)
>   at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
>   at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>   ...
> - PVs with local storage *** FAILED ***
>   io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: 
> POST at: https://192.168.39.112:8443/api/v1/persistentvolumes. Message: 
> PersistentVolume "test-local-pv" is invalid: [spec.local: Forbidden: Local 
> volumes are disabled by feature-gate, metadata.annotations: Required value: 
> Local volume requires node affinity]. Received status: Status(apiVersion=v1, 
> code=422, details=StatusDetails(causes=[StatusCause(field=spec.local, 
> message=Forbidden: Local volumes are disabled by feature-gate, 
> reason=FieldValueForbidden, additionalProperties={}), 
> StatusCause(field=metadata.annotations, message=Required value: Local volume 
> requires node affinity, reason=FieldValueRequired, additionalProperties={})], 
> group=null, kind=PersistentVolume, name=test-local-pv, 
> retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, 
> message=PersistentVolume "test-local-pv" is invalid: [spec.local: Forbidden: 
> Local volumes are disabled by feature-gate, metadata.annotations: Required 
> value: Local volume requires node affinity], 
> metadata=ListMeta(_continue=null, resourceVersion=null, selfLink=null, 
> additionalProperties={}), reason=Invalid, status=Failure, 
> additionalProperties={}).
>   at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:478)
>   at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:417)
>   at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:381)
>   at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:344)
>   at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:227)
>   at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:787)
>   at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:357)
>   at 
> org.apache.spark.deploy.k8s.integrationtest.PVTestsSuite.setupLocalStorage(PVTestsSuite.scala:87)
>   at 
> org.apache.spark.deploy.k8s.integrationtest.PVTestsSuite.$anonfun$$init$$1(PVTestsSuite.scala:137)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   ...
> - Launcher client dependencies *** FAILED ***
>   The code passed to eventually never returned normally. Attempted 1 times 
> over 6.67390320003 minutes. Last failure message: assertion failed: 
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28509) K8S integration tests are failing

2019-07-24 Thread Marcelo Vanzin (JIRA)
Marcelo Vanzin created SPARK-28509:
--

 Summary: K8S integration tests are failing
 Key: SPARK-28509
 URL: https://issues.apache.org/jira/browse/SPARK-28509
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes, Tests
Affects Versions: 3.0.0
Reporter: Marcelo Vanzin


I've been seeing lots of failures in master. e.g. 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/13180/console

{noformat}
- Start pod creation from template *** FAILED ***
  io.fabric8.kubernetes.client.KubernetesClientException: 404 page not found
  at 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2.onFailure(WatchConnectionManager.java:201)
  at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:571)
  at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:198)
  at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
  at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)
  ...
- PVs with local storage *** FAILED ***
  io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: 
POST at: https://192.168.39.112:8443/api/v1/persistentvolumes. Message: 
PersistentVolume "test-local-pv" is invalid: [spec.local: Forbidden: Local 
volumes are disabled by feature-gate, metadata.annotations: Required value: 
Local volume requires node affinity]. Received status: Status(apiVersion=v1, 
code=422, details=StatusDetails(causes=[StatusCause(field=spec.local, 
message=Forbidden: Local volumes are disabled by feature-gate, 
reason=FieldValueForbidden, additionalProperties={}), 
StatusCause(field=metadata.annotations, message=Required value: Local volume 
requires node affinity, reason=FieldValueRequired, additionalProperties={})], 
group=null, kind=PersistentVolume, name=test-local-pv, retryAfterSeconds=null, 
uid=null, additionalProperties={}), kind=Status, message=PersistentVolume 
"test-local-pv" is invalid: [spec.local: Forbidden: Local volumes are disabled 
by feature-gate, metadata.annotations: Required value: Local volume requires 
node affinity], metadata=ListMeta(_continue=null, resourceVersion=null, 
selfLink=null, additionalProperties={}), reason=Invalid, status=Failure, 
additionalProperties={}).
  at 
io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:478)
  at 
io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:417)
  at 
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:381)
  at 
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:344)
  at 
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:227)
  at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:787)
  at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:357)
  at 
org.apache.spark.deploy.k8s.integrationtest.PVTestsSuite.setupLocalStorage(PVTestsSuite.scala:87)
  at 
org.apache.spark.deploy.k8s.integrationtest.PVTestsSuite.$anonfun$$init$$1(PVTestsSuite.scala:137)
  at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
  ...
- Launcher client dependencies *** FAILED ***
  The code passed to eventually never returned normally. Attempted 1 times over 
6.67390320003 minutes. Last failure message: assertion failed: 
{noformat}




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25590) kubernetes-model-2.0.0.jar masks default Spark logging config

2019-07-24 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-25590.

Resolution: Duplicate

> kubernetes-model-2.0.0.jar masks default Spark logging config
> -
>
> Key: SPARK-25590
> URL: https://issues.apache.org/jira/browse/SPARK-25590
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Marcelo Vanzin
>Priority: Major
>
> That jar file, which is packaged when the k8s profile is enabled, has a log4j 
> configuration embedded in it:
> {noformat}
> $ jar tf /path/to/kubernetes-model-2.0.0.jar | grep log4j
> log4j.properties
> {noformat}
> What this causes is that Spark will always use that log4j configuration 
> instead of its own default (log4j-defaults.properties), unless the user 
> overrides it by somehow adding their own in the classpath before the 
> kubernetes one.
> You can see that by running spark-shell. With the k8s jar in:
> {noformat}
> $ ./bin/spark-shell 
> ...
> Setting default log level to "WARN"
> {noformat}
> Removing the k8s jar:
> {noformat}
> $ ./bin/spark-shell 
> ...
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> Setting default log level to "WARN".
> {noformat}
> The proper fix would be for the k8s jar to not ship that file, and then just 
> upgrade the dependency in Spark, but if there's something easy we can do in 
> the meantime...



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28488) Race in k8s scheduler shutdown can lead to misleading exceptions.

2019-07-23 Thread Marcelo Vanzin (JIRA)
Marcelo Vanzin created SPARK-28488:
--

 Summary: Race in k8s scheduler shutdown can lead to misleading 
exceptions.
 Key: SPARK-28488
 URL: https://issues.apache.org/jira/browse/SPARK-28488
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes
Affects Versions: 3.0.0
Reporter: Marcelo Vanzin


There's a race when shutting down the k8s scheduler backend that may cause ugly 
exceptions to show up in the logs:

{noformat}
19/07/22 14:43:46 ERROR Utils: Uncaught exception in thread 
kubernetes-executor-snapshots-subscribers-0
org.apache.spark.SparkException: Could not find CoarseGrainedScheduler.
at 
org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:162)
at 
org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:143)
at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:193)
at 
org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:537)
at 
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:509)
at 
org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend.doRemoveExecutor(KubernetesClusterSchedulerBackend.scala:63)
at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsLifecycleManager.removeExecutorFromSpark(ExecutorPodsLifecycleManager.scala:143)
at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsLifecycleManager.$anonfun$onNewSnapshots$2(ExecutorPodsLifecycleManager.scala:64)
at 
scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:234)
at 
scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:465)
at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsLifecycleManager.$anonfun$onNewSnapshots$1(ExecutorPodsLifecycleManager.scala:59)
at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsLifecycleManager.$anonfun$onNewSnapshots$1$adapted(ExecutorPodsLifecycleManager.scala:58)
at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsLifecycleManager.onNewSnapshots(ExecutorPodsLifecycleManager.scala:58)
at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsLifecycleManager.$anonfun$start$1(ExecutorPodsLifecycleManager.scala:50)
at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsLifecycleManager.$anonfun$start$1$adapted(ExecutorPodsLifecycleManager.scala:50)
at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsSnapshotsStoreImpl.$anonfun$callSubscriber$1(ExecutorPodsSnapshotsStoreImpl.scala:110)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1330)
at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsSnapshotsStoreImpl.org$apache$spark$scheduler$cluster$k8s$ExecutorPodsSnapshotsStoreImpl$$callSubscriber(ExecutorPodsSnapshotsStoreImpl.scala:107)
at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsSnapshotsStoreImpl$$anon$1.run(ExecutorPodsSnapshotsStoreImpl.scala:80)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{noformat}

Basically, because the scheduler endpoint is shut down before the executors 
used internally by the spark-on-k8s code, those may send messages to an 
endpoint that does not exist anymore.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28487) K8S pod allocator behaves poorly with dynamic allocation

2019-07-23 Thread Marcelo Vanzin (JIRA)
Marcelo Vanzin created SPARK-28487:
--

 Summary: K8S pod allocator behaves poorly with dynamic allocation
 Key: SPARK-28487
 URL: https://issues.apache.org/jira/browse/SPARK-28487
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 3.0.0
Reporter: Marcelo Vanzin


The current pod allocator in Spark is tuned towards the behavior without 
dynamic allocation; it needs some enhancements so that dynamic allocation 
behaves better on K8S.

I'll be submitting an updated and enhanced version of a patch we've used 
internally.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28455) Executor may be timed out too soon because of overflow in tracking code

2019-07-19 Thread Marcelo Vanzin (JIRA)
Marcelo Vanzin created SPARK-28455:
--

 Summary: Executor may be timed out too soon because of overflow in 
tracking code
 Key: SPARK-28455
 URL: https://issues.apache.org/jira/browse/SPARK-28455
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: Marcelo Vanzin


This affects the new code added in SPARK-27963 (so normal dynamic allocation is 
fine). There's an overflow issue in that code that may cause executors to be 
timed out early with the default configuration.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28417) Spark Submit does not use Proxy User Credentials to Resolve Path for Resources

2019-07-17 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-28417.

Resolution: Duplicate

> Spark Submit does not use Proxy User Credentials to Resolve Path for Resources
> --
>
> Key: SPARK-28417
> URL: https://issues.apache.org/jira/browse/SPARK-28417
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0, 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 
> 2.4.0, 2.4.1, 2.4.2, 2.4.3
>Reporter: Abhishek Modi
>Priority: Minor
>
> Starting in as of [#SPARK-21012], spark-submit supports wildcard paths (ex: 
> {{hdfs:///user/akmodi/*}}). To support these, spark-submit does a glob 
> resolution on these paths and overwrites the wildcard paths with the resolved 
> paths. This introduced a bug -  the change did not use {{proxy-user}} 
> credentials when resolving these paths. As a result, Spark 2.2 and later apps 
> fail to launch an app as a {{proxy-user}} if the paths are only readable by 
> the {{proxy-user}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-27963) Allow dynamic allocation without an external shuffle service

2019-07-16 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-27963.

   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 24817
[https://github.com/apache/spark/pull/24817]

> Allow dynamic allocation without an external shuffle service
> 
>
> Key: SPARK-27963
> URL: https://issues.apache.org/jira/browse/SPARK-27963
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>Priority: Major
> Fix For: 3.0.0
>
>
> It would be useful for users to be able to enable dynamic allocation without 
> the need to provision an external shuffle service. One immediate use case is 
> the ability to use dynamic allocation on Kubernetes, which doesn't yet have 
> that service.
> This has been suggested before (e.g. 
> https://github.com/apache/spark/pull/24083, which was attached to the 
> k8s-specific SPARK-24432), and can actually be done without affecting the 
> internals of the Spark scheduler (aside from the dynamic allocation code). 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27963) Allow dynamic allocation without an external shuffle service

2019-07-16 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-27963:
--

Assignee: Marcelo Vanzin

> Allow dynamic allocation without an external shuffle service
> 
>
> Key: SPARK-27963
> URL: https://issues.apache.org/jira/browse/SPARK-27963
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>Priority: Major
>
> It would be useful for users to be able to enable dynamic allocation without 
> the need to provision an external shuffle service. One immediate use case is 
> the ability to use dynamic allocation on Kubernetes, which doesn't yet have 
> that service.
> This has been suggested before (e.g. 
> https://github.com/apache/spark/pull/24083, which was attached to the 
> k8s-specific SPARK-24432), and can actually be done without affecting the 
> internals of the Spark scheduler (aside from the dynamic allocation code). 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-27959) Change YARN resource configs to use .amount

2019-07-16 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-27959.

   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 24989
[https://github.com/apache/spark/pull/24989]

> Change YARN resource configs to use .amount
> ---
>
> Key: SPARK-27959
> URL: https://issues.apache.org/jira/browse/SPARK-27959
> Project: Spark
>  Issue Type: Story
>  Components: YARN
>Affects Versions: 3.0.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Major
> Fix For: 3.0.0
>
>
> we are adding in generic resource support into spark where we have suffix for 
> the amount of the resource so that we could support other configs. 
> Spark on yarn already had added configs to request resources via the configs 
> spark.yarn.\{executor/driver/am}.resource=, where the  amont> is value and unit together.  We should change those configs to have a 
> .amount suffix on them to match the spark configs and to allow future configs 
> to be more easily added. YARN itself already supports tags and attributes so 
> if we want the user to be able to pass those from spark at some point having 
> a suffix makes sense.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27959) Change YARN resource configs to use .amount

2019-07-16 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-27959:
--

Assignee: Thomas Graves

> Change YARN resource configs to use .amount
> ---
>
> Key: SPARK-27959
> URL: https://issues.apache.org/jira/browse/SPARK-27959
> Project: Spark
>  Issue Type: Story
>  Components: YARN
>Affects Versions: 3.0.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Major
>
> we are adding in generic resource support into spark where we have suffix for 
> the amount of the resource so that we could support other configs. 
> Spark on yarn already had added configs to request resources via the configs 
> spark.yarn.\{executor/driver/am}.resource=, where the  amont> is value and unit together.  We should change those configs to have a 
> .amount suffix on them to match the spark configs and to allow future configs 
> to be more easily added. YARN itself already supports tags and attributes so 
> if we want the user to be able to pass those from spark at some point having 
> a suffix makes sense.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28407) Support mapping spark.local.dir to hostPath volume

2019-07-15 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-28407.

Resolution: Duplicate

This had already been cloned elsewhere.

> Support mapping spark.local.dir to hostPath volume
> --
>
> Key: SPARK-28407
> URL: https://issues.apache.org/jira/browse/SPARK-28407
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Junjie Chen
>Priority: Minor
>
> Currently, the k8s executor builder mount spark.local.dir as emptyDir or 
> memory, it should satisfy some small workload, while in some heavily workload 
> like TPCDS, both of them can have some problem, such as pods are evicted due 
> to disk pressure when using emptyDir, and OOM when using tmpfs.
> In particular on cloud environment, users may allocate cluster with minimum 
> configuration and add cloud storage when running workload. In this case, we 
> can specify multiple elastic storage as spark.local.dir to accelerate the 
> spilling. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27499) Support mapping spark.local.dir to hostPath volume

2019-07-15 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885578#comment-16885578
 ] 

Marcelo Vanzin commented on SPARK-27499:


I can't see an option to reopen this, so I'll clone it instead. This seems like 
a simple fix that can at least help people experiment with different storage.

> Support mapping spark.local.dir to hostPath volume
> --
>
> Key: SPARK-27499
> URL: https://issues.apache.org/jira/browse/SPARK-27499
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Junjie Chen
>Priority: Minor
>
> Currently, the k8s executor builder mount spark.local.dir as emptyDir or 
> memory, it should satisfy some small workload, while in some heavily workload 
> like TPCDS, both of them can have some problem, such as pods are evicted due 
> to disk pressure when using emptyDir, and OOM when using tmpfs.
> In particular on cloud environment, users may allocate cluster with minimum 
> configuration and add cloud storage when running workload. In this case, we 
> can specify multiple elastic storage as spark.local.dir to accelerate the 
> spilling. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28407) Support mapping spark.local.dir to hostPath volume

2019-07-15 Thread Marcelo Vanzin (JIRA)
Marcelo Vanzin created SPARK-28407:
--

 Summary: Support mapping spark.local.dir to hostPath volume
 Key: SPARK-28407
 URL: https://issues.apache.org/jira/browse/SPARK-28407
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 3.0.0
Reporter: Junjie Chen


Currently, the k8s executor builder mount spark.local.dir as emptyDir or 
memory, it should satisfy some small workload, while in some heavily workload 
like TPCDS, both of them can have some problem, such as pods are evicted due to 
disk pressure when using emptyDir, and OOM when using tmpfs.

In particular on cloud environment, users may allocate cluster with minimum 
configuration and add cloud storage when running workload. In this case, we can 
specify multiple elastic storage as spark.local.dir to accelerate the spilling. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28371) Parquet "starts with" filter is not null-safe

2019-07-12 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated SPARK-28371:
---
Description: 
I ran into this when running unit tests with Parquet 1.11. It seems that 1.10 
has the same behavior in a few places but Spark somehow doesn't trigger those 
code paths.

Basically, {{UserDefinedPredicate.keep}} should be null-safe, and Spark's 
implementation is not. This was clarified in Parquet's documentation in 
PARQUET-1489.

Failure I was getting:

{noformat}
Job aborted due to stage failure: Task 0 in stage 1304.0 failed 1 times, most 
recent failure: Lost task 0.0 in stage 1304.0 (TID 2528, localhost, executor 
driver): java.lang.NullPointerException
  at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFilters$$anonfun$createFilter$16$$anon$1.keep(ParquetFilters.scala:544)
  at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFilters$$anonfun$createFilter$16$$anon$1.keep(ParquetFilters.scala:523)
  at 
org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.visit(ColumnIndexFilter.java:152)
  at 
org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.visit(ColumnIndexFilter.java:56)
  at 
org.apache.parquet.filter2.predicate.Operators$UserDefined.accept(Operators.java:377)
  at 
org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.visit(ColumnIndexFilter.java:181)
  at 
org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.visit(ColumnIndexFilter.java:56)
  at 
org.apache.parquet.filter2.predicate.Operators$And.accept(Operators.java:309)
  at 
org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter$1.visit(ColumnIndexFilter.java:86)
  at 
org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter$1.visit(ColumnIndexFilter.java:81)
  at 
org.apache.parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:137)
  at 
org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.calculateRowRanges(ColumnIndexFilter.java:81)
  at 
org.apache.parquet.hadoop.ParquetFileReader.getRowRanges(ParquetFileReader.java:954)
  at 
org.apache.parquet.hadoop.ParquetFileReader.getFilteredRecordCount(ParquetFileReader.java:759)
  at 
org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:207)
  at 
org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:182)
  at 
org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:140)
  at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(ParquetFileFormat.scala:439)
  ... 
{noformat}

  was:
I ran into this when running unit tests with Parquet 1.11. It seems that 1.10 
has the same behavior in a few places but Spark somehow doesn't trigger those 
code paths.

Basically, {{UserDefinedPredicate.keep}} should be null-safe, and Spark's 
implementation is not. This was clarified in Parquet's documentation in 
PARQUET-1489.

Failure I was getting:

{noformat}
Job aborted due to stage failure: Task 0 in stage 1304.0 failed 1 times, most 
recent failure: Lost task 0.0 in stage 1304.0 (TID 2528, localhost, executor 
driver): java.lang.NullPointerException at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFilters$$anonfun$createFilter$16$$anon$1.keep(ParquetFilters.scala:544)
 at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFilters$$anonfun$createFilter$16$$anon$1.keep(ParquetFilters.scala:523)
 at 
org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.visit(ColumnIndexFilter.java:152)
 at 
org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.visit(ColumnIndexFilter.java:56)
 at 
org.apache.parquet.filter2.predicate.Operators$UserDefined.accept(Operators.java:377)
 at 
org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.visit(ColumnIndexFilter.java:181)
 at 
org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.visit(ColumnIndexFilter.java:56)
 at 
org.apache.parquet.filter2.predicate.Operators$And.accept(Operators.java:309)
 at 
org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter$1.visit(ColumnIndexFilter.java:86)
 at 
org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter$1.visit(ColumnIndexFilter.java:81)
 at 
org.apache.parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:137)
 at 
org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.calculateRowRanges(ColumnIndexFilter.java:81)
 at 
org.apache.parquet.hadoop.ParquetFileReader.getRowRanges(ParquetFileReader.java:954)
 at 
org.apache.parquet.hadoop.ParquetFileReader.getFilteredRecordCount(ParquetFileReader.java:759)
 at 
org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:207)
 at 

[jira] [Created] (SPARK-28371) Parquet "starts with" filter is not null-safe

2019-07-12 Thread Marcelo Vanzin (JIRA)
Marcelo Vanzin created SPARK-28371:
--

 Summary: Parquet "starts with" filter is not null-safe
 Key: SPARK-28371
 URL: https://issues.apache.org/jira/browse/SPARK-28371
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Marcelo Vanzin


I ran into this when running unit tests with Parquet 1.11. It seems that 1.10 
has the same behavior in a few places but Spark somehow doesn't trigger those 
code paths.

Basically, {{UserDefinedPredicate.keep}} should be null-safe, and Spark's 
implementation is not. This was clarified in Parquet's documentation in 
PARQUET-1489.

Failure I was getting:

{noformat}
Job aborted due to stage failure: Task 0 in stage 1304.0 failed 1 times, most 
recent failure: Lost task 0.0 in stage 1304.0 (TID 2528, localhost, executor 
driver): java.lang.NullPointerException at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFilters$$anonfun$createFilter$16$$anon$1.keep(ParquetFilters.scala:544)
 at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFilters$$anonfun$createFilter$16$$anon$1.keep(ParquetFilters.scala:523)
 at 
org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.visit(ColumnIndexFilter.java:152)
 at 
org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.visit(ColumnIndexFilter.java:56)
 at 
org.apache.parquet.filter2.predicate.Operators$UserDefined.accept(Operators.java:377)
 at 
org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.visit(ColumnIndexFilter.java:181)
 at 
org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.visit(ColumnIndexFilter.java:56)
 at 
org.apache.parquet.filter2.predicate.Operators$And.accept(Operators.java:309)
 at 
org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter$1.visit(ColumnIndexFilter.java:86)
 at 
org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter$1.visit(ColumnIndexFilter.java:81)
 at 
org.apache.parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:137)
 at 
org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.calculateRowRanges(ColumnIndexFilter.java:81)
 at 
org.apache.parquet.hadoop.ParquetFileReader.getRowRanges(ParquetFileReader.java:954)
 at 
org.apache.parquet.hadoop.ParquetFileReader.getFilteredRecordCount(ParquetFileReader.java:759)
 at 
org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:207)
 at 
org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:182)
 at 
org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:140)
 at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(ParquetFileFormat.scala:439)
 at 
...
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23472) Add config properties for administrator JVM options

2019-07-11 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-23472:
--

Assignee: Gabor Somogyi

> Add config properties for administrator JVM options
> ---
>
> Key: SPARK-23472
> URL: https://issues.apache.org/jira/browse/SPARK-23472
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Ryan Blue
>Assignee: Gabor Somogyi
>Priority: Major
>
> In our environment, users may need to add JVM options to their Spark 
> applications (e.g. to override log configuration). They typically use 
> {{--driver-java-options}} or {{spark.executor.extraJavaOptions}}. Both set 
> extraJavaOptions properties. We also have a set of administrator JVM options 
> to apply that set the garbage collector (G1GC) and kill the driver JVM on OOM.
> These two use cases both need to set extraJavaOptions properties, but will 
> clobber one another. In the past we've maintained wrapper scripts, but this 
> causes our default properties to be maintained in scripts rather than our 
> spark-defaults.properties.
> I think we should add defaultJavaOptions properties that are added along with 
> extraJavaOptions. Administrators could set defaultJavaOptions and these would 
> always get added to the JVM command line, along with any user options instead 
> of getting overwritten by user options.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23472) Add config properties for administrator JVM options

2019-07-11 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-23472.

   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 24804
[https://github.com/apache/spark/pull/24804]

> Add config properties for administrator JVM options
> ---
>
> Key: SPARK-23472
> URL: https://issues.apache.org/jira/browse/SPARK-23472
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Ryan Blue
>Assignee: Gabor Somogyi
>Priority: Major
> Fix For: 3.0.0
>
>
> In our environment, users may need to add JVM options to their Spark 
> applications (e.g. to override log configuration). They typically use 
> {{--driver-java-options}} or {{spark.executor.extraJavaOptions}}. Both set 
> extraJavaOptions properties. We also have a set of administrator JVM options 
> to apply that set the garbage collector (G1GC) and kill the driver JVM on OOM.
> These two use cases both need to set extraJavaOptions properties, but will 
> clobber one another. In the past we've maintained wrapper scripts, but this 
> causes our default properties to be maintained in scripts rather than our 
> spark-defaults.properties.
> I think we should add defaultJavaOptions properties that are added along with 
> extraJavaOptions. Administrators could set defaultJavaOptions and these would 
> always get added to the JVM command line, along with any user options instead 
> of getting overwritten by user options.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28055) Add delegation token custom AdminClient configurations.

2019-07-11 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-28055.

   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 24875
[https://github.com/apache/spark/pull/24875]

> Add delegation token custom AdminClient configurations.
> ---
>
> Key: SPARK-28055
> URL: https://issues.apache.org/jira/browse/SPARK-28055
> Project: Spark
>  Issue Type: Improvement
>  Components: DStreams, Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Minor
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28055) Add delegation token custom AdminClient configurations.

2019-07-11 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-28055:
--

Assignee: Gabor Somogyi

> Add delegation token custom AdminClient configurations.
> ---
>
> Key: SPARK-28055
> URL: https://issues.apache.org/jira/browse/SPARK-28055
> Project: Spark
>  Issue Type: Improvement
>  Components: DStreams, Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28214) Flaky test: org.apache.spark.streaming.CheckpointSuite.basic rdd checkpoints + dstream graph checkpoint recovery

2019-06-28 Thread Marcelo Vanzin (JIRA)
Marcelo Vanzin created SPARK-28214:
--

 Summary: Flaky test: 
org.apache.spark.streaming.CheckpointSuite.basic rdd checkpoints + dstream 
graph checkpoint recovery
 Key: SPARK-28214
 URL: https://issues.apache.org/jira/browse/SPARK-28214
 Project: Spark
  Issue Type: Bug
  Components: DStreams, Tests
Affects Versions: 3.0.0
Reporter: Marcelo Vanzin


This test has failed a few times in some PRs. Example of a failure:

{noformat}
Error Message
org.scalatest.exceptions.TestFailedException: Map() was empty No checkpointed 
RDDs in state stream before first failure
Stacktrace
sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: Map() was 
empty No checkpointed RDDs in state stream before first failure
at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:528)
at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:527)
at 
org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
at 
org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:501)
at 
org.apache.spark.streaming.CheckpointSuite.$anonfun$new$3(CheckpointSuite.scala:266)
at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:149)
at 
org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196)
at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196)
at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178)
at 
org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:56)
at 
org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221)
at 
org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214)
at 
org.apache.spark.streaming.CheckpointSuite.org$scalatest$BeforeAndAfter$$super$runTest(CheckpointSuite.scala:209)
{noformat}

On top of that, when this failure happens, the test leaves a running 
{{SparkContext}} behind, which makes every single unit test run after it on 
that project fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Deleted] (SPARK-28207) https://rtatdotblog.wordpress.com/2019/05/30/rohit-travels-tours-rohit

2019-06-28 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin deleted SPARK-28207:
---


> https://rtatdotblog.wordpress.com/2019/05/30/rohit-travels-tours-rohit
> --
>
> Key: SPARK-28207
> URL: https://issues.apache.org/jira/browse/SPARK-28207
> Project: Spark
>  Issue Type: Bug
>Reporter: Roufique Hossain
>Priority: Minor
>  Labels: http://schemas.xmlsoap.org/ws/2004/09/policy
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28187) Add hadoop-cloud module to PR builders

2019-06-27 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-28187:
--

Assignee: Marcelo Vanzin

> Add hadoop-cloud module to PR builders
> --
>
> Key: SPARK-28187
> URL: https://issues.apache.org/jira/browse/SPARK-28187
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>Priority: Minor
>
> We currently don't build / test the hadoop-cloud stuff in PRs. See 
> https://github.com/apache/spark/pull/24970 for an example.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >