[jira] [Created] (SPARK-26737) Executor/Task STDERR & STDOUT log urls are not correct in Yarn deployment mode
Devaraj K created SPARK-26737: - Summary: Executor/Task STDERR & STDOUT log urls are not correct in Yarn deployment mode Key: SPARK-26737 URL: https://issues.apache.org/jira/browse/SPARK-26737 Project: Spark Issue Type: Bug Components: Web UI, YARN Affects Versions: 3.0.0 Reporter: Devaraj K Base of the STDERR & STDOUT log urls are generating like these which is also including key, {code} http://ip:8042/node/containerlogs/container_1544212645385_0252_01_01/(SPARK_USER, devaraj) {code} {code} http://ip:8042/node/containerlogs/container_1544212645385_0252_01_01/(USER, devaraj) {code} Instead of {code}http://ip:8042/node/containerlogs/container_1544212645385_0251_01_02/devaraj {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26650) Yarn Client throws 'ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration'
Devaraj K created SPARK-26650: - Summary: Yarn Client throws 'ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration' Key: SPARK-26650 URL: https://issues.apache.org/jira/browse/SPARK-26650 Project: Spark Issue Type: Bug Components: Build, YARN Affects Versions: 3.0.0 Reporter: Devaraj K {code:xml} 19/01/17 11:33:00 WARN security.HBaseDelegationTokenProvider: Fail to invoke HBaseConfiguration java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.spark.deploy.security.HBaseDelegationTokenProvider.hbaseConf(HBaseDelegationTokenProvider.scala:69) at org.apache.spark.deploy.security.HBaseDelegationTokenProvider.delegationTokensRequired(HBaseDelegationTokenProvider.scala:62) at org.apache.spark.deploy.security.HadoopDelegationTokenManager.$anonfun$obtainDelegationTokens$1(HadoopDelegationTokenManager.scala:134) at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:244) at scala.collection.Iterator.foreach(Iterator.scala:941) at scala.collection.Iterator.foreach$(Iterator.scala:941) at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) at scala.collection.MapLike$DefaultValuesIterable.foreach(MapLike.scala:213) at scala.collection.TraversableLike.flatMap(TraversableLike.scala:244) at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:241) at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108) at org.apache.spark.deploy.security.HadoopDelegationTokenManager.obtainDelegationTokens(HadoopDelegationTokenManager.scala:133) at org.apache.spark.deploy.yarn.security.YARNHadoopDelegationTokenManager.obtainDelegationTokens(YARNHadoopDelegationTokenManager.scala:59) at org.apache.spark.deploy.yarn.Client.setupSecurityToken(Client.scala:305) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:1014) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:181) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:58) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:184) at org.apache.spark.SparkContext.(SparkContext.scala:509) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2466) at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$5(SparkSession.scala:948) at scala.Option.getOrElse(Option.scala:138) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:939) at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:30) at org.apache.spark.examples.SparkPi.main(SparkPi.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:853) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:168) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:196) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:87) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:932) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:941) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 19/01/17 11:33:00 INFO yarn.Client: Submitting application application_1544212645385_0197 to ResourceManager 19/01/17 11:33:00 INFO impl.YarnClientImpl: Submitted application application_1544212645385_0197 {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24787) Events being dropped at an alarming rate due to hsync being slow for eventLogging
[ https://issues.apache.org/jira/browse/SPARK-24787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16652687#comment-16652687 ] Devaraj K commented on SPARK-24787: --- It seems here the overhead is coming due the force call FileChannel.force in Datanode which is part of the hsync to write the data to the storage device. And the hsync is not making much difference with and without the flag SyncFlag.UPDATE_LENGTH, it might be because the update length is simple call to NameNode to update the length. I think the hsync change can be reverted, and the history server can get the latest file length using the DFSInputStream.getFileLength() which includes lastBlockBeingWrittenLength, if the cached length is same as FileStatus.getLen() then history server can make additional call to get the latest length using DFSInputStream.getFileLength() and decide whether to update the history log or not. > Events being dropped at an alarming rate due to hsync being slow for > eventLogging > - > > Key: SPARK-24787 > URL: https://issues.apache.org/jira/browse/SPARK-24787 > Project: Spark > Issue Type: Bug > Components: Spark Core, Web UI >Affects Versions: 2.3.0, 2.3.1 >Reporter: Sanket Reddy >Priority: Minor > > [https://github.com/apache/spark/pull/16924/files] updates the length of the > inprogress files allowing history server being responsive. > Although we have a production job that has 6 tasks per stage and due to > hsync being slow it starts dropping events and the history server has wrong > stats due to events being dropped. > A viable solution is not to make it sync very frequently or make it > configurable. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25683) Make AsyncEventQueue.lastReportTimestamp inital value as the currentTime instead of 0
Devaraj K created SPARK-25683: - Summary: Make AsyncEventQueue.lastReportTimestamp inital value as the currentTime instead of 0 Key: SPARK-25683 URL: https://issues.apache.org/jira/browse/SPARK-25683 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 2.3.2 Reporter: Devaraj K {code:xml} 18/10/08 17:51:40 ERROR AsyncEventQueue: Dropping event from queue eventLog. This likely means one of the listeners is too slow and cannot keep up with the rate at which tasks are being started by the scheduler. 18/10/08 17:51:40 WARN AsyncEventQueue: Dropped 1 events from eventLog since Wed Dec 31 16:00:00 PST 1969. 18/10/08 17:52:40 WARN AsyncEventQueue: Dropped 144853 events from eventLog since Mon Oct 08 17:51:40 PDT 2018. {code} Here it shows the time as Wed Dec 31 16:00:00 PST 1969 for the first log, I think it would be better if we show the initialized time as the time here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25645) Add provision to disable EventLoggingListener default flush/hsync/hflush for all events
[ https://issues.apache.org/jira/browse/SPARK-25645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K resolved SPARK-25645. --- Resolution: Duplicate > Add provision to disable EventLoggingListener default flush/hsync/hflush for > all events > --- > > Key: SPARK-25645 > URL: https://issues.apache.org/jira/browse/SPARK-25645 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.2 >Reporter: Devaraj K >Priority: Major > > {code:java|title=EventLoggingListener.scala|borderStyle=solid} > private def logEvent(event: SparkListenerEvent, flushLogger: Boolean = false) > { > val eventJson = JsonProtocol.sparkEventToJson(event) > // scalastyle:off println > writer.foreach(_.println(compact(render(eventJson > // scalastyle:on println > if (flushLogger) { > writer.foreach(_.flush()) > hadoopDataStream.foreach(ds => ds.getWrappedStream match { > case wrapped: DFSOutputStream => > wrapped.hsync(EnumSet.of(SyncFlag.UPDATE_LENGTH)) > case _ => ds.hflush() > }) > } > {code} > There are events which come with flushLogger=true and go through the > underlying stream flush, Here I tried running apps with disabling the > flush/hsync/hflush for all events and see that there is significant > improvement in the app completion time and also there are no event drops, > posting more details in the comments section. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25645) Add provision to disable EventLoggingListener default flush/hsync/hflush for all events
[ https://issues.apache.org/jira/browse/SPARK-25645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639079#comment-16639079 ] Devaraj K commented on SPARK-25645: --- {code:java|title=with hflush(no hsync)|borderStyle=solid} 18/10/04 17:01:12 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/10/04 17:01:13 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME. 18/10/04 17:03:35 WARN AsyncEventQueue: [ADDED-LOG-BEFORE-DISPATCH-THREAD-JOIN] Thread.currentThread():main, dispatchThread:spark-listener-group-appStatus, eventQueue.size(): 1 18/10/04 17:03:35 WARN AsyncEventQueue: [ADDED-LOG-AFTER-DISPATCH-THREAD-JOIN] Thread.currentThread():main, dispatchThread:spark-listener-group-appStatus, eventQueue.size(): 0 18/10/04 17:03:35 WARN AsyncEventQueue: [ADDED-LOG-BEFORE-DISPATCH-THREAD-JOIN] Thread.currentThread():main, dispatchThread:spark-listener-group-executorManagement, eventQueue.size(): 1 18/10/04 17:03:35 WARN AsyncEventQueue: [ADDED-LOG-AFTER-DISPATCH-THREAD-JOIN] Thread.currentThread():main, dispatchThread:spark-listener-group-executorManagement, eventQueue.size(): 0 18/10/04 17:03:35 WARN AsyncEventQueue: [ADDED-LOG-BEFORE-DISPATCH-THREAD-JOIN] Thread.currentThread():main, dispatchThread:spark-listener-group-eventLog, eventQueue.size(): 1 18/10/04 17:03:35 WARN AsyncEventQueue: [ADDED-LOG-AFTER-DISPATCH-THREAD-JOIN] Thread.currentThread():main, dispatchThread:spark-listener-group-eventLog, eventQueue.size(): 0 {code} with hflush(no hsync), it is slightly(2 sec) taking more than the no-hflush for all events, and don't see any dropped events here as well. > Add provision to disable EventLoggingListener default flush/hsync/hflush for > all events > --- > > Key: SPARK-25645 > URL: https://issues.apache.org/jira/browse/SPARK-25645 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.2 >Reporter: Devaraj K >Priority: Major > > {code:java|title=EventLoggingListener.scala|borderStyle=solid} > private def logEvent(event: SparkListenerEvent, flushLogger: Boolean = false) > { > val eventJson = JsonProtocol.sparkEventToJson(event) > // scalastyle:off println > writer.foreach(_.println(compact(render(eventJson > // scalastyle:on println > if (flushLogger) { > writer.foreach(_.flush()) > hadoopDataStream.foreach(ds => ds.getWrappedStream match { > case wrapped: DFSOutputStream => > wrapped.hsync(EnumSet.of(SyncFlag.UPDATE_LENGTH)) > case _ => ds.hflush() > }) > } > {code} > There are events which come with flushLogger=true and go through the > underlying stream flush, Here I tried running apps with disabling the > flush/hsync/hflush for all events and see that there is significant > improvement in the app completion time and also there are no event drops, > posting more details in the comments section. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25645) Add provision to disable EventLoggingListener default flush/hsync/hflush for all events
[ https://issues.apache.org/jira/browse/SPARK-25645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639046#comment-16639046 ] Devaraj K commented on SPARK-25645: --- Thanks [~vanzin] for the jira pointer, I haven't tried just with hflush, let me try with hflush and post the result for the same app. > Add provision to disable EventLoggingListener default flush/hsync/hflush for > all events > --- > > Key: SPARK-25645 > URL: https://issues.apache.org/jira/browse/SPARK-25645 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.2 >Reporter: Devaraj K >Priority: Major > > {code:java|title=EventLoggingListener.scala|borderStyle=solid} > private def logEvent(event: SparkListenerEvent, flushLogger: Boolean = false) > { > val eventJson = JsonProtocol.sparkEventToJson(event) > // scalastyle:off println > writer.foreach(_.println(compact(render(eventJson > // scalastyle:on println > if (flushLogger) { > writer.foreach(_.flush()) > hadoopDataStream.foreach(ds => ds.getWrappedStream match { > case wrapped: DFSOutputStream => > wrapped.hsync(EnumSet.of(SyncFlag.UPDATE_LENGTH)) > case _ => ds.hflush() > }) > } > {code} > There are events which come with flushLogger=true and go through the > underlying stream flush, Here I tried running apps with disabling the > flush/hsync/hflush for all events and see that there is significant > improvement in the app completion time and also there are no event drops, > posting more details in the comments section. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25645) Add provision to disable EventLoggingListener default flush/hsync/hflush for all events
Devaraj K created SPARK-25645: - Summary: Add provision to disable EventLoggingListener default flush/hsync/hflush for all events Key: SPARK-25645 URL: https://issues.apache.org/jira/browse/SPARK-25645 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 2.3.2 Reporter: Devaraj K {code:java|title=EventLoggingListener.scala|borderStyle=solid} private def logEvent(event: SparkListenerEvent, flushLogger: Boolean = false) { val eventJson = JsonProtocol.sparkEventToJson(event) // scalastyle:off println writer.foreach(_.println(compact(render(eventJson // scalastyle:on println if (flushLogger) { writer.foreach(_.flush()) hadoopDataStream.foreach(ds => ds.getWrappedStream match { case wrapped: DFSOutputStream => wrapped.hsync(EnumSet.of(SyncFlag.UPDATE_LENGTH)) case _ => ds.hflush() }) } {code} There are events which come with flushLogger=true and go through the underlying stream flush, Here I tried running apps with disabling the flush/hsync/hflush for all events and see that there is significant improvement in the app completion time and also there are no event drops, posting more details in the comments section. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25645) Add provision to disable EventLoggingListener default flush/hsync/hflush for all events
[ https://issues.apache.org/jira/browse/SPARK-25645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639005#comment-16639005 ] Devaraj K commented on SPARK-25645: --- {code:java|title=Present Behavior(flushLogger=true for some events)|borderStyle=solid} 18/10/04 15:00:25 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/10/04 15:00:26 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME. 18/10/04 15:00:58 ERROR AsyncEventQueue: Dropping event from queue eventLog. This likely means one of the listeners is too slow and cannot keep up with the rate at which tasks are being started by the scheduler. 18/10/04 15:00:58 WARN AsyncEventQueue: Dropped 2 events from eventLog since Wed Dec 31 16:00:00 PST 1969. 18/10/04 15:01:58 WARN AsyncEventQueue: Dropped 216493 events from eventLog since Thu Oct 04 15:00:58 PDT 2018. 18/10/04 15:02:44 WARN AsyncEventQueue: [ADDED-LOG-BEFORE-DISPATCH-THREAD-JOIN] Thread.currentThread():main, dispatchThread:spark-listener-group-appStatus, eventQueue.size(): 1 18/10/04 15:02:44 WARN AsyncEventQueue: [ADDED-LOG-AFTER-DISPATCH-THREAD-JOIN] Thread.currentThread():main, dispatchThread:spark-listener-group-appStatus, eventQueue.size(): 0 18/10/04 15:02:44 WARN AsyncEventQueue: [ADDED-LOG-BEFORE-DISPATCH-THREAD-JOIN] Thread.currentThread():main, dispatchThread:spark-listener-group-executorManagement, eventQueue.size(): 0 18/10/04 15:02:44 WARN AsyncEventQueue: [ADDED-LOG-AFTER-DISPATCH-THREAD-JOIN] Thread.currentThread():main, dispatchThread:spark-listener-group-executorManagement, eventQueue.size(): 0 18/10/04 15:02:44 WARN AsyncEventQueue: [ADDED-LOG-BEFORE-DISPATCH-THREAD-JOIN] Thread.currentThread():main, dispatchThread:spark-listener-group-eventLog, eventQueue.size(): 1 18/10/04 15:03:39 WARN AsyncEventQueue: [ADDED-LOG-AFTER-DISPATCH-THREAD-JOIN] Thread.currentThread():main, dispatchThread:spark-listener-group-eventLog, eventQueue.size(): 0 {code} With the present behavior, taking 3 mins 14 sec to complete the application with the dropped events. And it is taking 55 sec to clear the eventLog queue at the end of the application. {code:java|title=flush/hsync/hflush disabled for all events|borderStyle=solid} 18/10/04 14:51:33 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/10/04 14:51:34 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME. 18/10/04 14:53:54 WARN AsyncEventQueue: [ADDED-LOG-BEFORE-DISPATCH-THREAD-JOIN] Thread.currentThread():main, dispatchThread:spark-listener-group-appStatus, eventQueue.size(): 0 18/10/04 14:53:54 WARN AsyncEventQueue: [ADDED-LOG-AFTER-DISPATCH-THREAD-JOIN] Thread.currentThread():main, dispatchThread:spark-listener-group-appStatus, eventQueue.size(): 0 18/10/04 14:53:54 WARN AsyncEventQueue: [ADDED-LOG-BEFORE-DISPATCH-THREAD-JOIN] Thread.currentThread():main, dispatchThread:spark-listener-group-executorManagement, eventQueue.size(): 0 18/10/04 14:53:54 WARN AsyncEventQueue: [ADDED-LOG-AFTER-DISPATCH-THREAD-JOIN] Thread.currentThread():main, dispatchThread:spark-listener-group-executorManagement, eventQueue.size(): 0 18/10/04 14:53:54 WARN AsyncEventQueue: [ADDED-LOG-BEFORE-DISPATCH-THREAD-JOIN] Thread.currentThread():main, dispatchThread:spark-listener-group-eventLog, eventQueue.size(): 0 18/10/04 14:53:54 WARN AsyncEventQueue: [ADDED-LOG-AFTER-DISPATCH-THREAD-JOIN] Thread.currentThread():main, dispatchThread:spark-listener-group-eventLog, eventQueue.size(): 0 {code} With the disabled flush/hsync/hflush for all events, taking 2 mins 21 sec to complete the application without any dropped events. And also there are no pending events in eventLog queue at the end of the application. > Add provision to disable EventLoggingListener default flush/hsync/hflush for > all events > --- > > Key: SPARK-25645 > URL: https://issues.apache.org/jira/browse/SPARK-25645 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.2 >Reporter: Devaraj K >Priority: Major > > {code:java|title=EventLoggingListener.scala|borderStyle=solid} > private def logEvent(event: SparkListenerEvent, flushLogger: Boolean = false) > { > val eventJson = JsonProtocol.sparkEventToJson(event) > // scalastyle:off println > writer.foreach(_.println(compact(render(eventJson > // scalastyle:on println > if (flushLogger) { > writer.foreach(_.flush()) > hadoopDataStream.foreach(ds => ds.getWrappedStream match { > case wrapped: DFSOutputStream => >
[jira] [Created] (SPARK-25637) SparkException: Could not find CoarseGrainedScheduler occurs during the application stop
Devaraj K created SPARK-25637: - Summary: SparkException: Could not find CoarseGrainedScheduler occurs during the application stop Key: SPARK-25637 URL: https://issues.apache.org/jira/browse/SPARK-25637 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 2.3.2 Reporter: Devaraj K {code:xml} 2018-10-03 14:51:33 ERROR Inbox:91 - Ignoring error org.apache.spark.SparkException: Could not find CoarseGrainedScheduler. at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:160) at org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:140) at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:187) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:528) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.reviveOffers(CoarseGrainedSchedulerBackend.scala:449) at org.apache.spark.scheduler.TaskSchedulerImpl.executorLost(TaskSchedulerImpl.scala:638) at org.apache.spark.HeartbeatReceiver$$anonfun$org$apache$spark$HeartbeatReceiver$$expireDeadHosts$3.apply(HeartbeatReceiver.scala:201) at org.apache.spark.HeartbeatReceiver$$anonfun$org$apache$spark$HeartbeatReceiver$$expireDeadHosts$3.apply(HeartbeatReceiver.scala:197) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) at scala.collection.mutable.HashMap.foreach(HashMap.scala:130) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732) at org.apache.spark.HeartbeatReceiver.org$apache$spark$HeartbeatReceiver$$expireDeadHosts(HeartbeatReceiver.scala:197) at org.apache.spark.HeartbeatReceiver$$anonfun$receiveAndReply$1.applyOrElse(HeartbeatReceiver.scala:120) at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:105) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101) at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:221) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} SPARK-14228 fixed these kind of errors but still this is occurring while performing reviveOffers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25636) spark-submit swallows the failure reason when there is an error connecting to master
Devaraj K created SPARK-25636: - Summary: spark-submit swallows the failure reason when there is an error connecting to master Key: SPARK-25636 URL: https://issues.apache.org/jira/browse/SPARK-25636 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.3.2 Reporter: Devaraj K {code:xml} [apache-spark]$ ./bin/spark-submit --verbose --master spark:// Error: Exception thrown in awaitResult: Run with --help for usage help or --verbose for debug output {code} When the spark submit cannot connect to master, there is no error shown. I think it should display the cause for the problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25246) When the spark.eventLog.compress is enabled, the Application is not showing in the History server UI ('incomplete application' page), initially.
[ https://issues.apache.org/jira/browse/SPARK-25246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16615398#comment-16615398 ] Devaraj K commented on SPARK-25246: --- I think it is not a problem, the behavior might be based on the Codec using for compression, can you try with other Codecs and observe the behavior. > When the spark.eventLog.compress is enabled, the Application is not showing > in the History server UI ('incomplete application' page), initially. > > > Key: SPARK-25246 > URL: https://issues.apache.org/jira/browse/SPARK-25246 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: shahid >Priority: Major > > 1) bin/spark-shell --master yarn --conf "spark.eventLog.compress=true" > 2) hdfs dfs -ls /spark-logs > {code:java} > -rwxrwx--- 1 root supergroup *0* 2018-08-27 03:26 > /spark-logs/application_1535313809919_0005.lz4.inprogress > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25009) Standalone Cluster mode application submit is not working
Devaraj K created SPARK-25009: - Summary: Standalone Cluster mode application submit is not working Key: SPARK-25009 URL: https://issues.apache.org/jira/browse/SPARK-25009 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.3.1 Reporter: Devaraj K It is not showing any error while submitting but the app is not running and as well as not showing in the web UI. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24129) Add option to pass --build-arg's to docker-image-tool.sh
Devaraj K created SPARK-24129: - Summary: Add option to pass --build-arg's to docker-image-tool.sh Key: SPARK-24129 URL: https://issues.apache.org/jira/browse/SPARK-24129 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 2.4.0 Reporter: Devaraj K When we are working behind the firewall, we may need to pass the proxy details as part of the docker --build-arg parameters to build the image. But docker-image-tool.sh doesn't provide option to pass the proxy details or the --build-arg to the docker command. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24003) Add support to provide spark.executor.extraJavaOptions in terms of App Id and/or Executor Id's
Devaraj K created SPARK-24003: - Summary: Add support to provide spark.executor.extraJavaOptions in terms of App Id and/or Executor Id's Key: SPARK-24003 URL: https://issues.apache.org/jira/browse/SPARK-24003 Project: Spark Issue Type: Improvement Components: Mesos, Spark Core, YARN Affects Versions: 2.3.0 Reporter: Devaraj K Users may want to enable gc logging or heap dump for the executors, but there is a chance of overwriting it by other executors since the paths cannot be expressed dynamically. This improvement would enable to express the spark.executor.extraJavaOptions paths in terms of App Id and Executor Id's to avoid the overwriting by other executors. There was a discussion about this in SPARK-3767, but it never fixed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22567) spark.mesos.executor.memoryOverhead equivalent for the Driver when running on Mesos
[ https://issues.apache.org/jira/browse/SPARK-22567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377370#comment-16377370 ] Devaraj K commented on SPARK-22567: --- Dup of SPARK-17928. [~michaelmoss], can you check the PR available for SPARK-17928? > spark.mesos.executor.memoryOverhead equivalent for the Driver when running on > Mesos > --- > > Key: SPARK-22567 > URL: https://issues.apache.org/jira/browse/SPARK-22567 > Project: Spark > Issue Type: Improvement > Components: Mesos >Affects Versions: 2.2.0 >Reporter: Michael Moss >Priority: Minor > > spark.mesos.executor.memoryOverhead is: > "The amount of additional memory, specified in MB, to be allocated per > executor. By default, the overhead will be larger of either 384 or 10% of > spark.executor.memory" > It is important for every JVM process to have memory available to it, beyond > its heap (Xmx) for native allocations. > When using the MesosClusterDispatcher and running the Driver on Mesos > (https://spark.apache.org/docs/latest/running-on-mesos.html#cluster-mode), it > appears that the Driver's mesos sandbox is allocated with the same amount of > memory (configured with spark.driver.memory) as the heap (Xmx) itself. This > increases the prevalence of OOM exceptions. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22404) Provide an option to use unmanaged AM in yarn-client mode
[ https://issues.apache.org/jira/browse/SPARK-22404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308802#comment-16308802 ] Devaraj K commented on SPARK-22404: --- Thanks [~irashid] for the comment. bq. can you provide a little more explanation for the point of this? An unmanagedAM is an AM that is not launched and managed by the RM. The client creates a new application on the RM and negotiates a new attempt id. Then it waits for the RM app state to reach be YarnApplicationState.ACCEPTED after which it spawns the AM in same/another process and passes it the container id via env variable Environment.CONTAINER_ID. The AM(as part of same or different process) can register with the RM using the attempt id obtained from the container id and proceed as normal. In this PR/JIRA, providing a new configuration "spark.yarn.un-managed-am" (defaults to false) to enable the Unmanaged AM Application in Yarn Client mode which starts the Application Master service as part of the Client. It utilizes the existing code for communicating between the Application Master <-> Task Scheduler for the container requests/allocations/launch, and eliminates these, * Allocating and launching the Application Master container * Remote Node/Process communication between Application Master <-> Task Scheduler bq. how much time does this save for you? It removes the AM container scheduling and launching time, and eliminates the AM acting as proxy for requesting, launching and removing executors. I can post the comparison results here with and without unmanaged am. bq. What's the downside of an unmanaged AM? Unmanaged AM service would run as part of the Client, Client can handle if anything goes wrong with the unmanaged AM service unlike relaunching the AM container for failures. bq. the idea makes sense, but the yarn interaction and client mode is already pretty complicated so I'd like good justication for this In this PR, it reuses the most of the existing code for communication between AM <-> Task Scheduler but happens in the same process. The Client starts the AM service in the same process when the applications state is ACCEPTED and proceeds as usual without disrupting existing flow. > Provide an option to use unmanaged AM in yarn-client mode > - > > Key: SPARK-22404 > URL: https://issues.apache.org/jira/browse/SPARK-22404 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 2.2.0 >Reporter: Devaraj K > > There was an issue SPARK-1200 to provide an option but was closed without > fixing. > Using an unmanaged AM in yarn-client mode would allow apps to start up > faster, but not requiring the container launcher AM to be launched on the > cluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14228) Lost executor of RPC disassociated, and occurs exception: Could not find CoarseGrainedScheduler or it has been stopped
[ https://issues.apache.org/jira/browse/SPARK-14228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286282#comment-16286282 ] Devaraj K commented on SPARK-14228: --- [~KaiXinXIaoLei], Thanks for checking this. Is the issue you are mentioning different from the two instances mentioned in the PR, can you create a JIRA with the exception stacktrace? > Lost executor of RPC disassociated, and occurs exception: Could not find > CoarseGrainedScheduler or it has been stopped > -- > > Key: SPARK-14228 > URL: https://issues.apache.org/jira/browse/SPARK-14228 > Project: Spark > Issue Type: Bug >Reporter: meiyoula > Fix For: 2.3.0 > > > When I start 1000 executors, and then stop the process. It will call > SparkContext.stop to stop all executors. But during this process, the > executors has been killed will lost of rpc with driver, and try to > reviveOffers, but can't find CoarseGrainedScheduler or it has been stopped. > {quote} > 16/03/29 01:45:45 ERROR YarnScheduler: Lost executor 610 on 51-196-152-8: > remote Rpc client disassociated > 16/03/29 01:45:45 ERROR Inbox: Ignoring error > org.apache.spark.SparkException: Could not find CoarseGrainedScheduler or it > has been stopped. > at > org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:161) > at > org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:131) > at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:173) > at > org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:398) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.reviveOffers(CoarseGrainedSchedulerBackend.scala:314) > at > org.apache.spark.scheduler.TaskSchedulerImpl.executorLost(TaskSchedulerImpl.scala:482) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.removeExecutor(CoarseGrainedSchedulerBackend.scala:261) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anonfun$onDisconnected$1.apply(CoarseGrainedSchedulerBackend.scala:207) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anonfun$onDisconnected$1.apply(CoarseGrainedSchedulerBackend.scala:207) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.onDisconnected(CoarseGrainedSchedulerBackend.scala:207) > at > org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:144) > at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204) > at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:102) > at > org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {quote} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22519) Remove unnecessary stagingDirPath null check in ApplicationMaster.cleanupStagingDir()
[ https://issues.apache.org/jira/browse/SPARK-22519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated SPARK-22519: -- Summary: Remove unnecessary stagingDirPath null check in ApplicationMaster.cleanupStagingDir() (was: ApplicationMaster.cleanupStagingDir() throws NPE when SPARK_YARN_STAGING_DIR env var is not available) > Remove unnecessary stagingDirPath null check in > ApplicationMaster.cleanupStagingDir() > - > > Key: SPARK-22519 > URL: https://issues.apache.org/jira/browse/SPARK-22519 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 2.2.0 >Reporter: Devaraj K >Priority: Minor > > In the below, the condition checks whether the stagingDirPath is null but > stagingDirPath never becomes null. If SPARK_YARN_STAGING_DIR env var is null > then it throws NPE while creating the Path. > {code:title=ApplicationMaster.scala|borderStyle=solid} > stagingDirPath = new Path(System.getenv("SPARK_YARN_STAGING_DIR")) > if (stagingDirPath == null) { > logError("Staging directory is null") > return > } > {code} > Here we need to check whether the System.getenv("SPARK_YARN_STAGING_DIR") is > null or not, not the stagingDirPath. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22519) Remove unnecessary stagingDirPath null check in ApplicationMaster.cleanupStagingDir()
[ https://issues.apache.org/jira/browse/SPARK-22519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated SPARK-22519: -- Priority: Trivial (was: Minor) > Remove unnecessary stagingDirPath null check in > ApplicationMaster.cleanupStagingDir() > - > > Key: SPARK-22519 > URL: https://issues.apache.org/jira/browse/SPARK-22519 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 2.2.0 >Reporter: Devaraj K >Priority: Trivial > > In the below, the condition checks whether the stagingDirPath is null but > stagingDirPath never becomes null. If SPARK_YARN_STAGING_DIR env var is null > then it throws NPE while creating the Path. > {code:title=ApplicationMaster.scala|borderStyle=solid} > stagingDirPath = new Path(System.getenv("SPARK_YARN_STAGING_DIR")) > if (stagingDirPath == null) { > logError("Staging directory is null") > return > } > {code} > Here we need to check whether the System.getenv("SPARK_YARN_STAGING_DIR") is > null or not, not the stagingDirPath. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22519) ApplicationMaster.cleanupStagingDir() throws NPE when SPARK_YARN_STAGING_DIR env var is not available
[ https://issues.apache.org/jira/browse/SPARK-22519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16251925#comment-16251925 ] Devaraj K commented on SPARK-22519: --- It is not an usual case, I have seen this NPE while working SPARK-22404 when the SPARK_YARN_STAGING_DIR env doesn't exist. If you don't think to have a null check for env, atleast the *if (stagingDirPath == null) {* is never used. > ApplicationMaster.cleanupStagingDir() throws NPE when SPARK_YARN_STAGING_DIR > env var is not available > - > > Key: SPARK-22519 > URL: https://issues.apache.org/jira/browse/SPARK-22519 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 2.2.0 >Reporter: Devaraj K >Priority: Minor > > In the below, the condition checks whether the stagingDirPath is null but > stagingDirPath never becomes null. If SPARK_YARN_STAGING_DIR env var is null > then it throws NPE while creating the Path. > {code:title=ApplicationMaster.scala|borderStyle=solid} > stagingDirPath = new Path(System.getenv("SPARK_YARN_STAGING_DIR")) > if (stagingDirPath == null) { > logError("Staging directory is null") > return > } > {code} > Here we need to check whether the System.getenv("SPARK_YARN_STAGING_DIR") is > null or not, not the stagingDirPath. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22519) ApplicationMaster.cleanupStagingDir() throws NPE when SPARK_YARN_STAGING_DIR env var is not available
Devaraj K created SPARK-22519: - Summary: ApplicationMaster.cleanupStagingDir() throws NPE when SPARK_YARN_STAGING_DIR env var is not available Key: SPARK-22519 URL: https://issues.apache.org/jira/browse/SPARK-22519 Project: Spark Issue Type: Improvement Components: YARN Affects Versions: 2.2.0 Reporter: Devaraj K Priority: Minor In the below, the condition checks whether the stagingDirPath is null but stagingDirPath never becomes null. If SPARK_YARN_STAGING_DIR env var is null then it throws NPE while creating the Path. {code:title=ApplicationMaster.scala|borderStyle=solid} stagingDirPath = new Path(System.getenv("SPARK_YARN_STAGING_DIR")) if (stagingDirPath == null) { logError("Staging directory is null") return } {code} Here we need to check whether the System.getenv("SPARK_YARN_STAGING_DIR") is null or not, not the stagingDirPath. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22404) Provide an option to use unmanaged AM in yarn-client mode
Devaraj K created SPARK-22404: - Summary: Provide an option to use unmanaged AM in yarn-client mode Key: SPARK-22404 URL: https://issues.apache.org/jira/browse/SPARK-22404 Project: Spark Issue Type: Improvement Components: YARN Affects Versions: 2.2.0 Reporter: Devaraj K There was an issue SPARK-1200 to provide an option but was closed without fixing. Using an unmanaged AM in yarn-client mode would allow apps to start up faster, but not requiring the container launcher AM to be launched on the cluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22404) Provide an option to use unmanaged AM in yarn-client mode
[ https://issues.apache.org/jira/browse/SPARK-22404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226015#comment-16226015 ] Devaraj K commented on SPARK-22404: --- I am working on this, will update this jira with the proposal PR. > Provide an option to use unmanaged AM in yarn-client mode > - > > Key: SPARK-22404 > URL: https://issues.apache.org/jira/browse/SPARK-22404 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 2.2.0 >Reporter: Devaraj K > > There was an issue SPARK-1200 to provide an option but was closed without > fixing. > Using an unmanaged AM in yarn-client mode would allow apps to start up > faster, but not requiring the container launcher AM to be launched on the > cluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22172) Worker hangs when the external shuffle service port is already in use
Devaraj K created SPARK-22172: - Summary: Worker hangs when the external shuffle service port is already in use Key: SPARK-22172 URL: https://issues.apache.org/jira/browse/SPARK-22172 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.2.0 Reporter: Devaraj K When the external shuffle service port is already in use, Worker throws the below BindException and hangs forever, I think the exception should be handled gracefully. {code:xml} 17/09/29 11:16:30 INFO ExternalShuffleService: Starting shuffle service on port 7337 (auth enabled = false) 17/09/29 11:16:30 ERROR Inbox: Ignoring error java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:433) at sun.nio.ch.Net.bind(Net.java:425) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:128) at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:500) at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1218) at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:495) at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:480) at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:965) at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:209) at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:355) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:399) {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19417) spark.files.overwrite is ignored
[ https://issues.apache.org/jira/browse/SPARK-19417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16177253#comment-16177253 ] Devaraj K commented on SPARK-19417: --- Thanks [~ckanich] for the test case. {code:title=SparkContext.scala|borderStyle=solid} def addFile(path: String, recursive: Boolean): Unit = { val timestamp = System.currentTimeMillis if (addedFiles.putIfAbsent(key, timestamp).isEmpty) { logInfo(s"Added file $path at $key with timestamp $timestamp") // Fetch the file locally so that closures which are run on the driver can still use the // SparkFiles API to access files. Utils.fetchFile(uri.toString, new File(SparkFiles.getRootDirectory()), conf, env.securityManager, hadoopConfiguration, timestamp, useCache = false) postEnvironmentUpdate() } {code} It is not adding the file if it exists already and it seems to be the intentional behavior, Please find the discussion here https://github.com/apache/spark/pull/14396. Do you have any real use case to have this? > spark.files.overwrite is ignored > > > Key: SPARK-19417 > URL: https://issues.apache.org/jira/browse/SPARK-19417 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 >Reporter: Chris Kanich > > I have not been able to get Spark to actually overwrite a file after I have > changed it on the driver node, re-called addFile, and then used it on the > executors again. Here's a failing test. > {code} > test("can overwrite files when spark.files.overwrite is true") { > val dir = Utils.createTempDir() > val file = new File(dir, "file") > try { > Files.write("one", file, StandardCharsets.UTF_8) > sc = new SparkContext(new > SparkConf().setAppName("test").setMaster("local-cluster[1,1,1024]") > .set("spark.files.overwrite", "true")) > sc.addFile(file.getAbsolutePath) > def getAddedFileContents(): String = { > sc.parallelize(Seq(0)).map { _ => > scala.io.Source.fromFile(SparkFiles.get("file")).mkString > }.first() > } > assert(getAddedFileContents() === "one") > Files.write("two", file, StandardCharsets.UTF_8) > sc.addFile(file.getAbsolutePath) > assert(getAddedFileContents() === "onetwo") > } finally { > Utils.deleteRecursively(dir) > sc.stop() > } > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18648) spark-shell --jars option does not add jars to classpath on windows
[ https://issues.apache.org/jira/browse/SPARK-18648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121817#comment-16121817 ] Devaraj K commented on SPARK-18648: --- [~FlamingMike], It has fixed as part of SPARK-21339, can you check this issue with SPARK-21339 change if you have chance? Thanks > spark-shell --jars option does not add jars to classpath on windows > --- > > Key: SPARK-18648 > URL: https://issues.apache.org/jira/browse/SPARK-18648 > Project: Spark > Issue Type: Bug > Components: Spark Shell, Windows >Affects Versions: 2.0.2 > Environment: Windows 7 x64 >Reporter: Michel Lemay > Labels: windows > > I can't import symbols from command line jars when in the shell: > Adding jars via --jars: > {code} > spark-shell --master local[*] --jars path\to\deeplearning4j-core-0.7.0.jar > {code} > Same result if I add it through maven coordinates: > {code}spark-shell --master local[*] --packages > org.deeplearning4j:deeplearning4j-core:0.7.0 > {code} > I end up with: > {code} > scala> import org.deeplearning4j > :23: error: object deeplearning4j is not a member of package org >import org.deeplearning4j > {code} > NOTE: It is working as expected when running on linux. > Sample output with --verbose: > {code} > Using properties file: null > Parsed arguments: > master local[*] > deployMode null > executorMemory null > executorCores null > totalExecutorCores null > propertiesFile null > driverMemorynull > driverCores null > driverExtraClassPathnull > driverExtraLibraryPath null > driverExtraJavaOptions null > supervise false > queue null > numExecutorsnull > files null > pyFiles null > archivesnull > mainClass org.apache.spark.repl.Main > primaryResource spark-shell > nameSpark shell > childArgs [] > jars > file:/C:/Apps/Spark/spark-2.0.2-bin-hadoop2.4/bin/../deeplearning4j-core-0.7.0.jar > packagesnull > packagesExclusions null > repositoriesnull > verbose true > Spark properties used, including those specified through > --conf and those from the properties file null: > Main class: > org.apache.spark.repl.Main > Arguments: > System properties: > SPARK_SUBMIT -> true > spark.app.name -> Spark shell > spark.jars -> > file:/C:/Apps/Spark/spark-2.0.2-bin-hadoop2.4/bin/../deeplearning4j-core-0.7.0.jar > spark.submit.deployMode -> client > spark.master -> local[*] > Classpath elements: > file:/C:/Apps/Spark/spark-2.0.2-bin-hadoop2.4/bin/../deeplearning4j-core-0.7.0.jar > 16/11/30 08:30:49 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 16/11/30 08:30:51 WARN SparkContext: Use an existing SparkContext, some > configuration may not take effect. > Spark context Web UI available at http://192.168.70.164:4040 > Spark context available as 'sc' (master = local[*], app id = > local-1480512651325). > Spark session available as 'spark'. > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 2.0.2 > /_/ > Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_101) > Type in expressions to have them evaluated. > Type :help for more information. > scala> import org.deeplearning4j > :23: error: object deeplearning4j is not a member of package org >import org.deeplearning4j > ^ > scala> > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15142) Spark Mesos dispatcher becomes unusable when the Mesos master restarts
[ https://issues.apache.org/jira/browse/SPARK-15142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100079#comment-16100079 ] Devaraj K commented on SPARK-15142: --- bq. That means there is no way to detect the new master while the dispatcher is still alive, it must be restarted when the new master is up, correct? Yes, the https://github.com/apache/spark/pull/13143 doesn't handle discovery of new master when the dispatcher is still alive. You can reopen this jira and create a PR if you want to work on. > Spark Mesos dispatcher becomes unusable when the Mesos master restarts > -- > > Key: SPARK-15142 > URL: https://issues.apache.org/jira/browse/SPARK-15142 > Project: Spark > Issue Type: Bug > Components: Deploy, Mesos >Reporter: Devaraj K >Priority: Minor > Attachments: > spark-devaraj-org.apache.spark.deploy.mesos.MesosClusterDispatcher-1-stobdtserver5.out > > > While Spark Mesos dispatcher running if the Mesos master gets restarted then > Spark Mesos dispatcher will keep running and queues up all the submitted > applications and will not launch them. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-15142) Spark Mesos dispatcher becomes unusable when the Mesos master restarts
[ https://issues.apache.org/jira/browse/SPARK-15142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K resolved SPARK-15142. --- Resolution: Duplicate > Spark Mesos dispatcher becomes unusable when the Mesos master restarts > -- > > Key: SPARK-15142 > URL: https://issues.apache.org/jira/browse/SPARK-15142 > Project: Spark > Issue Type: Bug > Components: Deploy, Mesos >Reporter: Devaraj K >Priority: Minor > Attachments: > spark-devaraj-org.apache.spark.deploy.mesos.MesosClusterDispatcher-1-stobdtserver5.out > > > While Spark Mesos dispatcher running if the Mesos master gets restarted then > Spark Mesos dispatcher will keep running and queues up all the submitted > applications and will not launch them. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15142) Spark Mesos dispatcher becomes unusable when the Mesos master restarts
[ https://issues.apache.org/jira/browse/SPARK-15142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099535#comment-16099535 ] Devaraj K commented on SPARK-15142: --- [~skonto] Thanks for showing interest on this. I have already created a PR for SPARK-15359 which fixes this issue. > Spark Mesos dispatcher becomes unusable when the Mesos master restarts > -- > > Key: SPARK-15142 > URL: https://issues.apache.org/jira/browse/SPARK-15142 > Project: Spark > Issue Type: Bug > Components: Deploy, Mesos >Reporter: Devaraj K >Priority: Minor > Attachments: > spark-devaraj-org.apache.spark.deploy.mesos.MesosClusterDispatcher-1-stobdtserver5.out > > > While Spark Mesos dispatcher running if the Mesos master gets restarted then > Spark Mesos dispatcher will keep running and queues up all the submitted > applications and will not launch them. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21146) Master/Worker should handle and shutdown when any thread gets UncaughtException
[ https://issues.apache.org/jira/browse/SPARK-21146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated SPARK-21146: -- Summary: Master/Worker should handle and shutdown when any thread gets UncaughtException (was: Worker should handle and shutdown when any thread gets UncaughtException) > Master/Worker should handle and shutdown when any thread gets > UncaughtException > --- > > Key: SPARK-21146 > URL: https://issues.apache.org/jira/browse/SPARK-21146 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.1.1 >Reporter: Devaraj K > > {code:xml} > 17/06/19 11:41:23 INFO Worker: Asked to launch executor > app-20170619114055-0005/228 for ScalaSort > Exception in thread "dispatcher-event-loop-79" java.lang.OutOfMemoryError: > unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:714) > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950) > at > java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1018) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > I see in the logs that Worker's dispatcher-event got the above exception and > the Worker keeps running without performing any functionality. And also > Worker state changed from ALIVE to DEAD in Master's web UI. > {code:xml} > worker-20170619150349-192.168.1.120-56175 192.168.1.120:56175 DEAD > 88 (41 Used)251.2 GB (246.0 GB Used) > {code} > I think Worker should handle and shutdown when any thread gets > UncaughtException. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-21148) Set SparkUncaughtExceptionHandler to the Master
[ https://issues.apache.org/jira/browse/SPARK-21148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K closed SPARK-21148. - Resolution: Duplicate > Set SparkUncaughtExceptionHandler to the Master > --- > > Key: SPARK-21148 > URL: https://issues.apache.org/jira/browse/SPARK-21148 > Project: Spark > Issue Type: Improvement > Components: Deploy, Spark Core >Affects Versions: 2.1.1 >Reporter: Devaraj K > > Any one thread of the Master gets any of the UncaughtException then the > thread gets terminate and the Master process keeps running without > functioning properly. > I think we need to handle the UncaughtException and exit the Master > gracefully. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21170) Utils.tryWithSafeFinallyAndFailureCallbacks throws IllegalArgumentException: Self-suppression not permitted
Devaraj K created SPARK-21170: - Summary: Utils.tryWithSafeFinallyAndFailureCallbacks throws IllegalArgumentException: Self-suppression not permitted Key: SPARK-21170 URL: https://issues.apache.org/jira/browse/SPARK-21170 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.1.1 Reporter: Devaraj K Priority: Minor {code:xml} 17/06/20 22:49:39 ERROR Executor: Exception in task 225.0 in stage 1.0 (TID 27225) java.lang.IllegalArgumentException: Self-suppression not permitted at java.lang.Throwable.addSuppressed(Throwable.java:1043) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1400) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1145) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1125) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:341) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} {code:xml} 17/06/20 22:52:32 INFO scheduler.TaskSetManager: Lost task 427.0 in stage 1.0 (TID 27427) on 192.168.1.121, executor 12: java.lang.IllegalArgumentException (Self-suppression not permitted) [duplicate 1] 17/06/20 22:52:33 INFO scheduler.TaskSetManager: Starting task 427.1 in stage 1.0 (TID 27764, 192.168.1.122, executor 106, partition 427, PROCESS_LOCAL, 4625 bytes) 17/06/20 22:52:33 INFO scheduler.TaskSetManager: Lost task 186.0 in stage 1.0 (TID 27186) on 192.168.1.122, executor 106: java.lang.IllegalArgumentException (Self-suppression not permitted) [duplicate 2] 17/06/20 22:52:38 INFO scheduler.TaskSetManager: Starting task 186.1 in stage 1.0 (TID 27765, 192.168.1.121, executor 9, partition 186, PROCESS_LOCAL, 4625 bytes) 17/06/20 22:52:38 WARN scheduler.TaskSetManager: Lost task 392.0 in stage 1.0 (TID 27392, 192.168.1.121, executor 9): java.lang.IllegalArgumentException: Self-suppression not permitted at java.lang.Throwable.addSuppressed(Throwable.java:1043) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1400) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1145) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1125) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:341) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} Here it is trying to suppress the same Throwable instance and causing to throw the IllegalArgumentException which masks the original exception. I think it should not add to the suppressed if it is the same instance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21148) Set SparkUncaughtExceptionHandler to the Master
Devaraj K created SPARK-21148: - Summary: Set SparkUncaughtExceptionHandler to the Master Key: SPARK-21148 URL: https://issues.apache.org/jira/browse/SPARK-21148 Project: Spark Issue Type: Improvement Components: Deploy, Spark Core Affects Versions: 2.1.1 Reporter: Devaraj K Any one thread of the Master gets any of the UncaughtException then the thread gets terminate and the Master process keeps running without functioning properly. I think we need to handle the UncaughtException and exit the Master gracefully. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21146) Worker should handle and shutdown when any thread gets UncaughtException
Devaraj K created SPARK-21146: - Summary: Worker should handle and shutdown when any thread gets UncaughtException Key: SPARK-21146 URL: https://issues.apache.org/jira/browse/SPARK-21146 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 2.1.1 Reporter: Devaraj K {code:xml} 17/06/19 11:41:23 INFO Worker: Asked to launch executor app-20170619114055-0005/228 for ScalaSort Exception in thread "dispatcher-event-loop-79" java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:714) at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950) at java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1018) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} I see in the logs that Worker's dispatcher-event got the above exception and the Worker keeps running without performing any functionality. And also Worker state changed from ALIVE to DEAD in Master's web UI. {code:xml} worker-20170619150349-192.168.1.120-56175 192.168.1.120:56175 DEAD 88 (41 Used)251.2 GB (246.0 GB Used) {code} I think Worker should handle and shutdown when any thread gets UncaughtException. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15665) spark-submit --kill and --status are not working
[ https://issues.apache.org/jira/browse/SPARK-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942141#comment-15942141 ] Devaraj K commented on SPARK-15665: --- [~samuel-soubeyran], This issue has been resolved, Please create another jira if you see any other problems. > spark-submit --kill and --status are not working > - > > Key: SPARK-15665 > URL: https://issues.apache.org/jira/browse/SPARK-15665 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Devaraj K >Assignee: Devaraj K > Fix For: 2.0.0 > > > {code:xml} > [devaraj@server2 spark-master]$ ./bin/spark-submit --kill > driver-20160531171222- --master spark://xx.xx.xx.xx:6066 > Exception in thread "main" java.lang.IllegalArgumentException: Missing > application resource. > at > org.apache.spark.launcher.CommandBuilderUtils.checkArgument(CommandBuilderUtils.java:241) > at > org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitArgs(SparkSubmitCommandBuilder.java:160) > at > org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitCommand(SparkSubmitCommandBuilder.java:276) > at > org.apache.spark.launcher.SparkSubmitCommandBuilder.buildCommand(SparkSubmitCommandBuilder.java:151) > at org.apache.spark.launcher.Main.main(Main.java:86) > {code} > {code:xml} > [devaraj@server2 spark-master]$ ./bin/spark-submit --status > driver-20160531171222- --master spark://xx.xx.xx.xx:6066 > Exception in thread "main" java.lang.IllegalArgumentException: Missing > application resource. > at > org.apache.spark.launcher.CommandBuilderUtils.checkArgument(CommandBuilderUtils.java:241) > at > org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitArgs(SparkSubmitCommandBuilder.java:160) > at > org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitCommand(SparkSubmitCommandBuilder.java:276) > at > org.apache.spark.launcher.SparkSubmitCommandBuilder.buildCommand(SparkSubmitCommandBuilder.java:151) > at org.apache.spark.launcher.Main.main(Main.java:86) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-19689) Job Details page doesn't show 'Tasks: Succeeded/Total' progress bar text properly
Devaraj K created SPARK-19689: - Summary: Job Details page doesn't show 'Tasks: Succeeded/Total' progress bar text properly Key: SPARK-19689 URL: https://issues.apache.org/jira/browse/SPARK-19689 Project: Spark Issue Type: Improvement Components: Web UI Affects Versions: 2.1.0 Reporter: Devaraj K Priority: Minor Attachments: Tasks Progress bar - Job Details Page.png In Failed Stages table, 'Tasks: Succeeded/Total' value is displaying properly when there is a Failure Reason with some multi-line text. Please find the attached screen shot for more details. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19689) Job Details page doesn't show 'Tasks: Succeeded/Total' progress bar text properly
[ https://issues.apache.org/jira/browse/SPARK-19689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated SPARK-19689: -- Attachment: Tasks Progress bar - Job Details Page.png > Job Details page doesn't show 'Tasks: Succeeded/Total' progress bar text > properly > - > > Key: SPARK-19689 > URL: https://issues.apache.org/jira/browse/SPARK-19689 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 2.1.0 >Reporter: Devaraj K >Priority: Minor > Attachments: Tasks Progress bar - Job Details Page.png > > > In Failed Stages table, 'Tasks: Succeeded/Total' value is displaying properly > when there is a Failure Reason with some multi-line text. > Please find the attached screen shot for more details. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19354) Killed tasks are getting marked as FAILED
[ https://issues.apache.org/jira/browse/SPARK-19354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated SPARK-19354: -- Description: When we enable speculation, we can see there are multiple attempts running for the same task when the first task progress is slow. If any of the task attempt succeeds then the other attempts will be killed, during killing the attempts those attempts are getting marked as failed due to the below error. We need to handle this error and mark the attempt as KILLED instead of FAILED. ||93||214 ||1 (speculative) ||FAILED||ANY ||1 / xx.xx.xx.x2 stdout stderr||2017/01/24 10:30:44 ||0.2 s ||0.0 B / 0 ||8.0 KB / 400 ||java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: node2/xx.xx.xx.x2; destination host is: node1:9000; +details|| {code:xml} 17/01/23 23:54:32 INFO Executor: Executor is trying to kill task 93.1 in stage 1.0 (TID 214) 17/01/23 23:54:32 INFO FileOutputCommitter: File Output Committer Algorithm version is 1 17/01/23 23:54:32 ERROR Executor: Exception in task 93.1 in stage 1.0 (TID 214) java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "stobdtserver3/10.224.54.70"; destination host is: "stobdtserver2":9000; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:776) at org.apache.hadoop.ipc.Client.call(Client.java:1479) at org.apache.hadoop.ipc.Client.call(Client.java:1412) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy17.create(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:296) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy18.create(Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1648) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1689) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1624) at org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:448) at org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:444) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:459) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:387) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:804) at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123) at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:90) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1133) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1124) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:88) at org.apache.spark.scheduler.Task.run(Task.scala:114) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.nio.channels.ClosedByInterruptException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:659) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712) at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375) at
[jira] [Updated] (SPARK-19377) Killed tasks should have the status as KILLED
[ https://issues.apache.org/jira/browse/SPARK-19377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated SPARK-19377: -- Description: |143|10 |0 |SUCCESS|NODE_LOCAL |6 / x.xx.x.x stdout stderr |2017/01/25 07:49:27 |0 ms |0.0 B / 0 |0.0 B / 0 |TaskKilled (killed intentionally)| |156|11 |0 |SUCCESS|NODE_LOCAL |5 / x.xx.x.x stdout stderr |2017/01/25 07:49:27 |0 ms |0.0 B / 0 |0.0 B / 0 |TaskKilled (killed intentionally)| Killed tasks show the task status as SUCCESS, I think we should have the status as KILLED for the killed tasks. was: |143|10 |0 |SUCCESS|NODE_LOCAL |6 / x.xx.x.x stdout stderr |2017/01/25 07:49:27|0 ms |0.0 B / 0 |0.0 B / 0 |TaskKilled (killed intentionally)| |156|11 |0 |SUCCESS|NODE_LOCAL |5 / x.xx.x.x stdout stderr |2017/01/25 07:49:27|0 ms |0.0 B / 0 |0.0 B / 0 |TaskKilled (killed intentionally)| Killed tasks show the task status as SUCCESS, I think we should have the status as KILLED for the killed tasks. > Killed tasks should have the status as KILLED > - > > Key: SPARK-19377 > URL: https://issues.apache.org/jira/browse/SPARK-19377 > Project: Spark > Issue Type: Improvement > Components: Spark Core, Web UI >Reporter: Devaraj K >Priority: Minor > > |143 |10 |0 |SUCCESS|NODE_LOCAL |6 / x.xx.x.x > stdout > stderr |2017/01/25 07:49:27 |0 ms |0.0 B / 0 |0.0 B > / 0 |TaskKilled (killed intentionally)| > |156 |11 |0 |SUCCESS|NODE_LOCAL |5 / x.xx.x.x > stdout > stderr |2017/01/25 07:49:27 |0 ms |0.0 B / 0 |0.0 B > / 0 |TaskKilled (killed intentionally)| > Killed tasks show the task status as SUCCESS, I think we should have the > status as KILLED for the killed tasks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-19377) Killed tasks should have the status as KILLED
Devaraj K created SPARK-19377: - Summary: Killed tasks should have the status as KILLED Key: SPARK-19377 URL: https://issues.apache.org/jira/browse/SPARK-19377 Project: Spark Issue Type: Improvement Components: Spark Core, Web UI Reporter: Devaraj K Priority: Minor |143|10 |0 |SUCCESS|NODE_LOCAL |6 / x.xx.x.x stdout stderr |2017/01/25 07:49:27|0 ms |0.0 B / 0 |0.0 B / 0 |TaskKilled (killed intentionally)| |156|11 |0 |SUCCESS|NODE_LOCAL |5 / x.xx.x.x stdout stderr |2017/01/25 07:49:27|0 ms |0.0 B / 0 |0.0 B / 0 |TaskKilled (killed intentionally)| Killed tasks show the task status as SUCCESS, I think we should have the status as KILLED for the killed tasks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19354) Killed tasks are getting marked as FAILED
[ https://issues.apache.org/jira/browse/SPARK-19354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838396#comment-15838396 ] Devaraj K commented on SPARK-19354: --- bq. The question is, why the error during shutdown? The shutdown is not related to the error here, there were no further tasks to execute so driver commanded to shutdown it. It happens so frequently when the speculation enabled, I am suspecting it would probably lead to executors blacklisting due to this failure during kill. > Killed tasks are getting marked as FAILED > - > > Key: SPARK-19354 > URL: https://issues.apache.org/jira/browse/SPARK-19354 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Spark Core >Reporter: Devaraj K >Priority: Minor > > When we enable speculation, we can see there are multiple attempts running > for the same task when the first task progress is slow. If any of the task > attempt succeeds then the other attempts will be killed, during killing the > attempts those attempts are getting marked as failed due to the below error. > We need to handle this error and mark the attempt as KILLED instead of FAILED. > ||93 ||214 ||1 (speculative) ||FAILED||ANY ||1 / > xx.xx.xx.x2 > stdout > stderr > ||2017/01/24 10:30:44 ||0.2 s ||0.0 B / 0 ||8.0 KB / 400 > ||java.io.IOException: Failed on local exception: > java.nio.channels.ClosedByInterruptException; Host Details : local host is: > node2/xx.xx.xx.x2; destination host is: node1:9000; > +details|| > {code:xml} > 17/01/23 23:54:32 INFO Executor: Executor is trying to kill task 93.1 in > stage 1.0 (TID 214) > 17/01/23 23:54:32 INFO FileOutputCommitter: File Output Committer Algorithm > version is 1 > 17/01/23 23:54:32 ERROR Executor: Exception in task 93.1 in stage 1.0 (TID > 214) > java.io.IOException: Failed on local exception: > java.nio.channels.ClosedByInterruptException; Host Details : local host is: > "stobdtserver3/10.224.54.70"; destination host is: "stobdtserver2":9000; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:776) > at org.apache.hadoop.ipc.Client.call(Client.java:1479) > at org.apache.hadoop.ipc.Client.call(Client.java:1412) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy17.create(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:296) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy18.create(Unknown Source) > at > org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1648) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1689) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1624) > at > org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:448) > at > org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:444) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:459) > at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:387) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:804) > at > org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123) > at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:90) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1133) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1124) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:88) > at org.apache.spark.scheduler.Task.run(Task.scala:114) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at >
[jira] [Commented] (SPARK-19354) Killed tasks are getting marked as FAILED
[ https://issues.apache.org/jira/browse/SPARK-19354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15837346#comment-15837346 ] Devaraj K commented on SPARK-19354: --- Thanks [~uncleGen] for the comment. Here the error occurred during the kill process is masking the original reason that the task has been killed, I think we need to retain the real reason instead of the masked one. And also when we see a failed task we may suspect something went wrong and starts diagnose for the cause, finally we see that it happened during kill. It also shows wrong metrics in the section 'Aggregated Metrics by Executor'. > Killed tasks are getting marked as FAILED > - > > Key: SPARK-19354 > URL: https://issues.apache.org/jira/browse/SPARK-19354 > Project: Spark > Issue Type: Bug > Components: Scheduler, Spark Core >Reporter: Devaraj K >Priority: Minor > > When we enable speculation, we can see there are multiple attempts running > for the same task when the first task progress is slow. If any of the task > attempt succeeds then the other attempts will be killed, during killing the > attempts those attempts are getting marked as failed due to the below error. > We need to handle this error and mark the attempt as KILLED instead of FAILED. > ||93 ||214 ||1 (speculative) ||FAILED||ANY ||1 / > xx.xx.xx.x2 > stdout > stderr > ||2017/01/24 10:30:44 ||0.2 s ||0.0 B / 0 ||8.0 KB / 400 > ||java.io.IOException: Failed on local exception: > java.nio.channels.ClosedByInterruptException; Host Details : local host is: > node2/xx.xx.xx.x2; destination host is: node1:9000; > +details|| > {code:xml} > 17/01/23 23:54:32 INFO Executor: Executor is trying to kill task 93.1 in > stage 1.0 (TID 214) > 17/01/23 23:54:32 INFO FileOutputCommitter: File Output Committer Algorithm > version is 1 > 17/01/23 23:54:32 ERROR Executor: Exception in task 93.1 in stage 1.0 (TID > 214) > java.io.IOException: Failed on local exception: > java.nio.channels.ClosedByInterruptException; Host Details : local host is: > "stobdtserver3/10.224.54.70"; destination host is: "stobdtserver2":9000; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:776) > at org.apache.hadoop.ipc.Client.call(Client.java:1479) > at org.apache.hadoop.ipc.Client.call(Client.java:1412) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy17.create(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:296) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy18.create(Unknown Source) > at > org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1648) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1689) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1624) > at > org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:448) > at > org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:444) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:459) > at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:387) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:804) > at > org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123) > at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:90) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1133) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1124) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:88) > at org.apache.spark.scheduler.Task.run(Task.scala:114) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) > at >
[jira] [Created] (SPARK-19354) Killed tasks are getting marked as FAILED
Devaraj K created SPARK-19354: - Summary: Killed tasks are getting marked as FAILED Key: SPARK-19354 URL: https://issues.apache.org/jira/browse/SPARK-19354 Project: Spark Issue Type: Bug Components: Scheduler, Spark Core Reporter: Devaraj K Priority: Minor When we enable speculation, we can see there are multiple attempts running for the same task when the first task progress is slow. If any of the task attempt succeeds then the other attempts will be killed, during killing the attempts those attempts are getting marked as failed due to the below error. We need to handle this error and mark the attempt as KILLED instead of FAILED. ||93||214 ||1 (speculative) ||FAILED||ANY ||1 / xx.xx.xx.x2 stdout stderr ||2017/01/24 10:30:44 ||0.2 s ||0.0 B / 0 ||8.0 KB / 400 ||java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: node2/xx.xx.xx.x2; destination host is: node1:9000; +details|| {code:xml} 17/01/23 23:54:32 INFO Executor: Executor is trying to kill task 93.1 in stage 1.0 (TID 214) 17/01/23 23:54:32 INFO FileOutputCommitter: File Output Committer Algorithm version is 1 17/01/23 23:54:32 ERROR Executor: Exception in task 93.1 in stage 1.0 (TID 214) java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "stobdtserver3/10.224.54.70"; destination host is: "stobdtserver2":9000; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:776) at org.apache.hadoop.ipc.Client.call(Client.java:1479) at org.apache.hadoop.ipc.Client.call(Client.java:1412) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy17.create(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:296) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy18.create(Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1648) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1689) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1624) at org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:448) at org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:444) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:459) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:387) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:804) at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123) at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:90) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1133) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1124) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:88) at org.apache.spark.scheduler.Task.run(Task.scala:114) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.nio.channels.ClosedByInterruptException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:659) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495) at
[jira] [Commented] (SPARK-15359) Mesos dispatcher should handle DRIVER_ABORTED status from mesosDriver.run()
[ https://issues.apache.org/jira/browse/SPARK-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15786878#comment-15786878 ] Devaraj K commented on SPARK-15359: --- Thanks [~yu2003w] for verifying this PR, I forgot to mention that it depends on SPARK-15288 [https://github.com/apache/spark/pull/13072] for handling the UncaughtException's, sorry for that. Can you verify this PR with the SPARK-15288 fix? > Mesos dispatcher should handle DRIVER_ABORTED status from mesosDriver.run() > --- > > Key: SPARK-15359 > URL: https://issues.apache.org/jira/browse/SPARK-15359 > Project: Spark > Issue Type: Bug > Components: Deploy, Mesos >Reporter: Devaraj K >Priority: Minor > > Mesos dispatcher handles DRIVER_ABORTED status for mesosDriver.run() during > the successful registration but if the mesosDriver.run() returns > DRIVER_ABORTED status after the successful register then there is no action > for the status and the thread will be terminated. > I think we need to throw the exception and shutdown the dispatcher. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15359) Mesos dispatcher should handle DRIVER_ABORTED status from mesosDriver.run()
[ https://issues.apache.org/jira/browse/SPARK-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15783821#comment-15783821 ] Devaraj K commented on SPARK-15359: --- [~yu2003w], seems you are also facing the same issue which I mentioned in the description, I already created PR for this issue, do you have chance to try with the PR available and let me know your feedback? > Mesos dispatcher should handle DRIVER_ABORTED status from mesosDriver.run() > --- > > Key: SPARK-15359 > URL: https://issues.apache.org/jira/browse/SPARK-15359 > Project: Spark > Issue Type: Bug > Components: Deploy, Mesos >Reporter: Devaraj K >Priority: Minor > > Mesos dispatcher handles DRIVER_ABORTED status for mesosDriver.run() during > the successful registration but if the mesosDriver.run() returns > DRIVER_ABORTED status after the successful register then there is no action > for the status and the thread will be terminated. > I think we need to throw the exception and shutdown the dispatcher. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15665) spark-submit --kill and --status are not working
Devaraj K created SPARK-15665: - Summary: spark-submit --kill and --status are not working Key: SPARK-15665 URL: https://issues.apache.org/jira/browse/SPARK-15665 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Devaraj K {code:xml} [devaraj@server2 spark-master]$ ./bin/spark-submit --kill driver-20160531171222- --master spark://xx.xx.xx.xx:6066 Exception in thread "main" java.lang.IllegalArgumentException: Missing application resource. at org.apache.spark.launcher.CommandBuilderUtils.checkArgument(CommandBuilderUtils.java:241) at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitArgs(SparkSubmitCommandBuilder.java:160) at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitCommand(SparkSubmitCommandBuilder.java:276) at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildCommand(SparkSubmitCommandBuilder.java:151) at org.apache.spark.launcher.Main.main(Main.java:86) {code} {code:xml} [devaraj@server2 spark-master]$ ./bin/spark-submit --status driver-20160531171222- --master spark://xx.xx.xx.xx:6066 Exception in thread "main" java.lang.IllegalArgumentException: Missing application resource. at org.apache.spark.launcher.CommandBuilderUtils.checkArgument(CommandBuilderUtils.java:241) at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitArgs(SparkSubmitCommandBuilder.java:160) at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitCommand(SparkSubmitCommandBuilder.java:276) at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildCommand(SparkSubmitCommandBuilder.java:151) at org.apache.spark.launcher.Main.main(Main.java:86) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15560) Queued/Supervise drivers waiting for retry drivers disappear for kill command in Mesos mode
Devaraj K created SPARK-15560: - Summary: Queued/Supervise drivers waiting for retry drivers disappear for kill command in Mesos mode Key: SPARK-15560 URL: https://issues.apache.org/jira/browse/SPARK-15560 Project: Spark Issue Type: Bug Components: Mesos Reporter: Devaraj K Priority: Minor When we issue a kill command to the drivers when they are in Queued Drivers or Supervise drivers waiting for retry state, the driver disappear from the Mesos dispatcher web UI. I think they should be moved to Finished Drivers and list there instead of making it to disappear completely. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15555) Driver with --supervise option cannot be killed in Mesos mode
Devaraj K created SPARK-1: - Summary: Driver with --supervise option cannot be killed in Mesos mode Key: SPARK-1 URL: https://issues.apache.org/jira/browse/SPARK-1 Project: Spark Issue Type: Bug Components: Deploy, Mesos Reporter: Devaraj K When we have a launched driver which was submitted using --supervise option and if we want to kill it using 'spark-submit --kill' command then the Mesos dispatcher will add it back to 'Supervise drivers waiting for retry' section and restarts it again and again. I don't see any way to kill the supervised drivers. I think it should not be re-launched for a kill request. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15359) Mesos dispatcher should handle DRIVER_ABORTED status from mesosDriver.run()
Devaraj K created SPARK-15359: - Summary: Mesos dispatcher should handle DRIVER_ABORTED status from mesosDriver.run() Key: SPARK-15359 URL: https://issues.apache.org/jira/browse/SPARK-15359 Project: Spark Issue Type: Bug Components: Deploy, Mesos Reporter: Devaraj K Priority: Minor Mesos dispatcher handles DRIVER_ABORTED status for mesosDriver.run() during the successful registration but if the mesosDriver.run() returns DRIVER_ABORTED status after the successful register then there is no action for the status and the thread will be terminated. I think we need to throw the exception and shutdown the dispatcher. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15288) Mesos dispatcher should handle gracefully when any thread gets UncaughtException
Devaraj K created SPARK-15288: - Summary: Mesos dispatcher should handle gracefully when any thread gets UncaughtException Key: SPARK-15288 URL: https://issues.apache.org/jira/browse/SPARK-15288 Project: Spark Issue Type: Improvement Components: Deploy, Mesos Reporter: Devaraj K Priority: Minor Any one thread of the Mesos dispatcher gets any of the UncaughtException then the thread gets terminate and the dispatcher process keeps running without functioning properly. I think we need to handle the UncaughtException and shutdown the Mesos dispatcher. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15142) Spark Mesos dispatcher becomes unusable when the Mesos master restarts
[ https://issues.apache.org/jira/browse/SPARK-15142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275978#comment-15275978 ] Devaraj K commented on SPARK-15142: --- bq. Can you include the dispatcher logs? I have attached the dispatcher logs, but I don't see anything useful in those. bq. Does restarting the dispatcher fix the problem? Yes, It works fine after restarting the dispatcher. I suspect the dispatcher is loosing the connection with the mesos master after the mesos master restart and stops receiving the resource offerings. I think the dispatcher may need to re-register with the mesos master for connection loss. I will try creating a PR to fix this issue. > Spark Mesos dispatcher becomes unusable when the Mesos master restarts > -- > > Key: SPARK-15142 > URL: https://issues.apache.org/jira/browse/SPARK-15142 > Project: Spark > Issue Type: Bug > Components: Deploy, Mesos >Reporter: Devaraj K >Priority: Minor > Attachments: > spark-devaraj-org.apache.spark.deploy.mesos.MesosClusterDispatcher-1-stobdtserver5.out > > > While Spark Mesos dispatcher running if the Mesos master gets restarted then > Spark Mesos dispatcher will keep running and queues up all the submitted > applications and will not launch them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-15142) Spark Mesos dispatcher becomes unusable when the Mesos master restarts
[ https://issues.apache.org/jira/browse/SPARK-15142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated SPARK-15142: -- Attachment: spark-devaraj-org.apache.spark.deploy.mesos.MesosClusterDispatcher-1-stobdtserver5.out > Spark Mesos dispatcher becomes unusable when the Mesos master restarts > -- > > Key: SPARK-15142 > URL: https://issues.apache.org/jira/browse/SPARK-15142 > Project: Spark > Issue Type: Bug > Components: Deploy, Mesos >Reporter: Devaraj K >Priority: Minor > Attachments: > spark-devaraj-org.apache.spark.deploy.mesos.MesosClusterDispatcher-1-stobdtserver5.out > > > While Spark Mesos dispatcher running if the Mesos master gets restarted then > Spark Mesos dispatcher will keep running and queues up all the submitted > applications and will not launch them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15142) Spark Mesos dispatcher becomes unusable when the Mesos master restarts
[ https://issues.apache.org/jira/browse/SPARK-15142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15272056#comment-15272056 ] Devaraj K commented on SPARK-15142: --- It is not running the queued applications after Mesos master comes up and these are staying in 'Queued Drivers:' forever, the newly submitted applications also will go and add up in queue. Only way I see here is restarting the Spark Mesos dispatcher for launching the newly submitted applications. > Spark Mesos dispatcher becomes unusable when the Mesos master restarts > -- > > Key: SPARK-15142 > URL: https://issues.apache.org/jira/browse/SPARK-15142 > Project: Spark > Issue Type: Bug > Components: Deploy, Mesos >Reporter: Devaraj K >Priority: Minor > > While Spark Mesos dispatcher running if the Mesos master gets restarted then > Spark Mesos dispatcher will keep running and queues up all the submitted > applications and will not launch them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15142) Spark Mesos dispatcher becomes unusable when the Mesos master restarts
Devaraj K created SPARK-15142: - Summary: Spark Mesos dispatcher becomes unusable when the Mesos master restarts Key: SPARK-15142 URL: https://issues.apache.org/jira/browse/SPARK-15142 Project: Spark Issue Type: Bug Components: Deploy, Mesos Reporter: Devaraj K Priority: Minor While Spark Mesos dispatcher running if the Mesos master gets restarted then Spark Mesos dispatcher will keep running and queues up all the submitted applications and will not launch them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10713) SPARK_DIST_CLASSPATH ignored on Mesos executors
[ https://issues.apache.org/jira/browse/SPARK-10713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15271875#comment-15271875 ] Devaraj K commented on SPARK-10713: --- bq. However, on Mesos, SPARK_DIST_CLASSPATH is missing from executors and jar is not in the classpath. It is present on YARN. Am I missing something? Do you see different behavior? In my case, I see that jars/path provided for SPARK_DIST_CLASSPATH is getting included in executors classpath and as well as in the driver's classpath. > SPARK_DIST_CLASSPATH ignored on Mesos executors > --- > > Key: SPARK-10713 > URL: https://issues.apache.org/jira/browse/SPARK-10713 > Project: Spark > Issue Type: Bug > Components: Deploy, Mesos >Affects Versions: 1.5.0 >Reporter: Dara Adib >Priority: Minor > > If I set the environment variable SPARK_DIST_CLASSPATH, the jars are included > on the driver, but not on Mesos executors. Docs: > https://spark.apache.org/docs/latest/hadoop-provided.html > I see SPARK_DIST_CLASSPATH mentioned in these files: > launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java > project/SparkBuild.scala > yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala > But not the Mesos executor (or should it be included by the launcher > library?): > spark/core/src/main/scala/org/apache/spark/executor/Executor.scala -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10713) SPARK_DIST_CLASSPATH ignored on Mesos executors
[ https://issues.apache.org/jira/browse/SPARK-10713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270330#comment-15270330 ] Devaraj K commented on SPARK-10713: --- [~daradib], I tried to reproduce it but seems SPARK_DIST_CLASSPATH is getting included for the driver and as well as on the Mesos executors. Do you still see the issue here? > SPARK_DIST_CLASSPATH ignored on Mesos executors > --- > > Key: SPARK-10713 > URL: https://issues.apache.org/jira/browse/SPARK-10713 > Project: Spark > Issue Type: Bug > Components: Deploy, Mesos >Affects Versions: 1.5.0 >Reporter: Dara Adib >Priority: Minor > > If I set the environment variable SPARK_DIST_CLASSPATH, the jars are included > on the driver, but not on Mesos executors. Docs: > https://spark.apache.org/docs/latest/hadoop-provided.html > I see SPARK_DIST_CLASSPATH mentioned in these files: > launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java > project/SparkBuild.scala > yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala > But not the Mesos executor (or should it be included by the launcher > library?): > spark/core/src/main/scala/org/apache/spark/executor/Executor.scala -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13532) Spark yarn executor container fails if yarn.nodemanager.local-dirs starts with file://
[ https://issues.apache.org/jira/browse/SPARK-13532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268670#comment-15268670 ] Devaraj K commented on SPARK-13532: --- [~apivovarov], I tried to reproduce this issue but it seems working fine for 'yarn.nodemanager.local-dirs' value with prefix file:// and also for 'spark.local.dir' value with the file:// prefix. Node Manager container launch failure could be due to any other reason also. It would be great if you could provide some more information for reproducing this issue. > Spark yarn executor container fails if yarn.nodemanager.local-dirs starts > with file:// > -- > > Key: SPARK-13532 > URL: https://issues.apache.org/jira/browse/SPARK-13532 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.6.0 >Reporter: Alexander Pivovarov >Priority: Minor > > Spark yarn executor container fails if yarn.nodemanager.local-dirs starts > with file:// > {code} > > yarn.nodemanager.local-dirs > file:///data01/yarn/nm,file:///data02/yarn/nm > > {code} > other application, e.g. Hadoop MR and Hive work normally > Spark works only if yarn.nodemanager.local-dirs does not have file:// prefix > e.g. > {code} > /data01/yarn/nm,/data02/yarn/nm > {code} > to reproduce the issue > open spark-shell > run > {code} > $ spark-shell > > sc.parallelize(1 to 10).count > {code} > stack trace in spark-shell is > {code} > scala> sc.parallelize(1 to 10).count > 16/02/28 08:50:37 INFO spark.SparkContext: Starting job: count at :28 > 16/02/28 08:50:37 INFO scheduler.DAGScheduler: Got job 0 (count at > :28) with 2 output partitions > 16/02/28 08:50:37 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 > (count at :28) > 16/02/28 08:50:37 INFO scheduler.DAGScheduler: Parents of final stage: List() > 16/02/28 08:50:37 INFO scheduler.DAGScheduler: Missing parents: List() > 16/02/28 08:50:37 INFO scheduler.DAGScheduler: Submitting ResultStage 0 > (ParallelCollectionRDD[0] at parallelize at :28), which has no > missing parents > 16/02/28 08:50:38 INFO storage.MemoryStore: Block broadcast_0 stored as > values in memory (estimated size 1096.0 B, free 1096.0 B) > 16/02/28 08:50:38 INFO storage.MemoryStore: Block broadcast_0_piece0 stored > as bytes in memory (estimated size 804.0 B, free 1900.0 B) > 16/02/28 08:50:38 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in > memory on 10.101.124.13:39374 (size: 804.0 B, free: 511.5 MB) > 16/02/28 08:50:38 INFO spark.SparkContext: Created broadcast 0 from broadcast > at DAGScheduler.scala:1006 > 16/02/28 08:50:38 INFO scheduler.DAGScheduler: Submitting 2 missing tasks > from ResultStage 0 (ParallelCollectionRDD[0] at parallelize at :28) > 16/02/28 08:50:38 INFO cluster.YarnScheduler: Adding task set 0.0 with 2 tasks > 16/02/28 08:50:39 INFO spark.ExecutorAllocationManager: Requesting 1 new > executor because tasks are backlogged (new desired total will be 1) > 16/02/28 08:50:40 INFO spark.ExecutorAllocationManager: Requesting 1 new > executor because tasks are backlogged (new desired total will be 2) > 16/02/28 08:50:42 INFO cluster.YarnClientSchedulerBackend: Registered > executor NettyRpcEndpointRef(null) (ip-10-101-124-14:34681) with ID 1 > 16/02/28 08:50:42 INFO spark.ExecutorAllocationManager: New executor 1 has > registered (new total is 1) > 16/02/28 08:50:42 INFO scheduler.TaskSetManager: Starting task 0.0 in stage > 0.0 (TID 0, ip-10-101-124-14, partition 0,PROCESS_LOCAL, 2078 bytes) > 16/02/28 08:50:42 INFO storage.BlockManagerMasterEndpoint: Registering block > manager ip-10-101-124-14:58315 with 3.8 GB RAM, BlockManagerId(1, > ip-10-101-124-14, 58315) > 16/02/28 08:50:53 INFO cluster.YarnClientSchedulerBackend: Disabling executor > 1. > 16/02/28 08:50:53 INFO scheduler.DAGScheduler: Executor lost: 1 (epoch 0) > 16/02/28 08:50:53 INFO storage.BlockManagerMasterEndpoint: Trying to remove > executor 1 from BlockManagerMaster. > 16/02/28 08:50:53 INFO storage.BlockManagerMasterEndpoint: Removing block > manager BlockManagerId(1, ip-10-101-124-14, 58315) > 16/02/28 08:50:53 INFO storage.BlockManagerMaster: Removed 1 successfully in > removeExecutor > 16/02/28 08:50:53 ERROR cluster.YarnScheduler: Lost executor 1 on > ip-10-101-124-14: Container marked as failed: > container_1456648448960_0003_01_02 on host: ip-10-101-124-14. Exit > status: 1. Diagnostics: Exception from container-launch. > Container id: container_1456648448960_0003_01_02 > Exit code: 1 > Stack trace: ExitCodeException exitCode=1: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) > at org.apache.hadoop.util.Shell.run(Shell.java:455) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) > at >
[jira] [Commented] (SPARK-15067) YARN executors are launched with fixed perm gen size
[ https://issues.apache.org/jira/browse/SPARK-15067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268659#comment-15268659 ] Devaraj K commented on SPARK-15067: --- [~renatojdk], are you planning to create PR for this? If not please let me know, I can provide PR for this. Thanks. > YARN executors are launched with fixed perm gen size > > > Key: SPARK-15067 > URL: https://issues.apache.org/jira/browse/SPARK-15067 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.6.0, 1.6.1 >Reporter: Renato Falchi Brandão > > It is impossible to change the executors max perm gen size using the property > "spark.executor.extraJavaOptions" when you are running on YARN. > When the JVM option "-XX:MaxPermSize" is set through the property > "spark.executor.extraJavaOptions", Spark put it properly in the shell command > that will start the JVM container but, in the ending of command, it sets > again this option using a fixed value of 256m, as you can see in the log I've > extracted: > 2016-04-30 17:20:12 INFO ExecutorRunnable:58 - > === > YARN executor launch context: > env: > CLASSPATH -> > {{PWD}}{{PWD}}/__spark__.jar$HADOOP_CONF_DIR/usr/hdp/current/hadoop-client/*/usr/hdp/current/hadoop-client/lib/*/usr/hdp/current/hadoop-hdfs-client/*/usr/hdp/current/hadoop-hdfs-client/lib/*/usr/hdp/current/hadoop-yarn-client/*/usr/hdp/current/hadoop-yarn-client/lib/*/usr/hdp/mr-framework/hadoop/share/hadoop/mapreduce/*:/usr/hdp/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:/usr/hdp/mr-framework/hadoop/share/hadoop/common/*:/usr/hdp/mr-framework/hadoop/share/hadoop/common/lib/*:/usr/hdp/mr-framework/hadoop/share/hadoop/yarn/*:/usr/hdp/mr-framework/hadoop/share/hadoop/yarn/lib/*:/usr/hdp/mr-framework/hadoop/share/hadoop/hdfs/*:/usr/hdp/mr-framework/hadoop/share/hadoop/hdfs/lib/*:/usr/hdp/current/hadoop/lib/hadoop-lzo-0.6.0.jar:/etc/hadoop/conf/secure > SPARK_LOG_URL_STDERR -> > http://x0668sl.x.br:8042/node/containerlogs/container_1456962126505_329993_01_02/h_loadbd/stderr?start=-4096 > SPARK_YARN_STAGING_DIR -> .sparkStaging/application_1456962126505_329993 > SPARK_YARN_CACHE_FILES_FILE_SIZES -> 191719054,166 > SPARK_USER -> h_loadbd > SPARK_YARN_CACHE_FILES_VISIBILITIES -> PUBLIC,PUBLIC > SPARK_YARN_MODE -> true > SPARK_YARN_CACHE_FILES_TIME_STAMPS -> 1459806496093,1459808508343 > SPARK_LOG_URL_STDOUT -> > http://x0668sl.x.br:8042/node/containerlogs/container_1456962126505_329993_01_02/h_loadbd/stdout?start=-4096 > SPARK_YARN_CACHE_FILES -> > hdfs://x/user/datalab/hdp/spark/lib/spark-assembly-1.6.0.2.3.4.1-10-hadoop2.7.1.2.3.4.1-10.jar#__spark__.jar,hdfs://tlvcluster/user/datalab/hdp/spark/conf/hive-site.xml#hive-site.xml > command: > {{JAVA_HOME}}/bin/java -server -XX:OnOutOfMemoryError='kill %p' -Xms6144m > -Xmx6144m '-XX:+PrintGCDetails' '-XX:MaxPermSize=1024M' > '-XX:+PrintGCTimeStamps' -Djava.io.tmpdir={{PWD}}/tmp > '-Dspark.akka.timeout=30' '-Dspark.driver.port=62875' > '-Dspark.rpc.askTimeout=30' '-Dspark.rpc.lookupTimeout=30' > -Dspark.yarn.app.container.log.dir= -XX:MaxPermSize=256m > org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url > spark://CoarseGrainedScheduler@10.125.81.42:62875 --executor-id 1 --hostname > x0668sl.x.br --cores 1 --app-id application_1456962126505_329993 > --user-class-path file:$PWD/__app__.jar 1> /stdout 2> > /stderr > Analyzing the code is possible to see that all the options set in the > property "spark.executor.extraJavaOptions" are enclosed, one by one, in > single quotes (ExecutorRunnable.scala:151) before the launcher take the > decision if a default value has to be provided or not for the option > "-XX:MaxPermSize" (ExecutorRunnable.scala:202). > This decision is taken examining all the options set and looking for a string > starting with the value "-XX:MaxPermSize" (CommandBuilderUtils.java:328). If > that value is not found, the default value is set. > A string option starting without single quote will never be found, then, a > default value will always be provided. > A possible solution is change the source code of CommandBuilderUtils.java in > the line 328: > From-> if (arg.startsWith("-XX:MaxPermSize=")) > To-> if (arg.indexOf("-XX:MaxPermSize=") > -1) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13063) Make the SPARK YARN STAGING DIR as configurable
[ https://issues.apache.org/jira/browse/SPARK-13063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216090#comment-15216090 ] Devaraj K commented on SPARK-13063: --- Thanks [~tgraves] for confirmation, I will create PR for this. > Make the SPARK YARN STAGING DIR as configurable > --- > > Key: SPARK-13063 > URL: https://issues.apache.org/jira/browse/SPARK-13063 > Project: Spark > Issue Type: Improvement > Components: YARN >Reporter: Devaraj K >Priority: Minor > > SPARK YARN STAGING DIR is based on the file system home directory. If the > user wants to change this staging directory due to the same used by any other > applications, there is no provision for the user to specify a different > directory for staging dir. > {code:xml} > val stagingDirPath = new Path(remoteFs.getHomeDirectory, stagingDir) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14234) Executor crashes for TaskRunner thread interruption
Devaraj K created SPARK-14234: - Summary: Executor crashes for TaskRunner thread interruption Key: SPARK-14234 URL: https://issues.apache.org/jira/browse/SPARK-14234 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Devaraj K If the TaskRunner thread gets interrupted while running due to task kill or any other reason, the interrupted thread will try to update the task status as part of the exception handling and fails with the below exception. This is happening from all of these catch blocks statusUpdate calls, below are the exceptions correspondingly for all these catch cases. {code:title=Executor.scala|borderStyle=solid} case _: TaskKilledException | _: InterruptedException if task.killed => .. case cDE: CommitDeniedException => .. case t: Throwable => .. {code} {code:xml} 16/03/29 17:32:33 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-2,5,main] java.lang.Error: java.nio.channels.ClosedByInterruptException at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1151) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.nio.channels.ClosedByInterruptException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) at java.nio.channels.Channels$WritableByteChannelImpl.write(Channels.java:460) at org.apache.spark.util.SerializableBuffer$$anonfun$writeObject$1.apply(SerializableBuffer.scala:49) at org.apache.spark.util.SerializableBuffer$$anonfun$writeObject$1.apply(SerializableBuffer.scala:47) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1204) at org.apache.spark.util.SerializableBuffer.writeObject(SerializableBuffer.scala:47) at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43) at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100) at org.apache.spark.rpc.netty.NettyRpcEnv.serialize(NettyRpcEnv.scala:253) at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:192) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:513) at org.apache.spark.executor.CoarseGrainedExecutorBackend.statusUpdate(CoarseGrainedExecutorBackend.scala:135) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ... 2 more {code} {code:xml} 16/03/29 08:00:29 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-4,5,main] java.lang.Error: java.nio.channels.ClosedByInterruptException at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1151) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.nio.channels.ClosedByInterruptException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) at java.nio.channels.Channels$WritableByteChannelImpl.write(Channels.java:460) .. at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:192) at
[jira] [Resolved] (SPARK-13965) TaskSetManager should kill the other running task attempts if any one task attempt succeeds for the same task
[ https://issues.apache.org/jira/browse/SPARK-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K resolved SPARK-13965. --- Resolution: Duplicate > TaskSetManager should kill the other running task attempts if any one task > attempt succeeds for the same task > - > > Key: SPARK-13965 > URL: https://issues.apache.org/jira/browse/SPARK-13965 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 >Reporter: Devaraj K > > When we enable speculation, Driver would launch additional attempts for the > same task if it founds that attempt is progressing slow compared to other > tasks average progress and then there will be multiple task attempts in > running state. > At present, if any one attempt gets succeeded others would be keep running > (even they could run till the job completion) and cannot be given these slots > to other tasks in same stage or in next stages. > We can kill these running task attempts when any other attempt gets succeeded > and can be given the slots to run other tasks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13965) TaskSetManager should kill the other running task attempts if any one task attempt succeeds for the same task
[ https://issues.apache.org/jira/browse/SPARK-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated SPARK-13965: -- Summary: TaskSetManager should kill the other running task attempts if any one task attempt succeeds for the same task (was: Driver should kill the other running task attempts if any one task attempt succeeds for the same task) > TaskSetManager should kill the other running task attempts if any one task > attempt succeeds for the same task > - > > Key: SPARK-13965 > URL: https://issues.apache.org/jira/browse/SPARK-13965 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.1 >Reporter: Devaraj K > > When we enable speculation, Driver would launch additional attempts for the > same task if it founds that attempt is progressing slow compared to other > tasks average progress and then there will be multiple task attempts in > running state. > At present, if any one attempt gets succeeded others would be keep running > (even they could run till the job completion) and cannot be given these slots > to other tasks in same stage or in next stages. > We can kill these running task attempts when any other attempt gets succeeded > and can be given the slots to run other tasks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13965) Driver should kill the other running task attempts if any one task attempt succeeds for the same task
Devaraj K created SPARK-13965: - Summary: Driver should kill the other running task attempts if any one task attempt succeeds for the same task Key: SPARK-13965 URL: https://issues.apache.org/jira/browse/SPARK-13965 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.6.1 Reporter: Devaraj K When we enable speculation, Driver would launch additional attempts for the same task if it founds that attempt is progressing slow compared to other tasks average progress and then there will be multiple task attempts in running state. At present, if any one attempt gets succeeded others would be keep running (even they could run till the job completion) and cannot be given these slots to other tasks in same stage or in next stages. We can kill these running task attempts when any other attempt gets succeeded and can be given the slots to run other tasks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13621) TestExecutor.scala needs to be moved to test package
Devaraj K created SPARK-13621: - Summary: TestExecutor.scala needs to be moved to test package Key: SPARK-13621 URL: https://issues.apache.org/jira/browse/SPARK-13621 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.6.0, 2.0.0 Reporter: Devaraj K Priority: Minor TestExecutor.scala is in the package core\src\main\scala\org\apache\spark\deploy\client\ and it is getting used only by test classes. It needs to be moved to test package i.e. core\src\test\scala\org\apache\spark\deploy\client\ since the purpose of it is for test. And also core\src\main\scala\org\apache\spark\deploy\client\TestClient.scala is not getting used any where and present in the src, I think it can be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13619) Jobs page UI shows wrong number of failed tasks
Devaraj K created SPARK-13619: - Summary: Jobs page UI shows wrong number of failed tasks Key: SPARK-13619 URL: https://issues.apache.org/jira/browse/SPARK-13619 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 1.6.0, 2.0.0 Reporter: Devaraj K Priority: Minor In Master and History Server UI's, Jobs page shows the wrong number of failed tasks. http://X.X.X.X:8080/history/app-20160303024135-0001/jobs/ h3. Completed Jobs (1) ||Job Id|| Description|| Submitted|| Duration|| Stages: Succeeded/Total|| Tasks (for all stages): Succeeded/Total|| |0 | saveAsTextFile at PipeLineTest.java:52| 2016/03/03 02:41:36 | 16 s | 2/2 | 100/100 (2 failed)| \\ \\ When we go to the Job details page, we can see different number for failed tasks and It is the correct number based on the failed tasks. http://x.x.x.x:8080/history/app-20160303024135-0001/jobs/job/?id=0 h3. Completed Stages (2) ||Stage Id||Description|| Submitted|| Duration|| Tasks: Succeeded/Total||Input|| Output||Shuffle Read|| Shuffle Write|| |1| saveAsTextFile at PipeLineTest.java:52 +details|2016/03/03 02:41:51| 1 s|50/50 (6 failed)| |7.6 KB|371.0 KB| | |0| mapToPair at PipeLineTest.java:29 +details|2016/03/03 02:41:36| 15 s| 50/50| 1521.7 MB| | | 371.0 KB| -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13117) WebUI should use the local ip not 0.0.0.0
[ https://issues.apache.org/jira/browse/SPARK-13117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173224#comment-15173224 ] Devaraj K commented on SPARK-13117: --- I think we can start the Jetty server with the default value as "0.0.0.0" and it can take effect of the configured value for SPARK_PUBLIC_DNS if it is configured. It would change only for the Web UI and doesn't impact any other things. Changes will be some thing like below, {code:xml} protected val publicHostName = Option(conf.getenv("SPARK_PUBLIC_DNS")).getOrElse("0.0.0.0") {code} {code:xml} try { serverInfo = Some(startJettyServer(publicHostName, port, sslOptions, handlers, conf, name)) logInfo("Started %s at http://%s:%d".format(className, publicHostName, boundPort)) } catch { {code} [~srowen], any suggestions? Thanks > WebUI should use the local ip not 0.0.0.0 > - > > Key: SPARK-13117 > URL: https://issues.apache.org/jira/browse/SPARK-13117 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.6.0 >Reporter: Jeremiah Jordan >Priority: Minor > > When SPARK_LOCAL_IP is set everything seems to correctly bind and use that IP > except the WebUI. The WebUI should use the SPARK_LOCAL_IP not always use > 0.0.0.0 > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ui/WebUI.scala#L137 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13117) WebUI should use the local ip not 0.0.0.0
[ https://issues.apache.org/jira/browse/SPARK-13117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173218#comment-15173218 ] Devaraj K commented on SPARK-13117: --- [~jaypanicker], the proposed PR does the same but there is a problem while accessing web UI using the localhost or 127.0.0.1. Please have look into this comment https://github.com/apache/spark/pull/11133#issuecomment-188937933. > WebUI should use the local ip not 0.0.0.0 > - > > Key: SPARK-13117 > URL: https://issues.apache.org/jira/browse/SPARK-13117 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.6.0 >Reporter: Jeremiah Jordan >Priority: Minor > > When SPARK_LOCAL_IP is set everything seems to correctly bind and use that IP > except the WebUI. The WebUI should use the SPARK_LOCAL_IP not always use > 0.0.0.0 > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ui/WebUI.scala#L137 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13445) Selecting "data" with window function does not work unless aliased (using PARTITION BY)
[ https://issues.apache.org/jira/browse/SPARK-13445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated SPARK-13445: -- Summary: Selecting "data" with window function does not work unless aliased (using PARTITION BY) (was: Seleting "data" with window function does not work unless aliased (using PARTITION BY)) > Selecting "data" with window function does not work unless aliased (using > PARTITION BY) > --- > > Key: SPARK-13445 > URL: https://issues.apache.org/jira/browse/SPARK-13445 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Reynold Xin >Priority: Critical > > The code does not throw an exception if "data" is aliased. Maybe this is a > reserved word or aliases are just required when using PARTITION BY? > {code} > sql(""" > SELECT > data as the_data, > row_number() over (partition BY data.type) AS foo > FROM event_record_sample > """) > {code} > However, this code throws an error: > {code} > sql(""" > SELECT > data, > row_number() over (partition BY data.type) AS foo > FROM event_record_sample > """) > {code} > {code} > org.apache.spark.sql.AnalysisException: resolved attribute(s) type#15246 > missing from > data#15107,par_cat#15112,schemaMajorVersion#15110,source#15108,recordId#15103,features#15106,eventType#15105,ts#15104L,schemaMinorVersion#15111,issues#15109 > in operator !Project [data#15107,type#15246]; > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:44) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:183) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:105) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:104) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:104) > at scala.collection.immutable.List.foreach(List.scala:318) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:104) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:104) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:104) > at scala.collection.immutable.List.foreach(List.scala:318) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:104) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:104) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:104) > at scala.collection.immutable.List.foreach(List.scala:318) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:104) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:50) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:44) > at > org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34) > at org.apache.spark.sql.DataFrame.(DataFrame.scala:133) > at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52) > at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:816) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13061) Error in spark rest api application info for job names contains spaces
[ https://issues.apache.org/jira/browse/SPARK-13061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141340#comment-15141340 ] Devaraj K commented on SPARK-13061: --- You have mentioned the id as 'Spark shell' in the issue description, I don't think that is the way the API returns. {code:xml} http://spark.mysite.com:20888/proxy/application_1447676402999_1254/api/v1/applications/ returns: [ { "id" : "Spark shell", "name" : "Spark shell", {code} If we are requesting HTTP server with some URL which is having spaces using any browser or any other client then the browser/client would encode the URL(as part of encoding it replaces spaces with %20) before sending the request to the HTTP Server. This is happening when you are passing id as "Spark shell". {code:xml}/applications/[app-id]/jobs/[job-id] Details for the given job{code} I think you need to pass the job-id if you want to get details for the specific job not the name. > Error in spark rest api application info for job names contains spaces > -- > > Key: SPARK-13061 > URL: https://issues.apache.org/jira/browse/SPARK-13061 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.2 >Reporter: Avihoo Mamka >Priority: Trivial > Labels: rest_api, spark > > When accessing spark rest api with application id to get job specific id > status, a job with name containing whitespaces are being encoded to '%20' and > therefore the rest api returns `no such app`. > For example: > http://spark.mysite.com:20888/proxy/application_1447676402999_1254/api/v1/applications/ > returns: > [ { > "id" : "Spark shell", > "name" : "Spark shell", > "attempts" : [ { > "startTime" : "2016-01-28T09:20:58.526GMT", > "endTime" : "1969-12-31T23:59:59.999GMT", > "sparkUser" : "", > "completed" : false > } ] > } ] > and then when accessing: > http://spark.mysite.com:20888/proxy/application_1447676402999_1254/api/v1/applications/Spark > shell/ > the result returned is: > unknown app: Spark%20shell -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13061) Error in spark rest api application info for job names contains spaces
[ https://issues.apache.org/jira/browse/SPARK-13061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140601#comment-15140601 ] Devaraj K commented on SPARK-13061: --- [~avihoo], I am trying to reproduce the issue but I see the id format as below when I submit to standalone master and yarn cluster. {code:xml} [ { "id" : "app-20160210202703-", "name" : "Spark shell", {code} {code:xml} }, { "id" : "application_1452616238844_0041", "name" : "Spark Pi", {code} Can you give more details like how you are able to get the id as 'Spark shell' and which process REST api is returning like this? > Error in spark rest api application info for job names contains spaces > -- > > Key: SPARK-13061 > URL: https://issues.apache.org/jira/browse/SPARK-13061 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.2 >Reporter: Avihoo Mamka >Priority: Trivial > Labels: rest_api, spark > > When accessing spark rest api with application id to get job specific id > status, a job with name containing whitespaces are being encoded to '%20' and > therefore the rest api returns `no such app`. > For example: > http://spark.mysite.com:20888/proxy/application_1447676402999_1254/api/v1/applications/ > returns: > [ { > "id" : "Spark shell", > "name" : "Spark shell", > "attempts" : [ { > "startTime" : "2016-01-28T09:20:58.526GMT", > "endTime" : "1969-12-31T23:59:59.999GMT", > "sparkUser" : "", > "completed" : false > } ] > } ] > and then when accessing: > http://spark.mysite.com:20888/proxy/application_1447676402999_1254/api/v1/applications/Spark > shell/ > the result returned is: > unknown app: Spark%20shell -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13117) WebUI should use the local ip not 0.0.0.0
[ https://issues.apache.org/jira/browse/SPARK-13117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138251#comment-15138251 ] Devaraj K commented on SPARK-13117: --- Thanks [~jjordan]. > WebUI should use the local ip not 0.0.0.0 > - > > Key: SPARK-13117 > URL: https://issues.apache.org/jira/browse/SPARK-13117 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.6.0 >Reporter: Jeremiah Jordan > > When SPARK_LOCAL_IP is set everything seems to correctly bind and use that IP > except the WebUI. The WebUI should use the SPARK_LOCAL_IP not always use > 0.0.0.0 > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ui/WebUI.scala#L137 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13117) WebUI should use the local ip not 0.0.0.0
[ https://issues.apache.org/jira/browse/SPARK-13117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137320#comment-15137320 ] Devaraj K commented on SPARK-13117: --- Thanks [~jjordan] for reporting. I would like to provide PR if you are not planning to work on this. Please let me know, Thanks. > WebUI should use the local ip not 0.0.0.0 > - > > Key: SPARK-13117 > URL: https://issues.apache.org/jira/browse/SPARK-13117 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.6.0 >Reporter: Jeremiah Jordan > > When SPARK_LOCAL_IP is set everything seems to correctly bind and use that IP > except the WebUI. The WebUI should use the SPARK_LOCAL_IP not always use > 0.0.0.0 > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ui/WebUI.scala#L137 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13016) Replace example code in mllib-dimensionality-reduction.md using include_example
[ https://issues.apache.org/jira/browse/SPARK-13016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137310#comment-15137310 ] Devaraj K commented on SPARK-13016: --- I am working on this, I will provide PR for this. Thanks > Replace example code in mllib-dimensionality-reduction.md using > include_example > --- > > Key: SPARK-13016 > URL: https://issues.apache.org/jira/browse/SPARK-13016 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Reporter: Xusen Yin >Priority: Minor > Labels: starter > > See examples in other finished sub-JIRAs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13063) Make the SPARK YARN STAGING DIR as configurable
[ https://issues.apache.org/jira/browse/SPARK-13063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15121427#comment-15121427 ] Devaraj K commented on SPARK-13063: --- Yes, the sub-directories under the staging dir are specific to each application. If 'spark.yarn.preserve.staging.files' is enabled then the application will not remove them after completion and when we want to analyze these it would become difficult to identify the directories belong to spark applications. And also if we want to delete preserved staging files(could be after analysis), we should be careful that we will not be removing any other types of apps staging files. I think it would be good if we have logically separate staging directory for all spark apps which is configurable to avoid the difficulties when they are mixed with different types of apps. I can provide a PR for making it configurable and keep the current behavior as it is when the user doesn't specify staging directory. > Make the SPARK YARN STAGING DIR as configurable > --- > > Key: SPARK-13063 > URL: https://issues.apache.org/jira/browse/SPARK-13063 > Project: Spark > Issue Type: Improvement > Components: YARN >Reporter: Devaraj K >Priority: Minor > > SPARK YARN STAGING DIR is based on the file system home directory. If the > user wants to change this staging directory due to the same used by any other > applications, there is no provision for the user to specify a different > directory for staging dir. > {code:xml} > val stagingDirPath = new Path(remoteFs.getHomeDirectory, stagingDir) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13063) Make the SPARK YARN STAGING DIR as configurable
Devaraj K created SPARK-13063: - Summary: Make the SPARK YARN STAGING DIR as configurable Key: SPARK-13063 URL: https://issues.apache.org/jira/browse/SPARK-13063 Project: Spark Issue Type: Bug Components: YARN Reporter: Devaraj K SPARK YARN STAGING DIR is based on the file system home directory. If the user wants to change this staging directory due to the same used by any other applications, there is no provision for the user to specify a different directory for staging dir. {code:xml} val stagingDirPath = new Path(remoteFs.getHomeDirectory, stagingDir) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13063) Make the SPARK YARN STAGING DIR as configurable
[ https://issues.apache.org/jira/browse/SPARK-13063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15121239#comment-15121239 ] Devaraj K commented on SPARK-13063: --- If this default location is already used by another apps(like Mapreduce), users may need to change it to some other directory. Presently we don’t have provision to change the value, we can make this as configurable and have the current staging dir as default value. > Make the SPARK YARN STAGING DIR as configurable > --- > > Key: SPARK-13063 > URL: https://issues.apache.org/jira/browse/SPARK-13063 > Project: Spark > Issue Type: Improvement > Components: YARN >Reporter: Devaraj K >Priority: Minor > > SPARK YARN STAGING DIR is based on the file system home directory. If the > user wants to change this staging directory due to the same used by any other > applications, there is no provision for the user to specify a different > directory for staging dir. > {code:xml} > val stagingDirPath = new Path(remoteFs.getHomeDirectory, stagingDir) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13063) Make the SPARK YARN STAGING DIR as configurable
[ https://issues.apache.org/jira/browse/SPARK-13063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15121523#comment-15121523 ] Devaraj K commented on SPARK-13063: --- I don't think it would be good to store the files in the spark decided location in the file system for the user apps and not giving the provision to change the location for the user. I don't want to compare with MR but if we see in MR, it provides 'yarn.app.mapreduce.am.staging-dir' config for staging dir. I agree that it adds an another configuration if we move ahead. Thanks. > Make the SPARK YARN STAGING DIR as configurable > --- > > Key: SPARK-13063 > URL: https://issues.apache.org/jira/browse/SPARK-13063 > Project: Spark > Issue Type: Improvement > Components: YARN >Reporter: Devaraj K >Priority: Minor > > SPARK YARN STAGING DIR is based on the file system home directory. If the > user wants to change this staging directory due to the same used by any other > applications, there is no provision for the user to specify a different > directory for staging dir. > {code:xml} > val stagingDirPath = new Path(remoteFs.getHomeDirectory, stagingDir) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1253) Need to load mapred-site.xml for reading mapreduce.application.classpath
[ https://issues.apache.org/jira/browse/SPARK-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069890#comment-15069890 ] Devaraj K commented on SPARK-1253: -- Thanks [~srowen] for letting me know. I tried to reproduce it but seems it is loading the mapred-site.xml file and giving the configurations updated in that file. I think it is not a problem anymore and it can be closed unless there are no other expectations here. > Need to load mapred-site.xml for reading mapreduce.application.classpath > > > Key: SPARK-1253 > URL: https://issues.apache.org/jira/browse/SPARK-1253 > Project: Spark > Issue Type: Bug > Components: YARN >Reporter: Sandy Pérez González > > In Spark on YARN, we use mapreduce.application.classpath to discover the > location of the MR jars so that we can add them executor classpaths. > This config comes from mapred-site.xml, which we aren't loading. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1253) Need to load mapred-site.xml for reading mapreduce.application.classpath
[ https://issues.apache.org/jira/browse/SPARK-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15067962#comment-15067962 ] Devaraj K commented on SPARK-1253: -- [~sandyr], Do you want to work on this issue? I would like to provide a PR for this if you don't mind, Thanks. > Need to load mapred-site.xml for reading mapreduce.application.classpath > > > Key: SPARK-1253 > URL: https://issues.apache.org/jira/browse/SPARK-1253 > Project: Spark > Issue Type: Bug > Components: YARN >Reporter: Sandy Pérez González >Assignee: Sandy Ryza > > In Spark on YARN, we use mapreduce.application.classpath to discover the > location of the MR jars so that we can add them executor classpaths. > This config comes from mapred-site.xml, which we aren't loading. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12316) Stack overflow with endless call of `Delegation token thread` when application end.
[ https://issues.apache.org/jira/browse/SPARK-12316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059602#comment-15059602 ] Devaraj K commented on SPARK-12316: --- [~carlmartin], can you provide the stack trace for this error? Thanks. > Stack overflow with endless call of `Delegation token thread` when > application end. > --- > > Key: SPARK-12316 > URL: https://issues.apache.org/jira/browse/SPARK-12316 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.6.0 >Reporter: SaintBacchus > > When application end, AM will clean the staging dir. > But if the driver trigger to update the delegation token, it will can't find > the right token file and then it will endless cycle call the method > 'updateCredentialsIfRequired'. > Then it lead to StackOverflowError. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4117) Spark on Yarn handle AM being told command from RM
[ https://issues.apache.org/jira/browse/SPARK-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035721#comment-15035721 ] Devaraj K commented on SPARK-4117: -- Thanks [~tgraves] for the pointer. I will provide PR for this to avoid unnecessary retries when it gets ApplicationAttemptNotFoundException. > Spark on Yarn handle AM being told command from RM > -- > > Key: SPARK-4117 > URL: https://issues.apache.org/jira/browse/SPARK-4117 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 1.2.0 >Reporter: Thomas Graves > > In the allocateResponse from the RM it can send commands that the AM should > follow. for instance AM_RESYNC and AM_SHUTDOWN. We should add support for > those. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-8276) NPE in YarnClientSchedulerBackend.stop
[ https://issues.apache.org/jira/browse/SPARK-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K resolved SPARK-8276. -- Resolution: Duplicate Resolving it as duplicate of SPARK-8754, Please reopen it if you disagree. > NPE in YarnClientSchedulerBackend.stop > -- > > Key: SPARK-8276 > URL: https://issues.apache.org/jira/browse/SPARK-8276 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.5.0 >Reporter: Steve Loughran >Priority: Minor > > NPE seen in {{YarnClientSchedulerBackend.stop()}} after problem setting up > job; on the line {{monitorThread.interrupt()}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4117) Spark on Yarn handle AM being told command from RM
[ https://issues.apache.org/jira/browse/SPARK-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026526#comment-15026526 ] Devaraj K commented on SPARK-4117: -- For *ApplicationAttemptNotFoundException*, there is no explicit handling and it is getting handled as an Uncaught exception and then the ApplicationMaster is shutting down. Below is the code which does this. {code:title=ApplicationMaster.scala|borderStyle=solid} if (isClusterMode) { runDriver(securityMgr) } else { runExecutorLauncher(securityMgr) } } catch { case e: Exception => // catch everything else if not specifically handled logError("Uncaught exception: ", e) finish(FinalApplicationStatus.FAILED, ApplicationMaster.EXIT_UNCAUGHT_EXCEPTION, "Uncaught exception: " + e) } {code} Please find the exception stack trace from the log, {code:xml} 15/11/25 19:51:24 WARN cluster.YarnClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 15/11/25 19:51:30 ERROR yarn.ApplicationMaster: Uncaught exception: org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: Application attempt appattempt_1448461020570_0001_01 doesn't exist in ApplicationMasterService cache. at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:391) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) ………. at com.sun.proxy.$Proxy16.allocate(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:277) at org.apache.spark.deploy.yarn.YarnAllocator.allocateResources(YarnAllocator.scala:231) at org.apache.spark.deploy.yarn.ApplicationMaster.registerAM(ApplicationMaster.scala:292) at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:336) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:185) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:653) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:69) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:68) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:68) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:651) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException): Application attempt appattempt_1448461020570_0001_01 doesn't exist in ApplicationMasterService cache. at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:391) ….. at com.sun.proxy.$Proxy15.allocate(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77) ... 21 more 15/11/25 19:51:30 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: Application attempt appattempt_1448461020570_0001_01 doesn't exist in ApplicationMasterService cache. at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:391) ……….. at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2213) ) 15/11/25 19:51:30 INFO spark.SparkContext: Invoking stop() from shutdown hook {code} For *ApplicationMasterNotRegisteredException*, YarnAllocator.allocateResources() is invoking amClient.allocate() and AMRMClientImpl.allocate() is internally handling the ApplicationMasterNotRegisteredException by resyncing with the ResourceManager. Please find the below piece of code which handles this. {code:title=YarnAllocator.scala|borderStyle=solid} def allocateResources(): Unit = synchronized { updateResourceRequests() val progressIndicator = 0.1f // Poll the ResourceManager. This doubles as a heartbeat if there are no pending container // requests. val allocateResponse = amClient.allocate(progressIndicator) {code} {code:title=org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl|borderStyle=solid}
[jira] [Commented] (SPARK-4117) Spark on Yarn handle AM being told command from RM
[ https://issues.apache.org/jira/browse/SPARK-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023785#comment-15023785 ] Devaraj K commented on SPARK-4117: -- In YARN, now AM commands(AM_RESYNC, AM_SHUTDOWN) are deprecated and no more RM sends these to AM. Instead of sending these commands, RM throws ApplicationMasterNotRegisteredException for syncing with ResourceManager and ApplicationAttemptNotFoundException for letting the AM to shutdown itself. I see these both scenarios are already handled and ApplicationMaster does the same. [~tgraves], Do you have any other expectations from this Jira or can we close this ticket? > Spark on Yarn handle AM being told command from RM > -- > > Key: SPARK-4117 > URL: https://issues.apache.org/jira/browse/SPARK-4117 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 1.2.0 >Reporter: Thomas Graves > > In the allocateResponse from the RM it can send commands that the AM should > follow. for instance AM_RESYNC and AM_SHUTDOWN. We should add support for > those. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8276) NPE in YarnClientSchedulerBackend.stop
[ https://issues.apache.org/jira/browse/SPARK-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023790#comment-15023790 ] Devaraj K commented on SPARK-8276: -- I think it is duplicate of SPARK-8754. > NPE in YarnClientSchedulerBackend.stop > -- > > Key: SPARK-8276 > URL: https://issues.apache.org/jira/browse/SPARK-8276 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.5.0 >Reporter: Steve Loughran >Priority: Minor > > NPE seen in {{YarnClientSchedulerBackend.stop()}} after problem setting up > job; on the line {{monitorThread.interrupt()}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8754) YarnClientSchedulerBackend doesn't stop gracefully in failure conditions
Devaraj K created SPARK-8754: Summary: YarnClientSchedulerBackend doesn't stop gracefully in failure conditions Key: SPARK-8754 URL: https://issues.apache.org/jira/browse/SPARK-8754 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.4.0 Reporter: Devaraj K Priority: Minor {code:xml} java.lang.NullPointerException at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.stop(YarnClientSchedulerBackend.scala:151) at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:421) at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1447) at org.apache.spark.SparkContext.stop(SparkContext.scala:1651) at org.apache.spark.SparkContext.init(SparkContext.scala:572) at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:28) at org.apache.spark.examples.SparkPi.main(SparkPi.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:621) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:170) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) {code} If the application has FINISHED/FAILED/KILLED or failed to launch application master, monitorThread is not getting initialized but monitorThread.interrupt() is getting invoked as part of stop() without any check and It is causing to throw NPE and also it is preventing to stop the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1936) Add apache header and remove author tags
[ https://issues.apache.org/jira/browse/SPARK-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009352#comment-14009352 ] Devaraj K commented on SPARK-1936: -- Pull Request added : https://github.com/apache/spark/pull/890 Add apache header and remove author tags Key: SPARK-1936 URL: https://issues.apache.org/jira/browse/SPARK-1936 Project: Spark Issue Type: Bug Reporter: Devaraj K Priority: Minor These below files don’t have apache header and contain author tags. {code:xml} spark\repl\src\main\scala\org\apache\spark\repl\SparkExprTyper.scala spark\repl\src\main\scala\org\apache\spark\repl\SparkILoop.scala spark\repl\src\main\scala\org\apache\spark\repl\SparkILoopInit.scala spark\repl\src\main\scala\org\apache\spark\repl\SparkIMain.scala spark\repl\src\main\scala\org\apache\spark\repl\SparkImports.scala spark\repl\src\main\scala\org\apache\spark\repl\SparkJLineCompletion.scala spark\repl\src\main\scala\org\apache\spark\repl\SparkJLineReader.scala spark\repl\src\main\scala\org\apache\spark\repl\SparkMemberHandlers.scala {code} -- This message was sent by Atlassian JIRA (v6.2#6252)