[jira] [Created] (SPARK-26737) Executor/Task STDERR & STDOUT log urls are not correct in Yarn deployment mode

2019-01-25 Thread Devaraj K (JIRA)
Devaraj K created SPARK-26737:
-

 Summary: Executor/Task STDERR & STDOUT log urls are not correct in 
Yarn deployment mode
 Key: SPARK-26737
 URL: https://issues.apache.org/jira/browse/SPARK-26737
 Project: Spark
  Issue Type: Bug
  Components: Web UI, YARN
Affects Versions: 3.0.0
Reporter: Devaraj K




Base of the STDERR & STDOUT log urls are generating like these which is also 
including key,

{code}
http://ip:8042/node/containerlogs/container_1544212645385_0252_01_01/(SPARK_USER,
 devaraj)
{code}


{code}
http://ip:8042/node/containerlogs/container_1544212645385_0252_01_01/(USER, 
devaraj)
{code}

Instead of 
{code}http://ip:8042/node/containerlogs/container_1544212645385_0251_01_02/devaraj
 {code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26650) Yarn Client throws 'ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration'

2019-01-17 Thread Devaraj K (JIRA)
Devaraj K created SPARK-26650:
-

 Summary: Yarn Client throws 'ClassNotFoundException: 
org.apache.hadoop.hbase.HBaseConfiguration'
 Key: SPARK-26650
 URL: https://issues.apache.org/jira/browse/SPARK-26650
 Project: Spark
  Issue Type: Bug
  Components: Build, YARN
Affects Versions: 3.0.0
Reporter: Devaraj K


{code:xml}
19/01/17 11:33:00 WARN security.HBaseDelegationTokenProvider: Fail to invoke 
HBaseConfiguration
java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at 
org.apache.spark.deploy.security.HBaseDelegationTokenProvider.hbaseConf(HBaseDelegationTokenProvider.scala:69)
at 
org.apache.spark.deploy.security.HBaseDelegationTokenProvider.delegationTokensRequired(HBaseDelegationTokenProvider.scala:62)
at 
org.apache.spark.deploy.security.HadoopDelegationTokenManager.$anonfun$obtainDelegationTokens$1(HadoopDelegationTokenManager.scala:134)
at 
scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:244)
at scala.collection.Iterator.foreach(Iterator.scala:941)
at scala.collection.Iterator.foreach$(Iterator.scala:941)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
at 
scala.collection.MapLike$DefaultValuesIterable.foreach(MapLike.scala:213)
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:244)
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:241)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
at 
org.apache.spark.deploy.security.HadoopDelegationTokenManager.obtainDelegationTokens(HadoopDelegationTokenManager.scala:133)
at 
org.apache.spark.deploy.yarn.security.YARNHadoopDelegationTokenManager.obtainDelegationTokens(YARNHadoopDelegationTokenManager.scala:59)
at 
org.apache.spark.deploy.yarn.Client.setupSecurityToken(Client.scala:305)
at 
org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:1014)
at 
org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:181)
at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:58)
at 
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:184)
at org.apache.spark.SparkContext.(SparkContext.scala:509)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2466)
at 
org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$5(SparkSession.scala:948)
at scala.Option.getOrElse(Option.scala:138)
at 
org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:939)
at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:30)
at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:853)
at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:168)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:196)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:87)
at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:932)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:941)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
19/01/17 11:33:00 INFO yarn.Client: Submitting application 
application_1544212645385_0197 to ResourceManager
19/01/17 11:33:00 INFO impl.YarnClientImpl: Submitted application 
application_1544212645385_0197
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24787) Events being dropped at an alarming rate due to hsync being slow for eventLogging

2018-10-16 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16652687#comment-16652687
 ] 

Devaraj K commented on SPARK-24787:
---

It seems here the overhead is coming due the force call FileChannel.force in 
Datanode which is part of the hsync to write the data to the storage device. 
And the hsync is not making much difference with and without the flag 
SyncFlag.UPDATE_LENGTH, it might be because the update length is simple call to 
NameNode to update the length.

I think the hsync change can be reverted, and the history server can get the 
latest file length using the DFSInputStream.getFileLength() which includes 
lastBlockBeingWrittenLength, if the cached length is same as 
FileStatus.getLen() then history server can make additional call to get the 
latest length using DFSInputStream.getFileLength() and decide whether to update 
the history log or not.

> Events being dropped at an alarming rate due to hsync being slow for 
> eventLogging
> -
>
> Key: SPARK-24787
> URL: https://issues.apache.org/jira/browse/SPARK-24787
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Sanket Reddy
>Priority: Minor
>
> [https://github.com/apache/spark/pull/16924/files] updates the length of the 
> inprogress files allowing history server being responsive.
> Although we have a production job that has 6 tasks per stage and due to 
> hsync being slow it starts dropping events and the history server has wrong 
> stats due to events being dropped.
> A viable solution is not to make it sync very frequently or make it 
> configurable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25683) Make AsyncEventQueue.lastReportTimestamp inital value as the currentTime instead of 0

2018-10-08 Thread Devaraj K (JIRA)
Devaraj K created SPARK-25683:
-

 Summary: Make AsyncEventQueue.lastReportTimestamp inital value as 
the currentTime instead of 0
 Key: SPARK-25683
 URL: https://issues.apache.org/jira/browse/SPARK-25683
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 2.3.2
Reporter: Devaraj K


{code:xml}
18/10/08 17:51:40 ERROR AsyncEventQueue: Dropping event from queue eventLog. 
This likely means one of the listeners is too slow and cannot keep up with the 
rate at which tasks are being started by the scheduler.
18/10/08 17:51:40 WARN AsyncEventQueue: Dropped 1 events from eventLog since 
Wed Dec 31 16:00:00 PST 1969.
18/10/08 17:52:40 WARN AsyncEventQueue: Dropped 144853 events from eventLog 
since Mon Oct 08 17:51:40 PDT 2018.
{code}

Here it shows the time as Wed Dec 31 16:00:00 PST 1969 for the first log, I 
think it would be better if we show the initialized time as the time here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25645) Add provision to disable EventLoggingListener default flush/hsync/hflush for all events

2018-10-05 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K resolved SPARK-25645.
---
Resolution: Duplicate

> Add provision to disable EventLoggingListener default flush/hsync/hflush for 
> all events
> ---
>
> Key: SPARK-25645
> URL: https://issues.apache.org/jira/browse/SPARK-25645
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.2
>Reporter: Devaraj K
>Priority: Major
>
> {code:java|title=EventLoggingListener.scala|borderStyle=solid}
> private def logEvent(event: SparkListenerEvent, flushLogger: Boolean = false) 
> {
> val eventJson = JsonProtocol.sparkEventToJson(event)
> // scalastyle:off println
> writer.foreach(_.println(compact(render(eventJson
> // scalastyle:on println
> if (flushLogger) {
>   writer.foreach(_.flush())
>   hadoopDataStream.foreach(ds => ds.getWrappedStream match {
> case wrapped: DFSOutputStream => 
> wrapped.hsync(EnumSet.of(SyncFlag.UPDATE_LENGTH))
> case _ => ds.hflush()
>   })
> }
> {code}
> There are events which come with flushLogger=true and go through the 
> underlying stream flush, Here I tried running apps with disabling the 
> flush/hsync/hflush for all events and see that there is significant 
> improvement in the app completion time and also there are no event drops, 
> posting more details in the comments section.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25645) Add provision to disable EventLoggingListener default flush/hsync/hflush for all events

2018-10-04 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639079#comment-16639079
 ] 

Devaraj K commented on SPARK-25645:
---

{code:java|title=with hflush(no hsync)|borderStyle=solid}
18/10/04 17:01:12 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
18/10/04 17:01:13 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive 
is set, falling back to uploading libraries under SPARK_HOME.
18/10/04 17:03:35 WARN AsyncEventQueue: [ADDED-LOG-BEFORE-DISPATCH-THREAD-JOIN] 
Thread.currentThread():main, dispatchThread:spark-listener-group-appStatus, 
eventQueue.size(): 1
18/10/04 17:03:35 WARN AsyncEventQueue: [ADDED-LOG-AFTER-DISPATCH-THREAD-JOIN] 
Thread.currentThread():main, dispatchThread:spark-listener-group-appStatus, 
eventQueue.size(): 0
18/10/04 17:03:35 WARN AsyncEventQueue: [ADDED-LOG-BEFORE-DISPATCH-THREAD-JOIN] 
Thread.currentThread():main, 
dispatchThread:spark-listener-group-executorManagement, eventQueue.size(): 1
18/10/04 17:03:35 WARN AsyncEventQueue: [ADDED-LOG-AFTER-DISPATCH-THREAD-JOIN] 
Thread.currentThread():main, 
dispatchThread:spark-listener-group-executorManagement, eventQueue.size(): 0
18/10/04 17:03:35 WARN AsyncEventQueue: [ADDED-LOG-BEFORE-DISPATCH-THREAD-JOIN] 
Thread.currentThread():main, dispatchThread:spark-listener-group-eventLog, 
eventQueue.size(): 1
18/10/04 17:03:35 WARN AsyncEventQueue: [ADDED-LOG-AFTER-DISPATCH-THREAD-JOIN] 
Thread.currentThread():main, dispatchThread:spark-listener-group-eventLog, 
eventQueue.size(): 0

{code}
with hflush(no hsync), it is slightly(2 sec) taking more than the no-hflush for 
all events, and don't see any dropped events here as well.

> Add provision to disable EventLoggingListener default flush/hsync/hflush for 
> all events
> ---
>
> Key: SPARK-25645
> URL: https://issues.apache.org/jira/browse/SPARK-25645
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.2
>Reporter: Devaraj K
>Priority: Major
>
> {code:java|title=EventLoggingListener.scala|borderStyle=solid}
> private def logEvent(event: SparkListenerEvent, flushLogger: Boolean = false) 
> {
> val eventJson = JsonProtocol.sparkEventToJson(event)
> // scalastyle:off println
> writer.foreach(_.println(compact(render(eventJson
> // scalastyle:on println
> if (flushLogger) {
>   writer.foreach(_.flush())
>   hadoopDataStream.foreach(ds => ds.getWrappedStream match {
> case wrapped: DFSOutputStream => 
> wrapped.hsync(EnumSet.of(SyncFlag.UPDATE_LENGTH))
> case _ => ds.hflush()
>   })
> }
> {code}
> There are events which come with flushLogger=true and go through the 
> underlying stream flush, Here I tried running apps with disabling the 
> flush/hsync/hflush for all events and see that there is significant 
> improvement in the app completion time and also there are no event drops, 
> posting more details in the comments section.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25645) Add provision to disable EventLoggingListener default flush/hsync/hflush for all events

2018-10-04 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639046#comment-16639046
 ] 

Devaraj K commented on SPARK-25645:
---

Thanks [~vanzin] for the jira pointer, I haven't tried just with hflush, let me 
try with hflush and post the result for the same app.

> Add provision to disable EventLoggingListener default flush/hsync/hflush for 
> all events
> ---
>
> Key: SPARK-25645
> URL: https://issues.apache.org/jira/browse/SPARK-25645
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.2
>Reporter: Devaraj K
>Priority: Major
>
> {code:java|title=EventLoggingListener.scala|borderStyle=solid}
> private def logEvent(event: SparkListenerEvent, flushLogger: Boolean = false) 
> {
> val eventJson = JsonProtocol.sparkEventToJson(event)
> // scalastyle:off println
> writer.foreach(_.println(compact(render(eventJson
> // scalastyle:on println
> if (flushLogger) {
>   writer.foreach(_.flush())
>   hadoopDataStream.foreach(ds => ds.getWrappedStream match {
> case wrapped: DFSOutputStream => 
> wrapped.hsync(EnumSet.of(SyncFlag.UPDATE_LENGTH))
> case _ => ds.hflush()
>   })
> }
> {code}
> There are events which come with flushLogger=true and go through the 
> underlying stream flush, Here I tried running apps with disabling the 
> flush/hsync/hflush for all events and see that there is significant 
> improvement in the app completion time and also there are no event drops, 
> posting more details in the comments section.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25645) Add provision to disable EventLoggingListener default flush/hsync/hflush for all events

2018-10-04 Thread Devaraj K (JIRA)
Devaraj K created SPARK-25645:
-

 Summary: Add provision to disable EventLoggingListener default 
flush/hsync/hflush for all events
 Key: SPARK-25645
 URL: https://issues.apache.org/jira/browse/SPARK-25645
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 2.3.2
Reporter: Devaraj K


{code:java|title=EventLoggingListener.scala|borderStyle=solid}
private def logEvent(event: SparkListenerEvent, flushLogger: Boolean = false) {
val eventJson = JsonProtocol.sparkEventToJson(event)
// scalastyle:off println
writer.foreach(_.println(compact(render(eventJson
// scalastyle:on println
if (flushLogger) {
  writer.foreach(_.flush())
  hadoopDataStream.foreach(ds => ds.getWrappedStream match {
case wrapped: DFSOutputStream => 
wrapped.hsync(EnumSet.of(SyncFlag.UPDATE_LENGTH))
case _ => ds.hflush()
  })
}
{code}
There are events which come with flushLogger=true and go through the underlying 
stream flush, Here I tried running apps with disabling the flush/hsync/hflush 
for all events and see that there is significant improvement in the app 
completion time and also there are no event drops, posting more details in the 
comments section.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25645) Add provision to disable EventLoggingListener default flush/hsync/hflush for all events

2018-10-04 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639005#comment-16639005
 ] 

Devaraj K commented on SPARK-25645:
---

{code:java|title=Present Behavior(flushLogger=true for some 
events)|borderStyle=solid}
18/10/04 15:00:25 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
18/10/04 15:00:26 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive 
is set, falling back to uploading libraries under SPARK_HOME.
18/10/04 15:00:58 ERROR AsyncEventQueue: Dropping event from queue eventLog. 
This likely means one of the listeners is too slow and cannot keep up with the 
rate at which tasks are being started by the scheduler.
18/10/04 15:00:58 WARN AsyncEventQueue: Dropped 2 events from eventLog since 
Wed Dec 31 16:00:00 PST 1969.
18/10/04 15:01:58 WARN AsyncEventQueue: Dropped 216493 events from eventLog 
since Thu Oct 04 15:00:58 PDT 2018.
18/10/04 15:02:44 WARN AsyncEventQueue: [ADDED-LOG-BEFORE-DISPATCH-THREAD-JOIN] 
Thread.currentThread():main, dispatchThread:spark-listener-group-appStatus, 
eventQueue.size(): 1
18/10/04 15:02:44 WARN AsyncEventQueue: [ADDED-LOG-AFTER-DISPATCH-THREAD-JOIN] 
Thread.currentThread():main, dispatchThread:spark-listener-group-appStatus, 
eventQueue.size(): 0
18/10/04 15:02:44 WARN AsyncEventQueue: [ADDED-LOG-BEFORE-DISPATCH-THREAD-JOIN] 
Thread.currentThread():main, 
dispatchThread:spark-listener-group-executorManagement, eventQueue.size(): 0
18/10/04 15:02:44 WARN AsyncEventQueue: [ADDED-LOG-AFTER-DISPATCH-THREAD-JOIN] 
Thread.currentThread():main, 
dispatchThread:spark-listener-group-executorManagement, eventQueue.size(): 0
18/10/04 15:02:44 WARN AsyncEventQueue: [ADDED-LOG-BEFORE-DISPATCH-THREAD-JOIN] 
Thread.currentThread():main, dispatchThread:spark-listener-group-eventLog, 
eventQueue.size(): 1
18/10/04 15:03:39 WARN AsyncEventQueue: [ADDED-LOG-AFTER-DISPATCH-THREAD-JOIN] 
Thread.currentThread():main, dispatchThread:spark-listener-group-eventLog, 
eventQueue.size(): 0

{code}
With the present behavior, taking 3 mins 14 sec to complete the application 
with the dropped events. And it is taking 55 sec to clear the eventLog queue at 
the end of the application.
{code:java|title=flush/hsync/hflush disabled for all events|borderStyle=solid}
18/10/04 14:51:33 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
18/10/04 14:51:34 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive 
is set, falling back to uploading libraries under SPARK_HOME.
18/10/04 14:53:54 WARN AsyncEventQueue: [ADDED-LOG-BEFORE-DISPATCH-THREAD-JOIN] 
Thread.currentThread():main, dispatchThread:spark-listener-group-appStatus, 
eventQueue.size(): 0
18/10/04 14:53:54 WARN AsyncEventQueue: [ADDED-LOG-AFTER-DISPATCH-THREAD-JOIN] 
Thread.currentThread():main, dispatchThread:spark-listener-group-appStatus, 
eventQueue.size(): 0
18/10/04 14:53:54 WARN AsyncEventQueue: [ADDED-LOG-BEFORE-DISPATCH-THREAD-JOIN] 
Thread.currentThread():main, 
dispatchThread:spark-listener-group-executorManagement, eventQueue.size(): 0
18/10/04 14:53:54 WARN AsyncEventQueue: [ADDED-LOG-AFTER-DISPATCH-THREAD-JOIN] 
Thread.currentThread():main, 
dispatchThread:spark-listener-group-executorManagement, eventQueue.size(): 0
18/10/04 14:53:54 WARN AsyncEventQueue: [ADDED-LOG-BEFORE-DISPATCH-THREAD-JOIN] 
Thread.currentThread():main, dispatchThread:spark-listener-group-eventLog, 
eventQueue.size(): 0
18/10/04 14:53:54 WARN AsyncEventQueue: [ADDED-LOG-AFTER-DISPATCH-THREAD-JOIN] 
Thread.currentThread():main, dispatchThread:spark-listener-group-eventLog, 
eventQueue.size(): 0

{code}
With the disabled flush/hsync/hflush for all events, taking 2 mins 21 sec to 
complete the application without any dropped events. And also there are no 
pending events in eventLog queue at the end of the application.

> Add provision to disable EventLoggingListener default flush/hsync/hflush for 
> all events
> ---
>
> Key: SPARK-25645
> URL: https://issues.apache.org/jira/browse/SPARK-25645
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.2
>Reporter: Devaraj K
>Priority: Major
>
> {code:java|title=EventLoggingListener.scala|borderStyle=solid}
> private def logEvent(event: SparkListenerEvent, flushLogger: Boolean = false) 
> {
> val eventJson = JsonProtocol.sparkEventToJson(event)
> // scalastyle:off println
> writer.foreach(_.println(compact(render(eventJson
> // scalastyle:on println
> if (flushLogger) {
>   writer.foreach(_.flush())
>   hadoopDataStream.foreach(ds => ds.getWrappedStream match {
> case wrapped: DFSOutputStream => 
> 

[jira] [Created] (SPARK-25637) SparkException: Could not find CoarseGrainedScheduler occurs during the application stop

2018-10-03 Thread Devaraj K (JIRA)
Devaraj K created SPARK-25637:
-

 Summary: SparkException: Could not find CoarseGrainedScheduler 
occurs during the application stop
 Key: SPARK-25637
 URL: https://issues.apache.org/jira/browse/SPARK-25637
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 2.3.2
Reporter: Devaraj K


{code:xml}
2018-10-03 14:51:33 ERROR Inbox:91 - Ignoring error
org.apache.spark.SparkException: Could not find CoarseGrainedScheduler.
at 
org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:160)
at 
org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:140)
at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:187)
at 
org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:528)
at 
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.reviveOffers(CoarseGrainedSchedulerBackend.scala:449)
at 
org.apache.spark.scheduler.TaskSchedulerImpl.executorLost(TaskSchedulerImpl.scala:638)
at 
org.apache.spark.HeartbeatReceiver$$anonfun$org$apache$spark$HeartbeatReceiver$$expireDeadHosts$3.apply(HeartbeatReceiver.scala:201)
at 
org.apache.spark.HeartbeatReceiver$$anonfun$org$apache$spark$HeartbeatReceiver$$expireDeadHosts$3.apply(HeartbeatReceiver.scala:197)
at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130)
at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130)
at 
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:130)
at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at 
org.apache.spark.HeartbeatReceiver.org$apache$spark$HeartbeatReceiver$$expireDeadHosts(HeartbeatReceiver.scala:197)
at 
org.apache.spark.HeartbeatReceiver$$anonfun$receiveAndReply$1.applyOrElse(HeartbeatReceiver.scala:120)
at 
org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:105)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
at 
org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:221)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}
SPARK-14228 fixed these kind of errors but still this is occurring while 
performing reviveOffers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25636) spark-submit swallows the failure reason when there is an error connecting to master

2018-10-03 Thread Devaraj K (JIRA)
Devaraj K created SPARK-25636:
-

 Summary: spark-submit swallows the failure reason when there is an 
error connecting to master
 Key: SPARK-25636
 URL: https://issues.apache.org/jira/browse/SPARK-25636
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.3.2
Reporter: Devaraj K


{code:xml}
[apache-spark]$ ./bin/spark-submit --verbose --master spark://

Error: Exception thrown in awaitResult:
Run with --help for usage help or --verbose for debug output
{code}

When the spark submit cannot connect to master, there is no error shown. I 
think it should display the cause for the problem.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25246) When the spark.eventLog.compress is enabled, the Application is not showing in the History server UI ('incomplete application' page), initially.

2018-09-14 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16615398#comment-16615398
 ] 

Devaraj K commented on SPARK-25246:
---

I think it is not a problem, the behavior might be based on the Codec using for 
compression, can you try with other Codecs and observe the behavior.

> When the spark.eventLog.compress is enabled, the Application is not showing 
> in the History server UI ('incomplete application' page), initially.
> 
>
> Key: SPARK-25246
> URL: https://issues.apache.org/jira/browse/SPARK-25246
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.1
>Reporter: shahid
>Priority: Major
>
> 1) bin/spark-shell --master yarn --conf "spark.eventLog.compress=true" 
> 2) hdfs dfs -ls /spark-logs 
> {code:java}
> -rwxrwx---   1 root supergroup  *0* 2018-08-27 03:26 
> /spark-logs/application_1535313809919_0005.lz4.inprogress
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25009) Standalone Cluster mode application submit is not working

2018-08-02 Thread Devaraj K (JIRA)
Devaraj K created SPARK-25009:
-

 Summary: Standalone Cluster mode application submit is not working
 Key: SPARK-25009
 URL: https://issues.apache.org/jira/browse/SPARK-25009
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.3.1
Reporter: Devaraj K


It is not showing any error while submitting but the app is not running and as 
well as not showing in the web UI.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24129) Add option to pass --build-arg's to docker-image-tool.sh

2018-04-30 Thread Devaraj K (JIRA)
Devaraj K created SPARK-24129:
-

 Summary: Add option to pass --build-arg's to docker-image-tool.sh
 Key: SPARK-24129
 URL: https://issues.apache.org/jira/browse/SPARK-24129
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 2.4.0
Reporter: Devaraj K


When we are working behind the firewall, we may need to pass the proxy details 
as part of the docker --build-arg parameters to build the image. But 
docker-image-tool.sh doesn't provide option to pass the proxy details or the 
--build-arg to the docker command.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24003) Add support to provide spark.executor.extraJavaOptions in terms of App Id and/or Executor Id's

2018-04-17 Thread Devaraj K (JIRA)
Devaraj K created SPARK-24003:
-

 Summary: Add support to provide spark.executor.extraJavaOptions in 
terms of App Id and/or Executor Id's
 Key: SPARK-24003
 URL: https://issues.apache.org/jira/browse/SPARK-24003
 Project: Spark
  Issue Type: Improvement
  Components: Mesos, Spark Core, YARN
Affects Versions: 2.3.0
Reporter: Devaraj K


Users may want to enable gc logging or heap dump for the executors, but there 
is a chance of overwriting it by other executors since the paths cannot be 
expressed dynamically. This improvement would enable to express the 
spark.executor.extraJavaOptions paths in terms of App Id and Executor Id's to 
avoid the overwriting by other executors.

There was a discussion about this in SPARK-3767, but it never fixed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22567) spark.mesos.executor.memoryOverhead equivalent for the Driver when running on Mesos

2018-02-26 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377370#comment-16377370
 ] 

Devaraj K commented on SPARK-22567:
---

Dup of SPARK-17928.

[~michaelmoss], can you check the PR available for SPARK-17928?

> spark.mesos.executor.memoryOverhead equivalent for the Driver when running on 
> Mesos
> ---
>
> Key: SPARK-22567
> URL: https://issues.apache.org/jira/browse/SPARK-22567
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Affects Versions: 2.2.0
>Reporter: Michael Moss
>Priority: Minor
>
> spark.mesos.executor.memoryOverhead is:
> "The amount of additional memory, specified in MB, to be allocated per 
> executor. By default, the overhead will be larger of either 384 or 10% of 
> spark.executor.memory"
> It is important for every JVM process to have memory available to it, beyond 
> its heap (Xmx) for native allocations.
> When using the MesosClusterDispatcher and running the Driver on Mesos 
> (https://spark.apache.org/docs/latest/running-on-mesos.html#cluster-mode), it 
> appears that the Driver's mesos sandbox is allocated with the same amount of 
> memory (configured with spark.driver.memory) as the heap (Xmx) itself. This 
> increases the prevalence of OOM exceptions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22404) Provide an option to use unmanaged AM in yarn-client mode

2018-01-02 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16308802#comment-16308802
 ] 

Devaraj K commented on SPARK-22404:
---

Thanks [~irashid] for the comment.

bq. can you provide a little more explanation for the point of this?

An unmanagedAM is an AM that is not launched and managed by the RM. The client 
creates a new application on the RM and negotiates a new attempt id. Then it 
waits for the RM app state to reach be YarnApplicationState.ACCEPTED after 
which it spawns the AM in same/another process and passes it the container id 
via env variable Environment.CONTAINER_ID. The AM(as part of same or different 
process) can register with the RM using the attempt id obtained from the 
container id and proceed as normal.

In this PR/JIRA, providing a new configuration "spark.yarn.un-managed-am" 
(defaults to false) to enable the Unmanaged AM Application in Yarn Client mode 
which starts the Application Master service as part of the Client. It utilizes 
the existing code for communicating between the Application Master <-> Task 
Scheduler for the container requests/allocations/launch, and eliminates these,
*   Allocating and launching the Application Master container
*   Remote Node/Process communication between Application Master <-> Task 
Scheduler

bq. how much time does this save for you?
It removes the AM container scheduling and launching time, and eliminates the 
AM acting as proxy for requesting, launching and removing executors. I can post 
the comparison results here with and without unmanaged am.

bq. What's the downside of an unmanaged AM?
Unmanaged AM service would run as part of the Client, Client can handle if 
anything goes wrong with the unmanaged AM service unlike relaunching the AM 
container for failures.

bq. the idea makes sense, but the yarn interaction and client mode is already 
pretty complicated so I'd like good justication for this
In this PR, it reuses the most of the existing code for communication between 
AM <-> Task Scheduler but happens in the same process. The Client starts the AM 
service in the same process when the applications state is ACCEPTED and 
proceeds as usual without disrupting existing flow.


> Provide an option to use unmanaged AM in yarn-client mode
> -
>
> Key: SPARK-22404
> URL: https://issues.apache.org/jira/browse/SPARK-22404
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 2.2.0
>Reporter: Devaraj K
>
> There was an issue SPARK-1200 to provide an option but was closed without 
> fixing.
> Using an unmanaged AM in yarn-client mode would allow apps to start up 
> faster, but not requiring the container launcher AM to be launched on the 
> cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14228) Lost executor of RPC disassociated, and occurs exception: Could not find CoarseGrainedScheduler or it has been stopped

2017-12-11 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16286282#comment-16286282
 ] 

Devaraj K commented on SPARK-14228:
---

[~KaiXinXIaoLei], Thanks for checking this. Is the issue you are mentioning 
different from the two instances mentioned in the PR, can you create a JIRA 
with the exception stacktrace?

> Lost executor of RPC disassociated, and occurs exception: Could not find 
> CoarseGrainedScheduler or it has been stopped
> --
>
> Key: SPARK-14228
> URL: https://issues.apache.org/jira/browse/SPARK-14228
> Project: Spark
>  Issue Type: Bug
>Reporter: meiyoula
> Fix For: 2.3.0
>
>
> When I start 1000 executors, and then stop the process. It will call 
> SparkContext.stop to stop all executors. But during this process, the 
> executors has been killed will lost of rpc with driver, and try to 
> reviveOffers, but can't find CoarseGrainedScheduler or it has been stopped.
> {quote}
> 16/03/29 01:45:45 ERROR YarnScheduler: Lost executor 610 on 51-196-152-8: 
> remote Rpc client disassociated
> 16/03/29 01:45:45 ERROR Inbox: Ignoring error
> org.apache.spark.SparkException: Could not find CoarseGrainedScheduler or it 
> has been stopped.
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:161)
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:131)
>   at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:173)
>   at 
> org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:398)
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.reviveOffers(CoarseGrainedSchedulerBackend.scala:314)
>   at 
> org.apache.spark.scheduler.TaskSchedulerImpl.executorLost(TaskSchedulerImpl.scala:482)
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.removeExecutor(CoarseGrainedSchedulerBackend.scala:261)
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anonfun$onDisconnected$1.apply(CoarseGrainedSchedulerBackend.scala:207)
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anonfun$onDisconnected$1.apply(CoarseGrainedSchedulerBackend.scala:207)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.onDisconnected(CoarseGrainedSchedulerBackend.scala:207)
>   at 
> org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:144)
>   at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
>   at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:102)
>   at 
> org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22519) Remove unnecessary stagingDirPath null check in ApplicationMaster.cleanupStagingDir()

2017-11-14 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated SPARK-22519:
--
Summary: Remove unnecessary stagingDirPath null check in 
ApplicationMaster.cleanupStagingDir()  (was: 
ApplicationMaster.cleanupStagingDir() throws NPE when SPARK_YARN_STAGING_DIR 
env var is not available)

> Remove unnecessary stagingDirPath null check in 
> ApplicationMaster.cleanupStagingDir()
> -
>
> Key: SPARK-22519
> URL: https://issues.apache.org/jira/browse/SPARK-22519
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 2.2.0
>Reporter: Devaraj K
>Priority: Minor
>
> In the below, the condition checks whether the stagingDirPath is null but 
> stagingDirPath never becomes null. If SPARK_YARN_STAGING_DIR env var is null 
> then it throws NPE while creating the Path.
> {code:title=ApplicationMaster.scala|borderStyle=solid}
> stagingDirPath = new Path(System.getenv("SPARK_YARN_STAGING_DIR"))
> if (stagingDirPath == null) {
>   logError("Staging directory is null")
>   return
> }
> {code}
> Here we need to check whether the System.getenv("SPARK_YARN_STAGING_DIR") is 
> null or not, not the stagingDirPath.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22519) Remove unnecessary stagingDirPath null check in ApplicationMaster.cleanupStagingDir()

2017-11-14 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated SPARK-22519:
--
Priority: Trivial  (was: Minor)

> Remove unnecessary stagingDirPath null check in 
> ApplicationMaster.cleanupStagingDir()
> -
>
> Key: SPARK-22519
> URL: https://issues.apache.org/jira/browse/SPARK-22519
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 2.2.0
>Reporter: Devaraj K
>Priority: Trivial
>
> In the below, the condition checks whether the stagingDirPath is null but 
> stagingDirPath never becomes null. If SPARK_YARN_STAGING_DIR env var is null 
> then it throws NPE while creating the Path.
> {code:title=ApplicationMaster.scala|borderStyle=solid}
> stagingDirPath = new Path(System.getenv("SPARK_YARN_STAGING_DIR"))
> if (stagingDirPath == null) {
>   logError("Staging directory is null")
>   return
> }
> {code}
> Here we need to check whether the System.getenv("SPARK_YARN_STAGING_DIR") is 
> null or not, not the stagingDirPath.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22519) ApplicationMaster.cleanupStagingDir() throws NPE when SPARK_YARN_STAGING_DIR env var is not available

2017-11-14 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16251925#comment-16251925
 ] 

Devaraj K commented on SPARK-22519:
---

It is not an usual case, I have seen this NPE while working SPARK-22404 when 
the SPARK_YARN_STAGING_DIR env doesn't exist. If you don't think to have a null 
check for env, atleast the *if (stagingDirPath == null) {* is never used.

> ApplicationMaster.cleanupStagingDir() throws NPE when SPARK_YARN_STAGING_DIR 
> env var is not available
> -
>
> Key: SPARK-22519
> URL: https://issues.apache.org/jira/browse/SPARK-22519
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 2.2.0
>Reporter: Devaraj K
>Priority: Minor
>
> In the below, the condition checks whether the stagingDirPath is null but 
> stagingDirPath never becomes null. If SPARK_YARN_STAGING_DIR env var is null 
> then it throws NPE while creating the Path.
> {code:title=ApplicationMaster.scala|borderStyle=solid}
> stagingDirPath = new Path(System.getenv("SPARK_YARN_STAGING_DIR"))
> if (stagingDirPath == null) {
>   logError("Staging directory is null")
>   return
> }
> {code}
> Here we need to check whether the System.getenv("SPARK_YARN_STAGING_DIR") is 
> null or not, not the stagingDirPath.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-22519) ApplicationMaster.cleanupStagingDir() throws NPE when SPARK_YARN_STAGING_DIR env var is not available

2017-11-14 Thread Devaraj K (JIRA)
Devaraj K created SPARK-22519:
-

 Summary: ApplicationMaster.cleanupStagingDir() throws NPE when 
SPARK_YARN_STAGING_DIR env var is not available
 Key: SPARK-22519
 URL: https://issues.apache.org/jira/browse/SPARK-22519
 Project: Spark
  Issue Type: Improvement
  Components: YARN
Affects Versions: 2.2.0
Reporter: Devaraj K
Priority: Minor


In the below, the condition checks whether the stagingDirPath is null but 
stagingDirPath never becomes null. If SPARK_YARN_STAGING_DIR env var is null 
then it throws NPE while creating the Path.

{code:title=ApplicationMaster.scala|borderStyle=solid}
stagingDirPath = new Path(System.getenv("SPARK_YARN_STAGING_DIR"))
if (stagingDirPath == null) {
  logError("Staging directory is null")
  return
}
{code}


Here we need to check whether the System.getenv("SPARK_YARN_STAGING_DIR") is 
null or not, not the stagingDirPath.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-22404) Provide an option to use unmanaged AM in yarn-client mode

2017-10-30 Thread Devaraj K (JIRA)
Devaraj K created SPARK-22404:
-

 Summary: Provide an option to use unmanaged AM in yarn-client mode
 Key: SPARK-22404
 URL: https://issues.apache.org/jira/browse/SPARK-22404
 Project: Spark
  Issue Type: Improvement
  Components: YARN
Affects Versions: 2.2.0
Reporter: Devaraj K


There was an issue SPARK-1200 to provide an option but was closed without 
fixing.

Using an unmanaged AM in yarn-client mode would allow apps to start up faster, 
but not requiring the container launcher AM to be launched on the cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22404) Provide an option to use unmanaged AM in yarn-client mode

2017-10-30 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226015#comment-16226015
 ] 

Devaraj K commented on SPARK-22404:
---

I am working on this, will update this jira with the proposal PR.

> Provide an option to use unmanaged AM in yarn-client mode
> -
>
> Key: SPARK-22404
> URL: https://issues.apache.org/jira/browse/SPARK-22404
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 2.2.0
>Reporter: Devaraj K
>
> There was an issue SPARK-1200 to provide an option but was closed without 
> fixing.
> Using an unmanaged AM in yarn-client mode would allow apps to start up 
> faster, but not requiring the container launcher AM to be launched on the 
> cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-22172) Worker hangs when the external shuffle service port is already in use

2017-09-29 Thread Devaraj K (JIRA)
Devaraj K created SPARK-22172:
-

 Summary: Worker hangs when the external shuffle service port is 
already in use
 Key: SPARK-22172
 URL: https://issues.apache.org/jira/browse/SPARK-22172
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.2.0
Reporter: Devaraj K


When the external shuffle service port is already in use, Worker throws the 
below BindException and hangs forever, I think the exception should be handled 
gracefully. 

{code:xml}
17/09/29 11:16:30 INFO ExternalShuffleService: Starting shuffle service on port 
7337 (auth enabled = false)
17/09/29 11:16:30 ERROR Inbox: Ignoring error
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at 
io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:128)
at 
io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:500)
at 
io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1218)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:495)
at 
io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:480)
at 
io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:965)
at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:209)
at 
io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:355)
at 
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:399)

{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19417) spark.files.overwrite is ignored

2017-09-22 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16177253#comment-16177253
 ] 

Devaraj K commented on SPARK-19417:
---

Thanks [~ckanich] for the test case.
{code:title=SparkContext.scala|borderStyle=solid}
  def addFile(path: String, recursive: Boolean): Unit = {
  
val timestamp = System.currentTimeMillis
if (addedFiles.putIfAbsent(key, timestamp).isEmpty) {
  logInfo(s"Added file $path at $key with timestamp $timestamp")
  // Fetch the file locally so that closures which are run on the driver 
can still use the
  // SparkFiles API to access files.
  Utils.fetchFile(uri.toString, new File(SparkFiles.getRootDirectory()), 
conf,
env.securityManager, hadoopConfiguration, timestamp, useCache = false)
  postEnvironmentUpdate()
}
{code}
It is not adding the file if it exists already and it seems to be the 
intentional behavior, Please find the discussion here 
https://github.com/apache/spark/pull/14396.

Do you have any real use case to have this?

> spark.files.overwrite is ignored
> 
>
> Key: SPARK-19417
> URL: https://issues.apache.org/jira/browse/SPARK-19417
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: Chris Kanich
>
> I have not been able to get Spark to actually overwrite a file after I have 
> changed it on the driver node, re-called addFile, and then used it on the 
> executors again. Here's a failing test.
> {code}
>   test("can overwrite files when spark.files.overwrite is true") {
> val dir = Utils.createTempDir()
> val file = new File(dir, "file")
> try {
>   Files.write("one", file, StandardCharsets.UTF_8)
>   sc = new SparkContext(new 
> SparkConf().setAppName("test").setMaster("local-cluster[1,1,1024]")
>  .set("spark.files.overwrite", "true"))
>   sc.addFile(file.getAbsolutePath)
>   def getAddedFileContents(): String = {
> sc.parallelize(Seq(0)).map { _ =>
>   scala.io.Source.fromFile(SparkFiles.get("file")).mkString
> }.first()
>   }
>   assert(getAddedFileContents() === "one")
>   Files.write("two", file, StandardCharsets.UTF_8)
>   sc.addFile(file.getAbsolutePath)
>   assert(getAddedFileContents() === "onetwo")
> } finally {
>   Utils.deleteRecursively(dir)
>   sc.stop()
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18648) spark-shell --jars option does not add jars to classpath on windows

2017-08-10 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121817#comment-16121817
 ] 

Devaraj K commented on SPARK-18648:
---

[~FlamingMike], It has fixed as part of SPARK-21339, can you check this issue 
with SPARK-21339 change if you have chance? Thanks

> spark-shell --jars option does not add jars to classpath on windows
> ---
>
> Key: SPARK-18648
> URL: https://issues.apache.org/jira/browse/SPARK-18648
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, Windows
>Affects Versions: 2.0.2
> Environment: Windows 7 x64
>Reporter: Michel Lemay
>  Labels: windows
>
> I can't import symbols from command line jars when in the shell:
> Adding jars via --jars:
> {code}
> spark-shell --master local[*] --jars path\to\deeplearning4j-core-0.7.0.jar
> {code}
> Same result if I add it through maven coordinates:
> {code}spark-shell --master local[*] --packages 
> org.deeplearning4j:deeplearning4j-core:0.7.0
> {code}
> I end up with:
> {code}
> scala> import org.deeplearning4j
> :23: error: object deeplearning4j is not a member of package org
>import org.deeplearning4j
> {code}
> NOTE: It is working as expected when running on linux.
> Sample output with --verbose:
> {code}
> Using properties file: null
> Parsed arguments:
>   master  local[*]
>   deployMode  null
>   executorMemory  null
>   executorCores   null
>   totalExecutorCores  null
>   propertiesFile  null
>   driverMemorynull
>   driverCores null
>   driverExtraClassPathnull
>   driverExtraLibraryPath  null
>   driverExtraJavaOptions  null
>   supervise   false
>   queue   null
>   numExecutorsnull
>   files   null
>   pyFiles null
>   archivesnull
>   mainClass   org.apache.spark.repl.Main
>   primaryResource spark-shell
>   nameSpark shell
>   childArgs   []
>   jars
> file:/C:/Apps/Spark/spark-2.0.2-bin-hadoop2.4/bin/../deeplearning4j-core-0.7.0.jar
>   packagesnull
>   packagesExclusions  null
>   repositoriesnull
>   verbose true
> Spark properties used, including those specified through
>  --conf and those from the properties file null:
> Main class:
> org.apache.spark.repl.Main
> Arguments:
> System properties:
> SPARK_SUBMIT -> true
> spark.app.name -> Spark shell
> spark.jars -> 
> file:/C:/Apps/Spark/spark-2.0.2-bin-hadoop2.4/bin/../deeplearning4j-core-0.7.0.jar
> spark.submit.deployMode -> client
> spark.master -> local[*]
> Classpath elements:
> file:/C:/Apps/Spark/spark-2.0.2-bin-hadoop2.4/bin/../deeplearning4j-core-0.7.0.jar
> 16/11/30 08:30:49 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 16/11/30 08:30:51 WARN SparkContext: Use an existing SparkContext, some 
> configuration may not take effect.
> Spark context Web UI available at http://192.168.70.164:4040
> Spark context available as 'sc' (master = local[*], app id = 
> local-1480512651325).
> Spark session available as 'spark'.
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 2.0.2
>   /_/
> Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_101)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> import org.deeplearning4j
> :23: error: object deeplearning4j is not a member of package org
>import org.deeplearning4j
>   ^
> scala>
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15142) Spark Mesos dispatcher becomes unusable when the Mesos master restarts

2017-07-25 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100079#comment-16100079
 ] 

Devaraj K commented on SPARK-15142:
---

bq. That means there is no way to detect the new master while the dispatcher is 
still alive, it must be restarted when the new master is up, correct?

Yes, the https://github.com/apache/spark/pull/13143 doesn't handle discovery of 
new master when the dispatcher is still alive. You can reopen this jira and 
create a PR if you want to work on.

> Spark Mesos dispatcher becomes unusable when the Mesos master restarts
> --
>
> Key: SPARK-15142
> URL: https://issues.apache.org/jira/browse/SPARK-15142
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Mesos
>Reporter: Devaraj K
>Priority: Minor
> Attachments: 
> spark-devaraj-org.apache.spark.deploy.mesos.MesosClusterDispatcher-1-stobdtserver5.out
>
>
> While Spark Mesos dispatcher running if the Mesos master gets restarted then 
> Spark Mesos dispatcher will keep running and queues up all the submitted 
> applications and will not launch them.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15142) Spark Mesos dispatcher becomes unusable when the Mesos master restarts

2017-07-24 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K resolved SPARK-15142.
---
Resolution: Duplicate

> Spark Mesos dispatcher becomes unusable when the Mesos master restarts
> --
>
> Key: SPARK-15142
> URL: https://issues.apache.org/jira/browse/SPARK-15142
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Mesos
>Reporter: Devaraj K
>Priority: Minor
> Attachments: 
> spark-devaraj-org.apache.spark.deploy.mesos.MesosClusterDispatcher-1-stobdtserver5.out
>
>
> While Spark Mesos dispatcher running if the Mesos master gets restarted then 
> Spark Mesos dispatcher will keep running and queues up all the submitted 
> applications and will not launch them.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15142) Spark Mesos dispatcher becomes unusable when the Mesos master restarts

2017-07-24 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099535#comment-16099535
 ] 

Devaraj K commented on SPARK-15142:
---

[~skonto] Thanks for showing interest on this. I have already created a PR for 
SPARK-15359 which fixes this issue. 

> Spark Mesos dispatcher becomes unusable when the Mesos master restarts
> --
>
> Key: SPARK-15142
> URL: https://issues.apache.org/jira/browse/SPARK-15142
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Mesos
>Reporter: Devaraj K
>Priority: Minor
> Attachments: 
> spark-devaraj-org.apache.spark.deploy.mesos.MesosClusterDispatcher-1-stobdtserver5.out
>
>
> While Spark Mesos dispatcher running if the Mesos master gets restarted then 
> Spark Mesos dispatcher will keep running and queues up all the submitted 
> applications and will not launch them.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21146) Master/Worker should handle and shutdown when any thread gets UncaughtException

2017-06-30 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated SPARK-21146:
--
Summary: Master/Worker should handle and shutdown when any thread gets 
UncaughtException  (was: Worker should handle and shutdown when any thread gets 
UncaughtException)

> Master/Worker should handle and shutdown when any thread gets 
> UncaughtException
> ---
>
> Key: SPARK-21146
> URL: https://issues.apache.org/jira/browse/SPARK-21146
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.1.1
>Reporter: Devaraj K
>
> {code:xml}
> 17/06/19 11:41:23 INFO Worker: Asked to launch executor 
> app-20170619114055-0005/228 for ScalaSort
> Exception in thread "dispatcher-event-loop-79" java.lang.OutOfMemoryError: 
> unable to create new native thread
>   at java.lang.Thread.start0(Native Method)
>   at java.lang.Thread.start(Thread.java:714)
>   at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950)
>   at 
> java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1018)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> I see in the logs that Worker's dispatcher-event got the above exception and 
> the Worker keeps running without performing any functionality. And also 
> Worker state changed from ALIVE to DEAD in Master's web UI.
> {code:xml}
> worker-20170619150349-192.168.1.120-56175 192.168.1.120:56175 DEAD
> 88 (41 Used)251.2 GB (246.0 GB Used)
> {code}
> I think Worker should handle and shutdown when any thread gets 
> UncaughtException.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-21148) Set SparkUncaughtExceptionHandler to the Master

2017-06-29 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K closed SPARK-21148.
-
Resolution: Duplicate

> Set SparkUncaughtExceptionHandler to the Master
> ---
>
> Key: SPARK-21148
> URL: https://issues.apache.org/jira/browse/SPARK-21148
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy, Spark Core
>Affects Versions: 2.1.1
>Reporter: Devaraj K
>
> Any one thread of the Master gets any of the UncaughtException then the 
> thread gets terminate and the Master process keeps running without 
> functioning properly.
> I think we need to handle the UncaughtException and exit the Master 
> gracefully.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21170) Utils.tryWithSafeFinallyAndFailureCallbacks throws IllegalArgumentException: Self-suppression not permitted

2017-06-21 Thread Devaraj K (JIRA)
Devaraj K created SPARK-21170:
-

 Summary: Utils.tryWithSafeFinallyAndFailureCallbacks throws 
IllegalArgumentException: Self-suppression not permitted
 Key: SPARK-21170
 URL: https://issues.apache.org/jira/browse/SPARK-21170
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.1.1
Reporter: Devaraj K
Priority: Minor


{code:xml}
17/06/20 22:49:39 ERROR Executor: Exception in task 225.0 in stage 1.0 (TID 
27225)
java.lang.IllegalArgumentException: Self-suppression not permitted
at java.lang.Throwable.addSuppressed(Throwable.java:1043)
at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1400)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1145)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1125)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:341)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}

{code:xml}
17/06/20 22:52:32 INFO scheduler.TaskSetManager: Lost task 427.0 in stage 1.0 
(TID 27427) on 192.168.1.121, executor 12: java.lang.IllegalArgumentException 
(Self-suppression not permitted) [duplicate 1]
17/06/20 22:52:33 INFO scheduler.TaskSetManager: Starting task 427.1 in stage 
1.0 (TID 27764, 192.168.1.122, executor 106, partition 427, PROCESS_LOCAL, 4625 
bytes)
17/06/20 22:52:33 INFO scheduler.TaskSetManager: Lost task 186.0 in stage 1.0 
(TID 27186) on 192.168.1.122, executor 106: java.lang.IllegalArgumentException 
(Self-suppression not permitted) [duplicate 2]
17/06/20 22:52:38 INFO scheduler.TaskSetManager: Starting task 186.1 in stage 
1.0 (TID 27765, 192.168.1.121, executor 9, partition 186, PROCESS_LOCAL, 4625 
bytes)
17/06/20 22:52:38 WARN scheduler.TaskSetManager: Lost task 392.0 in stage 1.0 
(TID 27392, 192.168.1.121, executor 9): java.lang.IllegalArgumentException: 
Self-suppression not permitted
at java.lang.Throwable.addSuppressed(Throwable.java:1043)
at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1400)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1145)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1125)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:341)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}

Here it is trying to suppress the same Throwable instance and causing to throw 
the IllegalArgumentException which masks the original exception.

I think it should not add to the suppressed if it is the same instance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21148) Set SparkUncaughtExceptionHandler to the Master

2017-06-19 Thread Devaraj K (JIRA)
Devaraj K created SPARK-21148:
-

 Summary: Set SparkUncaughtExceptionHandler to the Master
 Key: SPARK-21148
 URL: https://issues.apache.org/jira/browse/SPARK-21148
 Project: Spark
  Issue Type: Improvement
  Components: Deploy, Spark Core
Affects Versions: 2.1.1
Reporter: Devaraj K


Any one thread of the Master gets any of the UncaughtException then the thread 
gets terminate and the Master process keeps running without functioning 
properly.
I think we need to handle the UncaughtException and exit the Master gracefully.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21146) Worker should handle and shutdown when any thread gets UncaughtException

2017-06-19 Thread Devaraj K (JIRA)
Devaraj K created SPARK-21146:
-

 Summary: Worker should handle and shutdown when any thread gets 
UncaughtException
 Key: SPARK-21146
 URL: https://issues.apache.org/jira/browse/SPARK-21146
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 2.1.1
Reporter: Devaraj K


{code:xml}
17/06/19 11:41:23 INFO Worker: Asked to launch executor 
app-20170619114055-0005/228 for ScalaSort
Exception in thread "dispatcher-event-loop-79" java.lang.OutOfMemoryError: 
unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at 
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950)
at 
java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1018)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}

I see in the logs that Worker's dispatcher-event got the above exception and 
the Worker keeps running without performing any functionality. And also Worker 
state changed from ALIVE to DEAD in Master's web UI.

{code:xml}
worker-20170619150349-192.168.1.120-56175   192.168.1.120:56175 DEAD
88 (41 Used)251.2 GB (246.0 GB Used)
{code}

I think Worker should handle and shutdown when any thread gets 
UncaughtException.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15665) spark-submit --kill and --status are not working

2017-03-25 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942141#comment-15942141
 ] 

Devaraj K commented on SPARK-15665:
---

[~samuel-soubeyran], This issue has been resolved, Please create another jira 
if you see any other problems.

> spark-submit --kill and --status are not working 
> -
>
> Key: SPARK-15665
> URL: https://issues.apache.org/jira/browse/SPARK-15665
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Devaraj K
>Assignee: Devaraj K
> Fix For: 2.0.0
>
>
> {code:xml}
> [devaraj@server2 spark-master]$ ./bin/spark-submit --kill 
> driver-20160531171222-  --master spark://xx.xx.xx.xx:6066
> Exception in thread "main" java.lang.IllegalArgumentException: Missing 
> application resource.
> at 
> org.apache.spark.launcher.CommandBuilderUtils.checkArgument(CommandBuilderUtils.java:241)
> at 
> org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitArgs(SparkSubmitCommandBuilder.java:160)
> at 
> org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitCommand(SparkSubmitCommandBuilder.java:276)
> at 
> org.apache.spark.launcher.SparkSubmitCommandBuilder.buildCommand(SparkSubmitCommandBuilder.java:151)
> at org.apache.spark.launcher.Main.main(Main.java:86)
> {code}
> {code:xml}
> [devaraj@server2 spark-master]$ ./bin/spark-submit --status 
> driver-20160531171222-  --master spark://xx.xx.xx.xx:6066
> Exception in thread "main" java.lang.IllegalArgumentException: Missing 
> application resource.
> at 
> org.apache.spark.launcher.CommandBuilderUtils.checkArgument(CommandBuilderUtils.java:241)
> at 
> org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitArgs(SparkSubmitCommandBuilder.java:160)
> at 
> org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitCommand(SparkSubmitCommandBuilder.java:276)
> at 
> org.apache.spark.launcher.SparkSubmitCommandBuilder.buildCommand(SparkSubmitCommandBuilder.java:151)
> at org.apache.spark.launcher.Main.main(Main.java:86)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19689) Job Details page doesn't show 'Tasks: Succeeded/Total' progress bar text properly

2017-02-21 Thread Devaraj K (JIRA)
Devaraj K created SPARK-19689:
-

 Summary: Job Details page doesn't show 'Tasks: Succeeded/Total' 
progress bar text properly
 Key: SPARK-19689
 URL: https://issues.apache.org/jira/browse/SPARK-19689
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 2.1.0
Reporter: Devaraj K
Priority: Minor
 Attachments: Tasks Progress bar - Job Details Page.png

In Failed Stages table, 'Tasks: Succeeded/Total' value is displaying properly 
when there is a Failure Reason with some multi-line text.

Please find the attached screen shot for more details.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19689) Job Details page doesn't show 'Tasks: Succeeded/Total' progress bar text properly

2017-02-21 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated SPARK-19689:
--
Attachment: Tasks Progress bar - Job Details Page.png

> Job Details page doesn't show 'Tasks: Succeeded/Total' progress bar text 
> properly
> -
>
> Key: SPARK-19689
> URL: https://issues.apache.org/jira/browse/SPARK-19689
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.1.0
>Reporter: Devaraj K
>Priority: Minor
> Attachments: Tasks Progress bar - Job Details Page.png
>
>
> In Failed Stages table, 'Tasks: Succeeded/Total' value is displaying properly 
> when there is a Failure Reason with some multi-line text.
> Please find the attached screen shot for more details.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19354) Killed tasks are getting marked as FAILED

2017-01-26 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated SPARK-19354:
--
Description: 
When we enable speculation, we can see there are multiple attempts running for 
the same task when the first task progress is slow. If any of the task attempt 
succeeds then the other attempts will be killed, during killing the attempts 
those attempts are getting marked as failed due to the below error. We need to 
handle this error and mark the attempt as KILLED instead of FAILED.


||93||214   ||1 (speculative)   ||FAILED||ANY   ||1 / 
xx.xx.xx.x2
stdout
stderr||2017/01/24 10:30:44 ||0.2 s ||0.0 B / 0 ||8.0 KB / 400  
||java.io.IOException: Failed on local exception: 
java.nio.channels.ClosedByInterruptException; Host Details : local host is: 
node2/xx.xx.xx.x2; destination host is: node1:9000; 
+details||

{code:xml}
17/01/23 23:54:32 INFO Executor: Executor is trying to kill task 93.1 in stage 
1.0 (TID 214)
17/01/23 23:54:32 INFO FileOutputCommitter: File Output Committer Algorithm 
version is 1
17/01/23 23:54:32 ERROR Executor: Exception in task 93.1 in stage 1.0 (TID 214)
java.io.IOException: Failed on local exception: 
java.nio.channels.ClosedByInterruptException; Host Details : local host is: 
"stobdtserver3/10.224.54.70"; destination host is: "stobdtserver2":9000; 
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:776)
at org.apache.hadoop.ipc.Client.call(Client.java:1479)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy17.create(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:296)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy18.create(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1648)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1689)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1624)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:448)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:444)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:459)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:387)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:804)
at 
org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123)
at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:90)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1133)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1124)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:88)
at org.apache.spark.scheduler.Task.run(Task.scala:114)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.channels.ClosedByInterruptException
at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:659)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
at 

[jira] [Updated] (SPARK-19377) Killed tasks should have the status as KILLED

2017-01-26 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated SPARK-19377:
--
Description: 
|143|10 |0  |SUCCESS|NODE_LOCAL |6 / x.xx.x.x
stdout
stderr |2017/01/25 07:49:27 |0 ms   |0.0 B / 0  |0.0 B 
/ 0  |TaskKilled (killed intentionally)|



|156|11 |0  |SUCCESS|NODE_LOCAL |5 / x.xx.x.x
stdout
stderr |2017/01/25 07:49:27 |0 ms   |0.0 B / 0  |0.0 B 
/ 0  |TaskKilled (killed intentionally)|

Killed tasks show the task status as SUCCESS, I think we should have the status 
as KILLED for the killed tasks.

  was:
|143|10 |0  |SUCCESS|NODE_LOCAL |6 / x.xx.x.x
stdout
stderr
|2017/01/25 07:49:27|0 ms   |0.0 B / 0  |0.0 B / 0  
|TaskKilled (killed intentionally)|



|156|11 |0  |SUCCESS|NODE_LOCAL |5 / x.xx.x.x
stdout
stderr
|2017/01/25 07:49:27|0 ms   |0.0 B / 0  |0.0 B / 0  
|TaskKilled (killed intentionally)|

Killed tasks show the task status as SUCCESS, I think we should have the status 
as KILLED for the killed tasks.


> Killed tasks should have the status as KILLED
> -
>
> Key: SPARK-19377
> URL: https://issues.apache.org/jira/browse/SPARK-19377
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, Web UI
>Reporter: Devaraj K
>Priority: Minor
>
> |143  |10 |0  |SUCCESS|NODE_LOCAL |6 / x.xx.x.x
> stdout
> stderr |2017/01/25 07:49:27   |0 ms   |0.0 B / 0  |0.0 B 
> / 0  |TaskKilled (killed intentionally)|
> |156  |11 |0  |SUCCESS|NODE_LOCAL |5 / x.xx.x.x
> stdout
> stderr |2017/01/25 07:49:27   |0 ms   |0.0 B / 0  |0.0 B 
> / 0  |TaskKilled (killed intentionally)|
> Killed tasks show the task status as SUCCESS, I think we should have the 
> status as KILLED for the killed tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19377) Killed tasks should have the status as KILLED

2017-01-26 Thread Devaraj K (JIRA)
Devaraj K created SPARK-19377:
-

 Summary: Killed tasks should have the status as KILLED
 Key: SPARK-19377
 URL: https://issues.apache.org/jira/browse/SPARK-19377
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, Web UI
Reporter: Devaraj K
Priority: Minor


|143|10 |0  |SUCCESS|NODE_LOCAL |6 / x.xx.x.x
stdout
stderr
|2017/01/25 07:49:27|0 ms   |0.0 B / 0  |0.0 B / 0  
|TaskKilled (killed intentionally)|



|156|11 |0  |SUCCESS|NODE_LOCAL |5 / x.xx.x.x
stdout
stderr
|2017/01/25 07:49:27|0 ms   |0.0 B / 0  |0.0 B / 0  
|TaskKilled (killed intentionally)|

Killed tasks show the task status as SUCCESS, I think we should have the status 
as KILLED for the killed tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19354) Killed tasks are getting marked as FAILED

2017-01-25 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838396#comment-15838396
 ] 

Devaraj K commented on SPARK-19354:
---

bq. The question is, why the error during shutdown?
The shutdown is not related to the error here, there were no further tasks to 
execute so driver commanded to shutdown it.

It happens so frequently when the speculation enabled, I am suspecting it would 
probably lead to executors blacklisting due to this failure during kill.

> Killed tasks are getting marked as FAILED
> -
>
> Key: SPARK-19354
> URL: https://issues.apache.org/jira/browse/SPARK-19354
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Spark Core
>Reporter: Devaraj K
>Priority: Minor
>
> When we enable speculation, we can see there are multiple attempts running 
> for the same task when the first task progress is slow. If any of the task 
> attempt succeeds then the other attempts will be killed, during killing the 
> attempts those attempts are getting marked as failed due to the below error. 
> We need to handle this error and mark the attempt as KILLED instead of FAILED.
> ||93  ||214   ||1 (speculative)   ||FAILED||ANY   ||1 / 
> xx.xx.xx.x2
> stdout
> stderr
> ||2017/01/24 10:30:44 ||0.2 s ||0.0 B / 0 ||8.0 KB / 400  
> ||java.io.IOException: Failed on local exception: 
> java.nio.channels.ClosedByInterruptException; Host Details : local host is: 
> node2/xx.xx.xx.x2; destination host is: node1:9000; 
> +details||
> {code:xml}
> 17/01/23 23:54:32 INFO Executor: Executor is trying to kill task 93.1 in 
> stage 1.0 (TID 214)
> 17/01/23 23:54:32 INFO FileOutputCommitter: File Output Committer Algorithm 
> version is 1
> 17/01/23 23:54:32 ERROR Executor: Exception in task 93.1 in stage 1.0 (TID 
> 214)
> java.io.IOException: Failed on local exception: 
> java.nio.channels.ClosedByInterruptException; Host Details : local host is: 
> "stobdtserver3/10.224.54.70"; destination host is: "stobdtserver2":9000; 
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:776)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1479)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1412)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>   at com.sun.proxy.$Proxy17.create(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:296)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>   at com.sun.proxy.$Proxy18.create(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1648)
>   at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1689)
>   at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1624)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:448)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:444)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:459)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:387)
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:804)
>   at 
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123)
>   at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:90)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1133)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1124)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:88)
>   at org.apache.spark.scheduler.Task.run(Task.scala:114)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> 

[jira] [Commented] (SPARK-19354) Killed tasks are getting marked as FAILED

2017-01-24 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15837346#comment-15837346
 ] 

Devaraj K commented on SPARK-19354:
---

Thanks [~uncleGen] for the comment. Here the error occurred during the kill 
process is masking the original reason that the task has been killed, I think 
we need to retain the real reason instead of the masked one. And also when we 
see a failed task we may suspect something went wrong and starts diagnose for 
the cause, finally we see that it happened during kill. It also shows wrong 
metrics in the section 'Aggregated Metrics by Executor'.

> Killed tasks are getting marked as FAILED
> -
>
> Key: SPARK-19354
> URL: https://issues.apache.org/jira/browse/SPARK-19354
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler, Spark Core
>Reporter: Devaraj K
>Priority: Minor
>
> When we enable speculation, we can see there are multiple attempts running 
> for the same task when the first task progress is slow. If any of the task 
> attempt succeeds then the other attempts will be killed, during killing the 
> attempts those attempts are getting marked as failed due to the below error. 
> We need to handle this error and mark the attempt as KILLED instead of FAILED.
> ||93  ||214   ||1 (speculative)   ||FAILED||ANY   ||1 / 
> xx.xx.xx.x2
> stdout
> stderr
> ||2017/01/24 10:30:44 ||0.2 s ||0.0 B / 0 ||8.0 KB / 400  
> ||java.io.IOException: Failed on local exception: 
> java.nio.channels.ClosedByInterruptException; Host Details : local host is: 
> node2/xx.xx.xx.x2; destination host is: node1:9000; 
> +details||
> {code:xml}
> 17/01/23 23:54:32 INFO Executor: Executor is trying to kill task 93.1 in 
> stage 1.0 (TID 214)
> 17/01/23 23:54:32 INFO FileOutputCommitter: File Output Committer Algorithm 
> version is 1
> 17/01/23 23:54:32 ERROR Executor: Exception in task 93.1 in stage 1.0 (TID 
> 214)
> java.io.IOException: Failed on local exception: 
> java.nio.channels.ClosedByInterruptException; Host Details : local host is: 
> "stobdtserver3/10.224.54.70"; destination host is: "stobdtserver2":9000; 
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:776)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1479)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1412)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>   at com.sun.proxy.$Proxy17.create(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:296)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>   at com.sun.proxy.$Proxy18.create(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1648)
>   at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1689)
>   at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1624)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:448)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:444)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:459)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:387)
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)
>   at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:804)
>   at 
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123)
>   at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:90)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1133)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1124)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:88)
>   at org.apache.spark.scheduler.Task.run(Task.scala:114)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
>   at 
> 

[jira] [Created] (SPARK-19354) Killed tasks are getting marked as FAILED

2017-01-24 Thread Devaraj K (JIRA)
Devaraj K created SPARK-19354:
-

 Summary: Killed tasks are getting marked as FAILED
 Key: SPARK-19354
 URL: https://issues.apache.org/jira/browse/SPARK-19354
 Project: Spark
  Issue Type: Bug
  Components: Scheduler, Spark Core
Reporter: Devaraj K
Priority: Minor


When we enable speculation, we can see there are multiple attempts running for 
the same task when the first task progress is slow. If any of the task attempt 
succeeds then the other attempts will be killed, during killing the attempts 
those attempts are getting marked as failed due to the below error. We need to 
handle this error and mark the attempt as KILLED instead of FAILED.


||93||214   ||1 (speculative)   ||FAILED||ANY   ||1 / 
xx.xx.xx.x2
stdout
stderr
||2017/01/24 10:30:44   ||0.2 s ||0.0 B / 0 ||8.0 KB / 400  
||java.io.IOException: Failed on local exception: 
java.nio.channels.ClosedByInterruptException; Host Details : local host is: 
node2/xx.xx.xx.x2; destination host is: node1:9000; 
+details||

{code:xml}
17/01/23 23:54:32 INFO Executor: Executor is trying to kill task 93.1 in stage 
1.0 (TID 214)
17/01/23 23:54:32 INFO FileOutputCommitter: File Output Committer Algorithm 
version is 1
17/01/23 23:54:32 ERROR Executor: Exception in task 93.1 in stage 1.0 (TID 214)
java.io.IOException: Failed on local exception: 
java.nio.channels.ClosedByInterruptException; Host Details : local host is: 
"stobdtserver3/10.224.54.70"; destination host is: "stobdtserver2":9000; 
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:776)
at org.apache.hadoop.ipc.Client.call(Client.java:1479)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy17.create(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:296)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy18.create(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1648)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1689)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1624)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:448)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:444)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:459)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:387)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:804)
at 
org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123)
at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:90)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1133)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1124)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:88)
at org.apache.spark.scheduler.Task.run(Task.scala:114)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.channels.ClosedByInterruptException
at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:659)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at 

[jira] [Commented] (SPARK-15359) Mesos dispatcher should handle DRIVER_ABORTED status from mesosDriver.run()

2016-12-29 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15786878#comment-15786878
 ] 

Devaraj K commented on SPARK-15359:
---

Thanks [~yu2003w] for verifying this PR, I forgot to mention that it depends on 
SPARK-15288 [https://github.com/apache/spark/pull/13072] for handling the 
UncaughtException's, sorry for that. Can you verify this PR with the 
SPARK-15288 fix?

> Mesos dispatcher should handle DRIVER_ABORTED status from mesosDriver.run()
> ---
>
> Key: SPARK-15359
> URL: https://issues.apache.org/jira/browse/SPARK-15359
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Mesos
>Reporter: Devaraj K
>Priority: Minor
>
> Mesos dispatcher handles DRIVER_ABORTED status for mesosDriver.run() during 
> the successful registration but if the mesosDriver.run() returns 
> DRIVER_ABORTED status after the successful register then there is no action 
> for the status and the thread will be terminated. 
> I think we need to throw the exception and shutdown the dispatcher.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15359) Mesos dispatcher should handle DRIVER_ABORTED status from mesosDriver.run()

2016-12-28 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15783821#comment-15783821
 ] 

Devaraj K commented on SPARK-15359:
---

[~yu2003w], seems you are also facing the same issue which I mentioned in the 
description, I already created PR for this issue, do you have chance to try 
with the PR available and let me know your feedback?

> Mesos dispatcher should handle DRIVER_ABORTED status from mesosDriver.run()
> ---
>
> Key: SPARK-15359
> URL: https://issues.apache.org/jira/browse/SPARK-15359
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Mesos
>Reporter: Devaraj K
>Priority: Minor
>
> Mesos dispatcher handles DRIVER_ABORTED status for mesosDriver.run() during 
> the successful registration but if the mesosDriver.run() returns 
> DRIVER_ABORTED status after the successful register then there is no action 
> for the status and the thread will be terminated. 
> I think we need to throw the exception and shutdown the dispatcher.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15665) spark-submit --kill and --status are not working

2016-05-31 Thread Devaraj K (JIRA)
Devaraj K created SPARK-15665:
-

 Summary: spark-submit --kill and --status are not working 
 Key: SPARK-15665
 URL: https://issues.apache.org/jira/browse/SPARK-15665
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Devaraj K


{code:xml}
[devaraj@server2 spark-master]$ ./bin/spark-submit --kill 
driver-20160531171222-  --master spark://xx.xx.xx.xx:6066
Exception in thread "main" java.lang.IllegalArgumentException: Missing 
application resource.
at 
org.apache.spark.launcher.CommandBuilderUtils.checkArgument(CommandBuilderUtils.java:241)
at 
org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitArgs(SparkSubmitCommandBuilder.java:160)
at 
org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitCommand(SparkSubmitCommandBuilder.java:276)
at 
org.apache.spark.launcher.SparkSubmitCommandBuilder.buildCommand(SparkSubmitCommandBuilder.java:151)
at org.apache.spark.launcher.Main.main(Main.java:86)
{code}


{code:xml}
[devaraj@server2 spark-master]$ ./bin/spark-submit --status 
driver-20160531171222-  --master spark://xx.xx.xx.xx:6066
Exception in thread "main" java.lang.IllegalArgumentException: Missing 
application resource.
at 
org.apache.spark.launcher.CommandBuilderUtils.checkArgument(CommandBuilderUtils.java:241)
at 
org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitArgs(SparkSubmitCommandBuilder.java:160)
at 
org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitCommand(SparkSubmitCommandBuilder.java:276)
at 
org.apache.spark.launcher.SparkSubmitCommandBuilder.buildCommand(SparkSubmitCommandBuilder.java:151)
at org.apache.spark.launcher.Main.main(Main.java:86)
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15560) Queued/Supervise drivers waiting for retry drivers disappear for kill command in Mesos mode

2016-05-26 Thread Devaraj K (JIRA)
Devaraj K created SPARK-15560:
-

 Summary: Queued/Supervise drivers waiting for retry drivers 
disappear for kill command in Mesos mode
 Key: SPARK-15560
 URL: https://issues.apache.org/jira/browse/SPARK-15560
 Project: Spark
  Issue Type: Bug
  Components: Mesos
Reporter: Devaraj K
Priority: Minor


When we issue a kill command to the drivers when they are in Queued Drivers or 
Supervise drivers waiting for retry state, the driver disappear from the Mesos 
dispatcher web UI.

I think they should be moved to Finished Drivers and list there instead of 
making it to disappear completely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15555) Driver with --supervise option cannot be killed in Mesos mode

2016-05-26 Thread Devaraj K (JIRA)
Devaraj K created SPARK-1:
-

 Summary: Driver with --supervise option cannot be killed in Mesos 
mode
 Key: SPARK-1
 URL: https://issues.apache.org/jira/browse/SPARK-1
 Project: Spark
  Issue Type: Bug
  Components: Deploy, Mesos
Reporter: Devaraj K


When we have a launched driver which was submitted using --supervise option and 
if we want to kill it using 'spark-submit --kill' command then the Mesos 
dispatcher will add it back to 'Supervise drivers waiting for retry' section 
and restarts it again and again. I don't see any way to kill the supervised 
drivers.

I think it should not be re-launched for a kill request.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15359) Mesos dispatcher should handle DRIVER_ABORTED status from mesosDriver.run()

2016-05-17 Thread Devaraj K (JIRA)
Devaraj K created SPARK-15359:
-

 Summary: Mesos dispatcher should handle DRIVER_ABORTED status from 
mesosDriver.run()
 Key: SPARK-15359
 URL: https://issues.apache.org/jira/browse/SPARK-15359
 Project: Spark
  Issue Type: Bug
  Components: Deploy, Mesos
Reporter: Devaraj K
Priority: Minor


Mesos dispatcher handles DRIVER_ABORTED status for mesosDriver.run() during the 
successful registration but if the mesosDriver.run() returns DRIVER_ABORTED 
status after the successful register then there is no action for the status and 
the thread will be terminated. 

I think we need to throw the exception and shutdown the dispatcher.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15288) Mesos dispatcher should handle gracefully when any thread gets UncaughtException

2016-05-12 Thread Devaraj K (JIRA)
Devaraj K created SPARK-15288:
-

 Summary: Mesos dispatcher should handle gracefully when any thread 
gets UncaughtException
 Key: SPARK-15288
 URL: https://issues.apache.org/jira/browse/SPARK-15288
 Project: Spark
  Issue Type: Improvement
  Components: Deploy, Mesos
Reporter: Devaraj K
Priority: Minor


Any one thread of the Mesos dispatcher gets any of the UncaughtException then 
the thread gets terminate and the dispatcher process keeps running without 
functioning properly. 

I think we need to handle the UncaughtException and shutdown the Mesos 
dispatcher.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15142) Spark Mesos dispatcher becomes unusable when the Mesos master restarts

2016-05-08 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275978#comment-15275978
 ] 

Devaraj K commented on SPARK-15142:
---

bq. Can you include the dispatcher logs?

I have attached the dispatcher logs, but I don't see anything useful in those.

bq. Does restarting the dispatcher fix the problem?
Yes, It works fine after restarting the dispatcher.

I suspect the dispatcher is loosing the connection with the mesos master after 
the mesos master restart and stops receiving the resource offerings. I think 
the dispatcher may need to re-register with the mesos master for connection 
loss. I will try creating a PR to fix this issue.

> Spark Mesos dispatcher becomes unusable when the Mesos master restarts
> --
>
> Key: SPARK-15142
> URL: https://issues.apache.org/jira/browse/SPARK-15142
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Mesos
>Reporter: Devaraj K
>Priority: Minor
> Attachments: 
> spark-devaraj-org.apache.spark.deploy.mesos.MesosClusterDispatcher-1-stobdtserver5.out
>
>
> While Spark Mesos dispatcher running if the Mesos master gets restarted then 
> Spark Mesos dispatcher will keep running and queues up all the submitted 
> applications and will not launch them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15142) Spark Mesos dispatcher becomes unusable when the Mesos master restarts

2016-05-08 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated SPARK-15142:
--
Attachment: 
spark-devaraj-org.apache.spark.deploy.mesos.MesosClusterDispatcher-1-stobdtserver5.out

> Spark Mesos dispatcher becomes unusable when the Mesos master restarts
> --
>
> Key: SPARK-15142
> URL: https://issues.apache.org/jira/browse/SPARK-15142
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Mesos
>Reporter: Devaraj K
>Priority: Minor
> Attachments: 
> spark-devaraj-org.apache.spark.deploy.mesos.MesosClusterDispatcher-1-stobdtserver5.out
>
>
> While Spark Mesos dispatcher running if the Mesos master gets restarted then 
> Spark Mesos dispatcher will keep running and queues up all the submitted 
> applications and will not launch them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15142) Spark Mesos dispatcher becomes unusable when the Mesos master restarts

2016-05-05 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15272056#comment-15272056
 ] 

Devaraj K commented on SPARK-15142:
---

It is not running the queued applications after Mesos master comes up and these 
are staying in 'Queued Drivers:' forever, the newly submitted applications also 
will go and add up in queue. Only way I see here is restarting the Spark Mesos 
dispatcher for launching the newly submitted applications.

> Spark Mesos dispatcher becomes unusable when the Mesos master restarts
> --
>
> Key: SPARK-15142
> URL: https://issues.apache.org/jira/browse/SPARK-15142
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Mesos
>Reporter: Devaraj K
>Priority: Minor
>
> While Spark Mesos dispatcher running if the Mesos master gets restarted then 
> Spark Mesos dispatcher will keep running and queues up all the submitted 
> applications and will not launch them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15142) Spark Mesos dispatcher becomes unusable when the Mesos master restarts

2016-05-04 Thread Devaraj K (JIRA)
Devaraj K created SPARK-15142:
-

 Summary: Spark Mesos dispatcher becomes unusable when the Mesos 
master restarts
 Key: SPARK-15142
 URL: https://issues.apache.org/jira/browse/SPARK-15142
 Project: Spark
  Issue Type: Bug
  Components: Deploy, Mesos
Reporter: Devaraj K
Priority: Minor


While Spark Mesos dispatcher running if the Mesos master gets restarted then 
Spark Mesos dispatcher will keep running and queues up all the submitted 
applications and will not launch them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10713) SPARK_DIST_CLASSPATH ignored on Mesos executors

2016-05-04 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15271875#comment-15271875
 ] 

Devaraj K commented on SPARK-10713:
---

bq. However, on Mesos, SPARK_DIST_CLASSPATH is missing from executors and jar 
is not in the classpath. It is present on YARN. Am I missing something? Do you 
see different behavior?
In my case, I see that jars/path provided for SPARK_DIST_CLASSPATH is getting 
included in executors classpath and as well as in the driver's classpath.

> SPARK_DIST_CLASSPATH ignored on Mesos executors
> ---
>
> Key: SPARK-10713
> URL: https://issues.apache.org/jira/browse/SPARK-10713
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Mesos
>Affects Versions: 1.5.0
>Reporter: Dara Adib
>Priority: Minor
>
> If I set the environment variable SPARK_DIST_CLASSPATH, the jars are included 
> on the driver, but not on Mesos executors. Docs: 
> https://spark.apache.org/docs/latest/hadoop-provided.html
> I see SPARK_DIST_CLASSPATH mentioned in these files:
> launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java
> project/SparkBuild.scala
> yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
> But not the Mesos executor (or should it be included by the launcher 
> library?):
> spark/core/src/main/scala/org/apache/spark/executor/Executor.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10713) SPARK_DIST_CLASSPATH ignored on Mesos executors

2016-05-04 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270330#comment-15270330
 ] 

Devaraj K commented on SPARK-10713:
---

[~daradib], I tried to reproduce it but seems SPARK_DIST_CLASSPATH is getting 
included for the driver and as well as on the Mesos executors. Do you still see 
the issue here?

> SPARK_DIST_CLASSPATH ignored on Mesos executors
> ---
>
> Key: SPARK-10713
> URL: https://issues.apache.org/jira/browse/SPARK-10713
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Mesos
>Affects Versions: 1.5.0
>Reporter: Dara Adib
>Priority: Minor
>
> If I set the environment variable SPARK_DIST_CLASSPATH, the jars are included 
> on the driver, but not on Mesos executors. Docs: 
> https://spark.apache.org/docs/latest/hadoop-provided.html
> I see SPARK_DIST_CLASSPATH mentioned in these files:
> launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java
> project/SparkBuild.scala
> yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
> But not the Mesos executor (or should it be included by the launcher 
> library?):
> spark/core/src/main/scala/org/apache/spark/executor/Executor.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13532) Spark yarn executor container fails if yarn.nodemanager.local-dirs starts with file://

2016-05-03 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268670#comment-15268670
 ] 

Devaraj K commented on SPARK-13532:
---

[~apivovarov], I tried to reproduce this issue but it seems working fine for 
'yarn.nodemanager.local-dirs' value with prefix file:// and also for 
'spark.local.dir' value with the file:// prefix. Node Manager container launch 
failure could be due to any other reason also. It would be great if you could 
provide some more information for reproducing this issue.

> Spark yarn executor container fails if yarn.nodemanager.local-dirs starts 
> with file://
> --
>
> Key: SPARK-13532
> URL: https://issues.apache.org/jira/browse/SPARK-13532
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.6.0
>Reporter: Alexander Pivovarov
>Priority: Minor
>
> Spark yarn executor container fails if yarn.nodemanager.local-dirs starts 
> with file://
> {code}
>
>  yarn.nodemanager.local-dirs
>  file:///data01/yarn/nm,file:///data02/yarn/nm
>
> {code}
> other application, e.g. Hadoop MR and Hive work normally
> Spark works only if yarn.nodemanager.local-dirs does not have file:// prefix
> e.g.
> {code}
> /data01/yarn/nm,/data02/yarn/nm
> {code}
> to reproduce the issue
> open spark-shell
> run
> {code}
> $ spark-shell
> > sc.parallelize(1 to 10).count
> {code}
> stack trace in spark-shell is
> {code}
> scala> sc.parallelize(1 to 10).count
> 16/02/28 08:50:37 INFO spark.SparkContext: Starting job: count at :28
> 16/02/28 08:50:37 INFO scheduler.DAGScheduler: Got job 0 (count at 
> :28) with 2 output partitions
> 16/02/28 08:50:37 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 
> (count at :28)
> 16/02/28 08:50:37 INFO scheduler.DAGScheduler: Parents of final stage: List()
> 16/02/28 08:50:37 INFO scheduler.DAGScheduler: Missing parents: List()
> 16/02/28 08:50:37 INFO scheduler.DAGScheduler: Submitting ResultStage 0 
> (ParallelCollectionRDD[0] at parallelize at :28), which has no 
> missing parents
> 16/02/28 08:50:38 INFO storage.MemoryStore: Block broadcast_0 stored as 
> values in memory (estimated size 1096.0 B, free 1096.0 B)
> 16/02/28 08:50:38 INFO storage.MemoryStore: Block broadcast_0_piece0 stored 
> as bytes in memory (estimated size 804.0 B, free 1900.0 B)
> 16/02/28 08:50:38 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in 
> memory on 10.101.124.13:39374 (size: 804.0 B, free: 511.5 MB)
> 16/02/28 08:50:38 INFO spark.SparkContext: Created broadcast 0 from broadcast 
> at DAGScheduler.scala:1006
> 16/02/28 08:50:38 INFO scheduler.DAGScheduler: Submitting 2 missing tasks 
> from ResultStage 0 (ParallelCollectionRDD[0] at parallelize at :28)
> 16/02/28 08:50:38 INFO cluster.YarnScheduler: Adding task set 0.0 with 2 tasks
> 16/02/28 08:50:39 INFO spark.ExecutorAllocationManager: Requesting 1 new 
> executor because tasks are backlogged (new desired total will be 1)
> 16/02/28 08:50:40 INFO spark.ExecutorAllocationManager: Requesting 1 new 
> executor because tasks are backlogged (new desired total will be 2)
> 16/02/28 08:50:42 INFO cluster.YarnClientSchedulerBackend: Registered 
> executor NettyRpcEndpointRef(null) (ip-10-101-124-14:34681) with ID 1
> 16/02/28 08:50:42 INFO spark.ExecutorAllocationManager: New executor 1 has 
> registered (new total is 1)
> 16/02/28 08:50:42 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 
> 0.0 (TID 0, ip-10-101-124-14, partition 0,PROCESS_LOCAL, 2078 bytes)
> 16/02/28 08:50:42 INFO storage.BlockManagerMasterEndpoint: Registering block 
> manager ip-10-101-124-14:58315 with 3.8 GB RAM, BlockManagerId(1, 
> ip-10-101-124-14, 58315)
> 16/02/28 08:50:53 INFO cluster.YarnClientSchedulerBackend: Disabling executor 
> 1.
> 16/02/28 08:50:53 INFO scheduler.DAGScheduler: Executor lost: 1 (epoch 0)
> 16/02/28 08:50:53 INFO storage.BlockManagerMasterEndpoint: Trying to remove 
> executor 1 from BlockManagerMaster.
> 16/02/28 08:50:53 INFO storage.BlockManagerMasterEndpoint: Removing block 
> manager BlockManagerId(1, ip-10-101-124-14, 58315)
> 16/02/28 08:50:53 INFO storage.BlockManagerMaster: Removed 1 successfully in 
> removeExecutor
> 16/02/28 08:50:53 ERROR cluster.YarnScheduler: Lost executor 1 on 
> ip-10-101-124-14: Container marked as failed: 
> container_1456648448960_0003_01_02 on host: ip-10-101-124-14. Exit 
> status: 1. Diagnostics: Exception from container-launch.
> Container id: container_1456648448960_0003_01_02
> Exit code: 1
> Stack trace: ExitCodeException exitCode=1: 
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
>   at org.apache.hadoop.util.Shell.run(Shell.java:455)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
>   at 
> 

[jira] [Commented] (SPARK-15067) YARN executors are launched with fixed perm gen size

2016-05-03 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268659#comment-15268659
 ] 

Devaraj K commented on SPARK-15067:
---

[~renatojdk], are you planning to create PR for this? If not please let me 
know, I can provide PR for this. Thanks.

> YARN executors are launched with fixed perm gen size
> 
>
> Key: SPARK-15067
> URL: https://issues.apache.org/jira/browse/SPARK-15067
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.6.0, 1.6.1
>Reporter: Renato Falchi Brandão
>
> It is impossible to change the executors max perm gen size using the property 
> "spark.executor.extraJavaOptions" when you are running on YARN.
> When the JVM option "-XX:MaxPermSize" is set through the property 
> "spark.executor.extraJavaOptions", Spark put it properly in the shell command 
> that will start the JVM container but, in the ending of command, it sets 
> again this option using a fixed value of 256m, as you can see in the log I've 
> extracted:
> 2016-04-30 17:20:12 INFO  ExecutorRunnable:58 -
> ===
> YARN executor launch context:
>   env:
> CLASSPATH -> 
> {{PWD}}{{PWD}}/__spark__.jar$HADOOP_CONF_DIR/usr/hdp/current/hadoop-client/*/usr/hdp/current/hadoop-client/lib/*/usr/hdp/current/hadoop-hdfs-client/*/usr/hdp/current/hadoop-hdfs-client/lib/*/usr/hdp/current/hadoop-yarn-client/*/usr/hdp/current/hadoop-yarn-client/lib/*/usr/hdp/mr-framework/hadoop/share/hadoop/mapreduce/*:/usr/hdp/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:/usr/hdp/mr-framework/hadoop/share/hadoop/common/*:/usr/hdp/mr-framework/hadoop/share/hadoop/common/lib/*:/usr/hdp/mr-framework/hadoop/share/hadoop/yarn/*:/usr/hdp/mr-framework/hadoop/share/hadoop/yarn/lib/*:/usr/hdp/mr-framework/hadoop/share/hadoop/hdfs/*:/usr/hdp/mr-framework/hadoop/share/hadoop/hdfs/lib/*:/usr/hdp/current/hadoop/lib/hadoop-lzo-0.6.0.jar:/etc/hadoop/conf/secure
> SPARK_LOG_URL_STDERR -> 
> http://x0668sl.x.br:8042/node/containerlogs/container_1456962126505_329993_01_02/h_loadbd/stderr?start=-4096
> SPARK_YARN_STAGING_DIR -> .sparkStaging/application_1456962126505_329993
> SPARK_YARN_CACHE_FILES_FILE_SIZES -> 191719054,166
> SPARK_USER -> h_loadbd
> SPARK_YARN_CACHE_FILES_VISIBILITIES -> PUBLIC,PUBLIC
> SPARK_YARN_MODE -> true
> SPARK_YARN_CACHE_FILES_TIME_STAMPS -> 1459806496093,1459808508343
> SPARK_LOG_URL_STDOUT -> 
> http://x0668sl.x.br:8042/node/containerlogs/container_1456962126505_329993_01_02/h_loadbd/stdout?start=-4096
> SPARK_YARN_CACHE_FILES -> 
> hdfs://x/user/datalab/hdp/spark/lib/spark-assembly-1.6.0.2.3.4.1-10-hadoop2.7.1.2.3.4.1-10.jar#__spark__.jar,hdfs://tlvcluster/user/datalab/hdp/spark/conf/hive-site.xml#hive-site.xml
>   command:
> {{JAVA_HOME}}/bin/java -server -XX:OnOutOfMemoryError='kill %p' -Xms6144m 
> -Xmx6144m '-XX:+PrintGCDetails' '-XX:MaxPermSize=1024M' 
> '-XX:+PrintGCTimeStamps' -Djava.io.tmpdir={{PWD}}/tmp 
> '-Dspark.akka.timeout=30' '-Dspark.driver.port=62875' 
> '-Dspark.rpc.askTimeout=30' '-Dspark.rpc.lookupTimeout=30' 
> -Dspark.yarn.app.container.log.dir= -XX:MaxPermSize=256m 
> org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url 
> spark://CoarseGrainedScheduler@10.125.81.42:62875 --executor-id 1 --hostname 
> x0668sl.x.br --cores 1 --app-id application_1456962126505_329993 
> --user-class-path file:$PWD/__app__.jar 1> /stdout 2> 
> /stderr
> Analyzing the code is possible to see that all the options set in the 
> property "spark.executor.extraJavaOptions" are enclosed, one by one, in 
> single quotes (ExecutorRunnable.scala:151) before the launcher take the 
> decision if a default value has to be provided or not for the option 
> "-XX:MaxPermSize" (ExecutorRunnable.scala:202).
> This decision is taken examining all the options set and looking for a string 
> starting with the value "-XX:MaxPermSize" (CommandBuilderUtils.java:328). If 
> that value is not found, the default value is set.
> A string option starting without single quote will never be found, then, a 
> default value will always be provided.
> A possible solution is change the source code of CommandBuilderUtils.java in 
> the line 328:
> From-> if (arg.startsWith("-XX:MaxPermSize="))
> To-> if (arg.indexOf("-XX:MaxPermSize=") > -1)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13063) Make the SPARK YARN STAGING DIR as configurable

2016-03-29 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216090#comment-15216090
 ] 

Devaraj K commented on SPARK-13063:
---

Thanks [~tgraves] for confirmation, I will create PR for this.

> Make the SPARK YARN STAGING DIR as configurable
> ---
>
> Key: SPARK-13063
> URL: https://issues.apache.org/jira/browse/SPARK-13063
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Reporter: Devaraj K
>Priority: Minor
>
> SPARK YARN STAGING DIR is based on the file system home directory. If the 
> user wants to change this staging directory due to the same used by any other 
> applications, there is no provision for the user to specify a different 
> directory for staging dir.
> {code:xml}
>  val stagingDirPath = new Path(remoteFs.getHomeDirectory, stagingDir)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14234) Executor crashes for TaskRunner thread interruption

2016-03-29 Thread Devaraj K (JIRA)
Devaraj K created SPARK-14234:
-

 Summary: Executor crashes for TaskRunner thread interruption
 Key: SPARK-14234
 URL: https://issues.apache.org/jira/browse/SPARK-14234
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Devaraj K


If the TaskRunner thread gets interrupted while running due to task kill or any 
other reason, the interrupted thread will try to update the task status as part 
of the exception handling and fails with the below exception. This is happening 
from all of these catch blocks statusUpdate calls, below are the exceptions 
correspondingly for all these catch cases.

{code:title=Executor.scala|borderStyle=solid}

case _: TaskKilledException | _: InterruptedException if task.killed =>
 ..

case cDE: CommitDeniedException =>
 ..

case t: Throwable =>
 ..
{code}

{code:xml}
16/03/29 17:32:33 ERROR SparkUncaughtExceptionHandler: Uncaught exception in 
thread Thread[Executor task launch worker-2,5,main]
java.lang.Error: java.nio.channels.ClosedByInterruptException
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1151)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.nio.channels.ClosedByInterruptException
at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at 
java.nio.channels.Channels$WritableByteChannelImpl.write(Channels.java:460)
at 
org.apache.spark.util.SerializableBuffer$$anonfun$writeObject$1.apply(SerializableBuffer.scala:49)
at 
org.apache.spark.util.SerializableBuffer$$anonfun$writeObject$1.apply(SerializableBuffer.scala:47)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1204)
at 
org.apache.spark.util.SerializableBuffer.writeObject(SerializableBuffer.scala:47)
at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
at 
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43)
at 
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
at 
org.apache.spark.rpc.netty.NettyRpcEnv.serialize(NettyRpcEnv.scala:253)
at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:192)
at 
org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:513)
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend.statusUpdate(CoarseGrainedExecutorBackend.scala:135)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
... 2 more
{code}

{code:xml}
16/03/29 08:00:29 ERROR SparkUncaughtExceptionHandler: Uncaught exception in 
thread Thread[Executor task launch worker-4,5,main]
java.lang.Error: java.nio.channels.ClosedByInterruptException
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1151)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.channels.ClosedByInterruptException
at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at 
java.nio.channels.Channels$WritableByteChannelImpl.write(Channels.java:460)
..
at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:192)
at 

[jira] [Resolved] (SPARK-13965) TaskSetManager should kill the other running task attempts if any one task attempt succeeds for the same task

2016-03-28 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K resolved SPARK-13965.
---
Resolution: Duplicate

> TaskSetManager should kill the other running task attempts if any one task 
> attempt succeeds for the same task
> -
>
> Key: SPARK-13965
> URL: https://issues.apache.org/jira/browse/SPARK-13965
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
>Reporter: Devaraj K
>
> When we enable speculation, Driver would launch additional attempts for the 
> same task if it founds that attempt is progressing slow compared to other 
> tasks average progress and then there will be multiple task attempts in 
> running state.
> At present, if any one attempt gets succeeded others would be keep running 
> (even they could run till the job completion) and cannot be given these slots 
> to other tasks in same stage or in next stages. 
> We can kill these running task attempts when any other attempt gets succeeded 
> and can be given the slots to run other tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13965) TaskSetManager should kill the other running task attempts if any one task attempt succeeds for the same task

2016-03-19 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated SPARK-13965:
--
Summary: TaskSetManager should kill the other running task attempts if any 
one task attempt succeeds for the same task  (was: Driver should kill the other 
running task attempts if any one task attempt succeeds for the same task)

> TaskSetManager should kill the other running task attempts if any one task 
> attempt succeeds for the same task
> -
>
> Key: SPARK-13965
> URL: https://issues.apache.org/jira/browse/SPARK-13965
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1
>Reporter: Devaraj K
>
> When we enable speculation, Driver would launch additional attempts for the 
> same task if it founds that attempt is progressing slow compared to other 
> tasks average progress and then there will be multiple task attempts in 
> running state.
> At present, if any one attempt gets succeeded others would be keep running 
> (even they could run till the job completion) and cannot be given these slots 
> to other tasks in same stage or in next stages. 
> We can kill these running task attempts when any other attempt gets succeeded 
> and can be given the slots to run other tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13965) Driver should kill the other running task attempts if any one task attempt succeeds for the same task

2016-03-19 Thread Devaraj K (JIRA)
Devaraj K created SPARK-13965:
-

 Summary: Driver should kill the other running task attempts if any 
one task attempt succeeds for the same task
 Key: SPARK-13965
 URL: https://issues.apache.org/jira/browse/SPARK-13965
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.6.1
Reporter: Devaraj K


When we enable speculation, Driver would launch additional attempts for the 
same task if it founds that attempt is progressing slow compared to other tasks 
average progress and then there will be multiple task attempts in running state.

At present, if any one attempt gets succeeded others would be keep running 
(even they could run till the job completion) and cannot be given these slots 
to other tasks in same stage or in next stages. 

We can kill these running task attempts when any other attempt gets succeeded 
and can be given the slots to run other tasks.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13621) TestExecutor.scala needs to be moved to test package

2016-03-02 Thread Devaraj K (JIRA)
Devaraj K created SPARK-13621:
-

 Summary: TestExecutor.scala needs to be moved to test package
 Key: SPARK-13621
 URL: https://issues.apache.org/jira/browse/SPARK-13621
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.6.0, 2.0.0
Reporter: Devaraj K
Priority: Minor


TestExecutor.scala is in the package 
core\src\main\scala\org\apache\spark\deploy\client\ and it is getting used only 
by test classes. It needs to be moved to test package i.e. 
core\src\test\scala\org\apache\spark\deploy\client\ since the purpose of it is 
for test.

And also core\src\main\scala\org\apache\spark\deploy\client\TestClient.scala is 
not getting used any where and present in the src, I think it can be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13619) Jobs page UI shows wrong number of failed tasks

2016-03-02 Thread Devaraj K (JIRA)
Devaraj K created SPARK-13619:
-

 Summary: Jobs page UI shows wrong number of failed tasks
 Key: SPARK-13619
 URL: https://issues.apache.org/jira/browse/SPARK-13619
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 1.6.0, 2.0.0
Reporter: Devaraj K
Priority: Minor


In Master and History Server UI's, Jobs page shows the wrong number of failed 
tasks.

http://X.X.X.X:8080/history/app-20160303024135-0001/jobs/

h3. Completed Jobs (1)

||Job Id||  Description||   Submitted|| Duration||  Stages: 
Succeeded/Total||   Tasks (for all stages): Succeeded/Total||
|0 | saveAsTextFile at PipeLineTest.java:52| 2016/03/03 02:41:36 |  16 s |  
2/2 | 100/100 (2 failed)|
\\
\\
When we go to the Job details page, we can see different number for failed 
tasks and It is the correct number based on the failed tasks.
http://x.x.x.x:8080/history/app-20160303024135-0001/jobs/job/?id=0

h3. Completed Stages (2)

||Stage Id||Description||   Submitted|| Duration||  Tasks: 
Succeeded/Total||Input|| Output||Shuffle Read||  Shuffle Write||
|1| saveAsTextFile at PipeLineTest.java:52 +details|2016/03/03 02:41:51|
1 s|50/50 (6 failed)|   |7.6 KB|371.0 KB|   |
|0| mapToPair at PipeLineTest.java:29 +details|2016/03/03 02:41:36| 15 s|   
50/50|  1521.7 MB|  |   |   371.0 KB|




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13117) WebUI should use the local ip not 0.0.0.0

2016-02-29 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173224#comment-15173224
 ] 

Devaraj K commented on SPARK-13117:
---

I think we can start the Jetty server with the default value as "0.0.0.0" and 
it can take effect of the configured value for SPARK_PUBLIC_DNS if it is 
configured. It would change only for the Web UI and doesn't impact any other 
things. Changes will be some thing like below,

{code:xml}
protected val publicHostName = 
Option(conf.getenv("SPARK_PUBLIC_DNS")).getOrElse("0.0.0.0")
{code}

{code:xml}
try {
  serverInfo = Some(startJettyServer(publicHostName, port, sslOptions, 
handlers, conf, name))
  logInfo("Started %s at http://%s:%d".format(className, publicHostName, 
boundPort))
} catch {
{code}

[~srowen], any suggestions? Thanks


> WebUI should use the local ip not 0.0.0.0
> -
>
> Key: SPARK-13117
> URL: https://issues.apache.org/jira/browse/SPARK-13117
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.6.0
>Reporter: Jeremiah Jordan
>Priority: Minor
>
> When SPARK_LOCAL_IP is set everything seems to correctly bind and use that IP 
> except the WebUI.  The WebUI should use the SPARK_LOCAL_IP not always use 
> 0.0.0.0
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ui/WebUI.scala#L137



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13117) WebUI should use the local ip not 0.0.0.0

2016-02-29 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173218#comment-15173218
 ] 

Devaraj K commented on SPARK-13117:
---

[~jaypanicker], the proposed PR does the same but there is a problem while 
accessing web UI using the localhost or 127.0.0.1. Please have look into this 
comment https://github.com/apache/spark/pull/11133#issuecomment-188937933.

> WebUI should use the local ip not 0.0.0.0
> -
>
> Key: SPARK-13117
> URL: https://issues.apache.org/jira/browse/SPARK-13117
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.6.0
>Reporter: Jeremiah Jordan
>Priority: Minor
>
> When SPARK_LOCAL_IP is set everything seems to correctly bind and use that IP 
> except the WebUI.  The WebUI should use the SPARK_LOCAL_IP not always use 
> 0.0.0.0
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ui/WebUI.scala#L137



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13445) Selecting "data" with window function does not work unless aliased (using PARTITION BY)

2016-02-25 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated SPARK-13445:
--
Summary: Selecting "data" with window function does not work unless aliased 
(using PARTITION BY)  (was: Seleting "data" with window function does not work 
unless aliased (using PARTITION BY))

> Selecting "data" with window function does not work unless aliased (using 
> PARTITION BY)
> ---
>
> Key: SPARK-13445
> URL: https://issues.apache.org/jira/browse/SPARK-13445
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Reynold Xin
>Priority: Critical
>
> The code does not throw an exception if "data" is aliased.  Maybe this is a 
> reserved word or aliases are just required when using PARTITION BY?
> {code}
> sql("""
>   SELECT 
> data as the_data,
> row_number() over (partition BY data.type) AS foo
>   FROM event_record_sample
> """)
> {code}
> However, this code throws an error:
> {code}
> sql("""
>   SELECT 
> data,
> row_number() over (partition BY data.type) AS foo
>   FROM event_record_sample
> """)
> {code}
> {code}
> org.apache.spark.sql.AnalysisException: resolved attribute(s) type#15246 
> missing from 
> data#15107,par_cat#15112,schemaMajorVersion#15110,source#15108,recordId#15103,features#15106,eventType#15105,ts#15104L,schemaMinorVersion#15111,issues#15109
>  in operator !Project [data#15107,type#15246];
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:44)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:183)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:105)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:104)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:104)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:104)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:104)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:104)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:104)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:104)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:104)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:104)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:50)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:44)
>   at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34)
>   at org.apache.spark.sql.DataFrame.(DataFrame.scala:133)
>   at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
>   at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:816)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13061) Error in spark rest api application info for job names contains spaces

2016-02-10 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141340#comment-15141340
 ] 

Devaraj K commented on SPARK-13061:
---


You have mentioned the id as 'Spark shell' in the issue description, I don't 
think that is the way the API returns.

{code:xml}
http://spark.mysite.com:20888/proxy/application_1447676402999_1254/api/v1/applications/
 returns:
[ {
"id" : "Spark shell",
"name" : "Spark shell",
{code}

If we are requesting HTTP server with some URL which is having spaces using any 
browser or any other client then the browser/client would encode the URL(as 
part of encoding it replaces spaces with %20) before sending the request to the 
HTTP Server. This is happening when you are passing id as "Spark shell".

{code:xml}/applications/[app-id]/jobs/[job-id]  Details for the given job{code}

I think you need to pass the job-id if you want to get details for the specific 
job not the name.

> Error in spark rest api application info for job names contains spaces
> --
>
> Key: SPARK-13061
> URL: https://issues.apache.org/jira/browse/SPARK-13061
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.2
>Reporter: Avihoo Mamka
>Priority: Trivial
>  Labels: rest_api, spark
>
> When accessing spark rest api with application id to get job specific id 
> status, a job with name containing whitespaces are being encoded to '%20' and 
> therefore the rest api returns `no such app`.
> For example:
> http://spark.mysite.com:20888/proxy/application_1447676402999_1254/api/v1/applications/
>  returns:
> [ {
>   "id" : "Spark shell",
>   "name" : "Spark shell",
>   "attempts" : [ {
> "startTime" : "2016-01-28T09:20:58.526GMT",
> "endTime" : "1969-12-31T23:59:59.999GMT",
> "sparkUser" : "",
> "completed" : false
>   } ]
> } ]
> and then when accessing:
> http://spark.mysite.com:20888/proxy/application_1447676402999_1254/api/v1/applications/Spark
>  shell/
> the result returned is:
> unknown app: Spark%20shell



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13061) Error in spark rest api application info for job names contains spaces

2016-02-10 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140601#comment-15140601
 ] 

Devaraj K commented on SPARK-13061:
---

[~avihoo], I am trying to reproduce the issue but I see the id format as below 
when I submit to standalone master and yarn cluster.
{code:xml}
[ {
  "id" : "app-20160210202703-",
  "name" : "Spark shell",
{code}

{code:xml}
}, {
  "id" : "application_1452616238844_0041",
  "name" : "Spark Pi",
{code}

Can you give more details like how you are able to get the id as 'Spark shell' 
and which process REST api is returning like this?

> Error in spark rest api application info for job names contains spaces
> --
>
> Key: SPARK-13061
> URL: https://issues.apache.org/jira/browse/SPARK-13061
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.2
>Reporter: Avihoo Mamka
>Priority: Trivial
>  Labels: rest_api, spark
>
> When accessing spark rest api with application id to get job specific id 
> status, a job with name containing whitespaces are being encoded to '%20' and 
> therefore the rest api returns `no such app`.
> For example:
> http://spark.mysite.com:20888/proxy/application_1447676402999_1254/api/v1/applications/
>  returns:
> [ {
>   "id" : "Spark shell",
>   "name" : "Spark shell",
>   "attempts" : [ {
> "startTime" : "2016-01-28T09:20:58.526GMT",
> "endTime" : "1969-12-31T23:59:59.999GMT",
> "sparkUser" : "",
> "completed" : false
>   } ]
> } ]
> and then when accessing:
> http://spark.mysite.com:20888/proxy/application_1447676402999_1254/api/v1/applications/Spark
>  shell/
> the result returned is:
> unknown app: Spark%20shell



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13117) WebUI should use the local ip not 0.0.0.0

2016-02-08 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138251#comment-15138251
 ] 

Devaraj K commented on SPARK-13117:
---

Thanks [~jjordan].

> WebUI should use the local ip not 0.0.0.0
> -
>
> Key: SPARK-13117
> URL: https://issues.apache.org/jira/browse/SPARK-13117
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.6.0
>Reporter: Jeremiah Jordan
>
> When SPARK_LOCAL_IP is set everything seems to correctly bind and use that IP 
> except the WebUI.  The WebUI should use the SPARK_LOCAL_IP not always use 
> 0.0.0.0
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ui/WebUI.scala#L137



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13117) WebUI should use the local ip not 0.0.0.0

2016-02-08 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137320#comment-15137320
 ] 

Devaraj K commented on SPARK-13117:
---

Thanks [~jjordan] for reporting. I would like to provide PR if you are not 
planning to work on this. Please let me know, Thanks.

> WebUI should use the local ip not 0.0.0.0
> -
>
> Key: SPARK-13117
> URL: https://issues.apache.org/jira/browse/SPARK-13117
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.6.0
>Reporter: Jeremiah Jordan
>
> When SPARK_LOCAL_IP is set everything seems to correctly bind and use that IP 
> except the WebUI.  The WebUI should use the SPARK_LOCAL_IP not always use 
> 0.0.0.0
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ui/WebUI.scala#L137



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13016) Replace example code in mllib-dimensionality-reduction.md using include_example

2016-02-08 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137310#comment-15137310
 ] 

Devaraj K commented on SPARK-13016:
---

I am working on this, I will provide PR for this. Thanks

> Replace example code in mllib-dimensionality-reduction.md using 
> include_example
> ---
>
> Key: SPARK-13016
> URL: https://issues.apache.org/jira/browse/SPARK-13016
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Reporter: Xusen Yin
>Priority: Minor
>  Labels: starter
>
> See examples in other finished sub-JIRAs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13063) Make the SPARK YARN STAGING DIR as configurable

2016-01-28 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15121427#comment-15121427
 ] 

Devaraj K commented on SPARK-13063:
---

Yes, the sub-directories under the staging dir are specific to each 
application. If 'spark.yarn.preserve.staging.files' is enabled then the 
application will not remove them after completion and when we want to analyze 
these it would become difficult to identify the directories belong to spark 
applications. And also if we want to delete preserved staging files(could be 
after analysis), we should be careful that we will not be removing any other 
types of apps staging files. I think it would be good if we have logically 
separate staging directory for all spark apps which is configurable to avoid 
the difficulties when they are mixed with different types of apps. I can 
provide a PR for making it configurable and keep the current behavior as it is 
when the user doesn't specify staging directory.

> Make the SPARK YARN STAGING DIR as configurable
> ---
>
> Key: SPARK-13063
> URL: https://issues.apache.org/jira/browse/SPARK-13063
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Reporter: Devaraj K
>Priority: Minor
>
> SPARK YARN STAGING DIR is based on the file system home directory. If the 
> user wants to change this staging directory due to the same used by any other 
> applications, there is no provision for the user to specify a different 
> directory for staging dir.
> {code:xml}
>  val stagingDirPath = new Path(remoteFs.getHomeDirectory, stagingDir)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13063) Make the SPARK YARN STAGING DIR as configurable

2016-01-28 Thread Devaraj K (JIRA)
Devaraj K created SPARK-13063:
-

 Summary: Make the SPARK YARN STAGING DIR as configurable
 Key: SPARK-13063
 URL: https://issues.apache.org/jira/browse/SPARK-13063
 Project: Spark
  Issue Type: Bug
  Components: YARN
Reporter: Devaraj K


SPARK YARN STAGING DIR is based on the file system home directory. If the user 
wants to change this staging directory due to the same used by any other 
applications, there is no provision for the user to specify a different 
directory for staging dir.

{code:xml}
 val stagingDirPath = new Path(remoteFs.getHomeDirectory, stagingDir)
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13063) Make the SPARK YARN STAGING DIR as configurable

2016-01-28 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15121239#comment-15121239
 ] 

Devaraj K commented on SPARK-13063:
---

If this default location is already used by another apps(like Mapreduce), users 
may need to change it to some other directory. Presently we don’t have 
provision to change the value, we can make this as configurable and have the 
current staging dir as default value.

> Make the SPARK YARN STAGING DIR as configurable
> ---
>
> Key: SPARK-13063
> URL: https://issues.apache.org/jira/browse/SPARK-13063
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Reporter: Devaraj K
>Priority: Minor
>
> SPARK YARN STAGING DIR is based on the file system home directory. If the 
> user wants to change this staging directory due to the same used by any other 
> applications, there is no provision for the user to specify a different 
> directory for staging dir.
> {code:xml}
>  val stagingDirPath = new Path(remoteFs.getHomeDirectory, stagingDir)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13063) Make the SPARK YARN STAGING DIR as configurable

2016-01-28 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15121523#comment-15121523
 ] 

Devaraj K commented on SPARK-13063:
---

I don't think it would be good to store the files in the spark decided location 
in the file system for the user apps and not giving the provision to change the 
location for the user. I don't want to compare with MR but if we see in MR, it 
provides 'yarn.app.mapreduce.am.staging-dir' config for staging dir. I agree 
that it adds an another configuration if we move ahead. Thanks.

> Make the SPARK YARN STAGING DIR as configurable
> ---
>
> Key: SPARK-13063
> URL: https://issues.apache.org/jira/browse/SPARK-13063
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Reporter: Devaraj K
>Priority: Minor
>
> SPARK YARN STAGING DIR is based on the file system home directory. If the 
> user wants to change this staging directory due to the same used by any other 
> applications, there is no provision for the user to specify a different 
> directory for staging dir.
> {code:xml}
>  val stagingDirPath = new Path(remoteFs.getHomeDirectory, stagingDir)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1253) Need to load mapred-site.xml for reading mapreduce.application.classpath

2015-12-23 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069890#comment-15069890
 ] 

Devaraj K commented on SPARK-1253:
--

Thanks [~srowen] for letting me know. I tried to reproduce it but seems it is 
loading the mapred-site.xml file and giving the configurations updated in that 
file. I think it is not a problem anymore and it can be closed unless there are 
no other expectations here.

> Need to load mapred-site.xml for reading mapreduce.application.classpath
> 
>
> Key: SPARK-1253
> URL: https://issues.apache.org/jira/browse/SPARK-1253
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Reporter: Sandy Pérez González
>
> In Spark on YARN, we use mapreduce.application.classpath to discover the 
> location of the MR jars so that we can add them executor classpaths.
> This config comes from mapred-site.xml, which we aren't loading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1253) Need to load mapred-site.xml for reading mapreduce.application.classpath

2015-12-22 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15067962#comment-15067962
 ] 

Devaraj K commented on SPARK-1253:
--

[~sandyr], Do you want to work on this issue? I would like to provide a PR for 
this if you don't mind, Thanks.

> Need to load mapred-site.xml for reading mapreduce.application.classpath
> 
>
> Key: SPARK-1253
> URL: https://issues.apache.org/jira/browse/SPARK-1253
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Reporter: Sandy Pérez González
>Assignee: Sandy Ryza
>
> In Spark on YARN, we use mapreduce.application.classpath to discover the 
> location of the MR jars so that we can add them executor classpaths.
> This config comes from mapred-site.xml, which we aren't loading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12316) Stack overflow with endless call of `Delegation token thread` when application end.

2015-12-15 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059602#comment-15059602
 ] 

Devaraj K commented on SPARK-12316:
---

[~carlmartin], can you provide the stack trace for this error? Thanks.

> Stack overflow with endless call of `Delegation token thread` when 
> application end.
> ---
>
> Key: SPARK-12316
> URL: https://issues.apache.org/jira/browse/SPARK-12316
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.6.0
>Reporter: SaintBacchus
>
> When application end, AM will clean the staging dir.
> But if the driver trigger to update the delegation token, it will can't find 
> the right token file and then it will endless cycle call the method 
> 'updateCredentialsIfRequired'.
> Then it lead to StackOverflowError.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4117) Spark on Yarn handle AM being told command from RM

2015-12-02 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035721#comment-15035721
 ] 

Devaraj K commented on SPARK-4117:
--

Thanks [~tgraves] for the pointer. I will provide PR for this to avoid 
unnecessary retries when it gets ApplicationAttemptNotFoundException.

> Spark on Yarn handle AM being told command from RM
> --
>
> Key: SPARK-4117
> URL: https://issues.apache.org/jira/browse/SPARK-4117
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 1.2.0
>Reporter: Thomas Graves
>
> In the allocateResponse from the RM it can send commands that the AM should 
> follow. for instance AM_RESYNC and AM_SHUTDOWN.  We should add support for 
> those.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-8276) NPE in YarnClientSchedulerBackend.stop

2015-11-25 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K resolved SPARK-8276.
--
Resolution: Duplicate

Resolving it as duplicate of SPARK-8754, Please reopen it if you disagree.

> NPE in YarnClientSchedulerBackend.stop
> --
>
> Key: SPARK-8276
> URL: https://issues.apache.org/jira/browse/SPARK-8276
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.5.0
>Reporter: Steve Loughran
>Priority: Minor
>
> NPE seen in {{YarnClientSchedulerBackend.stop()}} after problem setting up 
> job; on the line {{monitorThread.interrupt()}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4117) Spark on Yarn handle AM being told command from RM

2015-11-25 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026526#comment-15026526
 ] 

Devaraj K commented on SPARK-4117:
--

For *ApplicationAttemptNotFoundException*, there is no explicit handling and it 
is getting handled as an Uncaught exception and then the ApplicationMaster is 
shutting down. Below is the code which does this.

{code:title=ApplicationMaster.scala|borderStyle=solid}
  if (isClusterMode) {
runDriver(securityMgr)
  } else {
runExecutorLauncher(securityMgr)
  }
} catch {
  case e: Exception =>
// catch everything else if not specifically handled
logError("Uncaught exception: ", e)
finish(FinalApplicationStatus.FAILED,
  ApplicationMaster.EXIT_UNCAUGHT_EXCEPTION,
  "Uncaught exception: " + e)
}
{code}


Please find the exception stack trace from the log,

{code:xml}
15/11/25 19:51:24 WARN cluster.YarnClusterScheduler: Initial job has not 
accepted any resources; check your cluster UI to ensure that workers are 
registered and have sufficient resources
15/11/25 19:51:30 ERROR yarn.ApplicationMaster: Uncaught exception: 
org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: 
Application attempt appattempt_1448461020570_0001_01 doesn't exist in 
ApplicationMasterService cache.
at 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:391)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
……….
at com.sun.proxy.$Proxy16.allocate(Unknown Source)
at 
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:277)
at 
org.apache.spark.deploy.yarn.YarnAllocator.allocateResources(YarnAllocator.scala:231)
at 
org.apache.spark.deploy.yarn.ApplicationMaster.registerAM(ApplicationMaster.scala:292)
at 
org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:336)
at 
org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:185)
at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:653)
at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:69)
at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:68)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at 
org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:68)
at 
org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:651)
at 
org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException):
 Application attempt appattempt_1448461020570_0001_01 doesn't exist in 
ApplicationMasterService cache.
at 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:391)
…..
at com.sun.proxy.$Proxy15.allocate(Unknown Source)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
... 21 more
15/11/25 19:51:30 INFO yarn.ApplicationMaster: Final app status: FAILED, 
exitCode: 10, (reason: Uncaught exception: 
org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: 
Application attempt appattempt_1448461020570_0001_01 doesn't exist in 
ApplicationMasterService cache.
at 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:391)
………..
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2213)
)
15/11/25 19:51:30 INFO spark.SparkContext: Invoking stop() from shutdown hook
{code}


For *ApplicationMasterNotRegisteredException*, 
YarnAllocator.allocateResources() is invoking amClient.allocate() and 
AMRMClientImpl.allocate() is internally handling the 
ApplicationMasterNotRegisteredException by resyncing with the ResourceManager. 
Please find the below piece of code which handles this.

{code:title=YarnAllocator.scala|borderStyle=solid}
  def allocateResources(): Unit = synchronized {
updateResourceRequests()

val progressIndicator = 0.1f
// Poll the ResourceManager. This doubles as a heartbeat if there are no 
pending container
// requests.
val allocateResponse = amClient.allocate(progressIndicator)
{code}



{code:title=org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl|borderStyle=solid}

  

[jira] [Commented] (SPARK-4117) Spark on Yarn handle AM being told command from RM

2015-11-23 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023785#comment-15023785
 ] 

Devaraj K commented on SPARK-4117:
--

In YARN, now AM commands(AM_RESYNC, AM_SHUTDOWN) are deprecated and no more RM 
sends these to AM. Instead of sending these commands, RM throws 
ApplicationMasterNotRegisteredException for syncing with ResourceManager and 
ApplicationAttemptNotFoundException for letting the AM to shutdown itself. I 
see these both scenarios are already handled and ApplicationMaster does the 
same. 

[~tgraves], Do you have any other expectations from this Jira or can we close 
this ticket?


> Spark on Yarn handle AM being told command from RM
> --
>
> Key: SPARK-4117
> URL: https://issues.apache.org/jira/browse/SPARK-4117
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 1.2.0
>Reporter: Thomas Graves
>
> In the allocateResponse from the RM it can send commands that the AM should 
> follow. for instance AM_RESYNC and AM_SHUTDOWN.  We should add support for 
> those.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8276) NPE in YarnClientSchedulerBackend.stop

2015-11-23 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023790#comment-15023790
 ] 

Devaraj K commented on SPARK-8276:
--

I think it is duplicate of SPARK-8754.

> NPE in YarnClientSchedulerBackend.stop
> --
>
> Key: SPARK-8276
> URL: https://issues.apache.org/jira/browse/SPARK-8276
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.5.0
>Reporter: Steve Loughran
>Priority: Minor
>
> NPE seen in {{YarnClientSchedulerBackend.stop()}} after problem setting up 
> job; on the line {{monitorThread.interrupt()}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8754) YarnClientSchedulerBackend doesn't stop gracefully in failure conditions

2015-07-01 Thread Devaraj K (JIRA)
Devaraj K created SPARK-8754:


 Summary: YarnClientSchedulerBackend doesn't stop gracefully in 
failure conditions
 Key: SPARK-8754
 URL: https://issues.apache.org/jira/browse/SPARK-8754
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.4.0
Reporter: Devaraj K
Priority: Minor


{code:xml}
java.lang.NullPointerException
at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.stop(YarnClientSchedulerBackend.scala:151)
at 
org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:421)
at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1447)
at org.apache.spark.SparkContext.stop(SparkContext.scala:1651)
at org.apache.spark.SparkContext.init(SparkContext.scala:572)
at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:28)
at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:621)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:170)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
{code}

If the application has FINISHED/FAILED/KILLED or failed to launch application 
master, monitorThread is not getting initialized but monitorThread.interrupt() 
is getting invoked as part of stop() without any check and It is causing to 
throw NPE and also it is preventing to stop the client.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1936) Add apache header and remove author tags

2014-05-27 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009352#comment-14009352
 ] 

Devaraj K commented on SPARK-1936:
--

Pull Request added : https://github.com/apache/spark/pull/890

 Add apache header and remove author tags
 

 Key: SPARK-1936
 URL: https://issues.apache.org/jira/browse/SPARK-1936
 Project: Spark
  Issue Type: Bug
Reporter: Devaraj K
Priority: Minor

 These below files don’t have apache header and contain author tags.
 {code:xml}
 spark\repl\src\main\scala\org\apache\spark\repl\SparkExprTyper.scala
 spark\repl\src\main\scala\org\apache\spark\repl\SparkILoop.scala
 spark\repl\src\main\scala\org\apache\spark\repl\SparkILoopInit.scala
 spark\repl\src\main\scala\org\apache\spark\repl\SparkIMain.scala
 spark\repl\src\main\scala\org\apache\spark\repl\SparkImports.scala
 spark\repl\src\main\scala\org\apache\spark\repl\SparkJLineCompletion.scala
 spark\repl\src\main\scala\org\apache\spark\repl\SparkJLineReader.scala
 spark\repl\src\main\scala\org\apache\spark\repl\SparkMemberHandlers.scala
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)