Re: Issue running Spark 1.4 on Yarn

2015-06-23 Thread Matt Kapilevich
Hi Kevin I never did. I checked for free space in the root partition, don't
think this was an issue. Now that 1.4 is officially out I'll probably give
it another shot.
On Jun 22, 2015 4:28 PM, Kevin Markey kevin.mar...@oracle.com wrote:

  Matt:  Did you ever resolve this issue?  When running on a cluster or
 pseudocluster with too little space for /tmp or /var files, we've seen this
 sort of behavior.  There's enough memory, and enough HDFS space, but
 there's insufficient space on one or more nodes for other temporary files
 as logs grow and don't get cleared or deleted.  Depends on your
 configuration.  Often restarting will temporarily fix things, but for
 shorter and shorter periods of time until nothing works.

 Fix is to expand space available for logs, pruning them, a cron job to
 prune them periodically, and/or modifying limits on logs.

 Kevin

 On 06/09/2015 04:15 PM, Matt Kapilevich wrote:

 I've tried running a Hadoop app pointing to the same queue. Same thing
 now, the job doesn't get accepted. I've cleared out the queue and killed
 all the pending jobs, the queue is still unusable.

  It seems like an issue with YARN, but it's specifically Spark that
 leaves the queue in this state. I've ran a Hadoop job in a for loop 10x,
 while specifying the queue explicitly, just to double-check.

 On Tue, Jun 9, 2015 at 4:45 PM, Matt Kapilevich matve...@gmail.com
 wrote:

 From the RM scheduler, I see 3 applications currently stuck in the
 root.thequeue queue.

  Used Resources: memory:0, vCores:0
 Num Active Applications: 0
 Num Pending Applications: 3
 Min Resources: memory:0, vCores:0
 Max Resources: memory:6655, vCores:4
 Steady Fair Share: memory:1664, vCores:0
 Instantaneous Fair Share: memory:6655, vCores:0

 On Tue, Jun 9, 2015 at 4:30 PM, Matt Kapilevich matve...@gmail.com
 wrote:

 Yes! If I either specify a different queue or don't specify a queue at
 all, it works.

 On Tue, Jun 9, 2015 at 4:25 PM, Marcelo Vanzin van...@cloudera.com
 wrote:

 Does it work if you don't specify a queue?

 On Tue, Jun 9, 2015 at 1:21 PM, Matt Kapilevich matve...@gmail.com
 wrote:

 Hi Marcelo,

  Yes, restarting YARN fixes this behavior and it again works the
 first few times. The only thing that's consistent is that once Spark job
 submissions stop working, it's broken for good.

 On Tue, Jun 9, 2015 at 4:12 PM, Marcelo Vanzin van...@cloudera.com
 wrote:

  Apologies, I see you already posted everything from the RM logs
 that mention your stuck app.

  Have you tried restarting the YARN cluster to see if that changes
 anything? Does it go back to the first few tries work behaviour?

  I run 1.4 on top of CDH 5.4 pretty often and haven't seen anything
 like this.


 On Tue, Jun 9, 2015 at 1:01 PM, Marcelo Vanzin van...@cloudera.com
 wrote:

  On Tue, Jun 9, 2015 at 11:31 AM, Matt Kapilevich 
 matve...@gmail.com wrote:

  Like I mentioned earlier, I'm able to execute Hadoop jobs fine
 even now - this problem is specific to Spark.


  That doesn't necessarily mean anything. Spark apps have different
 resource requirements than Hadoop apps.

 Check your RM logs for any line that mentions your Spark app id.
 That may give you some insight into what's happening or not.

 --
 Marcelo




  --
 Marcelo





  --
 Marcelo








Re: Issue running Spark 1.4 on Yarn

2015-06-11 Thread matvey14
No, this just a random queue name I picked when submitting the job, there's
no specific configuration for it. I am not logged in, so don't have the
default fair scheduler configuration in front of me, but I don't think
that's the problem. The cluster is completely idle, there aren't any jobs
being executed, so it can't be hitting any of fair scheduler's limits.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Issue-running-Spark-1-4-on-Yarn-tp23211p23274.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Issue running Spark 1.4 on Yarn

2015-06-11 Thread nsalian
Hello,

Since the other queues are fine, I reckon, there may be a limit in the max
apps or memory on this queue in particular.
I don't suspect fairscheduler limits either but on this queue we may be
seeing / hitting a maximum.

Could you try to get the configs for the queue? That should provide more
context.

Thank you.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Issue-running-Spark-1-4-on-Yarn-tp23211p23285.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Issue running Spark 1.4 on Yarn

2015-06-10 Thread matvey14
Hi nsalian,

For some reason the rest of this thread isn't showing up here. The
NodeManager isn't busy. I'll copy/paste, the details are in there.



I've tried running a Hadoop app pointing to the same queue. Same thing now,
the job doesn't get accepted. I've cleared out the queue and killed all the
pending jobs, the queue is still unusable.

It seems like an issue with YARN, but it's specifically Spark that leaves
the queue in this state. I've ran a Hadoop job in a for loop 10x, while
specifying the queue explicitly, just to double-check.

On Tue, Jun 9, 2015 at 4:45 PM, Matt Kapilevich matve...@gmail.com wrote:
From the RM scheduler, I see 3 applications currently stuck in the
root.thequeue queue.

Used Resources: memory:0, vCores:0
Num Active Applications: 0
Num Pending Applications: 3
Min Resources: memory:0, vCores:0
Max Resources: memory:6655, vCores:4
Steady Fair Share: memory:1664, vCores:0
Instantaneous Fair Share: memory:6655, vCores:0

On Tue, Jun 9, 2015 at 4:30 PM, Matt Kapilevich matve...@gmail.com wrote:
Yes! If I either specify a different queue or don't specify a queue at all,
it works.

On Tue, Jun 9, 2015 at 4:25 PM, Marcelo Vanzin van...@cloudera.com wrote:
Does it work if you don't specify a queue?

On Tue, Jun 9, 2015 at 1:21 PM, Matt Kapilevich matve...@gmail.com wrote:
Hi Marcelo,

Yes, restarting YARN fixes this behavior and it again works the first few
times. The only thing that's consistent is that once Spark job submissions
stop working, it's broken for good.

On Tue, Jun 9, 2015 at 4:12 PM, Marcelo Vanzin van...@cloudera.com wrote:
Apologies, I see you already posted everything from the RM logs that mention
your stuck app.

Have you tried restarting the YARN cluster to see if that changes anything?
Does it go back to the first few tries work behaviour?

I run 1.4 on top of CDH 5.4 pretty often and haven't seen anything like
this.


On Tue, Jun 9, 2015 at 1:01 PM, Marcelo Vanzin van...@cloudera.com wrote:
On Tue, Jun 9, 2015 at 11:31 AM, Matt Kapilevich matve...@gmail.com wrote:
 Like I mentioned earlier, I'm able to execute Hadoop jobs fine even now -
this problem is specific to Spark.

That doesn't necessarily mean anything. Spark apps have different resource
requirements than Hadoop apps.
 
Check your RM logs for any line that mentions your Spark app id. That may
give you some insight into what's happening or not.

-- 
Marcelo



-- 
Marcelo




-- 
Marcelo







--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Issue-running-Spark-1-4-on-Yarn-tp23211p23258.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Issue running Spark 1.4 on Yarn

2015-06-10 Thread nsalian
Hi,

Thanks for the added information. Helps add more context.

Is that specific queue different from the others?

FairScheduler.xml should have the information needed.Or if you have a
separate allocations.xml.

Something of this format:
allocations
  queue name=sample_queue
minResources1 mb,0vcores/minResources
maxResources9 mb,0vcores/maxResources
maxRunningApps50/maxRunningApps
maxAMShare0.1/maxAMShare
weight2.0/weight
schedulingPolicyfair/schedulingPolicy
queue name=sample_sub_queue
  aclSubmitAppscharlie/aclSubmitApps
  minResources5000 mb,0vcores/minResources
/queue
  /queue

Thank you.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Issue-running-Spark-1-4-on-Yarn-tp23211p23261.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Issue running Spark 1.4 on Yarn

2015-06-09 Thread Marcelo Vanzin
If your application is stuck in that state, it generally means your cluster
doesn't have enough resources to start it.

In the RM logs you can see how many vcores / memory the application is
asking for, and then you can check your RM configuration to see if that's
currently available on any single NM.

On Tue, Jun 9, 2015 at 7:56 AM, Matt Kapilevich matve...@gmail.com wrote:

 Hi all,

 I'm manually building Spark from source against 1.4 branch and submitting
 the job against Yarn. I am seeing very strange behavior. The first 2 or 3
 times I submit the job, it runs fine, computes Pi, and exits. The next time
 I run it, it gets stuck in the ACCEPTED state.

 I'm kicking off a job using yarn-client mode like this:

 ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master
 yarn-client  --num-executors 3--driver-memory 4g --executor-memory
 2g--executor-cores 1--queue thequeue
 examples/target/scala-2.10/spark-examples*.jar10

 Here's what ResourceManager shows:[image: Yarn ResourceManager UI]

 In Yarn ResourceManager logs, all I'm seeing is this:

 2015-06-08 14:49:57,166 INFO
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
 Added Application Attempt appattempt_1433789077942_0004_01 to scheduler
 from user: root
 2015-06-08 14:49:57,166 INFO
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
 appattempt_1433789077942_0004_01 State change from SUBMITTED to
 SCHEDULED

 There's nothing in the NodeManager logs (though its up and running), the
 job isn't getting that far.

 It seems to me that there's an issue somewhere between Spark 1.4 and Yarn
 integration. Hadoop runs without any issues. I've ran the below multiple
 times.

 yarn jar
 /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.4.2.jar pi
 16 100

 For reference, I'm compiling the source against 1.4 branch, and running it
 on a single-node cluster with CDH5.4 and Hadoop 2.6, distributed mode. I am
 using the following to compile: mvn -Phadoop-2.6 -Dhadoop.version=2.6.0
 -Pyarn -Phive -Phive-thriftserver -DskipTests clean package

 Any help appreciated.

 Thanks,
 -Matt




-- 
Marcelo


Re: Issue running Spark 1.4 on Yarn

2015-06-09 Thread Matt Kapilevich
Hi Marcelo,

Thanks. I think something more subtle is happening.

I'm running a single-node cluster, so there's only 1 NM. When I executed
the exact same job the 4th time, the cluster was idle, and there was
nothing else being executed. RM currently reports that I have 6.5GB of
memory and 4 cpus available. However, the job is still stuck in the
ACCEPTED state a day later. Like I mentioned earlier, I'm able to execute
Hadoop jobs fine even now - this problem is specific to Spark.

Thanks,
-Matt

On Tue, Jun 9, 2015 at 12:32 PM, Marcelo Vanzin van...@cloudera.com wrote:

 If your application is stuck in that state, it generally means your
 cluster doesn't have enough resources to start it.

 In the RM logs you can see how many vcores / memory the application is
 asking for, and then you can check your RM configuration to see if that's
 currently available on any single NM.

 On Tue, Jun 9, 2015 at 7:56 AM, Matt Kapilevich matve...@gmail.com
 wrote:

 Hi all,

 I'm manually building Spark from source against 1.4 branch and submitting
 the job against Yarn. I am seeing very strange behavior. The first 2 or 3
 times I submit the job, it runs fine, computes Pi, and exits. The next time
 I run it, it gets stuck in the ACCEPTED state.

 I'm kicking off a job using yarn-client mode like this:

 ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master
 yarn-client  --num-executors 3--driver-memory 4g --executor-memory
 2g--executor-cores 1--queue thequeue
 examples/target/scala-2.10/spark-examples*.jar10

 Here's what ResourceManager shows:[image: Yarn ResourceManager UI]

 In Yarn ResourceManager logs, all I'm seeing is this:

 2015-06-08 14:49:57,166 INFO
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
 Added Application Attempt appattempt_1433789077942_0004_01 to scheduler
 from user: root
 2015-06-08 14:49:57,166 INFO
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
 appattempt_1433789077942_0004_01 State change from SUBMITTED to
 SCHEDULED

 There's nothing in the NodeManager logs (though its up and running), the
 job isn't getting that far.

 It seems to me that there's an issue somewhere between Spark 1.4 and Yarn
 integration. Hadoop runs without any issues. I've ran the below multiple
 times.

 yarn jar
 /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.4.2.jar pi
 16 100

 For reference, I'm compiling the source against 1.4 branch, and running
 it on a single-node cluster with CDH5.4 and Hadoop 2.6, distributed mode. I
 am using the following to compile: mvn -Phadoop-2.6 -Dhadoop.version=2.6.0
 -Pyarn -Phive -Phive-thriftserver -DskipTests clean package

 Any help appreciated.

 Thanks,
 -Matt




 --
 Marcelo



Re: Issue running Spark 1.4 on Yarn

2015-06-09 Thread Matt Kapilevich
Yes! If I either specify a different queue or don't specify a queue at all,
it works.

On Tue, Jun 9, 2015 at 4:25 PM, Marcelo Vanzin van...@cloudera.com wrote:

 Does it work if you don't specify a queue?

 On Tue, Jun 9, 2015 at 1:21 PM, Matt Kapilevich matve...@gmail.com
 wrote:

 Hi Marcelo,

 Yes, restarting YARN fixes this behavior and it again works the first few
 times. The only thing that's consistent is that once Spark job submissions
 stop working, it's broken for good.

 On Tue, Jun 9, 2015 at 4:12 PM, Marcelo Vanzin van...@cloudera.com
 wrote:

 Apologies, I see you already posted everything from the RM logs that
 mention your stuck app.

 Have you tried restarting the YARN cluster to see if that changes
 anything? Does it go back to the first few tries work behaviour?

 I run 1.4 on top of CDH 5.4 pretty often and haven't seen anything like
 this.


 On Tue, Jun 9, 2015 at 1:01 PM, Marcelo Vanzin van...@cloudera.com
 wrote:

 On Tue, Jun 9, 2015 at 11:31 AM, Matt Kapilevich matve...@gmail.com
 wrote:

  Like I mentioned earlier, I'm able to execute Hadoop jobs fine even
 now - this problem is specific to Spark.


 That doesn't necessarily mean anything. Spark apps have different
 resource requirements than Hadoop apps.

 Check your RM logs for any line that mentions your Spark app id. That
 may give you some insight into what's happening or not.

 --
 Marcelo




 --
 Marcelo





 --
 Marcelo



Re: Issue running Spark 1.4 on Yarn

2015-06-09 Thread Matt Kapilevich
From the RM scheduler, I see 3 applications currently stuck in the
root.thequeue queue.

Used Resources: memory:0, vCores:0
Num Active Applications: 0
Num Pending Applications: 3
Min Resources: memory:0, vCores:0
Max Resources: memory:6655, vCores:4
Steady Fair Share: memory:1664, vCores:0
Instantaneous Fair Share: memory:6655, vCores:0

On Tue, Jun 9, 2015 at 4:30 PM, Matt Kapilevich matve...@gmail.com wrote:

 Yes! If I either specify a different queue or don't specify a queue at
 all, it works.

 On Tue, Jun 9, 2015 at 4:25 PM, Marcelo Vanzin van...@cloudera.com
 wrote:

 Does it work if you don't specify a queue?

 On Tue, Jun 9, 2015 at 1:21 PM, Matt Kapilevich matve...@gmail.com
 wrote:

 Hi Marcelo,

 Yes, restarting YARN fixes this behavior and it again works the first
 few times. The only thing that's consistent is that once Spark job
 submissions stop working, it's broken for good.

 On Tue, Jun 9, 2015 at 4:12 PM, Marcelo Vanzin van...@cloudera.com
 wrote:

 Apologies, I see you already posted everything from the RM logs that
 mention your stuck app.

 Have you tried restarting the YARN cluster to see if that changes
 anything? Does it go back to the first few tries work behaviour?

 I run 1.4 on top of CDH 5.4 pretty often and haven't seen anything like
 this.


 On Tue, Jun 9, 2015 at 1:01 PM, Marcelo Vanzin van...@cloudera.com
 wrote:

 On Tue, Jun 9, 2015 at 11:31 AM, Matt Kapilevich matve...@gmail.com
 wrote:

  Like I mentioned earlier, I'm able to execute Hadoop jobs fine even
 now - this problem is specific to Spark.


 That doesn't necessarily mean anything. Spark apps have different
 resource requirements than Hadoop apps.

 Check your RM logs for any line that mentions your Spark app id. That
 may give you some insight into what's happening or not.

 --
 Marcelo




 --
 Marcelo





 --
 Marcelo





Re: Issue running Spark 1.4 on Yarn

2015-06-09 Thread Marcelo Vanzin
Apologies, I see you already posted everything from the RM logs that
mention your stuck app.

Have you tried restarting the YARN cluster to see if that changes anything?
Does it go back to the first few tries work behaviour?

I run 1.4 on top of CDH 5.4 pretty often and haven't seen anything like
this.


On Tue, Jun 9, 2015 at 1:01 PM, Marcelo Vanzin van...@cloudera.com wrote:

 On Tue, Jun 9, 2015 at 11:31 AM, Matt Kapilevich matve...@gmail.com
 wrote:

  Like I mentioned earlier, I'm able to execute Hadoop jobs fine even now
 - this problem is specific to Spark.


 That doesn't necessarily mean anything. Spark apps have different resource
 requirements than Hadoop apps.

 Check your RM logs for any line that mentions your Spark app id. That may
 give you some insight into what's happening or not.

 --
 Marcelo




-- 
Marcelo


Re: Issue running Spark 1.4 on Yarn

2015-06-09 Thread Marcelo Vanzin
Does it work if you don't specify a queue?

On Tue, Jun 9, 2015 at 1:21 PM, Matt Kapilevich matve...@gmail.com wrote:

 Hi Marcelo,

 Yes, restarting YARN fixes this behavior and it again works the first few
 times. The only thing that's consistent is that once Spark job submissions
 stop working, it's broken for good.

 On Tue, Jun 9, 2015 at 4:12 PM, Marcelo Vanzin van...@cloudera.com
 wrote:

 Apologies, I see you already posted everything from the RM logs that
 mention your stuck app.

 Have you tried restarting the YARN cluster to see if that changes
 anything? Does it go back to the first few tries work behaviour?

 I run 1.4 on top of CDH 5.4 pretty often and haven't seen anything like
 this.


 On Tue, Jun 9, 2015 at 1:01 PM, Marcelo Vanzin van...@cloudera.com
 wrote:

 On Tue, Jun 9, 2015 at 11:31 AM, Matt Kapilevich matve...@gmail.com
 wrote:

  Like I mentioned earlier, I'm able to execute Hadoop jobs fine even
 now - this problem is specific to Spark.


 That doesn't necessarily mean anything. Spark apps have different
 resource requirements than Hadoop apps.

 Check your RM logs for any line that mentions your Spark app id. That
 may give you some insight into what's happening or not.

 --
 Marcelo




 --
 Marcelo





-- 
Marcelo


Re: Issue running Spark 1.4 on Yarn

2015-06-09 Thread Matt Kapilevich
Hi Marcelo,

Yes, restarting YARN fixes this behavior and it again works the first few
times. The only thing that's consistent is that once Spark job submissions
stop working, it's broken for good.

On Tue, Jun 9, 2015 at 4:12 PM, Marcelo Vanzin van...@cloudera.com wrote:

 Apologies, I see you already posted everything from the RM logs that
 mention your stuck app.

 Have you tried restarting the YARN cluster to see if that changes
 anything? Does it go back to the first few tries work behaviour?

 I run 1.4 on top of CDH 5.4 pretty often and haven't seen anything like
 this.


 On Tue, Jun 9, 2015 at 1:01 PM, Marcelo Vanzin van...@cloudera.com
 wrote:

 On Tue, Jun 9, 2015 at 11:31 AM, Matt Kapilevich matve...@gmail.com
 wrote:

  Like I mentioned earlier, I'm able to execute Hadoop jobs fine even now
 - this problem is specific to Spark.


 That doesn't necessarily mean anything. Spark apps have different
 resource requirements than Hadoop apps.

 Check your RM logs for any line that mentions your Spark app id. That may
 give you some insight into what's happening or not.

 --
 Marcelo




 --
 Marcelo



Re: Issue running Spark 1.4 on Yarn

2015-06-09 Thread Marcelo Vanzin
On Tue, Jun 9, 2015 at 11:31 AM, Matt Kapilevich matve...@gmail.com wrote:

  Like I mentioned earlier, I'm able to execute Hadoop jobs fine even now -
 this problem is specific to Spark.


That doesn't necessarily mean anything. Spark apps have different resource
requirements than Hadoop apps.

Check your RM logs for any line that mentions your Spark app id. That may
give you some insight into what's happening or not.

-- 
Marcelo


Re: Issue running Spark 1.4 on Yarn

2015-06-09 Thread nsalian
I see the other jobs SUCCEEDED without issues.

Could you snapshot the FairScheduler activity as well? 
My guess it, with the single core, it is reaching a NodeManager that is
still busy with other jobs and the job ends up in a waiting state.

Does the job eventually complete?

Could you potentially add another node to the cluster to see if my guess is
right? I just see one Active NM.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Issue-running-Spark-1-4-on-Yarn-tp23211p23236.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Issue running Spark 1.4 on Yarn

2015-06-09 Thread Matt Kapilevich
I've tried running a Hadoop app pointing to the same queue. Same thing now,
the job doesn't get accepted. I've cleared out the queue and killed all the
pending jobs, the queue is still unusable.

It seems like an issue with YARN, but it's specifically Spark that leaves
the queue in this state. I've ran a Hadoop job in a for loop 10x, while
specifying the queue explicitly, just to double-check.

On Tue, Jun 9, 2015 at 4:45 PM, Matt Kapilevich matve...@gmail.com wrote:

 From the RM scheduler, I see 3 applications currently stuck in the
 root.thequeue queue.

 Used Resources: memory:0, vCores:0
 Num Active Applications: 0
 Num Pending Applications: 3
 Min Resources: memory:0, vCores:0
 Max Resources: memory:6655, vCores:4
 Steady Fair Share: memory:1664, vCores:0
 Instantaneous Fair Share: memory:6655, vCores:0

 On Tue, Jun 9, 2015 at 4:30 PM, Matt Kapilevich matve...@gmail.com
 wrote:

 Yes! If I either specify a different queue or don't specify a queue at
 all, it works.

 On Tue, Jun 9, 2015 at 4:25 PM, Marcelo Vanzin van...@cloudera.com
 wrote:

 Does it work if you don't specify a queue?

 On Tue, Jun 9, 2015 at 1:21 PM, Matt Kapilevich matve...@gmail.com
 wrote:

 Hi Marcelo,

 Yes, restarting YARN fixes this behavior and it again works the first
 few times. The only thing that's consistent is that once Spark job
 submissions stop working, it's broken for good.

 On Tue, Jun 9, 2015 at 4:12 PM, Marcelo Vanzin van...@cloudera.com
 wrote:

 Apologies, I see you already posted everything from the RM logs that
 mention your stuck app.

 Have you tried restarting the YARN cluster to see if that changes
 anything? Does it go back to the first few tries work behaviour?

 I run 1.4 on top of CDH 5.4 pretty often and haven't seen anything
 like this.


 On Tue, Jun 9, 2015 at 1:01 PM, Marcelo Vanzin van...@cloudera.com
 wrote:

 On Tue, Jun 9, 2015 at 11:31 AM, Matt Kapilevich matve...@gmail.com
 wrote:

  Like I mentioned earlier, I'm able to execute Hadoop jobs fine even
 now - this problem is specific to Spark.


 That doesn't necessarily mean anything. Spark apps have different
 resource requirements than Hadoop apps.

 Check your RM logs for any line that mentions your Spark app id. That
 may give you some insight into what's happening or not.

 --
 Marcelo




 --
 Marcelo





 --
 Marcelo