Hi Patrick, The fix you need is SPARK-6954: https://github.com/apache/spark/pull/5704. If possible, you may cherry-pick the following commit into your Spark deployment and it should resolve the issue:
https://github.com/apache/spark/commit/98ac39d2f5828fbdad8c9a4e563ad1169e3b9948 Note that this commit is only for the 1.3 branch. If you could upgrade to 1.4.0 then you do not need to apply that commit yourself. -Andrew 2015-06-13 12:01 GMT-07:00 Patrick Woody <patrick.woo...@gmail.com>: > Hey Sandy, > > I'll test it out on 1.4. Do you have a bug number or PR that I could > reference as well? > > Thanks! > -Pat > > Sent from my iPhone > > On Jun 13, 2015, at 11:38 AM, Sandy Ryza <sandy.r...@cloudera.com> wrote: > > Hi Patrick, > > I'm noticing that you're using Spark 1.3.1. We fixed a bug in dynamic > allocation in 1.4 that permitted requesting negative numbers of executors. > Any chance you'd be able to try with the newer version and see if the > problem persists? > > -Sandy > > On Fri, Jun 12, 2015 at 7:42 PM, Patrick Woody <patrick.woo...@gmail.com> > wrote: > >> Hey all, >> >> I've recently run into an issue where spark dynamicAllocation has asked >> for -1 executors from YARN. Unfortunately, this raises an exception that >> kills the executor-allocation thread and the application can't request more >> resources. >> >> Has anyone seen this before? It is spurious and the application usually >> works, but when this gets hit it becomes unusable when getting stuck at >> minimum YARN resources. >> >> Stacktrace below. >> >> Thanks! >> -Pat >> >> 470 ERROR [2015-06-12 16:44:39,724] org.apache.spark.util.Utils: Uncaught >> exception in thread spark-dynamic-executor-allocation-0 >> 471 ! java.lang.IllegalArgumentException: Attempted to request a negative >> number of executor(s) -1 from the cluster manager. Please specify a >> positive number! >> 472 ! at >> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:338) >> ~[spark-core_2.10-1.3.1.jar:1. >> 473 ! at >> org.apache.spark.SparkContext.requestTotalExecutors(SparkContext.scala:1137) >> ~[spark-core_2.10-1.3.1.jar:1.3.1] >> 474 ! at >> org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:294) >> ~[spark-core_2.10-1.3.1.jar:1.3.1] >> 475 ! at >> org.apache.spark.ExecutorAllocationManager.addOrCancelExecutorRequests(ExecutorAllocationManager.scala:263) >> ~[spark-core_2.10-1.3.1.jar:1.3.1] >> 476 ! at >> org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:230) >> ~[spark-core_2.10-1.3.1.j >> 477 ! at >> org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply$mcV$sp(ExecutorAllocationManager.scala:189) >> ~[spark-core_2.10-1.3.1.jar:1.3.1] >> 478 ! at >> org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply(ExecutorAllocationManager.scala:189) >> ~[spark-core_2.10-1.3.1.jar:1.3.1] >> 479 ! at >> org.apache.spark.ExecutorAllocationManager$$anon$1$$anonfun$run$1.apply(ExecutorAllocationManager.scala:189) >> ~[spark-core_2.10-1.3.1.jar:1.3.1] >> 480 ! at >> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1618) >> ~[spark-core_2.10-1.3.1.jar:1.3.1] >> 481 ! at >> org.apache.spark.ExecutorAllocationManager$$anon$1.run(ExecutorAllocationManager.scala:189) >> [spark-core_2.10-1.3.1.jar:1.3.1] >> 482 ! at >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >> [na:1.7.0_71] >> 483 ! at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) >> [na:1.7.0_71] >> 484 ! at >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) >> [na:1.7.0_71] >> 485 ! at >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) >> [na:1.7.0_71] >> 486 ! at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> [na:1.7.0_71] >> 487 ! at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> [na:1.7.0_71] >> > >