Looks like jar containing EsHadoopIllegalArgumentException class wasn't in the classpath. Can you double check ?
Which Spark version are you using ? Cheers On Thu, Jan 21, 2016 at 6:50 AM, Guillermo Ortiz <konstt2...@gmail.com> wrote: > I'm runing a Spark Streaming process and it stops in a while. It makes > some process an insert the result in ElasticSeach with its library. After a > while the process fail. > > I have been checking the logs and I have seen this error > 2016-01-21 14:57:54,388 [sparkDriver-akka.actor.default-dispatcher-17] > INFO org.apache.spark.storage.BlockManagerInfo - Added broadcast_2_piece0 > in memory on ose11kafkaelk.novalocal:46913 (size: 6.0 KB, free: 530.3 MB) > 2016-01-21 14:57:54,646 [task-result-getter-1] INFO > org.apache.spark.scheduler.TaskSetManager - Finished task 0.0 in stage 2.0 > (TID 7) in 397 ms on ose12kafkaelk.novalocal (1/6) > 2016-01-21 14:57:54,647 [task-result-getter-2] INFO > org.apache.spark.scheduler.TaskSetManager - Finished task 2.0 in stage 2.0 > (TID 10) in 395 ms on ose12kafkaelk.novalocal (2/6) > 2016-01-21 14:57:54,731 [task-result-getter-3] INFO > org.apache.spark.scheduler.TaskSetManager - Finished task 5.0 in stage 2.0 > (TID 9) in 481 ms on ose11kafkaelk.novalocal (3/6) > 2016-01-21 14:57:54,844 [task-result-getter-1] INFO > org.apache.spark.scheduler.TaskSetManager - Finished task 4.0 in stage 2.0 > (TID 8) in 595 ms on ose10kafkaelk.novalocal (4/6) > 2016-01-21 14:57:54,850 [task-result-getter-0] WARN > org.apache.spark.ThrowableSerializationWrapper - Task exception could not > be deserialized > java.lang.ClassNotFoundException: > org.elasticsearch.hadoop.EsHadoopIllegalArgumentException > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > > I don't know why I'm getting this error because the class > org.elasticsearch.hadoop.EsHadoopIllegalArgumentException is in the library > of elasticSearch. > > After this error I get others error and finally Spark ends. > 2016-01-21 14:57:55,012 [JobScheduler] INFO > org.apache.spark.streaming.scheduler.JobScheduler - Starting job streaming > job 1453384640000 ms.0 from job set of time 1453384640000 ms > 2016-01-21 14:57:55,012 [JobScheduler] ERROR > org.apache.spark.streaming.scheduler.JobScheduler - Error running job > streaming job 1453384635000 ms.0 > org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 > in stage 2.0 failed 4 times, most recent failure: Lost task 1.3 in stage > 2.0 (TID 13, ose11kafkaelk.novalocal): UnknownReason > Driver stacktrace: > at org.apache.spark.scheduler.DAGScheduler.org > $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1294) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1282) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1281) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1281) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1507) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1469) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1837) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1914) > at org.elasticsearch.spark.rdd.EsSpark$.saveToEs(EsSpark.scala:67) > at org.elasticsearch.spark.rdd.EsSpark$.saveToEs(EsSpark.scala:54) > at org.elasticsearch.spark.rdd.EsSpark$.saveJsonToEs(EsSpark.scala:90) > at > org.elasticsearch.spark.package$SparkJsonRDDFunctions.saveJsonToEs(package.scala:44) > at > produban.spark.CentralLog$$anonfun$createContext$1.apply(CentralLog.scala:56) > at > produban.spark.CentralLog$$anonfun$createContext$1.apply(CentralLog.scala:33) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:631) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:631) > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:42) > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:40) > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:40) > at > org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:399) > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:40) > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40) > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40) > at scala.util.Try$.apply(Try.scala:161) > at org.apache.spark.streaming.scheduler.Job.run(Job.scala:34) > at > org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:207) > at > org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:207) > at > org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:207) > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) > at > org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:206) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 2016-01-21 14:57:55,015 [Driver] ERROR > org.apache.spark.deploy.yarn.ApplicationMaster - User class threw > exception: org.apache.spark.SparkException: Job aborted due to stage > failure: Task 1 in stage 2.0 failed 4 times, most recent failure: Lost task > 1.3 in stage 2.0 (TID 13, ose11kafkaelk.novalocal): UnknownReason > Driver stacktrace: > org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 > in stage 2.0 failed 4 times, most recent failure: Lost task 1.3 in stage > 2.0 (TID 13, ose11kafkaelk.novalocal): UnknownReason > Driver stacktrace: > at org.apache.spark.scheduler.DAGScheduler.org > $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1294) > > 016-01-21 14:57:56,736 [JobGenerator] ERROR > org.apache.spark.streaming.CheckpointWriter - Could not submit checkpoint > task to the thread pool executor > java.util.concurrent.RejectedExecutionException: Task > org.apache.spark.streaming.CheckpointWriter$CheckpointWriteHandler@32a9f9db > rejected from java.util.concurrent.ThreadPoolExecutor@28ac8892[Shutting > down, pool size = 1, active threads = 1, queued tasks = 0, completed tasks > = 0] > at > java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048) > at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372) > at org.apache.spark.streaming.CheckpointWriter.write(Checkpoint.scala:253) > at > org.apache.spark.streaming.scheduler.JobGenerator.doCheckpoint(JobGenerator.scala:294) > at org.apache.spark.streaming.scheduler.JobGenerator.org > $apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:184) > at > org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:87) > at > org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:86) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > 2016-01-21 14:58:05,090 [Thread-3] INFO > org.apache.spark.streaming.CheckpointWriter - CheckpointWriter executor > terminated ? false, waited for 10001 ms. > 2016-01-21 14:58:05,092 [pool-19-thread-1] WARN > org.apache.hadoop.ipc.Client - interrupted waiting to send rpc request to > server > java.lang.InterruptedException > at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:400) > at java.util.concurrent.FutureTask.get(FutureTask.java:187) > > When I execute Spark in yarn-client mode, same error happens but Spark > doesn't stop. Although how I don't know how to reproduce I'm not sure 100% > of this... > I should do something if I change of yarn-cluster to yarn-client? It seems > that it doesn't have that class. >