Re: Need help, got java.lang.ExceptionInInitializerError in Yarn-Client/Cluster mode
I see Andrew, thanks for the explanantion. On Tue, Jul 29, 2014 at 5:29 AM, Andrew Lee wrote: > > I was thinking maybe we can suggest the community to enhance the Spark > HistoryServer to capture the last failure exception from the container logs > in the last failed stage? > This would be helpful. I personally like Yarn-Client mode as all the running status can be checked directly from the console. -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/
RE: Need help, got java.lang.ExceptionInInitializerError in Yarn-Client/Cluster mode
Hi Jianshi, My understanding is 'No' based on how Spark's is designed even with your own log4j.properties in the Spark's conf folder. In YARN mode, the Application Master is running inside the cluster and all logs are part of containers log which is defined by another log4j.properties file from the Hadoop and YARN environment. Spark can't override that unless it can provide its own log4j prior to YARN's in the classpath. So the only way is to login to the resource manager and click on the job itself to read the containers log. (Other people) Please correct me if my understanding is wrong. You may be thinking why can't I stream the log's to an external service (e.g. Flume, syslogd) with a different appender in log4j, myself don't consider this a good practice since:1. you need 2 infra structure to operate the entire cluster. 2. you will need to open up the firewall ports between the 2 services to transfer/stream logs.3. unpredictable traffic, the YARN cluster may bring down the logging service/infra (DDoS) when someone accidentally change the logging level from WARN to INFO, or worst, DEBUG. I was thinking maybe we can suggest the community to enhance the Spark HistoryServer to capture the last failure exception from the container logs in the last failed stage? Not sure if this is an good idea since it may complicate the event model. I'm not sure if Akka model can support this or some other components in Spark could help to capture these exceptions and pass it back to AM and eventually stored in somewhere for later troubleshooting. I'm not clear how this path is constructed until reading the source code, so I can't give a better answer. AL From: jianshi.hu...@gmail.com Date: Mon, 28 Jul 2014 13:32:05 +0800 Subject: Re: Need help, got java.lang.ExceptionInInitializerError in Yarn-Client/Cluster mode To: user@spark.apache.org Hi Andrew, Thanks for the reply, I figured out the cause of the issue. Some resource files were missing in JARs. A class initialization depends on the resource files so it got that exception. I appended the resource files explicitly to --jars option and it worked fine. The "Caused by..." messages were found in yarn logs actually, I think it might be useful if I can seem them from the console which runs spark-submit. Would that be possible? Jianshi On Sat, Jul 26, 2014 at 7:08 AM, Andrew Lee wrote: Hi Jianshi, Could you provide which HBase version you're using? By the way, a quick sanity check on whether the Workers can access HBase? Were you able to manually write one record to HBase with the serialize function? Hardcode and test it ? From: jianshi.hu...@gmail.com Date: Fri, 25 Jul 2014 15:12:18 +0800 Subject: Re: Need help, got java.lang.ExceptionInInitializerError in Yarn-Client/Cluster mode To: user@spark.apache.org I nailed it down to a union operation, here's my code snippet: val properties: RDD[((String, String, String), Externalizer[KeyValue])] = vertices.map { ve => val (vertices, dsName) = ve val rval = GraphConfig.getRval(datasetConf, Constants.VERTICES, dsName) val (_, rvalAsc, rvalType) = rval println(s"Table name: $dsName, Rval: $rval") println(vertices.toDebugString) vertices.map { v =>val rk = appendHash(boxId(v.id)).getBytes val cf = PROP_BYTES val cq = boxRval(v.rval, rvalAsc, rvalType).getBytesval value = Serializer.serialize(v.properties) ((new String(rk), new String(cf), new String(cq)), Externalizer(put(rk, cf, cq, value))) } }.reduce(_.union(_)).sortByKey(numPartitions = 32) Basically I read data from multiple tables (Seq[RDD[(key, value)]]) and they're transformed to the a KeyValue to be insert in HBase, so I need to do a .reduce(_.union(_)) to combine them into one RDD[(key, value)]. I cannot see what's wrong in my code. Jianshi On Fri, Jul 25, 2014 at 12:24 PM, Jianshi Huang wrote: I can successfully run my code in local mode using spark-submit (--master local[4]), but I got ExceptionInInitializerError errors in Yarn-client mode. Any hints what is the problem? Is it a closure serialization problem? How can I debug it? Your answers would be very helpful. 14/07/25 11:48:14 WARN scheduler.TaskSetManager: Loss was due to java.lang.ExceptionInInitializerErrorjava.lang.ExceptionInInitializerError at com.paypal.risk.rds.granada.storage.hbase.HBaseStore$$anonfun$1$$anonfun$apply$1.apply(HBaseStore.scal a:40)at com.paypal.risk.rds.granada.storage.hbase.HBaseStore$$anonfun$1$$anonfun$apply$1.apply(HBaseStore.scala:36) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)at org.apache.spark.r
Re: Need help, got java.lang.ExceptionInInitializerError in Yarn-Client/Cluster mode
Hi Andrew, Thanks for the reply, I figured out the cause of the issue. Some resource files were missing in JARs. A class initialization depends on the resource files so it got that exception. I appended the resource files explicitly to --jars option and it worked fine. The "Caused by..." messages were found in yarn logs actually, I think it might be useful if I can seem them from the console which runs spark-submit. Would that be possible? Jianshi On Sat, Jul 26, 2014 at 7:08 AM, Andrew Lee wrote: > Hi Jianshi, > > Could you provide which HBase version you're using? > > By the way, a quick sanity check on whether the Workers can access HBase? > > Were you able to manually write one record to HBase with the serialize > function? Hardcode and test it ? > > -- > From: jianshi.hu...@gmail.com > Date: Fri, 25 Jul 2014 15:12:18 +0800 > Subject: Re: Need help, got java.lang.ExceptionInInitializerError in > Yarn-Client/Cluster mode > To: user@spark.apache.org > > > I nailed it down to a union operation, here's my code snippet: > > val properties: RDD[((String, String, String), > Externalizer[KeyValue])] = vertices.map { ve => > val (vertices, dsName) = ve > val rval = GraphConfig.getRval(datasetConf, Constants.VERTICES, > dsName) > val (_, rvalAsc, rvalType) = rval > > println(s"Table name: $dsName, Rval: $rval") > println(vertices.toDebugString) > > vertices.map { v => > val rk = appendHash(boxId(v.id)).getBytes > val cf = PROP_BYTES > val cq = boxRval(v.rval, rvalAsc, rvalType).getBytes > val value = Serializer.serialize(v.properties) > > ((new String(rk), new String(cf), new String(cq)), > Externalizer(put(rk, cf, cq, value))) > } > }.reduce(_.union(_)).sortByKey(numPartitions = 32) > > Basically I read data from multiple tables (Seq[RDD[(key, value)]]) and > they're transformed to the a KeyValue to be insert in HBase, so I need to > do a .reduce(_.union(_)) to combine them into one RDD[(key, value)]. > > I cannot see what's wrong in my code. > > Jianshi > > > > On Fri, Jul 25, 2014 at 12:24 PM, Jianshi Huang > wrote: > > I can successfully run my code in local mode using spark-submit (--master > local[4]), but I got ExceptionInInitializerError errors in Yarn-client mode. > > Any hints what is the problem? Is it a closure serialization problem? How > can I debug it? Your answers would be very helpful. > > 14/07/25 11:48:14 WARN scheduler.TaskSetManager: Loss was due to > java.lang.ExceptionInInitializerError > java.lang.ExceptionInInitializerError > at > com.paypal.risk.rds.granada.storage.hbase.HBaseStore$$anonfun$1$$anonfun$apply$1.apply(HBaseStore.scal > a:40) > at > com.paypal.risk.rds.granada.storage.hbase.HBaseStore$$anonfun$1$$anonfun$apply$1.apply(HBaseStore.scal > a:36) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016) > at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847) > at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847) > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080) > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080) > at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) > at org.apache.spark.scheduler.Task.run(Task.scala:51) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > > > -- > Jianshi Huang > > LinkedIn: jianshi > Twitter: @jshuang > Github & Blog: http://huangjs.github.com/ > > > > > -- > Jianshi Huang > > LinkedIn: jianshi > Twitter: @jshuang > Github & Blog: http://huangjs.github.com/ > -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/
RE: Need help, got java.lang.ExceptionInInitializerError in Yarn-Client/Cluster mode
Hi Jianshi, Could you provide which HBase version you're using? By the way, a quick sanity check on whether the Workers can access HBase? Were you able to manually write one record to HBase with the serialize function? Hardcode and test it ? From: jianshi.hu...@gmail.com Date: Fri, 25 Jul 2014 15:12:18 +0800 Subject: Re: Need help, got java.lang.ExceptionInInitializerError in Yarn-Client/Cluster mode To: user@spark.apache.org I nailed it down to a union operation, here's my code snippet: val properties: RDD[((String, String, String), Externalizer[KeyValue])] = vertices.map { ve => val (vertices, dsName) = ve val rval = GraphConfig.getRval(datasetConf, Constants.VERTICES, dsName) val (_, rvalAsc, rvalType) = rval println(s"Table name: $dsName, Rval: $rval") println(vertices.toDebugString) vertices.map { v =>val rk = appendHash(boxId(v.id)).getBytes val cf = PROP_BYTES val cq = boxRval(v.rval, rvalAsc, rvalType).getBytesval value = Serializer.serialize(v.properties) ((new String(rk), new String(cf), new String(cq)), Externalizer(put(rk, cf, cq, value))) } }.reduce(_.union(_)).sortByKey(numPartitions = 32) Basically I read data from multiple tables (Seq[RDD[(key, value)]]) and they're transformed to the a KeyValue to be insert in HBase, so I need to do a .reduce(_.union(_)) to combine them into one RDD[(key, value)]. I cannot see what's wrong in my code. Jianshi On Fri, Jul 25, 2014 at 12:24 PM, Jianshi Huang wrote: I can successfully run my code in local mode using spark-submit (--master local[4]), but I got ExceptionInInitializerError errors in Yarn-client mode. Any hints what is the problem? Is it a closure serialization problem? How can I debug it? Your answers would be very helpful. 14/07/25 11:48:14 WARN scheduler.TaskSetManager: Loss was due to java.lang.ExceptionInInitializerErrorjava.lang.ExceptionInInitializerError at com.paypal.risk.rds.granada.storage.hbase.HBaseStore$$anonfun$1$$anonfun$apply$1.apply(HBaseStore.scal a:40)at com.paypal.risk.rds.granada.storage.hbase.HBaseStore$$anonfun$1$$anonfun$apply$1.apply(HBaseStore.scala:36) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) at org.apache.spark.scheduler.Task.run(Task.scala:51)at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/ -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/
Re: Need help, got java.lang.ExceptionInInitializerError in Yarn-Client/Cluster mode
I nailed it down to a union operation, here's my code snippet: val properties: RDD[((String, String, String), Externalizer[KeyValue])] = vertices.map { ve => val (vertices, dsName) = ve val rval = GraphConfig.getRval(datasetConf, Constants.VERTICES, dsName) val (_, rvalAsc, rvalType) = rval println(s"Table name: $dsName, Rval: $rval") println(vertices.toDebugString) vertices.map { v => val rk = appendHash(boxId(v.id)).getBytes val cf = PROP_BYTES val cq = boxRval(v.rval, rvalAsc, rvalType).getBytes val value = Serializer.serialize(v.properties) ((new String(rk), new String(cf), new String(cq)), Externalizer(put(rk, cf, cq, value))) } }.reduce(_.union(_)).sortByKey(numPartitions = 32) Basically I read data from multiple tables (Seq[RDD[(key, value)]]) and they're transformed to the a KeyValue to be insert in HBase, so I need to do a .reduce(_.union(_)) to combine them into one RDD[(key, value)]. I cannot see what's wrong in my code. Jianshi On Fri, Jul 25, 2014 at 12:24 PM, Jianshi Huang wrote: > I can successfully run my code in local mode using spark-submit (--master > local[4]), but I got ExceptionInInitializerError errors in Yarn-client mode. > > Any hints what is the problem? Is it a closure serialization problem? How > can I debug it? Your answers would be very helpful. > > 14/07/25 11:48:14 WARN scheduler.TaskSetManager: Loss was due to > java.lang.ExceptionInInitializerError > java.lang.ExceptionInInitializerError > at > com.paypal.risk.rds.granada.storage.hbase.HBaseStore$$anonfun$1$$anonfun$apply$1.apply(HBaseStore.scal > a:40) > at > com.paypal.risk.rds.granada.storage.hbase.HBaseStore$$anonfun$1$$anonfun$apply$1.apply(HBaseStore.scal > a:36) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016) > at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847) > at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847) > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080) > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080) > at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) > at org.apache.spark.scheduler.Task.run(Task.scala:51) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > > > -- > Jianshi Huang > > LinkedIn: jianshi > Twitter: @jshuang > Github & Blog: http://huangjs.github.com/ > -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/
Need help, got java.lang.ExceptionInInitializerError in Yarn-Client/Cluster mode
I can successfully run my code in local mode using spark-submit (--master local[4]), but I got ExceptionInInitializerError errors in Yarn-client mode. Any hints what is the problem? Is it a closure serialization problem? How can I debug it? Your answers would be very helpful. 14/07/25 11:48:14 WARN scheduler.TaskSetManager: Loss was due to java.lang.ExceptionInInitializerError java.lang.ExceptionInInitializerError at com.paypal.risk.rds.granada.storage.hbase.HBaseStore$$anonfun$1$$anonfun$apply$1.apply(HBaseStore.scal a:40) at com.paypal.risk.rds.granada.storage.hbase.HBaseStore$$anonfun$1$$anonfun$apply$1.apply(HBaseStore.scal a:36) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) at org.apache.spark.scheduler.Task.run(Task.scala:51) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/