Re: Need help, got java.lang.ExceptionInInitializerError in Yarn-Client/Cluster mode

2014-07-28 Thread Jianshi Huang
I see Andrew, thanks for the explanantion.

On Tue, Jul 29, 2014 at 5:29 AM, Andrew Lee  wrote:

>
> I was thinking maybe we can suggest the community to enhance the Spark
> HistoryServer to capture the last failure exception from the container logs
> in the last failed stage?
>

This would be helpful. I personally like Yarn-Client mode as all the
running status can be checked directly from the console.


-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/


RE: Need help, got java.lang.ExceptionInInitializerError in Yarn-Client/Cluster mode

2014-07-28 Thread Andrew Lee
Hi Jianshi,
My understanding is 'No' based on how Spark's is designed even with your own 
log4j.properties in the Spark's conf folder.
In YARN mode, the Application Master is running inside the cluster and all logs 
are part of containers log which is defined by another log4j.properties file 
from the Hadoop and YARN environment. Spark can't override that unless it can 
provide its own log4j prior to YARN's in the classpath. So the only way is to 
login to the resource manager and click on the job itself to read the 
containers log. (Other people) Please correct me if my understanding is wrong.
You may be thinking why can't I stream the log's to an external service (e.g. 
Flume, syslogd) with a different appender in log4j, myself don't consider this 
a good practice since:1. you need 2 infra structure to operate the entire 
cluster.  2. you will need to open up the firewall ports between the 2 services 
to transfer/stream logs.3. unpredictable traffic, the YARN cluster may bring 
down the logging service/infra (DDoS) when someone accidentally change the 
logging level from WARN to INFO, or worst, DEBUG.
I was thinking maybe we can suggest the community to enhance the Spark 
HistoryServer to capture the last failure exception from the container logs in 
the last failed stage? Not sure if this is an good idea since it may complicate 
the event model. I'm not sure if Akka model can support this or some other 
components in Spark could help to capture these exceptions and pass it back to 
AM and eventually stored in somewhere for later troubleshooting. I'm not clear 
how this path is constructed until reading the source code, so I can't give a 
better answer.
AL

From: jianshi.hu...@gmail.com
Date: Mon, 28 Jul 2014 13:32:05 +0800
Subject: Re: Need help, got java.lang.ExceptionInInitializerError in 
Yarn-Client/Cluster mode
To: user@spark.apache.org

Hi Andrew,
Thanks for the reply, I figured out the cause of the issue. Some resource files 
were missing in JARs. A class initialization depends on the resource files so 
it got that exception.


I appended the resource files explicitly to --jars option and it worked fine.
The "Caused by..." messages were found in yarn logs actually, I think it might 
be useful if I can seem them from the console which runs spark-submit. Would 
that be possible?


Jianshi


On Sat, Jul 26, 2014 at 7:08 AM, Andrew Lee  wrote:





Hi Jianshi,
Could you provide which HBase version you're using?
By the way, a quick sanity check on whether the Workers can access HBase?


Were you able to manually write one record to HBase with the serialize 
function? Hardcode and test it ?

From: jianshi.hu...@gmail.com


Date: Fri, 25 Jul 2014 15:12:18 +0800
Subject: Re: Need help, got java.lang.ExceptionInInitializerError in 
Yarn-Client/Cluster mode
To: user@spark.apache.org



I nailed it down to a union operation, here's my code snippet:
val properties: RDD[((String, String, String), Externalizer[KeyValue])] = 
vertices.map { ve =>

  val (vertices, dsName) = ve

  val rval = GraphConfig.getRval(datasetConf, Constants.VERTICES, dsName)   
   val (_, rvalAsc, rvalType) = rval
  println(s"Table name: $dsName, Rval: $rval")



  println(vertices.toDebugString)
  vertices.map { v =>val rk = appendHash(boxId(v.id)).getBytes  
  val cf = PROP_BYTES



val cq = boxRval(v.rval, rvalAsc, rvalType).getBytesval value = 
Serializer.serialize(v.properties)
((new String(rk), new String(cf), new String(cq)),



 Externalizer(put(rk, cf, cq, value)))  }
}.reduce(_.union(_)).sortByKey(numPartitions = 32)

Basically I read data from multiple tables (Seq[RDD[(key, value)]]) and they're 
transformed to the a KeyValue to be insert in HBase, so I need to do a 
.reduce(_.union(_)) to combine them into one RDD[(key, value)].




I cannot see what's wrong in my code.
Jianshi


On Fri, Jul 25, 2014 at 12:24 PM, Jianshi Huang  wrote:




I can successfully run my code in local mode using spark-submit (--master 
local[4]), but I got ExceptionInInitializerError errors in Yarn-client mode.




Any hints what is the problem? Is it a closure serialization problem? How can I 
debug it? Your answers would be very helpful. 

14/07/25 11:48:14 WARN scheduler.TaskSetManager: Loss was due to 
java.lang.ExceptionInInitializerErrorjava.lang.ExceptionInInitializerError  
  at 
com.paypal.risk.rds.granada.storage.hbase.HBaseStore$$anonfun$1$$anonfun$apply$1.apply(HBaseStore.scal




a:40)at 
com.paypal.risk.rds.granada.storage.hbase.HBaseStore$$anonfun$1$$anonfun$apply$1.apply(HBaseStore.scala:36)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)




at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016)   
 at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)at 
org.apache.spark.r

Re: Need help, got java.lang.ExceptionInInitializerError in Yarn-Client/Cluster mode

2014-07-27 Thread Jianshi Huang
Hi Andrew,

Thanks for the reply, I figured out the cause of the issue. Some resource
files were missing in JARs. A class initialization depends on the resource
files so it got that exception.

I appended the resource files explicitly to --jars option and it worked
fine.

The "Caused by..." messages were found in yarn logs actually, I think it
might be useful if I can seem them from the console which runs
spark-submit. Would that be possible?

Jianshi



On Sat, Jul 26, 2014 at 7:08 AM, Andrew Lee  wrote:

> Hi Jianshi,
>
> Could you provide which HBase version you're using?
>
> By the way, a quick sanity check on whether the Workers can access HBase?
>
> Were you able to manually write one record to HBase with the serialize
> function? Hardcode and test it ?
>
> --
> From: jianshi.hu...@gmail.com
> Date: Fri, 25 Jul 2014 15:12:18 +0800
> Subject: Re: Need help, got java.lang.ExceptionInInitializerError in
> Yarn-Client/Cluster mode
> To: user@spark.apache.org
>
>
> I nailed it down to a union operation, here's my code snippet:
>
> val properties: RDD[((String, String, String),
> Externalizer[KeyValue])] = vertices.map { ve =>
>   val (vertices, dsName) = ve
>   val rval = GraphConfig.getRval(datasetConf, Constants.VERTICES,
> dsName)
>   val (_, rvalAsc, rvalType) = rval
>
>   println(s"Table name: $dsName, Rval: $rval")
>   println(vertices.toDebugString)
>
>   vertices.map { v =>
> val rk = appendHash(boxId(v.id)).getBytes
> val cf = PROP_BYTES
> val cq = boxRval(v.rval, rvalAsc, rvalType).getBytes
> val value = Serializer.serialize(v.properties)
>
> ((new String(rk), new String(cf), new String(cq)),
>   Externalizer(put(rk, cf, cq, value)))
>   }
> }.reduce(_.union(_)).sortByKey(numPartitions = 32)
>
> Basically I read data from multiple tables (Seq[RDD[(key, value)]]) and
> they're transformed to the a KeyValue to be insert in HBase, so I need to
> do a .reduce(_.union(_)) to combine them into one RDD[(key, value)].
>
> I cannot see what's wrong in my code.
>
> Jianshi
>
>
>
> On Fri, Jul 25, 2014 at 12:24 PM, Jianshi Huang 
> wrote:
>
> I can successfully run my code in local mode using spark-submit (--master
> local[4]), but I got ExceptionInInitializerError errors in Yarn-client mode.
>
> Any hints what is the problem? Is it a closure serialization problem? How
> can I debug it? Your answers would be very helpful.
>
> 14/07/25 11:48:14 WARN scheduler.TaskSetManager: Loss was due to
> java.lang.ExceptionInInitializerError
> java.lang.ExceptionInInitializerError
> at
> com.paypal.risk.rds.granada.storage.hbase.HBaseStore$$anonfun$1$$anonfun$apply$1.apply(HBaseStore.scal
> a:40)
> at
> com.paypal.risk.rds.granada.storage.hbase.HBaseStore$$anonfun$1$$anonfun$apply$1.apply(HBaseStore.scal
> a:36)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016)
> at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
> at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
> at
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080)
> at
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080)
> at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
> at org.apache.spark.scheduler.Task.run(Task.scala:51)
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
>
> --
> Jianshi Huang
>
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://huangjs.github.com/
>
>
>
>
> --
> Jianshi Huang
>
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://huangjs.github.com/
>



-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/


RE: Need help, got java.lang.ExceptionInInitializerError in Yarn-Client/Cluster mode

2014-07-25 Thread Andrew Lee
Hi Jianshi,
Could you provide which HBase version you're using?
By the way, a quick sanity check on whether the Workers can access HBase?
Were you able to manually write one record to HBase with the serialize 
function? Hardcode and test it ?

From: jianshi.hu...@gmail.com
Date: Fri, 25 Jul 2014 15:12:18 +0800
Subject: Re: Need help, got java.lang.ExceptionInInitializerError in 
Yarn-Client/Cluster mode
To: user@spark.apache.org

I nailed it down to a union operation, here's my code snippet:
val properties: RDD[((String, String, String), Externalizer[KeyValue])] = 
vertices.map { ve =>  val (vertices, dsName) = ve

  val rval = GraphConfig.getRval(datasetConf, Constants.VERTICES, dsName)   
   val (_, rvalAsc, rvalType) = rval
  println(s"Table name: $dsName, Rval: $rval")

  println(vertices.toDebugString)
  vertices.map { v =>val rk = appendHash(boxId(v.id)).getBytes  
  val cf = PROP_BYTES

val cq = boxRval(v.rval, rvalAsc, rvalType).getBytesval value = 
Serializer.serialize(v.properties)
((new String(rk), new String(cf), new String(cq)),

 Externalizer(put(rk, cf, cq, value)))  }
}.reduce(_.union(_)).sortByKey(numPartitions = 32)

Basically I read data from multiple tables (Seq[RDD[(key, value)]]) and they're 
transformed to the a KeyValue to be insert in HBase, so I need to do a 
.reduce(_.union(_)) to combine them into one RDD[(key, value)].


I cannot see what's wrong in my code.
Jianshi


On Fri, Jul 25, 2014 at 12:24 PM, Jianshi Huang  wrote:


I can successfully run my code in local mode using spark-submit (--master 
local[4]), but I got ExceptionInInitializerError errors in Yarn-client mode.


Any hints what is the problem? Is it a closure serialization problem? How can I 
debug it? Your answers would be very helpful. 

14/07/25 11:48:14 WARN scheduler.TaskSetManager: Loss was due to 
java.lang.ExceptionInInitializerErrorjava.lang.ExceptionInInitializerError  
  at 
com.paypal.risk.rds.granada.storage.hbase.HBaseStore$$anonfun$1$$anonfun$apply$1.apply(HBaseStore.scal


a:40)at 
com.paypal.risk.rds.granada.storage.hbase.HBaseStore$$anonfun$1$$anonfun$apply$1.apply(HBaseStore.scala:36)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)


at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016)   
 at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)at 
org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)


at 
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080)  
  at 
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080)  
  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)


at org.apache.spark.scheduler.Task.run(Task.scala:51)at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)


at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
   at java.lang.Thread.run(Thread.java:745)



-- 
Jianshi Huang

LinkedIn: jianshi

Twitter: @jshuang
Github & Blog: http://huangjs.github.com/




-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/



  

Re: Need help, got java.lang.ExceptionInInitializerError in Yarn-Client/Cluster mode

2014-07-25 Thread Jianshi Huang
I nailed it down to a union operation, here's my code snippet:

val properties: RDD[((String, String, String), Externalizer[KeyValue])]
= vertices.map { ve =>
  val (vertices, dsName) = ve
  val rval = GraphConfig.getRval(datasetConf, Constants.VERTICES,
dsName)
  val (_, rvalAsc, rvalType) = rval

  println(s"Table name: $dsName, Rval: $rval")
  println(vertices.toDebugString)

  vertices.map { v =>
val rk = appendHash(boxId(v.id)).getBytes
val cf = PROP_BYTES
val cq = boxRval(v.rval, rvalAsc, rvalType).getBytes
val value = Serializer.serialize(v.properties)

((new String(rk), new String(cf), new String(cq)),
 Externalizer(put(rk, cf, cq, value)))
  }
}.reduce(_.union(_)).sortByKey(numPartitions = 32)

Basically I read data from multiple tables (Seq[RDD[(key, value)]]) and
they're transformed to the a KeyValue to be insert in HBase, so I need to
do a .reduce(_.union(_)) to combine them into one RDD[(key, value)].

I cannot see what's wrong in my code.

Jianshi



On Fri, Jul 25, 2014 at 12:24 PM, Jianshi Huang 
wrote:

> I can successfully run my code in local mode using spark-submit (--master
> local[4]), but I got ExceptionInInitializerError errors in Yarn-client mode.
>
> Any hints what is the problem? Is it a closure serialization problem? How
> can I debug it? Your answers would be very helpful.
>
> 14/07/25 11:48:14 WARN scheduler.TaskSetManager: Loss was due to
> java.lang.ExceptionInInitializerError
> java.lang.ExceptionInInitializerError
> at
> com.paypal.risk.rds.granada.storage.hbase.HBaseStore$$anonfun$1$$anonfun$apply$1.apply(HBaseStore.scal
> a:40)
> at
> com.paypal.risk.rds.granada.storage.hbase.HBaseStore$$anonfun$1$$anonfun$apply$1.apply(HBaseStore.scal
> a:36)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016)
> at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
> at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
> at
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080)
> at
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080)
> at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
> at org.apache.spark.scheduler.Task.run(Task.scala:51)
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
>
> --
> Jianshi Huang
>
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://huangjs.github.com/
>



-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/


Need help, got java.lang.ExceptionInInitializerError in Yarn-Client/Cluster mode

2014-07-24 Thread Jianshi Huang
I can successfully run my code in local mode using spark-submit (--master
local[4]), but I got ExceptionInInitializerError errors in Yarn-client mode.

Any hints what is the problem? Is it a closure serialization problem? How
can I debug it? Your answers would be very helpful.

14/07/25 11:48:14 WARN scheduler.TaskSetManager: Loss was due to
java.lang.ExceptionInInitializerError
java.lang.ExceptionInInitializerError
at
com.paypal.risk.rds.granada.storage.hbase.HBaseStore$$anonfun$1$$anonfun$apply$1.apply(HBaseStore.scal
a:40)
at
com.paypal.risk.rds.granada.storage.hbase.HBaseStore$$anonfun$1$$anonfun$apply$1.apply(HBaseStore.scal
a:36)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1016)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:847)
at
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080)
at
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
at org.apache.spark.scheduler.Task.run(Task.scala:51)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/