Re: Bulk loading Serialized RDD into Hbase throws KryoException - IndexOutOfBoundsException

2016-05-30 Thread Nirav Patel
Put is a type of Mutation so not sure what you mean by if I use mutation.

Anyway I registered all 3 classes to kryo.

kryo.register(classOf[org.apache.hadoop.hbase.client.Put])

kryo.register(classOf[ImmutableBytesWritable])

kryo.register(classOf[Mutable])


It still fails with the same exception.



On Sun, May 29, 2016 at 11:26 PM, sjk  wrote:

> org.apache.hadoop.hbase.client.{Mutation, Put}
> org.apache.hadoop.hbase.io.ImmutableBytesWritable
>
> if u used mutation, register the above class too
>
> On May 30, 2016, at 08:11, Nirav Patel  wrote:
>
> Sure let me can try that. But from looks of it it seems kryo kryo.
> util.MapReferenceResolver.getReadObject trying to access incorrect index
> (100)
>
> On Sun, May 29, 2016 at 5:06 PM, Ted Yu  wrote:
>
>> Can you register Put with Kryo ?
>>
>> Thanks
>>
>> On May 29, 2016, at 4:58 PM, Nirav Patel  wrote:
>>
>> I pasted code snipped for that method.
>>
>> here's full def:
>>
>>   def writeRddToHBase2(hbaseRdd: RDD[(ImmutableBytesWritable, Put)],
>> tableName: String) {
>>
>>
>> hbaseRdd.values.foreachPartition{ itr =>
>>
>> val hConf = HBaseConfiguration.create()
>>
>> hConf.setInt("hbase.client.write.buffer", 16097152)
>>
>> val table = new HTable(hConf, tableName)
>>
>> //table.setWriteBufferSize(8388608)
>>
>> *itr.grouped(100).foreach(table.put(_)) *  // << Exception
>> happens at this point
>>
>> table.close()
>>
>> }
>>
>>   }
>>
>>
>> I am using hbase 0.98.12 mapr distribution.
>>
>>
>> Thanks
>>
>> Nirav
>>
>> On Sun, May 29, 2016 at 4:46 PM, Ted Yu  wrote:
>>
>>> bq.  at com.mycorpt.myprojjobs.spark.jobs.hbase.HbaseUtils$$
>>> anonfun$writeRddToHBase2$1.apply(HbaseUtils.scala:80)
>>>
>>> Can you reveal related code from HbaseUtils.scala ?
>>>
>>> Which hbase version are you using ?
>>>
>>> Thanks
>>>
>>> On Sun, May 29, 2016 at 4:26 PM, Nirav Patel 
>>> wrote:
>>>
 Hi,

 I am getting following Kryo deserialization error when trying to
 buklload Cached RDD into Hbase. It works if I don't cache the RDD. I cache
 it with MEMORY_ONLY_SER.

 here's the code snippet:


 hbaseRdd.values.foreachPartition{ itr =>
 val hConf = HBaseConfiguration.create()
 hConf.setInt("hbase.client.write.buffer", 16097152)
 val table = new HTable(hConf, tableName)
 itr.grouped(100).foreach(table.put(_))
 table.close()
 }
 hbaseRdd is of type RDD[(ImmutableBytesWritable, Put)]


 Exception I am getting. I read on Kryo JIRA that this may be issue with
 incorrect use of serialization library. So could this be issue with
 twitter-chill library or spark core it self ?

 Job aborted due to stage failure: Task 16 in stage 9.0 failed 10 times,
 most recent failure: Lost task 16.9 in stage 9.0 (TID 28614,
 hdn10.mycorptcorporation.local): com.esotericsoftware.kryo.KryoException:
 java.lang.IndexOutOfBoundsException: Index: 100, Size: 6
 Serialization trace:
 familyMap (org.apache.hadoop.hbase.client.Put)
 at
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:626)
 at
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
 at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
 at com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:42)
 at com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:33)
 at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
 at
 org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:192)
 at
 org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:181)
 at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
 at
 org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
 at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:966)
 at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:972)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at
 com.mycorpt.myprojjobs.spark.jobs.hbase.HbaseUtils$$anonfun$writeRddToHBase2$1.apply(HbaseUtils.scala:80)
 at
 com.mycorpt.myprojjobs.spark.jobs.hbase.HbaseUtils$$anonfun$writeRddToHBase2$1.apply(HbaseUtils.scala:75)
 at
 org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:902)
 at
 org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:902)
 at
 org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
 at
 org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
 at org.apache.spark.schedule

Re: Bulk loading Serialized RDD into Hbase throws KryoException - IndexOutOfBoundsException

2016-05-29 Thread sjk
org.apache.hadoop.hbase.client.{Mutation, Put}
org.apache.hadoop.hbase.io.ImmutableBytesWritable

if u used mutation, register the above class too

> On May 30, 2016, at 08:11, Nirav Patel  wrote:
> 
> Sure let me can try that. But from looks of it it seems kryo 
> kryo.util.MapReferenceResolver.getReadObject trying to access incorrect index 
> (100) 
> 
> On Sun, May 29, 2016 at 5:06 PM, Ted Yu  > wrote:
> Can you register Put with Kryo ?
> 
> Thanks
> 
> On May 29, 2016, at 4:58 PM, Nirav Patel  > wrote:
> 
>> I pasted code snipped for that method.
>> 
>> here's full def:
>> 
>>   def writeRddToHBase2(hbaseRdd: RDD[(ImmutableBytesWritable, Put)], 
>> tableName: String) {
>> 
>> 
>> 
>> hbaseRdd.values.foreachPartition{ itr =>
>> 
>> val hConf = HBaseConfiguration.create()
>> 
>> hConf.setInt("hbase.client.write.buffer", 16097152)
>> 
>> val table = new HTable(hConf, tableName)
>> 
>> //table.setWriteBufferSize(8388608)
>> 
>> itr.grouped(100).foreach(table.put(_))   // << Exception happens at 
>> this point
>> 
>> table.close()
>> 
>> }
>> 
>>   }
>> 
>> 
>> 
>> I am using hbase 0.98.12 mapr distribution.
>> 
>> 
>> 
>> Thanks
>> 
>> Nirav
>> 
>> 
>> On Sun, May 29, 2016 at 4:46 PM, Ted Yu > > wrote:
>> bq.  at 
>> com.mycorpt.myprojjobs.spark.jobs.hbase.HbaseUtils$$anonfun$writeRddToHBase2$1.apply(HbaseUtils.scala:80)
>> 
>> Can you reveal related code from HbaseUtils.scala ?
>> 
>> Which hbase version are you using ?
>> 
>> Thanks
>> 
>> On Sun, May 29, 2016 at 4:26 PM, Nirav Patel > > wrote:
>> Hi,
>> 
>> I am getting following Kryo deserialization error when trying to buklload 
>> Cached RDD into Hbase. It works if I don't cache the RDD. I cache it with 
>> MEMORY_ONLY_SER.
>> 
>> here's the code snippet:
>> 
>> 
>> hbaseRdd.values.foreachPartition{ itr =>
>> val hConf = HBaseConfiguration.create()
>> hConf.setInt("hbase.client.write.buffer", 16097152)
>> val table = new HTable(hConf, tableName)
>> itr.grouped(100).foreach(table.put(_))
>> table.close()
>> }
>> hbaseRdd is of type RDD[(ImmutableBytesWritable, Put)]
>> 
>> 
>> Exception I am getting. I read on Kryo JIRA that this may be issue with 
>> incorrect use of serialization library. So could this be issue with 
>> twitter-chill library or spark core it self ? 
>> 
>> Job aborted due to stage failure: Task 16 in stage 9.0 failed 10 times, most 
>> recent failure: Lost task 16.9 in stage 9.0 (TID 28614, 
>> hdn10.mycorptcorporation.local): com.esotericsoftware.kryo.KryoException: 
>> java.lang.IndexOutOfBoundsException: Index: 100, Size: 6
>> Serialization trace:
>> familyMap (org.apache.hadoop.hbase.client.Put)
>>  at 
>> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:626)
>>  at 
>> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
>>  at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
>>  at com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:42)
>>  at com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:33)
>>  at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
>>  at 
>> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:192)
>>  at 
>> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:181)
>>  at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
>>  at 
>> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>>  at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>>  at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:966)
>>  at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:972)
>>  at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>  at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>>  at 
>> com.mycorpt.myprojjobs.spark.jobs.hbase.HbaseUtils$$anonfun$writeRddToHBase2$1.apply(HbaseUtils.scala:80)
>>  at 
>> com.mycorpt.myprojjobs.spark.jobs.hbase.HbaseUtils$$anonfun$writeRddToHBase2$1.apply(HbaseUtils.scala:75)
>>  at 
>> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:902)
>>  at 
>> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:902)
>>  at 
>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
>>  at 
>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
>>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>>  at org.apache.spark.scheduler.Task.run(Task.scala:88)
>>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>>  at 
>> java.

Re: Bulk loading Serialized RDD into Hbase throws KryoException - IndexOutOfBoundsException

2016-05-29 Thread Nirav Patel
Sure let me can try that. But from looks of it it seems kryo kryo.
util.MapReferenceResolver.getReadObject trying to access incorrect index
(100)

On Sun, May 29, 2016 at 5:06 PM, Ted Yu  wrote:

> Can you register Put with Kryo ?
>
> Thanks
>
> On May 29, 2016, at 4:58 PM, Nirav Patel  wrote:
>
> I pasted code snipped for that method.
>
> here's full def:
>
>   def writeRddToHBase2(hbaseRdd: RDD[(ImmutableBytesWritable, Put)],
> tableName: String) {
>
>
> hbaseRdd.values.foreachPartition{ itr =>
>
> val hConf = HBaseConfiguration.create()
>
> hConf.setInt("hbase.client.write.buffer", 16097152)
>
> val table = new HTable(hConf, tableName)
>
> //table.setWriteBufferSize(8388608)
>
> *itr.grouped(100).foreach(table.put(_)) *  // << Exception
> happens at this point
>
> table.close()
>
> }
>
>   }
>
>
> I am using hbase 0.98.12 mapr distribution.
>
>
> Thanks
>
> Nirav
>
> On Sun, May 29, 2016 at 4:46 PM, Ted Yu  wrote:
>
>> bq.  at com.mycorpt.myprojjobs.spark.jobs.hbase.HbaseUtils$$
>> anonfun$writeRddToHBase2$1.apply(HbaseUtils.scala:80)
>>
>> Can you reveal related code from HbaseUtils.scala ?
>>
>> Which hbase version are you using ?
>>
>> Thanks
>>
>> On Sun, May 29, 2016 at 4:26 PM, Nirav Patel 
>> wrote:
>>
>>> Hi,
>>>
>>> I am getting following Kryo deserialization error when trying to
>>> buklload Cached RDD into Hbase. It works if I don't cache the RDD. I cache
>>> it with MEMORY_ONLY_SER.
>>>
>>> here's the code snippet:
>>>
>>>
>>> hbaseRdd.values.foreachPartition{ itr =>
>>> val hConf = HBaseConfiguration.create()
>>> hConf.setInt("hbase.client.write.buffer", 16097152)
>>> val table = new HTable(hConf, tableName)
>>> itr.grouped(100).foreach(table.put(_))
>>> table.close()
>>> }
>>> hbaseRdd is of type RDD[(ImmutableBytesWritable, Put)]
>>>
>>>
>>> Exception I am getting. I read on Kryo JIRA that this may be issue with
>>> incorrect use of serialization library. So could this be issue with
>>> twitter-chill library or spark core it self ?
>>>
>>> Job aborted due to stage failure: Task 16 in stage 9.0 failed 10 times,
>>> most recent failure: Lost task 16.9 in stage 9.0 (TID 28614,
>>> hdn10.mycorptcorporation.local): com.esotericsoftware.kryo.KryoException:
>>> java.lang.IndexOutOfBoundsException: Index: 100, Size: 6
>>> Serialization trace:
>>> familyMap (org.apache.hadoop.hbase.client.Put)
>>> at
>>> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:626)
>>> at
>>> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
>>> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
>>> at com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:42)
>>> at com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:33)
>>> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
>>> at
>>> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:192)
>>> at
>>> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:181)
>>> at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
>>> at
>>> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>>> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>>> at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:966)
>>> at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:972)
>>> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>>> at
>>> com.mycorpt.myprojjobs.spark.jobs.hbase.HbaseUtils$$anonfun$writeRddToHBase2$1.apply(HbaseUtils.scala:80)
>>> at
>>> com.mycorpt.myprojjobs.spark.jobs.hbase.HbaseUtils$$anonfun$writeRddToHBase2$1.apply(HbaseUtils.scala:75)
>>> at
>>> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:902)
>>> at
>>> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:902)
>>> at
>>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
>>> at
>>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
>>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>>> at org.apache.spark.scheduler.Task.run(Task.scala:88)
>>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:744)
>>> Caused by: java.lang.IndexOutOfBoundsException: Index: 100, Size: 6
>>> at java.util.ArrayList.rangeCheck(ArrayList.java:635)
>>> at java.util.ArrayList.get(ArrayList.java:411)
>>> at
>>> com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.ja

Re: Bulk loading Serialized RDD into Hbase throws KryoException - IndexOutOfBoundsException

2016-05-29 Thread Ted Yu
Can you register Put with Kryo ?

Thanks

> On May 29, 2016, at 4:58 PM, Nirav Patel  wrote:
> 
> I pasted code snipped for that method.
> 
> here's full def:
> 
>   def writeRddToHBase2(hbaseRdd: RDD[(ImmutableBytesWritable, Put)], 
> tableName: String) {
> 
> 
> 
> hbaseRdd.values.foreachPartition{ itr =>
> 
> val hConf = HBaseConfiguration.create()
> 
> hConf.setInt("hbase.client.write.buffer", 16097152)
> 
> val table = new HTable(hConf, tableName)
> 
> //table.setWriteBufferSize(8388608)
> 
> itr.grouped(100).foreach(table.put(_))   // << Exception happens at 
> this point
> 
> table.close()
> 
> }
> 
>   }
> 
> 
> 
> I am using hbase 0.98.12 mapr distribution.
> 
> 
> 
> Thanks
> 
> Nirav
> 
> 
>> On Sun, May 29, 2016 at 4:46 PM, Ted Yu  wrote:
>> bq.  at 
>> com.mycorpt.myprojjobs.spark.jobs.hbase.HbaseUtils$$anonfun$writeRddToHBase2$1.apply(HbaseUtils.scala:80)
>> 
>> Can you reveal related code from HbaseUtils.scala ?
>> 
>> Which hbase version are you using ?
>> 
>> Thanks
>> 
>>> On Sun, May 29, 2016 at 4:26 PM, Nirav Patel  wrote:
>>> Hi,
>>> 
>>> I am getting following Kryo deserialization error when trying to buklload 
>>> Cached RDD into Hbase. It works if I don't cache the RDD. I cache it with 
>>> MEMORY_ONLY_SER.
>>> 
>>> here's the code snippet:
>>> 
>>> 
>>> hbaseRdd.values.foreachPartition{ itr =>
>>> val hConf = HBaseConfiguration.create()
>>> hConf.setInt("hbase.client.write.buffer", 16097152)
>>> val table = new HTable(hConf, tableName)
>>> itr.grouped(100).foreach(table.put(_))
>>> table.close()
>>> }
>>> hbaseRdd is of type RDD[(ImmutableBytesWritable, Put)]
>>> 
>>> 
>>> Exception I am getting. I read on Kryo JIRA that this may be issue with 
>>> incorrect use of serialization library. So could this be issue with 
>>> twitter-chill library or spark core it self ? 
>>> 
>>> Job aborted due to stage failure: Task 16 in stage 9.0 failed 10 times, 
>>> most recent failure: Lost task 16.9 in stage 9.0 (TID 28614, 
>>> hdn10.mycorptcorporation.local): com.esotericsoftware.kryo.KryoException: 
>>> java.lang.IndexOutOfBoundsException: Index: 100, Size: 6
>>> Serialization trace:
>>> familyMap (org.apache.hadoop.hbase.client.Put)
>>> at 
>>> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:626)
>>> at 
>>> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
>>> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
>>> at com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:42)
>>> at com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:33)
>>> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
>>> at 
>>> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:192)
>>> at 
>>> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:181)
>>> at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
>>> at 
>>> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>>> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>>> at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:966)
>>> at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:972)
>>> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>>> at 
>>> com.mycorpt.myprojjobs.spark.jobs.hbase.HbaseUtils$$anonfun$writeRddToHBase2$1.apply(HbaseUtils.scala:80)
>>> at 
>>> com.mycorpt.myprojjobs.spark.jobs.hbase.HbaseUtils$$anonfun$writeRddToHBase2$1.apply(HbaseUtils.scala:75)
>>> at 
>>> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:902)
>>> at 
>>> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:902)
>>> at 
>>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
>>> at 
>>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
>>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>>> at org.apache.spark.scheduler.Task.run(Task.scala:88)
>>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>>> at 
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:744)
>>> Caused by: java.lang.IndexOutOfBoundsException: Index: 100, Size: 6
>>> at java.util.ArrayList.rangeCheck(ArrayList.java:635)
>>> at java.util.ArrayList.get(ArrayList.java:411)
>>> at 
>>> com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:42)
>>> at com.es

Re: Bulk loading Serialized RDD into Hbase throws KryoException - IndexOutOfBoundsException

2016-05-29 Thread Nirav Patel
I pasted code snipped for that method.

here's full def:

  def writeRddToHBase2(hbaseRdd: RDD[(ImmutableBytesWritable, Put)],
tableName: String) {


hbaseRdd.values.foreachPartition{ itr =>

val hConf = HBaseConfiguration.create()

hConf.setInt("hbase.client.write.buffer", 16097152)

val table = new HTable(hConf, tableName)

//table.setWriteBufferSize(8388608)

*itr.grouped(100).foreach(table.put(_)) *  // << Exception happens
at this point

table.close()

}

  }


I am using hbase 0.98.12 mapr distribution.


Thanks

Nirav

On Sun, May 29, 2016 at 4:46 PM, Ted Yu  wrote:

> bq.  at com.mycorpt.myprojjobs.spark.jobs.hbase.HbaseUtils$$
> anonfun$writeRddToHBase2$1.apply(HbaseUtils.scala:80)
>
> Can you reveal related code from HbaseUtils.scala ?
>
> Which hbase version are you using ?
>
> Thanks
>
> On Sun, May 29, 2016 at 4:26 PM, Nirav Patel 
> wrote:
>
>> Hi,
>>
>> I am getting following Kryo deserialization error when trying to buklload
>> Cached RDD into Hbase. It works if I don't cache the RDD. I cache it
>> with MEMORY_ONLY_SER.
>>
>> here's the code snippet:
>>
>>
>> hbaseRdd.values.foreachPartition{ itr =>
>> val hConf = HBaseConfiguration.create()
>> hConf.setInt("hbase.client.write.buffer", 16097152)
>> val table = new HTable(hConf, tableName)
>> itr.grouped(100).foreach(table.put(_))
>> table.close()
>> }
>> hbaseRdd is of type RDD[(ImmutableBytesWritable, Put)]
>>
>>
>> Exception I am getting. I read on Kryo JIRA that this may be issue with
>> incorrect use of serialization library. So could this be issue with
>> twitter-chill library or spark core it self ?
>>
>> Job aborted due to stage failure: Task 16 in stage 9.0 failed 10 times,
>> most recent failure: Lost task 16.9 in stage 9.0 (TID 28614,
>> hdn10.mycorptcorporation.local): com.esotericsoftware.kryo.KryoException:
>> java.lang.IndexOutOfBoundsException: Index: 100, Size: 6
>> Serialization trace:
>> familyMap (org.apache.hadoop.hbase.client.Put)
>> at
>> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:626)
>> at
>> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
>> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
>> at com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:42)
>> at com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:33)
>> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
>> at
>> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:192)
>> at
>> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:181)
>> at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
>> at
>> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>> at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:966)
>> at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:972)
>> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>> at
>> com.mycorpt.myprojjobs.spark.jobs.hbase.HbaseUtils$$anonfun$writeRddToHBase2$1.apply(HbaseUtils.scala:80)
>> at
>> com.mycorpt.myprojjobs.spark.jobs.hbase.HbaseUtils$$anonfun$writeRddToHBase2$1.apply(HbaseUtils.scala:75)
>> at
>> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:902)
>> at
>> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:902)
>> at
>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
>> at
>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>> at org.apache.spark.scheduler.Task.run(Task.scala:88)
>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:744)
>> Caused by: java.lang.IndexOutOfBoundsException: Index: 100, Size: 6
>> at java.util.ArrayList.rangeCheck(ArrayList.java:635)
>> at java.util.ArrayList.get(ArrayList.java:411)
>> at
>> com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:42)
>> at com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:773)
>> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:727)
>> at
>> com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:134)
>> at
>> com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
>> at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
>> at
>> com.esotericsoftware.kryo.serializers

Re: Bulk loading Serialized RDD into Hbase throws KryoException - IndexOutOfBoundsException

2016-05-29 Thread Ted Yu
bq.  at com.mycorpt.myprojjobs.spark.jobs.hbase.HbaseUtils$$
anonfun$writeRddToHBase2$1.apply(HbaseUtils.scala:80)

Can you reveal related code from HbaseUtils.scala ?

Which hbase version are you using ?

Thanks

On Sun, May 29, 2016 at 4:26 PM, Nirav Patel  wrote:

> Hi,
>
> I am getting following Kryo deserialization error when trying to buklload
> Cached RDD into Hbase. It works if I don't cache the RDD. I cache it
> with MEMORY_ONLY_SER.
>
> here's the code snippet:
>
>
> hbaseRdd.values.foreachPartition{ itr =>
> val hConf = HBaseConfiguration.create()
> hConf.setInt("hbase.client.write.buffer", 16097152)
> val table = new HTable(hConf, tableName)
> itr.grouped(100).foreach(table.put(_))
> table.close()
> }
> hbaseRdd is of type RDD[(ImmutableBytesWritable, Put)]
>
>
> Exception I am getting. I read on Kryo JIRA that this may be issue with
> incorrect use of serialization library. So could this be issue with
> twitter-chill library or spark core it self ?
>
> Job aborted due to stage failure: Task 16 in stage 9.0 failed 10 times,
> most recent failure: Lost task 16.9 in stage 9.0 (TID 28614,
> hdn10.mycorptcorporation.local): com.esotericsoftware.kryo.KryoException:
> java.lang.IndexOutOfBoundsException: Index: 100, Size: 6
> Serialization trace:
> familyMap (org.apache.hadoop.hbase.client.Put)
> at
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:626)
> at
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
> at com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:42)
> at com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:33)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
> at
> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:192)
> at
> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:181)
> at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
> at
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
> at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:966)
> at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:972)
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> at
> com.mycorpt.myprojjobs.spark.jobs.hbase.HbaseUtils$$anonfun$writeRddToHBase2$1.apply(HbaseUtils.scala:80)
> at
> com.mycorpt.myprojjobs.spark.jobs.hbase.HbaseUtils$$anonfun$writeRddToHBase2$1.apply(HbaseUtils.scala:75)
> at
> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:902)
> at
> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:902)
> at
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
> at
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.IndexOutOfBoundsException: Index: 100, Size: 6
> at java.util.ArrayList.rangeCheck(ArrayList.java:635)
> at java.util.ArrayList.get(ArrayList.java:411)
> at
> com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:42)
> at com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:773)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:727)
> at
> com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:134)
> at
> com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
> at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
> at
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
> ... 26 more
>
>
>
> [image: What's New with Xactly] 
>
>   [image: LinkedIn]
>   [image: Twitter]
>   [image: Facebook]
>   [image: YouTube]
> 


Bulk loading Serialized RDD into Hbase throws KryoException - IndexOutOfBoundsException

2016-05-29 Thread Nirav Patel
Hi,

I am getting following Kryo deserialization error when trying to buklload
Cached RDD into Hbase. It works if I don't cache the RDD. I cache it
with MEMORY_ONLY_SER.

here's the code snippet:


hbaseRdd.values.foreachPartition{ itr =>
val hConf = HBaseConfiguration.create()
hConf.setInt("hbase.client.write.buffer", 16097152)
val table = new HTable(hConf, tableName)
itr.grouped(100).foreach(table.put(_))
table.close()
}
hbaseRdd is of type RDD[(ImmutableBytesWritable, Put)]


Exception I am getting. I read on Kryo JIRA that this may be issue with
incorrect use of serialization library. So could this be issue with
twitter-chill library or spark core it self ?

Job aborted due to stage failure: Task 16 in stage 9.0 failed 10 times,
most recent failure: Lost task 16.9 in stage 9.0 (TID 28614,
hdn10.mycorptcorporation.local): com.esotericsoftware.kryo.KryoException:
java.lang.IndexOutOfBoundsException: Index: 100, Size: 6
Serialization trace:
familyMap (org.apache.hadoop.hbase.client.Put)
at
com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:626)
at
com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
at com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:42)
at com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:33)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
at
org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:192)
at
org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:181)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:966)
at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:972)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at
com.mycorpt.myprojjobs.spark.jobs.hbase.HbaseUtils$$anonfun$writeRddToHBase2$1.apply(HbaseUtils.scala:80)
at
com.mycorpt.myprojjobs.spark.jobs.hbase.HbaseUtils$$anonfun$writeRddToHBase2$1.apply(HbaseUtils.scala:75)
at
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:902)
at
org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:902)
at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.IndexOutOfBoundsException: Index: 100, Size: 6
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at
com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:42)
at com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:773)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:727)
at
com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:134)
at
com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
at
com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
... 26 more

-- 


[image: What's New with Xactly] 

  [image: LinkedIn] 
  [image: Twitter] 
  [image: Facebook] 
  [image: YouTube]