Re: java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result

2015-04-14 Thread Akhil Das
One hack you can put in would be to bring Result class
http://grepcode.com/file_/repository.cloudera.com/content/repositories/releases/com.cloudera.hbase/hbase/0.89.20100924-28/org/apache/hadoop/hbase/client/Result.java/?v=source
locally and serialize it (implements serializable) and use it.

Thanks
Best Regards

On Tue, Apr 7, 2015 at 12:07 AM, Jeetendra Gangele gangele...@gmail.com
wrote:

 I hit again same issue This time I tried to return the Object it failed
 with task not serialized below is the code
 here vendor record is serializable

 private static JavaRDDVendorRecord
 getVendorDataToProcess(JavaSparkContext sc) throws IOException {
  return sc
 .newAPIHadoopRDD(getVendorDataRowKeyScannerConfiguration(),
 TableInputFormat.class,
 ImmutableBytesWritable.class, Result.class)
 .map(new FunctionTuple2ImmutableBytesWritable, Result,
 VendorRecord() {
 @Override
 public VendorRecord call(Tuple2ImmutableBytesWritable, Result v1)
 throws Exception {
 String rowKey = new String(v1._1.get());
  VendorRecord vd=vendorDataDAO.getVendorDataForRowkey(rowKey);
  return vd;
 }
 });
  }


 On 1 April 2015 at 02:07, Ted Yu yuzhih...@gmail.com wrote:

 Jeetendra:
 Please extract the information you need from Result and return the
 extracted portion - instead of returning Result itself.

 Cheers

 On Tue, Mar 31, 2015 at 1:14 PM, Nan Zhu zhunanmcg...@gmail.com wrote:

 The example in
 https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/HBaseTest.scala
  might
 help

 Best,

 --
 Nan Zhu
 http://codingcat.me

 On Tuesday, March 31, 2015 at 3:56 PM, Sean Owen wrote:

 Yep, it's not serializable:

 https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html

 You can't return this from a distributed operation since that would
 mean it has to travel over the network and you haven't supplied any
 way to convert the thing into bytes.

 On Tue, Mar 31, 2015 at 8:51 PM, Jeetendra Gangele gangele...@gmail.com
 wrote:

 When I am trying to get the result from Hbase and running mapToPair
 function
 of RRD its giving the error
 java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result

 Here is the code

 // private static JavaPairRDDInteger, Result
 getCompanyDataRDD(JavaSparkContext sc) throws IOException {
 // return sc.newAPIHadoopRDD(companyDAO.getCompnayDataConfiguration(),
 TableInputFormat.class, ImmutableBytesWritable.class,
 // Result.class).mapToPair(new
 PairFunctionTuple2ImmutableBytesWritable, Result, Integer, Result() {
 //
 // public Tuple2Integer, Result call(Tuple2ImmutableBytesWritable,
 Result t) throws Exception {
 // System.out.println(In getCompanyDataRDD+t._2);
 //
 // String cknid = Bytes.toString(t._1.get());
 // System.out.println(processing cknids is:+cknid);
 // Integer cknidInt = Integer.parseInt(cknid);
 // Tuple2Integer, Result returnTuple = new Tuple2Integer,
 Result(cknidInt, t._2);
 // return returnTuple;
 // }
 // });
 // }


 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org










Re: java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result

2015-04-06 Thread Jeetendra Gangele
I hit again same issue This time I tried to return the Object it failed
with task not serialized below is the code
here vendor record is serializable

private static JavaRDDVendorRecord
getVendorDataToProcess(JavaSparkContext sc) throws IOException {
 return sc
.newAPIHadoopRDD(getVendorDataRowKeyScannerConfiguration(),
TableInputFormat.class,
ImmutableBytesWritable.class, Result.class)
.map(new FunctionTuple2ImmutableBytesWritable, Result,
VendorRecord() {
@Override
public VendorRecord call(Tuple2ImmutableBytesWritable, Result v1)
throws Exception {
String rowKey = new String(v1._1.get());
 VendorRecord vd=vendorDataDAO.getVendorDataForRowkey(rowKey);
 return vd;
}
});
 }


On 1 April 2015 at 02:07, Ted Yu yuzhih...@gmail.com wrote:

 Jeetendra:
 Please extract the information you need from Result and return the
 extracted portion - instead of returning Result itself.

 Cheers

 On Tue, Mar 31, 2015 at 1:14 PM, Nan Zhu zhunanmcg...@gmail.com wrote:

 The example in
 https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/HBaseTest.scala
  might
 help

 Best,

 --
 Nan Zhu
 http://codingcat.me

 On Tuesday, March 31, 2015 at 3:56 PM, Sean Owen wrote:

 Yep, it's not serializable:

 https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html

 You can't return this from a distributed operation since that would
 mean it has to travel over the network and you haven't supplied any
 way to convert the thing into bytes.

 On Tue, Mar 31, 2015 at 8:51 PM, Jeetendra Gangele gangele...@gmail.com
 wrote:

 When I am trying to get the result from Hbase and running mapToPair
 function
 of RRD its giving the error
 java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result

 Here is the code

 // private static JavaPairRDDInteger, Result
 getCompanyDataRDD(JavaSparkContext sc) throws IOException {
 // return sc.newAPIHadoopRDD(companyDAO.getCompnayDataConfiguration(),
 TableInputFormat.class, ImmutableBytesWritable.class,
 // Result.class).mapToPair(new
 PairFunctionTuple2ImmutableBytesWritable, Result, Integer, Result() {
 //
 // public Tuple2Integer, Result call(Tuple2ImmutableBytesWritable,
 Result t) throws Exception {
 // System.out.println(In getCompanyDataRDD+t._2);
 //
 // String cknid = Bytes.toString(t._1.get());
 // System.out.println(processing cknids is:+cknid);
 // Integer cknidInt = Integer.parseInt(cknid);
 // Tuple2Integer, Result returnTuple = new Tuple2Integer,
 Result(cknidInt, t._2);
 // return returnTuple;
 // }
 // });
 // }


 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org






Re: java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result

2015-03-31 Thread Sean Owen
Yep, it's not serializable:
https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html

You can't return this from a distributed operation since that would
mean it has to travel over the network and you haven't supplied any
way to convert the thing into bytes.

On Tue, Mar 31, 2015 at 8:51 PM, Jeetendra Gangele gangele...@gmail.com wrote:
 When I am trying to get the result from Hbase and running mapToPair function
 of RRD its giving the error
 java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result

 Here is the code

 // private static JavaPairRDDInteger, Result
 getCompanyDataRDD(JavaSparkContext sc) throws IOException {
 // return sc.newAPIHadoopRDD(companyDAO.getCompnayDataConfiguration(),
 TableInputFormat.class, ImmutableBytesWritable.class,
 //Result.class).mapToPair(new
 PairFunctionTuple2ImmutableBytesWritable, Result, Integer, Result() {
 //
 // public Tuple2Integer, Result call(Tuple2ImmutableBytesWritable,
 Result t) throws Exception {
 // System.out.println(In getCompanyDataRDD+t._2);
 //
 // String cknid = Bytes.toString(t._1.get());
 // System.out.println(processing cknids is:+cknid);
 // Integer cknidInt = Integer.parseInt(cknid);
 // Tuple2Integer, Result returnTuple = new Tuple2Integer,
 Result(cknidInt, t._2);
 // return returnTuple;
 // }
 // });
 // }

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result

2015-03-31 Thread Jeetendra Gangele
When I am trying to get the result from Hbase and running mapToPair
function of RRD its giving the error
java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result

Here is the code

// private static JavaPairRDDInteger, Result
getCompanyDataRDD(JavaSparkContext sc) throws IOException {
// return sc.newAPIHadoopRDD(companyDAO.getCompnayDataConfiguration(),
TableInputFormat.class, ImmutableBytesWritable.class,
//Result.class).mapToPair(new
PairFunctionTuple2ImmutableBytesWritable, Result, Integer, Result() {
//
// public Tuple2Integer, Result call(Tuple2ImmutableBytesWritable,
Result t) throws Exception {
// System.out.println(In getCompanyDataRDD+t._2);
//
// String cknid = Bytes.toString(t._1.get());
// System.out.println(processing cknids is:+cknid);
// Integer cknidInt = Integer.parseInt(cknid);
// Tuple2Integer, Result returnTuple = new Tuple2Integer,
Result(cknidInt, t._2);
// return returnTuple;
// }
// });
// }


Re: java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result

2015-03-31 Thread Nan Zhu
The example in 
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/HBaseTest.scala
 might help

Best, 

-- 
Nan Zhu
http://codingcat.me


On Tuesday, March 31, 2015 at 3:56 PM, Sean Owen wrote:

 Yep, it's not serializable:
 https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html
 
 You can't return this from a distributed operation since that would
 mean it has to travel over the network and you haven't supplied any
 way to convert the thing into bytes.
 
 On Tue, Mar 31, 2015 at 8:51 PM, Jeetendra Gangele gangele...@gmail.com 
 (mailto:gangele...@gmail.com) wrote:
  When I am trying to get the result from Hbase and running mapToPair function
  of RRD its giving the error
  java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result
  
  Here is the code
  
  // private static JavaPairRDDInteger, Result
  getCompanyDataRDD(JavaSparkContext sc) throws IOException {
  // return sc.newAPIHadoopRDD(companyDAO.getCompnayDataConfiguration(),
  TableInputFormat.class, ImmutableBytesWritable.class,
  // Result.class).mapToPair(new
  PairFunctionTuple2ImmutableBytesWritable, Result, Integer, Result() {
  //
  // public Tuple2Integer, Result call(Tuple2ImmutableBytesWritable,
  Result t) throws Exception {
  // System.out.println(In getCompanyDataRDD+t._2);
  //
  // String cknid = Bytes.toString(t._1.get());
  // System.out.println(processing cknids is:+cknid);
  // Integer cknidInt = Integer.parseInt(cknid);
  // Tuple2Integer, Result returnTuple = new Tuple2Integer,
  Result(cknidInt, t._2);
  // return returnTuple;
  // }
  // });
  // }
  
 
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
 (mailto:user-unsubscr...@spark.apache.org)
 For additional commands, e-mail: user-h...@spark.apache.org 
 (mailto:user-h...@spark.apache.org)
 
 




Re: java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result

2015-03-31 Thread Ted Yu
Jeetendra:
Please extract the information you need from Result and return the
extracted portion - instead of returning Result itself.

Cheers

On Tue, Mar 31, 2015 at 1:14 PM, Nan Zhu zhunanmcg...@gmail.com wrote:

 The example in
 https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/HBaseTest.scala
  might
 help

 Best,

 --
 Nan Zhu
 http://codingcat.me

 On Tuesday, March 31, 2015 at 3:56 PM, Sean Owen wrote:

 Yep, it's not serializable:
 https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html

 You can't return this from a distributed operation since that would
 mean it has to travel over the network and you haven't supplied any
 way to convert the thing into bytes.

 On Tue, Mar 31, 2015 at 8:51 PM, Jeetendra Gangele gangele...@gmail.com
 wrote:

 When I am trying to get the result from Hbase and running mapToPair
 function
 of RRD its giving the error
 java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result

 Here is the code

 // private static JavaPairRDDInteger, Result
 getCompanyDataRDD(JavaSparkContext sc) throws IOException {
 // return sc.newAPIHadoopRDD(companyDAO.getCompnayDataConfiguration(),
 TableInputFormat.class, ImmutableBytesWritable.class,
 // Result.class).mapToPair(new
 PairFunctionTuple2ImmutableBytesWritable, Result, Integer, Result() {
 //
 // public Tuple2Integer, Result call(Tuple2ImmutableBytesWritable,
 Result t) throws Exception {
 // System.out.println(In getCompanyDataRDD+t._2);
 //
 // String cknid = Bytes.toString(t._1.get());
 // System.out.println(processing cknids is:+cknid);
 // Integer cknidInt = Integer.parseInt(cknid);
 // Tuple2Integer, Result returnTuple = new Tuple2Integer,
 Result(cknidInt, t._2);
 // return returnTuple;
 // }
 // });
 // }


 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org





Spark : java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result

2014-09-24 Thread Madabhattula Rajesh Kumar
Hi Team,

I'm getting below exception. Could you please me to resolve this issue.

Below is my piece of code

val rdd = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat],
classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],
classOf[org.apache.hadoop.hbase.client.Result])

var s =rdd.map(x = x._2)

var a = test(s.collect)

def test(s:Array[org.apache.hadoop.hbase.client.Result])
{

  var invRecords = HashMap[String, HashMap[String, String]]()
 s foreach (x = {
var invValues = HashMap[String, String]()
x rawCells () foreach (y = { invValues += Bytes.toString(y
getQualifier) - Bytes.toString(y getValue) })
invRecords += Bytes.toString(x getRow) - invValues
  })
println(* === +invRecords.size)

}

*java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result*
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
at
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:42)
at
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:71)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:197)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:701)
2014-09-24 20:57:29,703 WARN  [Result resolver thread-0]
scheduler.TaskSetManager (Logging.scala:logWarning(70)) - Lost TID 0 (task
0.0:0)
2014-09-24 20:57:29,717 ERROR [Result resolver thread-0]
scheduler.TaskSetManager (Logging.scala:logError(74)) - Task 0.0:0 had a
not serializable result: java.io.NotSerializableException:
org.apache.hadoop.hbase.client.Result; not retrying
2014-09-24 20:57:29,722 INFO  [main] scheduler.DAGScheduler
(Logging.scala:logInfo(58)) - Failed to run collect at HBaseTest5.scala:26
*Exception in thread main org.apache.spark.SparkException: Job aborted
due to stage failure: Task 0.0:0 had a not serializable result:
java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result*
at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:633)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1207)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)