Re: java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result
One hack you can put in would be to bring Result class http://grepcode.com/file_/repository.cloudera.com/content/repositories/releases/com.cloudera.hbase/hbase/0.89.20100924-28/org/apache/hadoop/hbase/client/Result.java/?v=source locally and serialize it (implements serializable) and use it. Thanks Best Regards On Tue, Apr 7, 2015 at 12:07 AM, Jeetendra Gangele gangele...@gmail.com wrote: I hit again same issue This time I tried to return the Object it failed with task not serialized below is the code here vendor record is serializable private static JavaRDDVendorRecord getVendorDataToProcess(JavaSparkContext sc) throws IOException { return sc .newAPIHadoopRDD(getVendorDataRowKeyScannerConfiguration(), TableInputFormat.class, ImmutableBytesWritable.class, Result.class) .map(new FunctionTuple2ImmutableBytesWritable, Result, VendorRecord() { @Override public VendorRecord call(Tuple2ImmutableBytesWritable, Result v1) throws Exception { String rowKey = new String(v1._1.get()); VendorRecord vd=vendorDataDAO.getVendorDataForRowkey(rowKey); return vd; } }); } On 1 April 2015 at 02:07, Ted Yu yuzhih...@gmail.com wrote: Jeetendra: Please extract the information you need from Result and return the extracted portion - instead of returning Result itself. Cheers On Tue, Mar 31, 2015 at 1:14 PM, Nan Zhu zhunanmcg...@gmail.com wrote: The example in https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/HBaseTest.scala might help Best, -- Nan Zhu http://codingcat.me On Tuesday, March 31, 2015 at 3:56 PM, Sean Owen wrote: Yep, it's not serializable: https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html You can't return this from a distributed operation since that would mean it has to travel over the network and you haven't supplied any way to convert the thing into bytes. On Tue, Mar 31, 2015 at 8:51 PM, Jeetendra Gangele gangele...@gmail.com wrote: When I am trying to get the result from Hbase and running mapToPair function of RRD its giving the error java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result Here is the code // private static JavaPairRDDInteger, Result getCompanyDataRDD(JavaSparkContext sc) throws IOException { // return sc.newAPIHadoopRDD(companyDAO.getCompnayDataConfiguration(), TableInputFormat.class, ImmutableBytesWritable.class, // Result.class).mapToPair(new PairFunctionTuple2ImmutableBytesWritable, Result, Integer, Result() { // // public Tuple2Integer, Result call(Tuple2ImmutableBytesWritable, Result t) throws Exception { // System.out.println(In getCompanyDataRDD+t._2); // // String cknid = Bytes.toString(t._1.get()); // System.out.println(processing cknids is:+cknid); // Integer cknidInt = Integer.parseInt(cknid); // Tuple2Integer, Result returnTuple = new Tuple2Integer, Result(cknidInt, t._2); // return returnTuple; // } // }); // } - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result
I hit again same issue This time I tried to return the Object it failed with task not serialized below is the code here vendor record is serializable private static JavaRDDVendorRecord getVendorDataToProcess(JavaSparkContext sc) throws IOException { return sc .newAPIHadoopRDD(getVendorDataRowKeyScannerConfiguration(), TableInputFormat.class, ImmutableBytesWritable.class, Result.class) .map(new FunctionTuple2ImmutableBytesWritable, Result, VendorRecord() { @Override public VendorRecord call(Tuple2ImmutableBytesWritable, Result v1) throws Exception { String rowKey = new String(v1._1.get()); VendorRecord vd=vendorDataDAO.getVendorDataForRowkey(rowKey); return vd; } }); } On 1 April 2015 at 02:07, Ted Yu yuzhih...@gmail.com wrote: Jeetendra: Please extract the information you need from Result and return the extracted portion - instead of returning Result itself. Cheers On Tue, Mar 31, 2015 at 1:14 PM, Nan Zhu zhunanmcg...@gmail.com wrote: The example in https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/HBaseTest.scala might help Best, -- Nan Zhu http://codingcat.me On Tuesday, March 31, 2015 at 3:56 PM, Sean Owen wrote: Yep, it's not serializable: https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html You can't return this from a distributed operation since that would mean it has to travel over the network and you haven't supplied any way to convert the thing into bytes. On Tue, Mar 31, 2015 at 8:51 PM, Jeetendra Gangele gangele...@gmail.com wrote: When I am trying to get the result from Hbase and running mapToPair function of RRD its giving the error java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result Here is the code // private static JavaPairRDDInteger, Result getCompanyDataRDD(JavaSparkContext sc) throws IOException { // return sc.newAPIHadoopRDD(companyDAO.getCompnayDataConfiguration(), TableInputFormat.class, ImmutableBytesWritable.class, // Result.class).mapToPair(new PairFunctionTuple2ImmutableBytesWritable, Result, Integer, Result() { // // public Tuple2Integer, Result call(Tuple2ImmutableBytesWritable, Result t) throws Exception { // System.out.println(In getCompanyDataRDD+t._2); // // String cknid = Bytes.toString(t._1.get()); // System.out.println(processing cknids is:+cknid); // Integer cknidInt = Integer.parseInt(cknid); // Tuple2Integer, Result returnTuple = new Tuple2Integer, Result(cknidInt, t._2); // return returnTuple; // } // }); // } - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result
Yep, it's not serializable: https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html You can't return this from a distributed operation since that would mean it has to travel over the network and you haven't supplied any way to convert the thing into bytes. On Tue, Mar 31, 2015 at 8:51 PM, Jeetendra Gangele gangele...@gmail.com wrote: When I am trying to get the result from Hbase and running mapToPair function of RRD its giving the error java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result Here is the code // private static JavaPairRDDInteger, Result getCompanyDataRDD(JavaSparkContext sc) throws IOException { // return sc.newAPIHadoopRDD(companyDAO.getCompnayDataConfiguration(), TableInputFormat.class, ImmutableBytesWritable.class, //Result.class).mapToPair(new PairFunctionTuple2ImmutableBytesWritable, Result, Integer, Result() { // // public Tuple2Integer, Result call(Tuple2ImmutableBytesWritable, Result t) throws Exception { // System.out.println(In getCompanyDataRDD+t._2); // // String cknid = Bytes.toString(t._1.get()); // System.out.println(processing cknids is:+cknid); // Integer cknidInt = Integer.parseInt(cknid); // Tuple2Integer, Result returnTuple = new Tuple2Integer, Result(cknidInt, t._2); // return returnTuple; // } // }); // } - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result
When I am trying to get the result from Hbase and running mapToPair function of RRD its giving the error java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result Here is the code // private static JavaPairRDDInteger, Result getCompanyDataRDD(JavaSparkContext sc) throws IOException { // return sc.newAPIHadoopRDD(companyDAO.getCompnayDataConfiguration(), TableInputFormat.class, ImmutableBytesWritable.class, //Result.class).mapToPair(new PairFunctionTuple2ImmutableBytesWritable, Result, Integer, Result() { // // public Tuple2Integer, Result call(Tuple2ImmutableBytesWritable, Result t) throws Exception { // System.out.println(In getCompanyDataRDD+t._2); // // String cknid = Bytes.toString(t._1.get()); // System.out.println(processing cknids is:+cknid); // Integer cknidInt = Integer.parseInt(cknid); // Tuple2Integer, Result returnTuple = new Tuple2Integer, Result(cknidInt, t._2); // return returnTuple; // } // }); // }
Re: java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result
The example in https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/HBaseTest.scala might help Best, -- Nan Zhu http://codingcat.me On Tuesday, March 31, 2015 at 3:56 PM, Sean Owen wrote: Yep, it's not serializable: https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html You can't return this from a distributed operation since that would mean it has to travel over the network and you haven't supplied any way to convert the thing into bytes. On Tue, Mar 31, 2015 at 8:51 PM, Jeetendra Gangele gangele...@gmail.com (mailto:gangele...@gmail.com) wrote: When I am trying to get the result from Hbase and running mapToPair function of RRD its giving the error java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result Here is the code // private static JavaPairRDDInteger, Result getCompanyDataRDD(JavaSparkContext sc) throws IOException { // return sc.newAPIHadoopRDD(companyDAO.getCompnayDataConfiguration(), TableInputFormat.class, ImmutableBytesWritable.class, // Result.class).mapToPair(new PairFunctionTuple2ImmutableBytesWritable, Result, Integer, Result() { // // public Tuple2Integer, Result call(Tuple2ImmutableBytesWritable, Result t) throws Exception { // System.out.println(In getCompanyDataRDD+t._2); // // String cknid = Bytes.toString(t._1.get()); // System.out.println(processing cknids is:+cknid); // Integer cknidInt = Integer.parseInt(cknid); // Tuple2Integer, Result returnTuple = new Tuple2Integer, Result(cknidInt, t._2); // return returnTuple; // } // }); // } - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org (mailto:user-unsubscr...@spark.apache.org) For additional commands, e-mail: user-h...@spark.apache.org (mailto:user-h...@spark.apache.org)
Re: java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result
Jeetendra: Please extract the information you need from Result and return the extracted portion - instead of returning Result itself. Cheers On Tue, Mar 31, 2015 at 1:14 PM, Nan Zhu zhunanmcg...@gmail.com wrote: The example in https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/HBaseTest.scala might help Best, -- Nan Zhu http://codingcat.me On Tuesday, March 31, 2015 at 3:56 PM, Sean Owen wrote: Yep, it's not serializable: https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html You can't return this from a distributed operation since that would mean it has to travel over the network and you haven't supplied any way to convert the thing into bytes. On Tue, Mar 31, 2015 at 8:51 PM, Jeetendra Gangele gangele...@gmail.com wrote: When I am trying to get the result from Hbase and running mapToPair function of RRD its giving the error java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result Here is the code // private static JavaPairRDDInteger, Result getCompanyDataRDD(JavaSparkContext sc) throws IOException { // return sc.newAPIHadoopRDD(companyDAO.getCompnayDataConfiguration(), TableInputFormat.class, ImmutableBytesWritable.class, // Result.class).mapToPair(new PairFunctionTuple2ImmutableBytesWritable, Result, Integer, Result() { // // public Tuple2Integer, Result call(Tuple2ImmutableBytesWritable, Result t) throws Exception { // System.out.println(In getCompanyDataRDD+t._2); // // String cknid = Bytes.toString(t._1.get()); // System.out.println(processing cknids is:+cknid); // Integer cknidInt = Integer.parseInt(cknid); // Tuple2Integer, Result returnTuple = new Tuple2Integer, Result(cknidInt, t._2); // return returnTuple; // } // }); // } - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark : java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result
Hi Team, I'm getting below exception. Could you please me to resolve this issue. Below is my piece of code val rdd = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat], classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable], classOf[org.apache.hadoop.hbase.client.Result]) var s =rdd.map(x = x._2) var a = test(s.collect) def test(s:Array[org.apache.hadoop.hbase.client.Result]) { var invRecords = HashMap[String, HashMap[String, String]]() s foreach (x = { var invValues = HashMap[String, String]() x rawCells () foreach (y = { invValues += Bytes.toString(y getQualifier) - Bytes.toString(y getValue) }) invRecords += Bytes.toString(x getRow) - invValues }) println(* === +invRecords.size) } *java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result* at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183) at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:42) at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:71) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:197) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:701) 2014-09-24 20:57:29,703 WARN [Result resolver thread-0] scheduler.TaskSetManager (Logging.scala:logWarning(70)) - Lost TID 0 (task 0.0:0) 2014-09-24 20:57:29,717 ERROR [Result resolver thread-0] scheduler.TaskSetManager (Logging.scala:logError(74)) - Task 0.0:0 had a not serializable result: java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result; not retrying 2014-09-24 20:57:29,722 INFO [main] scheduler.DAGScheduler (Logging.scala:logInfo(58)) - Failed to run collect at HBaseTest5.scala:26 *Exception in thread main org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:0 had a not serializable result: java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result* at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:633) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1207) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)