Re: Needed some best practices to integrate Spark with HBase
I also need good docs on this. Especially integrating pyspark with hive reading tables from hbase. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Spark with HBase on Spark Runtime 2.2.1
I wrote a simple program to read data from HBase, the program works find in Cloudera backed by HDFS. The program works fine on SPARK RUNTIME 1.6 on Cloudera. But does NOT work on EMR with Spark Runtime 2.2.1. But getting an exception while testing data on EMR with S3. // Spark conf SparkConf sparkConf = new SparkConf().setMaster("local[4]").setAppName("My App"); JavaSparkContext jsc = new JavaSparkContext(sparkConf); // Hbase conf Configuration conf = HBaseConfiguration.create(); conf.set("hbase.zookeeper.quorum","localhost"); conf.set("hbase.zookeeper.property.client.port","2181"); // Submit scan into hbase conf // conf.set(TableInputFormat.SCAN, TableMapReduceUtil.convertScanToString(scan)); conf.set(TableInputFormat.INPUT_TABLE, "mytable"); conf.set(TableInputFormat.SCAN_ROW_START, "startrow"); conf.set(TableInputFormat.SCAN_ROW_STOP, "endrow"); // Get RDD JavaPairRDD source = jsc .newAPIHadoopRDD(conf, TableInputFormat.class, ImmutableBytesWritable.class, Result.class); // Process RDD System.out.println("&&& " + source.count()); 0 down vote favorite I wrote a simple program to read data from HBase, the program works find in Cloudera backed by HDFS. But getting an exception while testing data on EMR with S3. // Spark conf SparkConf sparkConf = new SparkConf().setMaster("local[4]").setAppName("My App"); JavaSparkContext jsc = new JavaSparkContext(sparkConf); // Hbase conf Configuration conf = HBaseConfiguration.create(); conf.set("hbase.zookeeper.quorum","localhost"); conf.set("hbase.zookeeper.property.client.port","2181"); // Submit scan into hbase conf // conf.set(TableInputFormat.SCAN, TableMapReduceUtil.convertScanToString(scan)); conf.set(TableInputFormat.INPUT_TABLE, "mytable"); conf.set(TableInputFormat.SCAN_ROW_START, "startrow"); conf.set(TableInputFormat.SCAN_ROW_STOP, "endrow"); // Get RDD JavaPairRDD source = jsc .newAPIHadoopRDD(conf, TableInputFormat.class, ImmutableBytesWritable.class, Result.class); // Process RDD System.out.println("&&& " + source.count()); 18/05/04 00:22:02 INFO MetricRegistries: Loaded MetricRegistries class org.apache.hadoop.hbase.metrics.impl.MetricRegistriesImpl 18/05/04 00:22:02 ERROR TableInputFormat: java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238) Caused by: java.lang.IllegalAccessError: tried to access class org.apache.hadoop.metrics2.lib.MetricsInfoImpl from class org.apache.hadoop.metrics2.lib.DynamicMetricsRegistry at org.apache.hadoop.metrics2.lib.DynamicMetricsRegistry.newGauge(DynamicMetricsRegistry.java:139) at org.apache.hadoop.hbase.zookeeper.MetricsZooKeeperSourceImpl.(MetricsZooKeeperSourceImpl.java:59) at org.apache.hadoop.hbase.zookeeper.MetricsZooKeeperSourceImpl.(MetricsZooKeeperSourceImpl.java:51) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at java.lang.Class.newInstance(Class.java:442) at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380) ... 42 more Exception in thread "main" java.io.IOException: Cannot create a record reader because of a previous error. Please look at the previous logs lines from the task's full log for more details. at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:270) at org.apache.hadoop.hbase.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:256) at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:125) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2094) at org.apache.spark.rdd.RDD.count(RDD.scala:1158) at org.apache.spark.api.java.
Needed some best practices to integrate Spark with HBase
Dear All, Greetings ! I needed some best practices for integrating Spark with HBase. Would you be able to point me to some useful resources / URL's to your convenience please. Thanks, Debu
RE: Spark with HBase Error - Py4JJavaError
Hi Ram, Thanks very much it worked. Puneet From: ram kumar [mailto:ramkumarro...@gmail.com] Sent: Thursday, July 07, 2016 6:51 PM To: Puneet Tripathi Cc: user@spark.apache.org Subject: Re: Spark with HBase Error - Py4JJavaError Hi Puneet, Have you tried appending --jars $SPARK_HOME/lib/spark-examples-*.jar to the execution command? Ram On Thu, Jul 7, 2016 at 5:19 PM, Puneet Tripathi mailto:puneet.tripa...@dunnhumby.com>> wrote: Guys, Please can anyone help on the issue below? Puneet From: Puneet Tripathi [mailto:puneet.tripa...@dunnhumby.com<mailto:puneet.tripa...@dunnhumby.com>] Sent: Thursday, July 07, 2016 12:42 PM To: user@spark.apache.org<mailto:user@spark.apache.org> Subject: Spark with HBase Error - Py4JJavaError Hi, We are running Hbase in fully distributed mode. I tried to connect to Hbase via pyspark and then write to hbase using saveAsNewAPIHadoopDataset , but it failed the error says: Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.saveAsHadoopDataset. : java.lang.ClassNotFoundException: org.apache.spark.examples.pythonconverters.StringToImmutableBytesWritableConverter at java.net.URLClassLoader$1.run(URLClassLoader.java:366) I have been able to create pythonconverters.jar and then did below: 1. I think we have to copy this to a location on HDFS, /sparkjars/ seems a good a directory to create as any. I think the file has to be world readable 2. Set the spark_jar_hdfs_path property in Cloudera Manager e.g. hdfs:///sparkjars It still doesn’t seem to work can someone please help me with this. Regards, Puneet dunnhumby limited is a limited company registered in England and Wales with registered number 02388853 and VAT registered number 927 5871 83. Our registered office is at Aurora House, 71-75 Uxbridge Road, London W5 5SL. The contents of this message and any attachments to it are confidential and may be legally privileged. If you have received this message in error you should delete it from your system immediately and advise the sender. dunnhumby may monitor and record all emails. The views expressed in this email are those of the sender and not those of dunnhumby. dunnhumby limited is a limited company registered in England and Wales with registered number 02388853 and VAT registered number 927 5871 83. Our registered office is at Aurora House, 71-75 Uxbridge Road, London W5 5SL. The contents of this message and any attachments to it are confidential and may be legally privileged. If you have received this message in error you should delete it from your system immediately and advise the sender. dunnhumby may monitor and record all emails. The views expressed in this email are those of the sender and not those of dunnhumby. dunnhumby limited is a limited company registered in England and Wales with registered number 02388853 and VAT registered number 927 5871 83. Our registered office is at Aurora House, 71-75 Uxbridge Road, London W5 5SL. The contents of this message and any attachments to it are confidential and may be legally privileged. If you have received this message in error you should delete it from your system immediately and advise the sender. dunnhumby may monitor and record all emails. The views expressed in this email are those of the sender and not those of dunnhumby.
Re: Spark with HBase Error - Py4JJavaError
Hi Puneet, Have you tried appending --jars $SPARK_HOME/lib/spark-examples-*.jar to the execution command? Ram On Thu, Jul 7, 2016 at 5:19 PM, Puneet Tripathi < puneet.tripa...@dunnhumby.com> wrote: > Guys, Please can anyone help on the issue below? > > > > Puneet > > > > *From:* Puneet Tripathi [mailto:puneet.tripa...@dunnhumby.com] > *Sent:* Thursday, July 07, 2016 12:42 PM > *To:* user@spark.apache.org > *Subject:* Spark with HBase Error - Py4JJavaError > > > > Hi, > > > > We are running Hbase in fully distributed mode. I tried to connect to > Hbase via pyspark and then write to hbase using *saveAsNewAPIHadoopDataset > *, but it failed the error says: > > > > Py4JJavaError: An error occurred while calling > z:org.apache.spark.api.python.PythonRDD.saveAsHadoopDataset. > > : java.lang.ClassNotFoundException: > org.apache.spark.examples.pythonconverters.StringToImmutableBytesWritableConverter > > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > > I have been able to create pythonconverters.jar and then did below: > > > > 1. I think we have to copy this to a location on HDFS, /sparkjars/ > seems a good a directory to create as any. I think the file has to be world > readable > > 2. Set the spark_jar_hdfs_path property in Cloudera Manager e.g. > hdfs:///sparkjars > > > > It still doesn’t seem to work can someone please help me with this. > > > > Regards, > > Puneet > > dunnhumby limited is a limited company registered in England and Wales > with registered number 02388853 and VAT registered number 927 5871 83. Our > registered office is at Aurora House, 71-75 Uxbridge Road, London W5 5SL. > The contents of this message and any attachments to it are confidential and > may be legally privileged. If you have received this message in error you > should delete it from your system immediately and advise the sender. > dunnhumby may monitor and record all emails. The views expressed in this > email are those of the sender and not those of dunnhumby. > dunnhumby limited is a limited company registered in England and Wales > with registered number 02388853 and VAT registered number 927 5871 83. Our > registered office is at Aurora House, 71-75 Uxbridge Road, London W5 5SL. > The contents of this message and any attachments to it are confidential and > may be legally privileged. If you have received this message in error you > should delete it from your system immediately and advise the sender. > dunnhumby may monitor and record all emails. The views expressed in this > email are those of the sender and not those of dunnhumby. >
RE: Spark with HBase Error - Py4JJavaError
Guys, Please can anyone help on the issue below? Puneet From: Puneet Tripathi [mailto:puneet.tripa...@dunnhumby.com] Sent: Thursday, July 07, 2016 12:42 PM To: user@spark.apache.org Subject: Spark with HBase Error - Py4JJavaError Hi, We are running Hbase in fully distributed mode. I tried to connect to Hbase via pyspark and then write to hbase using saveAsNewAPIHadoopDataset , but it failed the error says: Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.saveAsHadoopDataset. : java.lang.ClassNotFoundException: org.apache.spark.examples.pythonconverters.StringToImmutableBytesWritableConverter at java.net.URLClassLoader$1.run(URLClassLoader.java:366) I have been able to create pythonconverters.jar and then did below: 1. I think we have to copy this to a location on HDFS, /sparkjars/ seems a good a directory to create as any. I think the file has to be world readable 2. Set the spark_jar_hdfs_path property in Cloudera Manager e.g. hdfs:///sparkjars It still doesn't seem to work can someone please help me with this. Regards, Puneet dunnhumby limited is a limited company registered in England and Wales with registered number 02388853 and VAT registered number 927 5871 83. Our registered office is at Aurora House, 71-75 Uxbridge Road, London W5 5SL. The contents of this message and any attachments to it are confidential and may be legally privileged. If you have received this message in error you should delete it from your system immediately and advise the sender. dunnhumby may monitor and record all emails. The views expressed in this email are those of the sender and not those of dunnhumby. dunnhumby limited is a limited company registered in England and Wales with registered number 02388853 and VAT registered number 927 5871 83. Our registered office is at Aurora House, 71-75 Uxbridge Road, London W5 5SL. The contents of this message and any attachments to it are confidential and may be legally privileged. If you have received this message in error you should delete it from your system immediately and advise the sender. dunnhumby may monitor and record all emails. The views expressed in this email are those of the sender and not those of dunnhumby.
Spark with HBase Error - Py4JJavaError
Hi, We are running Hbase in fully distributed mode. I tried to connect to Hbase via pyspark and then write to hbase using saveAsNewAPIHadoopDataset , but it failed the error says: Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.saveAsHadoopDataset. : java.lang.ClassNotFoundException: org.apache.spark.examples.pythonconverters.StringToImmutableBytesWritableConverter at java.net.URLClassLoader$1.run(URLClassLoader.java:366) I have been able to create pythonconverters.jar and then did below: 1. I think we have to copy this to a location on HDFS, /sparkjars/ seems a good a directory to create as any. I think the file has to be world readable 2. Set the spark_jar_hdfs_path property in Cloudera Manager e.g. hdfs:///sparkjars It still doesn't seem to work can someone please help me with this. Regards, Puneet dunnhumby limited is a limited company registered in England and Wales with registered number 02388853 and VAT registered number 927 5871 83. Our registered office is at Aurora House, 71-75 Uxbridge Road, London W5 5SL. The contents of this message and any attachments to it are confidential and may be legally privileged. If you have received this message in error you should delete it from your system immediately and advise the sender. dunnhumby may monitor and record all emails. The views expressed in this email are those of the sender and not those of dunnhumby.
Re: Spark with HBase
In case you are still looking for help, there has been multiple discussions in this mailing list that you can try searching for. Or you can simply use https://github.com/unicredit/hbase-rdd :-) Thanks, Aniket On Wed Dec 03 2014 at 16:11:47 Ted Yu wrote: > Which hbase release are you running ? > If it is 0.98, take a look at: > > https://issues.apache.org/jira/browse/SPARK-1297 > > Thanks > > On Dec 2, 2014, at 10:21 PM, Jai wrote: > > I am trying to use Apache Spark with a psuedo distributed Hadoop Hbase > Cluster and I am looking for some links regarding the same. Can someone > please guide me through the steps to accomplish this. Thanks a lot for > Helping > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-with-HBase-tp20226.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
Re: Spark with HBase
Which hbase release are you running ? If it is 0.98, take a look at: https://issues.apache.org/jira/browse/SPARK-1297 Thanks On Dec 2, 2014, at 10:21 PM, Jai wrote: > I am trying to use Apache Spark with a psuedo distributed Hadoop Hbase > Cluster and I am looking for some links regarding the same. Can someone > please guide me through the steps to accomplish this. Thanks a lot for > Helping > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-with-HBase-tp20226.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org >
Re: Spark with HBase
You could go through these to start with http://www.vidyasource.com/blog/Programming/Scala/Java/Data/Hadoop/Analytics/2014/01/25/lighting-a-spark-with-hbase http://stackoverflow.com/questions/25189527/how-to-process-a-range-of-hbase-rows-using-spark Thanks Best Regards On Wed, Dec 3, 2014 at 11:51 AM, Jai wrote: > I am trying to use Apache Spark with a psuedo distributed Hadoop Hbase > Cluster and I am looking for some links regarding the same. Can someone > please guide me through the steps to accomplish this. Thanks a lot for > Helping > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-with-HBase-tp20226.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
Spark with HBase
I am trying to use Apache Spark with a psuedo distributed Hadoop Hbase Cluster and I am looking for some links regarding the same. Can someone please guide me through the steps to accomplish this. Thanks a lot for Helping -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-with-HBase-tp20226.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark with HBase
this two posts should be good for setting up spark+hbase environment and use the results of hbase table scan as RDD settings http://www.abcn.net/2014/07/lighting-spark-with-hbase-full-edition.html some samples: http://www.abcn.net/2014/07/spark-hbase-result-keyvalue-bytearray.html -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-with-HBase-tp11629p11647.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark with HBase
You can download and compile spark against your existing hadoop version. Here's a quick start https://spark.apache.org/docs/latest/cluster-overview.html#cluster-manager-types You can also read a bit here http://docs.sigmoidanalytics.com/index.php/Installing_Spark_andSetting_Up_Your_Cluster ( the version is quiet old) Attached is a piece of Code (Spark Java API) to connect to HBase. Thanks Best Regards On Thu, Aug 7, 2014 at 1:48 PM, Deepa Jayaveer wrote: > Hi > I read your white paper about " " . We wanted to do a Proof of Concept on > Spark with HBase. Documents > are not much available to set up the spark cluster in Hadoop 2 > environment. If you have any, > can you please give us some reference URLs > Also, some sample program to connect to HBase using Spark Java API > > Thanks > Deepa > > =-=-= > Notice: The information contained in this e-mail > message and/or attachments to it may contain > confidential or privileged information. If you are > not the intended recipient, any dissemination, use, > review, distribution, printing or copying of the > information contained in this e-mail message > and/or attachments to it are strictly prohibited. If > you have received this communication in error, > please notify us by reply e-mail or telephone and > immediately and permanently delete the message > and any attachments. Thank you > > import java.util.Iterator; import java.util.List; import org.apache.commons.configuration.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.KeyValue; import org.apache.hadoop.hbase.client.Get; import org.apache.hadoop.hbase.client.HTable; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.util.Bytes; import org.apache.spark.SparkConf; import org.apache.spark.SparkContext; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.api.java.function.Function; import org.apache.spark.rdd.NewHadoopRDD; import org.apache.spark.streaming.Duration; import org.apache.spark.streaming.api.java.JavaStreamingContext; import org.apache.hadoop.hbase.io.ImmutableBytesWritable; import org.apache.hadoop.hbase.mapreduce.TableInputFormat; import com.google.common.collect.Lists; import scala.Function1; import scala.Tuple2; import scala.collection.JavaConversions; import scala.collection.Seq; import scala.collection.JavaConverters.*; import scala.reflect.ClassTag; public class SparkHBaseMain { @SuppressWarnings("deprecation") public static void main(String[] arg){ try{ List jars = Lists.newArrayList("/home/akhld/Desktop/tools/spark-9/jars/spark-assembly-0.9.0-incubating-hadoop2.3.0-mr1-cdh5.0.0.jar", "/home/akhld/Downloads/sparkhbasecode/hbase-server-0.96.0-hadoop2.jar", "/home/akhld/Downloads/sparkhbasecode/hbase-protocol-0.96.0-hadoop2.jar", "/home/akhld/Downloads/sparkhbasecode/hbase-hadoop2-compat-0.96.0-hadoop2.jar", "/home/akhld/Downloads/sparkhbasecode/hbase-common-0.96.0-hadoop2.jar", "/home/akhld/Downloads/sparkhbasecode/hbase-client-0.96.0-hadoop2.jar", "/home/akhld/Downloads/sparkhbasecode/htrace-core-2.02.jar"); SparkConf spconf = new SparkConf(); spconf.setMaster("local"); spconf.setAppName("SparkHBase"); spconf.setSparkHome("/home/akhld/Desktop/tools/spark-9"); spconf.setJars(jars.toArray(new String[jars.size()])); spconf.set("spark.executor.memory", "1g"); final JavaSparkContext sc = new JavaSparkContext(spconf); org.apache.hadoop.conf.Configuration conf = HBaseConfiguration.create(); conf.addResource("/home/akhld/Downloads/sparkhbasecode/hbase-site.xml"); conf.set(TableInputFormat.INPUT_TABLE, "blogposts"); NewHadoopRDD rdd = new NewHadoopRDD(JavaSparkContext.toSparkContext(sc), TableInputFormat.class, ImmutableBytesWritable.class, Result.class, conf); JavaRDD> jrdd = rdd.toJavaRDD(); ForEachFunction f = new ForEachFunction(); JavaRDD> retrdd = jrdd.map(f); System.out.println("Count =>" + retrdd.count()); }catch(Exception e){ e.printStackTrace(); System.out.println("Crshed : " + e); } } @SuppressWarnings("serial") private static class ForEachFunction extends Function, Iterator>{ public Iterator call(Tuple2 test) { Result tmp = (Result) test._2; List kvl = tmp.getColumn("post".getBytes(), "title".getBytes()); for(KeyValue kl:kvl){ String sb = new String(kl.getValue()); System.out.println("Value :" + sb); } return null; } } } - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark with HBase
Hi I read your white paper about " " . We wanted to do a Proof of Concept on Spark with HBase. Documents are not much available to set up the spark cluster in Hadoop 2 environment. If you have any, can you please give us some reference URLs Also, some sample program to connect to HBase using Spark Java API Thanks Deepa =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you
Use Spark with HBase' HFileOutputFormat
Hi, I want to use Spark with HBase and I'm confused about how to ingest my data using HBase' HFileOutputFormat. It recommends calling configureIncrementalLoad which does the following: - Inspects the table to configure a total order partitioner - Uploads the partitions file to the cluster and adds it to the DistributedCache - Sets the number of reduce tasks to match the current number of regions - Sets the output key/value class to match HFileOutputFormat2's requirements - Sets the reducer up to perform the appropriate sorting (either KeyValueSortReducer or PutSortReducer) But in Spark, it seems I have to do the sorting and partition myself, right? Can anyone show me how to do it properly? Is there a better way to ingest data fast to HBase from Spark? Cheers, -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/
Re: Spark with HBase
Hi, I met this issue before. the reason is the hbase client using in spark is 0.94.6, and your server is 0.96.1.1 to fix this issue, you could choose one way: a) deploy a hbase cluster with version 0.94.6 b) rebuild the spark code step 1: modify the hbase version in pom.xml to 0.96.1.1 step 2: modify the hbase artifactId in example/pom.xml to hbase-it step 3: use maven to build spark again c) try to add hbase jars to SPARK_CLASSPATH ( i did not try this way before ) 2014-07-04 1:19 GMT-07:00 N.Venkata Naga Ravi : > Hi, > > Any update on the solution? We are still facing this issue... > We could able to connect to HBase with independent code, but getting issue > with Spark integration. > > Thx, > Ravi > > -- > From: nvn_r...@hotmail.com > To: u...@spark.incubator.apache.org; user@spark.apache.org > Subject: RE: Spark with HBase > Date: Sun, 29 Jun 2014 15:32:42 +0530 > > +user@spark.apache.org > > -- > From: nvn_r...@hotmail.com > To: u...@spark.incubator.apache.org > Subject: Spark with HBase > Date: Sun, 29 Jun 2014 15:28:43 +0530 > > I am using follwoing versiongs .. > > *spark-1.0.0*-bin-hadoop2 > *hbase-0.96.1.1*-hadoop2 > > > When executing Hbase Test , i am facing following exception. Looks like > some version incompatibility, can you please help on it. > > NERAVI-M-70HY:spark-1.0.0-bin-hadoop2 neravi$ ./bin/run-example > org.apache.spark.examples.HBaseTest local localhost:4040 test > > > > 14/06/29 15:14:14 INFO RecoverableZooKeeper: The identifier of this > process is 69...@neravi-m-70hy.cisco.com > 14/06/29 15:14:14 INFO ClientCnxn: Opening socket connection to server > localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL > (unknown error) > 14/06/29 15:14:14 INFO ClientCnxn: Socket connection established to > localhost/0:0:0:0:0:0:0:1:2181, initiating session > 14/06/29 15:14:14 INFO ClientCnxn: Session establishment complete on > server localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x146e6fa10750009, > negotiated timeout = 4 > Exception in thread "main" java.lang.IllegalArgumentException: Not a > host:port pair: PBUF > > > 192.168.1.6�( > at > org.apache.hadoop.hbase.util.Addressing.parseHostname(Addressing.java:60) > at org.apache.hadoop.hbase.ServerName.(ServerName.java:101) > at > org.apache.hadoop.hbase.ServerName.parseVersionedServerName(ServerName.java:283) > at > org.apache.hadoop.hbase.MasterAddressTracker.bytesToServerName(MasterAddressTracker.java:77) > at > org.apache.hadoop.hbase.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:61) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:703) > at > org.apache.hadoop.hbase.client.HBaseAdmin.(HBaseAdmin.java:126) > at org.apache.spark.examples.HBaseTest$.main(HBaseTest.scala:37) > at org.apache.spark.examples.HBaseTest.main(HBaseTest.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > > > Thanks, > Ravi >
RE: Spark with HBase
Hi, Any update on the solution? We are still facing this issue... We could able to connect to HBase with independent code, but getting issue with Spark integration. Thx, Ravi From: nvn_r...@hotmail.com To: u...@spark.incubator.apache.org; user@spark.apache.org Subject: RE: Spark with HBase Date: Sun, 29 Jun 2014 15:32:42 +0530 +user@spark.apache.org From: nvn_r...@hotmail.com To: u...@spark.incubator.apache.org Subject: Spark with HBase Date: Sun, 29 Jun 2014 15:28:43 +0530 I am using follwoing versiongs .. spark-1.0.0-bin-hadoop2 hbase-0.96.1.1-hadoop2 When executing Hbase Test , i am facing following exception. Looks like some version incompatibility, can you please help on it. NERAVI-M-70HY:spark-1.0.0-bin-hadoop2 neravi$ ./bin/run-example org.apache.spark.examples.HBaseTest local localhost:4040 test 14/06/29 15:14:14 INFO RecoverableZooKeeper: The identifier of this process is 69...@neravi-m-70hy.cisco.com 14/06/29 15:14:14 INFO ClientCnxn: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error) 14/06/29 15:14:14 INFO ClientCnxn: Socket connection established to localhost/0:0:0:0:0:0:0:1:2181, initiating session 14/06/29 15:14:14 INFO ClientCnxn: Session establishment complete on server localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x146e6fa10750009, negotiated timeout = 4 Exception in thread "main" java.lang.IllegalArgumentException: Not a host:port pair: PBUF 192.168.1.6�( at org.apache.hadoop.hbase.util.Addressing.parseHostname(Addressing.java:60) at org.apache.hadoop.hbase.ServerName.(ServerName.java:101) at org.apache.hadoop.hbase.ServerName.parseVersionedServerName(ServerName.java:283) at org.apache.hadoop.hbase.MasterAddressTracker.bytesToServerName(MasterAddressTracker.java:77) at org.apache.hadoop.hbase.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:61) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:703) at org.apache.hadoop.hbase.client.HBaseAdmin.(HBaseAdmin.java:126) at org.apache.spark.examples.HBaseTest$.main(HBaseTest.scala:37) at org.apache.spark.examples.HBaseTest.main(HBaseTest.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Thanks, Ravi
RE: Spark with HBase
+user@spark.apache.org From: nvn_r...@hotmail.com To: u...@spark.incubator.apache.org Subject: Spark with HBase Date: Sun, 29 Jun 2014 15:28:43 +0530 I am using follwoing versiongs .. spark-1.0.0-bin-hadoop2 hbase-0.96.1.1-hadoop2 When executing Hbase Test , i am facing following exception. Looks like some version incompatibility, can you please help on it. NERAVI-M-70HY:spark-1.0.0-bin-hadoop2 neravi$ ./bin/run-example org.apache.spark.examples.HBaseTest local localhost:4040 test 14/06/29 15:14:14 INFO RecoverableZooKeeper: The identifier of this process is 69...@neravi-m-70hy.cisco.com 14/06/29 15:14:14 INFO ClientCnxn: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error) 14/06/29 15:14:14 INFO ClientCnxn: Socket connection established to localhost/0:0:0:0:0:0:0:1:2181, initiating session 14/06/29 15:14:14 INFO ClientCnxn: Session establishment complete on server localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x146e6fa10750009, negotiated timeout = 4 Exception in thread "main" java.lang.IllegalArgumentException: Not a host:port pair: PBUF 192.168.1.6�( at org.apache.hadoop.hbase.util.Addressing.parseHostname(Addressing.java:60) at org.apache.hadoop.hbase.ServerName.(ServerName.java:101) at org.apache.hadoop.hbase.ServerName.parseVersionedServerName(ServerName.java:283) at org.apache.hadoop.hbase.MasterAddressTracker.bytesToServerName(MasterAddressTracker.java:77) at org.apache.hadoop.hbase.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:61) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:703) at org.apache.hadoop.hbase.client.HBaseAdmin.(HBaseAdmin.java:126) at org.apache.spark.examples.HBaseTest$.main(HBaseTest.scala:37) at org.apache.spark.examples.HBaseTest.main(HBaseTest.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Thanks, Ravi
Spark with HBase
I am using follwoing versiongs .. spark-1.0.0-bin-hadoop2 hbase-0.96.1.1-hadoop2 When executing Hbase Test , i am facing following exception. Looks like some version incompatibility, can you please help on it. NERAVI-M-70HY:spark-1.0.0-bin-hadoop2 neravi$ ./bin/run-example org.apache.spark.examples.HBaseTest local localhost:4040 test 14/06/29 15:14:14 INFO RecoverableZooKeeper: The identifier of this process is 69...@neravi-m-70hy.cisco.com 14/06/29 15:14:14 INFO ClientCnxn: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error) 14/06/29 15:14:14 INFO ClientCnxn: Socket connection established to localhost/0:0:0:0:0:0:0:1:2181, initiating session 14/06/29 15:14:14 INFO ClientCnxn: Session establishment complete on server localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x146e6fa10750009, negotiated timeout = 4 Exception in thread "main" java.lang.IllegalArgumentException: Not a host:port pair: PBUF 192.168.1.6�( at org.apache.hadoop.hbase.util.Addressing.parseHostname(Addressing.java:60) at org.apache.hadoop.hbase.ServerName.(ServerName.java:101) at org.apache.hadoop.hbase.ServerName.parseVersionedServerName(ServerName.java:283) at org.apache.hadoop.hbase.MasterAddressTracker.bytesToServerName(MasterAddressTracker.java:77) at org.apache.hadoop.hbase.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:61) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:703) at org.apache.hadoop.hbase.client.HBaseAdmin.(HBaseAdmin.java:126) at org.apache.spark.examples.HBaseTest$.main(HBaseTest.scala:37) at org.apache.spark.examples.HBaseTest.main(HBaseTest.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Thanks, Ravi
Re: Problem using Spark with Hbase
Thanks Mayur for the reply. Actually issue was the I was running Spark application on hadoop-2.2.0 and hbase version there was 0.95.2. But spark by default gets build by an older hbase version. So I had to build spark again with hbase version as 0.95.2 in spark build file. And it worked. Thanks, -Vibhor On Wed, May 28, 2014 at 11:34 PM, Mayur Rustagi wrote: > Try this.. > > Mayur Rustagi > Ph: +1 (760) 203 3257 > http://www.sigmoidanalytics.com > @mayur_rustagi <https://twitter.com/mayur_rustagi> > > > > On Wed, May 28, 2014 at 7:40 PM, Vibhor Banga > wrote: > >> Any one who has used spark this way or has faced similar issue, please >> help. >> >> Thanks, >> -Vibhor >> >> >> On Wed, May 28, 2014 at 6:03 PM, Vibhor Banga >> wrote: >> >>> Hi all, >>> >>> I am facing issues while using spark with HBase. I am getting >>> NullPointerException at org.apache.hadoop.hbase.TableName.valueOf >>> (TableName.java:288) >>> >>> Can someone please help to resolve this issue. What am I missing ? >>> >>> >>> I am using following snippet of code - >>> >>> Configuration config = HBaseConfiguration.create(); >>> >>> config.set("hbase.zookeeper.znode.parent", "hostname1"); >>> config.set("hbase.zookeeper.quorum","hostname1"); >>> config.set("hbase.zookeeper.property.clientPort","2181"); >>> config.set("hbase.master", "hostname1: >>> config.set("fs.defaultFS","hdfs://hostname1/"); >>> config.set("dfs.namenode.rpc-address","hostname1:8020"); >>> >>> config.set(TableInputFormat.INPUT_TABLE, "tableName"); >>> >>>JavaSparkContext ctx = new JavaSparkContext(args[0], "Simple", >>> System.getenv(sparkHome), >>> JavaSparkContext.jarOfClass(Simple.class)); >>> >>>JavaPairRDD hBaseRDD >>> = ctx.newAPIHadoopRDD( config, TableInputFormat.class, >>> ImmutableBytesWritable.class, Result.class); >>> >>> Map rddMap = >>> hBaseRDD.collectAsMap(); >>> >>> >>> But when I go to the spark cluster and check the logs, I see following >>> error - >>> >>> INFO NewHadoopRDD: Input split: w3-target1.nm.flipkart.com:, >>> 14/05/28 16:48:51 ERROR TableInputFormat: java.lang.NullPointerException >>> at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:288) >>> at org.apache.hadoop.hbase.client.HTable.(HTable.java:154) >>> at >>> org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:99) >>> at >>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:92) >>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:84) >>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:48) >>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) >>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) >>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109) >>> at org.apache.spark.scheduler.Task.run(Task.scala:53) >>> at >>> org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:211) >>> at >>> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42) >>> at >>> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at javax.security.auth.Subject.doAs(Subject.java:415) >>> at >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) >>> at >>> org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41) >>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> at java.lang.Thread.run(Thread.java:745) >>> >>> Thanks, >>> >>> -Vibhor >>> >>> >> >> >> > -- Vibhor Banga Software Development Engineer Flipkart Internet Pvt. Ltd., Bangalore
Re: Problem using Spark with Hbase
Try this.. Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi <https://twitter.com/mayur_rustagi> On Wed, May 28, 2014 at 7:40 PM, Vibhor Banga wrote: > Any one who has used spark this way or has faced similar issue, please > help. > > Thanks, > -Vibhor > > > On Wed, May 28, 2014 at 6:03 PM, Vibhor Banga wrote: > >> Hi all, >> >> I am facing issues while using spark with HBase. I am getting >> NullPointerException at org.apache.hadoop.hbase.TableName.valueOf >> (TableName.java:288) >> >> Can someone please help to resolve this issue. What am I missing ? >> >> >> I am using following snippet of code - >> >> Configuration config = HBaseConfiguration.create(); >> >> config.set("hbase.zookeeper.znode.parent", "hostname1"); >> config.set("hbase.zookeeper.quorum","hostname1"); >> config.set("hbase.zookeeper.property.clientPort","2181"); >> config.set("hbase.master", "hostname1: >> config.set("fs.defaultFS","hdfs://hostname1/"); >> config.set("dfs.namenode.rpc-address","hostname1:8020"); >> >> config.set(TableInputFormat.INPUT_TABLE, "tableName"); >> >>JavaSparkContext ctx = new JavaSparkContext(args[0], "Simple", >> System.getenv(sparkHome), >> JavaSparkContext.jarOfClass(Simple.class)); >> >>JavaPairRDD hBaseRDD >> = ctx.newAPIHadoopRDD( config, TableInputFormat.class, >> ImmutableBytesWritable.class, Result.class); >> >> Map rddMap = >> hBaseRDD.collectAsMap(); >> >> >> But when I go to the spark cluster and check the logs, I see following >> error - >> >> INFO NewHadoopRDD: Input split: w3-target1.nm.flipkart.com:, >> 14/05/28 16:48:51 ERROR TableInputFormat: java.lang.NullPointerException >> at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:288) >> at org.apache.hadoop.hbase.client.HTable.(HTable.java:154) >> at >> org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:99) >> at >> org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:92) >> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:84) >> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:48) >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) >> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109) >> at org.apache.spark.scheduler.Task.run(Task.scala:53) >> at >> org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:211) >> at >> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42) >> at >> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:415) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) >> at >> org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41) >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >> >> Thanks, >> >> -Vibhor >> >> > > > SparkHBaseMain.java Description: Binary data
Re: Problem using Spark with Hbase
Any one who has used spark this way or has faced similar issue, please help. Thanks, -Vibhor On Wed, May 28, 2014 at 6:03 PM, Vibhor Banga wrote: > Hi all, > > I am facing issues while using spark with HBase. I am getting > NullPointerException at org.apache.hadoop.hbase.TableName.valueOf > (TableName.java:288) > > Can someone please help to resolve this issue. What am I missing ? > > > I am using following snippet of code - > > Configuration config = HBaseConfiguration.create(); > > config.set("hbase.zookeeper.znode.parent", "hostname1"); > config.set("hbase.zookeeper.quorum","hostname1"); > config.set("hbase.zookeeper.property.clientPort","2181"); > config.set("hbase.master", "hostname1: > config.set("fs.defaultFS","hdfs://hostname1/"); > config.set("dfs.namenode.rpc-address","hostname1:8020"); > > config.set(TableInputFormat.INPUT_TABLE, "tableName"); > >JavaSparkContext ctx = new JavaSparkContext(args[0], "Simple", > System.getenv(sparkHome), > JavaSparkContext.jarOfClass(Simple.class)); > >JavaPairRDD hBaseRDD > = ctx.newAPIHadoopRDD( config, TableInputFormat.class, > ImmutableBytesWritable.class, Result.class); > > Map rddMap = hBaseRDD.collectAsMap(); > > > But when I go to the spark cluster and check the logs, I see following > error - > > INFO NewHadoopRDD: Input split: w3-target1.nm.flipkart.com:, > 14/05/28 16:48:51 ERROR TableInputFormat: java.lang.NullPointerException > at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:288) > at org.apache.hadoop.hbase.client.HTable.(HTable.java:154) > at > org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:99) > at > org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:92) > at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:84) > at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:48) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109) > at org.apache.spark.scheduler.Task.run(Task.scala:53) > at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:211) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) > at > org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > > Thanks, > > -Vibhor > >
Problem using Spark with Hbase
Hi all, I am facing issues while using spark with HBase. I am getting NullPointerException at org.apache.hadoop.hbase.TableName.valueOf (TableName.java:288) Can someone please help to resolve this issue. What am I missing ? I am using following snippet of code - Configuration config = HBaseConfiguration.create(); config.set("hbase.zookeeper.znode.parent", "hostname1"); config.set("hbase.zookeeper.quorum","hostname1"); config.set("hbase.zookeeper.property.clientPort","2181"); config.set("hbase.master", "hostname1: config.set("fs.defaultFS","hdfs://hostname1/"); config.set("dfs.namenode.rpc-address","hostname1:8020"); config.set(TableInputFormat.INPUT_TABLE, "tableName"); JavaSparkContext ctx = new JavaSparkContext(args[0], "Simple", System.getenv(sparkHome), JavaSparkContext.jarOfClass(Simple.class)); JavaPairRDD hBaseRDD = ctx.newAPIHadoopRDD( config, TableInputFormat.class, ImmutableBytesWritable.class, Result.class); Map rddMap = hBaseRDD.collectAsMap(); But when I go to the spark cluster and check the logs, I see following error - INFO NewHadoopRDD: Input split: w3-target1.nm.flipkart.com:, 14/05/28 16:48:51 ERROR TableInputFormat: java.lang.NullPointerException at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:288) at org.apache.hadoop.hbase.client.HTable.(HTable.java:154) at org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:99) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:92) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:84) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:48) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109) at org.apache.spark.scheduler.Task.run(Task.scala:53) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:211) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Thanks, -Vibhor