Re: Needed some best practices to integrate Spark with HBase

2020-07-20 Thread YogeshGovi
I also need good docs on this. Especially integrating pyspark with hive
reading tables from hbase.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Spark with HBase on Spark Runtime 2.2.1

2018-05-05 Thread SparkUser6
I wrote a simple program to read data from HBase, the program works find in
Cloudera backed by HDFS.  The program works fine on SPARK RUNTIME 1.6 on
Cloudera.  But does NOT work on EMR with Spark Runtime 2.2.1.

But getting an exception while testing data on EMR with S3.

// Spark conf
SparkConf sparkConf = new
SparkConf().setMaster("local[4]").setAppName("My App");
JavaSparkContext jsc = new JavaSparkContext(sparkConf);
// Hbase conf
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum","localhost");
conf.set("hbase.zookeeper.property.client.port","2181");
// Submit scan into hbase conf
 //   conf.set(TableInputFormat.SCAN,
TableMapReduceUtil.convertScanToString(scan));

conf.set(TableInputFormat.INPUT_TABLE, "mytable");
conf.set(TableInputFormat.SCAN_ROW_START, "startrow");
conf.set(TableInputFormat.SCAN_ROW_STOP, "endrow");

// Get RDD
JavaPairRDD source = jsc
.newAPIHadoopRDD(conf, TableInputFormat.class,
ImmutableBytesWritable.class, Result.class);

// Process RDD
System.out.println("&&& " + source.count());



0
down vote
favorite
I wrote a simple program to read data from HBase, the program works find in
Cloudera backed by HDFS.

But getting an exception while testing data on EMR with S3.

// Spark conf
SparkConf sparkConf = new
SparkConf().setMaster("local[4]").setAppName("My App");
JavaSparkContext jsc = new JavaSparkContext(sparkConf);
// Hbase conf
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum","localhost");
conf.set("hbase.zookeeper.property.client.port","2181");
// Submit scan into hbase conf
 //   conf.set(TableInputFormat.SCAN,
TableMapReduceUtil.convertScanToString(scan));

conf.set(TableInputFormat.INPUT_TABLE, "mytable");
conf.set(TableInputFormat.SCAN_ROW_START, "startrow");
conf.set(TableInputFormat.SCAN_ROW_STOP, "endrow");

// Get RDD
JavaPairRDD source = jsc
.newAPIHadoopRDD(conf, TableInputFormat.class,
ImmutableBytesWritable.class, Result.class);

// Process RDD
System.out.println("&&& " + source.count());
18/05/04 00:22:02 INFO MetricRegistries: Loaded MetricRegistries class
org.apache.hadoop.hbase.metrics.impl.MetricRegistriesImpl 18/05/04 00:22:02
ERROR TableInputFormat: java.io.IOException:
java.lang.reflect.InvocationTargetException at
org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240)
Caused by: java.lang.reflect.InvocationTargetException at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at
org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238)
Caused by: java.lang.IllegalAccessError: tried to access class
org.apache.hadoop.metrics2.lib.MetricsInfoImpl from class
org.apache.hadoop.metrics2.lib.DynamicMetricsRegistry at
org.apache.hadoop.metrics2.lib.DynamicMetricsRegistry.newGauge(DynamicMetricsRegistry.java:139)
at
org.apache.hadoop.hbase.zookeeper.MetricsZooKeeperSourceImpl.(MetricsZooKeeperSourceImpl.java:59)
at
org.apache.hadoop.hbase.zookeeper.MetricsZooKeeperSourceImpl.(MetricsZooKeeperSourceImpl.java:51)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at
java.lang.Class.newInstance(Class.java:442) at
java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380) ...
42 more

Exception in thread "main" java.io.IOException: Cannot create a record
reader because of a previous error. Please look at the previous
logs lines from the task's full log for more details. at
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:270)
at
org.apache.hadoop.hbase.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:256)
at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:125)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at
scala.Option.getOrElse(Option.scala:121) at
org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at
org.apache.spark.SparkContext.runJob(SparkContext.scala:2094) at
org.apache.spark.rdd.RDD.count(RDD.scala:1158) at
org.apache.spark.api.java.

Needed some best practices to integrate Spark with HBase

2017-09-29 Thread Debabrata Ghosh
Dear All,
 Greetings !

 I needed some best practices for integrating Spark
with HBase. Would you be able to point me to some useful resources / URL's
to your convenience please.

Thanks,

Debu


RE: Spark with HBase Error - Py4JJavaError

2016-07-08 Thread Puneet Tripathi
Hi Ram, Thanks very much it worked.

Puneet

From: ram kumar [mailto:ramkumarro...@gmail.com]
Sent: Thursday, July 07, 2016 6:51 PM
To: Puneet Tripathi
Cc: user@spark.apache.org
Subject: Re: Spark with HBase Error - Py4JJavaError

Hi Puneet,
Have you tried appending
 --jars $SPARK_HOME/lib/spark-examples-*.jar
to the execution command?
Ram

On Thu, Jul 7, 2016 at 5:19 PM, Puneet Tripathi 
mailto:puneet.tripa...@dunnhumby.com>> wrote:
Guys, Please can anyone help on the issue below?

Puneet

From: Puneet Tripathi 
[mailto:puneet.tripa...@dunnhumby.com<mailto:puneet.tripa...@dunnhumby.com>]
Sent: Thursday, July 07, 2016 12:42 PM
To: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Spark with HBase Error - Py4JJavaError

Hi,

We are running Hbase in fully distributed mode. I tried to connect to Hbase via 
pyspark and then write to hbase using saveAsNewAPIHadoopDataset , but it failed 
the error says:

Py4JJavaError: An error occurred while calling 
z:org.apache.spark.api.python.PythonRDD.saveAsHadoopDataset.
: java.lang.ClassNotFoundException: 
org.apache.spark.examples.pythonconverters.StringToImmutableBytesWritableConverter
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
I have been able to create pythonconverters.jar and then did below:


1.  I think we have to copy this to a location on HDFS, /sparkjars/ seems a 
good a directory to create as any. I think the file has to be world readable

2.  Set the spark_jar_hdfs_path property in Cloudera Manager e.g. 
hdfs:///sparkjars

It still doesn’t seem to work can someone please help me with this.

Regards,
Puneet
dunnhumby limited is a limited company registered in England and Wales with 
registered number 02388853 and VAT registered number 927 5871 83. Our 
registered office is at Aurora House, 71-75 Uxbridge Road, London W5 5SL. The 
contents of this message and any attachments to it are confidential and may be 
legally privileged. If you have received this message in error you should 
delete it from your system immediately and advise the sender. dunnhumby may 
monitor and record all emails. The views expressed in this email are those of 
the sender and not those of dunnhumby.
dunnhumby limited is a limited company registered in England and Wales with 
registered number 02388853 and VAT registered number 927 5871 83. Our 
registered office is at Aurora House, 71-75 Uxbridge Road, London W5 5SL. The 
contents of this message and any attachments to it are confidential and may be 
legally privileged. If you have received this message in error you should 
delete it from your system immediately and advise the sender. dunnhumby may 
monitor and record all emails. The views expressed in this email are those of 
the sender and not those of dunnhumby.

dunnhumby limited is a limited company registered in England and Wales with 
registered number 02388853 and VAT registered number 927 5871 83. Our 
registered office is at Aurora House, 71-75 Uxbridge Road, London W5 5SL. The 
contents of this message and any attachments to it are confidential and may be 
legally privileged. If you have received this message in error you should 
delete it from your system immediately and advise the sender. dunnhumby may 
monitor and record all emails. The views expressed in this email are those of 
the sender and not those of dunnhumby.


Re: Spark with HBase Error - Py4JJavaError

2016-07-07 Thread ram kumar
Hi Puneet,

Have you tried appending
 --jars $SPARK_HOME/lib/spark-examples-*.jar
to the execution command?

Ram

On Thu, Jul 7, 2016 at 5:19 PM, Puneet Tripathi <
puneet.tripa...@dunnhumby.com> wrote:

> Guys, Please can anyone help on the issue below?
>
>
>
> Puneet
>
>
>
> *From:* Puneet Tripathi [mailto:puneet.tripa...@dunnhumby.com]
> *Sent:* Thursday, July 07, 2016 12:42 PM
> *To:* user@spark.apache.org
> *Subject:* Spark with HBase Error - Py4JJavaError
>
>
>
> Hi,
>
>
>
> We are running Hbase in fully distributed mode. I tried to connect to
> Hbase via pyspark and then write to hbase using *saveAsNewAPIHadoopDataset
> *, but it failed the error says:
>
>
>
> Py4JJavaError: An error occurred while calling
> z:org.apache.spark.api.python.PythonRDD.saveAsHadoopDataset.
>
> : java.lang.ClassNotFoundException:
> org.apache.spark.examples.pythonconverters.StringToImmutableBytesWritableConverter
>
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>
> I have been able to create pythonconverters.jar and then did below:
>
>
>
> 1.  I think we have to copy this to a location on HDFS, /sparkjars/
> seems a good a directory to create as any. I think the file has to be world
> readable
>
> 2.  Set the spark_jar_hdfs_path property in Cloudera Manager e.g.
> hdfs:///sparkjars
>
>
>
> It still doesn’t seem to work can someone please help me with this.
>
>
>
> Regards,
>
> Puneet
>
> dunnhumby limited is a limited company registered in England and Wales
> with registered number 02388853 and VAT registered number 927 5871 83. Our
> registered office is at Aurora House, 71-75 Uxbridge Road, London W5 5SL.
> The contents of this message and any attachments to it are confidential and
> may be legally privileged. If you have received this message in error you
> should delete it from your system immediately and advise the sender.
> dunnhumby may monitor and record all emails. The views expressed in this
> email are those of the sender and not those of dunnhumby.
> dunnhumby limited is a limited company registered in England and Wales
> with registered number 02388853 and VAT registered number 927 5871 83. Our
> registered office is at Aurora House, 71-75 Uxbridge Road, London W5 5SL.
> The contents of this message and any attachments to it are confidential and
> may be legally privileged. If you have received this message in error you
> should delete it from your system immediately and advise the sender.
> dunnhumby may monitor and record all emails. The views expressed in this
> email are those of the sender and not those of dunnhumby.
>


RE: Spark with HBase Error - Py4JJavaError

2016-07-07 Thread Puneet Tripathi
Guys, Please can anyone help on the issue below?

Puneet

From: Puneet Tripathi [mailto:puneet.tripa...@dunnhumby.com]
Sent: Thursday, July 07, 2016 12:42 PM
To: user@spark.apache.org
Subject: Spark with HBase Error - Py4JJavaError

Hi,

We are running Hbase in fully distributed mode. I tried to connect to Hbase via 
pyspark and then write to hbase using saveAsNewAPIHadoopDataset , but it failed 
the error says:

Py4JJavaError: An error occurred while calling 
z:org.apache.spark.api.python.PythonRDD.saveAsHadoopDataset.
: java.lang.ClassNotFoundException: 
org.apache.spark.examples.pythonconverters.StringToImmutableBytesWritableConverter
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
I have been able to create pythonconverters.jar and then did below:


1.  I think we have to copy this to a location on HDFS, /sparkjars/ seems a 
good a directory to create as any. I think the file has to be world readable

2.  Set the spark_jar_hdfs_path property in Cloudera Manager e.g. 
hdfs:///sparkjars

It still doesn't seem to work can someone please help me with this.

Regards,
Puneet
dunnhumby limited is a limited company registered in England and Wales with 
registered number 02388853 and VAT registered number 927 5871 83. Our 
registered office is at Aurora House, 71-75 Uxbridge Road, London W5 5SL. The 
contents of this message and any attachments to it are confidential and may be 
legally privileged. If you have received this message in error you should 
delete it from your system immediately and advise the sender. dunnhumby may 
monitor and record all emails. The views expressed in this email are those of 
the sender and not those of dunnhumby.
dunnhumby limited is a limited company registered in England and Wales with 
registered number 02388853 and VAT registered number 927 5871 83. Our 
registered office is at Aurora House, 71-75 Uxbridge Road, London W5 5SL. The 
contents of this message and any attachments to it are confidential and may be 
legally privileged. If you have received this message in error you should 
delete it from your system immediately and advise the sender. dunnhumby may 
monitor and record all emails. The views expressed in this email are those of 
the sender and not those of dunnhumby.


Spark with HBase Error - Py4JJavaError

2016-07-07 Thread Puneet Tripathi
Hi,

We are running Hbase in fully distributed mode. I tried to connect to Hbase via 
pyspark and then write to hbase using saveAsNewAPIHadoopDataset , but it failed 
the error says:

Py4JJavaError: An error occurred while calling 
z:org.apache.spark.api.python.PythonRDD.saveAsHadoopDataset.
: java.lang.ClassNotFoundException: 
org.apache.spark.examples.pythonconverters.StringToImmutableBytesWritableConverter
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
I have been able to create pythonconverters.jar and then did below:


1.  I think we have to copy this to a location on HDFS, /sparkjars/ seems a 
good a directory to create as any. I think the file has to be world readable

2.  Set the spark_jar_hdfs_path property in Cloudera Manager e.g. 
hdfs:///sparkjars

It still doesn't seem to work can someone please help me with this.

Regards,
Puneet
dunnhumby limited is a limited company registered in England and Wales with 
registered number 02388853 and VAT registered number 927 5871 83. Our 
registered office is at Aurora House, 71-75 Uxbridge Road, London W5 5SL. The 
contents of this message and any attachments to it are confidential and may be 
legally privileged. If you have received this message in error you should 
delete it from your system immediately and advise the sender. dunnhumby may 
monitor and record all emails. The views expressed in this email are those of 
the sender and not those of dunnhumby.


Re: Spark with HBase

2014-12-15 Thread Aniket Bhatnagar
In case you are still looking for help, there has been multiple discussions
in this mailing list that you can try searching for. Or you can simply use
https://github.com/unicredit/hbase-rdd :-)

Thanks,
Aniket

On Wed Dec 03 2014 at 16:11:47 Ted Yu  wrote:

> Which hbase release are you running ?
> If it is 0.98, take a look at:
>
> https://issues.apache.org/jira/browse/SPARK-1297
>
> Thanks
>
> On Dec 2, 2014, at 10:21 PM, Jai  wrote:
>
> I am trying to use Apache Spark with a psuedo distributed Hadoop Hbase
> Cluster and I am looking for some links regarding the same. Can someone
> please guide me through the steps to accomplish this. Thanks a lot for
> Helping
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-with-HBase-tp20226.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: Spark with HBase

2014-12-03 Thread Ted Yu
Which hbase release are you running ?
If it is 0.98, take a look at:

https://issues.apache.org/jira/browse/SPARK-1297

Thanks

On Dec 2, 2014, at 10:21 PM, Jai  wrote:

> I am trying to use Apache Spark with a psuedo distributed Hadoop Hbase
> Cluster and I am looking for some links regarding the same. Can someone
> please guide me through the steps to accomplish this. Thanks a lot for
> Helping
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-with-HBase-tp20226.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 


Re: Spark with HBase

2014-12-03 Thread Akhil Das
You could go through these to start with

http://www.vidyasource.com/blog/Programming/Scala/Java/Data/Hadoop/Analytics/2014/01/25/lighting-a-spark-with-hbase

http://stackoverflow.com/questions/25189527/how-to-process-a-range-of-hbase-rows-using-spark

Thanks
Best Regards

On Wed, Dec 3, 2014 at 11:51 AM, Jai  wrote:

> I am trying to use Apache Spark with a psuedo distributed Hadoop Hbase
> Cluster and I am looking for some links regarding the same. Can someone
> please guide me through the steps to accomplish this. Thanks a lot for
> Helping
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-with-HBase-tp20226.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Spark with HBase

2014-12-02 Thread Jai
I am trying to use Apache Spark with a psuedo distributed Hadoop Hbase
Cluster and I am looking for some links regarding the same. Can someone
please guide me through the steps to accomplish this. Thanks a lot for
Helping



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-with-HBase-tp20226.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark with HBase

2014-08-07 Thread chutium
this two posts should be good for setting up spark+hbase environment and use
the results of hbase table scan as RDD

settings
http://www.abcn.net/2014/07/lighting-spark-with-hbase-full-edition.html

some samples:
http://www.abcn.net/2014/07/spark-hbase-result-keyvalue-bytearray.html



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-with-HBase-tp11629p11647.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark with HBase

2014-08-07 Thread Akhil Das
You can download and compile spark against your existing hadoop version.

Here's a quick start
https://spark.apache.org/docs/latest/cluster-overview.html#cluster-manager-types

You can also read a bit here
http://docs.sigmoidanalytics.com/index.php/Installing_Spark_andSetting_Up_Your_Cluster
( the version is quiet old)

Attached is a piece of Code (Spark Java API) to connect to HBase.



Thanks
Best Regards


On Thu, Aug 7, 2014 at 1:48 PM, Deepa Jayaveer 
wrote:

> Hi
> I read your white paper about " " . We wanted to do a Proof of Concept on
> Spark with HBase. Documents
> are not much available to set up the spark cluster  in Hadoop 2
> environment. If you have any,
> can you please give us some reference URLs
> Also, some sample program to connect to HBase using Spark Java API
>
> Thanks
> Deepa
>
> =-=-=
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>
>
import java.util.Iterator;
import java.util.List;

import org.apache.commons.configuration.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.rdd.NewHadoopRDD;
import org.apache.spark.streaming.Duration;
import org.apache.spark.streaming.api.java.JavaStreamingContext;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableInputFormat;

import com.google.common.collect.Lists;

import scala.Function1;
import scala.Tuple2;
import scala.collection.JavaConversions;
import scala.collection.Seq;
import scala.collection.JavaConverters.*;
import scala.reflect.ClassTag;

public class SparkHBaseMain {

	
	@SuppressWarnings("deprecation")
	public static void main(String[] arg){
		
		try{
			
			List jars = Lists.newArrayList("/home/akhld/Desktop/tools/spark-9/jars/spark-assembly-0.9.0-incubating-hadoop2.3.0-mr1-cdh5.0.0.jar",
	"/home/akhld/Downloads/sparkhbasecode/hbase-server-0.96.0-hadoop2.jar",
	"/home/akhld/Downloads/sparkhbasecode/hbase-protocol-0.96.0-hadoop2.jar",
	"/home/akhld/Downloads/sparkhbasecode/hbase-hadoop2-compat-0.96.0-hadoop2.jar",
	"/home/akhld/Downloads/sparkhbasecode/hbase-common-0.96.0-hadoop2.jar",
	"/home/akhld/Downloads/sparkhbasecode/hbase-client-0.96.0-hadoop2.jar",
	"/home/akhld/Downloads/sparkhbasecode/htrace-core-2.02.jar");

			SparkConf spconf = new SparkConf();
			spconf.setMaster("local");
			spconf.setAppName("SparkHBase");
			spconf.setSparkHome("/home/akhld/Desktop/tools/spark-9");
			spconf.setJars(jars.toArray(new String[jars.size()]));
			spconf.set("spark.executor.memory", "1g");

			final JavaSparkContext sc = new JavaSparkContext(spconf);
		
			org.apache.hadoop.conf.Configuration conf = HBaseConfiguration.create();
			conf.addResource("/home/akhld/Downloads/sparkhbasecode/hbase-site.xml");
			conf.set(TableInputFormat.INPUT_TABLE, "blogposts");
			
		
			NewHadoopRDD rdd = new NewHadoopRDD(JavaSparkContext.toSparkContext(sc), TableInputFormat.class, ImmutableBytesWritable.class, Result.class, conf);
			
			JavaRDD> jrdd = rdd.toJavaRDD();
		
			ForEachFunction f = new ForEachFunction();
			JavaRDD> retrdd = jrdd.map(f);
			System.out.println("Count =>" + retrdd.count());
			
		}catch(Exception e){
			
			e.printStackTrace();
			System.out.println("Crshed : " + e);
			
		}
		
	}
	
	@SuppressWarnings("serial")
private static class ForEachFunction extends Function, Iterator>{
   	public Iterator call(Tuple2 test) {
   		Result tmp = (Result) test._2;
List kvl = tmp.getColumn("post".getBytes(), "title".getBytes());
for(KeyValue kl:kvl){
	String sb = new String(kl.getValue());
	System.out.println("Value :" + sb);
}
   		return null;
}

 }


}

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Spark with HBase

2014-08-07 Thread Deepa Jayaveer
Hi 
I read your white paper about " " . We wanted to do a Proof of Concept on 
Spark with HBase. Documents
are not much available to set up the spark cluster  in Hadoop 2 
environment. If you have any,
can you please give us some reference URLs
Also, some sample program to connect to HBase using Spark Java API

Thanks 
Deepa

=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you




Use Spark with HBase' HFileOutputFormat

2014-07-16 Thread Jianshi Huang
Hi,

I want to use Spark with HBase and I'm confused about how to ingest my data
using HBase' HFileOutputFormat. It recommends calling
configureIncrementalLoad which does the following:

   - Inspects the table to configure a total order partitioner
   - Uploads the partitions file to the cluster and adds it to the
   DistributedCache
   - Sets the number of reduce tasks to match the current number of regions
   - Sets the output key/value class to match HFileOutputFormat2's
   requirements
   - Sets the reducer up to perform the appropriate sorting (either
   KeyValueSortReducer or PutSortReducer)

But in Spark, it seems I have to do the sorting and partition myself, right?

Can anyone show me how to do it properly? Is there a better way to ingest
data fast to HBase from Spark?

Cheers,
-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/


Re: Spark with HBase

2014-07-04 Thread 田毅
Hi, I met this issue before.

the reason is the hbase client using in spark is 0.94.6, and your server is
0.96.1.1

to fix this issue, you could choose one way:

a) deploy a hbase cluster with version 0.94.6
b) rebuild the spark code
step 1:  modify the hbase version in pom.xml to 0.96.1.1
step 2:  modify the hbase artifactId in example/pom.xml to hbase-it
step 3:  use maven to build spark again
c) try to add hbase jars to SPARK_CLASSPATH ( i did not try this way before
)


2014-07-04 1:19 GMT-07:00 N.Venkata Naga Ravi :

> Hi,
>
> Any update on the solution? We are still facing this issue...
> We could able to connect to HBase with independent code, but getting issue
> with Spark integration.
>
> Thx,
> Ravi
>
> --
> From: nvn_r...@hotmail.com
> To: u...@spark.incubator.apache.org; user@spark.apache.org
> Subject: RE: Spark with HBase
> Date: Sun, 29 Jun 2014 15:32:42 +0530
>
> +user@spark.apache.org
>
> --
> From: nvn_r...@hotmail.com
> To: u...@spark.incubator.apache.org
> Subject: Spark with HBase
> Date: Sun, 29 Jun 2014 15:28:43 +0530
>
> I am using follwoing versiongs ..
>
> *spark-1.0.0*-bin-hadoop2
> *hbase-0.96.1.1*-hadoop2
>
>
> When executing Hbase Test , i am facing following exception. Looks like
> some version incompatibility, can you please help on it.
>
> NERAVI-M-70HY:spark-1.0.0-bin-hadoop2 neravi$ ./bin/run-example
> org.apache.spark.examples.HBaseTest local localhost:4040 test
>
>
>
> 14/06/29 15:14:14 INFO RecoverableZooKeeper: The identifier of this
> process is 69...@neravi-m-70hy.cisco.com
> 14/06/29 15:14:14 INFO ClientCnxn: Opening socket connection to server
> localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL
> (unknown error)
> 14/06/29 15:14:14 INFO ClientCnxn: Socket connection established to
> localhost/0:0:0:0:0:0:0:1:2181, initiating session
> 14/06/29 15:14:14 INFO ClientCnxn: Session establishment complete on
> server localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x146e6fa10750009,
> negotiated timeout = 4
> Exception in thread "main" java.lang.IllegalArgumentException: Not a
> host:port pair: PBUF
>
>
> 192.168.1.6�(
> at
> org.apache.hadoop.hbase.util.Addressing.parseHostname(Addressing.java:60)
> at org.apache.hadoop.hbase.ServerName.(ServerName.java:101)
> at
> org.apache.hadoop.hbase.ServerName.parseVersionedServerName(ServerName.java:283)
> at
> org.apache.hadoop.hbase.MasterAddressTracker.bytesToServerName(MasterAddressTracker.java:77)
> at
> org.apache.hadoop.hbase.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:61)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:703)
> at
> org.apache.hadoop.hbase.client.HBaseAdmin.(HBaseAdmin.java:126)
> at org.apache.spark.examples.HBaseTest$.main(HBaseTest.scala:37)
> at org.apache.spark.examples.HBaseTest.main(HBaseTest.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
>
> Thanks,
> Ravi
>


RE: Spark with HBase

2014-07-04 Thread N . Venkata Naga Ravi
Hi,

Any update on the solution? We are still facing this issue...
We could able to connect to HBase with independent code, but getting issue with 
Spark integration.

Thx,
Ravi

From: nvn_r...@hotmail.com
To: u...@spark.incubator.apache.org; user@spark.apache.org
Subject: RE: Spark with HBase
Date: Sun, 29 Jun 2014 15:32:42 +0530




+user@spark.apache.org

From: nvn_r...@hotmail.com
To: u...@spark.incubator.apache.org
Subject: Spark with HBase
Date: Sun, 29 Jun 2014 15:28:43 +0530




I am using follwoing versiongs ..

spark-1.0.0-bin-hadoop2
hbase-0.96.1.1-hadoop2


When executing Hbase Test , i am facing following exception. Looks like some 
version incompatibility, can you please help on it.

NERAVI-M-70HY:spark-1.0.0-bin-hadoop2 neravi$ ./bin/run-example 
org.apache.spark.examples.HBaseTest local localhost:4040 test



14/06/29 15:14:14 INFO RecoverableZooKeeper: The identifier of this process is 
69...@neravi-m-70hy.cisco.com
14/06/29 15:14:14 INFO ClientCnxn: Opening socket connection to server 
localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL 
(unknown error)
14/06/29 15:14:14 INFO ClientCnxn: Socket connection established to 
localhost/0:0:0:0:0:0:0:1:2181, initiating session
14/06/29 15:14:14 INFO ClientCnxn: Session establishment complete on server 
localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x146e6fa10750009, negotiated 
timeout = 4
Exception in thread "main" java.lang.IllegalArgumentException: Not a host:port 
pair: PBUF


192.168.1.6�(
at org.apache.hadoop.hbase.util.Addressing.parseHostname(Addressing.java:60)
at org.apache.hadoop.hbase.ServerName.(ServerName.java:101)
at 
org.apache.hadoop.hbase.ServerName.parseVersionedServerName(ServerName.java:283)
at 
org.apache.hadoop.hbase.MasterAddressTracker.bytesToServerName(MasterAddressTracker.java:77)
at 
org.apache.hadoop.hbase.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:61)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:703)
at org.apache.hadoop.hbase.client.HBaseAdmin.(HBaseAdmin.java:126)
at org.apache.spark.examples.HBaseTest$.main(HBaseTest.scala:37)
at org.apache.spark.examples.HBaseTest.main(HBaseTest.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)


Thanks,
Ravi

  

RE: Spark with HBase

2014-06-29 Thread N . Venkata Naga Ravi
+user@spark.apache.org

From: nvn_r...@hotmail.com
To: u...@spark.incubator.apache.org
Subject: Spark with HBase
Date: Sun, 29 Jun 2014 15:28:43 +0530




I am using follwoing versiongs ..

spark-1.0.0-bin-hadoop2
hbase-0.96.1.1-hadoop2


When executing Hbase Test , i am facing following exception. Looks like some 
version incompatibility, can you please help on it.

NERAVI-M-70HY:spark-1.0.0-bin-hadoop2 neravi$ ./bin/run-example 
org.apache.spark.examples.HBaseTest local localhost:4040 test



14/06/29 15:14:14 INFO RecoverableZooKeeper: The identifier of this process is 
69...@neravi-m-70hy.cisco.com
14/06/29 15:14:14 INFO ClientCnxn: Opening socket connection to server 
localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL 
(unknown error)
14/06/29 15:14:14 INFO ClientCnxn: Socket connection established to 
localhost/0:0:0:0:0:0:0:1:2181, initiating session
14/06/29 15:14:14 INFO ClientCnxn: Session establishment complete on server 
localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x146e6fa10750009, negotiated 
timeout = 4
Exception in thread "main" java.lang.IllegalArgumentException: Not a host:port 
pair: PBUF


192.168.1.6�(
at org.apache.hadoop.hbase.util.Addressing.parseHostname(Addressing.java:60)
at org.apache.hadoop.hbase.ServerName.(ServerName.java:101)
at 
org.apache.hadoop.hbase.ServerName.parseVersionedServerName(ServerName.java:283)
at 
org.apache.hadoop.hbase.MasterAddressTracker.bytesToServerName(MasterAddressTracker.java:77)
at 
org.apache.hadoop.hbase.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:61)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:703)
at org.apache.hadoop.hbase.client.HBaseAdmin.(HBaseAdmin.java:126)
at org.apache.spark.examples.HBaseTest$.main(HBaseTest.scala:37)
at org.apache.spark.examples.HBaseTest.main(HBaseTest.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)


Thanks,
Ravi

  

Spark with HBase

2014-06-29 Thread N . Venkata Naga Ravi
I am using follwoing versiongs ..

spark-1.0.0-bin-hadoop2
hbase-0.96.1.1-hadoop2


When executing Hbase Test , i am facing following exception. Looks like some 
version incompatibility, can you please help on it.

NERAVI-M-70HY:spark-1.0.0-bin-hadoop2 neravi$ ./bin/run-example 
org.apache.spark.examples.HBaseTest local localhost:4040 test



14/06/29 15:14:14 INFO RecoverableZooKeeper: The identifier of this process is 
69...@neravi-m-70hy.cisco.com
14/06/29 15:14:14 INFO ClientCnxn: Opening socket connection to server 
localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL 
(unknown error)
14/06/29 15:14:14 INFO ClientCnxn: Socket connection established to 
localhost/0:0:0:0:0:0:0:1:2181, initiating session
14/06/29 15:14:14 INFO ClientCnxn: Session establishment complete on server 
localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x146e6fa10750009, negotiated 
timeout = 4
Exception in thread "main" java.lang.IllegalArgumentException: Not a host:port 
pair: PBUF


192.168.1.6�(
at org.apache.hadoop.hbase.util.Addressing.parseHostname(Addressing.java:60)
at org.apache.hadoop.hbase.ServerName.(ServerName.java:101)
at 
org.apache.hadoop.hbase.ServerName.parseVersionedServerName(ServerName.java:283)
at 
org.apache.hadoop.hbase.MasterAddressTracker.bytesToServerName(MasterAddressTracker.java:77)
at 
org.apache.hadoop.hbase.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:61)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:703)
at org.apache.hadoop.hbase.client.HBaseAdmin.(HBaseAdmin.java:126)
at org.apache.spark.examples.HBaseTest$.main(HBaseTest.scala:37)
at org.apache.spark.examples.HBaseTest.main(HBaseTest.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)


Thanks,
Ravi
  

Re: Problem using Spark with Hbase

2014-05-30 Thread Vibhor Banga
Thanks Mayur for the reply.

Actually issue was the I was running Spark application on hadoop-2.2.0 and
hbase version there was 0.95.2.

But spark by default gets build by an older hbase version. So I had to
build spark again with hbase version as 0.95.2 in spark build file. And it
worked.

Thanks,
-Vibhor


On Wed, May 28, 2014 at 11:34 PM, Mayur Rustagi 
wrote:

> Try this..
>
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi <https://twitter.com/mayur_rustagi>
>
>
>
> On Wed, May 28, 2014 at 7:40 PM, Vibhor Banga 
> wrote:
>
>> Any one who has used spark this way or has faced similar issue, please
>> help.
>>
>> Thanks,
>> -Vibhor
>>
>>
>> On Wed, May 28, 2014 at 6:03 PM, Vibhor Banga 
>> wrote:
>>
>>> Hi all,
>>>
>>> I am facing issues while using spark with HBase. I am getting
>>> NullPointerException at org.apache.hadoop.hbase.TableName.valueOf
>>> (TableName.java:288)
>>>
>>> Can someone please help to resolve this issue. What am I missing ?
>>>
>>>
>>> I am using following snippet of code -
>>>
>>> Configuration config = HBaseConfiguration.create();
>>>
>>> config.set("hbase.zookeeper.znode.parent", "hostname1");
>>> config.set("hbase.zookeeper.quorum","hostname1");
>>> config.set("hbase.zookeeper.property.clientPort","2181");
>>> config.set("hbase.master", "hostname1:
>>> config.set("fs.defaultFS","hdfs://hostname1/");
>>> config.set("dfs.namenode.rpc-address","hostname1:8020");
>>>
>>> config.set(TableInputFormat.INPUT_TABLE, "tableName");
>>>
>>>JavaSparkContext ctx = new JavaSparkContext(args[0], "Simple",
>>>  System.getenv(sparkHome),
>>> JavaSparkContext.jarOfClass(Simple.class));
>>>
>>>JavaPairRDD hBaseRDD
>>> = ctx.newAPIHadoopRDD( config, TableInputFormat.class,
>>> ImmutableBytesWritable.class, Result.class);
>>>
>>>   Map rddMap =
>>> hBaseRDD.collectAsMap();
>>>
>>>
>>> But when I go to the spark cluster and check the logs, I see following
>>> error -
>>>
>>> INFO NewHadoopRDD: Input split: w3-target1.nm.flipkart.com:,
>>> 14/05/28 16:48:51 ERROR TableInputFormat: java.lang.NullPointerException
>>> at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:288)
>>> at org.apache.hadoop.hbase.client.HTable.(HTable.java:154)
>>> at 
>>> org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:99)
>>> at 
>>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:92)
>>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:84)
>>> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:48)
>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
>>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109)
>>> at org.apache.spark.scheduler.Task.run(Task.scala:53)
>>> at 
>>> org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:211)
>>> at 
>>> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42)
>>> at 
>>> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>> at 
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>>> at 
>>> org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41)
>>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176)
>>> at 
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:745)
>>>
>>> Thanks,
>>>
>>> -Vibhor
>>>
>>>
>>
>>
>>
>


-- 
Vibhor Banga
Software Development Engineer
Flipkart Internet Pvt. Ltd., Bangalore


Re: Problem using Spark with Hbase

2014-05-28 Thread Mayur Rustagi
Try this..

Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>



On Wed, May 28, 2014 at 7:40 PM, Vibhor Banga  wrote:

> Any one who has used spark this way or has faced similar issue, please
> help.
>
> Thanks,
> -Vibhor
>
>
> On Wed, May 28, 2014 at 6:03 PM, Vibhor Banga wrote:
>
>> Hi all,
>>
>> I am facing issues while using spark with HBase. I am getting
>> NullPointerException at org.apache.hadoop.hbase.TableName.valueOf
>> (TableName.java:288)
>>
>> Can someone please help to resolve this issue. What am I missing ?
>>
>>
>> I am using following snippet of code -
>>
>> Configuration config = HBaseConfiguration.create();
>>
>> config.set("hbase.zookeeper.znode.parent", "hostname1");
>> config.set("hbase.zookeeper.quorum","hostname1");
>> config.set("hbase.zookeeper.property.clientPort","2181");
>> config.set("hbase.master", "hostname1:
>> config.set("fs.defaultFS","hdfs://hostname1/");
>> config.set("dfs.namenode.rpc-address","hostname1:8020");
>>
>> config.set(TableInputFormat.INPUT_TABLE, "tableName");
>>
>>JavaSparkContext ctx = new JavaSparkContext(args[0], "Simple",
>>  System.getenv(sparkHome),
>> JavaSparkContext.jarOfClass(Simple.class));
>>
>>JavaPairRDD hBaseRDD
>> = ctx.newAPIHadoopRDD( config, TableInputFormat.class,
>> ImmutableBytesWritable.class, Result.class);
>>
>>   Map rddMap =
>> hBaseRDD.collectAsMap();
>>
>>
>> But when I go to the spark cluster and check the logs, I see following
>> error -
>>
>> INFO NewHadoopRDD: Input split: w3-target1.nm.flipkart.com:,
>> 14/05/28 16:48:51 ERROR TableInputFormat: java.lang.NullPointerException
>>  at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:288)
>>  at org.apache.hadoop.hbase.client.HTable.(HTable.java:154)
>>  at 
>> org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:99)
>>  at 
>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:92)
>>  at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:84)
>>  at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:48)
>>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
>>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
>>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109)
>>  at org.apache.spark.scheduler.Task.run(Task.scala:53)
>>  at 
>> org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:211)
>>  at 
>> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42)
>>  at 
>> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41)
>>  at java.security.AccessController.doPrivileged(Native Method)
>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>  at 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>>  at 
>> org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41)
>>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176)
>>  at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>  at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>  at java.lang.Thread.run(Thread.java:745)
>>
>> Thanks,
>>
>> -Vibhor
>>
>>
>
>
>


SparkHBaseMain.java
Description: Binary data


Re: Problem using Spark with Hbase

2014-05-28 Thread Vibhor Banga
Any one who has used spark this way or has faced similar issue, please help.

Thanks,
-Vibhor

On Wed, May 28, 2014 at 6:03 PM, Vibhor Banga  wrote:

> Hi all,
>
> I am facing issues while using spark with HBase. I am getting
> NullPointerException at org.apache.hadoop.hbase.TableName.valueOf
> (TableName.java:288)
>
> Can someone please help to resolve this issue. What am I missing ?
>
>
> I am using following snippet of code -
>
> Configuration config = HBaseConfiguration.create();
>
> config.set("hbase.zookeeper.znode.parent", "hostname1");
> config.set("hbase.zookeeper.quorum","hostname1");
> config.set("hbase.zookeeper.property.clientPort","2181");
> config.set("hbase.master", "hostname1:
> config.set("fs.defaultFS","hdfs://hostname1/");
> config.set("dfs.namenode.rpc-address","hostname1:8020");
>
> config.set(TableInputFormat.INPUT_TABLE, "tableName");
>
>JavaSparkContext ctx = new JavaSparkContext(args[0], "Simple",
>  System.getenv(sparkHome),
> JavaSparkContext.jarOfClass(Simple.class));
>
>JavaPairRDD hBaseRDD
> = ctx.newAPIHadoopRDD( config, TableInputFormat.class,
> ImmutableBytesWritable.class, Result.class);
>
>   Map rddMap = hBaseRDD.collectAsMap();
>
>
> But when I go to the spark cluster and check the logs, I see following
> error -
>
> INFO NewHadoopRDD: Input split: w3-target1.nm.flipkart.com:,
> 14/05/28 16:48:51 ERROR TableInputFormat: java.lang.NullPointerException
>   at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:288)
>   at org.apache.hadoop.hbase.client.HTable.(HTable.java:154)
>   at 
> org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:99)
>   at 
> org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:92)
>   at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:84)
>   at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:48)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109)
>   at org.apache.spark.scheduler.Task.run(Task.scala:53)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:211)
>   at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42)
>   at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>   at 
> org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
>
> Thanks,
>
> -Vibhor
>
>


Problem using Spark with Hbase

2014-05-28 Thread Vibhor Banga
Hi all,

I am facing issues while using spark with HBase. I am getting
NullPointerException at org.apache.hadoop.hbase.TableName.valueOf
(TableName.java:288)

Can someone please help to resolve this issue. What am I missing ?


I am using following snippet of code -

Configuration config = HBaseConfiguration.create();

config.set("hbase.zookeeper.znode.parent", "hostname1");
config.set("hbase.zookeeper.quorum","hostname1");
config.set("hbase.zookeeper.property.clientPort","2181");
config.set("hbase.master", "hostname1:
config.set("fs.defaultFS","hdfs://hostname1/");
config.set("dfs.namenode.rpc-address","hostname1:8020");

config.set(TableInputFormat.INPUT_TABLE, "tableName");

   JavaSparkContext ctx = new JavaSparkContext(args[0], "Simple",
 System.getenv(sparkHome),
JavaSparkContext.jarOfClass(Simple.class));

   JavaPairRDD hBaseRDD
= ctx.newAPIHadoopRDD( config, TableInputFormat.class,
ImmutableBytesWritable.class, Result.class);

  Map rddMap = hBaseRDD.collectAsMap();


But when I go to the spark cluster and check the logs, I see following
error -

INFO NewHadoopRDD: Input split: w3-target1.nm.flipkart.com:,
14/05/28 16:48:51 ERROR TableInputFormat: java.lang.NullPointerException
at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:288)
at org.apache.hadoop.hbase.client.HTable.(HTable.java:154)
at 
org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:99)
at 
org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:92)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:84)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:48)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109)
at org.apache.spark.scheduler.Task.run(Task.scala:53)
at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:211)
at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42)
at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at 
org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Thanks,

-Vibhor