Re: Spark and HBase RDD join/get

2016-01-14 Thread Kristoffer Sjögren
Thanks Ted! On Thu, Jan 14, 2016 at 4:49 PM, Ted Yu wrote: > For #1, yes it is possible. > > You can find some example in hbase-spark module of hbase where hbase as > DataSource is provided. > e.g. > > https://github.com/apache/hbase/blob/master/hbase-spark/src/main/scala/org/apache/hadoop/hbase/

Re: Spark and HBase RDD join/get

2016-01-14 Thread Ted Yu
For #1, yes it is possible. You can find some example in hbase-spark module of hbase where hbase as DataSource is provided. e.g. https://github.com/apache/hbase/blob/master/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseRDDFunctions.scala Cheers On Thu, Jan 14, 2016 at 5:04 AM, K

Spark and HBase RDD join/get

2016-01-14 Thread Kristoffer Sjögren
Hi We have a RDD that needs to be mapped with information from HBase, where the exact key is the user id. What's the different alternatives for doing this? - Is it possible to do HBase.get() requests from a map function in Spark? - Or should we join RDDs with all full HBase table scan? I ask be

Re: Python, Spark and HBase

2015-08-03 Thread ericbless
I wanted to confirm whether this is now supported, such as in Spark v1.3.0 I've read varying info online & just thought I'd verify. Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Python-Spark-and-HBase-tp6142p24117.html Sent from th

Re: Spark and HBase join issue

2015-03-14 Thread Ted Yu
The 4.1 GB table has 3 regions. This means that there would be at least 2 nodes which don't carry its region. Can you split this table into 12 (or more) regions ? BTW what's the value for spark.yarn.executor.memoryOverhead ? Cheers On Sat, Mar 14, 2015 at 10:52 AM, francexo83 wrote: > Hi all,

Spark and HBase join issue

2015-03-14 Thread francexo83
Hi all, I have the following cluster configurations: - 5 nodes on a cloud environment. - Hadoop 2.5.0. - HBase 0.98.6. - Spark 1.2.0. - 8 cores and 16 GB of ram on each host. - 1 NFS disk with 300 IOPS mounted on host 1 and 2. - 1 NFS disk with 300 IOPS mounted on host

Re: Python, Spark and HBase

2014-05-29 Thread Nick Pentreath
es not exist in the JVM'. > --> 659 format(self._fqn + name)) > 660 > 661 def __call__(self, *args): > > Py4JError: org.apache.spark.api.python.PythonRDDnewAPIHadoopFile does not > exist in the JVM > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Python-Spark-and-HBase-tp6142p6507.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >

Re: Python, Spark and HBase

2014-05-28 Thread twizansk
python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py in __getattr__(self, name) 657 else: 658 raise Py4JError('{0} does not exist in the JVM'. --> 659 format(self._fqn + name)) 660 661 def __call__(self, *args): Py4JError: org.apache.spa

Re: Python, Spark and HBase

2014-05-28 Thread twizansk
e', value_class='org.apache.hadoop.hbase.client.Result' Is it possible that the typo is coming from inside the spark code? Tommer -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Python-Spark-and-HBase-tp6142p6506.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Python, Spark and HBase

2014-05-28 Thread Matei Zaharia
rmats branch and cannot find any > reference to the class org.apache.spark.api.python.PythonRDDnewAPIHadoopFile > > Any ideas? > > Also, do you have a working example of HBase access with the new code? > > Thanks > > Tommer > > > > -- > View thi

Re: Python, Spark and HBase

2014-05-28 Thread twizansk
reference to the class org.apache.spark.api.python.PythonRDDnewAPIHadoopFile Any ideas? Also, do you have a working example of HBase access with the new code? Thanks Tommer -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Python-Spark-and-HBase-tp6142p6502.html

Re: Python, Spark and HBase

2014-05-21 Thread twizansk
Thanks Nick and Matei. I'll take a look at the patch and keep you updated. Tommer -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Python-Spark-and-HBase-tp6142p6176.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Python, Spark and HBase

2014-05-20 Thread Nick Pentreath
Format class. Is there any equivalent in python? >> >> Thanks >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Python-Spark-and-HBase-tp6142.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Python, Spark and HBase

2014-05-20 Thread Matei Zaharia
t in python? > > Thanks > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Python-Spark-and-HBase-tp6142.html > Sent from the Apache Spark User List mailing list archive at Nabble.com.

Python, Spark and HBase

2014-05-20 Thread twizansk
there any equivalent in python? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Python-Spark-and-HBase-tp6142.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark and HBase

2014-04-26 Thread Nicholas Chammas
an index for each column in my table and I store complex >>>>> object within the cells. Is it correct? >>>>> >>>>> Best, >>>>> Flavio >>>>> >>>>> >>>>> >>>>> >>>>> On Tue, Apr 8, 2014 at 6:05 PM, Bin Wang wrote: >>>>> >>>>>> Hi Flavio, >>>>>> >>>>>> I happened to attend, actually attending the 2014 Apache Conf, I >>>>>> heard a project called "Apache Phoenix", which fully leverage HBase and >>>>>> suppose to be 1000x faster than Hive. And it is not memory bounded, in >>>>>> which case sets up a limit for Spark. It is still in the incubating group >>>>>> and the "stats" functions spark has already implemented are still on the >>>>>> roadmap. I am not sure whether it will be good but might be something >>>>>> interesting to check out. >>>>>> >>>>>> /usr/bin >>>>>> >>>>>> >>>>>> On Tue, Apr 8, 2014 at 9:57 AM, Flavio Pompermaier < >>>>>> pomperma...@okkam.it> wrote: >>>>>> >>>>>>> Hi to everybody, >>>>>>> >>>>>>> in these days I looked a bit at the recent evolution of the big >>>>>>> data stacks and it seems that HBase is somehow fading away in favour of >>>>>>> Spark+HDFS. Am I correct? >>>>>>> Do you think that Spark and HBase should work together or not? >>>>>>> >>>>>>> Best regards, >>>>>>> Flavio >>>>>>> >>>>>> >>>> >>> >> >

Re: Spark and HBase

2014-04-26 Thread Josh Mahonin
>>>>> >>>>>> On Tue, Apr 8, 2014 at 6:05 PM, Bin Wang wrote: >>>>>> Hi Flavio, >>>>>> >>>>>> I happened to attend, actually attending the 2014 Apache Conf, I heard a >>>>>> project called "Apache Phoenix", which fully leverage HBase and suppose >>>>>> to be 1000x faster than Hive. And it is not memory bounded, in which >>>>>> case sets up a limit for Spark. It is still in the incubating group and >>>>>> the "stats" functions spark has already implemented are still on the >>>>>> roadmap. I am not sure whether it will be good but might be something >>>>>> interesting to check out. >>>>>> >>>>>> /usr/bin >>>>>> >>>>>> >>>>>>> On Tue, Apr 8, 2014 at 9:57 AM, Flavio Pompermaier >>>>>>> wrote: >>>>>>> Hi to everybody, >>>>>>> in these days I looked a bit at the recent evolution of the big data >>>>>>> stacks and it seems that HBase is somehow fading away in favour of >>>>>>> Spark+HDFS. Am I correct? >>>>>>> Do you think that Spark and HBase should work together or not? >>>>>>> >>>>>>> Best regards, >>>>>>> Flavio >>>>> >

Re: Spark and HBase

2014-04-25 Thread Nicholas Chammas
ich fully leverage HBase and suppose >>>>> to be 1000x faster than Hive. And it is not memory bounded, in which case >>>>> sets up a limit for Spark. It is still in the incubating group and the >>>>> "stats" functions spark has already implemented are still on the roadmap. >>>>> I >>>>> am not sure whether it will be good but might be something interesting to >>>>> check out. >>>>> >>>>> /usr/bin >>>>> >>>>> >>>>> On Tue, Apr 8, 2014 at 9:57 AM, Flavio Pompermaier < >>>>> pomperma...@okkam.it> wrote: >>>>> >>>>>> Hi to everybody, >>>>>> >>>>>> in these days I looked a bit at the recent evolution of the big >>>>>> data stacks and it seems that HBase is somehow fading away in favour of >>>>>> Spark+HDFS. Am I correct? >>>>>> Do you think that Spark and HBase should work together or not? >>>>>> >>>>>> Best regards, >>>>>> Flavio >>>>>> >>>>> >>> >> >

Re: Spark and HBase

2014-04-25 Thread Josh Mahonin
nix", which fully leverage HBase and suppose >>>> to be 1000x faster than Hive. And it is not memory bounded, in which case >>>> sets up a limit for Spark. It is still in the incubating group and the >>>> "stats" functions spark has already implemented

Re: Spark and HBase

2014-04-08 Thread Nicholas Chammas
> /usr/bin >>> >>> >>> On Tue, Apr 8, 2014 at 9:57 AM, Flavio Pompermaier >> > wrote: >>> >>>> Hi to everybody, >>>> >>>> in these days I looked a bit at the recent evolution of the big data >>>> stacks and it seems that HBase is somehow fading away in favour of >>>> Spark+HDFS. Am I correct? >>>> Do you think that Spark and HBase should work together or not? >>>> >>>> Best regards, >>>> Flavio >>>> >>> >

Re: Spark and HBase

2014-04-08 Thread Bin Wang
> >> /usr/bin >> >> >> On Tue, Apr 8, 2014 at 9:57 AM, Flavio Pompermaier >> wrote: >> >>> Hi to everybody, >>> >>> in these days I looked a bit at the recent evolution of the big data >>> stacks and it seems that HBase is somehow fading away in favour of >>> Spark+HDFS. Am I correct? >>> Do you think that Spark and HBase should work together or not? >>> >>> Best regards, >>> Flavio >>> >>

Re: Spark and HBase

2014-04-08 Thread Flavio Pompermaier
gt; out. > > /usr/bin > > > On Tue, Apr 8, 2014 at 9:57 AM, Flavio Pompermaier > wrote: > >> Hi to everybody, >> >> in these days I looked a bit at the recent evolution of the big data >> stacks and it seems that HBase is somehow fading away in favour

Re: Spark and HBase

2014-04-08 Thread Christopher Nguyen
somehow fading away in favour of > Spark+HDFS. Am I correct? > Do you think that Spark and HBase should work together or not? > > Best regards, > Flavio >

Re: Spark and HBase

2014-04-08 Thread Bin Wang
se days I looked a bit at the recent evolution of the big data > stacks and it seems that HBase is somehow fading away in favour of > Spark+HDFS. Am I correct? > Do you think that Spark and HBase should work together or not? > > Best regards, > Flavio >

Spark and HBase

2014-04-08 Thread Flavio Pompermaier
Hi to everybody, in these days I looked a bit at the recent evolution of the big data stacks and it seems that HBase is somehow fading away in favour of Spark+HDFS. Am I correct? Do you think that Spark and HBase should work together or not? Best regards, Flavio