hbase sql query

2015-03-11 Thread Udbhav Agarwal
Hi, How can we simply cache hbase table and do sql query via java api in spark. Thanks, Udbhav Agarwal

Re: Does anyone integrate HBASE on Spark

2015-03-04 Thread gen tang
Hi, There are some examples in spark/example <https://github.com/apache/spark/tree/master/examples> and there are also some examples in spark package <http://spark-packages.org/>. And I find this blog <http://www.abcn.net/2014/07/lighting-spark-with-hbase-full-edition.html> is

Does anyone integrate HBASE on Spark

2015-03-04 Thread sandeep vura
Hi Sparkers, How do i integrate hbase on spark !!! Appreciate for replies !! Regards, Sandeep.v

Re: How to integrate HBASE on Spark

2015-02-23 Thread sandeep vura
; > user@spark.apache.org> > *Sent:* Monday, February 23, 2015 8:52 AM > *Subject:* Re: How to integrate HBASE on Spark > > Installing hbase on hadoop cluster would allow hbase to utilize features > provided by hdfs, such as short circuit read (See '90.2. Leveraging loc

Re: How to integrate HBASE on Spark

2015-02-23 Thread Deepak Vohra
Or, use the SparkOnHBase lab.http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/   From: Ted Yu To: Akhil Das Cc: sandeep vura ; "user@spark.apache.org" Sent: Monday, February 23, 2015 8:52 AM Subject: Re: How to integrate HBASE on Spark Installin

Re: How to integrate HBASE on Spark

2015-02-23 Thread Ted Yu
Installing hbase on hadoop cluster would allow hbase to utilize features provided by hdfs, such as short circuit read (See '90.2. Leveraging local data' under http://hbase.apache.org/book.html#perf.hdfs). Cheers On Sun, Feb 22, 2015 at 11:38 PM, Akhil Das wrote: > If you are ha

Re: How to integrate HBASE on Spark

2015-02-23 Thread sandeep vura
cluster. If you install it on the spark > cluster itself, then hbase might take up a few cpu cycles and there's a > chance for the job to lag. > > Thanks > Best Regards > > On Mon, Feb 23, 2015 at 12:48 PM, sandeep vura > wrote: > >> Hi >> >> I had

Re: How to integrate HBASE on Spark

2015-02-22 Thread Akhil Das
If you are having both the clusters on the same network, then i'd suggest you installing it on the hadoop cluster. If you install it on the spark cluster itself, then hbase might take up a few cpu cycles and there's a chance for the job to lag. Thanks Best Regards On Mon, Feb 23, 201

How to integrate HBASE on Spark

2015-02-22 Thread sandeep vura
Hi I had installed spark on 3 node cluster. Spark services are up and running.But i want to integrate hbase on spark Do i need to install HBASE on hadoop cluster or spark cluster. Please let me know asap. Regards, Sandeep.v

Re: Hive/Hbase for low latency

2015-02-11 Thread Ravi Kiran
Hi Siddharth, With v 4.3 of Phoenix, you can use the PhoenixInputFormat and OutputFormat classes to pull/push to Phoenix from Spark. HTH Thanks Ravi On Wed, Feb 11, 2015 at 6:59 AM, Ted Yu wrote: > Connectivity to hbase is also avaliable. You can take a look at: > > examples/

Re: Hive/Hbase for low latency

2015-02-11 Thread Ted Yu
Connectivity to hbase is also avaliable. You can take a look at: examples//src/main/python/hbase_inputformat.py examples//src/main/python/hbase_outputformat.py examples//src/main/scala/org/apache/spark/examples/HBaseTest.scala examples//src/main/scala/org/apache/spark/examples/pythonconverters

Re: Hive/Hbase for low latency

2015-02-11 Thread VISHNU SUBRAMANIAN
> > > > I am new to Spark . We have recently moved from Apache Storm to Apache > Spark to build our OLAP tool . > > Now ,earlier we were using Hbase & Phoenix. > > We need to re-think what to use in case of Spark. > > Should we go ahead with Hbase or Hive or C

Hive/Hbase for low latency

2015-02-11 Thread Siddharth Ubale
Hi , I am new to Spark . We have recently moved from Apache Storm to Apache Spark to build our OLAP tool . Now ,earlier we were using Hbase & Phoenix. We need to re-think what to use in case of Spark. Should we go ahead with Hbase or Hive or Cassandra for query processing with Spark

Re: Pyspark Hbase scan.

2015-02-05 Thread gen tang
Hi, In fact, this pull https://github.com/apache/spark/pull/3920 is to do Hbase scan. However, it is not merged yet. You can also take a look at the example code at http://spark-packages.org/package/20 which is using scala and python to read data from hbase. Hope this can be helpful. Cheers Gen

Pyspark Hbase scan.

2015-02-05 Thread Castberg , René Christian
?Hi, I am trying to do a hbase scan and read it into a spark rdd using pyspark. I have successfully written data to hbase from pyspark, and been able to read a full table from hbase using the python example code. Unfortunately I am unable to find any example code for doing an HBase scan and

HBase Thrift API Error on map/reduce functions

2015-01-30 Thread mtheofilos
Try _fast_serialization=2 or contact PiCloud support Can any developer that works in that stuff tell me if that problem can be fixed? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/HBase-Thrift-API-Error-on-

Re: Re: Bulk loading into hbase using saveAsNewAPIHadoopFile

2015-01-28 Thread Jim Green
Thanks for all respnding. Finally I figured out the way to use bulk load to hbase using scala on spark. The sample code is here which others can refer in future: http://www.openkb.info/2015/01/how-to-use-scala-on-spark-to-load-data.html Thanks! On Tue, Jan 27, 2015 at 6:27 PM, Jim Green wrote

Re: Re: Bulk loading into hbase using saveAsNewAPIHadoopFile

2015-01-27 Thread Jim Green
Thanks Sun. My understanding is , savaAsNewHadoopFile is to save as Hfile on hdfs. Is it doable to use saveAsNewAPIHadoopDataset to directly loading to hbase? If so, is there any sample code for that? Thanks! On Tue, Jan 27, 2015 at 6:07 PM, fightf...@163.com wrote: > Hi, Jim > Your gen

Re: Re: Bulk loading into hbase using saveAsNewAPIHadoopFile

2015-01-27 Thread fightf...@163.com
val kv = new KeyValue(rowkeyBytes,colfam,qual,value) List(kv) } Thanks, Sun fightf...@163.com From: Jim Green Date: 2015-01-28 04:44 To: Ted Yu CC: user Subject: Re: Bulk loading into hbase using saveAsNewAPIHadoopFile I used below code, and it still failed with

Re: Bulk loading into hbase using saveAsNewAPIHadoopFile

2015-01-27 Thread Jim Green
Jim Green wrote: > Thanks Ted. Could you give me a simple example to load one row data in > hbase? How should I generate the KeyValue? > I tried multiple times, and still can not figure it out. > > On Tue, Jan 27, 2015 at 12:10 PM, Ted Yu wrote: > >> Here i

Re: Bulk loading into hbase using saveAsNewAPIHadoopFile

2015-01-27 Thread Jim Green
Thanks Ted. Could you give me a simple example to load one row data in hbase? How should I generate the KeyValue? I tried multiple times, and still can not figure it out. On Tue, Jan 27, 2015 at 12:10 PM, Ted Yu wrote: > Here is the method signature used by HFileOutputFormat : >

Re: Bulk loading into hbase using saveAsNewAPIHadoopFile

2015-01-27 Thread Ted Yu
data into hbase. > *Env:* > hbase 0.94 > spark-1.0.2 > > I am trying below code to just bulk load some data into hbase table “t1”. > > import org.apache.spark._ > import org.apache.spark.rdd.NewHadoopRDD > import org.apache.hadoop.hbase.{HBaseConfi

Bulk loading into hbase using saveAsNewAPIHadoopFile

2015-01-27 Thread Jim Green
Hi Team, I need some help on writing a scala to bulk load some data into hbase. *Env:* hbase 0.94 spark-1.0.2 I am trying below code to just bulk load some data into hbase table “t1”. import org.apache.spark._ import org.apache.spark.rdd.NewHadoopRDD import org.apache.hadoop.hbase

Unread block data exception when reading from HBase

2015-01-13 Thread jeremy p
Hello all, When I try to read data from an HBase table, I get an unread block data exception. I am running HBase and Spark on a single node (my workstation). My code is in Java, and I'm running it from the Eclipse IDE. Here are the versions I'm using : Cloudera : 2.5.0-cdh5.2.1 Hado

Re: Reading HBase data - Exception

2015-01-09 Thread lmsiva
getting. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Reading-HBase-data-Exception-tp21009p21071.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe

Re: Spark 1.1.0 and HBase: Snappy UnsatisfiedLinkError

2015-01-06 Thread Charles
Hi, I am getting this same error. Did you figure out how to solve the problem? Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-1-0-and-HBase-Snappy-UnsatisfiedLinkError-tp19827p21005.html Sent from the Apache Spark User List mailing list

RE: Saving data to Hbase hung in Spark streaming application with Spark 1.2.0

2015-01-06 Thread Max Xu
Issue resolved after updating the Hbase version to 0.98.8-hadoop2. Thanks Ted for all the help! For future reference: This problem has nothing to do with Spark 1.2.0 but simply because I built Spark 1.2.0 with the wrong Hbase version. From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Tuesday

Re: Saving data to Hbase hung in Spark streaming application with Spark 1.2.0

2015-01-06 Thread Ted Yu
I doubt anyone would deploy hbase 0.98.x on hadoop-1 Looks like hadoop2 profile should be made the default. Cheers On Tue, Jan 6, 2015 at 9:49 AM, Max Xu wrote: > Awesome. Thanks again Ted. I remember there is a block in the pom.xml > under the example folder that default hbase vers

RE: Saving data to Hbase hung in Spark streaming application with Spark 1.2.0

2015-01-06 Thread Max Xu
Awesome. Thanks again Ted. I remember there is a block in the pom.xml under the example folder that default hbase version to hadoop1. I figured out this last time when I built Spark 1.1.1 but forgot this time. hbase-hadoop1 !hbase.profile

Re: Saving data to Hbase hung in Spark streaming application with Spark 1.2.0

2015-01-06 Thread Ted Yu
Default profile is hbase-hadoop1 so you need to specify -Dhbase.profile=hadoop2 See SPARK-1297 Cheers On Tue, Jan 6, 2015 at 9:11 AM, Max Xu wrote: > Thanks Ted. You are right, hbase-site.xml is in the classpath. But > previously I have it in the classpath too and the app works f

RE: Saving data to Hbase hung in Spark streaming application with Spark 1.2.0

2015-01-06 Thread Max Xu
Thanks Ted. You are right, hbase-site.xml is in the classpath. But previously I have it in the classpath too and the app works fine. I believe I found the problem. I built Spark 1.2.0 myself and forgot to change the dependency hbase version to 0.98.8-hadoop2, which is the version I use. When I

Re: Saving data to Hbase hung in Spark streaming application with Spark 1.2.0

2015-01-06 Thread Ted Yu
I assume hbase-site.xml is in the classpath. Can you try the code snippet in standalone program to see if the problem persists ? Cheers On Tue, Jan 6, 2015 at 6:42 AM, Max Xu wrote: > Hi all, > > > > I have a Spark streaming application that ingests data from a Kafka topic

Saving data to Hbase hung in Spark streaming application with Spark 1.2.0

2015-01-06 Thread Max Xu
Hi all, I have a Spark streaming application that ingests data from a Kafka topic and persists received data to Hbase. It works fine with Spark 1.1.1 in YARN cluster mode. Basically, I use the following code to persist each partition of each RDD to Hbase: @Override void call

Re: saveAsNewAPIHadoopDataset against hbase hanging in pyspark 1.2.0

2014-12-24 Thread Antony Mayi
also hbase itself works ok: hbase(main):006:0> scan 'test'ROW                            COLUMN+CELL                                                                             key1                          column=f1:asd, timestamp=1419463092904, value=456                          

Re: saveAsNewAPIHadoopDataset against hbase hanging in pyspark 1.2.0

2014-12-24 Thread Antony Mayi
I am running it in yarn-client mode and I believe hbase-client is part of the  spark-examples-1.2.0-cdh5.3.0-hadoop2.5.0-cdh5.3.0.jar which I am submitting at launch. adding another jstack taken during the hanging - http://pastebin.com/QDQrBw70 - this is of the CoarseGrainedExecutorBackend

Re: saveAsNewAPIHadoopDataset against hbase hanging in pyspark 1.2.0

2014-12-24 Thread Ted Yu
bq. "hbase.zookeeper.quorum": "localhost" You are running hbase cluster in standalone mode ? Is hbase-client jar in the classpath ? Cheers On Wed, Dec 24, 2014 at 4:11 PM, Antony Mayi wrote: > I just run it by hand from pyspark shell. here is the steps: > > pyspar

Re: saveAsNewAPIHadoopDataset against hbase hanging in pyspark 1.2.0

2014-12-24 Thread Antony Mayi
e([['testkey', 'f1', 'testqual', 'testval']], 1).map(lambda x: (x[0], x)).saveAsNewAPIHadoopDataset(...         conf=conf,...         keyConverter=keyConv,...         valueConverter=valueConv) then it spills few of the INFO level messages about submitting a tas

Re: saveAsNewAPIHadoopDataset against hbase hanging in pyspark 1.2.0

2014-12-24 Thread Ted Yu
I went over the jstack but didn't find any call related to hbase or zookeeper. Do you find anything important in the logs ? Looks like container launcher was waiting for the script to return some result: 1. at org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecR

Re: saveAsNewAPIHadoopDataset against hbase hanging in pyspark 1.2.0

2014-12-24 Thread Antony Mayi
it ? Thanks On Wed, Dec 24, 2014 at 4:49 AM, Antony Mayi wrote: Hi, have been using this without any issues with spark 1.1.0 but after upgrading to 1.2.0 saving a RDD from pyspark using saveAsNewAPIHadoopDataset into HBase just hangs - even when testing with the example from the st

Re: saveAsNewAPIHadoopDataset against hbase hanging in pyspark 1.2.0

2014-12-24 Thread Ted Yu
2.0 saving a RDD from pyspark > using saveAsNewAPIHadoopDataset into HBase just hangs - even when testing > with the example from the stock hbase_outputformat.py. > > anyone having same issue? (and able to solve?) > > using hbase 0.98.6 and yarn-client mode. > > thanks, > Antony. > >

saveAsNewAPIHadoopDataset against hbase hanging in pyspark 1.2.0

2014-12-24 Thread Antony Mayi
Hi, have been using this without any issues with spark 1.1.0 but after upgrading to 1.2.0 saving a RDD from pyspark using saveAsNewAPIHadoopDataset into HBase just hangs - even when testing with the example from the stock hbase_outputformat.py. anyone having same issue? (and able to solve

Re:Re: Serialization issue when using HBase with Spark

2014-12-23 Thread yangliuyu
2014-12-15 17:52:47, "Aniket Bhatnagar" wrote: "The reason not using sc.newAPIHadoopRDD is it only support one scan each time." I am not sure is that's true. You can use multiple scans as following: val scanStrings = scans.map(scan => convertScanToString(scan)) conf.setSt

Re: custom python converter from HBase Result to tuple

2014-12-22 Thread Ted Yu
Please see http://stackoverflow.com/questions/18565953/wrong-number-of-arguments-when-a-calling-function-from-class-in-python Cheers On Mon, Dec 22, 2014 at 8:04 PM, Antony Mayi wrote: > using hbase 0.98.6 > > there is no stack trace, just this short error. > > just noti

Re: custom python converter from HBase Result to tuple

2014-12-22 Thread Antony Mayi
using hbase 0.98.6 there is no stack trace, just this short error. just noticed it does the fallback to toString as in the message as this is what I get back to python: hbase_rdd.collect() [(u'key1', u'List(cf1:12345:14567890, cf2:123:14567896)')] so the question is why it fa

Re: custom python converter from HBase Result to tuple

2014-12-22 Thread Ted Yu
Which HBase version are you using ? Can you show the full stack trace ? Cheers On Mon, Dec 22, 2014 at 11:02 AM, Antony Mayi wrote: > Hi, > > can anyone please give me some help how to write custom converter of hbase > data to (for example) tuples of ((family, qualifier, value), )

custom python converter from HBase Result to tuple

2014-12-22 Thread Antony Mayi
Hi, can anyone please give me some help how to write custom converter of hbase data to (for example) tuples of ((family, qualifier, value), ) for pyspark: I was trying something like (here trying to tuples of ("family:qualifier:value", )): class HBaseResultToTupleConverter extends

Re: SchemaRDD to Hbase

2014-12-20 Thread Alex Kamil
I'm using JDBCRDD <https://spark.apache.org/docs/1.2.0/api/scala/index.html#org.apache.spark.rdd.JdbcRDD> + Hbase JDBC driver <http://phoenix.apache.org/>+ schemaRDD <https://spark.apache.org/docs/1.2.0/api/scala/index.html#org.apache.spark.sql.SchemaRDD> make sure to use

Re: SchemaRDD to Hbase

2014-12-20 Thread Subacini B
Hi , Can someone help me , Any pointers would help. Thanks Subacini On Fri, Dec 19, 2014 at 10:47 PM, Subacini B wrote: > Hi All, > > Is there any API that can be used directly to write schemaRDD to HBase?? > If not, what is the best way to write schemaRDD to HBase. > > Thanks > Subacini >

SchemaRDD to Hbase

2014-12-19 Thread Subacini B
Hi All, Is there any API that can be used directly to write schemaRDD to HBase?? If not, what is the best way to write schemaRDD to HBase. Thanks Subacini

Re: Apache Spark 1.1.1 with Hbase 0.98.8-hadoop2 and hadoop 2.3.0

2014-12-17 Thread Ted Yu
.1001560.n3.nabble.com/file/n20746/pom.xml> > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Apache-Spark-1-1-1-with-Hbase-0-98-8-hadoop2-and-hadoop-2-3-0-tp20746.html > Sent from the Apache Spark User List mailing list archive a

Apache Spark 1.1.1 with Hbase 0.98.8-hadoop2 and hadoop 2.3.0

2014-12-17 Thread Amit Singh Hora
age in context: http://apache-spark-user-list.1001560.n3.nabble.com/Apache-Spark-1-1-1-with-Hbase-0-98-8-hadoop2-and-hadoop-2-3-0-tp20746.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe,

Re: Spark with HBase

2014-12-15 Thread Aniket Bhatnagar
In case you are still looking for help, there has been multiple discussions in this mailing list that you can try searching for. Or you can simply use https://github.com/unicredit/hbase-rdd :-) Thanks, Aniket On Wed Dec 03 2014 at 16:11:47 Ted Yu wrote: > Which hbase release are you runn

Re: Serialization issue when using HBase with Spark

2014-12-15 Thread Aniket Bhatnagar
rings : _*) where convertScanToString is implemented as: /** * Serializes a HBase scan into string. * @param scan Scan to serialize. * @return Base64 encoded serialized scan. */ private def convertScanToString(scan: Scan) = { val proto: ClientProtos.Scan = ProtobufUtil.toScan(scan) Base64.encodeBytes

Re: Serialization issue when using HBase with Spark

2014-12-15 Thread Shixiong Zhu
ible put all > rowkeys > > into HBaseConfiguration > > Option 2: > > sc.newAPIHadoopRDD(conf, classOf[MultiTableInputFormat], > > classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable], > > classOf[org.apache.hadoop.hbase.client.Result]) > > > >

Re: Serialization issue when using HBase with Spark

2014-12-14 Thread Yanbo
ges into several parts then use option 2, but I > prefer option 1. So is there any solution for option 1? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Serialization-issue-when-using-HBase-with-Spark-tp2065

Re: Serialization issue when using HBase with Spark

2014-12-12 Thread Akhil Das
Can you paste the complete code? it looks like at some point you are passing a hadoop's configuration which is not Serializable. You can look at this thread for similar discussion http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-into-HBase-td13378.html Thanks Best Regard

Serialization issue when using HBase with Spark

2014-12-12 Thread yangliuyu
.1001560.n3.nabble.com/Serialization-issue-when-using-HBase-with-Spark-tp20655.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional co

RE: Bulk-load to HBase

2014-12-07 Thread fralken
Hello, you can have a look at this project hbase-rdd <https://github.com/unicredit/hbase-rdd> that provides a simple method to bulk load an rdd to HBase. fralken -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Bulk-load-to-HBase-tp14667p20567.htm

Re: Loading a large Hbase table into SPARK RDD takes quite long time

2014-12-04 Thread bonnahu
-a-large-Hbase-table-into-SPARK-RDD-takes-quite-long-time-tp20396p20417.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional

Re: Loading a large Hbase table into SPARK RDD takes quite long time

2014-12-04 Thread bonnahu
the size of 3 columns are very small, probably less than 100 bytes. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Loading-a-large-Hbase-table-into-SPARK-RDD-takes-quite-long-time-tp20396p20414.html Sent from the Apache Spark User List mailing list archi

Re: Loading a large Hbase table into SPARK RDD takes quite long time

2014-12-04 Thread Ted Yu
ou run other > stuff in the background? > > Best regards > Am 04.12.2014 23:57 schrieb "bonnahu" : > >> I am trying to load a large Hbase table into SPARK RDD to run a SparkSQL >> query on the entity. For an entity with about 6 million rows, it will take >>

Re: Loading a large Hbase table into SPARK RDD takes quite long time

2014-12-04 Thread Jörn Franke
Hi, What is your cluster setup? How mich memory do you have? How much space does one row only consisting of the 3 columns consume? Do you run other stuff in the background? Best regards Am 04.12.2014 23:57 schrieb "bonnahu" : > I am trying to load a large Hbase table into SPARK

Loading a large Hbase table into SPARK RDD takes quite long time

2014-12-04 Thread bonnahu
I am trying to load a large Hbase table into SPARK RDD to run a SparkSQL query on the entity. For an entity with about 6 million rows, it will take about 35 seconds to load it to RDD. Is it expected? Is there any way to shorten the loading process? I have been getting some tips from http

Re: Spark with HBase

2014-12-03 Thread Ted Yu
Which hbase release are you running ? If it is 0.98, take a look at: https://issues.apache.org/jira/browse/SPARK-1297 Thanks On Dec 2, 2014, at 10:21 PM, Jai wrote: > I am trying to use Apache Spark with a psuedo distributed Hadoop Hbase > Cluster and I am looking for some links regardi

Re: Spark with HBase

2014-12-03 Thread Akhil Das
You could go through these to start with http://www.vidyasource.com/blog/Programming/Scala/Java/Data/Hadoop/Analytics/2014/01/25/lighting-a-spark-with-hbase http://stackoverflow.com/questions/25189527/how-to-process-a-range-of-hbase-rows-using-spark Thanks Best Regards On Wed, Dec 3, 2014 at

Spark with HBase

2014-12-02 Thread Jai
I am trying to use Apache Spark with a psuedo distributed Hadoop Hbase Cluster and I am looking for some links regarding the same. Can someone please guide me through the steps to accomplish this. Thanks a lot for Helping -- View this message in context: http://apache-spark-user-list.1001560

Using SparkSQL to query Hbase entity takes very long time

2014-12-02 Thread bonnahu
Hi all, I am new to Spark and currently I am trying to run a SparkSQL query on HBase entity. For an entity with about 4000 rows, it will take about 12 seconds. Is it expected? Is there any way to shorten the query process? Here is the code snippet: SparkConf sparkConf = new SparkConf

Spark 1.1.0 and HBase: Snappy UnsatisfiedLinkError

2014-11-25 Thread Pietro Gentile
a:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) This exception occurs at the line "val peopleRows = new NewHadoopRDD” when try to read rows from HBase (0.98)

Re: Reading from Hbase using python

2014-11-12 Thread Ted Yu
; } return CellUtil.cloneValue(cells[0]); This explains why you only got one row. In the thread you mentioned, see the code posted by freedafeng which iterates the Cells in Result. Cheers On Wed, Nov 12, 2014 at 1:04 PM, Ted Yu wrote: > To my knowledge, Spark 1.1 comes with HBase 0.94 > To u

Re: Reading from Hbase using python

2014-11-12 Thread Ted Yu
To my knowledge, Spark 1.1 comes with HBase 0.94 To utilize HBase 0.98, you will need: https://issues.apache.org/jira/browse/SPARK-1297 You can apply the patch and build Spark yourself. Cheers On Wed, Nov 12, 2014 at 12:57 PM, Alan Prando wrote: > Hi Ted! Thanks for anwsering... > >

Re: Reading from Hbase using python

2014-11-12 Thread Ted Yu
Can you give us a bit more detail: hbase release you're using. whether you can reproduce using hbase shell. I did the following using hbase shell against 0.98.4: hbase(main):001:0> create 'test', 'f1' 0 row(s) in 2.9140 seconds => Hbase::Table - test hbase(main)

Reading from Hbase using python

2014-11-12 Thread Alan Prando
Hi all, I'm trying to read an hbase table using this an example from github ( https://github.com/apache/spark/blob/master/examples/src/main/python/hbase_inputformat.py), however I have two qualifiers in a column family. Ex.: ROW COLUMN+CELL row1 column=f1:1, timestamp=1401883411986,

Re: pyspark get column family and qualifier names from hbase table

2014-11-12 Thread freedafeng
Hi Nick, I saw the HBase api has experienced lots of changes. If I remember correctly, the default hbase in spark 1.1.0 is 0.94.6. The one I am using is 0.98.1. To get the column family names and qualifier names, we need to call different methods for these two different versions. I don't kno

Re: pyspark get column family and qualifier names from hbase table

2014-11-12 Thread freedafeng
a PR probably later today. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-get-column-family-and-qualifier-names-from-hbase-table-tp18613p18744.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: pyspark get column family and qualifier names from hbase table

2014-11-11 Thread Nick Pentreath
d you write > HBaseResultToStringConverter to do what you wanted it to do? > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-get-column-family-and-qualifier-names-from-hbase-table-tp18613p18650.html > Sent from the Apache Spark User Li

Re: pyspark get column family and qualifier names from hbase table

2014-11-11 Thread alaa
e.com/pyspark-get-column-family-and-qualifier-names-from-hbase-table-tp18613p18650.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For addit

Does spark can't work with HBase?

2014-11-11 Thread gzlj
hello,all I have tested reading Hbase table with spark1.1 using SparkContext.newAPIHadoopRDD.I found the performance is much slower than reading from HIVE.I also try read data using HFileScanner on one region HFile,but the performance is not good.So,How do I improve performance spark reading

Re: pyspark get column family and qualifier names from hbase table

2014-11-11 Thread freedafeng
just wrote a custom convert in scala to replace HBaseResultToStringConverter. Just couple of lines of code. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-get-column-family-and-qualifier-names-from-hbase-table-tp18613p18639.html Sent from the

Re: pyspark get column family and qualifier names from hbase table

2014-11-11 Thread freedafeng
e is a big limitation. Converting from the 'list()' from the 'Result' is more general and easy to use. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-get-column-family-and-qualifier-names-from-hbase-table-tp18613p18619.html Sent from the

pyspark get column family and qualifier names from hbase table

2014-11-11 Thread freedafeng
Hello there, I am wondering how to get the column family names and column qualifier names when using pyspark to read an hbase table with multiple column families. I have a hbase table as follows, hbase(main):007:0> scan 'data1' ROW

Re: EC2 cluster set up and access to HBase in a different cluster

2014-10-16 Thread freedafeng
Maybe I should create a private AMI to use for my question No.1? Assuming I use the default instance type as the base image.. Anyone tried this? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/EC2-cluster-set-up-and-access-to-HBase-in-a-different-cluster

EC2 cluster set up and access to HBase in a different cluster

2014-10-16 Thread freedafeng
The plan is to create an EC2 cluster and run the (py) spark on it. Input data is from s3, output data goes to an hbase in a persistent cluster (also EC2). My questions are: 1. I need to install some software packages on all the workers (sudo apt-get install ...). Is there a better way to do this

Re: How to add HBase dependencies and conf with spark-submit?

2014-10-16 Thread Soumitra Kumar
ent: Thursday, October 16, 2014 12:50:01 AM Subject: Re: How to add HBase dependencies and conf with spark-submit? Thanks, Soumitra Kumar, I didn’t know why you put hbase-protocol.jar in SPARK_CLASSPATH, while add hbase-protocol.jar , hbase-common.jar , hbase-client.jar , htrace-core.jar i

Re: How to add HBase dependencies and conf with spark-submit?

2014-10-16 Thread Fengyun RAO
Thanks, Soumitra Kumar, I didn’t know why you put hbase-protocol.jar in SPARK_CLASSPATH, while add hbase-protocol.jar, hbase-common.jar, hbase-client.jar, htrace-core.jar in --jar, but it did work. Actually, I put all these four jars in SPARK_CLASSPATH along with HBase conf directory. ​ 2014-10

Re: How to add HBase dependencies and conf with spark-submit?

2014-10-15 Thread Soumitra Kumar
I am writing to HBase, following are my options: export SPARK_CLASSPATH=/opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol.jar spark-submit \ --jars /opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol.jar,/opt/cloudera/parcels/CDH/lib/hbase/hbase-common.jar,/opt/cloudera/parcels/CDH/lib

Re: How to add HBase dependencies and conf with spark-submit?

2014-10-15 Thread Fengyun RAO
+user@hbase 2014-10-15 20:48 GMT+08:00 Fengyun RAO : > We use Spark 1.1, and HBase 0.98.1-cdh5.1.0, and need to read and write an > HBase table in Spark program. > > I notice there are: > spark.driver.extraClassPath > spark.executor.extraClassPathproperties to manage extra Clas

How to add HBase dependencies and conf with spark-submit?

2014-10-15 Thread Fengyun RAO
We use Spark 1.1, and HBase 0.98.1-cdh5.1.0, and need to read and write an HBase table in Spark program. I notice there are: spark.driver.extraClassPath spark.executor.extraClassPathproperties to manage extra ClassPath, over even an deprecated SPARK_CLASSPATH. The problem is what classpath or

Re: Reading from HBase is too slow

2014-10-08 Thread Tao Xiao
Sean, I did specify the number of cores to use as follows: ... ... val sparkConf = new SparkConf() .setAppName("<<< Reading HBase >>>") .set("spark.cores.max", "32") val sc = new SparkContext(sparkConf) ... ... But that d

Re: Reading from HBase is too slow

2014-10-08 Thread Sean Owen
You do need to specify the number of executor cores to use. Executors are not like mappers. After all they may do much more in their lifetime than just read splits from HBase so would not make sense to determine it by something that the first line of the program does. On Oct 8, 2014 8:00 AM, &quo

Re: Reading from HBase is too slow

2014-10-08 Thread Tao Xiao
Hi Sean, Do I need to specify the number of executors when submitting the job? I suppose the number of executors will be determined by the number of regions of the table. Just like a MapReduce job, you needn't specify the number of map tasks when reading from a HBase table. The scri

Re: Reading from HBase is too slow

2014-10-07 Thread Sean Owen
How did you run your program? I don't see from your earlier post that you ever asked for more executors. On Wed, Oct 8, 2014 at 4:29 AM, Tao Xiao wrote: > I found the reason why reading HBase is too slow. Although each > regionserver serves multiple regions for the table I'm rea

Re: Reading from HBase is too slow

2014-10-07 Thread Tao Xiao
I found the reason why reading HBase is too slow. Although each regionserver serves multiple regions for the table I'm reading, the number of Spark workers allocated by Yarn is too low. Actually, I could see that the table has dozens of regions spread over about 20 regionservers, but onl

Re: Fixed:spark 1.1.0 - hbase 0.98.6-hadoop2 version - py4j.protocol.Py4JJavaError java.lang.ClassNotFoundException

2014-10-06 Thread serkan.dogan
-Dhadoop.version=2.4.1 -DskipTests clean package Now everything is ok. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-1-1-0-hbase-0-98-6-hadoop2-version-py4j-protocol-Py4JJavaError-java-lang-ClassNotFoundException-tp15668p15778.html Sent from the

Re: spark 1.1.0 - hbase 0.98.6-hadoop2 version - py4j.protocol.Py4JJavaError java.lang.ClassNotFoundException

2014-10-04 Thread Nick Pentreath
forgot to copy user list On Sat, Oct 4, 2014 at 3:12 PM, Nick Pentreath wrote: > what version did you put in the pom.xml? > > it does seem to be in Maven central: > http://search.maven.org/#artifactdetails%7Corg.apache.hbase%7Chbase%7C0.98.6-hadoop2%7Cpom > > > org.apa

spark 1.1.0 - hbase 0.98.6-hadoop2 version - py4j.protocol.Py4JJavaError java.lang.ClassNotFoundException

2014-10-03 Thread serkan.dogan
Hi, I installed hbase-0.98.6-hadoop2. It's working not any problem with that. When i am try to run spark hbase python examples, (wordcount examples working - not python issue) ./bin/spark-submit --master local --driver-class-path ./examples/target/spark-examples_2.10-1.1.0.jar ./exa

spark 1.1.0 - hbase 0.98.6-hadoop2 version - py4j.protocol.Py4JJavaError java.lang.ClassNotFoundException

2014-10-03 Thread serkan.dogan
Hi, I installed hbase-0.98.6-hadoop2. It's working not any problem with that. When i am try to run spark hbase python examples, (wordcount examples working - not python issue) ./bin/spark-submit --master local --driver-class-path ./examples/target/spark-examples_2.10-1.1.0.jar ./exa

Re: Reading from HBase is too slow

2014-10-01 Thread Vladimir Rodionov
2014 at 9:34 AM, Vladimir Rodionov < > vrodio...@splicemachine.com> wrote: > >> Using TableInputFormat is not the fastest way of reading data from HBase. >> Do not expect 100s of Mb per sec. You probably should take a look at M/R >> over HBase snapshots. >>

Re: Reading from HBase is too slow

2014-10-01 Thread Ted Yu
As far as I know, that feature is not in CDH 5.0.0 FYI On Wed, Oct 1, 2014 at 9:34 AM, Vladimir Rodionov < vrodio...@splicemachine.com> wrote: > Using TableInputFormat is not the fastest way of reading data from HBase. > Do not expect 100s of Mb per sec. You probably should take a

Re: Reading from HBase is too slow

2014-10-01 Thread Vladimir Rodionov
Using TableInputFormat is not the fastest way of reading data from HBase. Do not expect 100s of Mb per sec. You probably should take a look at M/R over HBase snapshots. https://issues.apache.org/jira/browse/HBASE-8369 -Vladimir Rodionov On Wed, Oct 1, 2014 at 8:17 AM, Tao Xiao wrote: > I

Re: Reading from HBase is too slow

2014-10-01 Thread Tao Xiao
; This would show whether the slowdown is in HBase code or somewhere else. > > Cheers > > On Mon, Sep 29, 2014 at 11:40 PM, Tao Xiao > wrote: > >> I checked HBase UI. Well, this table is not completely evenly spread >> across the nodes, but I think to some extent it c

<    1   2   3   4   5   6   7   8   >