Re: Bulk load to HBase

2017-10-22 Thread Jörn Franke
better. BTW if you need to use Spark then go for 2.x - it is also available in HDP. > On 22. Oct 2017, at 10:20, Pradeep wrote: > > We are on Hortonworks 2.5 and very soon upgrading to 2.6. Spark version 1.6.2. > > We have large volume of data that we bulk load to HBase using

Bulk load to HBase

2017-10-22 Thread Pradeep
We are on Hortonworks 2.5 and very soon upgrading to 2.6. Spark version 1.6.2. We have large volume of data that we bulk load to HBase using import tsv. Map Reduce job is very slow and looking for options we can use spark to improve performance. Please let me know if this can be optimized with

RE: Bulk-load to HBase

2014-12-07 Thread fralken
Hello, you can have a look at this project hbase-rdd <https://github.com/unicredit/hbase-rdd> that provides a simple method to bulk load an rdd to HBase. fralken -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Bulk-load-to-HBase-tp14667p20567.htm

RE: Bulk-load to HBase

2014-09-22 Thread innowireless TaeYun Kim
(For that time, my program did not include the HBase export task.) BTW, I use Spark 1.0.0. Thank you. -Original Message- From: Sean Owen [mailto:so...@cloudera.com] Sent: Monday, September 22, 2014 6:26 PM To: innowireless TaeYun Kim Cc: user Subject: Re: Bulk-load to HBase On Mon, S

Re: Bulk-load to HBase

2014-09-22 Thread Sean Owen
On Mon, Sep 22, 2014 at 10:21 AM, innowireless TaeYun Kim wrote: > I have to merge the byte[]s that have the same key. > If merging is done with reduceByKey(), a lot of intermediate byte[] > allocation and System.arraycopy() is executed, and it is too slow. So I had > to resort to groupByKey(),

RE: Bulk-load to HBase

2014-09-22 Thread innowireless TaeYun Kim
2014 5:46 PM To: innowireless TaeYun Kim Cc: user Subject: Re: Bulk-load to HBase I see a number of potential issues: On Mon, Sep 22, 2014 at 8:42 AM, innowireless TaeYun Kim wrote: > JavaPairRDD rdd = > // MyKey has a byte[] member for rowkey Two byte[] with the same content

Re: Bulk-load to HBase

2014-09-22 Thread Sean Owen
I see a number of potential issues: On Mon, Sep 22, 2014 at 8:42 AM, innowireless TaeYun Kim wrote: > JavaPairRDD rdd = > // MyKey has a byte[] member for rowkey Two byte[] with the same contents are not equals(), so won't work as you intend as a key. Is there more to it? I assume so

RE: Bulk-load to HBase

2014-09-22 Thread innowireless TaeYun Kim
correction would be very helpful. Thanks. -Original Message- From: Soumitra Kumar [mailto:kumar.soumi...@gmail.com] Sent: Saturday, September 20, 2014 1:44 PM To: Ted Yu Cc: innowireless TaeYun Kim; user; Aniket Bhatnagar Subject: Re: Bulk-load to HBase I successfully did this once.

Re: Bulk-load to HBase

2014-09-19 Thread Soumitra Kumar
ormat], conf) Then I do hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /user/cloudera/spark output to load the HFiles to hbase. - Original Message - From: "Ted Yu" To: "Aniket Bhatnagar" Cc: "innowireless TaeYun Kim" , "user" Sent: Friday, September 19, 2014 2:29:51 PM Subject:

Re: Bulk-load to HBase

2014-09-19 Thread Ted Yu
ile HFileOutFormat uses it to >> directly build the HFile. >> >> >> >> *From:* innowireless TaeYun Kim [mailto:taeyun@innowireless.co.kr] >> *Sent:* Friday, September 19, 2014 9:20 PM >> >> *To:* user@spark.apache.org >> *Subject:* RE: Bulk-load to H

Re: Bulk-load to HBase

2014-09-19 Thread Aniket Bhatnagar
goes through normal write path), while HFileOutFormat uses it to > directly build the HFile. > > > > *From:* innowireless TaeYun Kim [mailto:taeyun@innowireless.co.kr] > *Sent:* Friday, September 19, 2014 9:20 PM > > *To:* user@spark.apache.org > *Subject:* RE:

RE: Bulk-load to HBase

2014-09-19 Thread innowireless TaeYun Kim
: innowireless TaeYun Kim [mailto:taeyun@innowireless.co.kr] Sent: Friday, September 19, 2014 9:20 PM To: user@spark.apache.org Subject: RE: Bulk-load to HBase Thank you for the example code. Currently I use foreachPartition() + Put(), but your example code can be used to clean up my code

RE: Bulk-load to HBase

2014-09-19 Thread innowireless TaeYun Kim
bypasses the write path. Thanks. From: Aniket Bhatnagar [mailto:aniket.bhatna...@gmail.com] Sent: Friday, September 19, 2014 9:01 PM To: innowireless TaeYun Kim Cc: user Subject: Re: Bulk-load to HBase I have been using saveAsNewAPIHadoopDataset but I use TableOutputFormat instead of

Re: Bulk-load to HBase

2014-09-19 Thread Aniket Bhatnagar
t found saveAsNewAPIHadoopDataset. > > Then, Can I use HFileOutputFormat with saveAsNewAPIHadoopDataset? Is there > any example code for that? > > > > Thanks. > > > > *From:* innowireless TaeYun Kim [mailto:taeyun@innowireless.co.kr] > *Sent:* Friday, September 19,

RE: Bulk-load to HBase

2014-09-19 Thread innowireless TaeYun Kim
@spark.apache.org Subject: RE: Bulk-load to HBase Hi, After reading several documents, it seems that saveAsHadoopDataset cannot use HFileOutputFormat. It's because saveAsHadoopDataset method uses JobConf, so it belongs to the old Hadoop API, while HFileOutputFormat is a member of mapr

RE: Bulk-load to HBase

2014-09-19 Thread innowireless TaeYun Kim
Am I right? If so, is there another method to bulk-load to HBase from RDD? Thanks. From: innowireless TaeYun Kim [mailto:taeyun@innowireless.co.kr] Sent: Friday, September 19, 2014 7:17 PM To: user@spark.apache.org Subject: Bulk-load to HBase Hi, Is there a way to bulk-load to

Bulk-load to HBase

2014-09-19 Thread innowireless TaeYun Kim
Hi, Is there a way to bulk-load to HBase from RDD? HBase offers HFileOutputFormat class for bulk loading by MapReduce job, but I cannot figure out how to use it with saveAsHadoopDataset. Thanks.