Re: Bulk load to HBase

2017-10-22 Thread Jörn Franke
Before you look at any new library/tool: What is the process of importing, what is the original file format, file size, compression etc . once you have investigated this you can start improving it. Then, as a last step a new framework can be explored. Feel free to share those and we can help you

RE: Bulk-load to HBase

2014-12-07 Thread fralken
Hello, you can have a look at this project hbase-rdd that provides a simple method to bulk load an rdd to HBase. fralken -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Bulk-load-to-HBase-tp14667p20567.html Sent fr

RE: Bulk-load to HBase

2014-09-22 Thread innowireless TaeYun Kim
(For that time, my program did not include the HBase export task.) BTW, I use Spark 1.0.0. Thank you. -Original Message- From: Sean Owen [mailto:so...@cloudera.com] Sent: Monday, September 22, 2014 6:26 PM To: innowireless TaeYun Kim Cc: user Subject: Re: Bulk-load to HBase On Mon, S

Re: Bulk-load to HBase

2014-09-22 Thread Sean Owen
On Mon, Sep 22, 2014 at 10:21 AM, innowireless TaeYun Kim wrote: > I have to merge the byte[]s that have the same key. > If merging is done with reduceByKey(), a lot of intermediate byte[] > allocation and System.arraycopy() is executed, and it is too slow. So I had > to resort to groupByKey(),

RE: Bulk-load to HBase

2014-09-22 Thread innowireless TaeYun Kim
2014 5:46 PM To: innowireless TaeYun Kim Cc: user Subject: Re: Bulk-load to HBase I see a number of potential issues: On Mon, Sep 22, 2014 at 8:42 AM, innowireless TaeYun Kim wrote: > JavaPairRDD rdd = > // MyKey has a byte[] member for rowkey Two byte[] with the same content

Re: Bulk-load to HBase

2014-09-22 Thread Sean Owen
I see a number of potential issues: On Mon, Sep 22, 2014 at 8:42 AM, innowireless TaeYun Kim wrote: > JavaPairRDD rdd = > // MyKey has a byte[] member for rowkey Two byte[] with the same contents are not equals(), so won't work as you intend as a key. Is there more to it? I assume so

RE: Bulk-load to HBase

2014-09-22 Thread innowireless TaeYun Kim
correction would be very helpful. Thanks. -Original Message- From: Soumitra Kumar [mailto:kumar.soumi...@gmail.com] Sent: Saturday, September 20, 2014 1:44 PM To: Ted Yu Cc: innowireless TaeYun Kim; user; Aniket Bhatnagar Subject: Re: Bulk-load to HBase I successfully did this once.

Re: Bulk-load to HBase

2014-09-19 Thread Soumitra Kumar
ormat], conf) Then I do hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /user/cloudera/spark output to load the HFiles to hbase. - Original Message - From: "Ted Yu" To: "Aniket Bhatnagar" Cc: "innowireless TaeYun Kim" , "user" Sent: Friday, September 19, 2014 2:29:51 PM Subject:

Re: Bulk-load to HBase

2014-09-19 Thread Ted Yu
ile HFileOutFormat uses it to >> directly build the HFile. >> >> >> >> *From:* innowireless TaeYun Kim [mailto:taeyun@innowireless.co.kr] >> *Sent:* Friday, September 19, 2014 9:20 PM >> >> *To:* user@spark.apache.org >> *Subject:* RE: Bulk-load to H

Re: Bulk-load to HBase

2014-09-19 Thread Aniket Bhatnagar
goes through normal write path), while HFileOutFormat uses it to > directly build the HFile. > > > > *From:* innowireless TaeYun Kim [mailto:taeyun@innowireless.co.kr] > *Sent:* Friday, September 19, 2014 9:20 PM > > *To:* user@spark.apache.org > *Subject:* RE:

RE: Bulk-load to HBase

2014-09-19 Thread innowireless TaeYun Kim
: innowireless TaeYun Kim [mailto:taeyun@innowireless.co.kr] Sent: Friday, September 19, 2014 9:20 PM To: user@spark.apache.org Subject: RE: Bulk-load to HBase Thank you for the example code. Currently I use foreachPartition() + Put(), but your example code can be used to clean up my code

RE: Bulk-load to HBase

2014-09-19 Thread innowireless TaeYun Kim
bypasses the write path. Thanks. From: Aniket Bhatnagar [mailto:aniket.bhatna...@gmail.com] Sent: Friday, September 19, 2014 9:01 PM To: innowireless TaeYun Kim Cc: user Subject: Re: Bulk-load to HBase I have been using saveAsNewAPIHadoopDataset but I use TableOutputFormat instead of

Re: Bulk-load to HBase

2014-09-19 Thread Aniket Bhatnagar
t found saveAsNewAPIHadoopDataset. > > Then, Can I use HFileOutputFormat with saveAsNewAPIHadoopDataset? Is there > any example code for that? > > > > Thanks. > > > > *From:* innowireless TaeYun Kim [mailto:taeyun@innowireless.co.kr] > *Sent:* Friday, September 19,

RE: Bulk-load to HBase

2014-09-19 Thread innowireless TaeYun Kim
@spark.apache.org Subject: RE: Bulk-load to HBase Hi, After reading several documents, it seems that saveAsHadoopDataset cannot use HFileOutputFormat. It's because saveAsHadoopDataset method uses JobConf, so it belongs to the old Hadoop API, while HFileOutputFormat is a member of mapr

RE: Bulk-load to HBase

2014-09-19 Thread innowireless TaeYun Kim
Hi, After reading several documents, it seems that saveAsHadoopDataset cannot use HFileOutputFormat. It's because saveAsHadoopDataset method uses JobConf, so it belongs to the old Hadoop API, while HFileOutputFormat is a member of mapreduce package which is for the new Hadoop API. Am I rig