better.
BTW if you need to use Spark then go for 2.x - it is also available in HDP.
> On 22. Oct 2017, at 10:20, Pradeep wrote:
>
> We are on Hortonworks 2.5 and very soon upgrading to 2.6. Spark version 1.6.2.
>
> We have large volume of data that we bulk load to HBase using
We are on Hortonworks 2.5 and very soon upgrading to 2.6. Spark version 1.6.2.
We have large volume of data that we bulk load to HBase using import tsv. Map
Reduce job is very slow and looking for options we can use spark to improve
performance. Please let me know if this can be optimized with
Hello, you can have a look at this project hbase-rdd
<https://github.com/unicredit/hbase-rdd> that provides a simple method to
bulk load an rdd to HBase.
fralken
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Bulk-load-to-HBase-tp14667p20567.htm
(For that time, my program did
not include the HBase export task.)
BTW, I use Spark 1.0.0.
Thank you.
-Original Message-
From: Sean Owen [mailto:so...@cloudera.com]
Sent: Monday, September 22, 2014 6:26 PM
To: innowireless TaeYun Kim
Cc: user
Subject: Re: Bulk-load to HBase
On Mon, S
On Mon, Sep 22, 2014 at 10:21 AM, innowireless TaeYun Kim
wrote:
> I have to merge the byte[]s that have the same key.
> If merging is done with reduceByKey(), a lot of intermediate byte[]
> allocation and System.arraycopy() is executed, and it is too slow. So I had
> to resort to groupByKey(),
2014 5:46 PM
To: innowireless TaeYun Kim
Cc: user
Subject: Re: Bulk-load to HBase
I see a number of potential issues:
On Mon, Sep 22, 2014 at 8:42 AM, innowireless TaeYun Kim
wrote:
> JavaPairRDD rdd =
> // MyKey has a byte[] member for rowkey
Two byte[] with the same content
I see a number of potential issues:
On Mon, Sep 22, 2014 at 8:42 AM, innowireless TaeYun Kim
wrote:
> JavaPairRDD rdd =
> // MyKey has a byte[] member for rowkey
Two byte[] with the same contents are not equals(), so won't work as
you intend as a key. Is there more to it? I assume so
correction would be very helpful.
Thanks.
-Original Message-
From: Soumitra Kumar [mailto:kumar.soumi...@gmail.com]
Sent: Saturday, September 20, 2014 1:44 PM
To: Ted Yu
Cc: innowireless TaeYun Kim; user; Aniket Bhatnagar
Subject: Re: Bulk-load to HBase
I successfully did this once.
ormat], conf)
Then I do
hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles
/user/cloudera/spark output
to load the HFiles to hbase.
- Original Message -
From: "Ted Yu"
To: "Aniket Bhatnagar"
Cc: "innowireless TaeYun Kim" , "user"
Sent: Friday, September 19, 2014 2:29:51 PM
Subject:
ile HFileOutFormat uses it to
>> directly build the HFile.
>>
>>
>>
>> *From:* innowireless TaeYun Kim [mailto:taeyun@innowireless.co.kr]
>> *Sent:* Friday, September 19, 2014 9:20 PM
>>
>> *To:* user@spark.apache.org
>> *Subject:* RE: Bulk-load to H
goes through normal write path), while HFileOutFormat uses it to
> directly build the HFile.
>
>
>
> *From:* innowireless TaeYun Kim [mailto:taeyun@innowireless.co.kr]
> *Sent:* Friday, September 19, 2014 9:20 PM
>
> *To:* user@spark.apache.org
> *Subject:* RE:
: innowireless TaeYun Kim [mailto:taeyun@innowireless.co.kr]
Sent: Friday, September 19, 2014 9:20 PM
To: user@spark.apache.org
Subject: RE: Bulk-load to HBase
Thank you for the example code.
Currently I use foreachPartition() + Put(), but your example code can be used
to clean up my code
bypasses the write
path.
Thanks.
From: Aniket Bhatnagar [mailto:aniket.bhatna...@gmail.com]
Sent: Friday, September 19, 2014 9:01 PM
To: innowireless TaeYun Kim
Cc: user
Subject: Re: Bulk-load to HBase
I have been using saveAsNewAPIHadoopDataset but I use TableOutputFormat instead
of
t found saveAsNewAPIHadoopDataset.
>
> Then, Can I use HFileOutputFormat with saveAsNewAPIHadoopDataset? Is there
> any example code for that?
>
>
>
> Thanks.
>
>
>
> *From:* innowireless TaeYun Kim [mailto:taeyun@innowireless.co.kr]
> *Sent:* Friday, September 19,
@spark.apache.org
Subject: RE: Bulk-load to HBase
Hi,
After reading several documents, it seems that saveAsHadoopDataset cannot
use HFileOutputFormat.
It's because saveAsHadoopDataset method uses JobConf, so it belongs to the
old Hadoop API, while HFileOutputFormat is a member of mapr
Am I right?
If so, is there another method to bulk-load to HBase from RDD?
Thanks.
From: innowireless TaeYun Kim [mailto:taeyun@innowireless.co.kr]
Sent: Friday, September 19, 2014 7:17 PM
To: user@spark.apache.org
Subject: Bulk-load to HBase
Hi,
Is there a way to bulk-load to
Hi,
Is there a way to bulk-load to HBase from RDD?
HBase offers HFileOutputFormat class for bulk loading by MapReduce job, but
I cannot figure out how to use it with saveAsHadoopDataset.
Thanks.
17 matches
Mail list logo