Re: Writing/Importing large number of records into HBase

Ted Yu Fri, 27 Jan 2017 19:58:09 -0800

Chetan:
The link you posted was from personal repo. 

There hasn't been commit for at least a year.


Meanwhile, the hbase-spark module in hbase repo is being actively maintained. 

FYI 

> On Jan 27, 2017, at 7:47 PM, Chetan Khatri <chetan.opensou...@gmail.com> 
> wrote:
> 
> Adding to @Ted Check Bulk Put Example -
> https://github.com/tmalaska/SparkOnHBase/blob/master/src/main/scala/org/apache/hadoop/hbase/spark/example/hbasecontext/HBaseBulkPutExampleFromFile.scala
> 
>> On Sat, Jan 28, 2017 at 9:11 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>> 
>> Have you looked at hbase-spark module (currently in master branch) ?
>> 
>> See hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/
>> example/datasources/AvroSource.scala
>> and hbase-spark/src/test/scala/org/apache/hadoop/hbase/spark/
>> DefaultSourceSuite.scala
>> for examples.
>> 
>> There may be other options.
>> 
>> FYI
>> 
>> On Fri, Jan 27, 2017 at 7:28 PM, jeff saremi <jeffsar...@hotmail.com>
>> wrote:
>> 
>>> Hi
>>> I'm seeking some pointers/guidance on what we could do to insert billions
>>> of records that we already have in avro files in hadoop into HBase.
>>> 
>>> I read some articles online and one of them recommended using HFile
>>> format. I took a cursory look at the documentation for that. Given the
>>> complexity of that I think that may be the last resort we want to pursue.
>>> Unless some library is out there that easily helps us write our files
>> into
>>> that format. I didn't see any.
>>> Assuming that the Hbase native client may be our best bet, is there any
>>> advice around pre-paritioning our records or such techniques that we
>> could
>>> use?
>>> thanks
>>> 
>>> Jeff
>>

Re: Writing/Importing large number of records into HBase

Reply via email to