Re: Approach: Incremental data load from HBASE

ayan guha Wed, 04 Jan 2017 12:32:39 -0800

Hi Chetan

What do you mean by incremental load from HBase? There is a timestamp
marker for each cell, but not at Row level.


On Wed, Jan 4, 2017 at 10:37 PM, Chetan Khatri <chetan.opensou...@gmail.com>
wrote:

> Ted Yu,
>
> You understood wrong, i said Incremental load from HBase to Hive,
> individually you can say Incremental Import from HBase.
>
> On Wed, Dec 21, 2016 at 10:04 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> Incremental load traditionally means generating hfiles and
>> using org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles to load
>> the data into hbase.
>>
>> For your use case, the producer needs to find rows where the flag is 0 or
>> 1.
>> After such rows are obtained, it is up to you how the result of
>> processing is delivered to hbase.
>>
>> Cheers
>>
>> On Wed, Dec 21, 2016 at 8:00 AM, Chetan Khatri <
>> chetan.opensou...@gmail.com> wrote:
>>
>>> Ok, Sure will ask.
>>>
>>> But what would be generic best practice solution for Incremental load
>>> from HBASE.
>>>
>>> On Wed, Dec 21, 2016 at 8:42 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>
>>>> I haven't used Gobblin.
>>>> You can consider asking Gobblin mailing list of the first option.
>>>>
>>>> The second option would work.
>>>>
>>>>
>>>> On Wed, Dec 21, 2016 at 2:28 AM, Chetan Khatri <
>>>> chetan.opensou...@gmail.com> wrote:
>>>>
>>>>> Hello Guys,
>>>>>
>>>>> I would like to understand different approach for Distributed
>>>>> Incremental load from HBase, Is there any *tool / incubactor tool* which
>>>>> satisfy requirement ?
>>>>>
>>>>> *Approach 1:*
>>>>>
>>>>> Write Kafka Producer and maintain manually column flag for events and
>>>>> ingest it with Linkedin Gobblin to HDFS / S3.
>>>>>
>>>>> *Approach 2:*
>>>>>
>>>>> Run Scheduled Spark Job - Read from HBase and do transformations and
>>>>> maintain flag column at HBase Level.
>>>>>
>>>>> In above both approach, I need to maintain column level flags. such as
>>>>> 0 - by default, 1-sent,2-sent and acknowledged. So next time Producer will
>>>>> take another 1000 rows of batch where flag is 0 or 1.
>>>>>
>>>>> I am looking for best practice approach with any distributed tool.
>>>>>
>>>>> Thanks.
>>>>>
>>>>> - Chetan Khatri
>>>>>
>>>>
>>>>
>>>
>>
>


-- 
Best Regards,
Ayan Guha

Re: Approach: Incremental data load from HBASE

Reply via email to