Hi Chetan What do you mean by incremental load from HBase? There is a timestamp marker for each cell, but not at Row level.
On Wed, Jan 4, 2017 at 10:37 PM, Chetan Khatri <chetan.opensou...@gmail.com> wrote: > Ted Yu, > > You understood wrong, i said Incremental load from HBase to Hive, > individually you can say Incremental Import from HBase. > > On Wed, Dec 21, 2016 at 10:04 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >> Incremental load traditionally means generating hfiles and >> using org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles to load >> the data into hbase. >> >> For your use case, the producer needs to find rows where the flag is 0 or >> 1. >> After such rows are obtained, it is up to you how the result of >> processing is delivered to hbase. >> >> Cheers >> >> On Wed, Dec 21, 2016 at 8:00 AM, Chetan Khatri < >> chetan.opensou...@gmail.com> wrote: >> >>> Ok, Sure will ask. >>> >>> But what would be generic best practice solution for Incremental load >>> from HBASE. >>> >>> On Wed, Dec 21, 2016 at 8:42 PM, Ted Yu <yuzhih...@gmail.com> wrote: >>> >>>> I haven't used Gobblin. >>>> You can consider asking Gobblin mailing list of the first option. >>>> >>>> The second option would work. >>>> >>>> >>>> On Wed, Dec 21, 2016 at 2:28 AM, Chetan Khatri < >>>> chetan.opensou...@gmail.com> wrote: >>>> >>>>> Hello Guys, >>>>> >>>>> I would like to understand different approach for Distributed >>>>> Incremental load from HBase, Is there any *tool / incubactor tool* which >>>>> satisfy requirement ? >>>>> >>>>> *Approach 1:* >>>>> >>>>> Write Kafka Producer and maintain manually column flag for events and >>>>> ingest it with Linkedin Gobblin to HDFS / S3. >>>>> >>>>> *Approach 2:* >>>>> >>>>> Run Scheduled Spark Job - Read from HBase and do transformations and >>>>> maintain flag column at HBase Level. >>>>> >>>>> In above both approach, I need to maintain column level flags. such as >>>>> 0 - by default, 1-sent,2-sent and acknowledged. So next time Producer will >>>>> take another 1000 rows of batch where flag is 0 or 1. >>>>> >>>>> I am looking for best practice approach with any distributed tool. >>>>> >>>>> Thanks. >>>>> >>>>> - Chetan Khatri >>>>> >>>> >>>> >>> >> > -- Best Regards, Ayan Guha