Re: Streaming data to htable

Nicolas Liochon Fri, 13 Feb 2015 13:14:08 -0800

You should first try with the 'autoflush' boolean on the htable: set it to
false. it buffers the writes for you and does the writes asynchronously. So
all the multithreading / buffering work is done for you.
If you need a synchronisation point (to free the resources on the sending
side), you can call flushCommit on the htable.


Then the server side settings depend on your data volume. See this blog:
http://gbif.blogspot.fr/2012/07/optimizing-writes-in-hbase.html it's 3
years old, but most of it is still true.

Nicolas



On Fri, Feb 13, 2015 at 7:20 AM, hongbin ma <[email protected]> wrote:

> hi,
>
> I'm trying to use a htable to store data that comes in a streaming fashion.
> The streaming in data is guaranteed to have a larger KEY than ANY existing
> keys in the table.
> And the data will be READONLY.
>
> The data is streaming in at a very high rate, I don't want to issue a PUT
> operation for each data entry, because obviously it is poor in performance.
> I'm thinking about pooling the data entries and flush them to hbase every
> five minutes, and I AFAIK there're few options:
>
> 1.  Pool the data entries, and every 5 minute run a MR job to convert the
> data to hfile format. This approach could avoid the overhead of single PUT,
> but I'm afraid the MR job might be too costly( waiting in the job queue) to
> keep in pace.
>
> 2. Use HtableInterface.put(List<Put>) the batched version should be faster,
> but I'm not quite sure how much.
>
> 3.?
>
> can anyone give me some advice on this?
> thanks!
>
> hongbin
>

Re: Streaming data to htable

Reply via email to