Re: Efficient use of buffered writes in a post-HTablePool world?

Aaron Beppu Fri, 19 Dec 2014 10:59:08 -0800

Nick : Thanks, I've created an issue [1].

Pradeep : Yes, I have considered using that. However for the moment, we've
set it out of scope, since our migration from 0.94 -> 0.98 is already a bit
complicated, and we hoped to separate isolate these changes by not moving
to the async client until after the current migration is complete.


Andrew : HTableMultiplexer does seem like it would solve our buffered write
problem, albeit in an awkward way -- thanks! It kind of seems like HTable
should then (if autoFlush == false) send writes to the multiplexer, rather
than setting it in its own, short-lived writeBuffer. If nothing else, it's
still super confusing that HTableInterface exposes setAutoFlush() and
setWriteBufferSize(), given that the writeBuffer won't meaningfully buffer
anything if all tables are short-lived.

[1] https://issues.apache.org/jira/browse/HBASE-12728

On Fri, Dec 19, 2014 at 10:31 AM, Andrew Purtell <apurt...@apache.org>
wrote:
>
> I believe HTableMultiplexer[1] is meant to stand in for HTablePool for
> buffered writing. FWIW, I've not used it.
>
> 1:
>
> https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTableMultiplexer.html
>
>
> On Fri, Dec 19, 2014 at 9:00 AM, Nick Dimiduk <ndimi...@apache.org> wrote:
> >
> > Hi Aaron,
> >
> > Your analysis is spot on and I do not believe this is by design. I see
> the
> > write buffer is owned by the table, while I would have expected there to
> be
> > a buffer per table all managed by the connection. I suggest you raise a
> > blocker ticket vs the 1.0.0 release that's just around the corner to give
> > this the attention it needs. Let me know if you're not into JIRA, I can
> > raise one on your behalf.
> >
> > cc Lars, Enis.
> >
> > Nice work Aaron.
> > -n
> >
> > On Wed, Dec 17, 2014 at 6:44 PM, Aaron Beppu <abe...@siftscience.com>
> > wrote:
> > >
> > > Hi All,
> > >
> > > TLDR; in the absence of HTablePool, if HTable instances are
> short-lived,
> > > how should clients use buffered writes?
> > >
> > > I’m working on migrating a codebase from using 0.94.6 (CDH4.4) to
> 0.98.6
> > > (CDH5.2). One issue I’m confused by is how to effectively use buffered
> > > writes now that HTablePool has been deprecated[1].
> > >
> > > In our 0.94 code, a pathway could get a table from the pool, configure
> it
> > > with table.setAutoFlush(false); and write Puts to it. Those writes
> would
> > > then go to the table instance’s writeBuffer, and those writes would
> only
> > be
> > > flushed when the buffer was full, or when we were ready to close out
> the
> > > pool. We were intentionally choosing to have fewer, larger writes from
> > the
> > > client to the cluster, and we knew we were giving up a degree of safety
> > in
> > > exchange (i.e. if the client dies after it’s accepted a write but
> before
> > > the flush for that write occurs, the data is lost). This seems to be a
> > > generally considered a reasonable choice (cf the HBase Book [2] SS
> > 14.8.4)
> > >
> > > However in the 0.98 world, without HTablePool, the endorsed pattern [3]
> > > seems to be to create a new HTable via table =
> > > stashedHConnection.getTable(tableName, myExecutorService). However,
> even
> > if
> > > we do table.setAutoFlush(false), because that table instance is
> > > short-lived, its buffer never gets full. We’ll create a table instance,
> > > write a put to it, try to close the table, and the close call will
> > trigger
> > > a (synchronous) flush. Thus, not having HTablePool seems like it would
> > > cause us to have many more small writes from the client to the cluster,
> > and
> > > basically wipe out the advantage of turning off autoflush.
> > >
> > > More concretely :
> > >
> > > // Given these two helpers ...
> > >
> > > private HTableInterface getAutoFlushTable(String tableName) throws
> > > IOException {
> > >   // (autoflush is true by default)
> > >   return storedConnection.getTable(tableName, executorService);
> > > }
> > >
> > > private HTableInterface getBufferedTable(String tableName) throws
> > > IOException {
> > >   HTableInterface table = getAutoFlushTable(tableName);
> > >   table.setAutoFlush(false);
> > >   return table;
> > > }
> > >
> > > // it's my contention that these two methods would behave almost
> > > identically,
> > > // except the first will hit a synchronous flush during the put call,
> > > and the second will
> > > // flush during the (hidden) close call on table.
> > >
> > > private void writeAutoFlushed(Put somePut) throws IOException {
> > >   try (HTableInterface table = getAutoFlushTable(tableName)) {
> > >     table.put(somePut); // will do synchronous flush
> > >   }
> > > }
> > >
> > > private void writeBuffered(Put somePut) throws IOException {
> > >   try (HTableInterface table = getBufferedTable(tableName)) {
> > >     table.put(somePut);
> > >   } // auto-close will trigger synchronous flush
> > > }
> > >
> > > It seems like the only way to avoid this is to have long-lived HTable
> > > instances, which get reused for multiple writes. However, since the
> > actual
> > > writes are driven from highly concurrent code, and since HTable is not
> > > threadsafe, this would involve having a number of HTable instances,
> and a
> > > control mechanism for leasing them out to individual threads safely.
> > Except
> > > at this point it seems like we will have recreated HTablePool, which
> > > suggests that we’re doing something deeply wrong.
> > >
> > > What am I missing here? Since the HTableInterface.setAutoFlush method
> > still
> > > exists, it must be anticipated that users will still want to buffer
> > writes.
> > > What’s the recommended way to actually buffer a meaningful number of
> > > writes, from a multithreaded context, that doesn’t just amount to
> > creating
> > > a table pool?
> > >
> > > Thanks in advance,
> > > Aaron
> > >
> > > [1] https://issues.apache.org/jira/browse/HBASE-6580
> > > [2] http://hbase.apache.org/book/perf.writing.html
> > > [3]
> > >
> > >
> >
> https://issues.apache.org/jira/browse/HBASE-6580?focusedCommentId=13501302&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13501302
> > > 
> > >
> >
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Re: Efficient use of buffered writes in a post-HTablePool world?

Reply via email to