@Devaraja, Would you mind posting that on https://issues.apache.org/jira/browse/HBASE-12728? The HBase group is talking about this topic on that JIRA issue.
Thanks, -Solomon On Wed, Dec 24, 2014 at 9:40 PM, Devaraja Swami <devarajasw...@gmail.com> wrote: > > Would like to add my perspective as a user. (Thanks to Aaron Beppu for > uncovering this hidden issue). In my applications, I have some tables for > which I need autoflushing, and others for which I need a write buffer. Plus > the size of the write buffer is different for different tables. > All these seem to imply that the HBase client side will need to maintain > and operate write buffers on a per-table basis, whether or not the > ephemeral Table/HTableInterface instances come and go (ie., are closed). > The question then, as Nick points out, is what entity is responsible for > flushing the buffers. By elimination, my feeling is that this would end up > being the Connection instance. > > On Fri, Dec 19, 2014 at 5:55 PM, Nick Dimiduk <ndimi...@gmail.com> wrote: > > > Could be in an API-compatible way, though semantics would change, which > is > > probably worse. Table keeps these methods. When setAutoFlush is used, > write > > buffer managed by connection is created. If multiple Table instances for > > the same table setWriteBufferSize(), perhaps the largest value wins. > Writes > > across these instances all hit the same buffer. What's not clear here is > > who owns the ExecutorService(s) that handles flushing the buffer. > > > > My original thought was make this a blocker of 1.0, but we've shipped > 0.96 > > and 0.98 this way, so we have to keep API and semantics around for > backward > > compatibility anyway. Doesn't mean we can't so the new API better though. > > HTablePool is still in 1.0, so this would be thinking ahead to the fancy > > new Table-based API. If we drop these two methods from Table, we can ship > > with a feature gap between old and new API, resolve this in 1.1. Folks > who > > need this kind of pooling can continue to use HTablePool with HTables. > > > > On Friday, December 19, 2014, Solomon Duskis <sdus...@gmail.com> wrote: > > > > > My first thought based on this discussion was that it would require > > moving > > > some methods (setAutoFlush() and setWriteBufferSize()) from Table to > > > Connection. That would be a breaking API change. > > > > > > -Solomon > > > > > > On Fri, Dec 19, 2014 at 3:04 PM, Andrew Purtell <apurt...@apache.org > > > <javascript:;>> wrote: > > > > > > > > I think it would be critical if we're contemplating something that > > > requires > > > > a breaking API change? Do we have that here? I'm not sure. > > > > > > > > On Fri, Dec 19, 2014 at 12:02 PM, Solomon Duskis <sdus...@gmail.com > > > <javascript:;>> > > > > wrote: > > > > > > > > > > Is this critical to sort out before 1.0, or is fixing this a > post-1.0 > > > > > enhancement? > > > > > > > > > > -Solomon > > > > > > > > > > On Fri, Dec 19, 2014 at 2:19 PM, Andrew Purtell < > apurt...@apache.org > > > <javascript:;>> > > > > > wrote: > > > > > > > > > > > > I don't like the dropped writes either. Just pointing out what we > > > have > > > > > now. > > > > > > There is a gap no doubt. > > > > > > > > > > > > On Fri, Dec 19, 2014 at 11:16 AM, Nick Dimiduk < > > ndimi...@apache.org > > > <javascript:;>> > > > > > > wrote: > > > > > > > > > > > > > > Thanks for the reminder about the Multiplexer, Andrew. It > sort-of > > > > > solves > > > > > > > this problem, but think it's semantics of dropping writes are > not > > > > > > desirable > > > > > > > in the general case. Further, my understanding was that the new > > > > > > connection > > > > > > > implementation is designed to handle this kind of use-case > (hence > > > > > cc'ing > > > > > > > Lars). > > > > > > > > > > > > > > On Fri, Dec 19, 2014 at 11:02 AM, Andrew Purtell < > > > > apurt...@apache.org <javascript:;>> > > > > > > > wrote: > > > > > > > > > > > > > > > > Aaron: Please post a copy of that feedback on the JIRA, > pretty > > > sure > > > > > we > > > > > > > will > > > > > > > > be having an improvement discussion there. > > > > > > > > > > > > > > > > On Fri, Dec 19, 2014 at 10:58 AM, Aaron Beppu < > > > > > abe...@siftscience.com <javascript:;>> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > Nick : Thanks, I've created an issue [1]. > > > > > > > > > > > > > > > > > > Pradeep : Yes, I have considered using that. However for > the > > > > > moment, > > > > > > > > we've > > > > > > > > > set it out of scope, since our migration from 0.94 -> 0.98 > is > > > > > > already a > > > > > > > > bit > > > > > > > > > complicated, and we hoped to separate isolate these changes > > by > > > > not > > > > > > > moving > > > > > > > > > to the async client until after the current migration is > > > > complete. > > > > > > > > > > > > > > > > > > Andrew : HTableMultiplexer does seem like it would solve > our > > > > > buffered > > > > > > > > write > > > > > > > > > problem, albeit in an awkward way -- thanks! It kind of > seems > > > > like > > > > > > > HTable > > > > > > > > > should then (if autoFlush == false) send writes to the > > > > multiplexer, > > > > > > > > rather > > > > > > > > > than setting it in its own, short-lived writeBuffer. If > > nothing > > > > > else, > > > > > > > > it's > > > > > > > > > still super confusing that HTableInterface exposes > > > setAutoFlush() > > > > > and > > > > > > > > > setWriteBufferSize(), given that the writeBuffer won't > > > > meaningfully > > > > > > > > buffer > > > > > > > > > anything if all tables are short-lived. > > > > > > > > > > > > > > > > > > [1] https://issues.apache.org/jira/browse/HBASE-12728 > > > > > > > > > > > > > > > > > > On Fri, Dec 19, 2014 at 10:31 AM, Andrew Purtell < > > > > > > apurt...@apache.org <javascript:;>> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > I believe HTableMultiplexer[1] is meant to stand in for > > > > > HTablePool > > > > > > > for > > > > > > > > > > buffered writing. FWIW, I've not used it. > > > > > > > > > > > > > > > > > > > > 1: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTableMultiplexer.html > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Dec 19, 2014 at 9:00 AM, Nick Dimiduk < > > > > > ndimi...@apache.org <javascript:;> > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > Hi Aaron, > > > > > > > > > > > > > > > > > > > > > > Your analysis is spot on and I do not believe this is > by > > > > > design. > > > > > > I > > > > > > > > see > > > > > > > > > > the > > > > > > > > > > > write buffer is owned by the table, while I would have > > > > expected > > > > > > > there > > > > > > > > > to > > > > > > > > > > be > > > > > > > > > > > a buffer per table all managed by the connection. I > > suggest > > > > you > > > > > > > > raise a > > > > > > > > > > > blocker ticket vs the 1.0.0 release that's just around > > the > > > > > corner > > > > > > > to > > > > > > > > > give > > > > > > > > > > > this the attention it needs. Let me know if you're not > > into > > > > > > JIRA, I > > > > > > > > can > > > > > > > > > > > raise one on your behalf. > > > > > > > > > > > > > > > > > > > > > > cc Lars, Enis. > > > > > > > > > > > > > > > > > > > > > > Nice work Aaron. > > > > > > > > > > > -n > > > > > > > > > > > > > > > > > > > > > > On Wed, Dec 17, 2014 at 6:44 PM, Aaron Beppu < > > > > > > > abe...@siftscience.com <javascript:;> > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > > > > > > > > > > > TLDR; in the absence of HTablePool, if HTable > instances > > > are > > > > > > > > > > short-lived, > > > > > > > > > > > > how should clients use buffered writes? > > > > > > > > > > > > > > > > > > > > > > > > I’m working on migrating a codebase from using 0.94.6 > > > > > (CDH4.4) > > > > > > to > > > > > > > > > > 0.98.6 > > > > > > > > > > > > (CDH5.2). One issue I’m confused by is how to > > effectively > > > > use > > > > > > > > > buffered > > > > > > > > > > > > writes now that HTablePool has been deprecated[1]. > > > > > > > > > > > > > > > > > > > > > > > > In our 0.94 code, a pathway could get a table from > the > > > > pool, > > > > > > > > > configure > > > > > > > > > > it > > > > > > > > > > > > with table.setAutoFlush(false); and write Puts to it. > > > Those > > > > > > > writes > > > > > > > > > > would > > > > > > > > > > > > then go to the table instance’s writeBuffer, and > those > > > > writes > > > > > > > would > > > > > > > > > > only > > > > > > > > > > > be > > > > > > > > > > > > flushed when the buffer was full, or when we were > ready > > > to > > > > > > close > > > > > > > > out > > > > > > > > > > the > > > > > > > > > > > > pool. We were intentionally choosing to have fewer, > > > larger > > > > > > writes > > > > > > > > > from > > > > > > > > > > > the > > > > > > > > > > > > client to the cluster, and we knew we were giving up > a > > > > degree > > > > > > of > > > > > > > > > safety > > > > > > > > > > > in > > > > > > > > > > > > exchange (i.e. if the client dies after it’s > accepted a > > > > write > > > > > > but > > > > > > > > > > before > > > > > > > > > > > > the flush for that write occurs, the data is lost). > > This > > > > > seems > > > > > > to > > > > > > > > be > > > > > > > > > a > > > > > > > > > > > > generally considered a reasonable choice (cf the > HBase > > > Book > > > > > [2] > > > > > > > SS > > > > > > > > > > > 14.8.4) > > > > > > > > > > > > > > > > > > > > > > > > However in the 0.98 world, without HTablePool, the > > > endorsed > > > > > > > pattern > > > > > > > > > [3] > > > > > > > > > > > > seems to be to create a new HTable via table = > > > > > > > > > > > > stashedHConnection.getTable(tableName, > > > myExecutorService). > > > > > > > However, > > > > > > > > > > even > > > > > > > > > > > if > > > > > > > > > > > > we do table.setAutoFlush(false), because that table > > > > instance > > > > > is > > > > > > > > > > > > short-lived, its buffer never gets full. We’ll > create a > > > > table > > > > > > > > > instance, > > > > > > > > > > > > write a put to it, try to close the table, and the > > close > > > > call > > > > > > > will > > > > > > > > > > > trigger > > > > > > > > > > > > a (synchronous) flush. Thus, not having HTablePool > > seems > > > > like > > > > > > it > > > > > > > > > would > > > > > > > > > > > > cause us to have many more small writes from the > client > > > to > > > > > the > > > > > > > > > cluster, > > > > > > > > > > > and > > > > > > > > > > > > basically wipe out the advantage of turning off > > > autoflush. > > > > > > > > > > > > > > > > > > > > > > > > More concretely : > > > > > > > > > > > > > > > > > > > > > > > > // Given these two helpers ... > > > > > > > > > > > > > > > > > > > > > > > > private HTableInterface getAutoFlushTable(String > > > tableName) > > > > > > > throws > > > > > > > > > > > > IOException { > > > > > > > > > > > > // (autoflush is true by default) > > > > > > > > > > > > return storedConnection.getTable(tableName, > > > > > executorService); > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > private HTableInterface getBufferedTable(String > > > tableName) > > > > > > throws > > > > > > > > > > > > IOException { > > > > > > > > > > > > HTableInterface table = > getAutoFlushTable(tableName); > > > > > > > > > > > > table.setAutoFlush(false); > > > > > > > > > > > > return table; > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > // it's my contention that these two methods would > > behave > > > > > > almost > > > > > > > > > > > > identically, > > > > > > > > > > > > // except the first will hit a synchronous flush > during > > > the > > > > > put > > > > > > > > call, > > > > > > > > > > > > and the second will > > > > > > > > > > > > // flush during the (hidden) close call on table. > > > > > > > > > > > > > > > > > > > > > > > > private void writeAutoFlushed(Put somePut) throws > > > > > IOException { > > > > > > > > > > > > try (HTableInterface table = > > > > getAutoFlushTable(tableName)) > > > > > { > > > > > > > > > > > > table.put(somePut); // will do synchronous flush > > > > > > > > > > > > } > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > private void writeBuffered(Put somePut) throws > > > IOException > > > > { > > > > > > > > > > > > try (HTableInterface table = > > > > getBufferedTable(tableName)) { > > > > > > > > > > > > table.put(somePut); > > > > > > > > > > > > } // auto-close will trigger synchronous flush > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > It seems like the only way to avoid this is to have > > > > > long-lived > > > > > > > > HTable > > > > > > > > > > > > instances, which get reused for multiple writes. > > However, > > > > > since > > > > > > > the > > > > > > > > > > > actual > > > > > > > > > > > > writes are driven from highly concurrent code, and > > since > > > > > HTable > > > > > > > is > > > > > > > > > not > > > > > > > > > > > > threadsafe, this would involve having a number of > > HTable > > > > > > > instances, > > > > > > > > > > and a > > > > > > > > > > > > control mechanism for leasing them out to individual > > > > threads > > > > > > > > safely. > > > > > > > > > > > Except > > > > > > > > > > > > at this point it seems like we will have recreated > > > > > HTablePool, > > > > > > > > which > > > > > > > > > > > > suggests that we’re doing something deeply wrong. > > > > > > > > > > > > > > > > > > > > > > > > What am I missing here? Since the > > > > > HTableInterface.setAutoFlush > > > > > > > > method > > > > > > > > > > > still > > > > > > > > > > > > exists, it must be anticipated that users will still > > want > > > > to > > > > > > > buffer > > > > > > > > > > > writes. > > > > > > > > > > > > What’s the recommended way to actually buffer a > > > meaningful > > > > > > number > > > > > > > > of > > > > > > > > > > > > writes, from a multithreaded context, that doesn’t > just > > > > > amount > > > > > > to > > > > > > > > > > > creating > > > > > > > > > > > > a table pool? > > > > > > > > > > > > > > > > > > > > > > > > Thanks in advance, > > > > > > > > > > > > Aaron > > > > > > > > > > > > > > > > > > > > > > > > [1] https://issues.apache.org/jira/browse/HBASE-6580 > > > > > > > > > > > > [2] http://hbase.apache.org/book/perf.writing.html > > > > > > > > > > > > [3] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://issues.apache.org/jira/browse/HBASE-6580?focusedCommentId=13501302&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13501302 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > Best regards, > > > > > > > > > > > > > > > > > > > > - Andy > > > > > > > > > > > > > > > > > > > > Problems worthy of attack prove their worth by hitting > > back. > > > - > > > > > Piet > > > > > > > > Hein > > > > > > > > > > (via Tom White) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Best regards, > > > > > > > > > > > > > > > > - Andy > > > > > > > > > > > > > > > > Problems worthy of attack prove their worth by hitting back. > - > > > Piet > > > > > > Hein > > > > > > > > (via Tom White) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Best regards, > > > > > > > > > > > > - Andy > > > > > > > > > > > > Problems worthy of attack prove their worth by hitting back. - > Piet > > > > Hein > > > > > > (via Tom White) > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Best regards, > > > > > > > > - Andy > > > > > > > > Problems worthy of attack prove their worth by hitting back. - Piet > > Hein > > > > (via Tom White) > > > > > > > > > >