Nick : Thanks, I've created an issue [1]. Pradeep : Yes, I have considered using that. However for the moment, we've set it out of scope, since our migration from 0.94 -> 0.98 is already a bit complicated, and we hoped to separate isolate these changes by not moving to the async client until after the current migration is complete.
Andrew : HTableMultiplexer does seem like it would solve our buffered write problem, albeit in an awkward way -- thanks! It kind of seems like HTable should then (if autoFlush == false) send writes to the multiplexer, rather than setting it in its own, short-lived writeBuffer. If nothing else, it's still super confusing that HTableInterface exposes setAutoFlush() and setWriteBufferSize(), given that the writeBuffer won't meaningfully buffer anything if all tables are short-lived. [1] https://issues.apache.org/jira/browse/HBASE-12728 On Fri, Dec 19, 2014 at 10:31 AM, Andrew Purtell <apurt...@apache.org> wrote: > > I believe HTableMultiplexer[1] is meant to stand in for HTablePool for > buffered writing. FWIW, I've not used it. > > 1: > > https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTableMultiplexer.html > > > On Fri, Dec 19, 2014 at 9:00 AM, Nick Dimiduk <ndimi...@apache.org> wrote: > > > > Hi Aaron, > > > > Your analysis is spot on and I do not believe this is by design. I see > the > > write buffer is owned by the table, while I would have expected there to > be > > a buffer per table all managed by the connection. I suggest you raise a > > blocker ticket vs the 1.0.0 release that's just around the corner to give > > this the attention it needs. Let me know if you're not into JIRA, I can > > raise one on your behalf. > > > > cc Lars, Enis. > > > > Nice work Aaron. > > -n > > > > On Wed, Dec 17, 2014 at 6:44 PM, Aaron Beppu <abe...@siftscience.com> > > wrote: > > > > > > Hi All, > > > > > > TLDR; in the absence of HTablePool, if HTable instances are > short-lived, > > > how should clients use buffered writes? > > > > > > I’m working on migrating a codebase from using 0.94.6 (CDH4.4) to > 0.98.6 > > > (CDH5.2). One issue I’m confused by is how to effectively use buffered > > > writes now that HTablePool has been deprecated[1]. > > > > > > In our 0.94 code, a pathway could get a table from the pool, configure > it > > > with table.setAutoFlush(false); and write Puts to it. Those writes > would > > > then go to the table instance’s writeBuffer, and those writes would > only > > be > > > flushed when the buffer was full, or when we were ready to close out > the > > > pool. We were intentionally choosing to have fewer, larger writes from > > the > > > client to the cluster, and we knew we were giving up a degree of safety > > in > > > exchange (i.e. if the client dies after it’s accepted a write but > before > > > the flush for that write occurs, the data is lost). This seems to be a > > > generally considered a reasonable choice (cf the HBase Book [2] SS > > 14.8.4) > > > > > > However in the 0.98 world, without HTablePool, the endorsed pattern [3] > > > seems to be to create a new HTable via table = > > > stashedHConnection.getTable(tableName, myExecutorService). However, > even > > if > > > we do table.setAutoFlush(false), because that table instance is > > > short-lived, its buffer never gets full. We’ll create a table instance, > > > write a put to it, try to close the table, and the close call will > > trigger > > > a (synchronous) flush. Thus, not having HTablePool seems like it would > > > cause us to have many more small writes from the client to the cluster, > > and > > > basically wipe out the advantage of turning off autoflush. > > > > > > More concretely : > > > > > > // Given these two helpers ... > > > > > > private HTableInterface getAutoFlushTable(String tableName) throws > > > IOException { > > > // (autoflush is true by default) > > > return storedConnection.getTable(tableName, executorService); > > > } > > > > > > private HTableInterface getBufferedTable(String tableName) throws > > > IOException { > > > HTableInterface table = getAutoFlushTable(tableName); > > > table.setAutoFlush(false); > > > return table; > > > } > > > > > > // it's my contention that these two methods would behave almost > > > identically, > > > // except the first will hit a synchronous flush during the put call, > > > and the second will > > > // flush during the (hidden) close call on table. > > > > > > private void writeAutoFlushed(Put somePut) throws IOException { > > > try (HTableInterface table = getAutoFlushTable(tableName)) { > > > table.put(somePut); // will do synchronous flush > > > } > > > } > > > > > > private void writeBuffered(Put somePut) throws IOException { > > > try (HTableInterface table = getBufferedTable(tableName)) { > > > table.put(somePut); > > > } // auto-close will trigger synchronous flush > > > } > > > > > > It seems like the only way to avoid this is to have long-lived HTable > > > instances, which get reused for multiple writes. However, since the > > actual > > > writes are driven from highly concurrent code, and since HTable is not > > > threadsafe, this would involve having a number of HTable instances, > and a > > > control mechanism for leasing them out to individual threads safely. > > Except > > > at this point it seems like we will have recreated HTablePool, which > > > suggests that we’re doing something deeply wrong. > > > > > > What am I missing here? Since the HTableInterface.setAutoFlush method > > still > > > exists, it must be anticipated that users will still want to buffer > > writes. > > > What’s the recommended way to actually buffer a meaningful number of > > > writes, from a multithreaded context, that doesn’t just amount to > > creating > > > a table pool? > > > > > > Thanks in advance, > > > Aaron > > > > > > [1] https://issues.apache.org/jira/browse/HBASE-6580 > > > [2] http://hbase.apache.org/book/perf.writing.html > > > [3] > > > > > > > > > https://issues.apache.org/jira/browse/HBASE-6580?focusedCommentId=13501302&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13501302 > > > > > > > > > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) >