Thanks for all the explanations. Perhaps this is something we should clearly spell out in the documentation once all the facts are in. I'll keep a task open for now. ( https://issues.apache.org/jira/browse/ACCUMULO-2185)
On Sun, Jan 12, 2014 at 4:26 PM, Donald Miner <[email protected]>wrote: > HDFS-385 ( > https://issues.apache.org/jira/plugins/servlet/mobile#issue/HDFS-385 ) is > for custom pluggable block placement policies and there has been some talk > (i think) about improving mean time to recovering and data locality in > hbase. > > Basically this would allow accumulo to have a policy for its blocks and > control its own destiny... Instead of things like the rebalancer screwing > things up. > > I honestly don't know much else about this. Just thought it might be > relevant to the conversation. > > > On Jan 12, 2014, at 6:42 PM, Josh Elser <[email protected]> wrote: > > > > > > > >> On 1/12/14, 6:17 PM, Sean Busbey wrote: > >> On Sun, Jan 12, 2014 at 4:42 PM, William Slacum > >> <[email protected] <mailto:[email protected] > >> > >> wrote: > >> > >> Some data on short circuit reads would be great to have. > >> > >> > >> What kind of data are you looking for? Just HDFS read rates? or > >> specifically Accumulo when set up to make use of it? > > > > I believe what Bill means, and what I'm also curious about, is > specifically the impact on performance for Accumulo's workload: a merged > read over multiple files. An easy test might be to create multiple RFiles > (1 to 10 files?) which contain interspersed data. Test some sort of > random-read and random-seek+sequential-read workloads, from 1 to 10 RFiles, > and with shortcircuit reads on an off. > > > > Perhaps a slightly more accurate test would be to up the compaction > ratio on a table, and then bulk import them to a single table, and then > just use the regular client API. > > > >> I'm unsure of how correct the "compaction leading to eventual > >> locality" postulation is. It seems, to me at least, that in the case > >> of a multi-block file, the file system would eventually try to > >> distribute those blocks rather than leave them all on a single host. > >> > >> > >> > >> > >> I know in HBase set ups, it's common to either disable the HDFS Balancer > >> or just disable for a namespace containing the part of the filesystem > >> that handles HBase. Otherwise, when the blocks are moved off to other > >> hosts you get performance degradation until compaction can happen again. > >> I would expect the same thing ought to be done for Accumulo. > > > > AFAIK, HBase also does a lot more in regards to assigning Tablets in > regards to the blocks that serve them, no? To my knowledge, Accumulo > doesn't do anything like this. I don't want users to think that disabling > the HDFS balancer is a good idea for Accumulo unless we have actual > evidence. >
