Re: ISAM file location vs. read performance

Arshak Navruzyan Mon, 13 Jan 2014 11:44:12 -0800

Thanks for all the explanations.  Perhaps this is something we should
clearly spell out in the documentation once all the facts are in.  I'll
keep a task open for now. (
https://issues.apache.org/jira/browse/ACCUMULO-2185)



On Sun, Jan 12, 2014 at 4:26 PM, Donald Miner <[email protected]>wrote:

> HDFS-385 (
> https://issues.apache.org/jira/plugins/servlet/mobile#issue/HDFS-385 ) is
> for custom pluggable block placement policies and there has been some talk
> (i think) about improving mean time to recovering and data locality in
> hbase.
>
> Basically this would allow accumulo to have a policy for its blocks and
> control its own destiny... Instead of things like the rebalancer screwing
> things up.
>
> I honestly don't know much else about this. Just thought it might be
> relevant to the conversation.
>
> > On Jan 12, 2014, at 6:42 PM, Josh Elser <[email protected]> wrote:
> >
> >
> >
> >> On 1/12/14, 6:17 PM, Sean Busbey wrote:
> >> On Sun, Jan 12, 2014 at 4:42 PM, William Slacum
> >> <[email protected] <mailto:[email protected]
> >>
> >> wrote:
> >>
> >>    Some data on short circuit reads would be great to have.
> >>
> >>
> >> What kind of data are you looking for? Just HDFS read rates? or
> >> specifically Accumulo when set up to make use of it?
> >
> > I believe what Bill means, and what I'm also curious about, is
> specifically the impact on performance for Accumulo's workload: a merged
> read over multiple files. An easy test might be to create multiple RFiles
> (1 to 10 files?) which contain interspersed data. Test some sort of
> random-read and random-seek+sequential-read workloads, from 1 to 10 RFiles,
> and with shortcircuit reads on an off.
> >
> > Perhaps a slightly more accurate test would be to up the compaction
> ratio on a table, and then bulk import them to a single table, and then
> just use the regular client API.
> >
> >>    I'm unsure of how correct the "compaction leading to eventual
> >>    locality" postulation is. It seems, to me at least, that in the case
> >>    of a multi-block file, the file system would eventually try to
> >>    distribute those blocks rather than leave them all on a single host.
> >>
> >>
> >>
> >>
> >> I know in HBase set ups, it's common to either disable the HDFS Balancer
> >> or just disable for a namespace containing the part of the filesystem
> >> that handles HBase. Otherwise, when the blocks are moved off to other
> >> hosts you get performance degradation until compaction can happen again.
> >> I would expect the same thing ought to be done for Accumulo.
> >
> > AFAIK, HBase also does a lot more in regards to assigning Tablets in
> regards to the blocks that serve them, no? To my knowledge, Accumulo
> doesn't do anything like this. I don't want users to think that disabling
> the HDFS balancer is a good idea for Accumulo unless we have actual
> evidence.
>

Re: ISAM file location vs. read performance

Reply via email to