I had to think about this problem a lot for a product I worked on at one
point, but I think a lot of the same applies here.
To Corey's point, running the rebalancer is most definitely an issue, but
simply turning it off is not a good answer in a lot of situations. It
exists for a reason! You can r
I may also be getting this conflated with how reads work. Time for me to
read some HDFS code.
On 6/19/14, 8:52 AM, Josh Elser wrote:
I believe this happens via the DfsClient, but you can only expect the
first block of a file to actually be on the local datanode (assuming
there is one). Everythi
I believe this happens via the DfsClient, but you can only expect the
first block of a file to actually be on the local datanode (assuming
there is one). Everything else is possible to be remote. Assuming you
have a proper rack script set up, you would imagine that you'll still
get at least one
AFAIK, the locality may not be guaranteed right away unless the data for a
tablet was first ingested on the tablet server that is responsible for that
tablet, otherwise you'll need to wait for a major compaction to rewrite the
RFiles locally on the tablet server. I would assume if the tablet server
At the Accumulo Summit and on a recent client site, there have been
conversations about Data Locality and Accumulo.
I ran an experiment to see that Accumulo can scan tables when the
tserver process is run on a server without a datanode process. I
followed these steps:
1. Start three node cluster