Re: Is Data Locality Helpful? (or why run tserver and datanode on the same box?)

2014-06-19 Thread Donald Miner
I had to think about this problem a lot for a product I worked on at one point, but I think a lot of the same applies here. To Corey's point, running the rebalancer is most definitely an issue, but simply turning it off is not a good answer in a lot of situations. It exists for a reason! You can r

Re: Is Data Locality Helpful? (or why run tserver and datanode on the same box?)

2014-06-19 Thread Josh Elser
I may also be getting this conflated with how reads work. Time for me to read some HDFS code. On 6/19/14, 8:52 AM, Josh Elser wrote: I believe this happens via the DfsClient, but you can only expect the first block of a file to actually be on the local datanode (assuming there is one). Everythi

Re: Is Data Locality Helpful? (or why run tserver and datanode on the same box?)

2014-06-19 Thread Josh Elser
I believe this happens via the DfsClient, but you can only expect the first block of a file to actually be on the local datanode (assuming there is one). Everything else is possible to be remote. Assuming you have a proper rack script set up, you would imagine that you'll still get at least one

Re: Is Data Locality Helpful? (or why run tserver and datanode on the same box?)

2014-06-19 Thread Corey Nolet
AFAIK, the locality may not be guaranteed right away unless the data for a tablet was first ingested on the tablet server that is responsible for that tablet, otherwise you'll need to wait for a major compaction to rewrite the RFiles locally on the tablet server. I would assume if the tablet server

Is Data Locality Helpful? (or why run tserver and datanode on the same box?)

2014-06-19 Thread David Medinets
At the Accumulo Summit and on a recent client site, there have been conversations about Data Locality and Accumulo. I ran an experiment to see that Accumulo can scan tables when the tserver process is run on a server without a datanode process. I followed these steps: 1. Start three node cluster