This might be a very easy question, but I was wondering how the Accumulo Input Format handled a tablet file splitting over multiple nodes.
For example, if I have a tablet file that is 1GB large, where my hadoop block size is 256MB. Then there is a possibility that up to 4 nodes could be holding the data from my tablet file. However, when Accumulo Input Format creates mappers, it creates a mapper for every tablet. This might mean that 3 blocks are transferred over the network to where the mapper is running to ensure data locality. Am I correct in this assumption? Or is there something else the TabletServer is doing underneath to make sure all the data actually resides in one server, so there is no network overhead of moving blocks before a Map Reduce job. Thanks! Roshan
