I'd like to improve block locality on a system where nearly 100% of data ingest is via bulkloading. Presently, I measure block locality by monitoring the hdfsBlocksLocalityIndex metric. On a 10 node cluster with block replication of 3, the block locality index is about 30%, which is what I'd expect to see from random block placement. Running a major compaction does not significantly improve the locality.
How can I maximize block locality in a bulkloading-based system?
