I'd like to improve block locality on a system where nearly 100% of data
ingest is via bulkloading.  Presently,  I measure block locality by
monitoring the hdfsBlocksLocalityIndex metric. On a 10 node cluster with
block replication of 3, the block locality index is about 30%, which is
what I'd expect to see from random block placement.  Running a major
compaction does not significantly improve the locality.

How can I maximize block locality in a bulkloading-based system?

Reply via email to