Hi Scott, What do you mean by "Running a major compaction does not significantly improve the locality."? If there is no other writes on your table while/after the major compaction, it should be at 100%.
hdfsBlocksLocalityIndex is for the entire node, not just for a specific table. If you have other tables which are not major_compacted, they might get this value down. Is this a production cluster? Or it's a lab cluster where you can run some tests? JM 2013/8/7 Scott Kuehn <[email protected]> > I'd like to improve block locality on a system where nearly 100% of data > ingest is via bulkloading. Presently, I measure block locality by > monitoring the hdfsBlocksLocalityIndex metric. On a 10 node cluster with > block replication of 3, the block locality index is about 30%, which is > what I'd expect to see from random block placement. Running a major > compaction does not significantly improve the locality. > > How can I maximize block locality in a bulkloading-based system? >
