Hi JM,

After forcing major compactions on all tables, the locality index
crept up to ~100%.  This means the table I suspected to be problematic
was actually fine, and some of the legacy tables on the cluster had a
high percentage of non-local blocks. A per-table version of
hdfsBlocksLocalityIndex would have been useful in this scenario, since
it wasn't obvious which tables in the cluster had the most non-local
blocks and the major compactions were costly to run.  So your
assumption that other tables were causing the low block locality score
was correct.  Thanks for the help.


Scott

On Thu, 08 Aug 2013 00:11:58 GMT Jean-Marc Spaggiari <[email protected]

> Hi Scott,

>
> What do you mean by "Running a major compaction does not significantly
> improve the locality."? If there is no other writes on your table
> while/after the major compaction, it should be at 100%.
>
> hdfsBlocksLocalityIndex is for the entire node, not just for a specific
> table. If you have other tables which are not major_compacted, they might
> get this value down.
>
> Is this a production cluster? Or it's a lab cluster where you can run some
> tests?
>
> JM
>
> 2013/8/7 Scott Kuehn <[email protected]>
>
>> > I'd like to improve block locality on a system where nearly 100% of data
>> > ingest is via bulkloading.  Presently,  I measure block locality by
>> > monitoring the hdfsBlocksLocalityIndex metric. On a 10 node cluster with
>> > block replication of 3, the block locality index is about 30%, which is
>> > what I'd expect to see from random block placement.  Running a major
>> > compaction does not significantly improve the locality.
>> >
>> > How can I maximize block locality in a bulkloading-based system?
>> >

Reply via email to