Can you tell us how often you run major compaction after the import ? Have you noticed imbalanced read / write requests in the cluster ? Meaning subset of region servers receive bulk of the writes.
We do some manual movement of regions when the above happens. Cheers On Sat, Nov 3, 2012 at 8:12 AM, David Koch <ogd...@googlemail.com> wrote: > Hello, > > Every now and then we need to flatten our cluster and re-import all data > from log files (changes in data format, etc.) Afterwards we notice a > significant increase in scan performance. As data is added and shuffled > around between region servers, performance goes down again over time (say a > couple of weeks). Are there any routine operations that one should run > manually, or settings to activate in the HBase configuration to keep the > data well distributed? We use HBase 0.92 as part of a Cloudera4 cluster. > > Thank you, > > /David >