Hi, We are executing a small scale implementation using HBase. We receive around 200 - 300 MB of data each day for processing.Some of our number crunching and processing are based on this single day data.
The problem we are facing is because of this low size of the data , a single day data is residing in max 2-5 regions (our hbase split size is set to 64 MB). Due to this when our Map-reduce runs there are only about 2-5 tasks doing the effective work and so not performant to our expectations.It would be ideal to have more tasks working on this 200-300 MB data. One way we could increase the number of tasks is by further lowering the split sizes but in that case other jobs which process 30 days or 60 days data will be split into lots of tasks. Is further lowering the hbase file split size recommended ? Could anyone please suggest any other option to handle this scenario. Is it possible through some configuration or code to split the 200 - 300 MB daily data(maybe,while it gets inserted into HBase)into multiple regions but still sticking with the hbase split size (64/128/256 MB whatever) ---------- Thanks Himanish
