> I am not surprised by the fact that there is a performance hit, I just > expected it to be less. I figured it to be somewhere between 2 and 3 times > slower, not 5 times. So with my question I was basically looking for some > measure of what to expect based on someone else's experience. Apart from > that, I hoped it would just take longer and not die.
GC does blow :) > I will re-check. I only grepped for long pauses. I guess a series of short > collections could also get in the way of application code. Perhaps I need to > tweak GC params some more. Is highly increased GC activity a logical > consequence of using WAL? Does it create a lot of short lived objects while > pushing things to WAL? It's not something I wanted to explore in my first email, but the answer is: yes, it probably helps generating GC, but it also generates more IO traffic. - Writing to the network requires serializing the objects, so that's an extra copy of the data - Each request takes a bit longer, so it's payload will stick longer in memory - Logs have an upper bound, in order to keep log replay time under control, but it has the adverse effect of forcing flushes. HBASE-3242 is about helping that specific case. > Nope. This happens when all the RS stay up and running. It looks like a hang. > It does not happen very often. After the reducers are killed the subsequent > attempt always succeeds, so it just increases the running time of the job by > ten minutes, which is OK for me for now. I think it's worth looking into it. Start by jstacking those processes to see where it hangs and be sure to enable DEBUG for HBase. > Do I need to consider this massive? We do this import every 8 hours and have > been doing so for months without trouble (without WAL), while servicing > reads. By nature of the stuff we store, we get it in batches. The reading > side of things is low volume (small number of users). I don't know anything about your write workload, so it's hard to tell if it's appropriate, but it's generally a better solution to prolonged imports. > One other option would be to detect RS failures and just re-submit the job > when that happens during the insert job. But this wouldn't scale (with the 8 > RS we have, I guess we might get away with it). Or lower the number of clients. > > Are you referring to this: > http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html ? I need to do > read-modify-write, so I am not sure if this would work for me. Yes that page, I don't know either if it'll work for you :) J-D
