> I am not surprised by the fact that there is a performance hit, I just 
> expected it to be less. I figured it to be somewhere between 2 and 3 times 
> slower, not 5 times. So with my question I was basically looking for some 
> measure of what to expect based on someone else's experience. Apart from 
> that, I hoped it would just take longer and not die.

GC does blow :)

> I will re-check. I only grepped for long pauses. I guess a series of short 
> collections could also get in the way of application code. Perhaps I need to 
> tweak GC params some more. Is highly increased GC activity a logical 
> consequence of using WAL? Does it create a lot of short lived objects while 
> pushing things to WAL?

It's not something I wanted to explore in my first email, but the
answer is: yes, it probably helps generating GC, but it also generates
more IO traffic.

 - Writing to the network requires serializing the objects, so that's
an extra copy of the data
 - Each request takes a bit longer, so it's payload will stick longer in memory
 - Logs have an upper bound, in order to keep log replay time under
control, but it has the adverse effect of forcing flushes. HBASE-3242
is about helping that specific case.

> Nope. This happens when all the RS stay up and running. It looks like a hang. 
> It does not happen very often. After the reducers are killed the subsequent 
> attempt always succeeds, so it just increases the running time of the job by 
> ten minutes, which is OK for me for now.

I think it's worth looking into it. Start by jstacking those processes
to see where it hangs and be sure to enable DEBUG for HBase.

> Do I need to consider this massive? We do this import every 8 hours and have 
> been doing so for months without trouble (without WAL), while servicing 
> reads. By nature of the stuff we store, we get it in batches. The reading 
> side of things is low volume (small number of users).

I don't know anything about your write workload, so it's hard to tell
if it's appropriate, but it's generally a better solution to prolonged
imports.

> One other option would be to detect RS failures and just re-submit the job 
> when that happens during the insert job. But this wouldn't scale (with the 8 
> RS we have, I guess we might get away with it).

Or lower the number of clients.

>
> Are you referring to this: 
> http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html ? I need to do 
> read-modify-write, so I am not sure if this would work for me.

Yes that page, I don't know either if it'll work for you :)

J-D

Reply via email to