You were able to work around the durability concerns by skipping the WAL (never 
forget that this means your data in HBase is *not* guaranteed to be there).
We’re already doing this. This is actually not a problem for us, because we 
verify the data after the import (using our own restore-test mapreduce report).

Yes, I was summarizing what you had said to then make sure you understood the implications of what you had done. Good to hear you are verifying this.

Of course, you can also change your application (the Import m/r job) such that 
you can inject sleeps, but I assume you don't want to do that. We don't expose 
an option in that job (to my knowledge) that would inject slowdowns.

That’s funny - I was just talking about this with my colleague more in jest. 
But would it be possible that the MemStore realizes that the incoming write 
rate is higher than the flushing rate and slow down the write requests a little 
bit?
That means putting the „sleep“ into MemStore as a kind of an adaptive 
congestion control: MemStore could measure the incoming rate and the flushing 
rate and add some sleeps on demand...

HBase is essentially do what you're asking. By throwing the RegionTooBusyException, the client is pushed into a retry loop. The client will pause before it retries, increase the amount of time it waits the next time (by some function, I forget exactly what), and then retry the same operation.

The problem you're facing is that the default configuration is insufficient for the load and/or hardware that you're throwing at HBase.

The other thing you should be asking yourself is if you have a hotspot in your table design which is causing the load to not be evenly spread across all RegionServers.

Reply via email to