Hi J-D,

Some more inline...

Hey Friso,

A few thoughts:

- Using the WAL will degrade your performance a lot and this is
expected. Remember, appending is the operation of sending a chunk of
data to the memory of 3 datanodes in a pipeline (compared to just
writing to memory when not using it). I'm not sure I quite understand
your surprise there.

I am not surprised by the fact that there is a performance hit, I just expected 
it to be less. I figured it to be somewhere between 2 and 3 times slower, not 5 
times. So with my question I was basically looking for some measure of what to 
expect based on someone else's experience. Apart from that, I hoped it would 
just take longer and not die.


- Regarding the YouAreDeadException, the fact that it prints out
"have not heard from server in 70697ms" and that just before that
nothing was printed in the log between 02:38:55 and 02:40:11, strongly
indicates GC activity in that JVM. I'd like to see the GC log before
ruling that out.

I will re-check. I only grepped for long pauses. I guess a series of short 
collections could also get in the way of application code. Perhaps I need to 
tweak GC params some more. Is highly increased GC activity a logical 
consequence of using WAL? Does it create a lot of short lived objects while 
pushing things to WAL?


- Regarding the clients that died, did it happen at the same time as
region server died?

Nope. This happens when all the RS stay up and running. It looks like a hang. 
It does not happen very often. After the reducers are killed the subsequent 
attempt always succeeds, so it just increases the running time of the job by 
ten minutes, which is OK for me for now.


- Finally, doing massive imports requires finer tuning your cluster
compared to one that serves "normal" traffic. Have you considered
using the bulk loading tools instead?

Do I need to consider this massive? We do this import every 8 hours and have 
been doing so for months without trouble (without WAL), while servicing reads. 
By nature of the stuff we store, we get it in batches. The reading side of 
things is low volume (small number of users).

One other option would be to detect RS failures and just re-submit the job when 
that happens during the insert job. But this wouldn't scale (with the 8 RS we 
have, I guess we might get away with it).

Are you referring to this: 
http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html ? I need to do 
read-modify-write, so I am not sure if this would work for me.




J-D

On Wed, Dec 15, 2010 at 6:44 AM, Friso van Vollenhoven
<[email protected]<mailto:[email protected]>> wrote:
Hi,

I am experiencing some performance issues doing puts with WAL enabled. Without, 
everything runs fine.

My workload is doing roughly 30 million reads (rows) and after each read do a 
number of puts (update multiple indexes, basically). Total is about 165 million 
puts. The work is done from a MR job that uses 15 reducers against 8 RS. The 
reads and writes are across 16 tables, but I hit only a small number of regions 
(out of about 1000 in total for all tables). Without WAL, the job takes about 
30 to 45 minutes. With WAL, if it runs to completion, it takes close to 4 hours 
(3h45m). Can the difference be that large? On the master UI HBase shows doing 
between 10K and 50K requests per second with quite some drops to almost zero 
for some amount of time, while without WAL for the same job it easily reaches 
over 100K sustained.

Any hint on where to look is greatly appreciated. Below is a description of 
what happens.

Also, on some runs, region servers die because they fail to report for more 
than 1 minute (YouAreDeadException). I could set the timeout longer, but I 
think it should work for this setup. The GC log does not show any obvious long 
pause. Is it possible for flushing / log appending to block the ZK client / 
heartbeat? In the log snippet below you see a pause of about 1m30s between two 
log lines before the RS starts to shut down.

2010-12-15 02:38:55,457 INFO org.apache.hadoop.hbase.regionserver.Store: Added 
hdfs://m1r1.inrdb.ripe.net:9000/hbase/inrdb_ris_update_rrc12/fe902bb3224a1522b0be94d8459f7217/meta/3270107182044894418,
 entries=10653, sequenceid=367327898, memsize=2.4m, filesize=102.7k
2010-12-15 02:40:11,959 INFO org.apache.zookeeper.ClientCnxn: Client session 
timed out, have not heard from server in 70697ms for sessionid 
0x12ce510319b0004, closing sock
et connection and attempting reconnect
2010-12-15 02:40:12,125 INFO org.apache.zookeeper.ClientCnxn: Client session 
timed out, have not heard from server in 66270ms for sessionid 
0x12ce510319b0005, closing sock
et connection and attempting reconnect
2010-12-15 02:40:12,174 INFO 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: 
hconnection-0x12ce510319b0004 Received Disconnected from ZooKeeper, ignoring
2010-12-15 02:40:12,276 INFO 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: 
regionserver:60020-0x12ce510319b0005 Received Disconnected from ZooKeeper, 
ignoring
2010-12-15 02:40:12,623 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
connection to server 
m1r1.inrdb.ripe.net/2001:610:240:1:0:0:c100:1733:2181<http://m1r1.inrdb.ripe.net/2001:610:240:1:0:0:c100:1733:2181>
2010-12-15 02:40:12,802 FATAL 
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
serverName=w2r1.inrdb.ripe.net,60020,1292333234919, load=(requests
=10822, regions=575, usedHeap=6806, maxHeap=16000): Unhandled exception: 
org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently 
processing w2r1.inr
db.ripe.net<http://db.ripe.net>,60020,1292333234919 as dead server
org.apache.hadoop.hbase.YouAreDeadException: 
org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently 
processing w2r1.inrdb.ripe.net<http://w2r1.inrdb.ripe.net>,60020,1292333234
919 as dead server

On some other runs there occurs blocking on the client side. This causes the 
reducers to get killed after ten minutes of not reporting. Still, the GC log 
does not show any long pauses.

I guess that the RS dying and the clients blocking are just side effects of 
HBase not being able to cope with the load.

Versions and setup:
Hadoop CDH3b3
HBase 0.90 rc1
1 master, running: NN, HM, JT, ZK
8 workers, running: DN, RS, TT
RS gets 16GB heap
HBase max filesize = 1GB, client side write buffer = 16MB, memstore flush size 
is at 128MB
CPU usage is not off the chart and no swapping is happening.



Friso



Reply via email to