Hey guys,

I'm banging my head against some perf issues on EC2. I'm using .20.6
on ASF hadoop .20.2, and tweaked the ec2 hbase scripts to handle the
new version.

I'm trying to insert about 22G of data across nodes on EC2 m1.large
instances. I'm getting speeds of about 1200 rows/minute. It seems like
most inserts are <1 ms. But then some take 3sec, and occasionally I
see some take 30sec. It felt like a GC issue, but the data volume
should be nowhere near enough to cause that.

I'm using cascading.hbase, which does use the old API -- but I've
never run into these perf issues before.

Ideas? I'm sure it's something painfully obvious to everyone but moi :)

Here's some logs:

NameNode: http://pastebin.com/j09CJQJJ
DataNode: http://pastebin.com/XudWcaxW
RS: http://pastebin.com/wXPBAjpu
RS GC: http://pastebin.com/jqJyKAXq


-- 
Bradford Stephens,
Founder, Drawn to Scale
drawntoscalehq.com
727.697.7528

http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
solution. Process, store, query, search, and serve all your data.

http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science

Reply via email to