Hey guys, I'm banging my head against some perf issues on EC2. I'm using .20.6 on ASF hadoop .20.2, and tweaked the ec2 hbase scripts to handle the new version.
I'm trying to insert about 22G of data across nodes on EC2 m1.large instances. I'm getting speeds of about 1200 rows/minute. It seems like most inserts are <1 ms. But then some take 3sec, and occasionally I see some take 30sec. It felt like a GC issue, but the data volume should be nowhere near enough to cause that. I'm using cascading.hbase, which does use the old API -- but I've never run into these perf issues before. Ideas? I'm sure it's something painfully obvious to everyone but moi :) Here's some logs: NameNode: http://pastebin.com/j09CJQJJ DataNode: http://pastebin.com/XudWcaxW RS: http://pastebin.com/wXPBAjpu RS GC: http://pastebin.com/jqJyKAXq -- Bradford Stephens, Founder, Drawn to Scale drawntoscalehq.com 727.697.7528 http://www.drawntoscalehq.com -- The intuitive, cloud-scale data solution. Process, store, query, search, and serve all your data. http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science
