Slow MR data load to table

Bradford Stephens Mon, 20 Dec 2010 19:56:04 -0800

Greetings HBase Homies,

I'm running the .89 dev release (though I had this problem in .20.6 as
well).  Trying to load 10 x 8.5 CSV files from HDFS into an empty
HBase table.


Getting pretty slow loads ... 85,000 records/minute/node. I'd expect
this to be at least 5x faster based on past experience. Cluster has 5
RSs, on AWS, 7 GB RAM x 8 "cores". c1.xlarge. Occasionally I'm getting
"Failed to report status for 601 seconds. Killing!" on maptasks. WAL
is disabled.

What's odd is, I could have sworn it used to be *much* faster last
week. I don't remember the code changing. Could it be environmental?
top isn't displaying anything interesting.

The schema is pretty simple. Each record is maybe 1k:
id_set:id, id_set:mid, id_set:aguid, id_set:sid
metadata:seq, metadata:rdu, metadata:deploytype, metadata:ver, metadata:type
event:event
data_set:ts, data_set:data, data_set:geo

The code is simple (didn't write it):
(Main): http://pastebin.com/vmPgeqNj
(Mapper): http://pastebin.com/T2BQjs0k

The logs are quite boring:
HMaster: http://pastebin.com/zvyvNc3k
Reigonserver: http://pastebin.com/QvJ4J7Ps


Any ideas?

-- 
Bradford Stephens,
Founder, Drawn to Scale
drawntoscalehq.com
727.697.7528

http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
solution. Process, store, query, search, and serve all your data.

http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science

Slow MR data load to table

Reply via email to