Greetings HBase Homies, I'm running the .89 dev release (though I had this problem in .20.6 as well). Trying to load 10 x 8.5 CSV files from HDFS into an empty HBase table.
Getting pretty slow loads ... 85,000 records/minute/node. I'd expect this to be at least 5x faster based on past experience. Cluster has 5 RSs, on AWS, 7 GB RAM x 8 "cores". c1.xlarge. Occasionally I'm getting "Failed to report status for 601 seconds. Killing!" on maptasks. WAL is disabled. What's odd is, I could have sworn it used to be *much* faster last week. I don't remember the code changing. Could it be environmental? top isn't displaying anything interesting. The schema is pretty simple. Each record is maybe 1k: id_set:id, id_set:mid, id_set:aguid, id_set:sid metadata:seq, metadata:rdu, metadata:deploytype, metadata:ver, metadata:type event:event data_set:ts, data_set:data, data_set:geo The code is simple (didn't write it): (Main): http://pastebin.com/vmPgeqNj (Mapper): http://pastebin.com/T2BQjs0k The logs are quite boring: HMaster: http://pastebin.com/zvyvNc3k Reigonserver: http://pastebin.com/QvJ4J7Ps Any ideas? -- Bradford Stephens, Founder, Drawn to Scale drawntoscalehq.com 727.697.7528 http://www.drawntoscalehq.com -- The intuitive, cloud-scale data solution. Process, store, query, search, and serve all your data. http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science
