On Tue, Feb 14, 2012 at 7:56 AM, Oliver Meyn (GBIF) <om...@gbif.org> wrote: > 1) With a command line like 'hbase > org.apache.hadoop.hbase.PerformanceEvaluation randomWrite 10' I see 100 > mappers spawned, rather than the expected 10. I expect 10 because that's > what the usage text implies, and what the javadoc explicitly states - quoting > from doMapReduce "Run as many maps as asked-for clients." The culprit > appears to be the outer loop in writeInputFile which sets up 10 splits for > every "asked-for client" - at least, if I'm reading it right. Is this > somehow expected, or is that code leftover from some previous > iteration/experiment? >
Yeah. I'd expect ten clients, each to its own map, each doing 1M items each. Looking at writeInputFile, it seems to be dividing the namespace by ten so, yeah x10 mappers. > 2) With that same randomWrite command line above, I would expect a resulting > table with 10 * (1024 * 1024) rows (so 10485700 = roughly 10M rows). Instead > what I'm seeing is that the randomWrite job reports writing that many rows > (exactly) but running rowcounter against the table reveals only 6549899 rows. > A second attempt to build the table produces slightly different results > (e.g. 6627689). I see a similar discrepancy when using 50 instead of 10 > clients (~35% smaller than expected). Key collision could explain it, but it > seems pretty unlikely (given I only need e.g. 10M keys from a potential 2B). > Yeah, I'd think key overlap (print out the span for each mappers or check that file written by writeInputFile). Your clocks are all in sync? St.Ack