On Tue, Feb 14, 2012 at 7:56 AM, Oliver Meyn (GBIF) <om...@gbif.org> wrote:
> 1) With a command line like 'hbase 
> org.apache.hadoop.hbase.PerformanceEvaluation randomWrite 10' I see 100 
> mappers spawned, rather than the expected 10.  I expect 10 because that's 
> what the usage text implies, and what the javadoc explicitly states - quoting 
> from doMapReduce "Run as many maps as asked-for clients."  The culprit 
> appears to be the outer loop in writeInputFile which sets up 10 splits for 
> every "asked-for client" - at least, if I'm reading it right.  Is this 
> somehow expected, or is that code leftover from some previous 
> iteration/experiment?
>

Yeah.  I'd expect ten clients, each to its own map, each doing 1M items each.

Looking at writeInputFile, it seems to be dividing the namespace by
ten so, yeah x10 mappers.

> 2) With that same randomWrite command line above, I would expect a resulting 
> table with 10 * (1024 * 1024) rows (so 10485700 = roughly 10M rows).  Instead 
> what I'm seeing is that the randomWrite job reports writing that many rows 
> (exactly) but running rowcounter against the table reveals only 6549899 rows. 
>  A second attempt to build the table produces slightly different results 
> (e.g. 6627689).  I see a similar discrepancy when using 50 instead of 10 
> clients (~35% smaller than expected).  Key collision could explain it, but it 
> seems pretty unlikely (given I only need e.g. 10M keys from a potential 2B).
>

Yeah, I'd think key overlap (print out the span for each mappers or
check that file written by writeInputFile).

Your clocks are all in sync?

St.Ack

Reply via email to