Jean-Daniel Cryans wrote:
You expect a MapReduce job to be faster than a Scan on small data,
your expectation is wrong.

never expected a MR job to be faster  for every context

There's a minimal cost to every MR job, which is of a few seconds, and
you can't go around it.

for sure there is an overhead for MR job, and a few seconds are OK, but not a 
whole minute...

so what time can be expected for processing a full scan of i.e. 1.000.000.000 
rows in an hbase cluster with i.e. 3 region servers?

i'm just wondering, if its worth to run the full scan only once a day, and to 
persist the results
i hoped to be able to process it on demand, but if it takes too much time, its 
not acceptable

andre

Reply via email to