On Mon, Feb 20, 2012 at 1:12 PM, Himanish Kushary <[email protected]> wrote: > Where is the time being spent? In the server, in the mapper? - The most > time is spent in calling Htable.batch(...) inside the mapper >
So, 100 Gets at a time? > Why are you having scanner timeouts if you are doing big batch Gets? - We > are getting scanner timeout from the original Scan which serves the input > records to the mapper.The scanner caching is set to 100 . > > I think because the mapper is taking too > long(because of the batch Gets inside it) to process initial 100 records , > the next batch of scanned records throws the exception > > How big are these 100 rows? How many regions on this single RegionServer? > Also, could it be happening due to concurrency ? I am currently on a single > region-server. When i run the test case the batch Gets happen sequentially > whereas from the map-reduce the batch Gets happen concurrently on the same > region server. Could this be the reason that during map-reduce the > performance degrades due to thrashing on the same region server ? Thoughts ? > A single regionserver? How many datanodes? Whats it look like on the machine running the regionserver? Is it working hard? Seems odd that 100 gets would take longer than a a minute to complete. You've checked out the performance section of the reference guide? St.Ack
