Thanks Alan! We are using 0.79. Also got an answer from #hadoop channel and with this quora answer:
http://www.quora.com/Where-does-Hadoop-latency-come-from-e-g-it-takes-15-25-seconds-for-an-empty-job?q=hadoop+latency <http://www.quora.com/Where-does-Hadoop-latency-come-from-e-g-it-takes-15-25-seconds-for-an-empty-job?q=hadoop+latency>We will look into combining more work in each mapper and/or use Pig 0.8. Thanks again for your help. Dexin On Wed, Mar 23, 2011 at 5:55 PM, Alan Gates <[email protected]> wrote: > What version of Pig are you using? Starting in 0.8 Pig will combine small > blocks into a single map. This prevents jobs that actually are reading > small amounts of data from taking a lot of slots on the cluster. You can > turn this off by adding -Dpig.noSplitCombination=true to your command line. > > Alan. > > > On Mar 23, 2011, at 5:45 PM, Dexin Wang wrote: > > And the nodes are pretty lightly loaded (~1.0) and there's plenty of free >> memory. Now I'm seeing 2 mappers per node. Very much under-utilized. >> >> On Wed, Mar 23, 2011 at 1:39 PM, Dexin Wang <[email protected]> wrote: >> >> Hi, >>> >>> We've seen a strange problem where some Pig jobs would just run fewer >>> mappers concurrently than the mapper capacity. Specifically we have a 10 >>> node cluster and each is configured to have 12 mappers. Normally we have >>> 120 >>> mappers running. But for some Pig jobs it will only have 10 mappers >>> running >>> (while nothing else is running), and actually appears to be 1 mapper per >>> node. >>> >>> We have not noticed the same problem with other non-Pig hadoop job. >>> Anyone >>> has experienced the same thing and have any explanation or remedy? >>> >>> Thanks! >>> Dexin >>> >>> >
