Re: possibly Pig throttles the number of mappers

Dexin Wang Wed, 23 Mar 2011 17:59:40 -0700

Thanks Alan!

We are using 0.79. Also got an answer from #hadoop channel and with this
quora answer:


http://www.quora.com/Where-does-Hadoop-latency-come-from-e-g-it-takes-15-25-seconds-for-an-empty-job?q=hadoop+latency

<http://www.quora.com/Where-does-Hadoop-latency-come-from-e-g-it-takes-15-25-seconds-for-an-empty-job?q=hadoop+latency>We
will look into combining more work in each mapper and/or use Pig 0.8.

Thanks again for your help.

Dexin

On Wed, Mar 23, 2011 at 5:55 PM, Alan Gates <[email protected]> wrote:

> What version of Pig are you using?  Starting in 0.8 Pig will combine small
> blocks into a single map.  This prevents jobs that actually are reading
> small amounts of data from taking a lot of slots on the cluster.  You can
> turn this off by adding -Dpig.noSplitCombination=true to your command line.
>
> Alan.
>
>
> On Mar 23, 2011, at 5:45 PM, Dexin Wang wrote:
>
>  And the nodes are pretty lightly loaded (~1.0) and there's plenty of free
>> memory. Now I'm seeing 2 mappers per node. Very much under-utilized.
>>
>> On Wed, Mar 23, 2011 at 1:39 PM, Dexin Wang <[email protected]> wrote:
>>
>>  Hi,
>>>
>>> We've seen a strange problem where some Pig jobs would just run fewer
>>> mappers concurrently than the mapper capacity. Specifically we have a 10
>>> node cluster and each is configured to have 12 mappers. Normally we have
>>> 120
>>> mappers running. But for some Pig jobs it will only have 10 mappers
>>> running
>>> (while nothing else is running), and actually appears to be 1 mapper per
>>> node.
>>>
>>> We have not noticed the same problem with other non-Pig hadoop job.
>>> Anyone
>>> has experienced the same thing and have any explanation or remedy?
>>>
>>> Thanks!
>>> Dexin
>>>
>>>
>

Re: possibly Pig throttles the number of mappers

Reply via email to