What I understand so far is that in pig you cannot decide how many mappers will run. That is given by some optimalization - given the number of files, size of blocks etc. What you can control is the number of reducers via Parallel directive. But for sure you can SET mapreduce.job.maps but not sure what the effect will be. That is what I remember from doc.
Hope this helps On 21 October 2014 13:30, Shahab Yunus <[email protected]> wrote: > Jakub, are you saying that we can't change the mappers per job through the > script, right? Because, otherwise, if invoking through command line or > code, then we can, I think. We do have this property mapreduce.job.maps. > > Regards, > Shahab > > On Tue, Oct 21, 2014 at 2:42 AM, Jakub Stransky <[email protected]> > wrote: > >> Hello, >> >> as far as I understand. Number of mappers you cannot drive. The number of >> reducers you can control via PARALEL keyword. Number of containers on a >> node is given by following combination of settings: >> yarn.nodemanager.resource.memory-mb - set on a cluster. And following >> properties can be "modified" from your script setting to a different >> number, mapreduce.map.memory.mb and mapreduce.reduce.memory.mb. >> >> Hope this helps >> >> On 21 October 2014 07:31, Sunil S Nandihalli <[email protected]> >> wrote: >> >>> Hi Everybody, >>> I would like to know how I can limit the number of concurrent >>> containers requested(and used ofcourse) by my pig-script (not as a yarn >>> queue configuration or some such stuff.. I want to limit it from outside >>> on a per job basis. I would ideally like to set the number in my >>> pig-script.) Can I do this? >>> Thanks, >>> Sunil. >>> >> >> >> >> -- >> Jakub Stransky >> cz.linkedin.com/in/jakubstransky >> >> > -- Jakub Stransky cz.linkedin.com/in/jakubstransky
