So how does hadoop get this property if it is per node? Does it get the minimum of all nodes?
--> No its not minimum of all nodes. Each nodemanager reads this configuration from its respective configuration file(yarn-site.xml). Nodemanager is like an agent which manages the lifecycle of containers and installed on each node where you want to run containers. It communicates with resource manager and that is how resource manager comes to know about capability of each node. At the time of registration with RM, Nodemanager tells about that node's capability to RM(for scheduling) by reading above 2 configuration items(one for memory and one for vcores). By capability of node I meant you may have some nodes which has 8 cores and some which have 16 cores, for instance. Some may have 16 GB memory and some 24 GB. So above 2 configurations can be configured accordingly because till Hadoop 2.7 we were not getting a node's hardware capability from operating system. This will be automatically read from OS(Linux/Windows), if configured to do so, from 2.8 onwards. This is a nodemanager configuration and is not required to be configured at the client side while submitting the job. Regards, Varun Saxena On Mon, Aug 24, 2015 at 1:26 AM, Varun Saxena <[email protected]> wrote: > This configuration is read and used by NodeManager, on whichever node its > running. > If it is not configured, default value will be taken. > > Regards, > Varun Saxena. > > On Mon, Aug 24, 2015 at 1:21 AM, Pedro Magalhaes <[email protected]> > wrote: > >> Thanks Varun! Like we say in Brazil. "U are the guy!" (Você é o cara!) >> >> I have another question. You said that: >> "yarn.nodemanager.resource.cpu-vcores on the other hand will have to be >> configured as per resource capability of that particular node. " >> >> I get the configuration from my job and printed it: >> yarn.nodemanager.resource.cpu-vcores 8 >> yarn.nodemanager.resource.memory-mb 8192 >> >> So how does hadoop get this property if it is per node? Does it get the >> minimum of all nodes? Thanks again! >> >> >> >> On Sun, Aug 23, 2015 at 4:40 PM, Varun Saxena <[email protected]> >> wrote: >> >>> The fix would be released in next version(2.8.0). >>> I had checked the code to find out the default value and then found it >>> fixed in documentation(configuration list). >>> >>> As this is an unreleased version, a URL link (of the form >>> https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml) >>> may not be available AFAIK, >>> However, this XML(yarn-default.xml) can be checked online in git >>> repository. >>> >>> Associated JIRA which fixes this is >>> https://issues.apache.org/jira/browse/YARN-3823 >>> >>> Regards, >>> Varun Saxena. >>> >>> On Mon, Aug 24, 2015 at 12:53 AM, Pedro Magalhaes <[email protected]> >>> wrote: >>> >>>> Thanks Varun! >>>> Could plz send me the link with the fixed? >>>> >>>> On Sun, Aug 23, 2015 at 2:20 PM, Varun Saxena <[email protected]> >>>> wrote: >>>> >>>>> Hi Pedro, >>>>> >>>>> Real default value of yarn.scheduler.maximum-allocation-vcores is 4. >>>>> The value of 32 is actually a documentation issue and has been fixed >>>>> recently. >>>>> >>>>> Regards, >>>>> Varun Saxena. >>>>> >>>>> >>>>> On Sun, Aug 23, 2015 at 10:39 PM, Pedro Magalhaes <[email protected] >>>>> > wrote: >>>>> >>>>>> Varun, >>>>>> Thanks for the reply. I undestand the arn.scheduler.maximum- >>>>>> allocation-vcores parameter. I just asking why the default parameter >>>>>> is yarn.scheduler.maximum-allocation-vcores=32. And >>>>>> yarn.nodemanager.resource.cpu-vcores=8. >>>>>> >>>>>> In my opinion, if the yarn.scheduler.maximun-allocation-vcore is 32 >>>>>> tby default the yarn.nodemanager.resource.cpu-vcores would be equal or >>>>>> greater than 32, by default. >>>>>> Is this make sense? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Sun, Aug 23, 2015 at 2:00 PM, Varun Saxena < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi Pedro, >>>>>>> >>>>>>> Actual allocation would depend on the total resource capability >>>>>>> advertised by NM while registering with RM. >>>>>>> >>>>>>> yarn.scheduler.maximum-allocation-vcores merely puts an upper cap on >>>>>>> number of vcores which can be allocated by RM i.e. any Resource >>>>>>> request/ask from AM which asks for vcores > 32(default value) for a >>>>>>> container, will be normalized back to 32. >>>>>>> >>>>>>> If there is no such node available, this allocation will not be >>>>>>> fulfilled. >>>>>>> >>>>>>> yarn.scheduler.maximum-allocation-vcores will be configured in >>>>>>> resource manager and hence will be common for a cluster which can >>>>>>> possibly >>>>>>> have multiple nodes with heterogeneous resource capabilities >>>>>>> >>>>>>> yarn.nodemanager.resource.cpu-vcores on the other hand will have to >>>>>>> be configured as per resource capability of that particular node. >>>>>>> >>>>>>> Recently there has been work done to automatically get memory and >>>>>>> CPU information from underlying OS(supported OS being Linux and >>>>>>> Windows) if >>>>>>> configured to do so. This change would be available in 2.8 >>>>>>> I hope this answers your question. >>>>>>> >>>>>>> Regards, >>>>>>> Varun Saxena. >>>>>>> >>>>>>> On Sun, Aug 23, 2015 at 9:40 PM, Pedro Magalhaes < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> I was looking at default parameters for: >>>>>>>> >>>>>>>> yarn.nodemanager.resource.cpu-vcores = 8 >>>>>>>> yarn.scheduler.maximum-allocation-vcores = 32 >>>>>>>> >>>>>>>> For me this two parameters as default doesnt make any sense. >>>>>>>> >>>>>>>> The first one say "the number of CPU cores that can be allocated >>>>>>>> for containers." (I imagine that is vcore) The seconds says: "The >>>>>>>> maximum >>>>>>>> allocation for every container request at the RM". In my opinion, the >>>>>>>> second one must be equal or less than the first one. >>>>>>>> >>>>>>>> How can allocate 32 vcores for a container if i have only 8 cores >>>>>>>> available per container? >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
