Re: YARN ResourceManger running with yarn.root.logger=DEBUG,console

Akash Mishra Wed, 11 Jan 2017 15:13:34 -0800

You are getting NPE on
*org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.a.getName
* which is not in Hadoop codebase. I can see you are using some other
Scheduler Implementation
*com.pepperdata.supervisor.scheduler.PepperdataSupervisorYarnFair,* hence
you can check
*SourceFile:204 *for more details.


My guess is that you need to set some Name parameter in which is requested
only on Debug level.

Thanks,



On Wed, Jan 11, 2017 at 10:59 PM, Stephen Sprague <[email protected]>
wrote:

> ok.  i would attach but... i think there might be an aversion to
> attachments so i'll paste inline.  hopefully its not too confusing.
>
> $ cat fair-scheduler.xml
>
> <?xml version="1.0"?>
>
> <!--
>   This is a sample configuration file for the Fair Scheduler. For details
>   on the options, please refer to the fair scheduler documentation at
>   http://hadoop.apache.org/core/docs/r0.21.0/fair_scheduler.html.
>
>   To create your own configuration, copy this file to
> conf/fair-scheduler.xml
>   and add the following property in mapred-site.xml to point Hadoop to the
>   file, replacing [HADOOP_HOME] with the path to your installation
> directory:
>     <property>
>       <name>mapred.fairscheduler.allocation.file</name>
>       <value>[HADOOP_HOME]/conf/fair-scheduler.xml</value>
>     </property>
>
>   Note that all the parameters in the configuration file below are
> optional,
>   including the parameters inside <pool> and <user> elements. It is only
>   necessary to set the ones you want to differ from the defaults.
> -->
>
> <!-- https://hadoop.apache.org/docs/r1.2.1/fair_scheduler.html -->
>
> <allocations>
>
>   <!-- NOTE. ** Preemption IS NOT turn on! ** -->
>
>   <!-- Preemption timeout for jobs below their fair share, in seconds.
>     If a job is below half its fair share for this amount of time, it
>     is allowed to kill tasks from other jobs to go up to its fair share.
>     Requires mapred.fairscheduler.preemption to be true in
> mapred-site.xml. -->
>   <fairSharePreemptionTimeout>600</fairSharePreemptionTimeout>
>
>   <!-- Default min share preemption timeout for pools where it is not
>     explicitly configured, in seconds. Requires mapred.fairscheduler.
> preemption
>     to be set to true in your mapred-site.xml. -->
>   <defaultMinSharePreemptionTimeout>600</defaultMinSharePreemptionTimeout>
>
>   <!-- Default running job limit pools where it is not explicitly set. -->
>   <queueMaxJobsDefault>20</queueMaxJobsDefault>
>
>   <!-- Default running job limit users where it is not explicitly set. -->
>   <userMaxJobsDefault>10</userMaxJobsDefault>
>
>
> <!--  QUEUES:
>          dwr.interactive   : 10 at once
>          dwr.batch_sql     : 15 at once
>          dwr.batch_hdfs    : 5 at once   (distcp, sqoop, hfs -put,
> anything besides 'sql')
>          dwr.qa            : 3 at once
>          dwr.truck_lane    : 1 at once
>
>          cad.interactive   : 5 at once
>          cad.batch         : 10 at once
>
>          comms.interactive : 5 at once
>          comms.batch       : 3 at once
>
>          default           : 2 at once   (to discourage its use)
> -->
>
>
> <!-- queue placement -->
>
>   <queuePlacementPolicy>
>     <rule name="specified" />
>     <rule name="default" />
>   </queuePlacementPolicy>
>
>
> <!-- footprint -->
>  <queue name='footprint'>
>     <schedulingPolicy>fair</schedulingPolicy>   <!-- can be fifo too -->
>
>     <maxRunningApps>4</maxRunningApps>
>     <aclSubmitApps>*</aclSubmitApps>
>
>     <minMaps>10</minMaps>
>     <minReduces>5</minReduces>
>     <userMaxJobsDefault>50</userMaxJobsDefault>
>
>     <maxMaps>200</maxMaps>
>     <maxReduces>200</maxReduces>
>     <minResources>20000 mb, 10 vcores</minResources>
>     <maxResources>500000 mb, 175 vcores</maxResources>
>
>     <queue name="dev">
>        <maxMaps>200</maxMaps>
>        <maxReduces>200</maxReduces>
>        <minResources>20000 mb, 10 vcores</minResources>
>        <maxResources>500000 mb, 175 vcores</maxResources>
>     </queue>
>
>     <queue name="stage">
>        <maxMaps>200</maxMaps>
>        <maxReduces>200</maxReduces>
>        <minResources>20000 mb, 10 vcores</minResources>
>        <maxResources>500000 mb, 175 vcores</maxResources>
>     </queue>
>   </queue>
>
> <!-- comms -->
>  <queue name='comms'>
>     <schedulingPolicy>fair</schedulingPolicy>   <!-- can be fifo too -->
>
>     <queue name="interactive">
>        <maxRunningApps>5</maxRunningApps>
>        <aclSubmitApps>*</aclSubmitApps>
>     </queue>
>
>     <queue name="batch">
>        <maxRunningApps>10</maxRunningApps>
>        <aclSubmitApps>*</aclSubmitApps>
>     </queue>
>
>   </queue>
>
> <!-- cad -->
>  <queue name='cad'>
>     <schedulingPolicy>fair</schedulingPolicy>   <!-- can be fifo too -->
>
>     <queue name="interactive">
>        <maxRunningApps>5</maxRunningApps>
>        <aclSubmitApps>*</aclSubmitApps>
>     </queue>
>
>
>     <queue name="batch">
>        <maxRunningApps>10</maxRunningApps>
>        <aclSubmitApps>*</aclSubmitApps>
>     </queue>
>
>   </queue>
>
>
>
> <!-- dwr -->
>   <queue name="dwr">
>
>     <schedulingPolicy>fair</schedulingPolicy>   <!-- can be fifo too -->
>     <minMaps>10</minMaps>
>     <minReduces>5</minReduces>
>     <userMaxJobsDefault>50</userMaxJobsDefault>
>
>     <maxMaps>200</maxMaps>
>     <maxReduces>200</maxReduces>
>     <minResources>20000 mb, 10 vcores</minResources>
>     <maxResources>500000 mb, 175 vcores</maxResources>
>
> <!-- INTERACTiVE. 5 at once -->
>     <queue name="interactive">
>         <weight>2.0</weight>
>         <maxRunningApps>5</maxRunningApps>
>
>        <maxMaps>200</maxMaps>
>        <maxReduces>200</maxReduces>
>        <minResources>20000 mb, 10 vcores</minResources>
>        <maxResources>500000 mb, 175 vcores</maxResources>
>
> <!-- not used. Number of seconds after which the pool can preempt other
> pools -->
>         <minSharePreemptionTimeout>60</minSharePreemptionTimeout>
>
> <!-- per user. but given everything is dwr (for now) its not helpful -->
>         <userMaxAppsDefault>5</userMaxAppsDefault>
>         <aclSubmitApps>*</aclSubmitApps>
>     </queue>
>
>
> <!-- BATCH. 15 at once -->
>     <queue name="batch_sql">
>         <weight>1.5</weight>
>         <maxRunningApps>15</maxRunningApps>
>
>        <maxMaps>200</maxMaps>
>        <maxReduces>200</maxReduces>
>        <minResources>20000 mb, 10 vcores</minResources>
>        <maxResources>500000 mb, 175 vcores</maxResources>
>
> <!-- not used. Number of seconds after which the pool can preempt other
> pools -->
>         <minSharePreemptionTimeout>300</minSharePreemptionTimeout>
>
>         <userMaxAppsDefault>50</userMaxAppsDefault>
>         <aclSubmitApps>*</aclSubmitApps>
>     </queue>
>
>
> <!-- sqoop, distcp, hdfs-put type jobs here. 3 at once -->
>     <queue name="batch_hdfs">
>         <weight>1.0</weight>
>         <maxRunningApps>3</maxRunningApps>
>
> <!-- not used. Number of seconds after which the pool can preempt other
> pools -->
>         <minSharePreemptionTimeout>300</minSharePreemptionTimeout>
>         <userMaxAppsDefault>50</userMaxAppsDefault>
>         <aclSubmitApps>*</aclSubmitApps>
>     </queue>
>
>
> <!-- QA. 3 at once -->
>     <queue name="qa">
>         <weight>1.0</weight>
>         <maxRunningApps>100</maxRunningApps>
>
> <!-- not used. Number of seconds after which the pool can preempt other
> pools -->
>         <minSharePreemptionTimeout>300</minSharePreemptionTimeout>
>         <aclSubmitApps>*</aclSubmitApps>
>         <userMaxAppsDefault>50</userMaxAppsDefault>
>
>     </queue>
>
> <!-- big, unruly jobs -->
>     <queue name="truck_lane">
>         <weight>0.75</weight>
>         <maxRunningApps>1</maxRunningApps>
>         <minMaps>5</minMaps>
>         <minReduces>5</minReduces>
>
> <!-- lets try without static values and see how the "weight" works
> -->
>         <maxMaps>192</maxMaps>
>         <maxReduces>192</maxReduces>
>         <minResources>20000 mb, 10 vcores</minResources>
>         <maxResources>500000 mb, 200 vcores</maxResources>
>
> <!-- not used. Number of seconds after which the pool can preempt other
> pools -->
> <!--
>         <minSharePreemptionTimeout>300</minSharePreemptionTimeout>
>         <aclSubmitApps>*</aclSubmitApps>
>         <userMaxAppsDefault>50</userMaxAppsDefault>
> -->
>     </queue>
>   </queue>
>
> <!-- DEFAULT. 2 at once -->
>   <queue name="default">
>        <maxRunningApps>2</maxRunningApps>
>
>        <maxMaps>40</maxMaps>
>        <maxReduces>40</maxReduces>
>        <minResources>20000 mb, 10 vcores</minResources>
>        <maxResources>20000 mb, 10 vcores</maxResources>
>
> <!-- not used. Number of seconds after which the pool can preempt other
> pools -->
>       <minSharePreemptionTimeout>60</minSharePreemptionTimeout>
>       <userMaxAppsDefault>5</userMaxAppsDefault>
>       <aclSubmitApps>*</aclSubmitApps>
>   </queue>
>
>
> </allocations>
>
>
>
> <!-- some other stuff
>
>     <minResources>10000 mb,0vcores</minResources>
>     <maxResources>90000 mb,0vcores</maxResources>
>
>     <minMaps>10</minMaps>
>     <minReduces>5</minReduces>
>
> -->
>
> <!-- enabling
>    * Bringing the queues in effect:
>    Once the required parameters are defined in fair-scheduler.xml file,
> run the command to bring the changes in effect.
>    yarn rmadmin -refreshQueues
> -->
>
> <!-- verifying
>   Once the command runs properly, verify if the queues are setup using 2
> options:
>
>   1) hadoop queue -list
>   or
>   2) Open YARN resourcemanager GUI from Resource Manager GUI: http://
> <Resouremanager-hostname>:8088, click Scheduler.
>
> -->
>
>
> <!-- notes
>    [fail_user@phd11-nn ~]$ id
>    uid=507(fail_user) gid=507(failgroup) groups=507(failgroup)
>    [fail_user@phd11-nn ~]$ hadoop queue -showacls
> -->
>
>
> <!-- submit
>    To submit an application use the parameter 
> -Dmapred.job.queue.name=<queue-name>
> or -Dmapred.job.queuename=<queue-name>
> -->
>
>
>
>
>
> *** yarn-site.xml
>
>
>
> $ cat yarn-site.xml
>
> ssprague-mbpro:~ spragues$ cat yarn-site.xml
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <configuration>
> <!--Autogenerated yarn params from puppet yaml hash
> yarn_site_parameters__xml -->
>   <property>
>     <name>yarn.resourcemanager.hostname</name>
>     <value>FOO.sv2.trulia.com</value>
>   </property>
>   <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>mapreduce_shuffle</value>
>   </property>
>   <property>
>     <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
>     <value>org.apache.hadoop.mapred.ShuffleHandler</value>
>   </property>
>   <property>
>     <name>yarn.nodemanager.local-dirs</name>
>     <value>/storage0/hadoop/yarn/local,/storage1/hadoop/yarn/
> local,/storage2/hadoop/yarn/local,/storage3/hadoop/yarn/
> local,/storage4/hadoop/yarn/local,/storage5/hadoop/yarn/local</value>
>   </property>
>   <property>
>     <name>yarn.resourcemanager.scheduler.class</name>
>     <value>com.pepperdata.supervisor.scheduler.
> PepperdataSupervisorYarnFair</value>
>   </property>
>   <property>
>     <name>yarn.application.classpath</name>
>     <value>$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_
> COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/
> lib/*,$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$
> HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*,$TEZ_HOME/*,$
> TEZ_HOME/lib/*</value>
>   </property>
>   <property>
>     <name>pepperdata.license.key.specification</name>
>     <value>data://removed</value>
>   </property>
>   <property>
>     <name>pepperdata.license.key.comments</name>
>     <value>License Type: PRODUCTION Expiration Date (UTC): 2017/02/01
> Company Name: Trulia, LLC Cluster Name: trulia-production Number of Nodes:
> 150 Contact Person Name: Deep Varma Contact Person Email:
> [email protected]</value>
>   </property>
>   <property>
>     <name>yarn.timeline-service.hostname</name>
>     <value>FOO.sv2.trulia.com</value>
>   </property>
>   <property>
>     <name>yarn.timeline-service.enabled</name>
>     <value>true</value>
>   </property>
>   <property>
>     <name>yarn.timeline-service.webapp.address</name>
>     <value>FOO.sv2.trulia.com:8188</value>
>   </property>
>   <property>
>     <name>yarn.timeline-service.http-cross-origin.enabled</name>
>     <value>true</value>
>   </property>
>   <property>
>     <name>yarn.timeline-service.ttl-enable</name>
>     <value>false</value>
>   </property>
>
> <!--
>   <property>
>     <name>yarn.timeline-service.store-class</name>
>     <value>org.apache.hadoop.yarn.server.timeline.
> RollingLevelDbTimelineStore</value>
>   </property>
> -->
>   <property>
>     <name>yarn.resourcemanager.system-metrics-publisher.enabled</name>
>     <value>true</value>
>   </property>
>   <property>
>     <name>yarn.scheduler.fair.user-as-default-queue</name>
>     <value>true</value>
>   </property>
>   <property>
>     <name>yarn.scheduler.fair.preemption</name>
>     <value>false</value>
>   </property>
>   <property>
>     <name>yarn.scheduler.fair.sizebasedweight</name>
>     <value>true</value>
>   </property>
>   <property>
>     <name>yarn.scheduler.minimum-allocation-mb</name>
>     <value>2048</value>
>   </property>
>   <property>
>     <name>yarn.scheduler.maximum-allocation-mb</name>
>     <value>8192</value>
>   </property>
>   <property>
>     <name>yarn.nodemanager.disk-health-checker.max-disk-
> utilization-per-disk-percentage</name>
>     <value>98.5</value>
>   </property>
>   <property>
>     <name>yarn.log-aggregation.retain-seconds</name>
>     <value>604800</value>
>   </property>
>   <property>
>     <name>yarn.log-aggregation-enable</name>
>     <value>true</value>
>   </property>
>   <property>
>     <name>yarn.nodemanager.log-dirs</name>
>     <value>${yarn.log.dir}/userlogs</value>
>   </property>
>   <property>
>     <name>yarn.nodemanager.remote-app-log-dir</name>
>     <value>/app-logs</value>
>   </property>
>   <property>
>     <name>yarn.nodemanager.delete.debug-delay-sec</name>
>     <value>600</value>
>   </property>
>   <property>
>     <name>yarn.log.server.url</name>
>     <value>http://FOO.sv2.trulia.com:19888/jobhistory/logs</value>
>   </property>
>
> </configuration>
>
>
> On Wed, Jan 11, 2017 at 2:27 PM, Akash Mishra <[email protected]>
> wrote:
>
>> Please post your fair-scheduler.xml file and yarn-site.xml
>>
>> On Wed, Jan 11, 2017 at 9:14 PM, Stephen Sprague <[email protected]>
>> wrote:
>>
>>> hey guys,
>>> i'm running the RM with the above options (version 2.6.1) and get an NPE
>>> upon startup.
>>>
>>> {code}
>>> 17/01/11 12:44:45 FATAL resourcemanager.ResourceManager: Error starting
>>> ResourceManager
>>> java.lang.NullPointerException
>>>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair
>>> .a.getName(SourceFile:204)
>>>         at org.apache.hadoop.service.CompositeService.addService(Compos
>>> iteService.java:73)
>>>         at org.apache.hadoop.service.CompositeService.addIfService(Comp
>>> ositeService.java:88)
>>>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>> r$RMActiveServices.serviceInit(ResourceManager.java:490)
>>>         at org.apache.hadoop.service.AbstractService.init(AbstractServi
>>> ce.java:163)
>>>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>> r.createAndInitActiveServices(ResourceManager.java:993)
>>>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>> r.serviceInit(ResourceManager.java:255)
>>>         at org.apache.hadoop.service.AbstractService.init(AbstractServi
>>> ce.java:163)
>>>         at org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>> r.main(ResourceManager.java:1214)
>>> 17/01/11 12:44:45 INFO resourcemanager.ResourceManager: SHUTDOWN_MSG:
>>> {code}
>>>
>>> the fair-scheduler.xml file is fine and works in INFO level logging so
>>> i'm pretty sure there's nothing "wrong" with it. So with DEBUG level its
>>> making this java call and barfing.
>>>
>>> Any ideas how to fix this?
>>>
>>> thanks,
>>> Stephen.
>>>
>>
>>
>>
>> --
>>
>> Regards,
>> Akash Mishra.
>>
>>
>> "It's not our abilities that make us, but our decisions."--Albus
>> Dumbledore
>>
>
>


-- 

Regards,
Akash Mishra.


"It's not our abilities that make us, but our decisions."--Albus Dumbledore

Re: YARN ResourceManger running with yarn.root.logger=DEBUG,console

Reply via email to