You are getting NPE on *org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.a.getName * which is not in Hadoop codebase. I can see you are using some other Scheduler Implementation *com.pepperdata.supervisor.scheduler.PepperdataSupervisorYarnFair,* hence you can check *SourceFile:204 *for more details.
My guess is that you need to set some Name parameter in which is requested only on Debug level. Thanks, On Wed, Jan 11, 2017 at 10:59 PM, Stephen Sprague <[email protected]> wrote: > ok. i would attach but... i think there might be an aversion to > attachments so i'll paste inline. hopefully its not too confusing. > > $ cat fair-scheduler.xml > > <?xml version="1.0"?> > > <!-- > This is a sample configuration file for the Fair Scheduler. For details > on the options, please refer to the fair scheduler documentation at > http://hadoop.apache.org/core/docs/r0.21.0/fair_scheduler.html. > > To create your own configuration, copy this file to > conf/fair-scheduler.xml > and add the following property in mapred-site.xml to point Hadoop to the > file, replacing [HADOOP_HOME] with the path to your installation > directory: > <property> > <name>mapred.fairscheduler.allocation.file</name> > <value>[HADOOP_HOME]/conf/fair-scheduler.xml</value> > </property> > > Note that all the parameters in the configuration file below are > optional, > including the parameters inside <pool> and <user> elements. It is only > necessary to set the ones you want to differ from the defaults. > --> > > <!-- https://hadoop.apache.org/docs/r1.2.1/fair_scheduler.html --> > > <allocations> > > <!-- NOTE. ** Preemption IS NOT turn on! ** --> > > <!-- Preemption timeout for jobs below their fair share, in seconds. > If a job is below half its fair share for this amount of time, it > is allowed to kill tasks from other jobs to go up to its fair share. > Requires mapred.fairscheduler.preemption to be true in > mapred-site.xml. --> > <fairSharePreemptionTimeout>600</fairSharePreemptionTimeout> > > <!-- Default min share preemption timeout for pools where it is not > explicitly configured, in seconds. Requires mapred.fairscheduler. > preemption > to be set to true in your mapred-site.xml. --> > <defaultMinSharePreemptionTimeout>600</defaultMinSharePreemptionTimeout> > > <!-- Default running job limit pools where it is not explicitly set. --> > <queueMaxJobsDefault>20</queueMaxJobsDefault> > > <!-- Default running job limit users where it is not explicitly set. --> > <userMaxJobsDefault>10</userMaxJobsDefault> > > > <!-- QUEUES: > dwr.interactive : 10 at once > dwr.batch_sql : 15 at once > dwr.batch_hdfs : 5 at once (distcp, sqoop, hfs -put, > anything besides 'sql') > dwr.qa : 3 at once > dwr.truck_lane : 1 at once > > cad.interactive : 5 at once > cad.batch : 10 at once > > comms.interactive : 5 at once > comms.batch : 3 at once > > default : 2 at once (to discourage its use) > --> > > > <!-- queue placement --> > > <queuePlacementPolicy> > <rule name="specified" /> > <rule name="default" /> > </queuePlacementPolicy> > > > <!-- footprint --> > <queue name='footprint'> > <schedulingPolicy>fair</schedulingPolicy> <!-- can be fifo too --> > > <maxRunningApps>4</maxRunningApps> > <aclSubmitApps>*</aclSubmitApps> > > <minMaps>10</minMaps> > <minReduces>5</minReduces> > <userMaxJobsDefault>50</userMaxJobsDefault> > > <maxMaps>200</maxMaps> > <maxReduces>200</maxReduces> > <minResources>20000 mb, 10 vcores</minResources> > <maxResources>500000 mb, 175 vcores</maxResources> > > <queue name="dev"> > <maxMaps>200</maxMaps> > <maxReduces>200</maxReduces> > <minResources>20000 mb, 10 vcores</minResources> > <maxResources>500000 mb, 175 vcores</maxResources> > </queue> > > <queue name="stage"> > <maxMaps>200</maxMaps> > <maxReduces>200</maxReduces> > <minResources>20000 mb, 10 vcores</minResources> > <maxResources>500000 mb, 175 vcores</maxResources> > </queue> > </queue> > > <!-- comms --> > <queue name='comms'> > <schedulingPolicy>fair</schedulingPolicy> <!-- can be fifo too --> > > <queue name="interactive"> > <maxRunningApps>5</maxRunningApps> > <aclSubmitApps>*</aclSubmitApps> > </queue> > > <queue name="batch"> > <maxRunningApps>10</maxRunningApps> > <aclSubmitApps>*</aclSubmitApps> > </queue> > > </queue> > > <!-- cad --> > <queue name='cad'> > <schedulingPolicy>fair</schedulingPolicy> <!-- can be fifo too --> > > <queue name="interactive"> > <maxRunningApps>5</maxRunningApps> > <aclSubmitApps>*</aclSubmitApps> > </queue> > > > <queue name="batch"> > <maxRunningApps>10</maxRunningApps> > <aclSubmitApps>*</aclSubmitApps> > </queue> > > </queue> > > > > <!-- dwr --> > <queue name="dwr"> > > <schedulingPolicy>fair</schedulingPolicy> <!-- can be fifo too --> > <minMaps>10</minMaps> > <minReduces>5</minReduces> > <userMaxJobsDefault>50</userMaxJobsDefault> > > <maxMaps>200</maxMaps> > <maxReduces>200</maxReduces> > <minResources>20000 mb, 10 vcores</minResources> > <maxResources>500000 mb, 175 vcores</maxResources> > > <!-- INTERACTiVE. 5 at once --> > <queue name="interactive"> > <weight>2.0</weight> > <maxRunningApps>5</maxRunningApps> > > <maxMaps>200</maxMaps> > <maxReduces>200</maxReduces> > <minResources>20000 mb, 10 vcores</minResources> > <maxResources>500000 mb, 175 vcores</maxResources> > > <!-- not used. Number of seconds after which the pool can preempt other > pools --> > <minSharePreemptionTimeout>60</minSharePreemptionTimeout> > > <!-- per user. but given everything is dwr (for now) its not helpful --> > <userMaxAppsDefault>5</userMaxAppsDefault> > <aclSubmitApps>*</aclSubmitApps> > </queue> > > > <!-- BATCH. 15 at once --> > <queue name="batch_sql"> > <weight>1.5</weight> > <maxRunningApps>15</maxRunningApps> > > <maxMaps>200</maxMaps> > <maxReduces>200</maxReduces> > <minResources>20000 mb, 10 vcores</minResources> > <maxResources>500000 mb, 175 vcores</maxResources> > > <!-- not used. Number of seconds after which the pool can preempt other > pools --> > <minSharePreemptionTimeout>300</minSharePreemptionTimeout> > > <userMaxAppsDefault>50</userMaxAppsDefault> > <aclSubmitApps>*</aclSubmitApps> > </queue> > > > <!-- sqoop, distcp, hdfs-put type jobs here. 3 at once --> > <queue name="batch_hdfs"> > <weight>1.0</weight> > <maxRunningApps>3</maxRunningApps> > > <!-- not used. Number of seconds after which the pool can preempt other > pools --> > <minSharePreemptionTimeout>300</minSharePreemptionTimeout> > <userMaxAppsDefault>50</userMaxAppsDefault> > <aclSubmitApps>*</aclSubmitApps> > </queue> > > > <!-- QA. 3 at once --> > <queue name="qa"> > <weight>1.0</weight> > <maxRunningApps>100</maxRunningApps> > > <!-- not used. Number of seconds after which the pool can preempt other > pools --> > <minSharePreemptionTimeout>300</minSharePreemptionTimeout> > <aclSubmitApps>*</aclSubmitApps> > <userMaxAppsDefault>50</userMaxAppsDefault> > > </queue> > > <!-- big, unruly jobs --> > <queue name="truck_lane"> > <weight>0.75</weight> > <maxRunningApps>1</maxRunningApps> > <minMaps>5</minMaps> > <minReduces>5</minReduces> > > <!-- lets try without static values and see how the "weight" works > --> > <maxMaps>192</maxMaps> > <maxReduces>192</maxReduces> > <minResources>20000 mb, 10 vcores</minResources> > <maxResources>500000 mb, 200 vcores</maxResources> > > <!-- not used. Number of seconds after which the pool can preempt other > pools --> > <!-- > <minSharePreemptionTimeout>300</minSharePreemptionTimeout> > <aclSubmitApps>*</aclSubmitApps> > <userMaxAppsDefault>50</userMaxAppsDefault> > --> > </queue> > </queue> > > <!-- DEFAULT. 2 at once --> > <queue name="default"> > <maxRunningApps>2</maxRunningApps> > > <maxMaps>40</maxMaps> > <maxReduces>40</maxReduces> > <minResources>20000 mb, 10 vcores</minResources> > <maxResources>20000 mb, 10 vcores</maxResources> > > <!-- not used. Number of seconds after which the pool can preempt other > pools --> > <minSharePreemptionTimeout>60</minSharePreemptionTimeout> > <userMaxAppsDefault>5</userMaxAppsDefault> > <aclSubmitApps>*</aclSubmitApps> > </queue> > > > </allocations> > > > > <!-- some other stuff > > <minResources>10000 mb,0vcores</minResources> > <maxResources>90000 mb,0vcores</maxResources> > > <minMaps>10</minMaps> > <minReduces>5</minReduces> > > --> > > <!-- enabling > * Bringing the queues in effect: > Once the required parameters are defined in fair-scheduler.xml file, > run the command to bring the changes in effect. > yarn rmadmin -refreshQueues > --> > > <!-- verifying > Once the command runs properly, verify if the queues are setup using 2 > options: > > 1) hadoop queue -list > or > 2) Open YARN resourcemanager GUI from Resource Manager GUI: http:// > <Resouremanager-hostname>:8088, click Scheduler. > > --> > > > <!-- notes > [fail_user@phd11-nn ~]$ id > uid=507(fail_user) gid=507(failgroup) groups=507(failgroup) > [fail_user@phd11-nn ~]$ hadoop queue -showacls > --> > > > <!-- submit > To submit an application use the parameter > -Dmapred.job.queue.name=<queue-name> > or -Dmapred.job.queuename=<queue-name> > --> > > > > > > *** yarn-site.xml > > > > $ cat yarn-site.xml > > ssprague-mbpro:~ spragues$ cat yarn-site.xml > <?xml version="1.0"?> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> > > <configuration> > <!--Autogenerated yarn params from puppet yaml hash > yarn_site_parameters__xml --> > <property> > <name>yarn.resourcemanager.hostname</name> > <value>FOO.sv2.trulia.com</value> > </property> > <property> > <name>yarn.nodemanager.aux-services</name> > <value>mapreduce_shuffle</value> > </property> > <property> > <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name> > <value>org.apache.hadoop.mapred.ShuffleHandler</value> > </property> > <property> > <name>yarn.nodemanager.local-dirs</name> > <value>/storage0/hadoop/yarn/local,/storage1/hadoop/yarn/ > local,/storage2/hadoop/yarn/local,/storage3/hadoop/yarn/ > local,/storage4/hadoop/yarn/local,/storage5/hadoop/yarn/local</value> > </property> > <property> > <name>yarn.resourcemanager.scheduler.class</name> > <value>com.pepperdata.supervisor.scheduler. > PepperdataSupervisorYarnFair</value> > </property> > <property> > <name>yarn.application.classpath</name> > <value>$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_ > COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/ > lib/*,$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$ > HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*,$TEZ_HOME/*,$ > TEZ_HOME/lib/*</value> > </property> > <property> > <name>pepperdata.license.key.specification</name> > <value>data://removed</value> > </property> > <property> > <name>pepperdata.license.key.comments</name> > <value>License Type: PRODUCTION Expiration Date (UTC): 2017/02/01 > Company Name: Trulia, LLC Cluster Name: trulia-production Number of Nodes: > 150 Contact Person Name: Deep Varma Contact Person Email: > [email protected]</value> > </property> > <property> > <name>yarn.timeline-service.hostname</name> > <value>FOO.sv2.trulia.com</value> > </property> > <property> > <name>yarn.timeline-service.enabled</name> > <value>true</value> > </property> > <property> > <name>yarn.timeline-service.webapp.address</name> > <value>FOO.sv2.trulia.com:8188</value> > </property> > <property> > <name>yarn.timeline-service.http-cross-origin.enabled</name> > <value>true</value> > </property> > <property> > <name>yarn.timeline-service.ttl-enable</name> > <value>false</value> > </property> > > <!-- > <property> > <name>yarn.timeline-service.store-class</name> > <value>org.apache.hadoop.yarn.server.timeline. > RollingLevelDbTimelineStore</value> > </property> > --> > <property> > <name>yarn.resourcemanager.system-metrics-publisher.enabled</name> > <value>true</value> > </property> > <property> > <name>yarn.scheduler.fair.user-as-default-queue</name> > <value>true</value> > </property> > <property> > <name>yarn.scheduler.fair.preemption</name> > <value>false</value> > </property> > <property> > <name>yarn.scheduler.fair.sizebasedweight</name> > <value>true</value> > </property> > <property> > <name>yarn.scheduler.minimum-allocation-mb</name> > <value>2048</value> > </property> > <property> > <name>yarn.scheduler.maximum-allocation-mb</name> > <value>8192</value> > </property> > <property> > <name>yarn.nodemanager.disk-health-checker.max-disk- > utilization-per-disk-percentage</name> > <value>98.5</value> > </property> > <property> > <name>yarn.log-aggregation.retain-seconds</name> > <value>604800</value> > </property> > <property> > <name>yarn.log-aggregation-enable</name> > <value>true</value> > </property> > <property> > <name>yarn.nodemanager.log-dirs</name> > <value>${yarn.log.dir}/userlogs</value> > </property> > <property> > <name>yarn.nodemanager.remote-app-log-dir</name> > <value>/app-logs</value> > </property> > <property> > <name>yarn.nodemanager.delete.debug-delay-sec</name> > <value>600</value> > </property> > <property> > <name>yarn.log.server.url</name> > <value>http://FOO.sv2.trulia.com:19888/jobhistory/logs</value> > </property> > > </configuration> > > > On Wed, Jan 11, 2017 at 2:27 PM, Akash Mishra <[email protected]> > wrote: > >> Please post your fair-scheduler.xml file and yarn-site.xml >> >> On Wed, Jan 11, 2017 at 9:14 PM, Stephen Sprague <[email protected]> >> wrote: >> >>> hey guys, >>> i'm running the RM with the above options (version 2.6.1) and get an NPE >>> upon startup. >>> >>> {code} >>> 17/01/11 12:44:45 FATAL resourcemanager.ResourceManager: Error starting >>> ResourceManager >>> java.lang.NullPointerException >>> at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair >>> .a.getName(SourceFile:204) >>> at org.apache.hadoop.service.CompositeService.addService(Compos >>> iteService.java:73) >>> at org.apache.hadoop.service.CompositeService.addIfService(Comp >>> ositeService.java:88) >>> at org.apache.hadoop.yarn.server.resourcemanager.ResourceManage >>> r$RMActiveServices.serviceInit(ResourceManager.java:490) >>> at org.apache.hadoop.service.AbstractService.init(AbstractServi >>> ce.java:163) >>> at org.apache.hadoop.yarn.server.resourcemanager.ResourceManage >>> r.createAndInitActiveServices(ResourceManager.java:993) >>> at org.apache.hadoop.yarn.server.resourcemanager.ResourceManage >>> r.serviceInit(ResourceManager.java:255) >>> at org.apache.hadoop.service.AbstractService.init(AbstractServi >>> ce.java:163) >>> at org.apache.hadoop.yarn.server.resourcemanager.ResourceManage >>> r.main(ResourceManager.java:1214) >>> 17/01/11 12:44:45 INFO resourcemanager.ResourceManager: SHUTDOWN_MSG: >>> {code} >>> >>> the fair-scheduler.xml file is fine and works in INFO level logging so >>> i'm pretty sure there's nothing "wrong" with it. So with DEBUG level its >>> making this java call and barfing. >>> >>> Any ideas how to fix this? >>> >>> thanks, >>> Stephen. >>> >> >> >> >> -- >> >> Regards, >> Akash Mishra. >> >> >> "It's not our abilities that make us, but our decisions."--Albus >> Dumbledore >> > > -- Regards, Akash Mishra. "It's not our abilities that make us, but our decisions."--Albus Dumbledore
