ok. i would attach but... i think there might be an aversion to attachments so i'll paste inline. hopefully its not too confusing.
$ cat fair-scheduler.xml <?xml version="1.0"?> <!-- This is a sample configuration file for the Fair Scheduler. For details on the options, please refer to the fair scheduler documentation at http://hadoop.apache.org/core/docs/r0.21.0/fair_scheduler.html. To create your own configuration, copy this file to conf/fair-scheduler.xml and add the following property in mapred-site.xml to point Hadoop to the file, replacing [HADOOP_HOME] with the path to your installation directory: <property> <name>mapred.fairscheduler.allocation.file</name> <value>[HADOOP_HOME]/conf/fair-scheduler.xml</value> </property> Note that all the parameters in the configuration file below are optional, including the parameters inside <pool> and <user> elements. It is only necessary to set the ones you want to differ from the defaults. --> <!-- https://hadoop.apache.org/docs/r1.2.1/fair_scheduler.html --> <allocations> <!-- NOTE. ** Preemption IS NOT turn on! ** --> <!-- Preemption timeout for jobs below their fair share, in seconds. If a job is below half its fair share for this amount of time, it is allowed to kill tasks from other jobs to go up to its fair share. Requires mapred.fairscheduler.preemption to be true in mapred-site.xml. --> <fairSharePreemptionTimeout>600</fairSharePreemptionTimeout> <!-- Default min share preemption timeout for pools where it is not explicitly configured, in seconds. Requires mapred.fairscheduler.preemption to be set to true in your mapred-site.xml. --> <defaultMinSharePreemptionTimeout>600</defaultMinSharePreemptionTimeout> <!-- Default running job limit pools where it is not explicitly set. --> <queueMaxJobsDefault>20</queueMaxJobsDefault> <!-- Default running job limit users where it is not explicitly set. --> <userMaxJobsDefault>10</userMaxJobsDefault> <!-- QUEUES: dwr.interactive : 10 at once dwr.batch_sql : 15 at once dwr.batch_hdfs : 5 at once (distcp, sqoop, hfs -put, anything besides 'sql') dwr.qa : 3 at once dwr.truck_lane : 1 at once cad.interactive : 5 at once cad.batch : 10 at once comms.interactive : 5 at once comms.batch : 3 at once default : 2 at once (to discourage its use) --> <!-- queue placement --> <queuePlacementPolicy> <rule name="specified" /> <rule name="default" /> </queuePlacementPolicy> <!-- footprint --> <queue name='footprint'> <schedulingPolicy>fair</schedulingPolicy> <!-- can be fifo too --> <maxRunningApps>4</maxRunningApps> <aclSubmitApps>*</aclSubmitApps> <minMaps>10</minMaps> <minReduces>5</minReduces> <userMaxJobsDefault>50</userMaxJobsDefault> <maxMaps>200</maxMaps> <maxReduces>200</maxReduces> <minResources>20000 mb, 10 vcores</minResources> <maxResources>500000 mb, 175 vcores</maxResources> <queue name="dev"> <maxMaps>200</maxMaps> <maxReduces>200</maxReduces> <minResources>20000 mb, 10 vcores</minResources> <maxResources>500000 mb, 175 vcores</maxResources> </queue> <queue name="stage"> <maxMaps>200</maxMaps> <maxReduces>200</maxReduces> <minResources>20000 mb, 10 vcores</minResources> <maxResources>500000 mb, 175 vcores</maxResources> </queue> </queue> <!-- comms --> <queue name='comms'> <schedulingPolicy>fair</schedulingPolicy> <!-- can be fifo too --> <queue name="interactive"> <maxRunningApps>5</maxRunningApps> <aclSubmitApps>*</aclSubmitApps> </queue> <queue name="batch"> <maxRunningApps>10</maxRunningApps> <aclSubmitApps>*</aclSubmitApps> </queue> </queue> <!-- cad --> <queue name='cad'> <schedulingPolicy>fair</schedulingPolicy> <!-- can be fifo too --> <queue name="interactive"> <maxRunningApps>5</maxRunningApps> <aclSubmitApps>*</aclSubmitApps> </queue> <queue name="batch"> <maxRunningApps>10</maxRunningApps> <aclSubmitApps>*</aclSubmitApps> </queue> </queue> <!-- dwr --> <queue name="dwr"> <schedulingPolicy>fair</schedulingPolicy> <!-- can be fifo too --> <minMaps>10</minMaps> <minReduces>5</minReduces> <userMaxJobsDefault>50</userMaxJobsDefault> <maxMaps>200</maxMaps> <maxReduces>200</maxReduces> <minResources>20000 mb, 10 vcores</minResources> <maxResources>500000 mb, 175 vcores</maxResources> <!-- INTERACTiVE. 5 at once --> <queue name="interactive"> <weight>2.0</weight> <maxRunningApps>5</maxRunningApps> <maxMaps>200</maxMaps> <maxReduces>200</maxReduces> <minResources>20000 mb, 10 vcores</minResources> <maxResources>500000 mb, 175 vcores</maxResources> <!-- not used. Number of seconds after which the pool can preempt other pools --> <minSharePreemptionTimeout>60</minSharePreemptionTimeout> <!-- per user. but given everything is dwr (for now) its not helpful --> <userMaxAppsDefault>5</userMaxAppsDefault> <aclSubmitApps>*</aclSubmitApps> </queue> <!-- BATCH. 15 at once --> <queue name="batch_sql"> <weight>1.5</weight> <maxRunningApps>15</maxRunningApps> <maxMaps>200</maxMaps> <maxReduces>200</maxReduces> <minResources>20000 mb, 10 vcores</minResources> <maxResources>500000 mb, 175 vcores</maxResources> <!-- not used. Number of seconds after which the pool can preempt other pools --> <minSharePreemptionTimeout>300</minSharePreemptionTimeout> <userMaxAppsDefault>50</userMaxAppsDefault> <aclSubmitApps>*</aclSubmitApps> </queue> <!-- sqoop, distcp, hdfs-put type jobs here. 3 at once --> <queue name="batch_hdfs"> <weight>1.0</weight> <maxRunningApps>3</maxRunningApps> <!-- not used. Number of seconds after which the pool can preempt other pools --> <minSharePreemptionTimeout>300</minSharePreemptionTimeout> <userMaxAppsDefault>50</userMaxAppsDefault> <aclSubmitApps>*</aclSubmitApps> </queue> <!-- QA. 3 at once --> <queue name="qa"> <weight>1.0</weight> <maxRunningApps>100</maxRunningApps> <!-- not used. Number of seconds after which the pool can preempt other pools --> <minSharePreemptionTimeout>300</minSharePreemptionTimeout> <aclSubmitApps>*</aclSubmitApps> <userMaxAppsDefault>50</userMaxAppsDefault> </queue> <!-- big, unruly jobs --> <queue name="truck_lane"> <weight>0.75</weight> <maxRunningApps>1</maxRunningApps> <minMaps>5</minMaps> <minReduces>5</minReduces> <!-- lets try without static values and see how the "weight" works --> <maxMaps>192</maxMaps> <maxReduces>192</maxReduces> <minResources>20000 mb, 10 vcores</minResources> <maxResources>500000 mb, 200 vcores</maxResources> <!-- not used. Number of seconds after which the pool can preempt other pools --> <!-- <minSharePreemptionTimeout>300</minSharePreemptionTimeout> <aclSubmitApps>*</aclSubmitApps> <userMaxAppsDefault>50</userMaxAppsDefault> --> </queue> </queue> <!-- DEFAULT. 2 at once --> <queue name="default"> <maxRunningApps>2</maxRunningApps> <maxMaps>40</maxMaps> <maxReduces>40</maxReduces> <minResources>20000 mb, 10 vcores</minResources> <maxResources>20000 mb, 10 vcores</maxResources> <!-- not used. Number of seconds after which the pool can preempt other pools --> <minSharePreemptionTimeout>60</minSharePreemptionTimeout> <userMaxAppsDefault>5</userMaxAppsDefault> <aclSubmitApps>*</aclSubmitApps> </queue> </allocations> <!-- some other stuff <minResources>10000 mb,0vcores</minResources> <maxResources>90000 mb,0vcores</maxResources> <minMaps>10</minMaps> <minReduces>5</minReduces> --> <!-- enabling * Bringing the queues in effect: Once the required parameters are defined in fair-scheduler.xml file, run the command to bring the changes in effect. yarn rmadmin -refreshQueues --> <!-- verifying Once the command runs properly, verify if the queues are setup using 2 options: 1) hadoop queue -list or 2) Open YARN resourcemanager GUI from Resource Manager GUI: http://<Resouremanager-hostname>:8088, click Scheduler. --> <!-- notes [fail_user@phd11-nn ~]$ id uid=507(fail_user) gid=507(failgroup) groups=507(failgroup) [fail_user@phd11-nn ~]$ hadoop queue -showacls --> <!-- submit To submit an application use the parameter -Dmapred.job.queue.name=<queue-name> or -Dmapred.job.queuename=<queue-name> --> *** yarn-site.xml $ cat yarn-site.xml ssprague-mbpro:~ spragues$ cat yarn-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!--Autogenerated yarn params from puppet yaml hash yarn_site_parameters__xml --> <property> <name>yarn.resourcemanager.hostname</name> <value>FOO.sv2.trulia.com</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.nodemanager.local-dirs</name> <value>/storage0/hadoop/yarn/local,/storage1/hadoop/yarn/local,/storage2/hadoop/yarn/local,/storage3/hadoop/yarn/local,/storage4/hadoop/yarn/local,/storage5/hadoop/yarn/local</value> </property> <property> <name>yarn.resourcemanager.scheduler.class</name> <value>com.pepperdata.supervisor.scheduler.PepperdataSupervisorYarnFair</value> </property> <property> <name>yarn.application.classpath</name> <value>$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*,$TEZ_HOME/*,$TEZ_HOME/lib/*</value> </property> <property> <name>pepperdata.license.key.specification</name> <value>data://removed</value> </property> <property> <name>pepperdata.license.key.comments</name> <value>License Type: PRODUCTION Expiration Date (UTC): 2017/02/01 Company Name: Trulia, LLC Cluster Name: trulia-production Number of Nodes: 150 Contact Person Name: Deep Varma Contact Person Email: [email protected] </value> </property> <property> <name>yarn.timeline-service.hostname</name> <value>FOO.sv2.trulia.com</value> </property> <property> <name>yarn.timeline-service.enabled</name> <value>true</value> </property> <property> <name>yarn.timeline-service.webapp.address</name> <value>FOO.sv2.trulia.com:8188</value> </property> <property> <name>yarn.timeline-service.http-cross-origin.enabled</name> <value>true</value> </property> <property> <name>yarn.timeline-service.ttl-enable</name> <value>false</value> </property> <!-- <property> <name>yarn.timeline-service.store-class</name> <value>org.apache.hadoop.yarn.server.timeline.RollingLevelDbTimelineStore</value> </property> --> <property> <name>yarn.resourcemanager.system-metrics-publisher.enabled</name> <value>true</value> </property> <property> <name>yarn.scheduler.fair.user-as-default-queue</name> <value>true</value> </property> <property> <name>yarn.scheduler.fair.preemption</name> <value>false</value> </property> <property> <name>yarn.scheduler.fair.sizebasedweight</name> <value>true</value> </property> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>2048</value> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>8192</value> </property> <property> <name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name> <value>98.5</value> </property> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>604800</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <name>yarn.nodemanager.log-dirs</name> <value>${yarn.log.dir}/userlogs</value> </property> <property> <name>yarn.nodemanager.remote-app-log-dir</name> <value>/app-logs</value> </property> <property> <name>yarn.nodemanager.delete.debug-delay-sec</name> <value>600</value> </property> <property> <name>yarn.log.server.url</name> <value>http://FOO.sv2.trulia.com:19888/jobhistory/logs</value> </property> </configuration> On Wed, Jan 11, 2017 at 2:27 PM, Akash Mishra <[email protected]> wrote: > Please post your fair-scheduler.xml file and yarn-site.xml > > On Wed, Jan 11, 2017 at 9:14 PM, Stephen Sprague <[email protected]> > wrote: > >> hey guys, >> i'm running the RM with the above options (version 2.6.1) and get an NPE >> upon startup. >> >> {code} >> 17/01/11 12:44:45 FATAL resourcemanager.ResourceManager: Error starting >> ResourceManager >> java.lang.NullPointerException >> at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair >> .a.getName(SourceFile:204) >> at org.apache.hadoop.service.CompositeService.addService(Compos >> iteService.java:73) >> at org.apache.hadoop.service.CompositeService.addIfService(Comp >> ositeService.java:88) >> at org.apache.hadoop.yarn.server.resourcemanager.ResourceManage >> r$RMActiveServices.serviceInit(ResourceManager.java:490) >> at org.apache.hadoop.service.AbstractService.init(AbstractServi >> ce.java:163) >> at org.apache.hadoop.yarn.server.resourcemanager.ResourceManage >> r.createAndInitActiveServices(ResourceManager.java:993) >> at org.apache.hadoop.yarn.server.resourcemanager.ResourceManage >> r.serviceInit(ResourceManager.java:255) >> at org.apache.hadoop.service.AbstractService.init(AbstractServi >> ce.java:163) >> at org.apache.hadoop.yarn.server.resourcemanager.ResourceManage >> r.main(ResourceManager.java:1214) >> 17/01/11 12:44:45 INFO resourcemanager.ResourceManager: SHUTDOWN_MSG: >> {code} >> >> the fair-scheduler.xml file is fine and works in INFO level logging so >> i'm pretty sure there's nothing "wrong" with it. So with DEBUG level its >> making this java call and barfing. >> >> Any ideas how to fix this? >> >> thanks, >> Stephen. >> > > > > -- > > Regards, > Akash Mishra. > > > "It's not our abilities that make us, but our decisions."--Albus Dumbledore >
