Hi, Leon, First of all, the latest HAWQ use "hawq_global_rm_type" to indicate "NONE" mode or "YARN" mode(But this is not the reason of the failure below).
The log you attached shows that HAWQ is trying to run in YARN mode, and attend to register itself to Hadoop Yarn Resource manager but failed. (If succeed, the Progress will be 50%, not 0%) Please open your yarn-site.xml to check if property yarn.resourcemanager.system-metrics-publisher.enabled is true or false. If property yarn.resourcemanager.system-metrics-publisher.enabled is true, HAWQ will failed to register it to Hadoop Yarn, the progress of Hawq is 0%(expected 50%). In the log file of Hadoop Yarn, a null pointer exception occurs, just like your exception. This similar to http://zh.hortonworks.com/community/forums/topic/error-in-handling-event-type-registered-for-applicationattempt/ If yarn.resourcemanager.system-metrics-publisher.enabled is disable, the HAWQ can register itself to Yarn successfully.I haven't investigated the reason and don't know why the null pointer happens, just track it. If it is not because of yarn.resourcemanager.system-metrics-publisher.enabled in your environment, it maybe the other things cause a null pointer happen in Yarn. Thanks! On Thu, Nov 26, 2015 at 4:46 PM, Leon Zhang <[email protected]> wrote: > Thanks Daniel > > After I switch "hawq_resourcemanager_server_type" to "yarn", I can see > the application now: > > $ yarn application -list > > > Application-Id Application-Name > Application-Type User Queue State > Final-State Progress > Tracking-URL > application_1447985660182_0558 hawq > YARN xiaolin default RUNNING > UNDEFINED 0% > url > > But, my hawq application hang at RUNNING state. And the log shows: > > > 2015-11-26 16:40:16,186 INFO security.AMRMTokenSecretManager > (AMRMTokenSecretManager.java:createPassword(307)) - Creating password for > appattempt_1447985660182_0620_000001 > 2015-11-26 16:40:16,187 INFO attempt.RMAppAttemptImpl > (RMAppAttemptImpl.java:handle(762)) - appattempt_1447985660182_0620_000001 > State change from LAUNCHED_UNMANAGED_SAVING to LAUNCHED > 2015-11-26 16:40:17,193 INFO ipc.Server (Server.java:saslProcess(1306)) - > Auth successful for appattempt_1447985660182_0620_000001 (auth:SIMPLE) > 2015-11-26 16:40:17,194 INFO resourcemanager.ApplicationMasterService > (ApplicationMasterService.java:registerApplicationMaster(274)) - AM > registration appattempt_1447985660182_0620_000001 > 2015-11-26 16:40:17,194 INFO resourcemanager.RMAuditLogger > (RMAuditLogger.java:logSuccess(127)) - USER=xiaolin IP=10.10.0.11 > OPERATION=Register App Master TARGET=ApplicationMasterService > RESULT=SUCCESS APPID=application_1447985660182_0620 > APPATTEMPTID=appattempt_1447985660182_0620_000001 > 2015-11-26 16:40:17,194 ERROR resourcemanager.ResourceManager > (ResourceManager.java:handle(851)) - Error in handling event type > REGISTERED for applicationAttempt application_1447985660182_0620 > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.appAttemptRegistered(SystemMetricsPublisher.java:143) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMRegisteredTransition.transition(RMAppAttemptImpl.java:1365) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMRegisteredTransition.transition(RMAppAttemptImpl.java:1341) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:755) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:849) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:830) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$MultiListenerHandler.handle(AsyncDispatcher.java:266) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > 2015-11-26 16:40:17,195 INFO rmapp.RMAppImpl (RMAppImpl.java:handle(718)) > - application_1447985660182_0620 State change from ACCEPTED to RUNNING > 2015-11-26 16:40:17,196 ERROR attempt.RMAppAttemptImpl > (RMAppAttemptImpl.java:handle(757)) - Can't handle this event at current > state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid > event: STATUS_UPDATE at LAUNCHED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:755) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:849) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:830) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$MultiListenerHandler.handle(AsyncDispatcher.java:266) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > 2015-11-26 16:40:22,197 ERROR attempt.RMAppAttemptImpl > (RMAppAttemptImpl.java:handle(757)) - Can't handle this event at current > state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid > event: STATUS_UPDATE at LAUNCHED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:755) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:849) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:830) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$MultiListenerHandler.handle(AsyncDispatcher.java:266) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > > > Any clue for this issue? > > Thanks in advance. > > > On Tue, Nov 24, 2015 at 12:48 AM, Daniel Lynch <[email protected]> wrote: > >> here is a working config example from my lab where hawq will execute in >> yarn >> >> >> >> >> $GPHOME/etc/hawq-site.xml >> <?xml version="1.0" encoding="UTF-8"?> >> <configuration> >> >> <property> >> <name>hawq_resourcemanager_query_noresource_timeout</name> >> <value>30</value> >> </property> >> >> <property> >> <name>hawq_master_address_host</name> >> <value>node2</value> >> <description>The host name of hawq master.</description> >> </property> >> >> <property> >> <name>hawq_master_address_port</name> >> <value>2020</value> >> <description>The port of hawq master.</description> >> </property> >> >> <property> >> <name>hawq_segment_address_port</name> >> <value>40000</value> >> <description>The port of hawq segment.</description> >> </property> >> >> <property> >> <name>hawq_dfs_url</name> >> <value>node2:8020/hawq_default</value> >> <description>URL for accessing HDFS.</description> >> </property> >> >> <property> >> <name>hawq_master_directory</name> >> <value>/data/master</value> >> <description>The directory of hawq master.</description> >> </property> >> >> <property> >> <name>hawq_segment_directory</name> >> <value>/data/primary</value> >> <description>The directory of hawq segment.</description> >> </property> >> >> <property> >> <name>hawq_master_temp_directory</name> >> <value>/tmp</value> >> <description>The temporary directory reserved for hawq >> master.</description> >> </property> >> >> <property> >> <name>hawq_segment_temp_directory</name> >> <value>/tmp</value> >> <description>The temporary directory reserved for hawq >> segment.</description> >> </property> >> >> *<!-- HAWQ resource manager parameters -->* >> * <property>* >> * <name>hawq_resourcemanager_server_type</name>* >> * <value>yarn</value>* >> * <description>The resource manager type to start for allocating >> resource.* >> * 'none' means hawq resource manager exclusively uses >> whole* >> * cluster; 'yarn' means hawq resource manager >> contacts YARN* >> * resource manager to negotiate resource.* >> * </description>* >> * </property>* >> >> * <property>* >> * <name>hawq_resourcemanager_segment_limit_memory_use</name>* >> * <value>64GB</value>* >> * <description>The limit of memory usage in a hawq segment when* >> * hawq_resourcemanager_server_type is set 'none'.* >> * </description>* >> * </property>* >> >> * <property>* >> * <name>hawq_resourcemanager_segment_limit_core_use</name>* >> * <value>16</value>* >> * <description>The limit of virtual core usage in a hawq segment >> when* >> * hawq_resourcemanager_server_type is set 'none'.* >> * </description>* >> * </property>* >> >> * <property>* >> * <name>hawq_resourcemanager_yarn_resourcemanager_address</name>* >> * <value>node3:8050</value>* >> * <description>The address of YARN resource manager >> server.</description>* >> * </property>* >> >> * <property>* >> * >> <name>hawq_resourcemanager_yarn_resourcemanager_scheduler_address</name>* >> * <value>node3:8030</value>* >> * <description>The address of YARN scheduler server.</description>* >> * </property>* >> >> * <property>* >> * <name>hawq_resourcemanager_yarn_queue</name>* >> * <value>default</value>* >> * <description>The YARN queue name to register hawq resource >> manager.</description>* >> * </property>* >> >> * <property>* >> * <name>hawq_resourcemanager_yarn_application_name</name>* >> * <value>hawq</value>* >> * <description>The application name to register hawq resource >> manager in YARN.</description>* >> * </property>* >> >> * <property>* >> * <name>default_segment_num</name>* >> * <value>16</value>* >> * </property>* >> * <property>* >> * >> <name>hawq_resourcemanager_query_vsegment_number_per_segment_limit</name>* >> * <value>8</value>* >> * </property>* >> *</configuration>* >> >> >> >> >> >> Daniel Lynch >> Mon-Fri 9-5 PST >> Office: 408 780 4498 >> >> On Sun, Nov 22, 2015 at 9:23 PM, Leon Zhang <[email protected]> wrote: >> >>> Hi, >>> >>> Is there any tutorial about how to deploy latest HAWQ 2.0-beta on >>> YARN cluster? >>> I just rebuild the latest code from git, and after "hawq init >>> cluster", it seems the segments does not work on YARN container. Any help >>> will be appreciated. >>> >>> >>> Thanks. >>> >> >> >
