I do also see the below error messages in AppMaster.stderr after 10 to 15 mins.

19/01/04 07:30:31 INFO callback.ContainerRequestListener: Got response from RM 
for container ask, completedCnt=1
19/01/04 07:30:31 INFO callback.ContainerRequestListener: Got container status 
for containerID=container_e08_1545053754589_0089_01_000002, state=COMPLETE, 
exitStatus=1, diagnostics=Exception from container-launch.
Container id: container_e08_1545053754589_0089_01_000002
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:944)
        at org.apache.hadoop.util.Shell.run(Shell.java:848)
        at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142)
        at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 1

19/01/04 07:30:31 INFO callback.ContainerRequestListener: REMOVING CONTAINER 
container_e08_1545053754589_0089_01_000002
19/01/04 07:30:31 WARN discovery.ServiceDiscoverer: Unable to find registered 
model associated with container container_e08_1545053754589_0089_01_000002
19/01/04 07:30:31 ERROR discovery.ServiceDiscoverer: Unable to unregister 
container container_e08_1545053754589_0089_01_000002 due to: Unable.
java.lang.IllegalStateException: Unable.
        at 
org.apache.metron.maas.discovery.ServiceDiscoverer.unregisterByContainer(ServiceDiscoverer.java:204)
        at 
org.apache.metron.maas.service.callback.ContainerRequestListener.onContainersCompleted(ContainerRequestListener.java:121)
        at 
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:296)




From: Anil Donthireddy
Sent: Friday, January 4, 2019 6:08 PM
To: [email protected]
Cc: Christopher Berry <[email protected]>; Satish Abburi 
<[email protected]>
Subject: MAAS: Unable to deploy model to MAAS

Hi,

I am trying to deploy the sample model to MAAS. I have executed the steps as 
below and at the end I am unable to deploy the model to MAAS. I shared the logs 
I see for appropriate steps, I can't figure out the any issues in logs. I 
request someone who have expertise on using MAAS to go through logs and help me 
how to troubleshoot the issue further and resolve it.

To Start MAAS service: $METRON_HOME/bin/maas_service.sh -zq <ZookeeperNode>:2181

        After above command, I can see the application in Yarn resource manager 
and in the logs I see the latest message is "19/01/04 07:17:36 INFO 
service.ApplicationMaster: Ready to accept requests.."

Later I ran below command to deploy my model:
$METRON_HOME/bin/maas_deploy.sh -zq <ZookeeperNode>:2181 -lmp 
/home/metron/mock_dga -hmp /user/metron/models -mo ADD -m 512 -n dga -v 1.0 -ni 
1
                Below is the logs I see in AppMaster.stderr
                ***************Deploy model logs start*******

        19/01/04 07:20:23 INFO service.ApplicationMaster: [ADD]: Received 
request for model dga:1.0x1 containers of size 512M at path /user/metron/models

19/01/04 07:20:25 INFO impl.AMRMClientImpl: Received new token for : 
<YarnNode>:45454

19/01/04 07:20:25 INFO callback.ContainerRequestListener: Got response from RM 
for container ask, allocatedCnt=1

19/01/04 07:20:25 INFO service.ApplicationMaster: Found container id of 
8796093022210

19/01/04 07:20:25 INFO callback.ContainerRequestListener: Launching shell 
command on a new container., 
containerId=container_e08_1545053754589_0089_01_000002, 
containerNode=<YarnNode>:45454, containerNodeURI=<YarnNode>:8042, 
containerResourceMemory=1024, containerResourceVirtualCores=1

19/01/04 07:20:25 INFO callback.LaunchContainer: Setting up container launch 
container for containerid=container_e08_1545053754589_0089_01_000002

19/01/04 07:20:25 INFO callback.LaunchContainer: Local Directory Contents

19/01/04 07:20:25 INFO callback.LaunchContainer:   6 - tmp

19/01/04 07:20:25 INFO callback.LaunchContainer:   74 - container_tokens

19/01/04 07:20:25 INFO callback.LaunchContainer:   12 - .container_tokens.crc

19/01/04 07:20:25 INFO callback.LaunchContainer:   3930 - launch_container.sh

19/01/04 07:20:25 INFO callback.LaunchContainer:   40 - .launch_container.sh.crc

19/01/04 07:20:25 INFO callback.LaunchContainer:   655 - 
default_container_executor_session.sh

19/01/04 07:20:25 INFO callback.LaunchContainer:   16 - 
.default_container_executor_session.sh.crc

19/01/04 07:20:25 INFO callback.LaunchContainer:   709 - 
default_container_executor.sh

19/01/04 07:20:25 INFO callback.LaunchContainer:   16 - 
.default_container_executor.sh.crc

19/01/04 07:20:25 INFO callback.LaunchContainer:   20508144 - AppMaster.jar

19/01/04 07:20:25 INFO callback.LaunchContainer: Localizing /user/metron/models

19/01/04 07:20:25 INFO callback.LaunchContainer: Model payload: 
/user/metron/models

19/01/04 07:20:25 INFO callback.LaunchContainer: AppJAR Location: 
hdfs://<HDFSNODE>:8020/user/metron/MaaS/application_1545053754589_0089/AppMaster.jar

19/01/04 07:20:25 INFO callback.LaunchContainer: Localized dga.py -> 
LocatedFileStatus{path=hdfs://<HDFSNODE>:8020/user/metron/models/dga/dga.py; 
isDirectory=false; length=754; replication=3; blocksize=134217728; 
modification_time=1544799906052; access_time=1544799905761; owner=metron; 
group=metron; permission=rw-r--r--; isSymlink=false}

19/01/04 07:20:25 INFO callback.LaunchContainer: Localized rest.sh -> 
LocatedFileStatus{path=hdfs://<HDFSNODE>:8020/user/metron/models/dga/rest.sh; 
isDirectory=false; length=27; replication=3; blocksize=134217728; 
modification_time=1544799906093; access_time=1544799906058; owner=metron; 
group=metron; permission=rw-r--r--; isSymlink=false}

19/01/04 07:20:25 INFO callback.LaunchContainer: Localized dga.py -> 
LocatedFileStatus{path=hdfs://<HDFSNODE>:8020/user/metron/models/dga.py; 
isDirectory=false; length=754; replication=3; blocksize=134217728; 
modification_time=1546604423149; access_time=1546604422374; owner=metron; 
group=metron; permission=rw-r--r--; isSymlink=false}

19/01/04 07:20:25 INFO callback.LaunchContainer: Localized rest.sh -> 
LocatedFileStatus{path=hdfs://<HDFSNODE>:8020/user/metron/models/rest.sh; 
isDirectory=false; length=27; replication=3; blocksize=134217728; 
modification_time=1544797916505; access_time=1544797916294; owner=metron; 
group=metron; permission=rwxr-xr-x; isSymlink=false}

19/01/04 07:20:25 INFO callback.LaunchContainer: Localized run.sh -> 
LocatedFileStatus{path=hdfs://<HDFSNODE>:8020/user/metron/models/run.sh; 
isDirectory=false; length=26; replication=3; blocksize=134217728; 
modification_time=1546604423282; access_time=1546604423214; owner=metron; 
group=metron; permission=rw-r--r--; isSymlink=false}

19/01/04 07:20:25 INFO callback.LaunchContainer: AppMaster.jar localized: 
scheme: "hdfs" host: "<HDFSNODE>" port: 8020 file: 
"/user/metron/MaaS/application_1545053754589_0089/AppMaster.jar"

19/01/04 07:20:25 INFO callback.LaunchContainer: run.sh localized: scheme: 
"hdfs" host: "<HDFSNODE>" port: 8020 file: "/user/metron/models/run.sh"

19/01/04 07:20:25 INFO callback.LaunchContainer: dga.py localized: scheme: 
"hdfs" host: "<HDFSNODE>" port: 8020 file: "/user/metron/models/dga.py"

19/01/04 07:20:25 INFO callback.LaunchContainer: rest.sh localized: scheme: 
"hdfs" host: "<HDFSNODE>" port: 8020 file: "/user/metron/models/rest.sh"

19/01/04 07:20:25 INFO callback.LaunchContainer: Executing container command: 
{{JAVA_HOME}}/bin/java org.apache.metron.maas.service.runner.Runner -ci 
8796093022210 -zq <ZookeeperNode>:2181 -zr /metron/maas/config -s run.sh -n dga 
-hn <YarnNode> -v 1.0 1><LOG_DIR>/stdout 2><LOG_DIR>/stderr

19/01/04 07:20:25 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
START_CONTAINER for Container container_e08_1545053754589_0089_01_000002

19/01/04 07:20:25 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
<YarnNode>:45454

19/01/04 07:20:25 INFO impl.NMClientAsyncImpl: Processing Event EventType: 
QUERY_CONTAINER for Container container_e08_1545053754589_0089_01_000002

19/01/04 07:20:25 INFO impl.ContainerManagementProtocolProxy: Opening proxy : 
<YarnNode>:45454
                ****************************************
Now when I run the see list of models deployed/registered as below command, I 
am not getting proper result:
$METRON_HOME/bin/maas_deploy.sh -zq <ZookeeperNode>:2181 -mo LIST
*********************Log in AppMaster.stderr I see ****************

19/01/04 07:21:57 ERROR service.ApplicationMaster: Received a null request...

************************************************************
********************In console I see below*****************
19/01/04 07:25:08 INFO zookeeper.ZooKeeper: Client 
environment:java.io.tmpdir=/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir
19/01/04 07:25:08 INFO zookeeper.ZooKeeper: Client 
environment:java.compiler=<NA>
19/01/04 07:25:08 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
19/01/04 07:25:08 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
19/01/04 07:25:08 INFO zookeeper.ZooKeeper: Client 
environment:os.version=3.10.0-862.2.3.el7.x86_64
19/01/04 07:25:08 INFO zookeeper.ZooKeeper: Client environment:user.name=metron
19/01/04 07:25:08 INFO zookeeper.ZooKeeper: Client 
environment:user.home=/home/metron
19/01/04 07:25:08 INFO zookeeper.ZooKeeper: Client 
environment:user.dir=/home/metron
19/01/04 07:25:08 INFO zookeeper.ZooKeeper: Initiating client connection, 
connectString=<ZookeeperNode>:2181 sessionTimeout=60000 
watcher=org.apache.curator.ConnectionState@7cd2d3b6<mailto:watcher=org.apache.curator.ConnectionState@7cd2d3b6>
19/01/04 07:25:09 INFO zookeeper.ClientCnxn: Opening socket connection to 
server <ZookeeperNode>/<ZookeeperNode>:2181. Will not attempt to authenticate 
using SASL (unknown error)
19/01/04 07:25:09 INFO zookeeper.ClientCnxn: Socket connection established to 
<ZookeeperNode>/<ZookeeperNode>:2181, initiating session
19/01/04 07:25:09 INFO zookeeper.ClientCnxn: Session establishment complete on 
server <ZookeeperNode>/<ZookeeperNode>:2181, sessionid = 0x167b7a3ae3cc404, 
negotiated timeout = 60000
19/01/04 07:25:09 INFO state.ConnectionStateManager: State change: CONNECTED
19/01/04 07:25:09 INFO zookeeper.ZooKeeper: Session: 0x167b7a3ae3cc404 closed
19/01/04 07:25:09 INFO zookeeper.ClientCnxn: EventThread shut down
**************************************

I cannot see any error messages that are causing the issue to register Model in 
MAAS. Please look at the logs and help me how to troubleshoot further to 
resolve the issue.

Thanks,
Anil.

Reply via email to