Hi,

I'm trying to run ignite on AWS EMR as a YARN application, using zookeeper
for node discovery. I have compiled ignite with

```
mvn clean package -DskipTests -Dignite.edition=hadoop -Dhadoop.version=2.7.3
```

I'm using ignite_yarn.properties

```
# The number of nodes in the cluster.
IGNITE_NODE_COUNT=3

# The number of CPU Cores for each Apache Ignite node.
IGNITE_RUN_CPU_PER_NODE=1

# The number of Megabytes of RAM for each Apache Ignite node.
IGNITE_MEMORY_PER_NODE=500

IGNITE_PATH=hdfs:///user/hadoop/ignite/apache-ignite-2.3.0-hadoop-2.7.3.zip

IGNITE_XML_CONFIG=hdfs:///user/hadoop/ignite/ignite_conf.xml

# Local path
IGNITE_WORK_DIR=/mnt

# Local path
IGNITE_RELEASES_DIR=/mnt

IGNITE_WORKING_DIR=/mnt
````

and ignite_conf.xml as

```
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans";
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
       xsi:schemaLocation="
        http://www.springframework.org/schema/beans
        http://www.springframework.org/schema/beans/spring-beans.xsd";>
    <bean id="ignite.cfg"
class="org.apache.ignite.configuration.IgniteConfiguration">
        <property name="cacheConfiguration">
            <list>
                <!-- Partitioned replicated cache configuration (Atomic
mode). -->
                <bean
class="org.apache.ignite.configuration.CacheConfiguration">
                    <property name="name" value="default"/>
                    <property name="atomicityMode" value="ATOMIC"/>
                    <property name="backups" value="3"/>
                    <property name="cacheMode" value="PARTITIONED"/>
                </bean>
            </list>
        </property>

        <!-- Explicitly configure TCP discovery SPI to provide list of
initial nodes. -->
        <property name="discoverySpi">
            <bean
class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
              <property name="ipFinder">
              <bean
class="org.apache.ignite.spi.discovery.tcp.ipfinder.zk.TcpDiscoveryZookeeperIpFinder">
                   <!-- FIXME change to master internal API (as used by
YARN), e.g. ip-10-0-0-154.ec2.internal:2181 -->
                  <property name="zkConnectionString"
value="ip-10-0-0-173.ec2.internal:2181"/>
              </bean>
              </property>
            </bean>
        </property>
        <property name="gridLogger">
          <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger">
            <!-- Default path relative to IGNITE_HOME, assumming
IGNITE_HOME is set to the
          root of the Ignite installation  -->
            <constructor-arg type="java.lang.String"
value="config/ignite-log4j2.xml"/>
          </bean>
        </property>
    </bean>
</beans>
```

Then I launch the yarn job as


```
IGNITE_YARN_JAR=/mnt/ignite/apache-ignite-2.3.0-src/modules/yarn/target/ignite-yarn-2.3.0.jar
 yarn jar ${IGNITE_YARN_JAR} ${IGNITE_YARN_JAR}
/mnt/ignite/ignite_yarn.properties
```

The app launches and the application master is outputting logs, but
containers only last some seconds running, and the application is
constantly asking for more containers. For example, in the application
master log

```

Nov 20, 2017 3:08:30 AM org.apache.ignite.yarn.ApplicationMaster
onContainersAllocated
INFO: Launching container: container_1511142795395_0005_01_017079.
17/11/20 03:08:30 INFO impl.ContainerManagementProtocolProxy: Opening
proxy : ip-10-0-0-230.ec2.internal:8041
Nov 20, 2017 3:08:30 AM org.apache.ignite.yarn.ApplicationMaster
onContainersAllocated
INFO: Launching container: container_1511142795395_0005_01_017080.
17/11/20 03:08:30 INFO impl.ContainerManagementProtocolProxy: Opening
proxy : ip-10-0-0-78.ec2.internal:8041
Nov 20, 2017 3:08:30 AM org.apache.ignite.yarn.ApplicationMaster
onContainersAllocated
INFO: Launching container: container_1511142795395_0005_01_017081.
17/11/20 03:08:30 INFO impl.ContainerManagementProtocolProxy: Opening
proxy : ip-10-0-0-193.ec2.internal:8041
Nov 20, 2017 3:08:31 AM org.apache.ignite.yarn.ApplicationMaster
onContainersCompleted
INFO: Container completed. Container id:
container_1511142795395_0005_01_017080. State: COMPLETE.
Nov 20, 2017 3:08:31 AM org.apache.ignite.yarn.ApplicationMaster
onContainersCompleted
INFO: Container completed. Container id:
container_1511142795395_0005_01_017081. State: COMPLETE.
Nov 20, 2017 3:08:31 AM org.apache.ignite.yarn.ApplicationMaster
onContainersCompleted
INFO: Container completed. Container id:
container_1511142795395_0005_01_017079. State: COMPLETE.
Nov 20, 2017 3:08:31 AM org.apache.ignite.yarn.ApplicationMaster
onContainersAllocated

```

In the logs for a node manager I see containers seem to fail when they are
launched, because the corresponding bash command is not well formed

```
2017-11-20 03:08:47,810 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl
(AsyncDispatcher event handler): Container
container_1511142795395_0005_01_017281 transitioned from LOCALIZED to
RUNNING
2017-11-20 03:08:47,811 INFO
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor
(ContainersLauncher #4): launchContainer: [bash,
/mnt/yarn/usercache/hadoop/appcache/application_1511142795395_0005/container_1511142795395_0005_01_017281/default_container_executor.sh]
2017-11-20 03:08:47,819 WARN
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor
(ContainersLauncher #4): Exit code from container
container_1511142795395_0005_01_017281 is : 2
2017-11-20 03:08:47,819 WARN
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor
(ContainersLauncher #4): Exception from container-launch with container ID:
container_1511142795395_0005_01_017281 and exit code: 2
ExitCodeException exitCode=2:
/mnt/yarn/usercache/hadoop/appcache/application_1511142795395_0005/container_1511142795395_0005_01_017281/launch_container.sh:
line 4: syntax error near unexpected token `('
/mnt/yarn/usercache/hadoop/appcache/application_1511142795395_0005/container_1511142795395_0005_01_017281/launch_container.sh:
line 4: `export BASH_FUNC_run_prestart()="() {  su -s /bin/bash $SVC_USER
-c "cd $WORKING_DIR && $EXEC_PATH --config '$CONF_DIR' start $DAEMON_FLAGS"'

        at org.apache.hadoop.util.Shell.runCommand(Shell.java:582)
        at org.apache.hadoop.util.Shell.run(Shell.java:479)
        at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
        at
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
        at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
        at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
2017-11-20 03:08:47,819 INFO
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor
(ContainersLauncher #4): Exception from container-launch.
2017-11-20 03:08:47,819 INFO
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor
(ContainersLauncher #4): Container id:
container_1511142795395_0005_01_017281
2017-11-20 03:08:47,819 INFO
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor
(ContainersLauncher #4): Exit code: 2
2017-11-20 03:08:47,819 INFO
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor
(ContainersLauncher #4): Exception message:
/mnt/yarn/usercache/hadoop/appcache/application_1511142795395_0005/container_1511142795395_0005_01_017281/launch_container.sh:
line 4: syntax error near unexpected token `('
2017-11-20 03:08:47,819 INFO
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor
(ContainersLauncher #4):
/mnt/yarn/usercache/hadoop/appcache/application_1511142795395_0005/container_1511142795395_0005_01_017281/launch_container.sh:
line 4: `export BASH_FUNC_run_prestart()="() {  su -s /bin/bash $SVC_USER
-c "cd $WORKING_DIR && $EXEC_PATH --config '$CONF_DIR' start $DAEMON_FLAGS"'
2017-11-20 03:08:47,819 INFO
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor
(ContainersLauncher #4):
2017-11-20 03:08:47,819 INFO
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor
(ContainersLauncher #4): Stack trace: ExitCodeException exitCode=2:
/mnt/yarn/usercache/hadoop/appcache/application_1511142795395_0005/container_1511142795395_0005_01_017281/launch_container.sh:
line 4: syntax error near unexpected token `('
```

When I launch ignite manually in the master it is able to start fine, and
connect to zookeeper, but I see a topology with just 1 node.

Any thoughts on what I might be doing wrong here?

Thanks in advance.

Juan Rodriguez

Reply via email to