RE: About Ignite Map Reduce OOM problem

Xia Qu Tue, 05 Feb 2019 13:09:11 -0800

Hi All,
We were trying to use Ignite Map Reduce to accelerate Hive Query on an existing 
HDFS, changes we made includes:


  1.  Changed core-site.xml, added
<property>
    <name>fs.defaultFS</name>
    <value>hdfs://hacluster</value>
</property>

  1.  Changed hive-site.xml, added
<property>
    <name>hive.rpc.query.plan</name>
    <value>true</value>
</property>

  1.  Changed mapred-site.xml, added
<property>
    <name>mapreduce.framework.name</name>
    <value>ignite</value>
</property>
<property>
    <name>mapreduce.jobtracker.address</name>
    <value>localhost:11211</value>
</property>

  1.  Added ignite-core, ignite-hadoop and ignite-shmem into hadoop class path.
  2.  downloaded a In-Memory Hadoop Accelerator 2.6.0 version of ignite from 
https://ignite.apache.org/download.cgi
  3.  Changed ${ignite_home}/conf/default-config.xml, added
<property name="communicationSpi">
    <bean class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi">
        <property name="messageQueueLimit" value="1024"/>
    </bean>
</property>

  1.  Changed ${ignite_home}/bin/ignite.sh, enabled G1GC
  2.  Increased both on heap and off heap size.
  3.  Restarted HiveServer to let it pick up the latest config.
Then we started a beeline, executed some queries with around 1b records. It 
turns out that for a cluster of two nodes:

  1.  node1 got
[04:14:59,978][WARNING][jvm-pause-detector-worker][] Possible too long JVM 
pause: 9417 milliseconds.
[04:15:12,735][WARNING][jvm-pause-detector-worker][] Possible too long JVM 
pause: 12707 milliseconds.
[04:15:26,561][WARNING][jvm-pause-detector-worker][] Possible too long JVM 
pause: 8077 milliseconds.
[04:15:51,697][WARNING][jvm-pause-detector-worker][] Possible too long JVM 
pause: 30785 milliseconds.
[04:16:00,683][WARNING][jvm-pause-detector-worker][] Possible too long JVM 
pause: 8936 milliseconds.
[04:16:14,941][WARNING][jvm-pause-detector-worker][] Possible too long JVM 
pause: 14208 milliseconds.
Failed to execute IGFS ad-hoc thread: GC overhead limit exceeded

  1.  after a while, node2 got
Timed out waiting for message delivery receipt (most probably, the reason is in 
long GC pauses on remote node; consider tuning GC and increasing 'ackTimeout' 
configuration property). Will retry to send message with increased timeout 
[currentTimeout=10000, rmtAddr=host1/192.69.2.27:47500, rmtPort=47500]

  1.  eventually, the terminal which executes beeling got
Caused by: java.io.IOException: Did not receive any packets within ping 
response interval (connection is considered to be half-opened) 
[lastPingReceiveTime=9223372036854775807, lastPingSendTime=1549397555438, 
now=1549397562438, timeout=7000, addr=/192.69.2.12:11211]
Any ideas how could we solve this problem?

RE: About Ignite Map Reduce OOM problem

Reply via email to