Hi All,
We were trying to use Ignite Map Reduce to accelerate Hive Query on an existing
HDFS, changes we made includes:
1. Changed core-site.xml, added
<property>
<name>fs.defaultFS</name>
<value>hdfs://hacluster</value>
</property>
1. Changed hive-site.xml, added
<property>
<name>hive.rpc.query.plan</name>
<value>true</value>
</property>
1. Changed mapred-site.xml, added
<property>
<name>mapreduce.framework.name</name>
<value>ignite</value>
</property>
<property>
<name>mapreduce.jobtracker.address</name>
<value>localhost:11211</value>
</property>
1. Added ignite-core, ignite-hadoop and ignite-shmem into hadoop class path.
2. downloaded a In-Memory Hadoop Accelerator 2.6.0 version of ignite from
https://ignite.apache.org/download.cgi
3. Changed ${ignite_home}/conf/default-config.xml, added
<property name="communicationSpi">
<bean class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi">
<property name="messageQueueLimit" value="1024"/>
</bean>
</property>
1. Changed ${ignite_home}/bin/ignite.sh, enabled G1GC
2. Increased both on heap and off heap size.
3. Restarted HiveServer to let it pick up the latest config.
Then we started a beeline, executed some queries with around 1b records. It
turns out that for a cluster of two nodes:
1. node1 got
[04:14:59,978][WARNING][jvm-pause-detector-worker][] Possible too long JVM
pause: 9417 milliseconds.
[04:15:12,735][WARNING][jvm-pause-detector-worker][] Possible too long JVM
pause: 12707 milliseconds.
[04:15:26,561][WARNING][jvm-pause-detector-worker][] Possible too long JVM
pause: 8077 milliseconds.
[04:15:51,697][WARNING][jvm-pause-detector-worker][] Possible too long JVM
pause: 30785 milliseconds.
[04:16:00,683][WARNING][jvm-pause-detector-worker][] Possible too long JVM
pause: 8936 milliseconds.
[04:16:14,941][WARNING][jvm-pause-detector-worker][] Possible too long JVM
pause: 14208 milliseconds.
Failed to execute IGFS ad-hoc thread: GC overhead limit exceeded
1. after a while, node2 got
Timed out waiting for message delivery receipt (most probably, the reason is in
long GC pauses on remote node; consider tuning GC and increasing 'ackTimeout'
configuration property). Will retry to send message with increased timeout
[currentTimeout=10000, rmtAddr=host1/192.69.2.27:47500, rmtPort=47500]
1. eventually, the terminal which executes beeling got
Caused by: java.io.IOException: Did not receive any packets within ping
response interval (connection is considered to be half-opened)
[lastPingReceiveTime=9223372036854775807, lastPingSendTime=1549397555438,
now=1549397562438, timeout=7000, addr=/192.69.2.12:11211]
Any ideas how could we solve this problem?