Hello! Have you tried profiling heap, to see where the heap is trapped?
Regards, -- Ilya Kasnacheev ср, 6 февр. 2019 г. в 00:08, Xia Qu <[email protected]>: > Hi All, > > We were trying to use Ignite Map Reduce to accelerate Hive Query on an > existing HDFS, changes we made includes: > > 1. Changed core-site.xml, added > > <property> > > <name>fs.defaultFS</name> > > <value>hdfs://hacluster</value> > > </property> > > 1. Changed hive-site.xml, added > > <property> > > <name>hive.rpc.query.plan</name> > > <value>true</value> > > </property> > > 1. Changed mapred-site.xml, added > > <property> > > <name>mapreduce.framework.name</name> > > <value>ignite</value> > > </property> > > <property> > > <name>mapreduce.jobtracker.address</name> > > <value>localhost:11211</value> > > </property> > > 1. Added ignite-core, ignite-hadoop and ignite-shmem into hadoop class > path. > 2. downloaded a In-Memory Hadoop Accelerator 2.6.0 version of ignite > from https://ignite.apache.org/download.cgi > 3. Changed ${ignite_home}/conf/default-config.xml, added > > <property name="communicationSpi"> > > <bean class= > "org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi"> > > <property name="messageQueueLimit" value="1024"/> > > </bean> > > </property> > > 1. Changed ${ignite_home}/bin/ignite.sh, enabled G1GC > 2. Increased both on heap and off heap size. > 3. Restarted HiveServer to let it pick up the latest config. > > Then we started a beeline, executed some queries with around 1b records. > It turns out that for a cluster of two nodes: > > 1. node1 got > > [04:14:59,978][WARNING][jvm-pause-detector-worker][] Possible too long JVM > pause: 9417 milliseconds. > > [04:15:12,735][WARNING][jvm-pause-detector-worker][] Possible too long JVM > pause: 12707 milliseconds. > > [04:15:26,561][WARNING][jvm-pause-detector-worker][] Possible too long JVM > pause: 8077 milliseconds. > > [04:15:51,697][WARNING][jvm-pause-detector-worker][] Possible too long JVM > pause: 30785 milliseconds. > > [04:16:00,683][WARNING][jvm-pause-detector-worker][] Possible too long JVM > pause: 8936 milliseconds. > > [04:16:14,941][WARNING][jvm-pause-detector-worker][] Possible too long JVM > pause: 14208 milliseconds. > > Failed to execute IGFS ad-hoc thread: GC overhead limit exceeded > > 1. after a while, node2 got > > Timed out waiting for message delivery receipt (most probably, the reason > is in long GC pauses on remote node; consider tuning GC and increasing > 'ackTimeout' configuration property). Will retry to send message with > increased timeout [currentTimeout=10000, rmtAddr=host1/192.69.2.27:47500, > rmtPort=47500] > > 1. eventually, the terminal which executes beeling got > > Caused by: java.io.IOException: Did not receive any packets within ping > response interval (connection is considered to be half-opened) > [lastPingReceiveTime=9223372036854775807, lastPingSendTime=1549397555438, > now=1549397562438, timeout=7000, addr=/192.69.2.12:11211] > > Any ideas how could we solve this problem? > > >
