Hello!

Have you tried profiling heap, to see where the heap is trapped?

Regards,
-- 
Ilya Kasnacheev


ср, 6 февр. 2019 г. в 00:08, Xia Qu <[email protected]>:

> Hi All,
>
> We were trying to use Ignite Map Reduce to accelerate Hive Query on an
> existing HDFS, changes we made includes:
>
>    1. Changed core-site.xml, added
>
> <property>
>
>     <name>fs.defaultFS</name>
>
>     <value>hdfs://hacluster</value>
>
> </property>
>
>    1. Changed hive-site.xml, added
>
> <property>
>
>     <name>hive.rpc.query.plan</name>
>
>     <value>true</value>
>
> </property>
>
>    1. Changed mapred-site.xml, added
>
> <property>
>
>     <name>mapreduce.framework.name</name>
>
>     <value>ignite</value>
>
> </property>
>
> <property>
>
>     <name>mapreduce.jobtracker.address</name>
>
>     <value>localhost:11211</value>
>
> </property>
>
>    1. Added ignite-core, ignite-hadoop and ignite-shmem into hadoop class
>    path.
>    2. downloaded a In-Memory Hadoop Accelerator 2.6.0 version of ignite
>    from https://ignite.apache.org/download.cgi
>    3. Changed ${ignite_home}/conf/default-config.xml, added
>
> <property name="communicationSpi">
>
>     <bean class=
> "org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi">
>
>         <property name="messageQueueLimit" value="1024"/>
>
>     </bean>
>
> </property>
>
>    1. Changed ${ignite_home}/bin/ignite.sh, enabled G1GC
>    2. Increased both on heap and off heap size.
>    3. Restarted HiveServer to let it pick up the latest config.
>
> Then we started a beeline, executed some queries with around 1b records.
> It turns out that for a cluster of two nodes:
>
>    1. node1 got
>
> [04:14:59,978][WARNING][jvm-pause-detector-worker][] Possible too long JVM
> pause: 9417 milliseconds.
>
> [04:15:12,735][WARNING][jvm-pause-detector-worker][] Possible too long JVM
> pause: 12707 milliseconds.
>
> [04:15:26,561][WARNING][jvm-pause-detector-worker][] Possible too long JVM
> pause: 8077 milliseconds.
>
> [04:15:51,697][WARNING][jvm-pause-detector-worker][] Possible too long JVM
> pause: 30785 milliseconds.
>
> [04:16:00,683][WARNING][jvm-pause-detector-worker][] Possible too long JVM
> pause: 8936 milliseconds.
>
> [04:16:14,941][WARNING][jvm-pause-detector-worker][] Possible too long JVM
> pause: 14208 milliseconds.
>
> Failed to execute IGFS ad-hoc thread: GC overhead limit exceeded
>
>    1. after a while, node2 got
>
> Timed out waiting for message delivery receipt (most probably, the reason
> is in long GC pauses on remote node; consider tuning GC and increasing
> 'ackTimeout' configuration property). Will retry to send message with
> increased timeout [currentTimeout=10000, rmtAddr=host1/192.69.2.27:47500,
> rmtPort=47500]
>
>    1. eventually, the terminal which executes beeling got
>
> Caused by: java.io.IOException: Did not receive any packets within ping
> response interval (connection is considered to be half-opened)
> [lastPingReceiveTime=9223372036854775807, lastPingSendTime=1549397555438,
> now=1549397562438, timeout=7000, addr=/192.69.2.12:11211]
>
> Any ideas how could we solve this problem?
>
>
>

Reply via email to