Thanks for the tips,

I increased memory up to 28Gb (32Gb total in the Kylin node)
But still I could see the java process (its the only java process in the
server) memory consumption keep growing and finally crash with
OutOfMemoryException.

This happens in the 4th step "4 Step Name #: Build Dimension Dictionary
Duration: 0 Seconds" which continue for about 25mins before the crash.
Why does this step need that much of memory in Kylin side?
Also I couldn't see any  logs to investigate the issue.
Apart from GC dump, where else can I find any useful information ?


On Wed, Sep 28, 2016 at 4:55 PM, Li Yang <[email protected]> wrote:

> Increase memory in $KYLIN_HOME/bin/setenv.sh
>
> # (if your're deploying KYLIN on a powerful server and want to replace the
> default conservative settings)
> # uncomment following to for it to take effect
> export KYLIN_JVM_SETTINGS=...
> # export KYLIN_JVM_SETTINGS=...
>
> The commented line is a reference.
>
> Cheers
> Yang
>
>
> On Wed, Sep 28, 2016 at 3:06 PM, Ashika Umanga Umagiliya <
> [email protected]> wrote:
>
>> Looks like tomcat crashed after running out of memory.
>> I saw this in "kylin.out" :
>>
>> #
>> # java.lang.OutOfMemoryError: Java heap space
>> # -XX:OnOutOfMemoryError="kill -9 %p"
>> #   Executing /bin/sh -c "kill -9 12727"...
>>
>>
>>
>> Before the crash , "kylin.log" file shows following lines.
>> Seems it keep trying to reconnect to ZooKeeper.
>> What the reason for  Kylin to communicate with ZK ?
>>
>> I see the line "System free memory less than 100 MB."
>>
>> ---- kylin.log ----
>>
>> 2016-09-28 06:50:02,495 ERROR [Curator-Framework-0]
>> curator.ConnectionState:200 : Connection timed out for connection string
>> (hdp-jz5001.hadoop.local:2181,hdp-jz5002.hadoop.local:2181,hdp-jz5003.hadoop.local:2181)
>> and timeout (15000) / elapsed (28428)
>> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
>> ConnectionLoss
>> at org.apache.curator.ConnectionState.checkTimeouts(ConnectionS
>> tate.java:197)
>> at org.apache.curator.ConnectionState.getZooKeeper(ConnectionSt
>> ate.java:87)
>> at org.apache.curator.CuratorZookeeperClient.getZooKeeper(Curat
>> orZookeeperClient.java:115)
>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.perfo
>> rmBackgroundOperation(CuratorFrameworkImpl.java:806)
>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.backg
>> roundOperationsLoop(CuratorFrameworkImpl.java:792)
>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.acces
>> s$300(CuratorFrameworkImpl.java:62)
>> at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.cal
>> l(CuratorFrameworkImpl.java:257)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1142)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>> 2016-09-28 06:50:02,495 INFO  
>> [Thread-10-SendThread(hdp-jz5001.hadoop.local:2181)]
>> zookeeper.ClientCnxn:1279 : Session establishment complete on server
>> hdp-jz5001.hadoop.local/100.78.7.155:2181, sessionid =
>> 0x156d401adb1701a, negotiated timeout = 40000
>> 2016-09-28 06:50:02,495 INFO  [localhost-startStop-1-SendTh
>> read(hdp-jz5003.hadoop.local:2181)] zookeeper.ClientCnxn:1019 : Opening
>> socket connection to server hdp-jz5003.hadoop.local/100.78.8.153:2181.
>> Will not attempt to authenticate using SASL (unknown error)
>> 2016-09-28 06:50:02,495 ERROR [Curator-Framework-0]
>> curator.ConnectionState:200 : Connection timed out for connection string
>> (hdp-jz5001.hadoop.local:2181,hdp-jz5002.hadoop.local:2181,hdp-jz5003.hadoop.local:2181)
>> and timeout (15000) / elapsed (28429)
>> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode =
>> ConnectionLoss
>> at org.apache.curator.ConnectionState.checkTimeouts(ConnectionS
>> tate.java:197)
>> at org.apache.curator.ConnectionState.getZooKeeper(ConnectionSt
>> ate.java:87)
>> at org.apache.curator.CuratorZookeeperClient.getZooKeeper(Curat
>> orZookeeperClient.java:115)
>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.perfo
>> rmBackgroundOperation(CuratorFrameworkImpl.java:806)
>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.doSyn
>> cForSuspendedConnection(CuratorFrameworkImpl.java:681)
>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.acces
>> s$700(CuratorFrameworkImpl.java:62)
>> at org.apache.curator.framework.imps.CuratorFrameworkImpl$7.ret
>> riesExhausted(CuratorFrameworkImpl.java:677)
>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.check
>> BackgroundRetry(CuratorFrameworkImpl.java:696)
>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.perfo
>> rmBackgroundOperation(CuratorFrameworkImpl.java:826)
>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.backg
>> roundOperationsLoop(CuratorFrameworkImpl.java:792)
>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.acces
>> s$300(CuratorFrameworkImpl.java:62)
>> at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.cal
>> l(CuratorFrameworkImpl.java:257)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1142)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>> 2016-09-28 06:50:02,495 INFO  [localhost-startStop-1-SendTh
>> read(hdp-jz5003.hadoop.local:2181)] zookeeper.ClientCnxn:864 : Socket
>> connection established to hdp-jz5003.hadoop.local/100.78.8.153:2181,
>> initiating session
>> 2016-09-28 06:50:15,060 INFO  [localhost-startStop-1-SendTh
>> read(hdp-jz5003.hadoop.local:2181)] zookeeper.ClientCnxn:1140 : Client
>> session timed out, have not heard from server in 12565ms for sessionid
>> 0x356d401ac017143, closing socket connection and attempting reconnect
>> 2016-09-28 06:50:02,495 INFO  [Thread-10-EventThread]
>> state.ConnectionStateManager:228 : State change: RECONNECTED
>> 2016-09-28 06:50:31,040 INFO  
>> [Thread-10-SendThread(hdp-jz5001.hadoop.local:2181)]
>> zookeeper.ClientCnxn:1140 : Client session timed out, have not heard from
>> server in 28544ms for sessionid 0x156d401adb1701a, closing socket
>> connection and attempting reconnect
>> 2016-09-28 06:50:31,042 DEBUG [http-bio-7070-exec-7]
>> service.AdminService:89 : Get Kylin Runtime Config
>> 2016-09-28 06:50:31,043 DEBUG [http-bio-7070-exec-1]
>> controller.UserController:64 : authentication.getPrincipal() is
>> org.springframework.security.core.userdetails.User@3b40b2f: Username:
>> ADMIN; Password: [PROTECTED]; Enabled: true; AccountNonExpired: true;
>> credentialsNonExpired: true; AccountNonLocked: true; Granted Authorities:
>> ROLE_ADMIN,ROLE_ANALYST,ROLE_MODELER
>> 2016-09-28 06:50:43,799 INFO  [localhost-startStop-1-SendTh
>> read(hdp-jz5002.hadoop.local:2181)] zookeeper.ClientCnxn:1019 : Opening
>> socket connection to server hdp-jz5002.hadoop.local/100.78.8.20:2181.
>> Will not attempt to authenticate using SASL (unknown error)
>> 2016-09-28 06:50:43,799 INFO  [Thread-10-EventThread]
>> state.ConnectionStateManager:228 : State change: SUSPENDED
>> 2016-09-28 06:50:59,925 INFO  [BadQueryDetector]
>> service.BadQueryDetector:151 : System free memory less than 100 MB. 0
>> queries running.
>> 2016-09-28 06:50:59,926 INFO  [localhost-startStop-1-SendTh
>> read(hdp-jz5002.hadoop.local:2181)] zookeeper.ClientCnxn:864 : Socket
>> connection established to hdp-jz5002.hadoop.local/100.78.8.20:2181,
>> initiating session
>> 2016-09-28 06:51:28,723 INFO  [localhost-startStop-1-SendTh
>> read(hdp-jz5002.hadoop.local:2181)] zookeeper.ClientCnxn:1140 : Client
>> session timed out, have not heard from server in 28798ms for sessionid
>> 0x356d401ac017143, closing socket connection and attempting reconnect
>> 2016-09-28 06:51:41,129 INFO  
>> [pool-8-thread-10-SendThread(hdp-jz5001.hadoop.local:2181)]
>> zookeeper.ClientCnxn:1142 : Unable to read additional data from server
>> sessionid 0x356d401ac01714a, likely server has closed socket, closing
>> socket connection and attempting reconnect
>> 2016-09-28 06:51:53,474 INFO  
>> [Thread-10-SendThread(hdp-jz5003.hadoop.local:2181)]
>> zookeeper.ClientCnxn:1019 : Opening socket connection to server
>> hdp-jz5003.hadoop.local/100.78.8.153:2181. Will not attempt to
>> authenticate using SASL (unknown error)
>> 2016-09-28 06:51:12,316 INFO  
>> [pool-8-thread-10-SendThread(hdp-jz5003.hadoop.local:2181)]
>> zookeeper.ClientCnxn:1140 : Client session timed out, have not heard from
>> server in 28517ms for sessionid 0x256d401adbf6f77, closing socket
>> connection and attempting reconnect
>> 2016-09-28 06:54:29,304 INFO  [localhost-startStop-1-SendTh
>> read(hdp-jz5001.hadoop.local:2181)] zookeeper.ClientCnxn:1019 : Opening
>> socket connection to server hdp-jz5001.hadoop.local/100.78.7.155:2181.
>> Will not attempt to authenticate using SASL (unknown error)
>> 2016-09-28 06:52:05,570 INFO  [BadQueryDetector]
>> service.BadQueryDetector:151 : System free memory less than 100 MB. 0
>> queries running.
>> 2016-09-28 06:56:29,665 ERROR [Curator-Framework-0]
>> imps.CuratorFrameworkImpl:537 : Background operation retry gave up
>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>> KeeperErrorCode = ConnectionLoss
>> at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.check
>> BackgroundRetry(CuratorFrameworkImpl.java:708)
>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.perfo
>> rmBackgroundOperation(CuratorFrameworkImpl.java:826)
>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.backg
>> roundOperationsLoop(CuratorFrameworkImpl.java:792)
>> at org.apache.curator.framework.imps.CuratorFrameworkImpl.acces
>> s$300(CuratorFrameworkImpl.java:62)
>> at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.cal
>> l(CuratorFrameworkImpl.java:257)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1142)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>> 2016-09-28 06:57:31,275 INFO  [BadQueryDetector]
>> service.BadQueryDetector:151 : System free memory less than 100 MB. 0
>> queries running.
>> 2016-09-28 06:56:29,665 INFO  
>> [pool-8-thread-10-SendThread(hdp-jz5001.hadoop.local:2181)]
>> zookeeper.ClientCnxn:1019 : Opening socket connection to server
>> hdp-jz5001.hadoop.local/100.78.7.155:2181. Will not attempt to
>> authenticate using SASL (unknown error)
>>
>>
>>
>> #
>> # java.lang.OutOfMemoryError: Java heap space
>> # -XX:OnOutOfMemoryError="kill -9 %p"
>> #   Executing /bin/sh -c "kill -9 12727"...
>>
>>
>


-- 
Umanga
http://jp.linkedin.com/in/umanga
http://umanga.ifreepages.com

Reply via email to