[
https://issues.apache.org/jira/browse/HDDS-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shashikant Banerjee updated HDDS-3559:
--------------------------------------
Target Version/s: 0.7.0 (was: 0.6.0)
> Datanode doesn't handle java heap OutOfMemory exception
> --------------------------------------------------------
>
> Key: HDDS-3559
> URL: https://issues.apache.org/jira/browse/HDDS-3559
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Components: Ozone Datanode
> Affects Versions: 0.5.0
> Reporter: Li Cheng
> Priority: Major
> Labels: Triaged, pull-request-available
>
> 2020-05-05 15:47:41,568 [Datanode State Machine Thread - 167] WARN
> org.apache.hadoop.ozone.container.common.statemachine.Endpoi
> ntStateMachine: Unable to communicate to SCM server at host-10-51-87-181:9861
> for past 0 seconds.
> java.io.IOException: com.google.protobuf.ServiceException:
> java.lang.OutOfMemoryError: Java heap space
> at
> org.apache.hadoop.ipc.ProtobufHelper.getRemoteException(ProtobufHelper.java:47)
> at
> org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.submitRequest(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:118)
> at
> org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.sendHeartbeat(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:148)
> at
> org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask.call(HeartbeatEndpointTask.java:145)
> at
> org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask.call(HeartbeatEndpointTask.java:76)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: com.google.protobuf.ServiceException: java.lang.OutOfMemoryError:
> Java heap space
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.getReturnMessage(ProtobufRpcEngine.java:293)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:270)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
> at com.sun.proxy.$Proxy38.submitRequest(Unknown Source)
> at
> org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.submitRequest(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:116)
>
> On a cluster, one datanode stops reporting to SCM while being kept unknown.
> The datanode process is still working. Log shows Java heap OOM when it's
> serializing protobuf for rpc message. However, datanode silently stops
> reports to SCM and the process becomes stale.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]