Hanisha Koneru created HDDS-3642:
------------------------------------
Summary: Stop/Pause Background services while replacing OM DB with
checkpoint from Leader
Key: HDDS-3642
URL: https://issues.apache.org/jira/browse/HDDS-3642
Project: Hadoop Distributed Data Store
Issue Type: Sub-task
Reporter: Hanisha Koneru
Assignee: Hanisha Koneru
When a follower OM needs to replace its DB with a checkpoint from Leader (to
catch up on the transactions), it should pause or stop services which read/
write to the DB.
During OM HA testing, found that OM could crash with JVM error on RocksDB. This
happened because KeyDeletingService was trying to access a memory which is
already freed up.
{code:java}
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f19de835af0, pid=1389, tid=1712
#
# JRE version: OpenJDK Runtime Environment (11.0.6+10) (build 11.0.6+10-LTS)
# Java VM: OpenJDK 64-Bit Server VM (11.0.6+10-LTS, mixed mode, sharing,
tiered, compressed oops, concurrent mark sweep gc, linux-amd64)
# Problematic frame:
# C [librocksdbjni10001996641283911793.so+0x1aeaf0]
Java_org_rocksdb_RocksIterator_seekToFirst0+0x0
#
# Core dump will be written. Default location: Core dumps may be processed with
"/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /opt/core.1389)
#
# An error report file with more information is saved as:
# /opt/hs_err_pid1389.log
{code}
>From the hs_error log file:
{code:java}
--------------- T H R E A D ---------------Current thread
(0x00000000011a4000): JavaThread "KeyDeletingService#1" daemon
[_thread_in_native, id=1712,
stack(0x00007f19d2443000,0x00007f19d2544000)]Stack:
[0x00007f19d2443000,0x00007f19d2544000], sp=0x00007f19d2541e78, free
space=1019k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted,
Vv=VM code, C=native code)
C [librocksdbjni10001996641283911793.so+0x1aeaf0]
Java_org_rocksdb_RocksIterator_seekToFirst0+0x0
j org.rocksdb.AbstractRocksIterator.seekToFirst()V+26
j
org.apache.hadoop.hdds.utils.db.RDBStoreIterator.<init>(Lorg/rocksdb/RocksIterator;)V+13
j
org.apache.hadoop.hdds.utils.db.RDBTable.iterator()Lorg/apache/hadoop/hdds/utils/db/TableIterator;+30
j
org.apache.hadoop.hdds.utils.db.TypedTable.iterator()Lorg/apache/hadoop/hdds/utils/db/TableIterator;+4
j
org.apache.hadoop.ozone.om.OmMetadataManagerImpl.getPendingDeletionKeys(I)Ljava/util/List;+8
j
org.apache.hadoop.ozone.om.KeyManagerImpl.getPendingDeletionKeys(I)Ljava/util/List;+5
j
org.apache.hadoop.ozone.om.KeyDeletingService$KeyDeletingTask.call()Lorg/apache/hadoop/hdds/utils/BackgroundTaskResult;+39
j
org.apache.hadoop.ozone.om.KeyDeletingService$KeyDeletingTask.call()Ljava/lang/Object;+1
J 4791 c1 java.util.concurrent.FutureTask.run()V [email protected] (123 bytes) @
0x00007f19f0c7b414 [0x00007f19f0c7ad20+0x00000000000006f4]
J 4802 c1
java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object;
[email protected] (14 bytes) @ 0x00007f19f0c87214
[0x00007f19f0c870e0+0x0000000000000134]
J 4791 c1 java.util.concurrent.FutureTask.run()V [email protected] (123 bytes) @
0x00007f19f0c7b414 [0x00007f19f0c7ad20+0x00000000000006f4]
J 4802 c1
java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object;
[email protected] (14 bytes) @ 0x00007f19f0c87214
[0x00007f19f0c870e0+0x0000000000000134]
J 4791 c1 java.util.concurrent.FutureTask.run()V [email protected] (123 bytes) @
0x00007f19f0c7b414 [0x00007f19f0c7ad20+0x00000000000006f4]
J 4954 c1
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()V
[email protected] (57 bytes) @ 0x00007f19f0cfe10c
[0x00007f19f0cfde40+0x00000000000002cc]
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]