Hanisha Koneru created HDDS-3642:
------------------------------------

             Summary: Stop/Pause Background services while replacing OM DB with 
checkpoint from Leader
                 Key: HDDS-3642
                 URL: https://issues.apache.org/jira/browse/HDDS-3642
             Project: Hadoop Distributed Data Store
          Issue Type: Sub-task
            Reporter: Hanisha Koneru
            Assignee: Hanisha Koneru


When a follower OM needs to replace its DB with a checkpoint from Leader (to 
catch up on the transactions), it should pause or stop services which read/ 
write to the DB. 



During OM HA testing, found that OM could crash with JVM error on RocksDB. This 
happened because KeyDeletingService was trying to access a memory which is 
already freed up.
{code:java}
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f19de835af0, pid=1389, tid=1712
#
# JRE version: OpenJDK Runtime Environment (11.0.6+10) (build 11.0.6+10-LTS)
# Java VM: OpenJDK 64-Bit Server VM (11.0.6+10-LTS, mixed mode, sharing, 
tiered, compressed oops, concurrent mark sweep gc, linux-amd64)
# Problematic frame:
# C  [librocksdbjni10001996641283911793.so+0x1aeaf0]  
Java_org_rocksdb_RocksIterator_seekToFirst0+0x0
#
# Core dump will be written. Default location: Core dumps may be processed with 
"/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /opt/core.1389)
#
# An error report file with more information is saved as:
# /opt/hs_err_pid1389.log

{code}
>From the hs_error log file:
{code:java}
---------------  T H R E A D  ---------------Current thread 
(0x00000000011a4000):  JavaThread "KeyDeletingService#1" daemon 
[_thread_in_native, id=1712, 
stack(0x00007f19d2443000,0x00007f19d2544000)]Stack: 
[0x00007f19d2443000,0x00007f19d2544000],  sp=0x00007f19d2541e78,  free 
space=1019k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, 
Vv=VM code, C=native code)
C  [librocksdbjni10001996641283911793.so+0x1aeaf0]  
Java_org_rocksdb_RocksIterator_seekToFirst0+0x0
j  org.rocksdb.AbstractRocksIterator.seekToFirst()V+26
j  
org.apache.hadoop.hdds.utils.db.RDBStoreIterator.<init>(Lorg/rocksdb/RocksIterator;)V+13
j  
org.apache.hadoop.hdds.utils.db.RDBTable.iterator()Lorg/apache/hadoop/hdds/utils/db/TableIterator;+30
j  
org.apache.hadoop.hdds.utils.db.TypedTable.iterator()Lorg/apache/hadoop/hdds/utils/db/TableIterator;+4
j  
org.apache.hadoop.ozone.om.OmMetadataManagerImpl.getPendingDeletionKeys(I)Ljava/util/List;+8
j  
org.apache.hadoop.ozone.om.KeyManagerImpl.getPendingDeletionKeys(I)Ljava/util/List;+5
j  
org.apache.hadoop.ozone.om.KeyDeletingService$KeyDeletingTask.call()Lorg/apache/hadoop/hdds/utils/BackgroundTaskResult;+39
j  
org.apache.hadoop.ozone.om.KeyDeletingService$KeyDeletingTask.call()Ljava/lang/Object;+1
J 4791 c1 java.util.concurrent.FutureTask.run()V [email protected] (123 bytes) @ 
0x00007f19f0c7b414 [0x00007f19f0c7ad20+0x00000000000006f4]
J 4802 c1 
java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object; 
[email protected] (14 bytes) @ 0x00007f19f0c87214 
[0x00007f19f0c870e0+0x0000000000000134]
J 4791 c1 java.util.concurrent.FutureTask.run()V [email protected] (123 bytes) @ 
0x00007f19f0c7b414 [0x00007f19f0c7ad20+0x00000000000006f4]
J 4802 c1 
java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object; 
[email protected] (14 bytes) @ 0x00007f19f0c87214 
[0x00007f19f0c870e0+0x0000000000000134]
J 4791 c1 java.util.concurrent.FutureTask.run()V [email protected] (123 bytes) @ 
0x00007f19f0c7b414 [0x00007f19f0c7ad20+0x00000000000006f4]
J 4954 c1 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()V 
[email protected] (57 bytes) @ 0x00007f19f0cfe10c 
[0x00007f19f0cfde40+0x00000000000002cc]

{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to