Дмитрий Павлов Tue, 02 Jul 2019 01:25:59 -0700

 
Hi guys

I'm encountered a strange behaviour about replication of 2 tablets in my table 
in Kudu cluster
The table was in UNDER REPLICATED status. So i stoped all activity on cluster 
to make it cold.
But even in 2 hours table was in UNDER REPLICATED state so i checked 
rows_updated/rows_inserted metric and found out what replication process is 
very slow 1-2 K rows per second. I checked the logs of 2 tservers where tablets 
were located i found following errors:
 
W0701 12:10:07.124253    10 kernel_stack_watchdog.cc:198] Thread 141396 stuck 
at /tmp/apache-kudu-1.9
Kernel stack:                                                                   
                     
[<ffffffffffffffff>] 0xffffffffffffffff                                         
                     
                                                                                
                     
User stack:                                                                     
                     
    @     0x7f04971a45d0  (unknown)                                             
                     
    @           0xb4b21c  kudu::consensus::LogCache::EvictSomeUnlocked()        
                     
    @           0xb4bec6  kudu::consensus::LogCache::EvictThroughOp()           
                     
    @           0xb47d2f  kudu::consensus::PeerMessageQueue::ResponseFromPeer() 
                     
    @           0xb490b1  
kudu::consensus::PeerMessageQueue::LocalPeerAppendFinished()               
    @           0xb4bbcc  kudu::consensus::LogCache::LogCallback()              
                     
    @           0xb972d2  kudu::log::Log::AppendThread::HandleGroup()           
                     
    @           0xb97c2d  kudu::log::Log::AppendThread::DoWork()                
                     
    @          0x1e5bdff  kudu::ThreadPool::DispatchThread()                    
                     
    @          0x1e51634  kudu::Thread::SuperviseThread()                       
                     
    @     0x7f049719cdd5  start_thread                                          
                     
    @     0x7f0495473ead  __clone  
 
Both tservers were restarted and replication had been finished immediately
 
Did anybody else encounter this issue?


Thanks Dmitry Pavlov

Reply via email to