Alexey Serbin created KUDU-3163:
-----------------------------------

             Summary: Long after restarting kudu-tserver nodes, follower 
replicas continue rejecting scan requests with 'Uninitialized: safe time has 
not yet been initialized' error
                 Key: KUDU-3163
                 URL: https://issues.apache.org/jira/browse/KUDU-3163
             Project: Kudu
          Issue Type: Bug
          Components: tserver
            Reporter: Alexey Serbin
         Attachments: logs.tar.bz2

There was a report on a strange state of tablet replicas after some sort of 
rolling restart.  ksck with checksum reported the tablet was fine, but follower 
replicas continued rejecting scan requests with {{Uninitialized: safe time has 
not yet been initialized}} error.  It seems the issue went away after forcing 
tablet leader re-election.  No new write operations (INSERT, UPDATE, DELETE) 
were issued against the tablet.

As already mentioned, some nodes in the cluster were restarted, and before 
doing that {{\-\-follower_unavailable_considered_failed_sec}} flag was set to 
{{3600}}.

At this time, I don't have a clear picture of what was going on, but I just 
wanted to dump available information. I need to do a root cause analysis to 
produce a clear description and diagnosis for the issue.

The logs are attached (these are filtered tablet server logs containing the 
lines attributed only to the affected tablet: UUID 
{{c56432b0164e45d98175f26a54d65270}}).  At the time when the logs were 
captured, {{hdp025}} hosted the leader replica of the tablet, while {{hdp014}} 
and {{hdp035}} hosted the follower ones.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to