Alexey Serbin created KUDU-3163: ----------------------------------- Summary: Long after restarting kudu-tserver nodes, follower replicas continue rejecting scan requests with 'Uninitialized: safe time has not yet been initialized' error Key: KUDU-3163 URL: https://issues.apache.org/jira/browse/KUDU-3163 Project: Kudu Issue Type: Bug Components: tserver Reporter: Alexey Serbin Attachments: logs.tar.bz2
There was a report on a strange state of tablet replicas after some sort of rolling restart. ksck with checksum reported the tablet was fine, but follower replicas continued rejecting scan requests with {{Uninitialized: safe time has not yet been initialized}} error. It seems the issue went away after forcing tablet leader re-election. No new write operations (INSERT, UPDATE, DELETE) were issued against the tablet. As already mentioned, some nodes in the cluster were restarted, and before doing that {{\-\-follower_unavailable_considered_failed_sec}} flag was set to {{3600}}. At this time, I don't have a clear picture of what was going on, but I just wanted to dump available information. I need to do a root cause analysis to produce a clear description and diagnosis for the issue. The logs are attached (these are filtered tablet server logs containing the lines attributed only to the affected tablet: UUID {{c56432b0164e45d98175f26a54d65270}}). At the time when the logs were captured, {{hdp025}} hosted the leader replica of the tablet, while {{hdp014}} and {{hdp035}} hosted the follower ones. -- This message was sent by Atlassian Jira (v8.3.4#803005)