Which version of Kudu do you use? I also encountered this error a few days
ago using Kudu1.3.0: "Tried to update clock beyond the max. error.", after
restarting the cluster, everything goes normal. I checked the dmesg and
asked SRE to check the ntp service, everything looks normal. Still have no
idea what cause that error.

2017-11-01 10:12 GMT+08:00 Franco Venturi <fvent...@comcast.net>:

> A few days ago at work our Kudu servers started having fatal errors and
> shutting down with the following error message:
>
>
>      Couldn't get the current time: Clock unsynchronized. Status: Service
> unavailable: Error: Clock synchronized but error wastoo high (10000016 us).
>
>
> After some research in the community forums, I found a post by Todd that
> pointed to this JIRA issue: https://issues.apache.org/
> jira/browse/KUDU-2079
>
> I then checked our ntpd configuration and sure enough we had the '-x'
> option in the daemon command, so I went ahead, removed that option,
> restarted ntpd, and a few minutes later I restarted all the Kudu processes
> (one master and three tablet servers).
> A few minutes later a couple of those Kudu processes were down again, this
> time with this new time sync related error message:
>
>
>      Tried to update clock beyond the max. error.
>
>
> To try to address this new error, I brought down all the Kudu processes,
> stopped ntpd, resync'd the time on all the servers with ntpdate, brought
> ntpd back up, waited a bit, and restarted Kudu (master and tablet servers).
> A few minutes or less later a couple of them were down again with the same
> 'Tried to update clock beyond the max. error.'
>
>
> I eventually ended up doubling the parameter 'max_clock_sync_error_usec'
> to 20,000,000 (20 seconds) and everything stayed up (and is still up).
>
>
> Looking at the source code in git, I found the relevant section here
> (source file https://github.com/apache/kudu/blob/master/src/kudu/
> clock/hybrid_clock.cc):
>
>
>      // we won't update our clock if to_update is more than
> 'max_clock_sync_error_usec'
>      // into the future as it might have been corrupted or originated from
> an out-of-sync
>      // server.
>      if ((to_update_physical - now_physical) > 
> FLAGS_max_clock_sync_error_usec)
> {
>        return Status::InvalidArgument("Tried to update clock beyond the
> max. error.");
>      }
>
>
> If I understand this code correctly, it is complaining because for some
> reason Kudu is trying to update its clock by more than 10 seconds - however
> I ran ntptime and several ntpq queries, and I don't see the time between
> the servers being off by that much (or even by say half a second, since
> they are all synchronized with a stratum 3 NTP server).
>
>
> Has anyone in this group seen anything similar or does anyone have a
> better understanding of what this message means and what could be causing
> it?
>
>
> Thanks,
> Franco
>

Reply via email to