Sounds fine to me. -Todd
On Wed, Mar 18, 2020 at 1:18 AM Alexey Serbin <aser...@cloudera.com.invalid> wrote: > Thank you all for the feedback! > > As Grant mentioned, parsing the configuration files of local NTP servers or > getting the list of their peers via appropriate CLI was considered at some > point in the scope of the 'auto' time source. However, this option didn't > look robust enough and no progress was made in that direction. Among the > concerns were the following corner cases: (a) the configuration file might > be in a non-default location (b) certain CLI commands might be prohibited > by custom security policy (c) there might be both ntpd and chronyd > installed, and it's necessary to consult systemd/chkconfig (which is > version/platform dependent) to resolve the ambiguity if neither of NTP > daemons are running during Kudu's startup time > > I don't think we evaluated the idea of multiple time sources described by > Todd, at least as it is articulated in this e-mail thread (i.e. using > multiple time sources with a fallback behavior between them). We looked at > auto-configuring the time source at startup based on the system clockâs NTP > synchronization status. The latter was considered to be not very robust > because it might (a) mask configuration issues (b) result in different time > source within the same Kudu cluster due to transient issues (c) introduce > extra startup delay. The preferred choice was having something more > deterministic and static, and --time_source=auto currently means using the > built-in NTP client configured with the internal NTP server for AWS and GCE > instances and using the system clock synchronized by NTP in all other > cases. > > It's true that the built-in NTP client has numerous TODOs. We have some > test coverage based on chronyd's as part of our external mini-cluster test > harness, but the built-in NTP client is not battle-tested at this point. I > agree with Adar that it's not clear how many existing and new Kudu users > would benefit from switching to the built-in NTP client by default. From > that perspective, keeping the default time source as 'system' and allowing > to switch to the built-in NTP client is a conservative, but very reasonable > approach. > > OK, at this point I can see the following options for the default time > source: > > 1. Keep the default clock source as 'system' and make it possible to switch > to the built-in NTP client when --time_source=builtin is set explicitly > (that's already how it is now). > 2. Switch the default clock source to 'builtin' and mention in the release > notes that it's not backwards-compatible change and might require updating > Kudu's configuration after upgrading to 1.12. > 3. Switch the default clock source to 'builtin', set its list of NTP > servers empty by default, and introduce a parser for chronyd/ntpd > configuration files. This way upgraded Kudu masters and tablet servers > would seamlessly switch to the built-in NTP client working with the same > set of NTP servers as local NTP daemons (assuming they are using either > chronyd or ntpd at their Kudu nodes). > 4. Implement a new mode with multiple time sources with a fallback behavior > between them. Make the new time source the default one. This way existing > users will not need to change anything unless they want to stick with the > 'system' time source. > > It's clear that option 2 brings usability issues, so it's not a good one. > Options 3 and 4 require some extra functionality: it's not too cumbersome, > but it requires some time to implement and test. However, it's necessary > to re-evaluate the decision to allow inherent 'dynamicity' of the time > source for a Kudu cluster with option 4. Option 1 looks like the safest > bet at this point. > > So, here is the proposal: let's keep 'system' as the default time source > for 1.12 release. This automatically removes upgrade-related risks for > existing Kudu clusters. It's always possible to switch to the built-in NTP > client with --time_source=builtin, of course. > > I'll document options 3 and 4 in upstream JIRAs. If we see more value for > Kudu users in making 'builtin' the default time source, we can reconsider > and move forward with option 3 or 4 (or some other option). > > Let me know if you have concerns about keeping 'system' as the default time > source for Kudu. Your feedback is appreciated! > > > Kind regards, > > Alexey > > On Tue, Mar 17, 2020 at 1:30 PM Todd Lipcon <t...@cloudera.com.invalid> > wrote: > > > I seem to recall discussing at one point the idea of setting multiple > time > > sources with a fallback behavior between them. In other words, the > default > > could be the built-in client, but in the case that the built-in client > > can't contact NTP servers, we would fall back to the system NTP. This > > switching behavior could be dynamic (eg it would switch back when system > > NTP is down). Has this been considered and ruled out for some reason? > > > > -Todd > > > > On Mon, Mar 16, 2020 at 9:51 PM Adar Lieber-Dembo > > <a...@cloudera.com.invalid> > > wrote: > > > > > I share the two concerns you highlighted, Alexey. > > > 1. This would be a backwards incompatible change. > > > 2. The default NTP server list could be unreachable, or could be a > > > poorer choice than whatever the cluster is currently using. Grant's > > > suggestion could mitigate that somewhat, but it's sort of weird for > > > Kudu to go rooting around in system configuration files, not to > > > mention the possibility that we could get it wrong. > > > > > > To that I'll add a third concern: > > > 3. The build-in NTP client has an awful lot of TODOs. Does it work > > > correctly when NTP servers misbehave? I presume chronyd and ntpd are > > > battle-tested in this regard. > > > > > > Taken together, I'd be hesitant to change the default time source, at > > > least not without more concrete feedback suggesting that it'd be an > > > improvement for the vast majority of our user base. Ad hoc switching > > > is always an option when the system time source doesn't work. > > > > > > On Mon, Mar 16, 2020 at 8:12 PM Grant Henke > <ghe...@cloudera.com.invalid > > > > > > wrote: > > > > > > > > > > > > > > Also, in > > > > > case of Kudu clusters running without access to the internet, it > will > > > be > > > > > necessary to point the built-in NTP client to some internal NTP > > servers > > > > > since pool.ntp.org servers (the default servers for the built-in > NTP > > > > > client) might not be accessible. > > > > > > > > > > > > > I think this was discussed at some point, but I don't remember the > > > > outcome/answer. > > > > Would it be possible to load the ntp servers from /etc/ntp.conf > > > > or /etc/chrony.conf if --builtin_ntp_servers > > > > aren't specified? We could still fall back to the default > pool.ntp.org > > > if > > > > an ntp configuration isn't found. > > > > Looking in two files isn't great, but we could give preference to > > > > chrony.conf. > > > > > > > > Thank you, > > > > Grant > > > > > > > > > > > > On Mon, Mar 16, 2020 at 9:16 PM Alexey Serbin <ale...@apache.org> > > wrote: > > > > > > > > > Hi, > > > > > > > > > > I'd like to get feedback on the subj, please. > > > > > > > > > > The built-in NTP client for Kudu masters and tablet servers was > > > introduced > > > > > in Kudu 1.11.0. Back then, there were thoughts of switching to the > > > > > built-in client by default starting Kudu 1.12. > > > > > > > > > > Since it's time for cutting 1.12 release branch pretty soon, I > think > > > it's a > > > > > good opportunity to clarify on whether we want to make that change > or > > > we > > > > > want to keep the time source as is (i.e. 'system') in 1.12 release. > > > > > > > > > > For more context, the built-in NTP client has been used to run > > external > > > > > mini-cluster-based test scenarios since 1.11.0 release for every > > gerrit > > > > > pre-commit build. In addition, I ran a 6 node cluster for a few > > weeks > > > at > > > > > two clusters cluster in public cloud with basic write/read workload > > > ('kudu > > > > > perf loadgen' with the --run_scan option). So far I've seen no > > issues > > > > > there. As for the use in a production environment, at this point > I'm > > > not > > > > > aware of any Kudu clusters running in production using the built-in > > NTP > > > > > client. > > > > > > > > > > The benefit of the internal built-in NTP client is that it allows > to > > > run > > > > > Kudu without the requirement of having the local machines' clocks > > > > > synchronized by the kernel NTP discipline. That might benefit > newer > > > Kudu > > > > > installations where machines' clocks are not synchronized > > > out-of-the-box > > > > > and users are not keen performing an extra step deploying NTP > servers > > > (and > > > > > configure them appropriately if the default configuration is not > good > > > > > enough -- e.g., in case of fire-walled internal clusters). > > > > > > > > > > If we switch to the 'builtin' time source by default (i.e. use the > > > built-in > > > > > NTP client), existing installations running with the 'system' time > > > source > > > > > will need to add an extra flag if it's desired to stay with the > > > 'system' > > > > > time source after the upgrade to 1.12. In that regard, the update > > > would > > > > > not be backwards-compatible, but Kudu users should not care much > > about > > > the > > > > > clock source assuming the built-in NTP client is reliable enough. > > > Also, in > > > > > case of Kudu clusters running without access to the internet, it > will > > > be > > > > > necessary to point the built-in NTP client to some internal NTP > > servers > > > > > since pool.ntp.org servers (the default servers for the built-in > NTP > > > > > client) might not be accessible. > > > > > > > > > > So, it seems enabling the built-in NTP client by default could > > benefit > > > > > newer installations, but might require extra configuration steps > for > > > > > existing Kudu deployments where pool.ntp.org NTP servers are not > > > > > accessible. The latter step should be described in the release > notes > > > for > > > > > 1.12 release, of course. Also, there is some risk of hitting a > > not-yet > > > > > detected bug in the built-in NTP client. > > > > > > > > > > Do you think the benefits of removing the requirement to have the > > local > > > > > clock synchronized by local NTP server outweighs the drawbacks of > > > adding an > > > > > extra configuration step during 1.12 upgrade for Kudu clusters > > isolated > > > > > from the Internet? > > > > > > > > > > Your feedback is highly appreciated! > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > Alexey > > > > > > > > > > > > > > > P.S. I sent the original message one week ago, but it seems it went > > > into > > > > > spam box or alike, so I'm re-sending it. > > > > > > > > > > > > > > > > > -- > > > > Grant Henke > > > > Software Engineer | Cloudera > > > > gr...@cloudera.com | twitter.com/gchenke | > linkedin.com/in/granthenke > > > > > > > > > -- > > Todd Lipcon > > Software Engineer, Cloudera > > > -- Todd Lipcon Software Engineer, Cloudera