Sounds fine to me.

-Todd

On Wed, Mar 18, 2020 at 1:18 AM Alexey Serbin <aser...@cloudera.com.invalid>
wrote:

> Thank you all for the feedback!
>
> As Grant mentioned, parsing the configuration files of local NTP servers or
> getting the list of their peers via appropriate CLI was considered at some
> point in the scope of the 'auto' time source.  However, this option didn't
> look robust enough and no progress was made in that direction.  Among the
> concerns were the following corner cases: (a) the configuration file might
> be in a non-default location (b) certain CLI commands might be prohibited
> by custom security policy (c) there might be both ntpd and chronyd
> installed, and it's necessary to consult systemd/chkconfig (which is
> version/platform dependent) to resolve the ambiguity if neither of NTP
> daemons are running during Kudu's startup time
>
> I don't think we evaluated the idea of multiple time sources described by
> Todd, at least as it is articulated in this e-mail thread (i.e. using
> multiple time sources with a fallback behavior between them).  We looked at
> auto-configuring the time source at startup based on the system clock’s NTP
> synchronization status.  The latter was considered to be not very robust
> because it might (a) mask configuration issues (b) result in different time
> source within the same Kudu cluster due to transient issues (c) introduce
> extra startup delay.  The preferred choice was having something more
> deterministic and static, and --time_source=auto currently means using the
> built-in NTP client configured with the internal NTP server for AWS and GCE
> instances and using the system clock synchronized by NTP in all other
> cases.
>
> It's true that the built-in NTP client has numerous TODOs.  We have some
> test coverage based on chronyd's as part of our external mini-cluster test
> harness, but the built-in NTP client is not battle-tested at this point.  I
> agree with Adar that it's not clear how many existing and new Kudu users
> would benefit from switching to the built-in NTP client by default.  From
> that perspective, keeping the default time source as 'system' and allowing
> to switch to the built-in NTP client is a conservative, but very reasonable
> approach.
>
> OK, at this point I can see the following options for the default time
> source:
>
> 1. Keep the default clock source as 'system' and make it possible to switch
> to the built-in NTP client when --time_source=builtin is set explicitly
> (that's already how it is now).
> 2. Switch the default clock source to 'builtin' and mention in the release
> notes that it's not backwards-compatible change and might require updating
> Kudu's configuration after upgrading to 1.12.
> 3. Switch the default clock source to 'builtin', set its list of NTP
> servers empty by default, and introduce a parser for chronyd/ntpd
> configuration files.  This way upgraded Kudu masters and tablet servers
> would seamlessly switch to the built-in NTP client working with the same
> set of NTP servers as local NTP daemons (assuming they are using either
> chronyd or ntpd at their Kudu nodes).
> 4. Implement a new mode with multiple time sources with a fallback behavior
> between them.  Make the new time source the default one.  This way existing
> users will not need to change anything unless they want to stick with the
> 'system' time source.
>
> It's clear that option 2 brings usability issues, so it's not a good one.
> Options 3 and 4 require some extra functionality: it's not too cumbersome,
> but it requires some time to implement and test.  However, it's necessary
> to re-evaluate the decision to allow inherent 'dynamicity' of the time
> source for a Kudu cluster with option 4.  Option 1 looks like the safest
> bet at this point.
>
> So, here is the proposal: let's keep 'system' as the default time source
> for 1.12 release.  This automatically removes upgrade-related risks for
> existing Kudu clusters. It's always possible to switch to the built-in NTP
> client with --time_source=builtin, of course.
>
> I'll document options 3 and 4 in upstream JIRAs.  If we see more value for
> Kudu users in making 'builtin' the default time source, we can reconsider
> and move forward with option 3 or 4 (or some other option).
>
> Let me know if you have concerns about keeping 'system' as the default time
> source for Kudu.  Your feedback is appreciated!
>
>
> Kind regards,
>
> Alexey
>
> On Tue, Mar 17, 2020 at 1:30 PM Todd Lipcon <t...@cloudera.com.invalid>
> wrote:
>
> > I seem to recall discussing at one point the idea of setting multiple
> time
> > sources with a fallback behavior between them. In other words, the
> default
> > could be the built-in client, but in the case that the built-in client
> > can't contact NTP servers, we would fall back to the system NTP. This
> > switching behavior could be dynamic (eg it would switch back when system
> > NTP is down). Has this been considered and ruled out for some reason?
> >
> > -Todd
> >
> > On Mon, Mar 16, 2020 at 9:51 PM Adar Lieber-Dembo
> > <a...@cloudera.com.invalid>
> > wrote:
> >
> > > I share the two concerns you highlighted, Alexey.
> > > 1. This would be a backwards incompatible change.
> > > 2. The default NTP server list could be unreachable, or could be a
> > > poorer choice than whatever the cluster is currently using. Grant's
> > > suggestion could mitigate that somewhat, but it's sort of weird for
> > > Kudu to go rooting around in system configuration files, not to
> > > mention the possibility that we could get it wrong.
> > >
> > > To that I'll add a third concern:
> > > 3. The build-in NTP client has an awful lot of TODOs. Does it work
> > > correctly when NTP servers misbehave? I presume chronyd and ntpd are
> > > battle-tested in this regard.
> > >
> > > Taken together, I'd be hesitant to change the default time source, at
> > > least not without more concrete feedback suggesting that it'd be an
> > > improvement for the vast majority of our user base. Ad hoc switching
> > > is always an option when the system time source doesn't work.
> > >
> > > On Mon, Mar 16, 2020 at 8:12 PM Grant Henke
> <ghe...@cloudera.com.invalid
> > >
> > > wrote:
> > > >
> > > > >
> > > > > Also, in
> > > > > case of Kudu clusters running without access to the internet, it
> will
> > > be
> > > > > necessary to point the built-in NTP client to some internal NTP
> > servers
> > > > > since pool.ntp.org servers (the default servers for the built-in
> NTP
> > > > > client) might not be accessible.
> > > > >
> > > >
> > > > I think this was discussed at some point, but I don't remember the
> > > > outcome/answer.
> > > > Would it be possible to load the ntp servers from /etc/ntp.conf
> > > > or /etc/chrony.conf if --builtin_ntp_servers
> > > > aren't specified? We could still fall back to the default
> pool.ntp.org
> > > if
> > > > an ntp configuration isn't found.
> > > > Looking in two files isn't great, but we could give preference to
> > > > chrony.conf.
> > > >
> > > > Thank you,
> > > > Grant
> > > >
> > > >
> > > > On Mon, Mar 16, 2020 at 9:16 PM Alexey Serbin <ale...@apache.org>
> > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I'd like to get feedback on the subj, please.
> > > > >
> > > > > The built-in NTP client for Kudu masters and tablet servers was
> > > introduced
> > > > > in Kudu 1.11.0.  Back then, there were thoughts of switching to the
> > > > > built-in client by default starting Kudu 1.12.
> > > > >
> > > > > Since it's time for cutting 1.12 release branch pretty soon, I
> think
> > > it's a
> > > > > good opportunity to clarify on whether we want to make that change
> or
> > > we
> > > > > want to keep the time source as is (i.e. 'system') in 1.12 release.
> > > > >
> > > > > For more context, the built-in NTP client has been used to run
> > external
> > > > > mini-cluster-based test scenarios since 1.11.0 release for every
> > gerrit
> > > > > pre-commit build.  In addition, I ran a 6 node cluster for a few
> > weeks
> > > at
> > > > > two clusters cluster in public cloud with basic write/read workload
> > > ('kudu
> > > > > perf loadgen' with the --run_scan option).  So far I've seen no
> > issues
> > > > > there.  As for the use in a production environment, at this point
> I'm
> > > not
> > > > > aware of any Kudu clusters running in production using the built-in
> > NTP
> > > > > client.
> > > > >
> > > > > The benefit of the internal built-in NTP client is that it allows
> to
> > > run
> > > > > Kudu without the requirement of having the local machines' clocks
> > > > > synchronized by the kernel NTP discipline.  That might benefit
> newer
> > > Kudu
> > > > > installations where machines' clocks are not synchronized
> > > out-of-the-box
> > > > > and users are not keen performing an extra step deploying NTP
> servers
> > > (and
> > > > > configure them appropriately if the default configuration is not
> good
> > > > > enough -- e.g., in case of fire-walled internal clusters).
> > > > >
> > > > > If we switch to the 'builtin' time source by default (i.e. use the
> > > built-in
> > > > > NTP client), existing installations running with the 'system' time
> > > source
> > > > > will need to add an extra flag if it's desired to stay with the
> > > 'system'
> > > > > time source after the upgrade to 1.12.  In that regard, the update
> > > would
> > > > > not be backwards-compatible, but Kudu users should not care much
> > about
> > > the
> > > > > clock source assuming the built-in NTP client is reliable enough.
> > > Also, in
> > > > > case of Kudu clusters running without access to the internet, it
> will
> > > be
> > > > > necessary to point the built-in NTP client to some internal NTP
> > servers
> > > > > since pool.ntp.org servers (the default servers for the built-in
> NTP
> > > > > client) might not be accessible.
> > > > >
> > > > > So, it seems enabling the built-in NTP client by default could
> > benefit
> > > > > newer installations, but might require extra configuration steps
> for
> > > > > existing Kudu deployments where pool.ntp.org NTP servers are not
> > > > > accessible.  The latter step should be described in the release
> notes
> > > for
> > > > > 1.12 release, of course.  Also, there is some risk of hitting a
> > not-yet
> > > > > detected bug in the built-in NTP client.
> > > > >
> > > > > Do you think the benefits of removing the requirement to have the
> > local
> > > > > clock synchronized by local NTP server outweighs the drawbacks of
> > > adding an
> > > > > extra configuration step during 1.12 upgrade for Kudu clusters
> > isolated
> > > > > from the Internet?
> > > > >
> > > > > Your feedback is highly appreciated!
> > > > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Alexey
> > > > >
> > > > >
> > > > > P.S. I sent the original message one week ago, but it seems it went
> > > into
> > > > > spam box or alike, so I'm re-sending it.
> > > > >
> > > >
> > > >
> > > > --
> > > > Grant Henke
> > > > Software Engineer | Cloudera
> > > > gr...@cloudera.com | twitter.com/gchenke |
> linkedin.com/in/granthenke
> > >
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
> >
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to