[MediaWiki-commits] [Gerrit] operations/puppet[production]: cron_splay() with first use in cache_upload
BBlack has submitted this change and it was merged. Change subject: cron_splay() with first use in cache_upload .. cron_splay() with first use in cache_upload Change-Id: I995c8e55018bbd6544a55cae744658e972c72726 --- M modules/role/manifests/cache/upload.pp M modules/varnish/templates/varnish-backend-restart.cron.erb A modules/wmflib/lib/puppet/parser/functions/cron_splay.rb 3 files changed, 147 insertions(+), 10 deletions(-) Approvals: BBlack: Looks good to me, approved jenkins-bot: Verified diff --git a/modules/role/manifests/cache/upload.pp b/modules/role/manifests/cache/upload.pp index 908f7ad..ee8707a 100644 --- a/modules/role/manifests/cache/upload.pp +++ b/modules/role/manifests/cache/upload.pp @@ -131,15 +131,20 @@ } # XXX: temporary, we need this to mitigate T145661 -$rt_parts = split(inline_template("<%= require 'digest/md5'; x = Random.new(Digest::MD5.hexdigest(@fqdn).to_i(16)).rand(1440); hh = x / 60; mm = x % 60; hh.to_s() + ':' + mm.to_s(); %>"), ':') -$be_restart_h = $rt_parts[0] -$be_restart_m = $rt_parts[1] +if $::realm == 'production' { +$hnodes = hiera('cache::upload::nodes') +$all_nodes = array_concat($hnodes['eqiad'], $hnodes['esams'], $hnodes['ulsfo'], $hnodes['codfw']) +$times = cron_splay($all_nodes, 'daily', 'upload-backend-restarts') +$be_restart_h = $times['hour'] +$be_restart_m = $times['minute'] +$be_restart_d = $times['weekday'] -file { '/etc/cron.d/varnish-backend-restart': -mode=> '0444', -owner => 'root', -group => 'root', -content => template('varnish/varnish-backend-restart.cron.erb'), -require => File['/usr/local/sbin/varnish-backend-restart'], +file { '/etc/cron.d/varnish-backend-restart': +mode=> '0444', +owner => 'root', +group => 'root', +content => template('varnish/varnish-backend-restart.cron.erb'), +require => File['/usr/local/sbin/varnish-backend-restart'], +} } } diff --git a/modules/varnish/templates/varnish-backend-restart.cron.erb b/modules/varnish/templates/varnish-backend-restart.cron.erb index bc88893..2844280 100644 --- a/modules/varnish/templates/varnish-backend-restart.cron.erb +++ b/modules/varnish/templates/varnish-backend-restart.cron.erb @@ -1 +1 @@ -<%= @be_restart_m %> <%= @be_restart_h %> * * * root /usr/local/sbin/varnish-backend-restart > /dev/null +<%= @be_restart_m %> <%= @be_restart_h %> * * <%= @be_restart_d %> root /usr/local/sbin/varnish-backend-restart > /dev/null diff --git a/modules/wmflib/lib/puppet/parser/functions/cron_splay.rb b/modules/wmflib/lib/puppet/parser/functions/cron_splay.rb new file mode 100644 index 000..08fa6ab --- /dev/null +++ b/modules/wmflib/lib/puppet/parser/functions/cron_splay.rb @@ -0,0 +1,132 @@ +# +# cron_splay.rb +# + +require 'digest/md5' + +module Puppet::Parser::Functions + newfunction(:cron_splay, :type => :rvalue, :doc => <<-EOS +Given an array of fqdn which a cron is applicable to, and a period arg which is +one of 'hourly', 'daily', or 'weekly', this sorts the fqdn set with +per-datacenter interleaving for DC-numbered hosts, splays them to fixed even +intervals within the total period, and then outputs a set of crontab time +fields for the fqdn currently being compiled-for. + +The idea here is to ensure each host in the set executes the cron once per time +period, and also ensure the time between hosts is consistent (no edge cases +much closer than the average) by splaying them as evenly as possible with +rounding errors. For the case of hosts with numbers indicating the +datacenter in the first digit, we also maximize the period between any two +hosts in a given datacenter by interleaving sorted per-DC lists of hosts before +splaying. + +The third and final argument is a static seed which modulates the splayed +values in two different ways to minimize the effects of multiple cron_splay() +with the same hostlist and period. It is used to select a determinstically +random "offset" for the splayed time values (so that the first host doesn't +always start at 00:00), and is also used to permute the order of the hosts +within each DC uniquely. + +*Examples:* + +$times = fqdn_splay($hosts, 'weekly', 'foo-static-seed') +cron { 'foo': +minute => $times['minute'], +hour => $times['hour'], +weekday => $times['weekday'], +} + +EOS + ) do |arguments| + +raise(Puppet::ParseError, "cron_splay(): Wrong number of arguments " + + "given (#{arguments.size} for 3)") if arguments.size != 3 + +hosts = arguments[0] +period = arguments[1] +seed = arguments[2] + +unless hosts.is_a?(Array) + raise(Puppet::ParseError, 'cron_splay(): Argument 1 must be an array') +end + +unless period.is_a?(String) +
[MediaWiki-commits] [Gerrit] operations/puppet[production]: cron_splay() with first use in cache_upload
BBlack has uploaded a new change for review. https://gerrit.wikimedia.org/r/311239 Change subject: cron_splay() with first use in cache_upload .. cron_splay() with first use in cache_upload Change-Id: I995c8e55018bbd6544a55cae744658e972c72726 --- M modules/role/manifests/cache/upload.pp M modules/varnish/templates/varnish-backend-restart.cron.erb A modules/wmflib/lib/puppet/parser/functions/cron_splay.rb 3 files changed, 138 insertions(+), 4 deletions(-) git pull ssh://gerrit.wikimedia.org:29418/operations/puppet refs/changes/39/311239/1 diff --git a/modules/role/manifests/cache/upload.pp b/modules/role/manifests/cache/upload.pp index 908f7ad..7e828de 100644 --- a/modules/role/manifests/cache/upload.pp +++ b/modules/role/manifests/cache/upload.pp @@ -131,9 +131,12 @@ } # XXX: temporary, we need this to mitigate T145661 -$rt_parts = split(inline_template("<%= require 'digest/md5'; x = Random.new(Digest::MD5.hexdigest(@fqdn).to_i(16)).rand(1440); hh = x / 60; mm = x % 60; hh.to_s() + ':' + mm.to_s(); %>"), ':') -$be_restart_h = $rt_parts[0] -$be_restart_m = $rt_parts[1] +$hnodes = hiera('cache::upload::nodes') +$all_nodes = array_concat($hnodes['eqiad'], $hnodes['esams'], $hnodes['ulsfo'], $hnodes['codfw']) +$times = cron_splay($all_nodes, 'daily', 'upload-backend-restarts') +$be_restart_h = $times['hour'] +$be_restart_m = $times['minute'] +$be_restart_d = $times['weekday'] file { '/etc/cron.d/varnish-backend-restart': mode=> '0444', diff --git a/modules/varnish/templates/varnish-backend-restart.cron.erb b/modules/varnish/templates/varnish-backend-restart.cron.erb index bc88893..2844280 100644 --- a/modules/varnish/templates/varnish-backend-restart.cron.erb +++ b/modules/varnish/templates/varnish-backend-restart.cron.erb @@ -1 +1 @@ -<%= @be_restart_m %> <%= @be_restart_h %> * * * root /usr/local/sbin/varnish-backend-restart > /dev/null +<%= @be_restart_m %> <%= @be_restart_h %> * * <%= @be_restart_d %> root /usr/local/sbin/varnish-backend-restart > /dev/null diff --git a/modules/wmflib/lib/puppet/parser/functions/cron_splay.rb b/modules/wmflib/lib/puppet/parser/functions/cron_splay.rb new file mode 100644 index 000..7dba3e2 --- /dev/null +++ b/modules/wmflib/lib/puppet/parser/functions/cron_splay.rb @@ -0,0 +1,131 @@ +# +# cron_splay.rb +# + +require 'digest/md5' + +module Puppet::Parser::Functions + newfunction(:cron_splay, :type => :rvalue, :doc => <<-EOS +Given an array of fqdn which a cron is applicable to, and a period arg which is +one of 'hourly', 'daily', or 'weekly', this sorts the fqdn set with +per-datacenter interleaving for DC-numbered hosts, splays them to fixed even +intervals within the total period, and then outputs a set of crontab time +fields for the fqdn currently being compiled-for. + +The idea here is to ensure each host in the set executes the cron once per time +period, and also ensure the time between hosts is consistent (no edge cases +much closer than the average) by splaying them as evenly as possible with +rounding errors. For the case of hosts with numbers indicating the +datacenter in the first digit, we also maximize the period between any two +hosts in a given datacenter by interleaving sorted per-DC lists of hosts before +splaying. + +The third and final argument is a static seed which modulates the splayed +values in two different ways to minimize the effects of multiple cron_splay() +with the same hostlist and period. It is used to select a +determinstically-random "offset" for the splayed time values (so that the first +host doesn't always start at 00:00), and is also used to permute the order of +the hosts within each DC uniquely. + +*Examples:* + +$times = fqdn_splay($hosts, 'weekly', 'foo-static-seed') +crontab { 'foo': +minute => $times['minute'], +hour => $times['hour'], +weekday => $times['weekday'], +} + +EOS + ) do |arguments| + +raise(Puppet::ParseError, "cron_splay(): Wrong number of arguments " + + "given (#{arguments.size} for 3)") if arguments.size != 3 + +hosts = arguments[0] +period = arguments[1] +seed = arguments[2] + +unless hosts.is_a?(Array) + raise(Puppet::ParseError, 'cron_splay(): Argument 1 must be an array') +end + +unless period.is_a?(String) + raise(Puppet::ParseError, 'cron_splay(): Argument 2 must be an string') +end + +unless seed.is_a?(String) + raise(Puppet::ParseError, 'cron_splay(): Argument 3 must be an string') +end + +case period + when 'hourly' + mins = 60 + when 'daily' + mins = 1440 + when 'weekly' + mins = 10080 + else +raise(Puppet::ParseError, 'cron_splay(): invalid period') +end + +# Avoid this edge case for now. At sufficiently large host counts and +# small period, randomization is