[MediaWiki-commits] [Gerrit] operations/puppet[production]: Tune varnishkafka-webrequest parameters

2016-10-18 Thread Elukey (Code Review)
Elukey has submitted this change and it was merged.

Change subject: Tune varnishkafka-webrequest parameters
..


Tune varnishkafka-webrequest parameters

The Analytics team discovered a lot of webrequests missing the end
datatime field ending up in data consistency errors.
Varnishlog has been used on various cp hosts with the following
configuration to spot anomalies:

sudo varnishlog -c -n frontend -L 5000 -T 1500
  -q 'VSL or (Timestamp:Start and not Timestamp:Resp)' | tee timeouts.txt

The VSL timeouts settings (-L and -T) are the same used by Varnishkafka.
This query asks for any request that is either logged with a VSL timeout
or with a Start timestamp but not a Resp one. Two things came up:

1) A lot of requests with the HttpGarbage tag are discarded by Varnish but
logged. Example:

*   << Request  >> 173757699
-   Begin  req 173757698 rxreq
-   Timestamp  Start: 1476449182.479356 0.00 0.00
-   Timestamp  Req: 1476449182.479356 0.00 0.00
-   BogoHeader Header has ctrl char 0x1f
-   HttpGarbage"GET%00"
-   ReqAcct643 0 643 28 0 28
-   End

2) The VSL store overflow error is still present but happens less frequently.

The proposed solution for 1) is to avoid logging any request with
the HttpGarbage tag, and to raise the maximum number of incomplete requests
kept in memory to 1.

Bug: T148412
Change-Id: I68ada5789a848a676989c08590819625740b6bd8
---
M modules/role/manifests/cache/kafka/webrequest.pp
1 file changed, 11 insertions(+), 6 deletions(-)

Approvals:
  Elukey: Looks good to me, approved
  Ottomata: Looks good to me, but someone else must approve
  Ema: Looks good to me, but someone else must approve
  jenkins-bot: Verified



diff --git a/modules/role/manifests/cache/kafka/webrequest.pp 
b/modules/role/manifests/cache/kafka/webrequest.pp
index 65fcdd7..f541861 100644
--- a/modules/role/manifests/cache/kafka/webrequest.pp
+++ b/modules/role/manifests/cache/kafka/webrequest.pp
@@ -15,10 +15,11 @@
 {
 # Set varnish.arg.q or varnish.arg.m according to Varnish version
 if (hiera('varnish_version4', false)) {
-# Background from T136314:
+# Background task: T136314
+# Background info about the parameters used:
 # 'q':
-# Filter out PURGE requests and Pipe creation traffic.
-# A Varnish log containing Timestamp:Pipe does not carry 
Timestamp:Resp,
+# 1) Filter out PURGE requests and Pipe creation traffic.
+# 2) A Varnish log containing Timestamp:Pipe does not carry 
Timestamp:Resp,
 # used by Analytics to bucket data on Hadoop and for data consistency
 # checks. These requests indicate that Varnish tried to establish a 
pipe
 # channel between the client and the backend, an information that
@@ -30,13 +31,15 @@
 # At the moment these requests get logged incorrectly and with partial
 # data (due to the VSL timeout) so it makes sense to filter them out to
 # remove noise from Analytics data.
+# 3) A request marked with the VSL tag 'HttpGarbage' indicates 
unparseable
+# HTTP requests, generating spurious Varnish logs.
 # 'T':
 # VLS API timeout is the maximum time that Varnishkafka will wait 
between
 # "Begin" and "End" timestamps before flushing the available tags to a 
log.
 # When a timeout occurs most of the times the result is a webrequest 
log
 # missing values like the end timestamp.
 #
-# Parameters modified during the upload migration:
+# VSL Timeout parameters modified during the upload migration:
 # 'L':
 # Sets the upper limit of incomplete transactions kept before the 
oldest
 # one is force completed. This setting keeps an upper bound
@@ -44,14 +47,16 @@
 # A change in the -T timeout value has the side effect of keeping more
 # incomplete transactions in memory for each varnishkafka query (in 
our case
 # it directly corresponds to a varnishkafka instance running).
+# The threshold has been raised to '5000' the first time (which removed
+# the bulk of the timeouts) and to '1' the second time.
 # 'T':
 # Raised the maximum timeout for incomplete records from '700' to 
'1500'
 # after setting the -L to '5000'. VSL timeouts were masked
 # by VSL store overflow errors.
 $varnish_opts = {
-'q' => 'ReqMethod ne "PURGE" and not Timestamp:Pipe and not 
ReqHeader:Upgrade ~ "[wW]ebsocket"',
+'q' => 'ReqMethod ne "PURGE" and not Timestamp:Pipe and not 
ReqHeader:Upgrade ~ "[wW]ebsocket" and not HttpGarbage',
 'T' => '1500',
-'L' => '5000'
+'L' => '1'
 }
 $conf_template = 'varnishkafka/varnishkafka_v4.conf.erb'
 } else {

-- 
To view, visit https://gerrit.wikimedia.org/r/316306
To unsubscribe, visit 

[MediaWiki-commits] [Gerrit] operations/puppet[production]: Tune varnishkafka-webrequest parameters

2016-10-17 Thread Elukey (Code Review)
Elukey has uploaded a new change for review.

  https://gerrit.wikimedia.org/r/316306

Change subject: Tune varnishkafka-webrequest parameters
..

Tune varnishkafka-webrequest parameters

The Analytics team discovered a lot of webrequests missing the end
datatime field ending up in data consistency errors.
Varnishlog has been used on various cp hosts with the following
configuration to spot anomalies:

sudo varnishlog -c -n frontend -L 5000 -T 1500
  -q 'VSL or (Timestamp:Start and not Timestamp:Resp)' | tee timeouts.txt

The VSL timeouts settings (-L and -T) are the same used by Varnishkafka.
This query asks for any request that is either logged with a VSL timeout
or with a Start timestamp but not a Resp one. Two things came up:

1) A lot of requests with the HttpGarbage tag are discarded by Varnish but
logged.
2) The VSL store overflow error is still present but happens less frequently.

The proposed solution for 1) is to avoid logging any request with
the HttpGarbage tag, and to raise the maximum number of incomplete requests
kept in memory to 1.

Change-Id: I68ada5789a848a676989c08590819625740b6bd8
---
M modules/role/manifests/cache/kafka/webrequest.pp
1 file changed, 11 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/operations/puppet 
refs/changes/06/316306/1

diff --git a/modules/role/manifests/cache/kafka/webrequest.pp 
b/modules/role/manifests/cache/kafka/webrequest.pp
index 65fcdd7..f541861 100644
--- a/modules/role/manifests/cache/kafka/webrequest.pp
+++ b/modules/role/manifests/cache/kafka/webrequest.pp
@@ -15,10 +15,11 @@
 {
 # Set varnish.arg.q or varnish.arg.m according to Varnish version
 if (hiera('varnish_version4', false)) {
-# Background from T136314:
+# Background task: T136314
+# Background info about the parameters used:
 # 'q':
-# Filter out PURGE requests and Pipe creation traffic.
-# A Varnish log containing Timestamp:Pipe does not carry 
Timestamp:Resp,
+# 1) Filter out PURGE requests and Pipe creation traffic.
+# 2) A Varnish log containing Timestamp:Pipe does not carry 
Timestamp:Resp,
 # used by Analytics to bucket data on Hadoop and for data consistency
 # checks. These requests indicate that Varnish tried to establish a 
pipe
 # channel between the client and the backend, an information that
@@ -30,13 +31,15 @@
 # At the moment these requests get logged incorrectly and with partial
 # data (due to the VSL timeout) so it makes sense to filter them out to
 # remove noise from Analytics data.
+# 3) A request marked with the VSL tag 'HttpGarbage' indicates 
unparseable
+# HTTP requests, generating spurious Varnish logs.
 # 'T':
 # VLS API timeout is the maximum time that Varnishkafka will wait 
between
 # "Begin" and "End" timestamps before flushing the available tags to a 
log.
 # When a timeout occurs most of the times the result is a webrequest 
log
 # missing values like the end timestamp.
 #
-# Parameters modified during the upload migration:
+# VSL Timeout parameters modified during the upload migration:
 # 'L':
 # Sets the upper limit of incomplete transactions kept before the 
oldest
 # one is force completed. This setting keeps an upper bound
@@ -44,14 +47,16 @@
 # A change in the -T timeout value has the side effect of keeping more
 # incomplete transactions in memory for each varnishkafka query (in 
our case
 # it directly corresponds to a varnishkafka instance running).
+# The threshold has been raised to '5000' the first time (which removed
+# the bulk of the timeouts) and to '1' the second time.
 # 'T':
 # Raised the maximum timeout for incomplete records from '700' to 
'1500'
 # after setting the -L to '5000'. VSL timeouts were masked
 # by VSL store overflow errors.
 $varnish_opts = {
-'q' => 'ReqMethod ne "PURGE" and not Timestamp:Pipe and not 
ReqHeader:Upgrade ~ "[wW]ebsocket"',
+'q' => 'ReqMethod ne "PURGE" and not Timestamp:Pipe and not 
ReqHeader:Upgrade ~ "[wW]ebsocket" and not HttpGarbage',
 'T' => '1500',
-'L' => '5000'
+'L' => '1'
 }
 $conf_template = 'varnishkafka/varnishkafka_v4.conf.erb'
 } else {

-- 
To view, visit https://gerrit.wikimedia.org/r/316306
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I68ada5789a848a676989c08590819625740b6bd8
Gerrit-PatchSet: 1
Gerrit-Project: operations/puppet
Gerrit-Branch: production
Gerrit-Owner: Elukey 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org