[MediaWiki-commits] [Gerrit] Imported Upstream version 0.36+git1acdff3 - change (operations...carbon-c-relay)

2014-12-18 Thread Filippo Giunchedi (Code Review)
Filippo Giunchedi has uploaded a new change for review.

  https://gerrit.wikimedia.org/r/180756

Change subject: Imported Upstream version 0.36+git1acdff3
..

Imported Upstream version 0.36+git1acdff3

Change-Id: Ia850be88f3e7ffc8dcd23f1b13f04f14996e28e3
---
M README.md
A contrib/relay.conf
A contrib/relay.init
A contrib/relay.logrotate
A contrib/relay.monit
A contrib/relay.spec
A contrib/relay.sysconfig
M dispatcher.c
M dispatcher.h
M receptor.c
M receptor.h
M relay.c
M router.c
13 files changed, 442 insertions(+), 70 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/operations/debs/carbon-c-relay 
refs/changes/56/180756/1

diff --git a/README.md b/README.md
index 4d670e5..bc612b9 100644
--- a/README.md
+++ b/README.md
@@ -28,8 +28,22 @@
 right destination(s).  The route file supports two main constructs:
 clusters and matches.  The first define groups of hosts data metrics can
 be sent to, the latter define which metrics should be sent to which
-cluster.  Aggregation rules are seen as matches.  The syntax in this
-file is as follows:
+cluster.  Aggregation rules are seen as matches.
+
+For every metric received by the relay, cleansing is performed.  The
+following changes are performed before any match, aggregate or rewrite
+rule sees the metric:
+
+  - double dot elimination (necessary for correctly functioning
+consistent hash routing)
+  - trailing/leading dot elimination
+  - whitespace normalisation (this mostly affects output of the relay
+to other targets: metric, value and timestamp will be separated by
+a single space only, ever)
+  - irregular char replacement with underscores (\_), currently
+irregular is defined as not being in [0-9a-zA-Z-_:#].
+
+The route file syntax is as follows:
 
 ```
 # comments are allowed in any place and start with a hash (#)
@@ -124,9 +138,11 @@
 final, as no new entries are allowed to be added any more.  On top of an
 aggregation multiple aggregations can be computed.  They can be of the
 same or different aggregation types, but should write to a unique new
-metric.  Produced metrics are sent to the relay as if they were
-submitted from the outside, hence match and aggregation rules apply to
-those.  Care should be taken that loops are avoided.  Also, since
+metric.  The metric names can include back references like in rewrite
+expressions, allowing for powerful single aggregation rules that yield
+in many aggregations.  Produced metrics are sent to the relay as if they
+were submitted from the outside, hence match and aggregation rules apply
+to those.  Care should be taken that loops are avoided.  Also, since
 aggregations appear as matches without `stop` keyword, their positioning
 matters in the same way ordering of match statements.
 
@@ -313,8 +329,8 @@
 e.g. for each hostname encountered.  A typical aggregation looks like:
 
 aggregate
-sys.dc1.somehost-[0-9]+.somecluster.mysql.replication_delay
-sys.dc2.somehost-[0-9]+.somecluster.mysql.replication_delay
+^sys\.dc1\.somehost-[0-9]+\.somecluster\.mysql\.replication_delay
+^sys\.dc2\.somehost-[0-9]+\.somecluster\.mysql\.replication_delay
 every 10 seconds
 expire after 35 seconds
 compute sum write to
@@ -350,6 +366,31 @@
 carbon-c-relay instance, such that it is easy to forward the produced
 metrics to another relay instance is a good practice.
 
+The previous example could also be written as follows to be more
+dynamic:
+
+aggregate
+^sys\.dc[0-9].(somehost-[0-9]+)\.([^.]+)\.mysql\.replication_delay
+every 10 seconds
+expire after 35 seconds
+compute sum write to
+mysql.host.\1.replication_delay
+compute sum write to
+mysql.host.all.replication_delay
+compute sum write to
+mysql.cluster.\2.replication_delay
+compute sum write to
+mysql.cluster.all.replication_delay
+;
+
+Here a single match, results in four aggregations, each of a different
+scope.  In this example aggregation based on hostname and cluster are
+being made, as well as the more general `all` targets, which in this
+example have both identical values.  Note that with this single
+aggregation rule, both per-cluster, per-host and total aggregations are
+produced.  Obviously, the input metrics define which hosts and clusters
+are produced.
+
 
 Author
 --
diff --git a/contrib/relay.conf b/contrib/relay.conf
new file mode 100644
index 000..a32242c
--- /dev/null
+++ b/contrib/relay.conf
@@ -0,0 +1,63 @@
+# comments are allowed in any place and start with a hash (#)
+#
+#cluster name
+#forward | any_of | carbon_ch | fnv1a_ch [replication count]
+#host[:port] [proto udp | tcp] ...
+#;
+#match * | expression
+#send to cluster | blackhole
+#[stop]
+#;
+#rewrite expression
+#into replacement
+#;
+#aggregate
+#

[MediaWiki-commits] [Gerrit] Imported Upstream version 0.36+git1acdff3 - change (operations...carbon-c-relay)

2014-12-18 Thread Filippo Giunchedi (Code Review)
Filippo Giunchedi has submitted this change and it was merged.

Change subject: Imported Upstream version 0.36+git1acdff3
..


Imported Upstream version 0.36+git1acdff3

Change-Id: Ia850be88f3e7ffc8dcd23f1b13f04f14996e28e3
---
M README.md
A contrib/relay.conf
A contrib/relay.init
A contrib/relay.logrotate
A contrib/relay.monit
A contrib/relay.spec
A contrib/relay.sysconfig
M dispatcher.c
M dispatcher.h
M receptor.c
M receptor.h
M relay.c
M router.c
13 files changed, 442 insertions(+), 70 deletions(-)

Approvals:
  Filippo Giunchedi: Verified; Looks good to me, approved



diff --git a/README.md b/README.md
index 4d670e5..bc612b9 100644
--- a/README.md
+++ b/README.md
@@ -28,8 +28,22 @@
 right destination(s).  The route file supports two main constructs:
 clusters and matches.  The first define groups of hosts data metrics can
 be sent to, the latter define which metrics should be sent to which
-cluster.  Aggregation rules are seen as matches.  The syntax in this
-file is as follows:
+cluster.  Aggregation rules are seen as matches.
+
+For every metric received by the relay, cleansing is performed.  The
+following changes are performed before any match, aggregate or rewrite
+rule sees the metric:
+
+  - double dot elimination (necessary for correctly functioning
+consistent hash routing)
+  - trailing/leading dot elimination
+  - whitespace normalisation (this mostly affects output of the relay
+to other targets: metric, value and timestamp will be separated by
+a single space only, ever)
+  - irregular char replacement with underscores (\_), currently
+irregular is defined as not being in [0-9a-zA-Z-_:#].
+
+The route file syntax is as follows:
 
 ```
 # comments are allowed in any place and start with a hash (#)
@@ -124,9 +138,11 @@
 final, as no new entries are allowed to be added any more.  On top of an
 aggregation multiple aggregations can be computed.  They can be of the
 same or different aggregation types, but should write to a unique new
-metric.  Produced metrics are sent to the relay as if they were
-submitted from the outside, hence match and aggregation rules apply to
-those.  Care should be taken that loops are avoided.  Also, since
+metric.  The metric names can include back references like in rewrite
+expressions, allowing for powerful single aggregation rules that yield
+in many aggregations.  Produced metrics are sent to the relay as if they
+were submitted from the outside, hence match and aggregation rules apply
+to those.  Care should be taken that loops are avoided.  Also, since
 aggregations appear as matches without `stop` keyword, their positioning
 matters in the same way ordering of match statements.
 
@@ -313,8 +329,8 @@
 e.g. for each hostname encountered.  A typical aggregation looks like:
 
 aggregate
-sys.dc1.somehost-[0-9]+.somecluster.mysql.replication_delay
-sys.dc2.somehost-[0-9]+.somecluster.mysql.replication_delay
+^sys\.dc1\.somehost-[0-9]+\.somecluster\.mysql\.replication_delay
+^sys\.dc2\.somehost-[0-9]+\.somecluster\.mysql\.replication_delay
 every 10 seconds
 expire after 35 seconds
 compute sum write to
@@ -350,6 +366,31 @@
 carbon-c-relay instance, such that it is easy to forward the produced
 metrics to another relay instance is a good practice.
 
+The previous example could also be written as follows to be more
+dynamic:
+
+aggregate
+^sys\.dc[0-9].(somehost-[0-9]+)\.([^.]+)\.mysql\.replication_delay
+every 10 seconds
+expire after 35 seconds
+compute sum write to
+mysql.host.\1.replication_delay
+compute sum write to
+mysql.host.all.replication_delay
+compute sum write to
+mysql.cluster.\2.replication_delay
+compute sum write to
+mysql.cluster.all.replication_delay
+;
+
+Here a single match, results in four aggregations, each of a different
+scope.  In this example aggregation based on hostname and cluster are
+being made, as well as the more general `all` targets, which in this
+example have both identical values.  Note that with this single
+aggregation rule, both per-cluster, per-host and total aggregations are
+produced.  Obviously, the input metrics define which hosts and clusters
+are produced.
+
 
 Author
 --
diff --git a/contrib/relay.conf b/contrib/relay.conf
new file mode 100644
index 000..a32242c
--- /dev/null
+++ b/contrib/relay.conf
@@ -0,0 +1,63 @@
+# comments are allowed in any place and start with a hash (#)
+#
+#cluster name
+#forward | any_of | carbon_ch | fnv1a_ch [replication count]
+#host[:port] [proto udp | tcp] ...
+#;
+#match * | expression
+#send to cluster | blackhole
+#[stop]
+#;
+#rewrite expression
+#into replacement
+#;
+#aggregate
+#expression ...
+#every interval seconds
+#expire after