Gehel has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/403933 )

Change subject: prometheus blazegraph exporter should not fail when blazegraph 
is down
......................................................................


prometheus blazegraph exporter should not fail when blazegraph is down

prometheus client collects all metrics on startup to enumerate the metric
names. If blazegraph isn't available at that time, it crashes. Note that
the same issue occurs if blazegraph goes down while the exporter is running.

Arguably, the prometheus python client itself should be more robust. But we
can already make metric collection more robust by managing at least the most
obvious exceptions.

Bug: T184434
Change-Id: I8bfd360b2ca864e8c21ec8b03bde58b16a2839ae
---
M prometheus-blazegraph-exporter
1 file changed, 30 insertions(+), 21 deletions(-)

Approvals:
  jenkins-bot: Verified
  Filippo Giunchedi: Looks good to me, but someone else must approve
  Gehel: Looks good to me, approved



diff --git a/prometheus-blazegraph-exporter b/prometheus-blazegraph-exporter
index 9f9a23e..a75ffa2 100755
--- a/prometheus-blazegraph-exporter
+++ b/prometheus-blazegraph-exporter
@@ -27,6 +27,7 @@
 
 from datetime import timedelta, tzinfo
 from dateutil.parser import parse
+from urllib2 import URLError
 from xml.etree import ElementTree
 
 from prometheus_client import start_http_server, Summary
@@ -52,9 +53,13 @@
         url = self.url + "counters?depth=10&" + \
             urllib.urlencode({'path': cnt_name})
 
-        req = urllib2.Request(url)
-        req.add_header('Accept', 'application/xml')
-        response = urllib2.urlopen(req)
+        try:
+            req = urllib2.Request(url)
+            req.add_header('Accept', 'application/xml')
+            response = urllib2.urlopen(req)
+        except URLError:
+            return None
+
         el = ElementTree.fromstring(response.read())
         last_name = cnt_name.split('/')[-1]
 
@@ -102,33 +107,37 @@
 
                 try:
                     value = float(metric_value)
-                except ValueError:
+                except (ValueError, TypeError):
                     value = float('nan')
 
                 metric_family.add_metric([], value)
 
-        sparql_query = """ prefix schema: <http://schema.org/>
-                    SELECT * WHERE { {
-                      SELECT ( COUNT( * ) AS ?count ) { ?s ?p ?o }
-                    } UNION {
-                      SELECT * WHERE { <http://www.wikidata.org> 
schema:dateModified ?y }
-                    } }"""
-        data = self.execute_sparql(sparql_query)
-
         triple_metric = CounterMetricFamily('blazegraph_triples', '')
         lag_metric = CounterMetricFamily('blazegraph_lastupdated', '')
+        try:
+            sparql_query = """ prefix schema: <http://schema.org/>
+                        SELECT * WHERE { {
+                          SELECT ( COUNT( * ) AS ?count ) { ?s ?p ?o }
+                        } UNION {
+                          SELECT * WHERE { <http://www.wikidata.org> 
schema:dateModified ?y }
+                        } }"""
 
-        for binding in data['results']['bindings']:
-            if 'count' in binding:
-                triple_count = binding['count']['value']
-                triple_metric.add_metric([], float(triple_count))
+            data = self.execute_sparql(sparql_query)
 
-            elif 'y' in binding:
-                lastUpdated = parse(binding['y']['value'])
-                lag_metric.add_metric([], float(lastUpdated.strftime('%s')))
-            else:
-                raise ValueError('SPARQL binding returned with unexpected key')
+            for binding in data['results']['bindings']:
+                if 'count' in binding:
+                    triple_count = binding['count']['value']
+                    triple_metric.add_metric([], float(triple_count))
 
+                elif 'y' in binding:
+                    lastUpdated = parse(binding['y']['value'])
+                    lag_metric.add_metric([], 
float(lastUpdated.strftime('%s')))
+                else:
+                    raise ValueError('SPARQL binding returned with unexpected 
key')
+
+        except URLError:
+            triple_metric.add_metric([], float('nan'))
+            lag_metric.add_metric([], float('nan'))
         yield triple_metric
         yield lag_metric
 

-- 
To view, visit https://gerrit.wikimedia.org/r/403933
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I8bfd360b2ca864e8c21ec8b03bde58b16a2839ae
Gerrit-PatchSet: 3
Gerrit-Project: operations/debs/prometheus-blazegraph-exporter
Gerrit-Branch: master
Gerrit-Owner: Gehel <guillaume.leder...@wikimedia.org>
Gerrit-Reviewer: Filippo Giunchedi <fgiunch...@wikimedia.org>
Gerrit-Reviewer: Gehel <guillaume.leder...@wikimedia.org>
Gerrit-Reviewer: Muehlenhoff <mmuhlenh...@wikimedia.org>
Gerrit-Reviewer: jenkins-bot <>

_______________________________________________
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to