[Wikidata-bugs] [Maniphest] [Commented On] T144948: some icinga checks on WDQS do not send notifications

2016-09-15 Thread Gehel
Gehel added a comment. Actually, it seems that only the WDQS_Lag check is not reported to the wdqs-admins group. Patch coming up.TASK DETAILhttps://phabricator.wikimedia.org/T144948EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: Smalyshev, Gehel

[Wikidata-bugs] [Maniphest] [Commented On] T144948: some icinga checks on WDQS do not send notifications

2016-09-15 Thread Gehel
Gehel added a comment. So it seems that puppet failures on wdqs1001 are notified on IRC (# wikidata), but 'WDQS HTTP Port' are not. Looking at the check definition on neon:/etc/icinga/puppet_services.cfg I don't see a significant difference: define service { # --PUPPET_NAME-- wdqs1001

[Wikidata-bugs] [Maniphest] [Block] T144380: Install and configure new WDQS nodes on codfw

2016-09-20 Thread Gehel
Gehel created subtask T146158: Configure varnish to include wdqs nodes in codfw. TASK DETAILhttps://phabricator.wikimedia.org/T144380EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: thcipriani, Stashbot, gerritbot, Gehel, Aklapper, Smalyshev

[Wikidata-bugs] [Maniphest] [Created] T146158: Configure varnish to include wdqs nodes in codfw

2016-09-20 Thread Gehel
Gehel created this task.Gehel added projects: Wikidata, Wikidata-Query-Service, Operations, Discovery.Herald added a subscriber: Aklapper. TASK DESCRIPTIONWDQS does not seem to have an LVS service configured, so wdqs servers are configured directly in the cache role. I'm not entirely sure how we

[Wikidata-bugs] [Maniphest] [Edited] T146158: Configure varnish to include wdqs nodes in codfw

2016-09-20 Thread Gehel
Gehel edited the task description. (Show Details) EDIT DETAILSWDQS does not seem to have an LVS service configured, so wdqs servers are configured directly in the [[ https://phabricator.wikimedia.org/diffusion/OPUP/browse/production/modules/role/manifests/cache/misc.pp | cache role ]]. I'm

[Wikidata-bugs] [Maniphest] [Updated] T138637: codfw: (2) wqds200[12] systems

2016-09-20 Thread Gehel
Gehel added a comment. The configuration of those servers is tracked on T144380. This task can probably be closed, unless it is still used to track some work on the procurement side. @RobH, I'll let you close unless you need it.TASK DETAILhttps://phabricator.wikimedia.org/T138637EMAIL

[Wikidata-bugs] [Maniphest] [Created] T146468: wdqs - move metric collections to diamond

2016-09-23 Thread Gehel
Gehel created this task.Gehel added projects: Wikidata-Query-Service, Operations, Discovery.Herald added a subscriber: Aklapper.Herald added a project: Wikidata. TASK DESCRIPTIONSome of WDQS metrics are collected through an external PHP script, not integrated with diamond or our usual metric

[Wikidata-bugs] [Maniphest] [Updated] T132457: Move wdqs to an LVS service

2016-09-22 Thread Gehel
Gehel added a project: Discovery-Wikidata-Query-Service-Sprint. TASK DETAILhttps://phabricator.wikimedia.org/T132457EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: ema, Gehel, BBlack, Aklapper, mschwarzer, Avner, debt, D3r1ck01, Jonas, FloNight

[Wikidata-bugs] [Maniphest] [Updated] T146207: publish lag and response time for wdqs codfw to graphite

2016-09-20 Thread Gehel
Gehel removed subscribers: Aklapper, gerritbot, Stashbot, thcipriani.Gehel removed a project: Discovery-Wikidata-Query-Service-Sprint. TASK DETAILhttps://phabricator.wikimedia.org/T146207EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: Smalyshev

[Wikidata-bugs] [Maniphest] [Block] T144380: Install and configure new WDQS nodes on codfw

2016-09-20 Thread Gehel
Gehel created subtask T146207: publish lag and response time for wdqs codfw to graphite. TASK DETAILhttps://phabricator.wikimedia.org/T144380EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: thcipriani, Stashbot, gerritbot, Gehel, Aklapper, Smalyshev

[Wikidata-bugs] [Maniphest] [Created] T146207: publish lag and response time for wdqs codfw to graphite

2016-09-20 Thread Gehel
Gehel created this task.Gehel added projects: Wikidata, Wikidata-Query-Service, Operations, Discovery, Discovery-Wikidata-Query-Service-Sprint, Patch-For-Review.Herald removed a project: Patch-For-Review. TASK DESCRIPTIONSome metrics are collected on wdqs outside of diamond, and not deployed

[Wikidata-bugs] [Maniphest] [Commented On] T137238: Tune WDQS caching headers

2016-08-23 Thread Gehel
Gehel added a comment. @Smalyshev yes there is: adding some cache-control headers. Change submitted. Thanks for reminding me!TASK DETAILhttps://phabricator.wikimedia.org/T137238EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: gerritbot, Smalyshev

[Wikidata-bugs] [Maniphest] [Commented On] T146576: 502 Bad Gateway errors while trying to run simple queries with the Wikidata Query Service

2016-09-25 Thread Gehel
Gehel added a comment. logs on wdqs1002 show error opening sockets: Sep 25 14:35:08 wdqs1002 bash[10426]: 2016-09-25 14:30:56.279:WARN:oejs.ServerConnector:qtp927028538-34-acceptor-0@6b2731c9-ServerConnector@74cdad37{HTTP/1.1}{localhost:}: Sep 25 14:35:08 wdqs1002 bash[10426

[Wikidata-bugs] [Maniphest] [Commented On] T146576: 502 Bad Gateway errors while trying to run simple queries with the Wikidata Query Service

2016-09-25 Thread Gehel
Gehel added a comment. Restarting blazegraph on wdqs1002 seems to have solved the issue. I'll dig into the logs to see if I find something that would explain the issue.TASK DETAILhttps://phabricator.wikimedia.org/T146576EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel

[Wikidata-bugs] [Maniphest] [Commented On] T146576: 502 Bad Gateway errors while trying to run simple queries with the Wikidata Query Service

2016-09-25 Thread Gehel
Gehel added a comment. It seems that wdqs1002 started showing replication lag issues around 16:00 UTC on Saturday Sept 24. Shortly after, there is a hole in that metric in Graphite. HTTP 502 went up much later, around 19:00 UTC.TASK DETAILhttps://phabricator.wikimedia.org/T146576EMAIL

[Wikidata-bugs] [Maniphest] [Created] T147130: Response times of Wikidata Query Service increasing

2016-10-02 Thread Gehel
Gehel created this task.Gehel added projects: Wikidata-Query-Service, Discovery, Operations.Herald added a subscriber: Aklapper.Herald added a project: Wikidata. TASK DESCRIPTIONThe graphite check for wikidata query service is regularly alerting since Saturday October 1st. Some investigation have

[Wikidata-bugs] [Maniphest] [Closed] T146576: 502 Bad Gateway errors while trying to run simple queries with the Wikidata Query Service

2016-10-02 Thread Gehel
Gehel closed this task as "Resolved".Gehel claimed this task.Gehel added a comment. As @jcrespo pointed out, the current issue is different that the one raised here. I created T147130 to track this new issue. And I'm closing this one.TASK DETAILhttps://phabricator.wikimedia.org/T1

[Wikidata-bugs] [Maniphest] [Commented On] T124627: Adjust balance of WDQS nodes to allow continued operation if eqiad went offline.

2016-09-29 Thread Gehel
Gehel added a comment. Work is happening in sub tasks, but it is happening. This should be done by the end of next week.TASK DETAILhttps://phabricator.wikimedia.org/T124627EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: greg, Southparkfan, hoo, Gehel

[Wikidata-bugs] [Maniphest] [Created] T150356: Wikidata Query Service is overly verbose toward logstash

2016-11-09 Thread Gehel
Gehel created this task.Gehel added projects: Wikidata-Query-Service, Operations, Discovery.Herald added a subscriber: Aklapper.Herald added a project: Wikidata. TASK DESCRIPTIONwdqs100[12] are the host generating the most traffic to logstash. A quick look seems to indicate that they send

[Wikidata-bugs] [Maniphest] [Reassigned] T148747: Estimate hardware requirements for WDQS upgrade

2016-10-25 Thread Gehel
Gehel reassigned this task from Gehel to Deskana.Gehel added a comment. @Deskana I reassign this to you. Let me know if you need more details or if you want me to move forward and open a hardware request for this.TASK DETAILhttps://phabricator.wikimedia.org/T148747EMAIL PREFERENCEShttps

[Wikidata-bugs] [Maniphest] [Claimed] T148747: Estimate hardware requirements for WDQS upgrade

2016-10-25 Thread Gehel
Gehel claimed this task.Gehel added a comment. Side note: at this point, the need to increase hardware is more for availability than for scalability. Constraints: We want to be able to continue operations in the case where we loose a datacenter, including normal maintenance operations. Each

[Wikidata-bugs] [Maniphest] [Unblock] T144380: Install and configure new WDQS nodes on codfw

2016-10-25 Thread Gehel
Gehel closed subtask T146158: Configure varnish to include wdqs nodes in codfw as "Resolved". TASK DETAILhttps://phabricator.wikimedia.org/T144380EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: thcipriani, Stashbot, gerritbot, Gehel

[Wikidata-bugs] [Maniphest] [Closed] T146158: Configure varnish to include wdqs nodes in codfw

2016-10-25 Thread Gehel
Gehel closed this task as "Resolved".Gehel added a comment. This has been resolved by implementing LVS as described on T132457.TASK DETAILhttps://phabricator.wikimedia.org/T146158EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: Aklapper,

[Wikidata-bugs] [Maniphest] [Commented On] T108488: Look into limiting connection rate to WDQS per external IP

2016-11-28 Thread Gehel
Gehel added a comment. I propose to start by enabling connection limiting with a fairly high limit and lowering it after a week, once we get some feedback.TASK DETAILhttps://phabricator.wikimedia.org/T108488EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences

[Wikidata-bugs] [Maniphest] [Closed] T148015: WDQS monitoring of response times needs to be adapted now that we use LVS

2016-11-04 Thread Gehel
Gehel closed this task as "Resolved".Gehel added a comment. This is resolved by https://gerrit.wikimedia.org/r/315651TASK DETAILhttps://phabricator.wikimedia.org/T148015EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: gerritbot, Smalys

[Wikidata-bugs] [Maniphest] [Declined] T147674: codfw hosts seem to have wrong logstash sever

2016-10-11 Thread Gehel
Gehel closed this task as "Declined".Gehel added a comment. We only have logstash servers in eqiad (yes, I know, we should have some in codfw as well, but that's not the case).TASK DETAILhttps://phabricator.wikimedia.org/T147674EMAIL PREFERENCEShttps://phabricator.wikimedia.org/sett

[Wikidata-bugs] [Maniphest] [Created] T148015: WDQS monitoring of response times needs to be adapted now that we use LVS

2016-10-13 Thread Gehel
Gehel created this task.Gehel added projects: Wikidata-Query-Service, Discovery-Wikidata-Query-Service-Sprint, Operations.Herald added a subscriber: Aklapper.Herald added projects: Wikidata, Discovery. TASK DESCRIPTIONThe merge of https://gerrit.wikimedia.org/r/#/c/312225/ changed the metrics

[Wikidata-bugs] [Maniphest] [Triaged] T148015: WDQS monitoring of response times needs to be adapted now that we use LVS

2016-10-13 Thread Gehel
Gehel triaged this task as "High" priority. TASK DETAILhttps://phabricator.wikimedia.org/T148015EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: Smalyshev, Gehel, Aklapper, mschwarzer, Avner, debt, D3r1ck01, Jonas, FloNight, Xmlizer, Iz

[Wikidata-bugs] [Maniphest] [Changed Subscribers] T108488: Look into limiting connection rate to WDQS per external IP

2016-11-30 Thread Gehel
Gehel added a subscriber: mpopov.Gehel added a comment. A quick analysis of the situation after 2 days of limiting connections: around 2k connections have been rate limited over a 24h period (grep limiting /var/log/nginx/error.log.1 | wc -l), which is ~ 2% of the non cached hits. the 95-%ile

[Wikidata-bugs] [Maniphest] [Commented On] T151889: nginx proxy on wdqs servers sometimes try to connect to backend over IPv6

2016-11-29 Thread Gehel
Gehel added a comment. Of course, one option could be to make the backend also listen to IPv6.TASK DETAILhttps://phabricator.wikimedia.org/T151889EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: Aklapper, Smalyshev, Gehel, EBjune, mschwarzer, Avner

[Wikidata-bugs] [Maniphest] [Triaged] T151889: nginx proxy on wdqs servers sometimes try to connect to backend over IPv6

2016-11-29 Thread Gehel
Gehel triaged this task as "High" priority. TASK DETAILhttps://phabricator.wikimedia.org/T151889EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: Aklapper, Smalyshev, Gehel, EBjune, mschwarzer, Avner, Zppix, debt, D3r1ck01, Jonas, FloNigh

[Wikidata-bugs] [Maniphest] [Created] T151889: nginx proxy on wdqs servers sometimes try to connect to backend over IPv6

2016-11-29 Thread Gehel
Gehel created this task.Gehel added projects: Operations, Wikidata-Query-Service, Discovery.Herald added a subscriber: Aklapper.Herald added a project: Wikidata. TASK DESCRIPTIONnginx logs on wdqs100[12] show errors (see below for example) that seem to indicate that nginx tries to connect

[Wikidata-bugs] [Maniphest] [Closed] T151889: nginx proxy on wdqs servers sometimes try to connect to backend over IPv6

2016-11-29 Thread Gehel
Gehel closed this task as "Resolved".Gehel added a comment. nginx is now configured to talk to the backend 127.0.0.1 and not localhost, so resolution to IPv6 is bypassed. No more errors seen in the logs.TASK DETAILhttps://phabricator.wikimedia.org/T151889EMAIL PREFER

[Wikidata-bugs] [Maniphest] [Updated] T152644: rack/setup/install wdqs2003

2016-12-22 Thread Gehel
Gehel added a parent task: T124627: Adjust balance of WDQS nodes to allow continued operation if eqiad went offline.. TASK DETAILhttps://phabricator.wikimedia.org/T152644EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: Southparkfan, Smalyshev

[Wikidata-bugs] [Maniphest] [Updated] T124627: Adjust balance of WDQS nodes to allow continued operation if eqiad went offline.

2016-12-22 Thread Gehel
Gehel added subtasks: T152643: rack/setup/install wdqs1003, T152644: rack/setup/install wdqs2003. TASK DETAILhttps://phabricator.wikimedia.org/T124627EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: greg, Southparkfan, hoo, Gehel, RobH, Deskana, Tfinc

[Wikidata-bugs] [Maniphest] [Updated] T152643: rack/setup/install wdqs1003

2016-12-22 Thread Gehel
Gehel added a parent task: T124627: Adjust balance of WDQS nodes to allow continued operation if eqiad went offline.. TASK DETAILhttps://phabricator.wikimedia.org/T152643EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Cmjohnson, GehelCc: Southparkfan

[Wikidata-bugs] [Maniphest] [Unblock] T124862: Deploy WDQS nodes on codfw

2016-12-22 Thread Gehel
Gehel closed subtask T144380: Install and configure new WDQS nodes on codfw as "Resolved". TASK DETAILhttps://phabricator.wikimedia.org/T124862EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: hoo, Gehel, Tfinc, Deskana, Joe, EBernhardson

[Wikidata-bugs] [Maniphest] [Closed] T144380: Install and configure new WDQS nodes on codfw

2016-12-22 Thread Gehel
Gehel closed this task as "Resolved".Gehel added a comment. Oops... yes, it has been done for some time... We now have 2 new servers (T152643 and T152644) but that's a different task...TASK DETAILhttps://phabricator.wikimedia.org/T144380EMAIL PREFERENCEShttps://phabricator.wikimedia.or

[Wikidata-bugs] [Maniphest] [Closed] T124862: Deploy WDQS nodes on codfw

2016-12-22 Thread Gehel
Gehel closed this task as "Resolved".Gehel claimed this task. TASK DETAILhttps://phabricator.wikimedia.org/T124862EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: hoo, Gehel, Tfinc, Deskana, Joe, EBernhardson, Aklapper, Smalyshev, Th3d3v1

[Wikidata-bugs] [Maniphest] [Unblock] T124627: Adjust balance of WDQS nodes to allow continued operation if eqiad went offline.

2016-12-22 Thread Gehel
Gehel closed subtask T124862: Deploy WDQS nodes on codfw as "Resolved". TASK DETAILhttps://phabricator.wikimedia.org/T124627EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: greg, Southparkfan, hoo, Gehel, RobH, Deskana, Tfinc, Smalyshev, Akl

[Wikidata-bugs] [Maniphest] [Commented On] T159574: LDF endpoint ordering is not stable between servers when paging

2017-03-21 Thread Gehel
Gehel added a comment. At this point, the only workable option is the "single LDF server" (appart from abandoning LDF completely). So let's see how we could implement that and see what feedback we get. limitations As far as I can see, LVS does not support an active / passive fa

[Wikidata-bugs] [Maniphest] [Created] T161240: Expose wikidata query service LDF endpoint in a scalable and available way

2017-03-23 Thread Gehel
Gehel created this task.Gehel added projects: Wikidata-Query-Service, Discovery-Wikidata-Query-Service-Sprint.Herald added a subscriber: Aklapper.Herald added projects: Wikidata, Discovery. TASK DESCRIPTIONT159574 has been fixed by sending all LDF traffic to a single server. This has obvious

[Wikidata-bugs] [Maniphest] [Changed Project Column] T159574: LDF endpoint ordering is not stable between servers when paging

2017-03-23 Thread Gehel
Gehel moved this task from Needs review to Done on the Discovery-Wikidata-Query-Service-Sprint board.Gehel added a comment. This is done. Longer term solution is tracked on T161240.TASK DETAILhttps://phabricator.wikimedia.org/T159574WORKBOARDhttps://phabricator.wikimedia.org/project/board/1239

[Wikidata-bugs] [Maniphest] [Commented On] T159574: LDF endpoint ordering is not stable between servers when paging

2017-03-23 Thread Gehel
Gehel added a comment. Varnish patch deployed. I'll keep an eye on logs to make sure all request are routed as we expect. We still need to find a longer term solution, but that's another ticket.TASK DETAILhttps://phabricator.wikimedia.org/T159574EMAIL PREFERENCEShttps://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] [Created] T162111: Make WDQS active / active

2017-04-03 Thread Gehel
Gehel created this task.Gehel added projects: Wikidata-Query-Service, Discovery-Search (Current work), Operations.Herald added a subscriber: Aklapper.Herald added projects: Wikidata, Discovery. TASK DESCRIPTIONTraffic is ready for active / active applications, WDQS is ready to be active / active

[Wikidata-bugs] [Maniphest] [Triaged] T162111: Make WDQS active / active

2017-04-03 Thread Gehel
Gehel triaged this task as "High" priority. TASK DETAILhttps://phabricator.wikimedia.org/T162111EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: Smalyshev, Gehel, Aklapper, Th3d3v1ls, Hfbn0, QZanden, EBjune, merbst, Salgo60, Avner, Z

[Wikidata-bugs] [Maniphest] [Updated] T162111: Make WDQS active / active

2017-04-10 Thread Gehel
Gehel edited projects, added Traffic; removed Patch-For-Review.Gehel added a subscriber: BBlack.Gehel added a comment. I'm not sure the change is effective. While I do see a few requests (outside of pyball / icinga) in the nginx logs on the wdqs codwf servers, I don't see as many as I would expect

[Wikidata-bugs] [Maniphest] [Commented On] T162111: Make WDQS active / active

2017-04-10 Thread Gehel
Gehel added a comment. Using the following curl to test, I don't see an entry in the nginx access log: curl 'https://query.wikidata.org/bigdata/namespace/wdq/sparql?query=%23Streets%20without%20a%20city%0ASELECT%20%3Fstreet%20%3FstreetLabel%0AWHERE%0A%7B%0A%20%20%20%20%3Fstreet%20wdt%3AP31%2Fwdt

[Wikidata-bugs] [Maniphest] [Changed Project Column] T162111: Make WDQS active / active

2017-04-10 Thread Gehel
Gehel moved this task from Backlog to Done on the Discovery-Search (Current work) board.Gehel added a comment. grafana dashboard was wrongly filtering on eqiad only (that's why I did not see any traffic there). More tests and checking x-cache and x-served-by headers show that indeed traffic

[Wikidata-bugs] [Maniphest] [Commented On] T162111: Make WDQS active / active

2017-04-07 Thread Gehel
Gehel added a comment. Initial import is completed, wdqs-updater is restarted and is catching up on the differences since last export.TASK DETAILhttps://phabricator.wikimedia.org/T162111EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: Stashbot

[Wikidata-bugs] [Maniphest] [Updated] T159574: LDF endpoint ordering is not stable between servers when paging

2017-03-07 Thread Gehel
Gehel added subscribers: ema, BBlack.Gehel added a comment. For some more context: LDF is a way to cheaply get large lists of triples from WDQS, and displace some logic on the clients. Retrieving this list is done page by page. We already have use cases for this. The iteration order is just

[Wikidata-bugs] [Maniphest] [Commented On] T159245: wdqs1001 and wdqs1003 unresponsive

2017-03-07 Thread Gehel
Gehel added a comment. An early report based on GC logs is available. This report analyses only 14h of data, so no hard conclusions yet. Still, a few things to note: We use G1, I have not much experience with it, I need to learn! Over the period analysed, we rarely go over 8Go of heap after GC

[Wikidata-bugs] [Maniphest] [Commented On] T159248: collect usual GC metrics for Blazegraph JVMs

2017-03-01 Thread Gehel
Gehel added a comment. Elasticsearch configuration is done through puppet (https://github.com/wikimedia/puppet/blob/production/modules/elasticsearch/manifests/init.pp#L132-L141)TASK DETAILhttps://phabricator.wikimedia.org/T159248EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel

[Wikidata-bugs] [Maniphest] [Commented On] T119915: Create response time monitoring for WDQS endpoint

2017-07-10 Thread Gehel
Gehel added a comment. The UNKNOWN disappeared now that we are active/active. Previously, when no traffic was sent to codfw, we had no meaningful data about response time. This can be closed again.TASK DETAILhttps://phabricator.wikimedia.org/T119915EMAIL PREFERENCEShttps

[Wikidata-bugs] [Maniphest] [Commented On] T171210: rack/setup/install wdqs100[45].eqiad.wmnet

2017-07-24 Thread Gehel
Gehel added a comment. @Cmjohnson now that I am back, do you need anything from me to move forward on this?TASK DETAILhttps://phabricator.wikimedia.org/T171210EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Cmjohnson, GehelCc: PokestarFan, Aklapper, Gehel

[Wikidata-bugs] [Maniphest] [Created] T171855: reimage wdq-beta to debian jessie

2017-07-27 Thread Gehel
Gehel created this task.Gehel added projects: Wikidata-Query-Service, Discovery-Search (Current work).Herald added subscribers: PokestarFan, Aklapper.Herald added projects: Wikidata, Discovery. TASK DESCRIPTIONwdq-beta is still running on ubuntu trusty. This would allow to remove a significant

[Wikidata-bugs] [Maniphest] [Commented On] T166244: Reload WDQS data after T131960 is merged

2017-07-06 Thread Gehel
Gehel added a comment. After discussion with @Smalyshev, the data reload procedure should be: 1 server at a time on eqiad we can reload 2 servers at a the same time on codfw as the traffic there is minimal care must be taken to send LDF traffic away from wdqs1001 before reloading

[Wikidata-bugs] [Maniphest] [Created] T172774: Review users being throttled

2017-08-08 Thread Gehel
Gehel created this task.Gehel added projects: Wikidata-Query-Service, Discovery-Wikidata-Query-Service-Sprint.Herald added subscribers: PokestarFan, Aklapper.Herald added projects: Wikidata, Discovery. TASK DESCRIPTIONNow that T170860 is deployed, having a look at what users are being throttled

[Wikidata-bugs] [Maniphest] [Created] T172782: wdqs-updater is throttled and does not recovers

2017-08-08 Thread Gehel
Gehel created this task.Gehel added projects: Wikidata-Query-Service, Discovery-Wikidata-Query-Service-Sprint.Herald added subscribers: PokestarFan, Aklapper.Herald added projects: Wikidata, Discovery. TASK DESCRIPTIONOn Aug 08 11:12:50, wdqs-updater on wdqs1001 was throttled. It stopped doing any

[Wikidata-bugs] [Maniphest] [Commented On] T172774: Review users being throttled

2017-08-08 Thread Gehel
Gehel added a comment. 16 hours after deployment, ~900 requests have been throttled (for a total of ~30k requests, so ~3% of requests are being throttled). ~650 of those requests are obviously by bots. We might want to contact the bots owner (where possible) to see if the restriction is affecting

[Wikidata-bugs] [Maniphest] [Created] T172798: allow wdqs-admins to pool / depool wdqs servers

2017-08-08 Thread Gehel
Gehel created this task.Gehel added projects: Ops-Access-Requests, Operations, Wikidata-Query-Service, Discovery-Wikidata-Query-Service-Sprint.Herald added subscribers: PokestarFan, Aklapper.Herald added projects: Wikidata, Discovery. TASK DESCRIPTIONFor maintenance operations, Stas (or other wdqs

[Wikidata-bugs] [Maniphest] [Updated] T172713: WDQS logstash parser does not parse some requests

2017-08-09 Thread Gehel
Gehel added a project: Discovery-Search (Current work). TASK DETAILhttps://phabricator.wikimedia.org/T172713EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: Stashbot, gerritbot, Aklapper, PokestarFan, Gehel, Smalyshev, Lordiis, GoranSMilovanovic

[Wikidata-bugs] [Maniphest] [Created] T172710: send wdqs logs to logstash

2017-08-07 Thread Gehel
Gehel created this task.Gehel added projects: Wikidata-Query-Service, Operations.Herald added subscribers: PokestarFan, Aklapper.Herald added projects: Wikidata, Discovery. TASK DESCRIPTIONwdqs log messages woudl be much easier to read / analyze if they were sent to logstash. wdqs logging

[Wikidata-bugs] [Maniphest] [Created] T168636: wdqs-updater is stuck after restart of wdqs-blazegraph

2017-06-22 Thread Gehel
Gehel created this task.Gehel added projects: Wikidata-Query-Service, Discovery.Herald added a subscriber: Aklapper.Herald added a project: Wikidata. TASK DESCRIPTIONAs seen multiple times on grafana, wdqs-updater get stuck after the wdqs-blazegraph restart that happen during deployment of new

[Wikidata-bugs] [Maniphest] [Retitled] T166378: wdqs-updater fails when tail-poller queue is full

2017-05-26 Thread Gehel
Gehel created this task.Gehel added a project: Wikidata-Query-Service.Herald added a subscriber: Aklapper.Herald added projects: Wikidata, Discovery. TASK DESCRIPTIONDuring a recent high edit rate on wikidata, wikidata query service stopped to process updates. The updater logs show: May 26 10:14

[Wikidata-bugs] [Maniphest] [Edited] T166378: wdqs-updater fails when tail-poller queue is full

2017-05-26 Thread Gehel
Gehel updated the task description. (Show Details) CHANGES TO TASK DESCRIPTION...Looking at [[ https://github.com/wikimedia/wikidata-query-rdf/blob/master/tools/src/main/java/org/wikidata/query/rdf/tool/change/TailingChangesPoller.java#L64-L96 | TailingChangesPoller ]], it looks like the case

[Wikidata-bugs] [Maniphest] [Retitled] T166524: high replication lag on wdqs1002

2017-05-29 Thread Gehel
Gehel created this task.Gehel added a project: Wikidata-Query-Service.Herald added a subscriber: Aklapper.Herald added projects: Wikidata, Discovery. TASK DESCRIPTIONToday (May 29th) since ~ noon UTC, wdqs1002 is lagging behind on replication. Updates are still happening, just not fast enough

[Wikidata-bugs] [Maniphest] [Updated] T166524: high replication lag on wdqs1002

2017-05-29 Thread Gehel
Gehel added a project: Operations.Gehel added a comment. Looking at dmesg, there are a lot of warnings about CPU temperature and throttling: [9098037.343804] CPU23: Package temperature above threshold, cpu clock throttled (total events = 52647618) This has been going on at least since May 22

[Wikidata-bugs] [Maniphest] [Changed Subscribers] T166524: high replication lag on wdqs1002

2017-05-29 Thread Gehel
Gehel added a subscriber: Cmjohnson.Gehel added a comment. @Cmjohnson is this high temperature an indication that you should do some magic with thermal paste?TASK DETAILhttps://phabricator.wikimedia.org/T166524EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences

[Wikidata-bugs] [Maniphest] [Changed Subscribers] T166524: high replication lag on wdqs1002

2017-05-30 Thread Gehel
Gehel added a subscriber: RobH.Gehel added a comment. @RobH: from racktables, it looks like wdqs1002 is 4.5 years old (purchase date = 2012-12-05, same as wdqs1001 - other servers are newer). I'm not sure about the warranty status, or when we should think about renewing those servers. Any idea

[Wikidata-bugs] [Maniphest] [Updated] T166524: high replication lag on wdqs1002

2017-05-30 Thread Gehel
Gehel added a project: Discovery-Search (Current work).Gehel added a comment. Taking wdqs1002 out of LVS seems to have given it sufficient breathing space to catch up on replication. I added it back and it seems stable so far. I'm still not trusting it entirely...TASK DETAILhttps

[Wikidata-bugs] [Maniphest] [Updated] T166378: wdqs-updater fails when tail-poller queue is full

2017-05-30 Thread Gehel
Gehel added a project: Discovery-Search (Current work). TASK DETAILhttps://phabricator.wikimedia.org/T166378EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: gerritbot, Stashbot, Smalyshev, Gehel, Aklapper, GoranSMilovanovic, Adik2382, Th3d3v1ls

[Wikidata-bugs] [Maniphest] [Claimed] T166378: wdqs-updater fails when tail-poller queue is full

2017-05-30 Thread Gehel
Gehel claimed this task. TASK DETAILhttps://phabricator.wikimedia.org/T166378EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: gerritbot, Stashbot, Smalyshev, Gehel, Aklapper, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune

[Wikidata-bugs] [Maniphest] [Claimed] T166524: high replication lag on wdqs1002

2017-05-30 Thread Gehel
Gehel claimed this task. TASK DETAILhttps://phabricator.wikimedia.org/T166524EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: hoo, Jonas, Lydia_Pintscher, RobH, Cmjohnson, Stashbot, Alphos, Smalyshev, Gehel, Aklapper, GoranSMilovanovic, Th3d3v1ls

[Wikidata-bugs] [Maniphest] [Claimed] T166524: high replication lag on wdqs1002

2017-06-01 Thread Gehel
Gehel claimed this task.Gehel added a comment. thermal paste has been added by @Cmjohnson, this can be closed.TASK DETAILhttps://phabricator.wikimedia.org/T166524EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: hoo, Jonas, Lydia_Pintscher, RobH

[Wikidata-bugs] [Maniphest] [Retitled] T166780: Replace wdqs100[12] servers as they are getting old

2017-06-01 Thread Gehel
Gehel created this task.Gehel added projects: hardware-requests, Wikidata-Query-Service.Herald added projects: Operations, Wikidata, Discovery. TASK DESCRIPTIONWe had a recent issue on wdqs1002 (T166524) with CPU overheating. After a chat with @RobH and since wdqs1001 and wdqs1002 are almost 5

[Wikidata-bugs] [Maniphest] [Reassigned] T166524: high replication lag on wdqs1002

2017-06-01 Thread Gehel
Gehel moved this task from In progress to Done on the Discovery-Search (Current work) board.Gehel reassigned this task from Gehel to Cmjohnson.Gehel added a comment. wdqs1002 has not had any issue since then. Hardware request is done on a separate ticket. It still probably make sense to have

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-09-15 Thread Gehel
Gehel added a comment. Looking at wdqs-updater GC logs on wdqs1004, in the last 7 days: heap before GC peaks at ~1.4GB (with a few higher peaks at ~2GB) heap after full GC is ~512MB max heap size is configured at 2GB allocation rate over that period is ~70MB/s (but probably peaks much higher

[Wikidata-bugs] [Maniphest] [Created] T175919: investigate GC times on wikidata query service

2017-09-14 Thread Gehel
Gehel created this task.Gehel added projects: Wikidata-Query-Service, Discovery-Wikidata-Query-Service-Sprint.Herald added a subscriber: Aklapper.Herald added projects: Wikidata, Discovery. TASK DESCRIPTIONA quick look at GC logs on wdqs1004 show that almost 5% of time is spent in GC. That seems

[Wikidata-bugs] [Maniphest] [Updated] T176192: WDQS clone keeps freezing up on GC

2017-09-19 Thread Gehel
Gehel added a comment. I had a chat with @Yurik to dig a bit more into this issue. He is seeing a 75% GC overhead, which seems very different than the behaviour we observe on our own wdqs instances. We are doing our own investigation into GC tuning (T175919). I did gave @Yurik a few idea to try

[Wikidata-bugs] [Maniphest] [Declined] T176192: WDQS clone keeps freezing up on GC

2017-09-19 Thread Gehel
Gehel closed this task as "Declined". TASK DETAILhttps://phabricator.wikimedia.org/T176192EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: Yurik, Smalyshev, Gehel, Aklapper, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, mer

[Wikidata-bugs] [Maniphest] [Commented On] T176160: Puppet fails on newly created WDQS labs instance

2017-09-19 Thread Gehel
Gehel added a comment. My bad... There is now a role::wdqs::labs class that can be applied on VMs on Horizon. This class includes everything in the role::wdqs except the few things that should not be enabled in WMCS. I already applied the class an ran puppet. There is still an error, since

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-05 Thread Gehel
Gehel added a comment. New GC configuration deployed, all servers restarted. I'll wait a few hours and I'll have a look at GC logs to find out if we see improvements. For reference, the JVM options before: -XX:+UseG1GC -Xms12g -Xmx12g -Xloggc:/var/log/wdqs/wdqs-blazegraph_jvm_gc.%p.log -XX

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-04 Thread Gehel
Gehel added a comment. See F9995047: wdqs gc logs for an example of problematic GC log.TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: Stashbot, gerritbot, Smalyshev, Gehel, Aklapper, Gq86, Lordiis

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-04 Thread Gehel
Gehel added a comment. for reference: JVM: gehel@wdqs1004:~$ java -version openjdk version "1.8.0_141" OpenJDK Runtime Environment (build 1.8.0_141-8u141-b15-1~bpo8+1-b15) OpenJDK 64-Bit Server VM (build 25.141-b15, mixed mode) JVM options (full command line): java -server -XX:+UseG1

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-04 Thread Gehel
Gehel added a comment. Using a demo version of Jclarity Censum, I see the following: Problems Premature promotion: There are a number of possible causes for this problem: Survivor spaces are too small. Young gen is too small. The -XX:MaxTenuringThreshold flag may have been set too low

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-03 Thread Gehel
Gehel added a comment. It would be interesting to see if allocation rate goes up when we see the JVM locking up. gceasy does not graph allocation rate over time, but I think that JClarity Censum might do that (closed source, but demo license available).TASK DETAILhttps://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-04 Thread Gehel
Gehel added a comment. Transparent huge pages seem to be disabled: gehel@wdqs1004:~$ cat /sys/kernel/mm/transparent_hugepage/enabled always [madvise] never Patch coming up to test other options.TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-04 Thread Gehel
Gehel added a comment. A thread on Friends of JClarity suggests: adding -XX:+ParallelRefProcEnabled (since we seem to have really high Reference processing times turn off PrintTenuringThreshold, since it is rarely useful with G1. check transparent huge pages are disabled activate

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-03 Thread Gehel
Gehel added a comment. Comments inline. Keep in mind that my understanding of GC is limited, I am most probably wrong in a lot of what I write below. And my understanding of G1 is even more limited... In T175919#3647588, @Smalyshev wrote: So I took a look at our logs from Sep 29 with http

[Wikidata-bugs] [Maniphest] [Changed Project Column] T175919: investigate GC times on wikidata query service

2017-10-10 Thread Gehel
Gehel moved this task from In progress to Done on the Discovery-Wikidata-Query-Service-Sprint board.Gehel added a comment. Looking at a few days of logs, after the recent configuration changes: overall, things look better, GC overhead is under control, < 5% we still have long GC pauses (betw

[Wikidata-bugs] [Maniphest] [Created] T178271: Allow Kirk and Martijn (JClarity) access to our WDQS production servers

2017-10-16 Thread Gehel
Gehel created this task.Gehel added projects: Wikidata-Query-Service, Discovery-Wikidata-Query-Service-Sprint.Herald added a subscriber: Aklapper.Herald added projects: Wikidata, Discovery. TASK DESCRIPTIONAfter an interesting discussion on the Friends of JClarity Google group, Martijn Verburg

[Wikidata-bugs] [Maniphest] [Commented On] T175948: Add normalized predicates to Blazegraph vocabulary

2017-10-16 Thread Gehel
Gehel added a comment. In T175948#3630631, @Smalyshev wrote: The tricky part here is that once we deploy the change, we'd need to reload immediately (since the dictionary would be incompatible). But we don't want to take all servers down at the same time. So we need a way to run server

[Wikidata-bugs] [Maniphest] [Edited] T178271: Allow Kirk and Martijn (JClarity) access to our WDQS production servers

2017-10-16 Thread Gehel
Gehel updated the task description. (Show Details) CHANGES TO TASK DESCRIPTION...[ ] [[ https://www.mediawiki.org/wiki/Developer_access | Request developer access ]] (should be done directly by Martijn and Kirk, does not need the NDA to be signed, so we can move forward here already) [ ] [[ https

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-05 Thread Gehel
Gehel added a comment. Looking at Grafana, we can already see a decrease in overall GC time. Looking good!TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: Stashbot, gerritbot, Smalyshev, Gehel

[Wikidata-bugs] [Maniphest] [Commented On] T175919: investigate GC times on wikidata query service

2017-10-11 Thread Gehel
Gehel added a comment. According to Kirk Pepperdine, we might have run into a G1 bug... I'll try to help the JVM guys debug it, and we might get a long term solution at some point...TASK DETAILhttps://phabricator.wikimedia.org/T175919EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings

[Wikidata-bugs] [Maniphest] [Commented On] T172710: send wdqs logs to logstash

2017-08-29 Thread Gehel
Gehel added a comment. Oh, I was expecting %{HOSTNAME} to be interpreted by logstash itself, not as a ref in the same document. There is something about HOSTNAME being lazy loaded in recent version of logback. I'll try to find the reference...TASK DETAILhttps://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] [Commented On] T171210: rack/setup/install wdqs100[45].eqiad.wmnet

2017-09-11 Thread Gehel
Gehel added a comment. initial data import is done, wdqs100[45] can now be pooled.TASK DETAILhttps://phabricator.wikimedia.org/T171210EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: ops-monitoring-bot, gerritbot, PokestarFan, Aklapper, Gehel

[Wikidata-bugs] [Maniphest] [Updated] T175017: WDQS - MWApiServiceCall / MWAPI REQUEST

2017-09-06 Thread Gehel
Gehel added a project: Discovery-Wikidata-Query-Service-Sprint. TASK DETAILhttps://phabricator.wikimedia.org/T175017EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: gerritbot, Aklapper, Smalyshev, Gehel, Lordiis, Lucas_Werkmeister_WMDE

[Wikidata-bugs] [Maniphest] [Changed Subscribers] T172774: Review users being throttled

2017-09-06 Thread Gehel
Gehel added a subscriber: EBjune.Gehel added a comment. @EBjune: you should have access to kibana already. You need to use your wikitech (LDAP) username. The authorization is done on the ops / nda / wmf groups, and I expect that you are already in the nda and wmf groups. Ping me when you

<    1   2   3   4   5   6   7   8   9   10   >