[Wikidata-bugs] [Maniphest] [Commented On] T134238: Query service fails with "Too many open files"

2016-05-07 Thread Stashbot
Stashbot added a comment. Mentioned in SAL [2016-05-07T12:32:44Z] Restarted blazegraph on wdqs1002 (Unresponsive, even locally: java.io.IOException: Too many open files) https://phabricator.wikimedia.org/T134238 TASK DETAIL https://phabricator.wikimedia.org/T134238 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] [Commented On] T134238: Query service fails with "Too many open files"

2016-05-06 Thread Smalyshev
Smalyshev added a comment. Looks like the real cause is here: https://jira.blazegraph.com/browse/BLZG-1884 I'll add the patch as soon as it's released. TASK DETAIL https://phabricator.wikimedia.org/T134238 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] [Commented On] T134238: Query service fails with "Too many open files"

2016-05-05 Thread Gehel
Gehel added a comment. CLOSE_WAIT is whan the TCP stack is waiting for the application to acknowledge that the connection is terminated. So yes, a high number of CLOSE_WAIT is an indication of a problem. If blazegraph is frozen (or Jetty more precisely) that could explain. If I read the

[Wikidata-bugs] [Maniphest] [Commented On] T134238: Query service fails with "Too many open files"

2016-05-05 Thread Smalyshev
Smalyshev added a comment. TIME_WAIT seems to be normal, but I also saw very high number of CLOSE_WAIT ones, when Blazegraph was stalling, not sure if it's the cause or the symptom, but this seems to be what was keeping the file handles and causing "too many open files". TASK DETAIL

[Wikidata-bugs] [Maniphest] [Commented On] T134238: Query service fails with "Too many open files"

2016-05-05 Thread Gehel
Gehel added a comment. Number of established connections on nginx is low and seem to all be coming from varnish (I checked a sample over a few minutes): gehel@wdqs1002:/etc/nginx$ sudo netstat -pn | grep 10.64.32.183:80 | grep ESTABLISHED | wc -l 3 Number of connections in

[Wikidata-bugs] [Maniphest] [Commented On] T134238: Query service fails with "Too many open files"

2016-05-05 Thread Gehel
Gehel added a comment. Random things I see (and have no idea if they are relevant to the issue): - We have holes in Graphite data for unrelated metrics (for example load average). Those holes do not completely coincide with blazegraph being unavailable, but there seem to be some

[Wikidata-bugs] [Maniphest] [Commented On] T134238: Query service fails with "Too many open files"

2016-05-05 Thread hoo
hoo added a comment. I looked into it myself very briefly today (although I'm constantly distracted as I'm at a conference). When the service via query.wikidata.org is slow or even down, I usually still get very fast responses when querying the server locally (via curl on the host,

[Wikidata-bugs] [Maniphest] [Commented On] T134238: Query service fails with "Too many open files"

2016-05-05 Thread Yurik
Yurik added a comment. Probably 0 at the moment - I don't think there are any interactive graphs out there that use sparql yet. Only a few non-interactive samples (btw, there is another bug there - the order of latitude &

[Wikidata-bugs] [Maniphest] [Commented On] T134238: Query service fails with "Too many open files"

2016-05-04 Thread Stashbot
Stashbot added a comment. Mentioned in SAL [2016-05-04T16:19:14Z] restarting blazegraph (https://phabricator.wikimedia.org/T134238) TASK DETAIL https://phabricator.wikimedia.org/T134238 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Gehel,

[Wikidata-bugs] [Maniphest] [Commented On] T134238: Query service fails with "Too many open files"

2016-05-04 Thread Stashbot
Stashbot added a comment. Mentioned in SAL [2016-05-04T12:34:11Z] restarting blazegraph (https://phabricator.wikimedia.org/T134238) TASK DETAIL https://phabricator.wikimedia.org/T134238 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Gehel,

[Wikidata-bugs] [Maniphest] [Commented On] T134238: Query service fails with "Too many open files"

2016-05-03 Thread Gehel
Gehel added a comment. So the FD issue is probably a side issue that was detected because we were looking more closely than usual. The main cause of this outage is most probably a traffic peak as seen in Grafana

[Wikidata-bugs] [Maniphest] [Commented On] T134238: Query service fails with "Too many open files"

2016-05-03 Thread Stashbot
Stashbot added a comment. Mentioned in SAL [2016-05-03T21:11:17Z] restarting wdqs1002 (https://phabricator.wikimedia.org/T134238) TASK DETAIL https://phabricator.wikimedia.org/T134238 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Gehel,

[Wikidata-bugs] [Maniphest] [Commented On] T134238: Query service fails with "Too many open files"

2016-05-03 Thread Smalyshev
Smalyshev added a comment. Looks like updater slowly leaking FDs, to the rate of 2-3 FDs per minute. Eventually it causes the leaked ones to accumulate, but for now restarting updater once per day or two would eliminate the problem. I'll try to see where it comes from. TASK DETAIL

[Wikidata-bugs] [Maniphest] [Commented On] T134238: Query service fails with "Too many open files"

2016-05-03 Thread Gehel
Gehel added a comment. Previous analysis is mostly wrong. File descriptors seem under control and NOT jumping from 5k to 100k (if that was the case, I would be suspicious to raise the limit that far). TASK DETAIL https://phabricator.wikimedia.org/T134238 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] [Commented On] T134238: Query service fails with "Too many open files"

2016-05-03 Thread Joe
Joe added a comment. @gehel what about unsetting the file descriptor limit for wdqs? If those are due to real use and not fd leaking, I mean. TASK DETAIL https://phabricator.wikimedia.org/T134238 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To:

[Wikidata-bugs] [Maniphest] [Commented On] T134238: Query service fails with "Too many open files"

2016-05-03 Thread Stashbot
Stashbot added a comment. Mentioned in SAL [2016-05-03T17:18:08Z] restarting blazegraph on wdqs1002 (https://phabricator.wikimedia.org/T134238) TASK DETAIL https://phabricator.wikimedia.org/T134238 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/

[Wikidata-bugs] [Maniphest] [Commented On] T134238: Query service fails with "Too many open files"

2016-05-03 Thread Gehel
Gehel added a comment. With updater shutdown, it seems that number of open pipes stays stable at 5-8K. I did not take those measurement long enough to be confident about the correlation, but that's at least an idea. Of course, keeping the updater down is not a solution... TASK DETAIL

[Wikidata-bugs] [Maniphest] [Commented On] T134238: Query service fails with "Too many open files"

2016-05-03 Thread Gehel
Gehel added a comment. WDQS down again. 32K file handles taken by Java processes. More investigation needed. TASK DETAIL https://phabricator.wikimedia.org/T134238 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Gehel Cc: hoo, Frog23, Karima,