[Wikidata-bugs] [Maniphest] [Commented On] T134238: Query service fails with "Too many open files"

2016-05-07 Thread Stashbot
Stashbot added a comment.


  Mentioned in SAL [2016-05-07T12:32:44Z]  Restarted blazegraph on 
wdqs1002 (Unresponsive, even locally: java.io.IOException: Too many open files) 
https://phabricator.wikimedia.org/T134238

TASK DETAIL
  https://phabricator.wikimedia.org/T134238

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev, Stashbot
Cc: Gehel, aude, Yurik, Smalyshev, Joe, Stashbot, Envlh, JanZerebecki, hoo, 
Frog23, Karima, Aklapper, Yair_rand, Zppix, Avner, debt, D3r1ck01, FloNight, 
Izno, jkroll, Wikidata-bugs, Jdouglas, Deskana, Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T134238: Query service fails with "Too many open files"

2016-05-06 Thread Smalyshev
Smalyshev added a comment.


  Looks like the real cause is here: 
https://jira.blazegraph.com/browse/BLZG-1884
  
  I'll add the patch as soon as it's released.

TASK DETAIL
  https://phabricator.wikimedia.org/T134238

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Gehel, Smalyshev
Cc: Yurik, Smalyshev, Joe, Stashbot, Envlh, JanZerebecki, hoo, Frog23, Karima, 
Aklapper, Yair_rand, Zppix, Avner, debt, Gehel, D3r1ck01, FloNight, Izno, 
jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T134238: Query service fails with "Too many open files"

2016-05-05 Thread Gehel
Gehel added a comment.


  CLOSE_WAIT is whan the TCP stack is waiting for the application to 
acknowledge that the connection is terminated. So yes, a high number of 
CLOSE_WAIT is an indication of a problem. If blazegraph is frozen (or Jetty 
more precisely) that could explain. If I read the code / stacktraces correctly, 
blazegraph is using only async IO, so the reasons I see for a delay in 
acknowledging CLOSE_WAIT is if the HTTP thread pool is overloaded or if the JVM 
is completely frozen (GC going crazy?). There are probably other scenarios, but 
I can't think of any at the moment.

TASK DETAIL
  https://phabricator.wikimedia.org/T134238

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Gehel
Cc: Yurik, Smalyshev, Joe, Stashbot, Envlh, JanZerebecki, hoo, Frog23, Karima, 
Aklapper, Yair_rand, Zppix, Avner, debt, Gehel, D3r1ck01, FloNight, Izno, 
jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T134238: Query service fails with "Too many open files"

2016-05-05 Thread Smalyshev
Smalyshev added a comment.


  TIME_WAIT seems to be normal, but I also saw very high number of CLOSE_WAIT 
ones, when Blazegraph was stalling, not sure if it's the cause or the symptom, 
but this seems to be what was keeping the file handles and causing "too many 
open files".

TASK DETAIL
  https://phabricator.wikimedia.org/T134238

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Gehel, Smalyshev
Cc: Yurik, Smalyshev, Joe, Stashbot, Envlh, JanZerebecki, hoo, Frog23, Karima, 
Aklapper, Yair_rand, Zppix, Avner, debt, Gehel, D3r1ck01, FloNight, Izno, 
jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T134238: Query service fails with "Too many open files"

2016-05-05 Thread Gehel
Gehel added a comment.


  Number of established connections on nginx is low and seem to all be coming 
from varnish (I checked a sample over a few minutes):
  
gehel@wdqs1002:/etc/nginx$ sudo netstat -pn | grep 10.64.32.183:80 | grep 
ESTABLISHED | wc -l
3
  
  Number of connections in TIME_WAIT is coherent with the number of requests 
seen for the last minute in nginx access.log.

TASK DETAIL
  https://phabricator.wikimedia.org/T134238

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Gehel
Cc: Yurik, Smalyshev, Joe, Stashbot, Envlh, JanZerebecki, hoo, Frog23, Karima, 
Aklapper, Yair_rand, Zppix, Avner, debt, Gehel, D3r1ck01, FloNight, Izno, 
jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T134238: Query service fails with "Too many open files"

2016-05-05 Thread Gehel
Gehel added a comment.


  Random things I see (and have no idea if they are relevant to the issue):
  
  - We have holes in Graphite data for unrelated metrics (for example load 
average). Those holes do not completely coincide with blazegraph being 
unavailable, but there seem to be some correlation.
  - thread dumps taken during blazegraph being unresponsive show only a few 
threads doing actual work, no stuck threads and no deadlocks at the JVM level.
  - blazegraph stops sending any logs (unusual, blazegraph is usually pretty 
verbose)

TASK DETAIL
  https://phabricator.wikimedia.org/T134238

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Gehel
Cc: Yurik, Smalyshev, Joe, Stashbot, Envlh, JanZerebecki, hoo, Frog23, Karima, 
Aklapper, Yair_rand, Zppix, Avner, debt, Gehel, D3r1ck01, FloNight, Izno, 
jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T134238: Query service fails with "Too many open files"

2016-05-05 Thread hoo
hoo added a comment.


  I looked into it myself very briefly today (although I'm constantly 
distracted as I'm at a conference).
  
  When the service via query.wikidata.org is slow or even down, I usually still 
get very fast responses when querying the server locally (via curl on the host, 
using the nginx at port 80). I suspect we have a problem where either the nginx 
is not allowing more than N active workers for the misc web Varnish (but still 
for other hosts or specifically localhost) or Varnish doesn't want to have more 
than N open connections to the services, thus lets the other connections wait.

TASK DETAIL
  https://phabricator.wikimedia.org/T134238

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Gehel, hoo
Cc: Yurik, Smalyshev, Joe, Stashbot, Envlh, JanZerebecki, hoo, Frog23, Karima, 
Aklapper, Yair_rand, Zppix, Avner, debt, Gehel, D3r1ck01, FloNight, Izno, 
jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T134238: Query service fails with "Too many open files"

2016-05-05 Thread Yurik
Yurik added a comment.


  Probably 0 at the moment - I don't think there are any interactive graphs out 
there that use sparql yet.  Only a few non-interactive samples 
 (btw, there is 
another bug there - the order of latitude & longitude seemed to have switched.)

TASK DETAIL
  https://phabricator.wikimedia.org/T134238

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Gehel, Yurik
Cc: Yurik, Smalyshev, Joe, Stashbot, Envlh, JanZerebecki, hoo, Frog23, Karima, 
Aklapper, Yair_rand, Zppix, Avner, debt, Gehel, D3r1ck01, FloNight, Izno, 
jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T134238: Query service fails with "Too many open files"

2016-05-04 Thread Stashbot
Stashbot added a comment.


  Mentioned in SAL [2016-05-04T16:19:14Z]  restarting blazegraph 
(https://phabricator.wikimedia.org/T134238)

TASK DETAIL
  https://phabricator.wikimedia.org/T134238

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Gehel, Stashbot
Cc: Smalyshev, Joe, Stashbot, Envlh, JanZerebecki, hoo, Frog23, Karima, 
Aklapper, Yair_rand, Zppix, Avner, debt, Gehel, D3r1ck01, FloNight, Izno, 
jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T134238: Query service fails with "Too many open files"

2016-05-04 Thread Stashbot
Stashbot added a comment.


  Mentioned in SAL [2016-05-04T12:34:11Z]  restarting blazegraph 
(https://phabricator.wikimedia.org/T134238)

TASK DETAIL
  https://phabricator.wikimedia.org/T134238

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Gehel, Stashbot
Cc: Smalyshev, Joe, Stashbot, Envlh, JanZerebecki, hoo, Frog23, Karima, 
Aklapper, Yair_rand, Zppix, Avner, debt, Gehel, D3r1ck01, FloNight, Izno, 
jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T134238: Query service fails with "Too many open files"

2016-05-03 Thread Gehel
Gehel added a comment.


  So the FD issue is probably a side issue that was detected because we were 
looking more closely than usual. The main cause of this outage is most probably 
a traffic peak as seen in Grafana 
,
 made worse by the fact that we run on a single server. If that's really the 
case, it shows that we run too close to our limit load (peaks are around 2-3x 
normal load).

TASK DETAIL
  https://phabricator.wikimedia.org/T134238

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Gehel
Cc: Smalyshev, Joe, Stashbot, Envlh, JanZerebecki, hoo, Frog23, Karima, 
Aklapper, Yair_rand, Zppix, Avner, debt, Gehel, D3r1ck01, FloNight, Izno, 
jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T134238: Query service fails with "Too many open files"

2016-05-03 Thread Stashbot
Stashbot added a comment.


  Mentioned in SAL [2016-05-03T21:11:17Z]  restarting wdqs1002 
(https://phabricator.wikimedia.org/T134238)

TASK DETAIL
  https://phabricator.wikimedia.org/T134238

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Gehel, Stashbot
Cc: Smalyshev, Joe, Stashbot, Envlh, JanZerebecki, hoo, Frog23, Karima, 
Aklapper, Yair_rand, Zppix, Avner, debt, Gehel, D3r1ck01, FloNight, Izno, 
jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T134238: Query service fails with "Too many open files"

2016-05-03 Thread Smalyshev
Smalyshev added a comment.


  Looks like updater slowly leaking FDs, to the rate of 2-3 FDs per minute. 
Eventually it causes the leaked ones to accumulate, but for now restarting 
updater once per day or two would eliminate the problem. I'll try to see where 
it comes from.

TASK DETAIL
  https://phabricator.wikimedia.org/T134238

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Gehel, Smalyshev
Cc: Smalyshev, Joe, Stashbot, Envlh, JanZerebecki, hoo, Frog23, Karima, 
Aklapper, Yair_rand, Zppix, Avner, debt, Gehel, D3r1ck01, FloNight, Izno, 
jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T134238: Query service fails with "Too many open files"

2016-05-03 Thread Gehel
Gehel added a comment.


  Previous analysis is mostly wrong. File descriptors seem under control and 
NOT jumping from 5k to 100k (if that was the case, I would be suspicious to 
raise the limit that far).

TASK DETAIL
  https://phabricator.wikimedia.org/T134238

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Gehel
Cc: Joe, Stashbot, Envlh, JanZerebecki, hoo, Frog23, Karima, Aklapper, 
Yair_rand, Zppix, Avner, debt, Gehel, D3r1ck01, FloNight, Izno, jkroll, 
Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T134238: Query service fails with "Too many open files"

2016-05-03 Thread Joe
Joe added a comment.


  @gehel what about unsetting the file descriptor limit for wdqs? If those are 
due to real use and not fd leaking, I mean.

TASK DETAIL
  https://phabricator.wikimedia.org/T134238

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Gehel, Joe
Cc: Joe, Stashbot, Envlh, JanZerebecki, hoo, Frog23, Karima, Aklapper, 
Yair_rand, Zppix, Avner, debt, Gehel, D3r1ck01, FloNight, Izno, jkroll, 
Smalyshev, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T134238: Query service fails with "Too many open files"

2016-05-03 Thread Stashbot
Stashbot added a comment.


  Mentioned in SAL [2016-05-03T17:18:08Z]  restarting blazegraph on 
wdqs1002 (https://phabricator.wikimedia.org/T134238)

TASK DETAIL
  https://phabricator.wikimedia.org/T134238

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Gehel, Stashbot
Cc: Stashbot, Envlh, JanZerebecki, hoo, Frog23, Karima, Aklapper, Yair_rand, 
Zppix, Avner, debt, Gehel, D3r1ck01, FloNight, Izno, jkroll, Smalyshev, 
Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T134238: Query service fails with "Too many open files"

2016-05-03 Thread Gehel
Gehel added a comment.


  With updater shutdown, it seems that number of open pipes stays stable at 
5-8K. I did not take those measurement long enough to be confident about the 
correlation, but that's at least an idea. Of course, keeping the updater down 
is not a solution...

TASK DETAIL
  https://phabricator.wikimedia.org/T134238

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Gehel
Cc: hoo, Frog23, Karima, Aklapper, Yair_rand, Zppix, Avner, debt, Gehel, 
D3r1ck01, FloNight, Izno, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, 
Deskana, Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T134238: Query service fails with "Too many open files"

2016-05-03 Thread Gehel
Gehel added a comment.


  WDQS down again. 32K file handles taken by Java processes. More investigation 
needed.

TASK DETAIL
  https://phabricator.wikimedia.org/T134238

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Gehel
Cc: hoo, Frog23, Karima, Aklapper, Yair_rand, Zppix, Avner, debt, Gehel, 
D3r1ck01, FloNight, Izno, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, 
Deskana, Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs