[Bug 56882] memcached-serious log flooded with TIMED RETRY errors

2014-04-28 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56882

Sam Reed (reedy) s...@reedyboy.net changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #15 from Sam Reed (reedy) s...@reedyboy.net ---
Tampa is deaded

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56882] memcached-serious log flooded with TIMED RETRY errors

2014-04-25 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56882

Andre Klapper aklap...@wikimedia.org changed:

   What|Removed |Added

 Status|PATCH_TO_REVIEW |NEW

--- Comment #14 from Andre Klapper aklap...@wikimedia.org ---
All patches merged; resetting ticket status

(In reply to Andre Klapper from comment #13)
 Patch merged for Eqiad; somebody needs to decide if it's worth to do
 something about Tampa or not (and close the ticket accordingly).

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56882] memcached-serious log flooded with TIMED RETRY errors

2014-02-25 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56882

--- Comment #13 from Andre Klapper aklap...@wikimedia.org ---
Patch merged for Eqiad; somebody needs to decide if it's worth to do something
about Tampa or not (and close the ticket accordingly).

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56882] memcached-serious log flooded with TIMED RETRY errors

2014-02-14 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56882

--- Comment #10 from Gerrit Notification Bot gerritad...@wikimedia.org ---
Change 113387 had a related patch set uploaded by Aaron Schulz:
Set retry_timeout to -1 for memcached in eqiad only

https://gerrit.wikimedia.org/r/113387

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56882] memcached-serious log flooded with TIMED RETRY errors

2014-02-14 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56882

Gerrit Notification Bot gerritad...@wikimedia.org changed:

   What|Removed |Added

 Status|NEW |PATCH_TO_REVIEW

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56882] memcached-serious log flooded with TIMED RETRY errors

2014-02-14 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56882

--- Comment #11 from Gerrit Notification Bot gerritad...@wikimedia.org ---
Change 113387 merged by jenkins-bot:
Set retry_timeout to -1 for memcached in eqiad only

https://gerrit.wikimedia.org/r/113387

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56882] memcached-serious log flooded with TIMED RETRY errors

2014-02-14 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56882

--- Comment #12 from Aaron Schulz aschulz4...@gmail.com ---
(In reply to Gerrit Notification Bot from comment #10)
 Change 113387 had a related patch set uploaded by Aaron Schulz:
 Set retry_timeout to -1 for memcached in eqiad only
 
 https://gerrit.wikimedia.org/r/113387

This doesn't work for tampa, so most errors will remain. I guess they can
probably be ignored.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56882] memcached-serious log flooded with TIMED RETRY errors

2014-02-13 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56882

--- Comment #9 from Aaron Schulz aschulz4...@gmail.com ---
I set the timeout to -1 on production...which is kind of hacky looking at the C
code, but should work for now. Tim's upstream patch seems to be merged as well.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56882] memcached-serious log flooded with TIMED RETRY errors

2013-11-14 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56882

Andre Klapper aklap...@wikimedia.org changed:

   What|Removed |Added

   Priority|Unprioritized   |High

--- Comment #6 from Andre Klapper aklap...@wikimedia.org ---
[Tim: Your browser seems to reset the Priority field.]

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56882] memcached-serious log flooded with TIMED RETRY errors

2013-11-14 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56882

--- Comment #7 from Tim Starling tstarl...@wikimedia.org ---
A potential workaround is to reduce the retry timeout to zero. I have written a
libmemcached patch which allows this. Faidon will deploy it. The patch has been
submitted upstream at https://bugs.launchpad.net/libmemcached/+bug/1251482

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56882] memcached-serious log flooded with TIMED RETRY errors

2013-11-14 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56882

--- Comment #8 from Tim Starling tstarl...@wikimedia.org ---
(In reply to comment #6)
 [Tim: Your browser seems to reset the Priority field.]

Maybe priority changes don't trigger the mid-air collision page.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56882] memcached-serious log flooded with TIMED RETRY errors

2013-11-13 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56882

Andre Klapper aklap...@wikimedia.org changed:

   What|Removed |Added

   Priority|Unprioritized   |High

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56882] memcached-serious log flooded with TIMED RETRY errors

2013-11-13 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56882

Tim Starling tstarl...@wikimedia.org changed:

   What|Removed |Added

   Priority|High|Unprioritized

--- Comment #5 from Tim Starling tstarl...@wikimedia.org ---
Here's an aggregation of error messages in the current memcached-serious.log:

1  ITEM TOO BIG
2081  CONNECTION FAILURE
8199483  SERVER HAS FAILED AND IS DISABLED UNTIL TIMED RETRY
9  SERVER ERROR
101  A TIMEOUT OCCURRED
3  A BAD KEY WAS PROVIDED/CHARACTERS OUT OF RANGE

If you exclude TIMED RETRY messages, you get only 27 different appservers,
whereas there are 387 appservers in the log overall. CONNECTION FAILURE and A
TIMEOUT OCCURRED can lead to a flood of TIMED RETRY messages afterwards, but
obviously they can't account for most of the TIMED RETRY messages. In fact, all
2081 CONNECTION FAILURE messages in this log came from snapshot3. So we need to
look for unlogged causes of servers being marked for timed retry.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56882] memcached-serious log flooded with TIMED RETRY errors

2013-11-12 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56882

Rob Lanphier ro...@wikimedia.org changed:

   What|Removed |Added

   Priority|Unprioritized   |High
 CC||ro...@wikimedia.org
   Assignee|wikibugs-l@lists.wikimedia. |tstarl...@wikimedia.org
   |org |

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56882] memcached-serious log flooded with TIMED RETRY errors

2013-11-12 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56882

Tim Starling tstarl...@wikimedia.org changed:

   What|Removed |Added

   Priority|High|Unprioritized

--- Comment #3 from Tim Starling tstarl...@wikimedia.org ---
(In reply to comment #2)
 The keys enwiki:newtalk:ip:10.0.0.14 and enwiki:newtalk:ip:10.0.0.13 are
 preponderant in the logs and odd to behold. Possibly a different bug that is
 getting exposed by this one.

Those are LVS servers. Pybal is regularly requesting
http://en.wikipedia.org/wiki/Main_Page on all backends, which is essentially
the only MW traffic to the pmtpa apaches.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56882] memcached-serious log flooded with TIMED RETRY errors

2013-11-12 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56882

--- Comment #4 from Tim Starling tstarl...@wikimedia.org ---
The error message indicates that a libmemcached memcached_connect() call gave a
result of MEMCACHED_SERVER_TEMPORARILY_DISABLED. This can happen if
memcached_mark_server_for_timeout() was called on the server, which can happen
if memcached_quit_server is called with io_death=true, which can happen in all
sorts of different cases in io.cc and in one case in response.cc. More
investigation is needed to determine exactly which case is causing it. 

Unfortunately, MEMCACHED_BEHAVIOR_RETRY_TIMEOUT (i.e. retry_timeout in our
config) has a minimum of one whole second. In the case of PHP talking to
twemproxy, immediate reconnection would probably be a better policy. The
one-second timeout is why we see floods of messages in bursts that last
approximately one second each.

The fact that pmtpa apaches responding to pybal monitoring requests are heavily
represented in the logs may be caused by transient packet loss on the pmtpa to
eqiad link, which would increase the rate of connection timeouts.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56882] memcached-serious log flooded with TIMED RETRY errors

2013-11-10 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56882

Ori Livneh o...@wikimedia.org changed:

   What|Removed |Added

 CC||o...@wikimedia.org

--- Comment #1 from Ori Livneh o...@wikimedia.org ---
Possibly related to https://bugs.launchpad.net/libmemcached/+bug/928696

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56882] memcached-serious log flooded with TIMED RETRY errors

2013-11-10 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56882

--- Comment #2 from Ori Livneh o...@wikimedia.org ---
The keys enwiki:newtalk:ip:10.0.0.14 and enwiki:newtalk:ip:10.0.0.13 are
preponderant in the logs and odd to behold. Possibly a different bug that is
getting exposed by this one.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l