https://bugzilla.wikimedia.org/show_bug.cgi?id=69667
--- Comment #25 from Gerrit Notification Bot gerritad...@wikimedia.org ---
Change 167553 merged by Ottomata:
Require 2 ACKs from kafka brokers per default
https://gerrit.wikimedia.org/r/167553
--
You are receiving this mail because:
You are
https://bugzilla.wikimedia.org/show_bug.cgi?id=69667
Bug 69667 depends on bug 72550, which changed state.
Bug 72550 Summary: analytics1021 getting kicked out of kafka partition leader
role on 2014-10-27 ~07:12
https://bugzilla.wikimedia.org/show_bug.cgi?id=72550
What|Removed
https://bugzilla.wikimedia.org/show_bug.cgi?id=69667
christ...@quelltextlich.at changed:
What|Removed |Added
Depends on||72550
--
You are
https://bugzilla.wikimedia.org/show_bug.cgi?id=69667
--- Comment #24 from Gerrit Notification Bot gerritad...@wikimedia.org ---
Change 167552 merged by Ottomata:
Require 2 ACKs from kafka brokers for bits caches
https://gerrit.wikimedia.org/r/167552
--
You are receiving this mail because:
You
https://bugzilla.wikimedia.org/show_bug.cgi?id=69667
--- Comment #23 from Gerrit Notification Bot gerritad...@wikimedia.org ---
Change 167551 merged by Ottomata:
Require 2 ACKs from kafka brokers for text caches
https://gerrit.wikimedia.org/r/167551
--
You are receiving this mail because:
You
https://bugzilla.wikimedia.org/show_bug.cgi?id=69667
--- Comment #21 from Gerrit Notification Bot gerritad...@wikimedia.org ---
Change 167550 merged by Ottomata:
Require 2 ACKs from kafka brokers for mobile caches
https://gerrit.wikimedia.org/r/167550
--
You are receiving this mail because:
https://bugzilla.wikimedia.org/show_bug.cgi?id=69667
--- Comment #22 from Andrew Otto o...@wikimedia.org ---
Set vm.dirty_writeback_centisecs = 200 (was 500)
--
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
https://bugzilla.wikimedia.org/show_bug.cgi?id=69667
--- Comment #17 from Gerrit Notification Bot gerritad...@wikimedia.org ---
Change 167550 had a related patch set uploaded by QChris:
Require 2 ACKs from kafka brokers for mobile caches
https://gerrit.wikimedia.org/r/167550
--
You are
https://bugzilla.wikimedia.org/show_bug.cgi?id=69667
--- Comment #18 from Gerrit Notification Bot gerritad...@wikimedia.org ---
Change 167551 had a related patch set uploaded by QChris:
Require 2 ACKs from kafka brokers for text caches
https://gerrit.wikimedia.org/r/167551
--
You are receiving
https://bugzilla.wikimedia.org/show_bug.cgi?id=69667
--- Comment #19 from Gerrit Notification Bot gerritad...@wikimedia.org ---
Change 167552 had a related patch set uploaded by QChris:
Require 2 ACKs from kafka brokers for bits caches
https://gerrit.wikimedia.org/r/167552
--
You are receiving
https://bugzilla.wikimedia.org/show_bug.cgi?id=69667
--- Comment #20 from Gerrit Notification Bot gerritad...@wikimedia.org ---
Change 167553 had a related patch set uploaded by QChris:
Require 2 ACKs from kafka brokers per default
https://gerrit.wikimedia.org/r/167553
--
You are receiving
https://bugzilla.wikimedia.org/show_bug.cgi?id=69667
Bug 69667 depends on bug 72252, which changed state.
Bug 72252 Summary: Raw webrequest partitions for 2014-10-20T02:xx:xx not marked
successful
https://bugzilla.wikimedia.org/show_bug.cgi?id=72252
What|Removed
https://bugzilla.wikimedia.org/show_bug.cgi?id=69667
christ...@quelltextlich.at changed:
What|Removed |Added
Depends on||72252
--
You are
https://bugzilla.wikimedia.org/show_bug.cgi?id=69667
christ...@quelltextlich.at changed:
What|Removed |Added
Depends on||72028
--
You are
https://bugzilla.wikimedia.org/show_bug.cgi?id=69667
--- Comment #14 from christ...@quelltextlich.at ---
Happened again on 2014-10-13 around 13:37:31.
--
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
https://bugzilla.wikimedia.org/show_bug.cgi?id=69667
--- Comment #15 from Toby Negrin tneg...@wikimedia.org ---
Hi Christian - would you say that the fix did not solve this problem?
thanks,
-Toby
--
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list
https://bugzilla.wikimedia.org/show_bug.cgi?id=69667
--- Comment #16 from christ...@quelltextlich.at ---
(In reply to Toby Negrin from comment #15)
would you say that the fix did not solve this problem?
If you are refering to the change from comment 13, it totally did the trick.
For the
https://bugzilla.wikimedia.org/show_bug.cgi?id=69667
christ...@quelltextlich.at changed:
What|Removed |Added
Depends on||71876
--
You are
https://bugzilla.wikimedia.org/show_bug.cgi?id=69667
Bug 69667 depends on bug 71876, which changed state.
Bug 71876 Summary: Raw webrequest partitions for 2014-10-08T23:xx:xx not marked
successful
https://bugzilla.wikimedia.org/show_bug.cgi?id=71876
What|Removed
https://bugzilla.wikimedia.org/show_bug.cgi?id=69667
christ...@quelltextlich.at changed:
What|Removed |Added
Depends on|71876 |
--
You are receiving
https://bugzilla.wikimedia.org/show_bug.cgi?id=69667
Bug 69667 depends on bug 71425, which changed state.
Bug 71425 Summary: Raw webrequest partitions for 2014-09-28T04:xx:xx not marked
successful
https://bugzilla.wikimedia.org/show_bug.cgi?id=71425
What|Removed
https://bugzilla.wikimedia.org/show_bug.cgi?id=69667
christ...@quelltextlich.at changed:
What|Removed |Added
Depends on||71425
--
You are
https://bugzilla.wikimedia.org/show_bug.cgi?id=69667
--- Comment #13 from Gerrit Notification Bot gerritad...@wikimedia.org ---
Change 163744 had a related patch set uploaded by QChris:
Force ACKs from all in-sync kafka replicas
https://gerrit.wikimedia.org/r/163744
--
You are receiving this
https://bugzilla.wikimedia.org/show_bug.cgi?id=69667
Gerrit Notification Bot gerritad...@wikimedia.org changed:
What|Removed |Added
Status|NEW
https://bugzilla.wikimedia.org/show_bug.cgi?id=69667
--- Comment #12 from Andrew Otto o...@wikimedia.org ---
This happened again on 2014-09-06 around 19:43:22. Here are some relevant sar
captures:
https://gist.github.com/ottomata/ae372649afc914a6f606
Something is happening with paging stats,
https://bugzilla.wikimedia.org/show_bug.cgi?id=69667
--- Comment #11 from Andrew Otto o...@wikimedia.org ---
Yeah, strange indeed that this only happens on analytics1021. I *think* we
have seen this elsewhere before, but not often. And, I think not since the
cluster reinstall in July.
https://bugzilla.wikimedia.org/show_bug.cgi?id=69667
--- Comment #7 from Andrew Otto o...@wikimedia.org ---
I have done some more sleuthing on the zookeeper timeouts today, and I want to
record my thoughts.
- The timeout can happen when the broker is connected to any zookeeper.
- Timeouts
https://bugzilla.wikimedia.org/show_bug.cgi?id=69667
--- Comment #8 from christ...@quelltextlich.at ---
Adding some more records:
To rule out temporary basic network problems, I started a script to
ping all zookeepers from analytics1021 in an endless loop.
That produced results already:
There
https://bugzilla.wikimedia.org/show_bug.cgi?id=69667
--- Comment #9 from christ...@quelltextlich.at ---
Created attachment 16376
-- https://bugzilla.wikimedia.org/attachment.cgi?id=16376action=edit
analytics1021-2014-09-04T15-26-03.tar.gz
--
You are receiving this mail because:
You are the
https://bugzilla.wikimedia.org/show_bug.cgi?id=69667
--- Comment #10 from christ...@quelltextlich.at ---
Trying to understand why it only happens an analytics1021 and not the
other brokers, the only difference that I could find is that
analytic1021 is the only broker that does not have
https://bugzilla.wikimedia.org/show_bug.cgi?id=69667
--- Comment #4 from christ...@quelltextlich.at ---
It happened again on:
2014-08-24 ~16:00 (with recovery for the next reading in
ganglia. Since ganglia shows a decrease of volume for that time,
it might align with the kafkatee
https://bugzilla.wikimedia.org/show_bug.cgi?id=69667
--- Comment #5 from christ...@quelltextlich.at ---
(In reply to christian from comment #4)
2014-08-24 ~16:00 (with recovery for the next reading in
Wrong day. It should read
2014-08-25 ~16:00 (with recovery for the next reading in
.
https://bugzilla.wikimedia.org/show_bug.cgi?id=69667
--- Comment #6 from christ...@quelltextlich.at ---
(In reply to Andrew Otto from comment #3)
Today I increased kafka.queue.buffering.max.ms from 1 second to 5 seconds.
Leader election can take up to 3 seconds. Hopefully this will solve the
https://bugzilla.wikimedia.org/show_bug.cgi?id=69667
christ...@quelltextlich.at changed:
What|Removed |Added
Depends on||70087
--
You are
https://bugzilla.wikimedia.org/show_bug.cgi?id=69667
christ...@quelltextlich.at changed:
What|Removed |Added
Blocks|69244, 69665, 69666 |70087
Depends
https://bugzilla.wikimedia.org/show_bug.cgi?id=69667
--- Comment #3 from Andrew Otto o...@wikimedia.org ---
Today I increased kafka.queue.buffering.max.ms from 1 second to 5 seconds.
Leader election can take up to 3 seconds. Hopefully this will solve the second
bullet point above. We'll have
https://bugzilla.wikimedia.org/show_bug.cgi?id=69667
--- Comment #2 from Andrew Otto o...@wikimedia.org ---
The core of this issue is a timeout of the Zookeeper connection, which neither
Gage nor I have been able to solve.
Quick summary: Kafka brokers need to maintain a live connection with
https://bugzilla.wikimedia.org/show_bug.cgi?id=69667
Toby Negrin tneg...@wikimedia.org changed:
What|Removed |Added
CC|
38 matches
Mail list logo