Ottomata added a comment.
Can we close this task?
TASK DETAIL
https://phabricator.wikimedia.org/T161731
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Ottomata
Cc: gerritbot, JAllemandou, Pchelolo, Ladsgroup, Nuria, Anomie, Aklapper,
Smalyshev,
Ottomata added a comment.
OO yes @Smalyshev and in case you didn't see, we also increased retention of mediawiki topics to 31 days in the main kafka clusters.TASK DETAILhttps://phabricator.wikimedia.org/T161731EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Ot
Smalyshev added a comment.
@Nuria I don't see any immediate blockers so far.TASK DETAILhttps://phabricator.wikimedia.org/T161731EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Ottomata, SmalyshevCc: gerritbot, JAllemandou, Pchelolo, Ladsgroup, Nuria, Anomie, A
Nuria added a comment.
Ping @Smalyshev now that you have a reliable stream on the new kafka cluster (that supports time-based consumption) is there any other blockers on your end ?TASK DETAILhttps://phabricator.wikimedia.org/T161731EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/
Smalyshev added a comment.
yes, definitelyTASK DETAILhttps://phabricator.wikimedia.org/T161731EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Ottomata, SmalyshevCc: gerritbot, JAllemandou, Pchelolo, Ladsgroup, Nuria, Anomie, Aklapper, Smalyshev, Cpaulf30, Lahi
Nuria added a comment.
@Smalyshev Please, 45 minutes with me and @Ottomata would do?TASK DETAILhttps://phabricator.wikimedia.org/T161731EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Ottomata, NuriaCc: gerritbot, JAllemandou, Pchelolo, Ladsgroup, Nuria, Anomi
Smalyshev added a comment.
@Nuria yes mostly, though I do have some questions, maybe we should set up a short meeting to discuss them?TASK DETAILhttps://phabricator.wikimedia.org/T161731EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Ottomata, SmalyshevCc: ger
Nuria added a comment.
@Smalyshev Ok, we aim to have the cluster handling all prod traffic by end of next quarter, until then it will be mirroing data which i think should be sufficient for you to get started in the wdqs consumer? Correct me if I am wrong.TASK DETAILhttps://phabricator.wikimedia.o
Smalyshev added a comment.
@Nuria yes, consuming the data works.TASK DETAILhttps://phabricator.wikimedia.org/T161731EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Ottomata, SmalyshevCc: gerritbot, JAllemandou, Pchelolo, Ladsgroup, Nuria, Anomie, Aklapper, Sma
Ottomata added a comment.
So, FYI, the timestamps as they are now are the timestamp that the kafka jumbo-eqiad cluster received the messages. These are replicated from the main-eqiad cluster, and might have a short (seconds usually, minutes max) delay.
Eventually (work not planned yet) we will up
Nuria added a comment.
Nice, Can @Smalyshev check whether consuming from these topics as set would work for his purposes?TASK DETAILhttps://phabricator.wikimedia.org/T161731EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Ottomata, NuriaCc: gerritbot, JAlleman
Ottomata added a comment.
Woot, that did it ^. We need topics to default to LogAppendTime.
[@stat1005:/home/otto] $ ./kafkacat -Q -b kafka-jumbo1001.eqiad.wmnet:9092 -t eqiad.mediawiki.revision-create:0:151275919
eqiad.mediawiki.revision-create [0] offset 3658631TASK DETAILhttps://phabricator
gerritbot added a comment.
Change 396439 merged by Ottomata:
[operations/puppet@production] Set default topic timestamp.type to LogAppendTime
https://gerrit.wikimedia.org/r/396439TASK DETAILhttps://phabricator.wikimedia.org/T161731EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/e
gerritbot added a comment.
Change 396439 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Set default topic timestamp.type to LogAppendTime
https://gerrit.wikimedia.org/r/396439TASK DETAILhttps://phabricator.wikimedia.org/T161731EMAIL PREFERENCEShttps
Ottomata added a comment.
Seems like the mirroring is done by 0.9 MirrorMaker and timestamp handling was added only in 0.10 MirrorMaker.
Hm, ya but I had thought that if a timestamp was not set by the producer, it would be set to server receive time. Maybe I was wrong!TASK DETAILhttps://phabricat
Nuria added a comment.
I got same doing:
/home/otto/kafkacat -Q -b kafka-jumbo1003.eqiad.wmnet -t eqiad.mediawiki.revision-create:0:1512687299 -Xdebug=allTASK DETAILhttps://phabricator.wikimedia.org/T161731EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Ott
Pchelolo added a comment.
Hm, actually if I just try to consume from that topic (any topic actually) with -F "%T" that should give me message timestamps it gives -1 as well.
I suppose that the problem is that we're actually producing these messages into Kafka 0.9 and perhaps not specifying the tim
Smalyshev added a comment.
/home/otto/kafkacat runs fine but -Q seems to return this for everything:
eqiad.mediawiki.revision-create [0] offset -1
Maybe I'm doing something wrong?TASK DETAILhttps://phabricator.wikimedia.org/T161731EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/
Ottomata added a comment.
You can easily 'quickbuild' kafkacat with a statically linked librdkafka. I've just done this on a stretch labs host, and copied the kafkacat binary to stat1005 at /home/otto/kafkacat. Try it out!TASK DETAILhttps://phabricator.wikimedia.org/T161731EMAIL PREFERENCEShttps:
Smalyshev added a comment.
@Pchelolo thanks for the pointer, this is very helpful!
Indeed, kafkacat for example supports it since a year ago. However, looks like we have this version of Kafka:
Copyright (c) 2014-2015, Magnus Edenhill
Version KAFKACAT_VERSION (JSON) (librdkafka 0.9.3)
which doesn
Pchelolo added a comment.
In T161731#3814596, @Smalyshev wrote:
@Ottomata thanks, I can connect to the hosts above, but still not sure how to control the starting point. I'll try to look around for clients that can do this.
Java client has offsetsForTimes implemented and supports seek to an offs
Smalyshev added a comment.
@Ottomata thanks, I can connect to the hosts above, but still not sure how to control the starting point. I'll try to look around for clients that can do this.TASK DETAILhttps://phabricator.wikimedia.org/T161731EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/p
Ottomata added a comment.
Sure, I suppose! You can connect to it with a Kafka client now. The Kafka brokers are kafka-jumbo100[1-6].eqiad.wmnet:9092
I think you are most interested in the eqiad.mediawiki.revision-create topic. I haven't tried yet at all, but these topics should have a broker re
Nuria added a comment.
@Ottomata Could @Smalyshev do a test on consuming from the new cluster though with teh understanding it is not yet productionized to make sure it fits the use cases?TASK DETAILhttps://phabricator.wikimedia.org/T161731EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings
Ottomata added a comment.
Unfortunately not yet! We are very close...the cluster is up and running, but porting clients has been blocked on getting proper keys and certificates for SSL support for a long time now. SSL is finally moving now, so we should be able to start porting clients over soon.
Smalyshev added a comment.
@Ottomata, @Nuria what's the status on seekable Kafka streaming - do we have necessary infrastructure now?TASK DETAILhttps://phabricator.wikimedia.org/T161731EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Ottomata, SmalyshevCc: Lads
Smalyshev added a comment.
As the result of the discussion, we've arrived to a following conclusion:
After we have Kafka version installed that allows to start by timestamp, we can create a prototype that takes recent changes from either Kafka or EventStreams.
We need to evaluate if unfiltered st
Nuria added a comment.
From meeting:
@Smalyshev can consume from either kafka or event stream once we add the ability to consume from a given point in time, this is what is mean by "seekable" (on new kafka cluster, next quarter, Q1) .
Keeping data for longer than 7 days is not an issue for top
Smalyshev added a comment.
@Nuria yes, still very much needed and unsolved. Please feel welcome to set up a meet.TASK DETAILhttps://phabricator.wikimedia.org/T161731EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Ottomata, SmalyshevCc: Nuria, Anomie, Aklapper,
Nuria added a comment.
ping @Smalyshev is this still a need? Maybe we should set up a short 30 minute sync upTASK DETAILhttps://phabricator.wikimedia.org/T161731EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Ottomata, NuriaCc: Nuria, Anomie, Aklapper, Smalysh
Smalyshev added a comment.
FYI, neither base Kafka Consumer clients nor EventStreams does this.
Yes, I know :) It's one of the decisions I still haven't figured out - how much I can/should do on the backend so I don't have to do it on the client, vs sending the client the raw firehose output and l
Ottomata added a comment.
I'd rather have some intermediary that cleans up, deduplicates, etc. the changes.
FYI, neither base Kafka Consumer clients nor EventStreams does this.TASK DETAILhttps://phabricator.wikimedia.org/T161731EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emai
Smalyshev added a comment.
It will be more, a lot more. What language are you working in?
The end consumer will be Java, but I don't want to consume the raw Kafka stream from Java, I'd rather have some intermediary that cleans up, deduplicates, etc. the changes.
Load balanced parallel consumers,
Ottomata added a comment.
If so, you may want to consider consuming from Kafka rather than EventStreams.
I am considering this too, but I assume it's more code for me to write (maybe wrongly, I didn't look at it closely).
It will be more, a lot more. What language are you working in?
But what i
Smalyshev added a comment.
let's jump in a hangout sometime to discuss this more.
Would be glad to. I'll try to set up something next week.
If there is a desire, we can expose these in EventStreams. Do you have desire? :)
Yes, see T145712 - recentchanges ignores pageprops updates, and it would b
Ottomata added a comment.
@Smalyshev let's jump in a hangout sometime to discuss this more.
Just a few quick points:
Does not have data back more than 7 days
We could probably bump this up to 14 days for specific topics like recentchange.
Scalable - there's no hard limit on the number of client
Smalyshev added a comment.
I think EventStreams is closest to the goal too, but I want to have a complete description of the pony for the record so that we know what we need and what is missing. If and once it's implemented (T152731) covers part of it but not all - still need seeking and longer bac
37 matches
Mail list logo