[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2016-01-12 Thread Ottomata
Ottomata added a comment. I believe we can close this task, ja? Got a few defined here: https://github.com/wikimedia/mediawiki-event-schemas/tree/master/jsonschema/mediawiki TASK DETAIL https://phabricator.wikimedia.org/T116247 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2016-01-12 Thread Eevans
Eevans added a comment. In https://phabricator.wikimedia.org/T116247#1927791, @Ottomata wrote: > I believe we can close this task, ja? Got a few defined here: > https://github.com/wikimedia/mediawiki-event-schemas/tree/master/jsonschema/mediawiki +1 TASK DETAIL

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-12-03 Thread Ottomata
Ottomata added a comment. Hm, not sure I follow. We are proposing that a schema be ID-able via a URI, and also remotely locatable if that URI happens to be a full URL with schema and domain information. Is the opaqueness issue the fact that it is `/` that IDs the schema, instead of just a

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-12-03 Thread JanZerebecki
JanZerebecki added a comment. I think that only means that a client that gets a URL ending in '/' for an API should not assume it can extract name and revision from it without asking the API. TASK DETAIL https://phabricator.wikimedia.org/T116247 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-12-03 Thread Ottomata
Ottomata added a comment. Hm, I think I see. We are coupling the URI to the ID, which according to the W3C should not be relied upon. Ok, noted. TASK DETAIL https://phabricator.wikimedia.org/T116247 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-12-03 Thread mobrovac
mobrovac added a comment. From my POV, the URL **is** the ID. TASK DETAIL https://phabricator.wikimedia.org/T116247 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mobrovac Cc: RobLa-WMF, Nuria, gerritbot, intracer, EBernhardson, Smalyshev,

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-12-02 Thread RobLa-WMF
RobLa-WMF added a subscriber: RobLa-WMF. RobLa-WMF added a comment. @ottomata: I don't know all of the details, but I think the ID idea is a good one. One //possible hitch//: according to the W3C, URIs are supposed to be opaque . If you're using it

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-12-01 Thread Eevans
Eevans added a comment. In https://phabricator.wikimedia.org/T116247#1839888, @Ottomata wrote: > @gwicke and I discussed the schema/revision in meta issue in IRC today. He > had an idea that I quite like! > > @gwicke suggested that instead of using (schema, revision) to uniquely ID a > schema,

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-11-23 Thread gerritbot
gerritbot added a comment. Change 254180 merged by Ottomata: Basic MediaWiki events https://gerrit.wikimedia.org/r/254180 TASK DETAIL https://phabricator.wikimedia.org/T116247 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mobrovac, gerritbot Cc:

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-11-19 Thread gerritbot
gerritbot added a subscriber: gerritbot. gerritbot added a comment. Change 254180 had a related patch set uploaded (by Mobrovac): Basic MediaWiki events https://gerrit.wikimedia.org/r/254180 TASK DETAIL https://phabricator.wikimedia.org/T116247 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-11-19 Thread Ottomata
Ottomata added a comment. FYI, the repo is here, waiting for some schemas! :) https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki/event-schemas For Avro, @ebernhardson, you can go ahead and submit a patch there in an avro/ directory. We should should probably maintain the usual Java

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-11-12 Thread EBernhardson
EBernhardson added a comment. We have already run into many annoyances with trying to keep schemas in line across repositories. I'd be happy to be proved wrong, but I don't see any way, outside of a submoduled schema repository, to have versioned dependencies between java and php. TASK

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-11-12 Thread daniel
daniel added a comment. In https://phabricator.wikimedia.org/T116247#1799843, @Ottomata wrote: > Is it time to consider creating a standalone repo for these schemas? In my oppinion, schemas (and documentation) should always live in the same repo as the code, so it is easier to keep them in

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-11-12 Thread Ottomata
Ottomata added a comment. @daniel This schema repo will be used by many codebases. EventLogging, Mediawiki, analytics refinery, etc. etc. Anyone creating events will need this code. There are various ways to share these schemas, but one idea is to use git submodules. TASK DETAIL

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-11-12 Thread daniel
daniel added a comment. @Ottomata If we have good versioned dependencies between the modules, that should work too. My concern is making sure that code, specs and docs are in sync. TASK DETAIL https://phabricator.wikimedia.org/T116247 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-11-11 Thread Ottomata
Ottomata added a comment. Is it time to consider creating a standalone repo for these schemas? If so, then that means it is time for repo name bikeshed, woohoo! No idea what to call this or where to put it. `mediawiki/schemas`? TASK DETAIL https://phabricator.wikimedia.org/T116247 EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-11-03 Thread mobrovac
mobrovac added a comment. Please take a look at the proposed event definitions and voice any concerns you might have. We'd like to settle on it in the next couple of days so that we can continue with our QGs. TASK DETAIL

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-11-03 Thread Ottomata
Ottomata added a comment. Cool, added some comments. TASK DETAIL https://phabricator.wikimedia.org/T116247 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mobrovac, Ottomata Cc: intracer, EBernhardson, Smalyshev, yuvipanda, Hardikj, daniel, aaron,

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-29 Thread Halfak
Halfak added a comment. > I think understanding the semantics of an event primarily requires knowledge > of the topic. This is true if you are consuming from something that has a "topic", but what if you are downloading a historical dump of events? It seems to me that we should aim to have

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-29 Thread GWicke
GWicke added a comment. @ottomata, they will be filled in somewhere, but I think we haven't necessarily decided on filling them in at production time. To me it seems that filling in either at production or consumption time will work, as long as defaults don't change. It sounds like you have a

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-29 Thread Ottomata
Ottomata added a comment. > @Ottomata, I think understanding the semantics of an event primarily requires > knowledge of the topic. Hm, I don't think this is true. You will need some understanding of what a historical dataset is, but that's all. The historical datasets are going to be made

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-29 Thread Ottomata
Ottomata added a comment. Producer A has schema version 1. Producer B has schema version 2, which has added field "name" with default "nonya". All of these events are being imported into Hadoop. An analyst looks at the latest schema and wants to do some analysis on "name". They write a job

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-29 Thread GWicke
GWicke added a comment. @ottomata: Based on our backwards-compatibility rules, the latest schema will be a superset of previous schemas. This means that you will be able to understand both old and new data in a given topic using the //latest// schema. TASK DETAIL

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-29 Thread Ottomata
Ottomata added a comment. Have we decided that defaults will be filled in for missing fields? TASK DETAIL https://phabricator.wikimedia.org/T116247 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mobrovac, Ottomata Cc: intracer, EBernhardson,

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-29 Thread GWicke
GWicke added a comment. @ottomata, you are basically making the case for filling in the defaults at consumption time. TASK DETAIL https://phabricator.wikimedia.org/T116247 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: mobrovac, GWicke Cc:

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-29 Thread Ottomata
Ottomata added a comment. Or produce time. But really, even if we fill in defaults during production or consumption, this will still be a problem for historical data. Data is only consumed into Hadoop once, and schema changes can happen after consumption time. If you have no way of

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-29 Thread Ottomata
Ottomata added a comment. Events will be consumed into Hadoop close to production time (within an hour usually). Schema changes made years after the fact cannot be reflected in years old historical data unless it is reprocessed and rewritten. TASK DETAIL

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-29 Thread GWicke
GWicke added a comment. @ottomata: If you fill in the defaults at consumption time, then you have a choice of how you want to treat old events. You can either fill in the defaults from the latest schema (probably what you want in most cases), or choose to explicitly distinguish fields that

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-28 Thread Ottomata
Ottomata added a comment. > > If we adopt a convention of always storing schema name and/or revision in > > the schemas themselves, then we can do like EventLogging does and infer and > > validate the schema based on this value. This would especially be helpful > > in associating a message

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-28 Thread GWicke
GWicke added a comment. @ottomata, I think understanding the semantics of an event primarily requires knowledge of the topic. The topic in turn provides access to the schema, which describes the structure of the events. It is likely that we'll have multiple topics record similarly-structured

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-27 Thread GWicke
GWicke added a comment. > I've been thinking about it too. Ideally, we could leave these fields out of > schema defs, simply reference them. But, that seems not to be in correlation > with storing them in a git repo. What I see as a possible solution is to put > these common fields into a

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-27 Thread Ottomata
Ottomata added a comment. Ok, cool, I'm cool with that, so: `request_id` - UUID1 from Varnish, not necessarily unique for an individual event `event_id` (or maybe just `uuid`? since that is what EL uses?) - Actual UUID for an event. `dt` - IS08601 timestamp, usually derived from the timestamp

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-27 Thread mobrovac
mobrovac added a comment. In https://phabricator.wikimedia.org/T116247#1754709, @Ottomata wrote: > What do y'all think about keeping these 'framing' fields in a nested object? > I'm not sure if this is a good or bad idea. If later we decide we do want to > use $ref to share common schema

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-26 Thread Ottomata
Ottomata added a comment. > If we have a use case for emitting two secondary events *to the same topic* > that were both triggered by the same primary event (user click / request id), > then we can generate a new ID for at least one of those events, and record > the parent event id in a

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-26 Thread Ottomata
Ottomata added a comment. What do y'all think about keeping these 'framing' fields in a nested object? I'm not sure if this is a good or bad idea. If later we decide we do want to use $ref to share common schema fields between different schemas, it'll be easier to do so if these are in a

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-26 Thread GWicke
GWicke added a comment. In https://phabricator.wikimedia.org/T116247#1754698, @Ottomata wrote: > > If we have a use case for emitting two secondary events *to the same topic* > > that were both triggered by the same primary event (user click / request > > id), then we can generate a new ID for

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-26 Thread Ottomata
Ottomata added a comment. To avoid possible conflicts, I'd suggest we call this not just `id`. How about `uuid`? That's what EventLogging capsule does: https://meta.wikimedia.org/wiki/Schema:EventCapsule TASK DETAIL https://phabricator.wikimedia.org/T116247 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-26 Thread Ottomata
Ottomata added a comment. I'm still a little confused about how this reqid/id will work? You are suggesting that it comes from the x-request-id that we want varnish to set, right? Won't this mean that multiple events (those produced during the same http request at varnish level) will have

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-26 Thread Ottomata
Ottomata added a comment. Also, this is just a personal preference, but I'd prefer if we had a convention differentiating integer/second based 'timestamps' and string/date based 'datetimes'. For webrequest data, the ISO8601 is called `dt`.

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-26 Thread Ottomata
Ottomata added a comment. > Hm, I think duplicates should be detected based on the content of the message > itself and the time stamp. EventLogging explicitly uses the uuid in MySQL as a unique key for all tables. Having it standardized on a single field means that the unique index creation

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-26 Thread Ottomata
Ottomata added a comment. > Manual schema versions. We could increase the schema version every time we > change something in the schema. Easy to achieve but it's also easy to forget > to bump the version when something has been changed. FWIW, this is how EventLogging does it, although the

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-26 Thread Ottomata
Ottomata added a comment. > I don't see a conflicting problem with id (even though id is a JSONSchema > keyword, but it relates to the schema, not its properties, so we're good > there). uuid is not a good choice, IMHO, it's like naming a field string > because its value is a string. The most

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-26 Thread mobrovac
mobrovac added a comment. In https://phabricator.wikimedia.org/T116247#1753398, @Ottomata wrote: > Ok cool, if that's the case, then `reqid` or even `request_id` (I like long > names...what can I say?) sounds good. `request_id` works for me. I also happen to like //snake_case//. Let's

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-26 Thread mobrovac
mobrovac added a comment. In https://phabricator.wikimedia.org/T116247#1752974, @Ottomata wrote: > I'm still a little confused about how this reqid/id will work? You are > suggesting that it comes from the x-request-id that we want varnish to set, > right? Won't this mean that multiple

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-26 Thread Eevans
Eevans added a comment. In https://phabricator.wikimedia.org/T116247#1749452, @Ottomata wrote: > Right, but how would you do this in say, Hive? Or in bash? In bash: $ sudo apt-get install uuid $ ID=$(uuid -v 1) $ grep "content: time" <(uuid -d $ID) content: time: 2015-10-26

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-26 Thread GWicke
GWicke added a comment. > If we adopt a convention of always storing schema name and/or revision in the > schemas themselves, then we can do like EventLogging does and infer and > validate the schema based on this value. This would especially be helpful in > associating a message with an Avro

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-26 Thread GWicke
GWicke added a comment. > I'm not so sure actually that these will always be redundant. I think the > request ID should be persisted to track the same event throughout the system. > Imagine a user clicks on something which produces an event in the queue and > that event triggers another one to

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-23 Thread Eevans
Eevans added a comment. In https://phabricator.wikimedia.org/T116247#1748095, @Ottomata wrote: > > So the producer would store the same time stamp twice? UUID v1 already > > contains it. > > > Could you provide an example of what this UUID would look like? > > A reason for having a timestamp

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-23 Thread Ottomata
Ottomata added a comment. > topics named something like mw-edit and mw-edit-private perhaps (where the > latter contains this extra info). I'd prefer if we did this the other way around. The 'private' topic will have more data and be the main source of truth. The public one will contain a

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-23 Thread mobrovac
mobrovac added a comment. In https://phabricator.wikimedia.org/T116247#1747924, @Ottomata wrote: > I'd like an actual timestamp to be part of the framing for all events too. > I'm all for a reqid, (although I'd bikeshed about the name a bit), but having > a standardized canonical timestamp in

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-23 Thread Ottomata
Ottomata added a comment. I'd like an actual timestamp to be part of the framing for all events too. I'm all for a reqid, (although I'd bikeshed about the name a bit), but Having a standardized canonical timestamp in all events is very useful. Can we add: - **ts**: iso 8601 timestamp. This

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-23 Thread GWicke
GWicke added a comment. @ottomata, UUIDs are described in https://en.wikipedia.org/wiki/Universally_unique_identifier. An example for a v1 UUID is `b54adc00-67f9-11d9-9669-0800200c9a66`. There are libraries to extract the high-resolution timestamp for most environments. Regarding a separate

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-23 Thread GWicke
GWicke added a comment. @JanZerebecki: Suppression information would indeed be needed for public access to older events. One option would be to key this on the event's UUID. We could also consider superseding the message using Kafka's deduplication (compaction) based on the same UUID. TASK

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-23 Thread JanZerebecki
JanZerebecki added a comment. If we offer public access to the public events of the past we need to rewrite them according to new events that hide previous public events. Can you make sure that events that hide any part of previous public events are also public? So that a public archive of

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-23 Thread Ottomata
Ottomata added a comment. > So the producer would store the same time stamp twice? UUID v1 already > contains it. Could you provide an example of what this UUID would look like? A reason for having a timestamp only field is so that applications can use it for time based logic without having

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-23 Thread GWicke
GWicke added a comment. > Right, but how would you do this in say, Hive? Or in bash? Timestamp logic > should be easy and immediate. Yeah, Hive really seems to be lacking built-in support for UUIDs. There seems to be UDF code to deal with them, but it's definitely not as convenient as it

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-23 Thread Ottomata
Ottomata added a comment. Right, but how would you do this in say, Hive? Or in bash? Timestamp logic should be easy and immediate. > Regarding a separate timestamp in the framing information: Which time would > this correspond to? This is up to the producer, I think. If there are more

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-23 Thread GWicke
GWicke added a comment. I went ahead and updated the task description with the current framing / per-event schema. I renamed the `reqid` to just `id`, and added a `ts` field containing the same timestamp in ISO 8601 format. TASK DETAIL https://phabricator.wikimedia.org/T116247 EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-23 Thread JanZerebecki
JanZerebecki added a comment. As long as a separate public suppression event exists that refers to the old one it sounds fine. TASK DETAIL https://phabricator.wikimedia.org/T116247 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: JanZerebecki Cc:

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-22 Thread GWicke
GWicke added a comment. Some notes from the meeting: 1. Framing, for all events - **uri**: string; path or url. Example: /en.wikipedia.org/v1/page/title/San_Francisco - **reqid**: v1 UUID ;

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-22 Thread Ottomata
Ottomata added a comment. etherpad from today's meeting: https://etherpad.wikimedia.org/p/eventbus-events TASK DETAIL https://phabricator.wikimedia.org/T116247 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Ottomata Cc: Smalyshev, yuvipanda,

[Wikidata-bugs] [Maniphest] [Commented On] T116247: Define edit related events for change propagation

2015-10-22 Thread Ottomata
Ottomata added a comment. COOL. As part of this discussion, I'd like us to think about not only fields that are relevant to edit events, but also those fields that might be useful for most, if not all, standardized WMF events to have. These might be required for all events to share. Things