Re: [Wikitech-l] Recent changes, notifications & pageprops

2016-09-23 Thread Stas Malyshev
Hi!

> You can seek back on EventBus events, but not permanently (by default, only
> up to 1 week).  If you want to respond to changes in an event stream, you

1 week is not enough for this use case, but if it could be extended to,
say, 1 month, that could be workable.

The reason is that the starting point for the WDQS server install is
wikidata dump, which is made weekly. Then the server is catching up to
the data that changed from the dump point until the current moment.
However, there could be dump failures or other conditions which may make
most recent dump unusable. It also takes to load the dump itself. So the
delta between current moment and data in freshly deployed WDQS server
could be 2 weeks or even more. We need to be able to catch up to the
changes since then. We probably will never need the full month, but it's
a conservative limit we're using now for how far back we can ask for
data. 2 weeks would probably work too even if it could mean some
scenarios become more complicated to handle.

> should consume the full event stream realtime and react to the events as
> they come in.  A proper Stream Processing system (like Flink or Spark

This is not possible for the WDQS Updater. Since WDQS server is
completely independent of Wikidata, it can be started and stopped at
anytime. There's no way to ensure that at every moment something is
changed in Wikidata all WDQS instances that are interested in this
change are up and running. There needs to be an intermediary system that
keeps the data. So far recent changes API served as this system, but
since it does not know about secondary data, it's no longer enough.

> this stream will be relatively small, and you don’t need fancy features
> like time based windowing.  You just need to update something based on an
> event, right?

Well, I need something based on an even that I can ask something like:
"give me all events that happened since time point T". For T being, say,
from a second ago to 2 weeks ago.

> The change-propagation service that the Services team is building can help
> you with this.  It allows you to consume events, and specify matching rules
> and actions to take based on those rules.
> 
> https://www.mediawiki.org/wiki/Change_propagation

I see no mention of ability to consume past events. Is it possible?

-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Recent changes, notifications & pageprops

2016-09-23 Thread Andrew Otto
You can seek back on EventBus events, but not permanently (by default, only
up to 1 week).  If you want to respond to changes in an event stream, you
should consume the full event stream realtime and react to the events as
they come in.  A proper Stream Processing system (like Flink or Spark
Streaming) could help with this, but we don’t have that right now.  But, I
think for your use case, you don’t need a big stream processing system, as
this stream will be relatively small, and you don’t need fancy features
like time based windowing.  You just need to update something based on an
event, right?

The change-propagation service that the Services team is building can help
you with this.  It allows you to consume events, and specify matching rules
and actions to take based on those rules.

https://www.mediawiki.org/wiki/Change_propagation



On Fri, Sep 23, 2016 at 2:55 PM, Stas Malyshev 
wrote:

> Hi!
>
> > Could we emit a page/properties-change event to EventBus when page props
> > are updated?  Similar to how we emit an event for revision visibility
> > changes:
>
> This, however, still is missing a part because, as I understand,
> EventBus is not seekable. I.e., if I have data up-to-date to timepoint
> T, and I am now at timepoint N, I can scan recent changes list from T to
> N and know if certain item X has changed or not. However, since recent
> changes list has no entries for page props, and events on EventBus past
> N are lost to me, I have no idea if page props for X changed between T
> and N. To know that, I need permanent seekable record of changes. Or
> some flag that says when it was last updated, at least.
>
> Unless of course I'm missing the part where you can seek back on
> EventBus events, then please point me to the API that allows to do so.
>
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Recent changes, notifications & pageprops

2016-09-23 Thread Stas Malyshev
Hi!

> Could we emit a page/properties-change event to EventBus when page props
> are updated?  Similar to how we emit an event for revision visibility
> changes:

This, however, still is missing a part because, as I understand,
EventBus is not seekable. I.e., if I have data up-to-date to timepoint
T, and I am now at timepoint N, I can scan recent changes list from T to
N and know if certain item X has changed or not. However, since recent
changes list has no entries for page props, and events on EventBus past
N are lost to me, I have no idea if page props for X changed between T
and N. To know that, I need permanent seekable record of changes. Or
some flag that says when it was last updated, at least.

Unless of course I'm missing the part where you can seek back on
EventBus events, then please point me to the API that allows to do so.

-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Recent changes, notifications & pageprops

2016-09-23 Thread Brad Jorsch (Anomie)
On Fri, Sep 23, 2016 at 9:25 AM, Andrew Otto  wrote:

> Could we emit a page/properties-change event to EventBus when page props
> are updated?


I don't know how the event stuff works, but if you can do it by hooking
hooks then 'LinksUpdateComplete' would likely be the hook to use.

Note that hook signals not just the page properties being updated, but also
pagelinks, imagelinks, externallinks, langlinks, iwlinks (interwikis),
templatelinks, and categorylinks.


-- 
Brad Jorsch (Anomie)
Senior Software Engineer
Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Recent changes, notifications & pageprops

2016-09-23 Thread Aaron Halfaker
I like the idea of having a unique event for this sort of thing.  There's a
large class of annotations like that that happen on a shifted timescale.
E.g. abuse filter tags are applied after an edit is saved.  If we are to
build a queue of edits for review, we'd like to have up-to-date abuse
filter tags too.

On Fri, Sep 23, 2016 at 8:25 AM, Andrew Otto  wrote:

> Could we emit a page/properties-change event to EventBus when page props
> are updated?  Similar to how we emit an event for revision visibility
> changes:
>
> https://github.com/wikimedia/mediawiki-event-schemas/blob/
> master/jsonschema/mediawiki/revision/visibility-change/1.yaml
>
> These events would be available to you as a stream from Kafka, or (soon) as
> a publicly consumable stream.
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Recent changes, notifications & pageprops

2016-09-23 Thread Andrew Otto
Could we emit a page/properties-change event to EventBus when page props
are updated?  Similar to how we emit an event for revision visibility
changes:

https://github.com/wikimedia/mediawiki-event-schemas/blob/master/jsonschema/mediawiki/revision/visibility-change/1.yaml

These events would be available to you as a stream from Kafka, or (soon) as
a publicly consumable stream.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Recent changes, notifications & pageprops

2016-09-22 Thread Stas Malyshev
Hi!

I'd like to raise a topic of handling change notifications and
up-to-date-ness of Wiki pages data with relation to page props.

First, a little background about how I arrived at the issue. I am
maintaining Wikidata Query Service, which updates from Wikidata using
recent changes API and RDF export format for Wikidata pages. Recently,
we have implemented using certain page properties, such as link &
statement counts. This is when I discovered the issue: the page
properties are not updated when the page (Wikidata item) is edited, but
are updated later, as I understand by a job.

Now, this leads to a situation where when I have a recent changes entry,
and I look at the RDF export page - which contains page props derived
data now - I can not know if page props data is up-to-date or not.
Moreover, if the job - some unknown and undefined time later - updates
the page props, I get no notification since the modification is not
reflected in recent changes. This makes usage of information derived
from page props very hard - you never know if the data is stale or
whether the data in page props matches the data in the page.
The problem is described in more detail in
https://phabricator.wikimedia.org/T145712

I'd like to find a solution for it, but not sure how to proceed.
The data specific to this case can be easily generated from the data
already present in memory during the page update, but I assume there
were some reasons why it was deferred.
We could make some kind of notification when updating page props, though
that would probably seriously increase the number of notifications and
thus slow the updates. Also, in some cases, the second notification may
not be necessary since the page props were updated before I've processed
the first one, but I have no way of knowing it now.

Any advice on how to solve this issue?
-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l