It might be a good idea to add an API that outputs the entity IDs that
changed since time x or revision y. For older data it could refer to index
files for the dumps. Makes probably more sense than to create a dump each
minute.

hourly dumps: https://phabricator.wikimedia.org/T85100
changed entities API: https://phabricator.wikimedia.org/T85103

On Sat, Dec 20, 2014 at 9:03 PM, Stas Malyshev <[email protected]>
wrote:

> Hi!
>
> > The best place for this kind of question would be the wikidata-tech
> mailing list
> > <[email protected]>. It would probably be a good idea
> if you
> > (and whoever else deals with wikidata on the technical level) were
> subscribed
> > there. It's pretty low traffic.
>
> Thanks, I've sent the subscription request and adding it to the CC.
> Still learning the right places to go for things :)
>
> > Statement IDs are GUIDs (with the Item ID prefixed), and they do not
> change when
> > the Statement changes (otherwise, they would be hashes, not IDs -
> References are
> > currently handled by hash).
>
> From the export/import point of view, I think I'd prefer immutable
> claims (i.e. ID changes each time claim changes) as they are easier to
> handle, but as it is not the case, I can switch to using the content
> hash instead. The performance impact (time spent calculating the hashes)
> should not be too big.
>
> > One thing that would be rather easy to do is to make JSON dumps of just
> the
> > items that changed in the last X hours. But that wouldn't tell you wich
> > statements changed.
>
> I think for imports the best thing would be to have real diffs - i.e.
> list of claims/item fields that were added/removed/changed - but if
> that's not feasible, list of changed items would be great too. We may
> want this with even more frequency than hours. Item data is not that
> big, so loading it and running the diff manually would still be
> workable. It would be slightly slower for big items (since each claim
> for the item has to be examined) and requires maintaining additional
> data structure to efficiently enumerate the claims, but it should be
> still workable.
>
> Thanks,
> Stas
>



-- 
Best regards,
Jan Zerebecki
Software Engineer

Wikimedia Deutschland e.V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Phone: +49 (0)30 219 158 26-0
http://wikimedia.de

Imagine a world, in which every single human being can freely share in the
sum of all knowledge. That‘s our commitment.

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.
_______________________________________________
Wikidata-tech mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

Reply via email to