Re: [Wikidata-tech] claim IDs in wikidata

Daniel Kinzler Mon, 22 Dec 2014 01:00:15 -0800

You could just use the regular recentchanges feed for the main namespace:

https://www.wikidata.org/w/api.php?action=query&list=recentchanges&rcnamespace=0&rctoponly&rclimit=50


If that's too slow, just query the recentchanges table directly. Or, if you want
to be more wikibase centric, query the wbchanges table, it's conceptually 
similar.

HTH
Daniel



Am 21.12.2014 12:35, schrieb Jan Zerebecki:
> It might be a good idea to add an API that outputs the entity IDs that changed
> since time x or revision y. For older data it could refer to index files for 
> the
> dumps. Makes probably more sense than to create a dump each minute.
> 
> hourly dumps: https://phabricator.wikimedia.org/T85100
> changed entities API: https://phabricator.wikimedia.org/T85103
> 
> On Sat, Dec 20, 2014 at 9:03 PM, Stas Malyshev <[email protected]
> <mailto:[email protected]>> wrote:
> 
>     Hi!
> 
>     > The best place for this kind of question would be the wikidata-tech 
> mailing list
>     > <[email protected]
>     <mailto:[email protected]>>. It would probably be a good
>     idea if you
>     > (and whoever else deals with wikidata on the technical level) were 
> subscribed
>     > there. It's pretty low traffic.
> 
>     Thanks, I've sent the subscription request and adding it to the CC.
>     Still learning the right places to go for things :)
> 
>     > Statement IDs are GUIDs (with the Item ID prefixed), and they do not 
> change when
>     > the Statement changes (otherwise, they would be hashes, not IDs - 
> References are
>     > currently handled by hash).
> 
>     From the export/import point of view, I think I'd prefer immutable
>     claims (i.e. ID changes each time claim changes) as they are easier to
>     handle, but as it is not the case, I can switch to using the content
>     hash instead. The performance impact (time spent calculating the hashes)
>     should not be too big.
> 
>     > One thing that would be rather easy to do is to make JSON dumps of just 
> the
>     > items that changed in the last X hours. But that wouldn't tell you wich
>     > statements changed.
> 
>     I think for imports the best thing would be to have real diffs - i.e.
>     list of claims/item fields that were added/removed/changed - but if
>     that's not feasible, list of changed items would be great too. We may
>     want this with even more frequency than hours. Item data is not that
>     big, so loading it and running the diff manually would still be
>     workable. It would be slightly slower for big items (since each claim
>     for the item has to be examined) and requires maintaining additional
>     data structure to efficiently enumerate the claims, but it should be
>     still workable.
> 
>     Thanks,
>     Stas
> 
> 
> 
> 
> -- 
> Best regards,
> Jan Zerebecki
> Software Engineer
> 
> Wikimedia Deutschland e.V. | Tempelhofer Ufer 23-24 | 10963 Berlin
> Phone: +49 (0)30 219 158 26-0
> http://wikimedia.de
> 
> Imagine a world, in which every single human being can freely share in the sum
> of all knowledge. That‘s our commitment.
> 
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter 
> der
> Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
> Körperschaften I Berlin, Steuernummer 27/681/51985.


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

_______________________________________________
Wikidata-tech mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

Re: [Wikidata-tech] claim IDs in wikidata

Reply via email to