Basically, the xml dumps have 2 IDs: page_id and revision_id.

The page_id points to the article. In this case, 14640471 is the page_id
for Mars (https://en.wikipedia.org/wiki/Mars)

The revision_id points to the latest revision for the article. For Mars,
the latest revision_id is 699008434 which was generated on 2016-01-09 (
https://en.wikipedia.org/w/index.php?title=Mars&oldid=699008434). Note that
a revision_id is generated every time a page is edited.

So, to answer your question, the IDs never change. 14640471 will always
point to Mars, while 699008434 points to the 2016-01-09 revision for Mars.

That said, different dumps will have different revision_ids, because an
article may be updated. If Mars gets updated tomorrow, and the English
Wikipedia dump is generated afterwards, then that dump will list Mars with
a new revision_id (something higher than 6999008434). However, that dump
will still show Mars with a page_id of 1460471. You're probably better off
using the page_id.

Finally, you can see also reference the Wikimedia API to get a similar view
to the dump: For example:
https://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Mars&rvprop=content|ids

Hope this helps.


On Mon, Jan 11, 2016 at 5:09 AM, Luigi Assom <[email protected]> wrote:

> yep, same here!
>
> Also another question about consistency of _IDs in time.
> I was working with an old version of wikipedia dump, and testing some
> data models I built on the dumpusing as pivot a few topics.
> I might have data corrupted on my side, but just to be sure:
> are _IDs of article *persistent* over time, or are they subjected to
> change?
>
> Might happen that due any fallback or merge in an article history, ID
> would change?
> E.g. as test article "Mars" would first point to a version _ID ="4285430"
> and then changed to "14640471"
>
> I need to ensure _IDs will persist.
> thank you!
>
>
> *P.S. sorry for cross posting - I've replied from wrong email - could you
> please delete the other message and keep only this email address? thank
> you! *
>
> On Mon, Jan 11, 2016 at 11:05 AM, XDiscovery Team <[email protected]>
> wrote:
>
>> yep, same here!
>>
>> Also another question about consistency of _IDs in time.
>> I was working with an old version of wikipedia dump, and testing some
>> data models I built on the dump using as pivot a few topics.
>> I might have data corrupted on my side, but just to be sure:
>> are _IDs of article *persistent* over time, or are they subjected to
>> change?
>>
>> Might happen that due any fallback or merge in an article history, ID
>> would change?
>> E.g. as test article "Mars" would first point to a version _ID ="4285430"
>> and then changed to "14640471"
>>
>> I need to ensure _IDs will persist.
>> thank you!
>>
>>
>> On Mon, Jan 11, 2016 at 6:22 AM, Tilman Bayer <[email protected]>
>> wrote:
>>
>>> On Sun, Jan 10, 2016 at 4:05 PM, Bernardo Sulzbach <
>>> [email protected]> wrote:
>>>
>>>> On Sun, Jan 10, 2016 at 9:55 PM, Neil Harris <[email protected]>
>>>> wrote:
>>>> > Hello! I've noticed that no enwiki dump seems to have been generated
>>>> so far
>>>> > this month. Is this by design, or has there been some sort of dump
>>>> failure?
>>>> > Does anyone know when the next enwiki dump might happen?
>>>> >
>>>>
>>>> I would also be interested.
>>>>
>>>> --
>>>> Bernardo Sulzbach
>>>>
>>>> _______________________________________________
>>>> Wikitech-l mailing list
>>>> [email protected]
>>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>>
>>>
>>> CCing the Xmldatadumps mailing list
>>> <https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l>, where
>>> someone has already posted
>>> <https://lists.wikimedia.org/pipermail/xmldatadumps-l/2016-January/001214.html>
>>>  about
>>> what might be the same issue.
>>>
>>> --
>>> Tilman Bayer
>>> Senior Analyst
>>> Wikimedia Foundation
>>> IRC (Freenode): HaeB
>>>
>>> _______________________________________________
>>> Xmldatadumps-l mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
>>>
>>>
>>
>>
>> --
>> *Luigi Assom*
>> Founder & CEO @ XDiscovery - Crazy on Human Knowledge
>> *Corporate*
>> www.xdiscovery.com
>> *Mobile App for knowledge Discovery*
>> APP STORE <http://tiny.cc/LearnDiscoveryApp>  | PR
>> <http://tiny.cc/app_Mindmap_Wikipedia>  | WEB
>> <http://www.learndiscovery.com/>
>>
>> T +39 349 3033334 | +1 415 707 9684
>>
>
>
>
> --
> *Luigi Assom*
>
> T +39 349 3033334 | +1 415 707 9684
> Skype oggigigi
>
> _______________________________________________
> Xmldatadumps-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
>
>
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to