Re: [Xmldatadumps-l] Missing pages in enwiki pages-articles-multistream dumps

2018-02-27 Thread Ryan Hitchman
Thanks for the quick fix! I'll verify it too with the next run.

I discovered this while building a link graph directly from the
pages-articles dump, and finding that I had more broken links (missing
target articles) than expected.

On Tue, Feb 27, 2018 at 4:10 AM, Ariel Glenn WMF 
wrote:

> It turns out that this happens for exactly 27 pages, those at the end of
> each enwiki-20180220-stub-articlesXX.xml.gz file.  Tracking here:
> https://phabricator.wikimedia.org/T188388
>
> Ariel
>
> On Tue, Feb 27, 2018 at 10:45 AM, Ryan Hitchman 
> wrote:
>
>> Multiple pages are missing from the enwiki pages-articles-multistream
>> dumps from 20180201 and 20180220.
>>
>> Page id 88444: "Phosphor" doesn't appear in the index or in the data
>> stream. This also happens for TARDIS, Psalm 132, and many others
>>
>> Why would the dump be partial?
>>
>> ___
>> Xmldatadumps-l mailing list
>> Xmldatadumps-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
>>
>>
>
___
Xmldatadumps-l mailing list
Xmldatadumps-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l


Re: [Xmldatadumps-l] Missing pages in enwiki pages-articles-multistream dumps

2018-02-27 Thread Ariel Glenn WMF
It turns out that this happens for exactly 27 pages, those at the end of
each enwiki-20180220-stub-articlesXX.xml.gz file.  Tracking here:
https://phabricator.wikimedia.org/T188388

Ariel

On Tue, Feb 27, 2018 at 10:45 AM, Ryan Hitchman  wrote:

> Multiple pages are missing from the enwiki pages-articles-multistream
> dumps from 20180201 and 20180220.
>
> Page id 88444: "Phosphor" doesn't appear in the index or in the data
> stream. This also happens for TARDIS, Psalm 132, and many others
>
> Why would the dump be partial?
>
> ___
> Xmldatadumps-l mailing list
> Xmldatadumps-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
>
>
___
Xmldatadumps-l mailing list
Xmldatadumps-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l


[Xmldatadumps-l] Missing pages in enwiki pages-articles-multistream dumps

2018-02-27 Thread Ryan Hitchman
Multiple pages are missing from the enwiki pages-articles-multistream dumps
from 20180201 and 20180220.

Page id 88444: "Phosphor" doesn't appear in the index or in the data
stream. This also happens for TARDIS, Psalm 132, and many others

Why would the dump be partial?
___
Xmldatadumps-l mailing list
Xmldatadumps-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l