Re: [Wikitech-l] strange page id numbering

2011-02-15 Thread Anthony Ventresque (Dr)
Anthony Ventresque (Dr) wrote: Hi, I've found something strange in some files. The maximum ids for a page are: latest pages-articles.xml: 29189922 page.sql: 28707562 categorylinks.sql: 28705949 (15,684 categories and 135,521 articles are missing

Re: [Wikitech-l] strange page id numbering

2011-02-15 Thread Anthony Ventresque (Dr)
On Tue, Feb 15, 2011 at 9:29 AM, Anthony Ventresque (Dr) aventres...@ntu.edu.sg wrote: I was indeed suspecting something like that, but the difference in number of pages is large while we are talking about a relatively short delay (minutes?). Depending on what site you're talking

Re: [Wikitech-l] categorisation issues in dumps

2011-02-14 Thread Anthony Ventresque (Dr)
: [Wikitech-l] categorisation issues in dumps Anthony Ventresque (Dr) wrote: Hi, I am trying to build an offline version of the wikipedia categorisation tree. As usual with projects on wikipedia, I've downloaded dumps (actually the interesting one here is pages-articles.xml). And I found that none

[Wikitech-l] strange page id numbering

2011-02-14 Thread Anthony Ventresque (Dr)
Hi, I've found something strange in some files. The maximum ids for a page are: latest pages-articles.xml: 29189922 page.sql: 28707562 categorylinks.sql: 28705949 (15,684 categories and 135,521 articles are missing) 2011-01-15 pages-articles.xml: 30492297

[Wikitech-l] categorisation issues in dumps

2011-02-07 Thread Anthony Ventresque (Dr)
Hi, I am trying to build an offline version of the wikipedia categorisation tree. As usual with projects on wikipedia, I've downloaded dumps (actually the interesting one here is pages-articles.xml). And I found that none of the dumps has the relation between Category:1960_works and