If you are just trying to get at the structure from the various dump
files, the page table has page ids, titles, and whether the page is a
redirect or not (*-page.sql.gz), the category table has category names,
ids, and summary information (*-category.sql.gz), and categorylinks has
the list of all category links in a page, with the page id and the
category name (*-categorylinks.sql.gz).  You can find details on the
tables here: http://www.mediawiki.org/wiki/Manual:Categorylinks_table 
(here's the category:
http://www.mediawiki.org/wiki/Category:MediaWiki_database_tables )

Hopefully this should get you started.

Ariel

Στις 09-01-2013, ημέρα Τετ, και ώρα 10:51 -0800, ο/η Robert Crowe
έγραψε:
> I'd like to mirror just the category structure of the English
> Wikipedia, and I'm wondering which of the dump files I need to start
> with.
> 
>  
> 
> I don't need the page content, just the page names, and only for the
> most current revision.  I need the categories and category members,
> and I'd like to exclude hidden categories.  I also need to distinguish
> redirects, because I don't want to treat them as separate pages.  As
> much as possible I'd like to work with SQL files, but I can crunch
> through XML if necessary.
> 
>  
> 
> So which files do I need to download?  I may also need some help in
> understanding the schemas.
> 
>  
> 
> Thanks,
> 
>  
> 
> Robert
> 
>  
> 
> 
> _______________________________________________
> Xmldatadumps-l mailing list
> Xmldatadumps-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l



_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l

Reply via email to