Hello,

I'm trying to get information from Wikipedia dump based on 
categorization or template usage. I first query MediaWiki API with 
embeddedin or categorymembers query to get a list of articles I'm 
interested in. Then I retrieve them from the dump and extract the 
information I need. The problem is that sometimes the current titles 
retrieved using the API doesn't match with what's in the dump because 
the article has been moved, for example.

I think I could use two options to solve the problem:
- Parse the categorization and template usage information from all 
articles in the dump and build the list of all articles in given 
category and using given templates myself. This might be prone to errors 
because of the need of custom parsing.
- Import the dump into the local MediaWiki installation and query the 
API locally. But from what I read in the documentation importing the 
dump into a database can take an excessive amount of time.

Is there any easier option? Is there a dump of categorization and 
template usage kept somewhere? Or perhaps I missed something and this 
information can be retrieved from the dump without parsing it?

Thanks,
Piotr

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to