On 9 March 2011 16:00, Platonides <[email protected]> wrote: >> Dear Members, >> I am Ramesh, pursuing my PhD in Monash University, Malaysia. My >> Research is on blog classification using Wikipedia Categories. >> As for my experiment, I use 12 main categories of Wikipedia. >> I want to identify " which particular article belongs to which main 12 >> categories?". >> So I wrote a program to collect the subcategories of each article and >> classify based on 12 categories offline. >> I have downloaded already wiki-dump which consists of around 3 million >> article titles. >> My program takes this 3 million article titles and goes to online >> Wikipedia website and fetch the subcategories. > > Why do you need to access the live wikipedia for this? > Using categorylinks.sql and page.sql you should be able to fetch the > same data. Probably faster.
I concur. Everything required for this project should be in the dumps. _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
