hello John, Thanks for your effort. Actually I need official dumps as I need to use them in my thesis. Could you please point me how did you get these ones? Also, any idea why the API doesn't work properly for en Wikipedia? I use the same code for other language and it worked.
Thanks, Abed, On Sun, May 7, 2017 at 1:45 AM John <[email protected]> wrote: > Here you go > ns_0.7z <http://tools.wmflabs.org/betacommand-dev/reports/ns_0.7z> > ns_14.7z <http://tools.wmflabs.org/betacommand-dev/reports/ns_14.7z> > > On Sat, May 6, 2017 at 5:27 PM, John <[email protected]> wrote: > > > Give me a few minutes I can get you a database dump of what you need. > > > > On Sat, May 6, 2017 at 5:25 PM, Abdulfattah Safa <[email protected]> > > wrote: > > > >> 1. I'm usng max as a limit parameter > >> 2. I'm not sure if the dumps have the data I need. I need to get the > >> titles > >> for all Articles (name space = 0), with no redirects and also need the > >> titles of all Categories (namespace = 14) without redirects > >> > >> On Sat, May 6, 2017 at 11:39 PM Eran Rosenthal <[email protected]> > >> wrote: > >> > >> > 1. You can use limit parameter to get more titles in each request > >> > 2. For getting many entries it is recommended to extract from dumps or > >> from > >> > database using quarry > >> > > >> > On May 6, 2017 22:36, "Abdulfattah Safa" <[email protected]> > wrote: > >> > > >> > > for the & in $Continue=-||, it's a type. It doesn't exist in the > code. > >> > > > >> > > On Sat, May 6, 2017 at 10:12 PM Abdulfattah Safa < > >> [email protected]> > >> > > wrote: > >> > > > >> > > > I'm trying to get all the page titles in Wikipedia in namespace > >> using > >> > the > >> > > > API as following: > >> > > > > >> > > > https://en.wikipedia.org/w/api.php?action=query&format= > >> > > xml&list=allpages&apnamespace=0&apfilterredir=nonredirects& > >> > > aplimit=max&$continue=-||$apcontinue=BASE_PAGE_TITLE > >> > > > > >> > > > I keep requesting this url and checking the response if contains > >> > continue > >> > > > tag. if yes, then I use same request but change the > *BASE_PAGE_TITLE > >> > *to > >> > > > the value in apcontinue attribute in the response. > >> > > > My applications had been running since 3 days and number of > >> retrieved > >> > > > exceeds 30M, whereas it is about 13M in the dumps. > >> > > > any idea? > >> > > > > >> > > > > >> > > > > >> > > _______________________________________________ > >> > > Wikitech-l mailing list > >> > > [email protected] > >> > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > >> > _______________________________________________ > >> > Wikitech-l mailing list > >> > [email protected] > >> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > >> _______________________________________________ > >> Wikitech-l mailing list > >> [email protected] > >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l > >> > > > > > _______________________________________________ > Wikitech-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikitech-l _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
