1. I'm usng max as a limit parameter 2. I'm not sure if the dumps have the data I need. I need to get the titles for all Articles (name space = 0), with no redirects and also need the titles of all Categories (namespace = 14) without redirects
On Sat, May 6, 2017 at 11:39 PM Eran Rosenthal <eranro...@gmail.com> wrote: > 1. You can use limit parameter to get more titles in each request > 2. For getting many entries it is recommended to extract from dumps or from > database using quarry > > On May 6, 2017 22:36, "Abdulfattah Safa" <fattah.s...@gmail.com> wrote: > > > for the & in $Continue=-||, it's a type. It doesn't exist in the code. > > > > On Sat, May 6, 2017 at 10:12 PM Abdulfattah Safa <fattah.s...@gmail.com> > > wrote: > > > > > I'm trying to get all the page titles in Wikipedia in namespace using > the > > > API as following: > > > > > > https://en.wikipedia.org/w/api.php?action=query&format= > > xml&list=allpages&apnamespace=0&apfilterredir=nonredirects& > > aplimit=max&$continue=-||$apcontinue=BASE_PAGE_TITLE > > > > > > I keep requesting this url and checking the response if contains > continue > > > tag. if yes, then I use same request but change the *BASE_PAGE_TITLE > *to > > > the value in apcontinue attribute in the response. > > > My applications had been running since 3 days and number of retrieved > > > exceeds 30M, whereas it is about 13M in the dumps. > > > any idea? > > > > > > > > > > > _______________________________________________ > > Wikitech-l mailing list > > Wikitech-l@lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > _______________________________________________ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l