Revision: 6380 Author: nicdumz Date: 2009-02-20 02:37:42 +0000 (Fri, 20 Feb 2009)
Log Message: ----------- Modifying linksearch(): When looking a specific top level domain, e.g. "-weblink:*.yu", we were retrieving Linksearch/yu and Linksearch/*.yu ... :s Now we only retrieve the pages that the user asked for, i.e. urls matching *.yu Question for code reviewer: Anyone knows why we feel the need, when user asks for -weblink:wikimedia.org, to provide him with page containing links to http://wikimedia.org AND every subsite http://*.wikimedia.org ?? Modified Paths: -------------- trunk/pywikipedia/wikipedia.py Modified: trunk/pywikipedia/wikipedia.py =================================================================== --- trunk/pywikipedia/wikipedia.py 2009-02-19 13:58:20 UTC (rev 6379) +++ trunk/pywikipedia/wikipedia.py 2009-02-20 02:37:42 UTC (rev 6380) @@ -5660,12 +5660,14 @@ def linksearch(self, siteurl, limit=500): """Yield Pages from results of Special:Linksearch for 'siteurl'.""" - if siteurl.startswith('*.'): - siteurl = siteurl[2:] output(u'Querying [[Special:Linksearch]]...') cache = [] R = re.compile('title ?=\"([^<>]*?)\">[^<>]*</a></li>') - for url in [siteurl, '*.' + siteurl]: + + urlsToRetrieve = [siteurl] + if not siteurl.startswith('*.'): + urlsToRetrieve.append('*.' + siteurl) + for url in urlsToRetrieve: offset = 0 while True: path = self.linksearch_address(url, limit=limit, offset=offset) _______________________________________________ Pywikipedia-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
