Lee McFadden schrieb: > If everyone else agrees with Chris I will take a look at the moinmoin > instance to try and remove the comment system.
I hope the */PageCommentData pages will not be removed as well? We could harvest them automatically later and add their content to the parent pages. Could probably done by a little script with BeautifulSoup and mechanize. BTW, I just hacked together a little script that downloads every page in the wiki and checks, if it is broken (i.e. contains a DIV with class="traceback"). This could be easiliy extended to do more checks and we could run it on a regular basis. It should cache the downloaded pages though. The script is attached. It currently coughs up the following list of broken pages: /1.0/AlternativeTemplating /1.0/CLIReference /1.0/Configuration /1.0/GenerateFigures /1.0/GettingStarted/Admin /1.0/GettingStarted/Configuration /1.0/TgAdmin /1.0/ThirdParty /DocTeam /VideoHelp Chris --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "TurboGears Docs" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/turbogears-docs?hl=en -~----------~----~----~----~------~----~------~--~---
#!/usr/bin/env python import urllib import sys from BeautifulSoup import BeautifulSoup BASE_URL = 'http://docs.turbogears.org' TITLE_INDEX = BASE_URL + '/TitleIndex' def search_pages(urls, searchstring): """Download all pages in urls and return those whose content contains serachstring.""" pages = [] print >>sys.stderr, "Downloading and parsing wiki pages..." for url in urls: try: print >>sys.stderr, "Downloading '%s'..." % url ret = urllib.urlopen(BASE_URL + url) except: print >>sys.stderr, "Could not open '%s'" % url else: if searchstring in ret.read(): pages.append(url) return pages def main(args): try: print >>sys.stderr, "Retrieving title index..." ret = urllib.urlopen(TITLE_INDEX) except: print >>sys.stderr, "Could not retrieve title index from", TITLE_INDEX return 1 else: title_index = ret.read() soup = BeautifulSoup(title_index) content = soup.find('div', id='content') links = content.findAll('a', href=True) # only visit relative URLs without query params urls = [l['href'] for l in links if not (l['href'].startswith('http://') or l['href'].startswith('#') or '?' in l['href'])] urls.sort() print >>sys.stderr, urls broken_pages = search_pages(urls, '<div class="traceback">') print "Broken pages" print print "\n".join(broken_pages) if __name__ == '__main__': sys.exit(main(sys.argv[1:]))
