On Thu, July 30, 2009 9:11 pm, Stig Meireles Johansen wrote:
> I hacked a little on some old perl-code I had laying around which I once
> wrote for making fake apache-logs from a full xml-history dump (for
> analyzing with analog)..
> It ain't pretty, but works at least for me... I have only tried it on the
> full "pages-meta-history" files, but it should work on any similar
> XML-dump.. it only searches the last revision of a page.
> here it is: http://toolserver.no/~stigmj/tools/src/xml-search.pl.txt
>
> /Stigmj

Suggestion: pywikipediabot has good built-in support. My attempt at
building a simple parser (http://arctus.nl/~valhallasw/pulldom.py) is
about 10 times slower than just using four (much more readable) lines of
code:

import xmlreader

for page in
xmlreader.XmlDump('/home/valhallasw/download/nlwikiquote-20090730-pages-articles.xml').parse():
  if '{|' in page.text:
    print page.title


I sometimes am surprised of pywikipediabot myself :)

-Merlijn


_______________________________________________
Pywikipedia-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

Reply via email to