Danny B. wrote:
> 
> I'm looking for any kind of tool which would take the XML dump (most probably 
> the pages-meta-current.xml.bz2, at least the pages-articles.xml.bz2) and 
> would return the list of page titles (or alternatively/configurably page ids) 
> of pages containing given string.
> 
> Does anybody have such (kind of) tool and is willing to share? Both command 
> line or webpage interface are OK.

If you're only interested in page titles, why not just download 
all-titles-in-ns0.gz and grep it?

Alternatively, if you want titles in other namespaces too, I have a 
small perl script I once wrote that can extract such a list from the 
page.sql.gz dump -- I can clean it up and put it online somewhere if 
you're interested.

-- 
Ilmari Karonen

_______________________________________________
Toolserver-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/toolserver-l

Reply via email to