Hi

I have downloaded and unzipped the xml dump of Wikipedia (40+GB). I want to use 
Python and the SAX module (running under Windows 7) to carry out off-line 
phrase-searches of Wikipedia and to return a count of the number of hits for 
each search. Typical phrase-searches might be "of the dog" and "dog's".

I have some limited prior programming experience (from many years ago) and I am 
currently learning Python from a course of YouTube tutorials. Before I get much 
further, I wanted to ask:

Is what I am trying to do actually feasible?

Are there any example programs or code snippets that would help me?

Any advice or guidance would be gratefully received.

Best regards,
Kevin Glover
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to