Hi I have downloaded and unzipped the xml dump of Wikipedia (40+GB). I want to use Python and the SAX module (running under Windows 7) to carry out off-line phrase-searches of Wikipedia and to return a count of the number of hits for each search. Typical phrase-searches might be "of the dog" and "dog's".
I have some limited prior programming experience (from many years ago) and I am currently learning Python from a course of YouTube tutorials. Before I get much further, I wanted to ask: Is what I am trying to do actually feasible? Are there any example programs or code snippets that would help me? Any advice or guidance would be gratefully received. Best regards, Kevin Glover -- https://mail.python.org/mailman/listinfo/python-list