On 30/06/15 16:10, Joshua Valdez wrote:
So I wrote this script to go over a large wiki XML dump and pull out the
pages I want. However, every time I run it the kernel displays 'Killed' I'm
assuming this is a memory issue after reading around but I'm not sure where
the memory problem is in my script

That's quite a big assumption.
How big is the wiki file? How much RAM do you have?
What do your system resource monitoring tools (eg top) say?

and if there were any tricks to reduce
the virtual memory usage.

Of course, but as always be sure what you are tweaking before you start. Otherwise you can waste a lot of time doing nothing useful.

from bs4 import BeautifulSoup
import sys

pages_file = open('pages_file.txt', 'r')

....

#####################################

with open(sys.argv[1], 'r') as wiki:
     soup = BeautifulSoup(wiki)
wiki.closed

Is that really what you mean? Or should it be

wiki.close()?

wiki_page = soup.find_all("page")
del soup
for item in wiki_page:
     title = item.title.get_text()
     if title in page_titles:
         print item
         del title

--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to