Sax is the simplest to get started with. here is a simple example. See http://docs.python.org/lib/content-handler-objects.html for more info on the methods of ContentHandler. Using some list or dict of tags you're processing such as I do in the following example will keep "tag specific code" down to a minimum. Theoretically, you'd just have to change the list you initialize the object with and the endElement function to handle a new tag.
from xml import sax from xml.sax.handler import ContentHandler class myhandler(ContentHandler): def __init__(self, tagsToChirpOn=None): self.last = "" self.info = "" if tagsToChirpOn is None: self.chirpTags = [] else: self.chirpTags = tagsToChirpOn #then you define a start element method # this is called for each open tag you see def startElement(self,name,attr): self.last = name self.info = "" if name in self.chirpTags: print "starting %s tag" % name #then you define a characters method, which # is called on sections of text inside the # tags until it is all found def characters(self,content): if self.last in self.chirpTags: self.info +=content #then if you need to define an action to happen #when an end tag is hit, you write a def endElement(self,name): """called at </closetag""" if len(self.info) > 0: print "In tag %s was data{{%s}}" % (self.last,self.info) if name in self.chirpTags: print "Now leaving the %s tag" % name if __name__=="__main__": document = """ <xml> <foo>line 1 bars are fun </foo> <bar>line 2 dogs don't like celery </bar> <baz> 121309803124.12 </baz> </xml>""" hand = myhandler(["bar","baz"]) sax.parseString(document,hand) You often need to build a state machine or some other stateful tracking system to make Sax parsers do complicated things, but the above is good enough for most things involving data. If you use the start tag to create a new object, the characters tag to populate it and then the endElement tag to submit the object to a greater data structure, you can very easily build objects out of XML data of any source. I used sax parsers most recently on parsing out REST data from amazon. urlib2 and sax parsers are formidable, quick technologies to perform simple parsing needs. Look into BeautifulSoup as well: http://www.crummy.com/software/BeautifulSoup/ --Michael On Dec 20, 2007 4:15 PM, Lockhart, Luke <[EMAIL PROTECTED]> wrote: > > > > > Hello all, > > So I'm a very novice Python programmer. I've done stuff up to the > intermediate level in Microsoft flavors of BASIC and C++, but now I'm a > Linux man and trying to realize my overly ambitious programming dreams with > Python, mainly because I have friends who use it and because it has > libraries that in general are very good at doing what I want to do. > > Now, the program I'm working on would hypothetically store and read all > data as XML, and yes I'm aware of the performance drawbacks and I'm willing > to live with them. But I just can't figure out the various libraries that > Python uses to read XML, and my friend's code doesn't help much. > > Basically, what I really would like to do, without going into a lot of > detail, is be able to read various tags into classes and arrays until the > entire file has been read, then remove the file from memory. I first tried > to use the basic Python XML libraries, and then my friend recommended SAX - > but so far as I can tell, either method requires numerous lines of code to > support one new tag. Is this what I'm going to have to do, or is there a > simpler way? > > Thanks in advance, > Luke > _______________________________________________ > Tutor maillist - Tutor@python.org > http://mail.python.org/mailman/listinfo/tutor > > -- Michael Langford Phone: 404-386-0495 Consulting: http://www.RowdyLabs.com _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor