"Not clear from your question whether your goal is to learn to parse XML in python or to solve a particular problem. If your goal is to learn python XML processing, then go right ahead -- however, it looks like you are using SAX below, and the sort of thing you describe might be done better using a DOM parser ( or maybe etree )" - It's a bit of both - learning XML parsing through solving a problem. I started with SAX because that's how the book I have does it.
I have looked up ElementTree and this looks like a much easier and much more elegant solution to my problem. "Not that it can't be done in SAX -- it's just that, as you discovered, low level SAX parsing requires that you keep track of the containment hierarchy yourself, which is a lot of work to solve a simple problem." - I see now that I was doing a lot more work than I really needed to to accomplish my goal. Thanks a lot Steve for the in-depth (from my perspective) explanation of all the solutions available to me. I appreciate the help. Bryan On Sat, Feb 7, 2009 at 2:18 AM, Steve Majewski <sd...@mac.com> wrote: > > Not clear from your question whether your goal is to learn to parse XML in > python > or to solve a particular problem. If your goal is to learn python XML > processing, > then go right ahead -- however, it looks like you are using SAX below, and > the sort > of thing you describe might be done better using a DOM parser ( or maybe > etree ) > > If what you want is not just to select some info from the xml file, but to > get it > into a Python object so that you can then manipulate it further, then DOM > or etree > is also probably a better model. It will parse the XML ( likely using SAX > underneath ) > and give you an object that encodes the whole file. > > [ Not that it can't be done in SAX -- it's just that, as you discovered, > low level > SAX parsing requires that you keep track of the containment hierarchy > yourself, > which is a lot of work to solve a simple problem. ] > > > If you're just trying to work with XML, then most folks don't write XML > parsers for > that sort of thing, but use higher level tools: XSLT, XPATH and or XQUERY. > > The Mac has xsltproc as a built-in xslt (1.0) processor. > There is a xpath program written in perl in Leopard/10.5. ( /usr/bin/xpath > ) > And Saxon is easily downloaded and does xslt 2.0 and xquery 1.0 . > > > The following XSLT 1.0 stylesheet: > > <?xml version="1.0" encoding="UTF-8"?> > <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" > version="1.0"> > <xsl:output method="text"/> > > <xsl:template match="/"> > <xsl:apply-templates select="/topalbums/alb...@rank < 6]"/> > <!-- just select the top 5 albums --> > </xsl:template> > > <xsl:template match="/topalbums/album" > > album: <xsl:value-of select="name"/> > artist: <xsl:value-of select="artist/name"/> > count=<xsl:value-of select="playcount"/> > <xsl:text> > </xsl:text> <!-- this is here to insert the blank line break --> > </xsl:template> > > </xsl:stylesheet> > > > Will, when run on that file, produce this output: > ~$ xsltproc Untitled1.xsl topalbums.xml > > album: Vheissu > artist: Thrice > count=332 > > album: The Artist in the Ambulance > artist: Thrice > count=289 > > album: Appeal To Reason > artist: Rise Against > count=286 > > album: Favourite Worst Nightmare > artist: Arctic Monkeys > count=210 > > album: The Sufferer & The Witness > artist: Rise Against > count=206 > > [ Not sure if that's anything like what you want. ] > > > I'm sure that the whole thing would reduce to an even more concise XQuery > request. > > I was trying to do the whole thing as an xpath one liner, but it didn't > like > my attempts to include alternates in parenthesis. I think this is an xpath > 1.0 > vs. xpath 2.0 issue. Saxon is the only thing that supports 2.0. The perl, > python > and java libraries only support xpath 1.0. > > This sort of expression did work using xpath 2.0 (in oxygen editor): > > //alb...@rank < 6]/(name|playcount|artist/name) > > But I couldn't figure out a 1.0 syntax that would grab all three fields. > > ( and the perl xpath seems to have a bug that interprets '@rank < 6' as > less-than-or-equal! ) > > > -- Steve Majewski > > > > > On Feb 6, 2009, at 11:00 PM, Bryan Smith wrote: > > Hi everyone, >> >> I have another question I'm hoping someone would be kind enough to answer. >> I am new to parsing XML (not to mention much of Python itself) and I am >> trying to parse an XML file. The file I am trying to parse is this one: >> http://ws.audioscrobbler.com/2.0/user/bryansmith/topalbums.xml. >> >> So far, I have written up a class for parsing this file in my attempts to >> present to the user a list of top albums on their last.fm profile. If you >> note, the artist name and album name are both signified by the <name> tag >> which makes my job harder. If the tag names were different, I wouldn't have >> a problem. Listed below is the class I have written to parse the file. My >> question then is this: is there a way I can say something like "if tag_name >> == album name tag then....elif tag_name == artist name tag....". I hope this >> is clear. >> >> As it stands right now, if I parse this file and print the results, this >> is what I get (understandably) if I try to print out in the following >> fashion - album (playcount): Vheissu (332), Thrice (289), The Artist in the >> Ambulance (286), Thrice (210) and so on. Thrice is the artist name. I want >> to be able to differentiate between the "artist" name tag and the "album" >> name tag. >> >> >> Class as it stands right now: >> >> class GetTopAlbums(ContentHandler): >> >> in_album_tag = False >> in_playcount_tag = False >> >> def __init__(self, album, playcount): >> ContentHandler.__init__(self) >> self.album = album >> self.playcount = playcount >> self.data = [] >> >> def startElement(self, tag_name, attr): >> if tag_name == "name": >> self.in_album_tag = True >> elif tag_name == "playcount": >> self.in_playcount_tag = True >> >> def endElement(self, tag_name): >> if tag_name == "name": >> content = "".join(self.data) >> self.data = [] >> self.album.append(content) >> self.in_album_tag = False >> elif tag_name == "playcount": >> content = "".join(self.data) >> self.data = [] >> self.playcount.append(content) >> self.in_playcount_tag = False >> >> def characters(self, string): >> if self.in_album_tag == True: >> self.data.append(string) >> elif self.in_playcount_tag == True: >> self.data.append(string) >> >> Thanks in advance! >> Bryan >> _______________________________________________ >> Pythonmac-SIG maillist - Pythonmac-SIG@python.org >> http://mail.python.org/mailman/listinfo/pythonmac-sig >> > >
_______________________________________________ Pythonmac-SIG maillist - Pythonmac-SIG@python.org http://mail.python.org/mailman/listinfo/pythonmac-sig