Re: [Tutor] reading and processing xml files with python

Stefan Behnel Sun, 21 Jun 2009 00:31:24 -0700

Hi,

[email protected] wrote:
> I am a total python XML noob and wanted some clarification on using python 
> with reading remote XML data.


For XML in general, there's xml.etree.ElementTree in the stdlib. For remote
data (and for various other features), you should also try lxml.etree,
which is an advanced re-implementation.

http://codespeak.net/lxml/


> All examples I have found assumes the data is stored localy or have I 
> misunderstood this?

Likely a misunderstanding. Any XML library I know of can parse from a
string or a file-like object (which a socket is, for example).


> If I browse to:
> 'user:[email protected]/external/xmlinterface.jsp?cid=xxx&resType=hotel200631&intfc=ws&xml='
> 
> This request returns a page like:
> 
> <HotelAvailabilityListResults size="25">
> −
> <Hotel>
> <hotelId>134388</hotelId>
> <name>Milford Plaza at Times Square</name>
> <address1>700 8th Avenue</address1>
> <address2/>
> <address3/>
> <city>New York</city>
> <stateProvince>NY</stateProvince>
> <country>US</country>
> <postalCode>10036</postalCode>
> <airportCode>NYC</airportCode>
> <lowRate>155.4</lowRate>
> <highRate>259.0</highRate>
> <rateCurrencyCode>USD</rateCurrencyCode>
> <latitude>40.75905</latitude>
> <longitude>-73.98844</longitude>
[...]
> <rateFrequency>B</rateFrequency>
> </PromoRateInfo>
> </HotelProperty>
> </Hotel>
> 
> 
> I got this so far:
> 
>>>> import urllib2
>>>> request = 
>>>> urllib2.Request('user:[email protected]/external/xmlinterface.jsp?cid=xxx&resType=hotel200631&intfc=ws&xml=')
>>>> opener = urllib2.build_opener()
>>>> firstdatastream = opener.open(request)
>>>> firstdata = firstdatastream.read()
>>>> print firstdata

I never used HTTP authentication with lxml (ElementTree doesn't support
parsing from remote URLs at all), so I'm not sure if this works:

        url = 'user:[email protected]/external/...'

        from lxml import etree
        document = etree.parse(url)

If it doesn't, you can use your above code (BTW, isn't urlopen() enough
here?) up to the .open() call and do this afterwards:

        document = etree.parse( firstdatastream )


> <HotelAvailabilityListResults size='25'>
>   <Hotel>
>     <hotelId>134388</hotelId>
>     <name>Milford Plaza at Times Square</name>
>     <address1>700 8th Avenue</address1>
>     <address2/>
>     <address3/>
>     <city>New York</city>
>     <stateProvince>NY</stateProvince>
>     <country>US</country>
>     <postalCode>10036</postalCode>
> 
> ...
> 
> I would like to understand how to manipulate the data further and extract for 
> example all the hotel names in a list?

Read the tutorials on ElementTree and/or lxml. To get a list of hotel
names, I'd expect this to work:

        print [ name.text for name in document.find('//Hotel/name') ]

Stefan

_______________________________________________
Tutor maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] reading and processing xml files with python

Reply via email to