On 13/09/2007, sacha rook <[EMAIL PROTECTED]> wrote: > [CODE] > > from BeautifulSoup import BeautifulSoup > doc = ['<html><head><title>Page title</title></head>', > '<body><p id="firstpara" align="center">This is paragraph > <b>one</b>.', > '<p id="secondpara" align="blah">This is paragraph <b>two</b>.', > '<a href="http://www.google.co.uk"></a>', > '<a href="http://www.bbc.co.uk"></a>', > '<a href="http://www.amazon.co.uk"></a>', > '<a href="http://www.redhat.co.uk"></a>', > '</html>'] > soup = BeautifulSoup(''.join(doc)) > blist = soup.findAll('a') > print blist > import urlparse > for a in blist: > href = a['href'] > print urlparse.urlparse(href)[1] > > [/CODE]
Works fine for me: >>> ## working on region in file python-tmp-371673F... [<a href="http://www.google.co.uk"></a>, <a href="http://www.bbc.co.uk"></a>, <a href="http://www.amazon.co.uk"></a>, <a href="http://www.redhat.co.uk"></a>] www.google.co.uk www.bbc.co.uk www.amazon.co.uk www.redhat.co.uk But as Kent wrote; show the whole traceback, not just the last line. -- - Rikard - http://bos.hack.org/cv/ _______________________________________________ Tutor maillist - [email protected] http://mail.python.org/mailman/listinfo/tutor
