Problem plucking CGI URLs with & parameters

Craig Artley Wed, 14 Aug 2002 16:23:59 -0700

Greetings, I am a newbie Clie owner and plucker user.  This is wonder stuff!


I have a problem with the Spider.py script when it comes to URLs with 
&name=val parameters.  My machine has Python 2.1.1 (Mandrake Linux 8.1 
distro).  Spider.py appears to truncate such href urls at the first &.  For 
example:

<a 
HREF=http://www.hti.umich.edu/cgi/r/rsv/rsv-idx?type=DIV1&byte=1801>Genesis</a>

gets scanned as

http://www.hti.umich.edu/cgi/r/rsv/rsv-idx?type=DIV1

without the second 'byte=' parameter.  When it goes to pluck this URL, it of 
course fails because of the missing parameter.

I run with --verbosity=3 and see this:

Looking at suburl http://www.hti.umich.edu/cgi/r/rsv/rsv-idx?type=DIV1...

I spent more time than I care to admit hacking around the PyPlucker python 
files, but I cannot see where it is going wrong in sgmllib and/or 
TextParser....

Anyone else have this problem?

Regards,
    -craig    [EMAIL PROTECTED]

_________________________________________________________________
Chat with friends online, try MSN Messenger: http://messenger.msn.com

Problem plucking CGI URLs with & parameters

Reply via email to