I think Nutch is behaving correctly.
Maybe that page has a BASE URL (view source, look at the HEAD elements)
that throws off one or the other.

Otis


--- Raymond Creel <[EMAIL PROTECTED]> wrote:

> Has any one experience a problem with the way the
> standard html parser plugin handles relative urls?
> 
> There is a site where the home page is something like
> 
> http://www.xxxxx.com/xxxxx.cgi
> 
> and when browsing a link with its href set to
> 
> '?paramname=paramvalue'
> 
> a browser will naturally take you to
> 
> http://www.xxxxx.com/xxxxx.cgi?paramname=paramvalue
> 
> However, in nutch when the outlinks are parsed from
> the page the link ends up being
> 
> http://www.xxxxx.com/?paramname=paramvalue
> 
> which of course is broken.  So why is the xxxxx.cgi
> gone?  Is this a bug or am I missing something?
> 
> Thanks
> 
> 
> 
>               
> ____________________________________________________
> Start your day with Yahoo! - make it your home page 
> http://www.yahoo.com/r/hs 
>  
> 
> 
> -------------------------------------------------------
> SF.Net email is Sponsored by the Better Software Conference & EXPO
> September
> 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
> Agile & Plan-Driven Development * Managing Projects & Teams * Testing
> & QA
> Security * Process Improvement & Measurement *
> http://www.sqe.com/bsce5sf
> _______________________________________________
> Nutch-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/nutch-general
> 



-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO September
19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to