I think Nutch is behaving correctly. Maybe that page has a BASE URL (view source, look at the HEAD elements) that throws off one or the other.
Otis --- Raymond Creel <[EMAIL PROTECTED]> wrote: > Has any one experience a problem with the way the > standard html parser plugin handles relative urls? > > There is a site where the home page is something like > > http://www.xxxxx.com/xxxxx.cgi > > and when browsing a link with its href set to > > '?paramname=paramvalue' > > a browser will naturally take you to > > http://www.xxxxx.com/xxxxx.cgi?paramname=paramvalue > > However, in nutch when the outlinks are parsed from > the page the link ends up being > > http://www.xxxxx.com/?paramname=paramvalue > > which of course is broken. So why is the xxxxx.cgi > gone? Is this a bug or am I missing something? > > Thanks > > > > > ____________________________________________________ > Start your day with Yahoo! - make it your home page > http://www.yahoo.com/r/hs > > > > ------------------------------------------------------- > SF.Net email is Sponsored by the Better Software Conference & EXPO > September > 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices > Agile & Plan-Driven Development * Managing Projects & Teams * Testing > & QA > Security * Process Improvement & Measurement * > http://www.sqe.com/bsce5sf > _______________________________________________ > Nutch-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/nutch-general > ------------------------------------------------------- SF.Net email is Sponsored by the Better Software Conference & EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
