Protocol level redirects (asp redirects), meaning the server sends a redirect response 3xx code, work correctly in Nutch 0.8 dev. It processes it as a completely new page. If you are doing asp forwards I believe that the original page (www.domain.com/?code.aspx&redirect=445454) would be the URL that shows up in the search because Nutch doesn't know what is going on behind the scenes in the ASP code. It knows url and content recieved. As of right now in 0.8 dev meta level redirects (meta refesh tags) don't work correctly. They did in 0.7 but I don't think that functionality has been ported.

Dennis

Insurance Squared Inc. wrote:
How are redirects listed in version 0.7? If the crawler finds a link like:
www.domain.com/?code.aspx&redirect=445454
and that link redirects through to www.another-domain.com, which of those two links will show up in nutch?

(I'm wondering if I can use nutch to crawl sites with a lot of redirects, and still end up with the correct redirected domain in the listings).



-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to