2007/4/19, qi wu <[EMAIL PROTECTED]>:
I find there is a bug in Fetcher,which  cause the problem you reported...
Now,Nutch only take external link check during the parsing process,which can 
make sure all the outlinks generated are in the same host  as the from-URL.But 
for the links which  will be redirected during fetch,this is not enough.we also 
need to make sure the redirected url is are in the same host with in the source 
URL.
Just take the link below as an example:
http://www.nxtravel.net/?feed=AS&template=Lander_Hybrid&rank=4&keyword=Loans&d=unsecured-direct-loan.com&rid=http%3A%2F%2Fwww.google.com%2Furl%3Fsa%3DL%26ai%3DBLo7nXConRq6MG5_IhQS6xtEClJquHNzjjKMGrOuW0wTAuAIQBBgEIInKzAcoBzABOAFQ0PfZ2vj_____AWCdudCBkAWYAeeHAZgBhogBqgEFMDI1MTSyAQxueHRyYXZlbC5uZXTIAQHaAQxueHRyYXZlbC5uZXTIApS06QHZAzr5xMjNnhl44AMC%26num%3D4%26q%3Dhttp%3A%2F%2Funsecured-direct-loan.com%2Funsecured-loans-online.html%26usg%3DAFrqEzct1VSZnZ48RrXOwHNyxS8qzm9O_w
it will be redirected to
http://unsecured-direct-loan.com/unsecured-loans-online.html

Nice to know I haven't lost it completely: finally someone else
acknowledged the problem exists. :)
Could you please clarify what you ment by "So just add external link
check for moved and temp_moved urls should fix this problem"?

TIA,
t.n.a.

Reply via email to