2007/4/19, qi wu <[EMAIL PROTECTED]>:
I find there is a bug in Fetcher,which cause the problem you reported... Now,Nutch only take external link check during the parsing process,which can make sure all the outlinks generated are in the same host as the from-URL.But for the links which will be redirected during fetch,this is not enough.we also need to make sure the redirected url is are in the same host with in the source URL. Just take the link below as an example: http://www.nxtravel.net/?feed=AS&template=Lander_Hybrid&rank=4&keyword=Loans&d=unsecured-direct-loan.com&rid=http%3A%2F%2Fwww.google.com%2Furl%3Fsa%3DL%26ai%3DBLo7nXConRq6MG5_IhQS6xtEClJquHNzjjKMGrOuW0wTAuAIQBBgEIInKzAcoBzABOAFQ0PfZ2vj_____AWCdudCBkAWYAeeHAZgBhogBqgEFMDI1MTSyAQxueHRyYXZlbC5uZXTIAQHaAQxueHRyYXZlbC5uZXTIApS06QHZAzr5xMjNnhl44AMC%26num%3D4%26q%3Dhttp%3A%2F%2Funsecured-direct-loan.com%2Funsecured-loans-online.html%26usg%3DAFrqEzct1VSZnZ48RrXOwHNyxS8qzm9O_w it will be redirected to http://unsecured-direct-loan.com/unsecured-loans-online.html
Nice to know I haven't lost it completely: finally someone else acknowledged the problem exists. :) Could you please clarify what you ment by "So just add external link check for moved and temp_moved urls should fix this problem"? TIA, t.n.a.
