2007/4/19, qi wu <[EMAIL PROTECTED]>: > I find there is a bug in Fetcher,which cause the problem you reported... > Now,Nutch only take external link check during the parsing process,which can > make sure all the outlinks generated are in the same host as the > from-URL.But for the links which will be redirected during fetch,this is not > enough.we also need to make sure the redirected url is are in the same host > with in the source URL. > Just take the link below as an example: > http://www.nxtravel.net/?feed=AS&template=Lander_Hybrid&rank=4&keyword=Loans&d=unsecured-direct-loan.com&rid=http%3A%2F%2Fwww.google.com%2Furl%3Fsa%3DL%26ai%3DBLo7nXConRq6MG5_IhQS6xtEClJquHNzjjKMGrOuW0wTAuAIQBBgEIInKzAcoBzABOAFQ0PfZ2vj_____AWCdudCBkAWYAeeHAZgBhogBqgEFMDI1MTSyAQxueHRyYXZlbC5uZXTIAQHaAQxueHRyYXZlbC5uZXTIApS06QHZAzr5xMjNnhl44AMC%26num%3D4%26q%3Dhttp%3A%2F%2Funsecured-direct-loan.com%2Funsecured-loans-online.html%26usg%3DAFrqEzct1VSZnZ48RrXOwHNyxS8qzm9O_w > it will be redirected to > http://unsecured-direct-loan.com/unsecured-loans-online.html
Nice to know I haven't lost it completely: finally someone else acknowledged the problem exists. :) Could you please clarify what you ment by "So just add external link check for moved and temp_moved urls should fix this problem"? TIA, t.n.a. ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
