2007/4/19, qi wu <[EMAIL PROTECTED]>:
> I find there is a bug in Fetcher,which  cause the problem you reported...
> Now,Nutch only take external link check during the parsing process,which can 
> make sure all the outlinks generated are in the same host  as the 
> from-URL.But for the links which  will be redirected during fetch,this is not 
> enough.we also need to make sure the redirected url is are in the same host 
> with in the source URL.
> Just take the link below as an example:
> http://www.nxtravel.net/?feed=AS&template=Lander_Hybrid&rank=4&keyword=Loans&d=unsecured-direct-loan.com&rid=http%3A%2F%2Fwww.google.com%2Furl%3Fsa%3DL%26ai%3DBLo7nXConRq6MG5_IhQS6xtEClJquHNzjjKMGrOuW0wTAuAIQBBgEIInKzAcoBzABOAFQ0PfZ2vj_____AWCdudCBkAWYAeeHAZgBhogBqgEFMDI1MTSyAQxueHRyYXZlbC5uZXTIAQHaAQxueHRyYXZlbC5uZXTIApS06QHZAzr5xMjNnhl44AMC%26num%3D4%26q%3Dhttp%3A%2F%2Funsecured-direct-loan.com%2Funsecured-loans-online.html%26usg%3DAFrqEzct1VSZnZ48RrXOwHNyxS8qzm9O_w
> it will be redirected to
> http://unsecured-direct-loan.com/unsecured-loans-online.html

Nice to know I haven't lost it completely: finally someone else
acknowledged the problem exists. :)
Could you please clarify what you ment by "So just add external link
check for moved and temp_moved urls should fix this problem"?

TIA,
t.n.a.

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to