Hello,

When I fetch the following links with nutch 1.3:
  http://blog.mises.org/archives/010450.asp
  
http://feedproxy.google.com/~r/readwriteweb/~3/frC1ndi7-V8/google_docs_goes_back_to_schoo.php
and
  http.redirect.max = 2
The first of these links is fetched OK, including the two redirects:
  http://blog.mises.org/?p=010450
  http://blog.mises.org/10450/what-the-bubble-did-to-technology/
However for the second link (feedproxy.google.com) the redirects are
not being followed during the fetch.
Both redirects are "301 Moved Permanently".

May be somebody could suggest what is causing such behavior? I am
using the default settings + http.agent.name and http.robots.agents.

Further, if I update the crawldb with the results of the fetch and
then generate a new segment, the link
   
http://www.readwriteweb.com/archives/google_docs_goes_back_to_schoo.php?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+readwriteweb+%28ReadWriteWeb%29
which is redirected from
   
http://feedproxy.google.com/~r/readwriteweb/~3/frC1ndi7-V8/google_docs_goes_back_to_schoo.php
is never added to the new segment.

What am I doing wrong? :)

Thank You!
Oleg Mürk

Reply via email to