In some instances the content that is downloaded in Fetch phase from a
HTTP URL is not what you would get if you were to access the request
from a well known browser like Google Chrome for example, that is
because the server is expecting a user agent value that represents a
browser.

There is a http.agent.name property in nutch-site.xml, is it the same
property that should be used to set the user agent to make the server
respond to a Nutch get request the same way as it would for a request
from a browser ? Or is there an another configurable property ?

For example the user agent value for a Chrome browser is below.

Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/41.0.2228.0 Safari/537.36


Thanks.

Reply via email to