Hi Karanjeet, On Mon, Sep 28, 2015 at 10:54 PM, <[email protected]> wrote:
> > I am facing the same problem here. Tried rebuilding it but in logs I can > only > see the agent name mentioned in http.agent.name property. > So you have a file called agents.txt in $NUTCH_HOME/conf? Does this file have agent names listed one per line? > > By $NUTCH_HOME/conf do you mean runtime/local/conf directory ? > Yes. This is where (if running locally) your Nutch crawler is being run from. > > Also can you please brief me on how the rotation works ? Certainly. Based on the presence of an agents.txt file (or some other qualified and configuration matching file) being present in $NUTCH_HOME/conf with agent names present one per line. Each agent name is used read from the agents.txt file as per the logic in https://github.com/apache/nutch/blob/trunk/src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java#L158-L194 Each agent is then cached within an ArrayList as per https://github.com/apache/nutch/blob/trunk/src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java#L64 Fetcher threads then access this ArrayList pulling a different http.agent.name and assigning it to the HTTP request. > Does the agent > rotates after crawling some X links and if so can we configure that X ? It is changed (rotated) on every link a fetcher thread fetches. The change frequency configuration is managed internally. There has been no real appealing case to make this some adaptive rotation mechanism however it was once discussed by Giuseppe and I. Does this make sense? Thanks

