Hi Karanjeet,

On Mon, Sep 28, 2015 at 10:54 PM, <[email protected]> wrote:

>
> I am facing the same problem here. Tried rebuilding it but in logs I can
> only
> see the agent name mentioned in http.agent.name property.
>

So you have a file called agents.txt in $NUTCH_HOME/conf?
Does this file have agent names listed one per line?


>
> By $NUTCH_HOME/conf do you mean runtime/local/conf directory ?
>

Yes. This is where (if running locally) your Nutch crawler is being run
from.


>
> Also can you please brief me on how the rotation works ?


Certainly. Based on the presence of an agents.txt file (or some other
qualified and configuration matching file) being present in
$NUTCH_HOME/conf with agent names present one per line. Each agent name is
used read from the agents.txt file as per the logic in
https://github.com/apache/nutch/blob/trunk/src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java#L158-L194
Each agent is then cached within an ArrayList as per
https://github.com/apache/nutch/blob/trunk/src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java#L64
Fetcher threads then access this ArrayList pulling a different
http.agent.name and assigning it to the HTTP request.


> Does the agent
> rotates after crawling some X links and if so can we configure that X ?


It is changed (rotated) on every link a fetcher thread fetches. The change
frequency configuration is managed internally. There has been no real
appealing case to make this some adaptive rotation mechanism however it was
once discussed by Giuseppe and I.
Does this make sense?
Thanks

Reply via email to