Fetcher halting and throttling
------------------------------
Key: NUTCH-372
URL: http://issues.apache.org/jira/browse/NUTCH-372
Project: Nutch
Issue Type: Sub-task
Components: fetcher
Reporter: Andrzej Bialecki
Assigned To: Andrzej Bialecki
This patch uses the message queueing framework to implement the following
functionality:
* ability to gracefully stop fetching the current segment. This is different
from simply killing the job in that the partial results (partially fetched
segment) are available and can be further processed. This is especially useful
for fetching large segments with long "tails", i.e. pages which are fetched
very slowly, either because of politeness settings or the target site's
bandwidth limitations.
* ability to dynamicaly adjust the number of fetcher threads. For a
long-running fetch job it makes sense to decrease the number of fetcher threads
during the day, and increase it during the night. This can be done now with a
cron script, using the MsgQueueTool command-line.
It's worthwhile to note that the patch itself is trivial, and most of the work
is done by the MQ framework.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers