Fetcher halting and throttling
------------------------------

                 Key: NUTCH-372
                 URL: http://issues.apache.org/jira/browse/NUTCH-372
             Project: Nutch
          Issue Type: Sub-task
          Components: fetcher
            Reporter: Andrzej Bialecki 
         Assigned To: Andrzej Bialecki 


This patch uses the message queueing framework to implement the following 
functionality:

* ability to gracefully stop fetching the current segment. This is different 
from simply killing the job in that the partial results (partially fetched 
segment) are available and can be further processed. This is especially useful 
for fetching large segments with long "tails", i.e. pages which are fetched 
very slowly, either because of politeness settings or the target site's 
bandwidth limitations.

* ability to dynamicaly adjust the number of fetcher threads. For a 
long-running fetch job it makes sense to decrease the number of fetcher threads 
during the day, and increase it during the night. This can be done now with a 
cron script, using the MsgQueueTool command-line.

It's worthwhile to note that the patch itself is trivial, and most of the work 
is done by the MQ framework.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to