Generator exits incorrectly for small fetchlists 
-------------------------------------------------

                 Key: NUTCH-503
                 URL: https://issues.apache.org/jira/browse/NUTCH-503
             Project: Nutch
          Issue Type: Bug
          Components: generator
    Affects Versions: 0.9.0, 0.8.1, 0.8
         Environment: Fedora Core 2, JDK 1.6
            Reporter: Vishal Shah
             Fix For: 0.8.2


   I think I found the reason why the generator returns with an empty fetchlist 
for small fetchsizes. 
 
   After the first job finishes running, the generator checks the following 
condition to see if it got an empty list:
 
    if (readers == null || readers.length == 0 || !readers[0].next(new
FloatWritable())) {
 
  The third condition is incorrect here. In some cases, esp. for small 
fetchlists, the first partition might be empty, but some other partition(s) 
might contain urls. In this case, the Generator is incorrectly assuming that 
all partitions are empty by just looking at the first. This problem could also 
occur when all URLs in the fetchlist are from the same host (or from a very 
small number of hosts, or from a number of hosts that all map to a small number 
of partitions).


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to