Hi list,

My seed list contains URLs from about 20 different domains. In the first fetch 
cycles everything is all right and all domains will be selected quite equally 
distributed. But after about 10-15 cycles one domain starts to prevail. URLs 
from all other domains will not be selected anymore. It seems that URLs from 
that certain domain have the highest scoring and URLs from other domains don't 
have a chance anymore. Is this a right assumption?

I'm not very happy because I would like to fetch URLs from all domains in each 
cycle. What would you do in that case?

Best regards and thanks for answers
Thomas

(Using nutch-1.2)


GfK SE, Nuremberg, Germany, commercial register Nuremberg HRB 25014; Management 
Board: Professor Dr. Klaus L. W?bbenhorst (CEO), Pamela Knapp (CFO), Dr. 
Gerhard Hausruckinger, Petra Heinlein, Debra A. Pruent, Wilhelm R. Wessels; 
Chairman of the Supervisory Board: Dr. Arno Mahlert
This email and any attachments may contain confidential or privileged 
information. Please note that unauthorized copying, disclosure or distribution 
of the material in this email is not permitted.

Reply via email to