Hi, you could set generate.max.per.host to a reasonable size to prevent this! On a default configuration this is set to -1 which means unlimited.
BR Hannes --- Hannes Carl Meyer www.informera.de On Fri, Jul 8, 2011 at 2:53 PM, Eggebrecht, Thomas (GfK Marktforschung) < [email protected]> wrote: > Hi list, > > My seed list contains URLs from about 20 different domains. In the first > fetch cycles everything is all right and all domains will be selected quite > equally distributed. But after about 10-15 cycles one domain starts to > prevail. URLs from all other domains will not be selected anymore. It seems > that URLs from that certain domain have the highest scoring and URLs from > other domains don't have a chance anymore. Is this a right assumption? > > I'm not very happy because I would like to fetch URLs from all domains in > each cycle. What would you do in that case? > > Best regards and thanks for answers > Thomas > > (Using nutch-1.2) > > > GfK SE, Nuremberg, Germany, commercial register Nuremberg HRB 25014; > Management Board: Professor Dr. Klaus L. W?bbenhorst (CEO), Pamela Knapp > (CFO), Dr. Gerhard Hausruckinger, Petra Heinlein, Debra A. Pruent, Wilhelm > R. Wessels; Chairman of the Supervisory Board: Dr. Arno Mahlert > This email and any attachments may contain confidential or privileged > information. Please note that unauthorized copying, disclosure or > distribution of the material in this email is not permitted. >

