Hi,

you could set generate.max.per.host to a reasonable size to prevent this!
On a default configuration this is set to -1 which means unlimited.

BR

Hannes

---
Hannes Carl Meyer
www.informera.de

On Fri, Jul 8, 2011 at 2:53 PM, Eggebrecht, Thomas (GfK Marktforschung) <
[email protected]> wrote:

> Hi list,
>
> My seed list contains URLs from about 20 different domains. In the first
> fetch cycles everything is all right and all domains will be selected quite
> equally distributed. But after about 10-15 cycles one domain starts to
> prevail. URLs from all other domains will not be selected anymore. It seems
> that URLs from that certain domain have the highest scoring and URLs from
> other domains don't have a chance anymore. Is this a right assumption?
>
> I'm not very happy because I would like to fetch URLs from all domains in
> each cycle. What would you do in that case?
>
> Best regards and thanks for answers
> Thomas
>
> (Using nutch-1.2)
>
>
> GfK SE, Nuremberg, Germany, commercial register Nuremberg HRB 25014;
> Management Board: Professor Dr. Klaus L. W?bbenhorst (CEO), Pamela Knapp
> (CFO), Dr. Gerhard Hausruckinger, Petra Heinlein, Debra A. Pruent, Wilhelm
> R. Wessels; Chairman of the Supervisory Board: Dr. Arno Mahlert
> This email and any attachments may contain confidential or privileged
> information. Please note that unauthorized copying, disclosure or
> distribution of the material in this email is not permitted.
>

Reply via email to