Tim Martin wrote:
Thanks. I made the changes you suggested but the problem persisted.
After about 5 rounds of 1000 URLs one site would "take over." I made
the attached small change to get around this problem. It allows you to
specific the maximum number of URLs you want from any single host. I
now use -topN 1000 -maxSite 500 and things are going as I had hoped.

I like this idea and think it will make a useful addition to Nutch. However the filtering should be done in the loop at line 478, not at line 400, right? This way you'd get the highest scoring N pages from each site. If you agree, can you please modify the patch to work that way?


Thanks,

Doug


------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to