This brings to mind a minor suggestion -- rather than topN, why not have
the top percentage? Each time I use topN I think think in terms of a
percentage of sites. Seems easier to have the machine do such a simple
calculation...
- Bill
Tim said:
> Thanks. I made the changes you suggested but the problem persisted.
> After about 5 rounds of 1000 URLs one site would "take over." I made
> the attached small change to get around this problem. It allows you to
> specific the maximum number of URLs you want from any single host. I
> now use -topN 1000 -maxSite 500 and things are going as I had hoped.
>
> Thanks,
> Tim
--
*------------------------------------------------------*
| Bill Goffe [EMAIL PROTECTED] |
| Department of Economics voice: (315) 312-3444 |
| SUNY Oswego fax: (315) 312-5444 |
| 416 Mahar Hall <wuecon.wustl.edu/~goffe> |
| Oswego, NY 13126 |
*--------*------------------------------------------------------*-----------*
| "Two physics majors, Justin Kasper and Fred Niell, gathered up some |
| spare junk from their physics labs and dorm rooms and built a |
| plutonium-producing reactor. |
| "`It's kind of scary how easy it was to do,' said Niell, assuring |
| onlookers that there was only a trace of plutonium -- nothing harmful. |
| `It only took us about a day to build it. We've been thinking about it |
| for a few days and we gathered the parts, and last night we assembled |
| it. In Justin's room -- he lost the coin toss.'" |
| -- A description of part of the University of Chicago Scavenger Hunt, |
| where making a reactor was one of the possible projects. New York |
| Times, May 19, 1999. |
*---------------------------------------------------------------------------*
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers