This brings to mind a minor suggestion -- rather than topN, why not have
the top percentage? Each time I use topN I think think in terms of a
percentage of sites. Seems easier to have the machine do such a simple
calculation...

      - Bill


Tim said:

> Thanks. I made the changes you suggested but the problem persisted.
> After about 5 rounds of 1000 URLs one site would "take over." I made
> the attached small change to get around this problem. It allows you to
> specific the maximum number of URLs you want from any single host. I
> now use -topN 1000 -maxSite 500 and things are going as I had hoped.
> 
> Thanks,
> Tim

-- 
         *------------------------------------------------------*
         | Bill Goffe                 [EMAIL PROTECTED]          |
         | Department of Economics    voice: (315) 312-3444     |
         | SUNY Oswego                fax:   (315) 312-5444     |
         | 416 Mahar Hall             <wuecon.wustl.edu/~goffe> |          
         | Oswego, NY  13126                                    |
*--------*------------------------------------------------------*-----------*
|   "Two physics majors, Justin Kasper and Fred Niell, gathered up some     |
| spare junk from their physics labs and dorm rooms and built a             |
| plutonium-producing reactor.                                              |
|   "`It's kind of scary how easy it was to do,' said Niell, assuring       |
| onlookers that there was only a trace of plutonium -- nothing harmful.    |
| `It only took us about a day to build it.  We've been thinking about it   |
| for a few days and we gathered the parts, and last night we assembled     |
| it. In Justin's room -- he lost the coin toss.'"                          |
|   -- A description of part of the University of Chicago Scavenger Hunt,   |
|      where making a reactor was one of the possible projects. New York    |
|      Times, May 19, 1999.                                                 |
*---------------------------------------------------------------------------*



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to