RE: IncrediBILL's Random Rants: How Much Nutch is TOO MUCH Nutch?

2006-06-16 Thread Teruhiko Kurosaka
How about introducing these changes in an effort to force the nutch admins to properly edit the bot identity strings? 1. Add the http.agent.* entries to nutch-site.xml with the value being EDITME. The description should clearly state that these values *must* be edited to reflect the true

RE: IncrediBILL's Random Rants: How Much Nutch is TOO MUCH Nutch?

2006-06-16 Thread Paul Sutter
: Friday, June 16, 2006 5:52 AM To: nutch-dev@lucene.apache.org Subject: Re: IncrediBILL's Random Rants: How Much Nutch is TOO MUCH Nutch? Paul Sutter wrote: I think that Nutch has to solve the problem: if you leave the problem to the websites, they're more likely to cut you off than

RE: IncrediBILL's Random Rants: How Much Nutch is TOO MUCH Nutch?

2006-06-16 Thread Vanderdray, Jacob
. Just my 2cents, Jake. -Original Message- From: Paul Sutter [mailto:[EMAIL PROTECTED] Sent: Friday, June 16, 2006 2:14 PM To: nutch-dev@lucene.apache.org Subject: RE: IncrediBILL's Random Rants: How Much Nutch is TOO MUCH Nutch? Michael, Superb idea! And if those crawls could

RE: IncrediBILL's Random Rants: How Much Nutch is TOO MUCH Nutch?

2006-06-16 Thread peter decrem
PROTECTED] Date: Fri, 16 Jun 2006 14:36:03 To:nutch-dev@lucene.apache.org Subject: RE: IncrediBILL's Random Rants: How Much Nutch is TOO MUCH Nutch? That does sound fairly brilliant. One thing you'll have to keep in mind is that different plugins index different things and sometimes the same

Re: IncrediBILL's Random Rants: How Much Nutch is TOO MUCH Nutch?

2006-06-16 Thread ogjunk-nutch
] To: nutch-dev@lucene.apache.org Sent: Friday, June 16, 2006 2:05:41 PM Subject: RE: IncrediBILL's Random Rants: How Much Nutch is TOO MUCH Nutch? How about introducing these changes in an effort to force the nutch admins to properly edit the bot identity strings? 1. Add the http.agent.* entries

Re: IncrediBILL's Random Rants: How Much Nutch is TOO MUCH Nutch?

2006-06-15 Thread Michael Wechner
Doug Cutting wrote: http://incredibill.blogspot.com/2006/06/how-much-nutch-is-too-much-nutch.html well, I think incrediBILL has an argument, that people might really start excluding bots from their servers if it's becoming too much. What might help is that incrediBILL would offer an index

RE: IncrediBILL's Random Rants: How Much Nutch is TOO MUCH Nutch?

2006-06-15 Thread Gal Nitzan
:[EMAIL PROTECTED] Sent: Thursday, June 15, 2006 9:30 AM To: nutch-dev@lucene.apache.org Subject: Re: IncrediBILL's Random Rants: How Much Nutch is TOO MUCH Nutch? Doug Cutting wrote: http://incredibill.blogspot.com/2006/06/how-much-nutch-is-too-much-nutch.htm l well, I think incrediBILL has

RE: IncrediBILL's Random Rants: How Much Nutch is TOO MUCH Nutch?

2006-06-15 Thread Paul Sutter
Random Rants: How Much Nutch is TOO MUCH Nutch? Doug Cutting wrote: http://incredibill.blogspot.com/2006/06/how-much-nutch-is-too-much-nutch.htm l well, I think incrediBILL has an argument, that people might really start excluding bots from their servers if it's becoming too much. What might

Re: IncrediBILL's Random Rants: How Much Nutch is TOO MUCH Nutch?

2006-06-14 Thread Matt Kangas
Heh. Perhaps we should eliminate the default user-agent string? Then he'd have less of a target to aim at... :) On a more serious note, it seems reasonable to require a customized bot URL at least. But publishing an email contact is questionable these days. Neither Y! nor G do it,

RE: IncrediBILL's Random Rants: How Much Nutch is TOO MUCH Nutch?

2006-06-14 Thread Wootton, Alan
The 'bot blocker' image server at blogspot is broken so it's impossible to reply to this blog! -Original Message- From: Matt Kangas [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 14, 2006 10:38 AM To: nutch-dev@lucene.apache.org Subject: Re: IncrediBILL's Random Rants: How Much Nutch