How about introducing these changes in an effort to force the nutch
admins
to properly edit the bot identity strings?
1. Add the http.agent.* entries to nutch-site.xml with the value being
EDITME.
The description should clearly state that these values *must* be
edited
to reflect the true
: Friday, June 16, 2006 5:52 AM
To: nutch-dev@lucene.apache.org
Subject: Re: IncrediBILL's Random Rants: How Much Nutch is TOO MUCH Nutch?
Paul Sutter wrote:
I think that Nutch has to solve the problem: if you leave the problem to
the
websites, they're more likely to cut you off than
.
Just my 2cents,
Jake.
-Original Message-
From: Paul Sutter [mailto:[EMAIL PROTECTED]
Sent: Friday, June 16, 2006 2:14 PM
To: nutch-dev@lucene.apache.org
Subject: RE: IncrediBILL's Random Rants: How Much Nutch is TOO MUCH
Nutch?
Michael,
Superb idea! And if those crawls could
PROTECTED]
Date: Fri, 16 Jun 2006 14:36:03
To:nutch-dev@lucene.apache.org
Subject: RE: IncrediBILL's Random Rants: How Much
Nutch is TOO MUCH Nutch?
That does sound fairly brilliant. One thing you'll
have to keep
in mind is that different plugins index different
things and sometimes
the same
]
To: nutch-dev@lucene.apache.org
Sent: Friday, June 16, 2006 2:05:41 PM
Subject: RE: IncrediBILL's Random Rants: How Much Nutch is TOO MUCH Nutch?
How about introducing these changes in an effort to force the nutch
admins
to properly edit the bot identity strings?
1. Add the http.agent.* entries
Doug Cutting wrote:
http://incredibill.blogspot.com/2006/06/how-much-nutch-is-too-much-nutch.html
well, I think incrediBILL has an argument, that people might really
start excluding bots from their servers if it's
becoming too much. What might help is that incrediBILL would offer an
index
:[EMAIL PROTECTED]
Sent: Thursday, June 15, 2006 9:30 AM
To: nutch-dev@lucene.apache.org
Subject: Re: IncrediBILL's Random Rants: How Much Nutch is TOO MUCH Nutch?
Doug Cutting wrote:
http://incredibill.blogspot.com/2006/06/how-much-nutch-is-too-much-nutch.htm
l
well, I think incrediBILL has
Random Rants: How Much Nutch is TOO MUCH Nutch?
Doug Cutting wrote:
http://incredibill.blogspot.com/2006/06/how-much-nutch-is-too-much-nutch.htm
l
well, I think incrediBILL has an argument, that people might really
start excluding bots from their servers if it's
becoming too much. What might
Heh. Perhaps we should eliminate the default user-agent string? Then
he'd have less of a target to aim at... :)
On a more serious note, it seems reasonable to require a customized
bot URL at least. But publishing an email contact is questionable
these days. Neither Y! nor G do it,
The 'bot blocker' image server at blogspot is broken so it's impossible to
reply to this blog!
-Original Message-
From: Matt Kangas [mailto:[EMAIL PROTECTED]
Sent: Wednesday, June 14, 2006 10:38 AM
To: nutch-dev@lucene.apache.org
Subject: Re: IncrediBILL's Random Rants: How Much Nutch
10 matches
Mail list logo