AIL PROTECTED]>
To: nutch-dev@lucene.apache.org
Sent: Friday, June 16, 2006 2:05:41 PM
Subject: RE: IncrediBILL's Random Rants: How Much Nutch is TOO MUCH Nutch?
How about introducing these changes in an effort to force the nutch
admins
to properly edit the bot identity strings?
1. Add the http.agen
b" <[EMAIL PROTECTED]>
Date: Fri, 16 Jun 2006 14:36:03
To:
Subject: RE: IncrediBILL's Random Rants: How Much
Nutch is TOO MUCH Nutch?
That does sound fairly brilliant. One thing you'll
have to keep
in mind is that different plugins index different
things and sometimes
th
he plugins.
Just my 2cents,
Jake.
-Original Message-
From: Paul Sutter [mailto:[EMAIL PROTECTED]
Sent: Friday, June 16, 2006 2:14 PM
To: nutch-dev@lucene.apache.org
Subject: RE: IncrediBILL's Random Rants: How Much Nutch is TOO MUCH
Nutch?
Michael,
Superb idea! And if those
o:[EMAIL PROTECTED]
Sent: Friday, June 16, 2006 5:52 AM
To: nutch-dev@lucene.apache.org
Subject: Re: IncrediBILL's Random Rants: How Much Nutch is TOO MUCH Nutch?
Paul Sutter wrote:
> I think that Nutch has to solve the problem: if you leave the problem to
the
> websites, they're mo
How about introducing these changes in an effort to force the nutch
admins
to properly edit the bot identity strings?
1. Add the http.agent.* entries to nutch-site.xml with the value being
"EDITME".
The description should clearly state that these values *must* be
edited
to reflect the true
e a pretty good point.
-Original Message-
From: Michael Wechner [mailto:[EMAIL PROTECTED]
Sent: Thursday, June 15, 2006 12:30 AM
To: nutch-dev@lucene.apache.org
Subject: Re: IncrediBILL's Random Rants: How Much Nutch is TOO MUCH Nutch?
Doug Cutting wrote:
http://incredibill.blog
rediBILL's Random Rants: How Much Nutch is TOO MUCH Nutch?
Doug Cutting wrote:
>
http://incredibill.blogspot.com/2006/06/how-much-nutch-is-too-much-nutch.htm
l
>
>
well, I think incrediBILL has an argument, that people might really
start excluding bots from their servers if it
:[EMAIL PROTECTED]
Sent: Thursday, June 15, 2006 9:30 AM
To: nutch-dev@lucene.apache.org
Subject: Re: IncrediBILL's Random Rants: How Much Nutch is TOO MUCH Nutch?
Doug Cutting wrote:
>
http://incredibill.blogspot.com/2006/06/how-much-nutch-is-too-much-nutch.htm
l
>
>
well, I think i
Doug Cutting wrote:
http://incredibill.blogspot.com/2006/06/how-much-nutch-is-too-much-nutch.html
well, I think incrediBILL has an argument, that people might really
start excluding bots from their servers if it's
becoming too much. What might help is that incrediBILL would offer an
index of
The 'bot blocker' image server at blogspot is broken so it's impossible to
reply to this blog!
-Original Message-
From: Matt Kangas [mailto:[EMAIL PROTECTED]
Sent: Wednesday, June 14, 2006 10:38 AM
To: nutch-dev@lucene.apache.org
Subject: Re: IncrediBILL's Random Rant
Heh. Perhaps we should eliminate the default user-agent string? Then
he'd have less of a target to aim at... :)
On a more serious note, it seems reasonable to require a customized
"bot" URL at least. But publishing an email contact is questionable
these days. Neither Y! nor G do it, precise
http://incredibill.blogspot.com/2006/06/how-much-nutch-is-too-much-nutch.html
12 matches
Mail list logo