: What I mean is that when you have publicly exposed search that bots crawl,
they
: issue all kinds of crazy "queries" that result in errors, that add noise to
Solr
: caches, increase Solr cache evictions, etc. etc.
I teld with this type of thing a few years back by having my front end app
e
, 2011 9:07:43 AM
Subject: Re: How to let crawlers in, but prevent their damage?
On Jan 10, 2011, at 7:02am, Otis Gospodnetic wrote:
> Hi Ken, thanks Ken. :)
>
> The problem with this approach is that it exposes very limited content to
> bots/web search engines.
>
> Take http:/
- Original Message
From: lee carroll
To: solr-user@lucene.apache.org
Sent: Mon, January 10, 2011 6:48:12 AM
Subject: Re: How to let crawlers in, but prevent their damage?
Sorry not an answer but a +1 vote for finding out best practice for this.
Related to it is DOS attacks. We have
Krugler
To: solr-user@lucene.apache.org
Sent: Mon, January 10, 2011 9:43:49 AM
Subject: Re: How to let crawlers in, but prevent their damage?
Hi Otis,
From what I learned at Krugle, the approach that worked for us was:
1. Block all bots on the search page.
2. Expose the target content via stat
EARTH has a Right To Life,
otherwise we all die.
- Original Message
From: Otis Gospodnetic
To: solr-user@lucene.apache.org
Sent: Mon, January 10, 2011 5:41:17 AM
Subject: How to let crawlers in, but prevent their damage?
Hi,
How do people with public search services deal with bots/crawlers?
And
p://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
- Original Message
> From: Ken Krugler
> To: solr-user@lucene.apache.org
> Sent: Mon, January 10, 2011 9:43:49 AM
> Subject: Re: How to let crawlers in, but prevent their damage?
>
Sorry not an answer but a +1 vote for finding out best practice for this.
Related to it is DOS attacks. We have rewrite rules in between the proxy
server and solr which attempts to filter out undesriable stuff but would it
be better to have a query app doing this?
any standard rewrite rules whic
Hi Otis,
From what I learned at Krugle, the approach that worked for us was:
1. Block all bots on the search page.
2. Expose the target content via statically linked pages that are
separately generated from the same backing store, and optimized for
target search terms (extracted from your o
Hi,
How do people with public search services deal with bots/crawlers?
And I don't mean to ask how one bans them (robots.txt) or slow them down (Delay
stuff in robots.txt) or prevent them from digging too deep in search results...
What I mean is that when you have publicly exposed search that bo