Hi Otis,
From what I learned at Krugle, the approach that worked for us was:
1. Block all bots on the search page.
2. Expose the target content via statically linked pages that are
separately generated from the same backing store, and optimized for
target search terms (extracted from your own search logs).
-- Ken
On Jan 10, 2011, at 5:41am, Otis Gospodnetic wrote:
Hi,
How do people with public search services deal with bots/crawlers?
And I don't mean to ask how one bans them (robots.txt) or slow them
down (Delay
stuff in robots.txt) or prevent them from digging too deep in search
results...
What I mean is that when you have publicly exposed search that bots
crawl, they
issue all kinds of crazy "queries" that result in errors, that add
noise to Solr
caches, increase Solr cache evictions, etc. etc.
Are there some known recipes for dealing with them, minimizing their
negative
side-effects, while still letting them crawl you?
Thanks,
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
--------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g