Hi Otis,

From what I learned at Krugle, the approach that worked for us was:

1. Block all bots on the search page.

2. Expose the target content via statically linked pages that are separately generated from the same backing store, and optimized for target search terms (extracted from your own search logs).

-- Ken

On Jan 10, 2011, at 5:41am, Otis Gospodnetic wrote:

Hi,

How do people with public search services deal with bots/crawlers?
And I don't mean to ask how one bans them (robots.txt) or slow them down (Delay stuff in robots.txt) or prevent them from digging too deep in search results...

What I mean is that when you have publicly exposed search that bots crawl, they issue all kinds of crazy "queries" that result in errors, that add noise to Solr
caches, increase Solr cache evictions, etc. etc.

Are there some known recipes for dealing with them, minimizing their negative
side-effects, while still letting them crawl you?

Thanks,
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


--------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g





Reply via email to