I don't nkow about stopping proble3ms with the issues that you've raised. But I do know that web sites that aren't indempotent with GET requests are in a hurt locket. That seems to be WAY too many of them. This means, don't do anything with GET that changes the contents of your web site.
Regarding a more dierct answer to your question, you'd probably have to have some sort of filtering applied. And anyway, crawlers only issue 'queries' based on the URLs found in the site, right? So are you going to have wierd URLs embedded in your site? Dennis Gearon Signature Warning ---------------- It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die. ----- Original Message ---- From: Otis Gospodnetic <otis_gospodne...@yahoo.com> To: solr-user@lucene.apache.org Sent: Mon, January 10, 2011 5:41:17 AM Subject: How to let crawlers in, but prevent their damage? Hi, How do people with public search services deal with bots/crawlers? And I don't mean to ask how one bans them (robots.txt) or slow them down (Delay stuff in robots.txt) or prevent them from digging too deep in search results... What I mean is that when you have publicly exposed search that bots crawl, they issue all kinds of crazy "queries" that result in errors, that add noise to Solr caches, increase Solr cache evictions, etc. etc. Are there some known recipes for dealing with them, minimizing their negative side-effects, while still letting them crawl you? Thanks, Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/