Re: How to let crawlers in, but prevent their damage?

2011-01-10 Thread Chris Hostetter
: What I mean is that when you have publicly exposed search that bots crawl, they : issue all kinds of crazy "queries" that result in errors, that add noise to Solr : caches, increase Solr cache evictions, etc. etc. I teld with this type of thing a few years back by having my front end app e

Re: How to let crawlers in, but prevent their damage?

2011-01-10 Thread Dennis Gearon
, 2011 9:07:43 AM Subject: Re: How to let crawlers in, but prevent their damage? On Jan 10, 2011, at 7:02am, Otis Gospodnetic wrote: > Hi Ken, thanks Ken. :) > > The problem with this approach is that it exposes very limited content to > bots/web search engines. > > Take http:/

Re: How to let crawlers in, but prevent their damage?

2011-01-10 Thread Dennis Gearon
- Original Message From: lee carroll To: solr-user@lucene.apache.org Sent: Mon, January 10, 2011 6:48:12 AM Subject: Re: How to let crawlers in, but prevent their damage? Sorry not an answer but a +1 vote for finding out best practice for this. Related to it is DOS attacks. We have

Re: How to let crawlers in, but prevent their damage?

2011-01-10 Thread Ken Krugler
Krugler To: solr-user@lucene.apache.org Sent: Mon, January 10, 2011 9:43:49 AM Subject: Re: How to let crawlers in, but prevent their damage? Hi Otis, From what I learned at Krugle, the approach that worked for us was: 1. Block all bots on the search page. 2. Expose the target content via stat

Re: How to let crawlers in, but prevent their damage?

2011-01-10 Thread Dennis Gearon
EARTH has a Right To Life, otherwise we all die. - Original Message From: Otis Gospodnetic To: solr-user@lucene.apache.org Sent: Mon, January 10, 2011 5:41:17 AM Subject: How to let crawlers in, but prevent their damage? Hi, How do people with public search services deal with bots/crawlers? And

Re: How to let crawlers in, but prevent their damage?

2011-01-10 Thread Otis Gospodnetic
p://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: Ken Krugler > To: solr-user@lucene.apache.org > Sent: Mon, January 10, 2011 9:43:49 AM > Subject: Re: How to let crawlers in, but prevent their damage? >

Re: How to let crawlers in, but prevent their damage?

2011-01-10 Thread lee carroll
Sorry not an answer but a +1 vote for finding out best practice for this. Related to it is DOS attacks. We have rewrite rules in between the proxy server and solr which attempts to filter out undesriable stuff but would it be better to have a query app doing this? any standard rewrite rules whic

Re: How to let crawlers in, but prevent their damage?

2011-01-10 Thread Ken Krugler
Hi Otis, From what I learned at Krugle, the approach that worked for us was: 1. Block all bots on the search page. 2. Expose the target content via statically linked pages that are separately generated from the same backing store, and optimized for target search terms (extracted from your o

How to let crawlers in, but prevent their damage?

2011-01-10 Thread Otis Gospodnetic
Hi, How do people with public search services deal with bots/crawlers? And I don't mean to ask how one bans them (robots.txt) or slow them down (Delay stuff in robots.txt) or prevent them from digging too deep in search results... What I mean is that when you have publicly exposed search that bo