Re: stop spider

2008-01-17 Thread Dennis Kubes
You would need to contact them directly.  Nutch is an open source 
project and does NOT run crawlers of its own.  You would need to contact 
the organization that is running the crawlers and/or modify your 
robots.txt file to block (well behaved) robots.


Dennis Kubes

georgiosi ... wrote:

please can you STOP sitesell from leaching and crawling all over my site
www.georgiosi.com , i am receiving false statistics and this is NOT good.
just take it off my site.  : (



Re: stop spider

2008-01-17 Thread Martin Kuen
Hi,

Nutch is a software project and does not host/store a search index.
Furthermore no websites are crawled by the software project itself.
You are observing somebody USING nutch to crawl your site. The people
using/maintaining/developing the software called nutch are indeed interested
in misbehaving crawlers.

However, I just tried to access http://www.georgiosi.com/robots.txt and
could not find anything. If you don't want webspiders to crawl your site you
should/have to maintain a "robots.txt" file. The nutch spider does
by-default obey the robots exclusion protocol.

adding:
User-agent: Nutch
disallow: /*
to robots.txt blocks the nutchspider


Best Regards,

Martin

On Jan 17, 2008 2:26 PM, georgiosi ... <[EMAIL PROTECTED]> wrote:

> please can you STOP sitesell from leaching and crawling all over my site
> www.georgiosi.com , i am receiving false statistics and this is NOT good.
> just take it off my site.  : (
>


Re: stop spider

2008-01-17 Thread Andrzej Bialecki

georgiosi ... wrote:

please can you STOP sitesell from leaching and crawling all over my site
www.georgiosi.com , i am receiving false statistics and this is NOT good.
just take it off my site.  : (



Please contact the admins at Sitesell. This mailing list concerns the 
Nuch software project - we are not doing any crawling, we just develop 
the software. The user agent string that they report is a generic value 
in the default Nutch configuration.



--
Best regards,
Andrzej Bialecki <><
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



stop spider

2008-01-17 Thread georgiosi ...
please can you STOP sitesell from leaching and crawling all over my site
www.georgiosi.com , i am receiving false statistics and this is NOT good.
just take it off my site.  : (