Re: stop spider
You would need to contact them directly. Nutch is an open source project and does NOT run crawlers of its own. You would need to contact the organization that is running the crawlers and/or modify your robots.txt file to block (well behaved) robots. Dennis Kubes georgiosi ... wrote: please can you STOP sitesell from leaching and crawling all over my site www.georgiosi.com , i am receiving false statistics and this is NOT good. just take it off my site. : (
Re: stop spider
Hi, Nutch is a software project and does not host/store a search index. Furthermore no websites are crawled by the software project itself. You are observing somebody USING nutch to crawl your site. The people using/maintaining/developing the software called nutch are indeed interested in misbehaving crawlers. However, I just tried to access http://www.georgiosi.com/robots.txt and could not find anything. If you don't want webspiders to crawl your site you should/have to maintain a "robots.txt" file. The nutch spider does by-default obey the robots exclusion protocol. adding: User-agent: Nutch disallow: /* to robots.txt blocks the nutchspider Best Regards, Martin On Jan 17, 2008 2:26 PM, georgiosi ... <[EMAIL PROTECTED]> wrote: > please can you STOP sitesell from leaching and crawling all over my site > www.georgiosi.com , i am receiving false statistics and this is NOT good. > just take it off my site. : ( >
Re: stop spider
georgiosi ... wrote: please can you STOP sitesell from leaching and crawling all over my site www.georgiosi.com , i am receiving false statistics and this is NOT good. just take it off my site. : ( Please contact the admins at Sitesell. This mailing list concerns the Nuch software project - we are not doing any crawling, we just develop the software. The user agent string that they report is a generic value in the default Nutch configuration. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
stop spider
please can you STOP sitesell from leaching and crawling all over my site www.georgiosi.com , i am receiving false statistics and this is NOT good. just take it off my site. : (