http keep alive

2009-10-14 Thread Marko Bauhardt
hi. is there a way for using http-keep-alive with nutch? supports protocol-http or protocol-httpclient keep alive? i cant find the using of http-keep-alive inside the code or in configuration files? thanks marko

Re: http keep alive

2009-10-14 Thread Andrzej Bialecki
Marko Bauhardt wrote: hi. is there a way for using http-keep-alive with nutch? supports protocol-http or protocol-httpclient keep alive? i cant find the using of http-keep-alive inside the code or in configuration files? protocol-httpclient can support keep-alive. However, I think that it

Re: Recrawling Nutch

2009-10-14 Thread Paul Tomblin
nutch doesn't do a good job on storing or testing the Last-Modified time of pages it's crawled. I made the following changes which seem to help a lot: snowbird:~/src/nutch/trunk svn diff Index: src/java/org/apache/nutch/fetcher/Fetcher.java

RE: http keep alive

2009-10-14 Thread Fuad Efendi
I'd like to add: Keep-Alive is not polite. It uses dedicated listener on server-side. Establishing TCP socket via specific IP handshake takes time, that's why KeepAlive exists for web servers - to improve performance of subsequent requests. However, it allocated dedicated listener for specific

Nutch-based Application for Windows - New Release

2009-10-14 Thread John Whelan
All, Version 2.1 has now been release. This version adds the following: 1. Updated Nutch from 1.0-dev (build 2008-10-28) to 1.1-dev (build 2009-09-09) 2. Updated Tomcat from 6.0.16 to 6.0.20. 3. Fixed bugs related to running in non-English locales. 4. Fixed bug in uninstaller. (Improved

NUTCH_CRAWLING

2009-10-14 Thread meh
Hai, bin/nutch crawl urls -dir crawl_NEW1 -depth 3 -topN 50 I have used the above command to crawl. I am getting the following error. Dedup: adding indexes in: crawl_NEW1/indexes Exception in thread main java.io.IOException: Job failed! at