hi.
is there a way for using http-keep-alive with nutch?
supports protocol-http or protocol-httpclient keep alive?
i cant find the using of http-keep-alive inside the code or in
configuration files?
thanks
marko
Marko Bauhardt wrote:
hi.
is there a way for using http-keep-alive with nutch?
supports protocol-http or protocol-httpclient keep alive?
i cant find the using of http-keep-alive inside the code or in
configuration files?
protocol-httpclient can support keep-alive. However, I think that it
nutch doesn't do a good job on storing or testing the Last-Modified
time of pages it's crawled. I made the following changes which seem
to help a lot:
snowbird:~/src/nutch/trunk svn diff
Index: src/java/org/apache/nutch/fetcher/Fetcher.java
I'd like to add:
Keep-Alive is not polite. It uses dedicated listener on server-side.
Establishing TCP socket via specific IP handshake takes time, that's why
KeepAlive exists for web servers - to improve performance of subsequent
requests. However, it allocated dedicated listener for specific
All,
Version 2.1 has now been release. This version adds the following:
1. Updated Nutch from 1.0-dev (build 2008-10-28) to 1.1-dev (build
2009-09-09)
2. Updated Tomcat from 6.0.16 to 6.0.20.
3. Fixed bugs related to running in non-English locales.
4. Fixed bug in uninstaller. (Improved
Hai,
bin/nutch crawl urls -dir crawl_NEW1 -depth 3 -topN 50
I have used the above command to crawl.
I am getting the following error.
Dedup: adding indexes in: crawl_NEW1/indexes
Exception in thread main java.io.IOException: Job failed!
at