I have followed the media-style.com quick tutorial, but when I try to
fetch my segment the fetch is killed!
Have tried to set the system timer + 30 days, no anti-virus is running
on the systems.
System SUSE 9.2 and SUSE 10
# bin/nutch fetch segments/20060109014654/
060109 014714 parsing
file:/home/hkongsgaard/nutch-0.7.1/conf/nutch-default.xml
060109 014715 parsing file:/home/hkongsgaard/nutch-0.7.1/conf/nutch-site.xml
060109 014715 No FS indicated, using default:local
060109 014715 Plugins: looking in: /home/hkongsgaard/nutch-0.7.1/plugins
060109 014715 not including:
/home/hkongsgaard/nutch-0.7.1/plugins/query-more
060109 014715 parsing:
/home/hkongsgaard/nutch-0.7.1/plugins/query-site/plugin.xml
060109 014715 impl: point=org.apache.nutch.searcher.QueryFilter
class=org.apache.nutch.searcher.site.SiteQueryFilter
060109 014715 parsing:
/home/hkongsgaard/nutch-0.7.1/plugins/parse-html/plugin.xml
060109 014715 impl: point=org.apache.nutch.parse.Parser
class=org.apache.nutch.parse.html.HtmlParser
060109 014715 parsing:
/home/hkongsgaard/nutch-0.7.1/plugins/parse-text/plugin.xml
060109 014715 impl: point=org.apache.nutch.parse.Parser
class=org.apache.nutch.parse.text.TextParser
060109 014715 not including: /home/hkongsgaard/nutch-0.7.1/plugins/parse-ext
060109 014715 not including: /home/hkongsgaard/nutch-0.7.1/plugins/parse-pdf
060109 014715 not including: /home/hkongsgaard/nutch-0.7.1/plugins/parse-rss
060109 014715 parsing:
/home/hkongsgaard/nutch-0.7.1/plugins/query-basic/plugin.xml
060109 014715 impl: point=org.apache.nutch.searcher.QueryFilter
class=org.apache.nutch.searcher.basic.BasicQueryFilter
060109 014715 not including:
/home/hkongsgaard/nutch-0.7.1/plugins/index-more
060109 014715 not including: /home/hkongsgaard/nutch-0.7.1/plugins/parse-js
060109 014715 parsing:
/home/hkongsgaard/nutch-0.7.1/plugins/urlfilter-regex/plugin.xml
060109 014715 impl: point=org.apache.nutch.net.URLFilter
class=org.apache.nutch.net.RegexURLFilter
060109 014715 not including:
/home/hkongsgaard/nutch-0.7.1/plugins/protocol-ftp
060109 014715 not including:
/home/hkongsgaard/nutch-0.7.1/plugins/parse-msword
060109 014715 not including:
/home/hkongsgaard/nutch-0.7.1/plugins/creativecommons
060109 014715 not including: /home/hkongsgaard/nutch-0.7.1/plugins/ontology
060109 014715 parsing:
/home/hkongsgaard/nutch-0.7.1/plugins/nutch-extensionpoints/plugin.xml
060109 014715 not including:
/home/hkongsgaard/nutch-0.7.1/plugins/protocol-file
060109 014715 parsing:
/home/hkongsgaard/nutch-0.7.1/plugins/protocol-http/plugin.xml
060109 014715 impl: point=org.apache.nutch.protocol.Protocol
class=org.apache.nutch.protocol.http.Http
060109 014715 not including:
/home/hkongsgaard/nutch-0.7.1/plugins/clustering-carrot2
060109 014715 not including:
/home/hkongsgaard/nutch-0.7.1/plugins/language-identifier
060109 014715 not including:
/home/hkongsgaard/nutch-0.7.1/plugins/urlfilter-prefix
060109 014715 parsing:
/home/hkongsgaard/nutch-0.7.1/plugins/query-url/plugin.xml
060109 014715 impl: point=org.apache.nutch.searcher.QueryFilter
class=org.apache.nutch.searcher.url.URLQueryFilter
060109 014715 parsing:
/home/hkongsgaard/nutch-0.7.1/plugins/index-basic/plugin.xml
060109 014715 impl: point=org.apache.nutch.indexer.IndexingFilter
class=org.apache.nutch.indexer.basic.BasicIndexingFilter
060109 014715 not including:
/home/hkongsgaard/nutch-0.7.1/plugins/protocol-httpclient
060109 014715 logging at INFO
060109 014715 fetching http://www.sourceforge.net/
060109 014715 fetching http://www.apache.org/
060109 014715 fetching http://www.nutch.org/
060109 014715 http.proxy.host = null
060109 014715 http.proxy.port = 8080
060109 014715 http.timeout = 10000
060109 014715 http.content.limit = -1
060109 014715 http.agent = NutchCVS/0.7.1 (Nutch;
http://lucene.apache.org/nutch/bot.html; [email protected])
060109 014715 fetcher.server.delay = 5000
060109 014715 http.max.delays = 52
060109 014718 Using URL normalizer: org.apache.nutch.net.BasicUrlNormalizer
060109 014724 status: segment 20060109014654, 3 pages, 0 errors, 51033
bytes, 8309 ms
060109 014724 status: 0.36105427 pages/s, 47.98355 kb/s, 17011.0 bytes/page