I have followed the media-style.com quick tutorial, but when I try to
fetch my segment the fetch is killed!
Have tried to set the system timer + 30 days, no anti-virus is running
on the systems.
System SUSE 9.2 and SUSE 10
# bin/nutch fetch segments/20060109014654/
060109 014714 parsing
file:/home/hkongsgaard/nutch-0.7.1/conf/nutch-default.xml
060109 014715 parsing file:/home/hkongsgaard/nutch-0.7.1/conf/nutch-site.xml
060109 014715 No FS indicated, using default:local
060109 014715 Plugins: looking in: /home/hkongsgaard/nutch-0.7.1/plugins
060109 014715 not including:
/home/hkongsgaard/nutch-0.7.1/plugins/query-more
060109 014715 parsing:
/home/hkongsgaard/nutch-0.7.1/plugins/query-site/plugin.xml
060109 014715 impl: point=org.apache.nutch.searcher.QueryFilter
class=org.apache.nutch.searcher.site.SiteQueryFilter
060109 014715 parsing:
/home/hkongsgaard/nutch-0.7.1/plugins/parse-html/plugin.xml
060109 014715 impl: point=org.apache.nutch.parse.Parser
class=org.apache.nutch.parse.html.HtmlParser
060109 014715 parsing:
/home/hkongsgaard/nutch-0.7.1/plugins/parse-text/plugin.xml
060109 014715 impl: point=org.apache.nutch.parse.Parser
class=org.apache.nutch.parse.text.TextParser
060109 014715 not including: /home/hkongsgaard/nutch-0.7.1/plugins/parse-ext
060109 014715 not including: /home/hkongsgaard/nutch-0.7.1/plugins/parse-pdf
060109 014715 not including: /home/hkongsgaard/nutch-0.7.1/plugins/parse-rss
060109 014715 parsing:
/home/hkongsgaard/nutch-0.7.1/plugins/query-basic/plugin.xml
060109 014715 impl: point=org.apache.nutch.searcher.QueryFilter
class=org.apache.nutch.searcher.basic.BasicQueryFilter
060109 014715 not including:
/home/hkongsgaard/nutch-0.7.1/plugins/index-more
060109 014715 not including: /home/hkongsgaard/nutch-0.7.1/plugins/parse-js
060109 014715 parsing:
/home/hkongsgaard/nutch-0.7.1/plugins/urlfilter-regex/plugin.xml
060109 014715 impl: point=org.apache.nutch.net.URLFilter
class=org.apache.nutch.net.RegexURLFilter
060109 014715 not including:
/home/hkongsgaard/nutch-0.7.1/plugins/protocol-ftp
060109 014715 not including:
/home/hkongsgaard/nutch-0.7.1/plugins/parse-msword
060109 014715 not including:
/home/hkongsgaard/nutch-0.7.1/plugins/creativecommons
060109 014715 not including: /home/hkongsgaard/nutch-0.7.1/plugins/ontology
060109 014715 parsing:
/home/hkongsgaard/nutch-0.7.1/plugins/nutch-extensionpoints/plugin.xml
060109 014715 not including:
/home/hkongsgaard/nutch-0.7.1/plugins/protocol-file
060109 014715 parsing:
/home/hkongsgaard/nutch-0.7.1/plugins/protocol-http/plugin.xml
060109 014715 impl: point=org.apache.nutch.protocol.Protocol
class=org.apache.nutch.protocol.http.Http
060109 014715 not including:
/home/hkongsgaard/nutch-0.7.1/plugins/clustering-carrot2
060109 014715 not including:
/home/hkongsgaard/nutch-0.7.1/plugins/language-identifier
060109 014715 not including:
/home/hkongsgaard/nutch-0.7.1/plugins/urlfilter-prefix
060109 014715 parsing:
/home/hkongsgaard/nutch-0.7.1/plugins/query-url/plugin.xml
060109 014715 impl: point=org.apache.nutch.searcher.QueryFilter
class=org.apache.nutch.searcher.url.URLQueryFilter
060109 014715 parsing:
/home/hkongsgaard/nutch-0.7.1/plugins/index-basic/plugin.xml
060109 014715 impl: point=org.apache.nutch.indexer.IndexingFilter
class=org.apache.nutch.indexer.basic.BasicIndexingFilter
060109 014715 not including:
/home/hkongsgaard/nutch-0.7.1/plugins/protocol-httpclient
060109 014715 logging at INFO
060109 014715 fetching http://www.sourceforge.net/
060109 014715 fetching http://www.apache.org/
060109 014715 fetching http://www.nutch.org/
060109 014715 http.proxy.host = null
060109 014715 http.proxy.port = 8080
060109 014715 http.timeout = 10000
060109 014715 http.content.limit = -1
060109 014715 http.agent = NutchCVS/0.7.1 (Nutch;
http://lucene.apache.org/nutch/bot.html; [email protected])
060109 014715 fetcher.server.delay = 5000
060109 014715 http.max.delays = 52
060109 014718 Using URL normalizer: org.apache.nutch.net.BasicUrlNormalizer
060109 014724 status: segment 20060109014654, 3 pages, 0 errors, 51033
bytes, 8309 ms
060109 014724 status: 0.36105427 pages/s, 47.98355 kb/s, 17011.0 bytes/page
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general