Is the fetcher not supposed to fetch all the docs from the urls provide
in the ulrs.txt file?
The fetch process only takes some seconds, and the whole quick tutorial
is done in a minute.
Stefan Groschupf wrote:
I can not see any problems in your log, it fetched successfully 3 pages.
Can provide a more specific problem description?
Am 09.12.2005 um 01:57 schrieb Håvard W. Kongsgård:
I have followed the media-style.com quick tutorial, but when I try
to fetch my segment the fetch is killed!
Have tried to set the system timer + 30 days, no anti-virus is
running on the systems.
System SUSE 9.2 and SUSE 10
# bin/nutch fetch segments/20060109014654/
060109 014714 parsing file:/home/hkongsgaard/nutch-0.7.1/conf/nutch-
default.xml
060109 014715 parsing file:/home/hkongsgaard/nutch-0.7.1/conf/nutch-
site.xml
060109 014715 No FS indicated, using default:local
060109 014715 Plugins: looking in: /home/hkongsgaard/nutch-0.7.1/
plugins
060109 014715 not including: /home/hkongsgaard/nutch-0.7.1/plugins/
query-more
060109 014715 parsing: /home/hkongsgaard/nutch-0.7.1/plugins/query-
site/plugin.xml
060109 014715 impl: point=org.apache.nutch.searcher.QueryFilter
class=org.apache.nutch.searcher.site.SiteQueryFilter
060109 014715 parsing: /home/hkongsgaard/nutch-0.7.1/plugins/parse-
html/plugin.xml
060109 014715 impl: point=org.apache.nutch.parse.Parser
class=org.apache.nutch.parse.html.HtmlParser
060109 014715 parsing: /home/hkongsgaard/nutch-0.7.1/plugins/parse-
text/plugin.xml
060109 014715 impl: point=org.apache.nutch.parse.Parser
class=org.apache.nutch.parse.text.TextParser
060109 014715 not including: /home/hkongsgaard/nutch-0.7.1/plugins/
parse-ext
060109 014715 not including: /home/hkongsgaard/nutch-0.7.1/plugins/
parse-pdf
060109 014715 not including: /home/hkongsgaard/nutch-0.7.1/plugins/
parse-rss
060109 014715 parsing: /home/hkongsgaard/nutch-0.7.1/plugins/query-
basic/plugin.xml
060109 014715 impl: point=org.apache.nutch.searcher.QueryFilter
class=org.apache.nutch.searcher.basic.BasicQueryFilter
060109 014715 not including: /home/hkongsgaard/nutch-0.7.1/plugins/
index-more
060109 014715 not including: /home/hkongsgaard/nutch-0.7.1/plugins/
parse-js
060109 014715 parsing: /home/hkongsgaard/nutch-0.7.1/plugins/
urlfilter-regex/plugin.xml
060109 014715 impl: point=org.apache.nutch.net.URLFilter
class=org.apache.nutch.net.RegexURLFilter
060109 014715 not including: /home/hkongsgaard/nutch-0.7.1/plugins/
protocol-ftp
060109 014715 not including: /home/hkongsgaard/nutch-0.7.1/plugins/
parse-msword
060109 014715 not including: /home/hkongsgaard/nutch-0.7.1/plugins/
creativecommons
060109 014715 not including: /home/hkongsgaard/nutch-0.7.1/plugins/
ontology
060109 014715 parsing: /home/hkongsgaard/nutch-0.7.1/plugins/nutch-
extensionpoints/plugin.xml
060109 014715 not including: /home/hkongsgaard/nutch-0.7.1/plugins/
protocol-file
060109 014715 parsing: /home/hkongsgaard/nutch-0.7.1/plugins/
protocol-http/plugin.xml
060109 014715 impl: point=org.apache.nutch.protocol.Protocol
class=org.apache.nutch.protocol.http.Http
060109 014715 not including: /home/hkongsgaard/nutch-0.7.1/plugins/
clustering-carrot2
060109 014715 not including: /home/hkongsgaard/nutch-0.7.1/plugins/
language-identifier
060109 014715 not including: /home/hkongsgaard/nutch-0.7.1/plugins/
urlfilter-prefix
060109 014715 parsing: /home/hkongsgaard/nutch-0.7.1/plugins/query-
url/plugin.xml
060109 014715 impl: point=org.apache.nutch.searcher.QueryFilter
class=org.apache.nutch.searcher.url.URLQueryFilter
060109 014715 parsing: /home/hkongsgaard/nutch-0.7.1/plugins/index-
basic/plugin.xml
060109 014715 impl: point=org.apache.nutch.indexer.IndexingFilter
class=org.apache.nutch.indexer.basic.BasicIndexingFilter
060109 014715 not including: /home/hkongsgaard/nutch-0.7.1/plugins/
protocol-httpclient
060109 014715 logging at INFO
060109 014715 fetching http://www.sourceforge.net/
060109 014715 fetching http://www.apache.org/
060109 014715 fetching http://www.nutch.org/
060109 014715 http.proxy.host = null
060109 014715 http.proxy.port = 8080
060109 014715 http.timeout = 10000
060109 014715 http.content.limit = -1
060109 014715 http.agent = NutchCVS/0.7.1 (Nutch; http://
lucene.apache.org/nutch/bot.html; [email protected])
060109 014715 fetcher.server.delay = 5000
060109 014715 http.max.delays = 52
060109 014718 Using URL normalizer:
org.apache.nutch.net.BasicUrlNormalizer
060109 014724 status: segment 20060109014654, 3 pages, 0 errors,
51033 bytes, 8309 ms
060109 014724 status: 0.36105427 pages/s, 47.98355 kb/s, 17011.0
bytes/page
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general