When I feed my domain into the database the segment fetch output was like this:

-.-.-.-.-.-.-.-.-.-.-.-.-
060109 154622 fetching http://www.niap.no/magasinet/nyheter/nord_amerika/usa/israelsk_lobby_sparker_to_ansatte
060109 154622 fetching http://www.niap.no/magasinet/nyheter/afrika
060109 154622 fetching http://www.niap.no/magasinet/nyheter/asia_australia
060109 154622 fetching http://www.niap.no/magasinet/nyheter/midtoesten/libya/eu_oensker_aa_oppheve_forbudet_mot_vaapenhandel_med_libya
060109 154622 fetching http://www.niap.no/magasinet/rss/feed/magasinet_rss1
060109 154622 fetching http://www.niap.no/magasinet/content/search
060109 154622 fetching http://www.niap.no/magasinet/nyheter/europa/tyrkia/tyrkia_vil_innfoere_fengselstraff_for_utroskap 060109 154622 fetching http://www.niap.no/magasinet/nyheter/europa/russland/stalin_vender_tilbake
060109 154622 fetching http://www.niap.no/magasinet/nyheter/nord_amerika
060109 154626 fetch okay, but can't parse http://www.niap.no/magasinet/rss/feed/magasinet_rss1, reason: failed(2,203): Content-Type not text/html: text/xml 060109 154626 fetching http://www.niap.no/magasinet/nyheter/midtoesten/irak/al_queida
060109 154633 Using URL normalizer: org.apache.nutch.net.BasicUrlNormalizer
060109 154633 fetching http://www.niap.no/magasinet/niap/test
060109 154639 fetching http://www.niap.no/magasinet/nyheter/europa/italia/pave_benedict_xvi 060109 154642 fetch of http://www.niap.no/magasinet/nyheter/nord_amerika/usa/israelsk_lobby_sparker_to_ansatte failed with: java.lang.Exception: org.apache.nutch.protocol.RetryLater: Exceeded http.max.delays: retry later. 060109 154642 fetch of http://www.niap.no/magasinet/nyheter/asia_australia failed with: java.lang.Exception: org.apache.nutch.protocol.RetryLater: Exceeded http.max.delays: retry later. 060109 154642 fetch of http://www.niap.no/magasinet/nyheter/afrika failed with: java.lang.Exception: org.apache.nutch.protocol.RetryLater: Exceeded http.max.delays: retry later.
060109 154642 fetching http://www.niap.no/magasinet/nyheter/soer_amerika
060109 154642 fetch of http://www.niap.no/magasinet/nyheter/europa/tyrkia/tyrkia_vil_innfoere_fengselstraff_for_utroskap failed with: java.lang.Exception: org.apache.nutch.protocol.RetryLater: Exceeded http.max.delays: retry later. 060109 154642 fetch of http://www.niap.no/magasinet/nyheter/midtoesten/palestina_israel/israel_bekymret_for_landets_internasjonale_image failed with: java.lang.Exception: org.apache.nutch.protocol.RetryLater: Exceeded http.max.delays: retry later. 060109 154642 fetch of http://www.niap.no/magasinet/nyheter/midtoesten/libya/eu_oensker_aa_oppheve_forbudet_mot_vaapenhandel_med_libya failed with: java.lang.Exception: org.apache.nutch.protocol.RetryLater: Exceeded http.max.delays: retry later. 060109 154642 fetch of http://www.niap.no/magasinet/content/search failed with: java.lang.Exception: org.apache.nutch.protocol.RetryLater: Exceeded http.max.delays: retry later. 060109 154642 fetching http://www.niap.no/index.php/magasinet/nyheter/s_r_amerika

-.-.-.-.-.-.-
But then

-.-.-.-.-.-
060109 154714 fetch of http://phpadsnew.niap.no/adx.js failed with: java.lang.Exception: org.apache.nutch.protocol.RetryLater: Exceeded http.max.delays: retry later. 060109 154714 fetching http://www.niap.no/magasinet/nyheter/midtoesten/syria/russland_selger_luftforsvarssystem_til_syria 060109 154722 fetch of http://www.niap.org/ failed with: java.lang.Exception: java.net.SocketTimeoutException: connect timed out 060109 154724 fetch of http://www.niap.no/index.php/magasinet/nyheter/nord_amerika failed with: java.lang.Exception: org.apache.nutch.protocol.RetryLater: Exceeded http.max.delays: retry later. 060109 154724 fetch of http://www.niap.no/magasinet failed with: java.lang.Exception: org.apache.nutch.protocol.RetryLater: Exceeded http.max.delays: retry later. 060109 154724 fetch of http://www.niap.no/magasinet/kontakt_oss failed with: java.lang.Exception: org.apache.nutch.protocol.RetryLater: Exceeded http.max.delays: retry later. 060109 154724 fetch of http://www.niap.no/magasinet/magasinet/om_magasinet failed with: java.lang.Exception: org.apache.nutch.protocol.RetryLater: Exceeded http.max.delays: retry later. 060109 154724 fetch of http://www.niap.no/magasinet/layout/set/print failed with: java.lang.Exception: org.apache.nutch.protocol.RetryLater: Exceeded http.max.delays: retry later. 060109 154729 fetch of http://www.niap.no/magasinet/nyheter/midtoesten/syria/russland_selger_luftforsvarssystem_til_syria failed with: java.lang.Exception: org.apache.nutch.protocol.RetryLater: Exceeded http.max.delays: retry later. 060109 154730 status: segment 20060109154516, 12 pages, 31 errors, 181559 bytes, 68511 ms 060109 154730 status: 0.17515436 pages/s, 20.703678 kb/s, 15129.917 bytes/page

-.-.-.-.-.-
What is  java.net.SocketTimeoutException?




Håvard W. Kongsgård wrote:

Is the fetcher not supposed to fetch all the docs from the urls provide in the ulrs.txt file? The fetch process only takes some seconds, and the whole quick tutorial is done in a minute.



Stefan Groschupf wrote:

I can not see any problems in your log, it fetched successfully 3 pages.
Can provide a more specific problem description?

Am 09.12.2005 um 01:57 schrieb Håvard W. Kongsgård:

I have followed the media-style.com quick tutorial, but when I try to fetch my segment the fetch is killed!

Have tried to set the system timer + 30 days, no anti-virus is running on the systems.
System SUSE 9.2 and SUSE 10

# bin/nutch fetch segments/20060109014654/
060109 014714 parsing file:/home/hkongsgaard/nutch-0.7.1/conf/nutch- default.xml 060109 014715 parsing file:/home/hkongsgaard/nutch-0.7.1/conf/nutch- site.xml
060109 014715 No FS indicated, using default:local
060109 014715 Plugins: looking in: /home/hkongsgaard/nutch-0.7.1/ plugins 060109 014715 not including: /home/hkongsgaard/nutch-0.7.1/plugins/ query-more 060109 014715 parsing: /home/hkongsgaard/nutch-0.7.1/plugins/query- site/plugin.xml 060109 014715 impl: point=org.apache.nutch.searcher.QueryFilter class=org.apache.nutch.searcher.site.SiteQueryFilter 060109 014715 parsing: /home/hkongsgaard/nutch-0.7.1/plugins/parse- html/plugin.xml 060109 014715 impl: point=org.apache.nutch.parse.Parser class=org.apache.nutch.parse.html.HtmlParser 060109 014715 parsing: /home/hkongsgaard/nutch-0.7.1/plugins/parse- text/plugin.xml 060109 014715 impl: point=org.apache.nutch.parse.Parser class=org.apache.nutch.parse.text.TextParser 060109 014715 not including: /home/hkongsgaard/nutch-0.7.1/plugins/ parse-ext 060109 014715 not including: /home/hkongsgaard/nutch-0.7.1/plugins/ parse-pdf 060109 014715 not including: /home/hkongsgaard/nutch-0.7.1/plugins/ parse-rss 060109 014715 parsing: /home/hkongsgaard/nutch-0.7.1/plugins/query- basic/plugin.xml 060109 014715 impl: point=org.apache.nutch.searcher.QueryFilter class=org.apache.nutch.searcher.basic.BasicQueryFilter 060109 014715 not including: /home/hkongsgaard/nutch-0.7.1/plugins/ index-more 060109 014715 not including: /home/hkongsgaard/nutch-0.7.1/plugins/ parse-js 060109 014715 parsing: /home/hkongsgaard/nutch-0.7.1/plugins/ urlfilter-regex/plugin.xml 060109 014715 impl: point=org.apache.nutch.net.URLFilter class=org.apache.nutch.net.RegexURLFilter 060109 014715 not including: /home/hkongsgaard/nutch-0.7.1/plugins/ protocol-ftp 060109 014715 not including: /home/hkongsgaard/nutch-0.7.1/plugins/ parse-msword 060109 014715 not including: /home/hkongsgaard/nutch-0.7.1/plugins/ creativecommons 060109 014715 not including: /home/hkongsgaard/nutch-0.7.1/plugins/ ontology 060109 014715 parsing: /home/hkongsgaard/nutch-0.7.1/plugins/nutch- extensionpoints/plugin.xml 060109 014715 not including: /home/hkongsgaard/nutch-0.7.1/plugins/ protocol-file 060109 014715 parsing: /home/hkongsgaard/nutch-0.7.1/plugins/ protocol-http/plugin.xml 060109 014715 impl: point=org.apache.nutch.protocol.Protocol class=org.apache.nutch.protocol.http.Http 060109 014715 not including: /home/hkongsgaard/nutch-0.7.1/plugins/ clustering-carrot2 060109 014715 not including: /home/hkongsgaard/nutch-0.7.1/plugins/ language-identifier 060109 014715 not including: /home/hkongsgaard/nutch-0.7.1/plugins/ urlfilter-prefix 060109 014715 parsing: /home/hkongsgaard/nutch-0.7.1/plugins/query- url/plugin.xml 060109 014715 impl: point=org.apache.nutch.searcher.QueryFilter class=org.apache.nutch.searcher.url.URLQueryFilter 060109 014715 parsing: /home/hkongsgaard/nutch-0.7.1/plugins/index- basic/plugin.xml 060109 014715 impl: point=org.apache.nutch.indexer.IndexingFilter class=org.apache.nutch.indexer.basic.BasicIndexingFilter 060109 014715 not including: /home/hkongsgaard/nutch-0.7.1/plugins/ protocol-httpclient
060109 014715 logging at INFO
060109 014715 fetching http://www.sourceforge.net/
060109 014715 fetching http://www.apache.org/
060109 014715 fetching http://www.nutch.org/
060109 014715 http.proxy.host = null
060109 014715 http.proxy.port = 8080
060109 014715 http.timeout = 10000
060109 014715 http.content.limit = -1
060109 014715 http.agent = NutchCVS/0.7.1 (Nutch; http:// lucene.apache.org/nutch/bot.html; [email protected])
060109 014715 fetcher.server.delay = 5000
060109 014715 http.max.delays = 52
060109 014718 Using URL normalizer: org.apache.nutch.net.BasicUrlNormalizer 060109 014724 status: segment 20060109014654, 3 pages, 0 errors, 51033 bytes, 8309 ms 060109 014724 status: 0.36105427 pages/s, 47.98355 kb/s, 17011.0 bytes/page












-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to