Here is some more text from the log. It seems that it slows down at
mapred.LocalJobRunner



2006-08-02 10:12:28,160 INFO  mapred.LocalJobRunner - 36 pages, 0 
errors, 0.3 pages/s, 51 kb/s,
2006-08-02 10:12:28,900 DEBUG http.Http - fetching 
http://www.foo.com/internet_aplikacije.php
2006-08-02 10:12:28,918 DEBUG http.Http - fetched 25812 bytes from 
http://www.foo.com/internet_aplikacije.php
2006-08-02 10:12:28,920 DEBUG parse.ParseUtil - Parsing 
[http://www.foo.com/internet_aplikacije.php] with 
[EMAIL PROTECTED]
2006-08-02 10:12:28,920 DEBUG parse.html - 
http://www.foo.com/internet_aplikacije.php: setting encoding to ISO-8859-2
2006-08-02 10:12:28,920 DEBUG parse.html - Parsing...
2006-08-02 10:12:28,932 DEBUG parse.html - Meta tags for 
http://www.foo.com/internet_aplikacije.php: base=null, noCache=false, 
noFollow=false, noIndex=false, refresh=false, refreshHref=null
* general tags:
  - keywords   =       
cms,aplikacija,modul,vodenje,kontaktov,trgovina,upravljanje,vo??ilnice,anketa,internet
 
trgovina,izdelava
  - author     =       Foo
  - description        =       Aplikacija za vodenje kontaktov. CMS - 
Sistem za upravljanje z vsebinami.
  - robots     =       INDEX,FOLLOW
* http-equiv tags:
  - content-type       =       text/html; charset=iso-8859-2

2006-08-02 10:12:28,932 DEBUG parse.html - Getting text...
2006-08-02 10:12:28,938 DEBUG parse.html - Getting title...
2006-08-02 10:12:28,938 DEBUG parse.html - Getting links...
2006-08-02 10:12:28,942 DEBUG parse.html - found 160 outlinks in 
http://www.foo.com/internet_aplikacije.php
2006-08-02 10:12:29,162 INFO  mapred.LocalJobRunner - 37 pages, 0 
errors, 0.3 pages/s, 52 kb/s,
2006-08-02 10:12:30,164 INFO  mapred.LocalJobRunner - 37 pages, 0 
errors, 0.3 pages/s, 52 kb/s,
2006-08-02 10:12:31,166 INFO  mapred.LocalJobRunner - 37 pages, 0 
errors, 0.3 pages/s, 51 kb/s,
2006-08-02 10:12:32,168 INFO  mapred.LocalJobRunner - 37 pages, 0 
errors, 0.3 pages/s, 51 kb/s,
2006-08-02 10:12:33,170 INFO  mapred.LocalJobRunner - 37 pages, 0 
errors, 0.3 pages/s, 50 kb/s,
2006-08-02 10:12:33,918 DEBUG http.Http - fetching 
http://www.foo.com/mediji.php

ATB,
Vasja

Zaheed Haque wrote:
>> One question, though: anyone knows how to set more verbose logging?
>
> You can edit your log4j properties under nutch/conf to enable DEBUG
> mode both for hadoop and nutch.
>
> Cheers
>
>> Thanks.
>>
>> 2006-08-01 19:58:37,576 INFO  fetcher.Fetcher - fetching
>> http://www.foo.com/faq.php
>> 2006-08-01 19:58:37,599 INFO  http.Http - http.proxy.host = null
>> 2006-08-01 19:58:37,599 INFO  http.Http - http.proxy.port = 8080
>> 2006-08-01 19:58:37,599 INFO  http.Http - http.timeout = 10000
>> 2006-08-01 19:58:37,600 INFO  http.Http - http.content.limit = 65536
>> 2006-08-01 19:58:37,600 INFO  http.Http - http.agent = siBot/siBot-0.1
>> (http://www.foo.com/; [EMAIL PROTECTED])
>> 2006-08-01 19:58:37,600 INFO  http.Http - fetcher.server.delay = 5000
>> 2006-08-01 19:58:37,600 INFO  http.Http - http.max.delays = 100
>> 2006-08-01 19:58:38,103 INFO  crawl.SignatureFactory - Using Signature
>> impl: org.apache.nutch.crawl.MD5Signature
>> 2006-08-01 19:58:38,145 INFO  fetcher.Fetcher - fetching
>> http://www.foo.com/izobrazevanje.php
>> 2006-08-01 19:58:43,569 INFO  fetcher.Fetcher - fetching
>> http://www.foo.com/kontakti.php
>> 2006-08-01 19:58:48,624 INFO  fetcher.Fetcher - fetching
>> http://www.foo.com/portfolio_mailing.php
>> 2006-08-01 19:58:53,553 INFO  fetcher.Fetcher - fetching
>> http://www.foo.com/online_katalogi.php
>> 2006-08-01 19:58:58,597 INFO  fetcher.Fetcher - fetching
>> http://www.foo.com/postavitev_sistemov.php
>> 2006-08-01 19:59:03,592 INFO  fetcher.Fetcher - fetching
>> http://www.foo.com/internet_aplikacije.php
>> 2006-08-01 19:59:08,655 INFO  fetcher.Fetcher - fetching
>> http://www.foo.com/gradivo.php
>>
>> ATB,
>> Vasja
>>
>> Stefan Groschupf wrote:
>> > Check:
>> > http://issues.apache.org/jira/browse/NUTCH-233
>> > and let us know if it helps.
>> > Stefan
>> >
>> >
>> > Am 31.07.2006 um 07:46 schrieb Matthew Holt:
>> >
>> >> Fetcher for one, and the mapreduce takes forever... IE the mapreduce
>> >> is kind of annoying... is it possible to disable it if I'm not
>> >> running on a DFS?
>> >> Matt
>> >>
>> >> 06/07/25 20:59:12 INFO mapred.LocalJobRunner: reduce > reduce
>> >> 06/07/25 20:59:14 INFO mapred.LocalJobRunner: reduce > reduce
>> >> 06/07/25 20:59:19 INFO mapred.LocalJobRunner: reduce > reduce
>> >> 06/07/25 20:59:23 INFO mapred.LocalJobRunner: reduce > reduce
>> >> 06/07/25 20:59:29 INFO mapred.LocalJobRunner: reduce > reduce
>> >> 06/07/25 20:59:33 INFO mapred.LocalJobRunner: reduce > reduce
>> >> 06/07/25 20:59:34 INFO mapred.JobClient:  map 100%  reduce 96%
>> >> 06/07/25 20:59:40 INFO mapred.LocalJobRunner: reduce > reduce
>> >> 06/07/25 20:59:41 INFO mapred.LocalJobRunner: reduce > reduce
>> >> 06/07/25 20:59:42 INFO mapred.LocalJobRunner: reduce > reduce
>> >> 06/07/25 20:59:47 INFO mapred.LocalJobRunner: reduce > reduce
>> >> 06/07/25 20:59:48 INFO mapred.LocalJobRunner: reduce > reduce
>> >> 06/07/25 20:59:52 INFO mapred.LocalJobRunner: reduce > reduce
>> >> 06/07/25 20:59:53 INFO mapred.LocalJobRunner: reduce > reduce
>> >> 06/07/25 21:00:05 INFO mapred.LocalJobRunner: reduce > reduce
>> >> 06/07/25 21:00:22 INFO mapred.LocalJobRunner: reduce > reduce
>> >> 06/07/25 21:00:29 INFO mapred.LocalJobRunner: reduce > reduce
>> >> 06/07/25 21:00:39 INFO mapred.LocalJobRunner: reduce > reduce
>> >> 06/07/25 21:01:07 INFO mapred.LocalJobRunner: reduce > reduce
>> >> 06/07/25 21:01:08 INFO mapred.JobClient:  map 100%  reduce 97%
>> >> 06/07/25 21:01:16 INFO mapred.LocalJobRunner: reduce > reduce
>> >>
>> >>
>> >> Sami Siren wrote:
>> >>> Are you experiencing slowness in general or just on some parts of
>> >>> the process.
>> >>>
>> >>> Current fetcher is deadslow and it should be given immediate
>> >>> attention. there have been some talk about the issue but I havent
>> >>> seen any code yet.
>> >>>
>> >>> -- Sami Siren
>> >>>
>> >>> Matthew Holt wrote:
>> >>>> I agree. Is there anyway to disable something to speed it up? IE is
>> >>>> the map reduce currently needed if we're not on a DFS?
>> >>>>
>> >>>> Matt
>> >>>>
>> >>>> Vasja Ocvirk wrote:
>> >>>>
>> >>>>> Hello,
>> >>>>>
>> >>>>> I'm wondering if anyone can help. We injected 1000 seed URLs into
>> >>>>> Nutch 0.7.2 (basic configuration + 1000 URLs in regexp filter) and
>> >>>>> it processed them in just few hours. We just switched to 0.8 with
>> >>>>> same configuration, same URLs, but it seems everything slowed down
>> >>>>> significantly. Crawl script has 60 threads -- same as before but
>> >>>>> now it works much slower.
>> >>>>>
>> >>>>> Thanks!
>> >>>>>
>> >>>>> Best,
>> >>>>> Vasja
>> >>>>>
>> >>>>> __________ NOD32 1.1533 (20060512) Information __________
>> >>>>>
>> >>>>> This message was checked by NOD32 antivirus system.
>> >>>>> http://www.eset.com
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>>
>> >>
>> >
>> >
>> > __________ NOD32 1.1533 (20060512) Information __________
>> >
>> > This message was checked by NOD32 antivirus system.
>> > http://www.eset.com
>> >
>> >
>> >
>>
>

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to