> One question, though: anyone knows how to set more verbose logging?

You can edit your log4j properties under nutch/conf to enable DEBUG
mode both for hadoop and nutch.

Cheers

> Thanks.
>
> 2006-08-01 19:58:37,576 INFO  fetcher.Fetcher - fetching
> http://www.foo.com/faq.php
> 2006-08-01 19:58:37,599 INFO  http.Http - http.proxy.host = null
> 2006-08-01 19:58:37,599 INFO  http.Http - http.proxy.port = 8080
> 2006-08-01 19:58:37,599 INFO  http.Http - http.timeout = 10000
> 2006-08-01 19:58:37,600 INFO  http.Http - http.content.limit = 65536
> 2006-08-01 19:58:37,600 INFO  http.Http - http.agent = siBot/siBot-0.1
> (http://www.foo.com/; [EMAIL PROTECTED])
> 2006-08-01 19:58:37,600 INFO  http.Http - fetcher.server.delay = 5000
> 2006-08-01 19:58:37,600 INFO  http.Http - http.max.delays = 100
> 2006-08-01 19:58:38,103 INFO  crawl.SignatureFactory - Using Signature
> impl: org.apache.nutch.crawl.MD5Signature
> 2006-08-01 19:58:38,145 INFO  fetcher.Fetcher - fetching
> http://www.foo.com/izobrazevanje.php
> 2006-08-01 19:58:43,569 INFO  fetcher.Fetcher - fetching
> http://www.foo.com/kontakti.php
> 2006-08-01 19:58:48,624 INFO  fetcher.Fetcher - fetching
> http://www.foo.com/portfolio_mailing.php
> 2006-08-01 19:58:53,553 INFO  fetcher.Fetcher - fetching
> http://www.foo.com/online_katalogi.php
> 2006-08-01 19:58:58,597 INFO  fetcher.Fetcher - fetching
> http://www.foo.com/postavitev_sistemov.php
> 2006-08-01 19:59:03,592 INFO  fetcher.Fetcher - fetching
> http://www.foo.com/internet_aplikacije.php
> 2006-08-01 19:59:08,655 INFO  fetcher.Fetcher - fetching
> http://www.foo.com/gradivo.php
>
> ATB,
> Vasja
>
> Stefan Groschupf wrote:
> > Check:
> > http://issues.apache.org/jira/browse/NUTCH-233
> > and let us know if it helps.
> > Stefan
> >
> >
> > Am 31.07.2006 um 07:46 schrieb Matthew Holt:
> >
> >> Fetcher for one, and the mapreduce takes forever... IE the mapreduce
> >> is kind of annoying... is it possible to disable it if I'm not
> >> running on a DFS?
> >> Matt
> >>
> >> 06/07/25 20:59:12 INFO mapred.LocalJobRunner: reduce > reduce
> >> 06/07/25 20:59:14 INFO mapred.LocalJobRunner: reduce > reduce
> >> 06/07/25 20:59:19 INFO mapred.LocalJobRunner: reduce > reduce
> >> 06/07/25 20:59:23 INFO mapred.LocalJobRunner: reduce > reduce
> >> 06/07/25 20:59:29 INFO mapred.LocalJobRunner: reduce > reduce
> >> 06/07/25 20:59:33 INFO mapred.LocalJobRunner: reduce > reduce
> >> 06/07/25 20:59:34 INFO mapred.JobClient:  map 100%  reduce 96%
> >> 06/07/25 20:59:40 INFO mapred.LocalJobRunner: reduce > reduce
> >> 06/07/25 20:59:41 INFO mapred.LocalJobRunner: reduce > reduce
> >> 06/07/25 20:59:42 INFO mapred.LocalJobRunner: reduce > reduce
> >> 06/07/25 20:59:47 INFO mapred.LocalJobRunner: reduce > reduce
> >> 06/07/25 20:59:48 INFO mapred.LocalJobRunner: reduce > reduce
> >> 06/07/25 20:59:52 INFO mapred.LocalJobRunner: reduce > reduce
> >> 06/07/25 20:59:53 INFO mapred.LocalJobRunner: reduce > reduce
> >> 06/07/25 21:00:05 INFO mapred.LocalJobRunner: reduce > reduce
> >> 06/07/25 21:00:22 INFO mapred.LocalJobRunner: reduce > reduce
> >> 06/07/25 21:00:29 INFO mapred.LocalJobRunner: reduce > reduce
> >> 06/07/25 21:00:39 INFO mapred.LocalJobRunner: reduce > reduce
> >> 06/07/25 21:01:07 INFO mapred.LocalJobRunner: reduce > reduce
> >> 06/07/25 21:01:08 INFO mapred.JobClient:  map 100%  reduce 97%
> >> 06/07/25 21:01:16 INFO mapred.LocalJobRunner: reduce > reduce
> >>
> >>
> >> Sami Siren wrote:
> >>> Are you experiencing slowness in general or just on some parts of
> >>> the process.
> >>>
> >>> Current fetcher is deadslow and it should be given immediate
> >>> attention. there have been some talk about the issue but I havent
> >>> seen any code yet.
> >>>
> >>> -- Sami Siren
> >>>
> >>> Matthew Holt wrote:
> >>>> I agree. Is there anyway to disable something to speed it up? IE is
> >>>> the map reduce currently needed if we're not on a DFS?
> >>>>
> >>>> Matt
> >>>>
> >>>> Vasja Ocvirk wrote:
> >>>>
> >>>>> Hello,
> >>>>>
> >>>>> I'm wondering if anyone can help. We injected 1000 seed URLs into
> >>>>> Nutch 0.7.2 (basic configuration + 1000 URLs in regexp filter) and
> >>>>> it processed them in just few hours. We just switched to 0.8 with
> >>>>> same configuration, same URLs, but it seems everything slowed down
> >>>>> significantly. Crawl script has 60 threads -- same as before but
> >>>>> now it works much slower.
> >>>>>
> >>>>> Thanks!
> >>>>>
> >>>>> Best,
> >>>>> Vasja
> >>>>>
> >>>>> __________ NOD32 1.1533 (20060512) Information __________
> >>>>>
> >>>>> This message was checked by NOD32 antivirus system.
> >>>>> http://www.eset.com
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>
> >
> >
> > __________ NOD32 1.1533 (20060512) Information __________
> >
> > This message was checked by NOD32 antivirus system.
> > http://www.eset.com
> >
> >
> >
>

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to