> One question, though: anyone knows how to set more verbose logging? You can edit your log4j properties under nutch/conf to enable DEBUG mode both for hadoop and nutch.
Cheers > Thanks. > > 2006-08-01 19:58:37,576 INFO fetcher.Fetcher - fetching > http://www.foo.com/faq.php > 2006-08-01 19:58:37,599 INFO http.Http - http.proxy.host = null > 2006-08-01 19:58:37,599 INFO http.Http - http.proxy.port = 8080 > 2006-08-01 19:58:37,599 INFO http.Http - http.timeout = 10000 > 2006-08-01 19:58:37,600 INFO http.Http - http.content.limit = 65536 > 2006-08-01 19:58:37,600 INFO http.Http - http.agent = siBot/siBot-0.1 > (http://www.foo.com/; [EMAIL PROTECTED]) > 2006-08-01 19:58:37,600 INFO http.Http - fetcher.server.delay = 5000 > 2006-08-01 19:58:37,600 INFO http.Http - http.max.delays = 100 > 2006-08-01 19:58:38,103 INFO crawl.SignatureFactory - Using Signature > impl: org.apache.nutch.crawl.MD5Signature > 2006-08-01 19:58:38,145 INFO fetcher.Fetcher - fetching > http://www.foo.com/izobrazevanje.php > 2006-08-01 19:58:43,569 INFO fetcher.Fetcher - fetching > http://www.foo.com/kontakti.php > 2006-08-01 19:58:48,624 INFO fetcher.Fetcher - fetching > http://www.foo.com/portfolio_mailing.php > 2006-08-01 19:58:53,553 INFO fetcher.Fetcher - fetching > http://www.foo.com/online_katalogi.php > 2006-08-01 19:58:58,597 INFO fetcher.Fetcher - fetching > http://www.foo.com/postavitev_sistemov.php > 2006-08-01 19:59:03,592 INFO fetcher.Fetcher - fetching > http://www.foo.com/internet_aplikacije.php > 2006-08-01 19:59:08,655 INFO fetcher.Fetcher - fetching > http://www.foo.com/gradivo.php > > ATB, > Vasja > > Stefan Groschupf wrote: > > Check: > > http://issues.apache.org/jira/browse/NUTCH-233 > > and let us know if it helps. > > Stefan > > > > > > Am 31.07.2006 um 07:46 schrieb Matthew Holt: > > > >> Fetcher for one, and the mapreduce takes forever... IE the mapreduce > >> is kind of annoying... is it possible to disable it if I'm not > >> running on a DFS? > >> Matt > >> > >> 06/07/25 20:59:12 INFO mapred.LocalJobRunner: reduce > reduce > >> 06/07/25 20:59:14 INFO mapred.LocalJobRunner: reduce > reduce > >> 06/07/25 20:59:19 INFO mapred.LocalJobRunner: reduce > reduce > >> 06/07/25 20:59:23 INFO mapred.LocalJobRunner: reduce > reduce > >> 06/07/25 20:59:29 INFO mapred.LocalJobRunner: reduce > reduce > >> 06/07/25 20:59:33 INFO mapred.LocalJobRunner: reduce > reduce > >> 06/07/25 20:59:34 INFO mapred.JobClient: map 100% reduce 96% > >> 06/07/25 20:59:40 INFO mapred.LocalJobRunner: reduce > reduce > >> 06/07/25 20:59:41 INFO mapred.LocalJobRunner: reduce > reduce > >> 06/07/25 20:59:42 INFO mapred.LocalJobRunner: reduce > reduce > >> 06/07/25 20:59:47 INFO mapred.LocalJobRunner: reduce > reduce > >> 06/07/25 20:59:48 INFO mapred.LocalJobRunner: reduce > reduce > >> 06/07/25 20:59:52 INFO mapred.LocalJobRunner: reduce > reduce > >> 06/07/25 20:59:53 INFO mapred.LocalJobRunner: reduce > reduce > >> 06/07/25 21:00:05 INFO mapred.LocalJobRunner: reduce > reduce > >> 06/07/25 21:00:22 INFO mapred.LocalJobRunner: reduce > reduce > >> 06/07/25 21:00:29 INFO mapred.LocalJobRunner: reduce > reduce > >> 06/07/25 21:00:39 INFO mapred.LocalJobRunner: reduce > reduce > >> 06/07/25 21:01:07 INFO mapred.LocalJobRunner: reduce > reduce > >> 06/07/25 21:01:08 INFO mapred.JobClient: map 100% reduce 97% > >> 06/07/25 21:01:16 INFO mapred.LocalJobRunner: reduce > reduce > >> > >> > >> Sami Siren wrote: > >>> Are you experiencing slowness in general or just on some parts of > >>> the process. > >>> > >>> Current fetcher is deadslow and it should be given immediate > >>> attention. there have been some talk about the issue but I havent > >>> seen any code yet. > >>> > >>> -- Sami Siren > >>> > >>> Matthew Holt wrote: > >>>> I agree. Is there anyway to disable something to speed it up? IE is > >>>> the map reduce currently needed if we're not on a DFS? > >>>> > >>>> Matt > >>>> > >>>> Vasja Ocvirk wrote: > >>>> > >>>>> Hello, > >>>>> > >>>>> I'm wondering if anyone can help. We injected 1000 seed URLs into > >>>>> Nutch 0.7.2 (basic configuration + 1000 URLs in regexp filter) and > >>>>> it processed them in just few hours. We just switched to 0.8 with > >>>>> same configuration, same URLs, but it seems everything slowed down > >>>>> significantly. Crawl script has 60 threads -- same as before but > >>>>> now it works much slower. > >>>>> > >>>>> Thanks! > >>>>> > >>>>> Best, > >>>>> Vasja > >>>>> > >>>>> __________ NOD32 1.1533 (20060512) Information __________ > >>>>> > >>>>> This message was checked by NOD32 antivirus system. > >>>>> http://www.eset.com > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>> > >>> > >>> > >> > > > > > > __________ NOD32 1.1533 (20060512) Information __________ > > > > This message was checked by NOD32 antivirus system. > > http://www.eset.com > > > > > > > ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
