Here is some more text from the log. It seems that it slows down at mapred.LocalJobRunner
2006-08-02 10:12:28,160 INFO mapred.LocalJobRunner - 36 pages, 0 errors, 0.3 pages/s, 51 kb/s, 2006-08-02 10:12:28,900 DEBUG http.Http - fetching http://www.foo.com/internet_aplikacije.php 2006-08-02 10:12:28,918 DEBUG http.Http - fetched 25812 bytes from http://www.foo.com/internet_aplikacije.php 2006-08-02 10:12:28,920 DEBUG parse.ParseUtil - Parsing [http://www.foo.com/internet_aplikacije.php] with [EMAIL PROTECTED] 2006-08-02 10:12:28,920 DEBUG parse.html - http://www.foo.com/internet_aplikacije.php: setting encoding to ISO-8859-2 2006-08-02 10:12:28,920 DEBUG parse.html - Parsing... 2006-08-02 10:12:28,932 DEBUG parse.html - Meta tags for http://www.foo.com/internet_aplikacije.php: base=null, noCache=false, noFollow=false, noIndex=false, refresh=false, refreshHref=null * general tags: - keywords = cms,aplikacija,modul,vodenje,kontaktov,trgovina,upravljanje,vo??ilnice,anketa,internet trgovina,izdelava - author = Foo - description = Aplikacija za vodenje kontaktov. CMS - Sistem za upravljanje z vsebinami. - robots = INDEX,FOLLOW * http-equiv tags: - content-type = text/html; charset=iso-8859-2 2006-08-02 10:12:28,932 DEBUG parse.html - Getting text... 2006-08-02 10:12:28,938 DEBUG parse.html - Getting title... 2006-08-02 10:12:28,938 DEBUG parse.html - Getting links... 2006-08-02 10:12:28,942 DEBUG parse.html - found 160 outlinks in http://www.foo.com/internet_aplikacije.php 2006-08-02 10:12:29,162 INFO mapred.LocalJobRunner - 37 pages, 0 errors, 0.3 pages/s, 52 kb/s, 2006-08-02 10:12:30,164 INFO mapred.LocalJobRunner - 37 pages, 0 errors, 0.3 pages/s, 52 kb/s, 2006-08-02 10:12:31,166 INFO mapred.LocalJobRunner - 37 pages, 0 errors, 0.3 pages/s, 51 kb/s, 2006-08-02 10:12:32,168 INFO mapred.LocalJobRunner - 37 pages, 0 errors, 0.3 pages/s, 51 kb/s, 2006-08-02 10:12:33,170 INFO mapred.LocalJobRunner - 37 pages, 0 errors, 0.3 pages/s, 50 kb/s, 2006-08-02 10:12:33,918 DEBUG http.Http - fetching http://www.foo.com/mediji.php ATB, Vasja Zaheed Haque wrote: >> One question, though: anyone knows how to set more verbose logging? > > You can edit your log4j properties under nutch/conf to enable DEBUG > mode both for hadoop and nutch. > > Cheers > >> Thanks. >> >> 2006-08-01 19:58:37,576 INFO fetcher.Fetcher - fetching >> http://www.foo.com/faq.php >> 2006-08-01 19:58:37,599 INFO http.Http - http.proxy.host = null >> 2006-08-01 19:58:37,599 INFO http.Http - http.proxy.port = 8080 >> 2006-08-01 19:58:37,599 INFO http.Http - http.timeout = 10000 >> 2006-08-01 19:58:37,600 INFO http.Http - http.content.limit = 65536 >> 2006-08-01 19:58:37,600 INFO http.Http - http.agent = siBot/siBot-0.1 >> (http://www.foo.com/; [EMAIL PROTECTED]) >> 2006-08-01 19:58:37,600 INFO http.Http - fetcher.server.delay = 5000 >> 2006-08-01 19:58:37,600 INFO http.Http - http.max.delays = 100 >> 2006-08-01 19:58:38,103 INFO crawl.SignatureFactory - Using Signature >> impl: org.apache.nutch.crawl.MD5Signature >> 2006-08-01 19:58:38,145 INFO fetcher.Fetcher - fetching >> http://www.foo.com/izobrazevanje.php >> 2006-08-01 19:58:43,569 INFO fetcher.Fetcher - fetching >> http://www.foo.com/kontakti.php >> 2006-08-01 19:58:48,624 INFO fetcher.Fetcher - fetching >> http://www.foo.com/portfolio_mailing.php >> 2006-08-01 19:58:53,553 INFO fetcher.Fetcher - fetching >> http://www.foo.com/online_katalogi.php >> 2006-08-01 19:58:58,597 INFO fetcher.Fetcher - fetching >> http://www.foo.com/postavitev_sistemov.php >> 2006-08-01 19:59:03,592 INFO fetcher.Fetcher - fetching >> http://www.foo.com/internet_aplikacije.php >> 2006-08-01 19:59:08,655 INFO fetcher.Fetcher - fetching >> http://www.foo.com/gradivo.php >> >> ATB, >> Vasja >> >> Stefan Groschupf wrote: >> > Check: >> > http://issues.apache.org/jira/browse/NUTCH-233 >> > and let us know if it helps. >> > Stefan >> > >> > >> > Am 31.07.2006 um 07:46 schrieb Matthew Holt: >> > >> >> Fetcher for one, and the mapreduce takes forever... IE the mapreduce >> >> is kind of annoying... is it possible to disable it if I'm not >> >> running on a DFS? >> >> Matt >> >> >> >> 06/07/25 20:59:12 INFO mapred.LocalJobRunner: reduce > reduce >> >> 06/07/25 20:59:14 INFO mapred.LocalJobRunner: reduce > reduce >> >> 06/07/25 20:59:19 INFO mapred.LocalJobRunner: reduce > reduce >> >> 06/07/25 20:59:23 INFO mapred.LocalJobRunner: reduce > reduce >> >> 06/07/25 20:59:29 INFO mapred.LocalJobRunner: reduce > reduce >> >> 06/07/25 20:59:33 INFO mapred.LocalJobRunner: reduce > reduce >> >> 06/07/25 20:59:34 INFO mapred.JobClient: map 100% reduce 96% >> >> 06/07/25 20:59:40 INFO mapred.LocalJobRunner: reduce > reduce >> >> 06/07/25 20:59:41 INFO mapred.LocalJobRunner: reduce > reduce >> >> 06/07/25 20:59:42 INFO mapred.LocalJobRunner: reduce > reduce >> >> 06/07/25 20:59:47 INFO mapred.LocalJobRunner: reduce > reduce >> >> 06/07/25 20:59:48 INFO mapred.LocalJobRunner: reduce > reduce >> >> 06/07/25 20:59:52 INFO mapred.LocalJobRunner: reduce > reduce >> >> 06/07/25 20:59:53 INFO mapred.LocalJobRunner: reduce > reduce >> >> 06/07/25 21:00:05 INFO mapred.LocalJobRunner: reduce > reduce >> >> 06/07/25 21:00:22 INFO mapred.LocalJobRunner: reduce > reduce >> >> 06/07/25 21:00:29 INFO mapred.LocalJobRunner: reduce > reduce >> >> 06/07/25 21:00:39 INFO mapred.LocalJobRunner: reduce > reduce >> >> 06/07/25 21:01:07 INFO mapred.LocalJobRunner: reduce > reduce >> >> 06/07/25 21:01:08 INFO mapred.JobClient: map 100% reduce 97% >> >> 06/07/25 21:01:16 INFO mapred.LocalJobRunner: reduce > reduce >> >> >> >> >> >> Sami Siren wrote: >> >>> Are you experiencing slowness in general or just on some parts of >> >>> the process. >> >>> >> >>> Current fetcher is deadslow and it should be given immediate >> >>> attention. there have been some talk about the issue but I havent >> >>> seen any code yet. >> >>> >> >>> -- Sami Siren >> >>> >> >>> Matthew Holt wrote: >> >>>> I agree. Is there anyway to disable something to speed it up? IE is >> >>>> the map reduce currently needed if we're not on a DFS? >> >>>> >> >>>> Matt >> >>>> >> >>>> Vasja Ocvirk wrote: >> >>>> >> >>>>> Hello, >> >>>>> >> >>>>> I'm wondering if anyone can help. We injected 1000 seed URLs into >> >>>>> Nutch 0.7.2 (basic configuration + 1000 URLs in regexp filter) and >> >>>>> it processed them in just few hours. We just switched to 0.8 with >> >>>>> same configuration, same URLs, but it seems everything slowed down >> >>>>> significantly. Crawl script has 60 threads -- same as before but >> >>>>> now it works much slower. >> >>>>> >> >>>>> Thanks! >> >>>>> >> >>>>> Best, >> >>>>> Vasja >> >>>>> >> >>>>> __________ NOD32 1.1533 (20060512) Information __________ >> >>>>> >> >>>>> This message was checked by NOD32 antivirus system. >> >>>>> http://www.eset.com >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>> >> >>> >> >>> >> >> >> > >> > >> > __________ NOD32 1.1533 (20060512) Information __________ >> > >> > This message was checked by NOD32 antivirus system. >> > http://www.eset.com >> > >> > >> > >> > ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
