When I run the nutch-0.9 by the procedure of web page: http://wiki.apache.org/nutch/RunNutchInEclipse , I got the following messages from my eclilpse console tab:
2007-06-29 18:05:47,390 INFO crawl.Crawl (Crawl.java:main(89)) - crawl started in: crawl-epw 2007-06-29 18:05:47,406 INFO crawl.Crawl (Crawl.java:main(90)) - rootUrlDir = urls 2007-06-29 18:05:47,406 INFO crawl.Crawl (Crawl.java:main(91)) - threads = 10 2007-06-29 18:05:47,406 INFO crawl.Crawl (Crawl.java:main(92)) - depth = 5 2007-06-29 18:05:47,406 INFO crawl.Crawl (Crawl.java:main(94)) - topN = 50 2007-06-29 18:05:48,140 INFO crawl.Injector (Injector.java:inject(138)) - Injector: starting 2007-06-29 18:05:48,140 INFO crawl.Injector (Injector.java:inject(139)) - Injector: crawlDb: crawl-epw/crawldb 2007-06-29 18:05:48,140 INFO crawl.Injector (Injector.java:inject(140)) - Injector: urlDir: urls 2007-06-29 18:05:48,140 INFO crawl.Injector (Injector.java:inject(150)) - Injector: Converting injected urls to crawl db entries. 2007-06-29 18:05:48,734 INFO mapred.InputFormatBase (InputFormatBase.java:validateInput(141)) - Total input paths to process : 1 2007-06-29 18:05:49,234 INFO mapred.JobClient (JobClient.java:runJob(545)) - Running job: job_qhjv4d 2007-06-29 18:05:49,593 INFO plugin.PluginRepository (PluginManifestParser.java:parsePluginFolder(86)) - Plugins: looking in: C:\JavaSearchEngine\nutch-0.9\plugins 2007-06-29 18:05:50,234 INFO mapred.JobClient (JobClient.java:runJob(562)) - map 0% reduce 0% 2007-06-29 18:05:50,453 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(309)) - Plugin Auto-activation mode: [true] 2007-06-29 18:05:50,453 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(310)) - Registered Plugins: 2007-06-29 18:05:50,453 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Creative Commons Plugins (creativecommons) 2007-06-29 18:05:50,453 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Site Query Filter (query-site) 2007-06-29 18:05:50,453 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Subcollection indexing and query filter (subcollection) 2007-06-29 18:05:50,453 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Http / Https Protocol Plug-in (protocol-httpclient) 2007-06-29 18:05:50,453 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Html Parse Plug-in (parse-html) 2007-06-29 18:05:50,453 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Regex URL Filter Framework (lib-regex-filter) 2007-06-29 18:05:50,453 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Lucene Analysers (lib-lucene-analyzers) 2007-06-29 18:05:50,453 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Basic Indexing Filter (index-basic) 2007-06-29 18:05:50,453 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Pdf Parse Plug-in (parse-pdf) 2007-06-29 18:05:50,453 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - MSExcel Parse Plug-in (parse-msexcel) 2007-06-29 18:05:50,453 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Basic Summarizer Plug-in (summary-basic) 2007-06-29 18:05:50,453 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - JavaScript Parser (parse-js) 2007-06-29 18:05:50,453 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Regex URL Filter (urlfilter-regex) 2007-06-29 18:05:50,468 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - HTTP Framework (lib-http) 2007-06-29 18:05:50,468 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - URL Query Filter (query-url) 2007-06-29 18:05:50,468 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - SWF Parse Plug-in (parse-swf) 2007-06-29 18:05:50,468 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Log4j (lib-log4j) 2007-06-29 18:05:50,468 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - External Parser Plug-in (parse-ext) 2007-06-29 18:05:50,468 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Ontology Plug-in (ontology) 2007-06-29 18:05:50,468 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Ftp Protocol Plug-in (protocol-ftp) 2007-06-29 18:05:50,468 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Zip Parse Plug-in (parse-zip) 2007-06-29 18:05:50,468 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Http Protocol Plug-in (protocol-http) 2007-06-29 18:05:50,468 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - More Indexing Filter (index-more) 2007-06-29 18:05:50,468 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - the nutch core extension points (nutch-extensionpoints) 2007-06-29 18:05:50,468 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Suffix URL Filter (urlfilter-suffix) 2007-06-29 18:05:50,468 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - More Query Filter (query-more) 2007-06-29 18:05:50,468 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Online Search Results Clustering using Carrot2's Lingo component (clustering-carrot2) 2007-06-29 18:05:50,468 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Rel-Tag microformat Parser/Indexer/Querier (microformats-reltag) 2007-06-29 18:05:50,468 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Language Identification Parser/Filter (language-identifier) 2007-06-29 18:05:50,468 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Prefix URL Filter (urlfilter-prefix) 2007-06-29 18:05:50,468 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - CyberNeko HTML Parser (lib-nekohtml) 2007-06-29 18:05:50,468 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - MSPowerPoint Parse Plug-in (parse-mspowerpoint) 2007-06-29 18:05:50,468 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - MSWord Parse Plug-in (parse-msword) 2007-06-29 18:05:50,468 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Basic URL Normalizer (urlnormalizer-basic) 2007-06-29 18:05:50,468 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Pass-through URL Normalizer (urlnormalizer-pass) 2007-06-29 18:05:50,468 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - File Protocol Plug-in (protocol-file) 2007-06-29 18:05:50,468 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Text Parse Plug-in (parse-text) 2007-06-29 18:05:50,468 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Jakarta POI - Java API To Access Microsoft Format Files (lib-jakarta-poi) 2007-06-29 18:05:50,468 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Basic Query Filter (query-basic) 2007-06-29 18:05:50,484 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - XML Libraries (lib-xml) 2007-06-29 18:05:50,484 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Regex URL Normalizer (urlnormalizer-regex) 2007-06-29 18:05:50,484 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Parse MS Documents Framework (lib-parsems) 2007-06-29 18:05:50,484 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - RSS Parse Plug-in (parse-rss) 2007-06-29 18:05:50,484 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - OPIC Scoring Plug-in (scoring-opic) 2007-06-29 18:05:50,484 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - OpenOffice/OpenDocument Parse Plug-in (parse-oo) 2007-06-29 18:05:50,484 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Automaton URL Filter (urlfilter-automaton) 2007-06-29 18:05:50,484 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Lucene Highlighter Summary Plug-in (summary-lucene) 2007-06-29 18:05:50,484 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(320)) - Registered Extension-Points: 2007-06-29 18:05:50,484 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(325)) - Nutch Summarizer (org.apache.nutch.searcher.Summarizer) 2007-06-29 18:05:50,484 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(325)) - Nutch Scoring (org.apache.nutch.scoring.ScoringFilter) 2007-06-29 18:05:50,484 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(325)) - Nutch Protocol (org.apache.nutch.protocol.Protocol) 2007-06-29 18:05:50,484 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(325)) - Nutch URL Normalizer (org.apache.nutch.net.URLNormalizer) 2007-06-29 18:05:50,484 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(325)) - Nutch URL Filter (org.apache.nutch.net.URLFilter) 2007-06-29 18:05:50,484 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(325)) - HTML Parse Filter (org.apache.nutch.parse.HtmlParseFilter) 2007-06-29 18:05:50,484 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(325)) - Nutch Online Search Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer) 2007-06-29 18:05:50,484 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(325)) - Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter) 2007-06-29 18:05:50,484 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(325)) - Nutch Content Parser (org.apache.nutch.parse.Parser) 2007-06-29 18:05:50,484 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(325)) - Ontology Model Loader (org.apache.nutch.ontology.Ontology) 2007-06-29 18:05:50,484 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(325)) - Nutch Analysis (org.apache.nutch.analysis.NutchAnalyzer) 2007-06-29 18:05:50,484 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(325)) - Nutch Query Filter (org.apache.nutch.searcher.QueryFilter) 2007-06-29 18:05:50,937 INFO conf.Configuration (Configuration.java:getConfResourceAsReader(441)) - found resource crawl-urlfilter.txt at file:/C:/JavaSearchEngine/nutch-0.9/tmp-build/crawl-urlfilter.txt 2007-06-29 18:05:51,015 INFO conf.Configuration (Configuration.java:getConfResourceAsReader(441)) - found resource suffix-urlfilter.txt at file:/C:/JavaSearchEngine/nutch-0.9/tmp-build/suffix-urlfilter.txt 2007-06-29 18:05:51,218 INFO conf.Configuration (Configuration.java:getConfResourceAsReader(438)) - prefix-urlfilter.txt not found 2007-06-29 18:05:51,296 INFO conf.Configuration (Configuration.java:getConfResourceAsReader(441)) - found resource automaton-urlfilter.txt at file:/C:/JavaSearchEngine/nutch-0.9/tmp-build/automaton-urlfilter.txt 2007-06-29 18:05:51,765 INFO mapred.LocalJobRunner (LocalJobRunner.java:progress(188)) - file:/C:/JavaSearchEngine/nutch-0.9/urls/nutch:0+36 2007-06-29 18:05:51,796 WARN regex.RegexURLNormalizer (RegexURLNormalizer.java:regexNormalize(159)) - can't find rules for scope 'inject', using default 2007-06-29 18:05:52,109 INFO mapred.LocalJobRunner (LocalJobRunner.java:progress(188)) - file:/C:/JavaSearchEngine/nutch-0.9/urls/nutch:0+36 2007-06-29 18:05:52,250 INFO mapred.JobClient (JobClient.java:runJob(562)) - map 100% reduce 0% 2007-06-29 18:05:52,312 INFO mapred.LocalJobRunner (LocalJobRunner.java:progress(188)) - reduce > reduce 2007-06-29 18:05:53,250 INFO mapred.JobClient (JobClient.java:runJob(606)) - Job complete: job_qhjv4d 2007-06-29 18:05:53,250 INFO mapred.JobClient (Counters.java:log(357)) - Counters: 2 2007-06-29 18:05:53,265 INFO mapred.JobClient (Counters.java:log(361)) - Map-Reduce Framework 2007-06-29 18:05:53,265 INFO mapred.JobClient (Counters.java:log(363)) - Map input records=2 2007-06-29 18:05:53,265 INFO mapred.JobClient (Counters.java:log(363)) - Map input bytes=36 2007-06-29 18:05:53,265 INFO crawl.Injector (Injector.java:inject(166)) - Injector: Merging injected urls into crawl db. 2007-06-29 18:05:53,453 INFO mapred.InputFormatBase (InputFormatBase.java:validateInput(141)) - Total input paths to process : 1 2007-06-29 18:05:53,734 INFO mapred.JobClient (JobClient.java:runJob(545)) - Running job: job_tejhio 2007-06-29 18:05:53,843 INFO mapred.LocalJobRunner (LocalJobRunner.java:progress(188)) - file:/tmp/hadoop-Administrator/mapred/temp/inject-temp-111063224/part-00000: 0+86 2007-06-29 18:05:53,984 WARN util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(51)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2007-06-29 18:05:54,031 INFO mapred.LocalJobRunner (LocalJobRunner.java:progress(188)) - reduce > reduce 2007-06-29 18:05:54,750 INFO mapred.JobClient (JobClient.java:runJob(606)) - Job complete: job_tejhio 2007-06-29 18:05:54,750 INFO mapred.JobClient (Counters.java:log(357)) - Counters: 2 2007-06-29 18:05:54,750 INFO mapred.JobClient (Counters.java:log(361)) - Map-Reduce Framework 2007-06-29 18:05:54,750 INFO mapred.JobClient (Counters.java:log(363)) - Map input records=1 2007-06-29 18:05:54,750 INFO mapred.JobClient (Counters.java:log(363)) - Map input bytes=0 2007-06-29 18:05:54,906 INFO crawl.Injector (Injector.java:inject(177)) - Injector: done 2007-06-29 18:05:55,906 INFO crawl.Generator (Generator.java:generate(376)) - Generator: Selecting best-scoring urls due for fetch. 2007-06-29 18:05:55,906 INFO crawl.Generator (Generator.java:generate(377)) - Generator: starting 2007-06-29 18:05:55,906 INFO crawl.Generator (Generator.java:generate(378)) - Generator: segment: crawl-epw/segments/20070629180555 2007-06-29 18:05:55,906 INFO crawl.Generator (Generator.java:generate(379)) - Generator: filtering: false 2007-06-29 18:05:55,906 INFO crawl.Generator (Generator.java:generate(381)) - Generator: topN: 50 2007-06-29 18:05:55,968 INFO crawl.Generator (Generator.java:generate(393)) - Generator: jobtracker is 'local', generating exactly one partition. 2007-06-29 18:05:56,093 INFO mapred.InputFormatBase (InputFormatBase.java:validateInput(141)) - Total input paths to process : 1 2007-06-29 18:05:56,265 INFO mapred.JobClient (JobClient.java:runJob(545)) - Running job: job_3txcrs 2007-06-29 18:05:56,421 INFO plugin.PluginRepository (PluginManifestParser.java:parsePluginFolder(86)) - Plugins: looking in: C:\JavaSearchEngine\nutch-0.9\plugins 2007-06-29 18:05:56,734 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(309)) - Plugin Auto-activation mode: [true] 2007-06-29 18:05:56,734 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(310)) - Registered Plugins: 2007-06-29 18:05:56,734 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Creative Commons Plugins (creativecommons) 2007-06-29 18:05:56,734 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Site Query Filter (query-site) 2007-06-29 18:05:56,734 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Subcollection indexing and query filter (subcollection) 2007-06-29 18:05:56,734 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Http / Https Protocol Plug-in (protocol-httpclient) 2007-06-29 18:05:56,734 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Html Parse Plug-in (parse-html) 2007-06-29 18:05:56,734 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Regex URL Filter Framework (lib-regex-filter) 2007-06-29 18:05:56,734 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Lucene Analysers (lib-lucene-analyzers) 2007-06-29 18:05:56,734 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Basic Indexing Filter (index-basic) 2007-06-29 18:05:56,734 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Pdf Parse Plug-in (parse-pdf) 2007-06-29 18:05:56,734 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - MSExcel Parse Plug-in (parse-msexcel) 2007-06-29 18:05:56,734 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Basic Summarizer Plug-in (summary-basic) 2007-06-29 18:05:56,734 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - JavaScript Parser (parse-js) 2007-06-29 18:05:56,734 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Regex URL Filter (urlfilter-regex) 2007-06-29 18:05:56,734 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - HTTP Framework (lib-http) 2007-06-29 18:05:56,734 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - URL Query Filter (query-url) 2007-06-29 18:05:56,734 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - SWF Parse Plug-in (parse-swf) 2007-06-29 18:05:56,734 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Log4j (lib-log4j) 2007-06-29 18:05:56,734 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - External Parser Plug-in (parse-ext) 2007-06-29 18:05:56,734 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Ontology Plug-in (ontology) 2007-06-29 18:05:56,734 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Ftp Protocol Plug-in (protocol-ftp) 2007-06-29 18:05:56,734 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Zip Parse Plug-in (parse-zip) 2007-06-29 18:05:56,734 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Http Protocol Plug-in (protocol-http) 2007-06-29 18:05:56,734 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - More Indexing Filter (index-more) 2007-06-29 18:05:56,734 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - the nutch core extension points (nutch-extensionpoints) 2007-06-29 18:05:56,734 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Suffix URL Filter (urlfilter-suffix) 2007-06-29 18:05:56,734 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - More Query Filter (query-more) 2007-06-29 18:05:56,734 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Online Search Results Clustering using Carrot2's Lingo component (clustering-carrot2) 2007-06-29 18:05:56,734 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Rel-Tag microformat Parser/Indexer/Querier (microformats-reltag) 2007-06-29 18:05:56,734 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Language Identification Parser/Filter (language-identifier) 2007-06-29 18:05:56,734 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Prefix URL Filter (urlfilter-prefix) 2007-06-29 18:05:56,734 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - CyberNeko HTML Parser (lib-nekohtml) 2007-06-29 18:05:56,734 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - MSPowerPoint Parse Plug-in (parse-mspowerpoint) 2007-06-29 18:05:56,734 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - MSWord Parse Plug-in (parse-msword) 2007-06-29 18:05:56,734 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Basic URL Normalizer (urlnormalizer-basic) 2007-06-29 18:05:56,750 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Pass-through URL Normalizer (urlnormalizer-pass) 2007-06-29 18:05:56,750 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - File Protocol Plug-in (protocol-file) 2007-06-29 18:05:56,750 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Text Parse Plug-in (parse-text) 2007-06-29 18:05:56,750 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Jakarta POI - Java API To Access Microsoft Format Files (lib-jakarta-poi) 2007-06-29 18:05:56,750 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Basic Query Filter (query-basic) 2007-06-29 18:05:56,750 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - XML Libraries (lib-xml) 2007-06-29 18:05:56,750 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Regex URL Normalizer (urlnormalizer-regex) 2007-06-29 18:05:56,750 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Parse MS Documents Framework (lib-parsems) 2007-06-29 18:05:56,750 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - RSS Parse Plug-in (parse-rss) 2007-06-29 18:05:56,750 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - OPIC Scoring Plug-in (scoring-opic) 2007-06-29 18:05:56,750 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - OpenOffice/OpenDocument Parse Plug-in (parse-oo) 2007-06-29 18:05:56,750 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Automaton URL Filter (urlfilter-automaton) 2007-06-29 18:05:56,750 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Lucene Highlighter Summary Plug-in (summary-lucene) 2007-06-29 18:05:56,750 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(320)) - Registered Extension-Points: 2007-06-29 18:05:56,750 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(325)) - Nutch Summarizer (org.apache.nutch.searcher.Summarizer) 2007-06-29 18:05:56,750 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(325)) - Nutch Scoring (org.apache.nutch.scoring.ScoringFilter) 2007-06-29 18:05:56,750 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(325)) - Nutch Protocol (org.apache.nutch.protocol.Protocol) 2007-06-29 18:05:56,750 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(325)) - Nutch URL Normalizer (org.apache.nutch.net.URLNormalizer) 2007-06-29 18:05:56,750 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(325)) - Nutch URL Filter (org.apache.nutch.net.URLFilter) 2007-06-29 18:05:56,750 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(325)) - HTML Parse Filter (org.apache.nutch.parse.HtmlParseFilter) 2007-06-29 18:05:56,750 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(325)) - Nutch Online Search Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer) 2007-06-29 18:05:56,750 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(325)) - Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter) 2007-06-29 18:05:56,750 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(325)) - Nutch Content Parser (org.apache.nutch.parse.Parser) 2007-06-29 18:05:56,750 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(325)) - Ontology Model Loader (org.apache.nutch.ontology.Ontology) 2007-06-29 18:05:56,750 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(325)) - Nutch Analysis (org.apache.nutch.analysis.NutchAnalyzer) 2007-06-29 18:05:56,750 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(325)) - Nutch Query Filter (org.apache.nutch.searcher.QueryFilter) 2007-06-29 18:05:56,750 INFO conf.Configuration (Configuration.java:getConfResourceAsReader(441)) - found resource crawl-urlfilter.txt at file:/C:/JavaSearchEngine/nutch-0.9/tmp-build/crawl-urlfilter.txt 2007-06-29 18:05:56,765 INFO conf.Configuration (Configuration.java:getConfResourceAsReader(441)) - found resource suffix-urlfilter.txt at file:/C:/JavaSearchEngine/nutch-0.9/tmp-build/suffix-urlfilter.txt 2007-06-29 18:05:56,765 INFO conf.Configuration (Configuration.java:getConfResourceAsReader(438)) - prefix-urlfilter.txt not found 2007-06-29 18:05:56,765 INFO conf.Configuration (Configuration.java:getConfResourceAsReader(441)) - found resource automaton-urlfilter.txt at file:/C:/JavaSearchEngine/nutch-0.9/tmp-build/automaton-urlfilter.txt 2007-06-29 18:05:57,031 INFO mapred.LocalJobRunner (LocalJobRunner.java:progress(188)) - file:/C:/JavaSearchEngine/nutch-0.9/crawl-epw/crawldb/current/part-00000/dat a:0+129 2007-06-29 18:05:57,125 INFO plugin.PluginRepository (PluginManifestParser.java:parsePluginFolder(86)) - Plugins: looking in: C:\JavaSearchEngine\nutch-0.9\plugins 2007-06-29 18:05:57,281 INFO mapred.JobClient (JobClient.java:runJob(562)) - map 100% reduce 0% 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(309)) - Plugin Auto-activation mode: [true] 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(310)) - Registered Plugins: 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Creative Commons Plugins (creativecommons) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Site Query Filter (query-site) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Subcollection indexing and query filter (subcollection) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Http / Https Protocol Plug-in (protocol-httpclient) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Html Parse Plug-in (parse-html) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Regex URL Filter Framework (lib-regex-filter) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Lucene Analysers (lib-lucene-analyzers) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Basic Indexing Filter (index-basic) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Pdf Parse Plug-in (parse-pdf) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - MSExcel Parse Plug-in (parse-msexcel) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Basic Summarizer Plug-in (summary-basic) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - JavaScript Parser (parse-js) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Regex URL Filter (urlfilter-regex) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - HTTP Framework (lib-http) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - URL Query Filter (query-url) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - SWF Parse Plug-in (parse-swf) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Log4j (lib-log4j) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - External Parser Plug-in (parse-ext) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Ontology Plug-in (ontology) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Ftp Protocol Plug-in (protocol-ftp) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Zip Parse Plug-in (parse-zip) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Http Protocol Plug-in (protocol-http) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - More Indexing Filter (index-more) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - the nutch core extension points (nutch-extensionpoints) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Suffix URL Filter (urlfilter-suffix) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - More Query Filter (query-more) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Online Search Results Clustering using Carrot2's Lingo component (clustering-carrot2) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Rel-Tag microformat Parser/Indexer/Querier (microformats-reltag) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Language Identification Parser/Filter (language-identifier) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Prefix URL Filter (urlfilter-prefix) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - CyberNeko HTML Parser (lib-nekohtml) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - MSPowerPoint Parse Plug-in (parse-mspowerpoint) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - MSWord Parse Plug-in (parse-msword) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Basic URL Normalizer (urlnormalizer-basic) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Pass-through URL Normalizer (urlnormalizer-pass) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - File Protocol Plug-in (protocol-file) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Text Parse Plug-in (parse-text) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Jakarta POI - Java API To Access Microsoft Format Files (lib-jakarta-poi) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Basic Query Filter (query-basic) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - XML Libraries (lib-xml) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Regex URL Normalizer (urlnormalizer-regex) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Parse MS Documents Framework (lib-parsems) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - RSS Parse Plug-in (parse-rss) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - OPIC Scoring Plug-in (scoring-opic) 2007-06-29 18:05:57,421 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - OpenOffice/OpenDocument Parse Plug-in (parse-oo) 2007-06-29 18:05:57,437 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Automaton URL Filter (urlfilter-automaton) 2007-06-29 18:05:57,437 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(316)) - Lucene Highlighter Summary Plug-in (summary-lucene) 2007-06-29 18:05:57,437 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(320)) - Registered Extension-Points: 2007-06-29 18:05:57,437 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(325)) - Nutch Summarizer (org.apache.nutch.searcher.Summarizer) 2007-06-29 18:05:57,437 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(325)) - Nutch Scoring (org.apache.nutch.scoring.ScoringFilter) 2007-06-29 18:05:57,437 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(325)) - Nutch Protocol (org.apache.nutch.protocol.Protocol) 2007-06-29 18:05:57,437 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(325)) - Nutch URL Normalizer (org.apache.nutch.net.URLNormalizer) 2007-06-29 18:05:57,437 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(325)) - Nutch URL Filter (org.apache.nutch.net.URLFilter) 2007-06-29 18:05:57,437 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(325)) - HTML Parse Filter (org.apache.nutch.parse.HtmlParseFilter) 2007-06-29 18:05:57,437 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(325)) - Nutch Online Search Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer) 2007-06-29 18:05:57,437 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(325)) - Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter) 2007-06-29 18:05:57,437 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(325)) - Nutch Content Parser (org.apache.nutch.parse.Parser) 2007-06-29 18:05:57,437 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(325)) - Ontology Model Loader (org.apache.nutch.ontology.Ontology) 2007-06-29 18:05:57,437 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(325)) - Nutch Analysis (org.apache.nutch.analysis.NutchAnalyzer) 2007-06-29 18:05:57,437 INFO plugin.PluginRepository (PluginRepository.java:displayStatus(325)) - Nutch Query Filter (org.apache.nutch.searcher.QueryFilter) 2007-06-29 18:05:57,437 INFO conf.Configuration (Configuration.java:getConfResourceAsReader(441)) - found resource crawl-urlfilter.txt at file:/C:/JavaSearchEngine/nutch-0.9/tmp-build/crawl-urlfilter.txt 2007-06-29 18:05:57,453 INFO conf.Configuration (Configuration.java:getConfResourceAsReader(441)) - found resource suffix-urlfilter.txt at file:/C:/JavaSearchEngine/nutch-0.9/tmp-build/suffix-urlfilter.txt 2007-06-29 18:05:57,453 INFO conf.Configuration (Configuration.java:getConfResourceAsReader(438)) - prefix-urlfilter.txt not found 2007-06-29 18:05:57,453 INFO conf.Configuration (Configuration.java:getConfResourceAsReader(441)) - found resource automaton-urlfilter.txt at file:/C:/JavaSearchEngine/nutch-0.9/tmp-build/automaton-urlfilter.txt 2007-06-29 18:05:57,718 INFO mapred.LocalJobRunner (LocalJobRunner.java:progress(188)) - reduce > reduce 2007-06-29 18:05:58,281 INFO mapred.JobClient (JobClient.java:runJob(606)) - Job complete: job_3txcrs 2007-06-29 18:05:58,281 INFO mapred.JobClient (Counters.java:log(357)) - Counters: 2 2007-06-29 18:05:58,281 INFO mapred.JobClient (Counters.java:log(361)) - Map-Reduce Framework 2007-06-29 18:05:58,281 INFO mapred.JobClient (Counters.java:log(363)) - Map input records=1 2007-06-29 18:05:58,281 INFO mapred.JobClient (Counters.java:log(363)) - Map input bytes=0 2007-06-29 18:05:58,281 WARN crawl.Generator (Generator.java:generate(425)) - Generator: 0 records selected for fetching, exiting ... 2007-06-29 18:05:58,312 INFO crawl.Crawl (Crawl.java:main(121)) - Stopping at depth=0 - no more URLs to fetch. 2007-06-29 18:05:58,312 WARN crawl.Crawl (Crawl.java:main(138)) - No URLs to fetch - check your seed list and URL filters. 2007-06-29 18:05:58,312 INFO crawl.Crawl (Crawl.java:main(140)) - crawl finished: crawl-epw And I only get the crawldb folder, not all the five folders. Adam Shuy, President ePacific Web Design & Hosting Professional Web/Software developer TEL: 408-272-6946 www.epacificweb.com -----Original Message----- From: Tsengtan A Shuy [mailto:[EMAIL PROTECTED] Sent: Friday, June 29, 2007 2:55 PM To: [EMAIL PROTECTED] Subject: RE: windows eclipse run Please ignore my last email. I run both nutch-0.8.1 and nutch-0.9 with my windows eclipse environment. I got all the result folders: crawldb, index, indexs, linkdb and segments from nutch-0.8.1, but I only got crawldb folder from the nutch-0.9. Am I getting the right result? Any feedback will be much appreciated. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
