Hello, I've been given the task of figuring out why nutch is running slower on Solaris then on Linux with the same configuration. I am looking at the log file and I see this big gap between the time fetcher stops fetching and it says it is done and I would love to know what is going on. Here is the log snippet.
2010-09-24 11:04:28,413 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=0 2010-09-24 11:04:29,200 INFO fetcher.Fetcher - -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0 2010-09-24 11:04:29,200 INFO fetcher.Fetcher - -activeThreads=0 2010-09-24 11:05:32,782 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2010-09-24 11:05:33,469 INFO plugin.PluginRepository - Plugins: looking in: /opt/nutch/build/plugins 2010-09-24 11:05:34,052 INFO plugin.PluginRepository - Plugin Auto-activation mode: [true] 2010-09-24 11:05:34,053 INFO plugin.PluginRepository - Registered Plugins: 2010-09-24 11:05:34,053 INFO plugin.PluginRepository - Jakarta POI - Java API To Access Microsoft Format Files (lib-jakarta-poi) 2010-09-24 11:05:34,053 INFO plugin.PluginRepository - More Indexing Filter (index-more) 2010-09-24 11:05:34,053 INFO plugin.PluginRepository - HTTP Framework (lib-http) 2010-09-24 11:05:34,053 INFO plugin.PluginRepository - MSWord Parse Plug-in (parse-msword) 2010-09-24 11:05:34,053 INFO plugin.PluginRepository - More Query Filter (query-more) 2010-09-24 11:05:34,053 INFO plugin.PluginRepository - Regex URL Filter (urlfilter-regex) 2010-09-24 11:05:34,053 INFO plugin.PluginRepository - XML Libraries (lib-xml) 2010-09-24 11:05:34,054 INFO plugin.PluginRepository - Http Protocol Plug-in (protocol-http) 2010-09-24 11:05:34,054 INFO plugin.PluginRepository - MSExcel Parse Plug-in (parse-msexcel) 2010-09-24 11:05:34,054 INFO plugin.PluginRepository - XML Response Writer Plug-in (response-xml) 2010-09-24 11:05:34,054 INFO plugin.PluginRepository - OPIC Scoring Plug-in (scoring-opic) 2010-09-24 11:05:34,054 INFO plugin.PluginRepository - Zip Parse Plug-in (parse-zip) 2010-09-24 11:05:34,054 INFO plugin.PluginRepository - Anchor Indexing Filter (index-anchor) 2010-09-24 11:05:34,054 INFO plugin.PluginRepository - URL Query Filter (query-url) 2010-09-24 11:05:34,055 INFO plugin.PluginRepository - Parse MS Documents Framework (lib-parsems) 2010-09-24 11:05:34,055 INFO plugin.PluginRepository - Regex URL Filter Framework (lib-regex-filter) 2010-09-24 11:05:34,055 INFO plugin.PluginRepository - JSON Response Writer Plug-in (response-json) 2010-09-24 11:05:34,055 INFO plugin.PluginRepository - the nutch core extension points (nutch-extensionpoints) 2010-09-24 11:05:34,055 INFO plugin.PluginRepository - MSPowerPoint Parse Plug-in (parse-mspowerpoint) 2010-09-24 11:05:34,055 INFO plugin.PluginRepository - Basic Query Filter (query-basic) 2010-09-24 11:05:34,055 INFO plugin.PluginRepository - RSS Parse Plug-in (parse-rss) 2010-09-24 11:05:34,056 INFO plugin.PluginRepository - Html Parse Plug-in (parse-html) 2010-09-24 11:05:34,056 INFO plugin.PluginRepository - Basic Indexing Filter (index-basic) 2010-09-24 11:05:34,056 INFO plugin.PluginRepository - Site Query Filter (query-site) 2010-09-24 11:05:34,056 INFO plugin.PluginRepository - Basic Summarizer Plug-in (summary-basic) 2010-09-24 11:05:34,056 INFO plugin.PluginRepository - Text Parse Plug-in (parse-text) 2010-09-24 11:05:34,056 INFO plugin.PluginRepository - CyberNeko HTML Parser (lib-nekohtml) 2010-09-24 11:05:34,056 INFO plugin.PluginRepository - File Protocol Plug-in (protocol-file) 2010-09-24 11:05:34,056 INFO plugin.PluginRepository - Registered Extension-Points: 2010-09-24 11:05:34,057 INFO plugin.PluginRepository - Nutch Summarizer (org.apache.nutch.searcher.Summarizer) 2010-09-24 11:05:34,057 INFO plugin.PluginRepository - Nutch Protocol (org.apache.nutch.protocol.Protocol) 2010-09-24 11:05:34,057 INFO plugin.PluginRepository - Nutch Analysis (org.apache.nutch.analysis.NutchAnalyzer) 2010-09-24 11:05:34,057 INFO plugin.PluginRepository - Nutch Field Filter (org.apache.nutch.indexer.field.FieldFilter) 2010-09-24 11:05:34,057 INFO plugin.PluginRepository - HTML Parse Filter (org.apache.nutch.parse.HtmlParseFilter) 2010-09-24 11:05:34,057 INFO plugin.PluginRepository - Nutch Query Filter (org.apache.nutch.searcher.QueryFilter) 2010-09-24 11:05:34,057 INFO plugin.PluginRepository - Nutch Search Results Response Writer (org.apache.nutch.searcher.response.ResponseWriter) 2010-09-24 11:05:34,058 INFO plugin.PluginRepository - Nutch URL Normalizer (org.apache.nutch.net.URLNormalizer) 2010-09-24 11:05:34,058 INFO plugin.PluginRepository - Nutch URL Filter (org.apache.nutch.net.URLFilter) 2010-09-24 11:05:34,058 INFO plugin.PluginRepository - Nutch Online Search Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer) 2010-09-24 11:05:34,058 INFO plugin.PluginRepository - Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter) 2010-09-24 11:05:34,058 INFO plugin.PluginRepository - Nutch Content Parser (org.apache.nutch.parse.Parser) 2010-09-24 11:05:34,058 INFO plugin.PluginRepository - Nutch Scoring (org.apache.nutch.scoring.ScoringFilter) 2010-09-24 11:05:34,058 INFO plugin.PluginRepository - Ontology Model Loader (org.apache.nutch.ontology.Ontology) 2010-09-24 11:47:04,995 INFO fetcher.Fetcher - Fetcher: done 2010-09-24 11:47:10,151 INFO crawl.CrawlDb - CrawlDb update: starting So at 11:04, fetcher winds down and has no more threads to run. At 11:05 it gives an error about not having native hadoop libraries (I am going to build them today) and loads plugins. Then Fetcher gives a message that is done - 32 minutes later and Crawldb starts. What did Fetcher do for 32 minutes? Thanks, Steve

