Everyone, thanks for the help with this. I hope to return the assistance, once I am more familiar with 0.8. I am using tail -f now to monitor my test crawls. It also look like you can use conf/hadoop-env.sh to redirect log file output to a different location for each of your configurations.
One follow up question: Now that I can actually see the log, I am finding some of the output rather annoying/noisy. Specially, I am referring to the Registered Plugins and Registered Extension-Points output. It's nice to see that once at crawl start, but not with every step of the crawl. So does any one know if I can disable that output? Here's the output to which I refer: 2006-09-14 14:03:42,852 INFO plugin.PluginRepository - Plugins: looking in: /var/nutch/nutch-0.8/plugins 2006-09-14 14:03:43,030 INFO plugin.PluginRepository - Plugin Auto-activation mode: [true] 2006-09-14 14:03:43,030 INFO plugin.PluginRepository - Registered Plugins: 2006-09-14 14:03:43,031 INFO plugin.PluginRepository - CyberNeko HTML Parser (lib-nekohtml) 2006-09-14 14:03:43,031 INFO plugin.PluginRepository - Site Query Filter (query-site) 2006-09-14 14:03:43,031 INFO plugin.PluginRepository - Html Parse Plug-in (parse-html) [snip] 2006-09-14 14:03:43,031 INFO plugin.PluginRepository - Registered Extension-Points: 2006-09-14 14:03:43,031 INFO plugin.PluginRepository - Nutch Summarizer (org.apache.nutch.searcher.Summarizer) 2006-09-14 14:03:43,031 INFO plugin.PluginRepository - Nutch [snip] Search Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer) 2006-09-14 14:03:43,032 INFO plugin.PluginRepository - Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter) 2006-09-14 14:03:43,032 INFO plugin.PluginRepository - Nutch Content Parser (org.apache.nutch.parse.Parser) [snip] Jared- -----Original Message----- From: Jacob Brunson [mailto:[EMAIL PROTECTED] Sent: Thursday, September 14, 2006 1:24 AM To: [email protected] Subject: Re: 0.8 Intranet Crawl Output/Logging? On my system, I run the crawl command in one shell while running this command in another shell to monitor the crawl: tail -f log/hadoop.log Of course this does about the same thing as listed below, but "tail -f" is a little easier to remember. On 9/13/06, Tomi NA <[EMAIL PROTECTED]> wrote: > On 9/13/06, wmelo <[EMAIL PROTECTED]> wrote: > > I have the same original doubt. I know that the log shows informations, > > but, how to see the things happening, real time, like in nutch 0.7.2, when > > you use the crawl command in the terminal? > > try something like this (assuming you know what's good for you so you > use a *n*x): > watch -n 1 "tail -n 20 /home/wmelo/nutch-0.8/logs/hadoop.log" > > Please replace the path to your "logs" directory to match your > environment and report back if there's a problem. > Hope it helps. > > t.n.a. > -- http://JacobBrunson.com
