RE: 0.8 Intranet Crawl Output/Logging?

jared.dunne Thu, 14 Sep 2006 13:40:12 -0700

Everyone, thanks for the help with this.  I hope to return the
assistance, once I am more familiar with 0.8.  I am using tail -f now to
monitor my test crawls.  It also look like you can use
conf/hadoop-env.sh to redirect log file output to a different location
for each of your configurations.


One follow up question:
Now that I can actually see the log, I am finding some of the output
rather annoying/noisy.  Specially, I am referring to the Registered
Plugins and Registered Extension-Points output.  It's nice to see that
once at crawl start, but not with every step of the crawl.

So does any one know if I can disable that output?  Here's the output to
which I refer:

2006-09-14 14:03:42,852 INFO  plugin.PluginRepository - Plugins: looking
in: /var/nutch/nutch-0.8/plugins
2006-09-14 14:03:43,030 INFO  plugin.PluginRepository - Plugin
Auto-activation mode: [true]
2006-09-14 14:03:43,030 INFO  plugin.PluginRepository - Registered
Plugins:
2006-09-14 14:03:43,031 INFO  plugin.PluginRepository -
CyberNeko HTML Parser (lib-nekohtml)
2006-09-14 14:03:43,031 INFO  plugin.PluginRepository -         Site
Query Filter (query-site)
2006-09-14 14:03:43,031 INFO  plugin.PluginRepository -         Html
Parse Plug-in (parse-html)
[snip]
2006-09-14 14:03:43,031 INFO  plugin.PluginRepository - Registered
Extension-Points:
2006-09-14 14:03:43,031 INFO  plugin.PluginRepository -         Nutch
Summarizer (org.apache.nutch.searcher.Summarizer)
2006-09-14 14:03:43,031 INFO  plugin.PluginRepository -         Nutch 
[snip]
Search Results Clustering Plugin
(org.apache.nutch.clustering.OnlineClusterer)
2006-09-14 14:03:43,032 INFO  plugin.PluginRepository -         Nutch
Indexing Filter (org.apache.nutch.indexer.IndexingFilter)
2006-09-14 14:03:43,032 INFO  plugin.PluginRepository -         Nutch
Content Parser (org.apache.nutch.parse.Parser)
[snip]

Jared-

-----Original Message-----
From: Jacob Brunson [mailto:[EMAIL PROTECTED] 
Sent: Thursday, September 14, 2006 1:24 AM
To: [email protected]
Subject: Re: 0.8 Intranet Crawl Output/Logging?

On my system, I run the crawl command in one shell while running this
command in another shell to monitor the crawl:
tail -f log/hadoop.log
Of course this does about the same thing as listed below, but "tail
-f" is a little easier to remember.

On 9/13/06, Tomi NA <[EMAIL PROTECTED]> wrote:
> On 9/13/06, wmelo <[EMAIL PROTECTED]> wrote:
> > I have the same original doubt.  I know that the log shows
informations,
> > but, how to see the things happening, real time, like in nutch
0.7.2, when
> > you use the crawl command in the terminal?
>
> try something like this (assuming you know what's good for you so you
> use a *n*x):
> watch -n 1 "tail -n 20 /home/wmelo/nutch-0.8/logs/hadoop.log"
>
> Please replace the path to your "logs" directory to match your
> environment and report back if there's a problem.
> Hope it helps.
>
> t.n.a.
>


-- 
http://JacobBrunson.com

RE: 0.8 Intranet Crawl Output/Logging?

Reply via email to