Hello Jared,
[EMAIL PROTECTED] wrote:
Everyone, thanks for the help with this. I hope to return the
assistance, once I am more familiar with 0.8. I am using tail -f now to
monitor my test crawls. It also look like you can use
conf/hadoop-env.sh to redirect log file output to a different location
for each of your configurations.
One follow up question:
Now that I can actually see the log, I am finding some of the output
rather annoying/noisy. Specially, I am referring to the Registered
Plugins and Registered Extension-Points output. It's nice to see that
once at crawl start, but not with every step of the crawl.
So does any one know if I can disable that output?
please see http://issues.apache.org/jira/browse/NUTCH-346
HTH,
Renaud
Here's the output to
which I refer:
2006-09-14 14:03:42,852 INFO plugin.PluginRepository - Plugins: looking
in: /var/nutch/nutch-0.8/plugins
2006-09-14 14:03:43,030 INFO plugin.PluginRepository - Plugin
Auto-activation mode: [true]
2006-09-14 14:03:43,030 INFO plugin.PluginRepository - Registered
Plugins:
2006-09-14 14:03:43,031 INFO plugin.PluginRepository -
CyberNeko HTML Parser (lib-nekohtml)
2006-09-14 14:03:43,031 INFO plugin.PluginRepository - Site
Query Filter (query-site)
2006-09-14 14:03:43,031 INFO plugin.PluginRepository - Html
Parse Plug-in (parse-html)
[snip]
2006-09-14 14:03:43,031 INFO plugin.PluginRepository - Registered
Extension-Points:
2006-09-14 14:03:43,031 INFO plugin.PluginRepository - Nutch
Summarizer (org.apache.nutch.searcher.Summarizer)
2006-09-14 14:03:43,031 INFO plugin.PluginRepository - Nutch
[snip]
Search Results Clustering Plugin
(org.apache.nutch.clustering.OnlineClusterer)
2006-09-14 14:03:43,032 INFO plugin.PluginRepository - Nutch
Indexing Filter (org.apache.nutch.indexer.IndexingFilter)
2006-09-14 14:03:43,032 INFO plugin.PluginRepository - Nutch
Content Parser (org.apache.nutch.parse.Parser)
[snip]
Jared-
-----Original Message-----
From: Jacob Brunson [mailto:[EMAIL PROTECTED]
Sent: Thursday, September 14, 2006 1:24 AM
To: [email protected]
Subject: Re: 0.8 Intranet Crawl Output/Logging?
On my system, I run the crawl command in one shell while running this
command in another shell to monitor the crawl:
tail -f log/hadoop.log
Of course this does about the same thing as listed below, but "tail
-f" is a little easier to remember.
On 9/13/06, Tomi NA <[EMAIL PROTECTED]> wrote:
On 9/13/06, wmelo <[EMAIL PROTECTED]> wrote:
I have the same original doubt. I know that the log shows
informations,
but, how to see the things happening, real time, like in nutch
0.7.2, when
you use the crawl command in the terminal?
try something like this (assuming you know what's good for you so you
use a *n*x):
watch -n 1 "tail -n 20 /home/wmelo/nutch-0.8/logs/hadoop.log"
Please replace the path to your "logs" directory to match your
environment and report back if there's a problem.
Hope it helps.
t.n.a.
--
Renaud Richardet
COO America
Wyona - Open Source Content Management - Apache Lenya
office +1 857 776-3195 mobile +1 617 230 9112
renaud.richardet <at> wyona.com http://www.wyona.com