Hi,
I read your mail.
Hi All,
I gone through this
http://wiki.apache.org/nutch/Nutch0.9-Hadoop0.10-Tutorial and able setting
up the hadoop with nutch.
While running this bin/hadoop namenode -format -- Formatted Succesfully.
then bin/start-all.sh -- then able start all the master and
Hi,
I assume that you are probably running this program in Eclipse or some other
IDE. However, you need to include the path-to-nutch/conf directory in your
classpath. Otherwise the configuration files are not parsed/found on
start-up. plugins.folder is a key from nutch-default.xml or
if running in eclipse, do an ant build and add the build/nutch-1.0-dev
folder to the classpath, then edit that and exclude everything within
that folder except the plugins directory. If running outside of eclipse
then you will need to include the parent of the plugins folder in the
classpath.
Hi,
I am trying to run this program from a bash terminal.I added the
$NUTCH_HOME/conf folder to the classpath as u suggested...still i dont see
it running...
Heres the error text :
08/02/05 21:29:30 INFO searcher.NutchBean: opening indexes in crawl/indexes
Exception in thread main
Hi,
Since i am running it in the terminal (which is outside of Eclipse and which
i havent installed btw)
i added the parent of the plugins folder ,which is $NUTCH_HOME variable to
the classpath..
But the problem is still there...
Dennis Kubes-2 wrote:
if running in eclipse, do an ant build
I found a few of things in org.apache.nutch.crawl package which I want
to ask. I have three questions.
(1) In Injector.java, normalize() happens first and then filter()
happens, where as in Generator.java filter() happens in map phase and
normalize() happens in reduce phase. Why is the order
The conf directory would need to be in the classpath. You would have a
nutch-site.xml file, amoung others, in the conf directory. That file
would need to specify the plugins.folder variable with a value of
plugins. Or you would need to have the nutch-default file in the conf
directory which
You would need the parent of the plugin directory along with the conf
directory in your classpath.
Dennis
Aled Rhys Jones wrote:
Hi all
I've implemented a Nutch Bean into my web application (based on Appfuse).
I'm pretty sure I've got all the prerequisites I need but when I try to
build
i resolved the problem!!
i change in conf/context.xsl
?xml version=1.0 encoding=UTF-8?
by
?xml version=1.0 encoding=iso-8859-1?
this is correct?
i read this
http://www.openrdf.org/doc/sesame/users/ch09.html#d0e3707
--
View this message in context:
Hi,
I am using the 0.9 version of Nutch.
the layout is :
$NUTCH_HOME is at /media/sda1/linux/java/nutch-0.9
conf folder : /media/sda1/linux/java/nutch-0.9/conf which contains the xml
files
plugins: /media/sda1/linux/java/nutch-0.9/plugins
The conf directory is in the classpath ,and the
You have not added nutch-default.xml and nutch-site.xml to your
Configuration object. Adding the following two lines to your code
should solve the problem:-
conf.addDefaultResource(nutch-default.xml);
conf.addDefaultResource(nutch-site.xml);
Regards,
Susam Pal
On Feb 6, 2008 12:17 AM, devj
Good catch Susam. Instead of including the files directly through add
methods, an easier way would be this:
Configuration conf = NutchConfiguration.create();
Dennis
Susam Pal wrote:
You have not added nutch-default.xml and nutch-site.xml to your
Configuration object. Adding the following
Susam Pal wrote:
I found a few of things in org.apache.nutch.crawl package which I want
to ask. I have three questions.
(1) In Injector.java, normalize() happens first and then filter()
happens, where as in Generator.java filter() happens in map phase and
normalize() happens in reduce phase.
devj wrote:
Hi,
So it finally worked.Thanks ,Susam, for the two lines..now i know wat the
conf object is for...
but now theres another problem
I keep getting a no class found error for '
org/apache/commons/cli/ParseException ' .I downloaded the commons sourec
package,built it and added the
Thanks Dennis
The parent of the plugin directory is the nutch directory isn't it?
(nutch-0.9 in my case).
Should I include these directories in my project? How do I reference the
required folders in my web application using eclipse? If they're not
physically located within my project folder,
I am running nutch 0.9.
I have run nutch mergesegs many times before.
The last couple times I have run, I get the following
errors:
-
Merging 14 segments to /var/nutch/crawl/mergesegs_dir/20080201220906
SegmentMerger: adding /var/nutch/crawl/segments/20080128132506
SegmentMerger: adding
hi to all
how can configure hadoop-site.xml in the part:
1.- fs.default.name
2.- mapred.job.tracker
3.- mapred.tasktracker.tasks.maximum
in general hadoop-site i am working wiht two machines one master node an one
as slave
thanks
--
View this message in context:
Hi folks...
What is the best way to say limit crawling to perhaps 3-4 hours per day?
Is there a way to do this?
Right now, I have a crawl depth of 6 and maximum per site of 100. I
thought this would limit things pretty low but during some test crawls,
my last crawl took 2.5 days to complete:
Did you try specifying a topN value? -depth 3 -topN 1000 should be
close to what you want.
On 2/6/08, Paul Stewart [EMAIL PROTECTED] wrote:
Hi folks...
What is the best way to say limit crawling to perhaps 3-4 hours per day?
Is there a way to do this?
Right now, I have a crawl depth of 6
19 matches
Mail list logo