Nutch - Hadoop (Bad Connection to FS)

2008-02-05 Thread Volkan Ebil
Hi, I read your mail. Hi All, I gone through this http://wiki.apache.org/nutch/Nutch0.9-Hadoop0.10-Tutorial and able setting up the hadoop with nutch. While running this bin/hadoop namenode -format -- Formatted Succesfully. then bin/start-all.sh -- then able start all the master and

Re: Urgent help reqd.....plz

2008-02-05 Thread Martin Kuen
Hi, I assume that you are probably running this program in Eclipse or some other IDE. However, you need to include the path-to-nutch/conf directory in your classpath. Otherwise the configuration files are not parsed/found on start-up. plugins.folder is a key from nutch-default.xml or

Re: Urgent help reqd.....plz

2008-02-05 Thread Dennis Kubes
if running in eclipse, do an ant build and add the build/nutch-1.0-dev folder to the classpath, then edit that and exclude everything within that folder except the plugins directory. If running outside of eclipse then you will need to include the parent of the plugins folder in the classpath.

Re: Urgent help reqd.....plz

2008-02-05 Thread devj
Hi, I am trying to run this program from a bash terminal.I added the $NUTCH_HOME/conf folder to the classpath as u suggested...still i dont see it running... Heres the error text : 08/02/05 21:29:30 INFO searcher.NutchBean: opening indexes in crawl/indexes Exception in thread main

Re: Urgent help reqd.....plz

2008-02-05 Thread devj
Hi, Since i am running it in the terminal (which is outside of Eclipse and which i havent installed btw) i added the parent of the plugins folder ,which is $NUTCH_HOME variable to the classpath.. But the problem is still there... Dennis Kubes-2 wrote: if running in eclipse, do an ant build

Questions on normalizer and filter related code in Crawl, Injector and Generator

2008-02-05 Thread Susam Pal
I found a few of things in org.apache.nutch.crawl package which I want to ask. I have three questions. (1) In Injector.java, normalize() happens first and then filter() happens, where as in Generator.java filter() happens in map phase and normalize() happens in reduce phase. Why is the order

Re: Urgent help reqd.....plz

2008-02-05 Thread Dennis Kubes
The conf directory would need to be in the classpath. You would have a nutch-site.xml file, amoung others, in the conf directory. That file would need to specify the plugins.folder variable with a value of plugins. Or you would need to have the nutch-default file in the conf directory which

Re: QueryFilter runtime exception - incorrect plugins setup?

2008-02-05 Thread Dennis Kubes
You would need the parent of the plugin directory along with the conf directory in your classpath. Dennis Aled Rhys Jones wrote: Hi all I've implemented a Nutch Bean into my web application (based on Appfuse). I'm pretty sure I've got all the prerequisites I need but when I try to build

Re: Nutch and Hadoop

2008-02-05 Thread payo
i resolved the problem!! i change in conf/context.xsl ?xml version=1.0 encoding=UTF-8? by ?xml version=1.0 encoding=iso-8859-1? this is correct? i read this http://www.openrdf.org/doc/sesame/users/ch09.html#d0e3707 -- View this message in context:

Re: Urgent help reqd.....plz

2008-02-05 Thread devj
Hi, I am using the 0.9 version of Nutch. the layout is : $NUTCH_HOME is at /media/sda1/linux/java/nutch-0.9 conf folder : /media/sda1/linux/java/nutch-0.9/conf which contains the xml files plugins: /media/sda1/linux/java/nutch-0.9/plugins The conf directory is in the classpath ,and the

Re: Urgent help reqd.....plz

2008-02-05 Thread Susam Pal
You have not added nutch-default.xml and nutch-site.xml to your Configuration object. Adding the following two lines to your code should solve the problem:- conf.addDefaultResource(nutch-default.xml); conf.addDefaultResource(nutch-site.xml); Regards, Susam Pal On Feb 6, 2008 12:17 AM, devj

Re: Urgent help reqd.....plz

2008-02-05 Thread Dennis Kubes
Good catch Susam. Instead of including the files directly through add methods, an easier way would be this: Configuration conf = NutchConfiguration.create(); Dennis Susam Pal wrote: You have not added nutch-default.xml and nutch-site.xml to your Configuration object. Adding the following

Re: Questions on normalizer and filter related code in Crawl, Injector and Generator

2008-02-05 Thread Dennis Kubes
Susam Pal wrote: I found a few of things in org.apache.nutch.crawl package which I want to ask. I have three questions. (1) In Injector.java, normalize() happens first and then filter() happens, where as in Generator.java filter() happens in map phase and normalize() happens in reduce phase.

Re: Urgent help reqd.....plz

2008-02-05 Thread Dennis Kubes
devj wrote: Hi, So it finally worked.Thanks ,Susam, for the two lines..now i know wat the conf object is for... but now theres another problem I keep getting a no class found error for ' org/apache/commons/cli/ParseException ' .I downloaded the commons sourec package,built it and added the

RE: QueryFilter runtime exception - incorrect plugins setup?

2008-02-05 Thread Aled Rhys Jones
Thanks Dennis The parent of the plugin directory is the nutch directory isn't it? (nutch-0.9 in my case). Should I include these directories in my project? How do I reference the required folders in my web application using eclipse? If they're not physically located within my project folder,

nutch 0.9, mergesegs error

2008-02-05 Thread John Mendenhall
I am running nutch 0.9. I have run nutch mergesegs many times before. The last couple times I have run, I get the following errors: - Merging 14 segments to /var/nutch/crawl/mergesegs_dir/20080201220906 SegmentMerger: adding /var/nutch/crawl/segments/20080128132506 SegmentMerger: adding

Re: Nutch and Hadoop

2008-02-05 Thread payo
hi to all how can configure hadoop-site.xml in the part: 1.- fs.default.name 2.- mapred.job.tracker 3.- mapred.tasktracker.tasks.maximum in general hadoop-site i am working wiht two machines one master node an one as slave thanks -- View this message in context:

Limiting Crawl Time

2008-02-05 Thread Paul Stewart
Hi folks... What is the best way to say limit crawling to perhaps 3-4 hours per day? Is there a way to do this? Right now, I have a crawl depth of 6 and maximum per site of 100. I thought this would limit things pretty low but during some test crawls, my last crawl took 2.5 days to complete:

Re: Limiting Crawl Time

2008-02-05 Thread Susam Pal
Did you try specifying a topN value? -depth 3 -topN 1000 should be close to what you want. On 2/6/08, Paul Stewart [EMAIL PROTECTED] wrote: Hi folks... What is the best way to say limit crawling to perhaps 3-4 hours per day? Is there a way to do this? Right now, I have a crawl depth of 6