Hej,

I am configuring Nutch for just crawling webs in several machines (currently
I want to test with only one). 
Building Nutch with ant was successfully

   bin/hadoop namenode -format
   bin/start-all.sh

They show correct logs

  bin/hadoop dfs -put urls urls
  bin/hadoop dfs -ls

They show the urls directory correctly

But when I launch it the fetcher starts but does not show any message of
parsing and it stops in the second depth. The crawl-urlfilter and
nutch-default are well configured because they work great using local
filesystem (instead of hdfs). I guess it is because nutch-site is empty. 

What should be its content?

core-site.xml:

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:9000/</value>
  <description>
    The name of the default file system. Either the literal string 
    "local" or a host:port for NDFS.
  </description>
</property>

</configuration>


---------------------------------------

hdfs-site.xml:

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>
  <name>dfs.name.dir</name>
  <value>/root/filesystem/name</value>
</property>

<property>
  <name>dfs.data.dir</name>
  <value>/root/filesystem/data</value>
</property>

<property>
  <name>dfs.replication</name>
  <value>1</value>
</property>

</configuration>


---------------------------------------


mapred-site.xml:

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>
  <name>mapred.job.tracker</name>
  <value>hdfs://localhost:9001/</value>
  <description>
    The host and port that the MapReduce job tracker runs at. If 
    "local", then jobs are run in-process as a single map and 
    reduce task.
  </description>
</property>

<property> 
  <name>mapred.map.tasks</name>
  <value>2</value>
  <description>
    define mapred.map tasks to be number of slave hosts
  </description> 
</property> 

<property> 
  <name>mapred.reduce.tasks</name>
  <value>2</value>
  <description>
    define mapred.reduce tasks to be number of slave hosts
  </description> 
</property> 

<property>
  <name>mapred.system.dir</name>
  <value>/root/filesystem/mapreduce/system</value>
</property>

<property>
  <name>mapred.local.dir</name>
  <value>/root/filesystem/mapreduce/local</value>
</property>

</configuration>
-- 
View this message in context: 
http://old.nabble.com/Configurin-nutch-site.xml-tp27245750p27245750.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to