I have a 3-node cluster. I changed the solr server to one of the nodes rather than have the master node do both the master work and serve solr. I tried to crawl 100k urls again last and failed with too many fetch failures during the map and shuffle errors during the reduce. This just started happening - the only new additions to the cluster would be the solr server and adding a dell 2850 as a node. Here is my hadoop-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
  <name>hadoop.tmp.dir</name>
  <value>/tmp/hadoop-${user.name}</value>
  <description>A base for other temporary directories.</description>
</property>


<property>
  <name>fs.default.name</name>
  <value>hdfs://opel:9000</value>
  <description>
    The name of the default file system. Either the literal string
    "local" or a host:port for NDFS.
  </description>
</property>

<property>
  <name>mapred.job.tracker</name>
  <value>opel:9001</value>
  <description>
    The host and port that the MapReduce job tracker runs at. If
    "local", then jobs are run in-process as a single map and
    reduce task.
  </description>
</property>

<property>
  <name>mapred.map.tasks</name>
  <value>30</value>
  <description>
    define mapred.map tasks to be number of slave hosts
  </description>
</property>

<property>
  <name>mapred.reduce.tasks</name>
  <value>6</value>
  <description>
    define mapred.reduce tasks to be number of slave hosts
  </description>
</property>

<property>
  <name>dfs.name.dir</name>
  <value>/home/hadoop/filesystem/name</value>
</property>

<property>
 <name>fs.checkpoint.dir</name>
 <value>/home/hadoop/filesystem/name2</value>
 <final>true</final>
</property>

<property>
  <name>dfs.data.dir</name>
  <value>/home/hadoop/filesystem/data</value>
</property>

<property>
  <name>mapred.system.dir</name>
  <value>/home/hadoop/filesystem/mapreduce/system</value>
</property>

<property>
  <name>mapred.local.dir</name>
  <value>/home/hadoop/filesystem/mapreduce/local</value>
</property>

<property>
  <name>dfs.replication</name>
  <value>3</value>
</property>

</configuration>

Let me know if you need any other information - I have no idea how to fix this problem.

Thanks,

Eric

On Nov 20, 2009, at 1:30 AM, Julien Nioche wrote:

It was probably a one-off, network related problem. Can you tell us a bit
more about your cluster configuration?

2009/11/19 Eric Osgood <e...@lakemeadonline.com>

Julien,

Thanks for your help, how would I go about fixing this error now that it is
diagnosed?


On Nov 19, 2009, at 1:50 PM, Julien Nioche wrote:

could be a communication problem between the node and the master. It is
not
a fetching problem in the Nutch sense of the term but a Hadoop- related
issue.

2009/11/19 Eric Osgood <e...@lakemeadonline.com>

This is the first time I have received this error while crawling. During
a
crawl of 100K pages, one of the nodes had a task failed and cited "Too
Many
Fetch Failures" as the reason. The job completed successfully but took
about
3 times longer than normal. Here is the log output


2009-11-19 11:19:56,377 WARN mapred.TaskTracker - Error running child
java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java: 197) at org.apache.hadoop.hdfs.DFSClient.access$600(DFSClient.java: 65)
    at

org.apache.hadoop.hdfs.DFSClient$DFSInputStream.close (DFSClient.java:1575)
    at java.io.FilterInputStream.close(FilterInputStream.java:155)
    at org.apache.hadoop.util.LineReader.close(LineReader.java:91)
    at

org.apache.hadoop.mapred.LineRecordReader.close (LineRecordReader.java:169)
    at

org.apache.hadoop.mapred.MapTask$TrackedRecordReader.close (MapTask.java:198)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:346)
    at org.apache.hadoop.mapred.Child.main(Child.java:158)
2009-11-19 11:19:56,380 WARN mapred.TaskRunner - Parent died. Exiting
attempt_200911191100_0001_m_000029_1
2009-11-19 11:20:21,135 WARN mapred.TaskRunner - Parent died. Exiting
attempt_200911191100_0001_r_000004_1

Can Anyone tell me how to resolve this error?

Thanks,


Eric Osgood
---------------------------------------------
Cal Poly - Computer Engineering, Moon Valley Software
---------------------------------------------
eosg...@calpoly.edu, e...@lakemeadonline.com
---------------------------------------------
www.calpoly.edu/~eosgood <http://www.calpoly.edu/%7Eeosgood> <
http://www.calpoly.edu/%7Eeosgood>,
www.lakemeadonline.com




--
DigitalPebble Ltd
http://www.digitalpebble.com






--
DigitalPebble Ltd
http://www.digitalpebble.com


Reply via email to