DC> What version of the mapred branch are you running? I fixed a bug a week
DC> and a half ago that could cause this. There was a filehandle leak that
DC> resulted in this error after a tasktracker had run more than around 800
DC> tasks. If you have not updated your code recently, please try that.
It seems that new version fixed this problem, i haven't seen this
error anymore, but new problem arised during indexing process (i'm
using mapred revision 291801):
i'm trying to index via "./nutch index", segments were created by slightly
modificated version of crawl.Crawl class. With 1-2 segments everything
works ok, with about 20 segments task tracker logs on both servers
show repeating error block:
050926 180831 task_r_o4tt4z Got 1 map output locations.
050926 180831 Client connection to 127.0.0.1:60218: starting
050926 180831 Server connection on port 60218 from 127.0.0.1: starting
050926 180831 Client connection to 127.0.0.1:60218 caught:
java.lang.IndexOutOfBoundsException
java.lang.IndexOutOfBoundsException
at java.io.DataInputStream.readFully(DataInputStream.java:263)
at
org.apache.nutch.mapred.MapOutputFile.readFields(MapOutputFile.java:123)
at
org.apache.nutch.io.ObjectWritable.readObject(ObjectWritable.java:232)
at org.apache.nutch.io.ObjectWritable.readFields(ObjectWritable.java:60)
at org.apache.nutch.ipc.Client$Connection.run(Client.java:163)
050926 180831 Client connection to 127.0.0.1:60218: closing
050926 180831 Server handler on 60218 caught: java.net.SocketException:
Connection reset
java.net.SocketException: Connection reset
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:106)
at java.io.DataOutputStream.write(DataOutputStream.java:85)
at org.apache.nutch.mapred.MapOutputFile.write(MapOutputFile.java:98)
at
org.apache.nutch.io.ObjectWritable.writeObject(ObjectWritable.java:117)
at org.apache.nutch.io.ObjectWritable.write(ObjectWritable.java:64)
at org.apache.nutch.ipc.Server$Handler.run(Server.java:213)
050926 180831 Server connection on port 60218 from 127.0.0.1: exiting
050926 180931 task_r_o4tt4z copy failed: task_m_ypindn from
goku1.deeptown.net/127.0.0.1:60218
java.io.IOException: timed out waiting for response
at org.apache.nutch.ipc.Client.call(Client.java:296)
at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127)
at $Proxy2.getFile(Unknown Source)
at
org.apache.nutch.mapred.ReduceTaskRunner.prepare(ReduceTaskRunner.java:94)
at org.apache.nutch.mapred.TaskRunner.run(TaskRunner.java:61)
Michael