Hi Manuel, 2013-07-03 15:03:16,427 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place enough replicas, still in need of 3 2013-07-03 15:03:16,427 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:root cause:java.io.IOException: File /log/1372863795616 could only be replicated to 0 nodes, instead of 1
This indicates you haven't enough space on the HDFS. can you check the cluster capacity used? On Thu, Jul 4, 2013 at 12:14 AM, Manuel de Ferran <[email protected] > wrote: > Greetings all, > > we try to import data to an HDFS cluster, but we face random Exception. We > try to figure out what is the root cause: misconfiguration, too much load, > ... and how to solve that. > > The client writes hundred of files with a replication factor of 3. It > crashes sometimes at the beginning, sometimes close to the end, and in rare > case it succeeds. > > On failure, we have on client side: > DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: > java.io.IOException: File /log/1372863795616 could only be replicated to 0 > nodes, instead of 1 > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696) > .... > > which seems to be well known. We have followed the hints from the > Troubleshooting page, but we're still stuck: lots of disk available on > datanodes, free inodes, far below the open files limit , all datanodes are > up and running. > > Note that we have other HDFS clients that are still able to write files > while import is running. > > Here is the corresponding extract of the namenode log file: > > 2013-07-03 15:03:15,951 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of > transactions: 46009 Total time for transactions(ms): 153Number of > transactions batched in Syncs: 5428 Number of syncs: 32889 SyncTimes(ms): > 139555 > 2013-07-03 15:03:16,427 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place > enough replicas, still in need of 3 > 2013-07-03 15:03:16,427 ERROR > org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException > as:root cause:java.io.IOException: File /log/1372863795616 could only be > replicated to 0 nodes, instead of 1 > 2013-07-03 15:03:16,427 INFO org.apache.hadoop.ipc.Server: IPC Server > handler 9 on 9002, call addBlock(/log/1372863795616, DFSClient_1875494617, > null) from 192.168.1.141:41376: error: java.io.IOException: File > /log/1372863795616 could only be replicated to 0 nodes, instead of 1 > java.io.IOException: File /log/1372863795616 could only be replicated to 0 > nodes, instead of 1 > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696) > at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) > > > During the process, fsck reports about 300 of open files. The cluster is > running hadoop-1.0.3. > > Any advice about the configuration ? We tried to > lower dfs.heartbeat.interval, we raised dfs.datanode.max.xcievers to 4k > maybe raising dfs.datanode.handler.count ? > > > Thanks for your help >
