FYI, I did recreate from scratch a new filesystem to hold the HDFS and increased the size until the put operation succeeded. It took me a minimum of 650MB filesystem to be able to copy a 100K file. I incremented the space by chunks of 10MB each time to get the best value.
Here is the output of the dfsadmin -report Configured Capacity: 684486656 (652.78 MB) Present Capacity: 682922849 (651.29 MB) DFS Remaining: 682786816 (651.16 MB) DFS Used: 136033 (132.84 KB) DFS Used%: 0.02% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 ------------------------------------------------- Datanodes available: 1 (1 total, 0 dead) Live datanodes: Name: 127.0.0.1:50010 (feynman.cids.ca) Hostname: feynman.cids.ca Decommission Status : Normal Configured Capacity: 684486656 (652.78 MB) DFS Used: 136033 (132.84 KB) Non DFS Used: 1563807 (1.49 MB) DFS Remaining: 682786816 (651.16 MB) DFS Used%: 0.02% DFS Remaining%: 99.75% Last contact: Tue Dec 03 22:01:05 EST 2013 ----------------- Daniel Savard 2013/12/3 Daniel Savard <[email protected]> > Adam and others, > > I solved my problem by increasing by 3GB the filesystem holding the data. > I didn't try to increase it by smaller steps, so I don't know exactly at > which point I had enough space for HDFS to work properly. Is there anywhere > in the documentation a place we can have a list of guidelines, requirements > for the filesystem(s). And I suppose it is possible to use much less space > provided some parameter(s) is/are properly configured to use less space > (namenode?). Any worksheets to plan the disk space capacity for any > configuration (standalone single node or complete cluster)? > > > > ----------------- > Daniel Savard > > > 2013/12/3 Daniel Savard <[email protected]> > >> Adam, >> >> here is the link: >> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html >> >> Then, since it didn't work I tried a number of things, but my >> configuration files are really skinny and there isn't much stuff in it. >> >> ----------------- >> Daniel Savard >> >> >> 2013/12/3 Adam Kawa <[email protected]> >> >>> Could you please send me a link to the documentation that you followed >>> to setup your single-node cluster? >>> I will go through it and do it step by step, so hopefully at the end >>> your issue will be solved and the documentation will be improved. >>> >>> If you have any non-standard settings in core-site.xml, hdfs-site.xml >>> and hadoop-env.sh (that were not suggested by the documentation that you >>> followed), then please share them. >>> >>> >>> 2013/12/3 Daniel Savard <[email protected]> >>> >>>> Adam, >>>> >>>> that's not the issue, I did substitute the name in the first report. >>>> The actual hostname is feynman.cids.ca. >>>> >>>> ----------------- >>>> Daniel Savard >>>> >>>> >>>> 2013/12/3 Adam Kawa <[email protected]> >>>> >>>>> Daniel, >>>>> >>>>> I see that in previous hdfs report, you had: hosta.subdom1.tld1, but >>>>> now you have feynman.cids.ca. What is the content of your /etc/hosts >>>>> file, and output of $hostname command? >>>>> >>>>> >>>>> >>>>> >>>>> 2013/12/3 Daniel Savard <[email protected]> >>>>> >>>>>> I did that more than once, I just retry it from the beginning. I >>>>>> zapped the directories and recreated them with hdfs namenode -format and >>>>>> restarted HDFS and I am still getting the very same error. >>>>>> >>>>>> I have posted previously the report. Is there anything in this report >>>>>> that indicates I am not having enough free space somewhere? That's the >>>>>> only >>>>>> thing I can see may cause this problem after everything I read on the >>>>>> subject. I am new to Hadoop and I just want to setup a standalone node >>>>>> for >>>>>> starting to experiment a while with it before going ahead with a complete >>>>>> cluster. >>>>>> >>>>>> I repost the report for convenience: >>>>>> >>>>>> Configured Capacity: 2939899904 (2.74 GB) >>>>>> Present Capacity: 534421504 (509.66 MB) >>>>>> DFS Remaining: 534417408 (509.66 MB) >>>>>> >>>>>> DFS Used: 4096 (4 KB) >>>>>> DFS Used%: 0.00% >>>>>> Under replicated blocks: 0 >>>>>> Blocks with corrupt replicas: 0 >>>>>> Missing blocks: 0 >>>>>> >>>>>> ------------------------------------------------- >>>>>> Datanodes available: 1 (1 total, 0 dead) >>>>>> >>>>>> Live datanodes: >>>>>> Name: 127.0.0.1:50010 (feynman.cids.ca) >>>>>> Hostname: feynman.cids.ca >>>>>> Decommission Status : Normal >>>>>> Configured Capacity: 2939899904 (2.74 GB) >>>>>> >>>>>> DFS Used: 4096 (4 KB) >>>>>> Non DFS Used: 2405478400 (2.24 GB) >>>>>> DFS Remaining: 534417408 (509.66 MB) >>>>>> DFS Used%: 0.00% >>>>>> DFS Remaining%: 18.18% >>>>>> Last contact: Tue Dec 03 13:37:02 EST 2013 >>>>>> >>>>>> >>>>>> ----------------- >>>>>> Daniel Savard >>>>>> >>>>>> >>>>>> 2013/12/3 Adam Kawa <[email protected]> >>>>>> >>>>>>> Daniel, >>>>>>> >>>>>>> It looks that you can only communicate with NameNode to do >>>>>>> "metadata-only" operations (e.g. listing, creating a dir, empty file)... >>>>>>> >>>>>>> Did you format the NameNode correctly? >>>>>>> A quite similar issue is described here: >>>>>>> http://www.manning-sandbox.com/thread.jspa?messageID=126741. The >>>>>>> last reply says: "The most common is that you have reformatted the >>>>>>> namenode leaving it in an inconsistent state. The most common solution >>>>>>> is >>>>>>> to stop dfs, remove the contents of the dfs directories on all the >>>>>>> machines, run “hadoop namenode -format” on the controller, then restart >>>>>>> dfs. That consistently fixes the problem for me. This may be serious >>>>>>> overkill but it works." >>>>>>> >>>>>>> >>>>>>> 2013/12/3 Daniel Savard <[email protected]> >>>>>>> >>>>>>>> Thanks Arun, >>>>>>>> >>>>>>>> I already read and did everything recommended at the referred URL. >>>>>>>> There isn't any error message in the logfiles. The only error message >>>>>>>> appears when I try to put a non-zero file on the HDFS as posted above. >>>>>>>> Beside that, absolutely nothing in the logs is telling me something is >>>>>>>> wrong with the configuration so far. >>>>>>>> >>>>>>>> Is there some sort of diagnostic tool that can query/ping each >>>>>>>> server to make sure it responds properly to requests? When trying to >>>>>>>> put my >>>>>>>> file, in the datanode log I see nothing, the message appears in the >>>>>>>> namenode log. Is this the expected behavior or should I see at least >>>>>>>> some >>>>>>>> kind of request message in the datanode logfile? >>>>>>>> >>>>>>>> >>>>>>>> ----------------- >>>>>>>> Daniel Savard >>>>>>>> >>>>>>>> >>>>>>>> 2013/12/2 Arun C Murthy <[email protected]> >>>>>>>> >>>>>>>>> Daniel, >>>>>>>>> >>>>>>>>> Apologies if you had a bad experience. If you can point them out >>>>>>>>> to us, we'd be more than happy to fix it - alternately, we'd *love* >>>>>>>>> it if >>>>>>>>> you could help us improve docs too. >>>>>>>>> >>>>>>>>> Now, for the problem at hand: >>>>>>>>> http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo is one >>>>>>>>> place to look. Basically NN cannot find any datanodes. Anything in >>>>>>>>> your NN >>>>>>>>> logs to indicate trouble? >>>>>>>>> >>>>>>>>> Also, pls feel free to open liras with issues you find and we'll >>>>>>>>> help. >>>>>>>>> >>>>>>>>> thanks, >>>>>>>>> Arun >>>>>>>>> >>>>>>>>> On Dec 2, 2013, at 8:44 AM, Daniel Savard <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> André, >>>>>>>>> >>>>>>>>> good for you that greedy instructions on the reference page were >>>>>>>>> enough to setup your cluster. However, read them again and see how >>>>>>>>> many >>>>>>>>> assumptions are made into them about what you are supposed to already >>>>>>>>> know >>>>>>>>> and should come without saying more about it. >>>>>>>>> >>>>>>>>> I did try the single node setup, it is worst than the cluster >>>>>>>>> setup regarding the instructions. You are supposed to already have a >>>>>>>>> near >>>>>>>>> working system as far as I understand the instructions. It is assumed >>>>>>>>> the >>>>>>>>> HDFS is already setup and working properly. Try to find the >>>>>>>>> instructions to >>>>>>>>> setup HDFS for version 2.2.0 and you will end up with a lot of >>>>>>>>> inappropriate instructions about previous version (some properties >>>>>>>>> were >>>>>>>>> renamed). >>>>>>>>> >>>>>>>>> It may appear hard at people to say this is toxic, but it is. The >>>>>>>>> first place a newcomer will go is setup a single node. This will be >>>>>>>>> his >>>>>>>>> starting point and he will be left with a bunch of a priori and no >>>>>>>>> clue. >>>>>>>>> >>>>>>>>> To go back to my very problem at this point: >>>>>>>>> >>>>>>>>> 13/12/02 11:34:07 WARN hdfs.DFSClient: DataStreamer Exception >>>>>>>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File >>>>>>>>> /test._COPYING_ could only be replicated to 0 nodes instead of >>>>>>>>> minReplication (=1). There are 1 datanode(s) running and no node(s) >>>>>>>>> are >>>>>>>>> excluded in this operation. >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:387) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59582) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) >>>>>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) >>>>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048) >>>>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) >>>>>>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>>>>>> at javax.security.auth.Subject.doAs(Subject.java:415) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) >>>>>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042) >>>>>>>>> >>>>>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1347) >>>>>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1300) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) >>>>>>>>> at com.sun.proxy.$Proxy9.addBlock(Unknown Source) >>>>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>>>>>> at >>>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>>>>>>>> at >>>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>>>>>>> at java.lang.reflect.Method.invoke(Method.java:606) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) >>>>>>>>> at com.sun.proxy.$Proxy9.addBlock(Unknown Source) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:330) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1226) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514) >>>>>>>>> >>>>>>>>> I can copy an empty file, but as soon as its content is non-zero I >>>>>>>>> am getting this message. Searching on the message is of no help so >>>>>>>>> far. >>>>>>>>> >>>>>>>>> And I skimmed through the cluster instructions and found nothing >>>>>>>>> there that could help in any way neither. >>>>>>>>> >>>>>>>>> >>>>>>>>> ----------------- >>>>>>>>> Daniel Savard >>>>>>>>> >>>>>>>>> >>>>>>>>> 2013/12/2 Andre Kelpe <[email protected]> >>>>>>>>> >>>>>>>>>> Hi Daniel, >>>>>>>>>> >>>>>>>>>> first of all, before posting to a mailing list, take a deep >>>>>>>>>> breath and >>>>>>>>>> let your frustrations out. Then write the email. Using words like >>>>>>>>>> "crappy", "toxicware", "nightmare" are not going to help you >>>>>>>>>> getting >>>>>>>>>> useful responses. >>>>>>>>>> >>>>>>>>>> While I agree that the docs can be confusing, we should try to >>>>>>>>>> stay >>>>>>>>>> constructive. You haven't mentioned which documentation you are >>>>>>>>>> using. I found the cluster tutorial sufficient to get me started: >>>>>>>>>> >>>>>>>>>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html >>>>>>>>>> >>>>>>>>>> If you are looking for an easy way to spin up a small cluster with >>>>>>>>>> hadoop 2.2, try the hadoop2 branch of this vagrant setup: >>>>>>>>>> >>>>>>>>>> https://github.com/fs111/vagrant-hadoop-cluster/tree/hadoop2 >>>>>>>>>> >>>>>>>>>> - André >>>>>>>>>> >>>>>>>>>> On Mon, Dec 2, 2013 at 5:34 AM, Daniel Savard < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> > I am trying to configure hadoop 2.2.0 from source code and I >>>>>>>>>> found the >>>>>>>>>> > instructions really crappy and incomplete. It is like they were >>>>>>>>>> written to >>>>>>>>>> > avoid someone can do the job himself and must contract someone >>>>>>>>>> else to do it >>>>>>>>>> > or buy a packaged version. >>>>>>>>>> > >>>>>>>>>> > It is about three days I am struggling with this stuff with >>>>>>>>>> partial success. >>>>>>>>>> > The documentation is less than clear and most of the stuff out >>>>>>>>>> there apply >>>>>>>>>> > to earlier version and they haven't been updated for version >>>>>>>>>> 2.2.0. >>>>>>>>>> > >>>>>>>>>> > I was able to setup HDFS, however I am still unable to use it. >>>>>>>>>> I am doing a >>>>>>>>>> > single node installation and the instruction page doesn't >>>>>>>>>> explain anything >>>>>>>>>> > beside telling you to do this and that without documenting what >>>>>>>>>> each thing >>>>>>>>>> > is doing and what choices are available and what guidelines you >>>>>>>>>> should >>>>>>>>>> > follow. There is even environment variables you are told to >>>>>>>>>> set, but nothing >>>>>>>>>> > is said about what they mean and to which value they should be >>>>>>>>>> set. It seems >>>>>>>>>> > it assumes prior knowledge of everything about hadoop. >>>>>>>>>> > >>>>>>>>>> > Anyone knows a site with proper documentation about hadoop or >>>>>>>>>> it's hopeless >>>>>>>>>> > and this whole thing is just a piece of toxicware? >>>>>>>>>> > >>>>>>>>>> > I am already looking for alternate solutions to hadoop which >>>>>>>>>> for sure will >>>>>>>>>> > be a nightmare to manage and install each time a new version, >>>>>>>>>> release will >>>>>>>>>> > become available. >>>>>>>>>> > >>>>>>>>>> > TIA >>>>>>>>>> > ----------------- >>>>>>>>>> > Daniel Savard >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> André Kelpe >>>>>>>>>> [email protected] >>>>>>>>>> http://concurrentinc.com >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Arun C. Murthy >>>>>>>>> Hortonworks Inc. >>>>>>>>> http://hortonworks.com/ >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> CONFIDENTIALITY NOTICE >>>>>>>>> NOTICE: This message is intended for the use of the individual or >>>>>>>>> entity to which it is addressed and may contain information that is >>>>>>>>> confidential, privileged and exempt from disclosure under applicable >>>>>>>>> law. >>>>>>>>> If the reader of this message is not the intended recipient, you are >>>>>>>>> hereby >>>>>>>>> notified that any printing, copying, dissemination, distribution, >>>>>>>>> disclosure or forwarding of this communication is strictly >>>>>>>>> prohibited. If >>>>>>>>> you have received this communication in error, please contact the >>>>>>>>> sender >>>>>>>>> immediately and delete it from your system. Thank You. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
