Also, does anyone know how I can "force" the rebalancer to move more data in one run? At the current settings, it will take about a week to rebalance the nodes ;)
Ognen On Wed, Jan 29, 2014 at 8:12 AM, Ognen Duzlevski <[email protected]>wrote: > Ahh, OK :) > > However, this seems kind of silly - it may be stored in the datanode but I > find the need to "force" the balancing manually somewhat strange. I mean > why use hdfs://namenode:port/path/file if the copies end up being stored > locally anyway? ;) > > Ognen > > > On Wed, Jan 29, 2014 at 8:10 AM, Selçuk Şenkul <[email protected]> wrote: > >> Try to run the command from the namenode, or another node which is not a >> datanode, the files should distribute. As far as I know, if you copy a file >> to hdfs from a datanode, the first copy is stored in that datanode. >> >> On Wed, Jan 29, 2014 at 4:05 PM, Ognen Duzlevski < >> [email protected]> wrote: >> >>> Hello (and thanks for replying!) :) >>> >>> On Wed, Jan 29, 2014 at 7:38 AM, java8964 <[email protected]> wrote: >>> >>>> Hi, Ognen: >>>> >>>> I noticed you were asking this question before under a different >>>> subject line. I think you need to tell us where you mean unbalance space, >>>> is it on HDFS or the local disk. >>>> >>>> 1) The HDFS is independent as MR. They are not related to each other. >>>> >>> >>> OK good to know. >>> >>> >>>> 2) Without MR1 or MR2 (Yarn), HDFS should work as itself, which means >>>> all HDFS command, API will just work. >>>> >>> >>> Good to know. Does this also mean that when I put or distcp file to >>> hdfs://namenode:54310/path/file - it will "decide" how to split the file >>> across all the datanodes so as the nodes are utilized equally in terms of >>> space? >>> >>> >>>> 3) But when you tried to copy file into HDFS using distcp, you need MR >>>> component (Doesn't matter it is MR1 or MR2), as distcp indeed uses >>>> MapReduce to do the massively parallel copying files. >>>> >>> >>> Understood. >>> >>> >>>> 4) Your original problem is that when you run the distcp command, you >>>> didn't start the MR component in your cluster, so distcp in fact copy your >>>> files to the LOCAL file system, based on some one else's reply to your >>>> original question. I didn't test this myself before, but I kind of believe >>>> that. >>>> >>> >>> Sure. But even if distcp is running in one thread, its destination is >>> hdfs://namenode:54310/path/file - should this not ensure equal "split" of >>> files across the whole HDFS cluster? Or am I delusional? :) >>> >>> >>>> 5) If the above is true, then you should see under node your were >>>> running distcp command there should be having these files in the local file >>>> system, in the path you specified. You should check and verify that. >>>> >>> >>> OK - so the command is this: >>> >>> hadoop --config /etc/hadoop distcp s3n://<credentials>@bucket/file >>> hdfs://10.10.0.198:54310/test/file where 10.10.0.198 is the HDFS Name >>> node. I am running this on 10.10.0.200 which is one of the Data nodes and I >>> am making no mention of the local data node storage in this command. My >>> expectation is that the files obtained this way from S3 will end up >>> distributed somewhat evenly across all of the 16 Data nodes in this HDSF >>> cluster. Am I wrong to expect this? >>> >>> 6) After you start yarn/resource manager, you see the unbalance after >>>> you distcp files again. Where is this unbalance? In the HDFS or local file >>>> system. List the commands and outputs here, so we can understand your >>>> problem more clearly, instead of misleading sometimes by your words. >>>> >>> >>> The imbalance is as follows: the machine I run the distcp command on >>> (one of the Data nodes) ends up with 70+% of the space it is contributing >>> to the HDFS cluster occupied with these files while the rest of the data >>> nodes in the cluster only get 10% of their contributed space occupied. >>> Since HDFS is a distributed, parallel file system I would expect that the >>> file space occupied would be spread evenly or somewhat evenly across all >>> the data nodes. >>> >>> Thanks! >>> Ognen >>> >> >> >
