Hi, Ognen: I noticed you were asking this question before under a different subject line. I think you need to tell us where you mean unbalance space, is it on HDFS or the local disk. 1) The HDFS is independent as MR. They are not related to each other.2) Without MR1 or MR2 (Yarn), HDFS should work as itself, which means all HDFS command, API will just work.3) But when you tried to copy file into HDFS using distcp, you need MR component (Doesn't matter it is MR1 or MR2), as distcp indeed uses MapReduce to do the massively parallel copying files.4) Your original problem is that when you run the distcp command, you didn't start the MR component in your cluster, so distcp in fact copy your files to the LOCAL file system, based on some one else's reply to your original question. I didn't test this myself before, but I kind of believe that. 5) If the above is true, then you should see under node your were running distcp command there should be having these files in the local file system, in the path you specified. You should check and verify that.6) After you start yarn/resource manager, you see the unbalance after you distcp files again. Where is this unbalance? In the HDFS or local file system. List the commands and outputs here, so we can understand your problem more clearly, instead of misleading sometimes by your words.7) My suggest is that after you start the yarn/resource managers, run some examples MR jobs coming with hadoop, to make sure your cluster working as normal, then try your distcp command. Thanks Yong
Date: Wed, 29 Jan 2014 06:38:54 -0600 Subject: Re: Configuring hadoop 2.2.0 From: [email protected] To: [email protected] So, the question is: do I or don't I need to run the yarn/resource manager/node manager combination in addition to HDFS? My impression was what you are saying - that HDFS is independent of the MR component. Thanks! :) Ognen On Wed, Jan 29, 2014 at 6:37 AM, Ognen Duzlevski <[email protected]> wrote: Harsh, Thanks for your reply. What happens is this: I have about 70 files, all about 20GB in size in an Amazon S3 bucket. I got them from the bucket in a for loop, file by file using the -distcp command from a single node. When I look at the distribution of space consumed on the HDFS cluster now, the node I ran the command on has 70% of its space taken up while the rest of the nodes are at 10% local space usage. All of the nodes started out with the same local space of 1.6TB mounted in the same exact partition /extra (ephemeral space on an Amazon instance put into a RAID0 array). Hence, the distribution of space is not balanced. However, I did discover the start-balancer.sh script and ran it with -threshold 5. It has been running since yesterday, maybe the 5% balancing threshold is too much? Ognen On Wed, Jan 29, 2014 at 4:08 AM, Harsh J <[email protected]> wrote: I don't believe what you've been told is correct (IIUC). HDFS is an independent component and does not require presence of YARN (or MR) to function correctly. What do you exactly mean when you say "files are only stored on the node that uses the hdfs command"? Does your "hdfs dfs -ls /" show a local FS / result list or does it show a true HDFS directory listing? Your problem may simply be configuring clients right - depending on this. On Wed, Jan 29, 2014 at 12:52 AM, Ognen Duzlevski <[email protected]> wrote: > Hello, > > I have set up an HDFS cluster by running a name node and a bunch of data > nodes. I ran into a problem where the files are only stored on the node that > uses the hdfs command and was told that this is because I do not have a job > tracker and task nodes set up. > > However, the documentation for 2.2.0 does not mention any of these (at least > not this page: > http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html). > I browsed some of the earlier docs and they do mention job tracker nodes > etc. > > So, for 2.2.0 - what is the way to set this up? Do I need a separate machine > to be the "job tracker"? Did this job tracker node change its name to > something else in the current docs? > > Thanks, > Ognen -- Harsh J
