Re: Configuring hadoop 2.2.0

Ognen Duzlevski Wed, 29 Jan 2014 04:40:28 -0800

So, the question is: do I or don't I need to run the yarn/resource
manager/node manager combination in addition to HDFS? My impression was
what you are saying - that HDFS is independent of the MR component.


Thanks! :)
Ognen


On Wed, Jan 29, 2014 at 6:37 AM, Ognen Duzlevski
<[email protected]>wrote:

> Harsh,
>
> Thanks for your reply. What happens is this: I have about 70 files, all
> about 20GB in size in an Amazon S3 bucket. I got them from the bucket in a
> for loop, file by file using the -distcp command from a single node.
>
> When I look at the distribution of space consumed on the HDFS cluster now,
> the node I ran the command on has 70% of its space taken up while the rest
> of the nodes are at 10% local space usage. All of the nodes started out
> with the same local space of 1.6TB mounted in the same exact partition
> /extra (ephemeral space on an Amazon instance put into a RAID0 array).
>
> Hence, the distribution of space is not balanced.
>
> However, I did discover the start-balancer.sh script and ran it with
> -threshold 5. It has been running since yesterday, maybe the 5% balancing
> threshold is too much?
>
> Ognen
>
>
>
>
> On Wed, Jan 29, 2014 at 4:08 AM, Harsh J <[email protected]> wrote:
>
>> I don't believe what you've been told is correct (IIUC). HDFS is an
>> independent component and does not require presence of YARN (or MR) to
>> function correctly.
>>
>> What do you exactly mean when you say "files are only stored on the
>> node that uses the hdfs command"? Does your "hdfs dfs -ls /" show a
>> local FS / result list or does it show a true HDFS directory listing?
>> Your problem may simply be configuring clients right - depending on
>> this.
>>
>> On Wed, Jan 29, 2014 at 12:52 AM, Ognen Duzlevski
>> <[email protected]> wrote:
>> > Hello,
>> >
>> > I have set up an HDFS cluster by running a name node and a bunch of data
>> > nodes. I ran into a problem where the files are only stored on the node
>> that
>> > uses the hdfs command and was told that this is because I do not have a
>> job
>> > tracker and task nodes set up.
>> >
>> > However, the documentation for 2.2.0 does not mention any of these (at
>> least
>> > not this page:
>> >
>> http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html
>> ).
>> > I browsed some of the earlier docs and they do mention job tracker nodes
>> > etc.
>> >
>> > So, for 2.2.0 - what is the way to set this up? Do I need a separate
>> machine
>> > to be the "job tracker"? Did this job tracker node change its name to
>> > something else in the current docs?
>> >
>> > Thanks,
>> > Ognen
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: Configuring hadoop 2.2.0

Reply via email to