Re: Question on DFS Balancing

2014-03-05 Thread Harsh J
You can safely move block files between disks. Follow the instructions
here: 
http://wiki.apache.org/hadoop/FAQ#On_an_individual_data_node.2C_how_do_you_balance_the_blocks_on_the_disk.3F

On Tue, Mar 4, 2014 at 11:47 PM, divye sheth  wrote:
> Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using Hadoop
> 0.20.2 (we are in a process of upgrading) is there a workaround for the
> short term to balance the disk utilization? The patch in the Jira, if
> applied to the version that I am using, will it break anything?
>
> Thanks
> Divye Sheth
>
>
> On Wed, Mar 5, 2014 at 11:28 AM, Harsh J  wrote:
>>
>> You're probably looking for
>> https://issues.apache.org/jira/browse/HDFS-1804
>>
>> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth  wrote:
>> > Hi,
>> >
>> > I am new to the mailing list.
>> >
>> > I am using Hadoop 0.20.2 with an append r1056497 version. The question I
>> > have is related to balancing. I have a 5 datanode cluster and each node
>> > has
>> > 2 disks attached to it. The second disk was added when the first disk
>> > was
>> > reaching its capacity.
>> >
>> > Now the scenario that I am facing is, when the new disk was added hadoop
>> > automatically moved over some data to the new disk. But over the time I
>> > notice that data is no longer being written to the second disk. I have
>> > also
>> > faced an issue on the datanode where the first disk had 100%
>> > utilization.
>> >
>> > How can I overcome such scenario, is it not hadoop's job to balance the
>> > disk
>> > utilization between multiple disks on single datanode?
>> >
>> > Thanks
>> > Divye Sheth
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J


Re: Question on DFS Balancing

2014-03-05 Thread Azuryy Yu
It don't need any downtime. just like Balancer, but this tool move blocks
peer to peer. you specified source node and destination node. then start.


On Wed, Mar 5, 2014 at 5:12 PM, divye sheth  wrote:

> Does this require any downtime? I guess it should and any other
> precautions that I should take?
> Thanks Azuryy.
>
>
> On Wed, Mar 5, 2014 at 2:19 PM, Azuryy Yu  wrote:
>
>> you can write a simple tool to move blocks peer to peer. I had such tool
>> before, but I cannot find it now.
>>
>> background: our cluster is not balanced, load balancer is very slow, so i
>> wrote this tool to move blocks from one node to another node.
>>
>>
>> On Wed, Mar 5, 2014 at 4:06 PM, divye sheth  wrote:
>>
>>> I wont be in a position to fix that depending on HDFS-1804 as we are
>>> upgrading to CDH4 in the coming month. Just wanted a short term solution. I
>>> have read somewhere that manual movement of the blocks would help. Could
>>> some one guide me to the exact steps or precautions I should take while
>>> doing this? Data loss is a NO NO for me.
>>>
>>> Thanks
>>> Divye Sheth
>>>
>>>
>>> On Wed, Mar 5, 2014 at 1:28 PM, Azuryy Yu  wrote:
>>>
 Hi,
 That probably break something if you apply the patch from 2.x to
 0.20.x, but it depends on.

 AFAIK, Balancer had a major refactor in HDFSv2, so you'd better fix it
 by yourself based on HDFS-1804.



 On Wed, Mar 5, 2014 at 3:47 PM, divye sheth wrote:

> Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using
> Hadoop 0.20.2 (we are in a process of upgrading) is there a workaround for
> the short term to balance the disk utilization? The patch in the Jira, if
> applied to the version that I am using, will it break anything?
>
> Thanks
> Divye Sheth
>
>
> On Wed, Mar 5, 2014 at 11:28 AM, Harsh J  wrote:
>
>> You're probably looking for
>> https://issues.apache.org/jira/browse/HDFS-1804
>>
>> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth 
>> wrote:
>> > Hi,
>> >
>> > I am new to the mailing list.
>> >
>> > I am using Hadoop 0.20.2 with an append r1056497 version. The
>> question I
>> > have is related to balancing. I have a 5 datanode cluster and each
>> node has
>> > 2 disks attached to it. The second disk was added when the first
>> disk was
>> > reaching its capacity.
>> >
>> > Now the scenario that I am facing is, when the new disk was added
>> hadoop
>> > automatically moved over some data to the new disk. But over the
>> time I
>> > notice that data is no longer being written to the second disk. I
>> have also
>> > faced an issue on the datanode where the first disk had 100%
>> utilization.
>> >
>> > How can I overcome such scenario, is it not hadoop's job to balance
>> the disk
>> > utilization between multiple disks on single datanode?
>> >
>> > Thanks
>> > Divye Sheth
>>
>>
>>
>> --
>> Harsh J
>>
>
>

>>>
>>
>


Re: Question on DFS Balancing

2014-03-05 Thread divye sheth
Does this require any downtime? I guess it should and any other precautions
that I should take?
Thanks Azuryy.


On Wed, Mar 5, 2014 at 2:19 PM, Azuryy Yu  wrote:

> you can write a simple tool to move blocks peer to peer. I had such tool
> before, but I cannot find it now.
>
> background: our cluster is not balanced, load balancer is very slow, so i
> wrote this tool to move blocks from one node to another node.
>
>
> On Wed, Mar 5, 2014 at 4:06 PM, divye sheth  wrote:
>
>> I wont be in a position to fix that depending on HDFS-1804 as we are
>> upgrading to CDH4 in the coming month. Just wanted a short term solution. I
>> have read somewhere that manual movement of the blocks would help. Could
>> some one guide me to the exact steps or precautions I should take while
>> doing this? Data loss is a NO NO for me.
>>
>> Thanks
>> Divye Sheth
>>
>>
>> On Wed, Mar 5, 2014 at 1:28 PM, Azuryy Yu  wrote:
>>
>>> Hi,
>>> That probably break something if you apply the patch from 2.x to 0.20.x,
>>> but it depends on.
>>>
>>> AFAIK, Balancer had a major refactor in HDFSv2, so you'd better fix it
>>> by yourself based on HDFS-1804.
>>>
>>>
>>>
>>> On Wed, Mar 5, 2014 at 3:47 PM, divye sheth wrote:
>>>
 Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using
 Hadoop 0.20.2 (we are in a process of upgrading) is there a workaround for
 the short term to balance the disk utilization? The patch in the Jira, if
 applied to the version that I am using, will it break anything?

 Thanks
 Divye Sheth


 On Wed, Mar 5, 2014 at 11:28 AM, Harsh J  wrote:

> You're probably looking for
> https://issues.apache.org/jira/browse/HDFS-1804
>
> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth 
> wrote:
> > Hi,
> >
> > I am new to the mailing list.
> >
> > I am using Hadoop 0.20.2 with an append r1056497 version. The
> question I
> > have is related to balancing. I have a 5 datanode cluster and each
> node has
> > 2 disks attached to it. The second disk was added when the first
> disk was
> > reaching its capacity.
> >
> > Now the scenario that I am facing is, when the new disk was added
> hadoop
> > automatically moved over some data to the new disk. But over the
> time I
> > notice that data is no longer being written to the second disk. I
> have also
> > faced an issue on the datanode where the first disk had 100%
> utilization.
> >
> > How can I overcome such scenario, is it not hadoop's job to balance
> the disk
> > utilization between multiple disks on single datanode?
> >
> > Thanks
> > Divye Sheth
>
>
>
> --
> Harsh J
>


>>>
>>
>


Re: Question on DFS Balancing

2014-03-05 Thread Azuryy Yu
you can write a simple tool to move blocks peer to peer. I had such tool
before, but I cannot find it now.

background: our cluster is not balanced, load balancer is very slow, so i
wrote this tool to move blocks from one node to another node.


On Wed, Mar 5, 2014 at 4:06 PM, divye sheth  wrote:

> I wont be in a position to fix that depending on HDFS-1804 as we are
> upgrading to CDH4 in the coming month. Just wanted a short term solution. I
> have read somewhere that manual movement of the blocks would help. Could
> some one guide me to the exact steps or precautions I should take while
> doing this? Data loss is a NO NO for me.
>
> Thanks
> Divye Sheth
>
>
> On Wed, Mar 5, 2014 at 1:28 PM, Azuryy Yu  wrote:
>
>> Hi,
>> That probably break something if you apply the patch from 2.x to 0.20.x,
>> but it depends on.
>>
>> AFAIK, Balancer had a major refactor in HDFSv2, so you'd better fix it by
>> yourself based on HDFS-1804.
>>
>>
>>
>> On Wed, Mar 5, 2014 at 3:47 PM, divye sheth  wrote:
>>
>>> Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using
>>> Hadoop 0.20.2 (we are in a process of upgrading) is there a workaround for
>>> the short term to balance the disk utilization? The patch in the Jira, if
>>> applied to the version that I am using, will it break anything?
>>>
>>> Thanks
>>> Divye Sheth
>>>
>>>
>>> On Wed, Mar 5, 2014 at 11:28 AM, Harsh J  wrote:
>>>
 You're probably looking for
 https://issues.apache.org/jira/browse/HDFS-1804

 On Tue, Mar 4, 2014 at 5:54 AM, divye sheth 
 wrote:
 > Hi,
 >
 > I am new to the mailing list.
 >
 > I am using Hadoop 0.20.2 with an append r1056497 version. The
 question I
 > have is related to balancing. I have a 5 datanode cluster and each
 node has
 > 2 disks attached to it. The second disk was added when the first disk
 was
 > reaching its capacity.
 >
 > Now the scenario that I am facing is, when the new disk was added
 hadoop
 > automatically moved over some data to the new disk. But over the time
 I
 > notice that data is no longer being written to the second disk. I
 have also
 > faced an issue on the datanode where the first disk had 100%
 utilization.
 >
 > How can I overcome such scenario, is it not hadoop's job to balance
 the disk
 > utilization between multiple disks on single datanode?
 >
 > Thanks
 > Divye Sheth



 --
 Harsh J

>>>
>>>
>>
>


Re: Question on DFS Balancing

2014-03-05 Thread divye sheth
I wont be in a position to fix that depending on HDFS-1804 as we are
upgrading to CDH4 in the coming month. Just wanted a short term solution. I
have read somewhere that manual movement of the blocks would help. Could
some one guide me to the exact steps or precautions I should take while
doing this? Data loss is a NO NO for me.

Thanks
Divye Sheth


On Wed, Mar 5, 2014 at 1:28 PM, Azuryy Yu  wrote:

> Hi,
> That probably break something if you apply the patch from 2.x to 0.20.x,
> but it depends on.
>
> AFAIK, Balancer had a major refactor in HDFSv2, so you'd better fix it by
> yourself based on HDFS-1804.
>
>
>
> On Wed, Mar 5, 2014 at 3:47 PM, divye sheth  wrote:
>
>> Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using
>> Hadoop 0.20.2 (we are in a process of upgrading) is there a workaround for
>> the short term to balance the disk utilization? The patch in the Jira, if
>> applied to the version that I am using, will it break anything?
>>
>> Thanks
>> Divye Sheth
>>
>>
>> On Wed, Mar 5, 2014 at 11:28 AM, Harsh J  wrote:
>>
>>> You're probably looking for
>>> https://issues.apache.org/jira/browse/HDFS-1804
>>>
>>> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth 
>>> wrote:
>>> > Hi,
>>> >
>>> > I am new to the mailing list.
>>> >
>>> > I am using Hadoop 0.20.2 with an append r1056497 version. The question
>>> I
>>> > have is related to balancing. I have a 5 datanode cluster and each
>>> node has
>>> > 2 disks attached to it. The second disk was added when the first disk
>>> was
>>> > reaching its capacity.
>>> >
>>> > Now the scenario that I am facing is, when the new disk was added
>>> hadoop
>>> > automatically moved over some data to the new disk. But over the time I
>>> > notice that data is no longer being written to the second disk. I have
>>> also
>>> > faced an issue on the datanode where the first disk had 100%
>>> utilization.
>>> >
>>> > How can I overcome such scenario, is it not hadoop's job to balance
>>> the disk
>>> > utilization between multiple disks on single datanode?
>>> >
>>> > Thanks
>>> > Divye Sheth
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>


Re: Question on DFS Balancing

2014-03-04 Thread Azuryy Yu
Hi,
That probably break something if you apply the patch from 2.x to 0.20.x,
but it depends on.

AFAIK, Balancer had a major refactor in HDFSv2, so you'd better fix it by
yourself based on HDFS-1804.



On Wed, Mar 5, 2014 at 3:47 PM, divye sheth  wrote:

> Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using Hadoop
> 0.20.2 (we are in a process of upgrading) is there a workaround for the
> short term to balance the disk utilization? The patch in the Jira, if
> applied to the version that I am using, will it break anything?
>
> Thanks
> Divye Sheth
>
>
> On Wed, Mar 5, 2014 at 11:28 AM, Harsh J  wrote:
>
>> You're probably looking for
>> https://issues.apache.org/jira/browse/HDFS-1804
>>
>> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth  wrote:
>> > Hi,
>> >
>> > I am new to the mailing list.
>> >
>> > I am using Hadoop 0.20.2 with an append r1056497 version. The question I
>> > have is related to balancing. I have a 5 datanode cluster and each node
>> has
>> > 2 disks attached to it. The second disk was added when the first disk
>> was
>> > reaching its capacity.
>> >
>> > Now the scenario that I am facing is, when the new disk was added hadoop
>> > automatically moved over some data to the new disk. But over the time I
>> > notice that data is no longer being written to the second disk. I have
>> also
>> > faced an issue on the datanode where the first disk had 100%
>> utilization.
>> >
>> > How can I overcome such scenario, is it not hadoop's job to balance the
>> disk
>> > utilization between multiple disks on single datanode?
>> >
>> > Thanks
>> > Divye Sheth
>>
>>
>>
>> --
>> Harsh J
>>
>
>


Re: Question on DFS Balancing

2014-03-04 Thread divye sheth
Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using Hadoop
0.20.2 (we are in a process of upgrading) is there a workaround for the
short term to balance the disk utilization? The patch in the Jira, if
applied to the version that I am using, will it break anything?

Thanks
Divye Sheth


On Wed, Mar 5, 2014 at 11:28 AM, Harsh J  wrote:

> You're probably looking for
> https://issues.apache.org/jira/browse/HDFS-1804
>
> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth  wrote:
> > Hi,
> >
> > I am new to the mailing list.
> >
> > I am using Hadoop 0.20.2 with an append r1056497 version. The question I
> > have is related to balancing. I have a 5 datanode cluster and each node
> has
> > 2 disks attached to it. The second disk was added when the first disk was
> > reaching its capacity.
> >
> > Now the scenario that I am facing is, when the new disk was added hadoop
> > automatically moved over some data to the new disk. But over the time I
> > notice that data is no longer being written to the second disk. I have
> also
> > faced an issue on the datanode where the first disk had 100% utilization.
> >
> > How can I overcome such scenario, is it not hadoop's job to balance the
> disk
> > utilization between multiple disks on single datanode?
> >
> > Thanks
> > Divye Sheth
>
>
>
> --
> Harsh J
>


Re: Question on DFS Balancing

2014-03-04 Thread Harsh J
You're probably looking for https://issues.apache.org/jira/browse/HDFS-1804

On Tue, Mar 4, 2014 at 5:54 AM, divye sheth  wrote:
> Hi,
>
> I am new to the mailing list.
>
> I am using Hadoop 0.20.2 with an append r1056497 version. The question I
> have is related to balancing. I have a 5 datanode cluster and each node has
> 2 disks attached to it. The second disk was added when the first disk was
> reaching its capacity.
>
> Now the scenario that I am facing is, when the new disk was added hadoop
> automatically moved over some data to the new disk. But over the time I
> notice that data is no longer being written to the second disk. I have also
> faced an issue on the datanode where the first disk had 100% utilization.
>
> How can I overcome such scenario, is it not hadoop's job to balance the disk
> utilization between multiple disks on single datanode?
>
> Thanks
> Divye Sheth



-- 
Harsh J