Re: Question on DFS Balancing
You can safely move block files between disks. Follow the instructions here: http://wiki.apache.org/hadoop/FAQ#On_an_individual_data_node.2C_how_do_you_balance_the_blocks_on_the_disk.3F On Tue, Mar 4, 2014 at 11:47 PM, divye sheth wrote: > Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using Hadoop > 0.20.2 (we are in a process of upgrading) is there a workaround for the > short term to balance the disk utilization? The patch in the Jira, if > applied to the version that I am using, will it break anything? > > Thanks > Divye Sheth > > > On Wed, Mar 5, 2014 at 11:28 AM, Harsh J wrote: >> >> You're probably looking for >> https://issues.apache.org/jira/browse/HDFS-1804 >> >> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth wrote: >> > Hi, >> > >> > I am new to the mailing list. >> > >> > I am using Hadoop 0.20.2 with an append r1056497 version. The question I >> > have is related to balancing. I have a 5 datanode cluster and each node >> > has >> > 2 disks attached to it. The second disk was added when the first disk >> > was >> > reaching its capacity. >> > >> > Now the scenario that I am facing is, when the new disk was added hadoop >> > automatically moved over some data to the new disk. But over the time I >> > notice that data is no longer being written to the second disk. I have >> > also >> > faced an issue on the datanode where the first disk had 100% >> > utilization. >> > >> > How can I overcome such scenario, is it not hadoop's job to balance the >> > disk >> > utilization between multiple disks on single datanode? >> > >> > Thanks >> > Divye Sheth >> >> >> >> -- >> Harsh J > > -- Harsh J
Re: Question on DFS Balancing
It don't need any downtime. just like Balancer, but this tool move blocks peer to peer. you specified source node and destination node. then start. On Wed, Mar 5, 2014 at 5:12 PM, divye sheth wrote: > Does this require any downtime? I guess it should and any other > precautions that I should take? > Thanks Azuryy. > > > On Wed, Mar 5, 2014 at 2:19 PM, Azuryy Yu wrote: > >> you can write a simple tool to move blocks peer to peer. I had such tool >> before, but I cannot find it now. >> >> background: our cluster is not balanced, load balancer is very slow, so i >> wrote this tool to move blocks from one node to another node. >> >> >> On Wed, Mar 5, 2014 at 4:06 PM, divye sheth wrote: >> >>> I wont be in a position to fix that depending on HDFS-1804 as we are >>> upgrading to CDH4 in the coming month. Just wanted a short term solution. I >>> have read somewhere that manual movement of the blocks would help. Could >>> some one guide me to the exact steps or precautions I should take while >>> doing this? Data loss is a NO NO for me. >>> >>> Thanks >>> Divye Sheth >>> >>> >>> On Wed, Mar 5, 2014 at 1:28 PM, Azuryy Yu wrote: >>> Hi, That probably break something if you apply the patch from 2.x to 0.20.x, but it depends on. AFAIK, Balancer had a major refactor in HDFSv2, so you'd better fix it by yourself based on HDFS-1804. On Wed, Mar 5, 2014 at 3:47 PM, divye sheth wrote: > Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using > Hadoop 0.20.2 (we are in a process of upgrading) is there a workaround for > the short term to balance the disk utilization? The patch in the Jira, if > applied to the version that I am using, will it break anything? > > Thanks > Divye Sheth > > > On Wed, Mar 5, 2014 at 11:28 AM, Harsh J wrote: > >> You're probably looking for >> https://issues.apache.org/jira/browse/HDFS-1804 >> >> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth >> wrote: >> > Hi, >> > >> > I am new to the mailing list. >> > >> > I am using Hadoop 0.20.2 with an append r1056497 version. The >> question I >> > have is related to balancing. I have a 5 datanode cluster and each >> node has >> > 2 disks attached to it. The second disk was added when the first >> disk was >> > reaching its capacity. >> > >> > Now the scenario that I am facing is, when the new disk was added >> hadoop >> > automatically moved over some data to the new disk. But over the >> time I >> > notice that data is no longer being written to the second disk. I >> have also >> > faced an issue on the datanode where the first disk had 100% >> utilization. >> > >> > How can I overcome such scenario, is it not hadoop's job to balance >> the disk >> > utilization between multiple disks on single datanode? >> > >> > Thanks >> > Divye Sheth >> >> >> >> -- >> Harsh J >> > > >>> >> >
Re: Question on DFS Balancing
Does this require any downtime? I guess it should and any other precautions that I should take? Thanks Azuryy. On Wed, Mar 5, 2014 at 2:19 PM, Azuryy Yu wrote: > you can write a simple tool to move blocks peer to peer. I had such tool > before, but I cannot find it now. > > background: our cluster is not balanced, load balancer is very slow, so i > wrote this tool to move blocks from one node to another node. > > > On Wed, Mar 5, 2014 at 4:06 PM, divye sheth wrote: > >> I wont be in a position to fix that depending on HDFS-1804 as we are >> upgrading to CDH4 in the coming month. Just wanted a short term solution. I >> have read somewhere that manual movement of the blocks would help. Could >> some one guide me to the exact steps or precautions I should take while >> doing this? Data loss is a NO NO for me. >> >> Thanks >> Divye Sheth >> >> >> On Wed, Mar 5, 2014 at 1:28 PM, Azuryy Yu wrote: >> >>> Hi, >>> That probably break something if you apply the patch from 2.x to 0.20.x, >>> but it depends on. >>> >>> AFAIK, Balancer had a major refactor in HDFSv2, so you'd better fix it >>> by yourself based on HDFS-1804. >>> >>> >>> >>> On Wed, Mar 5, 2014 at 3:47 PM, divye sheth wrote: >>> Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using Hadoop 0.20.2 (we are in a process of upgrading) is there a workaround for the short term to balance the disk utilization? The patch in the Jira, if applied to the version that I am using, will it break anything? Thanks Divye Sheth On Wed, Mar 5, 2014 at 11:28 AM, Harsh J wrote: > You're probably looking for > https://issues.apache.org/jira/browse/HDFS-1804 > > On Tue, Mar 4, 2014 at 5:54 AM, divye sheth > wrote: > > Hi, > > > > I am new to the mailing list. > > > > I am using Hadoop 0.20.2 with an append r1056497 version. The > question I > > have is related to balancing. I have a 5 datanode cluster and each > node has > > 2 disks attached to it. The second disk was added when the first > disk was > > reaching its capacity. > > > > Now the scenario that I am facing is, when the new disk was added > hadoop > > automatically moved over some data to the new disk. But over the > time I > > notice that data is no longer being written to the second disk. I > have also > > faced an issue on the datanode where the first disk had 100% > utilization. > > > > How can I overcome such scenario, is it not hadoop's job to balance > the disk > > utilization between multiple disks on single datanode? > > > > Thanks > > Divye Sheth > > > > -- > Harsh J > >>> >> >
Re: Question on DFS Balancing
you can write a simple tool to move blocks peer to peer. I had such tool before, but I cannot find it now. background: our cluster is not balanced, load balancer is very slow, so i wrote this tool to move blocks from one node to another node. On Wed, Mar 5, 2014 at 4:06 PM, divye sheth wrote: > I wont be in a position to fix that depending on HDFS-1804 as we are > upgrading to CDH4 in the coming month. Just wanted a short term solution. I > have read somewhere that manual movement of the blocks would help. Could > some one guide me to the exact steps or precautions I should take while > doing this? Data loss is a NO NO for me. > > Thanks > Divye Sheth > > > On Wed, Mar 5, 2014 at 1:28 PM, Azuryy Yu wrote: > >> Hi, >> That probably break something if you apply the patch from 2.x to 0.20.x, >> but it depends on. >> >> AFAIK, Balancer had a major refactor in HDFSv2, so you'd better fix it by >> yourself based on HDFS-1804. >> >> >> >> On Wed, Mar 5, 2014 at 3:47 PM, divye sheth wrote: >> >>> Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using >>> Hadoop 0.20.2 (we are in a process of upgrading) is there a workaround for >>> the short term to balance the disk utilization? The patch in the Jira, if >>> applied to the version that I am using, will it break anything? >>> >>> Thanks >>> Divye Sheth >>> >>> >>> On Wed, Mar 5, 2014 at 11:28 AM, Harsh J wrote: >>> You're probably looking for https://issues.apache.org/jira/browse/HDFS-1804 On Tue, Mar 4, 2014 at 5:54 AM, divye sheth wrote: > Hi, > > I am new to the mailing list. > > I am using Hadoop 0.20.2 with an append r1056497 version. The question I > have is related to balancing. I have a 5 datanode cluster and each node has > 2 disks attached to it. The second disk was added when the first disk was > reaching its capacity. > > Now the scenario that I am facing is, when the new disk was added hadoop > automatically moved over some data to the new disk. But over the time I > notice that data is no longer being written to the second disk. I have also > faced an issue on the datanode where the first disk had 100% utilization. > > How can I overcome such scenario, is it not hadoop's job to balance the disk > utilization between multiple disks on single datanode? > > Thanks > Divye Sheth -- Harsh J >>> >>> >> >
Re: Question on DFS Balancing
I wont be in a position to fix that depending on HDFS-1804 as we are upgrading to CDH4 in the coming month. Just wanted a short term solution. I have read somewhere that manual movement of the blocks would help. Could some one guide me to the exact steps or precautions I should take while doing this? Data loss is a NO NO for me. Thanks Divye Sheth On Wed, Mar 5, 2014 at 1:28 PM, Azuryy Yu wrote: > Hi, > That probably break something if you apply the patch from 2.x to 0.20.x, > but it depends on. > > AFAIK, Balancer had a major refactor in HDFSv2, so you'd better fix it by > yourself based on HDFS-1804. > > > > On Wed, Mar 5, 2014 at 3:47 PM, divye sheth wrote: > >> Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using >> Hadoop 0.20.2 (we are in a process of upgrading) is there a workaround for >> the short term to balance the disk utilization? The patch in the Jira, if >> applied to the version that I am using, will it break anything? >> >> Thanks >> Divye Sheth >> >> >> On Wed, Mar 5, 2014 at 11:28 AM, Harsh J wrote: >> >>> You're probably looking for >>> https://issues.apache.org/jira/browse/HDFS-1804 >>> >>> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth >>> wrote: >>> > Hi, >>> > >>> > I am new to the mailing list. >>> > >>> > I am using Hadoop 0.20.2 with an append r1056497 version. The question >>> I >>> > have is related to balancing. I have a 5 datanode cluster and each >>> node has >>> > 2 disks attached to it. The second disk was added when the first disk >>> was >>> > reaching its capacity. >>> > >>> > Now the scenario that I am facing is, when the new disk was added >>> hadoop >>> > automatically moved over some data to the new disk. But over the time I >>> > notice that data is no longer being written to the second disk. I have >>> also >>> > faced an issue on the datanode where the first disk had 100% >>> utilization. >>> > >>> > How can I overcome such scenario, is it not hadoop's job to balance >>> the disk >>> > utilization between multiple disks on single datanode? >>> > >>> > Thanks >>> > Divye Sheth >>> >>> >>> >>> -- >>> Harsh J >>> >> >> >
Re: Question on DFS Balancing
Hi, That probably break something if you apply the patch from 2.x to 0.20.x, but it depends on. AFAIK, Balancer had a major refactor in HDFSv2, so you'd better fix it by yourself based on HDFS-1804. On Wed, Mar 5, 2014 at 3:47 PM, divye sheth wrote: > Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using Hadoop > 0.20.2 (we are in a process of upgrading) is there a workaround for the > short term to balance the disk utilization? The patch in the Jira, if > applied to the version that I am using, will it break anything? > > Thanks > Divye Sheth > > > On Wed, Mar 5, 2014 at 11:28 AM, Harsh J wrote: > >> You're probably looking for >> https://issues.apache.org/jira/browse/HDFS-1804 >> >> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth wrote: >> > Hi, >> > >> > I am new to the mailing list. >> > >> > I am using Hadoop 0.20.2 with an append r1056497 version. The question I >> > have is related to balancing. I have a 5 datanode cluster and each node >> has >> > 2 disks attached to it. The second disk was added when the first disk >> was >> > reaching its capacity. >> > >> > Now the scenario that I am facing is, when the new disk was added hadoop >> > automatically moved over some data to the new disk. But over the time I >> > notice that data is no longer being written to the second disk. I have >> also >> > faced an issue on the datanode where the first disk had 100% >> utilization. >> > >> > How can I overcome such scenario, is it not hadoop's job to balance the >> disk >> > utilization between multiple disks on single datanode? >> > >> > Thanks >> > Divye Sheth >> >> >> >> -- >> Harsh J >> > >
Re: Question on DFS Balancing
Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using Hadoop 0.20.2 (we are in a process of upgrading) is there a workaround for the short term to balance the disk utilization? The patch in the Jira, if applied to the version that I am using, will it break anything? Thanks Divye Sheth On Wed, Mar 5, 2014 at 11:28 AM, Harsh J wrote: > You're probably looking for > https://issues.apache.org/jira/browse/HDFS-1804 > > On Tue, Mar 4, 2014 at 5:54 AM, divye sheth wrote: > > Hi, > > > > I am new to the mailing list. > > > > I am using Hadoop 0.20.2 with an append r1056497 version. The question I > > have is related to balancing. I have a 5 datanode cluster and each node > has > > 2 disks attached to it. The second disk was added when the first disk was > > reaching its capacity. > > > > Now the scenario that I am facing is, when the new disk was added hadoop > > automatically moved over some data to the new disk. But over the time I > > notice that data is no longer being written to the second disk. I have > also > > faced an issue on the datanode where the first disk had 100% utilization. > > > > How can I overcome such scenario, is it not hadoop's job to balance the > disk > > utilization between multiple disks on single datanode? > > > > Thanks > > Divye Sheth > > > > -- > Harsh J >
Re: Question on DFS Balancing
You're probably looking for https://issues.apache.org/jira/browse/HDFS-1804 On Tue, Mar 4, 2014 at 5:54 AM, divye sheth wrote: > Hi, > > I am new to the mailing list. > > I am using Hadoop 0.20.2 with an append r1056497 version. The question I > have is related to balancing. I have a 5 datanode cluster and each node has > 2 disks attached to it. The second disk was added when the first disk was > reaching its capacity. > > Now the scenario that I am facing is, when the new disk was added hadoop > automatically moved over some data to the new disk. But over the time I > notice that data is no longer being written to the second disk. I have also > faced an issue on the datanode where the first disk had 100% utilization. > > How can I overcome such scenario, is it not hadoop's job to balance the disk > utilization between multiple disks on single datanode? > > Thanks > Divye Sheth -- Harsh J