Re: HDFS out of space
Are you seeing any exceptions because of the disk being at 99% capacity? Hadoop should do something sane here and write new data to the disk with more capacity. That said, it is ideal to be balanced. As far as I know, there is no way to balance an individual DataNode's hard drives (Hadoop does round-robin scheduling when writing data). Alex On Mon, Jun 22, 2009 at 10:12 AM, Kris Jirapinyo kjirapi...@biz360.comwrote: Hi all, How does one handle a mount running out of space for HDFS? We have two disks mounted on /mnt and /mnt2 respectively on one of the machines that are used for HDFS, and /mnt is at 99% while /mnt2 is at 30%. Is there a way to tell the machine to balance itself out? I know for the cluster, you can balance it using start-balancer.sh but I don't think that it will tell the individual machine to balance itself out. Our hack right now would be just to delete the data on /mnt, since we have replication of 3x, we should be OK. But I'd prefer not to do that. Any thoughts?
Re: HDFS out of space
Hey Alex, Will Hadoop balancer utility work in this case? Pankil On Mon, Jun 22, 2009 at 4:30 PM, Alex Loddengaard a...@cloudera.com wrote: Are you seeing any exceptions because of the disk being at 99% capacity? Hadoop should do something sane here and write new data to the disk with more capacity. That said, it is ideal to be balanced. As far as I know, there is no way to balance an individual DataNode's hard drives (Hadoop does round-robin scheduling when writing data). Alex On Mon, Jun 22, 2009 at 10:12 AM, Kris Jirapinyo kjirapi...@biz360.com wrote: Hi all, How does one handle a mount running out of space for HDFS? We have two disks mounted on /mnt and /mnt2 respectively on one of the machines that are used for HDFS, and /mnt is at 99% while /mnt2 is at 30%. Is there a way to tell the machine to balance itself out? I know for the cluster, you can balance it using start-balancer.sh but I don't think that it will tell the individual machine to balance itself out. Our hack right now would be just to delete the data on /mnt, since we have replication of 3x, we should be OK. But I'd prefer not to do that. Any thoughts?
Re: HDFS out of space
Pankil- I'd be interested to know the size of the /mnt and /mnt2 partitions. Are they the same? Can you run the following and report the output... % df -h /mnt /mnt2 Thanks. -Matt On Jun 22, 2009, at 1:32 PM, Pankil Doshi wrote: Hey Alex, Will Hadoop balancer utility work in this case? Pankil On Mon, Jun 22, 2009 at 4:30 PM, Alex Loddengaard a...@cloudera.com wrote: Are you seeing any exceptions because of the disk being at 99% capacity? Hadoop should do something sane here and write new data to the disk with more capacity. That said, it is ideal to be balanced. As far as I know, there is no way to balance an individual DataNode's hard drives (Hadoop does round-robin scheduling when writing data). Alex On Mon, Jun 22, 2009 at 10:12 AM, Kris Jirapinyo kjirapi...@biz360.com wrote: Hi all, How does one handle a mount running out of space for HDFS? We have two disks mounted on /mnt and /mnt2 respectively on one of the machines that are used for HDFS, and /mnt is at 99% while /mnt2 is at 30%. Is there a way to tell the machine to balance itself out? I know for the cluster, you can balance it using start-balancer.sh but I don't think that it will tell the individual machine to balance itself out. Our hack right now would be just to delete the data on /mnt, since we have replication of 3x, we should be OK. But I'd prefer not to do that. Any thoughts?
Re: HDFS out of space
On 6/22/09 10:12 AM, Kris Jirapinyo kjirapi...@biz360.com wrote: Hi all, How does one handle a mount running out of space for HDFS? We have two disks mounted on /mnt and /mnt2 respectively on one of the machines that are used for HDFS, and /mnt is at 99% while /mnt2 is at 30%. Is there a way to tell the machine to balance itself out? I know for the cluster, you can balance it using start-balancer.sh but I don't think that it will tell the individual machine to balance itself out. Our hack right now would be just to delete the data on /mnt, since we have replication of 3x, we should be OK. But I'd prefer not to do that. Any thoughts? Decommission the entire node, wait for data to be replicated, re-commission, then do HDFS rebalance. It blows, no doubt about it, but the admin tools in the space are... lacking.
Re: HDFS out of space
I have used the balancer to balance the data in the cluster with the -threshold option. The bandwidth transfer was set to 1MB/sec ( I think thats the default setting) in one of the config files and had to move 500GB of data around. It did take sometime but eventually the data got spread out evenly. In my case i was using one of the machines as the masternode and datanode at the same time which is why this one machine consumed more as compared to the other datanodes. Thanks, Usman Hey Alex, Will Hadoop balancer utility work in this case? Pankil On Mon, Jun 22, 2009 at 4:30 PM, Alex Loddengaard a...@cloudera.com wrote: Are you seeing any exceptions because of the disk being at 99% capacity? Hadoop should do something sane here and write new data to the disk with more capacity. That said, it is ideal to be balanced. As far as I know, there is no way to balance an individual DataNode's hard drives (Hadoop does round-robin scheduling when writing data). Alex On Mon, Jun 22, 2009 at 10:12 AM, Kris Jirapinyo kjirapi...@biz360.com wrote: Hi all, How does one handle a mount running out of space for HDFS? We have two disks mounted on /mnt and /mnt2 respectively on one of the machines that are used for HDFS, and /mnt is at 99% while /mnt2 is at 30%. Is there a way to tell the machine to balance itself out? I know for the cluster, you can balance it using start-balancer.sh but I don't think that it will tell the individual machine to balance itself out. Our hack right now would be just to delete the data on /mnt, since we have replication of 3x, we should be OK. But I'd prefer not to do that. Any thoughts? -- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
Re: HDFS out of space
Matt. Kris can give that info.. I am one of the users from mailing list. PAnkil On Mon, Jun 22, 2009 at 4:37 PM, Matt Massie m...@cloudera.com wrote: Pankil- I'd be interested to know the size of the /mnt and /mnt2 partitions. Are they the same? Can you run the following and report the output... % df -h /mnt /mnt2 Thanks. -Matt On Jun 22, 2009, at 1:32 PM, Pankil Doshi wrote: Hey Alex, Will Hadoop balancer utility work in this case? Pankil On Mon, Jun 22, 2009 at 4:30 PM, Alex Loddengaard a...@cloudera.com wrote: Are you seeing any exceptions because of the disk being at 99% capacity? Hadoop should do something sane here and write new data to the disk with more capacity. That said, it is ideal to be balanced. As far as I know, there is no way to balance an individual DataNode's hard drives (Hadoop does round-robin scheduling when writing data). Alex On Mon, Jun 22, 2009 at 10:12 AM, Kris Jirapinyo kjirapi...@biz360.com wrote: Hi all, How does one handle a mount running out of space for HDFS? We have two disks mounted on /mnt and /mnt2 respectively on one of the machines that are used for HDFS, and /mnt is at 99% while /mnt2 is at 30%. Is there a way to tell the machine to balance itself out? I know for the cluster, you can balance it using start-balancer.sh but I don't think that it will tell the individual machine to balance itself out. Our hack right now would be just to delete the data on /mnt, since we have replication of 3x, we should be OK. But I'd prefer not to do that. Any thoughts?
Re: HDFS out of space
It's a typical Amazon EC2 Large instance, so 414G each. -- Kris. On Mon, Jun 22, 2009 at 1:37 PM, Matt Massie m...@cloudera.com wrote: Pankil- I'd be interested to know the size of the /mnt and /mnt2 partitions. Are they the same? Can you run the following and report the output... % df -h /mnt /mnt2 Thanks. -Matt On Jun 22, 2009, at 1:32 PM, Pankil Doshi wrote: Hey Alex, Will Hadoop balancer utility work in this case? Pankil On Mon, Jun 22, 2009 at 4:30 PM, Alex Loddengaard a...@cloudera.com wrote: Are you seeing any exceptions because of the disk being at 99% capacity? Hadoop should do something sane here and write new data to the disk with more capacity. That said, it is ideal to be balanced. As far as I know, there is no way to balance an individual DataNode's hard drives (Hadoop does round-robin scheduling when writing data). Alex On Mon, Jun 22, 2009 at 10:12 AM, Kris Jirapinyo kjirapi...@biz360.com wrote: Hi all, How does one handle a mount running out of space for HDFS? We have two disks mounted on /mnt and /mnt2 respectively on one of the machines that are used for HDFS, and /mnt is at 99% while /mnt2 is at 30%. Is there a way to tell the machine to balance itself out? I know for the cluster, you can balance it using start-balancer.sh but I don't think that it will tell the individual machine to balance itself out. Our hack right now would be just to delete the data on /mnt, since we have replication of 3x, we should be OK. But I'd prefer not to do that. Any thoughts?