Re: HDFS out of space

2009-06-22 Thread Alex Loddengaard
Are you seeing any exceptions because of the disk being at 99% capacity?

Hadoop should do something sane here and write new data to the disk with
more capacity.  That said, it is ideal to be balanced.  As far as I know,
there is no way to balance an individual DataNode's hard drives (Hadoop does
round-robin scheduling when writing data).

Alex

On Mon, Jun 22, 2009 at 10:12 AM, Kris Jirapinyo kjirapi...@biz360.comwrote:

 Hi all,
How does one handle a mount running out of space for HDFS?  We have two
 disks mounted on /mnt and /mnt2 respectively on one of the machines that
 are
 used for HDFS, and /mnt is at 99% while /mnt2 is at 30%.  Is there a way to
 tell the machine to balance itself out?  I know for the cluster, you can
 balance it using start-balancer.sh but I don't think that it will tell the
 individual machine to balance itself out.  Our hack right now would be
 just to delete the data on /mnt, since we have replication of 3x, we should
 be OK.  But I'd prefer not to do that.  Any thoughts?



Re: HDFS out of space

2009-06-22 Thread Pankil Doshi
Hey Alex,

Will Hadoop balancer utility work in this case?

Pankil

On Mon, Jun 22, 2009 at 4:30 PM, Alex Loddengaard a...@cloudera.com wrote:

 Are you seeing any exceptions because of the disk being at 99% capacity?

 Hadoop should do something sane here and write new data to the disk with
 more capacity.  That said, it is ideal to be balanced.  As far as I know,
 there is no way to balance an individual DataNode's hard drives (Hadoop
 does
 round-robin scheduling when writing data).

 Alex

 On Mon, Jun 22, 2009 at 10:12 AM, Kris Jirapinyo kjirapi...@biz360.com
 wrote:

  Hi all,
 How does one handle a mount running out of space for HDFS?  We have
 two
  disks mounted on /mnt and /mnt2 respectively on one of the machines that
  are
  used for HDFS, and /mnt is at 99% while /mnt2 is at 30%.  Is there a way
 to
  tell the machine to balance itself out?  I know for the cluster, you can
  balance it using start-balancer.sh but I don't think that it will tell
 the
  individual machine to balance itself out.  Our hack right now would be
  just to delete the data on /mnt, since we have replication of 3x, we
 should
  be OK.  But I'd prefer not to do that.  Any thoughts?
 



Re: HDFS out of space

2009-06-22 Thread Matt Massie

Pankil-

I'd be interested to know the size of the /mnt and /mnt2 partitions.   
Are they the same?  Can you run the following and report the output...


% df -h /mnt /mnt2

Thanks.

-Matt

On Jun 22, 2009, at 1:32 PM, Pankil Doshi wrote:


Hey Alex,

Will Hadoop balancer utility work in this case?

Pankil

On Mon, Jun 22, 2009 at 4:30 PM, Alex Loddengaard  
a...@cloudera.com wrote:


Are you seeing any exceptions because of the disk being at 99%  
capacity?


Hadoop should do something sane here and write new data to the disk  
with
more capacity.  That said, it is ideal to be balanced.  As far as I  
know,
there is no way to balance an individual DataNode's hard drives  
(Hadoop

does
round-robin scheduling when writing data).

Alex

On Mon, Jun 22, 2009 at 10:12 AM, Kris Jirapinyo kjirapi...@biz360.com

wrote:



Hi all,
  How does one handle a mount running out of space for HDFS?  We  
have

two
disks mounted on /mnt and /mnt2 respectively on one of the  
machines that

are
used for HDFS, and /mnt is at 99% while /mnt2 is at 30%.  Is there  
a way

to
tell the machine to balance itself out?  I know for the cluster,  
you can
balance it using start-balancer.sh but I don't think that it will  
tell

the
individual machine to balance itself out.  Our hack right now  
would be

just to delete the data on /mnt, since we have replication of 3x, we

should

be OK.  But I'd prefer not to do that.  Any thoughts?







Re: HDFS out of space

2009-06-22 Thread Allen Wittenauer



On 6/22/09 10:12 AM, Kris Jirapinyo kjirapi...@biz360.com wrote:

 Hi all,
 How does one handle a mount running out of space for HDFS?  We have two
 disks mounted on /mnt and /mnt2 respectively on one of the machines that are
 used for HDFS, and /mnt is at 99% while /mnt2 is at 30%.  Is there a way to
 tell the machine to balance itself out?  I know for the cluster, you can
 balance it using start-balancer.sh but I don't think that it will tell the
 individual machine to balance itself out.  Our hack right now would be
 just to delete the data on /mnt, since we have replication of 3x, we should
 be OK.  But I'd prefer not to do that.  Any thoughts?

Decommission the entire node, wait for data to be replicated,
re-commission, then do HDFS rebalance.  It blows, no doubt about it, but the
admin tools in the space are... lacking.




Re: HDFS out of space

2009-06-22 Thread Usman Waheed
I have used the balancer to balance the data in the cluster with the  
-threshold option. The bandwidth transfer was set to 1MB/sec ( I think  
thats the default setting) in one of the config files and had to move  
500GB of data around. It did take sometime but eventually the data got  
spread out evenly. In my case i was using one of the machines as the  
masternode and datanode at the same time which is why this one machine  
consumed more as compared to the other datanodes.


Thanks,
Usman



Hey Alex,

Will Hadoop balancer utility work in this case?

Pankil

On Mon, Jun 22, 2009 at 4:30 PM, Alex Loddengaard a...@cloudera.com  
wrote:



Are you seeing any exceptions because of the disk being at 99% capacity?

Hadoop should do something sane here and write new data to the disk with
more capacity.  That said, it is ideal to be balanced.  As far as I  
know,

there is no way to balance an individual DataNode's hard drives (Hadoop
does
round-robin scheduling when writing data).

Alex

On Mon, Jun 22, 2009 at 10:12 AM, Kris Jirapinyo kjirapi...@biz360.com
wrote:

 Hi all,
How does one handle a mount running out of space for HDFS?  We have
two
 disks mounted on /mnt and /mnt2 respectively on one of the machines  
that

 are
 used for HDFS, and /mnt is at 99% while /mnt2 is at 30%.  Is there a  
way

to
 tell the machine to balance itself out?  I know for the cluster, you  
can

 balance it using start-balancer.sh but I don't think that it will tell
the
 individual machine to balance itself out.  Our hack right now would  
be

 just to delete the data on /mnt, since we have replication of 3x, we
should
 be OK.  But I'd prefer not to do that.  Any thoughts?






--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/


Re: HDFS out of space

2009-06-22 Thread Pankil Doshi
Matt.

Kris can give that info..
I am one of the users from mailing list.

PAnkil

On Mon, Jun 22, 2009 at 4:37 PM, Matt Massie m...@cloudera.com wrote:

 Pankil-

 I'd be interested to know the size of the /mnt and /mnt2 partitions.  Are
 they the same?  Can you run the following and report the output...

 % df -h /mnt /mnt2

 Thanks.

 -Matt


 On Jun 22, 2009, at 1:32 PM, Pankil Doshi wrote:

  Hey Alex,

 Will Hadoop balancer utility work in this case?

 Pankil

 On Mon, Jun 22, 2009 at 4:30 PM, Alex Loddengaard a...@cloudera.com
 wrote:

  Are you seeing any exceptions because of the disk being at 99% capacity?

 Hadoop should do something sane here and write new data to the disk with
 more capacity.  That said, it is ideal to be balanced.  As far as I know,
 there is no way to balance an individual DataNode's hard drives (Hadoop
 does
 round-robin scheduling when writing data).

 Alex

 On Mon, Jun 22, 2009 at 10:12 AM, Kris Jirapinyo kjirapi...@biz360.com

 wrote:


  Hi all,
  How does one handle a mount running out of space for HDFS?  We have

 two

 disks mounted on /mnt and /mnt2 respectively on one of the machines that
 are
 used for HDFS, and /mnt is at 99% while /mnt2 is at 30%.  Is there a way

 to

 tell the machine to balance itself out?  I know for the cluster, you can
 balance it using start-balancer.sh but I don't think that it will tell

 the

 individual machine to balance itself out.  Our hack right now would be
 just to delete the data on /mnt, since we have replication of 3x, we

 should

 be OK.  But I'd prefer not to do that.  Any thoughts?






Re: HDFS out of space

2009-06-22 Thread Kris Jirapinyo
It's a typical Amazon EC2 Large instance, so 414G each.

-- Kris.

On Mon, Jun 22, 2009 at 1:37 PM, Matt Massie m...@cloudera.com wrote:

 Pankil-

 I'd be interested to know the size of the /mnt and /mnt2 partitions.  Are
 they the same?  Can you run the following and report the output...

 % df -h /mnt /mnt2

 Thanks.

 -Matt


 On Jun 22, 2009, at 1:32 PM, Pankil Doshi wrote:

  Hey Alex,

 Will Hadoop balancer utility work in this case?

 Pankil

 On Mon, Jun 22, 2009 at 4:30 PM, Alex Loddengaard a...@cloudera.com
 wrote:

  Are you seeing any exceptions because of the disk being at 99% capacity?

 Hadoop should do something sane here and write new data to the disk with
 more capacity.  That said, it is ideal to be balanced.  As far as I know,
 there is no way to balance an individual DataNode's hard drives (Hadoop
 does
 round-robin scheduling when writing data).

 Alex

 On Mon, Jun 22, 2009 at 10:12 AM, Kris Jirapinyo kjirapi...@biz360.com

 wrote:


  Hi all,
  How does one handle a mount running out of space for HDFS?  We have

 two

 disks mounted on /mnt and /mnt2 respectively on one of the machines that
 are
 used for HDFS, and /mnt is at 99% while /mnt2 is at 30%.  Is there a way

 to

 tell the machine to balance itself out?  I know for the cluster, you can
 balance it using start-balancer.sh but I don't think that it will tell

 the

 individual machine to balance itself out.  Our hack right now would be
 just to delete the data on /mnt, since we have replication of 3x, we

 should

 be OK.  But I'd prefer not to do that.  Any thoughts?