Can you run the load from an "edge node" that is not a DataNode? john
John Lilley Chief Architect, RedPoint Global Inc. 1515 Walnut Street | Suite 300 | Boulder, CO 80302 T: +1 303 541 1516 | M: +1 720 938 5761 | F: +1 781-705-2077 Skype: jlilley.redpoint | [email protected] | www.redpoint.net -----Original Message----- From: Georgi Ivanov [mailto:[email protected]] Sent: Wednesday, September 03, 2014 1:56 AM To: [email protected] Subject: HDFS balance Hi, We have 11 nodes cluster. Every hour a cron job is started to upload one file( ~1GB) to Hadoop on node1. (plain hadoop fs -put) This way node1 is getting full because the first replica is always stored on the node where the command is executed. Every day i am running re-balance, but this seems to be not enough. The effect of this is : host1 4.7TB/5.3TB host[2-10] : 4.1/5.3 So i am always out of space on host1. What i can do is , spread the job to all the nodes and execute the job on random host. I don't really like this solution as it involves some NFS mounts, security issues etc. Is there any better solution ? Thanks in advance. George
