Hi, We have 11 nodes cluster. Every hour a cron job is started to upload one file( ~1GB) to Hadoop on node1. (plain hadoop fs -put)
This way node1 is getting full because the first replica is always stored on the node where the command is executed. Every day i am running re-balance, but this seems to be not enough. The effect of this is : host1 4.7TB/5.3TB host[2-10] : 4.1/5.3 So i am always out of space on host1. What i can do is , spread the job to all the nodes and execute the job on random host. I don't really like this solution as it involves some NFS mounts, security issues etc. Is there any better solution ? Thanks in advance. George
