Better to create one client/gateway node(where no DN is running) and schedule your cron from that machine.
Thanks & Regards, B Anil Kumar. On Wed, Sep 3, 2014 at 1:25 PM, Georgi Ivanov <[email protected]> wrote: > Hi, > We have 11 nodes cluster. > Every hour a cron job is started to upload one file( ~1GB) to Hadoop on > node1. (plain hadoop fs -put) > > This way node1 is getting full because the first replica is always > stored on the node where the command is executed. > Every day i am running re-balance, but this seems to be not enough. > The effect of this is : > host1 4.7TB/5.3TB > host[2-10] : 4.1/5.3 > > So i am always out of space on host1. > > What i can do is , spread the job to all the nodes and execute the job > on random host. > I don't really like this solution as it involves some NFS mounts, > security issues etc. > > Is there any better solution ? > > Thanks in advance. > George > >
