Thank You Very much. This is what I am trying to do. This is what storage I have.
Filesystem Size Used Avail Use% Mounted on /dev/xvda2 5.9G 5.3G 238M 96% / /dev/xvda4 7.9G 147M 7.4G 2% /mnt I have configured in dfs.datanode.dir in hdfs-site. <name>dfs.datanode.data.dir</name> <value>/mnt</value> I have formatted the name node and restarted and it is still copying to / and if it is full it throws an error instead of copying to /mnt¹. Error: 14/10/03 15:23:21 WARN hdfs.DFSClient: Could not get block locations. Source file "/user/hduser/getty/data4" - Aborting... put: java.io.IOException: File /user/hduser/getty/data4 could only be replicated to 0 nodes, instead of 1 14/10/03 15:23:21 ERROR hdfs.DFSClient: Failed to close file /user/hduser/getty/data4 Am I doing anything wrong here ? Thanks & Regards, Abdul Navaz Research Assistant University of Houston Main Campus, Houston TX Ph: 281-685-0388 From: ViSolve Hadoop Support <[email protected]> Reply-To: <[email protected]> Date: Friday, October 3, 2014 at 1:29 AM To: <[email protected]> Subject: Re: No space when running a hadoop job Hello, If you want to use drive /dev/xvda4 only, then add file location for '/dev/xvda4' and remove the file location for '/dev/xvda2' under "dfs.datanode.data.dir". After the changes restart the hadoop services and check the available space using the below command. # hadoop fs -df -h Regards, ViSolve Hadoop Team On 10/3/2014 4:36 AM, Abdul Navaz wrote: > > > Hello, > > > > > As you suggested I have changed the hdfs-site.xml file of datanodes and name > node as below and formatted the name node. > > > > > > > </property> > > > <property> > > > <name>dfs.datanode.data.dir</name> > > > <value>/mnt</value> > > > <description>Comma separated list of paths. Use the list of directories from > $DFS_DATA_DIR. > > > For example, > /grid/hadoop/hdfs/dn,/grid1/hadoop/hdfs/dn.</description> > > > </property> > > > > > > > > > > > hduser@dn1:~$ df -h > > > Filesystem Size Used Avail Use% Mounted > on > > > /dev/xvda2 5.9G 5.3G 258M 96% / > > > udev 98M 4.0K 98M 1% /dev > > > tmpfs 48M 196K 48M 1% /run > > > none 5.0M 0 5.0M 0% > /run/lock > > > none 120M 0 120M 0% > /run/shm > > > 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 113G 70G 62% > /groups/ch-geni-net/Hadoop-NET > > > 172.17.253.254:/q/proj/ch-geni-net 198G 113G 70G 62% > /proj/ch-geni-net > > > /dev/xvda4 7.9G 147M 7.4G 2% /mnt > > > hduser@dn1:~$ > > > > > > > > > Even after doing so, the file is copied only to /dev/xvda2 instead of > /dev/xvda4. > > > > > Once /dev/xvda2 is full I am getting the below error message. > > > > > > > hduser@nn:~$ hadoop fs -put file.txtac /user/hduser/getty/file12.txt > > > Warning: $HADOOP_HOME is deprecated. > > > > > > > 14/10/02 16:52:52 WARN hdfs.DFSClient: DataStreamer Exception: > org.apache.hadoop.ipc.RemoteException: java.io.IOException: File > /user/hduser/getty/file12.txt could only be replicated to 0 nodes, instead of > 1 > > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNames > ystem.java:1639) > > > > > > > > > > > > > Let me say like this: I don¹t want to use /dev/xvda2 as it has capacity of > 5.9GB , I want to use only /dev/xvda4. How can I do this ? > > > > > > > > > > > > > > Thanks & Regards, > > > > > Abdul Navaz > > Research Assistant > > University of Houston Main Campus, Houston TX > > Ph: 281-685-0388 > > > > > > > > > > From: Abdul Navaz <[email protected]> > Date: Monday, September 29, 2014 at 1:53 PM > To: <[email protected]> > Subject: Re: No space when running a hadoop job > > > > > > > > > > Dear All, > > > > > I am not doing load balancing here. I am just copying a file and it is > throwing me an error no space left on the device. > > > > > > > > > > hduser@dn1:~$ df -h > > > Filesystem Size Used Avail Use% Mounted > on > > > /dev/xvda2 5.9G 5.1G 533M 91% / > > > udev 98M 4.0K 98M 1% /dev > > > tmpfs 48M 196K 48M 1% /run > > > none 5.0M 0 5.0M 0% > /run/lock > > > none 120M 0 120M 0% > /run/shm > > > 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 116G 67G 64% > /groups/ch-geni-net/Hadoop-NET > > > 172.17.253.254:/q/proj/ch-geni-net 198G 116G 67G 64% > /proj/ch-geni-net > > > /dev/xvda4 7.9G 147M 7.4G 2% /mnt > > > hduser@dn1:~$ > > > hduser@dn1:~$ > > > hduser@dn1:~$ > > > hduser@dn1:~$ cp data2.txt data3.txt > > > cp: writing `data3.txt': No space left on device > > > cp: failed to extend `data3.txt': No space left on device > > > hduser@dn1:~$ > > > > > > > I guess by default it is copying to default location. Why I am getting this > error ? How can I fix this ? > > > > > > > > Thanks & Regards, > > > > > Abdul Navaz > > Research Assistant > > University of Houston Main Campus, Houston TX > > Ph: 281-685-0388 > > > > > > > > > > > From: Aitor Cedres <[email protected]> > Reply-To: <[email protected]> > Date: Monday, September 29, 2014 at 7:53 AM > To: <[email protected]> > Subject: Re: No space when running a hadoop job > > > > > > > > I think they way it works when HDFS has a list in dfs.datanode.data.dir, it's > basically a round robin between disks. And yes, it may not be perfect balanced > cause of different file sizes. > > > > > > > > > On 29 September 2014 13:15, Susheel Kumar Gadalay <[email protected]> wrote: > >> Thank Aitor. >> >> That is what is my observation too. >> >> I added a new disk location and manually moved some files. >> >> But if 2 locations are given at the beginning itself for >> dfs.datanode.data.dir, will hadoop balance the disks usage, if not >> perfect because file sizes may differ. >> >> >> >> On 9/29/14, Aitor Cedres <[email protected]> wrote: >>> > Hi Susheel, >>> > >>> > Adding a new directory to ³dfs.datanode.data.dir² will not balance your >>> > disks straightforward. Eventually, by HDFS activity >>> (deleting/invalidating >>> > some block, writing new ones), the disks will become balanced. If you >>> want >>> > to balance them right after adding the new disk and changing the >>> > ³dfs.datanode.data.dir² >>> > value, you have to shutdown the DN and manually move (mv) some files in >>> the >>> > old directory to the new one. >>> > >>> > The balancer will try to balance the usage between HDFS nodes, but it >>> won't >>> > care about "internal" node disks utilization. For your particular case, >>> the >>> > balancer won't fix your issue. >>> > >>> > Hope it helps, >>> > Aitor >>> > >>> > On 29 September 2014 05:53, Susheel Kumar Gadalay <[email protected]> >>> > wrote: >>> > >>>> >> You mean if multiple directory locations are given, Hadoop will >>>> >> balance the distribution of files across these different directories. >>>> >> >>>> >> But normally we start with 1 directory location and once it is >>>> >> reaching the maximum, we add new directory. >>>> >> >>>> >> In this case how can we balance the distribution of files? >>>> >> >>>> >> One way is to list the files and move. >>>> >> >>>> >> Will start balance script will work? >>>> >> >>>> >> On 9/27/14, Alexander Pivovarov <[email protected]> wrote: >>>>> >> > It can read/write in parallel to all drives. More hdd more io speed. >>>>> >> > On Sep 27, 2014 7:28 AM, "Susheel Kumar Gadalay" >>>>> <[email protected]> >>>>> >> > wrote: >>>>> >> > >>>>>> >> >> Correct me if I am wrong. >>>>>> >> >> >>>>>> >> >> Adding multiple directories will not balance the files >>>>>> distributions >>>>>> >> >> across these locations. >>>>>> >> >> >>>>>> >> >> Hadoop will add exhaust the first directory and then start using the >>>>>> >> >> next, next .. >>>>>> >> >> >>>>>> >> >> How can I tell Hadoop to evenly balance across these directories. >>>>>> >> >> >>>>>> >> >> On 9/26/14, Matt Narrell <[email protected]> wrote: >>>>>>> >> >> > You can add a comma separated list of paths to the >>>>>> >> >> ³dfs.datanode.data.dir² >>>>>>> >> >> > property in your hdfs-site.xml >>>>>>> >> >> > >>>>>>> >> >> > mn >>>>>>> >> >> > >>>>>>> >> >> > On Sep 26, 2014, at 8:37 AM, Abdul Navaz <[email protected]> >>>>>>> >> >> > wrote: >>>>>>> >> >> > >>>>>>>> >> >> >> Hi >>>>>>>> >> >> >> >>>>>>>> >> >> >> I am facing some space issue when I saving file into HDFS and/or >>>>>>>> >> >> >> running >>>>>>>> >> >> >> map reduce job. >>>>>>>> >> >> >> >>>>>>>> >> >> >> root@nn:~# df -h >>>>>>>> >> >> >> Filesystem Size Used Avail >>>> >> Use% >>>>>>>> >> >> >> Mounted on >>>>>>>> >> >> >> /dev/xvda2 5.9G 5.9G 0 >>>> >> 100% >>>>>>>> >> >> >> / >>>>>>>> >> >> >> udev 98M 4.0K 98M >>>> >> 1% >>>>>>>> >> >> >> /dev >>>>>>>> >> >> >> tmpfs 48M 192K 48M >>>> >> 1% >>>>>>>> >> >> >> /run >>>>>>>> >> >> >> none 5.0M 0 5.0M >>>> >> 0% >>>>>>>> >> >> >> /run/lock >>>>>>>> >> >> >> none 120M 0 120M >>>> >> 0% >>>>>>>> >> >> >> /run/shm >>>>>>>> >> >> >> overflow 1.0M 4.0K 1020K >>>> >> 1% >>>>>>>> >> >> >> /tmp >>>>>>>> >> >> >> /dev/xvda4 7.9G 147M 7.4G >>>> >> 2% >>>>>>>> >> >> >> /mnt >>>>>>>> >> >> >> 172.17.253.254:/q/groups/ch-geni-net/Hadoop-NET 198G 108G 75G >>>> >> 59% >>>>>>>> >> >> >> /groups/ch-geni-net/Hadoop-NET >>>>>>>> >> >> >> 172.17.253.254:/q/proj/ch-geni-net 198G 108G 75G >>>> >> 59% >>>>>>>> >> >> >> /proj/ch-geni-net >>>>>>>> >> >> >> root@nn:~# >>>>>>>> >> >> >> >>>>>>>> >> >> >> >>>>>>>> >> >> >> I can see there is no space left on /dev/xvda2. >>>>>>>> >> >> >> >>>>>>>> >> >> >> How can I make hadoop to see newly mounted /dev/xvda4 ? Or do I >>>>>>>> >> >> >> need >>>>>>>> >> >> >> to >>>>>>>> >> >> >> move the file manually from /dev/xvda2 to xvda4 ? >>>>>>>> >> >> >> >>>>>>>> >> >> >> >>>>>>>> >> >> >> >>>>>>>> >> >> >> Thanks & Regards, >>>>>>>> >> >> >> >>>>>>>> >> >> >> Abdul Navaz >>>>>>>> >> >> >> Research Assistant >>>>>>>> >> >> >> University of Houston Main Campus, Houston TX >>>>>>>> >> >> >> Ph: 281-685-0388 >>>>>>>> >> >> >> >>>>>>> >> >> > >>>>>>> >> >> > >>>>>> >> >> >>>>> >> > >>>> >> >>> > >> >> >> > > > > > > >
