Using NFS without HDFS
Hello, I'm trying to assemble a simple setup of 3 nodes using NFS as Distributed Filesystem. Box A: 192.168.2.3, this box is either the NFS server and working as a slave node Box B: 192.168.2.30, this box is only JobTracker Box C: 192.168.2.31, this box is only slave Obviously all three nodes can access the NFS shared, and the path to the share is /home/slitz/warehouse in all three. My hadoop-site.xml file were copied over all nodes and looks like this: configuration property namefs.default.name/name valuelocal/value description The name of the default file system. Either the literal string local or a host:port for NDFS. /description /property property namemapred.job.tracker/name value192.168.2.30:9001/value description The host and port that the MapReduce job tracker runs at. If local, then jobs are run in-process as a single map and reduce task. /description /property property namemapred.system.dir/name value/home/slitz/warehouse/hadoop_service/system/value descriptionomgrotfcopterlol./description /property /configuration As one can see, i'm not using HDFS at all. (Because all the free space i have is located in only one node, so using HDFS would be unnecessary overhead) I've copied the input folder from hadoop to /home/slitz/warehouse/input. When i try to run the example line bin/hadoop jar hadoop-*-examples.jar grep /home/slitz/warehouse/input/ /home/slitz/warehouse/output 'dfs[a-z.]+' the job starts and finish okay but at the end i get this error: org.apache.hadoop.mapred.InvalidInputException: Input path doesn't exist : /home/slitz/hadoop-0.15.3/grep-temp-141595661 at org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputFormat.java:154) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:508) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:753) (...the error stack continues...) i don't know why the input path being looked is in the local path /home/slitz/hadoop(...) instead of /home/slitz/warehouse/(...) Maybe something is missing in my hadoop-site.xml? slitz
Re: Using NFS without HDFS
I've read in the archive that it should be possible to use any distributed filesystem since the data is available to all nodes, so it should be possible to use NFS, right? I've also read somewere in the archive that this shoud be possible... slitz On Fri, Apr 11, 2008 at 1:43 PM, Peeyush Bishnoi [EMAIL PROTECTED] wrote: Hello , To execute Hadoop Map-Reduce job input data should be on HDFS not on NFS. Thanks --- Peeyush On Fri, 2008-04-11 at 12:40 +0100, slitz wrote: Hello, I'm trying to assemble a simple setup of 3 nodes using NFS as Distributed Filesystem. Box A: 192.168.2.3, this box is either the NFS server and working as a slave node Box B: 192.168.2.30, this box is only JobTracker Box C: 192.168.2.31, this box is only slave Obviously all three nodes can access the NFS shared, and the path to the share is /home/slitz/warehouse in all three. My hadoop-site.xml file were copied over all nodes and looks like this: configuration property namefs.default.name/name valuelocal/value description The name of the default file system. Either the literal string local or a host:port for NDFS. /description /property property namemapred.job.tracker/name value192.168.2.30:9001/value description The host and port that the MapReduce job tracker runs at. If local, then jobs are run in-process as a single map and reduce task. /description /property property namemapred.system.dir/name value/home/slitz/warehouse/hadoop_service/system/value descriptionomgrotfcopterlol./description /property /configuration As one can see, i'm not using HDFS at all. (Because all the free space i have is located in only one node, so using HDFS would be unnecessary overhead) I've copied the input folder from hadoop to /home/slitz/warehouse/input. When i try to run the example line bin/hadoop jar hadoop-*-examples.jar grep /home/slitz/warehouse/input/ /home/slitz/warehouse/output 'dfs[a-z.]+' the job starts and finish okay but at the end i get this error: org.apache.hadoop.mapred.InvalidInputException: Input path doesn't exist : /home/slitz/hadoop-0.15.3/grep-temp-141595661 at org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputFormat.java:154) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:508) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:753) (...the error stack continues...) i don't know why the input path being looked is in the local path /home/slitz/hadoop(...) instead of /home/slitz/warehouse/(...) Maybe something is missing in my hadoop-site.xml? slitz
Re: Using NFS without HDFS
slitz wrote: I've read in the archive that it should be possible to use any distributed filesystem since the data is available to all nodes, so it should be possible to use NFS, right? I've also read somewere in the archive that this shoud be possible... As far as I know, you can refer to any file on a mounted file system (visible from all compute nodes) using the prefix file:// before the full path, unless another prefix has been specified. Cheers, Luca slitz On Fri, Apr 11, 2008 at 1:43 PM, Peeyush Bishnoi [EMAIL PROTECTED] wrote: Hello , To execute Hadoop Map-Reduce job input data should be on HDFS not on NFS. Thanks --- Peeyush On Fri, 2008-04-11 at 12:40 +0100, slitz wrote: Hello, I'm trying to assemble a simple setup of 3 nodes using NFS as Distributed Filesystem. Box A: 192.168.2.3, this box is either the NFS server and working as a slave node Box B: 192.168.2.30, this box is only JobTracker Box C: 192.168.2.31, this box is only slave Obviously all three nodes can access the NFS shared, and the path to the share is /home/slitz/warehouse in all three. My hadoop-site.xml file were copied over all nodes and looks like this: configuration property namefs.default.name/name valuelocal/value description The name of the default file system. Either the literal string local or a host:port for NDFS. /description /property property namemapred.job.tracker/name value192.168.2.30:9001/value description The host and port that the MapReduce job tracker runs at. If local, then jobs are run in-process as a single map and reduce task. /description /property property namemapred.system.dir/name value/home/slitz/warehouse/hadoop_service/system/value descriptionomgrotfcopterlol./description /property /configuration As one can see, i'm not using HDFS at all. (Because all the free space i have is located in only one node, so using HDFS would be unnecessary overhead) I've copied the input folder from hadoop to /home/slitz/warehouse/input. When i try to run the example line bin/hadoop jar hadoop-*-examples.jar grep /home/slitz/warehouse/input/ /home/slitz/warehouse/output 'dfs[a-z.]+' the job starts and finish okay but at the end i get this error: org.apache.hadoop.mapred.InvalidInputException: Input path doesn't exist : /home/slitz/hadoop-0.15.3/grep-temp-141595661 at org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputFormat.java:154) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:508) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:753) (...the error stack continues...) i don't know why the input path being looked is in the local path /home/slitz/hadoop(...) instead of /home/slitz/warehouse/(...) Maybe something is missing in my hadoop-site.xml? slitz
Re: Using NFS without HDFS
Thank you for the file:/// tip, i was not including it in the paths. I'm running the example with this line - bin/hadoop jar hadoop-*-examples.jar grep file:///home/slitz/warehouse/input file:///home/slitz/warehouse/output 'dfs[a-z.]+' But i'm getting the same error as before, i'm getting org.apache.hadoop.mapred.InvalidInputException: Input path doesnt exist : * /home/slitz/hadoop-0.15.3/grep-temp-1030179831* at org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputFormat.java:154) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:508) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:753) (...stack continues...) i think the problem may be the input path, it should be pointing to some path in the nfs share, right? the grep-temp-* dir is being created in the HADOOP_HOME of Box A ( 192.168.2.3). slitz On Fri, Apr 11, 2008 at 4:06 PM, Luca [EMAIL PROTECTED] wrote: slitz wrote: I've read in the archive that it should be possible to use any distributed filesystem since the data is available to all nodes, so it should be possible to use NFS, right? I've also read somewere in the archive that this shoud be possible... As far as I know, you can refer to any file on a mounted file system (visible from all compute nodes) using the prefix file:// before the full path, unless another prefix has been specified. Cheers, Luca slitz On Fri, Apr 11, 2008 at 1:43 PM, Peeyush Bishnoi [EMAIL PROTECTED] wrote: Hello , To execute Hadoop Map-Reduce job input data should be on HDFS not on NFS. Thanks --- Peeyush On Fri, 2008-04-11 at 12:40 +0100, slitz wrote: Hello, I'm trying to assemble a simple setup of 3 nodes using NFS as Distributed Filesystem. Box A: 192.168.2.3, this box is either the NFS server and working as a slave node Box B: 192.168.2.30, this box is only JobTracker Box C: 192.168.2.31, this box is only slave Obviously all three nodes can access the NFS shared, and the path to the share is /home/slitz/warehouse in all three. My hadoop-site.xml file were copied over all nodes and looks like this: configuration property namefs.default.name/name valuelocal/value description The name of the default file system. Either the literal string local or a host:port for NDFS. /description /property property namemapred.job.tracker/name value192.168.2.30:9001/value description The host and port that the MapReduce job tracker runs at. If local, then jobs are run in-process as a single map and reduce task. /description /property property namemapred.system.dir/name value/home/slitz/warehouse/hadoop_service/system/value descriptionomgrotfcopterlol./description /property /configuration As one can see, i'm not using HDFS at all. (Because all the free space i have is located in only one node, so using HDFS would be unnecessary overhead) I've copied the input folder from hadoop to /home/slitz/warehouse/input. When i try to run the example line bin/hadoop jar hadoop-*-examples.jar grep /home/slitz/warehouse/input/ /home/slitz/warehouse/output 'dfs[a-z.]+' the job starts and finish okay but at the end i get this error: org.apache.hadoop.mapred.InvalidInputException: Input path doesn't exist : /home/slitz/hadoop-0.15.3/grep-temp-141595661 at org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputFormat.java:154) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:508) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:753) (...the error stack continues...) i don't know why the input path being looked is in the local path /home/slitz/hadoop(...) instead of /home/slitz/warehouse/(...) Maybe something is missing in my hadoop-site.xml? slitz