Using NFS without HDFS

2008-04-11 Thread slitz
Hello,
I'm trying to assemble a simple setup of 3 nodes using NFS as Distributed
Filesystem.

Box A: 192.168.2.3, this box is either the NFS server and working as a slave
node
Box B: 192.168.2.30, this box is only JobTracker
Box C: 192.168.2.31, this box is only slave

Obviously all three nodes can access the NFS shared, and the path to the
share is /home/slitz/warehouse in all three.

My hadoop-site.xml file were copied over all nodes and looks like this:

configuration

property

namefs.default.name/name

 valuelocal/value

description

 The name of the default file system. Either the literal string

local or a host:port for NDFS.

 /description

/property

 property

namemapred.job.tracker/name

 value192.168.2.30:9001/value

description

 The host and port that the MapReduce job

tracker runs at. If local, then jobs are

 run in-process as a single map and reduce task.

/description

 /property

property

namemapred.system.dir/name

 value/home/slitz/warehouse/hadoop_service/system/value

descriptionomgrotfcopterlol./description

 /property

/configuration


As one can see, i'm not using HDFS at all.
(Because all the free space i have is located in only one node, so using
HDFS would be unnecessary overhead)

I've copied the input folder from hadoop to /home/slitz/warehouse/input.
When i try to run the example line

bin/hadoop jar hadoop-*-examples.jar grep /home/slitz/warehouse/input/
/home/slitz/warehouse/output 'dfs[a-z.]+'

the job starts and finish okay but at the end i get this error:

org.apache.hadoop.mapred.InvalidInputException: Input path doesn't exist :
/home/slitz/hadoop-0.15.3/grep-temp-141595661
at
org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputFormat.java:154)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:508)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:753)
(...the error stack continues...)

i don't know why the input path being looked is in the local path
/home/slitz/hadoop(...) instead of /home/slitz/warehouse/(...)

Maybe something is missing in my hadoop-site.xml?



slitz


Re: Using NFS without HDFS

2008-04-11 Thread slitz
I've read in the archive that it should be possible to use any distributed
filesystem since the data is available to all nodes, so it should be
possible to use NFS, right?
I've also read somewere in the archive that this shoud be possible...


slitz


On Fri, Apr 11, 2008 at 1:43 PM, Peeyush Bishnoi [EMAIL PROTECTED]
wrote:

 Hello ,

 To execute Hadoop Map-Reduce job input data should be on HDFS not on
 NFS.

 Thanks

 ---
 Peeyush



 On Fri, 2008-04-11 at 12:40 +0100, slitz wrote:

  Hello,
  I'm trying to assemble a simple setup of 3 nodes using NFS as
 Distributed
  Filesystem.
 
  Box A: 192.168.2.3, this box is either the NFS server and working as a
 slave
  node
  Box B: 192.168.2.30, this box is only JobTracker
  Box C: 192.168.2.31, this box is only slave
 
  Obviously all three nodes can access the NFS shared, and the path to the
  share is /home/slitz/warehouse in all three.
 
  My hadoop-site.xml file were copied over all nodes and looks like this:
 
  configuration
 
  property
 
  namefs.default.name/name
 
   valuelocal/value
 
  description
 
   The name of the default file system. Either the literal string
 
  local or a host:port for NDFS.
 
   /description
 
  /property
 
   property
 
  namemapred.job.tracker/name
 
   value192.168.2.30:9001/value
 
  description
 
   The host and port that the MapReduce job
 
  tracker runs at. If local, then jobs are
 
   run in-process as a single map and reduce task.
 
  /description
 
   /property
 
  property
 
  namemapred.system.dir/name
 
   value/home/slitz/warehouse/hadoop_service/system/value
 
  descriptionomgrotfcopterlol./description
 
   /property
 
  /configuration
 
 
  As one can see, i'm not using HDFS at all.
  (Because all the free space i have is located in only one node, so using
  HDFS would be unnecessary overhead)
 
  I've copied the input folder from hadoop to /home/slitz/warehouse/input.
  When i try to run the example line
 
  bin/hadoop jar hadoop-*-examples.jar grep /home/slitz/warehouse/input/
  /home/slitz/warehouse/output 'dfs[a-z.]+'
 
  the job starts and finish okay but at the end i get this error:
 
  org.apache.hadoop.mapred.InvalidInputException: Input path doesn't exist
 :
  /home/slitz/hadoop-0.15.3/grep-temp-141595661
  at
 
 org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputFormat.java:154)
  at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:508)
  at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:753)
  (...the error stack continues...)
 
  i don't know why the input path being looked is in the local path
  /home/slitz/hadoop(...) instead of /home/slitz/warehouse/(...)
 
  Maybe something is missing in my hadoop-site.xml?
 
 
 
  slitz



Re: Using NFS without HDFS

2008-04-11 Thread Luca

slitz wrote:

I've read in the archive that it should be possible to use any distributed
filesystem since the data is available to all nodes, so it should be
possible to use NFS, right?
I've also read somewere in the archive that this shoud be possible...



As far as I know, you can refer to any file on a mounted file system 
(visible from all compute nodes) using the prefix file:// before the 
full path, unless another prefix has been specified.


Cheers,
Luca



slitz


On Fri, Apr 11, 2008 at 1:43 PM, Peeyush Bishnoi [EMAIL PROTECTED]
wrote:


Hello ,

To execute Hadoop Map-Reduce job input data should be on HDFS not on
NFS.

Thanks

---
Peeyush



On Fri, 2008-04-11 at 12:40 +0100, slitz wrote:


Hello,
I'm trying to assemble a simple setup of 3 nodes using NFS as

Distributed

Filesystem.

Box A: 192.168.2.3, this box is either the NFS server and working as a

slave

node
Box B: 192.168.2.30, this box is only JobTracker
Box C: 192.168.2.31, this box is only slave

Obviously all three nodes can access the NFS shared, and the path to the
share is /home/slitz/warehouse in all three.

My hadoop-site.xml file were copied over all nodes and looks like this:

configuration

property

namefs.default.name/name

 valuelocal/value

description

 The name of the default file system. Either the literal string

local or a host:port for NDFS.

 /description

/property

 property

namemapred.job.tracker/name

 value192.168.2.30:9001/value

description

 The host and port that the MapReduce job

tracker runs at. If local, then jobs are

 run in-process as a single map and reduce task.

/description

 /property

property

namemapred.system.dir/name

 value/home/slitz/warehouse/hadoop_service/system/value

descriptionomgrotfcopterlol./description

 /property

/configuration


As one can see, i'm not using HDFS at all.
(Because all the free space i have is located in only one node, so using
HDFS would be unnecessary overhead)

I've copied the input folder from hadoop to /home/slitz/warehouse/input.
When i try to run the example line

bin/hadoop jar hadoop-*-examples.jar grep /home/slitz/warehouse/input/
/home/slitz/warehouse/output 'dfs[a-z.]+'

the job starts and finish okay but at the end i get this error:

org.apache.hadoop.mapred.InvalidInputException: Input path doesn't exist

:

/home/slitz/hadoop-0.15.3/grep-temp-141595661
at


org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputFormat.java:154)

at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:508)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:753)
(...the error stack continues...)

i don't know why the input path being looked is in the local path
/home/slitz/hadoop(...) instead of /home/slitz/warehouse/(...)

Maybe something is missing in my hadoop-site.xml?



slitz







Re: Using NFS without HDFS

2008-04-11 Thread slitz
Thank you for the file:/// tip, i was not including it in the paths.
I'm running the example with this line - bin/hadoop jar
hadoop-*-examples.jar grep file:///home/slitz/warehouse/input
file:///home/slitz/warehouse/output 'dfs[a-z.]+'

But i'm getting the same error as before, i'm getting

org.apache.hadoop.mapred.InvalidInputException: Input path doesnt exist : *
/home/slitz/hadoop-0.15.3/grep-temp-1030179831*
at
org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputFormat.java:154)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:508)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:753)
(...stack continues...)

i think the problem may be the input path, it should be pointing to some
path in the nfs share, right?

the grep-temp-* dir is being created in the HADOOP_HOME of Box A (
192.168.2.3).

slitz

On Fri, Apr 11, 2008 at 4:06 PM, Luca [EMAIL PROTECTED] wrote:

 slitz wrote:

  I've read in the archive that it should be possible to use any
  distributed
  filesystem since the data is available to all nodes, so it should be
  possible to use NFS, right?
  I've also read somewere in the archive that this shoud be possible...
 
 
 As far as I know, you can refer to any file on a mounted file system
 (visible from all compute nodes) using the prefix file:// before the full
 path, unless another prefix has been specified.

 Cheers,
 Luca



  slitz
 
 
  On Fri, Apr 11, 2008 at 1:43 PM, Peeyush Bishnoi [EMAIL PROTECTED]
  
  wrote:
 
   Hello ,
  
   To execute Hadoop Map-Reduce job input data should be on HDFS not on
   NFS.
  
   Thanks
  
   ---
   Peeyush
  
  
  
   On Fri, 2008-04-11 at 12:40 +0100, slitz wrote:
  
Hello,
I'm trying to assemble a simple setup of 3 nodes using NFS as
   
   Distributed
  
Filesystem.
   
Box A: 192.168.2.3, this box is either the NFS server and working as
a
   
   slave
  
node
Box B: 192.168.2.30, this box is only JobTracker
Box C: 192.168.2.31, this box is only slave
   
Obviously all three nodes can access the NFS shared, and the path to
the
share is /home/slitz/warehouse in all three.
   
My hadoop-site.xml file were copied over all nodes and looks like
this:
   
configuration
   
property
   
namefs.default.name/name
   
 valuelocal/value
   
description
   
 The name of the default file system. Either the literal string
   
local or a host:port for NDFS.
   
 /description
   
/property
   
 property
   
namemapred.job.tracker/name
   
 value192.168.2.30:9001/value
   
description
   
 The host and port that the MapReduce job
   
tracker runs at. If local, then jobs are
   
 run in-process as a single map and reduce task.
   
/description
   
 /property
   
property
   
namemapred.system.dir/name
   
 value/home/slitz/warehouse/hadoop_service/system/value
   
descriptionomgrotfcopterlol./description
   
 /property
   
/configuration
   
   
As one can see, i'm not using HDFS at all.
(Because all the free space i have is located in only one node, so
using
HDFS would be unnecessary overhead)
   
I've copied the input folder from hadoop to
/home/slitz/warehouse/input.
When i try to run the example line
   
bin/hadoop jar hadoop-*-examples.jar grep
/home/slitz/warehouse/input/
/home/slitz/warehouse/output 'dfs[a-z.]+'
   
the job starts and finish okay but at the end i get this error:
   
org.apache.hadoop.mapred.InvalidInputException: Input path doesn't
exist
   
   :
  
/home/slitz/hadoop-0.15.3/grep-temp-141595661
at
   
org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputFormat.java:154)
  
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:508)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:753)
(...the error stack continues...)
   
i don't know why the input path being looked is in the local path
/home/slitz/hadoop(...) instead of /home/slitz/warehouse/(...)
   
Maybe something is missing in my hadoop-site.xml?
   
   
   
slitz