Re: distcp failing

2008-09-09 Thread Michael Di Domenico
i'm not sure that's the issue, i basically tarred up the hadoop directory
from the cluster and copied it over to the non-data node
but i do agree i've likely got a setting wrong, since i can run distcp from
the namenode and it works fine.  the question is which one

On Mon, Sep 8, 2008 at 7:04 PM, Aaron Kimball [EMAIL PROTECTED]wrote:

 It is likely that you mapred.system.dir and/or fs.default.name settings
 are
 incorrect on the non-datanode machine that you are launching the task from.
 These two settings (in your conf/hadoop-site.xml file) must match the
 settings on the cluster itself.

 - Aaron

 On Sun, Sep 7, 2008 at 8:58 PM, Michael Di Domenico
 [EMAIL PROTECTED]wrote:

  I'm attempting to load data into hadoop (version 0.17.1), from a
  non-datanode machine in the cluster.  I can run jobs and copyFromLocal
  works
  fine, but when i try to use distcp i get the below.  I'm don't understand
  what the error, can anyone help?
  Thanks
 
  blue:hadoop-0.17.1 mdidomenico$ time bin/hadoop distcp -overwrite
  file:///Users/mdidomenico/hadoop/1gTestfile /user/mdidomenico/1gTestfile
  08/09/07 23:56:06 INFO util.CopyFiles:
  srcPaths=[file:/Users/mdidomenico/hadoop/1gTestfile]
  08/09/07 23:56:06 INFO util.CopyFiles:
  destPath=/user/mdidomenico/1gTestfile1
  08/09/07 23:56:07 INFO util.CopyFiles: srcCount=1
  With failures, global counters are inaccurate; consider running with -i
  Copy failed: org.apache.hadoop.ipc.RemoteException: java.io.IOException:
  /tmp/hadoop-hadoop/mapred/system/job_200809072254_0005/job.xml: No such
  file
  or directory
 at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:215)
 at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:149)
 at
  org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1155)
 at
  org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1136)
 at
  org.apache.hadoop.mapred.JobInProgress.init(JobInProgress.java:175)
 at
  org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1755)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
 at java.lang.reflect.Method.invoke(Unknown Source)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
 
 at org.apache.hadoop.ipc.Client.call(Client.java:557)
 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
 at $Proxy1.submitJob(Unknown Source)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:585)
 at
 
 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
 at
 
 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
 at $Proxy1.submitJob(Unknown Source)
 at
 org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:758)
 at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:973)
 at org.apache.hadoop.util.CopyFiles.copy(CopyFiles.java:604)
 at org.apache.hadoop.util.CopyFiles.run(CopyFiles.java:743)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
 at org.apache.hadoop.util.CopyFiles.main(CopyFiles.java:763)
 



Re: distcp failing

2008-09-09 Thread Michael Di Domenico
a little more digging and it appears i cannot run distcp as someone other
then hadoop on the namenode
 /tmp/hadoop-hadoop/mapred/system/job_200809091231_0005/job.xml

looking at this directory from the error file the system directory does
not exist on the namenode, i only have a local directory

On Tue, Sep 9, 2008 at 12:41 PM, Michael Di Domenico [EMAIL PROTECTED]
 wrote:

 i'm not sure that's the issue, i basically tarred up the hadoop directory
 from the cluster and copied it over to the non-data node
 but i do agree i've likely got a setting wrong, since i can run distcp from
 the namenode and it works fine.  the question is which one

 On Mon, Sep 8, 2008 at 7:04 PM, Aaron Kimball [EMAIL PROTECTED]wrote:

 It is likely that you mapred.system.dir and/or fs.default.name settings
 are
 incorrect on the non-datanode machine that you are launching the task
 from.
 These two settings (in your conf/hadoop-site.xml file) must match the
 settings on the cluster itself.

 - Aaron

 On Sun, Sep 7, 2008 at 8:58 PM, Michael Di Domenico
 [EMAIL PROTECTED]wrote:

  I'm attempting to load data into hadoop (version 0.17.1), from a
  non-datanode machine in the cluster.  I can run jobs and copyFromLocal
  works
  fine, but when i try to use distcp i get the below.  I'm don't
 understand
  what the error, can anyone help?
  Thanks
 
  blue:hadoop-0.17.1 mdidomenico$ time bin/hadoop distcp -overwrite
  file:///Users/mdidomenico/hadoop/1gTestfile /user/mdidomenico/1gTestfile
  08/09/07 23:56:06 INFO util.CopyFiles:
  srcPaths=[file:/Users/mdidomenico/hadoop/1gTestfile]
  08/09/07 23:56:06 INFO util.CopyFiles:
  destPath=/user/mdidomenico/1gTestfile1
  08/09/07 23:56:07 INFO util.CopyFiles: srcCount=1
  With failures, global counters are inaccurate; consider running with -i
  Copy failed: org.apache.hadoop.ipc.RemoteException: java.io.IOException:
  /tmp/hadoop-hadoop/mapred/system/job_200809072254_0005/job.xml: No such
  file
  or directory
 at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:215)
 at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:149)
 at
  org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1155)
 at
  org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1136)
 at
  org.apache.hadoop.mapred.JobInProgress.init(JobInProgress.java:175)
 at
  org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1755)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
 Source)
 at java.lang.reflect.Method.invoke(Unknown Source)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
 
 at org.apache.hadoop.ipc.Client.call(Client.java:557)
 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
 at $Proxy1.submitJob(Unknown Source)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:585)
 at
 
 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
 at
 
 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
 at $Proxy1.submitJob(Unknown Source)
 at
 org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:758)
 at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:973)
 at org.apache.hadoop.util.CopyFiles.copy(CopyFiles.java:604)
 at org.apache.hadoop.util.CopyFiles.run(CopyFiles.java:743)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
 at org.apache.hadoop.util.CopyFiles.main(CopyFiles.java:763)
 





Re: distcp failing

2008-09-09 Thread Michael Di Domenico
manually creating the system directory gets me past the first error, but
now i get this.  i'm not necessarily sure its a step forward though, because
the map task never shows up in the jobtracker
[EMAIL PROTECTED] hadoop-0.17.1]$ bin/hadoop distcp
file:///home/mdidomenico/1gTestfile 1gTestfile
08/09/09 13:12:06 INFO util.CopyFiles:
srcPaths=[file:/home/mdidomenico/1gTestfile]
08/09/09 13:12:06 INFO util.CopyFiles: destPath=1gTestfile
08/09/09 13:12:07 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Could not read from stream
08/09/09 13:12:07 INFO dfs.DFSClient: Abandoning block
blk_5758513071638050362
08/09/09 13:12:13 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Could not read from stream
08/09/09 13:12:13 INFO dfs.DFSClient: Abandoning block
blk_1691495306775808049
08/09/09 13:12:17 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Could not read from stream
08/09/09 13:12:17 INFO dfs.DFSClient: Abandoning block
blk_1027634596973755899
08/09/09 13:12:19 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Could not read from stream
08/09/09 13:12:19 INFO dfs.DFSClient: Abandoning block
blk_4535302510016050282
08/09/09 13:12:23 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Could not read from stream
08/09/09 13:12:23 INFO dfs.DFSClient: Abandoning block
blk_7022658012001626339
08/09/09 13:12:25 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Could not read from stream
08/09/09 13:12:25 INFO dfs.DFSClient: Abandoning block
blk_-4509681241839967328
08/09/09 13:12:29 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Could not read from stream
08/09/09 13:12:29 INFO dfs.DFSClient: Abandoning block
blk_8318033979013580420
08/09/09 13:12:31 WARN dfs.DFSClient: DataStreamer Exception:
java.io.IOException: Unable to create new block.
08/09/09 13:12:31 WARN dfs.DFSClient: Error Recovery for block
blk_-4509681241839967328 bad datanode[0]
08/09/09 13:12:35 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Could not read from stream
08/09/09 13:12:35 INFO dfs.DFSClient: Abandoning block
blk_2848354798649979411
08/09/09 13:12:41 WARN dfs.DFSClient: DataStreamer Exception:
java.io.IOException: Unable to create new block.
08/09/09 13:12:41 WARN dfs.DFSClient: Error Recovery for block
blk_2848354798649979411 bad datanode[0]
Exception in thread Thread-0 java.util.ConcurrentModificationException
at java.util.TreeMap$PrivateEntryIterator.nextEntry(Unknown Source)
at java.util.TreeMap$KeyIterator.next(Unknown Source)
at org.apache.hadoop.dfs.DFSClient.close(DFSClient.java:217)
at
org.apache.hadoop.dfs.DistributedFileSystem.close(DistributedFileSystem.java:214)
at
org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:1324)
at org.apache.hadoop.fs.FileSystem.closeAll(FileSystem.java:224)
at
org.apache.hadoop.fs.FileSystem$ClientFinalizer.run(FileSystem.java:209)
08/09/09 13:12:41 INFO dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Could not read from stream
08/09/09 13:12:41 INFO dfs.DFSClient: Abandoning block
blk_9189111926428577428

On Tue, Sep 9, 2008 at 1:03 PM, Michael Di Domenico
[EMAIL PROTECTED]wrote:

 a little more digging and it appears i cannot run distcp as someone other
 then hadoop on the namenode
  /tmp/hadoop-hadoop/mapred/system/job_200809091231_0005/job.xml

 looking at this directory from the error file the system directory does
 not exist on the namenode, i only have a local directory


 On Tue, Sep 9, 2008 at 12:41 PM, Michael Di Domenico 
 [EMAIL PROTECTED] wrote:

 i'm not sure that's the issue, i basically tarred up the hadoop directory
 from the cluster and copied it over to the non-data node
 but i do agree i've likely got a setting wrong, since i can run distcp
 from the namenode and it works fine.  the question is which one

 On Mon, Sep 8, 2008 at 7:04 PM, Aaron Kimball [EMAIL PROTECTED]wrote:

 It is likely that you mapred.system.dir and/or fs.default.name settings
 are
 incorrect on the non-datanode machine that you are launching the task
 from.
 These two settings (in your conf/hadoop-site.xml file) must match the
 settings on the cluster itself.

 - Aaron

 On Sun, Sep 7, 2008 at 8:58 PM, Michael Di Domenico
 [EMAIL PROTECTED]wrote:

  I'm attempting to load data into hadoop (version 0.17.1), from a
  non-datanode machine in the cluster.  I can run jobs and copyFromLocal
  works
  fine, but when i try to use distcp i get the below.  I'm don't
 understand
  what the error, can anyone help?
  Thanks
 
  blue:hadoop-0.17.1 mdidomenico$ time bin/hadoop distcp -overwrite
  file:///Users/mdidomenico/hadoop/1gTestfile
 /user/mdidomenico/1gTestfile
  08/09/07 23:56:06 INFO util.CopyFiles:
  srcPaths=[file:/Users/mdidomenico/hadoop/1gTestfile]
  08/09/07 23:56:06 INFO

Re: distcp failing

2008-09-09 Thread Michael Di Domenico
Apparently, the fix to my original error is because hadoop is setup for a
single local machine out of the box and i had to change these directories
property
  namemapred.local.dir/name
  value/hadoop/mapred/local/value
/property
property
  namemapred.system.dir/name
  value/hadoop/mapred/system/value
/property
property
  namemapred.temp.dir/name
  value/hadoop/mapred/temp/value
/property

to be hdfs instead of hadoop.tmp.dir

So now distcp works as a non-hadoop user and mapred works as a non-hadoop
user from the name node, however, from a workstation i get this now

blue:hadoop-0.17.1 mdidomenico$ bin/hadoop distcp
file:///Users/mdidomenico/hadoop/1gTestfile 1gTestfile-1
08/09/09 13:44:19 INFO util.CopyFiles:
srcPaths=[file:/Users/mdidomenico/hadoop/1gTestfile]
08/09/09 13:44:19 INFO util.CopyFiles: destPath=1gTestfile-1
08/09/09 13:44:20 INFO util.CopyFiles: srcCount=1
08/09/09 13:44:22 INFO mapred.JobClient: Running job: job_200809091332_0004
08/09/09 13:44:23 INFO mapred.JobClient:  map 0% reduce 0%
08/09/09 13:44:31 INFO mapred.JobClient: Task Id :
task_200809091332_0004_m_00_0, Status : FAILED
java.io.IOException: Copied: 0 Skipped: 0 Failed: 1
at
org.apache.hadoop.util.CopyFiles$CopyFilesMapper.close(CopyFiles.java:527)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)

08/09/09 13:44:50 INFO mapred.JobClient: Task Id :
task_200809091332_0004_m_00_1, Status : FAILED
java.io.IOException: Copied: 0 Skipped: 0 Failed: 1
at
org.apache.hadoop.util.CopyFiles$CopyFilesMapper.close(CopyFiles.java:527)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)

08/09/09 13:45:07 INFO mapred.JobClient: Task Id :
task_200809091332_0004_m_00_2, Status : FAILED
java.io.IOException: Copied: 0 Skipped: 0 Failed: 1
at
org.apache.hadoop.util.CopyFiles$CopyFilesMapper.close(CopyFiles.java:527)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)

08/09/09 13:45:26 INFO mapred.JobClient:  map 100% reduce 100%
With failures, global counters are inaccurate; consider running with -i
Copy failed: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1062)
at org.apache.hadoop.util.CopyFiles.copy(CopyFiles.java:604)
at org.apache.hadoop.util.CopyFiles.run(CopyFiles.java:743)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.util.CopyFiles.main(CopyFiles.java:763)


On Tue, Sep 9, 2008 at 1:14 PM, Michael Di Domenico
[EMAIL PROTECTED]wrote:

 manually creating the system directory gets me past the first error, but
 now i get this.  i'm not necessarily sure its a step forward though, because
 the map task never shows up in the jobtracker
 [EMAIL PROTECTED] hadoop-0.17.1]$ bin/hadoop distcp
 file:///home/mdidomenico/1gTestfile 1gTestfile
 08/09/09 13:12:06 INFO util.CopyFiles:
 srcPaths=[file:/home/mdidomenico/1gTestfile]
 08/09/09 13:12:06 INFO util.CopyFiles: destPath=1gTestfile
 08/09/09 13:12:07 INFO dfs.DFSClient: Exception in createBlockOutputStream
 java.io.IOException: Could not read from stream
 08/09/09 13:12:07 INFO dfs.DFSClient: Abandoning block
 blk_5758513071638050362
 08/09/09 13:12:13 INFO dfs.DFSClient: Exception in createBlockOutputStream
 java.io.IOException: Could not read from stream
 08/09/09 13:12:13 INFO dfs.DFSClient: Abandoning block
 blk_1691495306775808049
 08/09/09 13:12:17 INFO dfs.DFSClient: Exception in createBlockOutputStream
 java.io.IOException: Could not read from stream
 08/09/09 13:12:17 INFO dfs.DFSClient: Abandoning block
 blk_1027634596973755899
 08/09/09 13:12:19 INFO dfs.DFSClient: Exception in createBlockOutputStream
 java.io.IOException: Could not read from stream
 08/09/09 13:12:19 INFO dfs.DFSClient: Abandoning block
 blk_4535302510016050282
 08/09/09 13:12:23 INFO dfs.DFSClient: Exception in createBlockOutputStream
 java.io.IOException: Could not read from stream
 08/09/09 13:12:23 INFO dfs.DFSClient: Abandoning block
 blk_7022658012001626339
 08/09/09 13:12:25 INFO dfs.DFSClient: Exception in createBlockOutputStream
 java.io.IOException: Could not read from stream
 08/09/09 13:12:25 INFO dfs.DFSClient: Abandoning block
 blk_-4509681241839967328
 08/09/09 13:12:29 INFO dfs.DFSClient: Exception in createBlockOutputStream
 java.io.IOException: Could not read from stream
 08/09/09 13:12:29 INFO dfs.DFSClient: Abandoning block
 blk_8318033979013580420
 08/09/09 13:12:31 WARN dfs.DFSClient

Re: distcp failing

2008-09-09 Thread Michael Di Domenico
Looking in the task tracker log, i see this
This file does exist on my local workstation, but it does not exist on the
namenode/datanodes in my cluster.  So it begs the question of if i
misunderstood the use of distcp or is there still something wrong?

I'm looking for something that will read a file from my workstation and load
it into the dfs, but instead of going through the namenode like
copyFromLocal seems to do, i'd like it to load the data via the datanodes
directly, if distcp doesn't do it this way, is there anything that will?

2008-09-09 14:00:54,418 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=MAP, sessionId=
2008-09-09 14:00:54,662 INFO org.apache.hadoop.mapred.MapTask:
numReduceTasks: 0
2008-09-09 14:00:54,894 INFO org.apache.hadoop.util.CopyFiles: FAIL
1gTestfile : java.io.FileNotFoundException: File
file:/Users/mdidomenico/hadoop/1gTestfile does not exist.
at
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:402)
at
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:242)
at
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.init(ChecksumFileSystem.java:116)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:274)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:380)
at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.copy(CopyFiles.java:366)
at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:493)
at org.apache.hadoop.util.CopyFiles$CopyFilesMapper.map(CopyFiles.java:268)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)

2008-09-09 14:01:03,950 WARN org.apache.hadoop.mapred.TaskTracker: Error
running child
java.io.IOException: Copied: 0 Skipped: 0 Failed: 1
at
org.apache.hadoop.util.CopyFiles$CopyFilesMapper.close(CopyFiles.java:527)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)


On Tue, Sep 9, 2008 at 1:47 PM, Michael Di Domenico
[EMAIL PROTECTED]wrote:

 Apparently, the fix to my original error is because hadoop is setup for a
 single local machine out of the box and i had to change these directories
 property
   namemapred.local.dir/name
   value/hadoop/mapred/local/value
 /property
 property
   namemapred.system.dir/name
   value/hadoop/mapred/system/value
 /property
 property
   namemapred.temp.dir/name
   value/hadoop/mapred/temp/value
 /property

 to be hdfs instead of hadoop.tmp.dir

 So now distcp works as a non-hadoop user and mapred works as a non-hadoop
 user from the name node, however, from a workstation i get this now

 blue:hadoop-0.17.1 mdidomenico$ bin/hadoop distcp
 file:///Users/mdidomenico/hadoop/1gTestfile 1gTestfile-1
 08/09/09 13:44:19 INFO util.CopyFiles:
 srcPaths=[file:/Users/mdidomenico/hadoop/1gTestfile]
 08/09/09 13:44:19 INFO util.CopyFiles: destPath=1gTestfile-1
 08/09/09 13:44:20 INFO util.CopyFiles: srcCount=1
 08/09/09 13:44:22 INFO mapred.JobClient: Running job: job_200809091332_0004
 08/09/09 13:44:23 INFO mapred.JobClient:  map 0% reduce 0%
 08/09/09 13:44:31 INFO mapred.JobClient: Task Id :
 task_200809091332_0004_m_00_0, Status : FAILED
 java.io.IOException: Copied: 0 Skipped: 0 Failed: 1
 at
 org.apache.hadoop.util.CopyFiles$CopyFilesMapper.close(CopyFiles.java:527)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
 at
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)

 08/09/09 13:44:50 INFO mapred.JobClient: Task Id :
 task_200809091332_0004_m_00_1, Status : FAILED
 java.io.IOException: Copied: 0 Skipped: 0 Failed: 1
 at
 org.apache.hadoop.util.CopyFiles$CopyFilesMapper.close(CopyFiles.java:527)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
 at
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)

 08/09/09 13:45:07 INFO mapred.JobClient: Task Id :
 task_200809091332_0004_m_00_2, Status : FAILED
 java.io.IOException: Copied: 0 Skipped: 0 Failed: 1
 at
 org.apache.hadoop.util.CopyFiles$CopyFilesMapper.close(CopyFiles.java:527)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
 at
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)

 08/09/09 13:45:26 INFO mapred.JobClient:  map 100% reduce 100%
 With failures, global counters are inaccurate; consider running with -i
 Copy failed: java.io.IOException: Job failed!
 at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1062)
 at org.apache.hadoop.util.CopyFiles.copy

distcp failing

2008-09-07 Thread Michael Di Domenico
I'm attempting to load data into hadoop (version 0.17.1), from a
non-datanode machine in the cluster.  I can run jobs and copyFromLocal works
fine, but when i try to use distcp i get the below.  I'm don't understand
what the error, can anyone help?
Thanks

blue:hadoop-0.17.1 mdidomenico$ time bin/hadoop distcp -overwrite
file:///Users/mdidomenico/hadoop/1gTestfile /user/mdidomenico/1gTestfile
08/09/07 23:56:06 INFO util.CopyFiles:
srcPaths=[file:/Users/mdidomenico/hadoop/1gTestfile]
08/09/07 23:56:06 INFO util.CopyFiles:
destPath=/user/mdidomenico/1gTestfile1
08/09/07 23:56:07 INFO util.CopyFiles: srcCount=1
With failures, global counters are inaccurate; consider running with -i
Copy failed: org.apache.hadoop.ipc.RemoteException: java.io.IOException:
/tmp/hadoop-hadoop/mapred/system/job_200809072254_0005/job.xml: No such file
or directory
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:215)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:149)
at
org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1155)
at
org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1136)
at
org.apache.hadoop.mapred.JobInProgress.init(JobInProgress.java:175)
at
org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:1755)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)

at org.apache.hadoop.ipc.Client.call(Client.java:557)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
at $Proxy1.submitJob(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.submitJob(Unknown Source)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:758)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:973)
at org.apache.hadoop.util.CopyFiles.copy(CopyFiles.java:604)
at org.apache.hadoop.util.CopyFiles.run(CopyFiles.java:743)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.util.CopyFiles.main(CopyFiles.java:763)


Re: Hadoop installation folders in multiple nodes

2008-06-02 Thread Michael Di Domenico
Depending on your windows version, there is a dos command called subst
which you could use to virtualize a drive letter on your third machine

On Fri, May 30, 2008 at 4:35 AM, Sridhar Raman [EMAIL PROTECTED]
wrote:

 Should the installation paths be the same in all the nodes?  Most
 documentation seems to suggest that it is _*recommended*_ to have the
 _*same
 *_ paths in all the nodes.  But what is the workaround, if, because of some
 reason, one isn't able to have the same path?

 That's the problem we are facing right now.  After making Hadoop work
 perfectly in a 2-node cluster, when we tried to accommodate a 3rd machine,
 we realised that this machine doesn't have a E:, which is where the
 installation of hadoop is in the other 2 nodes.  All our machines are
 Windows machines.  The possible solutions are:
 1) Move the installations in M1  M2 to a drive that is present in M3.  We
 will keep this as the last option.
 2) Map a folder in M3's D: to E:.  We used the subst command to do this.
 But when we tried to start DFS, it wasn't able to find the hadoop
 installation.  Just to verify, we tried a ssh to the localhost, and were
 unable to find the mapped drive.  It's only visible as a folder of D:.
 Whereas, in the basic cygwin prompt, we are able to view E:.
 3) Partition M3's D drive and create an E.  This carries the risk of loss
 of
 data.

 So, what should we do?  Is there any way we can specify in the NameNode the
 installation paths of hadoop in each of the remaining nodes?  Or is there
 some environment variable that can be set, which can make the hadoop
 installation path specific to each machine?

 Thanks,
 Sridhar



Re: Hadoop installation folders in multiple nodes

2008-06-02 Thread Michael Di Domenico
Oops, missed the part where you already tried that.

On Mon, Jun 2, 2008 at 3:23 PM, Michael Di Domenico [EMAIL PROTECTED]
wrote:

 Depending on your windows version, there is a dos command called subst
 which you could use to virtualize a drive letter on your third machine


 On Fri, May 30, 2008 at 4:35 AM, Sridhar Raman [EMAIL PROTECTED]
 wrote:

 Should the installation paths be the same in all the nodes?  Most
 documentation seems to suggest that it is _*recommended*_ to have the
 _*same
 *_ paths in all the nodes.  But what is the workaround, if, because of
 some
 reason, one isn't able to have the same path?

 That's the problem we are facing right now.  After making Hadoop work
 perfectly in a 2-node cluster, when we tried to accommodate a 3rd machine,
 we realised that this machine doesn't have a E:, which is where the
 installation of hadoop is in the other 2 nodes.  All our machines are
 Windows machines.  The possible solutions are:
 1) Move the installations in M1  M2 to a drive that is present in M3.  We
 will keep this as the last option.
 2) Map a folder in M3's D: to E:.  We used the subst command to do this.
 But when we tried to start DFS, it wasn't able to find the hadoop
 installation.  Just to verify, we tried a ssh to the localhost, and were
 unable to find the mapped drive.  It's only visible as a folder of D:.
 Whereas, in the basic cygwin prompt, we are able to view E:.
 3) Partition M3's D drive and create an E.  This carries the risk of loss
 of
 data.

 So, what should we do?  Is there any way we can specify in the NameNode
 the
 installation paths of hadoop in each of the remaining nodes?  Or is there
 some environment variable that can be set, which can make the hadoop
 installation path specific to each machine?

 Thanks,
 Sridhar