Problems running TestDFSIO to a non-default directory
Hi Konstantin (et al.); A while ago you gave me the following trick to run TestDFSIO to an output directory other than the default- just use -Dtest.build.data=/output/dir to pass the new directory to the executable. I recall this working, but it is failing now under 0.18.1, and looking at it I can't see how it ever worked. The -D option will set the property on the Java virtual machine which runs as a direct child of /bin/hadoop, but I see no way the property would get set on the mapper virtual machines. Should this still work? Thanks, -Joel On Thu, 2008-09-04 at 13:05 -0700, Konstantin Shvachko wrote: > Sure. > > bin/hadoop -Dtest.build.data=/bessemer/welling/hadoop_test/benchmarks/TestDFSIO/ org.apache.hadoop.fs.TestDFSIO -write -nrFiles 2*N -fileSize 360 > > --Konst > > Joel Welling wrote: > > With my setup, I need to change the file directory > > from /benchmarks/TestDFSIO/io_control to something > > like /bessemer/welling/hadoop_test/benchmarks/TestDFSIO/io_control . Is > > there a command line argument or parameter that will do this? > > Basically, I have to point it explicitly into my Lustre filesystem. > > > > -Joel > >
How to run sort900?
Hi folks; Is there a standard procedure for running the sort900 test? In particular, the timings show time for a verifier, but I can't find where that's implemented. Thanks, -Joel [EMAIL PROTECTED]
Re: reading input for a map function from 2 different files?
Amar, isn't there a problem with your method in that it gets a small result by subtracting very large numbers? Given a million inputs, won't A and B be so much larger than the standard deviation that there aren't enough no bits left in the floating point number to represent it? I just thought I should mention that, before this thread goes in an archive somewhere and some student looks it up. -Joel On Wed, 2008-11-12 at 12:32 +0530, Amar Kamat wrote: > some speed wrote: > > Thanks for the response. What I am trying is to do is finding the average > > and then the standard deviation for a very large set (say a million) of > > numbers. The result would be used in further calculations. > > I have got the average from the first map-reduce chain. now i need to read > > this average as well as the set of numbers to calculate the standard > > deviation. so one file would have the input set and the other "resultant" > > file would have just the average. > > Please do tell me in case there is a better way of doing things than what i > > am doing. Any input/suggestion is appreciated.:) > > > > > std_dev^2 = sum_i((Xi - Xa) ^ 2) / N; where Xa is the avg. > Why dont you use the formula to compute it in one MR job. > std_dev^2 = (sum_i(Xi ^ 2) - N * (Xa ^ 2) ) / N; > = (A - N*(avg^2))/N > > For this your map would look like >map (key, val) : output.collect(key^2, key); // imagine your input as > (k,v) = (Xi, null) > Reduce should simply sum over the keys to find out 'sum_i(Xi ^ 2)' and > sum over the values to find out 'Xa'. You could use the close() api to > finally dump there 2 values to a file. > > For example : > input : 1,2,3,4 > Say input is split in 2 groups [1,2] and [4,5] > Now there will be 2 maps with output as follows > map1 output : (1,1) (4,2) > map2 output : (9,3) (16,4) > > Reducer will maintain the sum over all keys and all values > A = sum(key i.e input squared) = 1+ 4 + 9 + 16 = 30 > B = sum(values i.e input) = 1 + 2 + 3 + 4 = 10 > > With A and B you can compute the standard deviation offline. > So avg = B / N = 10/4 = 2.5 > Hence the std deviation would be > sqrt( (A - N * avg^2) / N) = sqrt ((30 - 4*6.25)/4) = *1.11803399 > > *Using the main formula the answer is *1.11803399* > Amar > > > > On Mon, Nov 10, 2008 at 4:22 AM, Amar Kamat <[EMAIL PROTECTED]> wrote: > > > > > >> Amar Kamat wrote: > >> > >> > >>> some speed wrote: > >>> > >>> > I was wondering if it was possible to read the input for a map function > from > 2 different files: > 1st file ---> user-input file from a particular location(path) > > > >>> Is the input/user file sorted? If yes then you can use "map-side join" for > >>> > >> performance reasons. See org.apache.hadoop.mapred.join for more details. > >> > >> > >>> 2nd file=---> A resultant file (has just one pair) from a > >>> > previous MapReduce job. (I am implementing a chain MapReduce function) > > > >>> Can you explain in more detail the contents of 2nd file? > >>> > Now, for every pair in the user-input file, I would like to > use > the same pair from the 2nd file for some calculations. > > > >>> Can you explain this in more detail? Can you give some abstracted example > >>> > >> of how file1 and file2 look like and what operation/processing you want to > >> do? > >> > >> > >> > >>> I guess you might need to do some kind of join on the 2 files. Look at > >>> contrib/data_join for more details. > >>> Amar > >>> > >>> > Is it possible for me to do so? Can someone guide me in the right > direction > please? > > > Thanks! > > > > > >>> > > > >
Is there a unique ID associated with each task?
Hi folks; I'm writing a Hadoop Pipes application, and I need to generate a bunch of integers that are unique across all map tasks. If each map task has a unique integer ID, I can make sure my integers are unique by including that integer ID. I have this theory that each map task has a unique identifier associated with some configuration parameter, but I don't know the name of that parameter. Is there an integer associated with each task? If so, how do I get it? While we're at it, is there a way to get the total number of map tasks? Thanks, -Joel
Any examples using Hadoop Pipes with binary SequenceFiles?
Hi folks; I'm interested in reading binary data, running it through some C++ code, and writing the result as binary data. It looks like SequenceFiles and Pipes are the way to do it, but I can't find any examples or docs beyond the API specification. Can someone point me to an example where this is done? Thanks, -Joel
Re: Problems increasing number of tasks per node- really a task management problem!
I think I've found my problem. At some point about a week ago, I must have tried to start new tasktracker processes on my worker nodes without killing the ones that were already there. The new processes died immediately because their sockets were already in use. The old processes then took over their roles, running happily with new JobTrackers and doing tasks as requested. The pid files that are supposed to point to the tasktrackers did not contain their pids, however, and 'bin/stop-mapred.sh' chooses its targets from the pid files. So I could do 'bin/stop-mapred.sh' all day long without killing them. I ended up killing them explicitly one node at a time. These tasktrackers knew the *old* config values that were in force when they were started, so pushing the new values out to the worker nodes had no effect. So. Is there any mechanism for killing 'rogue' tasktrackers? I'm a little surprised that they are killed via their pids rather than by sending them a kill signal via the same mechanism whereby they learn of new work. -Joel [EMAIL PROTECTED] On Tue, 2008-09-23 at 14:29 -0700, Arun C Murthy wrote: > On Sep 23, 2008, at 2:21 PM, Joel Welling wrote: > > > Stopping and restarting the mapred service should push the new .xml > > file > > out, should it not? I've done 'bin/mapred-stop.sh', > > No, you need to run 'bin/mapred-stop.sh', push it out to all the > machines and then do 'bin/mapred-start.sh'. > > You do see it in your job's config - but that config isn't used by the > TaskTrackers. They use the config in their HADOOP_CONF_DIR; which is > why you'd need to push it to all machines. > > Arun
Re: Problems increasing number of tasks per node
Stopping and restarting the mapred service should push the new .xml file out, should it not? I've done 'bin/mapred-stop.sh', 'bin/mapred-start.sh', and I can see my new values in the file:.../mapred/system/job_SomeNumber_SomeNumber/job.xml files associated with the jobs. The mapred.tasktracker.map.tasks.maximum values shown in those files are 8, but each worker node tasktracker still uses the value 2. What file should contain the xml for the tasktracker itself? Does the maximum map task number get set when the task tracker is spawned, or can a new job reset the number? Thanks, -Joel On Tue, 2008-09-23 at 11:46 -0700, Arun C Murthy wrote: > On Sep 23, 2008, at 11:41 AM, Joel Welling wrote: > > > Hi folks; > > I have a small cluster, but each node is big- 8 cores each, with lots > > of IO bandwidth. I'd like to increase the number of simultaneous map > > and reduce tasks scheduled per node from the default of 2 to something > > like 8. > > My understanding is that I should be able to do this by increasing > > mapred.tasktracker.reduce.tasks.maximum and > > mapred.tasktracker.map.tasks.maximum , but doing so does not increase > > the number of tasks. I've been running gridmix with these parameters > > set to 4, but the average number of tasks per node stays at 4, with 2 > > reduce and 2 map. > > Am I missing something? Do I need to adjust something else as well? > > > > Please ensure that _all_ machines (tasktrackers) have this updated > configuration file... the above config knobs are used by the > TaskTrackers and hence they need to have the updated configs. > > Arun > > > Thanks, > > -Joel > > [EMAIL PROTECTED] > >
Problems increasing number of tasks per node
Hi folks; I have a small cluster, but each node is big- 8 cores each, with lots of IO bandwidth. I'd like to increase the number of simultaneous map and reduce tasks scheduled per node from the default of 2 to something like 8. My understanding is that I should be able to do this by increasing mapred.tasktracker.reduce.tasks.maximum and mapred.tasktracker.map.tasks.maximum , but doing so does not increase the number of tasks. I've been running gridmix with these parameters set to 4, but the average number of tasks per node stays at 4, with 2 reduce and 2 map. Am I missing something? Do I need to adjust something else as well? Thanks, -Joel [EMAIL PROTECTED]
gridmix on a small cluster?
Hi folks; I'd like to try the gridmix benchmark on my small cluster (3 nodes at 8 cores each, Lustre with IB interconnect). The documentation for gridmix suggests that it will take 4 hours on a 500 node cluster, which suggests it would take me something like a week to run. Is there a way to scale the problem size back? I don't mind the file size too much, but the running time would be excessive if things scale linearly with the number of nodes. Thanks, -Joel
Ordering of records in output files?
Hi folks; I have a simple Streaming job where the mapper produces output records beginning with a 16 character ascii string and passes them to IdentityReducer. When I run it, I get the same number of output files as I have mapred.reduce.tasks . Each one contains some of the strings, and within each file the strings are in sorted order. But there is no obvious ordering *across* the files. For example, I can see where the first few strings in the output went to files 0,1,3,4, and then back to 0, but none of them ended up in file 2. What's the algorithm that determines which strings end up in which files? Is there a way I can change it so that sequentially ordered strings end up in the same file rather than spraying off across all the files? Thanks, -Joel [EMAIL PROTECTED]
Trouble doing two-step sort
Hi folks; I'm trying to do a Hadoop streaming job which involves a two-step sort, of the type described at http://hadoop.apache.org/core/docs/r0.18.0/streaming.html#A+Useful +Partitioner+Class+%28secondary+sort%2C+the+-partitioner +org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner+option%29 . I've got a python script that emits records like: 020700414.0140. 140 12 132475 Over a whole bunch of such records I expect to find about a dozen with each first number (020700414 in this case), with consecutive values of the second number (0140 in this case). I'm using IdentityReducer and KeyFieldBasedPartitioner. I'd like to partition them based only on the first number, because I definitely want all records with the same first number to end up in the same output file. Then I'd like those to be sorted on the second number, so each output file contains a set of ordered records for each first number, and all records for a given first number end up in the same file. I've tried setting these values: stream.map.output.field.separator=. map.output.key.field.separator=. stream.num.map.output.key.fields=2 num.key.fields.for.partition=1 but I end up with each first number evenly distributed over all the output files! In each, the records for a given first number appear together and the second numbers are in the right order. This isn't the result I wanted. Am I misunderstanding this example somehow? What settings should give me the expected output order? Thanks, I hope; -Joel Welling [EMAIL PROTECTED]
Re: Hadoop over Lustre?
That seems to have done the trick! I am now running Hadoop 0.18 straight out of Lustre, without an intervening HDFS. The unusual things about my hadoop-site.xml are: fs.default.name file:///bessemer/welling mapred.system.dir ${fs.default.name}/hadoop_tmp/mapred/system The shared directory where MapReduce stores control files. where /bessemer/welling is a directory on a mounted Lustre filesystem. I then do 'bin/start-mapred.sh' (without starting dfs), and I can run Hadoop programs normally. I do have to specify full input and output file paths- they don't seem to be relative to fs.default.name . That's not too troublesome, though. Thanks very much! -Joel [EMAIL PROTECTED] On Fri, 2008-08-29 at 10:52 -0700, Owen O'Malley wrote: > Check the setting for mapred.system.dir. This needs to be a path that is on > a distributed file system. In old versions of Hadoop, it had to be on the > default file system, but that is no longer true. In recent versions, the > system dir only needs to be configured on the JobTracker and it is passed to > the TaskTrackers and clients.
Re: Hadoop over Lustre?
Sorry; I'm picking this thread up after a couple day's delay. Setting fs.default.name to the equivalent of file:///path/to/lustre and changing mapred.job.tracker to just a hostname and port does allow mapreduce to start up. However, test jobs fail with the exceptions below. It looks like TaskTracker.localizeJob is looking for job.xml in the local filesystem; I would have expected it to look in lustre. I can't find that particular job.xml anywhere on the system after the run aborts, I'm afraid. I guess it's getting cleaned up. Thanks, -Joel 08/08/28 18:46:07 INFO mapred.FileInputFormat: Total input paths to process : 1508/08/28 18:46:07 INFO mapred.FileInputFormat: Total input paths to process : 1508/08/28 18:46:08 INFO mapred.JobClient: Running job: job_200808281828_0002 08/08/28 18:46:09 INFO mapred.JobClient: map 0% reduce 0% 08/08/28 18:46:12 INFO mapred.JobClient: Task Id : attempt_200808281828_0002_m_00_0, Status : FAILED Error initializing attempt_200808281828_0002_m_00_0: java.io.IOException: file:/tmp/hadoop-welling/mapred/system/job_200808281828_0002/job.xml: No such file or directory at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:216) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:150) at org.apache.hadoop.fs.LocalFileSystem.copyToLocalFile(LocalFileSystem.java:55) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1193) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:668) at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1306) at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:946) at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1343) at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2354) 08/08/28 18:46:12 WARN mapred.JobClient: Error reading task outputhttp://foo.psc.edu:50060/tasklog?plaintext=true&taskid=attempt_200808281828_0002_m_00_0&filter=stdout 08/08/28 18:46:12 WARN mapred.JobClient: Error reading task outputhttp://foo.psc.edu:50060/tasklog?plaintext=true&taskid=attempt_200808281828_0002_m_00_0&filter=stderr On Mon, 2008-08-25 at 14:24 -0700, Konstantin Shvachko wrote: > mapred.job.tracker is the address and port of the JobTracker - the main > server that controls map-reduce jobs. > Every task tracker needs to know the address in order to connect. > Do you follow the docs, e.g. that one > http://wiki.apache.org/hadoop/GettingStartedWithHadoop > > Can you start one node cluster? > > > Are there standard tests of hadoop performance? > > There is the sort benchmark. We also run DFSIO benchmark for read and write > throughputs. > > --Konstantin > > Joel Welling wrote: > > So far no success, Konstantin- the hadoop job seems to start up, but > > fails immediately leaving no logs. What is the appropriate setting for > > mapred.job.tracker ? The generic value references hdfs, but it also has > > a port number- I'm not sure what that means. > > > > My cluster is small, but if I get this working I'd be very happy to run > > some benchmarks. Are there standard tests of hadoop performance? > > > > -Joel > > [EMAIL PROTECTED] > > > > On Fri, 2008-08-22 at 15:59 -0700, Konstantin Shvachko wrote: > >> I think the solution should be easier than Arun and Steve advise. > >> Lustre is already mounted as a local directory on each cluster machines, > >> right? > >> Say, it is mounted on /mnt/lustre. > >> Then you configure hadoop-site.xml and set > >> > >>fs.default.name > >>file:///mnt/lustre > >> > >> And then you start map-reduce only without hdfs using start-mapred.sh > >> > >> By this you basically redirect all FileSystem requests to Lustre and you > >> don't need > >> data-nodes or the name-node. > >> > >> Please let me know if that works. > >> > >> Also it would very interesting to have your experience shared on this list. > >> Problems, performance - everything is quite interesting. > >> > >> Cheers, > >> --Konstantin > >> > >> Joel Welling wrote: > >>>> 2. Could you set up symlinks from the local filesystem, so point every > >>>> node at a local dir > >>>> /tmp/hadoop > >>>> with each node pointing to a different subdir in the big filesystem? > >>> Yes, I could do that! Do I need to do it for the log directories as > >>> well, or can they be shared? > >>> > >>> -Joel > >>> > >>> On Fri, 2008-08-22 at 15:48 +0100, Steve Loughran wrote: > &g
Re: Hadoop over Lustre?
So far no success, Konstantin- the hadoop job seems to start up, but fails immediately leaving no logs. What is the appropriate setting for mapred.job.tracker ? The generic value references hdfs, but it also has a port number- I'm not sure what that means. My cluster is small, but if I get this working I'd be very happy to run some benchmarks. Are there standard tests of hadoop performance? -Joel [EMAIL PROTECTED] On Fri, 2008-08-22 at 15:59 -0700, Konstantin Shvachko wrote: > I think the solution should be easier than Arun and Steve advise. > Lustre is already mounted as a local directory on each cluster machines, > right? > Say, it is mounted on /mnt/lustre. > Then you configure hadoop-site.xml and set > >fs.default.name >file:///mnt/lustre > > And then you start map-reduce only without hdfs using start-mapred.sh > > By this you basically redirect all FileSystem requests to Lustre and you > don't need > data-nodes or the name-node. > > Please let me know if that works. > > Also it would very interesting to have your experience shared on this list. > Problems, performance - everything is quite interesting. > > Cheers, > --Konstantin > > Joel Welling wrote: > >> 2. Could you set up symlinks from the local filesystem, so point every > >> node at a local dir > >> /tmp/hadoop > >> with each node pointing to a different subdir in the big filesystem? > > > > Yes, I could do that! Do I need to do it for the log directories as > > well, or can they be shared? > > > > -Joel > > > > On Fri, 2008-08-22 at 15:48 +0100, Steve Loughran wrote: > >> Joel Welling wrote: > >>> Thanks, Steve and Arun. I'll definitely try to write something based on > >>> the KFS interface. I think that for our applications putting the mapper > >>> on the right rack is not going to be that useful. A lot of our > >>> calculations are going to be disordered stuff based on 3D spatial > >>> relationships like nearest-neighbor finding, so things will be in a > >>> random access pattern most of the time. > >>> > >>> Is there a way to set up the configuration for HDFS so that different > >>> datanodes keep their data in different directories? That would be a big > >>> help in the short term. > >> yes, but you'd have to push out a different config to each datanode. > >> > >> 1. I have some stuff that could help there, but its not ready for > >> production use yet [1]. > >> > >> 2. Could you set up symlinks from the local filesystem, so point every > >> node at a local dir > >> /tmp/hadoop > >> with each node pointing to a different subdir in the big filesystem? > >> > >> > >> [1] > >> http://people.apache.org/~stevel/slides/deploying_hadoop_with_smartfrog.pdf > > > >
Re: Hadoop over Lustre?
> 2. Could you set up symlinks from the local filesystem, so point every > node at a local dir > /tmp/hadoop > with each node pointing to a different subdir in the big filesystem? Yes, I could do that! Do I need to do it for the log directories as well, or can they be shared? -Joel On Fri, 2008-08-22 at 15:48 +0100, Steve Loughran wrote: > Joel Welling wrote: > > Thanks, Steve and Arun. I'll definitely try to write something based on > > the KFS interface. I think that for our applications putting the mapper > > on the right rack is not going to be that useful. A lot of our > > calculations are going to be disordered stuff based on 3D spatial > > relationships like nearest-neighbor finding, so things will be in a > > random access pattern most of the time. > > > > Is there a way to set up the configuration for HDFS so that different > > datanodes keep their data in different directories? That would be a big > > help in the short term. > > yes, but you'd have to push out a different config to each datanode. > > 1. I have some stuff that could help there, but its not ready for > production use yet [1]. > > 2. Could you set up symlinks from the local filesystem, so point every > node at a local dir > /tmp/hadoop > with each node pointing to a different subdir in the big filesystem? > > > [1] > http://people.apache.org/~stevel/slides/deploying_hadoop_with_smartfrog.pdf
Re: Hadoop over Lustre?
Thanks, Steve and Arun. I'll definitely try to write something based on the KFS interface. I think that for our applications putting the mapper on the right rack is not going to be that useful. A lot of our calculations are going to be disordered stuff based on 3D spatial relationships like nearest-neighbor finding, so things will be in a random access pattern most of the time. Is there a way to set up the configuration for HDFS so that different datanodes keep their data in different directories? That would be a big help in the short term. -Joel [EMAIL PROTECTED] On Fri, 2008-08-22 at 10:48 +0100, Steve Loughran wrote: > Joel Welling wrote: > > Hi folks; > > I'm new to Hadoop, and I'm trying to set it up on a cluster for which > > almost all the disk is mounted via the Lustre filesystem. That > > filesystem is visible to all the nodes, so I don't actually need HDFS to > > implement a shared filesystem. (I know the philosophical reasons why > > people say local disks are better for Hadoop, but that's not the > > situation I've got). My system is failing, and I think it's because the > > different nodes are tripping over each other when they try to run HDFS > > out of the same directory tree. > > Is there a way to turn off HDFS and just let Lustre do the distributed > > filesystem? I've seen discussion threads about Hadoop with NFS which > > said something like 'just specify a local filesystem and everything will > > be fine', but I don't know how to do that. I'm using Hadoop 0.17.2. > > > > I dont know enough about Lustre to be very useful > > * You shouldnt have nodes trying to use the same directories. At the > very least, point each datanode at a different bit of the filesystem. > > * If there is a specific API call to find out which rack has the data, > that could be used to place work near the data. Someone (==you) would > have to write a new filesystem back-end for Hadoop for this. > > -steve
Hadoop over Lustre?
Hi folks; I'm new to Hadoop, and I'm trying to set it up on a cluster for which almost all the disk is mounted via the Lustre filesystem. That filesystem is visible to all the nodes, so I don't actually need HDFS to implement a shared filesystem. (I know the philosophical reasons why people say local disks are better for Hadoop, but that's not the situation I've got). My system is failing, and I think it's because the different nodes are tripping over each other when they try to run HDFS out of the same directory tree. Is there a way to turn off HDFS and just let Lustre do the distributed filesystem? I've seen discussion threads about Hadoop with NFS which said something like 'just specify a local filesystem and everything will be fine', but I don't know how to do that. I'm using Hadoop 0.17.2. Thanks, I hope; -Joel Welling [EMAIL PROTECTED]
Problem with installation of 0.17.0: things start but tests fail
I'm new to Hadoop, and am doing an installation of 0.17.1 on a small system. The only thing that I know is unusual about the system is that the underlying filesystem is in fact shared- a single Lustre filesystem is visible to all the nodes, and individual nodes all seem to be storing their blocks in the same directory, under /current . I follow the installation instructions on http://hadoop.apache.org/core/docs/current/cluster_setup.html , and everything seems to start up just fine. I then try the test problem from http://hadoop.apache.org/core/docs/current/quickstart.html with: bin/hadoop dfs -put conf input bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+' This results in: 08/08/13 19:18:47 INFO mapred.FileInputFormat: Total input paths to process : 1308/08/13 19:18:47 INFO mapred.JobClient: Running job: job_200808131916_0001 08/08/13 19:18:49 INFO mapred.JobClient: map 0% reduce 0% 08/08/13 19:18:51 INFO mapred.JobClient: map 14% reduce 0% 08/08/13 19:19:05 INFO mapred.JobClient: map 14% reduce 4% 08/08/13 19:19:05 INFO mapred.JobClient: Task Id : task_200808131916_0001_m_02_0, Status : FAILED java.io.IOException: Could not obtain block: blk_-4001091320210055678 file=/user/welling/input/log4j.properties at org.apache.hadoop.dfs.DFSClient $DFSInputStream.chooseDataNode(DFSClient.java:1430) ... and similar errors on a bunch of other files. When I look at the status of a particular file, like log4j.properties, I see: java.io.IOException: Got error in response to OP_READ_BLOCK for file muir03.psc.edu/128.182.99.103:50010:-4001091320210055678 for block -4001091320210055678 However, I do see a block by that name in /current ! Is this some sort of configuration issue? Do I need to somehow arrange for the different datanodes to keep their files in separate directories? Thanks, I hope, -Joel Welling [EMAIL PROTECTED]