Re: only one reducer running in a hadoop cluster

2009-02-09 Thread Owen O'Malley


On Feb 7, 2009, at 11:52 PM, Nick Cen wrote:


Hi,

I hava a hadoop cluster with 4 pc. And I wanna to integrate hadoop and
lucene together, so i copy some of the source code from nutch's  
Indexer
class, but when i run my job, i found that there is only 1 reducer  
running

on 1 pc, so the performance is not as far as expect.


Set mapred.reduce.tasks in your configuration to the number of  
reduces, you want your jobs to have by default. Typically this should  
be 0.99 * mapred.tasktracker.reduce.tasks.maximum * number of computers.


RE: Reduce won't start until Map stage reaches 100%?

2009-02-09 Thread zhuweimin
Hi

I think the number of your job's reduce task is 1
because if the number of reduce task is 1 then reduce stage does not start
until Map stage 100% completion.

zhuweimin

-Original Message-
From: Taeho Kang [mailto:tka...@gmail.com] 
Sent: Monday, February 09, 2009 4:26 PM
To: hadoop-u...@lucene.apache.org
Subject: Reduce won't start until Map stage reaches 100%?

Dear All,

With Hadoop 0.19.0, Reduce stage does not start until Map stage gets to the
100% completion.
Has anyone faced the similar situation?

 ... ...
 -  map 90% reduce 0%
-  map 91% reduce 0%
-  map 92% reduce 0%
-  map 93% reduce 0%
-  map 94% reduce 0%
-  map 95% reduce 0%
-  map 96% reduce 0%
-  map 97% reduce 0%
-  map 98% reduce 0%
-  map 99% reduce 0%
-  map 100% reduce 0%
-  map 100% reduce 1%
-  map 100% reduce 2%
-  map 100% reduce 3%
-  map 100% reduce 4%
-  map 100% reduce 5%
-  map 100% reduce 6%
-  map 100% reduce 7%
-  map 100% reduce 8%
-  map 100% reduce 9%

Thank you all in advance,

/Taeho




Re: lost TaskTrackers

2009-02-09 Thread Vadim Zaliva
yes, I can access DFS from the cluster. namenode status seems to be OK
and I see no errors in namenode log files.

initially all trackers were visible, and 9433 maps completed
successfully. Then, this was followed by 65975 which were killed. In
log they all show same error:

Error initializing attempt_200902081049_0001_m_004499_1:
java.lang.NullPointerException
at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:459)
at org.apache.hadoop.ipc.Client.call(Client.java:686)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at $Proxy5.getFileInfo(Unknown Source)
at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy5.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:578)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:390)
at 
org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:699)
at 
org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1636)
at 
org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:102)
at 
org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1602)

While this is happening, I can access Job tracker web interface, but
it shows that there is 0 nodes in the cluster. I have tried to run
this task several times and the result is always the same. It works at
first and then starts failing.

Vadim

On Sun, Feb 8, 2009 at 22:19, Amar Kamat ama...@yahoo-inc.com wrote:
 Vadim Zaliva wrote:

 Hi!

 I am observing strange situation in my Hadoop cluster. While running
 task, eventually it gets into
 this strange mode where:

 1. JobTracker reports 0 task trackers.

 2. Task tracker processes are alive but log file is full of repeating
 messages like this:

 2009-02-08 19:16:47,761 INFO org.apache.hadoop.mapred.TaskRunner:
 attempt_200902
 081049_0001_m_017698_0 done; removing files.
 2009-02-08 19:16:47,761 INFO org.apache.hadoop.mapred.IndexCache: Map ID
 attempt
 _200902081049_0001_m_017698_0 not found in cache
 2009-02-08 19:16:47,761 INFO org.apache.hadoop.mapred.TaskRunner:
 attempt_200902
 081049_0001_m_021212_0 done; removing files.
 2009-02-08 19:16:47,762 INFO org.apache.hadoop.mapred.IndexCache: Map ID
 attempt
 _200902081049_0001_m_021212_0 not found in cache
 2009-02-08 19:16:47,762 INFO org.apache.hadoop.mapred.TaskRunner:
 attempt_200902
 081049_0001_m_022133_0 done; removing files.

 with new one appearing every couple of seconds.

 In the task tracker log, before these repeating messages last 2 exceptions
 are:

 2009-02-08 17:46:51,482 INFO org.apache.hadoop.mapred.TaskTracker:
 LaunchTaskAction (registerTask): attempt_200902081049_0001_m_075408_3
 2009-02-08 17:46:51,482 INFO org.apache.hadoop.mapred.TaskTracker:
 Trying to launch : attempt_200902081049_0001_m_075408_3
 2009-02-08 17:46:51,482 INFO org.apache.hadoop.mapred.TaskTracker: In
 TaskLauncher, current free slots : 8 and trying to launch
 attempt_200902081049_0001_m_07
 5408_3
 2009-02-08 17:46:51,483 WARN org.apache.hadoop.mapred.TaskTracker:
 Error initializing attempt_200902081049_0001_m_075408_3:
 java.lang.NullPointerException
at
 org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:459)
at org.apache.hadoop.ipc.Client.call(Client.java:686)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at $Proxy5.getFileInfo(Unknown Source)
at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy5.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:578)
at
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:390)
at
 org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:699)
at
 org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1636)
at
 org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:102)
at
 org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1602)



 Looks like an RPC issue. Can you tell more about the cluster? Is there a
 task that finished successfully in this job? Can you access the dfs from the
 trackers?

 2009-02-08 

Re: only one reducer running in a hadoop cluster

2009-02-09 Thread Nick Cen
Thanks everyone. I find the solution for this one, in my main method, i call
the setNumReductTask() on JobConf with the value i want.

2009/2/9 Owen O'Malley omal...@apache.org


 On Feb 7, 2009, at 11:52 PM, Nick Cen wrote:

  Hi,

 I hava a hadoop cluster with 4 pc. And I wanna to integrate hadoop and
 lucene together, so i copy some of the source code from nutch's Indexer
 class, but when i run my job, i found that there is only 1 reducer running
 on 1 pc, so the performance is not as far as expect.


 Set mapred.reduce.tasks in your configuration to the number of reduces, you
 want your jobs to have by default. Typically this should be 0.99 *
 mapred.tasktracker.reduce.tasks.maximum * number of computers.




-- 
http://daily.appspot.com/food/


Re: can't read the SequenceFile correctly

2009-02-09 Thread Owen O'Malley


On Feb 6, 2009, at 8:52 AM, Bhupesh Bansal wrote:


Hey Tom,

I got also burned by this ?? Why does BytesWritable.getBytes() returns
non-vaild bytes ?? Or we should add a BytesWritable.getValidBytes()  
kind of function.


It does it because continually resizing the array to the valid  
length is very expensive. It would be a reasonable patch to add a  
getValidBytes, but most methods in Java's libraries are aware of this  
and let you pass in byte[], offset, and length. So once you realize  
what the problem is, you can work around it.


-- Owen


Re: Reduce won't start until Map stage reaches 100%?

2009-02-09 Thread Matei Zaharia
I believe that in Hadoop 0.19, scheduling was changed so that reduces don't
start until 5% of maps have completed. The reasoning for this is that
reduces can't do anything until there is some map output to copy over the
network. So, if your job has very few map tasks, you won't see reduces start
until the end.

On Mon, Feb 9, 2009 at 12:51 AM, zhuweimin xim-...@tsm.kddilabs.jp wrote:

 Hi

 I think the number of your job's reduce task is 1
 because if the number of reduce task is 1 then reduce stage does not start
 until Map stage 100% completion.

 zhuweimin

 -Original Message-
 From: Taeho Kang [mailto:tka...@gmail.com]
 Sent: Monday, February 09, 2009 4:26 PM
 To: hadoop-u...@lucene.apache.org
 Subject: Reduce won't start until Map stage reaches 100%?

 Dear All,

 With Hadoop 0.19.0, Reduce stage does not start until Map stage gets to the
 100% completion.
 Has anyone faced the similar situation?

  ... ...
  -  map 90% reduce 0%
 -  map 91% reduce 0%
 -  map 92% reduce 0%
 -  map 93% reduce 0%
 -  map 94% reduce 0%
 -  map 95% reduce 0%
 -  map 96% reduce 0%
 -  map 97% reduce 0%
 -  map 98% reduce 0%
 -  map 99% reduce 0%
 -  map 100% reduce 0%
 -  map 100% reduce 1%
 -  map 100% reduce 2%
 -  map 100% reduce 3%
 -  map 100% reduce 4%
 -  map 100% reduce 5%
 -  map 100% reduce 6%
 -  map 100% reduce 7%
 -  map 100% reduce 8%
 -  map 100% reduce 9%

 Thank you all in advance,

 /Taeho





Re: Reduce won't start until Map stage reaches 100%?

2009-02-09 Thread Arun C Murthy


On Feb 8, 2009, at 11:26 PM, Taeho Kang wrote:


Dear All,

With Hadoop 0.19.0, Reduce stage does not start until Map stage gets  
to the

100% completion.
Has anyone faced the similar situation?



How many maps and reduces does your job have?

Arun


Using the Open Source Hadoop to Generate Data-Intensive Insights

2009-02-09 Thread Bonesata
Wednesday Feb 11, Mountain View, CA
info/registration:
http://www.meetup.com/CIO-IT-Executives/calendar/9528874/

Speaker:
Rob Weltman has been Director of Engineering in Enterprise Software at Nescape, 
Chief Architect at AOL, and Director of Engineering for Yahoo's data warehouse 
technology. He is currently Director of Grid Services at Yahoo.

Gaining and keeping a competitive edge in Internet offerings has increasingly 
become a matter of continuously processing enormous volumes of data about 
users, user activities, Web sites, ads, and Web searches. There is gold in the 
mountain of data but it is often impossible to extract in time to make use of 
it if you are constrained to a single (albeit powerful) computer or database. 
Hadoop (http://hadoop.apache) is open source software for creating a 
cluster of commodity computers from one node to several thousand nodes in size 
and internally managing petabytes of data. It provides a simple interface for 
attaching user-written code to be executed in parallel on some or all of the 
nodes in the cluster. As an option to creating your own Hadoop cluster, there 
are Hadoop AMIs (Amazon Machine Images - virtual machines) that allow you to 
create and run Hadoop programs on Amazon's EC2 infrastructure.

Rob will talk about what Hadoop is, options for writing programs that run on a 
Hadoop cluster, and Yahoo use cases where Hadoop has proved beneficial in 
dealing with very large data volumes.

--

Rob Weltman has been Director of Engineering in Enterprise Software at Nescape, 
Chief Architect at AOL, and Director of Engineering for Yahoo's data warehouse 
technology. He is currently Director of Grid Services at Yahoo.

Gourmet dinner and wine are included.


Re: using HDFS for a distributed storage system

2009-02-09 Thread Brian Bockelman

Hey Amit,

Your current thoughts on keeping block size larger and removing the  
very small files are along the right line.  Why not chose the default  
size of 64MB or larger?  You don't seem too concerned about the number  
of replicas.


However, you're still fighting against the tide.  You've got enough  
files that you'll be pushing against block report and namenode  
limitations, especially with 20 - 50 million files.  We find that  
about 500k blocks per node is a good stopping point right now.


You really, really need to figure out how to organize your files in  
such a way that the average file size is above 64MB.  Is there a  
primary key for each file?  If so, maybe consider HBASE?  If you  
just are going to be sequentially scanning through all your files,  
consider archiving them all to a single sequence file.


Your individual data nodes are quite large ... I hope you're not  
expecting to measure throughput in 10's of Gbps?


It's hard to give advice without knowing more about your application.   
I can tell you that you're going to run into a significant wall if you  
can't figure out a means for making your average file size at least  
greater than 64MB.


Brian

On Feb 8, 2009, at 10:06 PM, Amit Chandel wrote:


Hi Group,

I am planning to use HDFS as a reliable and distributed file system  
for
batch operations. No plans as of now to run any map reduce job on  
top of it,
but in future we will be having map reduce operations on files in  
HDFS.


The current (test) system has 3 machines:
NameNode: dual core CPU, 2GB RAM, 500GB HDD
2 DataNodes: Both of them with a dual core CPU, 2GB of RAM and 1.5TB  
of

space with ext3 filesystem.

I just need to put and retrieve files from this system. The files  
which I
will put in HDFS varies from a few Bytes to a around 100MB, with the  
average

file-size being 5MB. and the number of files would grow around 20-50
million. To avoid hitting limit of number of files under a  
directory, I
store each file at the path derived by the SHA1 hash of its content  
(which
is 20bytes long, and I create a 10 level deep path using 2bytes for  
each
level). When I started the cluster a month back, I had kept the  
default

block size to 1MB.

The hardware specs mentioned at
http://wiki.apache.org/hadoop/MachineScalingconsiders running map
reduce operations. So not sure if my setup is good
enough. I would like to get input on this setup.
The final cluster would have each datanode with 8GB RAM, a quad core  
CPU,

and 25 TB attached storage.

I played with this setup a little and then planned to increase the  
disk
space on both the DataNodes. I started by  increasing its disk  
capacity of
first dataNode to 15TB and changing the underlying filesystem to XFS  
(which
made it a clean datanode), and put it back in the system. Before  
performing

this operation, I had inserted around 7 files in HDFS.
**NameNode:50070/dfshealth.jsp
showd  *677323 files and directories, 332419 blocks = 1009742 total  
*. I
guess the way I create a 10 level deep path for the file results in  
~10
times the number of actual files in HDFS. Please correct me if I am  
wrong. I

then ran the rebalancer on the cleaned up DataNode, which was too slow
(writing 2blocks per second i.e. 2MBps) to begin with and died after  
a few
hours saying too many open files. I checked all the machiens and all  
the
DataNode and NameNode processes were running fine on all the  
respective

machines, but the dfshealth.jsp showd both the datanodes to be dead.
Re-starting the cluster brought both of them up. I guess this has to  
do with
RAM requirements. My question is how to figure out the RAM  
requirements of
DataNode and NameNode in this situation. The documentation states  
that both
Datanode and NameNode stores the block index. Its not quite clear if  
all the
index is in memory. Once I have figured that out, how can I instruct  
the

hadoop to rebalance with high priority?

Another question is regarding the Non DFS used: statistics shown  
on the

dfshealth.jsp: Is it  the space used to store the files and directory
metadata information (apart from the actual file content blocks)?  
Right now

it is 1/4th of the total space used by HDFS.

Some points which I have thought of over the last month to improve  
this

model are:
1. I should keep very small files (lets say smaller than 1KB) out of  
HDFS.
2. Reduce the dir level of the file path created by SHA1 hash  
(instead of

10, I can keep 3).
3. I should increase the block size to reduce the number of blocks  
in HDFS (
http://mail-archives.apache.org/mod_mbox/hadoop-core-user/ 
200805.mbox/
4aa34eb70805180030u5de8efaam6f1e9a8832636...@mail.gmail.com says it  
won't

result in waste of disk space).

More improvement advices are appreciated.

Thanks,
Amit




TaskTrackers being double counted after restart job recovery

2009-02-09 Thread Stefan Will
Hi,

I¹m using the new persistent job state feature in 0.19.0, and it¹s worked
really well so far. However, this morning my JobTracker died with and OOM
error (even though the heap size is set to 768M). So I killed it and all the
TaskTrackers. After starting everything up again, all my nodes were showing
up twice in the JobTracker web interface, with different port numbers. Also,
some of the jobs it restarted had already been completed when the job
tracker died.

 Any idea what might be happening here ? How can I fix this ? Will
temporarily setting mapred.jobtracker.restart.recover=false clear things up
?

-- Stefan


Re: TaskTrackers being double counted after restart job recovery

2009-02-09 Thread Owen O'Malley
There is a bug that when we restart the TaskTrackers they get counted twice.
The problem is the name is generated from the hostname and port number. When
TaskTrackers restart they get a new port number and get counted again. The
problem goes away when the old TaskTrackers time out in 10 minutes or you
restart the JobTracker.

-- Owen


Re: can't read the SequenceFile correctly

2009-02-09 Thread Raghu Angadi


+1 on something like getValidBytes(). Just the existence of this would 
warn many programmers about getBytes().


Raghu.

Owen O'Malley wrote:


On Feb 6, 2009, at 8:52 AM, Bhupesh Bansal wrote:


Hey Tom,

I got also burned by this ?? Why does BytesWritable.getBytes() returns
non-vaild bytes ?? Or we should add a BytesWritable.getValidBytes() 
kind of function.


It does it because continually resizing the array to the valid length 
is very expensive. It would be a reasonable patch to add a 
getValidBytes, but most methods in Java's libraries are aware of this 
and let you pass in byte[], offset, and length. So once you realize what 
the problem is, you can work around it.


-- Owen




java.io.IOException: Could not get block locations. Aborting...

2009-02-09 Thread Scott Whitecross

Hi all -

I've been running into this error the past few days:
java.io.IOException: Could not get block locations. Aborting...
	at org.apache.hadoop.dfs.DFSClient 
$DFSOutputStream.processDatanodeError(DFSClient.java:2143)
	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access 
$1400(DFSClient.java:1735)
	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream 
$DataStreamer.run(DFSClient.java:1889)


It seems to be related to trying to write to many files to HDFS.  I  
have a class extending  
org.apache.hadoop.mapred.lib.MultipleOutputFormat and if I output to a  
few file names, everything works.  However, if I output to thousands  
of small files, the above error occurs.  I'm having trouble isolating  
the problem, as the problem doesn't occur in the debugger unfortunately.


Is this a memory issue, or is there an upper limit to the number of  
files HDFS can hold?  Any settings to adjust?


Thanks.

Re: using HDFS for a distributed storage system

2009-02-09 Thread Amit Chandel
Thanks Brian for your inputs.

I am eventually targeting to store 200k directories each containing  75
files on avg, with average size of directory being 300MB (ranging from 50MB
to 650MB) in this storage system.

It will mostly be an archival storage from where I should be able to access
any of the old files easily. But the recent directories would be accessed
frequently for a day or 2 as they are being added. They are added in batches
of 500-1000 per week, and there can be rare bursts of adding 50k directories
once in 3 months. One such burst is about to come in a month, and I want to
test the current test setup against that burst. We have upgraded our test
hardware a little bit from what I last mentioned. The test setup will have 3
DataNodes with 15TB space on each, 6G RAM, dual core processor, and a
NameNode 500G storage, 6G RAM, dual core processor.

I am planning to add the individual files initially, and after a while (lets
say 2 days after insertion) will make a SequenceFile out of each directory
(I am currently looking into SequenceFile) and delete the previous files of
that directory from HDFS. That way in future, I can access any file given
its directory without much effort.
Now that SequenceFile is in picture, I can make default block size to 64MB
or even 128MB. For replication, I am just replicating a file at 1 extra
location (i.e. replication factor = 2, since a replication factor 3 will
leave me with only 33% of the usable storage). Regarding reading back from
HDFS, if I can read at ~50MBps (for recent files), that would be enough.

Let me know if you see any more pitfalls in this setup, or have more
suggestions. I really appreciate it. Once I test this setup, I will put the
results back to the list.

Thanks,
Amit


On Mon, Feb 9, 2009 at 12:39 PM, Brian Bockelman bbock...@cse.unl.eduwrote:

 Hey Amit,

 Your current thoughts on keeping block size larger and removing the very
 small files are along the right line.  Why not chose the default size of
 64MB or larger?  You don't seem too concerned about the number of replicas.

 However, you're still fighting against the tide.  You've got enough files
 that you'll be pushing against block report and namenode limitations,
 especially with 20 - 50 million files.  We find that about 500k blocks per
 node is a good stopping point right now.

 You really, really need to figure out how to organize your files in such a
 way that the average file size is above 64MB.  Is there a primary key for
 each file?  If so, maybe consider HBASE?  If you just are going to be
 sequentially scanning through all your files, consider archiving them all to
 a single sequence file.

 Your individual data nodes are quite large ... I hope you're not expecting
 to measure throughput in 10's of Gbps?

 It's hard to give advice without knowing more about your application.  I
 can tell you that you're going to run into a significant wall if you can't
 figure out a means for making your average file size at least greater than
 64MB.

 Brian

 On Feb 8, 2009, at 10:06 PM, Amit Chandel wrote:

  Hi Group,

 I am planning to use HDFS as a reliable and distributed file system for
 batch operations. No plans as of now to run any map reduce job on top of
 it,
 but in future we will be having map reduce operations on files in HDFS.

 The current (test) system has 3 machines:
 NameNode: dual core CPU, 2GB RAM, 500GB HDD
 2 DataNodes: Both of them with a dual core CPU, 2GB of RAM and 1.5TB of
 space with ext3 filesystem.

 I just need to put and retrieve files from this system. The files which I
 will put in HDFS varies from a few Bytes to a around 100MB, with the
 average
 file-size being 5MB. and the number of files would grow around 20-50
 million. To avoid hitting limit of number of files under a directory, I
 store each file at the path derived by the SHA1 hash of its content (which
 is 20bytes long, and I create a 10 level deep path using 2bytes for each
 level). When I started the cluster a month back, I had kept the default
 block size to 1MB.

 The hardware specs mentioned at
 http://wiki.apache.org/hadoop/MachineScalingconsiders running map

 reduce operations. So not sure if my setup is good
 enough. I would like to get input on this setup.
 The final cluster would have each datanode with 8GB RAM, a quad core CPU,
 and 25 TB attached storage.

 I played with this setup a little and then planned to increase the disk
 space on both the DataNodes. I started by  increasing its disk capacity of
 first dataNode to 15TB and changing the underlying filesystem to XFS
 (which
 made it a clean datanode), and put it back in the system. Before
 performing
 this operation, I had inserted around 7 files in HDFS.
 **NameNode:50070/dfshealth.jsp
 showd  *677323 files and directories, 332419 blocks = 1009742 total *. I
 guess the way I create a 10 level deep path for the file results in ~10
 times the number of actual files in HDFS. Please correct me if I am wrong.
 I
 then ran the 

Re: java.io.IOException: Could not get block locations. Aborting...

2009-02-09 Thread jason hadoop
You will have to increase the per user file descriptor limit.
For most linux machines the file /etc/security/limits.conf controls this on
a per user basis.
You will need to log in a fresh shell session after making the changes, to
see them. Any login shells started before the change and process started by
those shell will have the old limits.

If you are opening vast numbers of files you may need to increase the per
system limits, via the /etc/sysctl.conf file and the fs.file-max parameter.
This page seems to be a decent reference:
http://bloggerdigest.blogspot.com/2006/10/file-descriptors-vs-linux-performance.html


On Mon, Feb 9, 2009 at 1:01 PM, Scott Whitecross sc...@dataxu.com wrote:

 Hi all -

 I've been running into this error the past few days:
 java.io.IOException: Could not get block locations. Aborting...
at
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2143)
at
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)
at
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)

 It seems to be related to trying to write to many files to HDFS.  I have a
 class extending org.apache.hadoop.mapred.lib.MultipleOutputFormat and if I
 output to a few file names, everything works.  However, if I output to
 thousands of small files, the above error occurs.  I'm having trouble
 isolating the problem, as the problem doesn't occur in the debugger
 unfortunately.

 Is this a memory issue, or is there an upper limit to the number of files
 HDFS can hold?  Any settings to adjust?

 Thanks.


Re: java.io.IOException: Could not get block locations. Aborting...

2009-02-09 Thread Bryan Duxbury
Small files are bad for hadoop. You should avoid keeping a lot of  
small files if possible.


That said, that error is something I've seen a lot. It usually  
happens when the number of xcievers hasn't been adjusted upwards from  
the default of 256. We run with 8000 xcievers, and that seems to  
solve our problems. I think that if you have a lot of open files,  
this problem happens a lot faster.


-Bryan

On Feb 9, 2009, at 1:01 PM, Scott Whitecross wrote:


Hi all -

I've been running into this error the past few days:
java.io.IOException: Could not get block locations. Aborting...
	at org.apache.hadoop.dfs.DFSClient 
$DFSOutputStream.processDatanodeError(DFSClient.java:2143)
	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400 
(DFSClient.java:1735)
	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run 
(DFSClient.java:1889)


It seems to be related to trying to write to many files to HDFS.  I  
have a class extending  
org.apache.hadoop.mapred.lib.MultipleOutputFormat and if I output  
to a few file names, everything works.  However, if I output to  
thousands of small files, the above error occurs.  I'm having  
trouble isolating the problem, as the problem doesn't occur in the  
debugger unfortunately.


Is this a memory issue, or is there an upper limit to the number of  
files HDFS can hold?  Any settings to adjust?


Thanks.




Re: java.io.IOException: Could not get block locations. Aborting...

2009-02-09 Thread Scott Whitecross
This would be an addition to the hadoop-site.xml file, to up  
dfs.datanode.max.xcievers?


Thanks.



On Feb 9, 2009, at 5:54 PM, Bryan Duxbury wrote:

Small files are bad for hadoop. You should avoid keeping a lot of  
small files if possible.


That said, that error is something I've seen a lot. It usually  
happens when the number of xcievers hasn't been adjusted upwards  
from the default of 256. We run with 8000 xcievers, and that seems  
to solve our problems. I think that if you have a lot of open files,  
this problem happens a lot faster.


-Bryan

On Feb 9, 2009, at 1:01 PM, Scott Whitecross wrote:


Hi all -

I've been running into this error the past few days:
java.io.IOException: Could not get block locations. Aborting...
	at org.apache.hadoop.dfs.DFSClient 
$DFSOutputStream.processDatanodeError(DFSClient.java:2143)
	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access 
$1400(DFSClient.java:1735)
	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream 
$DataStreamer.run(DFSClient.java:1889)


It seems to be related to trying to write to many files to HDFS.  I  
have a class extending  
org.apache.hadoop.mapred.lib.MultipleOutputFormat and if I output  
to a few file names, everything works.  However, if I output to  
thousands of small files, the above error occurs.  I'm having  
trouble isolating the problem, as the problem doesn't occur in the  
debugger unfortunately.


Is this a memory issue, or is there an upper limit to the number of  
files HDFS can hold?  Any settings to adjust?


Thanks.







copyFromLocal *

2009-02-09 Thread S D
I'm using the Hadoop FS shell to move files into my data store (either HDFS
or S3Native). I'd like to use wildcard with copyFromLocal but this doesn't
seem to work. Is there any way I can get that kind of functionality?

Thanks,
John


Re: using HDFS for a distributed storage system

2009-02-09 Thread Brian Bockelman

Hey Amit,

That plan sounds much better.  I think you will find the system much  
more scalable.


From our experience, it takes a while to get the right amount of  
monitoring and infrastructure in place to have a very dependable  
system with 2 replicas.  I would recommend using 3 replicas until you  
feel you've mastered the setup.


Brian

On Feb 9, 2009, at 4:27 PM, Amit Chandel wrote:


Thanks Brian for your inputs.

I am eventually targeting to store 200k directories each containing   
75
files on avg, with average size of directory being 300MB (ranging  
from 50MB

to 650MB) in this storage system.

It will mostly be an archival storage from where I should be able to  
access
any of the old files easily. But the recent directories would be  
accessed
frequently for a day or 2 as they are being added. They are added in  
batches
of 500-1000 per week, and there can be rare bursts of adding 50k  
directories
once in 3 months. One such burst is about to come in a month, and I  
want to
test the current test setup against that burst. We have upgraded our  
test
hardware a little bit from what I last mentioned. The test setup  
will have 3

DataNodes with 15TB space on each, 6G RAM, dual core processor, and a
NameNode 500G storage, 6G RAM, dual core processor.

I am planning to add the individual files initially, and after a  
while (lets
say 2 days after insertion) will make a SequenceFile out of each  
directory
(I am currently looking into SequenceFile) and delete the previous  
files of
that directory from HDFS. That way in future, I can access any file  
given

its directory without much effort.
Now that SequenceFile is in picture, I can make default block size  
to 64MB
or even 128MB. For replication, I am just replicating a file at 1  
extra
location (i.e. replication factor = 2, since a replication factor 3  
will
leave me with only 33% of the usable storage). Regarding reading  
back from
HDFS, if I can read at ~50MBps (for recent files), that would be  
enough.


Let me know if you see any more pitfalls in this setup, or have more
suggestions. I really appreciate it. Once I test this setup, I will  
put the

results back to the list.

Thanks,
Amit


On Mon, Feb 9, 2009 at 12:39 PM, Brian Bockelman  
bbock...@cse.unl.eduwrote:



Hey Amit,

Your current thoughts on keeping block size larger and removing the  
very
small files are along the right line.  Why not chose the default  
size of
64MB or larger?  You don't seem too concerned about the number of  
replicas.


However, you're still fighting against the tide.  You've got enough  
files

that you'll be pushing against block report and namenode limitations,
especially with 20 - 50 million files.  We find that about 500k  
blocks per

node is a good stopping point right now.

You really, really need to figure out how to organize your files in  
such a
way that the average file size is above 64MB.  Is there a primary  
key for

each file?  If so, maybe consider HBASE?  If you just are going to be
sequentially scanning through all your files, consider archiving  
them all to

a single sequence file.

Your individual data nodes are quite large ... I hope you're not  
expecting

to measure throughput in 10's of Gbps?

It's hard to give advice without knowing more about your  
application.  I
can tell you that you're going to run into a significant wall if  
you can't
figure out a means for making your average file size at least  
greater than

64MB.

Brian

On Feb 8, 2009, at 10:06 PM, Amit Chandel wrote:

Hi Group,


I am planning to use HDFS as a reliable and distributed file  
system for
batch operations. No plans as of now to run any map reduce job on  
top of

it,
but in future we will be having map reduce operations on files in  
HDFS.


The current (test) system has 3 machines:
NameNode: dual core CPU, 2GB RAM, 500GB HDD
2 DataNodes: Both of them with a dual core CPU, 2GB of RAM and  
1.5TB of

space with ext3 filesystem.

I just need to put and retrieve files from this system. The files  
which I

will put in HDFS varies from a few Bytes to a around 100MB, with the
average
file-size being 5MB. and the number of files would grow around 20-50
million. To avoid hitting limit of number of files under a  
directory, I
store each file at the path derived by the SHA1 hash of its  
content (which
is 20bytes long, and I create a 10 level deep path using 2bytes  
for each
level). When I started the cluster a month back, I had kept the  
default

block size to 1MB.

The hardware specs mentioned at
http://wiki.apache.org/hadoop/MachineScalingconsiders running map

reduce operations. So not sure if my setup is good
enough. I would like to get input on this setup.
The final cluster would have each datanode with 8GB RAM, a quad  
core CPU,

and 25 TB attached storage.

I played with this setup a little and then planned to increase the  
disk
space on both the DataNodes. I started by  increasing its disk  
capacity of

first dataNode to 15TB and 

Backing up HDFS?

2009-02-09 Thread Nathan Marz
How do people back up their data that they keep on HDFS? We have many  
TB of data which we need to get backed up but are unclear on how to do  
this efficiently/reliably.


Re: Backing up HDFS?

2009-02-09 Thread Amandeep Khurana
Why would you want to have another backup beyond HDFS? HDFS itself
replicates your data so if the reliability of the system shouldnt be a
concern (if at all it is)...

Amandeep


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


On Mon, Feb 9, 2009 at 4:17 PM, Nathan Marz nat...@rapleaf.com wrote:

 How do people back up their data that they keep on HDFS? We have many TB of
 data which we need to get backed up but are unclear on how to do this
 efficiently/reliably.



Re: Backing up HDFS?

2009-02-09 Thread Brian Bockelman


On Feb 9, 2009, at 6:41 PM, Amandeep Khurana wrote:


Why would you want to have another backup beyond HDFS? HDFS itself
replicates your data so if the reliability of the system shouldnt be a
concern (if at all it is)...



It should be.  HDFS is not an archival system.   Multiple replicas  
does not equate a backup, just like having a RAID1 or RAID5 shouldn't  
make you feel safe.


HDFS is actively developed with lots of new features.  Bugs creep in.   
Things can become inconsistent and mis-replicated.  Even though loss  
due to hardware failure is small, losses due to new bugs are still  
possible!


Brian


Amandeep


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


On Mon, Feb 9, 2009 at 4:17 PM, Nathan Marz nat...@rapleaf.com  
wrote:


How do people back up their data that they keep on HDFS? We have  
many TB of

data which we need to get backed up but are unclear on how to do this
efficiently/reliably.





Re: java.io.IOException: Could not get block locations. Aborting...

2009-02-09 Thread Bryan Duxbury

Correct.

+1 to Jason's more unix file handles suggestion. That's a must-have.

-Bryan

On Feb 9, 2009, at 3:09 PM, Scott Whitecross wrote:

This would be an addition to the hadoop-site.xml file, to up  
dfs.datanode.max.xcievers?


Thanks.



On Feb 9, 2009, at 5:54 PM, Bryan Duxbury wrote:

Small files are bad for hadoop. You should avoid keeping a lot of  
small files if possible.


That said, that error is something I've seen a lot. It usually  
happens when the number of xcievers hasn't been adjusted upwards  
from the default of 256. We run with 8000 xcievers, and that seems  
to solve our problems. I think that if you have a lot of open  
files, this problem happens a lot faster.


-Bryan

On Feb 9, 2009, at 1:01 PM, Scott Whitecross wrote:


Hi all -

I've been running into this error the past few days:
java.io.IOException: Could not get block locations. Aborting...
	at org.apache.hadoop.dfs.DFSClient 
$DFSOutputStream.processDatanodeError(DFSClient.java:2143)
	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400 
(DFSClient.java:1735)
	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream 
$DataStreamer.run(DFSClient.java:1889)


It seems to be related to trying to write to many files to HDFS.   
I have a class extending  
org.apache.hadoop.mapred.lib.MultipleOutputFormat and if I output  
to a few file names, everything works.  However, if I output to  
thousands of small files, the above error occurs.  I'm having  
trouble isolating the problem, as the problem doesn't occur in  
the debugger unfortunately.


Is this a memory issue, or is there an upper limit to the number  
of files HDFS can hold?  Any settings to adjust?


Thanks.









Re: Backing up HDFS?

2009-02-09 Thread Allen Wittenauer
On 2/9/09 4:41 PM, Amandeep Khurana ama...@gmail.com wrote:
 Why would you want to have another backup beyond HDFS? HDFS itself
 replicates your data so if the reliability of the system shouldnt be a
 concern (if at all it is)...

I'm reminded of a previous job where a site administrator refused to make
tape backups (despite our continual harassment and pointing out that he was
in violation of the contract) because he said RAID was good enough.

Then the RAID controller failed. When we couldn't recover data from the
other mirror he was fired.  Not sure how they ever recovered, esp.
considering what the data was they lost.  Hopefully they had a paper trail.

To answer Nathan's question:

 On Mon, Feb 9, 2009 at 4:17 PM, Nathan Marz nat...@rapleaf.com wrote:
 
 How do people back up their data that they keep on HDFS? We have many TB of
 data which we need to get backed up but are unclear on how to do this
 efficiently/reliably.

The content of our HDFSes is loaded from elsewhere and is not considered
'the source of authority'.  It is the responsibility of the original sources
to maintain backups and we then follow their policies for data retention.
For user generated content, we provide *limited* (read: quota'ed) NFS space
that is backed up regularly.

Another strategy we take is multiple grids in multiple locations that get
the data loaded simultaneously.

The key here is to prioritize your data.  Impossible to replicate data gets
backed up using whatever means necessary, hard-to-regenerate data, next
priority. Easy to regenerate and ok to nuke data, doesn't get backed up.



Hadoop Workshop for College Teaching Faculty

2009-02-09 Thread Christophe Bisciglia
Hey Hadoop Fans, I wanted to call your attention to an event we're
putting on next month that would be great for your academic contacts.
Please take a moment and forward this to any faculty you think might
be interested.

http://www.cloudera.com/sigcse-2009-disc-workshop

One of the big challenges to Hadoop adoption is that it requires
thinking about data and computation in new ways. One of the best
things we can do as a community, long term, is help educators prepare
their students to work with big data using Hadoop. This is a chance to
help faculty impart skills that will continue to drive Hadoop adoption
for years to come.

Once a year, Computer Science educators from around the world gather
at the ACM's Special Interest Group for Computer Science Education:
SIGCSE

This year, Cloudera, is hosting a day long workshop at SIGCSE to
introduce faculty to the MapReduce programming model, demonstrate how
to integrate material into various types of courses, and go over some
great sample projects for Hadoop. We'll also go over technical
logistics around spinning up clusters on EC2 and getting free credits
from Amazon for classroom use. A lot of this material is based on past
work we have done with the National Science Foundation.

That link again: http://www.cloudera.com/sigcse-2009-disc-workshop

There is no charge for this event, and we'd love to see all your
favorite computer science teachers there.

Cheers,
Christophe


Re: java.io.IOException: Could not get block locations. Aborting...

2009-02-09 Thread Brian Bockelman


On Feb 9, 2009, at 7:50 PM, jason hadoop wrote:

The other issue you may run into, with many files in your HDFS is  
that you

may end up with more than a few 100k worth of blocks on each of your
datanodes. At present this can lead to instability due to the way the
periodic block reports to the namenode are handled. The more blocks  
per

datanode, the larger the risk of congestion collapse in your hdfs.


Of course, if you stay below, say, 500k, you don't have much of a risk  
of congestion.


In our experience, 500k blocks or less is going to be fine with decent  
hardware.  Between 500k and 750k, you will hit a wall somewhere  
depending on your hardware.  Good luck getting anything above 750k.


The recommendation is that you keep this number as low as possible --  
and explore the limits of your system and hardware in testing before  
you discover them in production :)


Brian




On Mon, Feb 9, 2009 at 5:11 PM, Bryan Duxbury br...@rapleaf.com  
wrote:



Correct.

+1 to Jason's more unix file handles suggestion. That's a must-have.

-Bryan


On Feb 9, 2009, at 3:09 PM, Scott Whitecross wrote:

This would be an addition to the hadoop-site.xml file, to up

dfs.datanode.max.xcievers?

Thanks.



On Feb 9, 2009, at 5:54 PM, Bryan Duxbury wrote:

Small files are bad for hadoop. You should avoid keeping a lot of  
small

files if possible.

That said, that error is something I've seen a lot. It usually  
happens
when the number of xcievers hasn't been adjusted upwards from the  
default of
256. We run with 8000 xcievers, and that seems to solve our  
problems. I
think that if you have a lot of open files, this problem happens  
a lot

faster.

-Bryan

On Feb 9, 2009, at 1:01 PM, Scott Whitecross wrote:

Hi all -


I've been running into this error the past few days:
java.io.IOException: Could not get block locations. Aborting...
  at
org.apache.hadoop.dfs.DFSClient 
$DFSOutputStream.processDatanodeError(DFSClient.java:2143)

  at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access 
$1400(DFSClient.java:1735)

  at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream 
$DataStreamer.run(DFSClient.java:1889)


It seems to be related to trying to write to many files to  
HDFS.  I have
a class extending  
org.apache.hadoop.mapred.lib.MultipleOutputFormat and if I
output to a few file names, everything works.  However, if I  
output to
thousands of small files, the above error occurs.  I'm having  
trouble
isolating the problem, as the problem doesn't occur in the  
debugger

unfortunately.

Is this a memory issue, or is there an upper limit to the number  
of

files HDFS can hold?  Any settings to adjust?

Thanks.













Re: copyFromLocal *

2009-02-09 Thread lohit
Which version of hadoop are you using.
I think from 0.18 or 0.19 copyFromLocal accepts multiple files as input but 
destination should be a directory.

Lohit



- Original Message 
From: S D sd.codewarr...@gmail.com
To: Hadoop Mailing List core-user@hadoop.apache.org
Sent: Monday, February 9, 2009 3:34:22 PM
Subject: copyFromLocal *

I'm using the Hadoop FS shell to move files into my data store (either HDFS
or S3Native). I'd like to use wildcard with copyFromLocal but this doesn't
seem to work. Is there any way I can get that kind of functionality?

Thanks,
John



Re: using HDFS for a distributed storage system

2009-02-09 Thread lohit
 I am planning to add the individual files initially, and after a while (lets
 say 2 days after insertion) will make a SequenceFile out of each directory
 (I am currently looking into SequenceFile) and delete the previous files of
 that directory from HDFS. That way in future, I can access any file given
 its directory without much effort.

Have you considered Hadoop archive? 
http://hadoop.apache.org/core/docs/current/hadoop_archives.html
Depending on your access pattern, you could store files in archive step in the 
first place.



- Original Message 
From: Brian Bockelman bbock...@cse.unl.edu
To: core-user@hadoop.apache.org
Sent: Monday, February 9, 2009 4:00:42 PM
Subject: Re: using HDFS for a distributed storage system

Hey Amit,

That plan sounds much better.  I think you will find the system much more 
scalable.

From our experience, it takes a while to get the right amount of monitoring 
and infrastructure in place to have a very dependable system with 2 replicas.  
I would recommend using 3 replicas until you feel you've mastered the setup.

Brian

On Feb 9, 2009, at 4:27 PM, Amit Chandel wrote:

 Thanks Brian for your inputs.
 
 I am eventually targeting to store 200k directories each containing  75
 files on avg, with average size of directory being 300MB (ranging from 50MB
 to 650MB) in this storage system.
 
 It will mostly be an archival storage from where I should be able to access
 any of the old files easily. But the recent directories would be accessed
 frequently for a day or 2 as they are being added. They are added in batches
 of 500-1000 per week, and there can be rare bursts of adding 50k directories
 once in 3 months. One such burst is about to come in a month, and I want to
 test the current test setup against that burst. We have upgraded our test
 hardware a little bit from what I last mentioned. The test setup will have 3
 DataNodes with 15TB space on each, 6G RAM, dual core processor, and a
 NameNode 500G storage, 6G RAM, dual core processor.
 
 I am planning to add the individual files initially, and after a while (lets
 say 2 days after insertion) will make a SequenceFile out of each directory
 (I am currently looking into SequenceFile) and delete the previous files of
 that directory from HDFS. That way in future, I can access any file given
 its directory without much effort.
 Now that SequenceFile is in picture, I can make default block size to 64MB
 or even 128MB. For replication, I am just replicating a file at 1 extra
 location (i.e. replication factor = 2, since a replication factor 3 will
 leave me with only 33% of the usable storage). Regarding reading back from
 HDFS, if I can read at ~50MBps (for recent files), that would be enough.
 
 Let me know if you see any more pitfalls in this setup, or have more
 suggestions. I really appreciate it. Once I test this setup, I will put the
 results back to the list.
 
 Thanks,
 Amit
 
 
 On Mon, Feb 9, 2009 at 12:39 PM, Brian Bockelman bbock...@cse.unl.eduwrote:
 
 Hey Amit,
 
 Your current thoughts on keeping block size larger and removing the very
 small files are along the right line.  Why not chose the default size of
 64MB or larger?  You don't seem too concerned about the number of replicas.
 
 However, you're still fighting against the tide.  You've got enough files
 that you'll be pushing against block report and namenode limitations,
 especially with 20 - 50 million files.  We find that about 500k blocks per
 node is a good stopping point right now.
 
 You really, really need to figure out how to organize your files in such a
 way that the average file size is above 64MB.  Is there a primary key for
 each file?  If so, maybe consider HBASE?  If you just are going to be
 sequentially scanning through all your files, consider archiving them all to
 a single sequence file.
 
 Your individual data nodes are quite large ... I hope you're not expecting
 to measure throughput in 10's of Gbps?
 
 It's hard to give advice without knowing more about your application.  I
 can tell you that you're going to run into a significant wall if you can't
 figure out a means for making your average file size at least greater than
 64MB.
 
 Brian
 
 On Feb 8, 2009, at 10:06 PM, Amit Chandel wrote:
 
 Hi Group,
 
 I am planning to use HDFS as a reliable and distributed file system for
 batch operations. No plans as of now to run any map reduce job on top of
 it,
 but in future we will be having map reduce operations on files in HDFS.
 
 The current (test) system has 3 machines:
 NameNode: dual core CPU, 2GB RAM, 500GB HDD
 2 DataNodes: Both of them with a dual core CPU, 2GB of RAM and 1.5TB of
 space with ext3 filesystem.
 
 I just need to put and retrieve files from this system. The files which I
 will put in HDFS varies from a few Bytes to a around 100MB, with the
 average
 file-size being 5MB. and the number of files would grow around 20-50
 million. To avoid hitting limit of number of files under a directory, I
 store each file 

Re: using HDFS for a distributed storage system

2009-02-09 Thread Jeff Hammerbacher
Yo,

I don't want to sound all spammy, but Tom White wrote a pretty nice blog
post about small files in HDFS recently that you might find helpful. The
post covers some potential solutions, including Hadoop Archives:
http://www.cloudera.com/blog/2009/02/02/the-small-files-problem.

Later,
Jeff

On Mon, Feb 9, 2009 at 6:14 PM, lohit lohit...@yahoo.com wrote:

  I am planning to add the individual files initially, and after a while
 (lets
  say 2 days after insertion) will make a SequenceFile out of each
 directory
  (I am currently looking into SequenceFile) and delete the previous files
 of
  that directory from HDFS. That way in future, I can access any file given
  its directory without much effort.

 Have you considered Hadoop archive?
 http://hadoop.apache.org/core/docs/current/hadoop_archives.html
 Depending on your access pattern, you could store files in archive step in
 the first place.



 - Original Message 
 From: Brian Bockelman bbock...@cse.unl.edu
 To: core-user@hadoop.apache.org
 Sent: Monday, February 9, 2009 4:00:42 PM
 Subject: Re: using HDFS for a distributed storage system

 Hey Amit,

 That plan sounds much better.  I think you will find the system much more
 scalable.

 From our experience, it takes a while to get the right amount of monitoring
 and infrastructure in place to have a very dependable system with 2
 replicas.  I would recommend using 3 replicas until you feel you've mastered
 the setup.

 Brian

 On Feb 9, 2009, at 4:27 PM, Amit Chandel wrote:

  Thanks Brian for your inputs.
 
  I am eventually targeting to store 200k directories each containing  75
  files on avg, with average size of directory being 300MB (ranging from
 50MB
  to 650MB) in this storage system.
 
  It will mostly be an archival storage from where I should be able to
 access
  any of the old files easily. But the recent directories would be accessed
  frequently for a day or 2 as they are being added. They are added in
 batches
  of 500-1000 per week, and there can be rare bursts of adding 50k
 directories
  once in 3 months. One such burst is about to come in a month, and I want
 to
  test the current test setup against that burst. We have upgraded our test
  hardware a little bit from what I last mentioned. The test setup will
 have 3
  DataNodes with 15TB space on each, 6G RAM, dual core processor, and a
  NameNode 500G storage, 6G RAM, dual core processor.
 
  I am planning to add the individual files initially, and after a while
 (lets
  say 2 days after insertion) will make a SequenceFile out of each
 directory
  (I am currently looking into SequenceFile) and delete the previous files
 of
  that directory from HDFS. That way in future, I can access any file given
  its directory without much effort.
  Now that SequenceFile is in picture, I can make default block size to
 64MB
  or even 128MB. For replication, I am just replicating a file at 1 extra
  location (i.e. replication factor = 2, since a replication factor 3 will
  leave me with only 33% of the usable storage). Regarding reading back
 from
  HDFS, if I can read at ~50MBps (for recent files), that would be enough.
 
  Let me know if you see any more pitfalls in this setup, or have more
  suggestions. I really appreciate it. Once I test this setup, I will put
 the
  results back to the list.
 
  Thanks,
  Amit
 
 
  On Mon, Feb 9, 2009 at 12:39 PM, Brian Bockelman bbock...@cse.unl.edu
 wrote:
 
  Hey Amit,
 
  Your current thoughts on keeping block size larger and removing the very
  small files are along the right line.  Why not chose the default size of
  64MB or larger?  You don't seem too concerned about the number of
 replicas.
 
  However, you're still fighting against the tide.  You've got enough
 files
  that you'll be pushing against block report and namenode limitations,
  especially with 20 - 50 million files.  We find that about 500k blocks
 per
  node is a good stopping point right now.
 
  You really, really need to figure out how to organize your files in such
 a
  way that the average file size is above 64MB.  Is there a primary key
 for
  each file?  If so, maybe consider HBASE?  If you just are going to be
  sequentially scanning through all your files, consider archiving them
 all to
  a single sequence file.
 
  Your individual data nodes are quite large ... I hope you're not
 expecting
  to measure throughput in 10's of Gbps?
 
  It's hard to give advice without knowing more about your application.  I
  can tell you that you're going to run into a significant wall if you
 can't
  figure out a means for making your average file size at least greater
 than
  64MB.
 
  Brian
 
  On Feb 8, 2009, at 10:06 PM, Amit Chandel wrote:
 
  Hi Group,
 
  I am planning to use HDFS as a reliable and distributed file system for
  batch operations. No plans as of now to run any map reduce job on top
 of
  it,
  but in future we will be having map reduce operations on files in HDFS.
 
  The current (test) system has 3 machines:
  

Re: using HDFS for a distributed storage system

2009-02-09 Thread Mark Kerzner
It is a good and useful overview,thank you. It also mentions Stuart Sierra's
post, where Stuart mentions that the process is slow. Does anybody know why?

I have written code to write from the PC file system to HDFS, and I also
noticed that it is very slow. Instead of 40M/sec, as promised by the Tom
White's book, it seems to be 40 sec/Meg. Stuart's tars would work about 5
times faster. But still, why is it so slow? Is there a way to speed this up?

Thanks!

Mark


On Mon, Feb 9, 2009 at 8:35 PM, Jeff Hammerbacher ham...@cloudera.comwrote:

 Yo,

 I don't want to sound all spammy, but Tom White wrote a pretty nice blog
 post about small files in HDFS recently that you might find helpful. The
 post covers some potential solutions, including Hadoop Archives:
 http://www.cloudera.com/blog/2009/02/02/the-small-files-problem.

 Later,
 Jeff

 On Mon, Feb 9, 2009 at 6:14 PM, lohit lohit...@yahoo.com wrote:

   I am planning to add the individual files initially, and after a while
  (lets
   say 2 days after insertion) will make a SequenceFile out of each
  directory
   (I am currently looking into SequenceFile) and delete the previous
 files
  of
   that directory from HDFS. That way in future, I can access any file
 given
   its directory without much effort.
 
  Have you considered Hadoop archive?
  http://hadoop.apache.org/core/docs/current/hadoop_archives.html
  Depending on your access pattern, you could store files in archive step
 in
  the first place.
 
 
 
  - Original Message 
  From: Brian Bockelman bbock...@cse.unl.edu
  To: core-user@hadoop.apache.org
  Sent: Monday, February 9, 2009 4:00:42 PM
  Subject: Re: using HDFS for a distributed storage system
 
  Hey Amit,
 
  That plan sounds much better.  I think you will find the system much more
  scalable.
 
  From our experience, it takes a while to get the right amount of
 monitoring
  and infrastructure in place to have a very dependable system with 2
  replicas.  I would recommend using 3 replicas until you feel you've
 mastered
  the setup.
 
  Brian
 
  On Feb 9, 2009, at 4:27 PM, Amit Chandel wrote:
 
   Thanks Brian for your inputs.
  
   I am eventually targeting to store 200k directories each containing  75
   files on avg, with average size of directory being 300MB (ranging from
  50MB
   to 650MB) in this storage system.
  
   It will mostly be an archival storage from where I should be able to
  access
   any of the old files easily. But the recent directories would be
 accessed
   frequently for a day or 2 as they are being added. They are added in
  batches
   of 500-1000 per week, and there can be rare bursts of adding 50k
  directories
   once in 3 months. One such burst is about to come in a month, and I
 want
  to
   test the current test setup against that burst. We have upgraded our
 test
   hardware a little bit from what I last mentioned. The test setup will
  have 3
   DataNodes with 15TB space on each, 6G RAM, dual core processor, and a
   NameNode 500G storage, 6G RAM, dual core processor.
  
   I am planning to add the individual files initially, and after a while
  (lets
   say 2 days after insertion) will make a SequenceFile out of each
  directory
   (I am currently looking into SequenceFile) and delete the previous
 files
  of
   that directory from HDFS. That way in future, I can access any file
 given
   its directory without much effort.
   Now that SequenceFile is in picture, I can make default block size to
  64MB
   or even 128MB. For replication, I am just replicating a file at 1 extra
   location (i.e. replication factor = 2, since a replication factor 3
 will
   leave me with only 33% of the usable storage). Regarding reading back
  from
   HDFS, if I can read at ~50MBps (for recent files), that would be
 enough.
  
   Let me know if you see any more pitfalls in this setup, or have more
   suggestions. I really appreciate it. Once I test this setup, I will put
  the
   results back to the list.
  
   Thanks,
   Amit
  
  
   On Mon, Feb 9, 2009 at 12:39 PM, Brian Bockelman bbock...@cse.unl.edu
  wrote:
  
   Hey Amit,
  
   Your current thoughts on keeping block size larger and removing the
 very
   small files are along the right line.  Why not chose the default size
 of
   64MB or larger?  You don't seem too concerned about the number of
  replicas.
  
   However, you're still fighting against the tide.  You've got enough
  files
   that you'll be pushing against block report and namenode limitations,
   especially with 20 - 50 million files.  We find that about 500k blocks
  per
   node is a good stopping point right now.
  
   You really, really need to figure out how to organize your files in
 such
  a
   way that the average file size is above 64MB.  Is there a primary
 key
  for
   each file?  If so, maybe consider HBASE?  If you just are going to be
   sequentially scanning through all your files, consider archiving them
  all to
   a single sequence file.
  
   Your individual data nodes are 

Re: java.io.IOException: Could not get block locations. Aborting...

2009-02-09 Thread Scott Whitecross
I tried modifying the settings, and I'm still running into the same  
issue.  I increased the xceivers count (fs.datanode.max.xcievers) in  
the hadoop-site.xml file.  I also checked to make sure the file  
handles were increased, but they were fairly high to begin with.


I don't think I'm dealing with anything out of the ordinary either.   
I'm process three large 'log' files, totaling around 5 GB, and  
producing around 8000 output files after some data processing,  
probably totals 6 or 7 gig.   In the past, I've produced a lot fewer  
files, and that has been fine.  When I change the process to output to  
just a few files, no problem again.


Anything else beyond the limits?  Is HDFS creating a substantial  
amount of temp files as well?







On Feb 9, 2009, at 8:11 PM, Bryan Duxbury wrote:


Correct.

+1 to Jason's more unix file handles suggestion. That's a must-have.

-Bryan

On Feb 9, 2009, at 3:09 PM, Scott Whitecross wrote:

This would be an addition to the hadoop-site.xml file, to up  
dfs.datanode.max.xcievers?


Thanks.



On Feb 9, 2009, at 5:54 PM, Bryan Duxbury wrote:

Small files are bad for hadoop. You should avoid keeping a lot of  
small files if possible.


That said, that error is something I've seen a lot. It usually  
happens when the number of xcievers hasn't been adjusted upwards  
from the default of 256. We run with 8000 xcievers, and that seems  
to solve our problems. I think that if you have a lot of open  
files, this problem happens a lot faster.


-Bryan

On Feb 9, 2009, at 1:01 PM, Scott Whitecross wrote:


Hi all -

I've been running into this error the past few days:
java.io.IOException: Could not get block locations. Aborting...
	at org.apache.hadoop.dfs.DFSClient 
$DFSOutputStream.processDatanodeError(DFSClient.java:2143)
	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access 
$1400(DFSClient.java:1735)
	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream 
$DataStreamer.run(DFSClient.java:1889)


It seems to be related to trying to write to many files to HDFS.   
I have a class extending  
org.apache.hadoop.mapred.lib.MultipleOutputFormat and if I output  
to a few file names, everything works.  However, if I output to  
thousands of small files, the above error occurs.  I'm having  
trouble isolating the problem, as the problem doesn't occur in  
the debugger unfortunately.


Is this a memory issue, or is there an upper limit to the number  
of files HDFS can hold?  Any settings to adjust?


Thanks.












Re: lost TaskTrackers

2009-02-09 Thread Vadim Zaliva
I am starting to wonder If hadoop 19 stable enough for production?

Vadim


On 2/9/09, Vadim Zaliva kroko...@gmail.com wrote:
 yes, I can access DFS from the cluster. namenode status seems to be OK
 and I see no errors in namenode log files.

 initially all trackers were visible, and 9433 maps completed
 successfully. Then, this was followed by 65975 which were killed. In
 log they all show same error:

 Error initializing attempt_200902081049_0001_m_004499_1:
 java.lang.NullPointerException
   at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:459)
   at org.apache.hadoop.ipc.Client.call(Client.java:686)
   at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
   at $Proxy5.getFileInfo(Unknown Source)
   at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
   at java.lang.reflect.Method.invoke(Unknown Source)
   at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
   at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
   at $Proxy5.getFileInfo(Unknown Source)
   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:578)
   at
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:390)
   at 
 org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:699)
   at
 org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1636)
   at 
 org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:102)
   at
 org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1602)

 While this is happening, I can access Job tracker web interface, but
 it shows that there is 0 nodes in the cluster. I have tried to run
 this task several times and the result is always the same. It works at
 first and then starts failing.

 Vadim

 On Sun, Feb 8, 2009 at 22:19, Amar Kamat ama...@yahoo-inc.com wrote:
 Vadim Zaliva wrote:

 Hi!

 I am observing strange situation in my Hadoop cluster. While running
 task, eventually it gets into
 this strange mode where:

 1. JobTracker reports 0 task trackers.

 2. Task tracker processes are alive but log file is full of repeating
 messages like this:

 2009-02-08 19:16:47,761 INFO org.apache.hadoop.mapred.TaskRunner:
 attempt_200902
 081049_0001_m_017698_0 done; removing files.
 2009-02-08 19:16:47,761 INFO org.apache.hadoop.mapred.IndexCache: Map ID
 attempt
 _200902081049_0001_m_017698_0 not found in cache
 2009-02-08 19:16:47,761 INFO org.apache.hadoop.mapred.TaskRunner:
 attempt_200902
 081049_0001_m_021212_0 done; removing files.
 2009-02-08 19:16:47,762 INFO org.apache.hadoop.mapred.IndexCache: Map ID
 attempt
 _200902081049_0001_m_021212_0 not found in cache
 2009-02-08 19:16:47,762 INFO org.apache.hadoop.mapred.TaskRunner:
 attempt_200902
 081049_0001_m_022133_0 done; removing files.

 with new one appearing every couple of seconds.

 In the task tracker log, before these repeating messages last 2
 exceptions
 are:

 2009-02-08 17:46:51,482 INFO org.apache.hadoop.mapred.TaskTracker:
 LaunchTaskAction (registerTask): attempt_200902081049_0001_m_075408_3
 2009-02-08 17:46:51,482 INFO org.apache.hadoop.mapred.TaskTracker:
 Trying to launch : attempt_200902081049_0001_m_075408_3
 2009-02-08 17:46:51,482 INFO org.apache.hadoop.mapred.TaskTracker: In
 TaskLauncher, current free slots : 8 and trying to launch
 attempt_200902081049_0001_m_07
 5408_3
 2009-02-08 17:46:51,483 WARN org.apache.hadoop.mapred.TaskTracker:
 Error initializing attempt_200902081049_0001_m_075408_3:
 java.lang.NullPointerException
at
 org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:459)
at org.apache.hadoop.ipc.Client.call(Client.java:686)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at $Proxy5.getFileInfo(Unknown Source)
at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
 Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy5.getFileInfo(Unknown Source)
at
 org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:578)
at
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:390)
at
 org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:699)
at
 org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1636)
at
 org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:102)
at
 org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1602)



 Looks like an RPC issue. Can you tell more about the 

Re: TaskTrackers being double counted after restart job recovery

2009-02-09 Thread Amar Kamat

Stefan Will wrote:

Hi,

I¹m using the new persistent job state feature in 0.19.0, and it¹s worked
really well so far. However, this morning my JobTracker died with and OOM
error (even though the heap size is set to 768M). So I killed it and all the
TaskTrackers. 
Any specific reason why you killed the task-trackers? Ideally the 
JobTracker should be restarted and the task-trackers will join.

After starting everything up again, all my nodes were showing
up twice in the JobTracker web interface, with different port numbers. 
Owen is correct. Since the state is rebuild from history, the old 
tracker information is obtained from the history and hence the double. 
Killing the tasktracker after killing the jobtracker is like losing the 
tasktracker while the jobtracker
is down. Upon restart the jobtracker assumes that the tasktracker 
mentioned in the history is still available and waits for it to 
re-connect. After the expiry interval (default 10 mins), the old tracker 
will be removed.

Also,
some of the jobs it restarted had already been completed when the job
tracker died.
  
Old job detection happens via the system directory. When a job is 
submitted, its info (job.xml etc) is copied to the mapred system dir and 
upon completion its removed from there. Upon restart, the 
mapred-system-dir is checked to see what all jobs needs to be 
resumed/re-run. So if the job folder/info is not cleaned up from the 
system-dir then the job will be resumed. But if the job was complete 
then the job logs should mention that its complete and hence upon 
restart it will simple finish/complete the job without even running any 
task. Are you seeing something different here? Look at jobtracker logs 
to see what is happening in the recovery. The line Restoration 
complete marks the end of recovery.

 Any idea what might be happening here ? How can I fix this ? Will
temporarily setting mapred.jobtracker.restart.recover=false clear things up
?
  
You can manually delete job files from mapred.system.dir to avoid 
resuming that job.

Amar

-- Stefan

  




what's going on :( ?

2009-02-09 Thread Mark Kerzner
Hi,
Hi,

why is hadoop suddenly telling me

 Retrying connect to server: localhost/127.0.0.1:8020

with this configuration

configuration
  property
namefs.default.name/name
valuehdfs://localhost:9000/value
  /property
  property
namemapred.job.tracker/name
valuelocalhost:9001/value
  /property
  property
namedfs.replication/name
value1/value
  /property
/configuration

and both this http://localhost:50070/dfshealth.jsp and this
http://localhost:50030/jobtracker.jsp links work fine?

Thank you,
Mark


Re: what's going on :( ?

2009-02-09 Thread Amar Kamat

Mark Kerzner wrote:

Hi,
Hi,

why is hadoop suddenly telling me

 Retrying connect to server: localhost/127.0.0.1:8020

with this configuration

configuration
  property
namefs.default.name/name
valuehdfs://localhost:9000/value
  /property
  property
namemapred.job.tracker/name
valuelocalhost:9001/value
  

Shouldnt this be

valuehdfs://localhost:9001/value

Amar

  /property
  property
namedfs.replication/name
value1/value
  /property
/configuration

and both this http://localhost:50070/dfshealth.jsp and this
http://localhost:50030/jobtracker.jsp links work fine?

Thank you,
Mark