Re: our experiences with various filesystems and tuning options

2011-05-10 Thread Will Maier
On Tue, May 10, 2011 at 12:03:09AM -0400, Rita wrote: > what filesystem are they using and what is the size of each filesystem? It sounds nuts, but each disk has its own ext3 filesystem. Beyond switching to the deadline IO scheduler, we haven't done much tuning/tweaking. A script runs every ten mi

Re: our experiences with various filesystems and tuning options

2011-05-10 Thread Rita
I keep asking because I wasn't able to use a XFS filesystem larger than 3-4TB. If the XFS file system is larger than 4TB hdfs won't recognize the space. I am on a 64bit RHEL 5.3 host. On Tue, May 10, 2011 at 6:30 AM, Will Maier wrote: > On Tue, May 10, 2011 at 12:03:09AM -0400, Rita wrote: > >

Re: our experiences with various filesystems and tuning options

2011-05-10 Thread Jonathan Disher
In a previous life, I've had extreme problems with XFS, including kernel panics and data loss under high load. Those were database servers, not Hadoop nodes, and it was a few years ago. But, ext3/ext4 seems to be stable enough, and it's more widely supported, so it's my preference. -j On May

Re: our experiences with various filesystems and tuning options

2011-05-10 Thread Marcos Ortiz
On 05/10/2011 06:29 AM, Rita wrote: I keep asking because I wasn't able to use a XFS filesystem larger than 3-4TB. If the XFS file system is larger than 4TB hdfs won't recognize the space. I am on a 64bit RHEL 5.3 host. On Tue, May 10, 2011 at 6:30 AM, Will Maier

Re: our experiences with various filesystems and tuning options

2011-05-10 Thread Marcos Ortiz
On 05/10/2011 06:56 AM, Jonathan Disher wrote: In a previous life, I've had extreme problems with XFS, including kernel panics and data loss under high load. Those were database servers, not Hadoop nodes, and it was a few years ago. But, ext3/ext4 seems to be stable enough, and it's more wide

HDFS network bottleneck - namenode?

2011-05-10 Thread Jonathan Disher
I will preface this with a couple statements: a) it's almost 6am, and I've been up all night b) I'm drugged up from an allergic reaction, so I may not be firing on all 64 bits. Do I correctly understand the HDFS architecture in that the namenode is a network bottleneck into the system? I.e., i

Re: HDFS network bottleneck - namenode?

2011-05-10 Thread Will Maier
Hi Jonathan- On Tue, May 10, 2011 at 05:50:03AM -0700, Jonathan Disher wrote: > I will preface this with a couple statements: a) it's almost 6am, and I've > been up all night b) I'm drugged up from an allergic reaction, so I may not be > firing on all 64 bits. > > Do I correctly understand the HDF

Re: more replicas on a single node

2011-05-10 Thread Matthew Foley
Hi Ferdy, I'm not aware of anyone running this way in production, but for test purposes it is often useful to run two DataNodes on a single physical server. It works fine, you just need to give the two services different HADOOP_CONF_DIR values with modified port numbers and storage directories.

Re: HDFS network bottleneck - namenode?

2011-05-10 Thread Matthew Foley
Will's right, meta-data transactions go through the Namenode, but all the content data read/write activity is directly between Clients and Datanodes, and replication activity is Datanode-to-Datanode. No bottlenecks, as long as your Namenode has enough ram to hold the namespace in memory, and

Re: our experiences with various filesystems and tuning options

2011-05-10 Thread Allen Wittenauer
On May 9, 2011, at 11:46 PM, Jonathan Disher wrote: > I cant speak for Will, but I'm actually going against recommendations, my > systems have three 20TB RAID 6 arrays, with two 10TB ext4 filesystems per > array. > > The problems you will encounter keeping machines performing well after they

Re: our experiences with various filesystems and tuning options

2011-05-10 Thread Allen Wittenauer
On May 10, 2011, at 6:14 AM, Marcos Ortiz wrote: > My prefered filesystem is ZFS, It's a shame that Linux support is very > inmature yet. For that reason, I changed my PostgreSQL hosts to FreeBSD-8.0 > to use > ZFS like filesystem and it's really rocks. > > Had anyone tested a Hadoop cluster wi

Re: HDFS network bottleneck - namenode?

2011-05-10 Thread Jonathan Disher
Okay, thanks. I -hoped- it was this way. Sadly, all my files are small (the largest are around 40MB). But oh well! -j On May 10, 2011, at 10:46 AM, Matthew Foley wrote: > Will's right, meta-data transactions go through the Namenode, but all the > content data > read/write activity is direct

Re: our experiences with various filesystems and tuning options

2011-05-10 Thread Jonathan Disher
This cluster is specifically a near-line archive cluster, so storage density is more important than computational performance. Our primary production cluster (which actually does very little in the way of computation) is comprised of Dell R510's with 10 disks in JBOD and a two disk mirrored OS

Null pointer exception in Mapper initialization

2011-05-10 Thread Mapred Learn
Hi, I get error like: java.lang.NullPointerException at org.apache.hadoop.io .serializer.SerializationFactory.getSerializer(SerializationFactory.java:73) at org.apache.hadoop.mapred.MapTask$MapOu