Re: Batch processing with Hadoop -- does HDFS scale for parallel reads?

2009-02-06 Thread TCK
Bockelman bbock...@cse.unl.edu wrote: From: Brian Bockelman bbock...@cse.unl.edu Subject: Re: Batch processing with Hadoop -- does HDFS scale for parallel reads? To: core-user@hadoop.apache.org Date: Wednesday, February 4, 2009, 1:50 PM Sounds overly complicated. Complicated usually leads to mistakes

Re: Batch processing with Hadoop -- does HDFS scale for parallel reads?

2009-02-06 Thread Brian Bockelman
, Brian Bockelman bbock...@cse.unl.edu wrote: From: Brian Bockelman bbock...@cse.unl.edu Subject: Re: Batch processing with Hadoop -- does HDFS scale for parallel reads? To: core-user@hadoop.apache.org Date: Wednesday, February 4, 2009, 1:50 PM Sounds overly complicated. Complicated usually leads

Batch processing with Hadoop -- does HDFS scale for parallel reads?

2009-02-04 Thread TCK
Hey guys, We have been using Hadoop to do batch processing of logs. The logs get written and stored on a NAS. Our Hadoop cluster periodically copies a batch of new logs from the NAS, via NFS into Hadoop's HDFS, processes them, and copies the output back to the NAS. The HDFS is cleaned up at

Re: Batch processing with Hadoop -- does HDFS scale for parallel reads?

2009-02-04 Thread Brian Bockelman
Hey TCK, We use HDFS+FUSE solely as a storage solution for a application which doesn't understand MapReduce. We've scaled this solution to around 80Gbps. For 300 processes reading from the same file, we get about 20Gbps. Do consider your data retention policies -- I would say that

Re: Batch processing with Hadoop -- does HDFS scale for parallel reads?

2009-02-04 Thread TCK
by most people? Best Regards, TCK --- On Wed, 2/4/09, Brian Bockelman bbock...@cse.unl.edu wrote: From: Brian Bockelman bbock...@cse.unl.edu Subject: Re: Batch processing with Hadoop -- does HDFS scale for parallel reads? To: core-user@hadoop.apache.org Date: Wednesday, February 4, 2009, 1:06 PM

Re: Batch processing with Hadoop -- does HDFS scale for parallel reads?

2009-02-04 Thread Brian Bockelman
, TCK --- On Wed, 2/4/09, Brian Bockelman bbock...@cse.unl.edu wrote: From: Brian Bockelman bbock...@cse.unl.edu Subject: Re: Batch processing with Hadoop -- does HDFS scale for parallel reads? To: core-user@hadoop.apache.org Date: Wednesday, February 4, 2009, 1:06 PM Hey TCK, We use HDFS+FUSE