Hi Murali,

Overall, the description of HDFS doesn't seem all that compelling. Its targeted at certain write-once, read-only workloads, with a heavy emphasis on fault-tolerance and load-balancing. It pretty much tosses POSIX consistency for its workloads, and assumes that a single client will perform the creation and writing of the file, after which, the file will be read-only. So its able to solve replication and data caching pretty easily. The description talks about supporting an append-to-file operation at some point in the future. Data is striped (and replicated) over multiple IO nodes, but it uses blocks instead of objects to manipulate stripes. The IO nodes send heartbeat messages to the metadata node for failover.

All metadata is stored on a single node, introducing a single point of failure, and although you can replicate the metadata on that node to avoid metadata corruption, all file metadata operations go through that node. Further, the metadata node communicates with the IO nodes for metadata operations (block allocations, etc.). They assume their workloads won't include heavy metadata operations or small access. All accesses are assumed to be in multiples of 64MB.

Both HDFS and Ceph seem to focus on load balancing of data distribution _within_ a cluster, talking about distances between racks and rows of racks and etc. I guess the clusters common in industry differ from the ones we're used to seeing, where we're trying to distribute data on pretty much all the storage hardware we can get our hands on to improve performance, not just get the data as close to the computation as possible. I guess that has something to do with the IO patterns being independent rather than collective.

The metadata operations are stored in a transaction log, but it doesn't look like its being used to perform rollbacks on failure. From the design doc:

"The Namenode keeps an image of the entire file system namespace and file Blockmap in memory. This key matadata item is designed to be compact, such that a Namenode with 4GB of RAM is plenty to support a huge number of files and directories."

Sounds like something Bill Gates would say...

Cheers,
-sam


On Oct 16, 2007, at 12:32 PM, Murali Vilayannur wrote:

Hi Folks
Have any of you guys looked at Hadoop and HDFS?
Hadoop is a distributed computing infrastructure with special
map&reduce constructs similar to what Google proposed in OSDI04.
HDFS is their backend cluster file system.

http://wiki.apache.org/lucene-hadoop-data/attachments/ HadoopPresentations/attachments/HDFSDescription.pdf
Question that I have is how many HPC apps can be rewritten using the
M&R programming model and whether it makes sense to integrate with the
Hadoop API to get a larger sample space of apps that can run well on
pvfs2 other than MPI based ones.?
Any thoughts?
If I recall, Avery or RobR  had already done some research on this
aspect.. or maybe I
heard from someone else..? Anyhow, it would be good to know from
someone who know
more about this.
thanks,
Murali
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to