Re: [Pvfs2-developers] PVFS2 & Hadoop

Sam Lang Sat, 03 Nov 2007 12:36:35 -0800


Hi Murali,

Overall, the description of HDFS doesn't seem all that compelling.Its targeted at certain write-once, read-only workloads, with a heavyemphasis on fault-tolerance and load-balancing. It pretty muchtosses POSIX consistency for its workloads, and assumes that a singleclient will perform the creation and writing of the file, afterwhich, the file will be read-only. So its able to solve replicationand data caching pretty easily. The description talks aboutsupporting an append-to-file operation at some point in the future.Data is striped (and replicated) over multiple IO nodes, but it usesblocks instead of objects to manipulate stripes. The IO nodes sendheartbeat messages to the metadata node for failover.

All metadata is stored on a single node, introducing a single pointof failure, and although you can replicate the metadata on that nodeto avoid metadata corruption, all file metadata operations go throughthat node. Further, the metadata node communicates with the IO nodesfor metadata operations (block allocations, etc.). They assume theirworkloads won't include heavy metadata operations or small access.All accesses are assumed to be in multiples of 64MB.

Both HDFS and Ceph seem to focus on load balancing of datadistribution _within_ a cluster, talking about distances betweenracks and rows of racks and etc. I guess the clusters common inindustry differ from the ones we're used to seeing, where we'retrying to distribute data on pretty much all the storage hardware wecan get our hands on to improve performance, not just get the data asclose to the computation as possible. I guess that has something todo with the IO patterns being independent rather than collective.

The metadata operations are stored in a transaction log, but itdoesn't look like its being used to perform rollbacks on failure.From the design doc:

"The Namenode keeps an image of the entire file system namespace andfile Blockmap in memory. This key matadata item is designed to becompact, such that a Namenode with 4GB of RAM is plenty to support ahuge number of files and directories."


Sounds like something Bill Gates would say...

Cheers,
-sam


On Oct 16, 2007, at 12:32 PM, Murali Vilayannur wrote:

Hi Folks
Have any of you guys looked at Hadoop and HDFS?
Hadoop is a distributed computing infrastructure with special
map&reduce constructs similar to what Google proposed in OSDI04.
HDFS is their backend cluster file system.

http://wiki.apache.org/lucene-hadoop-data/attachments/HadoopPresentations/attachments/HDFSDescription.pdf

Question that I have is how many HPC apps can be rewritten using the
M&R programming model and whether it makes sense to integrate with the
Hadoop API to get a larger sample space of apps that can run well on
pvfs2 other than MPI based ones.?
Any thoughts?
If I recall, Avery or RobR  had already done some research on this
aspect.. or maybe I
heard from someone else..? Anyhow, it would be good to know from
someone who know
more about this.
thanks,
Murali
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Re: [Pvfs2-developers] PVFS2 & Hadoop

Reply via email to