Hi folks,

A few weeks ago I started a php project for building a highly scalable
distributed file system similar to mogileFS. Well, I did just that and
called it PHPDFS and I just completed a test of PHPDFS using 20 ec2
instances and PHPDFS performed quite well.

I setup 20 (5 clients, 15 servers)  m1.large amazon instances and uploaded
250GB of data and downloaded 1.5 terabytes of data.  Total overall transfer
was 1.8 TB

The blog has more info and links to 870 graphs and the 300mb of data that
was collected and analyzed.:

http://phpdfs.blogspot.com

Here are some highlights:

   - 500 threads (5 java clients, 100 threads each)
   - 15 servers
   - ~250 GB uploaded (PUT requests) (individual files between 50k and 10mb)
   - ~1.5 Tb downloaded (GET requests)
   - ~1.8 TB transfer total
   - ~47mb / sec upload rate
   - ~201 requests / sec overall
   - The data was very evenly distributed across all nodes
   - 40 replicas were lost and totally unrecoverable amounting to .030% data
   loss

Basically,  PHPDFS performed quite well,  There was a small amount of data
loss due to one of the servers getting really hot.  What happened was a few
uploads to the hot server were corrupted and the corrupted objects were
replicated.  Better error handling and a checksum mechanism will eliminate
something like that from happening.  So that is next on the development
list.

Anyway, I just wanted to let the list know that this is coming along quite
nicely and actually has gotten some attention from the folks at the Storage
Systems Research Center at UC Santa Cruz.  http://www.ssrc.ucsc.edu/

if anyone has questions or wants to get involved please let me know.

you can download phpdfs here:

http://code.google.com/p/phpdfs/downloads/list

peace,

-Shane

Reply via email to