> On Wed, Jul 1, 2009 at 6:53 PM, Edward Ned Harvey <[email protected]>
> wrote:
> The goal I'm trying to accomplish - it's expected that some amount of
> network traffic is required, but it should be minimal.  If a 1G file is
> created on some machine, then a few packets should fly across the
> network, just to say the file exists.  But 1G should not go anywhere.
> 
> I'm not sure I understand your goal then.  There's no FS I know of that
> will do what you're asking.

Until today, I only had one idea which came close - google gfs (not global gfs) 
does exactly what I want except that it always writes to 3 or more peers.  If 
google gfs is available for use (can install and be used on linux) and if it's 
configurable to take that number down to 1, then it might do what I want.

Another person gave me a promising suggestion - Export /scratch from each 
machine, and then use unionfs to mount all the various machines' scratches onto 
a single scratch.  Haven't tried this yet - it may still go horribly wrong.


> Your options are: local disk (ext3, xfs), shared disk (iSCSI, fiber
> channel) running GFS (global FS, not google FS), network file system
> (NFS) or distributed file system (lustre, GPFS, AFS).

For this purpose, I'm interested in a distributed filesystem, which allows 
writes to be performed on a single host, to allow any individual host to work 
on new files at local disk speeds.  Thanks to this thread, I have the names of 
some distributed filesystems (as you mentioned, lustre etc) ... but I haven't 
had the chance to research any of them more thoroughly yet.


> Anything beyond local disk will require communication over the
> network.  With GFS, you'll mostly be speaking with the lock manager
> over the network.  So that would accomplish your goal of "write locally
> and only send limited amount of data over the network".  However, GFS
> isn't one of the shining stars in distributed/parallel processing or
> HPC (high performance computing).

Plus gfs (global gfs, not google) requires iscsi / san / shared disk.


> I think you're saying - it writes across more than one machine, which
> would slow down the write operation
> 
> Actually, writing across multiple hosts speeds it up.  Much along the
> lines of a RAID 0 striping pattern, the data is spread across multiple
> destinations.

Clarification - Suppose I have local raid 0+1, and I do random read/write to 
those disks.  It's going very fast, perhaps 3Gb sata bus speed.  Now I write to 
the fastest NFS server in the world.  I am limited to 1Gb Ethernet which adds 
overhead.  Not nearly as fast.

I understand, in google gfs for example, if most of your operations are read, 
and most of your operations come across the network, then you're able to gain 
performance by distributing writes across multiple hosts, so later you have 
multiple hosts available to handle the read requests.  But if your usage is (as 
is the case for me) mostly random read/write on a single host, with occasional 
reads from some other host, then positively the best performance will come from 
local disk.



_______________________________________________
Tech mailing list
[email protected]
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Reply via email to