> You seem to have conflicting wants/needs. First you said this: > --- On Wed, 7/1/09, Edward Ned Harvey <[email protected]> wrote: > > I have a bunch of compute servers. They all have local disks > > mounted as /scratch to use for computation scratch space. This > ensures > > maximum performance on all systems, and no competition for a shared > > resource during crunch time. At present, all of their /scratch > > directories are local, separate and distinct. > > Then this: > > I think it would be awesome > > if /scratch looked the same on all systems. > > Does "look the same" mean configured the same? You didn't really expand > on this statement and clarify the goal, which I'm not sure is > uniformity, accessibility, or a combo of both.
You're right - although it was clear in my mind, I see how that was confusing. Let me try again: If you go into some directory and do "ls" (or whatever), then the results should be the same regardless of which machine you're on. I do not want a centralized network file server, because of the bandwidth and diskspace bottleneck. I want a distributed filesystem, which would provide the aforementioned ubiquitousness of namespace, but also allows you to do heavy IO on some machine without necessitating heavy network traffic. A minimal amount of traffic is probably required, just so the other machines all have awareness of the existence of some file, but the file contents themselves are not needed to traverse the network until some other machine requests the contents of the file. > You named the storage "/scratch", implying it is just a temporary usage > space. Are you possibly adding requirements here that are unnecessary? I did not mean to imply scratch is temporary - You see, we already have a NFS server, which is backed up, so I named the local directory "scratch" so users know it's not backed up. > We have similar HPC systems that write results to local disk space. > When the computation is completely done, the results are rsynced to > separate network accessible storage space; the local space is then > reclaimed for the next job. The rsync is controlled by LSF scripts, but > any job management system will have similar capabilities. The network > available results can then be perused by engineers. If they want to > keep the results around permanently, they move the results at their > discretion to longer term storage. Anything that isn't moved by the > engineers after 7 days is considered unimportant, and deleted after 7 > days. > > Would that paradigm work for you? I have done exactly the same in the past - er - I should say the users have done the same. It's acceptable. In fact, what we have now is also acceptable. I'm just trying to make it better. (and learn more). The two downsides of the above are limitation of disk space on the nfs server, and actually since a bunch of machines are all pushing their results up to the nfs server, performance does become an issue. What we have now, is as follows: Each machine has a local disk mounted as /scratch. Each machine exports it. Each machine has an automount directory, /scratches So you can access any scratch directory from any machine. For example, /scratches/machinename is the nfs mountpoint for machinename:/scratch This is almost ideal - except - in order to access the data that was generated last night, the user must know on which machine that data was created. Acceptable, but still could be cooler. :-) _______________________________________________ Tech mailing list [email protected] http://lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/
