On Thu, Jan 17, 2013 at 12:24 AM, Roberto Nunnari <[email protected]>wrote:
> Hi all. > > I'm writing to you to ask for advice or a hint to the right direction. > > In our department, more and more researchers ask us (IT administrators) to > assemble (or to buy) GPGPU powered workstations to do parallel computing. > > As I already manage a small CPU cluster (resources managed using SGE), > with my boss we talked about building a new GPU cluster. The problem is > that I have no experience at all with GPU clusters. > > Apart from the already running GPU workstations, we already have some new > HW that looks promising to me as a starting point for a GPU cluster. > > - 1x Dell PowerEdge R720 > - 1x Dell PowerEdge C410x > - 1x NVIDIA M2090 PCIe x16 > - 1x NVIDIA iPASS Cable Kit > (Dell forgot to include the iPASS adapter for the R720!! :-D) > > I'd be grateful if you could kindly give me some advice and/or hint to the > right direction. > > In particular I'm interested on your opinion on: > 1) is the above HW suitable for a small (2 to 4/6 GPUs) GPU cluster? > 2) is apache adhoop suitable (or what could we use?) as a queuing and > resource management system? We would like the cluster to be usable by many > users at once in a way that no user has to worry about resources, just like > we do on the CPU cluster with SGE. > My understanding (although I could be wrong) is that only one task is going to be able to use the GPU at a time, so you're going to have to take that into account when configuring MR. 3) What distribution of linux would be more appropriate? > Whatever NVIDIA's kernel module supports best-- probably RHEL. 4) necessary stack of sw? (cuda, hadoop, other?) > > You probably want to write the code in C or C++ and use Hadoop streaming plus whatever libraries you need in order to use CUDA. nvidia.com should have more information about that. CUDA is an NVIDIA-proprietary technology. Colin
