[fonc] Re: [Beowulf] 3.79 TFlops sp, 0.95 TFlops dp, 264 TByte/s, 3 GByte, 198 W @ 500 EUR
- Forwarded message from Lux, Jim (337C) james.p@jpl.nasa.gov - From: Lux, Jim (337C) james.p@jpl.nasa.gov Date: Thu, 22 Dec 2011 08:27:46 -0800 To: Prentice Bisbal prent...@ias.edu, Beowulf Mailing List beow...@beowulf.org Subject: Re: [Beowulf] 3.79 TFlops sp, 0.95 TFlops dp, 264 TByte/s, 3 GByte, 198 W @ 500 EUR user-agent: Microsoft-MacOutlook/14.12.0.110505 The problem with FPGAs (and I use a fair number of them) is that you're never going to get the same picojoules/bit transition kind of power consumption that you do with a purpose designed processor. The extra logic needed to get it reconfigurable, and the physical junction sizes as well, make it so. What you will find is that on certain kinds of problems, you can implement a more efficient algorithm in FPGA than you can in a conventional processor or GPU. So, for that class of problem, the FPGA is a winner (things lending themselves to fixed point systolic array type processes are a good candidate). Bear in mind also that while an FPGA may have, say, 10-million gate equivalent, any given practical design is going to use a small fraction of those gates. Fortunately, most of those unused gates aren't toggling, so they don't consume clock related power, but they do consume leakage current, so the whole clock rate vs core voltage trade winds up a bit different for FPGAs. The biggest problem with FPGAs is that they are difficult to write high performance software for. With FORTRAN on conventional and vectorized and pipelined processors, we've got 50 years of compiler writing expertise, and real high performance libraries. And, literally millions of people who know how to code in FORTRAN or C or something, so if you're looking for the highest performance coders, even at the 4 sigma level, you've got a fair number to choose from. For numerical computation in FPGAs, not so many. I'd guess that a large fraction of FPGA developers are doing one of two things: 1) digital signal processing, flow through kinds of stuff (error correcting codes, compression/decompression, crypto; 2) bus interface and data handling (PCI bus, disk drive controls, etc.). Interestingly, even with the relative scarcity of FPGA developers versus conventional CPU software, the average salaries aren't that far apart. The distribution on generic coders is wider (particularly on the low end.. Barriers to entry are lower for C,Java,whathaveyou code monkeys), but there are very, very few people making more than, say, 150-200k/yr doing either. (except in a few anomalous industries, where compensation is higher than normal in general). (also leaving out equity participation type deals) On 12/22/11 7:42 AM, Prentice Bisbal prent...@ias.edu wrote: On 12/22/2011 09:57 AM, Eugen Leitl wrote: On Thu, Dec 22, 2011 at 09:43:55AM -0500, Prentice Bisbal wrote: Or if your German is rusty: http://www.zdnet.com/blog/computers/amd-radeon-hd-7970-graphics-card-lau nched-benchmarked-fastest-single-gpu-board-available/7204 Wonder what kind of response will be forthcoming from nVidia, given developments like http://www.theregister.co.uk/2011/11/14/arm_gpu_nvidia_supercomputer/ It does seem that x86 is dead, despite good Bulldozer performance in Interlagos http://www.heise.de/newsticker/meldung/AMDs-Serverprozessoren-mit-Bulldoz er-Architektur-legen-los-1378230.html (engage dekrautizer of your choice). At SC11, it was clear that everyone was looking for ways around the power wall. I saw 5 or 6 different booths touting the use of FPGAs for improved performance/efficiency. I don't remember there being a single FPGA booth in the past. Whether the accelerator is GPU, FPGA, GRAPE, Intem MIC, or something else, I think it's clear that the future of HPC architecture is going to change radically in the next couple years, unless some major breakthrough occurs for commodity processors. I think DE Shaw Research's Anton computer, which uses FPGAs and custom processors, is an excellent example of what the future of HPC might look like. -- Prentice ___ Beowulf mailing list, beow...@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf ___ Beowulf mailing list, beow...@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf - End forwarded message - -- Eugen* Leitl a href=http://leitl.org;leitl/a http://leitl.org __ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE ___ fonc mailing list fonc@vpri.org http://vpri.org/mailman/listinfo/fonc
[fonc] [Beowulf] personal HPC
- Forwarded message from Douglas Eadline deadl...@eadline.org - From: Douglas Eadline deadl...@eadline.org Date: Thu, 22 Dec 2011 11:51:17 -0500 (EST) To: Beowulf Mailing List beow...@beowulf.org Subject: [Beowulf] personal HPC User-Agent: SquirrelMail/1.4.8-5.el4.centos.8 For those that don't know, I have been working on a commodity desk side cluster for a while. I have been writing about the progress at: http://limulus.basement-supercomputing.com/ Recently I was able to get 200 GFLOPS using Intel i5-2400S processors connected by GigE (58% of peak). Of course these are CPU FLOPS not GPU FLOPS and the design has a power/heat/performance/noise envelope that makes it suitable for true desk side computing. (for things like software development, education, small production work, and cloud staging) You can find the raw HPC numbers and specifications here: http://limulus.basement-supercomputing.com/wiki/CommercialLimulus BTW, if click the Nexlink Limulus link, you can take a survey for a chance to win one of these systems. Happy holidays -- Doug -- MailScanner: clean ___ Beowulf mailing list, beow...@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf - End forwarded message - -- Eugen* Leitl a href=http://leitl.org;leitl/a http://leitl.org __ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE ___ fonc mailing list fonc@vpri.org http://vpri.org/mailman/listinfo/fonc
[fonc] Re: [Beowulf] 3.79 TFlops sp, 0.95 TFlops dp, 264 TByte/s, 3 GByte, 198 W @ 500 EUR
- Forwarded message from Prentice Bisbal prent...@ias.edu - From: Prentice Bisbal prent...@ias.edu Date: Thu, 22 Dec 2011 11:53:39 -0500 To: Beowulf Mailing List beow...@beowulf.org Subject: Re: [Beowulf] 3.79 TFlops sp, 0.95 TFlops dp, 264 TByte/s, 3 GByte, 198 W @ 500 EUR User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.24) Gecko/2009 Red Hat/3.1.16-2.el6_1 Thunderbird/3.1.16 Just for the record - I'm only the messenger. I noticed a not-insignificant number of booths touting FPGAs at SC11 this year, so I reported on it. I also mentioned other forms of accelerators, like GPUs and Intel's MIC architecture. The Anton computer architecture isn't just a FPGA - it also has custom-designed processors (ASICS). The ASICs handle the parts of the molecular dynamics (MD) algorithms that are well-understood, and unlikely to change, and the FPGAs handle the parts of the algorithms that may change or might have room for further optimization. As far as I know, only 8 or 9 Antons have been built. One is at the Pittsburgh Supercomputing Center (PSC), the rest are for internal use at DE Shaw. A single Anton consists of 512 cores, and takes up 6 or 8 racks. Despite it's small size, it's orders of magnitude faster at doing MD calculations than even super computers like Jaguar and Roadrunner with hundreds of thousands of processors. So overall, Anton is several orders of magnitudes faster than an general-purpose processor based supercomputer. And sI'm sure it uses a LOT less power. I don't think the Anton's are clustered together, so I'm pretty sure the published performance on MD simulations is for a single Anton with 512 cores Keep in mind that Anton was designed to do only 1 thing: MD, so it probably can't even run LinPack, and if it did, I'm sure it's score would be awful. Also, the designers cut corners where they knew the safely could, like using fixed-precision (or is it fixed-point?) math, so the hardware design is only half the story in this example. Prentice On 12/22/2011 11:27 AM, Lux, Jim (337C) wrote: The problem with FPGAs (and I use a fair number of them) is that you're never going to get the same picojoules/bit transition kind of power consumption that you do with a purpose designed processor. The extra logic needed to get it reconfigurable, and the physical junction sizes as well, make it so. What you will find is that on certain kinds of problems, you can implement a more efficient algorithm in FPGA than you can in a conventional processor or GPU. So, for that class of problem, the FPGA is a winner (things lending themselves to fixed point systolic array type processes are a good candidate). Bear in mind also that while an FPGA may have, say, 10-million gate equivalent, any given practical design is going to use a small fraction of those gates. Fortunately, most of those unused gates aren't toggling, so they don't consume clock related power, but they do consume leakage current, so the whole clock rate vs core voltage trade winds up a bit different for FPGAs. The biggest problem with FPGAs is that they are difficult to write high performance software for. With FORTRAN on conventional and vectorized and pipelined processors, we've got 50 years of compiler writing expertise, and real high performance libraries. And, literally millions of people who know how to code in FORTRAN or C or something, so if you're looking for the highest performance coders, even at the 4 sigma level, you've got a fair number to choose from. For numerical computation in FPGAs, not so many. I'd guess that a large fraction of FPGA developers are doing one of two things: 1) digital signal processing, flow through kinds of stuff (error correcting codes, compression/decompression, crypto; 2) bus interface and data handling (PCI bus, disk drive controls, etc.). Interestingly, even with the relative scarcity of FPGA developers versus conventional CPU software, the average salaries aren't that far apart. The distribution on generic coders is wider (particularly on the low end.. Barriers to entry are lower for C,Java,whathaveyou code monkeys), but there are very, very few people making more than, say, 150-200k/yr doing either. (except in a few anomalous industries, where compensation is higher than normal in general). (also leaving out equity participation type deals) On 12/22/11 7:42 AM, Prentice Bisbal prent...@ias.edu wrote: On 12/22/2011 09:57 AM, Eugen Leitl wrote: On Thu, Dec 22, 2011 at 09:43:55AM -0500, Prentice Bisbal wrote: Or if your German is rusty: http://www.zdnet.com/blog/computers/amd-radeon-hd-7970-graphics-card-lau nched-benchmarked-fastest-single-gpu-board-available/7204 Wonder what kind of response will be forthcoming from nVidia, given developments like http://www.theregister.co.uk/2011/11/14/arm_gpu_nvidia_supercomputer/ It does seem that x86 is dead, despite good Bulldozer performance in
Re: [fonc] PARC founder Jacon Goldman dies at 90
Yes, Jack was a driving force and quite a character in so many ways. Cheers, Alan From: Long Nguyen cgb...@gmail.com To: fonc@vpri.org Sent: Thursday, December 22, 2011 9:47 AM Subject: [fonc] PARC founder Jacon Goldman dies at 90 http://www.nytimes.com/2011/12/22/business/jacob-e-goldman-founder-of-xerox-lab-dies-at-90.html?_r=2pagewanted=all ___ fonc mailing list fonc@vpri.org http://vpri.org/mailman/listinfo/fonc ___ fonc mailing list fonc@vpri.org http://vpri.org/mailman/listinfo/fonc
[fonc] FASTRA II 12TFLOPS desktop
For just 6000 euros, you can have 12TFLOPS of computing power at your fingertips. http://fastra2.ua.ac.be/ ___ fonc mailing list fonc@vpri.org http://vpri.org/mailman/listinfo/fonc