On Sun, Jun 22, 2014 at 3:12 AM, Reza Behjatmanesh-Ardakani <reza_b_m_a at yahoo.com> wrote: > Dear Axel > Thank you. It was very helpful for me. > As you said some new GTX cards have good DP floating point performance such > as GTX Ti Black or GTX Ti Z that for both DP is 1/3 of SP. > They are much cheaper than Tesla cards. > I am not sure that Ti Black or Ti Z has ECC.
no it hasn't. > Quadro K6000 has it. well, the quadro is practically a tesla with all graphics features enabled. ...at a price. > Thanks again. > > With the Best Regards > > Reza Behjatmanesh-Ardakani > Associate Professor of Physical Chemistry > Address: > Department of Chemistry, > School of Science, > Payame Noor University (PNU), > Ardakan, > Yazd, > Iran. > E-mails: > 1- reza_b_m_a at yahoo.com (preferred), > 2- behjatmanesh at pnu.ac.ir, > 3- reza.b.m.a at gmail.com. > > -------------------------------------------- > On Sat, 6/21/14, Axel Kohlmeyer <akohlmey at gmail.com> wrote: > > Subject: Re: [Pw_forum] A "relax" input runs on CPU (pw.x) but not on > CPU-GPU (pw-gpu.x) > To: "PWSCF Forum" <pw_forum at pwscf.org> > Date: Saturday, June 21, 2014, 1:50 PM > > On Sat, Jun 21, 2014 at 4:20 AM, Reza > Behjatmanesh-Ardakani > <reza_b_m_a at yahoo.com> > wrote: > > Dear Axel > > This was just a proposal. If I am right, Terachem code > can use gaming cards for GPU calculations (I saw some of its > authors' papers). > > yes, but terachem was written from ground up with new > algorithms to > avoid loss of precision. in quantum mechanics this is > important, since > a lot of calculations depend on comparing large numbers of > equal sign > and magnitude and looking at the difference. about the only > part of a > plane wave DFT calculation that is "conservative" in terms > of > precision without a massive redesign are the FFTs. the loss > of > precision is fairly small when replacing double precision > FFTs with > single precision ones. for the many 3d-FFTs required, this > is > particularly beneficial when trying to scale out via MPI, as > this > reduces the amount of bytes that need to be sent and copied > around in > half and also reduces the strain on memory bandwidth. > > > As you know, the main problem of GTX cards comes back > to two important things. One, single precision, and the > other lack of ECC. > > ECC is a lesser issue. and it is not a problem of single > precision, > but lack of performance with double precision due to having > only a > fraction of double precision units. another issue is the > lack of RAM. > also you have to distinguish between different GTX cards. a > few of the > most high-end consumer cards *do* have the full set of > double > precision units and a large amount of RAM. > > ECC is mostly relevant for people running a large number of > GPUs in a > supercomputer environment. > > > > > It is not necessary to write a stand alone code. We can > test the QE-GPU with both TESLA and/or GTX and QE (cpu > only), and compare the outputs. > > but it is pointless to run on a hardware that is not > competitive. > you'll have a hard time already to get a 2x speedup from > using a top > level tesla card vs. an all CPU run on a decent machine. > what would be > the point of having the GPU _decelerate_ your calculation? > > in general, a lot of the GPU stuff is hype and > misinformation. the > following is a bit old, but still worth a read: > > http://www.hpcwire.com/2011/12/13/ten_ways_to_fool_the_masses_when_giving_performance_results_on_gpus/ > > as a consequence of a very smart and successful PR strategy, > there is > now the impression that *any* kind of GPU will result in a > *massive* > speedup. even people with a laptop GPU with 2 SMs, no memory > bandwidth > are now expecting 100x speedups and more. however, except > for a few > corner cases and applications that are very well represented > on GPUs > (not very complex) and badly on a CPU, you will often get > more like a > 2x-5x speedup in a "best effort" comparison of a well > equipped host > with a high-end GPU. in part, this situation has become > worse with > some choices made by nvidia hardware and software engineers. > while 5 > years back, the difference between a consumer and a > computing GPU was > small, the consumer models have been systematically > "downgraded" (via > removing previously supported management features in the > driver and > having consumer cards be based on a simplified design that > mostly > makes them mid-level GPUs). > > > I tested it for only one case (rutile 3*3*2 supercell), > and saw that the GTX output is similar to the CPU one. > > > > However, It is needed to test for different cases and > different clusters to be sure that the lack of ECC and > double precision has no effect on results. > > sorry, this statement doesn't make any sense. it looks to > me, like you > need to spend some time learning what the technical > implications of > ECC and single-vs-double precision are (and the fact that it > is the > software that chooses which precision is used, not the > hardware).. > > whether a card has ECC or not. broken memory is broken > memory. and if > it works, it works. so there is not much to test. if you > want to find > out, whether your GPU has broken or borderline memory, run > the GPU > memtest. it is much more effective at finding issues than > any other > application. > > where ECC helps is for very long running calculations or > calculations > across a very large number of GPUs when a single bitflip can > render > the entire effort useless and result in a crash. in a dense > cluster > environment or badly cooled desktops, this is a high risk. > in a well > setup machine, it is less of a risk, but you have to keep in > mind that > running without ECC makes you "blind" for those errors. i > run a > cluster with a pile of Tesla GPUs and we have disabled ECC > since the > machines run very reliably due to some hacking around > restrictions > that nvidia engineers placed in their drivers. > https://sites.google.com/site/akohlmey/random-hacks/nvidia-gpu-coolness > > we also run consumer level GPUs, particularly in the login > nodes, > since they work fine for development and don't cost as > outrageously > much as the tesla models. however, for development, > absolute > performance is a lesser concern. > > > As Filippo said formerly for GTX cards, the output may > be not reproducible. However, I think due to the nature of > SCF algorithm, the code can be used at least > > when you have memory corruption due to bad/overheated > memory, no SCF > algorithm will save you. if you go back 10 years when CPUs > didn't have > all those power management and automatic self-protection and > also > memory modules in desktop were often of very low quality, > people > experienced a lot of problems. "signal 11" and "segmentation > fault" > were a common topic in many mailing lists on scientific (or > other) > software that caused a high CPU load. > > but the indication of broken memory was usually a crash due > to a > segfault or bad data corruption leading to a massive change > in numbers > and often NaNs. once you have a single NaN in your data, it > will > spread like a highly infective virus and render the > calculation > invalid. > > a well set up consumer level GPU will run as reliable as a > tesla or > better, only you cannot tell since the nvidia tools will not > show you. > the main issue is performance and available memory. > > > for VC-RELAX, RELAX, and SCF types of calculations with > GTX cards. Of course, it should be tested. Thank you for > your interest. > > you are not making much sense here either. but if it makes > you feel > better to do those tests, don't let me discourage you. > sometimes > people learn the best this way. > > axel. > > > > With the Best Regards > > > > Reza Behjatmanesh-Ardakani > > Associate Professor of Physical Chemistry > > Address: > > Department of Chemistry, > > School of Science, > > Payame Noor University (PNU), > > Ardakan, > > Yazd, > > Iran. > > E-mails: > > 1- reza_b_m_a at yahoo.com > (preferred), > > 2- behjatmanesh at pnu.ac.ir, > > 3- reza.b.m.a at gmail.com. > > > > -------------------------------------------- > > On Fri, 6/20/14, Axel Kohlmeyer <akohlmey at gmail.com> > wrote: > > > > Subject: Re: [Pw_forum] A "relax" input runs on > CPU (pw.x) but not on CPU-GPU (pw-gpu.x) > > To: "PWSCF Forum" <pw_forum at pwscf.org> > > Date: Friday, June 20, 2014, 2:19 PM > > > > On Fri, Jun 20, 2014 at 4:22 AM, Reza > > Behjatmanesh-Ardakani > > <reza_b_m_a at yahoo.com> > > wrote: > > > Dear Filippo > > > > > > Due to the nature of QE which is iterative, > I think > > lack of ECC and even double precision floating > point in > > gaming cards (GTX) comparing to tesla cards > > > > > > is not serious problem for QE-GPU. Some > authors have > > checked this for AMBER molecular dynamics > simulation code. > > See following site: > > > > classical MD is a very different animal than what > you do > > with QE. > > errors due to single precision to some properties > in > > classical MD are > > huge with all single precision calculations. to > compute a > > force from a > > distance will not be much affected, but summing > up the force > > can > > already be a problem. "good" classical MD codes > usually > > employ a mixed > > precision approach, where only the accuracy > insensitive > > parts are done > > in single precision. for very large system, even > double > > precision can > > show significant floating point truncation > errors. usually > > you are > > dependent on error cancellation, too, i.e. when > you study a > > simple > > homogenous system (as it is quite common in those > tests). > > > > > > > > > > http://www.hpcwire.com/2014/03/13/ecc-performance-price-worth-gpus > > > > > > > > > and see the following paper: > > > > > > > > > > > > www.rosswalker.co.uk/papers/2014_03_ECC_AMBER_Paper_10.1002_cpe.3232.pdf > > > > > > > > > > > > I encourage the users of QE-GPU to test it > for QE, and > > report the difference on the site. > > > > it is a waste of time and effort. people have > done DFT and > > HF in > > (partial) single precision before and only if you > write a > > new code > > from scratch and have an extremely skilled > programmer, you > > will > > succeed. have a look at the terachem software out > of the > > group of todd > > martinez for example. > > > > axel. > > > > > PS: to be able to test the results for GTX > and TESLA, > > it is needed QE-GPU code to be run on GTX :-) > > > > > > > > > With the Best Regards > > > > > > Reza Behjatmanesh-Ardakani > > > Associate Professor of Physical > Chemistry > > > Address: > > > Department of Chemistry, > > > School of Science, > > > Payame Noor University (PNU), > > > Ardakan, > > > Yazd, > > > Iran. > > > E-mails: > > > > 1- reza_b_m_a at yahoo.com > > (preferred), > > > > 2- behjatmanesh at pnu.ac.ir, > > > > 3- reza.b.m.a at gmail.com. > > > > _______________________________________________ > > > Pw_forum mailing list > > > Pw_forum at pwscf.org > > > http://pwscf.org/mailman/listinfo/pw_forum > > > > > > > > -- > > Dr. Axel Kohlmeyer akohlmey at gmail.com > > http://goo.gl/1wk0 > > College of Science & Technology, Temple > University, > > Philadelphia PA, USA > > International Centre for Theoretical Physics, > Trieste. > > Italy. > > _______________________________________________ > > Pw_forum mailing list > > Pw_forum at pwscf.org > > http://pwscf.org/mailman/listinfo/pw_forum > > > > > > _______________________________________________ > > Pw_forum mailing list > > Pw_forum at pwscf.org > > http://pwscf.org/mailman/listinfo/pw_forum > > > > -- > Dr. Axel Kohlmeyer akohlmey at gmail.com > http://goo.gl/1wk0 > College of Science & Technology, Temple University, > Philadelphia PA, USA > International Centre for Theoretical Physics, Trieste. > Italy. > _______________________________________________ > Pw_forum mailing list > Pw_forum at pwscf.org > http://pwscf.org/mailman/listinfo/pw_forum > > > _______________________________________________ > Pw_forum mailing list > Pw_forum at pwscf.org > http://pwscf.org/mailman/listinfo/pw_forum -- Dr. Axel Kohlmeyer akohlmey at gmail.com http://goo.gl/1wk0 College of Science & Technology, Temple University, Philadelphia PA, USA International Centre for Theoretical Physics, Trieste. Italy.
