[Beowulf] Is comaring HPC to Formula1 a bad idea?

2013-04-18 Thread Hearns, John
http://www.isc-events.com/isc13/isc_blog/items/is-comparing-hpc-to-formula1-a-bad-idea.html Thankyou Andrew! It combines the rapidly evolving hardware (the car is aggressively innovated throughout the race season) with a collaborations of many different high end skills (e.g., driver, pit

Re: [Beowulf] Is comaring HPC to Formula1 a bad idea?

2013-04-18 Thread Mark Hahn
http://www.isc-events.com/isc13/isc_blog/items/is-comparing-hpc-to-formula1-a-bad-idea.html I think there probably is an F1-like HPC subculture, though I would argue that it's relatively fringe. the top fringe, of course, but fringe none the less. mostly bespoke hardware - even in cases where

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Ellis H. Wilson III
On 04/17/2013 12:56 PM, Joe Landman wrote: Without naming names ... we had a cluster we had set up several years ago, with a particular cluster distribution compromised by an errant graduate student running windows on a compromised laptop. They couldn't break into the cluster, so they

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Joe Landman
On 04/18/2013 10:37 AM, Ellis H. Wilson III wrote: [...] Please note: I NEVER run as root, I just tinker as root. I don't think there is ever a good reason to run as root. But having and using root is not so evil as you claim. In particular, I have NO doubt you require root to build

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Ellis H. Wilson III
On 04/18/2013 10:52 AM, Joe Landman wrote: On 04/18/2013 10:37 AM, Ellis H. Wilson III wrote: [...] Please note: I NEVER run as root, I just tinker as root. I don't think there is ever a good reason to run as root. But having and using root is not so evil as you claim. In particular, I

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Joe Landman
On 04/18/2013 11:09 AM, Ellis H. Wilson III wrote: [...] I am guessing your work doesn't involve a great deal of support. Support in grad school? Nope. Not really. Not to do what I need to do, at least. If you are referring to if I have to support someone, well, that's also a big no. So

Re: [Beowulf] Register article on Linux State of the Union

2013-04-18 Thread Joshua Mora
Search on the web for instance PGAS over Ethernet to get an idea of where _some_ of those things are headed. Joshua -- Original Message -- Received: 05:50 PM CEST, 04/18/2013 From: Douglas Eadline deadl...@eadline.org To: Hearns, John john.hea...@mclaren.comCc: beowulf@beowulf.org

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Nicholas M Glykos
snip Running as root? Yeah, its that bad. Just say no. Are you setting yourself up as arbiter of who should and who should not run as root? Please - respect those of us who have the capabilities, experience, and juice to do so (when cirumstances demand it). /snip I think that what

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Joe Landman
On 4/17/2013 5:32 PM, Max R. Dechantsreiter wrote: [...] Running as root? Yeah, its that bad. Just say no. Are you setting yourself up as arbiter of who should and who should not run as root? Please - respect those of us who have the capabilities, experience, and juice to do so (when

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Max R. Dechantsreiter
On Thu, 18 Apr 2013, Nicholas M Glykos wrote: snip Running as root? Yeah, its that bad. Just say no. Are you setting yourself up as arbiter of who should and who should not run as root? Please - respect those of us who have the capabilities, experience, and juice to do so (when

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Ellis H. Wilson III
On 04/18/2013 12:21 PM, Nicholas M Glykos wrote: In the same way you wouldn't allow a general user to override the safety interlocks of an X-ray generator, you shouldn't allow root access to the general users of a shared computing facility. Please describe what the grad student who pioneers

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Nicholas M Glykos
I run my X-ray generator with all safety interlocks off - personal responsibility are my watchwords. Does your radiation safety officer knows that ? ;-) It seems that some who decry the nanny state feel less libertarian when it comes to cluster management and use. This is not about

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Max R. Dechantsreiter
It seems that some who decry the nanny state feel less libertarian when it comes to cluster management and use. This is not about clusters. This is about dependable, responsible and This is a discussion in Beowulf.org - how could it not be about clusters? professional use of expensive

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Nicholas M Glykos
In the same way you wouldn't allow a general user to override the safety interlocks of an X-ray generator, you shouldn't allow root access to the general users of a shared computing facility. Please describe what the grad student who pioneers new tech for X-ray generators should do,

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Ellis H. Wilson III
On 04/18/2013 12:57 PM, Nicholas M Glykos wrote: In the same way you wouldn't allow a general user to override the safety interlocks of an X-ray generator, you shouldn't allow root access to the general users of a shared computing facility. Please describe what the grad student who

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Hearns, John
cluster and build my own storage at home, so I can do research without constantly having day or more delays in trying to flush a cache or do something similarly simple but requiring of root. As an aside, a normal user can trigger a drop of the caches before the start of a job. If you have

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Hearns, John
I run my X-ray generator with all safety interlocks off - personal responsibility are my watchwords. Does your radiation safety officer knows that ? ;-) First running down stairs with scissors, now taking the interlocks off Xray sources. Who knew that Beowulfery really was a sport for

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Max R. Dechantsreiter
Lets say ... ah ... Safety is very important. Pretending it isn't, or saying bad things can't happen to me because I is smart isn't quite a safe strategy ... for computing, for x-ray interlocks, for driving on a highway ... So, following your logic, I take it you don't drive on the

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Joe Landman
On 04/18/2013 01:07 PM, Hearns, John wrote: As an aside, a normal user can trigger a drop of the caches before the start of a job. If you have looked into it, sudo echo 3 /proc/sys/vm/drop_caches is well nigh impossible. But you can run an suid C program which does effectively the same

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Ellis H. Wilson III
On 04/18/2013 02:35 PM, Joe Landman wrote: landman@metal:~$ sudo ./drop_caches.bash [sudo] password for landman: landman@metal:~$ # PROFIT!!! This is exactly how I do things. I've got a whole folder of special scripts that require su. But getting the script developed usually requires

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Adam DeConinck
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tying in another recent discussion on the list, root access is actually one of the places I've seen some success using Cloud for HPC. It costs more, it's virtualized, and you usually can't get HPC-specialized hardware, so it's obviously not a silver

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Mark Hahn
If you have looked into it, sudo echo 3 /proc/sys/vm/drop_caches is well nigh impossible. But you can run an suid C program which does effectively the same job. sudo sysctl -w vm.drop_caches=3 is the smarter way to do it, or a fixed executable with sudo. or a fixed executable with suid.

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Joe Landman
On 04/18/2013 02:45 PM, Adam DeConinck wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tying in another recent discussion on the list, root access is actually one of the places I've seen some success using Cloud for HPC. It costs more, it's virtualized, and you usually can't get

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Nicholas M Glykos
I term this article fun with sudo, or how to drive down I95 at 65mph while holding scissors transporting your x-ray device :- -- Nicholas M. Glykos, Department of Molecular Biology and Genetics, Democritus University of Thrace, University Campus, Dragana, 68100

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Ellis H. Wilson III
On 04/18/2013 02:55 PM, Joe Landman wrote: [I am not BOFH ... I am not BOFH ... I am not BOFH ...] It should be against list policy for some of you somewhat more experienced guys to share insanely hilarious tomes of literature I managed to miss in the nineties. This is bound to suck down many

[Beowulf] nVidia Kepler GK110 GPU is incompatible w/Intel x86 hardware in PCI-E 3.0 mode ?

2013-04-18 Thread Mikhail Kuzminsky
I've cluster node (w/Linux, of course) based on Supermicro X9SCA system board and Xeon E3-1230v2 having LGA1155 socket. Now I want to buy GPU nVidia Kepler GK110 w/PCI-E 3.0 (CK20 Compute Board from PNY ?) and install it into my node.  Intel Xeon E3-1230v2 and Supermicro X9SCA both support

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Max R. Dechantsreiter
sudo sysctl -w vm.drop_caches=3 is the smarter way to do it, or a fixed executable with sudo. or a fixed executable with suid. or better yet: have the system do it when appropriate, since inappropriate drop_caches could cause problems. What problems? http://linux-mm.org/Drop_Caches

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Mark Hahn
What problems? performance, of course. drop_caches is really only sane for benchmarking, where you want to control for hot/cold caches. otherwise, you're almost certainly better off either letting the kernel optimize global caching, and/or fix your application to avoid polluting the cache

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Max R. Dechantsreiter
On Thu, 18 Apr 2013, Mark Hahn wrote: What problems? performance, of course. drop_caches is really only sane for benchmarking, where you want to control for hot/cold caches. Indeed. I thought you might know of harmful instances of which I was unaware. otherwise, you're almost

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Craig Tierney - NOAA Affiliate
Only for benchmarking? We have done this for years on our production clusters (and SGI provides a tool this and more to clean up nodes). We have this in our epilogue so that we can clean out memory on our diskless nodes so there is nothing stale sitting around that can impact the next users job.

[Beowulf] Are disk MTBF ratings at all useful?

2013-04-18 Thread mathog
High end SATA and SAS disks claim MTBF values that work out to over 100 years, and yet it is a common observation that certain models fail at rates entirely inconsistent with those values. For instance, 75% of all drives of one model dead in 6 years. (Cited by one poster in this thread:

Re: [Beowulf] Are disk MTBF ratings at all useful?

2013-04-18 Thread Alex Chekholko
On Thu, Apr 18, 2013 at 4:01 PM, mathog mat...@caltech.edu wrote: How do they come up with the MTBF values for disks anyway? Clearly it is not based on watching a large sample of disks for countless years! Hi David, How would you do it? Regards, Alex

Re: [Beowulf] Are disk MTBF ratings at all useful?

2013-04-18 Thread Joe Landman
On 4/18/2013 7:01 PM, mathog wrote: High end SATA and SAS disks claim MTBF values that work out to over 100 years, and yet it is a common Amazing isn't it. Disks that never fail! observation that certain models fail at rates entirely inconsistent with those values. For instance, 75% of

Re: [Beowulf] Are disk MTBF ratings at all useful?

2013-04-18 Thread mathog
On 18-Apr-2013 16:03, Alex Chekholko wrote: On Thu, Apr 18, 2013 at 4:01 PM, mathog mat...@caltech.edu wrote: How do they come up with the MTBF values for disks anyway? Clearly it is not based on watching a large sample of disks for countless years! How would you do it? On a brand new

Re: [Beowulf] Are disk MTBF ratings at all useful?

2013-04-18 Thread Lux, Jim (337C)
You set up 1000 drives, run them at high temperature (using a scaling factor developed by experience) and count how many fail after some length of time, then extrapolate to a failure rate which gets turned into a MTBF. It *is* fairly scientific and based on sound principles, although there are

Re: [Beowulf] Are disk MTBF ratings at all useful?

2013-04-18 Thread Geoffrey Jacobs
On 04/18/2013 06:40 PM, Joe Landman wrote: Statistical analysis (called a bathtub analysis). MTBF's are WAGs at best, and not well matched against empirical observation. Is anyone familiar with QC steps taken by hard drive manufacturers to eliminate infant mortality problems? Do any third

Re: [Beowulf] Are disk MTBF ratings at all useful?

2013-04-18 Thread Lux, Jim (337C)
Jim Lux -Original Message- From: beowulf-boun...@beowulf.org [mailto:beowulf-boun...@beowulf.org] On Behalf Of mathog Sent: Thursday, April 18, 2013 4:21 PM To: Alex Chekholko Cc: Beowulf List Subject: Re: [Beowulf] Are disk MTBF ratings at all useful? On 18-Apr-2013 16:03, Alex

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Mark Hahn
Only for benchmarking? We have done this for years on our production clusters (and SGI provides a tool this and more to clean up nodes). We have this in our epilogue so that we can clean out memory on our diskless nodes so there is nothing stale sitting around that can impact the next users

Re: [Beowulf] Are disk MTBF ratings at all useful?

2013-04-18 Thread Ellis H. Wilson III
On 04/18/2013 07:01 PM, mathog wrote: How do they come up with the MTBF values for disks anyway? Clearly it is not based on watching a large sample of disks for countless years! I am not intimately familiar with how they come up with the values (else I probably would be at liberty to