Dear William,

On Sat, Jan 03, 2015 at 08:23:15PM -0800, William Stein wrote:
> The computer combinat.math.washington.edu is down... again.  Sort of.
> It responds to ping requests, but I can't ssh in.
> 
> I suspect that not a lot of people are actively using it lately, since
> this is the second time it has gone down for over a week in the last 3
> months, and nobody (except my student Hao Chen), seems to have
> noticed.

I actually had noticed that the web server was down a couple weeks ago
and recently, but had not gotten the time to report.

> I'm considering doing the following.  I'll shutdown combinat
> completely, reformat the disk, and set it up as a node of the
> cloud.sagemath.com (SMC).   It'll still have the amazing 64 cores and
> huge (192GB) RAM.  However, instead of login in directly to it, people
> can email me to request that I move a particular SMC project to
> combinat.  It will then have access to expanded compute resources.
> The advantage of this, is that it is much easier for me to maintain.
> In particular, SMC has automated scripts to take care of using cgroups
> to explicitly limit usage of compute resources by a given project, I
> have extensive monitoring code in place so I know when things go down,
> and I everything runs in virtual machines, so when there are problems
> I can easily fix them in a few minutes remotely.  Also, it's much
> easier to grant fair usage to projects.    As it is now with default
> linux on combinat, basically any user can just bring down the computer
> by using too much memory/disk/whatever, which is probably what
> happened in this case (I don't know).
> 
> Thoughts?
> 
> Obviously, this may be a bit slower and the max memory will be less
> (as things are in a VM) for specific research-level computations.
> However, a working computer is way better than a regularly-crashing
> computer, in my opinion.    Also, given the weeks of downtime that
> nobody (except Hao) notices, maybe people aren't using combinat at all
> anyways, due to it being only a remote linux box.   Personally, I
> think SMC makes using remote Linux boxes much easier.

I believe people are actually using combinat relatively intensively,
but on a peak basis. Or for very long term calculations. So there can
indeed be long periods where they don't connect.

We had already discussed running the calculations on combinat within
virtual machines, for safeguards, ease of access and
checkpoints. Going through SMC seems a reasonable way to achieve
this. So +1 on my side, especially now that it's open source!

Besides, it will give me an occasion to use SMC seriously, which I
should for our VRE grant :-)

Cheers,
                                Nicolas
--
Nicolas M. ThiƩry "Isil" <nthi...@users.sf.net>
http://Nicolas.Thiery.name/

-- 
You received this message because you are subscribed to the Google Groups 
"sage-combinat-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-combinat-devel+unsubscr...@googlegroups.com.
To post to this group, send email to sage-combinat-devel@googlegroups.com.
Visit this group at http://groups.google.com/group/sage-combinat-devel.
For more options, visit https://groups.google.com/d/optout.

Reply via email to