Come and try our Jenkin SGE Plugin, https://github.com/jenkinsci/sge-cloud-plugin.
It has been performing well for us in our enterprise application. John McGehee Wave Computing On Sat, Apr 23, 2016 at 8:12 AM, Dr. Mark Asbach <mark.asb...@pixolus.de> wrote: > Hi S(o)GE users, > > I need some advice :-) > > During my Ph.D. times, I discovered Sun Grid Engine and used it to run > distributed machine learning jobs on a (then) medium sized cluster (96 > CPUs). I liked it. Now, a couple of years later, I am again looking for a > scheduling and resource allocation system like SGE for a similar purpose. > Unfortunately, SGE seems to be pretty dead. In addition, I have similar but > not identical needs stemming from continuous integration and from running > (micro-)web services. Ideally, I would like a simple, integrated solution > and not a complex monster built from many large parts. > > Here's what I'm trying to accomplish: > > - Run custom jobs for machine learning / data analysis. When I have an > idea, I write a job and run it. Usually, the same job is only run a few > times. Jobs will span multiple hosts and might require OpenMP + MPI. This > is where SGE was really good in the past. The crowd seems to have shifted > to run everything on Hadoop although this setup would be really ineffective > for my purposes. I usually just need a couple of CPUs (< 100). > > - Run frequent identical jobs for continous integration. We have a Jenkins > running, but it is lacking in some regards. Resource allocation and > scheduling is more or less non-existent. For example, I cannot define > resources for things like attached mobile devices that can be used only by > one job of a multi-core Mac at the same time. These are things already > solved with SGE, but SGE itself does not cover the main aspects of CI, i.e. > the collection and analysis of the build data. > > - Run (micro-)services. We have a couple of services that need run > continuously. Some need to be scaled up and down regarding the number of > parallel instances. This is where people are now using Docker and (also > quite complex) resource allocation and scheduling systems like kubernetes. > > All three sorts of tasks compete for the same resources and suffer the > same problem of provisioning/configuring the workers to fulfill a job's > requirements. We're using Vagrant + ansible to provision VMs for our > machine learning tasks and I would like to extend this to the other > problems as well. The resource allocation is still somewhat manual in our > case. I would really like to cut down the complexity of our setup. > > It would be great if you can point to me any helpful information, ideas, > projects that could help me solve this. > > Best, > Mark > _______________________________________________ > users mailing list > users@gridengine.org > https://gridengine.org/mailman/listinfo/users > >
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users