DongInn,

> [...]
> My question is more like asking if we really want to manage the third party 
> packages but not asking how we want to manage them by ourselves.
> Of course, we have very fundamental third party packages to ship in OSCAR and 
> it is worth to maintain them but I do not think that we want to manage all 
> the packages by ourselves.  I think it would be good enough if the packages 
> are shipped to the OSCAR with very basic features.
> For example, as a resource manager, torque + maui is good enough to run 
> parallel jobs on the cluster. If people want to have C/R features, they would 
> import BLCR or any other solutions.

Unfortunately, that's no so easy.
- If you want C/R, you needd to rebuild torque and openmpi for example.
- If you need to run openmpi under torque, you need to rebuild torque (that's 
why we build torque instead of using distro version).

> Some people prefer slurm to torque because it has much better management 
> interface. It seems to me that slurm has worked much better with IB than with 
> torque.

That would be a good addition in the future. slurm is supported by default in 
rhel and fedora, openmpi. Could be a good start for contributed packages. :-)

> Some people like to use just nagios rather than ganglia because they do not 
> care about the fancy GUI but they like more light weight and better 
> notification interface.
> Some please like mvapich / mpich better than openmpi because of the mpi3 
> implementation.
> There are a lot of other clustering third party packages that we do not ship 
> to OSCAR yet. We can not manage them all.

For sure.

> Instead, I would like to focus on setting up a system that any OSCAR users 
> can contribute to build the third party packages without any big learning 
> curve.
True, but there are some case that neid either to be dropped or to be 
enahanced. I have in mind openmpi and torque. Both are provided by rhel and 
fedora atleast, but both are unusable at the same time as openmpi has no 
support for tm (torque manager), thus if a user needs openmpi, he will have to 
rebuild the package. So the question becomes, do we continue to ship openmpi 
rebuild with torque option (need maintenance) or do we drop support.
Same for torque, if a user needs to have C/R, he will need to rebuild both 
torque and openmpi at least. And when he'll rebuild torque, he may ommit to 
specify specific prefix, and that can lead to opkg script failure. So in the 
end what we simplify on one side (less stuff to maintain), we'll loose it in 
maintenance when users encounter problems. (though I admit that those 
components are in the unsupported stuffs).
If a user doesn't need C/R, eventhough torque or openmpi are built with the 
support enabled, he can choose not to install it, and it'll work.

I perfectly understand what you mean, and maybe we could have the following 
component classes: core, included, contrib or external and we could move things 
like ganglia, gridengine, slurm, torque,blcr, ... to contrib.

> Once the system is ready, the OSCAR main frame work should be able to handle 
> the contributed packages without any issues. That is the one that we have to 
> make more effort on than building the packages by ourselves.

I agree, goal for OSCAR 7?

> [...]

> but withouth blcr, how can you stop a node with processes running? no way 
> without killing them, thus what would be the usefulness of a maintenance mode?
>Why do you think the C/R feature is so critical to the OSCAR cluster? We are 
>not a superman who can deal with all kinds of the sysadmin works on OSCAR.
> If necessary, they can manage it. It would be nice if OSCAR can provide a 
> system that the OSCAR users can easily build the packages and they can share 
> it with all the OSCAR users though.
I agree, I just wanted to highlight that it's not just a matter of building 
blcr package to have blcr feature. to have C/R, aside building blcr, you need 
to rebuild *mpi* and *queuemanagementsoftware* at least.

Of course, for the moment, we need to do a release so users stuck with RHEL5 
can upgrade, thus IMHO, for the moment, we should finish building external 
packages (not handeled by oscar-packager) by hand (systemimager and such) and 
once the repo is tested (deployment with success in step8 for core components), 
I think we can validate a release.
> I really appreciate your effort and hard work. I believe that the current 
> OSCAR with the trunk version is much better than before and I would like to 
> see that OSCAR will continue to evolve with many OSCAR users' participation.

Thanks DongInn, I have the same wish, I think that if the next stable release 
is really stable, it can be a good boost for increasing OSCAR comunity.


Regards,
--
   Olivier LAHAYE
   CEA DRT/LIST/DCSI/DIR
------------------------------------------------------------------------------
Own the Future-Intel® Level Up Game Demo Contest 2013
Rise to greatness in Intel's independent game demo contest.
Compete for recognition, cash, and the chance to get your game 
on Steam. $5K grand prize plus 10 genre and skill prizes. 
Submit your demo by 6/6/13. http://p.sf.net/sfu/intel_levelupd2d
_______________________________________________
Oscar-devel mailing list
Oscar-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-devel

Reply via email to