Re: [plug] Linux HPC question

Ariz Jacinto Sat, 13 Oct 2007 15:50:42 -0700

can you be more specific to your setup? is it an HPC or HTC?
can you also elaborate on your problem? does the job stays
idle on the low-end node? the way you deal with the problem
is the typical way of responding to such but be done automatically
via the job scheduler. and since you've already identified those
problematic nodes, you might want to pull them out of the
cluster, place them in a sandbox and then troubleshoot them
further.





On 10/13/07, Michael Calizo <[EMAIL PROTECTED]> wrote:
>
> Hi Guys,
>
> A newbie here needs an expert opinion regarding Linux HPC.
>
> In my current company we have a Linux(Redhat) cluster implementation, say
> 100 nodes per cluster.
> I notice that on the problematic cluster, some nodes are low end server
> say 2GB memory while the
> other nodes have 4GB memory. This past few weeks I noticed that user
> problem keeps on growing and
> base on my investigation, the leftover jobs is always on the compute nodes
> which are "low end".
> We manage to stop/kill/restart the jobs but I know that this is only a
> temporary solution and I wanted a permanent one.
>
> 1. I am suspecting that this might be a hardware related problem but I am
> not 100% sure. I want to get opinion/suggestion first from HPC guru before I
> make my move to approach the management and raise my case that hardware
> upgrade is needed.
>
> 2. Or can this problem be attributed to the cluster missconfiguration?
>
> Thanks in advance.
>
> --
> Mike Calizo
> Registered Linux User # 365113
>

_________________________________________________
Philippine Linux Users' Group (PLUG) Mailing List
[email protected] (#PLUG @ irc.free.net.ph)
Read the Guidelines: http://linux.org.ph/lists
Searchable Archives: http://archives.free.net.ph

Re: [plug] Linux HPC question

Reply via email to