On 3/29/13 10:12 PM, Paul Robert Marino wrote:
well openmip is the app that executes it so thats where the limitation
is probably coming from.
With a little time on Google you will find plenty of posts on the
subject of openmpi not being able to take advantage of all the
resources available to it.
The problem is Ive never seen an answer as to why, not that I looked
all that long. Most of the suggestions talk about the ulimit setting
which on the surface makes some sense but those numbers aren't right
for an issue caused by a ulimit. the other the most of the openmpi
users who have asked the question and got told it was ulimits said
latter that adjusting the ulimits didn't fix their issues. so again it
sounds like a problem in the code for either openmpi or the code you
are trying to execute with it.
but the only other possibility is maybe SELinux is preventing
something that capping the memory somehow as a side effect but i doubt
it.
Very useful comments Paul. I am jumping to openmpi forum to ask if they
are of help. Anyway, is there a way of testing the total memory of the
system? Any simple bash program (no use of openmpi) that I can try for
all the cores so that I can know that my system can take up to 8GB RAM?
Thanks,
D.
On Thu, Mar 28, 2013 at 12:39 PM, Duke Nguyen <[email protected]> wrote:
On 3/28/13 9:00 PM, Paul Robert Marino wrote:
kernel.shmmax does nothing if you don't bump up kernel.shmall
accordingly but I can tell you the cap is something wrong with your
application not the OS.
at one time I supported an application that in normal operation used
64BG Resident memory per instance.
And currently my PostgreSQL servers often spike to as much as 2GB of
ram per connection and would use more if i didn't cap it there in the
configurations.
Interesting, I never knew of any server process that takes that much of
memory. Anyway, it is good to know :).
I don't think the kernel settings are your problem what language is
the application written in?
Is it executed by an other process like Apache or Tomcat for example?
The app (a material simulation app) is just an input file which will calling
abinit (http://www.abinit.org/) using openmpi to run. So it is executed by
abinit. At the time the app runs, we make sure that no other process
(apache, tomcat etc...) is running, so basically the app should take all
available memory.
Thanks,
D.
On Wed, Mar 27, 2013 at 11:09 PM, Duke Nguyen <[email protected]> wrote:
On 3/27/13 11:52 PM, Attilio De Falco wrote:
Just a stab in the dark, but did you check the Shared Memory kernel
parameter (shmmax), type "cat /proc/sys/kernel/shmmax". We have it set
very
high so that any process/thread can use as much memory as it needs. You
set
the limit to 1 GB without rebooting by typing "echo 1073741824 >
/proc/sys/kernel/shmmax" or modify /etc/sysctl.conf and add the line
"kernel.shmmax = 1073741824" so remains after a reboot. I'm not sure
about
abinit but some fortran programs need shmmax limit to be set high…
Hi Attilio, we already had it at very high value (not sure why, I never
changed/edited this value before)
[root@biobos:~]# sysctl -p
net.ipv4.ip_forward = 1
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
kernel.sysrq = 0
kernel.core_uses_pid = 1
net.ipv4.tcp_syncookies = 1
error: "net.bridge.bridge-nf-call-ip6tables" is an unknown key
error: "net.bridge.bridge-nf-call-iptables" is an unknown key
error: "net.bridge.bridge-nf-call-arptables" is an unknown key
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.shmmax = 68719476736
kernel.shmall = 4294967296
[root@biobos:~]# cat /proc/sys/kernel/shmmax
68719476736
Any other suggestions?
On Mar 26, 2013, at 9:59 PM, Duke Nguyen <[email protected]> wrote:
Hi folks,
We have SL6.3 64bit installed on a box with two quad core and 8GB RAM.
We
installed openmpi, Intel Studio XE and abinit to run parallel (8
cores/processes) some of our applications. To our surprise, the system
usually takes only about half of available memory (about 500MB each
core)
and then the job/task was killed with the low-resource error.
We dont really understand why there is a cap of "512MB" (I guess it
would
be 512MB instead of 500MB) for each of our cores whereas in theory,
each of
the core should be able to run up to 1GB. Any
suggestions/comments/experience about this issue?
Thanks in advance,
D.