Thanks for the precision!
So assuming that on one node I have 16 jobs running the same program
(which uses
shared libraries), the shared libraries will be put into RAM only one
time (not 16)?

But then, as far as I understood, all 16 processes will have this
shared-libraries-memory
included into their "virtual memory measurement", which SGE uses to
check whether
they exceed the memory limit for the node or not. While actually this
shared-libraries-memory will be only once in the RAM, thus SGE
overestimating their usage.

I guess I am wrong somewhere...

Jérémie



2012/9/26 Brendan Moloney <[email protected]>:
> Virtual memory includes things like shared libraries (even though these are 
> only loaded into memory once for all processes that use them).
>
> -Brendan
> ________________________________________
> From: [email protected] [[email protected]] On Behalf 
> Of Jérémie Dubois-Lacoste [[email protected]]
> Sent: Wednesday, September 26, 2012 3:10 AM
> To: [email protected]
> Subject: Re: [gridengine users] Memory values reported by SGE too high
>
> Oh! Thanks, my mistake.
> So it seems SGE is correct with the memory measurement, it reports
> the same values as what we see if we launch things directly on the
> nodes. However these values are still surprisingly high.
> We'll investigate further if something is wrong with our kernel.
>
> Thanks,
>
> Jérémie
>
>
> 2012/9/25 Reuti <[email protected]>:
>> Am 25.09.2012 um 14:26 schrieb Jérémie Dubois-Lacoste:
>>
>>> Hi All,
>>>
>>> We recently reinstalled our cluster and we have some serious issues.
>>> Contrary to our previous installation, we now installed a fully 64bits
>>> system. We use Rocks cluster 6\CentOS  6.3,
>>> and SGE 6.2u5.
>>>
>>> The memory values reported by SGE are very high compared
>>> to the actual need of every jobs, and many get killed because
>>> they exceed the limit, while they should not.
>>> I found this thread about too low memory reports:
>>> http://comments.gmane.org/gmane.comp.clustering.gridengine.users/19303
>>>
>>> But I didn't find anything about too high memory reports...
>>>
>>>
>>> Here is a simple test to make it clear:
>>>
>>> I submit a very stupid python script "minimal.py", wich is just:
>>> -----
>>> import time
>>>
>>> time.sleep(30)
>>> print("done")
>>> -----
>>>
>>> * I tried to run it directly to check the memory consumption with:
>>> $ /usr/bin/time -v python minimal.py
>>> And I get: Maximum resident set size (kbytes): 15376
>>>
>>>
>>> * Then, when submitting the jobs with:
>>> qsub -m ase -M <my_mail> -b y -N memTest -o test.out -e test.err -cwd
>>> "python minimal.py"
>>> I go checking on the computation node where it gets scheduled and I "top":
>>> PID  USER     PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>> 20240 myName   23   3  114m 3844 1832 S  0.0  0.0   0:00.14 python 
>>> minimal.py
>>
>> The virtual size is listed here as 114m as well.
>>
>> -- Reuti
>>
>>
>>> So I understand it uses 3.8Mb of RAM.
>>>
>>>
>>> * But from the e-mail I get when the jobs terminate:
>>> Job 1879536 (memTest) Complete
>>> User = myName
>>> Queue = [email protected]
>>> Host = compute-3-0.local
>>> Start Time = 09/25/2012 13:46:45
>>> End Time = 09/25/2012 13:47:15
>>> User Time = 00:00:00
>>> System Time = 00:00:00
>>> Wallclock Time = 00:00:30
>>> CPU = 00:00:00
>>> Max vmem = 114.441M
>>> Exit Status = 0
>>>
>>>
>>> It says 114Mb, I don't understand this huge difference.
>>>
>>>
>>> The consequence is that most of the jobs get killed by "fakely" (I presume)
>>> exceeding the hard memory limit. Any clue is welcome!
>>>
>>>
>>> Sincerely,
>>>
>>>    Jérémie
>>>
>>> _______________________________________________
>>> users mailing list
>>> [email protected]
>>> https://gridengine.org/mailman/listinfo/users
>>>
>>
>
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to