Wow... Good to know.
Does anyone have an estimate of the amount of memory that can be
"overestimated" in that way, in an extreme situation (small code, large
shared library). Not in terms of numbers, but just to know if it can be
really significant or just a minor stuff.

Jérémie


2012/9/27 Brendan Moloney <[email protected]>:
> You have it right.  The virtual memory will almost always be an overestimate 
> of the memory needed. Another common issue is memory mapped files adding to 
> the virtual memory size.
>
> This issue isn't specific to SGE.
>
> Brendan
> ________________________________________
> From: [email protected] [[email protected]] On Behalf 
> Of Jérémie Dubois-Lacoste [[email protected]]
> Sent: Wednesday, September 26, 2012 2:56 PM
> To: [email protected]
> Subject: Re: [gridengine users] Memory values reported by SGE too high
>
> Thanks for the precision!
> So assuming that on one node I have 16 jobs running the same program
> (which uses
> shared libraries), the shared libraries will be put into RAM only one
> time (not 16)?
>
> But then, as far as I understood, all 16 processes will have this
> shared-libraries-memory
> included into their "virtual memory measurement", which SGE uses to
> check whether
> they exceed the memory limit for the node or not. While actually this
> shared-libraries-memory will be only once in the RAM, thus SGE
> overestimating their usage.
>
> I guess I am wrong somewhere...
>
> Jérémie
>
>
>
> 2012/9/26 Brendan Moloney <[email protected]>:
>> Virtual memory includes things like shared libraries (even though these are 
>> only loaded into memory once for all processes that use them).
>>
>> -Brendan
>> ________________________________________
>> From: [email protected] [[email protected]] On Behalf 
>> Of Jérémie Dubois-Lacoste [[email protected]]
>> Sent: Wednesday, September 26, 2012 3:10 AM
>> To: [email protected]
>> Subject: Re: [gridengine users] Memory values reported by SGE too high
>>
>> Oh! Thanks, my mistake.
>> So it seems SGE is correct with the memory measurement, it reports
>> the same values as what we see if we launch things directly on the
>> nodes. However these values are still surprisingly high.
>> We'll investigate further if something is wrong with our kernel.
>>
>> Thanks,
>>
>> Jérémie
>>
>>
>> 2012/9/25 Reuti <[email protected]>:
>>> Am 25.09.2012 um 14:26 schrieb Jérémie Dubois-Lacoste:
>>>
>>>> Hi All,
>>>>
>>>> We recently reinstalled our cluster and we have some serious issues.
>>>> Contrary to our previous installation, we now installed a fully 64bits
>>>> system. We use Rocks cluster 6\CentOS  6.3,
>>>> and SGE 6.2u5.
>>>>
>>>> The memory values reported by SGE are very high compared
>>>> to the actual need of every jobs, and many get killed because
>>>> they exceed the limit, while they should not.
>>>> I found this thread about too low memory reports:
>>>> http://comments.gmane.org/gmane.comp.clustering.gridengine.users/19303
>>>>
>>>> But I didn't find anything about too high memory reports...
>>>>
>>>>
>>>> Here is a simple test to make it clear:
>>>>
>>>> I submit a very stupid python script "minimal.py", wich is just:
>>>> -----
>>>> import time
>>>>
>>>> time.sleep(30)
>>>> print("done")
>>>> -----
>>>>
>>>> * I tried to run it directly to check the memory consumption with:
>>>> $ /usr/bin/time -v python minimal.py
>>>> And I get: Maximum resident set size (kbytes): 15376
>>>>
>>>>
>>>> * Then, when submitting the jobs with:
>>>> qsub -m ase -M <my_mail> -b y -N memTest -o test.out -e test.err -cwd
>>>> "python minimal.py"
>>>> I go checking on the computation node where it gets scheduled and I "top":
>>>> PID  USER     PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>>> 20240 myName   23   3  114m 3844 1832 S  0.0  0.0   0:00.14 python 
>>>> minimal.py
>>>
>>> The virtual size is listed here as 114m as well.
>>>
>>> -- Reuti
>>>
>>>
>>>> So I understand it uses 3.8Mb of RAM.
>>>>
>>>>
>>>> * But from the e-mail I get when the jobs terminate:
>>>> Job 1879536 (memTest) Complete
>>>> User = myName
>>>> Queue = [email protected]
>>>> Host = compute-3-0.local
>>>> Start Time = 09/25/2012 13:46:45
>>>> End Time = 09/25/2012 13:47:15
>>>> User Time = 00:00:00
>>>> System Time = 00:00:00
>>>> Wallclock Time = 00:00:30
>>>> CPU = 00:00:00
>>>> Max vmem = 114.441M
>>>> Exit Status = 0
>>>>
>>>>
>>>> It says 114Mb, I don't understand this huge difference.
>>>>
>>>>
>>>> The consequence is that most of the jobs get killed by "fakely" (I presume)
>>>> exceeding the hard memory limit. Any clue is welcome!
>>>>
>>>>
>>>> Sincerely,
>>>>
>>>>    Jérémie
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> [email protected]
>>>> https://gridengine.org/mailman/listinfo/users
>>>>
>>>
>>
>> _______________________________________________
>> users mailing list
>> [email protected]
>> https://gridengine.org/mailman/listinfo/users
>
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to