Re: [gpfsug-discuss] pagepool shrink doesn't release all memory

2018-02-25 Thread Sven Oehme
Hi,

i guess you saw that in some of my presentations about communication code
overhaul. we started in 4.2.X and since then added more and more numa
awareness to GPFS. Version 5.0 also has enhancements in this space.

sven

On Sun, Feb 25, 2018 at 8:54 AM Aaron Knister <aaron.s.knis...@nasa.gov>
wrote:

> Hi Stijn,
>
> Thanks for sharing your experiences-- I'm glad I'm not the only one
> whose had the idea (and come up empty handed).
>
> About the pagpool and numa awareness, I'd remembered seeing something
> about that somewhere and I did some googling and found there's a
> parameter called numaMemoryInterleave that "starts mmfsd with numactl
> --interleave=all". Do you think that provides the kind of numa awareness
> you're looking for?
>
> -Aaron
>
> On 2/23/18 9:44 AM, Stijn De Weirdt wrote:
> > hi all,
> >
> > we had the same idea long ago, afaik the issue we had was due to the
> > pinned memory the pagepool uses when RDMA is enabled.
> >
> > at some point we restarted gpfs on the compute nodes for each job,
> > similar to the way we do swapoff/swapon; but in certain scenarios gpfs
> > really did not like it; so we gave up on it.
> >
> > the other issue that needs to be resolved is that the pagepool needs to
> > be numa aware, so the pagepool is nicely allocated across all numa
> > domains, instead of using the first ones available. otherwise compute
> > jobs might start that only do non-local doamin memeory access.
> >
> > stijn
> >
> > On 02/23/2018 03:35 PM, IBM Spectrum Scale wrote:
> >> AFAIK you can increase the pagepool size dynamically but you cannot
> shrink
> >> it dynamically.  To shrink it you must restart the GPFS daemon.   Also,
> >> could you please provide the actual pmap commands you executed?
> >>
> >> Regards, The Spectrum Scale (GPFS) team
> >>
> >>
> --
> >> If you feel that your question can benefit other users of  Spectrum
> Scale
> >> (GPFS), then please post it to the public IBM developerWroks Forum at
> >>
> https://www.ibm.com/developerworks/community/forums/html/forum?id=----0479
> >> .
> >>
> >> If your query concerns a potential software error in Spectrum Scale
> (GPFS)
> >> and you have an IBM software maintenance contract please contact
> >> 1-800-237-5511 <(800)%20237-5511> in the United States or your local
> IBM Service Center in
> >> other countries.
> >>
> >> The forum is informally monitored as time permits and should not be used
> >> for priority messages to the Spectrum Scale (GPFS) team.
> >>
> >>
> >>
> >> From:   Aaron Knister <aaron.s.knis...@nasa.gov>
> >> To: <gpfsug-discuss@spectrumscale.org>
> >> Date:   02/22/2018 10:30 PM
> >> Subject:Re: [gpfsug-discuss] pagepool shrink doesn't release all
> >> memory
> >> Sent by:gpfsug-discuss-boun...@spectrumscale.org
> >>
> >>
> >>
> >> This is also interesting (although I don't know what it really means).
> >> Looking at pmap run against mmfsd I can see what happens after each
> step:
> >>
> >> # baseline
> >> 7fffe4639000  59164K  0K  0K  0K  0K ---p [anon]
> >> 7fffd837e000  61960K  0K  0K  0K  0K ---p [anon]
> >> 0200 1048576K 1048576K 1048576K 1048576K  0K rwxp [anon]
> >> Total:   1613580K 1191020K 1189650K 1171836K  0K
> >>
> >> # tschpool 64G
> >> 7fffe4639000  59164K  0K  0K  0K  0K ---p [anon]
> >> 7fffd837e000  61960K  0K  0K  0K  0K ---p [anon]
> >> 0200 67108864K 67108864K 67108864K 67108864K  0K rwxp
> >> [anon]
> >> Total:   67706636K 67284108K 67282625K 67264920K  0K
> >>
> >> # tschpool 1G
> >> 7fffe4639000  59164K  0K  0K  0K  0K ---p [anon]
> >> 7fffd837e000  61960K  0K  0K  0K  0K ---p [anon]
> >> 02000140 139264K 139264K 139264K 139264K  0K rwxp [anon]
> >> 020fc940 897024K 897024K 897024K 897024K  0K rwxp [anon]
> >> 020009c0 66052096K  0K  0K  0K  0K rwxp [anon]
> >> Total:   67706636K 1223820K 1222451K 1204632K  0K
> >>
> >> Even though mmfsd has that 64G chunk allocated there's none of it
> >> *used*. I

Re: [gpfsug-discuss] pagepool shrink doesn't release all memory

2018-02-25 Thread Aaron Knister

Hi Stijn,

Thanks for sharing your experiences-- I'm glad I'm not the only one 
whose had the idea (and come up empty handed).


About the pagpool and numa awareness, I'd remembered seeing something 
about that somewhere and I did some googling and found there's a 
parameter called numaMemoryInterleave that "starts mmfsd with numactl 
--interleave=all". Do you think that provides the kind of numa awareness 
you're looking for?


-Aaron

On 2/23/18 9:44 AM, Stijn De Weirdt wrote:

hi all,

we had the same idea long ago, afaik the issue we had was due to the
pinned memory the pagepool uses when RDMA is enabled.

at some point we restarted gpfs on the compute nodes for each job,
similar to the way we do swapoff/swapon; but in certain scenarios gpfs
really did not like it; so we gave up on it.

the other issue that needs to be resolved is that the pagepool needs to
be numa aware, so the pagepool is nicely allocated across all numa
domains, instead of using the first ones available. otherwise compute
jobs might start that only do non-local doamin memeory access.

stijn

On 02/23/2018 03:35 PM, IBM Spectrum Scale wrote:

AFAIK you can increase the pagepool size dynamically but you cannot shrink
it dynamically.  To shrink it you must restart the GPFS daemon.   Also,
could you please provide the actual pmap commands you executed?

Regards, The Spectrum Scale (GPFS) team

--
If you feel that your question can benefit other users of  Spectrum Scale
(GPFS), then please post it to the public IBM developerWroks Forum at
https://www.ibm.com/developerworks/community/forums/html/forum?id=----0479
.

If your query concerns a potential software error in Spectrum Scale (GPFS)
and you have an IBM software maintenance contract please contact
1-800-237-5511 in the United States or your local IBM Service Center in
other countries.

The forum is informally monitored as time permits and should not be used
for priority messages to the Spectrum Scale (GPFS) team.



From:   Aaron Knister <aaron.s.knis...@nasa.gov>
To: <gpfsug-discuss@spectrumscale.org>
Date:   02/22/2018 10:30 PM
Subject:    Re: [gpfsug-discuss] pagepool shrink doesn't release all
memory
Sent by:gpfsug-discuss-boun...@spectrumscale.org



This is also interesting (although I don't know what it really means).
Looking at pmap run against mmfsd I can see what happens after each step:

# baseline
7fffe4639000  59164K  0K  0K  0K  0K ---p [anon]
7fffd837e000  61960K  0K  0K  0K  0K ---p [anon]
0200 1048576K 1048576K 1048576K 1048576K  0K rwxp [anon]
Total:   1613580K 1191020K 1189650K 1171836K  0K

# tschpool 64G
7fffe4639000  59164K  0K  0K  0K  0K ---p [anon]
7fffd837e000  61960K  0K  0K  0K  0K ---p [anon]
0200 67108864K 67108864K 67108864K 67108864K  0K rwxp
[anon]
Total:   67706636K 67284108K 67282625K 67264920K  0K

# tschpool 1G
7fffe4639000  59164K  0K  0K  0K  0K ---p [anon]
7fffd837e000  61960K  0K  0K  0K  0K ---p [anon]
02000140 139264K 139264K 139264K 139264K  0K rwxp [anon]
020fc940 897024K 897024K 897024K 897024K  0K rwxp [anon]
020009c0 66052096K  0K  0K  0K  0K rwxp [anon]
Total:   67706636K 1223820K 1222451K 1204632K  0K

Even though mmfsd has that 64G chunk allocated there's none of it
*used*. I wonder why Linux seems to be accounting it as allocated.

-Aaron

On 2/22/18 10:17 PM, Aaron Knister wrote:

I've been exploring the idea for a while of writing a SLURM SPANK plugin



to allow users to dynamically change the pagepool size on a node. Every
now and then we have some users who would benefit significantly from a
much larger pagepool on compute nodes but by default keep it on the
smaller side to make as much physmem available as possible to batch

work.


In testing, though, it seems as though reducing the pagepool doesn't
quite release all of the memory. I don't really understand it because
I've never before seen memory that was previously resident become
un-resident but still maintain the virtual memory allocation.

Here's what I mean. Let's take a node with 128G and a 1G pagepool.

If I do the following to simulate what might happen as various jobs
tweak the pagepool:

- tschpool 64G
- tschpool 1G
- tschpool 32G
- tschpool 1G
- tschpool 32G

I end up with this:

mmfsd thinks there's 32G resident but 64G virt
# ps -o vsz,rss,comm -p 24397
 VSZ   RSS COMMAND
67589400 33723236 mmfsd

however, linux thinks there's ~100G used

# free -g
   total   used   free sharedbuffers

cached

Mem:   125100 25  0  0

0

-/+ buffers/cache: 98 26
Swap:7   

Re: [gpfsug-discuss] pagepool shrink doesn't release all memory

2018-02-25 Thread Aaron Knister

Hmm...interesting. It sure seems to try :)

The pmap command was this:

pmap $(pidof mmfsd) | sort -n -k3 | tail

-Aaron

On 2/23/18 9:35 AM, IBM Spectrum Scale wrote:
AFAIK you can increase the pagepool size dynamically but you cannot 
shrink it dynamically.  To shrink it you must restart the GPFS daemon.   
Also, could you please provide the actual pmap commands you executed?


Regards, The Spectrum Scale (GPFS) team

--
If you feel that your question can benefit other users of  Spectrum 
Scale (GPFS), then please post it to the public IBM developerWroks Forum 
at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=----0479. 



If your query concerns a potential software error in Spectrum Scale 
(GPFS) and you have an IBM software maintenance contract please contact 
  1-800-237-5511 in the United States or your local IBM Service Center 
in other countries.


The forum is informally monitored as time permits and should not be used 
for priority messages to the Spectrum Scale (GPFS) team.




From: Aaron Knister <aaron.s.knis...@nasa.gov>
To: <gpfsug-discuss@spectrumscale.org>
Date: 02/22/2018 10:30 PM
Subject: Re: [gpfsug-discuss] pagepool shrink doesn't release all memory
Sent by: gpfsug-discuss-boun...@spectrumscale.org




This is also interesting (although I don't know what it really means).
Looking at pmap run against mmfsd I can see what happens after each step:

# baseline
7fffe4639000  59164K      0K      0K      0K      0K ---p [anon]
7fffd837e000  61960K      0K      0K      0K      0K ---p [anon]
0200 1048576K 1048576K 1048576K 1048576K      0K rwxp [anon]
Total:           1613580K 1191020K 1189650K 1171836K      0K

# tschpool 64G
7fffe4639000  59164K      0K      0K      0K      0K ---p [anon]
7fffd837e000  61960K      0K      0K      0K      0K ---p [anon]
0200 67108864K 67108864K 67108864K 67108864K  0K rwxp [anon]
Total:           67706636K 67284108K 67282625K 67264920K      0K

# tschpool 1G
7fffe4639000  59164K      0K      0K      0K      0K ---p [anon]
7fffd837e000  61960K      0K      0K      0K      0K ---p [anon]
02000140 139264K 139264K 139264K 139264K      0K rwxp [anon]
020fc940 897024K 897024K 897024K 897024K      0K rwxp [anon]
020009c0 66052096K      0K      0K      0K      0K rwxp [anon]
Total:           67706636K 1223820K 1222451K 1204632K      0K

Even though mmfsd has that 64G chunk allocated there's none of it
*used*. I wonder why Linux seems to be accounting it as allocated.

-Aaron

On 2/22/18 10:17 PM, Aaron Knister wrote:
 > I've been exploring the idea for a while of writing a SLURM SPANK plugin
 > to allow users to dynamically change the pagepool size on a node. Every
 > now and then we have some users who would benefit significantly from a
 > much larger pagepool on compute nodes but by default keep it on the
 > smaller side to make as much physmem available as possible to batch work.
 >
 > In testing, though, it seems as though reducing the pagepool doesn't
 > quite release all of the memory. I don't really understand it because
 > I've never before seen memory that was previously resident become
 > un-resident but still maintain the virtual memory allocation.
 >
 > Here's what I mean. Let's take a node with 128G and a 1G pagepool.
 >
 > If I do the following to simulate what might happen as various jobs
 > tweak the pagepool:
 >
 > - tschpool 64G
 > - tschpool 1G
 > - tschpool 32G
 > - tschpool 1G
 > - tschpool 32G
 >
 > I end up with this:
 >
 > mmfsd thinks there's 32G resident but 64G virt
 > # ps -o vsz,rss,comm -p 24397
 >     VSZ   RSS COMMAND
 > 67589400 33723236 mmfsd
 >
 > however, linux thinks there's ~100G used
 >
 > # free -g
 > total   used free shared    buffers cached
 > Mem:   125 100 25 0  0 0
 > -/+ buffers/cache: 98 26
 > Swap: 7  0 7
 >
 > I can jump back and forth between 1G and 32G *after* allocating 64G
 > pagepool and the overall amount of memory in use doesn't balloon but I
 > can't seem to shed that original 64G.
 >
 > I don't understand what's going on... :) Any ideas? This is with Scale
 > 4.2.3.6.
 >
 > -Aaron
 >

--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss=DwIGaQ=jf_iaSHvJObTbx-siA1ZOg=IbxtjdkPAM2Sbon4Lbbi4w=OrZQeEmI6chBdguG-h4YPHsxXZ4gTU3CtIuN4e3ijdY=hvVIRG5kB1zom2Iql

Re: [gpfsug-discuss] pagepool shrink doesn't release all memory

2018-02-23 Thread Stijn De Weirdt
hi all,

we had the same idea long ago, afaik the issue we had was due to the
pinned memory the pagepool uses when RDMA is enabled.

at some point we restarted gpfs on the compute nodes for each job,
similar to the way we do swapoff/swapon; but in certain scenarios gpfs
really did not like it; so we gave up on it.

the other issue that needs to be resolved is that the pagepool needs to
be numa aware, so the pagepool is nicely allocated across all numa
domains, instead of using the first ones available. otherwise compute
jobs might start that only do non-local doamin memeory access.

stijn

On 02/23/2018 03:35 PM, IBM Spectrum Scale wrote:
> AFAIK you can increase the pagepool size dynamically but you cannot shrink 
> it dynamically.  To shrink it you must restart the GPFS daemon.   Also, 
> could you please provide the actual pmap commands you executed?
> 
> Regards, The Spectrum Scale (GPFS) team
> 
> --
> If you feel that your question can benefit other users of  Spectrum Scale 
> (GPFS), then please post it to the public IBM developerWroks Forum at 
> https://www.ibm.com/developerworks/community/forums/html/forum?id=----0479
> . 
> 
> If your query concerns a potential software error in Spectrum Scale (GPFS) 
> and you have an IBM software maintenance contract please contact 
> 1-800-237-5511 in the United States or your local IBM Service Center in 
> other countries. 
> 
> The forum is informally monitored as time permits and should not be used 
> for priority messages to the Spectrum Scale (GPFS) team.
> 
> 
> 
> From:   Aaron Knister <aaron.s.knis...@nasa.gov>
> To: <gpfsug-discuss@spectrumscale.org>
> Date:   02/22/2018 10:30 PM
> Subject:Re: [gpfsug-discuss] pagepool shrink doesn't release all 
> memory
> Sent by:gpfsug-discuss-boun...@spectrumscale.org
> 
> 
> 
> This is also interesting (although I don't know what it really means). 
> Looking at pmap run against mmfsd I can see what happens after each step:
> 
> # baseline
> 7fffe4639000  59164K  0K  0K  0K  0K ---p [anon]
> 7fffd837e000  61960K  0K  0K  0K  0K ---p [anon]
> 0200 1048576K 1048576K 1048576K 1048576K  0K rwxp [anon]
> Total:   1613580K 1191020K 1189650K 1171836K  0K
> 
> # tschpool 64G
> 7fffe4639000  59164K  0K  0K  0K  0K ---p [anon]
> 7fffd837e000  61960K  0K  0K  0K  0K ---p [anon]
> 0200 67108864K 67108864K 67108864K 67108864K  0K rwxp 
> [anon]
> Total:   67706636K 67284108K 67282625K 67264920K  0K
> 
> # tschpool 1G
> 7fffe4639000  59164K  0K  0K  0K  0K ---p [anon]
> 7fffd837e000  61960K  0K  0K  0K  0K ---p [anon]
> 02000140 139264K 139264K 139264K 139264K  0K rwxp [anon]
> 020fc940 897024K 897024K 897024K 897024K  0K rwxp [anon]
> 020009c0 66052096K  0K  0K  0K  0K rwxp [anon]
> Total:   67706636K 1223820K 1222451K 1204632K  0K
> 
> Even though mmfsd has that 64G chunk allocated there's none of it 
> *used*. I wonder why Linux seems to be accounting it as allocated.
> 
> -Aaron
> 
> On 2/22/18 10:17 PM, Aaron Knister wrote:
>> I've been exploring the idea for a while of writing a SLURM SPANK plugin 
> 
>> to allow users to dynamically change the pagepool size on a node. Every 
>> now and then we have some users who would benefit significantly from a 
>> much larger pagepool on compute nodes but by default keep it on the 
>> smaller side to make as much physmem available as possible to batch 
> work.
>>
>> In testing, though, it seems as though reducing the pagepool doesn't 
>> quite release all of the memory. I don't really understand it because 
>> I've never before seen memory that was previously resident become 
>> un-resident but still maintain the virtual memory allocation.
>>
>> Here's what I mean. Let's take a node with 128G and a 1G pagepool.
>>
>> If I do the following to simulate what might happen as various jobs 
>> tweak the pagepool:
>>
>> - tschpool 64G
>> - tschpool 1G
>> - tschpool 32G
>> - tschpool 1G
>> - tschpool 32G
>>
>> I end up with this:
>>
>> mmfsd thinks there's 32G resident but 64G virt
>> # ps -o vsz,rss,comm -p 24397
>> VSZ   RSS COMMAND
>> 67589400 33723236 mmfsd
>>
>> however, linux thinks there's ~100G used
>>
>> # free -g
>>   total   used   free sharedbuffers 
> c

Re: [gpfsug-discuss] pagepool shrink doesn't release all memory

2018-02-23 Thread IBM Spectrum Scale
AFAIK you can increase the pagepool size dynamically but you cannot shrink 
it dynamically.  To shrink it you must restart the GPFS daemon.   Also, 
could you please provide the actual pmap commands you executed?

Regards, The Spectrum Scale (GPFS) team

--
If you feel that your question can benefit other users of  Spectrum Scale 
(GPFS), then please post it to the public IBM developerWroks Forum at 
https://www.ibm.com/developerworks/community/forums/html/forum?id=----0479
. 

If your query concerns a potential software error in Spectrum Scale (GPFS) 
and you have an IBM software maintenance contract please contact 
1-800-237-5511 in the United States or your local IBM Service Center in 
other countries. 

The forum is informally monitored as time permits and should not be used 
for priority messages to the Spectrum Scale (GPFS) team.



From:   Aaron Knister <aaron.s.knis...@nasa.gov>
To: <gpfsug-discuss@spectrumscale.org>
Date:   02/22/2018 10:30 PM
Subject:    Re: [gpfsug-discuss] pagepool shrink doesn't release all 
memory
Sent by:gpfsug-discuss-boun...@spectrumscale.org



This is also interesting (although I don't know what it really means). 
Looking at pmap run against mmfsd I can see what happens after each step:

# baseline
7fffe4639000  59164K  0K  0K  0K  0K ---p [anon]
7fffd837e000  61960K  0K  0K  0K  0K ---p [anon]
0200 1048576K 1048576K 1048576K 1048576K  0K rwxp [anon]
Total:   1613580K 1191020K 1189650K 1171836K  0K

# tschpool 64G
7fffe4639000  59164K  0K  0K  0K  0K ---p [anon]
7fffd837e000  61960K  0K  0K  0K  0K ---p [anon]
0200 67108864K 67108864K 67108864K 67108864K  0K rwxp 
[anon]
Total:   67706636K 67284108K 67282625K 67264920K  0K

# tschpool 1G
7fffe4639000  59164K  0K  0K  0K  0K ---p [anon]
7fffd837e000  61960K  0K  0K  0K  0K ---p [anon]
02000140 139264K 139264K 139264K 139264K  0K rwxp [anon]
020fc940 897024K 897024K 897024K 897024K  0K rwxp [anon]
020009c0 66052096K  0K  0K  0K  0K rwxp [anon]
Total:   67706636K 1223820K 1222451K 1204632K  0K

Even though mmfsd has that 64G chunk allocated there's none of it 
*used*. I wonder why Linux seems to be accounting it as allocated.

-Aaron

On 2/22/18 10:17 PM, Aaron Knister wrote:
> I've been exploring the idea for a while of writing a SLURM SPANK plugin 

> to allow users to dynamically change the pagepool size on a node. Every 
> now and then we have some users who would benefit significantly from a 
> much larger pagepool on compute nodes but by default keep it on the 
> smaller side to make as much physmem available as possible to batch 
work.
> 
> In testing, though, it seems as though reducing the pagepool doesn't 
> quite release all of the memory. I don't really understand it because 
> I've never before seen memory that was previously resident become 
> un-resident but still maintain the virtual memory allocation.
> 
> Here's what I mean. Let's take a node with 128G and a 1G pagepool.
> 
> If I do the following to simulate what might happen as various jobs 
> tweak the pagepool:
> 
> - tschpool 64G
> - tschpool 1G
> - tschpool 32G
> - tschpool 1G
> - tschpool 32G
> 
> I end up with this:
> 
> mmfsd thinks there's 32G resident but 64G virt
> # ps -o vsz,rss,comm -p 24397
> VSZ   RSS COMMAND
> 67589400 33723236 mmfsd
> 
> however, linux thinks there's ~100G used
> 
> # free -g
>   total   used   free sharedbuffers 
cached
> Mem:   125100 25  0  0  
0
> -/+ buffers/cache: 98 26
> Swap:7  0  7
> 
> I can jump back and forth between 1G and 32G *after* allocating 64G 
> pagepool and the overall amount of memory in use doesn't balloon but I 
> can't seem to shed that original 64G.
> 
> I don't understand what's going on... :) Any ideas? This is with Scale 
> 4.2.3.6.
> 
> -Aaron
> 

-- 
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss=DwIGaQ=jf_iaSHvJObTbx-siA1ZOg=IbxtjdkPAM2Sbon4Lbbi4w=OrZQeEmI6chBdguG-h4YPHsxXZ4gTU3CtIuN4e3ijdY=hvVIRG5kB1zom2Iql2_TOagchsgl99juKiZfJt5S1tM=






___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] pagepool shrink doesn't release all memory

2018-02-22 Thread Aaron Knister
This is also interesting (although I don't know what it really means). 
Looking at pmap run against mmfsd I can see what happens after each step:


# baseline
7fffe4639000  59164K  0K  0K  0K  0K ---p [anon]
7fffd837e000  61960K  0K  0K  0K  0K ---p [anon]
0200 1048576K 1048576K 1048576K 1048576K  0K rwxp [anon]
Total:   1613580K 1191020K 1189650K 1171836K  0K

# tschpool 64G
7fffe4639000  59164K  0K  0K  0K  0K ---p [anon]
7fffd837e000  61960K  0K  0K  0K  0K ---p [anon]
0200 67108864K 67108864K 67108864K 67108864K  0K rwxp [anon]
Total:   67706636K 67284108K 67282625K 67264920K  0K

# tschpool 1G
7fffe4639000  59164K  0K  0K  0K  0K ---p [anon]
7fffd837e000  61960K  0K  0K  0K  0K ---p [anon]
02000140 139264K 139264K 139264K 139264K  0K rwxp [anon]
020fc940 897024K 897024K 897024K 897024K  0K rwxp [anon]
020009c0 66052096K  0K  0K  0K  0K rwxp [anon]
Total:   67706636K 1223820K 1222451K 1204632K  0K

Even though mmfsd has that 64G chunk allocated there's none of it 
*used*. I wonder why Linux seems to be accounting it as allocated.


-Aaron

On 2/22/18 10:17 PM, Aaron Knister wrote:
I've been exploring the idea for a while of writing a SLURM SPANK plugin 
to allow users to dynamically change the pagepool size on a node. Every 
now and then we have some users who would benefit significantly from a 
much larger pagepool on compute nodes but by default keep it on the 
smaller side to make as much physmem available as possible to batch work.


In testing, though, it seems as though reducing the pagepool doesn't 
quite release all of the memory. I don't really understand it because 
I've never before seen memory that was previously resident become 
un-resident but still maintain the virtual memory allocation.


Here's what I mean. Let's take a node with 128G and a 1G pagepool.

If I do the following to simulate what might happen as various jobs 
tweak the pagepool:


- tschpool 64G
- tschpool 1G
- tschpool 32G
- tschpool 1G
- tschpool 32G

I end up with this:

mmfsd thinks there's 32G resident but 64G virt
# ps -o vsz,rss,comm -p 24397
    VSZ   RSS COMMAND
67589400 33723236 mmfsd

however, linux thinks there's ~100G used

# free -g
  total   used   free shared    buffers cached
Mem:   125    100 25  0  0  0
-/+ buffers/cache: 98 26
Swap:    7  0  7

I can jump back and forth between 1G and 32G *after* allocating 64G 
pagepool and the overall amount of memory in use doesn't balloon but I 
can't seem to shed that original 64G.


I don't understand what's going on... :) Any ideas? This is with Scale 
4.2.3.6.


-Aaron



--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] pagepool shrink doesn't release all memory

2018-02-22 Thread Aaron Knister
I've been exploring the idea for a while of writing a SLURM SPANK plugin 
to allow users to dynamically change the pagepool size on a node. Every 
now and then we have some users who would benefit significantly from a 
much larger pagepool on compute nodes but by default keep it on the 
smaller side to make as much physmem available as possible to batch work.


In testing, though, it seems as though reducing the pagepool doesn't 
quite release all of the memory. I don't really understand it because 
I've never before seen memory that was previously resident become 
un-resident but still maintain the virtual memory allocation.


Here's what I mean. Let's take a node with 128G and a 1G pagepool.

If I do the following to simulate what might happen as various jobs 
tweak the pagepool:


- tschpool 64G
- tschpool 1G
- tschpool 32G
- tschpool 1G
- tschpool 32G

I end up with this:

mmfsd thinks there's 32G resident but 64G virt
# ps -o vsz,rss,comm -p 24397
   VSZ   RSS COMMAND
67589400 33723236 mmfsd

however, linux thinks there's ~100G used

# free -g
 total   used   free sharedbuffers cached
Mem:   125100 25  0  0  0
-/+ buffers/cache: 98 26
Swap:7  0  7

I can jump back and forth between 1G and 32G *after* allocating 64G 
pagepool and the overall amount of memory in use doesn't balloon but I 
can't seem to shed that original 64G.


I don't understand what's going on... :) Any ideas? This is with Scale 
4.2.3.6.


-Aaron

--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss