Re: [racket-users] Memory usage on Linux

2018-08-26 Thread Jonathan Simpson
Yes, it is running on a linode and the OOM eventually kills the process. 
There is some swap space available, but I probably should increase it. I 
started with trying to understand my racket program because it is a lot 
more fun for me than sysadmin type stuff :)

-- Jonathan

On Sunday, August 26, 2018 at 9:02:47 PM UTC-4, gneuner2 wrote:
>
>
>
> On 8/26/2018 6:43 PM, Jonathan Simpson wrote: 
> > The fact that Racket isn't releasing the memory back to the OS appears 
> > to be causing the system to eventually run out of physical pages. 
>
> Is this a cloud server?  Is the problem that the "out-of-memory" (OOM) 
> handler is killing processes?  That won't happen if you provide a swap 
> space to cover over-commit spikes  (or prevent them a priori with 
> ulimit).  In any event you probably should disable the OOM killer and 
> adjust the kernel's over-commit behavior to something reasonable. 
>
> see: 
>   
> https://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html
>  
>   
> http://engineering.pivotal.io/post/virtual_memory_settings_in_linux_-_the_problem_with_overcommit/
>  
>   
> https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/performance_tuning_guide/s-memory-captun
>  
>
> George 
>

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Memory usage on Linux

2018-08-26 Thread George Neuner




On 8/26/2018 6:43 PM, Jonathan Simpson wrote:
The fact that Racket isn't releasing the memory back to the OS appears 
to be causing the system to eventually run out of physical pages.


Is this a cloud server?  Is the problem that the "out-of-memory" (OOM) 
handler is killing processes?  That won't happen if you provide a swap 
space to cover over-commit spikes  (or prevent them a priori with 
ulimit).  In any event you probably should disable the OOM killer and 
adjust the kernel's over-commit behavior to something reasonable.


see:
 
https://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html
 
http://engineering.pivotal.io/post/virtual_memory_settings_in_linux_-_the_problem_with_overcommit/
 
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/performance_tuning_guide/s-memory-captun

George

--
You received this message because you are subscribed to the Google Groups "Racket 
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Memory usage on Linux

2018-08-26 Thread Jonathan Simpson


On Sunday, August 26, 2018 at 7:43:20 PM UTC-4, Matthew Flatt wrote:
>
>
> Racket's memory manager does not immediately release pages back to the 
> OS (i.e., unmap them) after a GC. In its current configuration, the GC 
> releases a page at the beginning of a major GC only if the page was 
> unused at the *start* of the previous GC. So, there's roughly a lag of 
> 2 GCs to unmap a page that becomes unused as a result of a GC. That 
> policy reduces back-and-forth with the OS on page allocation, which can 
> be relatively costly. 
>
>
I think this lag explains exactly what I'm seeing. I ran (collect-garbage 
'major) twice in a row and it freed up several hunderd MBs that it wasn't 
freeing after just one collection. This should work for my needs!

Then again, at the end of a GC, if there are more than 4 times as many 
> mapped pages as currently in use, then all unused pages are immediately 
> released. That rule is an attempt to keep actual use and mapped pages 
> from getting too far out of sync. 
>
>
I think I am probably close but not quite at the 4x threshold. I'm happy to 
hear this, because I don't need to worry overmuch about memory usage 
growing too much larger than it already is without memory being freed. 

Thanks again for the explanation. I'm assuming there aren't any GC 
parameters for this that I can tweak from application code, but please let 
me know if there are!

-- Jonathan

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Memory usage on Linux

2018-08-26 Thread George Neuner



On 8/26/2018 7:43 PM, Matthew Flatt wrote

Racket's memory manager does not immediately release pages back to the
OS (i.e., unmap them) after a GC. In its current configuration, the GC
releases a page at the beginning of a major GC only if the page was
unused at the *start* of the previous GC. So, there's roughly a lag of
2 GCs to unmap a page that becomes unused as a result of a GC. That
policy reduces back-and-forth with the OS on page allocation, which can
be relatively costly.

Then again, at the end of a GC, if there are more than 4 times as many
mapped pages as currently in use, then all unused pages are immediately
released. That rule is an attempt to keep actual use and mapped pages
from getting too far out of sync.

Racket doesn't reserve pages without mapping them, so you don't need to
worry about that potential difference.


Good to know.  Thanks for the peek inside.
George

--
You received this message because you are subscribed to the Google Groups "Racket 
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Memory usage on Linux

2018-08-26 Thread Matthew Flatt
At Sun, 26 Aug 2018 09:55:25 -0700 (PDT), Jonathan Simpson wrote:
> Then if I run (collect-garbage 'major), 
> current-memory-use reports only about 300MB in use, but the VIRT/RES values 
> reported by top do not change. The VIRT/RES values don't actually decrease 
> until I unlink the variable pointing to the data structure and re-run 
> collect-garbage. At that point top reports about 300MB when according to 
> current-memory-use only about 80MB of memory is reachable.
> 
> So, it seems that the memory used by Linux is lagging behind Racket's 
> garbage collection. Is there an explanation for this? I need a way to fully 
> release resources back to the OS as quickly as possible.

I'm not sure exactly how it maps to what you're seeing, but here are
some things I can report.

Racket's memory manager does not immediately release pages back to the
OS (i.e., unmap them) after a GC. In its current configuration, the GC
releases a page at the beginning of a major GC only if the page was
unused at the *start* of the previous GC. So, there's roughly a lag of
2 GCs to unmap a page that becomes unused as a result of a GC. That
policy reduces back-and-forth with the OS on page allocation, which can
be relatively costly.

Then again, at the end of a GC, if there are more than 4 times as many
mapped pages as currently in use, then all unused pages are immediately
released. That rule is an attempt to keep actual use and mapped pages
from getting too far out of sync.

Racket doesn't reserve pages without mapping them, so you don't need to
worry about that potential difference.

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] Memory usage on Linux

2018-08-26 Thread Jonathan Simpson
Thanks for the response as it confirms my suspicions. For my purposes the 
resident memory reported by top should be accurate enough. I'm talking 
about 100s of MB that I'd like to free immediately, so I'm not too 
concerned about CODE pages or the GC taking a few MBs. All that is dwarfed 
by the amount of heap data that my application is creating.

Restricting resident pages with ulimit isn't really what I'm after because 
my application could benefit from a large amount of memory initially. I 
just don't want it to hold onto it. The fact that Racket isn't releasing 
the memory back to the OS appears to be causing the system to eventually 
run out of physical pages.

Nevertheless, I can experiment with ulimit to see if it will help. Perhaps 
it won't affect startup time too much. I will experiment with 
custodian-limit-memory as well. Maybe that will give Racket the hint it 
needs to fully release the unused memory from the heap.

If anyone is knowledgeable of how Racket decides to release physical pages 
back to the OS, I'd be very interested in hearing about it. Especially if 
there is a way to customize the behavior.

Thanks,
Jonathan

On Sunday, August 26, 2018 at 5:52:59 PM UTC-4, gneuner2 wrote:
>
>
>
> On 8/26/2018 12:55 PM, Jonathan Simpson wrote:
>
> I have a Racket application that I need to run in a fairly memory 
> constrained environment(1 GB ram) and I've ran into something I don't quite 
> understand. The application is deserializing a fairly large data structure 
> from disk on startup. After loading, current-memory-use 
> 
>  reports 
> about 1GB of memory in use, which matches the values reported by top(the 
> application is running on Linux and top's VIRT/RES values are about 
> 1GB/800MB for the process). 
>
> Then if I run (collect-garbage 
> 
>  'major), 
> current-memory-use reports only about 300MB in use, but the VIRT/RES values 
> reported by top do not change. The VIRT/RES values don't actually decrease 
> until I unlink the variable pointing to the data structure and re-run 
> collect-garbage. At that point top reports about 300MB when according to 
> current-memory-use only about 80MB of memory is reachable.
>
> So, it seems that the memory used by Linux is lagging behind Racket's 
> garbage collection. Is there an explanation for this? I need a way to fully 
> release resources back to the OS as quickly as possible.
>
> Thanks,
> Jonathan
>
>
> GC does not necessarily return freed heap space to the operating system - 
> it expects the program is going to use it again.  Unfortunately, I don't 
> know under what circumstances Racket actually does return space to the OS.  
> Processes can reserve blocks of address space without having VMM pages 
> actually mapped to those addresses, and mapped pages may not be resident in 
> RAM.
>
> Consequently top's results can be misleading - you have to distinguish 
> between what is "address space" use versus what is actual RAM use, and what 
> it actually means.  RES shows total RAM residency, but it includes both 
> code and data.  To double check GC behavior (so far as you can) you should 
> look at DATA, but be aware that it include both the heap and stack.  Also 
> remember that the GC code may have to be swapped in to perform the 
> collection, and that may *increase* both CODE and RES, without necessarily 
> reducing DATA.
>
> VIRT is practically useless on a modern 64-bit system - it shows the total 
> address space of the process, which includes RAM resident code + data, VMM 
> code file mappings for the program, the Racket framework and any libraries 
> in use, mapped data files, shared memory segments, and process data pages 
> that are currently swapped out to disk.  VIRT was (only slightly) more 
> relevant when processes had to fit into 3GB on 32-bit systems.
>
>
> If you really need to keep the process residency under a certain size, you 
> have to use *ulimit*.  Racket itself provides no controls to restrict 
> process residency - the code is whatever size it is and the heap will 
> expand whenever necessary subject to OS limits.  You can place soft limits 
> on the program's heap usage with *custodian-limit-memory*, but the limits 
> are enforced only at collection time ... if the heap happens to be larger 
> than the current limit, the program can use all of it before noticing that 
> the limit has been passed.
>
> At the OS level you can use "ulimit -m " to restrict total process 
> RAM use.  This does not affect expanding heap address space when necessary 
> - it only limits the RAM residency of the process:  program code and data 
> will be swapped in/out as necessary to keep the program running.  Note that 
> there is no way to individually control the sizes of the 

Re: [racket-users] Memory usage on Linux

2018-08-26 Thread George Neuner



On 8/26/2018 12:55 PM, Jonathan Simpson wrote:
I have a Racket application that I need to run in a fairly memory 
constrained environment(1 GB ram) and I've ran into something I don't 
quite understand. The application is deserializing a fairly large data 
structure from disk on startup. After loading, current-memory-use 
 reports 
about 1GB of memory in use, which matches the values reported by 
top(the application is running on Linux and top's VIRT/RES values are 
about 1GB/800MB for the process).


Then if I run (collect-garbage 
 'major), 
current-memory-use reports only about 300MB in use, but the VIRT/RES 
values reported by top do not change. The VIRT/RES values don't 
actually decrease until I unlink the variable pointing to the data 
structure and re-run collect-garbage. At that point top reports about 
300MB when according to current-memory-use only about 80MB of memory 
is reachable.


So, it seems that the memory used by Linux is lagging behind Racket's 
garbage collection. Is there an explanation for this? I need a way to 
fully release resources back to the OS as quickly as possible.


Thanks,
Jonathan


GC does not necessarily return freed heap space to the operating system 
- it expects the program is going to use it again. Unfortunately, I 
don't know under what circumstances Racket actually does return space to 
the OS.  Processes can reserve blocks of address space without having 
VMM pages actually mapped to those addresses, and mapped pages may not 
be resident in RAM.


Consequently top's results can be misleading - you have to distinguish 
between what is "address space" use versus what is actual RAM use, and 
what it actually means.  RES shows total RAM residency, but it includes 
both code and data.  To double check GC behavior (so far as you can) you 
should look at DATA, but be aware that it include both the heap and 
stack.  Also remember that the GC code may have to be swapped in to 
perform the collection, and that may *increase* both CODE and RES, 
without necessarily reducing DATA.


VIRT is practically useless on a modern 64-bit system - it shows the 
total address space of the process, which includes RAM resident code + 
data, VMM code file mappings for the program, the Racket framework and 
any libraries in use, mapped data files, shared memory segments, and 
process data pages that are currently swapped out to disk.  VIRT was 
(only slightly) more relevant when processes had to fit into 3GB on 
32-bit systems.



If you really need to keep the process residency under a certain size, 
you have to use *ulimit*.  Racket itself provides no controls to 
restrict process residency - the code is whatever size it is and the 
heap will expand whenever necessary subject to OS limits.  You can place 
soft limits on the program's heap usage with *custodian-limit-memory*, 
but the limits are enforced only at collection time ... if the heap 
happens to be larger than the current limit, the program can use all of 
it before noticing that the limit has been passed.


At the OS level you can use "ulimit -m " to restrict total 
process RAM use.  This does not affect expanding heap address space when 
necessary - it only limits the RAM residency of the process:  program 
code and data will be swapped in/out as necessary to keep the program 
running.  Note that there is no way to individually control the sizes of 
the resident code, data and stack segments.  You can put upper bound 
limits individually on the process's data and/or stack segments, but if 
the total of code+data+stack exceeds the OS limit (whether ulimit -m or 
the OS default), Linux will swap things in/out as necessary to keep the 
whole of the process below the residency limit.


Also note that unless you are loading large binary images, your 
deserialization likely will work fine using much less total space.


Hope this helps,
George

--
You received this message because you are subscribed to the Google Groups "Racket 
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.