Re: Multiprocessing and memory management

2019-07-04 Thread Thomas Jollans
On 03/07/2019 18.37, Israel Brewster wrote:
> I have a script that benefits greatly from multiprocessing (it’s generating a 
> bunch of images from data). Of course, as expected each process uses a chunk 
> of memory, and the more processes there are, the more memory used. The amount 
> used per process can vary from around 3 GB (yes, gigabytes) to over 40 or 50 
> GB, depending on the amount of data being processed (usually closer to 10GB, 
> the 40/50 is fairly rare). This puts me in a position of needing to balance 
> the number of processes with memory usage, such that I maximize resource 
> utilization (running one process at a time would simply take WAY to long) 
> while not overloading RAM (which at best would slow things down due to swap). 
>
> Obviously this process will be run on a machine with lots of RAM, but as I 
> don’t know how large the datasets that will be fed to it are, I wanted to see 
> if I could build some intelligence into the program such that it doesn’t 
> overload the memory. A couple of approaches I thought of:
>
> 1) Determine the total amount of RAM in the machine (how?), assume an average 
> of 10GB per process, and only launch as many processes as calculated to fit. 
> Easy, but would run the risk of under-utilizing the processing capabilities 
> and taking longer to run if most of the processes were using significantly 
> less than 10GB
>
> 2) Somehow monitor the memory usage of the various processes, and if one 
> process needs a lot, pause the others until that one is complete. Of course, 
> I’m not sure if this is even possible.
>
> 3) Other approaches?
>

Are you familiar with Dask? 

I don't know it myself other than through hearsay, but I have a feeling
it may have a ready-to-go solution to your problem. You'd have to look
into dask in more detail than I have...


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Multiprocessing and memory management

2019-07-03 Thread Peter J. Holzer
On 2019-07-03 08:37:50 -0800, Israel Brewster wrote:
> 1) Determine the total amount of RAM in the machine (how?), assume an
> average of 10GB per process, and only launch as many processes as
> calculated to fit. Easy, but would run the risk of under-utilizing the
> processing capabilities and taking longer to run if most of the
> processes were using significantly less than 10GB
> 
> 2) Somehow monitor the memory usage of the various processes, and if
> one process needs a lot, pause the others until that one is complete.
> Of course, I’m not sure if this is even possible.

If you use Linux or another unixoid OS, you can pause and resume
processes with the STOP and CONT signals. Just keep in mind that the
paused processes still need virtual memory and cannot complete - so you
need enough swap space.


> 3) Other approaches?

Is the memory usage at all predictable? I.e. can you estimate the usage
from the size of the input data? Or after the process has been running
for a short time? In that case you could monitor the free space and use
your estimates to determine whether you can start another process now or
need to wait until later (until a process terminates or maybe only until
you get a better estimate).

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson 


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Multiprocessing and memory management

2019-07-03 Thread Gary Herron


On 7/3/19 9:37 AM, ijbrews...@alaska.edu wrote:

I have a script that benefits greatly from multiprocessing (it’s generating a 
bunch of images from data). Of course, as expected each process uses a chunk of 
memory, and the more processes there are, the more memory used. The amount used 
per process can vary from around 3 GB (yes, gigabytes) to over 40 or 50 GB, 
depending on the amount of data being processed (usually closer to 10GB, the 
40/50 is fairly rare). This puts me in a position of needing to balance the 
number of processes with memory usage, such that I maximize resource 
utilization (running one process at a time would simply take WAY to long) while 
not overloading RAM (which at best would slow things down due to swap).

Obviously this process will be run on a machine with lots of RAM, but as I 
don’t know how large the datasets that will be fed to it are, I wanted to see 
if I could build some intelligence into the program such that it doesn’t 
overload the memory. A couple of approaches I thought of:

1) Determine the total amount of RAM in the machine (how?), assume an average 
of 10GB per process, and only launch as many processes as calculated to fit. 
Easy, but would run the risk of under-utilizing the processing capabilities and 
taking longer to run if most of the processes were using significantly less 
than 10GB



Try psutil to get information about memory (and cpu usage and lots 
more).  For example:


>>> import psutil
>>> psutil.virtual_memory()
svmem(total=16769519616, available=9151971328, percent=45.4, 
used=7031549952, free=4486520832, active=9026158592, 
inactive=2238566400, buffers=312815616, cached=4938633216, 
shared=234295296, slab=593375232)


Home page: https://github.com/giampaolo/psutil




2) Somehow monitor the memory usage of the various processes, and if one 
process needs a lot, pause the others until that one is complete. Of course, 
I’m not sure if this is even possible.

3) Other approaches?


---
Israel Brewster
Software Engineer
Alaska Volcano Observatory
Geophysical Institute - UAF
2156 Koyukuk Drive
Fairbanks AK 99775-7320
Work: 907-474-5172
cell:  907-328-9145


--
Dr. Gary Herron
Professor of Computer Science
DigiPen Institute of Technology
(425) 895-4418

--
https://mail.python.org/mailman/listinfo/python-list


Multiprocessing and memory management

2019-07-03 Thread Israel Brewster
I have a script that benefits greatly from multiprocessing (it’s generating a 
bunch of images from data). Of course, as expected each process uses a chunk of 
memory, and the more processes there are, the more memory used. The amount used 
per process can vary from around 3 GB (yes, gigabytes) to over 40 or 50 GB, 
depending on the amount of data being processed (usually closer to 10GB, the 
40/50 is fairly rare). This puts me in a position of needing to balance the 
number of processes with memory usage, such that I maximize resource 
utilization (running one process at a time would simply take WAY to long) while 
not overloading RAM (which at best would slow things down due to swap). 

Obviously this process will be run on a machine with lots of RAM, but as I 
don’t know how large the datasets that will be fed to it are, I wanted to see 
if I could build some intelligence into the program such that it doesn’t 
overload the memory. A couple of approaches I thought of:

1) Determine the total amount of RAM in the machine (how?), assume an average 
of 10GB per process, and only launch as many processes as calculated to fit. 
Easy, but would run the risk of under-utilizing the processing capabilities and 
taking longer to run if most of the processes were using significantly less 
than 10GB

2) Somehow monitor the memory usage of the various processes, and if one 
process needs a lot, pause the others until that one is complete. Of course, 
I’m not sure if this is even possible.

3) Other approaches?


---
Israel Brewster
Software Engineer
Alaska Volcano Observatory 
Geophysical Institute - UAF 
2156 Koyukuk Drive 
Fairbanks AK 99775-7320
Work: 907-474-5172
cell:  907-328-9145

-- 
https://mail.python.org/mailman/listinfo/python-list