Re: Multiprocessing performance question

DL Neil Thu, 21 Feb 2019 10:42:48 -0800

George: apologies for mis-identifying yourself as OP.


Israel:

On 22/02/19 6:04 AM, Israel Brewster wrote:

Actually not a ’toy example’ at all. It is simply the first step ingridding some data I am working with - a problem that is solved by toolslike SatPy, but unfortunately I can’t use SatPy because it doesn’trecognize my file format, and you can’t load data directly. Writing acustom file importer for SatPy is probably my next step.

Not to focus on the word "toy", the governing issue is of setup cost cfthe acceleration afforded by the parallel processing. In this case, theformer is/was more-or-less as high as the latter, and your efforts wereinsufficiently rewarded.

That said, if the computer was concurrently performing this task and anumber of others, the number of cores available to you would decrease.At which point, speeds start heading backwards!

This is largely speculation because only you know the task, objectives,and circumstances - however, for those 'playing along at home' andlearning from your experiment...

That said, the entire process took around 60 seconds to run. As thisstep was taking 10, I figured it would be low-hanging fruit for speedingup the process. Obviously I was wrong. For what it’s worth, I did manageto re-factor the code, so instead of generating the entire gridup-front, I generate the boxes as needed to calculate the overlap withthe data grid. This brought the processing time down to around 40seconds, so a definite improvement there.

Doing it on-demand. Now you're talking! Plus, if you're able to 'fit'the data into each box as it is created, that will help justify thesetup/tear-down overhead cost for each async process.


Well done!

---
Israel Brewster
Software Engineer
Alaska Volcano Observatory
Geophysical Institute - UAF
2156 Koyukuk Drive
Fairbanks AK 99775-7320
Work: 907-474-5172
cell:  907-328-9145
On Feb 20, 2019, at 4:30 PM, DL Neil <[email protected]<mailto:[email protected]>> wrote:
George

On 21/02/19 1:15 PM, george trojan wrote:
def create_box(x_y):
    return geometry.box(x_y[0] - 1, x_y[1],  x_y[0], x_y[1] - 1)
x_range = range(1, 1001)
y_range = range(1, 801)
x_y_range = list(itertools.product(x_range, y_range))
grid = list(map(create_box, x_y_range))
Which creates and populates an 800x1000 “grid” (represented as a flatlist
at this point) of “boxes”, where a box is a shapely.geometry.box(). This
takes about 10 seconds to run.
Looking at this, I am thinking it would lend itself well to
parallelization. Since the box at each “coordinate" is independent of all
others, it seems I should be able to simply split the list up into chunks
and process each chunk in parallel on a separate core. To that end, I
created a multiprocessing pool:
I recall a similar discussion when folk were being encouraged to moveaway from monolithic and straight-line processing to modular functions- it is more (CPU-time) efficient to run in a straight line; than itis to repeatedly call, set-up, execute, and return-from a function orsub-routine! ie there is an over-head to many/all constructs!
Isn't the 'problem' that it is a 'toy example'? That the amount ofcomputing within each parallel process is small in relation to theinherent 'overhead'.
Thus, if the code performed a reasonable analytical task within eachbox after it had been defined (increased CPU load), would you thennotice the expected difference between the single- and multi-processimplementations?
From AKL to AK
--
Regards =dn
--
https://mail.python.org/mailman/listinfo/python-list


--
Regards =dn
--
https://mail.python.org/mailman/listinfo/python-list

Re: Multiprocessing performance question

Reply via email to