I have the following code running in python 3.7:

def create_box(x_y):
    return geometry.box(x_y[0] - 1, x_y[1],  x_y[0], x_y[1] - 1)

x_range = range(1, 1001)
y_range = range(1, 801)
x_y_range = list(itertools.product(x_range, y_range))

grid = list(map(create_box, x_y_range))

Which creates and populates an 800x1000 “grid” (represented as a flat list at 
this point) of “boxes”, where a box is a shapely.geometry.box(). This takes 
about 10 seconds to run.

Looking at this, I am thinking it would lend itself well to parallelization. 
Since the box at each “coordinate" is independent of all others, it seems I 
should be able to simply split the list up into chunks and process each chunk 
in parallel on a separate core. To that end, I created a multiprocessing pool:

pool = multiprocessing.Pool()

And then called pool.map() rather than just “map”. Somewhat to my surprise, the 
execution time was virtually identical. Given the simplicity of my code, and 
the presumable ease with which it should be able to be parallelized, what could 
explain why the performance did not improve at all when moving from the 
single-process map() to the multiprocess map()?

I am aware that in python3, the map function doesn’t actually produce a result 
until needed, but that’s why I wrapped everything in calls to list(), at least 
for testing.

---
Israel Brewster
Software Engineer
Alaska Volcano Observatory 
Geophysical Institute - UAF 
2156 Koyukuk Drive 
Fairbanks AK 99775-7320
Work: 907-474-5172
cell:  907-328-9145

-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to