[web2py] Re: Help understanding memory use and selects

Massimo Di Pierro Sun, 15 Sep 2019 17:30:13 -0700

I would rewrite

p_here = [p for p in cpaths if loc_id in db.steps[int(p['steps'][0])].
> locations]
>
>
as


ids = [int(p['steps][0]) for p in p in cpaths])
rpaths = 
db(db.steps.id.belongs(ids)(db.steps.locations.contains(loc_id)).select(db.steps.id)
p_here = [p for p in cpaths if int(p['steps'][0]) in rpaths.as_dict()]

(if locations is searchable else)

ids = [int(p['steps][0]) for p in p in cpaths])
rpaths = db(db.steps.id.belongs(ids).select(db.steps.id, 
db.steps.locations).as_dict()
p_here = [p for p in cpaths if int(p['steps'][0]) in rpaths and loc_id in 
paths[int(p['steps'][0])].locations]

In both case you do a single select. In the second case it returns more 
records because the filter is done in python, not in db

 
>
> The refactored version looks like this:
>
>     pid_here = [p['path2steps']['path_id'] for p
>                 in db((db.path2steps.step_id == db.steps.id) &
>                       (db.steps.locations.contains(loc_id))
>                       ).iterselect(db.path2steps.path_id, db.steps.
> locations)
>                       if loc_id in p['steps']['locations']
>                       ]
>     p_here = [p for p in cpaths if p['id'] in pid_here]
>
> It looks less elegant, but it's *much* lighter on memory. Let me break 
> down the changes I made. 
>
>    1. I removed the db access (select) from the "if" condition which is 
>    called on every iteration of the loop. Instead I access the db once and 
>    iterate over the result.
>    2. I use iterselect instead of select.
>    3. In the iterselect I specified just the fields I'm actually going to 
>    use, so that useless data doesn't go into memory.
>
> In order to make these changes I had to reorganize the logic 
> significantly. Rather than trying to pinpoint my desired data set in one 
> pass through the list comprehension, I first gather a larger (!) data set 
> and then refine it in a second step that doesn't require db access. The 
> details aren't important here. What surprised me, though, is that it was 
> far more memory efficient to iterate over a single, stripped down 
> iterselect than to make multiple selects. This is true even though the 
> resulting list is larger and has to be pared down in a second stage.
>
> The larger takeaway for me is that db access is generally very expensive 
> in terms of memory. It's worth it for me to organize my logic around 
> minimizing db calls, even if the result is less elegant code.
>
> On Monday, September 2, 2019 at 3:44:18 PM UTC-4, Ian W. Scott wrote:
>>
>> I'm trying to lower the memory use of an app and have some general 
>> questions about how memory is used in DAL selects:
>>
>>
>>    1. Am I right that the memory used while performing the select isn't 
>>    released right away, even if the select isn't assigned to a variable? 
>>    2. I'm aware of iterselect. Am I right that with iterselect the 
>>    memory used is just enough to store one row of data (instead of the whole 
>>    selected set)?
>>    3. Does this mean that, generally, you want to perform as few 
>>    separate selects as possible, unless you can use iterselect?
>>    4. Selects seem to occupy a significant amount of memory, even when 
>>    the result set is only one row. Is the memory use for the select 
>> determined 
>>    by the table size or the result size?
>>
>> Here's an example of the kind of situation I'm working with. I'm using a 
>> list comprehension to loop through a list, performing a select in each loop:
>>
>>     p_here = [p for p in cpaths if loc_id in db.steps[int(p['steps'][0
>> ])].locations]
>>
>>
>> When I run a memory profiler, this line results in over 1MB of memory 
>> being occupied, and that memory isn't released for at least several 
>> minutes. The table "steps" has about 3000 rows, so it's not enormous. The 
>> result for each select is a single row and doesn't include a huge amount of 
>> data (a few strings, ints, etc.). The "cpaths" list might have 50 or so 
>> items. So is the memory issue emerging because (a) the memory use for each 
>> select is determined by the table size, and (b) memory is being occupied 
>> (and not released) separately for each iteration of the loop? Is there a 
>> way to rewrite this so that it uses less memory?
>>
>

-- 
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/web2py/7916869a-fe94-42a7-8283-f88bbf11b4d2%40googlegroups.com.

[web2py] Re: Help understanding memory use and selects

Reply via email to