On Tuesday, May 15, 2012 2:54:09 PM UTC+1, Vlad K. wrote:
>
>  
> Okay, followup on this problem. 
>
> I've now replaced one large lxml.etree root with chunked writing into a 
> tempfile. The XML view basically does this (debug heapy output included):
>
> [...]
> The idle, just started application is reported by top to use:
>
> VIRT: 36216
> RES: 23m
> SHR: 5276
>
>
> After single request to the above XML view:
>
> VIRT: 135m
> RES: 121m
> SHR: 3692
>
> I'm reporting the worker uwsgi process because the master parent process 
> is unchanged (23m).
>
>
> I am obviously doing something wrong or not understanding processes 
> involved, because the heapy is showing little change before and after 
> processing, total heap of 18M and top says 120M is eaten by that process, 
> and the resulting XML "file" is only 7M large.
>

As has been implied elsewhere on this thread, this is likely down to memory 
fragmentation. So, by and large, Python will request memory from the 
operating system via malloc, which requests memory from the kernel in 
chunks (more or less) with either the brk or mmap system calls. Now, these 
will normally be relatively large blocks compared to a single python 
object, normally 4k (a page) for mmap, or a single extending block in the 
case of brk. 

So, because the os layer will manage memory in bigger blocks than python 
does, what happens is that if you have one small object (say 100bytes) left 
allocated on a 4k allocation from the os, then naturally, you can't release 
that back to the OS, and Python can't optimise the memory usage (ie: 
compact the heap) to be able to do this. 

So, with the results, you're seeing, fragmentation is likely the cause.

Also, it's worth bearing in mind that heapy will only record objects 
allocated and tracked by python, which will of course be a subset of the 
total memory allocated to python by the os (which is what the virtual size 
(VIRT) measures). I'd guess that if you were to take heapy snapshots part 
way through creating the XML file, then you would see a 
bigger allocation of objects.

I hope this is somehow useful,

Ceri.

On Tuesday, May 15, 2012 2:54:09 PM UTC+1, Vlad K. wrote:
>
>  
> Okay, followup on this problem. 
>
> I've now replaced one large lxml.etree root with chunked writing into a 
> tempfile. The XML view basically does this (debug heapy output included):
>
>
>
> hp = guppy.hpy()
> print "================ BEFORE TEMPFILE"
> print hp.heap()
>
> xml_model = XMLModel()
> xmlfile = tempfile.TemporaryFile()
> xmlfile.write(xml_model.begin())        # write XML "doctype" and start 
> root tag
>
> for row in session.query(DataModel).filter(....).all():
>     # turn single row into its XML representation using lxml.element to 
> construct
>     # it, and return as string with lxml.tostring(element), here to be 
> written to tempfile
>
>     xmlfile.write(xml_model.process_row(row))
>
> xmlfile.write(xml_model.end())          # write closing root tag
>
> print "================ BEFORE ITERATOR RETURN"
> print hp.heap()
>
> # Return file as app iterator
> xmlfile.seek(0)
> return Resopnse(app_itel=xmlfile, content_type="application/xml")
>
>
>
> The problems I've encountered:
>
> 1. If I used SpooledTemporaryFile it eats up memory that is apparently 
> Python's and thus not returned to OS, so I'll just use tmpfs to minimize 
> disk writes on tens of thousands of rows
> 2. The whole application apparently uses 121M of memory as shown by top, 
> while guppy/heapy says 18M of heap is used. What follows is the output of 
> uwsgi.log, I've restarted the app and there is no
> request to it except this single request for a large XML "file". The 
> resulting output "file" is 7M.
>
>
> ================ BEFORE TEMPFILE
> Partition of a set of 192674 objects. Total size = 16970368 bytes.
>  Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
>      0  80163  42  6728540  40   6728540  40 str
>      1  44155  23  1673300  10   8401840  50 tuple
>      2   7068   4  1554336   9   9956176  59 dict (no owner)
>      3  10675   6   725900   4  10682076  63 types.CodeType
>      4   1539   1   687936   4  11370012  67 type
>      5    475   0   654424   4  12024436  71 dict of module
>      6  11259   6   630504   4  12654940  75 function
>      7   1539   1   614808   4  13269748  78 dict of type
>      8   3544   2   262048   2  13531796  80 list
>      9   1817   1   235916   1  13767712  81 unicode
> <593 more rows. Type e.g. '_.more' to view.>
> ================ BEFORE ITERATOR RETURN
> Partition of a set of 194294 objects. Total size = 18709172 bytes.
>  Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
>      0  80558  41  6739812  36   6739812  36 str
>      1  44243  23  1676316   9   8416128  45 tuple
>      2   7214   4  1596080   9  10012208  54 dict (no owner)
>      3      1   0  1573008   8  11585216  62 
> sqlalchemy.orm.identity.WeakInstanceDict
>      4  10675   5   725900   4  12311116  66 types.CodeType
>      5   1539   1   687936   4  12999052  69 type
>      6    475   0   654424   3  13653476  73 dict of module
>      7  11264   6   630784   3  14284260  76 function
>      8   1539   1   614808   3  14899068  80 dict of type
>      9   3679   2   268960   1  15168028  81 list
> <628 more rows. Type e.g. '_.more' to view.>
>
>
> The idle, just started application is reported by top to use:
>
> VIRT: 36216
> RES: 23m
> SHR: 5276
>
>
> After single request to the above XML view:
>
> VIRT: 135m
> RES: 121m
> SHR: 3692
>
> I'm reporting the worker uwsgi process because the master parent process 
> is unchanged (23m).
>
>
> I am obviously doing something wrong or not understanding processes 
> involved, because the heapy is showing little change before and after 
> processing, total heap of 18M and top says 120M is eaten by that process, 
> and the resulting XML "file" is only 7M large.
>
>
> .oO V Oo.
>
>
>  
On Tuesday, May 15, 2012 2:54:09 PM UTC+1, Vlad K. wrote:
>
>  
> Okay, followup on this problem. 
>
> I've now replaced one large lxml.etree root with chunked writing into a 
> tempfile. The XML view basically does this (debug heapy output included):
>
>
>
> hp = guppy.hpy()
> print "================ BEFORE TEMPFILE"
> print hp.heap()
>
> xml_model = XMLModel()
> xmlfile = tempfile.TemporaryFile()
> xmlfile.write(xml_model.begin())        # write XML "doctype" and start 
> root tag
>
> for row in session.query(DataModel).filter(....).all():
>     # turn single row into its XML representation using lxml.element to 
> construct
>     # it, and return as string with lxml.tostring(element), here to be 
> written to tempfile
>
>     xmlfile.write(xml_model.process_row(row))
>
> xmlfile.write(xml_model.end())          # write closing root tag
>
> print "================ BEFORE ITERATOR RETURN"
> print hp.heap()
>
> # Return file as app iterator
> xmlfile.seek(0)
> return Resopnse(app_itel=xmlfile, content_type="application/xml")
>
>
>
> The problems I've encountered:
>
> 1. If I used SpooledTemporaryFile it eats up memory that is apparently 
> Python's and thus not returned to OS, so I'll just use tmpfs to minimize 
> disk writes on tens of thousands of rows
> 2. The whole application apparently uses 121M of memory as shown by top, 
> while guppy/heapy says 18M of heap is used. What follows is the output of 
> uwsgi.log, I've restarted the app and there is no
> request to it except this single request for a large XML "file". The 
> resulting output "file" is 7M.
>
>
> ================ BEFORE TEMPFILE
> Partition of a set of 192674 objects. Total size = 16970368 bytes.
>  Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
>      0  80163  42  6728540  40   6728540  40 str
>      1  44155  23  1673300  10   8401840  50 tuple
>      2   7068   4  1554336   9   9956176  59 dict (no owner)
>      3  10675   6   725900   4  10682076  63 types.CodeType
>      4   1539   1   687936   4  11370012  67 type
>      5    475   0   654424   4  12024436  71 dict of module
>      6  11259   6   630504   4  12654940  75 function
>      7   1539   1   614808   4  13269748  78 dict of type
>      8   3544   2   262048   2  13531796  80 list
>      9   1817   1   235916   1  13767712  81 unicode
> <593 more rows. Type e.g. '_.more' to view.>
> ================ BEFORE ITERATOR RETURN
> Partition of a set of 194294 objects. Total size = 18709172 bytes.
>  Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
>      0  80558  41  6739812  36   6739812  36 str
>      1  44243  23  1676316   9   8416128  45 tuple
>      2   7214   4  1596080   9  10012208  54 dict (no owner)
>      3      1   0  1573008   8  11585216  62 
> sqlalchemy.orm.identity.WeakInstanceDict
>      4  10675   5   725900   4  12311116  66 types.CodeType
>      5   1539   1   687936   4  12999052  69 type
>      6    475   0   654424   3  13653476  73 dict of module
>      7  11264   6   630784   3  14284260  76 function
>      8   1539   1   614808   3  14899068  80 dict of type
>      9   3679   2   268960   1  15168028  81 list
> <628 more rows. Type e.g. '_.more' to view.>
>
>
> The idle, just started application is reported by top to use:
>
> VIRT: 36216
> RES: 23m
> SHR: 5276
>
>
> After single request to the above XML view:
>
> VIRT: 135m
> RES: 121m
> SHR: 3692
>
> I'm reporting the worker uwsgi process because the master parent process 
> is unchanged (23m).
>
>
> I am obviously doing something wrong or not understanding processes 
> involved, because the heapy is showing little change before and after 
> processing, total heap of 18M and top says 120M is eaten by that process, 
> and the resulting XML "file" is only 7M large.
>
>
> .oO V Oo.
>
>
>  

-- 
You received this message because you are subscribed to the Google Groups 
"pylons-discuss" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/pylons-discuss/-/BfJEIzNBq6oJ.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en.

Reply via email to