Re: [pkg-discuss] Code review request for [bugs 1154, 1237, 1845, 1887, 1888]

Shawn Walker Mon, 19 May 2008 16:40:16 -0700

2008/5/19  <[EMAIL PROTECTED]>:
> Shawn,
>
>> So I reworked the filelist operation again to use StringIO initially,
>> and it worked this time. Going further, I discovered I could get a
>> little better performance using cStringIO instead.
>>
>> As such, I believe I've managed to find an acceptable solution with
>> cStringIO. After each file is written to the tar stream, I stream the
>> cStringIO contents and then truncate and add the next file to the tar
>> stream.
>
> This is an improvement.  Would you comment the code you added in
> repository.py.  I had to take a look at the source for cStringIO to
> figure out some of the details.


Sure. I actually never looked at the source for cStringIO; I just used
the pydoc.

>> The memory footprint is going to be a little bigger than what we
>> currently have; though acceptably so in my view.
>
> Let's validate this empirically.  The cStringIO code allocates from the
> heap.  Even when we free a buffer, the memory will stay allocated in
> Python's address space.  This means that over time, the memory allocated
> by Python may gradually grow.  I'm going to run a test where we pull a
> bunch of large files from the depot. This should show us how the change
> effects our overall memory footprint.

Argh!

I had also checked pmap's output yesterday, but apparently I had
checked the *wrong* depot process.

The final pmap output today (after rechecking) shows about 43MiB at
the end of a large operation with a heap of about 34MiB.

It doesn't grow any larger on subsequent operations, but it looks like
you were right to have me check again.

Sorry about that.

> If this turns out to be a large number, we may want to consider writing
> a custom cStringIO-like object that has a hard limit on the buffer size.
> This may be a bit tricky, though.  The Python code in tarfile.py assumes
> that all operations can complete synchronously.  If we run out of space
> and block, the op will never complete.

Yes, I saw the note about that.

I'll have to experiment some with a wrapper object.

>> By comparison, the old depot code allocated and freed a total of about
>> 1MiB *every* time the operation is performed since it starts and kills
>> a thread for every transaction.
>>
>> I obtained that information using the anonprofile.d DTrace script that
>> Brendan Gregg wrote.
>
> cStringIO allocates from the heap.  Does anonprofile track those
> allocations, or just mmap(MAP_ANON) ones?

anonprofile.d actually hooks into the kernel's
anon_resvmem/anon_unresvmem functions.

http://www.brendangregg.com/DTrace/anonprofile.d

>> > I'd be interested to see the example that the cherrypy guys gave you, if
>> > it's handy.
>>
>> This is the example they pointed me to:
>> http://www.cherrypy.org/browser/trunk/cherrypy/test/test_conn.py?rev=1956#L282
>
> I took a look at this example.  Unless I misread the code, it looks like
> they're keeping the connection open and sending a request, reading a
> response, and then sending another request.  This doesn't fit my
> definition of pipelining, since we want to send multiple requests at
> once and then receive the responses.

I thought you might say that :-)

When I get the time, I was going to try out their approach, except
using the pattern you wanted.

Cheers,
-- 
Shawn Walker

"To err is human -- and to blame it on a computer is even more so." -
Robert Orben
_______________________________________________
pkg-discuss mailing list
pkg-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Re: [pkg-discuss] Code review request for [bugs 1154, 1237, 1845, 1887, 1888]

Reply via email to