2008/5/19 <[EMAIL PROTECTED]>: > Shawn, > >> So I reworked the filelist operation again to use StringIO initially, >> and it worked this time. Going further, I discovered I could get a >> little better performance using cStringIO instead. >> >> As such, I believe I've managed to find an acceptable solution with >> cStringIO. After each file is written to the tar stream, I stream the >> cStringIO contents and then truncate and add the next file to the tar >> stream. > > This is an improvement. Would you comment the code you added in > repository.py. I had to take a look at the source for cStringIO to > figure out some of the details.
Sure. I actually never looked at the source for cStringIO; I just used the pydoc. >> The memory footprint is going to be a little bigger than what we >> currently have; though acceptably so in my view. > > Let's validate this empirically. The cStringIO code allocates from the > heap. Even when we free a buffer, the memory will stay allocated in > Python's address space. This means that over time, the memory allocated > by Python may gradually grow. I'm going to run a test where we pull a > bunch of large files from the depot. This should show us how the change > effects our overall memory footprint. Argh! I had also checked pmap's output yesterday, but apparently I had checked the *wrong* depot process. The final pmap output today (after rechecking) shows about 43MiB at the end of a large operation with a heap of about 34MiB. It doesn't grow any larger on subsequent operations, but it looks like you were right to have me check again. Sorry about that. > If this turns out to be a large number, we may want to consider writing > a custom cStringIO-like object that has a hard limit on the buffer size. > This may be a bit tricky, though. The Python code in tarfile.py assumes > that all operations can complete synchronously. If we run out of space > and block, the op will never complete. Yes, I saw the note about that. I'll have to experiment some with a wrapper object. >> By comparison, the old depot code allocated and freed a total of about >> 1MiB *every* time the operation is performed since it starts and kills >> a thread for every transaction. >> >> I obtained that information using the anonprofile.d DTrace script that >> Brendan Gregg wrote. > > cStringIO allocates from the heap. Does anonprofile track those > allocations, or just mmap(MAP_ANON) ones? anonprofile.d actually hooks into the kernel's anon_resvmem/anon_unresvmem functions. http://www.brendangregg.com/DTrace/anonprofile.d >> > I'd be interested to see the example that the cherrypy guys gave you, if >> > it's handy. >> >> This is the example they pointed me to: >> http://www.cherrypy.org/browser/trunk/cherrypy/test/test_conn.py?rev=1956#L282 > > I took a look at this example. Unless I misread the code, it looks like > they're keeping the connection open and sending a request, reading a > response, and then sending another request. This doesn't fit my > definition of pipelining, since we want to send multiple requests at > once and then receive the responses. I thought you might say that :-) When I get the time, I was going to try out their approach, except using the pattern you wanted. Cheers, -- Shawn Walker "To err is human -- and to blame it on a computer is even more so." - Robert Orben _______________________________________________ pkg-discuss mailing list pkg-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/pkg-discuss