Shawn, > So I reworked the filelist operation again to use StringIO initially, > and it worked this time. Going further, I discovered I could get a > little better performance using cStringIO instead. > > As such, I believe I've managed to find an acceptable solution with > cStringIO. After each file is written to the tar stream, I stream the > cStringIO contents and then truncate and add the next file to the tar > stream.
This is an improvement. Would you comment the code you added in repository.py. I had to take a look at the source for cStringIO to figure out some of the details. > The memory footprint is going to be a little bigger than what we > currently have; though acceptably so in my view. Let's validate this empirically. The cStringIO code allocates from the heap. Even when we free a buffer, the memory will stay allocated in Python's address space. This means that over time, the memory allocated by Python may gradually grow. I'm going to run a test where we pull a bunch of large files from the depot. This should show us how the change effects our overall memory footprint. If this turns out to be a large number, we may want to consider writing a custom cStringIO-like object that has a hard limit on the buffer size. This may be a bit tricky, though. The Python code in tarfile.py assumes that all operations can complete synchronously. If we run out of space and block, the op will never complete. > By comparison, the old depot code allocated and freed a total of about > 1MiB *every* time the operation is performed since it starts and kills > a thread for every transaction. > > I obtained that information using the anonprofile.d DTrace script that > Brendan Gregg wrote. cStringIO allocates from the heap. Does anonprofile track those allocations, or just mmap(MAP_ANON) ones? > > I'd be interested to see the example that the cherrypy guys gave you, if > > it's handy. > > This is the example they pointed me to: > http://www.cherrypy.org/browser/trunk/cherrypy/test/test_conn.py?rev=1956#L282 I took a look at this example. Unless I misread the code, it looks like they're keeping the connection open and sending a request, reading a response, and then sending another request. This doesn't fit my definition of pipelining, since we want to send multiple requests at once and then receive the responses. -j _______________________________________________ pkg-discuss mailing list pkg-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/pkg-discuss