Shawn,

> So I reworked the filelist operation again to use StringIO initially,
> and it worked this time. Going further, I discovered I could get a
> little better performance using cStringIO instead.
> 
> As such, I believe I've managed to find an acceptable solution with
> cStringIO. After each file is written to the tar stream, I stream the
> cStringIO contents and then truncate and add the next file to the tar
> stream.

This is an improvement.  Would you comment the code you added in
repository.py.  I had to take a look at the source for cStringIO to
figure out some of the details.

> The memory footprint is going to be a little bigger than what we
> currently have; though acceptably so in my view.

Let's validate this empirically.  The cStringIO code allocates from the
heap.  Even when we free a buffer, the memory will stay allocated in
Python's address space.  This means that over time, the memory allocated
by Python may gradually grow.  I'm going to run a test where we pull a
bunch of large files from the depot. This should show us how the change
effects our overall memory footprint.

If this turns out to be a large number, we may want to consider writing
a custom cStringIO-like object that has a hard limit on the buffer size.
This may be a bit tricky, though.  The Python code in tarfile.py assumes
that all operations can complete synchronously.  If we run out of space
and block, the op will never complete.  

> By comparison, the old depot code allocated and freed a total of about
> 1MiB *every* time the operation is performed since it starts and kills
> a thread for every transaction.
> 
> I obtained that information using the anonprofile.d DTrace script that
> Brendan Gregg wrote.

cStringIO allocates from the heap.  Does anonprofile track those
allocations, or just mmap(MAP_ANON) ones?

> > I'd be interested to see the example that the cherrypy guys gave you, if
> > it's handy.
> 
> This is the example they pointed me to:
> http://www.cherrypy.org/browser/trunk/cherrypy/test/test_conn.py?rev=1956#L282

I took a look at this example.  Unless I misread the code, it looks like
they're keeping the connection open and sending a request, reading a
response, and then sending another request.  This doesn't fit my
definition of pipelining, since we want to send multiple requests at
once and then receive the responses.

-j
_______________________________________________
pkg-discuss mailing list
pkg-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Reply via email to