2008/5/16 <[EMAIL PROTECTED]>: >> I did post a question to the cherrypy users group though, so I'll see >> if someone knows a better way. > > Okay, great. Let me know if you'd like me to particiapte in the > discussion on their list. > > I spent some time this afternoon looking at the CherryPy code and some > of the Tools() that they've implemented. The code that handles the > gzipping of a stream seems awfully close to what we want to do, except > that we actually want to bypass the stack of transforms they employ in > the before_finalizer hooks. > > We may end up having to write some kind of file-like object that holds > the lines it gets from tarfile and then writes them out to the response > body. I'd be interested to see if the cherrypy users group knows of a > good way to do this. > > The response object has a attribute named stream (or streaming, maybe?) > and if that gets set, the framework omits key content-length headers > that could cause mischief.
I did post to the list and the response I got was that what I was doing now was really the only way. As for file-like object that holds the lines, I had already experimented with that using StringIO. However, due to an idiotic mistake on my part, it never worked quite right. So I reworked the filelist operation again to use StringIO initially, and it worked this time. Going further, I discovered I could get a little better performance using cStringIO instead. As such, I believe I've managed to find an acceptable solution with cStringIO. After each file is written to the tar stream, I stream the cStringIO contents and then truncate and add the next file to the tar stream. This isn't as efficient as the old approach, but it does avoid creating a temporary file. >> I think, unfortunately, that this will be an issue I'll have to >> resolve somehow before this gets putback. > > We'll need to figure something out before putback. The streaming is > necessary for the production servers handling pkg.opensolaris.org. > It'll be a problem if their memory footprint suddenly baloons. I'll let > Stephen comment on this more, if it's appropriate. The memory footprint is going to be a little bigger than what we currently have; though acceptably so in my view. The increase in footprint, from what I can tell, is only about one to two megabytes for an idle depot process (meaning before and after an operation). During an operation (such as pkg install SUNWfirefox into an *empty* user image), anonymous memory profiling shows about a total of 30MiB being allocated and *freed* for the new depot code. That is only for the first time a client does this. Repeating the same operation (without stopping the depot server) shows only about 500KiB being freed and allocated. By comparison, the old depot code allocated and freed a total of about 1MiB *every* time the operation is performed since it starts and kills a thread for every transaction. I obtained that information using the anonprofile.d DTrace script that Brendan Gregg wrote. >> httplib2 can supposedly handle pipelined requests. >> >> The cherrypy guys also have an example of doing it "by hand" using the >> existing python libraries. > > I took a look at httplib2, but didn't see how it handled pipelined > requests. It looks like it employs keep-alive, so that you can send > multiple requests over the same socket; however, I didn't see any way to > send multiple requests at the same time. What did I miss? According to the authors, you should be able to perform multiple get requests and then receive the data. However, I have not found any examples. > I'd be interested to see the example that the cherrypy guys gave you, if > it's handy. This is the example they pointed me to: http://www.cherrypy.org/browser/trunk/cherrypy/test/test_conn.py?rev=1956#L282 >> >> One of the things I struggled with while making these changes was >> >> whether it was more efficient to pass the request and response object >> >> around (and cleaner) or whether it was better to simply use the >> >> singleton object to access them. >> > >> > My guess it that it might be faster to pass the request and response >> > object; however, the difference probably isn't enough to be appreciable. >> >> I'll take a look back at the code and see how big of a change it would >> be to do this. > > Unless you want to make this change, I wouldn't worry about it. I have further reduced the imports of cherrypy. Now only depot.py, server/repository.py and server/transaction.py use it. I'm going to write-up a small summary of the changes I've made since I first posted the depot code changes for review and then a link to the new webrev just to make it easier for others to look at. Cheers, -- Shawn Walker "To err is human -- and to blame it on a computer is even more so." - Robert Orben _______________________________________________ pkg-discuss mailing list pkg-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/pkg-discuss