Hi, So just a few quick notes on running shindig in a high-volume production environment. As you may know hi5 launched our OpenSocial platform on March 31st, and we've been at 100% since last Thursday. Shindig has been doing a great job for us.
* On a single server instance we were able to push shindig into the 500 req/sec range. Beyond that and we saw request latency go higher than we found acceptable. * If you're careful about your iframe URLs you can get near 100% cache hit ratio. Our iframes only vary on country/lang/userprefs, and we have low usage of userprefs. Shindig has good support for generating and processing URLs with a version param (v=...). * Using a CDN or caching reverse proxy in front of your shindig server gives good results, even with all the uncachable requests for /gadgets/socialdata and /gadgets/proxy. We are seeing a 30% hit rate on our cache. Making sure we gzipped requests outbound and inbound helped too. We had an outbound http proxy for shindig requests to outside hosts for a while, however we've removed it for now. This is because we were only seeing a 2% hit rate on requests and it added a high amount of latency, slowing everything down. We'll probably add this back at some point, since a proxy of this sort can hold onto stale content when the originating site is down. * We wrote our own implementations of the social data interfaces, the content fetch interface, and the gadget signer interface. This is all pre-Guice patch. For ContentFetch we retrieve the appXML from a local memcache instance, since we grab and parse the appXML in our developer console. For GadgetSigner we use our custom RSA key. For the SocialData API stuff we integrated the various calls with our existing User/Friends/Activity services. We spent a bit of time optimizing the People fetches to only convert users that fall into the resultset. The trick is to convert filters into appropriate backend calls to reduce the data set size, and then to intelligently select only the actual users to convert based on first/max/sort We also converted a custom hi5 extension to use a new socialdata call for photo albums. We also added our experimental 'presence' field to Person to show ONLINE/OFFLINE status. * The built-in Fetch memory cache is mostly useless (and should probably be removed) We spent a couple of days chasing red herrings on why apps would not get refreshed because of this.. * The patches to add REFRESH_INTERVAL to gadgets.io.makeRequest() helped our cachability a lot. It appears that more and more people are using signed requests which will really hurt cachability. We regenerate the security token on every request, that might have to change.... * Prefetch of data would go a long way towards making shindig scale better. Right now requests for /gadgets/socialdata dominate, and for profile pages with multiple apps you'll see the same data fetched multiple times. * The default five second timeout on proxy requests scares me. If one of our major partners starts getting slow we'll stack up a lot of threads... You can see the hi5 modifications to shindig at http://www.hi5networks.com/platform/browser/shindig/

