(Migrated to user@) On Thu, Aug 25, 2011 at 4:05 AM, Chris Stockton <[email protected]> wrote: > Hello, > > On Wed, Aug 24, 2011 at 1:53 PM, Paul Davis <[email protected]> > wrote: >> I bet you're hitting a bug we just recently fixed on trunk. Basically, >> there was a possibility that errors in some of the JS functions would >> end up causing a couchjs process to be come unusable without removing >> it from the list. Eventually there wouldn't be any spots left and >> clients would get timeouts like you see. >> >> Patch is at [1]. If it doesn't apply cleanly, you really only need the >> bits from couch_os_process.erl and couch_query_servers.erl. The rest >> is just test code. >> >> https://github.com/apache/couchdb/commit/95da6f6f4246d2e8e86a3cf92ddf6487e46c10e9 >> > > Right after sending this I finally saw what the issue was, our bug > report here: > https://issues.apache.org/jira/browse/COUCHDB-1257?focusedCommentId=13090484#comment-13090484 > as a side effect was leaving lingering processes ultimately leading to > instability of couch. > > We are working to patch our reduce issue, and will look at applying > that commit perhaps once it hits mainline couch?
Chris, I'm glad those bugs are identified and will be fixed soon. But in the meantime, perhaps you can change your code to add robustness? I can think of two ideas. 1. CouchDB has an odd, idiosyncratic, feature where GET queries produce side-effects. From HTTP, there are no side-effects, but as you can see, GETting a view can spawn couches processes and write files to the disk. Perhaps you could add ?stale=ok to all of your queries used in production. To my knowledge, stale=ok guarantees that couchjs will not be involved in servicing that query. This protects your users from seeing map/reduce errors. The down-side is that you must of course query the views yourself to keep them current. 2. Design documents should never be published (used in production) until their views are fully built. This is not a CouchDB bug, but rather a lack of tooling. The technique is pretty simple. Publish your design document under an alternative id: _design/example_staging. Next, query the views (which you are conveniently already doing in step 1!). When the views are fresh, with no bugs and everything looks good, query with HTTP COPY to promote _design/example_staging to _design/example. That is an atomic software upgrade; and views will be ready instantly. Not bad! Perhaps these ideas will help you work around your bugs. IMO, they are good general policies anyway. -- Iris Couch
