We're having a bit of a problem with couchdb views 'disappearing', and I've not been able to make headway diagnosing the problem. I'd love to have suggestions on how to isolate this.

After some time running, we start seeing errors in the log:
2010-07-12_17:59:04.82351 [info] [<0.8900.246>] 10.192.210.79 - - 'GET' /authorization/_design/objects/_view/by_type?key=%2245c4ef2de7981991a5aaf23cd7fb0bbf%22 404

Running curl
% curl 'localhost:5984//authorization/_design/objects/_view/by_type?limit=0'
{"error":"not_found","reason":"missing_named_view"}
This would normally succeed.

The design documents are present and can be fetched.
% curl 'localhost:5984//authorization/_design/objects'
{"_id":"_design/objects","_rev":"1-3598677772","views":{"by_type":{"map":"function(doc) {emit(doc._id,doc.type)}"}}}

Couchdb never recovers. Restarting couchdb fixes the problem. This problem repeats, in the sense that it happens pretty consistently, but we've had trouble reproducing the problem neatly; synthetic couchdb workloads do not seem to trigger this.

The only patterns we've been able to spot is that they seem to happen right after a series of rapid updates to an document indexed by the view in question. We see a bunch of 'PUT' entries, and 'checkpointing view update at seq XXX for authorization'. This is happening across all of our databases, but seems connected with load.

* Using native erlang views does not seem to prevent the problem, just defer it a bit. * There is plenty of disk space; 30GB used in a 100GB partition. None of the databases are larger than 1GB, but some of the views get very large (12G or more)

We're running CouchDB 0.11.0; Ubuntu 10.04. I've not yet been able to repro the problem in 1.0.0, and will try 0.11.1 as soon as I give up on breaking 1.0.0.

I have a ec2 machine in this state, so if anyone has suggestions of diagnostics to run or the like, I'd be glad to poke at it a bit.



Reply via email to