We're having a bit of a problem with couchdb views 'disappearing', and
I've not been able to make headway diagnosing the problem. I'd love to
have suggestions on how to isolate this.
After some time running, we start seeing errors in the log:
2010-07-12_17:59:04.82351 [info] [<0.8900.246>] 10.192.210.79 - - 'GET'
/authorization/_design/objects/_view/by_type?key=%2245c4ef2de7981991a5aaf23cd7fb0bbf%22
404
Running curl
% curl
'localhost:5984//authorization/_design/objects/_view/by_type?limit=0'
{"error":"not_found","reason":"missing_named_view"}
This would normally succeed.
The design documents are present and can be fetched.
% curl 'localhost:5984//authorization/_design/objects'
{"_id":"_design/objects","_rev":"1-3598677772","views":{"by_type":{"map":"function(doc)
{emit(doc._id,doc.type)}"}}}
Couchdb never recovers. Restarting couchdb fixes the problem. This
problem repeats, in the sense that it happens pretty consistently, but
we've had trouble reproducing the problem neatly; synthetic couchdb
workloads do not seem to trigger this.
The only patterns we've been able to spot is that they seem to happen
right after a series of rapid updates to an document indexed by the view
in question. We see a bunch of 'PUT' entries, and 'checkpointing view
update at seq XXX for authorization'. This is happening across all of
our databases, but seems connected with load.
* Using native erlang views does not seem to prevent the problem, just
defer it a bit.
* There is plenty of disk space; 30GB used in a 100GB partition. None of
the databases are larger than 1GB, but some of the views get very large
(12G or more)
We're running CouchDB 0.11.0; Ubuntu 10.04. I've not yet been able to
repro the problem in 1.0.0, and will try 0.11.1 as soon as I give up on
breaking 1.0.0.
I have a ec2 machine in this state, so if anyone has suggestions of
diagnostics to run or the like, I'd be glad to poke at it a bit.