We are seeing a strange error in CouchDB that causes Chef to become unusable 
and unrecoverable. The knife command ceases to respond, and the chef webui 
ceases to respond. /var/log/couch.log shows an os_process_error with exit 
status 0.

This is the second time this has happened. The first time, it happened to our 
chef-server that was running properly for several weeks. On Monday, at about 11 
AM EST, the error occurred and our chef-server became urecoverable. We tried to 
research and recover the issue for about a day.

We then rebuilt the chef-server this morning. During the setup/installation, we 
encountered this issue (http://tickets.opscode.com/browse/CHEF-2346) which we 
had encountered in the past. We then applied the fix, by increasing 
maxFieldLength in the mainIndex section of the chef solr config file.

Very shortly after that, while do a chef run on a lab node, running a knife 
command and trying to access the web UI all at the same time, the 
os_process_error occurred again and the chef-server became unusable.

Our chef-server is running on a vSphere VM with 2 cores (2 cores in 1 socket), 
2GB of RAM. It's running Ubuntu 10.04 LTS, Chef 0.10.8 and CouchDB 0.10. The VM 
was generated from a pre-existing VM that originally had only 1 core.

Another detail about our environment that may be important is that we use 
Centrify on our Linux server for Active Directory integration. This is why we 
were affected by CHEF-2346. Since chef pulls in all authorized users on a node 
as an automatic attribute, there can be thousands of users in a list that gets 
gathered by chef.

couch.log is 125000 lines long, so I'll include the beginning 
(http://pastie.org/3602674) and the end (http://pastie.org/3602677).

I should also mention that we have since rebuilt our chef-server on Ubuntu 
11.10 which includes CouchDB 1.0.1. We have no issues, but we are very 
interested in getting to the root cause of this problem, because we are still 
nervous.

Is perhaps CouchDB dying because of the size of the node data that we are 
asking chef to gather? Has anyone else encountered this error? Much thanks for 
any help. Let me know if I can provide any more information.

Ian D. Rossi

Reply via email to