Hello,
I am using couch db 0.9.0 for storing logs from my mail server.
Logs are sent from mail servers to a RabbitMQ queue server. Log
insertions into couchdb is done by a python program, after fetching it
from RabbitMQ and converting to Json, using couchdb module (from
couchdb import *). I have a single document storing entire history of
the email transactions. I also have multiple RabbitMQ clients each
pulling from same queue and updating the same coudhdb. This means I
have to update the same document from different clients several times
during the life time of an email message.
To do this I use the message id of each mail transaction as it's key.
(this appears in every log entry) When a first log entry arrives I
check if a doc with that key is present in db, if not I create a new
doc with that key. When second log arrives I extract the doc, convert
it to a hash table in my program, merge the new log entry with the
hash table and update the doc with the updated hash table's json. If a
conflict occurs, the program retries, fetching the doc and updating it
and storing again till conflict is resolved.
This means for every write there is a corresponding read.
Currently I am running it as a pilot and just have a single server
logging to couchdb. I have about 0.75 GB per day right now, with
GET/PUT happening almost continuously (say 1 - 2 per second).
Previously I had a test server running and I tested couple of map
reduce using that DB (about 5 mb)
Now after logging from a single production machine I am not able to
run a single view so far. I get the following error if I wait long
enough:
Error: case_clause
{{bad_return_value,{os_process_error,"OS process timed out."}},
{gen_server,call,
[<0.436.0>,
{prompt,[<<"rereduce">>,
[<<"function(keys, values)\n{\n return
values;\n}">>],.....
I have changed os_process_timeout to 50000, removed the reduce part
but even after about 6 hours my map is not yet finished. Currently the
db size is 3.6G
The map function I am using is:
function(doc) {
if ("msgtype" in doc){
if (doc.msgtype == "allow"){
if ((doc.event == "action_allowed_ip") || (doc.event ==
"action_allow_new")){
result = {};
ip = doc.parameters.client_address;
result["helo"] = doc.parameters.helo_name;
result["event"] = doc.event;
result["timestamp"] = doc.timestamp;
result["id"] = doc._id;
result["from"] = doc.parameters.sender;
result["to"] = doc.parameters.recipient;
emit (ip,result);
}
}
}
}
Top shows that couchjs is most active process and it shows the
following line right now,
11410 root 20 0 90752 27m 752 R 76 0.7 1235:05 couchjs
My hardware is Intel(R) Core(TM)2 Duo CPU E6750 @ 2.66GHz, 4Gig RAM
and one SATA hard disk. I do not think this is the expected
performance of couchdb, so is there some thing I am doing wrong? Any
tips to enhance the performance to acceptable levels?
thanks and much regards,
raj