Hi,
Total number of docs are 341948. growing at the rate of about a couple
per second. I know that first time indexing takes some time but I have
not yet been able to complete even a single map so far. I am now
trying with a simpler map with no reduce.
function(doc) {
if ("msgtype" in doc){
if (doc.msgtype == "allow"){
if ((doc.event == "action_allowed_ip") || (doc.event ==
"action_allow_new")){
emit (ip, 1);
}
}
}
}
On Wed, Oct 21, 2009 at 8:25 PM, Alex P <[email protected]> wrote:
> how many docs is that, and have you run the view incrementally? first time
> index builds are painful...
>
> On Wed, Oct 21, 2009 at 9:46 AM, Rajkumar S <[email protected]> wrote:
>
>> Hello,
>>
>> I am using couch db 0.9.0 for storing logs from my mail server.
>>
>> Logs are sent from mail servers to a RabbitMQ queue server. Log
>> insertions into couchdb is done by a python program, after fetching it
>> from RabbitMQ and converting to Json, using couchdb module (from
>> couchdb import *). I have a single document storing entire history of
>> the email transactions. I also have multiple RabbitMQ clients each
>> pulling from same queue and updating the same coudhdb. This means I
>> have to update the same document from different clients several times
>> during the life time of an email message.
>>
>> To do this I use the message id of each mail transaction as it's key.
>> (this appears in every log entry) When a first log entry arrives I
>> check if a doc with that key is present in db, if not I create a new
>> doc with that key. When second log arrives I extract the doc, convert
>> it to a hash table in my program, merge the new log entry with the
>> hash table and update the doc with the updated hash table's json. If a
>> conflict occurs, the program retries, fetching the doc and updating it
>> and storing again till conflict is resolved.
>>
>> This means for every write there is a corresponding read.
>>
>> Currently I am running it as a pilot and just have a single server
>> logging to couchdb. I have about 0.75 GB per day right now, with
>> GET/PUT happening almost continuously (say 1 - 2 per second).
>> Previously I had a test server running and I tested couple of map
>> reduce using that DB (about 5 mb)
>>
>> Now after logging from a single production machine I am not able to
>> run a single view so far. I get the following error if I wait long
>> enough:
>>
>> Error: case_clause
>>
>> {{bad_return_value,{os_process_error,"OS process timed out."}},
>> {gen_server,call,
>> [<0.436.0>,
>> {prompt,[<<"rereduce">>,
>> [<<"function(keys, values)\n{\n return
>> values;\n}">>],.....
>>
>> I have changed os_process_timeout to 50000, removed the reduce part
>> but even after about 6 hours my map is not yet finished. Currently the
>> db size is 3.6G
>>
>> The map function I am using is:
>>
>> function(doc) {
>> if ("msgtype" in doc){
>> if (doc.msgtype == "allow"){
>> if ((doc.event == "action_allowed_ip") || (doc.event ==
>> "action_allow_new")){
>> result = {};
>> ip = doc.parameters.client_address;
>> result["helo"] = doc.parameters.helo_name;
>> result["event"] = doc.event;
>> result["timestamp"] = doc.timestamp;
>> result["id"] = doc._id;
>> result["from"] = doc.parameters.sender;
>> result["to"] = doc.parameters.recipient;
>> emit (ip,result);
>> }
>> }
>> }
>> }
>>
>> Top shows that couchjs is most active process and it shows the
>> following line right now,
>> 11410 root 20 0 90752 27m 752 R 76 0.7 1235:05 couchjs
>>
>> My hardware is Intel(R) Core(TM)2 Duo CPU E6750 @ 2.66GHz, 4Gig RAM
>> and one SATA hard disk. I do not think this is the expected
>> performance of couchdb, so is there some thing I am doing wrong? Any
>> tips to enhance the performance to acceptable levels?
>>
>> thanks and much regards,
>>
>> raj
>>
>