Question on performance of getting document by id

Manjunath Somashekhar Tue, 30 Dec 2008 09:57:23 -0800

hi All,

I have been evaluating couchDB for a project and was trying some performance 
tests, one of them is to test the performance of getting a document by id.


I have written up a small python script that loads about a million documents 
(very simple - {_id, value}), for the test i assigned the ids myself instead of 
the uuid assigned by couchDB. id starts from 0 and reach a million.

After the loading is done, I ran another python script that tries to get each 
of the million documents - it ran for a few hours and then i killed it.
Tried running the same python script simultaneously with different key ranges 
(4 to be precise) - it ran and completed in about 3 hrs on a mimimum - for 
multiple runs.

This means 1000000/(3*60*60) ~ 93 gets per second. Is this the current 
performance benchmark ? or is there some thing stupid that i am doing. BTW this 
is way too slow for the application i was exploring couchDB for.

Please let me know if there are any suggestions.

Environment:
python-couchDB lib - latest version 0.5.x
python - 2.5.3
ubuntu - 8.04
laptop has 4G or RAM, dual core and about 80G of HDD.
python-couchDB - bulk docs - insertion.
python-couchDB - get by id - multiple options tried - like db[doc_id], 
db.get(doc_id)
Tried creating a view on id (was just trying - AFAIU an index should already 
exist on id) - took hours and hours and i killed it.

Sample code:
### insertion ###
count = 0
id = 0
lineCount = 0
# input comes from STDIN (standard input)
batch = []
for line in sys.stdin:
    # remove leading and trailing whitespace
    lineCount += 1
    line = line.strip()
    values = []
    # parsing of the input line
    for size in range(sizesLength):
        value = line[absIndex[size]:absIndex[size + 1]].strip()
        values.append(value)
        if size == sizesLength - 2:
            break
    
    idS = '%s' % (id)

    batch.append({"partnerCode":values[1],
                       "_id":idS 
                      })
    count += 1
    id += 1
    
    if count % 10000 == 0:
       db.update(batch)
### insertion ###

### fetch ###
for line in range(1000000):   
    idS = '%s' % (line)
    tx = db[idS]
### fetch ###

Thanks
Manju

Question on performance of getting document by id

Reply via email to