hi All,
I have been evaluating couchDB for a project and was trying some performance
tests, one of them is to test the performance of getting a document by id.
I have written up a small python script that loads about a million documents
(very simple - {_id, value}), for the test i assigned the ids myself instead of
the uuid assigned by couchDB. id starts from 0 and reach a million.
After the loading is done, I ran another python script that tries to get each
of the million documents - it ran for a few hours and then i killed it.
Tried running the same python script simultaneously with different key ranges
(4 to be precise) - it ran and completed in about 3 hrs on a mimimum - for
multiple runs.
This means 1000000/(3*60*60) ~ 93 gets per second. Is this the current
performance benchmark ? or is there some thing stupid that i am doing. BTW this
is way too slow for the application i was exploring couchDB for.
Please let me know if there are any suggestions.
Environment:
python-couchDB lib - latest version 0.5.x
python - 2.5.3
ubuntu - 8.04
laptop has 4G or RAM, dual core and about 80G of HDD.
python-couchDB - bulk docs - insertion.
python-couchDB - get by id - multiple options tried - like db[doc_id],
db.get(doc_id)
Tried creating a view on id (was just trying - AFAIU an index should already
exist on id) - took hours and hours and i killed it.
Sample code:
### insertion ###
count = 0
id = 0
lineCount = 0
# input comes from STDIN (standard input)
batch = []
for line in sys.stdin:
# remove leading and trailing whitespace
lineCount += 1
line = line.strip()
values = []
# parsing of the input line
for size in range(sizesLength):
value = line[absIndex[size]:absIndex[size + 1]].strip()
values.append(value)
if size == sizesLength - 2:
break
idS = '%s' % (id)
batch.append({"partnerCode":values[1],
"_id":idS
})
count += 1
id += 1
if count % 10000 == 0:
db.update(batch)
### insertion ###
### fetch ###
for line in range(1000000):
idS = '%s' % (line)
tx = db[idS]
### fetch ###
Thanks
Manju