Tilgovi on IRC asked me to open an issue: https://issues.apache.org/jira/browse/COUCHDB-1192
Cheers, Eli On Wed, Jun 8, 2011 at 1:36 AM, Eli Stevens (Gmail) <[email protected]> wrote: > Running the following code on a macbook pro, using CouchDBX 1.0.2 > (everything local), we're seeing the following output when trying to > attach a file with 10MB of random data: > > Code: https://gist.github.com/bc0c36f36be0c85e2a36 (code included in full > below) > Output: > > Using curl: 0.168450117111 > Using put_attachment: 0.309157133102 > post time: 2.5557808876 > Using multipart: 2.61283898354 > Encoding base64: 0.0497629642487 > Updating: 5.0550069809 > > Server log: https://gist.github.com/a80a495fd35049ff871f (there's a > HEAD/DELETE/PUT/GET cycle that's just cleanup) > > The calls in question are: > > Using curl: 0.168450117111 > 1> [info] [<0.27828.7>] 127.0.0.1 - - 'PUT' > /benchmark_entity/bigfile/bigfile/bigfile.gz?rev=78-db58ded2899c5546e349feb5a8c0eee4 > 201 > > Using put_attachment: 0.309157133102 > 1> [info] [<0.27809.7>] 127.0.0.1 - - 'PUT' > /benchmark_entity/bigfile/smallfile?rev=81-c538b38a8463952f0136143cfa49e9fa > 201 > > Using multipart: 2.61283898354 (post time: 2.5557808876) > 1> [info] [<0.27809.7>] 127.0.0.1 - - 'POST' /benchmark_entity/bigfile 201 > > Updating: 5.0550069809 > 1> [info] [<0.27809.7>] 127.0.0.1 - - 'POST' /benchmark_entity/_bulk_docs 201 > > Profiling our code shows 1.5 sec of CPU usage in our code (which > covers setup / cleanup code that's not included in the times above), > and 11.8 sec of total run time, which roughly matches up with the > PUT/POST times above. Basically, I feel pretty confident that the > bulk of the times above are not in our client code, and are instead > due to couchdb's handling time. > > Why is the form/multipart handler so much slower than using a bare PUT > on the attachment? Why is the base64 approach even slower? Is it due > to bandwidth issues, couchdb CPU usage...? > > Thanks for any help, > Eli > > Full code from: https://gist.github.com/bc0c36f36be0c85e2a36 > > import base64 > import contextlib > import cStringIO > import subprocess > import time > > import couchdb > import couchdb.json > import couchdb.multipart > > @contextlib.contextmanager > def stopwatch(m=''): > t0=time.time() > yield > tdiff=time.time() - t0 > if m: > print '{}: {}'.format(m, tdiff) > else: > print tdiff > > def reset(d): > try: > del d['bigfile'] > except couchdb.http.ResourceNotFound: > pass > d['bigfile'] = {'foo': 'bar'} > return d['bigfile'] > > s = couchdb.Server() > d = s['benchmark_entity'] > > fn = '/tmp/bigfile.gz' > fn = '/tmp/smallfile' > > doc = reset(d) > with stopwatch('Using curl'): > p = subprocess.Popen([ > 'curl', > '-X', 'PUT', > > 'http://localhost:5984/benchmark_entity/{}/bigfile/bigfile.gz?rev={}'.format(doc.id, > doc.rev), > '-d', '@{}'.format(fn), > '-H', 'Content-Type: application/gzip' > ]) > p.wait() > > doc = reset(d) > with open(fn, 'r') as f: > with stopwatch('Using put_attachment'): > d.put_attachment(doc, f) > > doc = reset(d) > with open(fn, 'r') as f: > content_name = 'bigfile.gz' > content = f.read() > content_type = 'application/gzip' > with stopwatch('Using multipart'): > fileobj = cStringIO.StringIO() > > with couchdb.multipart.MultipartWriter(fileobj, headers=None, > subtype='form-data') as mpw: > mime_headers = {'Content-Disposition': '''form-data; > name="_doc"'''} > mpw.add('application/json', couchdb.json.encode(doc), mime_headers) > > mime_headers = {'Content-Disposition': '''form-data; > name="_attachments"; filename="{}"'''.format(content_name)} > mpw.add(content_type, content, mime_headers) > > header_str, blank_str, body = fileobj.getvalue().split('\r\n', 2) > > http_headers = {'Referer': d.resource.url, 'Content-Type': > header_str[len('Content-Type: '):]} > params = {} > t0 = time.time() > status, msg, data = d.resource.post(doc['_id'], body, > http_headers, **params) > print 'post time: {}'.format(time.time() - t0) > > doc = reset(d) > with open(fn, 'r') as f: > content_name = 'bigfile.gz' > content = f.read() > content_type = 'application/gzip' > with stopwatch('Encoding base64'): > doc['_attachments'] = {content_name: {'content_type': > content_type, 'data': base64.b64encode(content)}} > with stopwatch('Updating'): > d.update([doc]) >
