Erhm..just replace: > db.put_attachment(doc, content, content_type='text/plain')
with > db.put_attachment(doc, content, content_type='text/plain;charset=utf-8') And CouchDB will remember it: $ http HEAD http://localhost:5984/b/testing/test HTTP/1.1 200 OK Accept-Ranges: none Cache-Control: must-revalidate Content-Encoding: gzip Content-Length: 102 Content-MD5: 7y85tiUeF/UX9kqpKAzQEw== Content-Type: text/plain; charset=utf-8 Date: Fri, 03 Jan 2014 14:14:27 GMT ETag: "7y85tiUeF/UX9kqpKAzQEw==" Server: CouchDB/1.6.0+build.0bf1856 (Erlang OTP/R16B01) it will also available in attachments stub info. So before decoding, just read content-type value, get att's encoding and decode it according it. -- ,,,^..^,,, On Fri, Jan 3, 2014 at 7:43 PM, Daniel Gonzalez <[email protected]> wrote: > No, what I mean is "how can I keep track of the encoding used for each of > the attachments, so that I can decode then correctly whenever I want to" > > > On Fri, Jan 3, 2014 at 4:23 PM, Alexander Shorin <[email protected]> wrote: > >> Not sure if I follow your idea. You mean, that how you can set such >> charset info for existed attachments? In this case you have to >> reupload them. >> -- >> ,,,^..^,,, >> >> >> On Fri, Jan 3, 2014 at 6:40 PM, Daniel Gonzalez <[email protected]> >> wrote: >> > Thanks but, how do you set that on a per-attachment basis in a couchdb >> > document? If this is not supported, I guess I will have to add a mapping >> > "attachments-encoding" to the document where I can associate each >> > attachment with its encoding. Any comments on this? >> > >> > >> > On Fri, Jan 3, 2014 at 3:18 PM, Alexander Shorin <[email protected]> >> wrote: >> > >> >> You can set MIME type as text/plain;charset=utf-8 to help browsers >> >> detect the correct content encoding. >> >> See http://tools.ietf.org/html/rfc2068#section-3.4 for more info >> >> -- >> >> ,,,^..^,,, >> >> >> >> >> >> On Fri, Jan 3, 2014 at 5:52 PM, Daniel Gonzalez <[email protected]> >> >> wrote: >> >> > Hi, >> >> > >> >> > I have the following test script: >> >> > >> >> > # -*- coding: utf-8 -*- >> >> > >> >> > import os >> >> > import couchdb >> >> > >> >> > GREEK = u'ΑΒΓΔ ΕΖΗΘ ΙΚΛΜ ΝΞΟΠ ΡΣΤΥ ΦΧΨΩ αβγδ εζηθ ικλμ νξοπ ρςτυ φχψω' >> >> > >> >> > # Prepare a unicode file, encoded using ENCODING >> >> > ENCODING = 'utf-8' >> >> > filename = '/tmp/test' >> >> > open(filename, 'w').write(GREEK.encode(ENCODING)) >> >> > >> >> > # Create an empty document >> >> > server = couchdb.Server() >> >> > db = server['cdb-tests'] >> >> > doc_id = 'testing' >> >> > doc = { } >> >> > db[doc_id] = doc >> >> > >> >> > # Attach the file to the document >> >> > content = open(filename, 'rb') # Open the file for reading >> >> > db.put_attachment(doc, content, content_type='text/plain') >> >> > >> >> > As you can see, the file is utf-8 encoded, but when I attach that >> file to >> >> > couchdb, I have no way to specify this encoding. Thus, requesting the >> >> > attachment at http://localhost:5984/cdb-tests/testing/test returns >> the >> >> > following Response Headers: >> >> > >> >> > HTTP/1.1 200 OK >> >> > Server: CouchDB/1.2.0 (Erlang OTP/R15B01) >> >> > ETag: "7y85tiUeF/UX9kqpKAzQEw==" >> >> > Date: Fri, 03 Jan 2014 13:43:36 GMT >> >> > Content-Type: text/plain >> >> > Content-MD5: 7y85tiUeF/UX9kqpKAzQEw== >> >> > Content-Length: 102 >> >> > Content-Encoding: gzip >> >> > Cache-Control: must-revalidate >> >> > Accept-Ranges: none >> >> > >> >> > Seeing the attachment with a browser shows complete gibberish. How >> can I >> >> > store the encoding for couchdb attachments? >> >> > >> >> > Thanks and regards, >> >> > >> >> > Daniel >> >> > >> >> > PD: SO reference link: http://stackoverflow.com/q/20905157/647991 >> >> >>
