Re: Compact not completing

Simon Eisenmann Fri, 12 Aug 2011 08:58:19 -0700

Woot! The script works and i was able to free around 100GB of space. If
perfectly finds the broken docs, recreates them empty and deletes them
again. Afterwards the databases become compactable again.


Thanks alot Adam for your support.

Attached is the final version of the repair script, which might be of
help for anybody which has this problem

Best regards
Simon

Am Freitag, den 12.08.2011, 10:56 -0400 schrieb Adam Kocoloski:
> Hi Simon, progress!  Yes, go ahead and re-create and delete documents with 
> those IDs and see if you can successfully compact.  Regards,
> 
> Adam
> 
> On Aug 12, 2011, at 6:00 AM, Simon Eisenmann wrote:
> 
> > Hi Adam,
> > 
> > all right, looks like i am able to find the trouble makers when also
> > looking at the deleted documents:
> > 
> > ~# python repaircompact.py http://localhost:5984/gangstercluster_1
> > Changes feed is
> > http://localhost:5984/gangstercluster_1/_changes?since=0.
> > Fetched feed into /tmp/tmpLjRkZY.
> > Not found error: missing!=deleted
> > (/gangstercluster_1/alive_1dcc4efdb2a411e0acd3003048679e10)
> > Not found error: missing!=deleted
> > (/gangstercluster_1/alive_1dcc523bb2a411e0acd4003048679e10)
> > Not found error: missing!=deleted
> > (/gangstercluster_1/alive_41ed63c1b2a411e0acd4003048679e10)
> > Not found error: missing!=deleted
> > (/gangstercluster_1/alive_41ed678bb2a411e0acd5003048679e10)
> > Processed 8225 entries (last sequence: 5972567)
> > Document count: 8222 (count: 74, deleted: 8148)
> > Document commited sequence is now: 5972569
> > 
> > 
> > Though its hard to say if the number of changes feed rows matches the
> > number of the database infos as its constantly changing. Though its very
> > close so probably it would match if nothing changes.
> > 
> > So now that i have found the missing documents. Should i just create an
> > empty document with that ID, and delete it again?
> > 
> > Thank you and best regards
> > Simon
> > 
> > ps: updated script attached
> > 
> > 
> > Am Donnerstag, den 11.08.2011, 12:30 -0400 schrieb Adam Kocoloski:
> >> Hi Simon, I wouldn't skip the deleted documents, a deleted doc could just 
> >> as easily be the one missing in the ID index.  When you lookup a deleted 
> >> document you should see "reason":"deleted" instead of the 
> >> "reason":"missing" that you get if the ID is not in the index at all.  
> >> Then again, if you see that only ever see deleted docs after retrieving 
> >> the full _changes then a deleted doc is probably not the source of your 
> >> troubles.
> >> 
> >> Can you confirm that the number of rows in the changes feed is equal to 
> >> doc_count + doc_del_count from db.info()?  Best,
> >> 
> >> Adam
> >> 
> >> On Aug 11, 2011, at 12:08 PM, Simon Eisenmann wrote:
> >> 
> >>> Hi Adam,
> >>> 
> >>> i wrote a short python script, which loads the complete changes feed and
> >>> requests all of the documents not marked "deleted" using HTTP HEAD
> >>> requests. Though i only find documents which have been deleted after i
> >>> have retrieved the complete changes view (takes a while for large
> >>> databases), and never the same.
> >>> 
> >>> So again no luck here in finding the problems.
> >>> 
> >>> Any more suggestions? Is there a way to rebuild the complete database
> >>> file (maybe offline?).
> >>> 
> >>> Thank you and best regards
> >>> Simon
> >>> 
> >>> 
> >>> ps: The script i have been using is attached.
> >>> 
> >>> 
> >>> 
> >>> Am Dienstag, den 09.08.2011, 16:10 -0400 schrieb Adam Kocoloski:
> >>>> Hi Simon, CouchDB 1.1.0 includes a recent optimization to 
> >>>> _changes?include_docs=true which allows it to skip a lookup in the id 
> >>>> tree and instead load the document body from the pointer in the sequence 
> >>>> tree.  In that case you wouldn't notice any missing entry in the id 
> >>>> tree.  You would notice it, however, if you did direct lookups for each 
> >>>> document. Apologies for the outdated instructions.  Can you try looking 
> >>>> up the documents in a separate request and see if the results change?
> >>>> 
> >>>> Adam
> >>>> 
> >>>> On Aug 9, 2011, at 5:49 AM, Simon Eisenmann wrote:
> >>>> 
> >>>>> Hi Adam,
> >>>>> 
> >>>>> i just checked the whole _changes feed (since=0) and could not find any
> >>>>> document "missing" when using "include_docs=true".
> >>>>> 
> >>>>> The database in itself has only around 30 documents, so should be quite
> >>>>> small. Though there are lots of creations and deletions happening all
> >>>>> the time. Thus it its daily purged and compacted. 
> >>>>> 
> >>>>> So - the changes feed is of no help. Any other idea?
> >>>>> 
> >>>>> Thank you and best regards
> >>>>> Simon
> >>>>> 
> >>>>> Am Montag, den 08.08.2011, 13:25 -0400 schrieb Adam Kocoloski:
> >>>>>> Hi Simon, I think my amended instructions to Mike are still a sensible 
> >>>>>> way to debug/workaround the problem.  Reiterating (96282148 was the 
> >>>>>> last seq Mike observed in the Futon status for the compaction):
> >>>>>> 
> >>>>>>> 1) What you really want are the last 1000 Ids in the seq_tree prior 
> >>>>>>> to the compactor crash.  So maybe something like
> >>>>>>> 
> >>>>>>> GET /iris/_changes?descending=true&limit=1000&since=96282148
> >>>>>> 
> >>>>>>> 2) Figure out which of those entries are missing from the id tree, 
> >>>>>>> e.g. lookup the document and see if the response is 
> >>>>>>> {"not_found":"missing"}.  You could also try using include_docs=true 
> >>>>>>> on the _changes feed to accomplish the same.
> >>>>>> 
> >>>>>>> 3) Once you've identified the problematic IDs, try creating them 
> >>>>>>> again.  You might end up introducing duplicates in the _changes feed, 
> >>>>>>> but if you do there's a procedure to fix that.
> >>>>>> 
> >>>>>> Regards, Adam
> >>>>>> 
> >>>>>> On Aug 8, 2011, at 12:31 PM, Simon Eisenmann wrote:
> >>>>>> 
> >>>>>>> Hi Guys,
> >>>>>>> 
> >>>>>>> i have a couple of CouchDB instances which started to be come
> >>>>>>> unpackable. It shows this error:
> >>>>>>> 
> >>>>>>> [Mon, 08 Aug 2011 16:16:27 GMT] [info] [<0.10808.123>] Starting
> >>>>>>> compaction for db "database1"
> >>>>>>> [Mon, 08 Aug 2011 16:16:45 GMT] [error] [<0.10808.123>] ** Generic
> >>>>>>> server <0.10808.123> terminating 
> >>>>>>> ** Last message in was {'EXIT',<0.30396.143>,
> >>>>>>>                     {function_clause,
> >>>>>>>                      [{couch_db_updater,'-copy_docs/4-fun-2-',
> >>>>>>>                        [not_found,
> >>>>>>>                         {db,<0.10807.123>,<0.10808.123>,nil,
> >>>>>>>                          <<"1312767347007568">>,<0.10805.123>,
> >>>>>>>                          <0.10809.123>,
> >>>>>>>                          {db_header,5,6340261,0,
> >>>>>>>                           {7198006895,{65952,10145}},
> >>>>>>>                           {7198010315,76813},
> >>>>>>>                           {7198051016,[]},
> >>>>>>>                           364,7050915618,nil,1000},
> >>>>>>>                          6340261,
> >>>>>>>                          {btree,<0.10805.123>,
> >>>>>>>                           {7198006895,{65952,10145}},
> >>>>>>>                           #Fun<couch_db_updater.10.19222179>,
> >>>>>>>                           #Fun<couch_db_updater.11.21515767>,
> >>>>>>>                           #Fun<couch_btree.5.124754102>,
> >>>>>>>                           #Fun<couch_db_updater.12.93888648>},
> >>>>>>>                          {btree,<0.10805.123>,
> >>>>>>>                           {7198010315,76813},
> >>>>>>>                           #Fun<couch_db_updater.13.40165027>,
> >>>>>>>                           #Fun<couch_db_updater.14.82810239>,
> >>>>>>>                           #Fun<couch_btree.5.124754102>,
> >>>>>>>                           #Fun<couch_db_updater.15.104121193>},
> >>>>>>>                          {btree,<0.10805.123>,
> >>>>>>>                           {7198051016,[]},
> >>>>>>>                           #Fun<couch_btree.0.83553141>,
> >>>>>>>                           #Fun<couch_btree.1.30790806>,
> >>>>>>>                           #Fun<couch_btree.2.124754102>,nil},
> >>>>>>>                          6340261,<<"spreedcom_accounts_1">>,
> >>>>>>>                          "/var/lib/couchdb/database1.couch",[],
> >>>>>>>                          [],nil,
> >>>>>>>                          {user_ctx,null,[],undefined},
> >>>>>>>                          nil,1000,
> >>>>>>>                          [before_header,after_header,on_file_open],
> >>>>>>>                          false},
> >>>>>>>                         <0.30397.143>]},
> >>>>>>>                       {lists,map,2},
> >>>>>>>                       {lists,map,2},
> >>>>>>>                       {couch_db_updater,copy_docs,4},
> >>>>>>>                       {couch_db_updater,'-copy_compact/3-fun-0-',6},
> >>>>>>>                       {couch_btree,stream_kv_node2,8},
> >>>>>>>                       {couch_btree,stream_kp_node,7},
> >>>>>>>                       {couch_btree,fold,4}]}}
> >>>>>>> 
> >>>>>>> 
> >>>>>>> ... lots of more similar errors following.
> >>>>>>> 
> >>>>>>> In this mailing list i have found a similar issue from the beginning 
> >>>>>>> of
> >>>>>>> this year (Fri, 31 Dec 2010 12:38:18), though without a solution.
> >>>>>>> 
> >>>>>>> This database got purged a lot and has constant changes. So pretty
> >>>>>>> similar from the older topic. So i assume there is something wrong in
> >>>>>>> the database file related to previous purges. So looks like some bug
> >>>>>>> here.
> >>>>>>> 
> >>>>>>> So - any hints how to fix this? The database is getting pretty large 
> >>>>>>> and
> >>>>>>> has to be packed from time to time.
> >>>>>>> 
> >>>>>>> CouchDB is running with Version 1.1.0 on Linux 64bit. The database has
> >>>>>>> initially been created with CouchDB 1.0.1 - though the issue appeared 
> >>>>>>> a
> >>>>>>> couple of weeks ago (packing has been working with 1.1.0 before.
> >>>>>>> 
> >>>>>>> 
> >>>>>>> Thank you and best regards
> >>>>>>> Simon
> >>>>>>> 
> >>>>>>> 
> >>>>>>> -- 
> >>>>>>> Simon Eisenmann
> >>>>>>> 
> >>>>>>> [ mailto:[email protected] ]
> >>>>>>> 
> >>>>>>> [ struktur AG | Kronenstraße 22a | D-70173 Stuttgart ]
> >>>>>>> [ T. +49.711.896656.0 | F.+49.711.89665610 ]
> >>>>>>> [ http://www.struktur.de | mailto:[email protected] ]
> >>>>>> 
> >>>>> 
> >>>>> -- 
> >>>>> Simon Eisenmann
> >>>>> 
> >>>>> [ mailto:[email protected] ]
> >>>>> 
> >>>>> [ struktur AG | Kronenstraße 22a | D-70173 Stuttgart ]
> >>>>> [ T. +49.711.896656.68 | F.+49.711.89665610 ]
> >>>>> [ http://www.struktur.de | mailto:[email protected] ]
> >>>> 
> >>> 
> >>> -- 
> >>> Simon Eisenmann
> >>> 
> >>> [ mailto:[email protected] ]
> >>> 
> >>> [ struktur AG | Kronenstraße 22a | D-70173 Stuttgart ]
> >>> [ T. +49.711.896656.68 | F.+49.711.89665610 ]
> >>> [ http://www.struktur.de | mailto:[email protected] ]
> >>> <repaircompact.py>
> >> 
> > 
> > -- 
> > Simon Eisenmann
> > 
> > [ mailto:[email protected] ]
> > 
> > [ struktur AG | Kronenstraße 22a | D-70173 Stuttgart ]
> > [ T. +49.711.896656.68 | F.+49.711.89665610 ]
> > [ http://www.struktur.de | mailto:[email protected] ]
> > <repaircompact.py>
> 

-- 
Simon Eisenmann

[ mailto:[email protected] ]

[ struktur AG | Kronenstraße 22a | D-70173 Stuttgart ]
[ T. +49.711.896656.68 | F.+49.711.89665610 ]
[ http://www.struktur.de | mailto:[email protected] ]

#!/usr/bin/python
"""
Simple script to load all documents returned by a CouchDB _changes feed with 
(since=0) to find if some of the documents are missing (checks HTTP status).

This script can help when you cannot compact your CouchDB database any more
because of missing documents which somehow got lost. This script finds all these
missing documents in a given database, creates them empty and immediately 
deletes the document again. This brings back the reference in the ID table and
thus the database can be compressed again. 

Big thanks to Adam Kocoloski of the CouchDB User Maillinglist for the technical
inside to get this script done.

(c)2011 Simon Eisenmann - mailto:[email protected]
"""

import sys
import cjson
import urllib
import tempfile
import urlparse
import httplib
import urllib

class Repairer(object):

    def __init__(self, database_url, since=0, batchsize=1000):

        if database_url.endswith("/"):
            database_url = database_url[:-1]
    
        self.since = since
        self.database_url = database_url
        self.database_url_parts = urlparse.urlparse(self.database_url)
        self.database_host, self.database_port = self.database_url_parts[1].split(":", 1)
        self.batchsize = batchsize
        
    def repair(self):

        url = "%s/_changes?since=%s" % (self.database_url, self.since)
        print "Changes feed is %s." % url
        
        changes, headers = urllib.urlretrieve(url, tempfile.mktemp())
        print "Fetched feed into %s." % changes

        # Load and print stats.
        conn = httplib.HTTPConnection(self.database_host, int(self.database_port))
        conn.request("GET", "%s/" % self.database_url_parts[2])
        res = conn.getresponse()
        info = cjson.decode(res.read())
        conn.close()

        fp = file(changes, "rb")
        fp.readline()
        line = fp.readline().strip()
        
        count = 0
        buf = []
        failed = {}
        seq = None
        
        while line:

            if line.endswith(","):
                line = line[:-1]
                
            if line.strip() == "]":
                break

            buf.append(cjson.decode(line))
                
            if len(buf) >= self.batchsize:
                count, seq = self._check(buf, count=count, failed=failed)
                buf = []

            line = fp.readline().strip()
                            
        fp.close()
                    
        if len(buf):
            count, seq = self._check(buf, count=count, failed=failed)
            buf = []

        print "Processed %d entries (last sequence: %s)" % (count, seq)
        print "Document count: %d (count: %d, deleted: %d)" % (info["doc_count"]+info["doc_del_count"], info["doc_count"], info["doc_del_count"])
        print "Document commited sequence is now: %d" % info["committed_update_seq"]

    def _check(self, results, count=0, failed=None):

        if failed is None:
            failed = {}

        seq = None

        conn = httplib.HTTPConnection(self.database_host, int(self.database_port))
        
        for c in results:
            seq = c.get("seq", 0)
            deleted = c.get("deleted", None)
            if True:
                i = c.get("id")
                quoted_i = urllib.quote_plus(i)
                if quoted_i.startswith("_design%2F"):
                    # FIXME(longsleep) waah what a cheap hack
                    quoted_i = "_design/%s" % quoted_i[10:]
                path = "%s/%s" % (self.database_url_parts[2], quoted_i)
                conn.request("GET", path)
                res = conn.getresponse()
                data = res.read()
                
                if res.status == 404:
                    # Make sure that it says reason=="deleted".
                    data = cjson.decode(data)
                    if data.get("reason", None) != "deleted":
                    
                        print "Not found error: %s!=deleted (%s)" % (data.get("reason"), path)
                        
                        # Recreate the missing document.
                        conn.request("PUT", path, "{}", headers={"Content-Type": "application/json", "Content-Length": "2"})
                        res = conn.getresponse()
                        data = cjson.decode(res.read())
                        assert res.status == 201, "Failed to PUT new document. (%r) (%r)" % (res.status, data)
                        rev = data.get("rev", None)
                        assert rev is not None, "PUT result has no rev. (%r)" % data
                        
                        # Delete it again.
                        conn.request("DELETE", path, headers={"If-Match": "%s" % rev})
                        res = conn.getresponse()
                        res.read()
                        assert res.status == 200, "Failed to DELETE new document."
                        
                count = count + 1
        
        conn.close()
        
        return count, seq


if __name__ == "__main__":
    args = sys.argv[1:];
    if not len(args) == 1:
        print "Usage: %s database_url" % sys.argv[0]
        sys.exit(1)

    repairer = Repairer(args[0])
    repairer.repair()

smime.p7s
Description: S/MIME cryptographic signature

Re: Compact not completing

Reply via email to