If you're ok posting it somewhere and its not too big that might be the easiest way to debug this.
On Thu, Oct 7, 2010 at 3:01 PM, Alexey Loshkarev <[email protected]> wrote: > Can't reproduce it with this script. > If I send you copy of my buggy database, will it help to improve couchdb? > > 2010/10/7 Paul Davis <[email protected]>: >> Alexey, >> >> I tried writing a script that'd hammer a single document and make >> random view requests to see if I could reproduce the issue. Currently >> the test doc is on revision 6,000 or so and I've not reproduced your >> issue. I've included the script below, can you try it or see anything >> that I should add to try and get closer to your situation? >> >> Paul >> >> #! /usr/bin/env python >> >> import random >> import couchdbkit >> >> def main(): >> server = couchdbkit.Server("http://127.0.0.1:5984/") >> db = server.get_or_create_db("foo") >> >> ddocid = "_design/baz" >> if ddocid not in db: >> db[ddocid] = {"views": {"foo": {"map": """ >> function(doc) {emit(doc._id, doc.value);} >> """}}} >> >> assert 0 <= len(db.view("baz/foo")) <= 1 >> >> docid = "foo" >> >> if docid in db: >> doc = db[docid] >> else: >> doc = {"_id": "foo", "value": 1} >> db[docid] = doc >> >> for i in range(1200): >> db[docid] = doc >> doc = db[docid] >> if random.random() < 0.33: >> assert len(db.view("baz/foo")) == 1 >> >> if __name__ == '__main__': >> main() >> >> >> On Thu, Oct 7, 2010 at 1:46 PM, Alexey Loshkarev <[email protected]> wrote: >>> If it helps.. >>> >>> This q_* documents are some such of state data. They are changed very >>> frequently. >>> I have 12 q_* documents and they may be changed 10-30 time per minute. >>> May be, there are race condition problem in couchdb in view creation? >>> >>> >>> 2010/10/7 Alexey Loshkarev <[email protected]>: >>>> I just tried to move view function to separate design doc and no >>>> success - duplicates (with same revision) in view response. >>>> >>>> >>>> 2010/10/7 Paul Davis <[email protected]>: >>>>> Alexey, >>>>> >>>>> Can you show the other views you have in your design doc? Or >>>>> alternatively, try moving this view to its own design doc? >>>>> >>>>> Paul >>>>> >>>>> On Thu, Oct 7, 2010 at 1:07 PM, Alexey Loshkarev <[email protected]> >>>>> wrote: >>>>>> Same problem appears again. >>>>>> What was done till yesterday: >>>>>> 1. Created new database at node2 >>>>>> 2. Replicated from node1 to node2 >>>>>> 3. Checked. _all_docs return only unique rows. queue/all returns only >>>>>> unique rows >>>>>> >>>>>> After a few hour of stable work, couchdb produce duplicates too. >>>>>> This time, no duplicate documents (_all_docs has only unique strings), >>>>>> but duplicate view response. >>>>>> Remove view index (between couchdb restarts) doesn't help. Couchdb >>>>>> produce stable duplicates in view. >>>>>> >>>>>> View function: >>>>>> function(doc) { >>>>>> if (doc.type == "queue") { >>>>>> log("BUG TEST id:" + doc._id + ", rev:" + doc._rev); >>>>>> emit(doc.ordering, doc); >>>>>> } >>>>>> } >>>>>> >>>>>> Response: >>>>>> $ curl http://localhost:5984/exhaust/_design/queues/_view/all >>>>>> {"total_rows":15,"offset":0,"rows":[ >>>>>> .... >>>>>> {"id":"q_nikolaevka","key":10,"value":{"_id":"q_nikolaevka","_rev":"16181-ae5e5cca96b0491f266bc97c37a88f47","name":"\u041d\u0418\u041a\u041e\u041b\u0410\u0415\u0412\u041a\u0410","default":false,"cars":[],"drivers":[],"ordering":10,"type":"queue"}}, >>>>>> {"id":"q_nikolaevka","key":10,"value":{"_id":"q_nikolaevka","_rev":"16176-3a7bbd128bfb257fd746dfd80769b6fc","name":"\u041d\u0418\u041a\u041e\u041b\u0410\u0415\u0412\u041a\u0410","default":false,"cars":[],"ordering":10,"type":"queue","drivers":[]}}, >>>>>> ... >>>>>> ]} >>>>>> >>>>>> >>>>>> Saw that? Two documents with different revisions in it! >>>>>> >>>>>> Also, couch.log consists of 3 (!) calls of this function for one >>>>>> document: >>>>>> [Thu, 07 Oct 2010 16:53:51 GMT] [info] [<0.180.0>] OS Process >>>>>> #Port<0.2132> Log :: BUG TEST id:q_nikolaevka, >>>>>> rev:16175-11cedeb529991cf60193d436d1a567e9 >>>>>> [Thu, 07 Oct 2010 16:53:51 GMT] [info] [<0.180.0>] OS Process >>>>>> #Port<0.2132> Log :: BUG TEST id:q_nikolaevka, >>>>>> rev:16176-3a7bbd128bfb257fd746dfd80769b6fc >>>>>> [Thu, 07 Oct 2010 16:53:51 GMT] [info] [<0.180.0>] OS Process >>>>>> #Port<0.2132> Log :: BUG TEST id:q_nikolaevka, >>>>>> rev:16181-ae5e5cca96b0491f266bc97c37a88f47 >>>>>> >>>>>> >>>>>> Then I do compact to eliminate old revisions. >>>>>> And now I have 3 duplicates per q_nikolaevka with same revisions! >>>>>> >>>>>> I think, I found problem. This document has 1000 revisions in database >>>>>> and here (http://wiki.apache.org/couchdb/HTTP_database_API) is >>>>>> described default maximum of 1000 revisions of document. >>>>>> >>>>>> >>>>>> >>>>>> 2010/10/7 Alexey Loshkarev <[email protected]>: >>>>>>> Haha! >>>>>>> Fresh replication (into new database) eliminates duplicates and I can >>>>>>> sleep quietly. >>>>>>> >>>>>>> >>>>>>> 2010/10/7 Alexey Loshkarev <[email protected]>: >>>>>>>> P.S. dmesg doesn't show any hardware problems (bad blocks, segfaults >>>>>>>> and so on). >>>>>>>> P.P.S. I think, I was migrate 0.10.1 -> 1.0.1 without database >>>>>>>> replication, so it may be my fault. >>>>>>>> >>>>>>>> 2010/10/7 Alexey Loshkarev <[email protected]>: >>>>>>>>> I think, this is database file corruption. Query _all_docs returns me >>>>>>>>> a lot of duplicates (about 3.000 duplicates in ~350.000-documents >>>>>>>>> database). >>>>>>>>> >>>>>>>>> >>>>>>>>> [12:17:48 r...@node2 (~)]# curl >>>>>>>>> http://localhost:5984/exhaust/_all_docs > all_docs >>>>>>>>> % Total % Received % Xferd Average Speed Time Time Time >>>>>>>>> Current >>>>>>>>> Dload Upload Total Spent Left >>>>>>>>> Speed >>>>>>>>> 100 37.7M 0 37.7M 0 0 1210k 0 --:--:-- 0:00:31 >>>>>>>>> --:--:-- 943k >>>>>>>>> [12:18:23 r...@node2 (~)]# wc -l all_docs >>>>>>>>> 325102 all_docs >>>>>>>>> [12:18:27 r...@node2 (~)]# uniq all_docs |wc -l >>>>>>>>> 322924 >>>>>>>>> >>>>>>>>> >>>>>>>>> Node1 has duplicates too, but very small amount: >>>>>>>>> [12:18:48 r...@node1 (~)]# curl >>>>>>>>> http://localhost:5984/exhaust/_all_docs > all_docs >>>>>>>>> % Total % Received % Xferd Average Speed Time Time Time >>>>>>>>> Current >>>>>>>>> Dload Upload Total Spent Left >>>>>>>>> Speed >>>>>>>>> 100 38.6M 0 38.6M 0 0 693k 0 --:--:-- 0:00:57 >>>>>>>>> --:--:-- 55809 >>>>>>>>> [12:19:57 r...@node1 (~)]# wc -l all_docs >>>>>>>>> 332714 all_docs >>>>>>>>> [12:20:54 r...@node1 (~)]# uniq all_docs |wc -l >>>>>>>>> 332523 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> 2010/10/7 Alexey Loshkarev <[email protected]>: >>>>>>>>>> I can't say what specific it may be, so let dive into history of this >>>>>>>>>> database(s). >>>>>>>>>> >>>>>>>>>> First (before a 5-6 weeks) it was node2 server with couchdb v10.1. >>>>>>>>>> There was testing database on it. There were alot of structural >>>>>>>>>> changes, view updates and so on. >>>>>>>>>> Than it becomes production and starts working ok. >>>>>>>>>> Than we realize we need backup, and best - online backup (as we have >>>>>>>>>> couchdb we can do this). >>>>>>>>>> So, there appears node1 server with couchdb 1.0.1. I replicated node2 >>>>>>>>>> to node1, than initiates continuous replication node1 -> node2 and >>>>>>>>>> node2 -> node1. All clients works with node2 only. All works fine >>>>>>>>>> about a month. >>>>>>>>>> Few days before we was at peak load, so I'v want to use node1 and >>>>>>>>>> node2 simultaneously. This was done by round-robin on DNS (host db >>>>>>>>>> returns 2 different IP - node1's ip and node2's IP). All works fine >>>>>>>>>> about 5 minutes, than I gave first conflict (view queues/all returns >>>>>>>>>> two identical documents, one - actual version, second - conflicted >>>>>>>>>> revision, document with field _conflict="....."). Document ID was >>>>>>>>>> q_tsentr. >>>>>>>>>> As I don't has conflict resolver yet, I resolves conflict manually by >>>>>>>>>> deleting conflicted revision. I'v also disables round-robin and move >>>>>>>>>> all load to node2 to avoid conflicts for a while to wrote conflict >>>>>>>>>> resolver. >>>>>>>>>> >>>>>>>>>> It works ok (node1 and node2 in mutual replications, active load on >>>>>>>>>> node2) till yesterday. >>>>>>>>>> Yesterday operator call me he has duplicate data in program. At this >>>>>>>>>> queues/all returns 1 duplicated document - the same as few days >>>>>>>>>> before >>>>>>>>>> (id = q_tsentr). One row consists of actual document version, another >>>>>>>>>> row consists of old revision with field _conflicted_revision="some >>>>>>>>>> old >>>>>>>>>> revision". >>>>>>>>>> >>>>>>>>>> I tried to delete this revision but without success. GET for >>>>>>>>>> q_tsentr?rev="some old revision" returns valid document. DELETE >>>>>>>>>> q_tsentr?rev="some old revision" gaves me 409 error. >>>>>>>>>> Here are log files (node2): >>>>>>>>>> >>>>>>>>>> [Wed, 06 Oct 2010 12:17:19 GMT] [info] [<0.7239.1462>] 10.0.0.41 - - >>>>>>>>>> 'GET' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229 >>>>>>>>>> 200 >>>>>>>>>> [Wed, 06 Oct 2010 12:17:30 GMT] [info] [<0.7245.1462>] 10.0.0.41 - - >>>>>>>>>> 'GET' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229 >>>>>>>>>> 200 >>>>>>>>>> [Wed, 06 Oct 2010 12:17:35 GMT] [info] [<0.7287.1462>] 10.0.0.41 - - >>>>>>>>>> 'GET' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229 >>>>>>>>>> 200 >>>>>>>>>> [Wed, 06 Oct 2010 12:17:43 GMT] [info] [<0.7345.1462>] 10.0.0.41 - - >>>>>>>>>> 'GET' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229 >>>>>>>>>> 200 >>>>>>>>>> [Wed, 06 Oct 2010 12:18:02 GMT] [info] [<0.7864.1462>] 10.0.0.41 - - >>>>>>>>>> 'DELETE' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229 >>>>>>>>>> 409 >>>>>>>>>> [Wed, 06 Oct 2010 12:18:29 GMT] [info] [<0.8331.1462>] 10.0.0.41 - - >>>>>>>>>> 'GET' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229 >>>>>>>>>> 200 >>>>>>>>>> [Wed, 06 Oct 2010 12:18:39 GMT] [info] [<0.8363.1462>] 10.0.0.41 - - >>>>>>>>>> 'DELETE' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229 >>>>>>>>>> 409 >>>>>>>>>> [Wed, 06 Oct 2010 12:38:19 GMT] [info] [<0.16765.1462>] 10.0.0.41 - - >>>>>>>>>> 'GET' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229 >>>>>>>>>> 200 >>>>>>>>>> [Wed, 06 Oct 2010 12:40:40 GMT] [info] [<0.17337.1462>] 10.0.0.41 - - >>>>>>>>>> 'GET' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229 >>>>>>>>>> 200 >>>>>>>>>> [Wed, 06 Oct 2010 12:40:45 GMT] [info] [<0.17344.1462>] 10.0.0.41 - - >>>>>>>>>> 'DELETE' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229 >>>>>>>>>> 404 >>>>>>>>>> >>>>>>>>>> Logs at node1: >>>>>>>>>> >>>>>>>>>> [Wed, 06 Oct 2010 12:17:46 GMT] [info] [<0.25979.462>] 10.20.20.13 - >>>>>>>>>> - >>>>>>>>>> 'GET' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229 >>>>>>>>>> 200 >>>>>>>>>> [Wed, 06 Oct 2010 12:17:56 GMT] [info] [<0.26002.462>] 10.20.20.13 - >>>>>>>>>> - >>>>>>>>>> 'DELETE' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229 >>>>>>>>>> 200 >>>>>>>>>> [Wed, 06 Oct 2010 12:21:25 GMT] [info] [<0.27133.462>] 10.20.20.13 - >>>>>>>>>> - >>>>>>>>>> 'DELETE' /exhaust/q_tsentr?rev=all 404 >>>>>>>>>> [Wed, 06 Oct 2010 12:21:49 GMT] [info] [<0.27179.462>] 10.20.20.13 - >>>>>>>>>> - >>>>>>>>>> 'DELETE' /exhaust/q_tsentr?revs=true 404 >>>>>>>>>> [Wed, 06 Oct 2010 12:24:41 GMT] [info] [<0.28959.462>] 10.20.20.13 - >>>>>>>>>> - >>>>>>>>>> 'DELETE' /exhaust/q_tsentr?revs=true 404 >>>>>>>>>> [Wed, 06 Oct 2010 12:38:07 GMT] [info] [<0.10362.463>] 10.20.20.13 - >>>>>>>>>> - >>>>>>>>>> 'GET' /exhaust/q_tsentr?revs=all 404 >>>>>>>>>> [Wed, 06 Oct 2010 12:38:23 GMT] [info] [<0.10534.463>] 10.20.20.13 - >>>>>>>>>> - >>>>>>>>>> 'GET' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229 >>>>>>>>>> 200 >>>>>>>>>> [Wed, 06 Oct 2010 12:40:25 GMT] [info] [<0.12014.463>] 10.20.20.13 - >>>>>>>>>> - >>>>>>>>>> 'GET' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229 >>>>>>>>>> 200 >>>>>>>>>> [Wed, 06 Oct 2010 12:40:33 GMT] [info] [<0.12109.463>] 10.20.20.13 - >>>>>>>>>> - >>>>>>>>>> 'DELETE' /exhaust/q_tsentr?rev=27144-f516ac68e697874eef9c7562f3e2e229 >>>>>>>>>> 404 >>>>>>>>>> >>>>>>>>>> So, I deletes this document and creates new one (id - q_tsentr2). >>>>>>>>>> It will works fine about hour. >>>>>>>>>> >>>>>>>>>> Node2 has undeletable duplicate, so I move all clients to node1. >>>>>>>>>> There >>>>>>>>>> were now such problem, view response was correct. >>>>>>>>>> >>>>>>>>>> Than I tried to recover database at node2. I stops, deletes view >>>>>>>>>> index >>>>>>>>>> files and start couchdb again. Than i ping all view to recreate >>>>>>>>>> index. >>>>>>>>>> At the end ot this procedure, i saw duplicates of identical rows (see >>>>>>>>>> first letter in this thread). Node1 has no such problems, so I stops >>>>>>>>>> replication, leave load on node1 and go for crying into this >>>>>>>>>> maillist. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 2010/10/6 Paul Davis <[email protected]>: >>>>>>>>>>> It was noted on IRC that I should give a bit more explanation. >>>>>>>>>>> >>>>>>>>>>> With the information that you've provided there are two possible >>>>>>>>>>> explanations. Either your client code is not doing what you expect >>>>>>>>>>> or >>>>>>>>>>> you've triggered a really crazy bug in the view indexer that caused >>>>>>>>>>> it >>>>>>>>>>> to reindex a database without invalidating a view and not removing >>>>>>>>>>> keys for docs when it reindexed. >>>>>>>>>>> >>>>>>>>>>> Given that no one has reported anything remotely like this and I >>>>>>>>>>> can't >>>>>>>>>>> immediately see a code path that would violate so many behaviours in >>>>>>>>>>> the view updater, I'm leaning towards this being an issue in the >>>>>>>>>>> client code. >>>>>>>>>>> >>>>>>>>>>> If there was something specific that changed since the view worked, >>>>>>>>>>> that might illuminate what could cause this sort of behaviour if it >>>>>>>>>>> is >>>>>>>>>>> indeed a bug in CouchDB. >>>>>>>>>>> >>>>>>>>>>> HTH, >>>>>>>>>>> Paul Davis >>>>>>>>>>> >>>>>>>>>>> On Wed, Oct 6, 2010 at 12:24 PM, Alexey Loshkarev >>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>> I have such view function (map only, without reduce) >>>>>>>>>>>> >>>>>>>>>>>> function(doc) { >>>>>>>>>>>> if (doc.type == "queue") { >>>>>>>>>>>> emit(doc.ordering, doc.drivers); >>>>>>>>>>>> } >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> It works perfect till yesterday, but today it start return >>>>>>>>>>>> duplicates >>>>>>>>>>>> Example: >>>>>>>>>>>> $ curl http://node2:5984/exhaust/_design/queues/_view/all >>>>>>>>>>>> >>>>>>>>>>>> {"total_rows":46,"offset":0,"rows":[ >>>>>>>>>>>> {"id":"q_mashinyi-v-gorode","key":0,"value":["d_mironets_ivan","d_smertin_ivan","d_kasyanenko_sergej","d_chabotar_aleksandr","d_martyinenko_yurij","d_krikunenko_aleksandr"]}, >>>>>>>>>>>> {"id":"q_mashinyi-v-gorode","key":0,"value":["d_mironets_ivan","d_smertin_ivan","d_kasyanenko_sergej","d_chabotar_aleksandr","d_martyinenko_yurij","d_krikunenko_aleksandr"]}, >>>>>>>>>>>> {"id":"q_mashinyi-v-gorode","key":0,"value":["d_mironets_ivan","d_smertin_ivan","d_kasyanenko_sergej","d_chabotar_aleksandr","d_martyinenko_yurij","d_krikunenko_aleksandr"]}, >>>>>>>>>>>> ...... >>>>>>>>>>>> {"id":"q_oblasnaya","key":2,"value":["d_kramarenko_viktor","d_skorodzievskij_eduard"]}, >>>>>>>>>>>> {"id":"q_oblasnaya","key":2,"value":["d_kramarenko_viktor","d_skorodzievskij_eduard"]}, >>>>>>>>>>>> {"id":"q_oblasnaya","key":2,"value":["d_kramarenko_viktor","d_skorodzievskij_eduard"]}, >>>>>>>>>>>> ........ >>>>>>>>>>>> {"id":"q_otstoj","key":11,"value":["d_gavrilenko_aleksandr","d_klishnev_sergej"]} >>>>>>>>>>>> ]} >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I tried to restart server, recreate view (remove view index file), >>>>>>>>>>>> compact view and database and none of this helps, it still returns >>>>>>>>>>>> duplicates. >>>>>>>>>>>> What happens? How to avoid it in the future? >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> ---------------- >>>>>>>>>>>> Best regards >>>>>>>>>>>> Alexey Loshkarev >>>>>>>>>>>> mailto:[email protected] >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> ---------------- >>>>>>>>>> Best regards >>>>>>>>>> Alexey Loshkarev >>>>>>>>>> mailto:[email protected] >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> ---------------- >>>>>>>>> Best regards >>>>>>>>> Alexey Loshkarev >>>>>>>>> mailto:[email protected] >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> ---------------- >>>>>>>> Best regards >>>>>>>> Alexey Loshkarev >>>>>>>> mailto:[email protected] >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ---------------- >>>>>>> Best regards >>>>>>> Alexey Loshkarev >>>>>>> mailto:[email protected] >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> ---------------- >>>>>> Best regards >>>>>> Alexey Loshkarev >>>>>> mailto:[email protected] >>>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> ---------------- >>>> Best regards >>>> Alexey Loshkarev >>>> mailto:[email protected] >>>> >>> >>> >>> >>> -- >>> ---------------- >>> Best regards >>> Alexey Loshkarev >>> mailto:[email protected] >>> >> > > > > -- > ---------------- > Best regards > Alexey Loshkarev > mailto:[email protected] >
