Re: Testing Upgrade to 1.6.x

2014-04-27 Thread Stefan Kögl
On Sun, Apr 27, 2014 at 1:38 AM, Robert Samuel Newson rnew...@apache.orgwrote:


 Did you 'make clean' first? This looks like you have a module with the old
 #changes_args record and other modules with the new one.


I ran 'make distclean' before ./boostrap and ./configure


Testing Upgrade to 1.6.x

2014-04-26 Thread Stefan Kögl
Hi,

As I see that 1.6.x is coming along, I've tested upgrading an 1.5 db to the
current 1.6.x branch (26cff2d51473cbc8c0dfc655b2dbb986493bbcb8).

I've stopped couch, compiled 1.6.x with R15B03 and installed the result on
an Ubuntu 12.04.4 LTS.

The instance always failed with the logging output shown in

https://gist.github.com/stefankoegl/11332148

Is this a bug in the 1.6.x branch, or is my upgrade process somehow broken?


-- Stefan


Re: [ANN] Dave Cottlehuber joins the PMC

2012-10-31 Thread Stefan Kögl
Congratulations! Well deserved, Dave \°/

On Tue, Oct 30, 2012 at 11:55 PM, Noah Slater nsla...@apache.org wrote:
 Hey folks,

 I am delighted to announce that Dave Cottlehuber joins the Apache CouchDB
 Project Management Committee today.

 Dave has made outstanding, sustained contributions, and this appointment is
 an official acknowledgement of his position within the community, and our
 trust in his ability to provide oversight for the project.

 Everybody, please join me in congratulating Dave!

 I would also like to take this opportunity to bid a fond farewell to J.
 Chris Anderson, who has resigned from the PMC. J. Chris was instrumental to
 much of CouchDB's early success, and we wish him all the best with his
 future projects.

 Thanks,

 --
 NS


Re: COUCHDB-1444 (missing_named_view error on existing javascript design doc and view)

2012-08-03 Thread Stefan Kögl
Hi,

On Fri, Aug 3, 2012 at 1:50 PM, Robert Newson robert.new...@gmail.com wrote:
 Yes, I've been looking into it but I can't induce the condition locally, 
 which hinders my progress considerably.

 I have made a related fix on couchdb master 
 (ce7204b7eb64ac98d4445130fc4e647ed5181da9) which might have fixed this but I 
 can't confirm.

 If you are able to induce this reliably, I would like to work with you to 
 resolve this issue. IRC is best.

Unfortunately I can't reproduce this reliably. The problem only
appears on one of my production systems, but there rather often (at
least every 2nd day, sometimes even 5 times a day). However, if you
could prepare a patch against 1.2.0, I'd try to patch this system and
see what happens.


-- Stefan


 On 2 Aug 2012, at 22:15, Stefan Kögl wrote:

 Hi,

 I just wanted to ask if any of the devs would be willing to help with
 tracking down and solving COUCHDB-1444 (missing_named_view error on
 existing javascript design doc and view), which has been open since
 March.

 Already 7 people who were affected by this issue (some on production
 systems) have commented on the issue, confirming it for combinations
 of 1.1.1, 1.2.0, 1.3.0@master (f0d6f19bc8) and R13B03, R14, R15B01,
 but apparently no one could provide any useful debugging output.

 I lack the Erlang skills to tackle this issue myself, but I'd like to
 help with debugging as much as I can.

 Any feedback, thoughts, etc would be very much appreciated!


 Thanks,

 -- Stefan



COUCHDB-1444 (missing_named_view error on existing javascript design doc and view)

2012-08-02 Thread Stefan Kögl
Hi,

I just wanted to ask if any of the devs would be willing to help with
tracking down and solving COUCHDB-1444 (missing_named_view error on
existing javascript design doc and view), which has been open since
March.

Already 7 people who were affected by this issue (some on production
systems) have commented on the issue, confirming it for combinations
of 1.1.1, 1.2.0, 1.3.0@master (f0d6f19bc8) and R13B03, R14, R15B01,
but apparently no one could provide any useful debugging output.

I lack the Erlang skills to tackle this issue myself, but I'd like to
help with debugging as much as I can.

Any feedback, thoughts, etc would be very much appreciated!


Thanks,

-- Stefan


Re: [VOTE] Apache CouchDB 1.2.0 release, third round

2012-03-28 Thread Stefan Kögl
Hi everybody,

I just wanted to raise some attention to the DB compaction bug discovered today

 https://issues.apache.org/jira/browse/COUCHDB-1451

While I initially discovered the bug with a 1.1.2 instance, this could
also affect 1.2. I think this issue should be resolved before closing
the vote. Filipe already provided a patch, so it shouldn't take too
long.


-- Stefan


Re: [VOTE] Apache CouchDB 1.2.0 release, third round

2012-03-28 Thread Stefan Kögl
On 03/28/2012 07:59 PM, Jan Lehnardt wrote:
 Hi Stefan, thanks for bringing this up. This is just to explain the procedure.
 A vote happens on a fixed package and no changes to the package can be made
 during the vote. If we find a flaw significant enough to include a fix into
 the release that is currently being voted on, we can abort the vote, fix,
 re-package and start a new vote, like we've done before.

Thanks for the explanation, that's exactly what I expected ;) What I
meant was, that we should either

* rule out that this affects 1.2.x (but apparently it does)
* fix it and start another round of votes

Anyway, I wanted to be in time before somebody closes the vote and 1.2
might be shipped with a bug.


-- Stefan


Re: [VOTE] Apache CouchDB 1.2.0 release, third round

2012-03-28 Thread Stefan Kögl
On 03/28/2012 08:08 PM, Robert Newson wrote:
 Can you clarify the conditions under which this bug occurs? I'm
 inclined to agree with Filipe that it's release blocking, but if it
 the conditions to induce are very rare, I might change my mind. Given
 that you induced it without apparent effort, I don't think it's likely
 to be rare.

I have a setup where I redirect live read traffic to test instances,
while replicating writes from a stable master. This is the first such
incident in a few weeks of doing this (with maybe 5 to 10 db
compactions), so I can't judge how often it occurs. From my point of
view there was no special work load on the instance when it happend.


-- Stefan


Re: [VOTE] Apache CouchDB 1.2.0 release, third round

2012-03-26 Thread Stefan Kögl
On Sun, Mar 25, 2012 at 1:04 PM, Paul Davis paul.joseph.da...@gmail.com wrote:
 Try adding:

 --with-js-lib-name=mozjs185 to your ./configure invocation

That worked.

+1 from me :)


 Also, why is libjs installed on this system? Was that a custom thing
 you did or is it included for some reason in a package?

I can't remember why, tbh. Is that important somehow?


-- Stefan


Re: [VOTE] Apache CouchDB 1.2.0 release, third round

2012-03-25 Thread Stefan Kögl
Hi,

On Sat, Mar 24, 2012 at 6:39 PM, Noah Slater nsla...@tumbolia.org wrote:
 Please follow the test procedure before voting:

 http://wiki.apache.org/couchdb/Test_procedure

Its probably related to my system (Ubuntu precise beta 1 with current
updates), but I'm getting the following errors during make check (in
the make part).

http://friendpaste.com/3cmPcgwU1hXHYjXhixaLFk

Any ideas whether I need to fix my system or its a bug?


-- Stefan


Re: [VOTE] Apache CouchDB 1.2.0 release, third round

2012-03-25 Thread Stefan Kögl
Hi,

On Sun, Mar 25, 2012 at 9:27 AM, Paul Davis paul.joseph.da...@gmail.com wrote:
 Also the contents of mozjs185.pc

 This is probably in either /usr/lib/pkconfig or /usr/local/lib/pkgconfig.

http://friendpaste.com/1UJ8uaGpYJRSpfhVZqrNvF


 Can you paste me the contents of config.log?

http://friendpaste.com/1AXWuMuTUzCMCXePuSJijA


-- Stefan


Re: {error,emfile} on CouchDB 1.2.x

2012-03-19 Thread Stefan Kögl
On Mon, Mar 19, 2012 at 9:31 AM, Randall Leeds randall.le...@gmail.com wrote:
 Fixed on 1.2.x and 1.1.x. Need to sleep and take a look at how I want to
 handle it on master.
 Thanks again for picking up on this one, Stefan. It's been in there since
 forever and I'd definitely seen the symptom without knowing the cause.

Thanks for taking care of this so quickly -- looking forward to a
great release :)


-- Stefan


{error,emfile} on CouchDB 1.2.x

2012-03-18 Thread Stefan Kögl
Hi,

Another thing I noticed during my tests of CouchDB 1.2.x. I redirected
live traffic to the instance and after a rather short time, requests
were failing with the following information in the logs:


[Sun, 18 Mar 2012 16:39:24 GMT] [error] [0.27554.2]
{error_report,0.31.0,
{0.27554.2,std_error,
 [{application,mochiweb},
  Accept failed error,
  {error,emfile}]}}
[Sun, 18 Mar 2012 16:39:24 GMT] [error] [0.27554.2]
{error_report,0.31.0,
  {0.27554.2,crash_report,
   [[{initial_call,
 {mochiweb_acceptor,init,
 ['Argument__1','Argument__2',
  'Argument__3']}},
 {pid,0.27554.2},
 {registered_name,[]},
 {error_info,
 {exit,
 {error,accept_failed},
 [{mochiweb_acceptor,init,3},
  {proc_lib,init_p_do_apply,3}]}},
 {ancestors,
 [couch_httpd,couch_secondary_services,
  couch_server_sup,0.32.0]},
 {messages,[]},
 {links,[0.129.0]},
 {dictionary,[]},
 {trap_exit,false},
 {status,running},
 {heap_size,233},
 {stack_size,24},
 {reductions,244}],
[]]}}


I think emfile means that CouchDB (or mochiweb?) couldn't open any
more files / connections. I've set the (hard and soft) nofile limit for
user couchdb to 4096, but didn't raise the ERL_MAX_PORTS accordingly.
Anyway, as soon as the error occured, CouchDB started writing most of my
view files from scratch, rendering the instance unusable.

I'd expect CouchDB to fail more gracefully when the maximum number of
open files is reached. Is this a bug or expected behaviour?


-- Stefan



Re: Update Conflict for PUT/DELETE in _replicator

2012-03-12 Thread Stefan Kögl
In the meantime I tried copying the _replicator database to another
instance, where I could delete the entry without problems. However it
still doesn't work on the initial instance. If one of the committers
is interested, I could organize either remote access via HTTP, or
shell access to the machine it is running on.

-- Stefan



On Fri, Mar 2, 2012 at 4:30 PM, Stefan Kögl koeglste...@gmail.com wrote:
 On Fri, Mar 2, 2012 at 4:06 PM, Jan Lehnardt j...@apache.org wrote:
 I just created a replication doc under 1.1.1 and then copied the
 _replicator.couch file to a 1.2.x. On update I the expected result
 Robert also got (Only the replicator can edit replication documents
 that are in the triggered state.. a curl -X DELETE on the doc with
 ?rev=4-abcd... (no quotes) also worked.

 The document was created with 1.2.x, from around the time of the second RC.

 I also tried with quotes and got

 $ curl -sv -X DELETE
 http://stefan:*@127.0.0.1:5984/_replicator/mygpo?rev=\131-57b4da8d3163468cb0bbf4fd30c87832\;
 * About to connect() to 127.0.0.1 port 5984 (#0)
 *   Trying 127.0.0.1... connected
 * Connected to 127.0.0.1 (127.0.0.1) port 5984 (#0)
 * Server auth using Basic with user 'stefan'
 DELETE /_replicator/mygpo?rev=131-57b4da8d3163468cb0bbf4fd30c87832 HTTP/1.1
 Authorization: Basic **
 User-Agent: curl/7.19.7 (x86_64-pc-linux-gnu) libcurl/7.19.7 OpenSSL/0.9.8k 
 zlib/1.2.3.3 libidn/1.15
 Host: 127.0.0.1:5984
 Accept: */*

  HTTP/1.1 500 Internal Server Error
  Server: CouchDB/1.2.0 (Erlang OTP/R14B04)
  Date: Fri, 02 Mar 2012 15:18:31 GMT
  Content-Type: text/plain; charset=utf-8
  Content-Length: 44
  Cache-Control: must-revalidate
 
 {error:unknown_error,reason:badarg}
 * Connection #0 to host 127.0.0.1 left intact
 * Closing connection #0

 After that I also tried compacting the _replicator database, but also
 that didn't change anything.


 -- Stefan


Re: Update Conflict for PUT/DELETE in _replicator

2012-03-12 Thread Stefan Kögl
On Mon, Mar 12, 2012 at 2:01 PM, Jason Smith j...@iriscouch.com wrote:
 On Mon, Mar 12, 2012 at 12:57 PM, Stefan Kögl koeglste...@gmail.com wrote:
 In the meantime I tried copying the _replicator database to another
 instance, where I could delete the entry without problems. However it
 still doesn't work on the initial instance. If one of the committers
 is interested, I could organize either remote access via HTTP, or
 shell access to the machine it is running on.

 Hm, if you copied the _replicator.couch file from the 1.2.x prerelease
 to version 1.1 then it will not support the newer file format.

 You could replicate to it, or since replication docs have no
 attachments, just query _all_docs, massage that into a _bulk_docs, and
 post it to the other couch.

That's not the problem. I copied the database to another 1.2.x
CouchDB, which could read and update it correctly.
The problem is the original 1.2.x instance (both are on the current
1.2.x branch, btw) where I created the entry but can not update /
delete it anymore.

-- Stefan


Re: Update Conflict for PUT/DELETE in _replicator

2012-03-12 Thread Stefan Kögl
On Mon, Mar 12, 2012 at 2:20 PM, Robert Newson rnew...@apache.org wrote:
 I'd welcome a chance to access this database. couchdb admin level
 access is sufficient for now, email me directly at rnew...@apache.org.

For general information: The problem spontaneously disappeared when I
tried to reproduce it with the admin I created for rnewson... I'll
report back in case it happens again.


-- Stefan


Re: Crash of CouchDB 1.2.x

2012-03-11 Thread Stefan Kögl
Hi,

I had my CouchDB 1.2.x (fb72251bc7114b07f0667867226ec9e200732dac)
crash again twice today.

The first one was while the instance was pull replicating (which
failed due to the source being unreachable), and compacting a rather
large view (from ~216G disk size to about 57G data size, if that's
relevant).

Here's the log that shows the crash

http://friendpaste.com/41Idie3gGdQRxJPEyVHpTR

After the crash the view compaction stopped, and I tried to restart it

$ curl -H Content-Type: application/json -X POST
http://stefan:@localhost:5984/mygpo/_compact/users-tmp
{error:timeout,reason:{gen_server,call,\n
[0.19783.69,\n
{start_compact,#Funcouch_view_compactor.0.15011741}]}}

http://friendpaste.com/2A086gHN8dNEJHPpMkDrPO

I assume this is because deleting the .compact.view file took too
long. The compaction started anyway, though. Besides the replication,
there were no other activities on the server.

Please let me know if I can assist with debugging somehow.


-- Stefan



On Fri, Mar 2, 2012 at 11:51 AM, Jan Lehnardt j...@apache.org wrote:

 On Mar 2, 2012, at 11:29 , Stefan Kögl wrote:

 On Thu, Mar 1, 2012 at 9:39 PM, Jan Lehnardt j...@apache.org wrote:
 Where in there did you do the git pull? And was a make clean or git clean
 involved?

 IIRC I did not pull in between, only apply the patch I mentioned
 earlier ( http://friendpaste.com/178nPFgfyyeGf2vtNRpL0w ). And I
 probably did a make clean  make  make install.


 I think you should be in the clear with that procedure, but just to be
 sure, I think it'd be worth rm'ing all .beam files you find manually
 after the uninstall.

 Done, I'll report back if the problem appears again.

 Thanks Stefan, your help here is really appreciated :)

 Cheers
 Jan
 --



Re: Crash of CouchDB 1.2.x

2012-03-11 Thread Stefan Kögl

On 03/11/2012 01:33 PM, Bob Dionne wrote:

At a glance I would suspect you've run out of disk space and the
error thrown is not caught resulting in the badmatch,


At the time of the crash there were about 70G of free space left, which 
is enough for the compaction to finish.



-- Stefan


Re: Crash of CouchDB 1.2.x

2012-03-11 Thread Stefan Kögl

On 03/11/2012 02:32 PM, Jason Smith wrote:

Longshot, but is it possible that couch had a file handle to an
unlinked file, so once the (OS) process crashed, the space was
freed?


Hmm.. that might be possible. I ran a database compaction before that. 
When I noticed the crash I saw that the db compaction finished, but it 
might be possible that it still had a handle to the old db file.


How should we proceed from here? Is it possible for me to provide 
further information about that in retrospect?



-- Stefan


Re: Crash of CouchDB 1.2.x

2012-03-02 Thread Stefan Kögl
On Thu, Mar 1, 2012 at 9:39 PM, Jan Lehnardt j...@apache.org wrote:
 Where in there did you do the git pull? And was a make clean or git clean
 involved?

IIRC I did not pull in between, only apply the patch I mentioned
earlier ( http://friendpaste.com/178nPFgfyyeGf2vtNRpL0w ). And I
probably did a make clean  make  make install.


 I think you should be in the clear with that procedure, but just to be
 sure, I think it'd be worth rm'ing all .beam files you find manually
 after the uninstall.

Done, I'll report back if the problem appears again.


-- Stefan


Re: Update Conflict for PUT/DELETE in _replicator

2012-03-02 Thread Stefan Kögl
On Fri, Mar 2, 2012 at 1:39 PM, Jan Lehnardt j...@apache.org wrote:
 On Mar 2, 2012, at 13:25 , Stefan Kögl wrote:
 Again something I noticed during my 1.2.x experiments: It seems I
 can't update or remove a document from the _replicator database, which
 I use for pull-replication into my 1.2.x instance.

 # get current _rev
 $ curl http://127.0.0.1:5984/_replicator/mygpo
 {_id:mygpo,_rev:131-57b4da8d3163468cb0bbf4fd30c87832,source:,target:http://127.0.0.1:5984/mygpo,create_target:false,continuous:true,user_ctx:{name:stefan,roles:[admin]},owner:stefan,_replication_state:triggered,_replication_state_time:2012-03-02T02:56:12+00:00,_replication_id:f9fc5457b278d3cdb1ba2f1881253b04}

 # try to delete
 $ curl -X DELETE
 http://127.0.0.1:5984/_replicator/mygpo?_rev=131-57b4da8d3163468cb0bbf4fd30c87832;
 {error:conflict,reason:Document update conflict.}

 this should be ?rev=... (no underscore)

Of course, I also tried that but it didn't work either... see below.



 $ curl -X PUT http://127.0.0.1:5984/_replicator/mygpo; -d @replication.json

 Can you try adding the ?rev= in the URL as well?

# get the rev first
$ curl http://127.0.0.1:5984/_replicator/mygpo
{_id:mygpo,_rev:131-57b4da8d3163468cb0bbf4fd30c87832,source:,target:http://127.0.0.1:5984/mygpo,create_target:false,continuous:true,user_ctx:{name:stefan,roles:[admin]},owner:stefan,_replication_state:triggered,_replication_state_time:2012-03-02T02:56:12+00:00,_replication_id:f9fc5457b278d3cdb1ba2f1881253b04}

# delete with rev instead of _rev
$ curl -X DELETE
http://127.0.0.1:5984/_replicator/mygpo?rev=131-57b4da8d3163468cb0bbf4fd30c87832;
{error:conflict,reason:Document update conflict.}

# now PUT with rev in URL
$ curl -X PUT 
http://127.0.0.1:5984/_replicator/mygpo?rev=131-57b4da8d3163468cb0bbf4fd30c87832;
-d @replication.json
{error:conflict,reason:Document update conflict.}

# still the same rev (no other change in between)
$ curl http://127.0.0.1:5984/_replicator/mygpo
{_id:mygpo,_rev:131-57b4da8d3163468cb0bbf4fd30c87832,source:,target:http://127.0.0.1:5984/mygpo,create_target:false,continuous:true,user_ctx:{name:stefan,roles:[admin]},owner:stefan,_replication_state:triggered,_replication_state_time:2012-03-02T02:56:12+00:00,_replication_id:f9fc5457b278d3cdb1ba2f1881253b04}


-- Stefan


Re: Update Conflict for PUT/DELETE in _replicator

2012-03-02 Thread Stefan Kögl
On Fri, Mar 2, 2012 at 2:54 PM, Robert Newson rnew...@apache.org wrote:
 could you redo the DELETE or PUT with '-sv' so we can see what's
 really being sent? Perhaps there's a weird shell thing happening
 causing the rev to be sent incorrectly.


$ curl -sv -X DELETE
http://stefan:*@127.0.0.1:5984/_replicator/mygpo?rev=131-57b4da8d3163468cb0bbf4fd30c87832;
* About to connect() to 127.0.0.1 port 5984 (#0)
*   Trying 127.0.0.1... connected
* Connected to 127.0.0.1 (127.0.0.1) port 5984 (#0)
* Server auth using Basic with user 'stefan'
 DELETE /_replicator/mygpo?rev=131-57b4da8d3163468cb0bbf4fd30c87832 HTTP/1.1
 Authorization: Basic **
 User-Agent: curl/7.19.7 (x86_64-pc-linux-gnu) libcurl/7.19.7 OpenSSL/0.9.8k 
 zlib/1.2.3.3 libidn/1.15
 Host: 127.0.0.1:5984
 Accept: */*

 HTTP/1.1 409 Conflict
 Server: CouchDB/1.2.0 (Erlang OTP/R14B04)
 Date: Fri, 02 Mar 2012 13:57:01 GMT
 Content-Type: text/plain; charset=utf-8
 Content-Length: 58
 Cache-Control: must-revalidate

{error:conflict,reason:Document update conflict.}
* Connection #0 to host 127.0.0.1 left intact
* Closing connection #0

I verified the rev before and after the DELETE to make sure it hasn't
changed in between.


 On 2 March 2012 13:51, Robert Newson rnew...@apache.org wrote:
 tbh you should be getting this error: 'Only the replicator can edit
 replication documents that are in the triggered state'. I do.

Shouldn't I be able to DELETE the replication document?


-- Stefan


Re: Update Conflict for PUT/DELETE in _replicator

2012-03-02 Thread Stefan Kögl
On Fri, Mar 2, 2012 at 4:06 PM, Jan Lehnardt j...@apache.org wrote:
 I just created a replication doc under 1.1.1 and then copied the
 _replicator.couch file to a 1.2.x. On update I the expected result
 Robert also got (Only the replicator can edit replication documents
 that are in the triggered state.. a curl -X DELETE on the doc with
 ?rev=4-abcd... (no quotes) also worked.

The document was created with 1.2.x, from around the time of the second RC.

I also tried with quotes and got

$ curl -sv -X DELETE
http://stefan:*@127.0.0.1:5984/_replicator/mygpo?rev=\131-57b4da8d3163468cb0bbf4fd30c87832\;
* About to connect() to 127.0.0.1 port 5984 (#0)
*   Trying 127.0.0.1... connected
* Connected to 127.0.0.1 (127.0.0.1) port 5984 (#0)
* Server auth using Basic with user 'stefan'
 DELETE /_replicator/mygpo?rev=131-57b4da8d3163468cb0bbf4fd30c87832 HTTP/1.1
 Authorization: Basic **
 User-Agent: curl/7.19.7 (x86_64-pc-linux-gnu) libcurl/7.19.7 OpenSSL/0.9.8k 
 zlib/1.2.3.3 libidn/1.15
 Host: 127.0.0.1:5984
 Accept: */*

 HTTP/1.1 500 Internal Server Error
 Server: CouchDB/1.2.0 (Erlang OTP/R14B04)
 Date: Fri, 02 Mar 2012 15:18:31 GMT
 Content-Type: text/plain; charset=utf-8
 Content-Length: 44
 Cache-Control: must-revalidate

{error:unknown_error,reason:badarg}
* Connection #0 to host 127.0.0.1 left intact
* Closing connection #0

After that I also tried compacting the _replicator database, but also
that didn't change anything.


-- Stefan


Crash of CouchDB 1.2.x

2012-03-01 Thread Stefan Kögl
Hello,

My experiments to replicate some live data / traffic to a CouchDB
1.2.x (running the current 1.2.x branch + the patch from [1]) that
sparked the indexing speed discussions, did also yield another
(potential) problem. First sorry for not further reporting back any
performance measurements, but I didn't yet find the time to run the
tests on my machines.

Anyway, I found the following stack traces in my log (after noticing
that some requests failed and compaction of a view stopped)

http://skoegl.net/~stefan/tmp/couchdb-1.2.x-crash.txt

The files starts at the first failed requests. Every request before
that returned a positiv (ie 2xx) status code. The crash might have
some natural reason (such as timeouts, lack of RAM, etc), but I'm
not sure how to interpret Erlang stack traces. Can somebody point me
in the right direction for diagnosing the problem?


Thanks,

-- Stefan


[1] http://friendpaste.com/178nPFgfyyeGf2vtNRpL0w


Re: Crash of CouchDB 1.2.x

2012-03-01 Thread Stefan Kögl
On Thu, Mar 1, 2012 at 4:52 PM, Jan Lehnardt j...@apache.org wrote:
 Can you tell us how you installed 1.2.x? Is it a fresh installation,
 did you do an in-place update from an earlier installation (earlier
 1.2.x or 1.1.x or 1.0.x?

I first did a fresh install of 1.2.x using R15B. I then removed R15B,
installed R14B04 (both from source), compiled 1.2.x with the patch I
mentioned earlier, and did an in-place update.

If this is a problem, I could remove CouchDB first and do a fresh
install instead. What would be the preferred way to do a clean
uninstall?


-- Stefan


 On Mar 1, 2012, at 12:17 , Stefan Kögl wrote:

 Hello,

 My experiments to replicate some live data / traffic to a CouchDB
 1.2.x (running the current 1.2.x branch + the patch from [1]) that
 sparked the indexing speed discussions, did also yield another
 (potential) problem. First sorry for not further reporting back any
 performance measurements, but I didn't yet find the time to run the
 tests on my machines.

 Anyway, I found the following stack traces in my log (after noticing
 that some requests failed and compaction of a view stopped)

 http://skoegl.net/~stefan/tmp/couchdb-1.2.x-crash.txt

 The files starts at the first failed requests. Every request before
 that returned a positiv (ie 2xx) status code. The crash might have
 some natural reason (such as timeouts, lack of RAM, etc), but I'm
 not sure how to interpret Erlang stack traces. Can somebody point me
 in the right direction for diagnosing the problem?


 Thanks,

 -- Stefan


 [1] http://friendpaste.com/178nPFgfyyeGf2vtNRpL0w



Re: Crash of CouchDB 1.2.x

2012-03-01 Thread Stefan Kögl
On 03/01/2012 07:38 PM, Jan Lehnardt wrote:
 On Mar 1, 2012, at 19:18 , Stefan Kögl wrote:
 If this is a problem, I could remove CouchDB first and do a fresh
 install instead. What would be the preferred way to do a clean
 uninstall?
 
 I don't want to claim that this is definitely the cause for your
 problem, but it'd be great if you could do a clean, fresh, empty
 install to make sure we can rule that out as :)

I just did

/etc/init.d/couchdb stop
make uninstall
make install
# edit local.ini -- why does that get removed anyway?
/etc/init.d/couchdb start

Is that enough to count as a fresh install, or should I do anything
else? I'll continue monitoring the instance. Previously the error
happened after a few days, so I can't say yet if the re-install changed
anything.


-- Stefan


Re: Crash of CouchDB 1.2.x

2012-03-01 Thread Stefan Kögl
Hi,

On Fri, Mar 2, 2012 at 4:33 AM, Nathan Vander Wilt
nate-li...@calftrail.com wrote:
 Was your server under heavy load? Did you end up with a bunch of zombie 
 couchjs processes?

The crash occured under load, but there are no zombies - at least not anymore.

If the crash happens again I'll try to inspect it more closely.


-- Stefan


Re: couchdb conflict resolution

2011-02-15 Thread Stefan Kögl
On Tue, Feb 15, 2011 at 5:14 PM, Aaron Boxer boxe...@gmail.com wrote:
 I am very interested in understanding how conflict resolution works in 
 couchdb:

 Is there a technical overview, somewhere, of how a node decides which
 revision wins?
 After a conflict is resolved, are old revisions discarded?

 Any technical details, short of slogging through the code, would be
 very welcome.

I'm not sure if this is the right level of detail for your purposes, but I found

 http://guide.couchdb.org/draft/conflicts.html

very interesting.

-- Stefan