On Apr 4, 2009, at 6:53 PM, Patric Fors wrote:


5 apr 2009 kl. 00.23 skrev Adam Kocoloski:

On Apr 4, 2009, at 5:44 PM, Patric Fors wrote:

Hi,

Should I be worried that the "view_conflicts" test fail in the Test Suite? I mean, is it the test that fails, or is it couchdb that fails the test. :-)

Hi Patrick, did you happen to run that test with Safari? view_conflicts fails for me in Safari 4, but passes in Firefox 3 and in the command-line runner. In other words, I think it's the test that fails, not Couch :-)

Aha, thanks!
And, yes, Safari was the browser I used, I confess :-)
Ran it again with Firefox and it's all good: 44 of 44 test(s) run, 0 failures (55178 ms)

Hm...Command-line runner? Must have missed that one.
Well, while we are on the command line, I guess these errors are also part of the Test Suits tests?

[info] [<0.10759.0>] 127.0.0.1 - - 'POST' /test_suite_db/ _ensure_full_commit 201

<snipped file descriptor traceback>

[info] [<0.10759.0>] 127.0.0.1 - - 'POST' /_restart 200


/Patric

Hi Patric, funny you should bring that up. I've been trying to understand the source of those tracebacks myself. Short answer is that you probably don't have anything to worry about. Long answer follows ...

CouchDB uses a single file on disk for each database it creates, and all access to that file goes through a reference-counted gen_server using couch_file as the callback module. The tracebacks in the logs occur when a couch_file gen_server terminates abnormally, where "abnormally" just means that the reason given in the exit signal is something other than "normal". It happens rarely, and only when a database is deleted or the server is restarted, both of which occur much more frequently in the test suite than they do in normal operation. It's not necessarily indicative of a problem.

I believe the issue is one of message ordering. In normal operation couch_ref_counter is supposed to stop couch_file when the DB is deleted or the server restarted. In your log, couch_ref_counter is the neighbour at <0.10708.0>. Take a look at the ref_counter's message queue:

{messages, [{'DOWN',#Ref<0.0.0.128931>,process,<0.10644.0>,killed}]},

When the ref_counter processes that message I believe it will trigger a normal shutdown of the couch_file. Unfortunately, couch_file got the message about the couch_server at <0.10644.0> going down first, so you see what looks like a crash. The reason this is not a problem is that couch_file doesn't do anything differently for a normal or abnormal termination. The only difference is that the Erlang logger pukes out this stacktrace if its an abnormal termination.

I think we should look into refactoring the couch_file/ couch_ref_counter stuff a bit; the current workflow (server spawns file and unlinks, server spawns ref_counter, ref_counter links to file) is pretty tough to follow and opens us up to these occasional tracebacks in the logs. Anyway, thanks for listening. Cheers,

Adam

P.S. Any Erlangers out there might have noticed something odd here. couch_server spawn_links a couch_file and then unlinks it, so why does couch_file terminate when the server does? Chandru Mullaparthi (of ibrowse fame) pointed out an undocumented OTP feature that seems to be responsible in this thread:

http://groups.google.com/group/erlang-programming/browse_thread/thread/8ab392fedcad19b6

Reply via email to