On Apr 4, 2009, at 6:53 PM, Patric Fors wrote:
5 apr 2009 kl. 00.23 skrev Adam Kocoloski:
On Apr 4, 2009, at 5:44 PM, Patric Fors wrote:
Hi,
Should I be worried that the "view_conflicts" test fail in the
Test Suite?
I mean, is it the test that fails, or is it couchdb that fails the
test. :-)
Hi Patrick, did you happen to run that test with Safari?
view_conflicts fails for me in Safari 4, but passes in Firefox 3
and in the command-line runner. In other words, I think it's the
test that fails, not Couch :-)
Aha, thanks!
And, yes, Safari was the browser I used, I confess :-)
Ran it again with Firefox and it's all good: 44 of 44 test(s) run, 0
failures (55178 ms)
Hm...Command-line runner? Must have missed that one.
Well, while we are on the command line, I guess these errors are
also part of the Test Suits tests?
[info] [<0.10759.0>] 127.0.0.1 - - 'POST' /test_suite_db/
_ensure_full_commit 201
<snipped file descriptor traceback>
[info] [<0.10759.0>] 127.0.0.1 - - 'POST' /_restart 200
/Patric
Hi Patric, funny you should bring that up. I've been trying to
understand the source of those tracebacks myself. Short answer is
that you probably don't have anything to worry about. Long answer
follows ...
CouchDB uses a single file on disk for each database it creates, and
all access to that file goes through a reference-counted gen_server
using couch_file as the callback module. The tracebacks in the logs
occur when a couch_file gen_server terminates abnormally, where
"abnormally" just means that the reason given in the exit signal is
something other than "normal". It happens rarely, and only when a
database is deleted or the server is restarted, both of which occur
much more frequently in the test suite than they do in normal
operation. It's not necessarily indicative of a problem.
I believe the issue is one of message ordering. In normal operation
couch_ref_counter is supposed to stop couch_file when the DB is
deleted or the server restarted. In your log, couch_ref_counter is
the neighbour at <0.10708.0>. Take a look at the ref_counter's
message queue:
{messages, [{'DOWN',#Ref<0.0.0.128931>,process,<0.10644.0>,killed}]},
When the ref_counter processes that message I believe it will trigger
a normal shutdown of the couch_file. Unfortunately, couch_file got
the message about the couch_server at <0.10644.0> going down first, so
you see what looks like a crash. The reason this is not a problem is
that couch_file doesn't do anything differently for a normal or
abnormal termination. The only difference is that the Erlang logger
pukes out this stacktrace if its an abnormal termination.
I think we should look into refactoring the couch_file/
couch_ref_counter stuff a bit; the current workflow (server spawns
file and unlinks, server spawns ref_counter, ref_counter links to
file) is pretty tough to follow and opens us up to these occasional
tracebacks in the logs. Anyway, thanks for listening. Cheers,
Adam
P.S. Any Erlangers out there might have noticed something odd here.
couch_server spawn_links a couch_file and then unlinks it, so why does
couch_file terminate when the server does? Chandru Mullaparthi (of
ibrowse fame) pointed out an undocumented OTP feature that seems to be
responsible in this thread:
http://groups.google.com/group/erlang-programming/browse_thread/thread/8ab392fedcad19b6