[GitHub] [couchdb] nickva commented on pull request #4729: Fabric workers should exit if the client exits
nickva commented on PR #4729: URL: https://github.com/apache/couchdb/pull/4729#issuecomment-1688590578 I managed to find a way to seemingly detect the connection state change after the local client (curl) disconnects by querying the equivalent of `getsockopt(fd(), IPPROTO_TCP, TCP_CONNECTION_INFO, , _size)` ``` (node1@127.0.0.1)22> rp(inet:getopts(#Port<0.24>, [{raw, 6, 262, 1000}])). {ok,[{raw,6,262, <<4,6,6,0,7,0,0,0,0,0,0,0,0,0,0,0,204,63,0,0,0,192,255, 63,18,0,2,0,0,57,6,0,0,0,0,0,88,58,6,0,1,0,0,0,1,0,0, 0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,177,1,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,148,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0>>}]} ok ``` Ctrl+C... ``` (node1@127.0.0.1)24> inet:info(#Port<0.24>). #{counters => #{recv_avg => 24,recv_cnt => 6,recv_dvi => 10,recv_max => 51, recv_oct => 148,send_avg => 216,send_cnt => 2, send_max => 427,send_oct => 433,send_pend => 0}, input => 0, links => [<0.12859.0>], memory => 40,monitors => [],output => 433, owner => <0.12859.0>, states => [connected,open]} (node1@127.0.0.1)25> rp(inet:getopts(#Port<0.24>, [{raw, 6, 262, 1000}])). {ok,[{raw,6,262, <<5,6,6,0,7,0,0,0,0,0,0,0,0,0,0,0,204,63,0,0,0,192,255, 63,18,0,2,0,0,57,6,0,0,0,0,0,88,58,6,0,59,58,0,0,1,0, 0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,177,1,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,148,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0>>}]} ok ``` Looks like Linux has a similar facility so there is some hope, though it does look very ugly -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@couchdb.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [couchdb] nickva commented on pull request #4729: Fabric workers should exit if the client exits
nickva commented on PR #4729: URL: https://github.com/apache/couchdb/pull/4729#issuecomment-1686663150 > it ought to kill the process/port for the socket, which I think will kill the mochiweb process, but subject to testing It doesn't hurt to try I don't see how that could plausibly work. There is no active socket monitoring or polling in mochiweb. If we were in active mode or were in C and used a select/poll/epoll we'd probably find out about it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@couchdb.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [couchdb] nickva commented on pull request #4729: Fabric workers should exit if the client exits
nickva commented on PR #4729: URL: https://github.com/apache/couchdb/pull/4729#issuecomment-1686633468 On localhost `keepalive` won't behave any differently for disconnects, as network handling is just a direct callback into the other peer's socket code at the kernel level. In other words the socket will find out it's closed anyway. For remote connection it's a different story and keepalive will be useful there. So keepalive is useful as a step 0 - to allow the socket at the kernel level to become closed. However that is not enough for us, what is missing is at the HTTP protocol stage in mochiweb we do not have a reason do any socket operations (and we can't do any extra ones outside of mochweb in a sneaky way so far it seems, which is what I was getting at above). We're forced to wait for the `mango` application to return a response. Without any attempted writes or reads to socket at the mochiweb level won't have a way to find out the socket is closed. Which could come after days of waiting in pathological cases like we saw already. The only solutions I see so far is to either modify mango to periodically handle a timeout and push a `\n` between row (after forcing to emit headers on the first timeout) or find some possibly non-portable raw socket getops query to detect that the socket is closed out-of-band as it were. We could enhance the existing `fabric_stream` monitor to periodically check it or invent some new thing (maybe some new thing in mochiweb even). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@couchdb.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [couchdb] nickva commented on pull request #4729: Fabric workers should exit if the client exits
nickva commented on PR #4729: URL: https://github.com/apache/couchdb/pull/4729#issuecomment-1686478247 In principle at least the experimental `socket` module does allow `peek`-ing with `recv` since OTP 24.0: https://www.erlang.org/doc/man/socket#type-msg_flag > ``` msg_flag() = cmsg_cloexec | confirm | ctrunc | dontroute | eor | errqueue | more | oob | peek | trunc ``` But we can't use socket for non-socket connections and it might be a while if/until mochiweb starts using that module. Especially since in some cases `socket` has been performing as well as the old `inet` driver. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@couchdb.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [couchdb] nickva commented on pull request #4729: Fabric workers should exit if the client exits
nickva commented on PR #4729: URL: https://github.com/apache/couchdb/pull/4729#issuecomment-1684533075 > I confirmed directly and via haproxy that a client that disconnects (curl and CTRL-C in the first case and simply doing something that hits haproxy's server timeout like `_changes?feed=continuous=1000`) that the mochiweb process is killed, and this triggers the new code above to take down workers. Cleanup happens without this patch already form what I've observed. Here is how I tested on main: ```diff diff --git a/src/chttpd/src/chttpd_db.erl b/src/chttpd/src/chttpd_db.erl index e2de301b2..6c2beba83 100644 --- a/src/chttpd/src/chttpd_db.erl +++ b/src/chttpd/src/chttpd_db.erl @@ -127,6 +127,8 @@ handle_changes_req1(#httpd{} = Req, Db) -> db_open_options = [{user_ctx, couch_db:get_user_ctx(Db)}] }, Max = chttpd:chunked_response_buffer_size(), +couch_log:error(" +++TRACING _changes ~p:~p@~B REQPID:~p", [?MODULE, ?FUNCTION_NAME, ?LINE, self()]), +dbg:tracer(), dbg:p(self(), p), case ChangesArgs#changes_args.feed of "normal" -> T0 = os:timestamp(), diff --git a/src/fabric/src/fabric_db_update_listener.erl b/src/fabric/src/fabric_db_update_listener.erl index 78ccf5a4d..fb508294a 100644 --- a/src/fabric/src/fabric_db_update_listener.erl +++ b/src/fabric/src/fabric_db_update_listener.erl @@ -37,6 +37,8 @@ }). go(Parent, ParentRef, DbName, Timeout) -> +couch_log:error(" +++TRACING UPDATE NOTIFIER+++ ~p:~p@~B ~p Parent:~p", [?MODULE, ?FUNCTION_NAME, ?LINE, DbName, Parent]), +dbg:tracer(), dbg:p(self(), p), Shards = mem3:shards(DbName), Notifiers = start_update_notifiers(Shards), MonRefs = lists:usort([rexi_utils:server_pid(N) || #worker{node = N} <- Notifiers]), @@ -82,6 +84,8 @@ start_update_notifiers(Shards) -> % rexi endpoint start_update_notifier(DbNames) -> +couch_log:error(" +++TRACING UPDATE NOTIFIER WORKER+++ ~p:~p@~B~p", [?MODULE, ?FUNCTION_NAME, ?LINE, DbNames]), +dbg:tracer(), dbg:p(self(), p), {Caller, Ref} = get(rexi_from), Notify = config:get("couchdb", "maintenance_mode", "false") /= "true", State = #cb_state{client_pid = Caller, client_ref = Ref, notify = Notify}, diff --git a/src/fabric/src/fabric_rpc.erl b/src/fabric/src/fabric_rpc.erl index fa6ea5116..64fdbf4b5 100644 --- a/src/fabric/src/fabric_rpc.erl +++ b/src/fabric/src/fabric_rpc.erl @@ -69,6 +69,8 @@ changes(DbName, Args, StartSeq) -> changes(DbName, #changes_args{} = Args, StartSeq, DbOptions) -> changes(DbName, [Args], StartSeq, DbOptions); changes(DbName, Options, StartVector, DbOptions) -> +couch_log:error(" ++TRACING CHANGES WORKER+ ~p:~p@~B~p", [?MODULE, ?FUNCTION_NAME, ?LINE, DbName]), +dbg:tracer(), dbg:p(self(), p), set_io_priority(DbName, DbOptions), Args0 = lists:keyfind(changes_args, 1, Options), #changes_args{dir = Dir, filter_fun = Filter} = Args0, ``` That's setting up process even traces on the request process, worker process, db updater process and db update worker process. The goal is that these should be cleaned as soon as we attempt any write to the closed socket. (`db` is Q=1 empty db) ``` % curl -i $DB/db/_changes'?feed=continuous=now' HTTP/1.1 200 OK ... ^C ``` ``` [error] 2023-08-18T22:54:36.654375Z node1@127.0.0.1 <0.949.0> cc6744542c +++TRACING _changes chttpd_db:handle_changes_req1@130 REQPID:<0.949.0> (<0.949.0>) spawn <0.974.0> as erlang:apply(#Fun,[]) (<0.949.0>) link <0.974.0> (<0.949.0>) getting_unlinked <0.974.0> [error] 2023-08-18T22:54:36.654949Z node1@127.0.0.1 <0.976.0> +++TRACING UPDATE NOTIFIER+++ fabric_db_update_listener:go@40 <<"db">> Parent:<0.949.0> (<0.949.0>) spawn <0.976.0> as fabric_db_update_listener:go(<0.949.0>,#Ref<0.830586435.801374217.85503>,<<"db">>,6) (<0.949.0>) link <0.976.0> (<0.949.0>) spawn <0.978.0> as erlang:apply(#Fun,[]) (<0.949.0>) link <0.978.0> (<0.949.0>) spawn <0.979.0> as erlang:apply(#Fun,[]) [error] 2023-08-18T22:54:36.655018Z node1@127.0.0.1 <0.977.0> cc6744542c ++TRACING CHANGES WORKER+ fabric_rpc:changes@72<<"shards/-/db.1692392006">> (<0.976.0>) spawn <0.980.0> as erlang:apply(#Fun,[]) [error] 2023-08-18T22:54:36.655123Z node1@127.0.0.1 <0.982.0> +++TRACING UPDATE NOTIFIER WORKER+++ fabric_db_update_listener:start_update_notifier@87[<<"shards/-/db.1692392006">>] (<0.976.0>) link <0.980.0> (<0.976.0>) spawn <0.981.0> as erlang:apply(#Fun,[]) (<0.977.0>) exit normal (<0.949.0>) getting_unlinked <0.978.0> (<0.976.0>) exit normal (<0.982.0>) exit killed (<0.949.0>) exit shutdown ``` * `0.949.0` request process is killed with a shutdown as expected.