[GitHub] [couchdb] nickva commented on pull request #4729: Fabric workers should exit if the client exits

2023-08-22 Thread via GitHub


nickva commented on PR #4729:
URL: https://github.com/apache/couchdb/pull/4729#issuecomment-1688590578

   I managed to find a way to seemingly detect the connection state change 
after the local client (curl) disconnects by querying the equivalent of 
`getsockopt(fd(), IPPROTO_TCP, TCP_CONNECTION_INFO, , _size)`
   
   ```
   (node1@127.0.0.1)22> rp(inet:getopts(#Port<0.24>, [{raw, 6, 262, 1000}])).
   {ok,[{raw,6,262,
 <<4,6,6,0,7,0,0,0,0,0,0,0,0,0,0,0,204,63,0,0,0,192,255,
   63,18,0,2,0,0,57,6,0,0,0,0,0,88,58,6,0,1,0,0,0,1,0,0,
   0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,177,1,0,0,0,0,0,0,
   0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,148,0,0,0,0,0,0,0,0,
   0,0,0,0,0,0,0,0,0,0,0,0,0,0,0>>}]}
   ok
   ```
   
   Ctrl+C...
   
   ```
   (node1@127.0.0.1)24> inet:info(#Port<0.24>).
   #{counters =>
 #{recv_avg => 24,recv_cnt => 6,recv_dvi => 10,recv_max => 51,
   recv_oct => 148,send_avg => 216,send_cnt => 2,
   send_max => 427,send_oct => 433,send_pend => 0},
 input => 0,
 links => [<0.12859.0>],
 memory => 40,monitors => [],output => 433,
 owner => <0.12859.0>,
 states => [connected,open]}
   
   (node1@127.0.0.1)25> rp(inet:getopts(#Port<0.24>, [{raw, 6, 262, 1000}])).
   {ok,[{raw,6,262,
 <<5,6,6,0,7,0,0,0,0,0,0,0,0,0,0,0,204,63,0,0,0,192,255,
   63,18,0,2,0,0,57,6,0,0,0,0,0,88,58,6,0,59,58,0,0,1,0,
   0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,177,1,0,0,0,0,0,
   0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,148,0,0,0,0,0,0,0,
   0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0>>}]}
   ok
   ```
   
   Looks like Linux has a similar facility so there is some hope, though it 
does look very ugly


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [couchdb] nickva commented on pull request #4729: Fabric workers should exit if the client exits

2023-08-21 Thread via GitHub


nickva commented on PR #4729:
URL: https://github.com/apache/couchdb/pull/4729#issuecomment-1686663150

   > it ought to kill the process/port for the socket, which I think will kill 
the mochiweb process, but subject to testing
   
   It doesn't hurt to try I don't see how that could plausibly work. There is 
no active socket monitoring or polling in mochiweb. If we were in active mode 
or were in C and used a select/poll/epoll we'd probably find out about it.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [couchdb] nickva commented on pull request #4729: Fabric workers should exit if the client exits

2023-08-21 Thread via GitHub


nickva commented on PR #4729:
URL: https://github.com/apache/couchdb/pull/4729#issuecomment-1686633468

   On localhost `keepalive` won't behave any differently for disconnects, as 
network handling is just a direct callback into the other peer's socket code at 
the kernel level. In other words the socket will find out it's closed anyway. 
For remote connection it's a different story and keepalive will be useful there.
   
   So keepalive is useful as a step 0 - to allow the socket at the kernel level 
to become closed. However that is not enough for us, what is missing is at the 
HTTP protocol stage in mochiweb we do not have a reason do any socket 
operations (and we can't do any extra ones outside of mochweb in a sneaky way 
so far it seems, which is what I was getting at above). We're forced to wait 
for the `mango` application to return a response. Without any attempted writes 
or reads to socket at the mochiweb level won't have a way to find out the 
socket is closed. Which could come after days of waiting in pathological cases 
like we saw already.
   
   The only solutions I see so far is to either modify mango to periodically 
handle a timeout and push a `\n` between row (after forcing to emit headers on 
the first timeout) or find some possibly non-portable raw socket getops query 
to detect that the socket is closed out-of-band as it were. We could enhance 
the existing `fabric_stream` monitor to periodically check it or invent some 
new thing (maybe some new thing in mochiweb even).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [couchdb] nickva commented on pull request #4729: Fabric workers should exit if the client exits

2023-08-21 Thread via GitHub


nickva commented on PR #4729:
URL: https://github.com/apache/couchdb/pull/4729#issuecomment-1686478247

   In principle at least the experimental `socket` module does allow `peek`-ing 
with `recv` since OTP 24.0: https://www.erlang.org/doc/man/socket#type-msg_flag
   
   > 
   ```
   msg_flag() =
   cmsg_cloexec | confirm | ctrunc | dontroute | eor | errqueue |
   more | oob | peek | trunc
   ```
   
   But we can't use socket for non-socket connections and it might be a while 
if/until mochiweb starts using that module. Especially since in some cases 
`socket` has been performing as well as the old `inet` driver.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [couchdb] nickva commented on pull request #4729: Fabric workers should exit if the client exits

2023-08-18 Thread via GitHub


nickva commented on PR #4729:
URL: https://github.com/apache/couchdb/pull/4729#issuecomment-1684533075

   > I confirmed directly and via haproxy that a client that disconnects (curl 
and CTRL-C in the first case and simply doing something that hits haproxy's 
server timeout like `_changes?feed=continuous=1000`) that the 
mochiweb process is killed, and this triggers the new code above to take down 
workers.
   
   Cleanup happens without this patch already form what I've observed. Here is 
how I tested on main:
   
   ```diff
   diff --git a/src/chttpd/src/chttpd_db.erl b/src/chttpd/src/chttpd_db.erl
   index e2de301b2..6c2beba83 100644
   --- a/src/chttpd/src/chttpd_db.erl
   +++ b/src/chttpd/src/chttpd_db.erl
   @@ -127,6 +127,8 @@ handle_changes_req1(#httpd{} = Req, Db) ->
db_open_options = [{user_ctx, couch_db:get_user_ctx(Db)}]
},
Max = chttpd:chunked_response_buffer_size(),
   +couch_log:error(" +++TRACING _changes ~p:~p@~B REQPID:~p", 
[?MODULE, ?FUNCTION_NAME, ?LINE, self()]),
   +dbg:tracer(), dbg:p(self(), p),
case ChangesArgs#changes_args.feed of
"normal" ->
T0 = os:timestamp(),
   diff --git a/src/fabric/src/fabric_db_update_listener.erl 
b/src/fabric/src/fabric_db_update_listener.erl
   index 78ccf5a4d..fb508294a 100644
   --- a/src/fabric/src/fabric_db_update_listener.erl
   +++ b/src/fabric/src/fabric_db_update_listener.erl
   @@ -37,6 +37,8 @@
}).
   
go(Parent, ParentRef, DbName, Timeout) ->
   +couch_log:error(" +++TRACING UPDATE NOTIFIER+++ ~p:~p@~B ~p Parent:~p", 
[?MODULE, ?FUNCTION_NAME, ?LINE, DbName, Parent]),
   +dbg:tracer(), dbg:p(self(), p),
Shards = mem3:shards(DbName),
Notifiers = start_update_notifiers(Shards),
MonRefs = lists:usort([rexi_utils:server_pid(N) || #worker{node = N} <- 
Notifiers]),
   @@ -82,6 +84,8 @@ start_update_notifiers(Shards) ->
   
% rexi endpoint
start_update_notifier(DbNames) ->
   +couch_log:error(" +++TRACING UPDATE NOTIFIER WORKER+++ ~p:~p@~B~p", 
[?MODULE, ?FUNCTION_NAME, ?LINE, DbNames]),
   +dbg:tracer(), dbg:p(self(), p),
{Caller, Ref} = get(rexi_from),
Notify = config:get("couchdb", "maintenance_mode", "false") /= "true",
State = #cb_state{client_pid = Caller, client_ref = Ref, notify = 
Notify},
   diff --git a/src/fabric/src/fabric_rpc.erl b/src/fabric/src/fabric_rpc.erl
   index fa6ea5116..64fdbf4b5 100644
   --- a/src/fabric/src/fabric_rpc.erl
   +++ b/src/fabric/src/fabric_rpc.erl
   @@ -69,6 +69,8 @@ changes(DbName, Args, StartSeq) ->
changes(DbName, #changes_args{} = Args, StartSeq, DbOptions) ->
changes(DbName, [Args], StartSeq, DbOptions);
changes(DbName, Options, StartVector, DbOptions) ->
   +couch_log:error(" ++TRACING CHANGES WORKER+ ~p:~p@~B~p", [?MODULE, 
?FUNCTION_NAME, ?LINE, DbName]),
   +dbg:tracer(), dbg:p(self(), p),
set_io_priority(DbName, DbOptions),
Args0 = lists:keyfind(changes_args, 1, Options),
#changes_args{dir = Dir, filter_fun = Filter} = Args0,
   ```
   
   That's setting up process even traces on the request process, worker 
process, db updater process and db update worker process. The goal is that 
these should be cleaned as soon as we attempt any write to the closed socket.
   
   (`db` is Q=1 empty db)
   
   ```
   % curl -i $DB/db/_changes'?feed=continuous=now'
   HTTP/1.1 200 OK
   ...
   
   ^C
   ```
   
   ```
   [error] 2023-08-18T22:54:36.654375Z node1@127.0.0.1 <0.949.0> cc6744542c  
+++TRACING _changes chttpd_db:handle_changes_req1@130 REQPID:<0.949.0>
   (<0.949.0>) spawn <0.974.0> as erlang:apply(#Fun,[])
   (<0.949.0>) link <0.974.0>
   (<0.949.0>) getting_unlinked <0.974.0>
   [error] 2023-08-18T22:54:36.654949Z node1@127.0.0.1 <0.976.0>   
+++TRACING UPDATE NOTIFIER+++ fabric_db_update_listener:go@40 <<"db">> 
Parent:<0.949.0>
   (<0.949.0>) spawn <0.976.0> as 
fabric_db_update_listener:go(<0.949.0>,#Ref<0.830586435.801374217.85503>,<<"db">>,6)
   (<0.949.0>) link <0.976.0>
   (<0.949.0>) spawn <0.978.0> as erlang:apply(#Fun,[])
   (<0.949.0>) link <0.978.0>
   (<0.949.0>) spawn <0.979.0> as 
erlang:apply(#Fun,[])
   [error] 2023-08-18T22:54:36.655018Z node1@127.0.0.1 <0.977.0> cc6744542c  
++TRACING CHANGES WORKER+ 
fabric_rpc:changes@72<<"shards/-/db.1692392006">>
   (<0.976.0>) spawn <0.980.0> as erlang:apply(#Fun,[])
   [error] 2023-08-18T22:54:36.655123Z node1@127.0.0.1 <0.982.0>   
+++TRACING UPDATE NOTIFIER WORKER+++ 
fabric_db_update_listener:start_update_notifier@87[<<"shards/-/db.1692392006">>]
   (<0.976.0>) link <0.980.0>
   (<0.976.0>) spawn <0.981.0> as 
erlang:apply(#Fun,[])
   (<0.977.0>) exit normal
   (<0.949.0>) getting_unlinked <0.978.0>
   
   (<0.976.0>) exit normal
   (<0.982.0>) exit killed
   (<0.949.0>) exit shutdown
   ```
   
* `0.949.0` request process is killed with a shutdown as expected.