Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22882 )

Change subject: [socket] Fix test failure caused due to unsupported socket 
option
......................................................................


Patch Set 1:

(2 comments)

> > Patch Set 1:
 > >
 > > > > Patch Set 1:
 > >  > >
 > >  > > > Patch Set 1:
 > >  > > >
 > >  > > > (1 comment)
 > >  > >
 > >  > > The two tests are consistently failing on my local Ubuntu
 > 20.04
 > >  > machine for debug build (rebase 9e661cb34).
 > >  > >
 > >  > > It appears that in my case, for some reason, the socket
 > shutdown
 > >  > is failing consistently due to which subsequent operation of
 > >  > setting SO_REUSEPORT option fails on unix domain socket. If
 > socket
 > >  > shutdown completes successfully, setting SO_REUSEPORT option
 > may
 > >  > not fail. I will need to validate this theory though, by
 > running
 > >  > some tests.
 > >  > >
 > >  > > Considering above explanation holds, we need to address two
 > >  > issues:
 > >  > > 1. Ensure proper socket cleanup is done successfully in
 > order to
 > >  > avoid reuseport failure.
 > >  > > 2. Ensure unix domain sockets are not unknowingly used for
 > >  > SO_REUSEPORT socket option and expected to work fine.
 > >  >
 > >  >
 > >  > I ran a sample routine, that sets socket option(SO_REUSEPORT)
 > on a
 > >  > AF_UNIX type socket, on different linux kernel versions
 > (5.15.0 and
 > >  > 6.15-rc7) and it became clear that kernel versions containing
 > >  > commit [1] are gracefully erroring out on non-inet sockets.
 > The
 > >  > change was introduced to fix a different issue however.
 > >  >
 > >  > [1]: 
 > > https://github.com/torvalds/linux/commit/5b0af621c3f6ef9261cf6067812f2fd9943acb4b
 > >  >
 > >  > Socket cleanup issue (mentioned above) turns out to be
 > something
 > >  > different and could be related to C++ version installed on my
 > >  > ubuntu linux variant. I haven't figured that out yet.
 > >  >
 > >  > I still believe, we need to make sure we are not unknowingly
 > using
 > >  > SO_REUSEPORT option while setting sockops on a AF_UNIX type
 > socket.
 > >  > Current change should help in doing that.
 > >
 > > OK, but why to change socket.cc?
 > >
 > > Also, maybe update the description with your findings, since the
 > statement that SO_REUSEPORT for a Unix domain socket isn't
 > supported isn't true.
 >
 > SO_REUSEPORT macro has been available there since kernel 3.9 and it
 > would be safe to assume that it is available in all the linux
 > distros today, so there is no need to have this macro availability
 > check, hence the change in socket.cc.

This doesn't explain why this change is needed to fix the tests that you 
reported as failing.  As I understand, the change in socket.cc in PS1 is 
unrelated to changes needed to address the failing tests.  What benefit does 
the change in socket.cc bring in?  Did it fix or improve anything, even in a 
theoretical use case?  I don't see much of that, but I might be missing 
something.  However, I have one concern with the change in socket.cc: it might 
break compilation at some of the platforms listed as supported at 
https://kudu.apache.org/docs/installation.html.

Did you verify that Kudu source code still compiles with this update on all the 
OSes listed there?  Even if you did or you know there is nothing to break, I'd 
rather see the update on socket.cc published in a separate changelist since 
it's not related to fixing the failing tests.  If you feel strongly about this 
and prefer lumping it with the rest of the changes in this patch, then I'm fine 
with keeping the update on socket.cc here as well, but otherwise I encourage 
you to post the changes in socket.cc in a separate patch.

 >
 > SO_REUSEPORT for a Unix domain socket is not supported is true
 > because there is no proper handling of such a case in kernel. Also,
 > there is no equivalent of sin_port in sockaddr_un structure. For
 > the kernel versions (with [1] commit), setsockopt fails with
 > appropriate error which is good because we will get to know the
 > error before bind operation. However, if there is a wrong
 > expectation of successfully setting reuseport option on unix domain
 > socket, we can certainly correct that expectation with this check.
 > The check will also handle the cases where [1] is not present in
 > the kernel, potentially leading to undefined behaviour.

The statement the commit description for PS1 is literally this:
  > Socket API (SetSockOpt) doesn't support SO_REUSEPORT for a Unix domain 
socket.

SetSockOpt() is Kudu's API, so maybe that was the source of my confusion.  As 
one can see, SetSockOpt() simply forwards everything down to the setsockopt() 
syscall, but there is nothing specific about Unix domain sockets regarding 
particular socket options in SetSockOpt() itself.  It seems you rather meant 
setsockopt() syscall.  Indeed -- the port number isn't applicable for Unix 
socket family, so SO_REUSEPORT isn't relevant for AF_UNIX sockets.  Please 
consider updating the commit description accordingly.

Also, please add the kernel version information for the platform where you saw 
those two tests failing.  Adding the relevant information regarding the change 
in setsockopt() behavior for SO_REUSEPORT in case of AF_UNIX sockets should be 
great as well.

Thank you!

http://gerrit.cloudera.org:8080/#/c/22882/1/src/kudu/rpc/messenger.cc
File src/kudu/rpc/messenger.cc:

http://gerrit.cloudera.org:8080/#/c/22882/1/src/kudu/rpc/messenger.cc@262
PS1, Line 262: unix
nit: UNIX


http://gerrit.cloudera.org:8080/#/c/22882/1/src/kudu/rpc/messenger.cc@263
PS1, Line 263: (reuseport_ && !accept_addr.is_unix())
Instead of silently ignoring the 'reuseport_' option for UNIX sockets, I'd 
rather think of reporting about the misconfiguration in this case, returning 
Status::ConfigurationError() when 'reuseport_' is set 'true' and the listening 
socket is of AF_UNIX family.

In addition, to catch such misconfiguration errors early, consider adding a 
group flag validator to check for the consistency of settings for 
--rpc_reuseport and --rpc_listen_on_unix_domain_socket flags.



--
To view, visit http://gerrit.cloudera.org:8080/22882
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ice00b1c4fd1df78fa84c3dd2a79c968a4a91cc21
Gerrit-Change-Number: 22882
Gerrit-PatchSet: 1
Gerrit-Owner: Ashwani Raina <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Ashwani Raina <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 21 May 2025 22:30:09 +0000
Gerrit-HasComments: Yes

Reply via email to