On 04/01, Stefan Metzmacher wrote: > Am 01.04.25 um 17:45 schrieb Stanislav Fomichev: > > On 04/01, Breno Leitao wrote: > > > On Tue, Apr 01, 2025 at 03:48:58PM +0200, Stefan Metzmacher wrote: > > > > Am 01.04.25 um 15:37 schrieb Stefan Metzmacher: > > > > > Am 01.04.25 um 10:19 schrieb Stefan Metzmacher: > > > > > > Am 31.03.25 um 23:04 schrieb Stanislav Fomichev: > > > > > > > On 03/31, Stefan Metzmacher wrote: > > > > > > > > The motivation for this is to remove the SOL_SOCKET limitation > > > > > > > > from io_uring_cmd_getsockopt(). > > > > > > > > > > > > > > > > The reason for this limitation is that io_uring_cmd_getsockopt() > > > > > > > > passes a kernel pointer as optlen to do_sock_getsockopt() > > > > > > > > and can't reach the ops->getsockopt() path. > > > > > > > > > > > > > > > > The first idea would be to change the optval and optlen > > > > > > > > arguments > > > > > > > > to the protocol specific hooks also to sockptr_t, as that > > > > > > > > is already used for setsockopt() and also by > > > > > > > > do_sock_getsockopt() > > > > > > > > sk_getsockopt() and BPF_CGROUP_RUN_PROG_GETSOCKOPT(). > > > > > > > > > > > > > > > > But as Linus don't like 'sockptr_t' I used a different approach. > > > > > > > > > > > > > > > > @Linus, would that optlen_t approach fit better for you? > > > > > > > > > > > > > > [..] > > > > > > > > > > > > > > > Instead of passing the optlen as user or kernel pointer, > > > > > > > > we only ever pass a kernel pointer and do the > > > > > > > > translation from/to userspace in do_sock_getsockopt(). > > > > > > > > > > > > > > At this point why not just fully embrace iov_iter? You have the > > > > > > > size > > > > > > > now + the user (or kernel) pointer. Might as well do > > > > > > > s/sockptr_t/iov_iter/ conversion? > > > > > > > > > > > > I think that would only be possible if we introduce > > > > > > proto[_ops].getsockopt_iter() and then convert the implementations > > > > > > step by step. Doing it all in one go has a lot of potential to break > > > > > > the uapi. I could try to convert things like socket, ip and tcp > > > > > > myself, but > > > > > > the rest needs to be converted by the maintainer of the specific > > > > > > protocol, > > > > > > as it needs to be tested. As there are crazy things happening in > > > > > > the existing > > > > > > implementations, e.g. some getsockopt() implementations use optval > > > > > > as in and out > > > > > > buffer. > > > > > > > > > > > > I first tried to convert both optval and optlen of getsockopt to > > > > > > sockptr_t, > > > > > > and that showed that touching the optval part starts to get complex > > > > > > very soon, > > > > > > see > > > > > > https://git.samba.org/?p=metze/linux/wip.git;a=commitdiff;h=141912166473bf8843ec6ace76dc9c6945adafd1 > > > > > > (note it didn't converted everything, I gave up after hitting > > > > > > sctp_getsockopt_peer_addrs and sctp_getsockopt_local_addrs. > > > > > > sctp_getsockopt_context, sctp_getsockopt_maxseg, > > > > > > sctp_getsockopt_associnfo and maybe > > > > > > more are the ones also doing both copy_from_user and copy_to_user > > > > > > on optval) > > > > > > > > > > > > I come also across one implementation that returned -ERANGE because > > > > > > *optlen was > > > > > > too short and put the required length into *optlen, which means the > > > > > > returned > > > > > > *optlen is larger than the optval buffer given from userspace. > > > > > > > > > > > > Because of all these strange things I tried to do a minimal change > > > > > > in order to get rid of the io_uring limitation and only converted > > > > > > optlen and leave optval as is. > > > > > > > > > > > > In order to have a patchset that has a low risk to cause > > > > > > regressions. > > > > > > > > > > > > But as alternative introducing a prototype like this: > > > > > > > > > > > > int (*getsockopt_iter)(struct socket *sock, int level, > > > > > > int optname, > > > > > > struct iov_iter *optval_iter); > > > > > > > > > > > > That returns a non-negative value which can be placed into *optlen > > > > > > or negative value as error and *optlen will not be changed on error. > > > > > > optval_iter will get direction ITER_DEST, so it can only be written > > > > > > to. > > > > > > > > > > > > Implementations could then opt in for the new interface and > > > > > > allow do_sock_getsockopt() work also for the io_uring case, > > > > > > while all others would still get -EOPNOTSUPP. > > > > > > > > > > > > So what should be the way to go? > > > > > > > > > > Ok, I've added the infrastructure for getsockopt_iter, see below, > > > > > but the first part I wanted to convert was > > > > > tcp_ao_copy_mkts_to_user() and that also reads from userspace before > > > > > writing. > > > > > > > > > > So we could go with the optlen_t approach, or we need > > > > > logic for ITER_BOTH or pass two iov_iters one with ITER_SRC and one > > > > > with ITER_DEST... > > > > > > > > > > So who wants to decide? > > > > > > > > I just noticed that it's even possible in same cases > > > > to pass in a short buffer to optval, but have a longer value in optlen, > > > > hci_sock_getsockopt() with SOL_BLUETOOTH completely ignores optlen. > > > > > > > > This makes it really hard to believe that trying to use iov_iter for > > > > this > > > > is a good idea :-( > > > > > > That was my finding as well a while ago, when I was planning to get the > > > __user pointers converted to iov_iter. There are some weird ways of > > > using optlen and optval, which makes them non-trivial to covert to > > > iov_iter. > > > > Can we ignore all non-ip/tcp/udp cases for now? This should cover +90% > > of useful socket opts. See if there are any obvious problems with them > > and if not, try converting. The rest we can cover separately when/if > > needed. > > That's what I tried, but it fails with > tcp_getsockopt -> > do_tcp_getsockopt -> > tcp_ao_get_mkts -> > tcp_ao_copy_mkts_to_user -> > copy_struct_from_sockptr > tcp_ao_get_sock_info -> > copy_struct_from_sockptr > > That's not possible with a ITER_DEST iov_iter. > > metze
Can we create two iterators over the same memory? One for ITER_SOURCE and another for ITER_DEST. And then make getsockopt_iter accept optval_in and optval_out. We can also use optval_out position (iov_offset) as optlen output value. Don't see why it won't work, but I agree that's gonna be a messy conversion so let's see if someone else has better suggestions.