[no subject]
David Miller wrote: > > Is there going to be a merge of net into net-next before the merge > > window opens? Or do you have a sample merge that I can rebase my > > afs-next branch on? > > I'll be doing a net to net-next merge some time today. Excellent, thanks! David
[no subject]
Hi Dave, Is there going to be a merge of net into net-next before the merge window opens? Or do you have a sample merge that I can rebase my afs-next branch on? The problem I have is that there's a really necessary patch in net that's not in net-next: d7b4c24f45d2efe51b8f213da4593fefd49240ba rxrpc: Fix an uninitialised variable (it fixes a fix that went in net just before you last merged it into net-next). So I would like to base my branch on both net and net-next, but the merge is non-trivial, and I'd rather not hand Linus a merge that conflicts with yours. The issues are: (*) net/sched/cls_api.c I think nlmsg_parse() needs to take both rtm_tca_policy and cb->extack as its last two arguments. Each branch fills in one argument and leaves the other NULL. (*) net/ipv4/ipmr_base.c mr_rtm_dumproute() got a piece abstracted out and modified in one branch, but the unabstracted branch has a fix in the same area. I think the thing to do is to apply the fix (removing the same two lines) from the abstracted-out branch. Thanks, David
Re: [PATCH net] rxrpc: Fix incorrect conditional on IPV6
David Miller wrote: > > Nit : Correct attribution would require a Reported-by: tag > > Right. And I've posted a new version with that and the reviewed-by from Arnd. David
Re: [PATCH net] rxrpc: Fix incorrect conditional on IPV6
Eric Dumazet wrote: > Nit : Correct attribution would require a Reported-by: tag Point. I'll repost it with a To: line also, I'm waiting to see if Arnd will ack it. David
Re: [PATCH net-next] rxrpc: Remove set but not used variable 'ioc'
Dan Carpenter wrote: > I told him to do that because it wasn't a bugfix... Probably just Fixes > is the right thing to use though. But the correct fix *is* a bug fix. David
Re: [PATCH net-next] rxrpc: Remove set but not used variable 'ioc'
YueHaibing wrote: > net/rxrpc/output.c: In function 'rxrpc_reject_packets': > net/rxrpc/output.c:527:11: warning: > variable 'ioc' set but not used [-Wunused-but-set-variable] > > It never used since introduction in I wonder why my compiler doesn't show this warning. Anyway, NAK: just removing the variable is the wrong fix - you need to look at the code more closely. The actual fix is to pass it to kernel_sendmsg() instead of 2. But thanks anyway! Do you want to respin your patch? > commit ece64fec164f ("rxrpc: Emit BUSY packets when supposed to rather than > ABORTs") Btw, this should be a 'Fixes: ("subject")' line and the patch needs to go to net, not net-next. David
[PATCH net-next 2/9] rxrpc: Emit the data Tx trace line before transmitting
Print the data Tx trace line before transmitting so that it appears before the trace lines indicating success or failure of the transmission. This makes the trace log less confusing. Signed-off-by: David Howells --- net/rxrpc/output.c |8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c index e8fb8922bca8..993d4cd247f9 100644 --- a/net/rxrpc/output.c +++ b/net/rxrpc/output.c @@ -378,11 +378,13 @@ int rxrpc_send_data_packet(struct rxrpc_call *call, struct sk_buff *skb, if ((lose++ & 7) == 7) { ret = 0; lost = true; - goto done; } } - _proto("Tx DATA %%%u { #%u }", serial, sp->hdr.seq); + trace_rxrpc_tx_data(call, sp->hdr.seq, serial, whdr.flags, + retrans, lost); + if (lost) + goto done; /* send the packet with the don't fragment bit set if we currently * think it's small enough */ @@ -415,8 +417,6 @@ int rxrpc_send_data_packet(struct rxrpc_call *call, struct sk_buff *skb, goto send_fragmentable; done: - trace_rxrpc_tx_data(call, sp->hdr.seq, serial, whdr.flags, - retrans, lost); if (ret >= 0) { if (whdr.flags & RXRPC_REQUEST_ACK) { call->peer->rtt_last_req = skb->tstamp;
Re: How to post rxrpc patches for net-next with deps on net?
David Miller wrote: > Just FYI, net has been merged into net-next. I saw, thanks. I've rebased my rxrpc-next on it, tagged it and pushed. Will mail a request soon. Thanks, David
How to post rxrpc patches for net-next with deps on net?
Hi Dave, I have some rxrpc patches to post for your net-next/master branch, but there's a dependency in them on the rxrpc-fixes-20180928 tag you pulled into your net/master branch. What's the best way to handle this? (1) Wait for you to merge net into net-next? (2) Base it on my own merge of rxrpc-fixes-20180928 into net-next? I had to fix up drivers/net/ethernet/netronome/nfp/nfp_net_common.c in the merge I'm currently using. (3) Let you fix up the discrepency between the two branches? Two lines that got removed in the rxrpc-fixes-20180928 tag cause a merge failure in net/rxrpc/conn_object.c when applying it to raw net-next by their unexpected presence. It's nothing too bad. Thanks, David
Re: r8169 tx batching(?) causing performance problems
David Miller wrote: > Probably you are seeing some interrupt mitigation. > > It seems there is a difference in how the interrupt mitigation is > programmed on for 8168 chips vs. others by default. Most get > all zeros in the IntrMitigate register, whilst for 8168 chips > a value of 0x5151 is programmed. I'm not sure what that means. I can't seem to find a programmer's manual for the chip. > You can play with ethtool to mess with the coalescing settings > to see if this is part of the problem. These bits from "ethtool -c enp3s0"? rx-usecs: 200 rx-frames: 4 rx-usecs-irq: 0 rx-frames-irq: 0 tx-usecs: 200 tx-frames: 4 tx-usecs-irq: 0 tx-frames-irq: 0 > I bet this might explain the behavior you see after including > even Heiner's TXCFG_AUTO_FIFO patch. Thanks, David
Re: r8169 tx batching(?) causing performance problems
David Howells wrote: > Can someone help me figure out a performance issue that seems to be caused by > an RTL8168g/8111g NIC that seems to be batching up transmissions - or, at > least, not starting immediately that it's given something to transmit? I've been told that: commit ad5f97faff4231e72b96bd96adbe1b6e977a9b86 Author: Heiner Kallweit Date: Fri Sep 28 23:51:54 2018 +0200 r8169: fix network stalls due to missing bit TXCFG_AUTO_FIFO might fix the problem, however it doesn't seem to help entirely. Whilst it does now seem to be transmitting whilst I'm generating up the queue, it still seems that I'm able to load up the queue faster the packets are being cleared from the queue. So in the following excerpt: id-0[001] d.h241.284702: net_rtl8169_interrupt: enp3s0 st=85 id-0[001] ..s241.284715: net_rtl8169_poll: enp3s0 st=85 dd-3186 [003] ...341.284741: net_rtl8169_tx: enp3s0 p=213-216 skb=2e2b0c3e dd-3186 [003] ...341.284777: net_rtl8169_tx: enp3s0 p=213-217 skb=3950deca dd-3186 [003] ...341.284790: net_rtl8169_tx: enp3s0 p=213-218 skb=471b2bc2 dd-3186 [003] ...341.284826: net_rtl8169_tx: enp3s0 p=213-219 skb=7c25ae16 dd-3186 [003] ...341.284839: net_rtl8169_tx: enp3s0 p=213-220 skb=cfbf719f dd-3186 [003] ...341.284870: net_rtl8169_tx: enp3s0 p=213-221 skb=d34a1f67 dd-3186 [003] ...341.284883: net_rtl8169_tx: enp3s0 p=213-222 skb=466e20e8 dd-3186 [003] ...341.284914: net_rtl8169_tx: enp3s0 p=213-223 skb=3d36cb1c dd-3186 [003] ...341.284927: net_rtl8169_tx: enp3s0 p=213-224 skb=399c06ea id-0[001] ..s241.284938: net_rtl8169_tx_done: enp3s0 p=213 skb=c797fea6 id-0[001] ..s241.284939: net_rtl8169_tx_done: enp3s0 p=214 skb=c0e4d6f0 id-0[001] ..s241.284940: net_rtl8169_tx_done: enp3s0 p=215 skb=2e2b0c3e id-0[001] ..s241.284941: net_rtl8169_tx_done: enp3s0 p=216 skb=3950deca id-0[001] ..s241.284941: net_rtl8169_tx_done: enp3s0 p=217 skb=471b2bc2 id-0[001] ..s241.284942: net_rtl8169_tx_done: enp3s0 p=218 skb=7c25ae16 id-0[001] ..s241.284943: net_rtl8169_tx_done: enp3s0 p=219 skb=cfbf719f id-0[001] ..s241.284944: net_rtl8169_tx_done: enp3s0 p=220 skb=d34a1f67 id-0[001] ..s241.284945: net_rtl8169_tx_done: enp3s0 p=221 skb=466e20e8 id-0[001] ..s241.284946: net_rtl8169_tx_done: enp3s0 p=222 skb=3d36cb1c id-0[001] ..s241.284947: net_rtl8169_tx_done: enp3s0 p=223 skb=399c06ea id-0[001] d.h241.284954: net_rtl8169_interrupt: enp3s0 st=85 packets are being queued something like 11-13uS apart, but there seems like a big gap of about 200uS in the idle thread between the poll and the first tx_done that might be masking things. David
r8169 tx batching(?) causing performance problems
the update of the Tx ring descriptors and does this right before interrupting the CPU. (5) Reading/writing the Tx descriptor ring from the device is slow. But again note that tshark on the server shows the ACKs arriving as a batch - I've also instrumented the NIC driver for that machine and observed the NIC apparently receiving ACK packets in batches. Also, what does the TxDescUnavail status flag signify? The NIC is setting it, but nothing in the driver takes any notice of it or clears it. Thanks, David --- commit 4ce09fa309d8fb4b1b910091607c2290c45541f1 Author: David Howells Date: Tue Oct 2 21:14:04 2018 +0100 Add tracepoints for Realtek r8169 diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index ab30aaeac6d3..8d3a86a674b7 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -31,6 +31,9 @@ #include #include +#define CREATE_TRACE_POINTS +#include + #define MODULENAME "r8169" #define FIRMWARE_8168D_1 "rtl_nic/rtl8168d-1.fw" @@ -6197,6 +6200,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb, RTL_W8(tp, TxPoll, NPQ); mmiowb(); + trace_net_rtl8169_tx(tp->dev, tp->dirty_tx, tp->cur_tx, skb); if (!TX_FRAGS_READY_FOR(tp, MAX_SKB_FRAGS)) { /* Avoid wrongly optimistic queue wake-up: rtl_tx thread must @@ -6302,6 +6306,7 @@ static void rtl_tx(struct net_device *dev, struct rtl8169_private *tp) rtl8169_unmap_tx_skb(tp_to_dev(tp), tx_skb, tp->TxDescArray + entry); if (status & LastFrag) { + trace_net_rtl8169_tx_done(dev, dirty_tx, tx_skb->skb); u64_stats_update_begin(>tx_stats.syncp); tp->tx_stats.packets++; tp->tx_stats.bytes += tx_skb->skb->len; @@ -6453,6 +6458,7 @@ static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, u32 budget dev->stats.multicast++; napi_gro_receive(>napi, skb); + trace_net_rtl8169_napi_rx(dev, skb); u64_stats_update_begin(>rx_stats.syncp); tp->rx_stats.packets++; @@ -6466,7 +6472,6 @@ static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, u32 budget count = cur_rx - tp->cur_rx; tp->cur_rx = cur_rx; - return count; } @@ -6478,6 +6483,8 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance) if (status == 0x || !(status & (RTL_EVENT_NAPI | tp->event_slow))) return IRQ_NONE; + trace_net_rtl8169_interrupt(tp->napi.dev, status); + rtl_irq_disable(tp); napi_schedule_irqoff(>napi); @@ -6559,6 +6566,7 @@ static int rtl8169_poll(struct napi_struct *napi, int budget) status = rtl_get_events(tp); rtl_ack_events(tp, status & ~tp->event_slow); + trace_net_rtl8169_poll(dev, status); if (status & RTL_EVENT_NAPI_RX) work_done = rtl_rx(dev, tp, (u32) budget); diff --git a/include/trace/events/net_rtl8169.h b/include/trace/events/net_rtl8169.h new file mode 100644 index ..92049078b3fa --- /dev/null +++ b/include/trace/events/net_rtl8169.h @@ -0,0 +1,125 @@ +/* Realtek RTL8169 tracepoints + * + * Copyright (C) 2018 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowe...@redhat.com) + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public Licence + * as published by the Free Software Foundation; either version + * 2 of the Licence, or (at your option) any later version. + */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM net_rtl8169 + +#if !defined(_TRACE_NET_RTL8169_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_NET_RTL8169_H + +#include +#include + + +TRACE_EVENT(net_rtl8169_interrupt, + TP_PROTO(struct net_device *netdev, u16 status), + + TP_ARGS(netdev, status), + + TP_STRUCT__entry( + __field(u16,status ) + __array(char, name, IFNAMSIZ ) +), + + TP_fast_assign( + __entry->status = status; + memcpy(__entry->name, netdev->name, IFNAMSIZ); + ), + + TP_printk("%s st=%x", __entry->name, __entry->status) + ); + +TRACE_EVENT(net_rtl8169_poll, + TP_PROTO(struct net_device *netdev, u16 status), + + TP_ARGS(netdev, status), + + TP_STRUCT__entry( + __field(u16,status ) + __array(char, nam
Massive delays between packet rx and UDP ->data_ready handoff
Hi, I'm seeing occasional massive delays between packets being received in the r8169 driver and them turning up in the rxrpc_data_ready() function called by the UDP or UDP6 ->data_ready() handler. By massive, I mean I've seen anything up to about 4s. I *think* that the effect gets rapidly worse over time, then plateaus. I stuck some tracepoints in the r8169 driver, including one in the rtl_rx() function, right after it calls napi_gro_receive() to queue the packet. rxrpc has tracepoints that give useful information here also, the first indicating that we've been given a new packet by skb_recv_udp(), the second that we've checksummed it and decoded the header and the third that the packet is about to be destroyed. All four lines print the (mangled) socket buffer address. So, for example: -0 [001] .Ns2 169.274008: net_rtl8169_napi_rx: enp3s0 skb=4f91d06d ... -0 [001] .Ns4 172.014880: rxrpc_skb: s=4f91d06d Rx RCV u=1 m=1 p=rxrpc_data_ready+0xed/0x195b -0 [001] .Ns4 172.014883: rxrpc_rx_packet: d3ea7470:8f7b61a0:0003:09c5 000c 0001 01 05 DATA 2740874925 skb=4f91d06d ... kworker/... [001] ...1 172.015007: rxrpc_skb: s=4f91d06d Rx FRE u=1 m=0 p=rxrpc_recvmsg_data.isra.0+0x840/0xdc1 This packet was received at 169.274008, came into possession of rxrpc at 172.014880 and had been checksummed and had its header processed by 172.014883; rxrpc has finished with the packet by 172.015007. On the third trace line, the "2740874925" is the difference between skb->tstamp and the current time in nanoseconds - ie. 2.74s. That's way too long and causes rxrpc to retransmit. Note that the two machines involved are on an uncontended gigE network with about 4m of cable separating them and neither the client (the machine on which this logging was captured) nor the server are particularly busy (the client was also running a tshark capture to a file). The base kernel is between current -rc5 and -rc6 plus a bunch of my own rxrpc, afs and iter patches and a patch to add a few tracepoints to the r8169 driver. Does anyone have any suggestions as to how I can work out what's going on or how to fix it? I have a more complete trace and matching pcap file if anyone wishes to look at them. I've also attached my config below in case that shines any light. Thanks, David --- # # Automatically generated file; DO NOT EDIT. # Linux/x86_64 4.19.0-rc5 Kernel Configuration # # # Compiler: x86_64-linux-gnu-gcc (GCC) 8.1.1 20180626 (Red Hat Cross 8.1.1-3) # CONFIG_CC_IS_GCC=y CONFIG_GCC_VERSION=80101 CONFIG_CLANG_VERSION=0 CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_EXTABLE_SORT=y CONFIG_THREAD_INFO_IN_TASK=y # # General setup # CONFIG_INIT_ENV_ARG_LIMIT=32 # CONFIG_COMPILE_TEST is not set CONFIG_LOCALVERSION="-fscache" # CONFIG_LOCALVERSION_AUTO is not set CONFIG_BUILD_SALT="" CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y CONFIG_HAVE_KERNEL_LZMA=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_HAVE_KERNEL_LZO=y CONFIG_HAVE_KERNEL_LZ4=y # CONFIG_KERNEL_GZIP is not set CONFIG_KERNEL_BZIP2=y # CONFIG_KERNEL_LZMA is not set # CONFIG_KERNEL_XZ is not set # CONFIG_KERNEL_LZO is not set # CONFIG_KERNEL_LZ4 is not set CONFIG_DEFAULT_HOSTNAME="(none)" CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_POSIX_MQUEUE_SYSCTL=y CONFIG_CROSS_MEMORY_ATTACH=y # CONFIG_USELIB is not set CONFIG_AUDIT=y CONFIG_HAVE_ARCH_AUDITSYSCALL=y CONFIG_AUDITSYSCALL=y CONFIG_AUDIT_WATCH=y CONFIG_AUDIT_TREE=y # # IRQ subsystem # CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_IRQ_SHOW=y CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y CONFIG_GENERIC_PENDING_IRQ=y CONFIG_GENERIC_IRQ_MIGRATION=y CONFIG_IRQ_DOMAIN=y CONFIG_IRQ_DOMAIN_HIERARCHY=y CONFIG_GENERIC_MSI_IRQ=y CONFIG_GENERIC_MSI_IRQ_DOMAIN=y CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y CONFIG_GENERIC_IRQ_RESERVATION_MODE=y CONFIG_IRQ_FORCED_THREADING=y CONFIG_SPARSE_IRQ=y # CONFIG_GENERIC_IRQ_DEBUGFS is not set CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_ARCH_CLOCKSOURCE_DATA=y CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y CONFIG_GENERIC_CMOS_UPDATE=y # # Timers subsystem # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ_COMMON=y # CONFIG_HZ_PERIODIC is not set # CONFIG_NO_HZ_IDLE is not set CONFIG_NO_HZ_FULL=y # CONFIG_NO_HZ is not set CONFIG_HIGH_RES_TIMERS=y CONFIG_PREEMPT_NONE=y # CONFIG_PREEMPT_VOLUNTARY is not set # CONFIG_PREEMPT is not set CONFIG_PREEMPT_COUNT=y # # CPU/Task time and stats accounting # CONFIG_VIRT_CPU_ACCOUNTING=y CONFIG_VIRT_CPU_ACCOUNTING_GEN=y # CONFIG_IRQ_TIME_ACCOUNTING is not set CONFIG_BSD_PROCESS_ACCT=y CONFIG_BSD_PROCESS_ACCT_V3=y CONFIG_TASKSTATS=y CONFIG_TASK_DELAY_ACCT=y CONFIG_TASK_XACCT=y CONFIG_TASK_IO_ACCOUNTING=y CONFIG_CPU_ISOLATION=y # # RCU Subsystem # CONFIG_TREE_RCU=y # CONFIG_RCU_EXPERT is not set CONFIG_SRCU=y CONFIG_TREE_SRCU=y CONFIG_RCU_STALL_COMMON=y
Re: Is it possible to get Rx timestamps in skb->tstamp?
Toke Høiland-Jørgensen wrote: > > Is it possible to tell a UDP socket that you'd like it to put > > reception timestamps in skb->tstamp? > > I think you probably want SO_TIMESTAMP*? > > See > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/networking/timestamping.txt That seems to work. I thought that only affected recvmsg() putting the timestamp into the ancillary data buffer, but apparently not. Thanks, David
Is it possible to get Rx timestamps in skb->tstamp?
Hi, Is it possible to tell a UDP socket that you'd like it to put reception timestamps in skb->tstamp? For some reason, I seem to remember that the kernel used to put something in there - and AF_RXRPC makes use of it. David
[PATCH net-next] rxrpc: Remove set but not used variable 'nowj'
From: Wei Yongjun Fixes gcc '-Wunused-but-set-variable' warning: net/rxrpc/proc.c: In function 'rxrpc_call_seq_show': net/rxrpc/proc.c:66:29: warning: variable 'nowj' set but not used [-Wunused-but-set-variable] unsigned long timeout = 0, nowj; ^ Signed-off-by: Wei Yongjun Signed-off-by: David Howells --- net/rxrpc/proc.c |3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/net/rxrpc/proc.c b/net/rxrpc/proc.c index 163d05df339d..9805e3b85c36 100644 --- a/net/rxrpc/proc.c +++ b/net/rxrpc/proc.c @@ -63,7 +63,7 @@ static int rxrpc_call_seq_show(struct seq_file *seq, void *v) struct rxrpc_peer *peer; struct rxrpc_call *call; struct rxrpc_net *rxnet = rxrpc_net(seq_file_net(seq)); - unsigned long timeout = 0, nowj; + unsigned long timeout = 0; rxrpc_seq_t tx_hard_ack, rx_hard_ack; char lbuff[50], rbuff[50]; @@ -97,7 +97,6 @@ static int rxrpc_call_seq_show(struct seq_file *seq, void *v) if (call->state != RXRPC_CALL_SERVER_PREALLOC) { timeout = READ_ONCE(call->expect_rx_by); - nowj = jiffies; timeout -= jiffies; }
Re: [lkp-robot] [bisect done] ace45bec6d [ 52.056290] EIP: lock_release
kernel test robot wrote: > 0day kernel testing robot got the below dmesg and the first bad commit is > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master > > commit ace45bec6d77bc061c3c3d8ad99e298ea9800c2b > Author: David Howells > AuthorDate: Fri Mar 30 21:04:43 2018 +0100 > Commit: David Howells > CommitDate: Fri Mar 30 21:04:43 2018 +0100 > > rxrpc: Fix firewall route keepalive Are you actually making AF_RXRPC or the AFS filesystem do anything? Or is it just happening spontaneously? David
Re: [PATCH net] KEYS: DNS: fix parsing multiple options
Simon Horman wrote: > > - eq = memchr(opt, '=', opt_len) ?: end; > > + eq = memchr(opt, '=', opt_len) ?: next_opt; > > opt_nlen = eq - opt; > > eq++; > > It seems risky to advance eq++ in the case there the value is empty. > Its not not pointing to the value but it may be accessed twice further on > in this loop. > > > opt_vlen = next_opt - eq; /* will be -1 if no value */ Yes, but note the next line ^^^ and the comment thereon. This is followed later by a check: if (opt_vlen <= 0) goto bad_option_value; in the dnserror option handler. Note, also, there is guaranteed to be a NUL char included at the end of the payload data, and that this is checked: if (datalen <= 1 || !data || data[datalen - 1] != '\0') { David
Re: [PATCH net] KEYS: DNS: fix parsing multiple options
The fix seems to work, but the use of kstrtoul(): ret = kstrtoul(eq, 10, ); is incorrect since the buffer can't been modified to block out the next argument if there is one, so the following fails: perl -e 'print "#dnserror=1#", "\x00" x 1' | keyctl padd dns_resolver desc @s (Note this is preexisting and nothing to do with your patch). I'm not sure how best to handle this. Anyway, Dave, can you take Eric's patch into the net tree with: Acked-by: David Howells David
'Adding' a writable proc file under /proc/net/afs/ [was [PATCH 3/3] afs: Implement namespacing]
Christoph Hellwigwrote: > So the real question here is why afs needs a write callback as the first > user of /proc/net ever in a day and time were we've deprecated adding > new proc files. This is a discussion that should be had with linux-fsdevel > and netdev in Cc. /proc/fs/afs/cells and /proc/fs/afs/rootcell are not new proc files, nor is the ability to configure the in-kernel afs filesystem by writing to them new. The cells file has been there since 2002 and the rootcell file before the start of the git history in 2005 - it's just that they've been in /proc/fs/afs/ not /proc/net/afs/. However, the afs procfiles need to be somewhere under /proc/net/ for the network namespacing, so I moved the directory over and emplaced a symlink at /proc/fs/afs. David
[PATCH net 0/5] rxrpc: Fixes
Here are three fixes for AF_RXRPC and two tracepoints that were useful for finding them: (1) Fix missing start of expect-Rx-by timeout on initial packet transmission so that calls will time out if the peer doesn't respond. (2) Fix error reception on AF_INET6 sockets by using the correct family of sockopts on the UDP transport socket. (3) Fix setting the minimum security level on kernel calls so that they can be encrypted. (4) Add a tracepoint to log ICMP/ICMP6 and other error reports from the transport socket. (5) Add a tracepoint to log UDP sendmsg failure so that we can find out if transmission failure occurred on the UDP socket. The patches are tagged here: git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git rxrpc-fixes-20180510 and can also be found on the following branch: http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-fixes David --- David Howells (5): rxrpc: Fix missing start of call timeout rxrpc: Fix error reception on AF_INET6 sockets rxrpc: Fix the min security level for kernel calls rxrpc: Add a tracepoint to log ICMP/ICMP6 and error messages rxrpc: Trace UDP transmission failure include/trace/events/rxrpc.h | 85 ++ net/rxrpc/af_rxrpc.c |2 - net/rxrpc/ar-internal.h |1 net/rxrpc/conn_event.c | 11 - net/rxrpc/input.c|2 - net/rxrpc/local_event.c |3 + net/rxrpc/local_object.c | 57 +--- net/rxrpc/output.c | 34 - net/rxrpc/peer_event.c | 46 +++ net/rxrpc/rxkad.c|6 ++- net/rxrpc/sendmsg.c | 10 + 11 files changed, 209 insertions(+), 48 deletions(-)
[PATCH net 1/5] rxrpc: Fix missing start of call timeout
The expect_rx_by call timeout is supposed to be set when a call is started to indicate that we need to receive a packet by that point. This is currently put back every time we receive a packet, but it isn't started when we first send a packet. Without this, the call may wait forever if the server doesn't deign to reply. Fix this by setting the timeout upon a successful UDP sendmsg call for the first DATA packet. The timeout is initiated only for initial transmission and not for subsequent retries as we don't want the retry mechanism to extend the timeout indefinitely. Fixes: a158bdd3247b ("rxrpc: Fix call timeouts") Reported-by: Marc Dionne <marc.dio...@auristor.com> Signed-off-by: David Howells <dhowe...@redhat.com> --- net/rxrpc/ar-internal.h |1 + net/rxrpc/input.c |2 +- net/rxrpc/output.c | 11 +++ net/rxrpc/sendmsg.c | 10 ++ 4 files changed, 23 insertions(+), 1 deletion(-) diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index 90d7079e0aa9..19975d2ca9a2 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -476,6 +476,7 @@ enum rxrpc_call_flag { RXRPC_CALL_SEND_PING, /* A ping will need to be sent */ RXRPC_CALL_PINGING, /* Ping in process */ RXRPC_CALL_RETRANS_TIMEOUT, /* Retransmission due to timeout occurred */ + RXRPC_CALL_BEGAN_RX_TIMER, /* We began the expect_rx_by timer */ }; /* diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index 0410d2277ca2..b5fd6381313d 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -971,7 +971,7 @@ static void rxrpc_input_call_packet(struct rxrpc_call *call, if (timo) { unsigned long now = jiffies, expect_rx_by; - expect_rx_by = jiffies + timo; + expect_rx_by = now + timo; WRITE_ONCE(call->expect_rx_by, expect_rx_by); rxrpc_reduce_call_timer(call, expect_rx_by, now, rxrpc_timer_set_for_normal); diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c index 7f1fc04775b3..6b9d27f0d7ec 100644 --- a/net/rxrpc/output.c +++ b/net/rxrpc/output.c @@ -414,6 +414,17 @@ int rxrpc_send_data_packet(struct rxrpc_call *call, struct sk_buff *skb, rxrpc_timer_set_for_lost_ack); } } + + if (sp->hdr.seq == 1 && + !test_and_set_bit(RXRPC_CALL_BEGAN_RX_TIMER, + >flags)) { + unsigned long nowj = jiffies, expect_rx_by; + + expect_rx_by = nowj + call->next_rx_timo; + WRITE_ONCE(call->expect_rx_by, expect_rx_by); + rxrpc_reduce_call_timer(call, expect_rx_by, nowj, + rxrpc_timer_set_for_normal); + } } rxrpc_set_keepalive(call); diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c index 206e802ccbdc..be01f9c5d963 100644 --- a/net/rxrpc/sendmsg.c +++ b/net/rxrpc/sendmsg.c @@ -223,6 +223,15 @@ static void rxrpc_queue_packet(struct rxrpc_sock *rx, struct rxrpc_call *call, ret = rxrpc_send_data_packet(call, skb, false); if (ret < 0) { + switch (ret) { + case -ENETUNREACH: + case -EHOSTUNREACH: + case -ECONNREFUSED: + rxrpc_set_call_completion(call, + RXRPC_CALL_LOCAL_ERROR, + 0, ret); + goto out; + } _debug("need instant resend %d", ret); rxrpc_instant_resend(call, ix); } else { @@ -241,6 +250,7 @@ static void rxrpc_queue_packet(struct rxrpc_sock *rx, struct rxrpc_call *call, rxrpc_timer_set_for_send); } +out: rxrpc_free_skb(skb, rxrpc_skb_tx_freed); _leave(""); }
[PATCH net 2/5] rxrpc: Fix error reception on AF_INET6 sockets
AF_RXRPC tries to turn on IP_RECVERR and IP_MTU_DISCOVER on the UDP socket it just opened for communications with the outside world, regardless of the type of socket. Unfortunately, this doesn't work with an AF_INET6 socket. Fix this by turning on IPV6_RECVERR and IPV6_MTU_DISCOVER instead if the socket is of the AF_INET6 family. Without this, kAFS server and address rotation doesn't work correctly because the algorithm doesn't detect received network errors. Fixes: 75b54cb57ca3 ("rxrpc: Add IPv6 support") Signed-off-by: David Howells <dhowe...@redhat.com> --- net/rxrpc/local_object.c | 57 ++ 1 file changed, 42 insertions(+), 15 deletions(-) diff --git a/net/rxrpc/local_object.c b/net/rxrpc/local_object.c index 8b54e9531d52..b493e6b62740 100644 --- a/net/rxrpc/local_object.c +++ b/net/rxrpc/local_object.c @@ -134,22 +134,49 @@ static int rxrpc_open_socket(struct rxrpc_local *local, struct net *net) } } - /* we want to receive ICMP errors */ - opt = 1; - ret = kernel_setsockopt(local->socket, SOL_IP, IP_RECVERR, - (char *) , sizeof(opt)); - if (ret < 0) { - _debug("setsockopt failed"); - goto error; - } + switch (local->srx.transport.family) { + case AF_INET: + /* we want to receive ICMP errors */ + opt = 1; + ret = kernel_setsockopt(local->socket, SOL_IP, IP_RECVERR, + (char *) , sizeof(opt)); + if (ret < 0) { + _debug("setsockopt failed"); + goto error; + } - /* we want to set the don't fragment bit */ - opt = IP_PMTUDISC_DO; - ret = kernel_setsockopt(local->socket, SOL_IP, IP_MTU_DISCOVER, - (char *) , sizeof(opt)); - if (ret < 0) { - _debug("setsockopt failed"); - goto error; + /* we want to set the don't fragment bit */ + opt = IP_PMTUDISC_DO; + ret = kernel_setsockopt(local->socket, SOL_IP, IP_MTU_DISCOVER, + (char *) , sizeof(opt)); + if (ret < 0) { + _debug("setsockopt failed"); + goto error; + } + break; + + case AF_INET6: + /* we want to receive ICMP errors */ + opt = 1; + ret = kernel_setsockopt(local->socket, SOL_IPV6, IPV6_RECVERR, + (char *) , sizeof(opt)); + if (ret < 0) { + _debug("setsockopt failed"); + goto error; + } + + /* we want to set the don't fragment bit */ + opt = IPV6_PMTUDISC_DO; + ret = kernel_setsockopt(local->socket, SOL_IPV6, IPV6_MTU_DISCOVER, + (char *) , sizeof(opt)); + if (ret < 0) { + _debug("setsockopt failed"); + goto error; + } + break; + + default: + BUG(); } /* set the socket up */
[PATCH net 3/5] rxrpc: Fix the min security level for kernel calls
Fix the kernel call initiation to set the minimum security level for kernel initiated calls (such as from kAFS) from the sockopt value. Fixes: 19ffa01c9c45 ("rxrpc: Use structs to hold connection params and protocol info") Signed-off-by: David Howells <dhowe...@redhat.com> --- net/rxrpc/af_rxrpc.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c index 9a2c8e7c000e..2b463047dd7b 100644 --- a/net/rxrpc/af_rxrpc.c +++ b/net/rxrpc/af_rxrpc.c @@ -313,7 +313,7 @@ struct rxrpc_call *rxrpc_kernel_begin_call(struct socket *sock, memset(, 0, sizeof(cp)); cp.local= rx->local; cp.key = key; - cp.security_level = 0; + cp.security_level = rx->min_sec_level; cp.exclusive= false; cp.upgrade = upgrade; cp.service_id = srx->srx_service;
[PATCH net 4/5] rxrpc: Add a tracepoint to log ICMP/ICMP6 and error messages
Add a tracepoint to log received ICMP/ICMP6 events and other error messages. Signed-off-by: David Howells <dhowe...@redhat.com> --- include/trace/events/rxrpc.h | 30 +++ net/rxrpc/peer_event.c | 46 +- 2 files changed, 53 insertions(+), 23 deletions(-) diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index 9e96c2fe2793..497d0b67f421 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -15,6 +15,7 @@ #define _TRACE_RXRPC_H #include +#include /* * Define enums for tracing information. @@ -1374,6 +1375,35 @@ TRACE_EVENT(rxrpc_resend, __entry->anno) ); +TRACE_EVENT(rxrpc_rx_icmp, + TP_PROTO(struct rxrpc_peer *peer, struct sock_extended_err *ee, +struct sockaddr_rxrpc *srx), + + TP_ARGS(peer, ee, srx), + + TP_STRUCT__entry( + __field(unsigned int, peer) + __field_struct(struct sock_extended_err,ee ) + __field_struct(struct sockaddr_rxrpc, srx ) +), + + TP_fast_assign( + __entry->peer = peer->debug_id; + memcpy(&__entry->ee, ee, sizeof(__entry->ee)); + memcpy(&__entry->srx, srx, sizeof(__entry->srx)); + ), + + TP_printk("P=%08x o=%u t=%u c=%u i=%u d=%u e=%d %pISp", + __entry->peer, + __entry->ee.ee_origin, + __entry->ee.ee_type, + __entry->ee.ee_code, + __entry->ee.ee_info, + __entry->ee.ee_data, + __entry->ee.ee_errno, + &__entry->srx.transport) + ); + #endif /* _TRACE_RXRPC_H */ /* This part must be outside protection */ diff --git a/net/rxrpc/peer_event.c b/net/rxrpc/peer_event.c index 78c2f95d1f22..0ed8b651cec2 100644 --- a/net/rxrpc/peer_event.c +++ b/net/rxrpc/peer_event.c @@ -28,39 +28,39 @@ static void rxrpc_store_error(struct rxrpc_peer *, struct sock_exterr_skb *); * Find the peer associated with an ICMP packet. */ static struct rxrpc_peer *rxrpc_lookup_peer_icmp_rcu(struct rxrpc_local *local, -const struct sk_buff *skb) +const struct sk_buff *skb, +struct sockaddr_rxrpc *srx) { struct sock_exterr_skb *serr = SKB_EXT_ERR(skb); - struct sockaddr_rxrpc srx; _enter(""); - memset(, 0, sizeof(srx)); - srx.transport_type = local->srx.transport_type; - srx.transport_len = local->srx.transport_len; - srx.transport.family = local->srx.transport.family; + memset(srx, 0, sizeof(*srx)); + srx->transport_type = local->srx.transport_type; + srx->transport_len = local->srx.transport_len; + srx->transport.family = local->srx.transport.family; /* Can we see an ICMP4 packet on an ICMP6 listening socket? and vice * versa? */ - switch (srx.transport.family) { + switch (srx->transport.family) { case AF_INET: - srx.transport.sin.sin_port = serr->port; + srx->transport.sin.sin_port = serr->port; switch (serr->ee.ee_origin) { case SO_EE_ORIGIN_ICMP: _net("Rx ICMP"); - memcpy(_addr, + memcpy(>transport.sin.sin_addr, skb_network_header(skb) + serr->addr_offset, sizeof(struct in_addr)); break; case SO_EE_ORIGIN_ICMP6: _net("Rx ICMP6 on v4 sock"); - memcpy(_addr, + memcpy(>transport.sin.sin_addr, skb_network_header(skb) + serr->addr_offset + 12, sizeof(struct in_addr)); break; default: - memcpy(_addr, _hdr(skb)->saddr, + memcpy(>transport.sin.sin_addr, _hdr(skb)->saddr, sizeof(struct in_addr)); break; } @@ -68,25 +68,25 @@ static struct rxrpc_peer *rxrpc_lookup_peer_icmp_rcu(struct rxrpc_local *local, #ifdef CONFIG_AF_RXRPC_IPV6 case AF_INET6: - srx.transport.sin6.sin6_port = serr->port; + srx->transport.sin6.sin6_port = serr->port; switch (serr->ee.ee_origin) {
[PATCH net 5/5] rxrpc: Trace UDP transmission failure
Add a tracepoint to log transmission failure from the UDP transport socket being used by AF_RXRPC. Signed-off-by: David Howells <dhowe...@redhat.com> --- include/trace/events/rxrpc.h | 55 ++ net/rxrpc/conn_event.c | 11 ++-- net/rxrpc/local_event.c |3 ++ net/rxrpc/output.c | 23 -- net/rxrpc/rxkad.c|6 +++-- 5 files changed, 90 insertions(+), 8 deletions(-) diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index 497d0b67f421..077e664ac9a2 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -211,6 +211,20 @@ enum rxrpc_congest_change { rxrpc_cong_saw_nack, }; +enum rxrpc_tx_fail_trace { + rxrpc_tx_fail_call_abort, + rxrpc_tx_fail_call_ack, + rxrpc_tx_fail_call_data_frag, + rxrpc_tx_fail_call_data_nofrag, + rxrpc_tx_fail_call_final_resend, + rxrpc_tx_fail_conn_abort, + rxrpc_tx_fail_conn_challenge, + rxrpc_tx_fail_conn_response, + rxrpc_tx_fail_reject, + rxrpc_tx_fail_version_keepalive, + rxrpc_tx_fail_version_reply, +}; + #endif /* end __RXRPC_DECLARE_TRACE_ENUMS_ONCE_ONLY */ /* @@ -438,6 +452,19 @@ enum rxrpc_congest_change { EM(RXRPC_CALL_LOCAL_ERROR, "LocalError") \ E_(RXRPC_CALL_NETWORK_ERROR,"NetError") +#define rxrpc_tx_fail_traces \ + EM(rxrpc_tx_fail_call_abort,"CallAbort") \ + EM(rxrpc_tx_fail_call_ack, "CallAck") \ + EM(rxrpc_tx_fail_call_data_frag,"CallDataFrag") \ + EM(rxrpc_tx_fail_call_data_nofrag, "CallDataNofrag") \ + EM(rxrpc_tx_fail_call_final_resend, "CallFinalResend") \ + EM(rxrpc_tx_fail_conn_abort,"ConnAbort") \ + EM(rxrpc_tx_fail_conn_challenge,"ConnChall") \ + EM(rxrpc_tx_fail_conn_response, "ConnResp") \ + EM(rxrpc_tx_fail_reject,"Reject") \ + EM(rxrpc_tx_fail_version_keepalive, "VerKeepalive") \ + E_(rxrpc_tx_fail_version_reply, "VerReply") + /* * Export enum symbols via userspace. */ @@ -461,6 +488,7 @@ rxrpc_propose_ack_traces; rxrpc_propose_ack_outcomes; rxrpc_congest_modes; rxrpc_congest_changes; +rxrpc_tx_fail_traces; /* * Now redefine the EM() and E_() macros to map the enums to the strings that @@ -1404,6 +1432,33 @@ TRACE_EVENT(rxrpc_rx_icmp, &__entry->srx.transport) ); +TRACE_EVENT(rxrpc_tx_fail, + TP_PROTO(unsigned int debug_id, rxrpc_serial_t serial, int ret, +enum rxrpc_tx_fail_trace what), + + TP_ARGS(debug_id, serial, ret, what), + + TP_STRUCT__entry( + __field(unsigned int, debug_id) + __field(rxrpc_serial_t, serial ) + __field(int,ret ) + __field(enum rxrpc_tx_fail_trace, what) +), + + TP_fast_assign( + __entry->debug_id = debug_id; + __entry->serial = serial; + __entry->ret = ret; + __entry->what = what; + ), + + TP_printk("c=%08x r=%x ret=%d %s", + __entry->debug_id, + __entry->serial, + __entry->ret, + __print_symbolic(__entry->what, rxrpc_tx_fail_traces)) + ); + #endif /* _TRACE_RXRPC_H */ /* This part must be outside protection */ diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c index c717152070df..1350f1be8037 100644 --- a/net/rxrpc/conn_event.c +++ b/net/rxrpc/conn_event.c @@ -40,7 +40,7 @@ static void rxrpc_conn_retransmit_call(struct rxrpc_connection *conn, } __attribute__((packed)) pkt; struct rxrpc_ackinfo ack_info; size_t len; - int ioc; + int ret, ioc; u32 serial, mtu, call_id, padding; _enter("%d", conn->debug_id); @@ -135,10 +135,13 @@ static void rxrpc_conn_retransmit_call(struct rxrpc_connection *conn, break; } - kernel_sendmsg(conn->params.local->socket, , iov, ioc, len); + ret = kernel_sendmsg(conn->params.local->socket, , iov, ioc, len); conn->params.peer->last_tx_at = ktime_get_real(); + if (ret < 0) + trace_rxrpc_tx_fail(conn->debug_id, serial, ret, + rxrpc_tx_fail_call_final_resend); + _leave(""); - return; } /* @@ -236,6 +239,8 @@ static int rxrpc_abort_connection(struct rxrpc_connection *conn,
Re: simplify procfs code for seq_file instances V2
Note that your kernel hits the: inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage. swapper/0/0 [HC1[1]:SC0[0]:HE0:SE1] takes: (ptrval) (fs_reclaim){?.+.}, at: fs_reclaim_acquire+0x12/0x35 {HARDIRQ-ON-W} state was registered at: fs_reclaim_acquire+0x32/0x35 kmem_cache_alloc_node_trace+0x49/0x2cf alloc_worker+0x1d/0x49 init_rescuer.part.7+0x19/0x8f workqueue_init+0xc0/0x1fe kernel_init_freeable+0xdc/0x433 kernel_init+0xa/0xf5 ret_from_fork+0x24/0x30 bug, as described here: https://groups.google.com/forum/#!msg/syzkaller-bugs/sJC3Y3hOM08/aO3z9JXoAgAJ David
Re: [PATCH 20/39] afs: simplify procfs code
Christoph Hellwigwrote: > I don't think you should need any of these. seq_file_net or > seq_file_single_net will return you the net_ns based on a struct > seq_file. And even from your write routines you can reach the > seq_file in file->private pretty easily. You've taken away things like single_open/release_net() which means I can't supply my own fops and use the proc_net stuff. I wonder if I should add a write op to struct proc_dir_entry. David
Re: [PATCH 04/40] proc: introduce proc_create_seq{,_data}
Christoph Hellwigwrote: > + > +struct proc_dir_entry *proc_create_seq_data(const char *name, umode_t mode, > + struct proc_dir_entry *parent, const struct seq_operations *ops, > + void *data) > +{ > ... > +EXPORT_SYMBOL(proc_create_seq_data); Please add documentation comments to exported functions when you add them. David
[PATCH 02/02] afs: Implement namespacing
Implement namespacing within AFS, but don't yet let mounts occur outside the init namespace. An additional patch will be required propagate the network namespace across automounts. Signed-off-by: David Howells <dhowe...@redhat.com> --- cell.c |4 +- internal.h | 39 +++--- main.c | 33 ++ proc.c | 89 +++-- super.c| 20 - 5 files changed, 121 insertions(+), 64 deletions(-) diff --git a/fs/afs/cell.c b/fs/afs/cell.c index fdf4c36cff79..a98a8a3d5544 100644 --- a/fs/afs/cell.c +++ b/fs/afs/cell.c @@ -528,7 +528,7 @@ static int afs_activate_cell(struct afs_net *net, struct afs_cell *cell) NULL, 0, cell, 0, true); #endif - ret = afs_proc_cell_setup(net, cell); + ret = afs_proc_cell_setup(cell); if (ret < 0) return ret; spin_lock(>proc_cells_lock); @@ -544,7 +544,7 @@ static void afs_deactivate_cell(struct afs_net *net, struct afs_cell *cell) { _enter("%s", cell->name); - afs_proc_cell_remove(net, cell); + afs_proc_cell_remove(cell); spin_lock(>proc_cells_lock); list_del_init(>proc_link); diff --git a/fs/afs/internal.h b/fs/afs/internal.h index f8086ec95e24..9863dac6ff6a 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -22,6 +22,8 @@ #include #include #include +#include +#include #include #include "afs.h" @@ -40,7 +42,8 @@ struct afs_mount_params { afs_voltype_t type; /* type of volume requested */ int volnamesz; /* size of volume name */ const char *volname; /* name of volume to mount */ - struct afs_net *net; /* Network namespace in effect */ + struct net *net_ns;/* Network namespace in effect */ + struct afs_net *net; /* the AFS net namespace stuff */ struct afs_cell *cell; /* cell in which to find volume */ struct afs_volume *volume;/* volume record */ struct key *key; /* key to use for secure mounting */ @@ -189,7 +192,7 @@ struct afs_read { * - there's one superblock per volume */ struct afs_super_info { - struct afs_net *net; /* Network namespace */ + struct net *net_ns;/* Network namespace */ struct afs_cell *cell; /* The cell in which the volume resides */ struct afs_volume *volume;/* volume record */ booldyn_root; /* True if dynamic root */ @@ -218,6 +221,7 @@ struct afs_sysnames { * AFS network namespace record. */ struct afs_net { + struct net *net; /* Backpointer to the owning net namespace */ struct afs_uuid uuid; boollive; /* F if this namespace is being removed */ @@ -280,7 +284,6 @@ struct afs_net { }; extern const char afs_init_sysname[]; -extern struct afs_net __afs_net;// Dummy AFS network namespace; TODO: replace with real netns enum afs_cell_state { AFS_CELL_UNSET, @@ -787,34 +790,36 @@ extern int afs_drop_inode(struct inode *); * main.c */ extern struct workqueue_struct *afs_wq; +extern int afs_net_id; -static inline struct afs_net *afs_d2net(struct dentry *dentry) +static inline struct afs_net *afs_net(struct net *net) { - return &__afs_net; + return net_generic(net, afs_net_id); } -static inline struct afs_net *afs_i2net(struct inode *inode) +static inline struct afs_net *afs_sb2net(struct super_block *sb) { - return &__afs_net; + return afs_net(AFS_FS_S(sb)->net_ns); } -static inline struct afs_net *afs_v2net(struct afs_vnode *vnode) +static inline struct afs_net *afs_d2net(struct dentry *dentry) { - return &__afs_net; + return afs_sb2net(dentry->d_sb); } -static inline struct afs_net *afs_sock2net(struct sock *sk) +static inline struct afs_net *afs_i2net(struct inode *inode) { - return &__afs_net; + return afs_sb2net(inode->i_sb); } -static inline struct afs_net *afs_get_net(struct afs_net *net) +static inline struct afs_net *afs_v2net(struct afs_vnode *vnode) { - return net; + return afs_i2net(>vfs_inode); } -static inline void afs_put_net(struct afs_net *net) +static inline struct afs_net *afs_sock2net(struct sock *sk) { + return net_generic(sock_net(sk), afs_net_id); } static inline void __afs_stat(atomic_t *s) @@ -849,8 +854,8 @@ extern int afs_get_ipv4_interfaces(struct afs_interface *, size_t, bool); */ extern int __net_init afs_proc_init(struct afs_net *); extern void __net_exit afs_p
[PATCH 01/02] net: Export get_proc_net()
Export get_proc_net() so that write() routines attached to net-namespaced proc files can find the network namespace that they're in. Currently, this is only accessible via seqfile routines. This will permit AFS to have a separate cell database per-network namespace and to manage each one independently of the others. Signed-off-by: David Howells <dhowe...@redhat.com> --- fs/proc/proc_net.c |3 ++- include/linux/proc_fs.h |2 ++ 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/proc/proc_net.c b/fs/proc/proc_net.c index 1763f370489d..23d56146e38f 100644 --- a/fs/proc/proc_net.c +++ b/fs/proc/proc_net.c @@ -33,10 +33,11 @@ static inline struct net *PDE_NET(struct proc_dir_entry *pde) return pde->parent->data; } -static struct net *get_proc_net(const struct inode *inode) +struct net *get_proc_net(const struct inode *inode) { return maybe_get_net(PDE_NET(PDE(inode))); } +EXPORT_SYMBOL_GPL(get_proc_net); int seq_open_net(struct inode *ino, struct file *f, const struct seq_operations *ops, int size) diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h index 928ef9e4d912..47a87121d30c 100644 --- a/include/linux/proc_fs.h +++ b/include/linux/proc_fs.h @@ -79,6 +79,8 @@ static inline struct proc_dir_entry *proc_net_mkdir( return proc_mkdir_data(name, 0, parent, net); } +extern struct net *get_proc_net(const struct inode *inode); + struct ns_common; int open_related_ns(struct ns_common *ns, struct ns_common *(*get_ns)(struct ns_common *ns));
Re: [PATCH 20/39] afs: simplify procfs code
David Howells <dhowe...@redhat.com> wrote: > > Use remove_proc_subtree to remove the whole subtree on cleanup, and > > unwind the registration loop into individual calls. Switch to use > > proc_create_seq where applicable. > > Note that this is likely going to clash with my patch to net-namespace all of > the afs proc files: > > > https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/commit/?h=mount-context=f60c26c827c073583107ebf19d87bc5c0e71c3d2 > > If it helps, I should be able to disentangle this from the mount-api changes > since the subsequent patch connects the dots to propagate the network > namespace over automount using the new fs_context to do it. Okay, I'll follow up this mail with a pair of patches to just use network namespacing in AFS. The first exports a function from core code only; the second is the actual modifications to AFS. Note that this doesn't actually yet permit AFS to use any other namespace. The propagation-over-automount issue is handled by a separate patch. David
Re: [PATCH 20/39] afs: simplify procfs code
Christoph Hellwigwrote: > Use remove_proc_subtree to remove the whole subtree on cleanup, and > unwind the registration loop into individual calls. Switch to use > proc_create_seq where applicable. Note that this is likely going to clash with my patch to net-namespace all of the afs proc files: https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/commit/?h=mount-context=f60c26c827c073583107ebf19d87bc5c0e71c3d2 If it helps, I should be able to disentangle this from the mount-api changes since the subsequent patch connects the dots to propagate the network namespace over automount using the new fs_context to do it. David
Re: linux-next: build failure after merge of the tip tree
Stephen Rothwellwrote: > + wait_var_event(>nr_calls, !atomic_read(>nr_calls)); I would prefer == 0 to ! as it's not really a true/false value. But apart from that, it's looks okay and you can add my Reviewed-by. David
Re: linux-next: build failure after merge of the tip tree
Peter Zijlstrawrote: > I figured that since there were only a handful of users it wasn't a > popular API, also David very much knew of those patches changing it so > could easily have pulled in the special tip/sched/wait branch :/ I'm not sure I could, since I have to base on net-next. I'm not sure what DaveM's policy on that is. Also, it might've been better not to simply erase the atomic_t wait API immediately, but substitute wrappers for it to be removed one iteration hence. David
[PATCH net-next] rxrpc: Fix undefined packet handling
By analogy with other Rx implementations, RxRPC packet types 9, 10 and 11 should just be discarded rather than being aborted like other undefined packet types. Reported-by: Jeffrey Altman <jalt...@auristor.com> Signed-off-by: David Howells <dhowe...@redhat.com> --- net/rxrpc/input.c|6 ++ net/rxrpc/protocol.h |6 ++ 2 files changed, 12 insertions(+) diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index 21800e6f5019..0410d2277ca2 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -1200,6 +1200,12 @@ void rxrpc_data_ready(struct sock *udp_sk) !rxrpc_validate_jumbo(skb)) goto bad_message; break; + + /* Packet types 9-11 should just be ignored. */ + case RXRPC_PACKET_TYPE_PARAMS: + case RXRPC_PACKET_TYPE_10: + case RXRPC_PACKET_TYPE_11: + goto discard; } rcu_read_lock(); diff --git a/net/rxrpc/protocol.h b/net/rxrpc/protocol.h index 4bddcf3face3..93da73bf7098 100644 --- a/net/rxrpc/protocol.h +++ b/net/rxrpc/protocol.h @@ -46,6 +46,9 @@ struct rxrpc_wire_header { #define RXRPC_PACKET_TYPE_CHALLENGE6 /* connection security challenge (SRVR->CLNT) */ #define RXRPC_PACKET_TYPE_RESPONSE 7 /* connection secutity response (CLNT->SRVR) */ #define RXRPC_PACKET_TYPE_DEBUG8 /* debug info request */ +#define RXRPC_PACKET_TYPE_PARAMS 9 /* Parameter negotiation (unspec'd, ignore) */ +#define RXRPC_PACKET_TYPE_10 10 /* Ignored */ +#define RXRPC_PACKET_TYPE_11 11 /* Ignored */ #define RXRPC_PACKET_TYPE_VERSION 13 /* version string request */ #define RXRPC_N_PACKET_TYPES 14 /* number of packet types (incl type 0) */ @@ -78,6 +81,9 @@ struct rxrpc_wire_header { (1 << RXRPC_PACKET_TYPE_CHALLENGE) |\ (1 << RXRPC_PACKET_TYPE_RESPONSE) | \ /*(1 << RXRPC_PACKET_TYPE_DEBUG) | */ \ + (1 << RXRPC_PACKET_TYPE_PARAMS) | \ + (1 << RXRPC_PACKET_TYPE_10) | \ + (1 << RXRPC_PACKET_TYPE_11) | \ (1 << RXRPC_PACKET_TYPE_VERSION)) /*/
[PATCH net-next 00/12] rxrpc: Fixes and more traces
Here are some patches that add some more tracepoints to AF_RXRPC and fix some issues therein: (1) Fix the use of VERSION packets to keep firewall routes open. (2) Fix the incorrect current time usage in a tracepoint. (3) Fix Tx ring annotation corruption. (4) Fix accidental conversion of call-level abort into connection-level abort. (5) Fix calculation of resend time. (6) Remove a couple of unused variables. (7) Fix a bunch of checker warnings and an error. Note that not all warnings can be quashed as checker doesn't seem to correctly handle seqlocks. (8) Fix a potential race between call destruction and socket/net destruction. (9) Add a tracepoint to track rxrpc_local refcounting. (10) Fix an apparent leak of rxrpc_local objects. (11) Add a tracepoint to track rxrpc_peer refcounting. (12) Fix a leak of rxrpc_peer objects. The patches are tagged here: git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git rxrpc-next-20180330 and can also be found on this branch: http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-next David --- David Howells (10): rxrpc: Fix firewall route keepalive rxrpc: Fix a bit of time confusion rxrpc: Fix Tx ring annotation after initial Tx failure rxrpc: Don't treat call aborts as conn aborts rxrpc: Fix checker warnings and errors rxrpc: Fix potential call vs socket/net destruction race rxrpc: Add a tracepoint to track rxrpc_local refcounting rxrpc: Fix apparent leak of rxrpc_local objects rxrpc: Add a tracepoint to track rxrpc_peer refcounting rxrpc: Fix leak of rxrpc_peer objects Marc Dionne (1): rxrpc: Fix resend event time calculation Sebastian Andrzej Siewior (1): rxrpc: remove unused static variables include/trace/events/rxrpc.h | 85 net/rxrpc/af_rxrpc.c |6 +++ net/rxrpc/ar-internal.h | 68 +++-- net/rxrpc/call_accept.c |9 +++- net/rxrpc/call_event.c |4 +- net/rxrpc/call_object.c | 17 ++- net/rxrpc/conn_client.c |3 + net/rxrpc/conn_event.c |3 + net/rxrpc/conn_object.c | 10 net/rxrpc/conn_service.c |1 net/rxrpc/input.c| 17 +-- net/rxrpc/local_object.c | 65 +++- net/rxrpc/net_ns.c | 24 ++ net/rxrpc/output.c | 59 + net/rxrpc/peer_event.c | 98 ++ net/rxrpc/peer_object.c | 93 +++- net/rxrpc/proc.c |6 +++ net/rxrpc/rxkad.c|2 + net/rxrpc/security.c |3 - net/rxrpc/sendmsg.c |7 +++ 20 files changed, 509 insertions(+), 71 deletions(-)
[PATCH net-next 01/12] rxrpc: Fix firewall route keepalive
Fix the firewall route keepalive part of AF_RXRPC which is currently function incorrectly by replying to VERSION REPLY packets from the server with VERSION REQUEST packets. Instead, send VERSION REPLY packets to the peers of service connections to act as keep-alives 20s after the latest packet was transmitted to that peer. Also, just discard VERSION REPLY packets rather than replying to them. Signed-off-by: David Howells <dhowe...@redhat.com> --- net/rxrpc/af_rxrpc.c|4 ++ net/rxrpc/ar-internal.h | 14 ++- net/rxrpc/conn_event.c |3 + net/rxrpc/input.c |2 + net/rxrpc/net_ns.c | 21 ++ net/rxrpc/output.c | 59 - net/rxrpc/peer_event.c | 96 +++ net/rxrpc/peer_object.c |7 +++ net/rxrpc/rxkad.c |2 + 9 files changed, 204 insertions(+), 4 deletions(-) diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c index ec5ec68be1aa..0b3026b8fa40 100644 --- a/net/rxrpc/af_rxrpc.c +++ b/net/rxrpc/af_rxrpc.c @@ -762,6 +762,7 @@ static __poll_t rxrpc_poll(struct file *file, struct socket *sock, static int rxrpc_create(struct net *net, struct socket *sock, int protocol, int kern) { + struct rxrpc_net *rxnet; struct rxrpc_sock *rx; struct sock *sk; @@ -801,6 +802,9 @@ static int rxrpc_create(struct net *net, struct socket *sock, int protocol, rwlock_init(>call_lock); memset(>srx, 0, sizeof(rx->srx)); + rxnet = rxrpc_net(sock_net(>sk)); + timer_reduce(>peer_keepalive_timer, jiffies + 1); + _leave(" = 0 [%p]", rx); return 0; } diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index 21cf164b6d85..8a348e0a9d95 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -97,8 +97,16 @@ struct rxrpc_net { struct list_headlocal_endpoints; struct mutexlocal_mutex;/* Lock for ->local_endpoints */ - spinlock_t peer_hash_lock; /* Lock for ->peer_hash */ DECLARE_HASHTABLE (peer_hash, 10); + spinlock_t peer_hash_lock; /* Lock for ->peer_hash */ + +#define RXRPC_KEEPALIVE_TIME 20 /* NAT keepalive time in seconds */ + u8 peer_keepalive_cursor; + ktime_t peer_keepalive_base; + struct hlist_head peer_keepalive[RXRPC_KEEPALIVE_TIME + 1]; + struct hlist_head peer_keepalive_new; + struct timer_list peer_keepalive_timer; + struct work_struct peer_keepalive_work; }; /* @@ -285,6 +293,8 @@ struct rxrpc_peer { struct hlist_head error_targets; /* targets for net error distribution */ struct work_struct error_distributor; struct rb_root service_conns; /* Service connections */ + struct hlist_node keepalive_link; /* Link in net->peer_keepalive[] */ + time64_tlast_tx_at; /* Last time packet sent here */ seqlock_t service_conn_lock; spinlock_t lock; /* access lock */ unsigned intif_mtu; /* interface MTU for this peer */ @@ -1026,6 +1036,7 @@ int rxrpc_send_ack_packet(struct rxrpc_call *, bool, rxrpc_serial_t *); int rxrpc_send_abort_packet(struct rxrpc_call *); int rxrpc_send_data_packet(struct rxrpc_call *, struct sk_buff *, bool); void rxrpc_reject_packets(struct rxrpc_local *); +void rxrpc_send_keepalive(struct rxrpc_peer *); /* * peer_event.c @@ -1034,6 +1045,7 @@ void rxrpc_error_report(struct sock *); void rxrpc_peer_error_distributor(struct work_struct *); void rxrpc_peer_add_rtt(struct rxrpc_call *, enum rxrpc_rtt_rx_trace, rxrpc_serial_t, rxrpc_serial_t, ktime_t, ktime_t); +void rxrpc_peer_keepalive_worker(struct work_struct *); /* * peer_object.c diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c index d2ec3fd593e8..c717152070df 100644 --- a/net/rxrpc/conn_event.c +++ b/net/rxrpc/conn_event.c @@ -136,6 +136,7 @@ static void rxrpc_conn_retransmit_call(struct rxrpc_connection *conn, } kernel_sendmsg(conn->params.local->socket, , iov, ioc, len); + conn->params.peer->last_tx_at = ktime_get_real(); _leave(""); return; } @@ -239,6 +240,8 @@ static int rxrpc_abort_connection(struct rxrpc_connection *conn, return -EAGAIN; } + conn->params.peer->last_tx_at = ktime_get_real(); + _leave(" = 0"); return 0; } diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index 2a868fdab0ae..d4f2509e018b 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -1183,6 +1183,8 @@ void rxrpc_data_ready(struct sock *udp_sk) switch (sp->hdr.type) { case RXRPC_PACKET_TYPE_VERSION: +
[PATCH net-next 02/12] rxrpc: Fix a bit of time confusion
The rxrpc_reduce_call_timer() function should be passed the 'current time' in jiffies, not the current ktime time. It's confusing in rxrpc_resend because that has to deal with both. Pass the correct current time in. Note that this only affects the trace produced and not the functioning of the code. Fixes: a158bdd3247b ("rxrpc: Fix call timeouts") Signed-off-by: David Howells <dhowe...@redhat.com> --- net/rxrpc/call_event.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c index 6a62e42e1d8d..3dee89c7a06e 100644 --- a/net/rxrpc/call_event.c +++ b/net/rxrpc/call_event.c @@ -238,7 +238,7 @@ static void rxrpc_resend(struct rxrpc_call *call, unsigned long now_j) * retransmitting data. */ if (!retrans) { - rxrpc_reduce_call_timer(call, resend_at, now, + rxrpc_reduce_call_timer(call, resend_at, now_j, rxrpc_timer_set_for_resend); spin_unlock_bh(>lock); ack_ts = ktime_sub(now, call->acks_latest_ts);
[PATCH net-next 03/12] rxrpc: Fix Tx ring annotation after initial Tx failure
rxrpc calls have a ring of packets that are awaiting ACK or retransmission and a parallel ring of annotations that tracks the state of those packets. If the initial transmission of a packet on the underlying UDP socket fails then the packet annotation is marked for resend - but the setting of this mark accidentally erases the last-packet mark also stored in the same annotation slot. If this happens, a call won't switch out of the Tx phase when all the packets have been transmitted. Fix this by retaining the last-packet mark and only altering the packet state. Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code") Signed-off-by: David Howells <dhowe...@redhat.com> --- net/rxrpc/sendmsg.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c index 8503f279b467..783c777fc6e7 100644 --- a/net/rxrpc/sendmsg.c +++ b/net/rxrpc/sendmsg.c @@ -130,7 +130,9 @@ static inline void rxrpc_instant_resend(struct rxrpc_call *call, int ix) spin_lock_bh(>lock); if (call->state < RXRPC_CALL_COMPLETE) { - call->rxtx_annotations[ix] = RXRPC_TX_ANNO_RETRANS; + call->rxtx_annotations[ix] = + (call->rxtx_annotations[ix] & RXRPC_TX_ANNO_LAST) | + RXRPC_TX_ANNO_RETRANS; if (!test_and_set_bit(RXRPC_CALL_EV_RESEND, >events)) rxrpc_queue_call(call); }
[PATCH net-next 04/12] rxrpc: Don't treat call aborts as conn aborts
If a call-level abort is received for the previous call to complete on a connection channel, then that abort is queued for the connection processor to handle. Unfortunately, the connection processor then assumes without checking that the abort is connection-level (ie. callNumber is 0) and distributes it over all active calls on that connection, thereby incorrectly aborting them. Fix this by discarding aborts aimed at a completed call. Further, discard all packets aimed at a call that's complete if there's currently an active call on a channel, since the DATA packets associated with the new call automatically terminate the old call. Fixes: 18bfeba50dfd ("rxrpc: Perform terminal call ACK/ABORT retransmission from conn processor") Reported-by: Marc Dionne <marc.dio...@auristor.com> Signed-off-by: David Howells <dhowe...@redhat.com> --- net/rxrpc/input.c | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index d4f2509e018b..21800e6f5019 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -1242,16 +1242,19 @@ void rxrpc_data_ready(struct sock *udp_sk) goto discard_unlock; if (sp->hdr.callNumber == chan->last_call) { - /* For the previous service call, if completed successfully, we -* discard all further packets. + if (chan->call || + sp->hdr.type == RXRPC_PACKET_TYPE_ABORT) + goto discard_unlock; + + /* For the previous service call, if completed +* successfully, we discard all further packets. */ if (rxrpc_conn_is_service(conn) && - (chan->last_type == RXRPC_PACKET_TYPE_ACK || -sp->hdr.type == RXRPC_PACKET_TYPE_ABORT)) + chan->last_type == RXRPC_PACKET_TYPE_ACK) goto discard_unlock; - /* But otherwise we need to retransmit the final packet from -* data cached in the connection record. + /* But otherwise we need to retransmit the final packet +* from data cached in the connection record. */ rxrpc_post_packet_to_conn(conn, skb); goto out_unlock;
[PATCH net-next 07/12] rxrpc: Fix checker warnings and errors
Fix various issues detected by checker. Errors: (*) rxrpc_discard_prealloc() should be using rcu_assign_pointer to set call->socket. Warnings: (*) rxrpc_service_connection_reaper() should be passing NULL rather than 0 to trace_rxrpc_conn() as the where argument. (*) rxrpc_disconnect_client_call() should get its net pointer via the call->conn rather than call->sock to avoid a warning about accessing an RCU pointer without protection. (*) Proc seq start/stop functions need annotation as they pass locks between the functions. False positives: (*) Checker doesn't correctly handle of seq-retry lock context balance in rxrpc_find_service_conn_rcu(). (*) Checker thinks execution may proceed past the BUG() in rxrpc_publish_service_conn(). (*) Variable length array warnings from SKCIPHER_REQUEST_ON_STACK() in rxkad.c. Signed-off-by: David Howells <dhowe...@redhat.com> --- net/rxrpc/call_accept.c |3 ++- net/rxrpc/call_object.c |1 + net/rxrpc/conn_client.c |2 +- net/rxrpc/conn_object.c |2 +- net/rxrpc/proc.c|6 ++ net/rxrpc/sendmsg.c |2 ++ 6 files changed, 13 insertions(+), 3 deletions(-) diff --git a/net/rxrpc/call_accept.c b/net/rxrpc/call_accept.c index 92ebd1d7e0bb..4ce24c000653 100644 --- a/net/rxrpc/call_accept.c +++ b/net/rxrpc/call_accept.c @@ -225,7 +225,7 @@ void rxrpc_discard_prealloc(struct rxrpc_sock *rx) tail = b->call_backlog_tail; while (CIRC_CNT(head, tail, size) > 0) { struct rxrpc_call *call = b->call_backlog[tail]; - call->socket = rx; + rcu_assign_pointer(call->socket, rx); if (rx->discard_new_call) { _debug("discard %lx", call->user_call_ID); rx->discard_new_call(call, call->user_call_ID); @@ -456,6 +456,7 @@ struct rxrpc_call *rxrpc_accept_call(struct rxrpc_sock *rx, unsigned long user_call_ID, rxrpc_notify_rx_t notify_rx) __releases(>sk.sk_lock.slock) + __acquires(call->user_mutex) { struct rxrpc_call *call; struct rb_node *parent, **pp; diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c index 147657dfe757..85b12c472522 100644 --- a/net/rxrpc/call_object.c +++ b/net/rxrpc/call_object.c @@ -219,6 +219,7 @@ struct rxrpc_call *rxrpc_new_client_call(struct rxrpc_sock *rx, gfp_t gfp, unsigned int debug_id) __releases(>sk.sk_lock.slock) + __acquires(>user_mutex) { struct rxrpc_call *call, *xcall; struct rxrpc_net *rxnet = rxrpc_net(sock_net(>sk)); diff --git a/net/rxrpc/conn_client.c b/net/rxrpc/conn_client.c index 064175068059..041da40dbf93 100644 --- a/net/rxrpc/conn_client.c +++ b/net/rxrpc/conn_client.c @@ -776,7 +776,7 @@ void rxrpc_disconnect_client_call(struct rxrpc_call *call) unsigned int channel = call->cid & RXRPC_CHANNELMASK; struct rxrpc_connection *conn = call->conn; struct rxrpc_channel *chan = >channels[channel]; - struct rxrpc_net *rxnet = rxrpc_net(sock_net(>socket->sk)); + struct rxrpc_net *rxnet = conn->params.local->rxnet; trace_rxrpc_client(conn, channel, rxrpc_client_chan_disconnect); call->conn = NULL; diff --git a/net/rxrpc/conn_object.c b/net/rxrpc/conn_object.c index ccbac190add1..bfc46fd69a62 100644 --- a/net/rxrpc/conn_object.c +++ b/net/rxrpc/conn_object.c @@ -418,7 +418,7 @@ void rxrpc_service_connection_reaper(struct work_struct *work) */ if (atomic_cmpxchg(>usage, 1, 0) != 1) continue; - trace_rxrpc_conn(conn, rxrpc_conn_reap_service, 0, 0); + trace_rxrpc_conn(conn, rxrpc_conn_reap_service, 0, NULL); if (rxrpc_conn_is_client(conn)) BUG(); diff --git a/net/rxrpc/proc.c b/net/rxrpc/proc.c index f79f260c6ddc..7e45db058823 100644 --- a/net/rxrpc/proc.c +++ b/net/rxrpc/proc.c @@ -29,6 +29,8 @@ static const char *const rxrpc_conn_states[RXRPC_CONN__NR_STATES] = { * generate a list of extant and dead calls in /proc/net/rxrpc_calls */ static void *rxrpc_call_seq_start(struct seq_file *seq, loff_t *_pos) + __acquires(rcu) + __acquires(rxnet->call_lock) { struct rxrpc_net *rxnet = rxrpc_net(seq_file_net(seq)); @@ -45,6 +47,8 @@ static void *rxrpc_call_seq_next(struct seq_file *seq, void *v, loff_t *pos) } static void rxrpc_call_seq_stop(struct seq_file *seq, void *v) + __releases(rxnet->call_lock) + __releases(rcu) { struct rxrpc_net *rxnet = rxrpc_net(seq_file_net(seq)); @@ -135,6 +139,7 @@ const struct file_operations rxrpc_call_seq_fops = { * generate a list of extant virtual connect
[PATCH net-next 06/12] rxrpc: remove unused static variables
From: Sebastian Andrzej Siewior <bige...@linutronix.de> The rxrpc_security_methods and rxrpc_security_sem user has been removed in 648af7fca159 ("rxrpc: Absorb the rxkad security module"). This was noticed by kbuild test robot for the -RT tree but is also true for !RT. Reported-by: kbuild test robot <fengguang...@intel.com> Signed-off-by: Sebastian Andrzej Siewior <bige...@linutronix.de> Signed-off-by: David Howells <dhowe...@redhat.com> --- net/rxrpc/security.c |3 --- 1 file changed, 3 deletions(-) diff --git a/net/rxrpc/security.c b/net/rxrpc/security.c index e9f428351293..c4479afe8ae7 100644 --- a/net/rxrpc/security.c +++ b/net/rxrpc/security.c @@ -19,9 +19,6 @@ #include #include "ar-internal.h" -static LIST_HEAD(rxrpc_security_methods); -static DECLARE_RWSEM(rxrpc_security_sem); - static const struct rxrpc_security *rxrpc_security_types[] = { [RXRPC_SECURITY_NONE] = _no_security, #ifdef CONFIG_RXKAD
[PATCH net-next 08/12] rxrpc: Fix potential call vs socket/net destruction race
rxrpc_call structs don't pin sockets or network namespaces, but may attempt to access both after their refcount reaches 0 so that they can detach themselves from the network namespace. However, there's no guarantee that the socket still exists at this point (so sock_net(>socket->sk) may be invalid) and the namespace may have gone away if the call isn't pinning a peer. Fix this by (a) carrying a net pointer in the rxrpc_call struct and (b) waiting for all calls to be destroyed when the network namespace goes away. This was detected by checker: net/rxrpc/call_object.c:634:57: warning: incorrect type in argument 1 (different address spaces) net/rxrpc/call_object.c:634:57:expected struct sock const *sk net/rxrpc/call_object.c:634:57:got struct sock [noderef] * Fixes: 2baec2c3f854 ("rxrpc: Support network namespacing") Signed-off-by: David Howells <dhowe...@redhat.com> --- net/rxrpc/ar-internal.h |2 ++ net/rxrpc/call_accept.c |1 + net/rxrpc/call_object.c | 16 +--- net/rxrpc/net_ns.c |1 + 4 files changed, 17 insertions(+), 3 deletions(-) diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index 8a348e0a9d95..2a2b0fdfb157 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -75,6 +75,7 @@ struct rxrpc_net { u32 epoch; /* Local epoch for detecting local-end reset */ struct list_headcalls; /* List of calls active in this namespace */ rwlock_tcall_lock; /* Lock for ->calls */ + atomic_tnr_calls; /* Count of allocated calls */ struct list_headconn_proc_list; /* List of conns in this namespace for proc */ struct list_headservice_conns; /* Service conns in this namespace */ @@ -528,6 +529,7 @@ struct rxrpc_call { struct rxrpc_connection *conn; /* connection carrying call */ struct rxrpc_peer *peer; /* Peer record for remote address */ struct rxrpc_sock __rcu *socket;/* socket responsible */ + struct rxrpc_net*rxnet; /* Network namespace to which call belongs */ struct mutexuser_mutex; /* User access mutex */ unsigned long ack_at; /* When deferred ACK needs to happen */ unsigned long ack_lost_at;/* When ACK is figured as lost */ diff --git a/net/rxrpc/call_accept.c b/net/rxrpc/call_accept.c index 4ce24c000653..493545033e42 100644 --- a/net/rxrpc/call_accept.c +++ b/net/rxrpc/call_accept.c @@ -138,6 +138,7 @@ static int rxrpc_service_prealloc_one(struct rxrpc_sock *rx, write_unlock(>call_lock); + rxnet = call->rxnet; write_lock(>call_lock); list_add_tail(>link, >calls); write_unlock(>call_lock); diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c index 85b12c472522..f721c2b7e234 100644 --- a/net/rxrpc/call_object.c +++ b/net/rxrpc/call_object.c @@ -103,6 +103,7 @@ struct rxrpc_call *rxrpc_alloc_call(struct rxrpc_sock *rx, gfp_t gfp, unsigned int debug_id) { struct rxrpc_call *call; + struct rxrpc_net *rxnet = rxrpc_net(sock_net(>sk)); call = kmem_cache_zalloc(rxrpc_call_jar, gfp); if (!call) @@ -153,6 +154,9 @@ struct rxrpc_call *rxrpc_alloc_call(struct rxrpc_sock *rx, gfp_t gfp, call->cong_cwnd = 2; call->cong_ssthresh = RXRPC_RXTX_BUFF_SIZE - 1; + + call->rxnet = rxnet; + atomic_inc(>nr_calls); return call; nomem_2: @@ -222,7 +226,7 @@ struct rxrpc_call *rxrpc_new_client_call(struct rxrpc_sock *rx, __acquires(>user_mutex) { struct rxrpc_call *call, *xcall; - struct rxrpc_net *rxnet = rxrpc_net(sock_net(>sk)); + struct rxrpc_net *rxnet; struct rb_node *parent, **pp; const void *here = __builtin_return_address(0); int ret; @@ -272,6 +276,7 @@ struct rxrpc_call *rxrpc_new_client_call(struct rxrpc_sock *rx, write_unlock(>call_lock); + rxnet = call->rxnet; write_lock(>call_lock); list_add_tail(>link, >calls); write_unlock(>call_lock); @@ -617,7 +622,7 @@ void rxrpc_release_calls_on_socket(struct rxrpc_sock *rx) */ void rxrpc_put_call(struct rxrpc_call *call, enum rxrpc_call_trace op) { - struct rxrpc_net *rxnet; + struct rxrpc_net *rxnet = call->rxnet; const void *here = __builtin_return_address(0); int n; @@ -631,7 +636,6 @@ void rxrpc_put_call(struct rxrpc_call *call, enum rxrpc_call_trace op) ASSERTCMP(call->state, ==, RXRPC_CALL_COMPLETE); if (!list_empty(>link)) { - rxnet = rxrpc_net(sock_net(>socket->sk)); write_lock(>call_lock);
[PATCH net-next 05/12] rxrpc: Fix resend event time calculation
From: Marc Dionne <marc.dio...@auristor.com> Commit a158bdd3 ("rxrpc: Fix call timeouts") reworked the time calculation for the next resend event. For this calculation, "oldest" will be before "now", so ktime_sub(oldest, now) will yield a negative value. When passed to nsecs_to_jiffies which expects an unsigned value, the end result will be a very large value, and a resend event scheduled far into the future. This could cause calls to stall if some packets were lost. Fix by ordering the arguments to ktime_sub correctly. Fixes: a158bdd3247b ("rxrpc: Fix call timeouts") Signed-off-by: Marc Dionne <marc.dio...@auristor.com> Signed-off-by: David Howells <dhowe...@redhat.com> --- net/rxrpc/call_event.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c index 3dee89c7a06e..6e0d788b4dc4 100644 --- a/net/rxrpc/call_event.c +++ b/net/rxrpc/call_event.c @@ -226,7 +226,7 @@ static void rxrpc_resend(struct rxrpc_call *call, unsigned long now_j) ktime_to_ns(ktime_sub(skb->tstamp, max_age))); } - resend_at = nsecs_to_jiffies(ktime_to_ns(ktime_sub(oldest, now))); + resend_at = nsecs_to_jiffies(ktime_to_ns(ktime_sub(now, oldest))); resend_at += jiffies + rxrpc_resend_timeout; WRITE_ONCE(call->resend_at, resend_at);
[PATCH net-next 11/12] rxrpc: Add a tracepoint to track rxrpc_peer refcounting
Add a tracepoint to track reference counting on the rxrpc_peer struct. Signed-off-by: David Howells <dhowe...@redhat.com> --- include/trace/events/rxrpc.h | 42 +++ net/rxrpc/ar-internal.h | 23 +++ net/rxrpc/peer_event.c |2 + net/rxrpc/peer_object.c | 65 +- 4 files changed, 110 insertions(+), 22 deletions(-) diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index 0410dfeb79c6..9e96c2fe2793 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -50,6 +50,14 @@ enum rxrpc_local_trace { rxrpc_local_queued, }; +enum rxrpc_peer_trace { + rxrpc_peer_got, + rxrpc_peer_new, + rxrpc_peer_processing, + rxrpc_peer_put, + rxrpc_peer_queued_error, +}; + enum rxrpc_conn_trace { rxrpc_conn_got, rxrpc_conn_new_client, @@ -230,6 +238,13 @@ enum rxrpc_congest_change { EM(rxrpc_local_put, "PUT") \ E_(rxrpc_local_queued, "QUE") +#define rxrpc_peer_traces \ + EM(rxrpc_peer_got, "GOT") \ + EM(rxrpc_peer_new, "NEW") \ + EM(rxrpc_peer_processing, "PRO") \ + EM(rxrpc_peer_put, "PUT") \ + E_(rxrpc_peer_queued_error, "QER") + #define rxrpc_conn_traces \ EM(rxrpc_conn_got, "GOT") \ EM(rxrpc_conn_new_client, "NWc") \ @@ -482,6 +497,33 @@ TRACE_EVENT(rxrpc_local, __entry->where) ); +TRACE_EVENT(rxrpc_peer, + TP_PROTO(struct rxrpc_peer *peer, enum rxrpc_peer_trace op, +int usage, const void *where), + + TP_ARGS(peer, op, usage, where), + + TP_STRUCT__entry( + __field(unsigned int, peer) + __field(int,op ) + __field(int,usage ) + __field(const void *, where ) +), + + TP_fast_assign( + __entry->peer = peer->debug_id; + __entry->op = op; + __entry->usage = usage; + __entry->where = where; + ), + + TP_printk("P=%08x %s u=%d sp=%pSR", + __entry->peer, + __print_symbolic(__entry->op, rxrpc_peer_traces), + __entry->usage, + __entry->where) + ); + TRACE_EVENT(rxrpc_conn, TP_PROTO(struct rxrpc_connection *conn, enum rxrpc_conn_trace op, int usage, const void *where), diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index d40d54b78567..c46583bc255d 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -1041,25 +1041,10 @@ struct rxrpc_peer *rxrpc_lookup_peer(struct rxrpc_local *, struct rxrpc_peer *rxrpc_alloc_peer(struct rxrpc_local *, gfp_t); struct rxrpc_peer *rxrpc_lookup_incoming_peer(struct rxrpc_local *, struct rxrpc_peer *); - -static inline struct rxrpc_peer *rxrpc_get_peer(struct rxrpc_peer *peer) -{ - atomic_inc(>usage); - return peer; -} - -static inline -struct rxrpc_peer *rxrpc_get_peer_maybe(struct rxrpc_peer *peer) -{ - return atomic_inc_not_zero(>usage) ? peer : NULL; -} - -extern void __rxrpc_put_peer(struct rxrpc_peer *peer); -static inline void rxrpc_put_peer(struct rxrpc_peer *peer) -{ - if (peer && atomic_dec_and_test(>usage)) - __rxrpc_put_peer(peer); -} +struct rxrpc_peer *rxrpc_get_peer(struct rxrpc_peer *); +struct rxrpc_peer *rxrpc_get_peer_maybe(struct rxrpc_peer *); +void rxrpc_put_peer(struct rxrpc_peer *); +void __rxrpc_queue_peer_error(struct rxrpc_peer *); /* * proc.c diff --git a/net/rxrpc/peer_event.c b/net/rxrpc/peer_event.c index d01eb9a06448..78c2f95d1f22 100644 --- a/net/rxrpc/peer_event.c +++ b/net/rxrpc/peer_event.c @@ -192,7 +192,7 @@ void rxrpc_error_report(struct sock *sk) rxrpc_free_skb(skb, rxrpc_skb_rx_freed); /* The ref we obtained is passed off to the work item */ - rxrpc_queue_work(>error_distributor); + __rxrpc_queue_peer_error(peer); _leave(""); } diff --git a/net/rxrpc/peer_object.c b/net/rxrpc/peer_object.c index 94a6dbfcf129..a4a750aea1e5 100644 --- a/net/rxrpc/peer_object.c +++ b/net/rxrpc/peer_object.c @@ -386,9 +386,54 @@ struct rxrpc_peer *rxrpc_lookup_peer(struct rxrpc_local *local, } /* - * Discard a ref on a remote peer record. + * Get a ref on a peer record. */ -void __rxrpc_put_peer(struct rxrpc_peer *peer) +struc
[PATCH net-next 10/12] rxrpc: Fix apparent leak of rxrpc_local objects
rxrpc_local objects cannot be disposed of until all the connections that point to them have been RCU'd as a connection object holds refcount on the local endpoint it is communicating through. Currently, this can cause an assertion failure to occur when a network namespace is destroyed as there's no check that the RCU destructors for the connections have been run before we start trying to destroy local endpoints. The kernel reports: rxrpc: AF_RXRPC: Leaked local 36a41bc1 {5} [ cut here ] kernel BUG at ../net/rxrpc/local_object.c:439! Fix this by keeping a count of the live connections and waiting for it to go to zero at the end of rxrpc_destroy_all_connections(). Fixes: dee46364ce6f ("rxrpc: Add RCU destruction for connections and calls") Signed-off-by: David Howells <dhowe...@redhat.com> --- net/rxrpc/ar-internal.h |1 + net/rxrpc/call_accept.c |2 ++ net/rxrpc/conn_client.c |1 + net/rxrpc/conn_object.c |8 net/rxrpc/conn_service.c |1 + net/rxrpc/net_ns.c |1 + 6 files changed, 14 insertions(+) diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index cc51d3eb0548..d40d54b78567 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -77,6 +77,7 @@ struct rxrpc_net { rwlock_tcall_lock; /* Lock for ->calls */ atomic_tnr_calls; /* Count of allocated calls */ + atomic_tnr_conns; struct list_headconn_proc_list; /* List of conns in this namespace for proc */ struct list_headservice_conns; /* Service conns in this namespace */ rwlock_tconn_lock; /* Lock for ->conn_proc_list, ->service_conns */ diff --git a/net/rxrpc/call_accept.c b/net/rxrpc/call_accept.c index 5a9b1d916124..f67017dcb25e 100644 --- a/net/rxrpc/call_accept.c +++ b/net/rxrpc/call_accept.c @@ -219,6 +219,8 @@ void rxrpc_discard_prealloc(struct rxrpc_sock *rx) list_del(>proc_link); write_unlock(>conn_lock); kfree(conn); + if (atomic_dec_and_test(>nr_conns)) + wake_up_atomic_t(>nr_conns); tail = (tail + 1) & (size - 1); } diff --git a/net/rxrpc/conn_client.c b/net/rxrpc/conn_client.c index 041da40dbf93..5736f643c516 100644 --- a/net/rxrpc/conn_client.c +++ b/net/rxrpc/conn_client.c @@ -207,6 +207,7 @@ rxrpc_alloc_client_connection(struct rxrpc_conn_parameters *cp, gfp_t gfp) if (ret < 0) goto error_2; + atomic_inc(>nr_conns); write_lock(>conn_lock); list_add_tail(>proc_link, >conn_proc_list); write_unlock(>conn_lock); diff --git a/net/rxrpc/conn_object.c b/net/rxrpc/conn_object.c index bfc46fd69a62..0950ee3d26f5 100644 --- a/net/rxrpc/conn_object.c +++ b/net/rxrpc/conn_object.c @@ -365,6 +365,9 @@ static void rxrpc_destroy_connection(struct rcu_head *rcu) key_put(conn->params.key); key_put(conn->server_key); rxrpc_put_peer(conn->params.peer); + + if (atomic_dec_and_test(>params.local->rxnet->nr_conns)) + wake_up_atomic_t(>params.local->rxnet->nr_conns); rxrpc_put_local(conn->params.local); kfree(conn); @@ -458,6 +461,7 @@ void rxrpc_destroy_all_connections(struct rxrpc_net *rxnet) _enter(""); + atomic_dec(>nr_conns); rxrpc_destroy_all_client_connections(rxnet); del_timer_sync(>service_conn_reap_timer); @@ -475,5 +479,9 @@ void rxrpc_destroy_all_connections(struct rxrpc_net *rxnet) ASSERT(list_empty(>conn_proc_list)); + /* We need to wait for the connections to be destroyed by RCU as they +* pin things that we still need to get rid of. +*/ + wait_on_atomic_t(>nr_conns, atomic_t_wait, TASK_UNINTERRUPTIBLE); _leave(""); } diff --git a/net/rxrpc/conn_service.c b/net/rxrpc/conn_service.c index f6fcdb3130a1..80773a50c755 100644 --- a/net/rxrpc/conn_service.c +++ b/net/rxrpc/conn_service.c @@ -132,6 +132,7 @@ struct rxrpc_connection *rxrpc_prealloc_service_connection(struct rxrpc_net *rxn conn->state = RXRPC_CONN_SERVICE_PREALLOC; atomic_set(>usage, 2); + atomic_inc(>nr_conns); write_lock(>conn_lock); list_add_tail(>link, >service_conns); list_add_tail(>proc_link, >conn_proc_list); diff --git a/net/rxrpc/net_ns.c b/net/rxrpc/net_ns.c index 101019b0be34..fa9ce60e7bfa 100644 --- a/net/rxrpc/net_ns.c +++ b/net/rxrpc/net_ns.c @@ -57,6 +57,7 @@ static __net_init int rxrpc_init_net(struct net *net) rwlock_init(>call_lock); atomic_set(>nr_calls, 1); + atomic_set(>nr_conns, 1); INIT_LIST_HEAD(>conn_proc_list); INIT_LIST_HEAD(>service_conns); rwlock_init(>conn_lock);
[PATCH net-next 09/12] rxrpc: Add a tracepoint to track rxrpc_local refcounting
Add a tracepoint to track reference counting on the rxrpc_local struct. Signed-off-by: David Howells <dhowe...@redhat.com> --- include/trace/events/rxrpc.h | 43 net/rxrpc/ar-internal.h | 27 +++-- net/rxrpc/call_accept.c |3 +- net/rxrpc/local_object.c | 65 +- 4 files changed, 111 insertions(+), 27 deletions(-) diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index 2ea788f6f95d..0410dfeb79c6 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -42,6 +42,14 @@ enum rxrpc_skb_trace { rxrpc_skb_tx_seen, }; +enum rxrpc_local_trace { + rxrpc_local_got, + rxrpc_local_new, + rxrpc_local_processing, + rxrpc_local_put, + rxrpc_local_queued, +}; + enum rxrpc_conn_trace { rxrpc_conn_got, rxrpc_conn_new_client, @@ -215,6 +223,13 @@ enum rxrpc_congest_change { EM(rxrpc_skb_tx_rotated,"Tx ROT") \ E_(rxrpc_skb_tx_seen, "Tx SEE") +#define rxrpc_local_traces \ + EM(rxrpc_local_got, "GOT") \ + EM(rxrpc_local_new, "NEW") \ + EM(rxrpc_local_processing, "PRO") \ + EM(rxrpc_local_put, "PUT") \ + E_(rxrpc_local_queued, "QUE") + #define rxrpc_conn_traces \ EM(rxrpc_conn_got, "GOT") \ EM(rxrpc_conn_new_client, "NWc") \ @@ -416,6 +431,7 @@ enum rxrpc_congest_change { #define E_(a, b) TRACE_DEFINE_ENUM(a); rxrpc_skb_traces; +rxrpc_local_traces; rxrpc_conn_traces; rxrpc_client_traces; rxrpc_call_traces; @@ -439,6 +455,33 @@ rxrpc_congest_changes; #define EM(a, b) { a, b }, #define E_(a, b) { a, b } +TRACE_EVENT(rxrpc_local, + TP_PROTO(struct rxrpc_local *local, enum rxrpc_local_trace op, +int usage, const void *where), + + TP_ARGS(local, op, usage, where), + + TP_STRUCT__entry( + __field(unsigned int, local ) + __field(int,op ) + __field(int,usage ) + __field(const void *, where ) +), + + TP_fast_assign( + __entry->local = local->debug_id; + __entry->op = op; + __entry->usage = usage; + __entry->where = where; + ), + + TP_printk("L=%08x %s u=%d sp=%pSR", + __entry->local, + __print_symbolic(__entry->op, rxrpc_local_traces), + __entry->usage, + __entry->where) + ); + TRACE_EVENT(rxrpc_conn, TP_PROTO(struct rxrpc_connection *conn, enum rxrpc_conn_trace op, int usage, const void *where), diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index 2a2b0fdfb157..cc51d3eb0548 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -981,31 +981,12 @@ extern void rxrpc_process_local_events(struct rxrpc_local *); * local_object.c */ struct rxrpc_local *rxrpc_lookup_local(struct net *, const struct sockaddr_rxrpc *); -void __rxrpc_put_local(struct rxrpc_local *); +struct rxrpc_local *rxrpc_get_local(struct rxrpc_local *); +struct rxrpc_local *rxrpc_get_local_maybe(struct rxrpc_local *); +void rxrpc_put_local(struct rxrpc_local *); +void rxrpc_queue_local(struct rxrpc_local *); void rxrpc_destroy_all_locals(struct rxrpc_net *); -static inline void rxrpc_get_local(struct rxrpc_local *local) -{ - atomic_inc(>usage); -} - -static inline -struct rxrpc_local *rxrpc_get_local_maybe(struct rxrpc_local *local) -{ - return atomic_inc_not_zero(>usage) ? local : NULL; -} - -static inline void rxrpc_put_local(struct rxrpc_local *local) -{ - if (local && atomic_dec_and_test(>usage)) - __rxrpc_put_local(local); -} - -static inline void rxrpc_queue_local(struct rxrpc_local *local) -{ - rxrpc_queue_work(>processor); -} - /* * misc.c */ diff --git a/net/rxrpc/call_accept.c b/net/rxrpc/call_accept.c index 493545033e42..5a9b1d916124 100644 --- a/net/rxrpc/call_accept.c +++ b/net/rxrpc/call_accept.c @@ -296,8 +296,7 @@ static struct rxrpc_call *rxrpc_alloc_incoming_call(struct rxrpc_sock *rx, b->conn_backlog[conn_tail] = NULL; smp_store_release(>conn_backlog_tail, (conn_tail + 1) & (RXRPC_BACKLOG_MAX - 1)); - rxrpc_get_local(local); - conn->params.local = local; + conn->params.local =
[PATCH net-next 12/12] rxrpc: Fix leak of rxrpc_peer objects
When a new client call is requested, an rxrpc_conn_parameters struct object is passed in with a bunch of parameters set, such as the local endpoint to use. A pointer to the target peer record is also placed in there by rxrpc_get_client_conn() - and this is removed if and only if a new connection object is allocated. Thus it leaks if a new connection object isn't allocated. Fix this by putting any peer object attached to the rxrpc_conn_parameters object in the function that allocated it. Fixes: 19ffa01c9c45 ("rxrpc: Use structs to hold connection params and protocol info") Signed-off-by: David Howells <dhowe...@redhat.com> --- net/rxrpc/af_rxrpc.c|2 ++ net/rxrpc/ar-internal.h |1 + net/rxrpc/net_ns.c |1 + net/rxrpc/peer_object.c | 21 + net/rxrpc/sendmsg.c |1 + 5 files changed, 26 insertions(+) diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c index 0b3026b8fa40..9a2c8e7c000e 100644 --- a/net/rxrpc/af_rxrpc.c +++ b/net/rxrpc/af_rxrpc.c @@ -324,6 +324,7 @@ struct rxrpc_call *rxrpc_kernel_begin_call(struct socket *sock, mutex_unlock(>user_mutex); } + rxrpc_put_peer(cp.peer); _leave(" = %p", call); return call; } @@ -447,6 +448,7 @@ int rxrpc_kernel_retry_call(struct socket *sock, struct rxrpc_call *call, ret = rxrpc_retry_client_call(rx, call, , srx, GFP_KERNEL); mutex_unlock(>user_mutex); + rxrpc_put_peer(cp.peer); _leave(" = %d", ret); return ret; } diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index c46583bc255d..90d7079e0aa9 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -1041,6 +1041,7 @@ struct rxrpc_peer *rxrpc_lookup_peer(struct rxrpc_local *, struct rxrpc_peer *rxrpc_alloc_peer(struct rxrpc_local *, gfp_t); struct rxrpc_peer *rxrpc_lookup_incoming_peer(struct rxrpc_local *, struct rxrpc_peer *); +void rxrpc_destroy_all_peers(struct rxrpc_net *); struct rxrpc_peer *rxrpc_get_peer(struct rxrpc_peer *); struct rxrpc_peer *rxrpc_get_peer_maybe(struct rxrpc_peer *); void rxrpc_put_peer(struct rxrpc_peer *); diff --git a/net/rxrpc/net_ns.c b/net/rxrpc/net_ns.c index fa9ce60e7bfa..c7a023fb22d0 100644 --- a/net/rxrpc/net_ns.c +++ b/net/rxrpc/net_ns.c @@ -118,6 +118,7 @@ static __net_exit void rxrpc_exit_net(struct net *net) cancel_work_sync(>peer_keepalive_work); rxrpc_destroy_all_calls(rxnet); rxrpc_destroy_all_connections(rxnet); + rxrpc_destroy_all_peers(rxnet); rxrpc_destroy_all_locals(rxnet); proc_remove(rxnet->proc_net); } diff --git a/net/rxrpc/peer_object.c b/net/rxrpc/peer_object.c index a4a750aea1e5..1b7e8107b3ae 100644 --- a/net/rxrpc/peer_object.c +++ b/net/rxrpc/peer_object.c @@ -463,6 +463,27 @@ void rxrpc_put_peer(struct rxrpc_peer *peer) } } +/* + * Make sure all peer records have been discarded. + */ +void rxrpc_destroy_all_peers(struct rxrpc_net *rxnet) +{ + struct rxrpc_peer *peer; + int i; + + for (i = 0; i < HASH_SIZE(rxnet->peer_hash); i++) { + if (hlist_empty(>peer_hash[i])) + continue; + + hlist_for_each_entry(peer, >peer_hash[i], hash_link) { + pr_err("Leaked peer %u {%u} %pISp\n", + peer->debug_id, + atomic_read(>usage), + >srx.transport); + } + } +} + /** * rxrpc_kernel_get_peer - Get the peer address of a call * @sock: The socket on which the call is in progress. diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c index a62980a80151..206e802ccbdc 100644 --- a/net/rxrpc/sendmsg.c +++ b/net/rxrpc/sendmsg.c @@ -586,6 +586,7 @@ rxrpc_new_client_call_for_sendmsg(struct rxrpc_sock *rx, struct msghdr *msg, atomic_inc_return(_debug_id)); /* The socket is now unlocked */ + rxrpc_put_peer(cp.peer); _leave(" = %p\n", call); return call; }
Re: [PATCH net-next 0/6] rxrpc: Fixes
David Millerwrote: > David, this GIT URL has tons of unrelated changes. It seems to bring in > the parts of Linus's tree that haven't proagated to 'net' yet. Sorry about that, I rebased on the wrong branch by accident. I've got some more fixes. Should I just give the lot to you to pull into your net-next tree, given that the merge window may well open Sunday? David
[PATCH net-next 0/3] rxrpc: Tracing updates
Here are some patches that update tracing in AF_RXRPC and AFS: (1) Add a tracepoint for tracking resend events. (2) Use debug_ids in traces rather than pointers (as pointers are now hashed) and allow use of the same debug_id in AFS calls as in the corresponding AF_RXRPC calls. This makes filtering the trace output much easier. (3) Add a tracepoint for tracking call completion. The patches are tagged here: git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git rxrpc-next-20180327 and can also be found on this branch: http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-next David --- David Howells (3): rxrpc: Trace resend rxrpc, afs: Use debug_ids rather than pointers in traces rxrpc: Trace call completion fs/afs/internal.h|1 fs/afs/rxrpc.c | 12 ++ include/net/af_rxrpc.h | 11 ++ include/trace/events/afs.h | 69 ++ include/trace/events/rxrpc.h | 206 +++--- net/rxrpc/af_rxrpc.c |7 + net/rxrpc/ar-internal.h |9 +- net/rxrpc/call_accept.c | 18 ++-- net/rxrpc/call_event.c |1 net/rxrpc/call_object.c | 15 ++- net/rxrpc/conn_event.c |3 - net/rxrpc/input.c|6 + net/rxrpc/sendmsg.c |3 - 13 files changed, 219 insertions(+), 142 deletions(-)
[PATCH net-next 1/3] rxrpc: Trace resend
Add a tracepoint to trace packet resend events and to dump the Tx annotation buffer for added illumination. Signed-off-by: David Howells <dhowe...@rdhat.com> --- include/trace/events/rxrpc.h | 24 net/rxrpc/call_event.c |1 + 2 files changed, 25 insertions(+) diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index 36cb50c111a6..41979f907575 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -420,6 +420,7 @@ rxrpc_rtt_rx_traces; rxrpc_timer_traces; rxrpc_propose_ack_traces; rxrpc_propose_ack_outcomes; +rxrpc_congest_modes; rxrpc_congest_changes; /* @@ -1229,6 +1230,29 @@ TRACE_EVENT(rxrpc_connect_call, __entry->call_id) ); +TRACE_EVENT(rxrpc_resend, + TP_PROTO(struct rxrpc_call *call, int ix), + + TP_ARGS(call, ix), + + TP_STRUCT__entry( + __field(struct rxrpc_call *,call) + __field(int,ix ) + __array(u8, anno, 64) +), + + TP_fast_assign( + __entry->call = call; + __entry->ix = ix; + memcpy(__entry->anno, call->rxtx_annotations, 64); + ), + + TP_printk("c=%p ix=%u a=%64phN", + __entry->call, + __entry->ix, + __entry->anno) + ); + #endif /* _TRACE_RXRPC_H */ /* This part must be outside protection */ diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c index ad2ab1103189..6a62e42e1d8d 100644 --- a/net/rxrpc/call_event.c +++ b/net/rxrpc/call_event.c @@ -195,6 +195,7 @@ static void rxrpc_resend(struct rxrpc_call *call, unsigned long now_j) * the packets in the Tx buffer we're going to resend and what the new * resend timeout will be. */ + trace_rxrpc_resend(call, (cursor + 1) & RXRPC_RXTX_BUFF_MASK); oldest = now; for (seq = cursor + 1; before_eq(seq, top); seq++) { ix = seq & RXRPC_RXTX_BUFF_MASK;
[PATCH net-next 2/3] rxrpc, afs: Use debug_ids rather than pointers in traces
In rxrpc and afs, use the debug_ids that are monotonically allocated to various objects as they're allocated rather than pointers as kernel pointers are now hashed making them less useful. Further, the debug ids aren't reused anywhere nearly as quickly. In addition, allow kernel services that use rxrpc, such as afs, to take numbers from the rxrpc counter, assign them to their own call struct and pass them in to rxrpc for both client and service calls so that the trace lines for each will have the same ID tag. Signed-off-by: David Howells <dhowe...@redhat.com> --- fs/afs/internal.h|1 fs/afs/rxrpc.c | 12 ++- include/net/af_rxrpc.h | 11 ++- include/trace/events/afs.h | 69 --- include/trace/events/rxrpc.h | 155 +- net/rxrpc/af_rxrpc.c |7 +- net/rxrpc/ar-internal.h |8 +- net/rxrpc/call_accept.c | 18 +++-- net/rxrpc/call_object.c | 15 ++-- net/rxrpc/conn_event.c |3 + net/rxrpc/input.c|6 +- net/rxrpc/sendmsg.c |3 + 12 files changed, 163 insertions(+), 145 deletions(-) diff --git a/fs/afs/internal.h b/fs/afs/internal.h index f38d6a561a84..72217170b155 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -118,6 +118,7 @@ struct afs_call { boolret_reply0; /* T if should return reply[0] on success */ boolupgrade;/* T to request service upgrade */ u16 service_id; /* Actual service ID (after upgrade) */ + unsigned intdebug_id; /* Trace ID */ u32 operation_ID; /* operation ID for an incoming call */ u32 count; /* count for use in unmarshalling */ __be32 tmp;/* place to extract temporary data */ diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c index e1126659f043..b819900916e6 100644 --- a/fs/afs/rxrpc.c +++ b/fs/afs/rxrpc.c @@ -131,6 +131,7 @@ static struct afs_call *afs_alloc_call(struct afs_net *net, call->type = type; call->net = net; + call->debug_id = atomic_inc_return(_debug_id); atomic_set(>usage, 1); INIT_WORK(>async_work, afs_process_async_call); init_waitqueue_head(>waitq); @@ -169,11 +170,12 @@ void afs_put_call(struct afs_call *call) afs_put_server(call->net, call->cm_server); afs_put_cb_interest(call->net, call->cbi); kfree(call->request); - kfree(call); - o = atomic_dec_return(>nr_outstanding_calls); trace_afs_call(call, afs_call_trace_free, 0, o, __builtin_return_address(0)); + kfree(call); + + o = atomic_dec_return(>nr_outstanding_calls); if (o == 0) wake_up_atomic_t(>nr_outstanding_calls); } @@ -378,7 +380,8 @@ long afs_make_call(struct afs_addr_cursor *ac, struct afs_call *call, (async ? afs_wake_up_async_call : afs_wake_up_call_waiter), -call->upgrade); +call->upgrade, +call->debug_id); if (IS_ERR(rxcall)) { ret = PTR_ERR(rxcall); goto error_kill_call; @@ -727,7 +730,8 @@ void afs_charge_preallocation(struct work_struct *work) afs_wake_up_async_call, afs_rx_attach, (unsigned long)call, - GFP_KERNEL) < 0) + GFP_KERNEL, + call->debug_id) < 0) break; call = NULL; } diff --git a/include/net/af_rxrpc.h b/include/net/af_rxrpc.h index 2b3a6eec4570..8ae8ee004258 100644 --- a/include/net/af_rxrpc.h +++ b/include/net/af_rxrpc.h @@ -31,6 +31,11 @@ enum rxrpc_call_completion { NR__RXRPC_CALL_COMPLETIONS }; +/* + * Debug ID counter for tracing. + */ +extern atomic_t rxrpc_debug_id; + typedef void (*rxrpc_notify_rx_t)(struct sock *, struct rxrpc_call *, unsigned long); typedef void (*rxrpc_notify_end_tx_t)(struct sock *, struct rxrpc_call *, @@ -50,7 +55,8 @@ struct rxrpc_call *rxrpc_kernel_begin_call(struct socket *, s64, gfp_t, rxrpc_notify_rx_t, -
[PATCH net-next 3/3] rxrpc: Trace call completion
Add a tracepoint to track rxrpc calls moving into the completed state and to log the completion type and the recorded error value and abort code. Signed-off-by: David Howells <dhowe...@redhat.com> --- include/trace/events/rxrpc.h | 33 + net/rxrpc/ar-internal.h |1 + 2 files changed, 34 insertions(+) diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index 4d2c2d35c5cb..2ea788f6f95d 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -400,6 +400,13 @@ enum rxrpc_congest_change { EM(RXRPC_ACK_IDLE, "IDL") \ E_(RXRPC_ACK__INVALID, "-?-") +#define rxrpc_completions \ + EM(RXRPC_CALL_SUCCEEDED,"Succeeded") \ + EM(RXRPC_CALL_REMOTELY_ABORTED, "RemoteAbort") \ + EM(RXRPC_CALL_LOCALLY_ABORTED, "LocalAbort") \ + EM(RXRPC_CALL_LOCAL_ERROR, "LocalError") \ + E_(RXRPC_CALL_NETWORK_ERROR,"NetError") + /* * Export enum symbols via userspace. */ @@ -624,6 +631,32 @@ TRACE_EVENT(rxrpc_abort, __entry->abort_code, __entry->error, __entry->why) ); +TRACE_EVENT(rxrpc_call_complete, + TP_PROTO(struct rxrpc_call *call), + + TP_ARGS(call), + + TP_STRUCT__entry( + __field(unsigned int, call) + __field(enum rxrpc_call_completion, compl ) + __field(int,error ) + __field(u32,abort_code ) +), + + TP_fast_assign( + __entry->call = call->debug_id; + __entry->compl = call->completion; + __entry->error = call->error; + __entry->abort_code = call->abort_code; + ), + + TP_printk("c=%08x %s r=%d ac=%d", + __entry->call, + __print_symbolic(__entry->compl, rxrpc_completions), + __entry->error, + __entry->abort_code) + ); + TRACE_EVENT(rxrpc_transmit, TP_PROTO(struct rxrpc_call *call, enum rxrpc_transmit_trace why), diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index 9c9817ddafc5..21cf164b6d85 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -778,6 +778,7 @@ static inline bool __rxrpc_set_call_completion(struct rxrpc_call *call, call->error = error; call->completion = compl, call->state = RXRPC_CALL_COMPLETE; + trace_rxrpc_call_complete(call); wake_up(>waitq); return true; }
Re: [PATCH net-next 0/6] rxrpc: Fixes
Sorry, that should be net, not net-next, in the subject line. David
[PATCH net-next 0/6] rxrpc: Fixes
Here are six patches for AF_RXRPC: (1) Fix the use of VERSION packets to keep firewall routes open. (2) Fix the incorrect current time usage in a tracepoint. (3) Fix Tx ring annotation corruption. (4) Fix accidental conversion of call-level abort into connection-level abort. (5) Fix calculation of resend time. (6) Remove a couple of unused variables. The patches are tagged here: git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git rxrpc-fixes-20180327 and can also be found on the following branch: http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-fixes David --- David Howells (4): rxrpc: Fix firewall route keepalive rxrpc: Fix a bit of time confusion rxrpc: Fix Tx ring annotation after initial Tx failure rxrpc: Don't treat call aborts as conn aborts Marc Dionne (1): rxrpc: Fix resend event time calculation Sebastian Andrzej Siewior (1): rxrpc: remove unused static variables net/rxrpc/af_rxrpc.c|4 ++ net/rxrpc/ar-internal.h | 14 ++- net/rxrpc/call_event.c |4 +- net/rxrpc/conn_event.c |3 + net/rxrpc/input.c | 17 +--- net/rxrpc/net_ns.c | 21 ++ net/rxrpc/output.c | 59 - net/rxrpc/peer_event.c | 96 +++ net/rxrpc/peer_object.c |7 +++ net/rxrpc/rxkad.c |2 + net/rxrpc/security.c|3 - net/rxrpc/sendmsg.c |4 +- 12 files changed, 218 insertions(+), 16 deletions(-)
[PATCH net 1/6] rxrpc: Fix firewall route keepalive
Fix the firewall route keepalive part of AF_RXRPC which is currently function incorrectly by replying to VERSION REPLY packets from the server with VERSION REQUEST packets. Instead, send VERSION REPLY packets to the peers of service connections to act as keep-alives 20s after the latest packet was transmitted to that peer. Also, just discard VERSION REPLY packets rather than replying to them. Signed-off-by: David Howells <dhowe...@redhat.com> --- net/rxrpc/af_rxrpc.c|4 ++ net/rxrpc/ar-internal.h | 14 ++- net/rxrpc/conn_event.c |3 + net/rxrpc/input.c |2 + net/rxrpc/net_ns.c | 21 ++ net/rxrpc/output.c | 59 - net/rxrpc/peer_event.c | 96 +++ net/rxrpc/peer_object.c |7 +++ net/rxrpc/rxkad.c |2 + 9 files changed, 204 insertions(+), 4 deletions(-) diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c index 0c9c18aa7c77..eca50b495021 100644 --- a/net/rxrpc/af_rxrpc.c +++ b/net/rxrpc/af_rxrpc.c @@ -759,6 +759,7 @@ static __poll_t rxrpc_poll(struct file *file, struct socket *sock, static int rxrpc_create(struct net *net, struct socket *sock, int protocol, int kern) { + struct rxrpc_net *rxnet; struct rxrpc_sock *rx; struct sock *sk; @@ -798,6 +799,9 @@ static int rxrpc_create(struct net *net, struct socket *sock, int protocol, rwlock_init(>call_lock); memset(>srx, 0, sizeof(rx->srx)); + rxnet = rxrpc_net(sock_net(>sk)); + timer_reduce(>peer_keepalive_timer, jiffies + 1); + _leave(" = 0 [%p]", rx); return 0; } diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index 416688381eb7..352ef8d546f9 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -97,8 +97,16 @@ struct rxrpc_net { struct list_headlocal_endpoints; struct mutexlocal_mutex;/* Lock for ->local_endpoints */ - spinlock_t peer_hash_lock; /* Lock for ->peer_hash */ DECLARE_HASHTABLE (peer_hash, 10); + spinlock_t peer_hash_lock; /* Lock for ->peer_hash */ + +#define RXRPC_KEEPALIVE_TIME 20 /* NAT keepalive time in seconds */ + u8 peer_keepalive_cursor; + ktime_t peer_keepalive_base; + struct hlist_head peer_keepalive[RXRPC_KEEPALIVE_TIME + 1]; + struct hlist_head peer_keepalive_new; + struct timer_list peer_keepalive_timer; + struct work_struct peer_keepalive_work; }; /* @@ -285,6 +293,8 @@ struct rxrpc_peer { struct hlist_head error_targets; /* targets for net error distribution */ struct work_struct error_distributor; struct rb_root service_conns; /* Service connections */ + struct hlist_node keepalive_link; /* Link in net->peer_keepalive[] */ + time64_tlast_tx_at; /* Last time packet sent here */ seqlock_t service_conn_lock; spinlock_t lock; /* access lock */ unsigned intif_mtu; /* interface MTU for this peer */ @@ -1025,6 +1035,7 @@ int rxrpc_send_ack_packet(struct rxrpc_call *, bool, rxrpc_serial_t *); int rxrpc_send_abort_packet(struct rxrpc_call *); int rxrpc_send_data_packet(struct rxrpc_call *, struct sk_buff *, bool); void rxrpc_reject_packets(struct rxrpc_local *); +void rxrpc_send_keepalive(struct rxrpc_peer *); /* * peer_event.c @@ -1033,6 +1044,7 @@ void rxrpc_error_report(struct sock *); void rxrpc_peer_error_distributor(struct work_struct *); void rxrpc_peer_add_rtt(struct rxrpc_call *, enum rxrpc_rtt_rx_trace, rxrpc_serial_t, rxrpc_serial_t, ktime_t, ktime_t); +void rxrpc_peer_keepalive_worker(struct work_struct *); /* * peer_object.c diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c index b1dfae107431..5f06dc8266bf 100644 --- a/net/rxrpc/conn_event.c +++ b/net/rxrpc/conn_event.c @@ -136,6 +136,7 @@ static void rxrpc_conn_retransmit_call(struct rxrpc_connection *conn, } kernel_sendmsg(conn->params.local->socket, , iov, ioc, len); + conn->params.peer->last_tx_at = ktime_get_real(); _leave(""); return; } @@ -238,6 +239,8 @@ static int rxrpc_abort_connection(struct rxrpc_connection *conn, return -EAGAIN; } + conn->params.peer->last_tx_at = ktime_get_real(); + _leave(" = 0"); return 0; } diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index 6fc61400337f..31bf954480b6 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -1183,6 +1183,8 @@ void rxrpc_data_ready(struct sock *udp_sk) switch (sp->hdr.type) { case RXRPC_PACKET_TYPE_VERSION: +
[PATCH net 4/6] rxrpc: Don't treat call aborts as conn aborts
If a call-level abort is received for the previous call to complete on a connection channel, then that abort is queued for the connection processor to handle. Unfortunately, the connection processor then assumes without checking that the abort is connection-level (ie. callNumber is 0) and distributes it over all active calls on that connection, thereby incorrectly aborting them. Fix this by discarding aborts aimed at a completed call. Further, discard all packets aimed at a call that's complete if there's currently an active call on a channel, since the DATA packets associated with the new call automatically terminate the old call. Fixes: 18bfeba50dfd ("rxrpc: Perform terminal call ACK/ABORT retransmission from conn processor") Reported-by: Marc Dionne <marc.dio...@auristor.com> Signed-off-by: David Howells <dhowe...@redhat.com> --- net/rxrpc/input.c | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index 31bf954480b6..846ec0938953 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -1242,16 +1242,19 @@ void rxrpc_data_ready(struct sock *udp_sk) goto discard_unlock; if (sp->hdr.callNumber == chan->last_call) { - /* For the previous service call, if completed successfully, we -* discard all further packets. + if (chan->call || + sp->hdr.type == RXRPC_PACKET_TYPE_ABORT) + goto discard_unlock; + + /* For the previous service call, if completed +* successfully, we discard all further packets. */ if (rxrpc_conn_is_service(conn) && - (chan->last_type == RXRPC_PACKET_TYPE_ACK || -sp->hdr.type == RXRPC_PACKET_TYPE_ABORT)) + chan->last_type == RXRPC_PACKET_TYPE_ACK) goto discard_unlock; - /* But otherwise we need to retransmit the final packet from -* data cached in the connection record. + /* But otherwise we need to retransmit the final packet +* from data cached in the connection record. */ rxrpc_post_packet_to_conn(conn, skb); goto out_unlock;
[PATCH net 2/6] rxrpc: Fix a bit of time confusion
The rxrpc_reduce_call_timer() function should be passed the 'current time' in jiffies, not the current ktime time. It's confusing in rxrpc_resend because that has to deal with both. Pass the correct current time in. Note that this only affects the trace produced and not the functioning of the code. Fixes: a158bdd3247b ("rxrpc: Fix call timeouts") Signed-off-by: David Howells <dhowe...@redhat.com> --- net/rxrpc/call_event.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c index ad2ab1103189..27f77b7d0ead 100644 --- a/net/rxrpc/call_event.c +++ b/net/rxrpc/call_event.c @@ -237,7 +237,7 @@ static void rxrpc_resend(struct rxrpc_call *call, unsigned long now_j) * retransmitting data. */ if (!retrans) { - rxrpc_reduce_call_timer(call, resend_at, now, + rxrpc_reduce_call_timer(call, resend_at, now_j, rxrpc_timer_set_for_resend); spin_unlock_bh(>lock); ack_ts = ktime_sub(now, call->acks_latest_ts);
[PATCH net 3/6] rxrpc: Fix Tx ring annotation after initial Tx failure
rxrpc calls have a ring of packets that are awaiting ACK or retransmission and a parallel ring of annotations that tracks the state of those packets. If the initial transmission of a packet on the underlying UDP socket fails then the packet annotation is marked for resend - but the setting of this mark accidentally erases the last-packet mark also stored in the same annotation slot. If this happens, a call won't switch out of the Tx phase when all the packets have been transmitted. Fix this by retaining the last-packet mark and only altering the packet state. Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code") Signed-off-by: David Howells <dhowe...@redhat.com> --- net/rxrpc/sendmsg.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c index 09f2a3e05221..7a94ce92ffdc 100644 --- a/net/rxrpc/sendmsg.c +++ b/net/rxrpc/sendmsg.c @@ -130,7 +130,9 @@ static inline void rxrpc_instant_resend(struct rxrpc_call *call, int ix) spin_lock_bh(>lock); if (call->state < RXRPC_CALL_COMPLETE) { - call->rxtx_annotations[ix] = RXRPC_TX_ANNO_RETRANS; + call->rxtx_annotations[ix] = + (call->rxtx_annotations[ix] & RXRPC_TX_ANNO_LAST) | + RXRPC_TX_ANNO_RETRANS; if (!test_and_set_bit(RXRPC_CALL_EV_RESEND, >events)) rxrpc_queue_call(call); }
[PATCH net 5/6] rxrpc: Fix resend event time calculation
From: Marc Dionne <marc.dio...@auristor.com> Commit a158bdd3 ("rxrpc: Fix call timeouts") reworked the time calculation for the next resend event. For this calculation, "oldest" will be before "now", so ktime_sub(oldest, now) will yield a negative value. When passed to nsecs_to_jiffies which expects an unsigned value, the end result will be a very large value, and a resend event scheduled far into the future. This could cause calls to stall if some packets were lost. Fix by ordering the arguments to ktime_sub correctly. Fixes: a158bdd3247b ("rxrpc: Fix call timeouts") Signed-off-by: Marc Dionne <marc.dio...@auristor.com> Signed-off-by: David Howells <dhowe...@redhat.com> --- net/rxrpc/call_event.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c index 27f77b7d0ead..87ec697ca117 100644 --- a/net/rxrpc/call_event.c +++ b/net/rxrpc/call_event.c @@ -225,7 +225,7 @@ static void rxrpc_resend(struct rxrpc_call *call, unsigned long now_j) ktime_to_ns(ktime_sub(skb->tstamp, max_age))); } - resend_at = nsecs_to_jiffies(ktime_to_ns(ktime_sub(oldest, now))); + resend_at = nsecs_to_jiffies(ktime_to_ns(ktime_sub(now, oldest))); resend_at += jiffies + rxrpc_resend_timeout; WRITE_ONCE(call->resend_at, resend_at);
[PATCH net 6/6] rxrpc: remove unused static variables
From: Sebastian Andrzej Siewior <bige...@linutronix.de> The rxrpc_security_methods and rxrpc_security_sem user has been removed in 648af7fca159 ("rxrpc: Absorb the rxkad security module"). This was noticed by kbuild test robot for the -RT tree but is also true for !RT. Reported-by: kbuild test robot <fengguang...@intel.com> Signed-off-by: Sebastian Andrzej Siewior <bige...@linutronix.de> Signed-off-by: David Howells <dhowe...@redhat.com> --- net/rxrpc/security.c |3 --- 1 file changed, 3 deletions(-) diff --git a/net/rxrpc/security.c b/net/rxrpc/security.c index e9f428351293..c4479afe8ae7 100644 --- a/net/rxrpc/security.c +++ b/net/rxrpc/security.c @@ -19,9 +19,6 @@ #include #include "ar-internal.h" -static LIST_HEAD(rxrpc_security_methods); -static DECLARE_RWSEM(rxrpc_security_sem); - static const struct rxrpc_security *rxrpc_security_types[] = { [RXRPC_SECURITY_NONE] = _no_security, #ifdef CONFIG_RXKAD
Re: [PATCH net-next nfs 6/6] net: Convert rxrpc_net_ops
Kirill Tkhai <ktk...@virtuozzo.com> wrote: > These pernet_operations modifies rxrpc_net_id-pointed > per-net entities. There is external link to AF_RXRPC > in fs/afs/Kconfig, but it seems there is no other > pernet_operations interested in that per-net entities. The fs/afs/ namespacing stuff isn't active yet as I'm trying to decide on how to deal with the DNS cache which also needs namespacing. > Signed-off-by: Kirill Tkhai <ktk...@virtuozzo.com> DaveM: Can you take this into net-next and add: Acked-by: David Howells <dhowe...@redhat.com>
Re: [PATCH 00/16] remove eight obsolete architectures
Do we have anything left that still implements NOMMU? David
Re: [PATCH v2] KEYS: DNS: limit the length of option strings
Eric Biggerswrote: > Fix it by limiting option strings (combined name + value) to a much more > reasonable 128 bytes. The exact limit is arbitrary, but currently the > only recognized option is formatted as "dnserror=%lu" which fits well > within this limit. There will be more options coming ("ipv4", "ipv6") but they shouldn't overrun this limit and we can always extend the limit if need be. David
[PATCH net] rxrpc: Fix send in rxrpc_send_data_packet()
All the kernel_sendmsg() calls in rxrpc_send_data_packet() need to send both parts of the iov[] buffer, but one of them does not. Fix it so that it does. Without this, short IPv6 rxrpc DATA packets may be seen that have the rxrpc header included, but no payload. Fixes: 5a924b8951f8 ("rxrpc: Don't store the rxrpc header in the Tx queue sk_buffs") Reported-by: Marc Dionne <marc.dio...@auristor.com> Signed-off-by: David Howells <dhowe...@redhat.com> --- net/rxrpc/output.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c index 42410e910aff..cf73dc006c3b 100644 --- a/net/rxrpc/output.c +++ b/net/rxrpc/output.c @@ -445,7 +445,7 @@ int rxrpc_send_data_packet(struct rxrpc_call *call, struct sk_buff *skb, (char *), sizeof(opt)); if (ret == 0) { ret = kernel_sendmsg(conn->params.local->socket, , -iov, 1, iov[0].iov_len); +iov, 2, len); opt = IPV6_PMTUDISC_DO; kernel_setsockopt(conn->params.local->socket,
[PATCH net] rxrpc: Work around usercopy check
Due to a check recently added to copy_to_user(), it's now not permitted to copy from slab-held data to userspace unless the slab is whitelisted. This affects rxrpc_recvmsg() when it attempts to place an RXRPC_USER_CALL_ID control message in the userspace control message buffer. A warning is generated by usercopy_warn() because the source is the copy of the user_call_ID retained in the rxrpc_call struct. Work around the issue by copying the user_call_ID to a variable on the stack and passing that to put_cmsg(). The warning generated looks like: Bad or missing usercopy whitelist? Kernel memory exposure attempt detected from SLUB object 'dmaengine-unmap-128' (offset 680, size 8)! WARNING: CPU: 0 PID: 1401 at mm/usercopy.c:81 usercopy_warn+0x7e/0xa0 ... RIP: 0010:usercopy_warn+0x7e/0xa0 ... Call Trace: __check_object_size+0x9c/0x1a0 put_cmsg+0x98/0x120 rxrpc_recvmsg+0x6fc/0x1010 [rxrpc] ? finish_wait+0x80/0x80 ___sys_recvmsg+0xf8/0x240 ? __clear_rsb+0x25/0x3d ? __clear_rsb+0x15/0x3d ? __clear_rsb+0x25/0x3d ? __clear_rsb+0x15/0x3d ? __clear_rsb+0x25/0x3d ? __clear_rsb+0x15/0x3d ? __clear_rsb+0x25/0x3d ? __clear_rsb+0x15/0x3d ? finish_task_switch+0xa6/0x2b0 ? trace_hardirqs_on_caller+0xed/0x180 ? _raw_spin_unlock_irq+0x29/0x40 ? __sys_recvmsg+0x4e/0x90 __sys_recvmsg+0x4e/0x90 do_syscall_64+0x7a/0x220 entry_SYSCALL_64_after_hwframe+0x26/0x9b Reported-by: Jonathan Billings <jsbilli...@jsbillings.org> Signed-off-by: David Howells <dhowe...@redhat.com> Acked-by: Kees Cook <keesc...@chromium.org> Tested-by: Jonathan Billings <jsbilli...@jsbillings.org> --- net/rxrpc/recvmsg.c |5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c index cc21e8db25b0..9d45d8b56744 100644 --- a/net/rxrpc/recvmsg.c +++ b/net/rxrpc/recvmsg.c @@ -517,9 +517,10 @@ int rxrpc_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, ret = put_cmsg(msg, SOL_RXRPC, RXRPC_USER_CALL_ID, sizeof(unsigned int), ); } else { + unsigned long idl = call->user_call_ID; + ret = put_cmsg(msg, SOL_RXRPC, RXRPC_USER_CALL_ID, - sizeof(unsigned long), - >user_call_ID); + sizeof(unsigned long), ); } if (ret < 0) goto error_unlock_call;
[PATCH net] rxrpc: Don't put crypto buffers on the stack
Don't put buffers of data to be handed to crypto on the stack as this may cause an assertion failure in the kernel (see below). Fix this by using an kmalloc'd buffer instead. kernel BUG at ./include/linux/scatterlist.h:147! ... RIP: 0010:rxkad_encrypt_response.isra.6+0x191/0x1b0 [rxrpc] RSP: 0018:be2fc06cfca8 EFLAGS: 00010246 RAX: RBX: 989277d59900 RCX: 0028 RDX: 259dc06cfd88 RSI: 0025 RDI: be30406cfd88 RBP: be2fc06cfd60 R08: be2fc06cfd08 R09: be2fc06cfd08 R10: R11: R12: 17c5f80d9f95 R13: be2fc06cfd88 R14: 98927a3f7aa0 R15: be2fc06cfd08 FS: () GS:98927fc0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 55b1ff28f0f8 CR3: 1b412003 CR4: 003606f0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 Call Trace: rxkad_respond_to_challenge+0x297/0x330 [rxrpc] rxrpc_process_connection+0xd1/0x690 [rxrpc] ? process_one_work+0x1c3/0x680 ? __lock_is_held+0x59/0xa0 process_one_work+0x249/0x680 worker_thread+0x3a/0x390 ? process_one_work+0x680/0x680 kthread+0x121/0x140 ? kthread_create_worker_on_cpu+0x70/0x70 ret_from_fork+0x3a/0x50 Reported-by: Jonathan Billings <jsbilli...@jsbillings.org> Reported-by: Marc Dionne <marc.dio...@auristor.com> Signed-off-by: David Howells <dhowe...@redhat.com> Tested-by: Jonathan Billings <jsbilli...@jsbillings.org> --- net/rxrpc/conn_event.c |1 + net/rxrpc/rxkad.c | 92 +++- 2 files changed, 52 insertions(+), 41 deletions(-) diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c index 4ca11be6be3c..b1dfae107431 100644 --- a/net/rxrpc/conn_event.c +++ b/net/rxrpc/conn_event.c @@ -460,6 +460,7 @@ void rxrpc_process_connection(struct work_struct *work) case -EKEYEXPIRED: case -EKEYREJECTED: goto protocol_error; + case -ENOMEM: case -EAGAIN: goto requeue_and_leave; case -ECONNABORTED: diff --git a/net/rxrpc/rxkad.c b/net/rxrpc/rxkad.c index c38b3a1de56c..77cb23c7bd0a 100644 --- a/net/rxrpc/rxkad.c +++ b/net/rxrpc/rxkad.c @@ -773,8 +773,7 @@ static int rxkad_respond_to_challenge(struct rxrpc_connection *conn, { const struct rxrpc_key_token *token; struct rxkad_challenge challenge; - struct rxkad_response resp - __attribute__((aligned(8))); /* must be aligned for crypto */ + struct rxkad_response *resp; struct rxrpc_skb_priv *sp = rxrpc_skb(skb); const char *eproto; u32 version, nonce, min_level, abort_code; @@ -818,26 +817,29 @@ static int rxkad_respond_to_challenge(struct rxrpc_connection *conn, token = conn->params.key->payload.data[0]; /* build the response packet */ - memset(, 0, sizeof(resp)); - - resp.version= htonl(RXKAD_VERSION); - resp.encrypted.epoch= htonl(conn->proto.epoch); - resp.encrypted.cid = htonl(conn->proto.cid); - resp.encrypted.securityIndex= htonl(conn->security_ix); - resp.encrypted.inc_nonce= htonl(nonce + 1); - resp.encrypted.level= htonl(conn->params.security_level); - resp.kvno = htonl(token->kad->kvno); - resp.ticket_len = htonl(token->kad->ticket_len); - - resp.encrypted.call_id[0] = htonl(conn->channels[0].call_counter); - resp.encrypted.call_id[1] = htonl(conn->channels[1].call_counter); - resp.encrypted.call_id[2] = htonl(conn->channels[2].call_counter); - resp.encrypted.call_id[3] = htonl(conn->channels[3].call_counter); + resp = kzalloc(sizeof(struct rxkad_response), GFP_NOFS); + if (!resp) + return -ENOMEM; + + resp->version = htonl(RXKAD_VERSION); + resp->encrypted.epoch = htonl(conn->proto.epoch); + resp->encrypted.cid = htonl(conn->proto.cid); + resp->encrypted.securityIndex = htonl(conn->security_ix); + resp->encrypted.inc_nonce = htonl(nonce + 1); + resp->encrypted.level = htonl(conn->params.security_level); + resp->kvno = htonl(token->kad->kvno); + resp->ticket_len= htonl(token->kad->ticket_len); + resp->encrypted.call_id[0] = htonl(conn->channels[0].call_counter); + resp->encrypted.call_id[1] = htonl(conn->channels[1].call_counter); + resp->encrypted.call_id[2] = htonl(conn->channels[2].call_counter); + resp->encrypted.call_id[3] = htonl(conn->
[PATCH net] rxrpc: Fix received abort handling
AF_RXRPC is incorrectly sending back to the server any abort it receives for a client connection. This is due to the final-ACK offload to the connection event processor patch. The abort code is copied into the last-call information on the connection channel and then the event processor is set. Instead, the following should be done: (1) In the case of a final-ACK for a successful call, the ACK should be scheduled as before. (2) In the case of a locally generated ABORT, the ABORT details should be cached for sending in response to further packets related to that call and no further action scheduled at call disconnect time. (3) In the case of an ACK received from the peer, the call should be considered dead, no ABORT should be transmitted at this time. In response to further non-ABORT packets from the peer relating to this call, an RX_USER_ABORT ABORT should be transmitted. (4) In the case of a call killed due to network error, an RX_USER_ABORT ABORT should be cached for transmission in response to further packets, but no ABORT should be sent at this time. Fixes: 3136ef49a14c ("rxrpc: Delay terminal ACK transmission on a client call") Signed-off-by: David Howells <dhowe...@redhat.com> --- net/rxrpc/conn_client.c |3 ++- net/rxrpc/conn_object.c | 16 2 files changed, 14 insertions(+), 5 deletions(-) diff --git a/net/rxrpc/conn_client.c b/net/rxrpc/conn_client.c index 7f74ca3059f8..064175068059 100644 --- a/net/rxrpc/conn_client.c +++ b/net/rxrpc/conn_client.c @@ -834,7 +834,8 @@ void rxrpc_disconnect_client_call(struct rxrpc_call *call) * can be skipped if we find a follow-on call. The first DATA packet * of the follow on call will implicitly ACK this call. */ - if (test_bit(RXRPC_CALL_EXPOSED, >flags)) { + if (call->completion == RXRPC_CALL_SUCCEEDED && + test_bit(RXRPC_CALL_EXPOSED, >flags)) { unsigned long final_ack_at = jiffies + 2; WRITE_ONCE(chan->final_ack_at, final_ack_at); diff --git a/net/rxrpc/conn_object.c b/net/rxrpc/conn_object.c index c628351eb900..ccbac190add1 100644 --- a/net/rxrpc/conn_object.c +++ b/net/rxrpc/conn_object.c @@ -177,13 +177,21 @@ void __rxrpc_disconnect_call(struct rxrpc_connection *conn, * through the channel, whilst disposing of the actual call record. */ trace_rxrpc_disconnect_call(call); - if (call->abort_code) { - chan->last_abort = call->abort_code; - chan->last_type = RXRPC_PACKET_TYPE_ABORT; - } else { + switch (call->completion) { + case RXRPC_CALL_SUCCEEDED: chan->last_seq = call->rx_hard_ack; chan->last_type = RXRPC_PACKET_TYPE_ACK; + break; + case RXRPC_CALL_LOCALLY_ABORTED: + chan->last_abort = call->abort_code; + chan->last_type = RXRPC_PACKET_TYPE_ABORT; + break; + default: + chan->last_abort = RX_USER_ABORT; + chan->last_type = RXRPC_PACKET_TYPE_ABORT; + break; } + /* Sync with rxrpc_conn_retransmit(). */ smp_wmb(); chan->last_call = chan->call_id;
[PATCH net] rxrpc: Fix the MAINTAINERS record
Fix the MAINTAINERS record so that it's more obvious who the maintainer for AF_RXRPC is. Reported-by: Joe Perches <j...@perches.com> Reported-by: David Miller <da...@davemloft.net> Signed-off-by: David Howells <dhowe...@redhat.com> --- MAINTAINERS | 18 +++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 77d819b458a9..511b858405bc 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -554,13 +554,13 @@ S:Orphan F: Documentation/filesystems/affs.txt F: fs/affs/ -AFS FILESYSTEM & AF_RXRPC SOCKET DOMAIN +AFS FILESYSTEM M: David Howells <dhowe...@redhat.com> L: linux-...@lists.infradead.org S: Supported F: fs/afs/ -F: include/net/af_rxrpc.h -F: net/rxrpc/af_rxrpc.c +F: include/trace/events/afs.h +F: Documentation/filesystems/afs.txt W: https://www.infradead.org/~dhowells/kafs/ AGPGART DRIVER @@ -11777,6 +11777,18 @@ T: git git://git.kernel.org/pub/scm/linux/kernel/git/jes/linux.git rtl8xxxu-deve S: Maintained F: drivers/net/wireless/realtek/rtl8xxxu/ +RXRPC SOCKETS (AF_RXRPC) +M: David Howells <dhowe...@redhat.com> +L: linux-...@lists.infradead.org +S: Supported +F: net/rxrpc/ +F: include/keys/rxrpc-type.h +F: include/net/af_rxrpc.h +F: include/trace/events/rxrpc.h +F: include/uapi/linux/rxrpc.h +F: Documentation/networking/rxrpc.txt +W: https://www.infradead.org/~dhowells/kafs/ + S3 SAVAGE FRAMEBUFFER DRIVER M: Antonino Daplas <adap...@gmail.com> L: linux-fb...@vger.kernel.org
[PATCH net] rxrpc: Use correct netns source in rxrpc_release_sock()
In rxrpc_release_sock() there may be no rx->local value to access, so we can't unconditionally follow it to the rxrpc network namespace information to poke the connection reapers. Instead, use the socket's namespace pointer to find the namespace. This unfixed code causes the following static checker warning: net/rxrpc/af_rxrpc.c:898 rxrpc_release_sock() error: we previously assumed 'rx->local' could be null (see line 887) Fixes: 3d18cbb7fd0c ("rxrpc: Fix conn expiry timers") Reported-by: Dan Carpenter <dan.carpen...@oracle.com> Signed-off-by: David Howells <dhowe...@redhat.com> --- net/rxrpc/af_rxrpc.c |5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c index 8f7cf4c042be..dcd818fa837e 100644 --- a/net/rxrpc/af_rxrpc.c +++ b/net/rxrpc/af_rxrpc.c @@ -860,6 +860,7 @@ static void rxrpc_sock_destructor(struct sock *sk) static int rxrpc_release_sock(struct sock *sk) { struct rxrpc_sock *rx = rxrpc_sk(sk); + struct rxrpc_net *rxnet = rxrpc_net(sock_net(>sk)); _enter("%p{%d,%d}", sk, sk->sk_state, refcount_read(>sk_refcnt)); @@ -895,8 +896,8 @@ static int rxrpc_release_sock(struct sock *sk) rxrpc_release_calls_on_socket(rx); flush_workqueue(rxrpc_workqueue); rxrpc_purge_queue(>sk_receive_queue); - rxrpc_queue_work(>local->rxnet->service_conn_reaper); - rxrpc_queue_work(>local->rxnet->client_conn_reaper); + rxrpc_queue_work(>service_conn_reaper); + rxrpc_queue_work(>client_conn_reaper); rxrpc_put_local(rx->local); rx->local = NULL;
Re: [PATCH] rxrpc: Neaten logging macros and add KERN_DEBUG logging level
Joe Perches <j...@perches.com> wrote: > There is no listed rxrpc maintainer. There's a script in the kernel called get_maintainer.pl which you might find of use: warthog>./scripts/get_maintainer.pl net/rxrpc/ "David S. Miller" <da...@davemloft.net> (maintainer:NETWORKING [GENERAL],commit_signer:23/74=31%) David Howells <dhowe...@redhat.com> (commit_signer:64/74=86%,authored:60/74=81%) netdev@vger.kernel.org (open list:NETWORKING [GENERAL]) linux-ker...@vger.kernel.org (open list) It would seem a good idea to cc me, as the one with the largest "authored" also. David
Re: [PATCH net-next 0/3] rxrpc: Fixes
David Millerwrote: > This email says "net-next", yet your patches say "net". Sorry about that - it should be 'net'. I copied an old cover note. All the patches have a macro substitution, but the cover note does not. Do you want me to repost? David
[PATCH net-next 0/3] rxrpc: Fixes
Here are three patches for AF_RXRPC. One removes some whitespace, one fixes terminal ACK generation and the third makes a couple of places actually use the timeout value just determined rather than ignoring it. The patches can be found here also: http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-fixes Tagged thusly: git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git rxrpc-fixes-20171129 David --- David Howells (2): rxrpc: Clean up whitespace rxrpc: Fix ACK generation from the connection event processor Gustavo A. R. Silva (1): rxrpc: Fix variable overwrite net/rxrpc/call_event.c |4 ++-- net/rxrpc/conn_event.c | 50 +++ net/rxrpc/conn_object.c |2 +- net/rxrpc/input.c |4 ++-- net/rxrpc/sendmsg.c |2 +- 5 files changed, 35 insertions(+), 27 deletions(-)
[PATCH net 2/3] rxrpc: Fix ACK generation from the connection event processor
Repeat terminal ACKs and now terminal ACKs are now generated from the connection event processor rather from call handling as this allows us to discard client call structures as soon as possible and free up the channel for a follow on call. However, in ACKs so generated, the additional information trailer is malformed because the padding that's meant to be in the middle isn't included in what's transmitted. Fix it so that the 3 bytes of padding are included in the transmission. Further, the trailer is misaligned because of the padding, so assigment to the u16 and u32 fields inside it might cause problems on some arches, so fix this by breaking the padding and the trailer out of the packed struct. (This also deals with potential compiler weirdies where some of the nested structs are packed and some aren't). The symptoms can be seen in wireshark as terminal DUPLICATE or IDLE ACK packets in which the Max MTU, Interface MTU and rwind fields have weird values and the Max Packets field is apparently missing. Reported-by: Jeffrey Altman <jalt...@auristor.com> Signed-off-by: David Howells <dhowe...@redhat.com> --- net/rxrpc/conn_event.c | 50 1 file changed, 29 insertions(+), 21 deletions(-) diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c index 9e9a8db1bc9c..4ca11be6be3c 100644 --- a/net/rxrpc/conn_event.c +++ b/net/rxrpc/conn_event.c @@ -30,22 +30,18 @@ static void rxrpc_conn_retransmit_call(struct rxrpc_connection *conn, struct rxrpc_skb_priv *sp = skb ? rxrpc_skb(skb) : NULL; struct rxrpc_channel *chan; struct msghdr msg; - struct kvec iov; + struct kvec iov[3]; struct { struct rxrpc_wire_header whdr; union { - struct { - __be32 code; - } abort; - struct { - struct rxrpc_ackpacket ack; - u8 padding[3]; - struct rxrpc_ackinfo info; - }; + __be32 abort_code; + struct rxrpc_ackpacket ack; }; } __attribute__((packed)) pkt; + struct rxrpc_ackinfo ack_info; size_t len; - u32 serial, mtu, call_id; + int ioc; + u32 serial, mtu, call_id, padding; _enter("%d", conn->debug_id); @@ -66,6 +62,13 @@ static void rxrpc_conn_retransmit_call(struct rxrpc_connection *conn, msg.msg_controllen = 0; msg.msg_flags = 0; + iov[0].iov_base = + iov[0].iov_len = sizeof(pkt.whdr); + iov[1].iov_base = + iov[1].iov_len = 3; + iov[2].iov_base = _info; + iov[2].iov_len = sizeof(ack_info); + pkt.whdr.epoch = htonl(conn->proto.epoch); pkt.whdr.cid= htonl(conn->proto.cid); pkt.whdr.callNumber = htonl(call_id); @@ -80,8 +83,10 @@ static void rxrpc_conn_retransmit_call(struct rxrpc_connection *conn, len = sizeof(pkt.whdr); switch (chan->last_type) { case RXRPC_PACKET_TYPE_ABORT: - pkt.abort.code = htonl(chan->last_abort); - len += sizeof(pkt.abort); + pkt.abort_code = htonl(chan->last_abort); + iov[0].iov_len += sizeof(pkt.abort_code); + len += sizeof(pkt.abort_code); + ioc = 1; break; case RXRPC_PACKET_TYPE_ACK: @@ -94,13 +99,19 @@ static void rxrpc_conn_retransmit_call(struct rxrpc_connection *conn, pkt.ack.serial = htonl(skb ? sp->hdr.serial : 0); pkt.ack.reason = skb ? RXRPC_ACK_DUPLICATE : RXRPC_ACK_IDLE; pkt.ack.nAcks = 0; - pkt.info.rxMTU = htonl(rxrpc_rx_mtu); - pkt.info.maxMTU = htonl(mtu); - pkt.info.rwind = htonl(rxrpc_rx_window_size); - pkt.info.jumbo_max = htonl(rxrpc_rx_jumbo_max); + ack_info.rxMTU = htonl(rxrpc_rx_mtu); + ack_info.maxMTU = htonl(mtu); + ack_info.rwind = htonl(rxrpc_rx_window_size); + ack_info.jumbo_max = htonl(rxrpc_rx_jumbo_max); pkt.whdr.flags |= RXRPC_SLOW_START_OK; - len += sizeof(pkt.ack) + sizeof(pkt.info); + padding = 0; + iov[0].iov_len += sizeof(pkt.ack); + len += sizeof(pkt.ack) + 3 + sizeof(ack_info); + ioc = 3; break; + + default: + return; } /* Resync with __rxrpc_disconnect_call() and check that the last call @@ -110,9 +121,6 @@ static void rxrpc_conn_retransmit_call(struct rxrpc_connection *conn, if (READ_ONCE(chan->
[PATCH net 1/3] rxrpc: Clean up whitespace
Clean up some whitespace from rxrpc. Signed-off-by: David Howells <dhowe...@redhat.com> --- net/rxrpc/call_event.c |2 +- net/rxrpc/conn_object.c |2 +- net/rxrpc/input.c |4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c index bda952ffe6a6..555274ddc514 100644 --- a/net/rxrpc/call_event.c +++ b/net/rxrpc/call_event.c @@ -426,7 +426,7 @@ void rxrpc_process_call(struct work_struct *work) next = call->expect_rx_by; #define set(T) { t = READ_ONCE(T); if (time_before(t, next)) next = t; } - + set(call->expect_req_by); set(call->expect_term_by); set(call->ack_at); diff --git a/net/rxrpc/conn_object.c b/net/rxrpc/conn_object.c index 1aad04a32d5e..c628351eb900 100644 --- a/net/rxrpc/conn_object.c +++ b/net/rxrpc/conn_object.c @@ -424,7 +424,7 @@ void rxrpc_service_connection_reaper(struct work_struct *work) if (earliest != now + MAX_JIFFY_OFFSET) { _debug("reschedule reaper %ld", (long)earliest - (long)now); ASSERT(time_after(earliest, now)); - rxrpc_set_service_reap_timer(rxnet, earliest); + rxrpc_set_service_reap_timer(rxnet, earliest); } while (!list_empty()) { diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index 23a5e61d8f79..6fc61400337f 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -976,7 +976,7 @@ static void rxrpc_input_call_packet(struct rxrpc_call *call, rxrpc_reduce_call_timer(call, expect_rx_by, now, rxrpc_timer_set_for_normal); } - + switch (sp->hdr.type) { case RXRPC_PACKET_TYPE_DATA: rxrpc_input_data(call, skb, skew); @@ -1213,7 +1213,7 @@ void rxrpc_data_ready(struct sock *udp_sk) goto reupgrade; conn->service_id = sp->hdr.serviceId; } - + if (sp->hdr.callNumber == 0) { /* Connection-level packet */ _debug("CONN %p {%d}", conn, conn->debug_id);
[PATCH net 3/3] rxrpc: Fix variable overwrite
From: Gustavo A. R. Silva <garsi...@embeddedor.com> Values assigned to both variable resend_at and ack_at are overwritten before they can be used. The correct fix here is to add 'now' to the previously computed value in resend_at and ack_at. Addresses-Coverity-ID: 1462262 Addresses-Coverity-ID: 1462263 Addresses-Coverity-ID: 1462264 Fixes: beb8e5e4f38c ("rxrpc: Express protocol timeouts in terms of RTT") Link: https://marc.info/?i=17004.1511808959%40warthog.procyon.org.uk Signed-off-by: Gustavo A. R. Silva <garsi...@embeddedor.com> Signed-off-by: David Howells <dhowe...@redhat.com> --- net/rxrpc/call_event.c |2 +- net/rxrpc/sendmsg.c|2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c index 555274ddc514..ad2ab1103189 100644 --- a/net/rxrpc/call_event.c +++ b/net/rxrpc/call_event.c @@ -123,7 +123,7 @@ static void __rxrpc_propose_ACK(struct rxrpc_call *call, u8 ack_reason, else ack_at = expiry; - ack_at = jiffies + expiry; + ack_at += now; if (time_before(ack_at, call->ack_at)) { WRITE_ONCE(call->ack_at, ack_at); rxrpc_reduce_call_timer(call, ack_at, now, diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c index a1c53ac066a1..09f2a3e05221 100644 --- a/net/rxrpc/sendmsg.c +++ b/net/rxrpc/sendmsg.c @@ -233,7 +233,7 @@ static void rxrpc_queue_packet(struct rxrpc_sock *rx, struct rxrpc_call *call, if (resend_at < 1) resend_at = 1; - resend_at = now + rxrpc_resend_timeout; + resend_at += now; WRITE_ONCE(call->resend_at, resend_at); rxrpc_reduce_call_timer(call, resend_at, now, rxrpc_timer_set_for_send);
Re: [PATCH] rxrpc: call_event: Fix variable overwrite in __rxrpc_propose_ACK
Gustavo A. R. Silvawrote: > - ack_at = jiffies + expiry; Same issue as with the other patch. Can you just combine the two please? David
Re: [PATCH] rxrpc: sendmsg: Fix variable overwrite in rxrpc_queue_packet
Gustavo A. R. Silvawrote: > - resend_at = now + rxrpc_resend_timeout; > + resend_at += now; Yep. :-) David
Re: [PATCH] rxrpc: sendmsg: Fix variable overwrite in rxrpc_queue_packet
Gustavo A. R. Silvawrote: > Value assigned to variable resend_at is overwritten before it can be used. > > Fix this by removing the value overwrite as it seems that this is a > leftover code. NAK. Your fix will actually cause the code to break. The resend_at value used for the timer must be based on the current time (ie. jiffies in this case), so we can't simply remove that line as the previously calculated resend_at value is a duration, not a time. What you need to do is to instead modify the line you wanted to remove to add 'now' to the previously-computed value. David
[PATCH net-next 00/12] rxrpc: Fixes and improvements
Hi David, Is it too late for this to go to Linus in this merge window? --- Here's a set of patches that fix and improve some stuff in the AF_RXRPC protocol: The patches are: (1) Unlock mutex returned by rxrpc_accept_call(). (2) Don't set connection upgrade by default. (3) Differentiate the call->user_mutex used by the kernel from that used by userspace calling sendmsg() to avoid lockdep warnings. (4) Delay terminal ACK transmission to a work queue so that it can be replaced by the next call if there is one. (5) Split the call parameters from the connection parameters so that more call-specific parameters can be passed through. (6) Fix the call timeouts to work the same as for other RxRPC/AFS implementations. (7) Don't transmit DELAY ACKs immediately, but instead delay them slightly so that can be discarded or can represent more packets. (8) Use RTT to calculate certain protocol timeouts. (9) Add a timeout to detect lost ACK/DATA packets. (10) Add a keepalive function so that we ping the peer if we haven't transmitted for a short while, thereby keeping intervening firewall routes open. (11) Make service endpoints expire like they're supposed to so that the UDP port can be reused. (12) Fix connection expiry timers to make cleanup happen in a more timely fashion. The patches can be found here also: http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-fixes Tagged thusly: git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git rxrpc-fixes-20171124 David --- David Howells (12): rxrpc: The mutex lock returned by rxrpc_accept_call() needs releasing rxrpc: Don't set upgrade by default in sendmsg() rxrpc: Provide a different lockdep key for call->user_mutex for kernel calls rxrpc: Delay terminal ACK transmission on a client call rxrpc: Split the call params from the operation params rxrpc: Fix call timeouts rxrpc: Don't transmit DELAY ACKs immediately on proposal rxrpc: Express protocol timeouts in terms of RTT rxrpc: Add a timeout for detecting lost ACKs/lost DATA rxrpc: Add keepalive for a call rxrpc: Fix service endpoint expiry rxrpc: Fix conn expiry timers include/trace/events/rxrpc.h | 86 include/uapi/linux/rxrpc.h |1 net/rxrpc/af_rxrpc.c | 23 net/rxrpc/ar-internal.h | 103 --- net/rxrpc/call_accept.c |2 net/rxrpc/call_event.c | 229 -- net/rxrpc/call_object.c | 62 +++ net/rxrpc/conn_client.c | 54 -- net/rxrpc/conn_event.c | 74 +++--- net/rxrpc/conn_object.c | 76 +- net/rxrpc/input.c| 74 +- net/rxrpc/misc.c | 19 +-- net/rxrpc/net_ns.c | 33 +- net/rxrpc/output.c | 43 net/rxrpc/recvmsg.c | 12 +- net/rxrpc/sendmsg.c | 126 ++- net/rxrpc/sysctl.c | 60 +-- 17 files changed, 752 insertions(+), 325 deletions(-)
[PATCH net-next 01/12] rxrpc: The mutex lock returned by rxrpc_accept_call() needs releasing
The caller of rxrpc_accept_call() must release the lock on call->user_mutex returned by that function. Signed-off-by: David Howells <dhowe...@redhat.com> --- net/rxrpc/sendmsg.c |5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c index 7d2595582c09..3a99b1a908df 100644 --- a/net/rxrpc/sendmsg.c +++ b/net/rxrpc/sendmsg.c @@ -619,8 +619,8 @@ int rxrpc_do_sendmsg(struct rxrpc_sock *rx, struct msghdr *msg, size_t len) /* The socket is now unlocked. */ if (IS_ERR(call)) return PTR_ERR(call); - rxrpc_put_call(call, rxrpc_call_put); - return 0; + ret = 0; + goto out_put_unlock; } call = rxrpc_find_call_by_user_ID(rx, p.user_call_ID); @@ -689,6 +689,7 @@ int rxrpc_do_sendmsg(struct rxrpc_sock *rx, struct msghdr *msg, size_t len) ret = rxrpc_send_data(rx, call, msg, len, NULL); } +out_put_unlock: mutex_unlock(>user_mutex); error_put: rxrpc_put_call(call, rxrpc_call_put);
[PATCH net-next 03/12] rxrpc: Provide a different lockdep key for call->user_mutex for kernel calls
0060f720 R08: 0001 R09: R10: 0001 R11: 8800b5459b68 R12: 8800ce150e00 R13: 0060f720 R14: 006127a8 R15: padzero+0x1c/0x2b load_elf_binary+0x785/0xdc7 search_binary_handler+0x81/0x1ff do_execveat_common.isra.14+0x600/0x888 do_execve+0x1f/0x21 SyS_execve+0x28/0x2f do_syscall_64+0x89/0x1be entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7fdb6009ee07 RSP: 002b:7fff566d9728 EFLAGS: 0246 ORIG_RAX: 003b RAX: ffda RBX: 55ba57280900 RCX: 7fdb6009ee07 RDX: 55ba5727f270 RSI: 55ba5727cac0 RDI: 55ba57280900 RBP: 55ba57280900 R08: 7fff566d9700 R09: R10: 55ba5727cac0 R11: 0246 R12: R13: 55ba5727cac0 R14: 000055ba5727f270 R15: Signed-off-by: David Howells <dhowe...@redhat.com> --- net/rxrpc/ar-internal.h |2 +- net/rxrpc/call_accept.c |2 +- net/rxrpc/call_object.c | 19 +++ 3 files changed, 17 insertions(+), 6 deletions(-) diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index b2151993d384..a972887b3f5d 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -672,7 +672,7 @@ extern unsigned int rxrpc_max_call_lifetime; extern struct kmem_cache *rxrpc_call_jar; struct rxrpc_call *rxrpc_find_call_by_user_ID(struct rxrpc_sock *, unsigned long); -struct rxrpc_call *rxrpc_alloc_call(gfp_t); +struct rxrpc_call *rxrpc_alloc_call(struct rxrpc_sock *, gfp_t); struct rxrpc_call *rxrpc_new_client_call(struct rxrpc_sock *, struct rxrpc_conn_parameters *, struct sockaddr_rxrpc *, diff --git a/net/rxrpc/call_accept.c b/net/rxrpc/call_accept.c index cbd1701e813a..3028298ca561 100644 --- a/net/rxrpc/call_accept.c +++ b/net/rxrpc/call_accept.c @@ -94,7 +94,7 @@ static int rxrpc_service_prealloc_one(struct rxrpc_sock *rx, /* Now it gets complicated, because calls get registered with the * socket here, particularly if a user ID is preassigned by the user. */ - call = rxrpc_alloc_call(gfp); + call = rxrpc_alloc_call(rx, gfp); if (!call) return -ENOMEM; call->flags |= (1 << RXRPC_CALL_IS_SERVICE); diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c index 4c7fbc6dcce7..1f141dc08ad2 100644 --- a/net/rxrpc/call_object.c +++ b/net/rxrpc/call_object.c @@ -55,6 +55,8 @@ static void rxrpc_call_timer_expired(unsigned long _call) rxrpc_set_timer(call, rxrpc_timer_expired, ktime_get_real()); } +static struct lock_class_key rxrpc_call_user_mutex_lock_class_key; + /* * find an extant server call * - called in process context with IRQs enabled @@ -95,7 +97,7 @@ struct rxrpc_call *rxrpc_find_call_by_user_ID(struct rxrpc_sock *rx, /* * allocate a new call */ -struct rxrpc_call *rxrpc_alloc_call(gfp_t gfp) +struct rxrpc_call *rxrpc_alloc_call(struct rxrpc_sock *rx, gfp_t gfp) { struct rxrpc_call *call; @@ -114,6 +116,14 @@ struct rxrpc_call *rxrpc_alloc_call(gfp_t gfp) goto nomem_2; mutex_init(>user_mutex); + + /* Prevent lockdep reporting a deadlock false positive between the afs +* filesystem and sys_sendmsg() via the mmap sem. +*/ + if (rx->sk.sk_kern_sock) + lockdep_set_class(>user_mutex, + _call_user_mutex_lock_class_key); + setup_timer(>timer, rxrpc_call_timer_expired, (unsigned long)call); INIT_WORK(>processor, _process_call); @@ -151,7 +161,8 @@ struct rxrpc_call *rxrpc_alloc_call(gfp_t gfp) /* * Allocate a new client call. */ -static struct rxrpc_call *rxrpc_alloc_client_call(struct sockaddr_rxrpc *srx, +static struct rxrpc_call *rxrpc_alloc_client_call(struct rxrpc_sock *rx, + struct sockaddr_rxrpc *srx, gfp_t gfp) { struct rxrpc_call *call; @@ -159,7 +170,7 @@ static struct rxrpc_call *rxrpc_alloc_client_call(struct sockaddr_rxrpc *srx, _enter(""); - call = rxrpc_alloc_call(gfp); + call = rxrpc_alloc_call(rx, gfp); if (!call) return ERR_PTR(-ENOMEM); call->state = RXRPC_CALL_CLIENT_AWAIT_CONN; @@ -210,7 +221,7 @@ struct rxrpc_call *rxrpc_new_client_call(struct rxrpc_sock *rx, _enter("%p,%lx", rx, user_call_ID); - call = rxrpc_alloc_client_call(srx, gfp); + call = rxrpc_alloc_client_call(rx, srx, gfp); if (IS_ERR(call)) { release_sock(>sk); _leave(" = %ld", PTR_ERR(call));
[PATCH net-next 05/12] rxrpc: Split the call params from the operation params
When rxrpc_sendmsg() parses the control message buffer, it places the parameters extracted into a structure, but lumps together call parameters (such as user call ID) with operation parameters (such as whether to send data, send an abort or accept a call). Split the call parameters out into their own structure, a copy of which is then embedded in the operation parameters struct. The call parameters struct is then passed down into the places that need it instead of passing the individual parameters. This allows for extra call parameters to be added. Signed-off-by: David Howells <dhowe...@redhat.com> --- net/rxrpc/af_rxrpc.c|8 ++- net/rxrpc/ar-internal.h | 31 - net/rxrpc/call_object.c | 15 ++ net/rxrpc/sendmsg.c | 51 --- 4 files changed, 60 insertions(+), 45 deletions(-) diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c index 9b5c46b052fd..c0cdcf980ffc 100644 --- a/net/rxrpc/af_rxrpc.c +++ b/net/rxrpc/af_rxrpc.c @@ -285,6 +285,7 @@ struct rxrpc_call *rxrpc_kernel_begin_call(struct socket *sock, bool upgrade) { struct rxrpc_conn_parameters cp; + struct rxrpc_call_params p; struct rxrpc_call *call; struct rxrpc_sock *rx = rxrpc_sk(sock->sk); int ret; @@ -302,6 +303,10 @@ struct rxrpc_call *rxrpc_kernel_begin_call(struct socket *sock, if (key && !key->payload.data[0]) key = NULL; /* a no-security key */ + memset(, 0, sizeof(p)); + p.user_call_ID = user_call_ID; + p.tx_total_len = tx_total_len; + memset(, 0, sizeof(cp)); cp.local= rx->local; cp.key = key; @@ -309,8 +314,7 @@ struct rxrpc_call *rxrpc_kernel_begin_call(struct socket *sock, cp.exclusive= false; cp.upgrade = upgrade; cp.service_id = srx->srx_service; - call = rxrpc_new_client_call(rx, , srx, user_call_ID, tx_total_len, -gfp); + call = rxrpc_new_client_call(rx, , srx, , gfp); /* The socket has been unlocked. */ if (!IS_ERR(call)) { call->notify_rx = notify_rx; diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index d1213d503f30..ba63f2231107 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -643,6 +643,35 @@ struct rxrpc_ack_summary { u8 cumulative_acks; }; +/* + * sendmsg() cmsg-specified parameters. + */ +enum rxrpc_command { + RXRPC_CMD_SEND_DATA,/* send data message */ + RXRPC_CMD_SEND_ABORT, /* request abort generation */ + RXRPC_CMD_ACCEPT, /* [server] accept incoming call */ + RXRPC_CMD_REJECT_BUSY, /* [server] reject a call as busy */ +}; + +struct rxrpc_call_params { + s64 tx_total_len; /* Total Tx data length (if send data) */ + unsigned long user_call_ID; /* User's call ID */ + struct { + u32 hard; /* Maximum lifetime (sec) */ + u32 idle; /* Max time since last data packet (msec) */ + u32 normal; /* Max time since last call packet (msec) */ + } timeouts; + u8 nr_timeouts;/* Number of timeouts specified */ +}; + +struct rxrpc_send_params { + struct rxrpc_call_params call; + u32 abort_code; /* Abort code to Tx (if abort) */ + enum rxrpc_command command : 8;/* The command to implement */ + boolexclusive; /* Shared or exclusive call */ + boolupgrade;/* If the connection is upgradeable */ +}; + #include /* @@ -687,7 +716,7 @@ struct rxrpc_call *rxrpc_alloc_call(struct rxrpc_sock *, gfp_t); struct rxrpc_call *rxrpc_new_client_call(struct rxrpc_sock *, struct rxrpc_conn_parameters *, struct sockaddr_rxrpc *, -unsigned long, s64, gfp_t); +struct rxrpc_call_params *, gfp_t); int rxrpc_retry_client_call(struct rxrpc_sock *, struct rxrpc_call *, struct rxrpc_conn_parameters *, diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c index 1f141dc08ad2..c3e1fa854471 100644 --- a/net/rxrpc/call_object.c +++ b/net/rxrpc/call_object.c @@ -208,8 +208,7 @@ static void rxrpc_start_call_timer(struct rxrpc_call *call) struct rxrpc_call *rxrpc_new_client_call(struct rxrpc_sock *rx, struct rxrpc_conn_parameters *cp,
[PATCH net-next 04/12] rxrpc: Delay terminal ACK transmission on a client call
Delay terminal ACK transmission on a client call by deferring it to the connection processor. This allows it to be skipped if we can send the next call instead, the first DATA packet of which will implicitly ack this call. Signed-off-by: David Howells <dhowe...@redhat.com> --- net/rxrpc/ar-internal.h | 17 +++ net/rxrpc/conn_client.c | 18 +++ net/rxrpc/conn_event.c | 74 +++ net/rxrpc/conn_object.c | 10 ++ net/rxrpc/recvmsg.c |2 + 5 files changed, 108 insertions(+), 13 deletions(-) diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index a972887b3f5d..d1213d503f30 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -338,8 +338,17 @@ enum rxrpc_conn_flag { RXRPC_CONN_DONT_REUSE, /* Don't reuse this connection */ RXRPC_CONN_COUNTED, /* Counted by rxrpc_nr_client_conns */ RXRPC_CONN_PROBING_FOR_UPGRADE, /* Probing for service upgrade */ + RXRPC_CONN_FINAL_ACK_0, /* Need final ACK for channel 0 */ + RXRPC_CONN_FINAL_ACK_1, /* Need final ACK for channel 1 */ + RXRPC_CONN_FINAL_ACK_2, /* Need final ACK for channel 2 */ + RXRPC_CONN_FINAL_ACK_3, /* Need final ACK for channel 3 */ }; +#define RXRPC_CONN_FINAL_ACK_MASK ((1UL << RXRPC_CONN_FINAL_ACK_0) | \ + (1UL << RXRPC_CONN_FINAL_ACK_1) |\ + (1UL << RXRPC_CONN_FINAL_ACK_2) |\ + (1UL << RXRPC_CONN_FINAL_ACK_3)) + /* * Events that can be raised upon a connection. */ @@ -393,6 +402,7 @@ struct rxrpc_connection { #define RXRPC_ACTIVE_CHANS_MASK((1 << RXRPC_MAXCALLS) - 1) struct list_headwaiting_calls; /* Calls waiting for channels */ struct rxrpc_channel { + unsigned long final_ack_at; /* Time at which to issue final ACK */ struct rxrpc_call __rcu *call; /* Active call */ u32 call_id;/* ID of current call */ u32 call_counter; /* Call ID counter */ @@ -404,6 +414,7 @@ struct rxrpc_connection { }; } channels[RXRPC_MAXCALLS]; + struct timer_list timer; /* Conn event timer */ struct work_struct processor; /* connection event processor */ union { struct rb_node client_node;/* Node in local->client_conns */ @@ -861,6 +872,12 @@ static inline void rxrpc_put_connection(struct rxrpc_connection *conn) rxrpc_put_service_conn(conn); } +static inline void rxrpc_reduce_conn_timer(struct rxrpc_connection *conn, + unsigned long expire_at) +{ + timer_reduce(>timer, expire_at); +} + /* * conn_service.c */ diff --git a/net/rxrpc/conn_client.c b/net/rxrpc/conn_client.c index 5f9624bd311c..cfb997593da9 100644 --- a/net/rxrpc/conn_client.c +++ b/net/rxrpc/conn_client.c @@ -554,6 +554,11 @@ static void rxrpc_activate_one_channel(struct rxrpc_connection *conn, trace_rxrpc_client(conn, channel, rxrpc_client_chan_activate); + /* Cancel the final ACK on the previous call if it hasn't been sent yet +* as the DATA packet will implicitly ACK it. +*/ + clear_bit(RXRPC_CONN_FINAL_ACK_0 + channel, >flags); + write_lock_bh(>state_lock); if (!test_bit(RXRPC_CALL_TX_LASTQ, >flags)) call->state = RXRPC_CALL_CLIENT_SEND_REQUEST; @@ -813,6 +818,19 @@ void rxrpc_disconnect_client_call(struct rxrpc_call *call) goto out_2; } + /* Schedule the final ACK to be transmitted in a short while so that it +* can be skipped if we find a follow-on call. The first DATA packet +* of the follow on call will implicitly ACK this call. +*/ + if (test_bit(RXRPC_CALL_EXPOSED, >flags)) { + unsigned long final_ack_at = jiffies + 2; + + WRITE_ONCE(chan->final_ack_at, final_ack_at); + smp_wmb(); /* vs rxrpc_process_delayed_final_acks() */ + set_bit(RXRPC_CONN_FINAL_ACK_0 + channel, >flags); + rxrpc_reduce_conn_timer(conn, final_ack_at); + } + /* Things are more complex and we need the cache lock. We might be * able to simply idle the conn or it might now be lurking on the wait * list. It might even get moved back to the active list whilst we're diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c index 59a51a56e7c8..9e9a8db1bc9c 100644 --- a/net/rxrpc/conn_event.c +++ b/net/rxrpc/conn_event.c @@ -24,9 +24,10 @@ * Retransmit terminal ACK or ABORT of the previous call. */ static void rxrpc_conn_retransmit_call(struct rxrpc_connection *c
[PATCH net-next 02/12] rxrpc: Don't set upgrade by default in sendmsg()
Don't set upgrade by default when creating a call from sendmsg(). This is a holdover from when I was testing the code. Signed-off-by: David Howells <dhowe...@redhat.com> --- net/rxrpc/sendmsg.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c index 3a99b1a908df..94555c94b2d8 100644 --- a/net/rxrpc/sendmsg.c +++ b/net/rxrpc/sendmsg.c @@ -602,7 +602,7 @@ int rxrpc_do_sendmsg(struct rxrpc_sock *rx, struct msghdr *msg, size_t len) .abort_code = 0, .command= RXRPC_CMD_SEND_DATA, .exclusive = false, - .upgrade= true, + .upgrade= false, }; _enter("");
[PATCH net-next 07/12] rxrpc: Don't transmit DELAY ACKs immediately on proposal
Don't transmit a DELAY ACK immediately on proposal when the Rx window is rotated, but rather defer it to the work function. This means that we have a chance to queue/consume more received packets before we actually send the DELAY ACK, or even cancel it entirely, thereby reducing the number of packets transmitted. We do, however, want to continue sending other types of packet immediately, particularly REQUESTED ACKs, as they may be used for RTT calculation by the other side. Signed-off-by: David Howells <dhowe...@redhat.com> --- net/rxrpc/recvmsg.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c index 0b6609da80b7..fad5f42a3abd 100644 --- a/net/rxrpc/recvmsg.c +++ b/net/rxrpc/recvmsg.c @@ -219,9 +219,9 @@ static void rxrpc_rotate_rx_window(struct rxrpc_call *call) after_eq(top, call->ackr_seen + 2) || (hard_ack == top && after(hard_ack, call->ackr_consumed))) rxrpc_propose_ACK(call, RXRPC_ACK_DELAY, 0, serial, - true, false, + true, true, rxrpc_propose_ack_rotate_rx); - if (call->ackr_reason) + if (call->ackr_reason && call->ackr_reason != RXRPC_ACK_DELAY) rxrpc_send_ack_packet(call, false); } }
[PATCH net-next 06/12] rxrpc: Fix call timeouts
Fix the rxrpc call expiration timeouts and make them settable from userspace. By analogy with other rx implementations, there should be three timeouts: (1) "Normal timeout" This is set for all calls and is triggered if we haven't received any packets from the peer in a while. It is measured from the last time we received any packet on that call. This is not reset by any connection packets (such as CHALLENGE/RESPONSE packets). If a service operation takes a long time, the server should generate PING ACKs at a duration that's substantially less than the normal timeout so is to keep both sides alive. This is set at 1/6 of normal timeout. (2) "Idle timeout" This is set only for a service call and is triggered if we stop receiving the DATA packets that comprise the request data. It is measured from the last time we received a DATA packet. (3) "Hard timeout" This can be set for a call and specified the maximum lifetime of that call. It should not be specified by default. Some operations (such as volume transfer) take a long time. Allow userspace to set/change the timeouts on a call with sendmsg, using a control message: RXRPC_SET_CALL_TIMEOUTS The data to the message is a number of 32-bit words, not all of which need be given: u32 hard_timeout; /* sec from first packet */ u32 idle_timeout; /* msec from packet Rx */ u32 normal_timeout; /* msec from data Rx */ This can be set in combination with any other sendmsg() that affects a call. Signed-off-by: David Howells <dhowe...@redhat.com> --- include/trace/events/rxrpc.h | 69 +++- include/uapi/linux/rxrpc.h |1 net/rxrpc/ar-internal.h | 37 ++--- net/rxrpc/call_event.c | 179 -- net/rxrpc/call_object.c | 27 -- net/rxrpc/conn_client.c |4 - net/rxrpc/input.c| 34 +++- net/rxrpc/misc.c | 19 ++-- net/rxrpc/recvmsg.c |2 net/rxrpc/sendmsg.c | 59 +++--- net/rxrpc/sysctl.c | 60 +++--- 11 files changed, 290 insertions(+), 201 deletions(-) diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index ebe96796027a..01dcbc2164b5 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -138,10 +138,20 @@ enum rxrpc_rtt_rx_trace { enum rxrpc_timer_trace { rxrpc_timer_begin, + rxrpc_timer_exp_ack, + rxrpc_timer_exp_hard, + rxrpc_timer_exp_idle, + rxrpc_timer_exp_normal, + rxrpc_timer_exp_ping, + rxrpc_timer_exp_resend, rxrpc_timer_expired, rxrpc_timer_init_for_reply, rxrpc_timer_init_for_send_reply, + rxrpc_timer_restart, rxrpc_timer_set_for_ack, + rxrpc_timer_set_for_hard, + rxrpc_timer_set_for_idle, + rxrpc_timer_set_for_normal, rxrpc_timer_set_for_ping, rxrpc_timer_set_for_resend, rxrpc_timer_set_for_send, @@ -296,12 +306,22 @@ enum rxrpc_congest_change { #define rxrpc_timer_traces \ EM(rxrpc_timer_begin, "Begin ") \ EM(rxrpc_timer_expired, "*EXPR*") \ + EM(rxrpc_timer_exp_ack, "ExpAck") \ + EM(rxrpc_timer_exp_hard,"ExpHrd") \ + EM(rxrpc_timer_exp_idle,"ExpIdl") \ + EM(rxrpc_timer_exp_normal, "ExpNml") \ + EM(rxrpc_timer_exp_ping,"ExpPng") \ + EM(rxrpc_timer_exp_resend, "ExpRsn") \ EM(rxrpc_timer_init_for_reply, "IniRpl") \ EM(rxrpc_timer_init_for_send_reply, "SndRpl") \ + EM(rxrpc_timer_restart, "Restrt") \ EM(rxrpc_timer_set_for_ack, "SetAck") \ + EM(rxrpc_timer_set_for_hard,"SetHrd") \ + EM(rxrpc_timer_set_for_idle,"SetIdl") \ + EM(rxrpc_timer_set_for_normal, "SetNml") \ EM(rxrpc_timer_set_for_ping,"SetPng") \ EM(rxrpc_timer_set_for_resend, "SetRTx") \ - E_(rxrpc_timer_set_for_send,"SetTx ") + E_(rxrpc_timer_set_for_send,"SetSnd") #define rxrpc_propose_ack_traces \ EM(rxrpc_propose_ack_client_tx_end, "ClTxEnd") \ @@ -932,39 +952,44 @@ TRACE_EVENT(rxrpc_rtt_rx, TRACE_EVENT(rxrpc_timer, TP_PROTO(struct rxrpc_call *call, enum rxrpc_timer_trace why, -ktime_t now, unsigned long now_j), +unsigned long now), - TP_ARGS(call, why, now, now_j), + TP_ARGS(call, why, now), TP_STRUCT__entr
[PATCH net-next 09/12] rxrpc: Add a timeout for detecting lost ACKs/lost DATA
Add an extra timeout that is set/updated when we send a DATA packet that has the request-ack flag set. This allows us to detect if we don't get an ACK in response to the latest flagged packet. The ACK packet is adjudged to have been lost if it doesn't turn up within 2*RTT of the transmission. If the timeout occurs, we schedule the sending of a PING ACK to find out the state of the other side. If a new DATA packet is ready to go sooner, we cancel the sending of the ping and set the request-ack flag on that instead. If we get back a PING-RESPONSE ACK that indicates a lower tx_top than what we had at the time of the ping transmission, we adjudge all the DATA packets sent between the response tx_top and the ping-time tx_top to have been lost and retransmit immediately. Rather than sending a PING ACK, we could just pick a DATA packet and speculatively retransmit that with request-ack set. It should result in either a REQUESTED ACK or a DUPLICATE ACK which we can then use in lieu the a PING-RESPONSE ACK mentioned above. Signed-off-by: David Howells <dhowe...@redhat.com> --- include/trace/events/rxrpc.h | 11 +-- net/rxrpc/ar-internal.h |6 +- net/rxrpc/call_event.c | 26 ++ net/rxrpc/call_object.c |1 + net/rxrpc/input.c| 40 net/rxrpc/output.c | 20 ++-- net/rxrpc/recvmsg.c |4 ++-- net/rxrpc/sendmsg.c |2 +- 8 files changed, 98 insertions(+), 12 deletions(-) diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index 01dcbc2164b5..84ade8b76a19 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -141,6 +141,7 @@ enum rxrpc_timer_trace { rxrpc_timer_exp_ack, rxrpc_timer_exp_hard, rxrpc_timer_exp_idle, + rxrpc_timer_exp_lost_ack, rxrpc_timer_exp_normal, rxrpc_timer_exp_ping, rxrpc_timer_exp_resend, @@ -151,6 +152,7 @@ enum rxrpc_timer_trace { rxrpc_timer_set_for_ack, rxrpc_timer_set_for_hard, rxrpc_timer_set_for_idle, + rxrpc_timer_set_for_lost_ack, rxrpc_timer_set_for_normal, rxrpc_timer_set_for_ping, rxrpc_timer_set_for_resend, @@ -309,6 +311,7 @@ enum rxrpc_congest_change { EM(rxrpc_timer_exp_ack, "ExpAck") \ EM(rxrpc_timer_exp_hard,"ExpHrd") \ EM(rxrpc_timer_exp_idle,"ExpIdl") \ + EM(rxrpc_timer_exp_lost_ack,"ExpLoA") \ EM(rxrpc_timer_exp_normal, "ExpNml") \ EM(rxrpc_timer_exp_ping,"ExpPng") \ EM(rxrpc_timer_exp_resend, "ExpRsn") \ @@ -318,6 +321,7 @@ enum rxrpc_congest_change { EM(rxrpc_timer_set_for_ack, "SetAck") \ EM(rxrpc_timer_set_for_hard,"SetHrd") \ EM(rxrpc_timer_set_for_idle,"SetIdl") \ + EM(rxrpc_timer_set_for_lost_ack,"SetLoA") \ EM(rxrpc_timer_set_for_normal, "SetNml") \ EM(rxrpc_timer_set_for_ping,"SetPng") \ EM(rxrpc_timer_set_for_resend, "SetRTx") \ @@ -961,6 +965,7 @@ TRACE_EVENT(rxrpc_timer, __field(enum rxrpc_timer_trace, why ) __field(long, now ) __field(long, ack_at ) + __field(long, ack_lost_at ) __field(long, resend_at ) __field(long, ping_at ) __field(long, expect_rx_by ) @@ -974,6 +979,7 @@ TRACE_EVENT(rxrpc_timer, __entry->why= why; __entry->now= now; __entry->ack_at = call->ack_at; + __entry->ack_lost_at= call->ack_lost_at; __entry->resend_at = call->resend_at; __entry->expect_rx_by = call->expect_rx_by; __entry->expect_req_by = call->expect_req_by; @@ -981,10 +987,11 @@ TRACE_EVENT(rxrpc_timer, __entry->timer = call->timer.expires; ), - TP_printk("c=%p %s a=%ld r=%ld xr=%ld xq=%ld xt=%ld t=%ld", + TP_printk("c=%p %s a=%ld la=%ld r=%ld xr=%ld xq=%ld xt=%ld t=%ld", __entry->call, __print_symbolic(__entry->why, rxrpc_timer_traces),
[PATCH net-next 11/12] rxrpc: Fix service endpoint expiry
RxRPC service endpoints expire like they're supposed to by the following means: (1) Mark dead rxrpc_net structs (with ->live) rather than twiddling the global service conn timeout, otherwise the first rxrpc_net struct to die will cause connections on all others to expire immediately from then on. (2) Mark local service endpoints for which the socket has been closed (->service_closed) so that the expiration timeout can be much shortened for service and client connections going through that endpoint. (3) rxrpc_put_service_conn() needs to schedule the reaper when the usage count reaches 1, not 0, as idle conns have a 1 count. (4) The accumulator for the earliest time we might want to schedule for should be initialised to jiffies + MAX_JIFFY_OFFSET, not ULONG_MAX as the comparison functions use signed arithmetic. (5) Simplify the expiration handling, adding the expiration value to the idle timestamp each time rather than keeping track of the time in the past before which the idle timestamp must go to be expired. This is much easier to read. (6) Ignore the timeouts if the net namespace is dead. (7) Restart the service reaper work item rather the client reaper. Signed-off-by: David Howells <dhowe...@redhat.com> --- include/trace/events/rxrpc.h |2 ++ net/rxrpc/af_rxrpc.c | 13 + net/rxrpc/ar-internal.h |3 +++ net/rxrpc/conn_client.c |2 ++ net/rxrpc/conn_object.c | 42 -- net/rxrpc/net_ns.c |3 +++ 6 files changed, 47 insertions(+), 18 deletions(-) diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index e98fed6de497..36cb50c111a6 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -49,6 +49,7 @@ enum rxrpc_conn_trace { rxrpc_conn_put_client, rxrpc_conn_put_service, rxrpc_conn_queued, + rxrpc_conn_reap_service, rxrpc_conn_seen, }; @@ -221,6 +222,7 @@ enum rxrpc_congest_change { EM(rxrpc_conn_put_client, "PTc") \ EM(rxrpc_conn_put_service, "PTs") \ EM(rxrpc_conn_queued, "QUE") \ + EM(rxrpc_conn_reap_service, "RPs") \ E_(rxrpc_conn_seen, "SEE") #define rxrpc_client_traces \ diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c index c0cdcf980ffc..abb524c2b8f8 100644 --- a/net/rxrpc/af_rxrpc.c +++ b/net/rxrpc/af_rxrpc.c @@ -867,6 +867,19 @@ static int rxrpc_release_sock(struct sock *sk) sock_orphan(sk); sk->sk_shutdown = SHUTDOWN_MASK; + /* We want to kill off all connections from a service socket +* as fast as possible because we can't share these; client +* sockets, on the other hand, can share an endpoint. +*/ + switch (sk->sk_state) { + case RXRPC_SERVER_BOUND: + case RXRPC_SERVER_BOUND2: + case RXRPC_SERVER_LISTENING: + case RXRPC_SERVER_LISTEN_DISABLED: + rx->local->service_closed = true; + break; + } + spin_lock_bh(>sk_receive_queue.lock); sk->sk_state = RXRPC_CLOSE; spin_unlock_bh(>sk_receive_queue.lock); diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index cdcbc798f921..a0082c407005 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -84,6 +84,7 @@ struct rxrpc_net { unsigned intnr_client_conns; unsigned intnr_active_client_conns; boolkill_all_client_conns; + boollive; spinlock_t client_conn_cache_lock; /* Lock for ->*_client_conns */ spinlock_t client_conn_discard_lock; /* Prevent multiple discarders */ struct list_headwaiting_client_conns; @@ -265,6 +266,7 @@ struct rxrpc_local { rwlock_tservices_lock; /* lock for services list */ int debug_id; /* debug ID for printks */ booldead; + boolservice_closed; /* Service socket closed */ struct sockaddr_rxrpc srx;/* local address */ }; @@ -881,6 +883,7 @@ void rxrpc_process_connection(struct work_struct *); * conn_object.c */ extern unsigned int rxrpc_connection_expiry; +extern unsigned int rxrpc_closed_conn_expiry; struct rxrpc_connection *rxrpc_alloc_connection(gfp_t); struct rxrpc_connection *rxrpc_find_connection_rcu(struct rxrpc_local *, diff --git a/net/rxrpc/conn_client.c b/net/rxrpc/conn_client.c index 97f6a8de4845..785dfdb9fef1 100644 --- a/net/rxrpc/conn_client.c +++ b/net/rxrpc/conn_client.c @@ -1079,6 +1079,8 @@ void rxrpc_discard_expired_client_conns(struct work_struct *work) expiry = rxrpc
[PATCH net-next 10/12] rxrpc: Add keepalive for a call
We need to transmit a packet every so often to act as a keepalive for the peer (which has a timeout from the last time it received a packet) and also to prevent any intervening firewalls from closing the route. Do this by resetting a timer every time we transmit a packet. If the timer ever expires, we transmit a PING ACK packet and thereby also elicit a PING RESPONSE ACK from the other side - which prevents our last-rx timeout from expiring. The timer is set to 1/6 of the last-rx timeout so that we can detect the other side going away if it misses 6 replies in a row. This is particularly necessary for servers where the processing of the service function may take a significant amount of time. Signed-off-by: David Howells <dhowe...@redhat.com> --- include/trace/events/rxrpc.h |6 ++ net/rxrpc/ar-internal.h |1 + net/rxrpc/call_event.c | 10 ++ net/rxrpc/output.c | 23 +++ 4 files changed, 40 insertions(+) diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index 84ade8b76a19..e98fed6de497 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -141,6 +141,7 @@ enum rxrpc_timer_trace { rxrpc_timer_exp_ack, rxrpc_timer_exp_hard, rxrpc_timer_exp_idle, + rxrpc_timer_exp_keepalive, rxrpc_timer_exp_lost_ack, rxrpc_timer_exp_normal, rxrpc_timer_exp_ping, @@ -152,6 +153,7 @@ enum rxrpc_timer_trace { rxrpc_timer_set_for_ack, rxrpc_timer_set_for_hard, rxrpc_timer_set_for_idle, + rxrpc_timer_set_for_keepalive, rxrpc_timer_set_for_lost_ack, rxrpc_timer_set_for_normal, rxrpc_timer_set_for_ping, @@ -162,6 +164,7 @@ enum rxrpc_timer_trace { enum rxrpc_propose_ack_trace { rxrpc_propose_ack_client_tx_end, rxrpc_propose_ack_input_data, + rxrpc_propose_ack_ping_for_keepalive, rxrpc_propose_ack_ping_for_lost_ack, rxrpc_propose_ack_ping_for_lost_reply, rxrpc_propose_ack_ping_for_params, @@ -311,6 +314,7 @@ enum rxrpc_congest_change { EM(rxrpc_timer_exp_ack, "ExpAck") \ EM(rxrpc_timer_exp_hard,"ExpHrd") \ EM(rxrpc_timer_exp_idle,"ExpIdl") \ + EM(rxrpc_timer_exp_keepalive, "ExpKA ") \ EM(rxrpc_timer_exp_lost_ack,"ExpLoA") \ EM(rxrpc_timer_exp_normal, "ExpNml") \ EM(rxrpc_timer_exp_ping,"ExpPng") \ @@ -321,6 +325,7 @@ enum rxrpc_congest_change { EM(rxrpc_timer_set_for_ack, "SetAck") \ EM(rxrpc_timer_set_for_hard,"SetHrd") \ EM(rxrpc_timer_set_for_idle,"SetIdl") \ + EM(rxrpc_timer_set_for_keepalive, "KeepAl") \ EM(rxrpc_timer_set_for_lost_ack,"SetLoA") \ EM(rxrpc_timer_set_for_normal, "SetNml") \ EM(rxrpc_timer_set_for_ping,"SetPng") \ @@ -330,6 +335,7 @@ enum rxrpc_congest_change { #define rxrpc_propose_ack_traces \ EM(rxrpc_propose_ack_client_tx_end, "ClTxEnd") \ EM(rxrpc_propose_ack_input_data,"DataIn ") \ + EM(rxrpc_propose_ack_ping_for_keepalive, "KeepAlv") \ EM(rxrpc_propose_ack_ping_for_lost_ack, "LostAck") \ EM(rxrpc_propose_ack_ping_for_lost_reply, "LostRpl") \ EM(rxrpc_propose_ack_ping_for_params, "Params ") \ diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index 7e7b817c69f0..cdcbc798f921 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -519,6 +519,7 @@ struct rxrpc_call { unsigned long ack_lost_at;/* When ACK is figured as lost */ unsigned long resend_at; /* When next resend needs to happen */ unsigned long ping_at;/* When next to send a ping */ + unsigned long keepalive_at; /* When next to send a keepalive ping */ unsigned long expect_rx_by; /* When we expect to get a packet by */ unsigned long expect_req_by; /* When we expect to get a request DATA packet by */ unsigned long expect_term_by; /* When we expect call termination by */ diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c index c65666b2f39e..bda952ffe6a6 100644 --- a/net/rxrpc/call_event.c +++ b/net/rxrpc/call_event.c @@ -366,6 +366,15 @@ void rxrpc_process_call(struct work_struct *work) set_bit(RXRPC_CALL_EV_ACK_LOST, >events); } + t = READ_ONCE(call->keepalive_at); + if (time_after_eq(now, t)) { + trace_rxrpc_timer(call, rxrpc_timer_exp_keepalive, now); + cmpxchg(>keepalive_at, t,
[PATCH net-next 12/12] rxrpc: Fix conn expiry timers
Fix the rxrpc connection expiry timers so that connections for closed AF_RXRPC sockets get deleted in a more timely fashion, freeing up the transport UDP port much more quickly. (1) Replace the delayed work items with work items plus timers so that timer_reduce() can be used to shorten them and so that the timer doesn't requeue the work item if the net namespace is dead. (2) Don't use queue_delayed_work() as that won't alter the timeout if the timer is already running. (3) Don't rearm the timers if the network namespace is dead. Signed-off-by: David Howells <dhowe...@redhat.com> --- net/rxrpc/af_rxrpc.c|2 ++ net/rxrpc/ar-internal.h |6 -- net/rxrpc/conn_client.c | 30 +++--- net/rxrpc/conn_object.c | 28 +--- net/rxrpc/net_ns.c | 30 ++ 5 files changed, 68 insertions(+), 28 deletions(-) diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c index abb524c2b8f8..8f7cf4c042be 100644 --- a/net/rxrpc/af_rxrpc.c +++ b/net/rxrpc/af_rxrpc.c @@ -895,6 +895,8 @@ static int rxrpc_release_sock(struct sock *sk) rxrpc_release_calls_on_socket(rx); flush_workqueue(rxrpc_workqueue); rxrpc_purge_queue(>sk_receive_queue); + rxrpc_queue_work(>local->rxnet->service_conn_reaper); + rxrpc_queue_work(>local->rxnet->client_conn_reaper); rxrpc_put_local(rx->local); rx->local = NULL; diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index a0082c407005..416688381eb7 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -79,7 +79,8 @@ struct rxrpc_net { struct list_headconn_proc_list; /* List of conns in this namespace for proc */ struct list_headservice_conns; /* Service conns in this namespace */ rwlock_tconn_lock; /* Lock for ->conn_proc_list, ->service_conns */ - struct delayed_work service_conn_reaper; + struct work_struct service_conn_reaper; + struct timer_list service_conn_reap_timer; unsigned intnr_client_conns; unsigned intnr_active_client_conns; @@ -90,7 +91,8 @@ struct rxrpc_net { struct list_headwaiting_client_conns; struct list_headactive_client_conns; struct list_headidle_client_conns; - struct delayed_work client_conn_reaper; + struct work_struct client_conn_reaper; + struct timer_list client_conn_reap_timer; struct list_headlocal_endpoints; struct mutexlocal_mutex;/* Lock for ->local_endpoints */ diff --git a/net/rxrpc/conn_client.c b/net/rxrpc/conn_client.c index 785dfdb9fef1..7f74ca3059f8 100644 --- a/net/rxrpc/conn_client.c +++ b/net/rxrpc/conn_client.c @@ -691,7 +691,7 @@ int rxrpc_connect_call(struct rxrpc_call *call, _enter("{%d,%lx},", call->debug_id, call->user_call_ID); - rxrpc_discard_expired_client_conns(>client_conn_reaper.work); + rxrpc_discard_expired_client_conns(>client_conn_reaper); rxrpc_cull_active_client_conns(rxnet); ret = rxrpc_get_client_conn(call, cp, srx, gfp); @@ -757,6 +757,18 @@ void rxrpc_expose_client_call(struct rxrpc_call *call) } /* + * Set the reap timer. + */ +static void rxrpc_set_client_reap_timer(struct rxrpc_net *rxnet) +{ + unsigned long now = jiffies; + unsigned long reap_at = now + rxrpc_conn_idle_client_expiry; + + if (rxnet->live) + timer_reduce(>client_conn_reap_timer, reap_at); +} + +/* * Disconnect a client call. */ void rxrpc_disconnect_client_call(struct rxrpc_call *call) @@ -896,9 +908,7 @@ void rxrpc_disconnect_client_call(struct rxrpc_call *call) list_move_tail(>cache_link, >idle_client_conns); if (rxnet->idle_client_conns.next == >cache_link && !rxnet->kill_all_client_conns) - queue_delayed_work(rxrpc_workqueue, - >client_conn_reaper, - rxrpc_conn_idle_client_expiry); + rxrpc_set_client_reap_timer(rxnet); } else { trace_rxrpc_client(conn, channel, rxrpc_client_to_inactive); conn->cache_state = RXRPC_CONN_CLIENT_INACTIVE; @@ -1036,8 +1046,7 @@ void rxrpc_discard_expired_client_conns(struct work_struct *work) { struct rxrpc_connection *conn; struct rxrpc_net *rxnet = - container_of(to_delayed_work(work), -struct rxrpc_net, client_conn_reaper); + container_of(work, struct rxrpc_net, client_conn_reaper); unsigned long expiry, conn_expires_at, now; unsigned int nr_conns; bool did_discard = false; @@ -1116,9 +1125
[PATCH net-next 08/12] rxrpc: Express protocol timeouts in terms of RTT
Express protocol timeouts for data retransmission and deferred ack generation in terms on RTT rather than specified timeouts once we have sufficient RTT samples. For the moment, this requires just one RTT sample to be able to use this for ack deferral and two for data retransmission. The data retransmission timeout is set at RTT*1.5 and the ACK deferral timeout is set at RTT. Note that the calculated timeout is limited to a minimum of 4ns to make sure it doesn't happen too quickly. Signed-off-by: David Howells <dhowe...@redhat.com> --- net/rxrpc/call_event.c | 22 ++ net/rxrpc/sendmsg.c|7 +++ 2 files changed, 25 insertions(+), 4 deletions(-) diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c index c14395d5ad8c..da91f16ac77c 100644 --- a/net/rxrpc/call_event.c +++ b/net/rxrpc/call_event.c @@ -52,7 +52,7 @@ static void __rxrpc_propose_ACK(struct rxrpc_call *call, u8 ack_reason, enum rxrpc_propose_ack_trace why) { enum rxrpc_propose_ack_outcome outcome = rxrpc_propose_ack_use; - unsigned long now, ack_at, expiry = rxrpc_soft_ack_delay; + unsigned long expiry = rxrpc_soft_ack_delay; s8 prior = rxrpc_ack_priority[ack_reason]; /* Pings are handled specially because we don't want to accidentally @@ -116,7 +116,13 @@ static void __rxrpc_propose_ACK(struct rxrpc_call *call, u8 ack_reason, background) rxrpc_queue_call(call); } else { - now = jiffies; + unsigned long now = jiffies, ack_at; + + if (call->peer->rtt_usage > 0) + ack_at = nsecs_to_jiffies(call->peer->rtt); + else + ack_at = expiry; + ack_at = jiffies + expiry; if (time_before(ack_at, call->ack_at)) { WRITE_ONCE(call->ack_at, ack_at); @@ -160,14 +166,22 @@ static void rxrpc_resend(struct rxrpc_call *call, unsigned long now_j) struct sk_buff *skb; unsigned long resend_at; rxrpc_seq_t cursor, seq, top; - ktime_t now, max_age, oldest, ack_ts; + ktime_t now, max_age, oldest, ack_ts, timeout, min_timeo; int ix; u8 annotation, anno_type, retrans = 0, unacked = 0; _enter("{%d,%d}", call->tx_hard_ack, call->tx_top); + if (call->peer->rtt_usage > 1) + timeout = ns_to_ktime(call->peer->rtt * 3 / 2); + else + timeout = ms_to_ktime(rxrpc_resend_timeout); + min_timeo = ns_to_ktime((10 / HZ) * 4); + if (ktime_before(timeout, min_timeo)) + timeout = min_timeo; + now = ktime_get_real(); - max_age = ktime_sub_ms(now, rxrpc_resend_timeout * 1000 / HZ); + max_age = ktime_sub(now, timeout); spin_lock_bh(>lock); diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c index 03e0676db28c..c56ee54fdd1f 100644 --- a/net/rxrpc/sendmsg.c +++ b/net/rxrpc/sendmsg.c @@ -226,6 +226,13 @@ static void rxrpc_queue_packet(struct rxrpc_sock *rx, struct rxrpc_call *call, } else { unsigned long now = jiffies, resend_at; + if (call->peer->rtt_usage > 1) + resend_at = nsecs_to_jiffies(call->peer->rtt * 3 / 2); + else + resend_at = rxrpc_resend_timeout; + if (resend_at < 1) + resend_at = 1; + resend_at = now + rxrpc_resend_timeout; WRITE_ONCE(call->resend_at, resend_at); rxrpc_reduce_call_timer(call, resend_at, now,
[PATCH net-next 2/3] rxrpc: Fix a null ptr deref in rxrpc_fill_out_ack()
rxrpc_fill_out_ack() needs to be passed the connection pointer from its caller rather than using call->conn as the call may be disconnected in parallel with it, clearing call->conn, leading to: BUG: unable to handle kernel NULL pointer dereference at 0010 IP: rxrpc_send_ack_packet+0x231/0x6a4 Signed-off-by: David Howells <dhowe...@redhat.com> --- net/rxrpc/output.c |9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c index 71e6f713fbe7..8ee8b2d4a3eb 100644 --- a/net/rxrpc/output.c +++ b/net/rxrpc/output.c @@ -35,7 +35,8 @@ struct rxrpc_abort_buffer { /* * Fill out an ACK packet. */ -static size_t rxrpc_fill_out_ack(struct rxrpc_call *call, +static size_t rxrpc_fill_out_ack(struct rxrpc_connection *conn, +struct rxrpc_call *call, struct rxrpc_ack_buffer *pkt, rxrpc_seq_t *_hard_ack, rxrpc_seq_t *_top, @@ -77,8 +78,8 @@ static size_t rxrpc_fill_out_ack(struct rxrpc_call *call, } while (before_eq(seq, top)); } - mtu = call->conn->params.peer->if_mtu; - mtu -= call->conn->params.peer->hdrsize; + mtu = conn->params.peer->if_mtu; + mtu -= conn->params.peer->hdrsize; jmax = (call->nr_jumbo_bad > 3) ? 1 : rxrpc_rx_jumbo_max; pkt->ackinfo.rxMTU = htonl(rxrpc_rx_mtu); pkt->ackinfo.maxMTU = htonl(mtu); @@ -148,7 +149,7 @@ int rxrpc_send_ack_packet(struct rxrpc_call *call, bool ping) } call->ackr_reason = 0; } - n = rxrpc_fill_out_ack(call, pkt, _ack, , reason); + n = rxrpc_fill_out_ack(conn, call, pkt, _ack, , reason); spin_unlock_bh(>lock);
[PATCH net-next 0/3] rxrpc: Fixes
Here are some patches that fix some things in AF_RXRPC: (1) Prevent notifications from being passed to a kernel service for a call that it has ended. (2) Fix a null pointer deference that occurs under some circumstances when an ACK is generated. (3) Fix a number of things to do with call expiration. The patches can be found here also: http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-next Tagged thusly: git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git rxrpc-next-2017 David --- David Howells (3): rxrpc: Lock around calling a kernel service Rx notification rxrpc: Fix a null ptr deref in rxrpc_fill_out_ack() rxrpc: Fix call expiry handling net/rxrpc/af_rxrpc.c| 16 net/rxrpc/ar-internal.h |1 + net/rxrpc/call_event.c |2 +- net/rxrpc/call_object.c |1 + net/rxrpc/input.c |2 -- net/rxrpc/output.c | 19 +++ net/rxrpc/recvmsg.c |2 ++ 7 files changed, 36 insertions(+), 7 deletions(-)
[PATCH net-next 1/3] rxrpc: Lock around calling a kernel service Rx notification
Place a spinlock around the invocation of call->notify_rx() for a kernel service call and lock again when ending the call and replace the notification pointer with a pointer to a dummy function. This is required because it's possible for rxrpc_notify_socket() to be called after the call has been ended by the kernel service if called from the asynchronous work function rxrpc_process_call(). However, rxrpc_notify_socket() currently only holds the RCU read lock when invoking ->notify_rx(), which means that the afs_call struct would need to be disposed of by call_rcu() rather than by kfree(). But we shouldn't see any notifications from a call after calling rxrpc_kernel_end_call(), so a lock is required in rxrpc code. Without this, we may see the call wait queue as having a corrupt spinlock: BUG: spinlock bad magic on CPU#0, kworker/0:2/1612 general protection fault: [#1] SMP ... Workqueue: krxrpcd rxrpc_process_call task: 88040b83c400 task.stack: 88040adfc000 RIP: 0010:spin_bug+0x161/0x18f RSP: 0018:88040adffcc0 EFLAGS: 00010002 RAX: 0032 RBX: 6b6b6b6b6b6b6b6b RCX: 81ab16cf RDX: 88041fa14c01 RSI: 88041fa0ccb8 RDI: 88041fa0ccb8 RBP: 88040adffcd8 R08: R09: R10: 88040adffc60 R11: 022c R12: 88040aca2208 R13: 81a58114 R14: R15: Call Trace: do_raw_spin_lock+0x1d/0x89 _raw_spin_lock_irqsave+0x3d/0x49 ? __wake_up_common_lock+0x4c/0xa7 __wake_up_common_lock+0x4c/0xa7 ? __lock_is_held+0x47/0x7a __wake_up+0xe/0x10 afs_wake_up_call_waiter+0x11b/0x122 [kafs] rxrpc_notify_socket+0x12b/0x258 rxrpc_process_call+0x18e/0x7d0 process_one_work+0x298/0x4de ? rescuer_thread+0x280/0x280 worker_thread+0x1d1/0x2ae ? rescuer_thread+0x280/0x280 kthread+0x12c/0x134 ? kthread_create_on_node+0x3a/0x3a ret_from_fork+0x27/0x40 In this case, note the corrupt data in EBX. The address of the offending afs_call is in R12, plus the offset to the spinlock. Signed-off-by: David Howells <dhowe...@redhat.com> --- net/rxrpc/af_rxrpc.c| 16 net/rxrpc/ar-internal.h |1 + net/rxrpc/call_object.c |1 + net/rxrpc/recvmsg.c |2 ++ 4 files changed, 20 insertions(+) diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c index 344b2dcad52d..9b5c46b052fd 100644 --- a/net/rxrpc/af_rxrpc.c +++ b/net/rxrpc/af_rxrpc.c @@ -322,6 +322,14 @@ struct rxrpc_call *rxrpc_kernel_begin_call(struct socket *sock, } EXPORT_SYMBOL(rxrpc_kernel_begin_call); +/* + * Dummy function used to stop the notifier talking to recvmsg(). + */ +static void rxrpc_dummy_notify_rx(struct sock *sk, struct rxrpc_call *rxcall, + unsigned long call_user_ID) +{ +} + /** * rxrpc_kernel_end_call - Allow a kernel service to end a call it was using * @sock: The socket the call is on @@ -336,6 +344,14 @@ void rxrpc_kernel_end_call(struct socket *sock, struct rxrpc_call *call) mutex_lock(>user_mutex); rxrpc_release_call(rxrpc_sk(sock->sk), call); + + /* Make sure we're not going to call back into a kernel service */ + if (call->notify_rx) { + spin_lock_bh(>notify_lock); + call->notify_rx = rxrpc_dummy_notify_rx; + spin_unlock_bh(>notify_lock); + } + mutex_unlock(>user_mutex); rxrpc_put_call(call, rxrpc_call_put_kernel); } diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index ea5600b747cc..b2151993d384 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -525,6 +525,7 @@ struct rxrpc_call { unsigned long flags; unsigned long events; spinlock_t lock; + spinlock_t notify_lock;/* Kernel notification lock */ rwlock_tstate_lock; /* lock for state transition */ u32 abort_code; /* Local/remote abort code */ int error; /* Local error incurred */ diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c index fcdd6555a820..4c7fbc6dcce7 100644 --- a/net/rxrpc/call_object.c +++ b/net/rxrpc/call_object.c @@ -124,6 +124,7 @@ struct rxrpc_call *rxrpc_alloc_call(gfp_t gfp) INIT_LIST_HEAD(>sock_link); init_waitqueue_head(>waitq); spin_lock_init(>lock); + spin_lock_init(>notify_lock); rwlock_init(>state_lock); atomic_set(>usage, 1); call->debug_id = atomic_inc_return(_debug_id); diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c index e4937b3f3685..8510a98b87e1 100644 --- a/net/rxrpc/recvmsg.c +++ b/net/rxrpc/recvmsg.c @@ -40,7 +40,9 @@ void rxrpc_notify_socket(struct rxrpc_call *call) sk = >sk; if (rx
[PATCH net-next 3/3] rxrpc: Fix call expiry handling
Fix call expiry handling in the following ways (1) If all the request data from a client call is acked, don't send a follow up IDLE ACK with firstPacket == 1 and previousPacket == 0 as this appears to fool some servers into thinking everything has been accepted. (2) Never send an abort back to the server once it has ACK'd all the request packets; rather just try to reuse the channel for the next call. The first request DATA packet of the next call on the same channel will implicitly ACK the entire reply of the dead call - even if we haven't transmitted it yet. (3) Don't send RX_CALL_TIMEOUT in an ABORT packet, librx uses abort codes to pass local errors to the caller in addition to remote errors, and this is meant to be local only. The following also need to be addressed in future patches: (4) Service calls should send PING ACKs as 'keep alives' if the server is still processing the call. (5) VERSION REPLY packets should be sent to the peers of service connections to act as keep-alives. This is used to keep firewall routes in place. The AFS CM should enable this. Signed-off-by: David Howells <dhowe...@redhat.com> --- net/rxrpc/call_event.c |2 +- net/rxrpc/input.c |2 -- net/rxrpc/output.c | 10 ++ 3 files changed, 11 insertions(+), 3 deletions(-) diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c index 7a77844aab16..3574508baf9a 100644 --- a/net/rxrpc/call_event.c +++ b/net/rxrpc/call_event.c @@ -386,7 +386,7 @@ void rxrpc_process_call(struct work_struct *work) now = ktime_get_real(); if (ktime_before(call->expire_at, now)) { - rxrpc_abort_call("EXP", call, 0, RX_CALL_TIMEOUT, -ETIME); + rxrpc_abort_call("EXP", call, 0, RX_USER_ABORT, -ETIME); set_bit(RXRPC_CALL_EV_ABORT, >events); goto recheck_state; } diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index 1e37eb1c0c66..1b592073ec96 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -298,8 +298,6 @@ static bool rxrpc_end_tx_phase(struct rxrpc_call *call, bool reply_begun, write_unlock(>state_lock); if (call->state == RXRPC_CALL_CLIENT_AWAIT_REPLY) { - rxrpc_propose_ACK(call, RXRPC_ACK_IDLE, 0, 0, false, true, - rxrpc_propose_ack_client_tx_end); trace_rxrpc_transmit(call, rxrpc_transmit_await_reply); } else { trace_rxrpc_transmit(call, rxrpc_transmit_end); diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c index 8ee8b2d4a3eb..f47659c7b224 100644 --- a/net/rxrpc/output.c +++ b/net/rxrpc/output.c @@ -222,6 +222,16 @@ int rxrpc_send_abort_packet(struct rxrpc_call *call) rxrpc_serial_t serial; int ret; + /* Don't bother sending aborts for a client call once the server has +* hard-ACK'd all of its request data. After that point, we're not +* going to stop the operation proceeding, and whilst we might limit +* the reply, it's not worth it if we can send a new call on the same +* channel instead, thereby closing off this call. +*/ + if (rxrpc_is_client_call(call) && + test_bit(RXRPC_CALL_TX_LAST, >flags)) + return 0; + spin_lock_bh(>lock); if (call->conn) conn = rxrpc_get_connection_maybe(call->conn);
Re: [PATCH 18/27] bpf: Restrict kernel image access functions when the kernel is locked down
Alexei Starovoitovwrote: > > TBH, I've no idea how bpf does anything, so I can't say whether this is > > better, overkill or insufficient. > > ok. To make it clear: > Nacked-by: Alexei Starovoitov > For the current patch. > Unnecessary checks for no good reason in performance critical > functions are not acceptable. They aren't unnecessary checks. Can you please suggest if there's some way more suitable than just killing bpf entirely? I don't know the code, and I presume you do. David
Re: Is there a race between __mod_timer() and del_timer()?
David Howells <dhowe...@redhat.com> wrote: > I think it might just be best to put a note in the comments in __mod_timer(). How about the attached? David --- commit d538c734f9bf885292b88a81a06c5efee528d70d Author: David Howells <dhowe...@redhat.com> Date: Wed Nov 8 10:20:27 2017 + Add a comment into __mod_timer() noting a possible race with del_timer() Add a comment into __mod_timer() noting a possible race with del_timer() in which the 'common optimization' case could leave the timer unstarted if del_timer() happens between the timer_pending() check and the timer expiration check. Signed-off-by: David Howells <dhowe...@redhat.com> diff --git a/kernel/time/timer.c b/kernel/time/timer.c index f2674a056c26..e0ac4486529c 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -949,6 +949,9 @@ __mod_timer(struct timer_list *timer, unsigned long expires, bool pending_only) * The downside of this optimization is that it can result in * larger granularity than you would get from adding a new * timer with this expiry. +* +* Note that if del_timer() happens whilst we're just here, we +* will return with the timer unstarted. */ if (timer->expires == expires) return 1;
Is there a race between __mod_timer() and del_timer()?
Is there a race between the optimisation for networking code in __mod_timer() and del_timer() - or, at least, a race that matters? Consider: CPU A CPU B === === [timer X is active] ==>__mod_timer(X) if (timer_pending(timer)) [Take the true path] -- IRQ -- ==>del_timer(X) <== if (timer->expires == expires) [Take the true path] <==return 1 [timer X is not active] There's no locking to prevent this, but __mod_timer() returns without restarting the timer. I'm not sure this is a problem exactly, however, since del_timer() *was* issued, and could've deleted the timer after __mod_timer() returned. A couple of possible alleviations: (1) Recheck timer_pending() before returning from __mod_timer(). (2) Set timer->expires to jiffies in del_timer() - but since there's nothing preventing the optimisation in __mod_timer() from occurring concurrently with del_timer(), this probably won't help. I think it might just be best to put a note in the comments in __mod_timer(). Thoughts? David