[no subject]

2018-10-19 Thread David Howells
David Miller  wrote:

> > Is there going to be a merge of net into net-next before the merge
> > window opens?  Or do you have a sample merge that I can rebase my
> > afs-next branch on?
> 
> I'll be doing a net to net-next merge some time today.

Excellent, thanks!

David


[no subject]

2018-10-19 Thread David Howells
Hi Dave,

Is there going to be a merge of net into net-next before the merge window
opens?  Or do you have a sample merge that I can rebase my afs-next branch on?

The problem I have is that there's a really necessary patch in net that's not
in net-next:

d7b4c24f45d2efe51b8f213da4593fefd49240ba
rxrpc: Fix an uninitialised variable

(it fixes a fix that went in net just before you last merged it into
net-next).

So I would like to base my branch on both net and net-next, but the merge is
non-trivial, and I'd rather not hand Linus a merge that conflicts with yours.

The issues are:

 (*) net/sched/cls_api.c

 I think nlmsg_parse() needs to take both rtm_tca_policy and cb->extack as
 its last two arguments.  Each branch fills in one argument and leaves the
 other NULL.

 (*) net/ipv4/ipmr_base.c

 mr_rtm_dumproute() got a piece abstracted out and modified in one branch,
 but the unabstracted branch has a fix in the same area.  I think the
 thing to do is to apply the fix (removing the same two lines) from the
 abstracted-out branch.

Thanks,
David


Re: [PATCH net] rxrpc: Fix incorrect conditional on IPV6

2018-10-12 Thread David Howells
David Miller  wrote:

> > Nit : Correct attribution would require a Reported-by: tag
> 
> Right.

And I've posted a new version with that and the reviewed-by from Arnd.

David


Re: [PATCH net] rxrpc: Fix incorrect conditional on IPV6

2018-10-12 Thread David Howells
Eric Dumazet  wrote:

> Nit : Correct attribution would require a Reported-by: tag

Point.  I'll repost it with a To: line also, I'm waiting to see if Arnd will
ack it.

David


Re: [PATCH net-next] rxrpc: Remove set but not used variable 'ioc'

2018-10-11 Thread David Howells
Dan Carpenter  wrote:

> I told him to do that because it wasn't a bugfix...  Probably just Fixes
> is the right thing to use though.

But the correct fix *is* a bug fix.

David


Re: [PATCH net-next] rxrpc: Remove set but not used variable 'ioc'

2018-10-09 Thread David Howells
YueHaibing  wrote:

> net/rxrpc/output.c: In function 'rxrpc_reject_packets':
> net/rxrpc/output.c:527:11: warning:
>  variable 'ioc' set but not used [-Wunused-but-set-variable]
> 
> It never used since introduction in

I wonder why my compiler doesn't show this warning.

Anyway, NAK: just removing the variable is the wrong fix - you need to look at
the code more closely.  The actual fix is to pass it to kernel_sendmsg()
instead of 2.

But thanks anyway!  Do you want to respin your patch?

> commit ece64fec164f ("rxrpc: Emit BUSY packets when supposed to rather than 
> ABORTs")

Btw, this should be a 'Fixes:  ("subject")' line and the patch needs
to go to net, not net-next.

David


[PATCH net-next 2/9] rxrpc: Emit the data Tx trace line before transmitting

2018-10-04 Thread David Howells
Print the data Tx trace line before transmitting so that it appears before
the trace lines indicating success or failure of the transmission.  This
makes the trace log less confusing.

Signed-off-by: David Howells 
---

 net/rxrpc/output.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c
index e8fb8922bca8..993d4cd247f9 100644
--- a/net/rxrpc/output.c
+++ b/net/rxrpc/output.c
@@ -378,11 +378,13 @@ int rxrpc_send_data_packet(struct rxrpc_call *call, 
struct sk_buff *skb,
if ((lose++ & 7) == 7) {
ret = 0;
lost = true;
-   goto done;
}
}
 
-   _proto("Tx DATA %%%u { #%u }", serial, sp->hdr.seq);
+   trace_rxrpc_tx_data(call, sp->hdr.seq, serial, whdr.flags,
+   retrans, lost);
+   if (lost)
+   goto done;
 
/* send the packet with the don't fragment bit set if we currently
 * think it's small enough */
@@ -415,8 +417,6 @@ int rxrpc_send_data_packet(struct rxrpc_call *call, struct 
sk_buff *skb,
goto send_fragmentable;
 
 done:
-   trace_rxrpc_tx_data(call, sp->hdr.seq, serial, whdr.flags,
-   retrans, lost);
if (ret >= 0) {
if (whdr.flags & RXRPC_REQUEST_ACK) {
call->peer->rtt_last_req = skb->tstamp;



Re: How to post rxrpc patches for net-next with deps on net?

2018-10-04 Thread David Howells
David Miller  wrote:

> Just FYI, net has been merged into net-next.

I saw, thanks.  I've rebased my rxrpc-next on it, tagged it and pushed.  Will
mail a request soon.

Thanks,
David


How to post rxrpc patches for net-next with deps on net?

2018-10-03 Thread David Howells
Hi Dave,

I have some rxrpc patches to post for your net-next/master branch, but there's
a dependency in them on the rxrpc-fixes-20180928 tag you pulled into your
net/master branch.

What's the best way to handle this?

 (1) Wait for you to merge net into net-next?

 (2) Base it on my own merge of rxrpc-fixes-20180928 into net-next?  I had to
 fix up drivers/net/ethernet/netronome/nfp/nfp_net_common.c in the merge
 I'm currently using.

 (3) Let you fix up the discrepency between the two branches?  Two lines that
 got removed in the rxrpc-fixes-20180928 tag cause a merge failure in
 net/rxrpc/conn_object.c when applying it to raw net-next by their
 unexpected presence.  It's nothing too bad.

Thanks,
David


Re: r8169 tx batching(?) causing performance problems

2018-10-03 Thread David Howells
David Miller  wrote:

> Probably you are seeing some interrupt mitigation.
> 
> It seems there is a difference in how the interrupt mitigation is
> programmed on for 8168 chips vs. others by default.  Most get
> all zeros in the IntrMitigate register, whilst for 8168 chips
> a value of 0x5151 is programmed.

I'm not sure what that means.  I can't seem to find a programmer's manual for
the chip.

> You can play with ethtool to mess with the coalescing settings
> to see if this is part of the problem.

These bits from "ethtool -c enp3s0"?

rx-usecs: 200
rx-frames: 4
rx-usecs-irq: 0
rx-frames-irq: 0

tx-usecs: 200
tx-frames: 4
tx-usecs-irq: 0
tx-frames-irq: 0

> I bet this might explain the behavior you see after including
> even Heiner's TXCFG_AUTO_FIFO patch.

Thanks,
David


Re: r8169 tx batching(?) causing performance problems

2018-10-03 Thread David Howells
David Howells  wrote:

> Can someone help me figure out a performance issue that seems to be caused by
> an RTL8168g/8111g NIC that seems to be batching up transmissions - or, at
> least, not starting immediately that it's given something to transmit?

I've been told that:

commit ad5f97faff4231e72b96bd96adbe1b6e977a9b86
Author: Heiner Kallweit 
Date:   Fri Sep 28 23:51:54 2018 +0200
r8169: fix network stalls due to missing bit TXCFG_AUTO_FIFO

might fix the problem, however it doesn't seem to help entirely.  Whilst it
does now seem to be transmitting whilst I'm generating up the queue, it still
seems that I'm able to load up the queue faster the packets are being cleared
from the queue.

So in the following excerpt:

id-0[001] d.h241.284702: net_rtl8169_interrupt: enp3s0 st=85
id-0[001] ..s241.284715: net_rtl8169_poll: enp3s0 st=85
dd-3186 [003] ...341.284741: net_rtl8169_tx: enp3s0 p=213-216 skb=2e2b0c3e
dd-3186 [003] ...341.284777: net_rtl8169_tx: enp3s0 p=213-217 skb=3950deca
dd-3186 [003] ...341.284790: net_rtl8169_tx: enp3s0 p=213-218 skb=471b2bc2
dd-3186 [003] ...341.284826: net_rtl8169_tx: enp3s0 p=213-219 skb=7c25ae16
dd-3186 [003] ...341.284839: net_rtl8169_tx: enp3s0 p=213-220 skb=cfbf719f
dd-3186 [003] ...341.284870: net_rtl8169_tx: enp3s0 p=213-221 skb=d34a1f67
dd-3186 [003] ...341.284883: net_rtl8169_tx: enp3s0 p=213-222 skb=466e20e8
dd-3186 [003] ...341.284914: net_rtl8169_tx: enp3s0 p=213-223 skb=3d36cb1c
dd-3186 [003] ...341.284927: net_rtl8169_tx: enp3s0 p=213-224 skb=399c06ea
id-0[001] ..s241.284938: net_rtl8169_tx_done: enp3s0 p=213 skb=c797fea6
id-0[001] ..s241.284939: net_rtl8169_tx_done: enp3s0 p=214 skb=c0e4d6f0
id-0[001] ..s241.284940: net_rtl8169_tx_done: enp3s0 p=215 skb=2e2b0c3e
id-0[001] ..s241.284941: net_rtl8169_tx_done: enp3s0 p=216 skb=3950deca
id-0[001] ..s241.284941: net_rtl8169_tx_done: enp3s0 p=217 skb=471b2bc2
id-0[001] ..s241.284942: net_rtl8169_tx_done: enp3s0 p=218 skb=7c25ae16
id-0[001] ..s241.284943: net_rtl8169_tx_done: enp3s0 p=219 skb=cfbf719f
id-0[001] ..s241.284944: net_rtl8169_tx_done: enp3s0 p=220 skb=d34a1f67
id-0[001] ..s241.284945: net_rtl8169_tx_done: enp3s0 p=221 skb=466e20e8
id-0[001] ..s241.284946: net_rtl8169_tx_done: enp3s0 p=222 skb=3d36cb1c
id-0[001] ..s241.284947: net_rtl8169_tx_done: enp3s0 p=223 skb=399c06ea
id-0[001] d.h241.284954: net_rtl8169_interrupt: enp3s0 st=85

packets are being queued something like 11-13uS apart, but there seems like a
big gap of about 200uS in the idle thread between the poll and the first
tx_done that might be masking things.

David


r8169 tx batching(?) causing performance problems

2018-10-03 Thread David Howells
 the update of the Tx ring descriptors and does this right
 before interrupting the CPU.

 (5) Reading/writing the Tx descriptor ring from the device is slow.

But again note that tshark on the server shows the ACKs arriving as a batch -
I've also instrumented the NIC driver for that machine and observed the NIC
apparently receiving ACK packets in batches.

Also, what does the TxDescUnavail status flag signify?  The NIC is setting it,
but nothing in the driver takes any notice of it or clears it.

Thanks,
David
---
commit 4ce09fa309d8fb4b1b910091607c2290c45541f1
Author: David Howells 
Date:   Tue Oct 2 21:14:04 2018 +0100

Add tracepoints for Realtek r8169

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index ab30aaeac6d3..8d3a86a674b7 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -31,6 +31,9 @@
 #include 
 #include 
 
+#define CREATE_TRACE_POINTS
+#include 
+
 #define MODULENAME "r8169"
 
 #define FIRMWARE_8168D_1   "rtl_nic/rtl8168d-1.fw"
@@ -6197,6 +6200,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
RTL_W8(tp, TxPoll, NPQ);
 
mmiowb();
+   trace_net_rtl8169_tx(tp->dev, tp->dirty_tx, tp->cur_tx, skb);
 
if (!TX_FRAGS_READY_FOR(tp, MAX_SKB_FRAGS)) {
/* Avoid wrongly optimistic queue wake-up: rtl_tx thread must
@@ -6302,6 +6306,7 @@ static void rtl_tx(struct net_device *dev, struct 
rtl8169_private *tp)
rtl8169_unmap_tx_skb(tp_to_dev(tp), tx_skb,
 tp->TxDescArray + entry);
if (status & LastFrag) {
+   trace_net_rtl8169_tx_done(dev, dirty_tx, tx_skb->skb);
u64_stats_update_begin(>tx_stats.syncp);
tp->tx_stats.packets++;
tp->tx_stats.bytes += tx_skb->skb->len;
@@ -6453,6 +6458,7 @@ static int rtl_rx(struct net_device *dev, struct 
rtl8169_private *tp, u32 budget
dev->stats.multicast++;
 
napi_gro_receive(>napi, skb);
+   trace_net_rtl8169_napi_rx(dev, skb);
 
u64_stats_update_begin(>rx_stats.syncp);
tp->rx_stats.packets++;
@@ -6466,7 +6472,6 @@ static int rtl_rx(struct net_device *dev, struct 
rtl8169_private *tp, u32 budget
 
count = cur_rx - tp->cur_rx;
tp->cur_rx = cur_rx;
-
return count;
 }
 
@@ -6478,6 +6483,8 @@ static irqreturn_t rtl8169_interrupt(int irq, void 
*dev_instance)
if (status == 0x || !(status & (RTL_EVENT_NAPI | tp->event_slow)))
return IRQ_NONE;
 
+   trace_net_rtl8169_interrupt(tp->napi.dev, status);
+
rtl_irq_disable(tp);
napi_schedule_irqoff(>napi);
 
@@ -6559,6 +6566,7 @@ static int rtl8169_poll(struct napi_struct *napi, int 
budget)
 
status = rtl_get_events(tp);
rtl_ack_events(tp, status & ~tp->event_slow);
+   trace_net_rtl8169_poll(dev, status);
 
if (status & RTL_EVENT_NAPI_RX)
work_done = rtl_rx(dev, tp, (u32) budget);
diff --git a/include/trace/events/net_rtl8169.h 
b/include/trace/events/net_rtl8169.h
new file mode 100644
index ..92049078b3fa
--- /dev/null
+++ b/include/trace/events/net_rtl8169.h
@@ -0,0 +1,125 @@
+/* Realtek RTL8169 tracepoints
+ *
+ * Copyright (C) 2018 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowe...@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM net_rtl8169
+
+#if !defined(_TRACE_NET_RTL8169_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_NET_RTL8169_H
+
+#include 
+#include 
+
+
+TRACE_EVENT(net_rtl8169_interrupt,
+   TP_PROTO(struct net_device *netdev, u16 status),
+
+   TP_ARGS(netdev, status),
+
+   TP_STRUCT__entry(
+   __field(u16,status  )
+   __array(char,   name, IFNAMSIZ  )
+),
+
+   TP_fast_assign(
+   __entry->status = status;
+   memcpy(__entry->name, netdev->name, IFNAMSIZ);
+  ),
+
+   TP_printk("%s st=%x", __entry->name, __entry->status)
+   );
+
+TRACE_EVENT(net_rtl8169_poll,
+   TP_PROTO(struct net_device *netdev, u16 status),
+
+   TP_ARGS(netdev, status),
+
+   TP_STRUCT__entry(
+   __field(u16,status  )
+   __array(char,   nam

Massive delays between packet rx and UDP ->data_ready handoff

2018-10-02 Thread David Howells
Hi,

I'm seeing occasional massive delays between packets being received in the
r8169 driver and them turning up in the rxrpc_data_ready() function called by
the UDP or UDP6 ->data_ready() handler.  By massive, I mean I've seen anything
up to about 4s.  I *think* that the effect gets rapidly worse over time, then
plateaus.

I stuck some tracepoints in the r8169 driver, including one in the rtl_rx()
function, right after it calls napi_gro_receive() to queue the packet.  rxrpc
has tracepoints that give useful information here also, the first indicating
that we've been given a new packet by skb_recv_udp(), the second that we've
checksummed it and decoded the header and the third that the packet is about
to be destroyed.  All four lines print the (mangled) socket buffer address.

So, for example:

 -0 [001] .Ns2   169.274008: net_rtl8169_napi_rx: enp3s0 
skb=4f91d06d
 ...
 -0 [001] .Ns4   172.014880: rxrpc_skb: s=4f91d06d Rx RCV u=1 
m=1 p=rxrpc_data_ready+0xed/0x195b
 -0 [001] .Ns4   172.014883: rxrpc_rx_packet: 
d3ea7470:8f7b61a0:0003:09c5 000c 0001 01 05 DATA 2740874925 
skb=4f91d06d
 ...
 kworker/...  [001] ...1   172.015007: rxrpc_skb: s=4f91d06d Rx FRE u=1 
m=0 p=rxrpc_recvmsg_data.isra.0+0x840/0xdc1

This packet was received at 169.274008, came into possession of rxrpc at
172.014880 and had been checksummed and had its header processed by
172.014883; rxrpc has finished with the packet by 172.015007.  On the third
trace line, the "2740874925" is the difference between skb->tstamp and the
current time in nanoseconds - ie. 2.74s.  That's way too long and causes rxrpc
to retransmit.

Note that the two machines involved are on an uncontended gigE network with
about 4m of cable separating them and neither the client (the machine on which
this logging was captured) nor the server are particularly busy (the client
was also running a tshark capture to a file).

The base kernel is between current -rc5 and -rc6 plus a bunch of my own rxrpc,
afs and iter patches and a patch to add a few tracepoints to the r8169 driver.

Does anyone have any suggestions as to how I can work out what's going on or
how to fix it?

I have a more complete trace and matching pcap file if anyone wishes to look
at them.  I've also attached my config below in case that shines any light.

Thanks,
David
---
#
# Automatically generated file; DO NOT EDIT.
# Linux/x86_64 4.19.0-rc5 Kernel Configuration
#

#
# Compiler: x86_64-linux-gnu-gcc (GCC) 8.1.1 20180626 (Red Hat Cross 8.1.1-3)
#
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=80101
CONFIG_CLANG_VERSION=0
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION="-fscache"
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_BUILD_SALT=""
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
# CONFIG_KERNEL_GZIP is not set
CONFIG_KERNEL_BZIP2=y
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_CROSS_MEMORY_ATTACH=y
# CONFIG_USELIB is not set
CONFIG_AUDIT=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
CONFIG_AUDITSYSCALL=y
CONFIG_AUDIT_WATCH=y
CONFIG_AUDIT_TREE=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_GENERIC_IRQ_MIGRATION=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y
CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
# CONFIG_NO_HZ_IDLE is not set
CONFIG_NO_HZ_FULL=y
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_COUNT=y

#
# CPU/Task time and stats accounting
#
CONFIG_VIRT_CPU_ACCOUNTING=y
CONFIG_VIRT_CPU_ACCOUNTING_GEN=y
# CONFIG_IRQ_TIME_ACCOUNTING is not set
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y
CONFIG_CPU_ISOLATION=y

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_RCU_STALL_COMMON=y

Re: Is it possible to get Rx timestamps in skb->tstamp?

2018-09-19 Thread David Howells
Toke Høiland-Jørgensen  wrote:

> > Is it possible to tell a UDP socket that you'd like it to put
> > reception timestamps in skb->tstamp?
> 
> I think you probably want SO_TIMESTAMP*?
> 
> See 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/networking/timestamping.txt

That seems to work.  I thought that only affected recvmsg() putting the
timestamp into the ancillary data buffer, but apparently not.

Thanks,
David


Is it possible to get Rx timestamps in skb->tstamp?

2018-09-19 Thread David Howells
Hi,

Is it possible to tell a UDP socket that you'd like it to put reception
timestamps in skb->tstamp?

For some reason, I seem to remember that the kernel used to put something in
there - and AF_RXRPC makes use of it.

David


[PATCH net-next] rxrpc: Remove set but not used variable 'nowj'

2018-08-02 Thread David Howells
From: Wei Yongjun 

Fixes gcc '-Wunused-but-set-variable' warning:

net/rxrpc/proc.c: In function 'rxrpc_call_seq_show':
net/rxrpc/proc.c:66:29: warning:
 variable 'nowj' set but not used [-Wunused-but-set-variable]
  unsigned long timeout = 0, nowj;
 ^

Signed-off-by: Wei Yongjun 
Signed-off-by: David Howells 
---

 net/rxrpc/proc.c |3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/rxrpc/proc.c b/net/rxrpc/proc.c
index 163d05df339d..9805e3b85c36 100644
--- a/net/rxrpc/proc.c
+++ b/net/rxrpc/proc.c
@@ -63,7 +63,7 @@ static int rxrpc_call_seq_show(struct seq_file *seq, void *v)
struct rxrpc_peer *peer;
struct rxrpc_call *call;
struct rxrpc_net *rxnet = rxrpc_net(seq_file_net(seq));
-   unsigned long timeout = 0, nowj;
+   unsigned long timeout = 0;
rxrpc_seq_t tx_hard_ack, rx_hard_ack;
char lbuff[50], rbuff[50];
 
@@ -97,7 +97,6 @@ static int rxrpc_call_seq_show(struct seq_file *seq, void *v)
 
if (call->state != RXRPC_CALL_SERVER_PREALLOC) {
timeout = READ_ONCE(call->expect_rx_by);
-   nowj = jiffies;
timeout -= jiffies;
}
 



Re: [lkp-robot] [bisect done] ace45bec6d [ 52.056290] EIP: lock_release

2018-06-26 Thread David Howells
kernel test robot  wrote:

> 0day kernel testing robot got the below dmesg and the first bad commit is
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> 
> commit ace45bec6d77bc061c3c3d8ad99e298ea9800c2b
> Author: David Howells 
> AuthorDate: Fri Mar 30 21:04:43 2018 +0100
> Commit: David Howells 
> CommitDate: Fri Mar 30 21:04:43 2018 +0100
> 
> rxrpc: Fix firewall route keepalive

Are you actually making AF_RXRPC or the AFS filesystem do anything?  Or is it
just happening spontaneously?

David


Re: [PATCH net] KEYS: DNS: fix parsing multiple options

2018-06-14 Thread David Howells
Simon Horman  wrote:

> > -   eq = memchr(opt, '=', opt_len) ?: end;
> > +   eq = memchr(opt, '=', opt_len) ?: next_opt;
> > opt_nlen = eq - opt;
> > eq++;
> 
> It seems risky to advance eq++ in the case there the value is empty.
> Its not not pointing to the value but it may be accessed twice further on
> in this loop.
> 
> > opt_vlen = next_opt - eq; /* will be -1 if no value */

Yes, but note the next line ^^^ and the comment thereon.

This is followed later by a check:

if (opt_vlen <= 0)
goto bad_option_value;

in the dnserror option handler.

Note, also, there is guaranteed to be a NUL char included at the end of the
payload data, and that this is checked:

if (datalen <= 1 || !data || data[datalen - 1] != '\0') {

David


Re: [PATCH net] KEYS: DNS: fix parsing multiple options

2018-06-14 Thread David Howells
The fix seems to work, but the use of kstrtoul():

ret = kstrtoul(eq, 10, );

is incorrect since the buffer can't been modified to block out the next
argument if there is one, so the following fails:

perl -e 'print "#dnserror=1#", "\x00" x 1' |
keyctl padd dns_resolver desc @s

(Note this is preexisting and nothing to do with your patch).

I'm not sure how best to handle this.

Anyway, Dave, can you take Eric's patch into the net tree with:

    Acked-by: David Howells 

David


'Adding' a writable proc file under /proc/net/afs/ [was [PATCH 3/3] afs: Implement namespacing]

2018-05-15 Thread David Howells
Christoph Hellwig  wrote:

> So the real question here is why afs needs a write callback as the first
> user of /proc/net ever in a day and time were we've deprecated adding
> new proc files.  This is a discussion that should be had with linux-fsdevel
> and netdev in Cc.

/proc/fs/afs/cells and /proc/fs/afs/rootcell are not new proc files, nor is
the ability to configure the in-kernel afs filesystem by writing to them new.
The cells file has been there since 2002 and the rootcell file before the
start of the git history in 2005 - it's just that they've been in
/proc/fs/afs/ not /proc/net/afs/.

However, the afs procfiles need to be somewhere under /proc/net/ for the
network namespacing, so I moved the directory over and emplaced a symlink at
/proc/fs/afs.

David


[PATCH net 0/5] rxrpc: Fixes

2018-05-10 Thread David Howells

Here are three fixes for AF_RXRPC and two tracepoints that were useful for
finding them:

 (1) Fix missing start of expect-Rx-by timeout on initial packet
 transmission so that calls will time out if the peer doesn't respond.

 (2) Fix error reception on AF_INET6 sockets by using the correct family of
 sockopts on the UDP transport socket.

 (3) Fix setting the minimum security level on kernel calls so that they
 can be encrypted.

 (4) Add a tracepoint to log ICMP/ICMP6 and other error reports from the
 transport socket.

 (5) Add a tracepoint to log UDP sendmsg failure so that we can find out if
 transmission failure occurred on the UDP socket.

The patches are tagged here:

git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
rxrpc-fixes-20180510

and can also be found on the following branch:


http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-fixes

David
---
David Howells (5):
  rxrpc: Fix missing start of call timeout
  rxrpc: Fix error reception on AF_INET6 sockets
  rxrpc: Fix the min security level for kernel calls
  rxrpc: Add a tracepoint to log ICMP/ICMP6 and error messages
  rxrpc: Trace UDP transmission failure


 include/trace/events/rxrpc.h |   85 ++
 net/rxrpc/af_rxrpc.c |2 -
 net/rxrpc/ar-internal.h  |1 
 net/rxrpc/conn_event.c   |   11 -
 net/rxrpc/input.c|2 -
 net/rxrpc/local_event.c  |3 +
 net/rxrpc/local_object.c |   57 +---
 net/rxrpc/output.c   |   34 -
 net/rxrpc/peer_event.c   |   46 +++
 net/rxrpc/rxkad.c|6 ++-
 net/rxrpc/sendmsg.c  |   10 +
 11 files changed, 209 insertions(+), 48 deletions(-)



[PATCH net 1/5] rxrpc: Fix missing start of call timeout

2018-05-10 Thread David Howells
The expect_rx_by call timeout is supposed to be set when a call is started
to indicate that we need to receive a packet by that point.  This is
currently put back every time we receive a packet, but it isn't started
when we first send a packet.  Without this, the call may wait forever if
the server doesn't deign to reply.

Fix this by setting the timeout upon a successful UDP sendmsg call for the
first DATA packet.  The timeout is initiated only for initial transmission
and not for subsequent retries as we don't want the retry mechanism to
extend the timeout indefinitely.

Fixes: a158bdd3247b ("rxrpc: Fix call timeouts")
Reported-by: Marc Dionne <marc.dio...@auristor.com>
Signed-off-by: David Howells <dhowe...@redhat.com>
---

 net/rxrpc/ar-internal.h |1 +
 net/rxrpc/input.c   |2 +-
 net/rxrpc/output.c  |   11 +++
 net/rxrpc/sendmsg.c |   10 ++
 4 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 90d7079e0aa9..19975d2ca9a2 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -476,6 +476,7 @@ enum rxrpc_call_flag {
RXRPC_CALL_SEND_PING,   /* A ping will need to be sent */
RXRPC_CALL_PINGING, /* Ping in process */
RXRPC_CALL_RETRANS_TIMEOUT, /* Retransmission due to timeout 
occurred */
+   RXRPC_CALL_BEGAN_RX_TIMER,  /* We began the expect_rx_by timer */
 };
 
 /*
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index 0410d2277ca2..b5fd6381313d 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -971,7 +971,7 @@ static void rxrpc_input_call_packet(struct rxrpc_call *call,
if (timo) {
unsigned long now = jiffies, expect_rx_by;
 
-   expect_rx_by = jiffies + timo;
+   expect_rx_by = now + timo;
WRITE_ONCE(call->expect_rx_by, expect_rx_by);
rxrpc_reduce_call_timer(call, expect_rx_by, now,
rxrpc_timer_set_for_normal);
diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c
index 7f1fc04775b3..6b9d27f0d7ec 100644
--- a/net/rxrpc/output.c
+++ b/net/rxrpc/output.c
@@ -414,6 +414,17 @@ int rxrpc_send_data_packet(struct rxrpc_call *call, struct 
sk_buff *skb,

rxrpc_timer_set_for_lost_ack);
}
}
+
+   if (sp->hdr.seq == 1 &&
+   !test_and_set_bit(RXRPC_CALL_BEGAN_RX_TIMER,
+ >flags)) {
+   unsigned long nowj = jiffies, expect_rx_by;
+
+   expect_rx_by = nowj + call->next_rx_timo;
+   WRITE_ONCE(call->expect_rx_by, expect_rx_by);
+   rxrpc_reduce_call_timer(call, expect_rx_by, nowj,
+   rxrpc_timer_set_for_normal);
+   }
}
 
rxrpc_set_keepalive(call);
diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c
index 206e802ccbdc..be01f9c5d963 100644
--- a/net/rxrpc/sendmsg.c
+++ b/net/rxrpc/sendmsg.c
@@ -223,6 +223,15 @@ static void rxrpc_queue_packet(struct rxrpc_sock *rx, 
struct rxrpc_call *call,
 
ret = rxrpc_send_data_packet(call, skb, false);
if (ret < 0) {
+   switch (ret) {
+   case -ENETUNREACH:
+   case -EHOSTUNREACH:
+   case -ECONNREFUSED:
+   rxrpc_set_call_completion(call,
+ RXRPC_CALL_LOCAL_ERROR,
+ 0, ret);
+   goto out;
+   }
_debug("need instant resend %d", ret);
rxrpc_instant_resend(call, ix);
} else {
@@ -241,6 +250,7 @@ static void rxrpc_queue_packet(struct rxrpc_sock *rx, 
struct rxrpc_call *call,
rxrpc_timer_set_for_send);
}
 
+out:
rxrpc_free_skb(skb, rxrpc_skb_tx_freed);
_leave("");
 }



[PATCH net 2/5] rxrpc: Fix error reception on AF_INET6 sockets

2018-05-10 Thread David Howells
AF_RXRPC tries to turn on IP_RECVERR and IP_MTU_DISCOVER on the UDP socket
it just opened for communications with the outside world, regardless of the
type of socket.  Unfortunately, this doesn't work with an AF_INET6 socket.

Fix this by turning on IPV6_RECVERR and IPV6_MTU_DISCOVER instead if the
socket is of the AF_INET6 family.

Without this, kAFS server and address rotation doesn't work correctly
because the algorithm doesn't detect received network errors.

Fixes: 75b54cb57ca3 ("rxrpc: Add IPv6 support")
Signed-off-by: David Howells <dhowe...@redhat.com>
---

 net/rxrpc/local_object.c |   57 ++
 1 file changed, 42 insertions(+), 15 deletions(-)

diff --git a/net/rxrpc/local_object.c b/net/rxrpc/local_object.c
index 8b54e9531d52..b493e6b62740 100644
--- a/net/rxrpc/local_object.c
+++ b/net/rxrpc/local_object.c
@@ -134,22 +134,49 @@ static int rxrpc_open_socket(struct rxrpc_local *local, 
struct net *net)
}
}
 
-   /* we want to receive ICMP errors */
-   opt = 1;
-   ret = kernel_setsockopt(local->socket, SOL_IP, IP_RECVERR,
-   (char *) , sizeof(opt));
-   if (ret < 0) {
-   _debug("setsockopt failed");
-   goto error;
-   }
+   switch (local->srx.transport.family) {
+   case AF_INET:
+   /* we want to receive ICMP errors */
+   opt = 1;
+   ret = kernel_setsockopt(local->socket, SOL_IP, IP_RECVERR,
+   (char *) , sizeof(opt));
+   if (ret < 0) {
+   _debug("setsockopt failed");
+   goto error;
+   }
 
-   /* we want to set the don't fragment bit */
-   opt = IP_PMTUDISC_DO;
-   ret = kernel_setsockopt(local->socket, SOL_IP, IP_MTU_DISCOVER,
-   (char *) , sizeof(opt));
-   if (ret < 0) {
-   _debug("setsockopt failed");
-   goto error;
+   /* we want to set the don't fragment bit */
+   opt = IP_PMTUDISC_DO;
+   ret = kernel_setsockopt(local->socket, SOL_IP, IP_MTU_DISCOVER,
+   (char *) , sizeof(opt));
+   if (ret < 0) {
+   _debug("setsockopt failed");
+   goto error;
+   }
+   break;
+
+   case AF_INET6:
+   /* we want to receive ICMP errors */
+   opt = 1;
+   ret = kernel_setsockopt(local->socket, SOL_IPV6, IPV6_RECVERR,
+   (char *) , sizeof(opt));
+   if (ret < 0) {
+   _debug("setsockopt failed");
+   goto error;
+   }
+
+   /* we want to set the don't fragment bit */
+   opt = IPV6_PMTUDISC_DO;
+   ret = kernel_setsockopt(local->socket, SOL_IPV6, 
IPV6_MTU_DISCOVER,
+   (char *) , sizeof(opt));
+   if (ret < 0) {
+   _debug("setsockopt failed");
+   goto error;
+   }
+   break;
+
+   default:
+   BUG();
}
 
/* set the socket up */



[PATCH net 3/5] rxrpc: Fix the min security level for kernel calls

2018-05-10 Thread David Howells
Fix the kernel call initiation to set the minimum security level for kernel
initiated calls (such as from kAFS) from the sockopt value.

Fixes: 19ffa01c9c45 ("rxrpc: Use structs to hold connection params and protocol 
info")
Signed-off-by: David Howells <dhowe...@redhat.com>
---

 net/rxrpc/af_rxrpc.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index 9a2c8e7c000e..2b463047dd7b 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -313,7 +313,7 @@ struct rxrpc_call *rxrpc_kernel_begin_call(struct socket 
*sock,
memset(, 0, sizeof(cp));
cp.local= rx->local;
cp.key  = key;
-   cp.security_level   = 0;
+   cp.security_level   = rx->min_sec_level;
cp.exclusive= false;
cp.upgrade  = upgrade;
cp.service_id   = srx->srx_service;



[PATCH net 4/5] rxrpc: Add a tracepoint to log ICMP/ICMP6 and error messages

2018-05-10 Thread David Howells
Add a tracepoint to log received ICMP/ICMP6 events and other error
messages.

Signed-off-by: David Howells <dhowe...@redhat.com>
---

 include/trace/events/rxrpc.h |   30 +++
 net/rxrpc/peer_event.c   |   46 +-
 2 files changed, 53 insertions(+), 23 deletions(-)

diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h
index 9e96c2fe2793..497d0b67f421 100644
--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
@@ -15,6 +15,7 @@
 #define _TRACE_RXRPC_H
 
 #include 
+#include 
 
 /*
  * Define enums for tracing information.
@@ -1374,6 +1375,35 @@ TRACE_EVENT(rxrpc_resend,
  __entry->anno)
);
 
+TRACE_EVENT(rxrpc_rx_icmp,
+   TP_PROTO(struct rxrpc_peer *peer, struct sock_extended_err *ee,
+struct sockaddr_rxrpc *srx),
+
+   TP_ARGS(peer, ee, srx),
+
+   TP_STRUCT__entry(
+   __field(unsigned int,   peer)
+   __field_struct(struct sock_extended_err,ee  )
+   __field_struct(struct sockaddr_rxrpc,   srx )
+),
+
+   TP_fast_assign(
+   __entry->peer = peer->debug_id;
+   memcpy(&__entry->ee, ee, sizeof(__entry->ee));
+   memcpy(&__entry->srx, srx, sizeof(__entry->srx));
+  ),
+
+   TP_printk("P=%08x o=%u t=%u c=%u i=%u d=%u e=%d %pISp",
+ __entry->peer,
+ __entry->ee.ee_origin,
+ __entry->ee.ee_type,
+ __entry->ee.ee_code,
+ __entry->ee.ee_info,
+ __entry->ee.ee_data,
+ __entry->ee.ee_errno,
+ &__entry->srx.transport)
+   );
+
 #endif /* _TRACE_RXRPC_H */
 
 /* This part must be outside protection */
diff --git a/net/rxrpc/peer_event.c b/net/rxrpc/peer_event.c
index 78c2f95d1f22..0ed8b651cec2 100644
--- a/net/rxrpc/peer_event.c
+++ b/net/rxrpc/peer_event.c
@@ -28,39 +28,39 @@ static void rxrpc_store_error(struct rxrpc_peer *, struct 
sock_exterr_skb *);
  * Find the peer associated with an ICMP packet.
  */
 static struct rxrpc_peer *rxrpc_lookup_peer_icmp_rcu(struct rxrpc_local *local,
-const struct sk_buff *skb)
+const struct sk_buff *skb,
+struct sockaddr_rxrpc *srx)
 {
struct sock_exterr_skb *serr = SKB_EXT_ERR(skb);
-   struct sockaddr_rxrpc srx;
 
_enter("");
 
-   memset(, 0, sizeof(srx));
-   srx.transport_type = local->srx.transport_type;
-   srx.transport_len = local->srx.transport_len;
-   srx.transport.family = local->srx.transport.family;
+   memset(srx, 0, sizeof(*srx));
+   srx->transport_type = local->srx.transport_type;
+   srx->transport_len = local->srx.transport_len;
+   srx->transport.family = local->srx.transport.family;
 
/* Can we see an ICMP4 packet on an ICMP6 listening socket?  and vice
 * versa?
 */
-   switch (srx.transport.family) {
+   switch (srx->transport.family) {
case AF_INET:
-   srx.transport.sin.sin_port = serr->port;
+   srx->transport.sin.sin_port = serr->port;
switch (serr->ee.ee_origin) {
case SO_EE_ORIGIN_ICMP:
_net("Rx ICMP");
-   memcpy(_addr,
+   memcpy(>transport.sin.sin_addr,
   skb_network_header(skb) + serr->addr_offset,
   sizeof(struct in_addr));
break;
case SO_EE_ORIGIN_ICMP6:
_net("Rx ICMP6 on v4 sock");
-   memcpy(_addr,
+   memcpy(>transport.sin.sin_addr,
   skb_network_header(skb) + serr->addr_offset + 12,
   sizeof(struct in_addr));
break;
default:
-   memcpy(_addr, _hdr(skb)->saddr,
+   memcpy(>transport.sin.sin_addr, 
_hdr(skb)->saddr,
   sizeof(struct in_addr));
break;
}
@@ -68,25 +68,25 @@ static struct rxrpc_peer *rxrpc_lookup_peer_icmp_rcu(struct 
rxrpc_local *local,
 
 #ifdef CONFIG_AF_RXRPC_IPV6
case AF_INET6:
-   srx.transport.sin6.sin6_port = serr->port;
+   srx->transport.sin6.sin6_port = serr->port;
switch (serr->ee.ee_origin) {

[PATCH net 5/5] rxrpc: Trace UDP transmission failure

2018-05-10 Thread David Howells
Add a tracepoint to log transmission failure from the UDP transport socket
being used by AF_RXRPC.

Signed-off-by: David Howells <dhowe...@redhat.com>
---

 include/trace/events/rxrpc.h |   55 ++
 net/rxrpc/conn_event.c   |   11 ++--
 net/rxrpc/local_event.c  |3 ++
 net/rxrpc/output.c   |   23 --
 net/rxrpc/rxkad.c|6 +++--
 5 files changed, 90 insertions(+), 8 deletions(-)

diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h
index 497d0b67f421..077e664ac9a2 100644
--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
@@ -211,6 +211,20 @@ enum rxrpc_congest_change {
rxrpc_cong_saw_nack,
 };
 
+enum rxrpc_tx_fail_trace {
+   rxrpc_tx_fail_call_abort,
+   rxrpc_tx_fail_call_ack,
+   rxrpc_tx_fail_call_data_frag,
+   rxrpc_tx_fail_call_data_nofrag,
+   rxrpc_tx_fail_call_final_resend,
+   rxrpc_tx_fail_conn_abort,
+   rxrpc_tx_fail_conn_challenge,
+   rxrpc_tx_fail_conn_response,
+   rxrpc_tx_fail_reject,
+   rxrpc_tx_fail_version_keepalive,
+   rxrpc_tx_fail_version_reply,
+};
+
 #endif /* end __RXRPC_DECLARE_TRACE_ENUMS_ONCE_ONLY */
 
 /*
@@ -438,6 +452,19 @@ enum rxrpc_congest_change {
EM(RXRPC_CALL_LOCAL_ERROR,  "LocalError") \
E_(RXRPC_CALL_NETWORK_ERROR,"NetError")
 
+#define rxrpc_tx_fail_traces \
+   EM(rxrpc_tx_fail_call_abort,"CallAbort") \
+   EM(rxrpc_tx_fail_call_ack,  "CallAck") \
+   EM(rxrpc_tx_fail_call_data_frag,"CallDataFrag") \
+   EM(rxrpc_tx_fail_call_data_nofrag,  "CallDataNofrag") \
+   EM(rxrpc_tx_fail_call_final_resend, "CallFinalResend") \
+   EM(rxrpc_tx_fail_conn_abort,"ConnAbort") \
+   EM(rxrpc_tx_fail_conn_challenge,"ConnChall") \
+   EM(rxrpc_tx_fail_conn_response, "ConnResp") \
+   EM(rxrpc_tx_fail_reject,"Reject") \
+   EM(rxrpc_tx_fail_version_keepalive, "VerKeepalive") \
+   E_(rxrpc_tx_fail_version_reply, "VerReply")
+
 /*
  * Export enum symbols via userspace.
  */
@@ -461,6 +488,7 @@ rxrpc_propose_ack_traces;
 rxrpc_propose_ack_outcomes;
 rxrpc_congest_modes;
 rxrpc_congest_changes;
+rxrpc_tx_fail_traces;
 
 /*
  * Now redefine the EM() and E_() macros to map the enums to the strings that
@@ -1404,6 +1432,33 @@ TRACE_EVENT(rxrpc_rx_icmp,
  &__entry->srx.transport)
);
 
+TRACE_EVENT(rxrpc_tx_fail,
+   TP_PROTO(unsigned int debug_id, rxrpc_serial_t serial, int ret,
+enum rxrpc_tx_fail_trace what),
+
+   TP_ARGS(debug_id, serial, ret, what),
+
+   TP_STRUCT__entry(
+   __field(unsigned int,   debug_id)
+   __field(rxrpc_serial_t, serial  )
+   __field(int,ret )
+   __field(enum rxrpc_tx_fail_trace,   what)
+),
+
+   TP_fast_assign(
+   __entry->debug_id = debug_id;
+   __entry->serial = serial;
+   __entry->ret = ret;
+   __entry->what = what;
+  ),
+
+   TP_printk("c=%08x r=%x ret=%d %s",
+ __entry->debug_id,
+ __entry->serial,
+ __entry->ret,
+ __print_symbolic(__entry->what, rxrpc_tx_fail_traces))
+   );
+
 #endif /* _TRACE_RXRPC_H */
 
 /* This part must be outside protection */
diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c
index c717152070df..1350f1be8037 100644
--- a/net/rxrpc/conn_event.c
+++ b/net/rxrpc/conn_event.c
@@ -40,7 +40,7 @@ static void rxrpc_conn_retransmit_call(struct 
rxrpc_connection *conn,
} __attribute__((packed)) pkt;
struct rxrpc_ackinfo ack_info;
size_t len;
-   int ioc;
+   int ret, ioc;
u32 serial, mtu, call_id, padding;
 
_enter("%d", conn->debug_id);
@@ -135,10 +135,13 @@ static void rxrpc_conn_retransmit_call(struct 
rxrpc_connection *conn,
break;
}
 
-   kernel_sendmsg(conn->params.local->socket, , iov, ioc, len);
+   ret = kernel_sendmsg(conn->params.local->socket, , iov, ioc, len);
conn->params.peer->last_tx_at = ktime_get_real();
+   if (ret < 0)
+   trace_rxrpc_tx_fail(conn->debug_id, serial, ret,
+   rxrpc_tx_fail_call_final_resend);
+
_leave("");
-   return;
 }
 
 /*
@@ -236,6 +239,8 @@ static int rxrpc_abort_connection(struct rxrpc_connection 
*conn,

Re: simplify procfs code for seq_file instances V2

2018-04-30 Thread David Howells
Note that your kernel hits the:

inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
swapper/0/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
(ptrval) (fs_reclaim){?.+.}, at: fs_reclaim_acquire+0x12/0x35
{HARDIRQ-ON-W} state was registered at:
  fs_reclaim_acquire+0x32/0x35
  kmem_cache_alloc_node_trace+0x49/0x2cf
  alloc_worker+0x1d/0x49
  init_rescuer.part.7+0x19/0x8f
  workqueue_init+0xc0/0x1fe
  kernel_init_freeable+0xdc/0x433
  kernel_init+0xa/0xf5
  ret_from_fork+0x24/0x30

bug, as described here:


https://groups.google.com/forum/#!msg/syzkaller-bugs/sJC3Y3hOM08/aO3z9JXoAgAJ

David


Re: [PATCH 20/39] afs: simplify procfs code

2018-04-30 Thread David Howells
Christoph Hellwig  wrote:

> I don't think you should need any of these.  seq_file_net or
> seq_file_single_net will return you the net_ns based on a struct
> seq_file.  And even from your write routines you can reach the
> seq_file in file->private pretty easily.

You've taken away things like single_open/release_net() which means I can't
supply my own fops and use the proc_net stuff.  I wonder if I should add a
write op to struct proc_dir_entry.

David


Re: [PATCH 04/40] proc: introduce proc_create_seq{,_data}

2018-04-30 Thread David Howells
Christoph Hellwig  wrote:

> +
> +struct proc_dir_entry *proc_create_seq_data(const char *name, umode_t mode,
> + struct proc_dir_entry *parent, const struct seq_operations *ops,
> + void *data)
> +{
> ...
> +EXPORT_SYMBOL(proc_create_seq_data);

Please add documentation comments to exported functions when you add them.

David


[PATCH 02/02] afs: Implement namespacing

2018-04-20 Thread David Howells
Implement namespacing within AFS, but don't yet let mounts occur outside
the init namespace.  An additional patch will be required propagate the
network namespace across automounts.

Signed-off-by: David Howells <dhowe...@redhat.com>
---
 cell.c |4 +-
 internal.h |   39 +++---
 main.c |   33 ++
 proc.c |   89 +++--
 super.c|   20 -
 5 files changed, 121 insertions(+), 64 deletions(-)

diff --git a/fs/afs/cell.c b/fs/afs/cell.c
index fdf4c36cff79..a98a8a3d5544 100644
--- a/fs/afs/cell.c
+++ b/fs/afs/cell.c
@@ -528,7 +528,7 @@ static int afs_activate_cell(struct afs_net *net, struct 
afs_cell *cell)
 NULL, 0,
 cell, 0, true);
 #endif
-   ret = afs_proc_cell_setup(net, cell);
+   ret = afs_proc_cell_setup(cell);
if (ret < 0)
return ret;
spin_lock(>proc_cells_lock);
@@ -544,7 +544,7 @@ static void afs_deactivate_cell(struct afs_net *net, struct 
afs_cell *cell)
 {
_enter("%s", cell->name);
 
-   afs_proc_cell_remove(net, cell);
+   afs_proc_cell_remove(cell);
 
spin_lock(>proc_cells_lock);
list_del_init(>proc_link);
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index f8086ec95e24..9863dac6ff6a 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -22,6 +22,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 
 #include "afs.h"
@@ -40,7 +42,8 @@ struct afs_mount_params {
afs_voltype_t   type;   /* type of volume requested */
int volnamesz;  /* size of volume name */
const char  *volname;   /* name of volume to mount */
-   struct afs_net  *net;   /* Network namespace in effect 
*/
+   struct net  *net_ns;/* Network namespace in effect 
*/
+   struct afs_net  *net;   /* the AFS net namespace stuff 
*/
struct afs_cell *cell;  /* cell in which to find volume 
*/
struct afs_volume   *volume;/* volume record */
struct key  *key;   /* key to use for secure 
mounting */
@@ -189,7 +192,7 @@ struct afs_read {
  * - there's one superblock per volume
  */
 struct afs_super_info {
-   struct afs_net  *net;   /* Network namespace */
+   struct net  *net_ns;/* Network namespace */
struct afs_cell *cell;  /* The cell in which the volume 
resides */
struct afs_volume   *volume;/* volume record */
booldyn_root;   /* True if dynamic root */
@@ -218,6 +221,7 @@ struct afs_sysnames {
  * AFS network namespace record.
  */
 struct afs_net {
+   struct net  *net;   /* Backpointer to the owning 
net namespace */
struct afs_uuid uuid;
boollive;   /* F if this namespace is being 
removed */
 
@@ -280,7 +284,6 @@ struct afs_net {
 };
 
 extern const char afs_init_sysname[];
-extern struct afs_net __afs_net;// Dummy AFS network namespace; TODO: replace 
with real netns
 
 enum afs_cell_state {
AFS_CELL_UNSET,
@@ -787,34 +790,36 @@ extern int afs_drop_inode(struct inode *);
  * main.c
  */
 extern struct workqueue_struct *afs_wq;
+extern int afs_net_id;
 
-static inline struct afs_net *afs_d2net(struct dentry *dentry)
+static inline struct afs_net *afs_net(struct net *net)
 {
-   return &__afs_net;
+   return net_generic(net, afs_net_id);
 }
 
-static inline struct afs_net *afs_i2net(struct inode *inode)
+static inline struct afs_net *afs_sb2net(struct super_block *sb)
 {
-   return &__afs_net;
+   return afs_net(AFS_FS_S(sb)->net_ns);
 }
 
-static inline struct afs_net *afs_v2net(struct afs_vnode *vnode)
+static inline struct afs_net *afs_d2net(struct dentry *dentry)
 {
-   return &__afs_net;
+   return afs_sb2net(dentry->d_sb);
 }
 
-static inline struct afs_net *afs_sock2net(struct sock *sk)
+static inline struct afs_net *afs_i2net(struct inode *inode)
 {
-   return &__afs_net;
+   return afs_sb2net(inode->i_sb);
 }
 
-static inline struct afs_net *afs_get_net(struct afs_net *net)
+static inline struct afs_net *afs_v2net(struct afs_vnode *vnode)
 {
-   return net;
+   return afs_i2net(>vfs_inode);
 }
 
-static inline void afs_put_net(struct afs_net *net)
+static inline struct afs_net *afs_sock2net(struct sock *sk)
 {
+   return net_generic(sock_net(sk), afs_net_id);
 }
 
 static inline void __afs_stat(atomic_t *s)
@@ -849,8 +854,8 @@ extern int afs_get_ipv4_interfaces(struct afs_interface *, 
size_t, bool);
  */
 extern int __net_init afs_proc_init(struct afs_net *);
 extern void __net_exit afs_p

[PATCH 01/02] net: Export get_proc_net()

2018-04-20 Thread David Howells
Export get_proc_net() so that write() routines attached to net-namespaced
proc files can find the network namespace that they're in.  Currently, this
is only accessible via seqfile routines.

This will permit AFS to have a separate cell database per-network namespace
and to manage each one independently of the others.

Signed-off-by: David Howells <dhowe...@redhat.com>
---
 fs/proc/proc_net.c  |3 ++-
 include/linux/proc_fs.h |2 ++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/proc/proc_net.c b/fs/proc/proc_net.c
index 1763f370489d..23d56146e38f 100644
--- a/fs/proc/proc_net.c
+++ b/fs/proc/proc_net.c
@@ -33,10 +33,11 @@ static inline struct net *PDE_NET(struct proc_dir_entry 
*pde)
return pde->parent->data;
 }
 
-static struct net *get_proc_net(const struct inode *inode)
+struct net *get_proc_net(const struct inode *inode)
 {
return maybe_get_net(PDE_NET(PDE(inode)));
 }
+EXPORT_SYMBOL_GPL(get_proc_net);
 
 int seq_open_net(struct inode *ino, struct file *f,
 const struct seq_operations *ops, int size)
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 928ef9e4d912..47a87121d30c 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -79,6 +79,8 @@ static inline struct proc_dir_entry *proc_net_mkdir(
return proc_mkdir_data(name, 0, parent, net);
 }
 
+extern struct net *get_proc_net(const struct inode *inode);
+
 struct ns_common;
 int open_related_ns(struct ns_common *ns,
   struct ns_common *(*get_ns)(struct ns_common *ns));


Re: [PATCH 20/39] afs: simplify procfs code

2018-04-20 Thread David Howells
David Howells <dhowe...@redhat.com> wrote:

> > Use remove_proc_subtree to remove the whole subtree on cleanup, and
> > unwind the registration loop into individual calls.  Switch to use
> > proc_create_seq where applicable.
> 
> Note that this is likely going to clash with my patch to net-namespace all of
> the afs proc files:
> 
>   
> https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/commit/?h=mount-context=f60c26c827c073583107ebf19d87bc5c0e71c3d2
> 
> If it helps, I should be able to disentangle this from the mount-api changes
> since the subsequent patch connects the dots to propagate the network
> namespace over automount using the new fs_context to do it.

Okay, I'll follow up this mail with a pair of patches to just use network
namespacing in AFS.  The first exports a function from core code only; the
second is the actual modifications to AFS.

Note that this doesn't actually yet permit AFS to use any other namespace.
The propagation-over-automount issue is handled by a separate patch.

David


Re: [PATCH 20/39] afs: simplify procfs code

2018-04-19 Thread David Howells
Christoph Hellwig  wrote:

> Use remove_proc_subtree to remove the whole subtree on cleanup, and
> unwind the registration loop into individual calls.  Switch to use
> proc_create_seq where applicable.

Note that this is likely going to clash with my patch to net-namespace all of
the afs proc files:


https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/commit/?h=mount-context=f60c26c827c073583107ebf19d87bc5c0e71c3d2

If it helps, I should be able to disentangle this from the mount-api changes
since the subsequent patch connects the dots to propagate the network
namespace over automount using the new fs_context to do it.

David


Re: linux-next: build failure after merge of the tip tree

2018-04-03 Thread David Howells
Stephen Rothwell  wrote:

> + wait_var_event(>nr_calls, !atomic_read(>nr_calls));

I would prefer == 0 to ! as it's not really a true/false value.

But apart from that, it's looks okay and you can add my Reviewed-by.

David


Re: linux-next: build failure after merge of the tip tree

2018-04-03 Thread David Howells
Peter Zijlstra  wrote:

> I figured that since there were only a handful of users it wasn't a
> popular API, also David very much knew of those patches changing it so
> could easily have pulled in the special tip/sched/wait branch :/

I'm not sure I could, since I have to base on net-next.  I'm not sure what
DaveM's policy on that is.

Also, it might've been better not to simply erase the atomic_t wait API
immediately, but substitute wrappers for it to be removed one iteration hence.

David


[PATCH net-next] rxrpc: Fix undefined packet handling

2018-04-02 Thread David Howells
By analogy with other Rx implementations, RxRPC packet types 9, 10 and 11
should just be discarded rather than being aborted like other undefined
packet types.

Reported-by: Jeffrey Altman <jalt...@auristor.com>
Signed-off-by: David Howells <dhowe...@redhat.com>
---

 net/rxrpc/input.c|6 ++
 net/rxrpc/protocol.h |6 ++
 2 files changed, 12 insertions(+)

diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index 21800e6f5019..0410d2277ca2 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -1200,6 +1200,12 @@ void rxrpc_data_ready(struct sock *udp_sk)
!rxrpc_validate_jumbo(skb))
goto bad_message;
break;
+
+   /* Packet types 9-11 should just be ignored. */
+   case RXRPC_PACKET_TYPE_PARAMS:
+   case RXRPC_PACKET_TYPE_10:
+   case RXRPC_PACKET_TYPE_11:
+   goto discard;
}
 
rcu_read_lock();
diff --git a/net/rxrpc/protocol.h b/net/rxrpc/protocol.h
index 4bddcf3face3..93da73bf7098 100644
--- a/net/rxrpc/protocol.h
+++ b/net/rxrpc/protocol.h
@@ -46,6 +46,9 @@ struct rxrpc_wire_header {
 #define RXRPC_PACKET_TYPE_CHALLENGE6   /* connection security 
challenge (SRVR->CLNT) */
 #define RXRPC_PACKET_TYPE_RESPONSE 7   /* connection secutity response 
(CLNT->SRVR) */
 #define RXRPC_PACKET_TYPE_DEBUG8   /* debug info request */
+#define RXRPC_PACKET_TYPE_PARAMS   9   /* Parameter negotiation 
(unspec'd, ignore) */
+#define RXRPC_PACKET_TYPE_10   10  /* Ignored */
+#define RXRPC_PACKET_TYPE_11   11  /* Ignored */
 #define RXRPC_PACKET_TYPE_VERSION  13  /* version string request */
 #define RXRPC_N_PACKET_TYPES   14  /* number of packet types (incl 
type 0) */
 
@@ -78,6 +81,9 @@ struct rxrpc_wire_header {
(1 << RXRPC_PACKET_TYPE_CHALLENGE) |\
(1 << RXRPC_PACKET_TYPE_RESPONSE) | \
/*(1 << RXRPC_PACKET_TYPE_DEBUG) | */   \
+   (1 << RXRPC_PACKET_TYPE_PARAMS) |   \
+   (1 << RXRPC_PACKET_TYPE_10) |   \
+   (1 << RXRPC_PACKET_TYPE_11) |   \
(1 << RXRPC_PACKET_TYPE_VERSION))
 
 /*/



[PATCH net-next 00/12] rxrpc: Fixes and more traces

2018-03-30 Thread David Howells

Here are some patches that add some more tracepoints to AF_RXRPC and fix
some issues therein:

 (1) Fix the use of VERSION packets to keep firewall routes open.

 (2) Fix the incorrect current time usage in a tracepoint.

 (3) Fix Tx ring annotation corruption.

 (4) Fix accidental conversion of call-level abort into connection-level
 abort.

 (5) Fix calculation of resend time.

 (6) Remove a couple of unused variables.

 (7) Fix a bunch of checker warnings and an error.  Note that not all
 warnings can be quashed as checker doesn't seem to correctly handle
 seqlocks.

 (8) Fix a potential race between call destruction and socket/net
 destruction.

 (9) Add a tracepoint to track rxrpc_local refcounting.

(10) Fix an apparent leak of rxrpc_local objects.

(11) Add a tracepoint to track rxrpc_peer refcounting.

(12) Fix a leak of rxrpc_peer objects.

The patches are tagged here:

git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
rxrpc-next-20180330

and can also be found on this branch:


http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-next

David
---
David Howells (10):
  rxrpc: Fix firewall route keepalive
  rxrpc: Fix a bit of time confusion
  rxrpc: Fix Tx ring annotation after initial Tx failure
  rxrpc: Don't treat call aborts as conn aborts
  rxrpc: Fix checker warnings and errors
  rxrpc: Fix potential call vs socket/net destruction race
  rxrpc: Add a tracepoint to track rxrpc_local refcounting
  rxrpc: Fix apparent leak of rxrpc_local objects
  rxrpc: Add a tracepoint to track rxrpc_peer refcounting
  rxrpc: Fix leak of rxrpc_peer objects

Marc Dionne (1):
  rxrpc: Fix resend event time calculation

Sebastian Andrzej Siewior (1):
  rxrpc: remove unused static variables


 include/trace/events/rxrpc.h |   85 
 net/rxrpc/af_rxrpc.c |6 +++
 net/rxrpc/ar-internal.h  |   68 +++--
 net/rxrpc/call_accept.c  |9 +++-
 net/rxrpc/call_event.c   |4 +-
 net/rxrpc/call_object.c  |   17 ++-
 net/rxrpc/conn_client.c  |3 +
 net/rxrpc/conn_event.c   |3 +
 net/rxrpc/conn_object.c  |   10 
 net/rxrpc/conn_service.c |1 
 net/rxrpc/input.c|   17 +--
 net/rxrpc/local_object.c |   65 +++-
 net/rxrpc/net_ns.c   |   24 ++
 net/rxrpc/output.c   |   59 +
 net/rxrpc/peer_event.c   |   98 ++
 net/rxrpc/peer_object.c  |   93 +++-
 net/rxrpc/proc.c |6 +++
 net/rxrpc/rxkad.c|2 +
 net/rxrpc/security.c |3 -
 net/rxrpc/sendmsg.c  |7 +++
 20 files changed, 509 insertions(+), 71 deletions(-)



[PATCH net-next 01/12] rxrpc: Fix firewall route keepalive

2018-03-30 Thread David Howells
Fix the firewall route keepalive part of AF_RXRPC which is currently
function incorrectly by replying to VERSION REPLY packets from the server
with VERSION REQUEST packets.

Instead, send VERSION REPLY packets to the peers of service connections to
act as keep-alives 20s after the latest packet was transmitted to that
peer.

Also, just discard VERSION REPLY packets rather than replying to them.

Signed-off-by: David Howells <dhowe...@redhat.com>
---

 net/rxrpc/af_rxrpc.c|4 ++
 net/rxrpc/ar-internal.h |   14 ++-
 net/rxrpc/conn_event.c  |3 +
 net/rxrpc/input.c   |2 +
 net/rxrpc/net_ns.c  |   21 ++
 net/rxrpc/output.c  |   59 -
 net/rxrpc/peer_event.c  |   96 +++
 net/rxrpc/peer_object.c |7 +++
 net/rxrpc/rxkad.c   |2 +
 9 files changed, 204 insertions(+), 4 deletions(-)

diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index ec5ec68be1aa..0b3026b8fa40 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -762,6 +762,7 @@ static __poll_t rxrpc_poll(struct file *file, struct socket 
*sock,
 static int rxrpc_create(struct net *net, struct socket *sock, int protocol,
int kern)
 {
+   struct rxrpc_net *rxnet;
struct rxrpc_sock *rx;
struct sock *sk;
 
@@ -801,6 +802,9 @@ static int rxrpc_create(struct net *net, struct socket 
*sock, int protocol,
rwlock_init(>call_lock);
memset(>srx, 0, sizeof(rx->srx));
 
+   rxnet = rxrpc_net(sock_net(>sk));
+   timer_reduce(>peer_keepalive_timer, jiffies + 1);
+
_leave(" = 0 [%p]", rx);
return 0;
 }
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 21cf164b6d85..8a348e0a9d95 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -97,8 +97,16 @@ struct rxrpc_net {
struct list_headlocal_endpoints;
struct mutexlocal_mutex;/* Lock for ->local_endpoints */
 
-   spinlock_t  peer_hash_lock; /* Lock for ->peer_hash */
DECLARE_HASHTABLE   (peer_hash, 10);
+   spinlock_t  peer_hash_lock; /* Lock for ->peer_hash */
+
+#define RXRPC_KEEPALIVE_TIME 20 /* NAT keepalive time in seconds */
+   u8  peer_keepalive_cursor;
+   ktime_t peer_keepalive_base;
+   struct hlist_head   peer_keepalive[RXRPC_KEEPALIVE_TIME + 1];
+   struct hlist_head   peer_keepalive_new;
+   struct timer_list   peer_keepalive_timer;
+   struct work_struct  peer_keepalive_work;
 };
 
 /*
@@ -285,6 +293,8 @@ struct rxrpc_peer {
struct hlist_head   error_targets;  /* targets for net error 
distribution */
struct work_struct  error_distributor;
struct rb_root  service_conns;  /* Service connections */
+   struct hlist_node   keepalive_link; /* Link in 
net->peer_keepalive[] */
+   time64_tlast_tx_at; /* Last time packet sent here */
seqlock_t   service_conn_lock;
spinlock_t  lock;   /* access lock */
unsigned intif_mtu; /* interface MTU for this peer 
*/
@@ -1026,6 +1036,7 @@ int rxrpc_send_ack_packet(struct rxrpc_call *, bool, 
rxrpc_serial_t *);
 int rxrpc_send_abort_packet(struct rxrpc_call *);
 int rxrpc_send_data_packet(struct rxrpc_call *, struct sk_buff *, bool);
 void rxrpc_reject_packets(struct rxrpc_local *);
+void rxrpc_send_keepalive(struct rxrpc_peer *);
 
 /*
  * peer_event.c
@@ -1034,6 +1045,7 @@ void rxrpc_error_report(struct sock *);
 void rxrpc_peer_error_distributor(struct work_struct *);
 void rxrpc_peer_add_rtt(struct rxrpc_call *, enum rxrpc_rtt_rx_trace,
rxrpc_serial_t, rxrpc_serial_t, ktime_t, ktime_t);
+void rxrpc_peer_keepalive_worker(struct work_struct *);
 
 /*
  * peer_object.c
diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c
index d2ec3fd593e8..c717152070df 100644
--- a/net/rxrpc/conn_event.c
+++ b/net/rxrpc/conn_event.c
@@ -136,6 +136,7 @@ static void rxrpc_conn_retransmit_call(struct 
rxrpc_connection *conn,
}
 
kernel_sendmsg(conn->params.local->socket, , iov, ioc, len);
+   conn->params.peer->last_tx_at = ktime_get_real();
_leave("");
return;
 }
@@ -239,6 +240,8 @@ static int rxrpc_abort_connection(struct rxrpc_connection 
*conn,
return -EAGAIN;
}
 
+   conn->params.peer->last_tx_at = ktime_get_real();
+
_leave(" = 0");
return 0;
 }
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index 2a868fdab0ae..d4f2509e018b 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -1183,6 +1183,8 @@ void rxrpc_data_ready(struct sock *udp_sk)
 
switch (sp->hdr.type) {
case RXRPC_PACKET_TYPE_VERSION:
+  

[PATCH net-next 02/12] rxrpc: Fix a bit of time confusion

2018-03-30 Thread David Howells
The rxrpc_reduce_call_timer() function should be passed the 'current time'
in jiffies, not the current ktime time.  It's confusing in rxrpc_resend
because that has to deal with both.  Pass the correct current time in.

Note that this only affects the trace produced and not the functioning of
the code.

Fixes: a158bdd3247b ("rxrpc: Fix call timeouts")
Signed-off-by: David Howells <dhowe...@redhat.com>
---

 net/rxrpc/call_event.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
index 6a62e42e1d8d..3dee89c7a06e 100644
--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
@@ -238,7 +238,7 @@ static void rxrpc_resend(struct rxrpc_call *call, unsigned 
long now_j)
 * retransmitting data.
 */
if (!retrans) {
-   rxrpc_reduce_call_timer(call, resend_at, now,
+   rxrpc_reduce_call_timer(call, resend_at, now_j,
rxrpc_timer_set_for_resend);
spin_unlock_bh(>lock);
ack_ts = ktime_sub(now, call->acks_latest_ts);



[PATCH net-next 03/12] rxrpc: Fix Tx ring annotation after initial Tx failure

2018-03-30 Thread David Howells
rxrpc calls have a ring of packets that are awaiting ACK or retransmission
and a parallel ring of annotations that tracks the state of those packets.
If the initial transmission of a packet on the underlying UDP socket fails
then the packet annotation is marked for resend - but the setting of this
mark accidentally erases the last-packet mark also stored in the same
annotation slot.  If this happens, a call won't switch out of the Tx phase
when all the packets have been transmitted.

Fix this by retaining the last-packet mark and only altering the packet
state.

Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
Signed-off-by: David Howells <dhowe...@redhat.com>
---

 net/rxrpc/sendmsg.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c
index 8503f279b467..783c777fc6e7 100644
--- a/net/rxrpc/sendmsg.c
+++ b/net/rxrpc/sendmsg.c
@@ -130,7 +130,9 @@ static inline void rxrpc_instant_resend(struct rxrpc_call 
*call, int ix)
spin_lock_bh(>lock);
 
if (call->state < RXRPC_CALL_COMPLETE) {
-   call->rxtx_annotations[ix] = RXRPC_TX_ANNO_RETRANS;
+   call->rxtx_annotations[ix] =
+   (call->rxtx_annotations[ix] & RXRPC_TX_ANNO_LAST) |
+   RXRPC_TX_ANNO_RETRANS;
if (!test_and_set_bit(RXRPC_CALL_EV_RESEND, >events))
rxrpc_queue_call(call);
}



[PATCH net-next 04/12] rxrpc: Don't treat call aborts as conn aborts

2018-03-30 Thread David Howells
If a call-level abort is received for the previous call to complete on a
connection channel, then that abort is queued for the connection processor
to handle.  Unfortunately, the connection processor then assumes without
checking that the abort is connection-level (ie. callNumber is 0) and
distributes it over all active calls on that connection, thereby
incorrectly aborting them.

Fix this by discarding aborts aimed at a completed call.

Further, discard all packets aimed at a call that's complete if there's
currently an active call on a channel, since the DATA packets associated
with the new call automatically terminate the old call.

Fixes: 18bfeba50dfd ("rxrpc: Perform terminal call ACK/ABORT retransmission 
from conn processor")
Reported-by: Marc Dionne <marc.dio...@auristor.com>
Signed-off-by: David Howells <dhowe...@redhat.com>
---

 net/rxrpc/input.c |   15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index d4f2509e018b..21800e6f5019 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -1242,16 +1242,19 @@ void rxrpc_data_ready(struct sock *udp_sk)
goto discard_unlock;
 
if (sp->hdr.callNumber == chan->last_call) {
-   /* For the previous service call, if completed 
successfully, we
-* discard all further packets.
+   if (chan->call ||
+   sp->hdr.type == RXRPC_PACKET_TYPE_ABORT)
+   goto discard_unlock;
+
+   /* For the previous service call, if completed
+* successfully, we discard all further packets.
 */
if (rxrpc_conn_is_service(conn) &&
-   (chan->last_type == RXRPC_PACKET_TYPE_ACK ||
-sp->hdr.type == RXRPC_PACKET_TYPE_ABORT))
+   chan->last_type == RXRPC_PACKET_TYPE_ACK)
goto discard_unlock;
 
-   /* But otherwise we need to retransmit the final packet 
from
-* data cached in the connection record.
+   /* But otherwise we need to retransmit the final packet
+* from data cached in the connection record.
 */
rxrpc_post_packet_to_conn(conn, skb);
goto out_unlock;



[PATCH net-next 07/12] rxrpc: Fix checker warnings and errors

2018-03-30 Thread David Howells
Fix various issues detected by checker.

Errors:

 (*) rxrpc_discard_prealloc() should be using rcu_assign_pointer to set
 call->socket.

Warnings:

 (*) rxrpc_service_connection_reaper() should be passing NULL rather than 0 to
 trace_rxrpc_conn() as the where argument.

 (*) rxrpc_disconnect_client_call() should get its net pointer via the
 call->conn rather than call->sock to avoid a warning about accessing
 an RCU pointer without protection.

 (*) Proc seq start/stop functions need annotation as they pass locks
 between the functions.

False positives:

 (*) Checker doesn't correctly handle of seq-retry lock context balance in
 rxrpc_find_service_conn_rcu().

 (*) Checker thinks execution may proceed past the BUG() in
 rxrpc_publish_service_conn().

 (*) Variable length array warnings from SKCIPHER_REQUEST_ON_STACK() in
 rxkad.c.

Signed-off-by: David Howells <dhowe...@redhat.com>
---

 net/rxrpc/call_accept.c |3 ++-
 net/rxrpc/call_object.c |1 +
 net/rxrpc/conn_client.c |2 +-
 net/rxrpc/conn_object.c |2 +-
 net/rxrpc/proc.c|6 ++
 net/rxrpc/sendmsg.c |2 ++
 6 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/net/rxrpc/call_accept.c b/net/rxrpc/call_accept.c
index 92ebd1d7e0bb..4ce24c000653 100644
--- a/net/rxrpc/call_accept.c
+++ b/net/rxrpc/call_accept.c
@@ -225,7 +225,7 @@ void rxrpc_discard_prealloc(struct rxrpc_sock *rx)
tail = b->call_backlog_tail;
while (CIRC_CNT(head, tail, size) > 0) {
struct rxrpc_call *call = b->call_backlog[tail];
-   call->socket = rx;
+   rcu_assign_pointer(call->socket, rx);
if (rx->discard_new_call) {
_debug("discard %lx", call->user_call_ID);
rx->discard_new_call(call, call->user_call_ID);
@@ -456,6 +456,7 @@ struct rxrpc_call *rxrpc_accept_call(struct rxrpc_sock *rx,
 unsigned long user_call_ID,
 rxrpc_notify_rx_t notify_rx)
__releases(>sk.sk_lock.slock)
+   __acquires(call->user_mutex)
 {
struct rxrpc_call *call;
struct rb_node *parent, **pp;
diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c
index 147657dfe757..85b12c472522 100644
--- a/net/rxrpc/call_object.c
+++ b/net/rxrpc/call_object.c
@@ -219,6 +219,7 @@ struct rxrpc_call *rxrpc_new_client_call(struct rxrpc_sock 
*rx,
 gfp_t gfp,
 unsigned int debug_id)
__releases(>sk.sk_lock.slock)
+   __acquires(>user_mutex)
 {
struct rxrpc_call *call, *xcall;
struct rxrpc_net *rxnet = rxrpc_net(sock_net(>sk));
diff --git a/net/rxrpc/conn_client.c b/net/rxrpc/conn_client.c
index 064175068059..041da40dbf93 100644
--- a/net/rxrpc/conn_client.c
+++ b/net/rxrpc/conn_client.c
@@ -776,7 +776,7 @@ void rxrpc_disconnect_client_call(struct rxrpc_call *call)
unsigned int channel = call->cid & RXRPC_CHANNELMASK;
struct rxrpc_connection *conn = call->conn;
struct rxrpc_channel *chan = >channels[channel];
-   struct rxrpc_net *rxnet = rxrpc_net(sock_net(>socket->sk));
+   struct rxrpc_net *rxnet = conn->params.local->rxnet;
 
trace_rxrpc_client(conn, channel, rxrpc_client_chan_disconnect);
call->conn = NULL;
diff --git a/net/rxrpc/conn_object.c b/net/rxrpc/conn_object.c
index ccbac190add1..bfc46fd69a62 100644
--- a/net/rxrpc/conn_object.c
+++ b/net/rxrpc/conn_object.c
@@ -418,7 +418,7 @@ void rxrpc_service_connection_reaper(struct work_struct 
*work)
 */
if (atomic_cmpxchg(>usage, 1, 0) != 1)
continue;
-   trace_rxrpc_conn(conn, rxrpc_conn_reap_service, 0, 0);
+   trace_rxrpc_conn(conn, rxrpc_conn_reap_service, 0, NULL);
 
if (rxrpc_conn_is_client(conn))
BUG();
diff --git a/net/rxrpc/proc.c b/net/rxrpc/proc.c
index f79f260c6ddc..7e45db058823 100644
--- a/net/rxrpc/proc.c
+++ b/net/rxrpc/proc.c
@@ -29,6 +29,8 @@ static const char *const 
rxrpc_conn_states[RXRPC_CONN__NR_STATES] = {
  * generate a list of extant and dead calls in /proc/net/rxrpc_calls
  */
 static void *rxrpc_call_seq_start(struct seq_file *seq, loff_t *_pos)
+   __acquires(rcu)
+   __acquires(rxnet->call_lock)
 {
struct rxrpc_net *rxnet = rxrpc_net(seq_file_net(seq));
 
@@ -45,6 +47,8 @@ static void *rxrpc_call_seq_next(struct seq_file *seq, void 
*v, loff_t *pos)
 }
 
 static void rxrpc_call_seq_stop(struct seq_file *seq, void *v)
+   __releases(rxnet->call_lock)
+   __releases(rcu)
 {
struct rxrpc_net *rxnet = rxrpc_net(seq_file_net(seq));
 
@@ -135,6 +139,7 @@ const struct file_operations rxrpc_call_seq_fops = {
  * generate a list of extant virtual connect

[PATCH net-next 06/12] rxrpc: remove unused static variables

2018-03-30 Thread David Howells
From: Sebastian Andrzej Siewior <bige...@linutronix.de>

The rxrpc_security_methods and rxrpc_security_sem user has been removed
in 648af7fca159 ("rxrpc: Absorb the rxkad security module"). This was
noticed by kbuild test robot for the -RT tree but is also true for !RT.

Reported-by: kbuild test robot <fengguang...@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bige...@linutronix.de>
Signed-off-by: David Howells <dhowe...@redhat.com>
---

 net/rxrpc/security.c |3 ---
 1 file changed, 3 deletions(-)

diff --git a/net/rxrpc/security.c b/net/rxrpc/security.c
index e9f428351293..c4479afe8ae7 100644
--- a/net/rxrpc/security.c
+++ b/net/rxrpc/security.c
@@ -19,9 +19,6 @@
 #include 
 #include "ar-internal.h"
 
-static LIST_HEAD(rxrpc_security_methods);
-static DECLARE_RWSEM(rxrpc_security_sem);
-
 static const struct rxrpc_security *rxrpc_security_types[] = {
[RXRPC_SECURITY_NONE]   = _no_security,
 #ifdef CONFIG_RXKAD



[PATCH net-next 08/12] rxrpc: Fix potential call vs socket/net destruction race

2018-03-30 Thread David Howells
rxrpc_call structs don't pin sockets or network namespaces, but may attempt
to access both after their refcount reaches 0 so that they can detach
themselves from the network namespace.  However, there's no guarantee that
the socket still exists at this point (so sock_net(>socket->sk) may
be invalid) and the namespace may have gone away if the call isn't pinning
a peer.

Fix this by (a) carrying a net pointer in the rxrpc_call struct and (b)
waiting for all calls to be destroyed when the network namespace goes away.

This was detected by checker:

net/rxrpc/call_object.c:634:57: warning: incorrect type in argument 1 
(different address spaces)
net/rxrpc/call_object.c:634:57:expected struct sock const *sk
net/rxrpc/call_object.c:634:57:got struct sock [noderef] *

Fixes: 2baec2c3f854 ("rxrpc: Support network namespacing")
Signed-off-by: David Howells <dhowe...@redhat.com>
---

 net/rxrpc/ar-internal.h |2 ++
 net/rxrpc/call_accept.c |1 +
 net/rxrpc/call_object.c |   16 +---
 net/rxrpc/net_ns.c  |1 +
 4 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 8a348e0a9d95..2a2b0fdfb157 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -75,6 +75,7 @@ struct rxrpc_net {
u32 epoch;  /* Local epoch for detecting 
local-end reset */
struct list_headcalls;  /* List of calls active in this 
namespace */
rwlock_tcall_lock;  /* Lock for ->calls */
+   atomic_tnr_calls;   /* Count of allocated calls */
 
struct list_headconn_proc_list; /* List of conns in this 
namespace for proc */
struct list_headservice_conns;  /* Service conns in this 
namespace */
@@ -528,6 +529,7 @@ struct rxrpc_call {
struct rxrpc_connection *conn;  /* connection carrying call */
struct rxrpc_peer   *peer;  /* Peer record for remote 
address */
struct rxrpc_sock __rcu *socket;/* socket responsible */
+   struct rxrpc_net*rxnet; /* Network namespace to which 
call belongs */
struct mutexuser_mutex; /* User access mutex */
unsigned long   ack_at; /* When deferred ACK needs to 
happen */
unsigned long   ack_lost_at;/* When ACK is figured as lost 
*/
diff --git a/net/rxrpc/call_accept.c b/net/rxrpc/call_accept.c
index 4ce24c000653..493545033e42 100644
--- a/net/rxrpc/call_accept.c
+++ b/net/rxrpc/call_accept.c
@@ -138,6 +138,7 @@ static int rxrpc_service_prealloc_one(struct rxrpc_sock *rx,
 
write_unlock(>call_lock);
 
+   rxnet = call->rxnet;
write_lock(>call_lock);
list_add_tail(>link, >calls);
write_unlock(>call_lock);
diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c
index 85b12c472522..f721c2b7e234 100644
--- a/net/rxrpc/call_object.c
+++ b/net/rxrpc/call_object.c
@@ -103,6 +103,7 @@ struct rxrpc_call *rxrpc_alloc_call(struct rxrpc_sock *rx, 
gfp_t gfp,
unsigned int debug_id)
 {
struct rxrpc_call *call;
+   struct rxrpc_net *rxnet = rxrpc_net(sock_net(>sk));
 
call = kmem_cache_zalloc(rxrpc_call_jar, gfp);
if (!call)
@@ -153,6 +154,9 @@ struct rxrpc_call *rxrpc_alloc_call(struct rxrpc_sock *rx, 
gfp_t gfp,
 
call->cong_cwnd = 2;
call->cong_ssthresh = RXRPC_RXTX_BUFF_SIZE - 1;
+
+   call->rxnet = rxnet;
+   atomic_inc(>nr_calls);
return call;
 
 nomem_2:
@@ -222,7 +226,7 @@ struct rxrpc_call *rxrpc_new_client_call(struct rxrpc_sock 
*rx,
__acquires(>user_mutex)
 {
struct rxrpc_call *call, *xcall;
-   struct rxrpc_net *rxnet = rxrpc_net(sock_net(>sk));
+   struct rxrpc_net *rxnet;
struct rb_node *parent, **pp;
const void *here = __builtin_return_address(0);
int ret;
@@ -272,6 +276,7 @@ struct rxrpc_call *rxrpc_new_client_call(struct rxrpc_sock 
*rx,
 
write_unlock(>call_lock);
 
+   rxnet = call->rxnet;
write_lock(>call_lock);
list_add_tail(>link, >calls);
write_unlock(>call_lock);
@@ -617,7 +622,7 @@ void rxrpc_release_calls_on_socket(struct rxrpc_sock *rx)
  */
 void rxrpc_put_call(struct rxrpc_call *call, enum rxrpc_call_trace op)
 {
-   struct rxrpc_net *rxnet;
+   struct rxrpc_net *rxnet = call->rxnet;
const void *here = __builtin_return_address(0);
int n;
 
@@ -631,7 +636,6 @@ void rxrpc_put_call(struct rxrpc_call *call, enum 
rxrpc_call_trace op)
ASSERTCMP(call->state, ==, RXRPC_CALL_COMPLETE);
 
if (!list_empty(>link)) {
-   rxnet = rxrpc_net(sock_net(>socket->sk));
write_lock(>call_lock);
  

[PATCH net-next 05/12] rxrpc: Fix resend event time calculation

2018-03-30 Thread David Howells
From: Marc Dionne <marc.dio...@auristor.com>

Commit a158bdd3 ("rxrpc: Fix call timeouts") reworked the time calculation
for the next resend event.  For this calculation, "oldest" will be before
"now", so ktime_sub(oldest, now) will yield a negative value.  When passed
to nsecs_to_jiffies which expects an unsigned value, the end result will be
a very large value, and a resend event scheduled far into the future.  This
could cause calls to stall if some packets were lost.

Fix by ordering the arguments to ktime_sub correctly.

Fixes: a158bdd3247b ("rxrpc: Fix call timeouts")
Signed-off-by: Marc Dionne <marc.dio...@auristor.com>
Signed-off-by: David Howells <dhowe...@redhat.com>
---

 net/rxrpc/call_event.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
index 3dee89c7a06e..6e0d788b4dc4 100644
--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
@@ -226,7 +226,7 @@ static void rxrpc_resend(struct rxrpc_call *call, unsigned 
long now_j)
   ktime_to_ns(ktime_sub(skb->tstamp, 
max_age)));
}
 
-   resend_at = nsecs_to_jiffies(ktime_to_ns(ktime_sub(oldest, now)));
+   resend_at = nsecs_to_jiffies(ktime_to_ns(ktime_sub(now, oldest)));
resend_at += jiffies + rxrpc_resend_timeout;
WRITE_ONCE(call->resend_at, resend_at);
 



[PATCH net-next 11/12] rxrpc: Add a tracepoint to track rxrpc_peer refcounting

2018-03-30 Thread David Howells
Add a tracepoint to track reference counting on the rxrpc_peer struct.

Signed-off-by: David Howells <dhowe...@redhat.com>
---

 include/trace/events/rxrpc.h |   42 +++
 net/rxrpc/ar-internal.h  |   23 +++
 net/rxrpc/peer_event.c   |2 +
 net/rxrpc/peer_object.c  |   65 +-
 4 files changed, 110 insertions(+), 22 deletions(-)

diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h
index 0410dfeb79c6..9e96c2fe2793 100644
--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
@@ -50,6 +50,14 @@ enum rxrpc_local_trace {
rxrpc_local_queued,
 };
 
+enum rxrpc_peer_trace {
+   rxrpc_peer_got,
+   rxrpc_peer_new,
+   rxrpc_peer_processing,
+   rxrpc_peer_put,
+   rxrpc_peer_queued_error,
+};
+
 enum rxrpc_conn_trace {
rxrpc_conn_got,
rxrpc_conn_new_client,
@@ -230,6 +238,13 @@ enum rxrpc_congest_change {
EM(rxrpc_local_put, "PUT") \
E_(rxrpc_local_queued,  "QUE")
 
+#define rxrpc_peer_traces \
+   EM(rxrpc_peer_got,  "GOT") \
+   EM(rxrpc_peer_new,  "NEW") \
+   EM(rxrpc_peer_processing,   "PRO") \
+   EM(rxrpc_peer_put,  "PUT") \
+   E_(rxrpc_peer_queued_error, "QER")
+
 #define rxrpc_conn_traces \
EM(rxrpc_conn_got,  "GOT") \
EM(rxrpc_conn_new_client,   "NWc") \
@@ -482,6 +497,33 @@ TRACE_EVENT(rxrpc_local,
  __entry->where)
);
 
+TRACE_EVENT(rxrpc_peer,
+   TP_PROTO(struct rxrpc_peer *peer, enum rxrpc_peer_trace op,
+int usage, const void *where),
+
+   TP_ARGS(peer, op, usage, where),
+
+   TP_STRUCT__entry(
+   __field(unsigned int,   peer)
+   __field(int,op  )
+   __field(int,usage   )
+   __field(const void *,   where   )
+),
+
+   TP_fast_assign(
+   __entry->peer = peer->debug_id;
+   __entry->op = op;
+   __entry->usage = usage;
+   __entry->where = where;
+  ),
+
+   TP_printk("P=%08x %s u=%d sp=%pSR",
+ __entry->peer,
+ __print_symbolic(__entry->op, rxrpc_peer_traces),
+ __entry->usage,
+ __entry->where)
+   );
+
 TRACE_EVENT(rxrpc_conn,
TP_PROTO(struct rxrpc_connection *conn, enum rxrpc_conn_trace op,
 int usage, const void *where),
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index d40d54b78567..c46583bc255d 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -1041,25 +1041,10 @@ struct rxrpc_peer *rxrpc_lookup_peer(struct rxrpc_local 
*,
 struct rxrpc_peer *rxrpc_alloc_peer(struct rxrpc_local *, gfp_t);
 struct rxrpc_peer *rxrpc_lookup_incoming_peer(struct rxrpc_local *,
  struct rxrpc_peer *);
-
-static inline struct rxrpc_peer *rxrpc_get_peer(struct rxrpc_peer *peer)
-{
-   atomic_inc(>usage);
-   return peer;
-}
-
-static inline
-struct rxrpc_peer *rxrpc_get_peer_maybe(struct rxrpc_peer *peer)
-{
-   return atomic_inc_not_zero(>usage) ? peer : NULL;
-}
-
-extern void __rxrpc_put_peer(struct rxrpc_peer *peer);
-static inline void rxrpc_put_peer(struct rxrpc_peer *peer)
-{
-   if (peer && atomic_dec_and_test(>usage))
-   __rxrpc_put_peer(peer);
-}
+struct rxrpc_peer *rxrpc_get_peer(struct rxrpc_peer *);
+struct rxrpc_peer *rxrpc_get_peer_maybe(struct rxrpc_peer *);
+void rxrpc_put_peer(struct rxrpc_peer *);
+void __rxrpc_queue_peer_error(struct rxrpc_peer *);
 
 /*
  * proc.c
diff --git a/net/rxrpc/peer_event.c b/net/rxrpc/peer_event.c
index d01eb9a06448..78c2f95d1f22 100644
--- a/net/rxrpc/peer_event.c
+++ b/net/rxrpc/peer_event.c
@@ -192,7 +192,7 @@ void rxrpc_error_report(struct sock *sk)
rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
 
/* The ref we obtained is passed off to the work item */
-   rxrpc_queue_work(>error_distributor);
+   __rxrpc_queue_peer_error(peer);
_leave("");
 }
 
diff --git a/net/rxrpc/peer_object.c b/net/rxrpc/peer_object.c
index 94a6dbfcf129..a4a750aea1e5 100644
--- a/net/rxrpc/peer_object.c
+++ b/net/rxrpc/peer_object.c
@@ -386,9 +386,54 @@ struct rxrpc_peer *rxrpc_lookup_peer(struct rxrpc_local 
*local,
 }
 
 /*
- * Discard a ref on a remote peer record.
+ * Get a ref on a peer record.
  */
-void __rxrpc_put_peer(struct rxrpc_peer *peer)
+struc

[PATCH net-next 10/12] rxrpc: Fix apparent leak of rxrpc_local objects

2018-03-30 Thread David Howells
rxrpc_local objects cannot be disposed of until all the connections that
point to them have been RCU'd as a connection object holds refcount on the
local endpoint it is communicating through.  Currently, this can cause an
assertion failure to occur when a network namespace is destroyed as there's
no check that the RCU destructors for the connections have been run before
we start trying to destroy local endpoints.

The kernel reports:

rxrpc: AF_RXRPC: Leaked local 36a41bc1 {5}
[ cut here ]
kernel BUG at ../net/rxrpc/local_object.c:439!

Fix this by keeping a count of the live connections and waiting for it to
go to zero at the end of rxrpc_destroy_all_connections().

Fixes: dee46364ce6f ("rxrpc: Add RCU destruction for connections and calls")
Signed-off-by: David Howells <dhowe...@redhat.com>
---

 net/rxrpc/ar-internal.h  |1 +
 net/rxrpc/call_accept.c  |2 ++
 net/rxrpc/conn_client.c  |1 +
 net/rxrpc/conn_object.c  |8 
 net/rxrpc/conn_service.c |1 +
 net/rxrpc/net_ns.c   |1 +
 6 files changed, 14 insertions(+)

diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index cc51d3eb0548..d40d54b78567 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -77,6 +77,7 @@ struct rxrpc_net {
rwlock_tcall_lock;  /* Lock for ->calls */
atomic_tnr_calls;   /* Count of allocated calls */
 
+   atomic_tnr_conns;
struct list_headconn_proc_list; /* List of conns in this 
namespace for proc */
struct list_headservice_conns;  /* Service conns in this 
namespace */
rwlock_tconn_lock;  /* Lock for ->conn_proc_list, 
->service_conns */
diff --git a/net/rxrpc/call_accept.c b/net/rxrpc/call_accept.c
index 5a9b1d916124..f67017dcb25e 100644
--- a/net/rxrpc/call_accept.c
+++ b/net/rxrpc/call_accept.c
@@ -219,6 +219,8 @@ void rxrpc_discard_prealloc(struct rxrpc_sock *rx)
list_del(>proc_link);
write_unlock(>conn_lock);
kfree(conn);
+   if (atomic_dec_and_test(>nr_conns))
+   wake_up_atomic_t(>nr_conns);
tail = (tail + 1) & (size - 1);
}
 
diff --git a/net/rxrpc/conn_client.c b/net/rxrpc/conn_client.c
index 041da40dbf93..5736f643c516 100644
--- a/net/rxrpc/conn_client.c
+++ b/net/rxrpc/conn_client.c
@@ -207,6 +207,7 @@ rxrpc_alloc_client_connection(struct rxrpc_conn_parameters 
*cp, gfp_t gfp)
if (ret < 0)
goto error_2;
 
+   atomic_inc(>nr_conns);
write_lock(>conn_lock);
list_add_tail(>proc_link, >conn_proc_list);
write_unlock(>conn_lock);
diff --git a/net/rxrpc/conn_object.c b/net/rxrpc/conn_object.c
index bfc46fd69a62..0950ee3d26f5 100644
--- a/net/rxrpc/conn_object.c
+++ b/net/rxrpc/conn_object.c
@@ -365,6 +365,9 @@ static void rxrpc_destroy_connection(struct rcu_head *rcu)
key_put(conn->params.key);
key_put(conn->server_key);
rxrpc_put_peer(conn->params.peer);
+
+   if (atomic_dec_and_test(>params.local->rxnet->nr_conns))
+   wake_up_atomic_t(>params.local->rxnet->nr_conns);
rxrpc_put_local(conn->params.local);
 
kfree(conn);
@@ -458,6 +461,7 @@ void rxrpc_destroy_all_connections(struct rxrpc_net *rxnet)
 
_enter("");
 
+   atomic_dec(>nr_conns);
rxrpc_destroy_all_client_connections(rxnet);
 
del_timer_sync(>service_conn_reap_timer);
@@ -475,5 +479,9 @@ void rxrpc_destroy_all_connections(struct rxrpc_net *rxnet)
 
ASSERT(list_empty(>conn_proc_list));
 
+   /* We need to wait for the connections to be destroyed by RCU as they
+* pin things that we still need to get rid of.
+*/
+   wait_on_atomic_t(>nr_conns, atomic_t_wait, TASK_UNINTERRUPTIBLE);
_leave("");
 }
diff --git a/net/rxrpc/conn_service.c b/net/rxrpc/conn_service.c
index f6fcdb3130a1..80773a50c755 100644
--- a/net/rxrpc/conn_service.c
+++ b/net/rxrpc/conn_service.c
@@ -132,6 +132,7 @@ struct rxrpc_connection 
*rxrpc_prealloc_service_connection(struct rxrpc_net *rxn
conn->state = RXRPC_CONN_SERVICE_PREALLOC;
atomic_set(>usage, 2);
 
+   atomic_inc(>nr_conns);
write_lock(>conn_lock);
list_add_tail(>link, >service_conns);
list_add_tail(>proc_link, >conn_proc_list);
diff --git a/net/rxrpc/net_ns.c b/net/rxrpc/net_ns.c
index 101019b0be34..fa9ce60e7bfa 100644
--- a/net/rxrpc/net_ns.c
+++ b/net/rxrpc/net_ns.c
@@ -57,6 +57,7 @@ static __net_init int rxrpc_init_net(struct net *net)
rwlock_init(>call_lock);
atomic_set(>nr_calls, 1);
 
+   atomic_set(>nr_conns, 1);
INIT_LIST_HEAD(>conn_proc_list);
INIT_LIST_HEAD(>service_conns);
rwlock_init(>conn_lock);



[PATCH net-next 09/12] rxrpc: Add a tracepoint to track rxrpc_local refcounting

2018-03-30 Thread David Howells
Add a tracepoint to track reference counting on the rxrpc_local struct.

Signed-off-by: David Howells <dhowe...@redhat.com>
---

 include/trace/events/rxrpc.h |   43 
 net/rxrpc/ar-internal.h  |   27 +++--
 net/rxrpc/call_accept.c  |3 +-
 net/rxrpc/local_object.c |   65 +-
 4 files changed, 111 insertions(+), 27 deletions(-)

diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h
index 2ea788f6f95d..0410dfeb79c6 100644
--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
@@ -42,6 +42,14 @@ enum rxrpc_skb_trace {
rxrpc_skb_tx_seen,
 };
 
+enum rxrpc_local_trace {
+   rxrpc_local_got,
+   rxrpc_local_new,
+   rxrpc_local_processing,
+   rxrpc_local_put,
+   rxrpc_local_queued,
+};
+
 enum rxrpc_conn_trace {
rxrpc_conn_got,
rxrpc_conn_new_client,
@@ -215,6 +223,13 @@ enum rxrpc_congest_change {
EM(rxrpc_skb_tx_rotated,"Tx ROT") \
E_(rxrpc_skb_tx_seen,   "Tx SEE")
 
+#define rxrpc_local_traces \
+   EM(rxrpc_local_got, "GOT") \
+   EM(rxrpc_local_new, "NEW") \
+   EM(rxrpc_local_processing,  "PRO") \
+   EM(rxrpc_local_put, "PUT") \
+   E_(rxrpc_local_queued,  "QUE")
+
 #define rxrpc_conn_traces \
EM(rxrpc_conn_got,  "GOT") \
EM(rxrpc_conn_new_client,   "NWc") \
@@ -416,6 +431,7 @@ enum rxrpc_congest_change {
 #define E_(a, b) TRACE_DEFINE_ENUM(a);
 
 rxrpc_skb_traces;
+rxrpc_local_traces;
 rxrpc_conn_traces;
 rxrpc_client_traces;
 rxrpc_call_traces;
@@ -439,6 +455,33 @@ rxrpc_congest_changes;
 #define EM(a, b)   { a, b },
 #define E_(a, b)   { a, b }
 
+TRACE_EVENT(rxrpc_local,
+   TP_PROTO(struct rxrpc_local *local, enum rxrpc_local_trace op,
+int usage, const void *where),
+
+   TP_ARGS(local, op, usage, where),
+
+   TP_STRUCT__entry(
+   __field(unsigned int,   local   )
+   __field(int,op  )
+   __field(int,usage   )
+   __field(const void *,   where   )
+),
+
+   TP_fast_assign(
+   __entry->local = local->debug_id;
+   __entry->op = op;
+   __entry->usage = usage;
+   __entry->where = where;
+  ),
+
+   TP_printk("L=%08x %s u=%d sp=%pSR",
+ __entry->local,
+ __print_symbolic(__entry->op, rxrpc_local_traces),
+ __entry->usage,
+ __entry->where)
+   );
+
 TRACE_EVENT(rxrpc_conn,
TP_PROTO(struct rxrpc_connection *conn, enum rxrpc_conn_trace op,
 int usage, const void *where),
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 2a2b0fdfb157..cc51d3eb0548 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -981,31 +981,12 @@ extern void rxrpc_process_local_events(struct rxrpc_local 
*);
  * local_object.c
  */
 struct rxrpc_local *rxrpc_lookup_local(struct net *, const struct 
sockaddr_rxrpc *);
-void __rxrpc_put_local(struct rxrpc_local *);
+struct rxrpc_local *rxrpc_get_local(struct rxrpc_local *);
+struct rxrpc_local *rxrpc_get_local_maybe(struct rxrpc_local *);
+void rxrpc_put_local(struct rxrpc_local *);
+void rxrpc_queue_local(struct rxrpc_local *);
 void rxrpc_destroy_all_locals(struct rxrpc_net *);
 
-static inline void rxrpc_get_local(struct rxrpc_local *local)
-{
-   atomic_inc(>usage);
-}
-
-static inline
-struct rxrpc_local *rxrpc_get_local_maybe(struct rxrpc_local *local)
-{
-   return atomic_inc_not_zero(>usage) ? local : NULL;
-}
-
-static inline void rxrpc_put_local(struct rxrpc_local *local)
-{
-   if (local && atomic_dec_and_test(>usage))
-   __rxrpc_put_local(local);
-}
-
-static inline void rxrpc_queue_local(struct rxrpc_local *local)
-{
-   rxrpc_queue_work(>processor);
-}
-
 /*
  * misc.c
  */
diff --git a/net/rxrpc/call_accept.c b/net/rxrpc/call_accept.c
index 493545033e42..5a9b1d916124 100644
--- a/net/rxrpc/call_accept.c
+++ b/net/rxrpc/call_accept.c
@@ -296,8 +296,7 @@ static struct rxrpc_call *rxrpc_alloc_incoming_call(struct 
rxrpc_sock *rx,
b->conn_backlog[conn_tail] = NULL;
smp_store_release(>conn_backlog_tail,
  (conn_tail + 1) & (RXRPC_BACKLOG_MAX - 1));
-   rxrpc_get_local(local);
-   conn->params.local = local;
+   conn->params.local = 

[PATCH net-next 12/12] rxrpc: Fix leak of rxrpc_peer objects

2018-03-30 Thread David Howells
When a new client call is requested, an rxrpc_conn_parameters struct object
is passed in with a bunch of parameters set, such as the local endpoint to
use.  A pointer to the target peer record is also placed in there by
rxrpc_get_client_conn() - and this is removed if and only if a new
connection object is allocated.  Thus it leaks if a new connection object
isn't allocated.

Fix this by putting any peer object attached to the rxrpc_conn_parameters
object in the function that allocated it.

Fixes: 19ffa01c9c45 ("rxrpc: Use structs to hold connection params and protocol 
info")
Signed-off-by: David Howells <dhowe...@redhat.com>
---

 net/rxrpc/af_rxrpc.c|2 ++
 net/rxrpc/ar-internal.h |1 +
 net/rxrpc/net_ns.c  |1 +
 net/rxrpc/peer_object.c |   21 +
 net/rxrpc/sendmsg.c |1 +
 5 files changed, 26 insertions(+)

diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index 0b3026b8fa40..9a2c8e7c000e 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -324,6 +324,7 @@ struct rxrpc_call *rxrpc_kernel_begin_call(struct socket 
*sock,
mutex_unlock(>user_mutex);
}
 
+   rxrpc_put_peer(cp.peer);
_leave(" = %p", call);
return call;
 }
@@ -447,6 +448,7 @@ int rxrpc_kernel_retry_call(struct socket *sock, struct 
rxrpc_call *call,
ret = rxrpc_retry_client_call(rx, call, , srx, GFP_KERNEL);
 
mutex_unlock(>user_mutex);
+   rxrpc_put_peer(cp.peer);
_leave(" = %d", ret);
return ret;
 }
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index c46583bc255d..90d7079e0aa9 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -1041,6 +1041,7 @@ struct rxrpc_peer *rxrpc_lookup_peer(struct rxrpc_local *,
 struct rxrpc_peer *rxrpc_alloc_peer(struct rxrpc_local *, gfp_t);
 struct rxrpc_peer *rxrpc_lookup_incoming_peer(struct rxrpc_local *,
  struct rxrpc_peer *);
+void rxrpc_destroy_all_peers(struct rxrpc_net *);
 struct rxrpc_peer *rxrpc_get_peer(struct rxrpc_peer *);
 struct rxrpc_peer *rxrpc_get_peer_maybe(struct rxrpc_peer *);
 void rxrpc_put_peer(struct rxrpc_peer *);
diff --git a/net/rxrpc/net_ns.c b/net/rxrpc/net_ns.c
index fa9ce60e7bfa..c7a023fb22d0 100644
--- a/net/rxrpc/net_ns.c
+++ b/net/rxrpc/net_ns.c
@@ -118,6 +118,7 @@ static __net_exit void rxrpc_exit_net(struct net *net)
cancel_work_sync(>peer_keepalive_work);
rxrpc_destroy_all_calls(rxnet);
rxrpc_destroy_all_connections(rxnet);
+   rxrpc_destroy_all_peers(rxnet);
rxrpc_destroy_all_locals(rxnet);
proc_remove(rxnet->proc_net);
 }
diff --git a/net/rxrpc/peer_object.c b/net/rxrpc/peer_object.c
index a4a750aea1e5..1b7e8107b3ae 100644
--- a/net/rxrpc/peer_object.c
+++ b/net/rxrpc/peer_object.c
@@ -463,6 +463,27 @@ void rxrpc_put_peer(struct rxrpc_peer *peer)
}
 }
 
+/*
+ * Make sure all peer records have been discarded.
+ */
+void rxrpc_destroy_all_peers(struct rxrpc_net *rxnet)
+{
+   struct rxrpc_peer *peer;
+   int i;
+
+   for (i = 0; i < HASH_SIZE(rxnet->peer_hash); i++) {
+   if (hlist_empty(>peer_hash[i]))
+   continue;
+
+   hlist_for_each_entry(peer, >peer_hash[i], hash_link) {
+   pr_err("Leaked peer %u {%u} %pISp\n",
+  peer->debug_id,
+  atomic_read(>usage),
+  >srx.transport);
+   }
+   }
+}
+
 /**
  * rxrpc_kernel_get_peer - Get the peer address of a call
  * @sock: The socket on which the call is in progress.
diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c
index a62980a80151..206e802ccbdc 100644
--- a/net/rxrpc/sendmsg.c
+++ b/net/rxrpc/sendmsg.c
@@ -586,6 +586,7 @@ rxrpc_new_client_call_for_sendmsg(struct rxrpc_sock *rx, 
struct msghdr *msg,
 atomic_inc_return(_debug_id));
/* The socket is now unlocked */
 
+   rxrpc_put_peer(cp.peer);
_leave(" = %p\n", call);
return call;
 }



Re: [PATCH net-next 0/6] rxrpc: Fixes

2018-03-29 Thread David Howells
David Miller  wrote:

> David, this GIT URL has tons of unrelated changes.  It seems to bring in
> the parts of Linus's tree that haven't proagated to 'net' yet.

Sorry about that, I rebased on the wrong branch by accident.

I've got some more fixes.  Should I just give the lot to you to pull into your
net-next tree, given that the merge window may well open Sunday?

David


[PATCH net-next 0/3] rxrpc: Tracing updates

2018-03-27 Thread David Howells

Here are some patches that update tracing in AF_RXRPC and AFS:

 (1) Add a tracepoint for tracking resend events.

 (2) Use debug_ids in traces rather than pointers (as pointers are now hashed)
 and allow use of the same debug_id in AFS calls as in the corresponding
 AF_RXRPC calls.  This makes filtering the trace output much easier.

 (3) Add a tracepoint for tracking call completion.

The patches are tagged here:

git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
rxrpc-next-20180327

and can also be found on this branch:


http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-next

David
---
David Howells (3):
  rxrpc: Trace resend
  rxrpc, afs: Use debug_ids rather than pointers in traces
  rxrpc: Trace call completion


 fs/afs/internal.h|1 
 fs/afs/rxrpc.c   |   12 ++
 include/net/af_rxrpc.h   |   11 ++
 include/trace/events/afs.h   |   69 ++
 include/trace/events/rxrpc.h |  206 +++---
 net/rxrpc/af_rxrpc.c |7 +
 net/rxrpc/ar-internal.h  |9 +-
 net/rxrpc/call_accept.c  |   18 ++--
 net/rxrpc/call_event.c   |1 
 net/rxrpc/call_object.c  |   15 ++-
 net/rxrpc/conn_event.c   |3 -
 net/rxrpc/input.c|6 +
 net/rxrpc/sendmsg.c  |3 -
 13 files changed, 219 insertions(+), 142 deletions(-)



[PATCH net-next 1/3] rxrpc: Trace resend

2018-03-27 Thread David Howells
Add a tracepoint to trace packet resend events and to dump the Tx
annotation buffer for added illumination.

Signed-off-by: David Howells <dhowe...@rdhat.com>
---

 include/trace/events/rxrpc.h |   24 
 net/rxrpc/call_event.c   |1 +
 2 files changed, 25 insertions(+)

diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h
index 36cb50c111a6..41979f907575 100644
--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
@@ -420,6 +420,7 @@ rxrpc_rtt_rx_traces;
 rxrpc_timer_traces;
 rxrpc_propose_ack_traces;
 rxrpc_propose_ack_outcomes;
+rxrpc_congest_modes;
 rxrpc_congest_changes;
 
 /*
@@ -1229,6 +1230,29 @@ TRACE_EVENT(rxrpc_connect_call,
  __entry->call_id)
);
 
+TRACE_EVENT(rxrpc_resend,
+   TP_PROTO(struct rxrpc_call *call, int ix),
+
+   TP_ARGS(call, ix),
+
+   TP_STRUCT__entry(
+   __field(struct rxrpc_call *,call)
+   __field(int,ix  )
+   __array(u8, anno, 64)
+),
+
+   TP_fast_assign(
+   __entry->call = call;
+   __entry->ix = ix;
+   memcpy(__entry->anno, call->rxtx_annotations, 64);
+  ),
+
+   TP_printk("c=%p ix=%u a=%64phN",
+ __entry->call,
+ __entry->ix,
+ __entry->anno)
+   );
+
 #endif /* _TRACE_RXRPC_H */
 
 /* This part must be outside protection */
diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
index ad2ab1103189..6a62e42e1d8d 100644
--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
@@ -195,6 +195,7 @@ static void rxrpc_resend(struct rxrpc_call *call, unsigned 
long now_j)
 * the packets in the Tx buffer we're going to resend and what the new
 * resend timeout will be.
 */
+   trace_rxrpc_resend(call, (cursor + 1) & RXRPC_RXTX_BUFF_MASK);
oldest = now;
for (seq = cursor + 1; before_eq(seq, top); seq++) {
ix = seq & RXRPC_RXTX_BUFF_MASK;



[PATCH net-next 2/3] rxrpc, afs: Use debug_ids rather than pointers in traces

2018-03-27 Thread David Howells
In rxrpc and afs, use the debug_ids that are monotonically allocated to
various objects as they're allocated rather than pointers as kernel
pointers are now hashed making them less useful.  Further, the debug ids
aren't reused anywhere nearly as quickly.

In addition, allow kernel services that use rxrpc, such as afs, to take
numbers from the rxrpc counter, assign them to their own call struct and
pass them in to rxrpc for both client and service calls so that the trace
lines for each will have the same ID tag.

Signed-off-by: David Howells <dhowe...@redhat.com>
---

 fs/afs/internal.h|1 
 fs/afs/rxrpc.c   |   12 ++-
 include/net/af_rxrpc.h   |   11 ++-
 include/trace/events/afs.h   |   69 ---
 include/trace/events/rxrpc.h |  155 +-
 net/rxrpc/af_rxrpc.c |7 +-
 net/rxrpc/ar-internal.h  |8 +-
 net/rxrpc/call_accept.c  |   18 +++--
 net/rxrpc/call_object.c  |   15 ++--
 net/rxrpc/conn_event.c   |3 +
 net/rxrpc/input.c|6 +-
 net/rxrpc/sendmsg.c  |3 +
 12 files changed, 163 insertions(+), 145 deletions(-)

diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index f38d6a561a84..72217170b155 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -118,6 +118,7 @@ struct afs_call {
boolret_reply0; /* T if should return reply[0] 
on success */
boolupgrade;/* T to request service upgrade 
*/
u16 service_id; /* Actual service ID (after 
upgrade) */
+   unsigned intdebug_id;   /* Trace ID */
u32 operation_ID;   /* operation ID for an incoming 
call */
u32 count;  /* count for use in 
unmarshalling */
__be32  tmp;/* place to extract temporary 
data */
diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index e1126659f043..b819900916e6 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -131,6 +131,7 @@ static struct afs_call *afs_alloc_call(struct afs_net *net,
 
call->type = type;
call->net = net;
+   call->debug_id = atomic_inc_return(_debug_id);
atomic_set(>usage, 1);
INIT_WORK(>async_work, afs_process_async_call);
init_waitqueue_head(>waitq);
@@ -169,11 +170,12 @@ void afs_put_call(struct afs_call *call)
afs_put_server(call->net, call->cm_server);
afs_put_cb_interest(call->net, call->cbi);
kfree(call->request);
-   kfree(call);
 
-   o = atomic_dec_return(>nr_outstanding_calls);
trace_afs_call(call, afs_call_trace_free, 0, o,
   __builtin_return_address(0));
+   kfree(call);
+
+   o = atomic_dec_return(>nr_outstanding_calls);
if (o == 0)
wake_up_atomic_t(>nr_outstanding_calls);
}
@@ -378,7 +380,8 @@ long afs_make_call(struct afs_addr_cursor *ac, struct 
afs_call *call,
 (async ?
  afs_wake_up_async_call :
  afs_wake_up_call_waiter),
-call->upgrade);
+call->upgrade,
+call->debug_id);
if (IS_ERR(rxcall)) {
ret = PTR_ERR(rxcall);
goto error_kill_call;
@@ -727,7 +730,8 @@ void afs_charge_preallocation(struct work_struct *work)
   afs_wake_up_async_call,
   afs_rx_attach,
   (unsigned long)call,
-  GFP_KERNEL) < 0)
+  GFP_KERNEL,
+  call->debug_id) < 0)
break;
call = NULL;
}
diff --git a/include/net/af_rxrpc.h b/include/net/af_rxrpc.h
index 2b3a6eec4570..8ae8ee004258 100644
--- a/include/net/af_rxrpc.h
+++ b/include/net/af_rxrpc.h
@@ -31,6 +31,11 @@ enum rxrpc_call_completion {
NR__RXRPC_CALL_COMPLETIONS
 };
 
+/*
+ * Debug ID counter for tracing.
+ */
+extern atomic_t rxrpc_debug_id;
+
 typedef void (*rxrpc_notify_rx_t)(struct sock *, struct rxrpc_call *,
  unsigned long);
 typedef void (*rxrpc_notify_end_tx_t)(struct sock *, struct rxrpc_call *,
@@ -50,7 +55,8 @@ struct rxrpc_call *rxrpc_kernel_begin_call(struct socket *,
   s64,
   gfp_t,
   rxrpc_notify_rx_t,
-  

[PATCH net-next 3/3] rxrpc: Trace call completion

2018-03-27 Thread David Howells
Add a tracepoint to track rxrpc calls moving into the completed state and
to log the completion type and the recorded error value and abort code.

Signed-off-by: David Howells <dhowe...@redhat.com>
---

 include/trace/events/rxrpc.h |   33 +
 net/rxrpc/ar-internal.h  |1 +
 2 files changed, 34 insertions(+)

diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h
index 4d2c2d35c5cb..2ea788f6f95d 100644
--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
@@ -400,6 +400,13 @@ enum rxrpc_congest_change {
EM(RXRPC_ACK_IDLE,  "IDL") \
E_(RXRPC_ACK__INVALID,  "-?-")
 
+#define rxrpc_completions \
+   EM(RXRPC_CALL_SUCCEEDED,"Succeeded") \
+   EM(RXRPC_CALL_REMOTELY_ABORTED, "RemoteAbort") \
+   EM(RXRPC_CALL_LOCALLY_ABORTED,  "LocalAbort") \
+   EM(RXRPC_CALL_LOCAL_ERROR,  "LocalError") \
+   E_(RXRPC_CALL_NETWORK_ERROR,"NetError")
+
 /*
  * Export enum symbols via userspace.
  */
@@ -624,6 +631,32 @@ TRACE_EVENT(rxrpc_abort,
  __entry->abort_code, __entry->error, __entry->why)
);
 
+TRACE_EVENT(rxrpc_call_complete,
+   TP_PROTO(struct rxrpc_call *call),
+
+   TP_ARGS(call),
+
+   TP_STRUCT__entry(
+   __field(unsigned int,   call)
+   __field(enum rxrpc_call_completion, compl   )
+   __field(int,error   )
+   __field(u32,abort_code  )
+),
+
+   TP_fast_assign(
+   __entry->call = call->debug_id;
+   __entry->compl = call->completion;
+   __entry->error = call->error;
+   __entry->abort_code = call->abort_code;
+  ),
+
+   TP_printk("c=%08x %s r=%d ac=%d",
+ __entry->call,
+ __print_symbolic(__entry->compl, rxrpc_completions),
+ __entry->error,
+ __entry->abort_code)
+   );
+
 TRACE_EVENT(rxrpc_transmit,
TP_PROTO(struct rxrpc_call *call, enum rxrpc_transmit_trace why),
 
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 9c9817ddafc5..21cf164b6d85 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -778,6 +778,7 @@ static inline bool __rxrpc_set_call_completion(struct 
rxrpc_call *call,
call->error = error;
call->completion = compl,
call->state = RXRPC_CALL_COMPLETE;
+   trace_rxrpc_call_complete(call);
wake_up(>waitq);
return true;
}



Re: [PATCH net-next 0/6] rxrpc: Fixes

2018-03-27 Thread David Howells
Sorry, that should be net, not net-next, in the subject line.

David


[PATCH net-next 0/6] rxrpc: Fixes

2018-03-27 Thread David Howells

Here are six patches for AF_RXRPC:

 (1) Fix the use of VERSION packets to keep firewall routes open.

 (2) Fix the incorrect current time usage in a tracepoint.

 (3) Fix Tx ring annotation corruption.

 (4) Fix accidental conversion of call-level abort into connection-level
 abort.

 (5) Fix calculation of resend time.

 (6) Remove a couple of unused variables.

The patches are tagged here:

git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
rxrpc-fixes-20180327

and can also be found on the following branch:


http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-fixes

David
---
David Howells (4):
  rxrpc: Fix firewall route keepalive
  rxrpc: Fix a bit of time confusion
  rxrpc: Fix Tx ring annotation after initial Tx failure
  rxrpc: Don't treat call aborts as conn aborts

Marc Dionne (1):
  rxrpc: Fix resend event time calculation

Sebastian Andrzej Siewior (1):
  rxrpc: remove unused static variables


 net/rxrpc/af_rxrpc.c|4 ++
 net/rxrpc/ar-internal.h |   14 ++-
 net/rxrpc/call_event.c  |4 +-
 net/rxrpc/conn_event.c  |3 +
 net/rxrpc/input.c   |   17 +---
 net/rxrpc/net_ns.c  |   21 ++
 net/rxrpc/output.c  |   59 -
 net/rxrpc/peer_event.c  |   96 +++
 net/rxrpc/peer_object.c |7 +++
 net/rxrpc/rxkad.c   |2 +
 net/rxrpc/security.c|3 -
 net/rxrpc/sendmsg.c |4 +-
 12 files changed, 218 insertions(+), 16 deletions(-)



[PATCH net 1/6] rxrpc: Fix firewall route keepalive

2018-03-27 Thread David Howells
Fix the firewall route keepalive part of AF_RXRPC which is currently
function incorrectly by replying to VERSION REPLY packets from the server
with VERSION REQUEST packets.

Instead, send VERSION REPLY packets to the peers of service connections to
act as keep-alives 20s after the latest packet was transmitted to that
peer.

Also, just discard VERSION REPLY packets rather than replying to them.

Signed-off-by: David Howells <dhowe...@redhat.com>
---

 net/rxrpc/af_rxrpc.c|4 ++
 net/rxrpc/ar-internal.h |   14 ++-
 net/rxrpc/conn_event.c  |3 +
 net/rxrpc/input.c   |2 +
 net/rxrpc/net_ns.c  |   21 ++
 net/rxrpc/output.c  |   59 -
 net/rxrpc/peer_event.c  |   96 +++
 net/rxrpc/peer_object.c |7 +++
 net/rxrpc/rxkad.c   |2 +
 9 files changed, 204 insertions(+), 4 deletions(-)

diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index 0c9c18aa7c77..eca50b495021 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -759,6 +759,7 @@ static __poll_t rxrpc_poll(struct file *file, struct socket 
*sock,
 static int rxrpc_create(struct net *net, struct socket *sock, int protocol,
int kern)
 {
+   struct rxrpc_net *rxnet;
struct rxrpc_sock *rx;
struct sock *sk;
 
@@ -798,6 +799,9 @@ static int rxrpc_create(struct net *net, struct socket 
*sock, int protocol,
rwlock_init(>call_lock);
memset(>srx, 0, sizeof(rx->srx));
 
+   rxnet = rxrpc_net(sock_net(>sk));
+   timer_reduce(>peer_keepalive_timer, jiffies + 1);
+
_leave(" = 0 [%p]", rx);
return 0;
 }
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 416688381eb7..352ef8d546f9 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -97,8 +97,16 @@ struct rxrpc_net {
struct list_headlocal_endpoints;
struct mutexlocal_mutex;/* Lock for ->local_endpoints */
 
-   spinlock_t  peer_hash_lock; /* Lock for ->peer_hash */
DECLARE_HASHTABLE   (peer_hash, 10);
+   spinlock_t  peer_hash_lock; /* Lock for ->peer_hash */
+
+#define RXRPC_KEEPALIVE_TIME 20 /* NAT keepalive time in seconds */
+   u8  peer_keepalive_cursor;
+   ktime_t peer_keepalive_base;
+   struct hlist_head   peer_keepalive[RXRPC_KEEPALIVE_TIME + 1];
+   struct hlist_head   peer_keepalive_new;
+   struct timer_list   peer_keepalive_timer;
+   struct work_struct  peer_keepalive_work;
 };
 
 /*
@@ -285,6 +293,8 @@ struct rxrpc_peer {
struct hlist_head   error_targets;  /* targets for net error 
distribution */
struct work_struct  error_distributor;
struct rb_root  service_conns;  /* Service connections */
+   struct hlist_node   keepalive_link; /* Link in 
net->peer_keepalive[] */
+   time64_tlast_tx_at; /* Last time packet sent here */
seqlock_t   service_conn_lock;
spinlock_t  lock;   /* access lock */
unsigned intif_mtu; /* interface MTU for this peer 
*/
@@ -1025,6 +1035,7 @@ int rxrpc_send_ack_packet(struct rxrpc_call *, bool, 
rxrpc_serial_t *);
 int rxrpc_send_abort_packet(struct rxrpc_call *);
 int rxrpc_send_data_packet(struct rxrpc_call *, struct sk_buff *, bool);
 void rxrpc_reject_packets(struct rxrpc_local *);
+void rxrpc_send_keepalive(struct rxrpc_peer *);
 
 /*
  * peer_event.c
@@ -1033,6 +1044,7 @@ void rxrpc_error_report(struct sock *);
 void rxrpc_peer_error_distributor(struct work_struct *);
 void rxrpc_peer_add_rtt(struct rxrpc_call *, enum rxrpc_rtt_rx_trace,
rxrpc_serial_t, rxrpc_serial_t, ktime_t, ktime_t);
+void rxrpc_peer_keepalive_worker(struct work_struct *);
 
 /*
  * peer_object.c
diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c
index b1dfae107431..5f06dc8266bf 100644
--- a/net/rxrpc/conn_event.c
+++ b/net/rxrpc/conn_event.c
@@ -136,6 +136,7 @@ static void rxrpc_conn_retransmit_call(struct 
rxrpc_connection *conn,
}
 
kernel_sendmsg(conn->params.local->socket, , iov, ioc, len);
+   conn->params.peer->last_tx_at = ktime_get_real();
_leave("");
return;
 }
@@ -238,6 +239,8 @@ static int rxrpc_abort_connection(struct rxrpc_connection 
*conn,
return -EAGAIN;
}
 
+   conn->params.peer->last_tx_at = ktime_get_real();
+
_leave(" = 0");
return 0;
 }
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index 6fc61400337f..31bf954480b6 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -1183,6 +1183,8 @@ void rxrpc_data_ready(struct sock *udp_sk)
 
switch (sp->hdr.type) {
case RXRPC_PACKET_TYPE_VERSION:
+  

[PATCH net 4/6] rxrpc: Don't treat call aborts as conn aborts

2018-03-27 Thread David Howells
If a call-level abort is received for the previous call to complete on a
connection channel, then that abort is queued for the connection processor
to handle.  Unfortunately, the connection processor then assumes without
checking that the abort is connection-level (ie. callNumber is 0) and
distributes it over all active calls on that connection, thereby
incorrectly aborting them.

Fix this by discarding aborts aimed at a completed call.

Further, discard all packets aimed at a call that's complete if there's
currently an active call on a channel, since the DATA packets associated
with the new call automatically terminate the old call.

Fixes: 18bfeba50dfd ("rxrpc: Perform terminal call ACK/ABORT retransmission 
from conn processor")
Reported-by: Marc Dionne <marc.dio...@auristor.com>
Signed-off-by: David Howells <dhowe...@redhat.com>
---

 net/rxrpc/input.c |   15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index 31bf954480b6..846ec0938953 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -1242,16 +1242,19 @@ void rxrpc_data_ready(struct sock *udp_sk)
goto discard_unlock;
 
if (sp->hdr.callNumber == chan->last_call) {
-   /* For the previous service call, if completed 
successfully, we
-* discard all further packets.
+   if (chan->call ||
+   sp->hdr.type == RXRPC_PACKET_TYPE_ABORT)
+   goto discard_unlock;
+
+   /* For the previous service call, if completed
+* successfully, we discard all further packets.
 */
if (rxrpc_conn_is_service(conn) &&
-   (chan->last_type == RXRPC_PACKET_TYPE_ACK ||
-sp->hdr.type == RXRPC_PACKET_TYPE_ABORT))
+   chan->last_type == RXRPC_PACKET_TYPE_ACK)
goto discard_unlock;
 
-   /* But otherwise we need to retransmit the final packet 
from
-* data cached in the connection record.
+   /* But otherwise we need to retransmit the final packet
+* from data cached in the connection record.
 */
rxrpc_post_packet_to_conn(conn, skb);
goto out_unlock;



[PATCH net 2/6] rxrpc: Fix a bit of time confusion

2018-03-27 Thread David Howells
The rxrpc_reduce_call_timer() function should be passed the 'current time'
in jiffies, not the current ktime time.  It's confusing in rxrpc_resend
because that has to deal with both.  Pass the correct current time in.

Note that this only affects the trace produced and not the functioning of
the code.

Fixes: a158bdd3247b ("rxrpc: Fix call timeouts")
Signed-off-by: David Howells <dhowe...@redhat.com>
---

 net/rxrpc/call_event.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
index ad2ab1103189..27f77b7d0ead 100644
--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
@@ -237,7 +237,7 @@ static void rxrpc_resend(struct rxrpc_call *call, unsigned 
long now_j)
 * retransmitting data.
 */
if (!retrans) {
-   rxrpc_reduce_call_timer(call, resend_at, now,
+   rxrpc_reduce_call_timer(call, resend_at, now_j,
rxrpc_timer_set_for_resend);
spin_unlock_bh(>lock);
ack_ts = ktime_sub(now, call->acks_latest_ts);



[PATCH net 3/6] rxrpc: Fix Tx ring annotation after initial Tx failure

2018-03-27 Thread David Howells
rxrpc calls have a ring of packets that are awaiting ACK or retransmission
and a parallel ring of annotations that tracks the state of those packets.
If the initial transmission of a packet on the underlying UDP socket fails
then the packet annotation is marked for resend - but the setting of this
mark accidentally erases the last-packet mark also stored in the same
annotation slot.  If this happens, a call won't switch out of the Tx phase
when all the packets have been transmitted.

Fix this by retaining the last-packet mark and only altering the packet
state.

Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
Signed-off-by: David Howells <dhowe...@redhat.com>
---

 net/rxrpc/sendmsg.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c
index 09f2a3e05221..7a94ce92ffdc 100644
--- a/net/rxrpc/sendmsg.c
+++ b/net/rxrpc/sendmsg.c
@@ -130,7 +130,9 @@ static inline void rxrpc_instant_resend(struct rxrpc_call 
*call, int ix)
spin_lock_bh(>lock);
 
if (call->state < RXRPC_CALL_COMPLETE) {
-   call->rxtx_annotations[ix] = RXRPC_TX_ANNO_RETRANS;
+   call->rxtx_annotations[ix] =
+   (call->rxtx_annotations[ix] & RXRPC_TX_ANNO_LAST) |
+   RXRPC_TX_ANNO_RETRANS;
if (!test_and_set_bit(RXRPC_CALL_EV_RESEND, >events))
rxrpc_queue_call(call);
}



[PATCH net 5/6] rxrpc: Fix resend event time calculation

2018-03-27 Thread David Howells
From: Marc Dionne <marc.dio...@auristor.com>

Commit a158bdd3 ("rxrpc: Fix call timeouts") reworked the time calculation
for the next resend event.  For this calculation, "oldest" will be before
"now", so ktime_sub(oldest, now) will yield a negative value.  When passed
to nsecs_to_jiffies which expects an unsigned value, the end result will be
a very large value, and a resend event scheduled far into the future.  This
could cause calls to stall if some packets were lost.

Fix by ordering the arguments to ktime_sub correctly.

Fixes: a158bdd3247b ("rxrpc: Fix call timeouts")
Signed-off-by: Marc Dionne <marc.dio...@auristor.com>
Signed-off-by: David Howells <dhowe...@redhat.com>
---

 net/rxrpc/call_event.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
index 27f77b7d0ead..87ec697ca117 100644
--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
@@ -225,7 +225,7 @@ static void rxrpc_resend(struct rxrpc_call *call, unsigned 
long now_j)
   ktime_to_ns(ktime_sub(skb->tstamp, 
max_age)));
}
 
-   resend_at = nsecs_to_jiffies(ktime_to_ns(ktime_sub(oldest, now)));
+   resend_at = nsecs_to_jiffies(ktime_to_ns(ktime_sub(now, oldest)));
resend_at += jiffies + rxrpc_resend_timeout;
WRITE_ONCE(call->resend_at, resend_at);
 



[PATCH net 6/6] rxrpc: remove unused static variables

2018-03-27 Thread David Howells
From: Sebastian Andrzej Siewior <bige...@linutronix.de>

The rxrpc_security_methods and rxrpc_security_sem user has been removed
in 648af7fca159 ("rxrpc: Absorb the rxkad security module"). This was
noticed by kbuild test robot for the -RT tree but is also true for !RT.

Reported-by: kbuild test robot <fengguang...@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bige...@linutronix.de>
Signed-off-by: David Howells <dhowe...@redhat.com>
---

 net/rxrpc/security.c |3 ---
 1 file changed, 3 deletions(-)

diff --git a/net/rxrpc/security.c b/net/rxrpc/security.c
index e9f428351293..c4479afe8ae7 100644
--- a/net/rxrpc/security.c
+++ b/net/rxrpc/security.c
@@ -19,9 +19,6 @@
 #include 
 #include "ar-internal.h"
 
-static LIST_HEAD(rxrpc_security_methods);
-static DECLARE_RWSEM(rxrpc_security_sem);
-
 static const struct rxrpc_security *rxrpc_security_types[] = {
[RXRPC_SECURITY_NONE]   = _no_security,
 #ifdef CONFIG_RXKAD



Re: [PATCH net-next nfs 6/6] net: Convert rxrpc_net_ops

2018-03-20 Thread David Howells
Kirill Tkhai <ktk...@virtuozzo.com> wrote:

> These pernet_operations modifies rxrpc_net_id-pointed
> per-net entities. There is external link to AF_RXRPC
> in fs/afs/Kconfig, but it seems there is no other
> pernet_operations interested in that per-net entities.

The fs/afs/ namespacing stuff isn't active yet as I'm trying to decide on how
to deal with the DNS cache which also needs namespacing.

> Signed-off-by: Kirill Tkhai <ktk...@virtuozzo.com>

DaveM: Can you take this into net-next and add:

Acked-by: David Howells <dhowe...@redhat.com>


Re: [PATCH 00/16] remove eight obsolete architectures

2018-03-15 Thread David Howells
Do we have anything left that still implements NOMMU?

David


Re: [PATCH v2] KEYS: DNS: limit the length of option strings

2018-03-07 Thread David Howells
Eric Biggers  wrote:

> Fix it by limiting option strings (combined name + value) to a much more
> reasonable 128 bytes.  The exact limit is arbitrary, but currently the
> only recognized option is formatted as "dnserror=%lu" which fits well
> within this limit.

There will be more options coming ("ipv4", "ipv6") but they shouldn't overrun
this limit and we can always extend the limit if need be.

David


[PATCH net] rxrpc: Fix send in rxrpc_send_data_packet()

2018-02-22 Thread David Howells
All the kernel_sendmsg() calls in rxrpc_send_data_packet() need to send
both parts of the iov[] buffer, but one of them does not.  Fix it so that
it does.

Without this, short IPv6 rxrpc DATA packets may be seen that have the rxrpc
header included, but no payload.

Fixes: 5a924b8951f8 ("rxrpc: Don't store the rxrpc header in the Tx queue 
sk_buffs")
Reported-by: Marc Dionne <marc.dio...@auristor.com>
Signed-off-by: David Howells <dhowe...@redhat.com>
---

 net/rxrpc/output.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c
index 42410e910aff..cf73dc006c3b 100644
--- a/net/rxrpc/output.c
+++ b/net/rxrpc/output.c
@@ -445,7 +445,7 @@ int rxrpc_send_data_packet(struct rxrpc_call *call, struct 
sk_buff *skb,
(char *), sizeof(opt));
if (ret == 0) {
ret = kernel_sendmsg(conn->params.local->socket, ,
-iov, 1, iov[0].iov_len);
+iov, 2, len);
 
opt = IPV6_PMTUDISC_DO;
kernel_setsockopt(conn->params.local->socket,



[PATCH net] rxrpc: Work around usercopy check

2018-02-15 Thread David Howells
Due to a check recently added to copy_to_user(), it's now not permitted to
copy from slab-held data to userspace unless the slab is whitelisted.  This
affects rxrpc_recvmsg() when it attempts to place an RXRPC_USER_CALL_ID
control message in the userspace control message buffer.  A warning is
generated by usercopy_warn() because the source is the copy of the
user_call_ID retained in the rxrpc_call struct.

Work around the issue by copying the user_call_ID to a variable on the
stack and passing that to put_cmsg().

The warning generated looks like:

Bad or missing usercopy whitelist? Kernel memory exposure attempt 
detected from SLUB object 'dmaengine-unmap-128' (offset 680, size 8)!
WARNING: CPU: 0 PID: 1401 at mm/usercopy.c:81 usercopy_warn+0x7e/0xa0
...
RIP: 0010:usercopy_warn+0x7e/0xa0
...
Call Trace:
 __check_object_size+0x9c/0x1a0
 put_cmsg+0x98/0x120
 rxrpc_recvmsg+0x6fc/0x1010 [rxrpc]
 ? finish_wait+0x80/0x80
 ___sys_recvmsg+0xf8/0x240
 ? __clear_rsb+0x25/0x3d
 ? __clear_rsb+0x15/0x3d
 ? __clear_rsb+0x25/0x3d
 ? __clear_rsb+0x15/0x3d
 ? __clear_rsb+0x25/0x3d
 ? __clear_rsb+0x15/0x3d
 ? __clear_rsb+0x25/0x3d
 ? __clear_rsb+0x15/0x3d
 ? finish_task_switch+0xa6/0x2b0
 ? trace_hardirqs_on_caller+0xed/0x180
 ? _raw_spin_unlock_irq+0x29/0x40
 ? __sys_recvmsg+0x4e/0x90
 __sys_recvmsg+0x4e/0x90
 do_syscall_64+0x7a/0x220
 entry_SYSCALL_64_after_hwframe+0x26/0x9b

Reported-by: Jonathan Billings <jsbilli...@jsbillings.org>
Signed-off-by: David Howells <dhowe...@redhat.com>
Acked-by: Kees Cook <keesc...@chromium.org>
Tested-by: Jonathan Billings <jsbilli...@jsbillings.org>
---

 net/rxrpc/recvmsg.c |5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c
index cc21e8db25b0..9d45d8b56744 100644
--- a/net/rxrpc/recvmsg.c
+++ b/net/rxrpc/recvmsg.c
@@ -517,9 +517,10 @@ int rxrpc_recvmsg(struct socket *sock, struct msghdr *msg, 
size_t len,
ret = put_cmsg(msg, SOL_RXRPC, RXRPC_USER_CALL_ID,
   sizeof(unsigned int), );
} else {
+   unsigned long idl = call->user_call_ID;
+
ret = put_cmsg(msg, SOL_RXRPC, RXRPC_USER_CALL_ID,
-  sizeof(unsigned long),
-  >user_call_ID);
+  sizeof(unsigned long), );
}
if (ret < 0)
goto error_unlock_call;



[PATCH net] rxrpc: Don't put crypto buffers on the stack

2018-02-08 Thread David Howells
Don't put buffers of data to be handed to crypto on the stack as this may
cause an assertion failure in the kernel (see below).  Fix this by using an
kmalloc'd buffer instead.

kernel BUG at ./include/linux/scatterlist.h:147!
...
RIP: 0010:rxkad_encrypt_response.isra.6+0x191/0x1b0 [rxrpc]
RSP: 0018:be2fc06cfca8 EFLAGS: 00010246
RAX:  RBX: 989277d59900 RCX: 0028
RDX: 259dc06cfd88 RSI: 0025 RDI: be30406cfd88
RBP: be2fc06cfd60 R08: be2fc06cfd08 R09: be2fc06cfd08
R10:  R11:  R12: 17c5f80d9f95
R13: be2fc06cfd88 R14: 98927a3f7aa0 R15: be2fc06cfd08
FS:  () GS:98927fc0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 55b1ff28f0f8 CR3: 1b412003 CR4: 003606f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 rxkad_respond_to_challenge+0x297/0x330 [rxrpc]
 rxrpc_process_connection+0xd1/0x690 [rxrpc]
 ? process_one_work+0x1c3/0x680
 ? __lock_is_held+0x59/0xa0
 process_one_work+0x249/0x680
 worker_thread+0x3a/0x390
 ? process_one_work+0x680/0x680
 kthread+0x121/0x140
 ? kthread_create_worker_on_cpu+0x70/0x70
 ret_from_fork+0x3a/0x50

Reported-by: Jonathan Billings <jsbilli...@jsbillings.org>
Reported-by: Marc Dionne <marc.dio...@auristor.com>
Signed-off-by: David Howells <dhowe...@redhat.com>
Tested-by: Jonathan Billings <jsbilli...@jsbillings.org>
---

 net/rxrpc/conn_event.c |1 +
 net/rxrpc/rxkad.c  |   92 +++-
 2 files changed, 52 insertions(+), 41 deletions(-)

diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c
index 4ca11be6be3c..b1dfae107431 100644
--- a/net/rxrpc/conn_event.c
+++ b/net/rxrpc/conn_event.c
@@ -460,6 +460,7 @@ void rxrpc_process_connection(struct work_struct *work)
case -EKEYEXPIRED:
case -EKEYREJECTED:
goto protocol_error;
+   case -ENOMEM:
case -EAGAIN:
goto requeue_and_leave;
case -ECONNABORTED:
diff --git a/net/rxrpc/rxkad.c b/net/rxrpc/rxkad.c
index c38b3a1de56c..77cb23c7bd0a 100644
--- a/net/rxrpc/rxkad.c
+++ b/net/rxrpc/rxkad.c
@@ -773,8 +773,7 @@ static int rxkad_respond_to_challenge(struct 
rxrpc_connection *conn,
 {
const struct rxrpc_key_token *token;
struct rxkad_challenge challenge;
-   struct rxkad_response resp
-   __attribute__((aligned(8))); /* must be aligned for crypto */
+   struct rxkad_response *resp;
struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
const char *eproto;
u32 version, nonce, min_level, abort_code;
@@ -818,26 +817,29 @@ static int rxkad_respond_to_challenge(struct 
rxrpc_connection *conn,
token = conn->params.key->payload.data[0];
 
/* build the response packet */
-   memset(, 0, sizeof(resp));
-
-   resp.version= htonl(RXKAD_VERSION);
-   resp.encrypted.epoch= htonl(conn->proto.epoch);
-   resp.encrypted.cid  = htonl(conn->proto.cid);
-   resp.encrypted.securityIndex= htonl(conn->security_ix);
-   resp.encrypted.inc_nonce= htonl(nonce + 1);
-   resp.encrypted.level= htonl(conn->params.security_level);
-   resp.kvno   = htonl(token->kad->kvno);
-   resp.ticket_len = htonl(token->kad->ticket_len);
-
-   resp.encrypted.call_id[0] = htonl(conn->channels[0].call_counter);
-   resp.encrypted.call_id[1] = htonl(conn->channels[1].call_counter);
-   resp.encrypted.call_id[2] = htonl(conn->channels[2].call_counter);
-   resp.encrypted.call_id[3] = htonl(conn->channels[3].call_counter);
+   resp = kzalloc(sizeof(struct rxkad_response), GFP_NOFS);
+   if (!resp)
+   return -ENOMEM;
+
+   resp->version   = htonl(RXKAD_VERSION);
+   resp->encrypted.epoch   = htonl(conn->proto.epoch);
+   resp->encrypted.cid = htonl(conn->proto.cid);
+   resp->encrypted.securityIndex   = htonl(conn->security_ix);
+   resp->encrypted.inc_nonce   = htonl(nonce + 1);
+   resp->encrypted.level   = htonl(conn->params.security_level);
+   resp->kvno  = htonl(token->kad->kvno);
+   resp->ticket_len= htonl(token->kad->ticket_len);
+   resp->encrypted.call_id[0]  = htonl(conn->channels[0].call_counter);
+   resp->encrypted.call_id[1]  = htonl(conn->channels[1].call_counter);
+   resp->encrypted.call_id[2]  = htonl(conn->channels[2].call_counter);
+   resp->encrypted.call_id[3]  = htonl(conn->

[PATCH net] rxrpc: Fix received abort handling

2018-02-06 Thread David Howells
AF_RXRPC is incorrectly sending back to the server any abort it receives
for a client connection.  This is due to the final-ACK offload to the
connection event processor patch.  The abort code is copied into the
last-call information on the connection channel and then the event
processor is set.

Instead, the following should be done:

 (1) In the case of a final-ACK for a successful call, the ACK should be
 scheduled as before.

 (2) In the case of a locally generated ABORT, the ABORT details should be
 cached for sending in response to further packets related to that
 call and no further action scheduled at call disconnect time.

 (3) In the case of an ACK received from the peer, the call should be
 considered dead, no ABORT should be transmitted at this time.  In
 response to further non-ABORT packets from the peer relating to this
 call, an RX_USER_ABORT ABORT should be transmitted.

 (4) In the case of a call killed due to network error, an RX_USER_ABORT
 ABORT should be cached for transmission in response to further
 packets, but no ABORT should be sent at this time.

Fixes: 3136ef49a14c ("rxrpc: Delay terminal ACK transmission on a client call")
Signed-off-by: David Howells <dhowe...@redhat.com>
---

 net/rxrpc/conn_client.c |3 ++-
 net/rxrpc/conn_object.c |   16 
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/net/rxrpc/conn_client.c b/net/rxrpc/conn_client.c
index 7f74ca3059f8..064175068059 100644
--- a/net/rxrpc/conn_client.c
+++ b/net/rxrpc/conn_client.c
@@ -834,7 +834,8 @@ void rxrpc_disconnect_client_call(struct rxrpc_call *call)
 * can be skipped if we find a follow-on call.  The first DATA packet
 * of the follow on call will implicitly ACK this call.
 */
-   if (test_bit(RXRPC_CALL_EXPOSED, >flags)) {
+   if (call->completion == RXRPC_CALL_SUCCEEDED &&
+   test_bit(RXRPC_CALL_EXPOSED, >flags)) {
unsigned long final_ack_at = jiffies + 2;
 
WRITE_ONCE(chan->final_ack_at, final_ack_at);
diff --git a/net/rxrpc/conn_object.c b/net/rxrpc/conn_object.c
index c628351eb900..ccbac190add1 100644
--- a/net/rxrpc/conn_object.c
+++ b/net/rxrpc/conn_object.c
@@ -177,13 +177,21 @@ void __rxrpc_disconnect_call(struct rxrpc_connection 
*conn,
 * through the channel, whilst disposing of the actual call 
record.
 */
trace_rxrpc_disconnect_call(call);
-   if (call->abort_code) {
-   chan->last_abort = call->abort_code;
-   chan->last_type = RXRPC_PACKET_TYPE_ABORT;
-   } else {
+   switch (call->completion) {
+   case RXRPC_CALL_SUCCEEDED:
chan->last_seq = call->rx_hard_ack;
chan->last_type = RXRPC_PACKET_TYPE_ACK;
+   break;
+   case RXRPC_CALL_LOCALLY_ABORTED:
+   chan->last_abort = call->abort_code;
+   chan->last_type = RXRPC_PACKET_TYPE_ABORT;
+   break;
+   default:
+   chan->last_abort = RX_USER_ABORT;
+   chan->last_type = RXRPC_PACKET_TYPE_ABORT;
+   break;
}
+
/* Sync with rxrpc_conn_retransmit(). */
smp_wmb();
chan->last_call = chan->call_id;



[PATCH net] rxrpc: Fix the MAINTAINERS record

2017-12-01 Thread David Howells
Fix the MAINTAINERS record so that it's more obvious who the maintainer for
AF_RXRPC is.

Reported-by: Joe Perches <j...@perches.com>
Reported-by: David Miller <da...@davemloft.net>
Signed-off-by: David Howells <dhowe...@redhat.com>
---

 MAINTAINERS |   18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 77d819b458a9..511b858405bc 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -554,13 +554,13 @@ S:Orphan
 F: Documentation/filesystems/affs.txt
 F: fs/affs/
 
-AFS FILESYSTEM & AF_RXRPC SOCKET DOMAIN
+AFS FILESYSTEM
 M: David Howells <dhowe...@redhat.com>
 L: linux-...@lists.infradead.org
 S: Supported
 F: fs/afs/
-F: include/net/af_rxrpc.h
-F: net/rxrpc/af_rxrpc.c
+F: include/trace/events/afs.h
+F: Documentation/filesystems/afs.txt
 W: https://www.infradead.org/~dhowells/kafs/
 
 AGPGART DRIVER
@@ -11777,6 +11777,18 @@ T: git 
git://git.kernel.org/pub/scm/linux/kernel/git/jes/linux.git rtl8xxxu-deve
 S: Maintained
 F: drivers/net/wireless/realtek/rtl8xxxu/
 
+RXRPC SOCKETS (AF_RXRPC)
+M: David Howells <dhowe...@redhat.com>
+L: linux-...@lists.infradead.org
+S: Supported
+F: net/rxrpc/
+F: include/keys/rxrpc-type.h
+F: include/net/af_rxrpc.h
+F: include/trace/events/rxrpc.h
+F: include/uapi/linux/rxrpc.h
+F: Documentation/networking/rxrpc.txt
+W: https://www.infradead.org/~dhowells/kafs/
+
 S3 SAVAGE FRAMEBUFFER DRIVER
 M: Antonino Daplas <adap...@gmail.com>
 L: linux-fb...@vger.kernel.org



[PATCH net] rxrpc: Use correct netns source in rxrpc_release_sock()

2017-12-01 Thread David Howells
In rxrpc_release_sock() there may be no rx->local value to access, so we
can't unconditionally follow it to the rxrpc network namespace information
to poke the connection reapers.

Instead, use the socket's namespace pointer to find the namespace.

This unfixed code causes the following static checker warning:

net/rxrpc/af_rxrpc.c:898 rxrpc_release_sock()
error: we previously assumed 'rx->local' could be null (see line 887)

Fixes: 3d18cbb7fd0c ("rxrpc: Fix conn expiry timers")
Reported-by: Dan Carpenter <dan.carpen...@oracle.com>
Signed-off-by: David Howells <dhowe...@redhat.com>
---

 net/rxrpc/af_rxrpc.c |5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index 8f7cf4c042be..dcd818fa837e 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -860,6 +860,7 @@ static void rxrpc_sock_destructor(struct sock *sk)
 static int rxrpc_release_sock(struct sock *sk)
 {
struct rxrpc_sock *rx = rxrpc_sk(sk);
+   struct rxrpc_net *rxnet = rxrpc_net(sock_net(>sk));
 
_enter("%p{%d,%d}", sk, sk->sk_state, refcount_read(>sk_refcnt));
 
@@ -895,8 +896,8 @@ static int rxrpc_release_sock(struct sock *sk)
rxrpc_release_calls_on_socket(rx);
flush_workqueue(rxrpc_workqueue);
rxrpc_purge_queue(>sk_receive_queue);
-   rxrpc_queue_work(>local->rxnet->service_conn_reaper);
-   rxrpc_queue_work(>local->rxnet->client_conn_reaper);
+   rxrpc_queue_work(>service_conn_reaper);
+   rxrpc_queue_work(>client_conn_reaper);
 
rxrpc_put_local(rx->local);
rx->local = NULL;



Re: [PATCH] rxrpc: Neaten logging macros and add KERN_DEBUG logging level

2017-11-30 Thread David Howells
Joe Perches <j...@perches.com> wrote:

> There is no listed rxrpc maintainer.

There's a script in the kernel called get_maintainer.pl which you might find
of use:

warthog>./scripts/get_maintainer.pl net/rxrpc/
"David S. Miller" <da...@davemloft.net> (maintainer:NETWORKING 
[GENERAL],commit_signer:23/74=31%)
David Howells <dhowe...@redhat.com> 
(commit_signer:64/74=86%,authored:60/74=81%)
netdev@vger.kernel.org (open list:NETWORKING [GENERAL])
linux-ker...@vger.kernel.org (open list)

It would seem a good idea to cc me, as the one with the largest "authored"
also.

David


Re: [PATCH net-next 0/3] rxrpc: Fixes

2017-11-29 Thread David Howells
David Miller  wrote:

> This email says "net-next", yet your patches say "net".

Sorry about that - it should be 'net'.  I copied an old cover note.  All the
patches have a macro substitution, but the cover note does not.  Do you want
me to repost?

David


[PATCH net-next 0/3] rxrpc: Fixes

2017-11-29 Thread David Howells

Here are three patches for AF_RXRPC.  One removes some whitespace, one
fixes terminal ACK generation and the third makes a couple of places
actually use the timeout value just determined rather than ignoring it.

The patches can be found here also:


http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-fixes

Tagged thusly:

git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
rxrpc-fixes-20171129

David
---
David Howells (2):
  rxrpc: Clean up whitespace
  rxrpc: Fix ACK generation from the connection event processor

Gustavo A. R. Silva (1):
  rxrpc: Fix variable overwrite


 net/rxrpc/call_event.c  |4 ++--
 net/rxrpc/conn_event.c  |   50 +++
 net/rxrpc/conn_object.c |2 +-
 net/rxrpc/input.c   |4 ++--
 net/rxrpc/sendmsg.c |2 +-
 5 files changed, 35 insertions(+), 27 deletions(-)



[PATCH net 2/3] rxrpc: Fix ACK generation from the connection event processor

2017-11-29 Thread David Howells
Repeat terminal ACKs and now terminal ACKs are now generated from the
connection event processor rather from call handling as this allows us to
discard client call structures as soon as possible and free up the channel
for a follow on call.

However, in ACKs so generated, the additional information trailer is
malformed because the padding that's meant to be in the middle isn't
included in what's transmitted.

Fix it so that the 3 bytes of padding are included in the transmission.

Further, the trailer is misaligned because of the padding, so assigment to
the u16 and u32 fields inside it might cause problems on some arches, so
fix this by breaking the padding and the trailer out of the packed struct.

(This also deals with potential compiler weirdies where some of the nested
structs are packed and some aren't).

The symptoms can be seen in wireshark as terminal DUPLICATE or IDLE ACK
packets in which the Max MTU, Interface MTU and rwind fields have weird
values and the Max Packets field is apparently missing.

Reported-by: Jeffrey Altman <jalt...@auristor.com>
Signed-off-by: David Howells <dhowe...@redhat.com>
---

 net/rxrpc/conn_event.c |   50 
 1 file changed, 29 insertions(+), 21 deletions(-)

diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c
index 9e9a8db1bc9c..4ca11be6be3c 100644
--- a/net/rxrpc/conn_event.c
+++ b/net/rxrpc/conn_event.c
@@ -30,22 +30,18 @@ static void rxrpc_conn_retransmit_call(struct 
rxrpc_connection *conn,
struct rxrpc_skb_priv *sp = skb ? rxrpc_skb(skb) : NULL;
struct rxrpc_channel *chan;
struct msghdr msg;
-   struct kvec iov;
+   struct kvec iov[3];
struct {
struct rxrpc_wire_header whdr;
union {
-   struct {
-   __be32 code;
-   } abort;
-   struct {
-   struct rxrpc_ackpacket ack;
-   u8 padding[3];
-   struct rxrpc_ackinfo info;
-   };
+   __be32 abort_code;
+   struct rxrpc_ackpacket ack;
};
} __attribute__((packed)) pkt;
+   struct rxrpc_ackinfo ack_info;
size_t len;
-   u32 serial, mtu, call_id;
+   int ioc;
+   u32 serial, mtu, call_id, padding;
 
_enter("%d", conn->debug_id);
 
@@ -66,6 +62,13 @@ static void rxrpc_conn_retransmit_call(struct 
rxrpc_connection *conn,
msg.msg_controllen = 0;
msg.msg_flags   = 0;
 
+   iov[0].iov_base = 
+   iov[0].iov_len  = sizeof(pkt.whdr);
+   iov[1].iov_base = 
+   iov[1].iov_len  = 3;
+   iov[2].iov_base = _info;
+   iov[2].iov_len  = sizeof(ack_info);
+
pkt.whdr.epoch  = htonl(conn->proto.epoch);
pkt.whdr.cid= htonl(conn->proto.cid);
pkt.whdr.callNumber = htonl(call_id);
@@ -80,8 +83,10 @@ static void rxrpc_conn_retransmit_call(struct 
rxrpc_connection *conn,
len = sizeof(pkt.whdr);
switch (chan->last_type) {
case RXRPC_PACKET_TYPE_ABORT:
-   pkt.abort.code  = htonl(chan->last_abort);
-   len += sizeof(pkt.abort);
+   pkt.abort_code  = htonl(chan->last_abort);
+   iov[0].iov_len += sizeof(pkt.abort_code);
+   len += sizeof(pkt.abort_code);
+   ioc = 1;
break;
 
case RXRPC_PACKET_TYPE_ACK:
@@ -94,13 +99,19 @@ static void rxrpc_conn_retransmit_call(struct 
rxrpc_connection *conn,
pkt.ack.serial  = htonl(skb ? sp->hdr.serial : 0);
pkt.ack.reason  = skb ? RXRPC_ACK_DUPLICATE : 
RXRPC_ACK_IDLE;
pkt.ack.nAcks   = 0;
-   pkt.info.rxMTU  = htonl(rxrpc_rx_mtu);
-   pkt.info.maxMTU = htonl(mtu);
-   pkt.info.rwind  = htonl(rxrpc_rx_window_size);
-   pkt.info.jumbo_max  = htonl(rxrpc_rx_jumbo_max);
+   ack_info.rxMTU  = htonl(rxrpc_rx_mtu);
+   ack_info.maxMTU = htonl(mtu);
+   ack_info.rwind  = htonl(rxrpc_rx_window_size);
+   ack_info.jumbo_max  = htonl(rxrpc_rx_jumbo_max);
pkt.whdr.flags  |= RXRPC_SLOW_START_OK;
-   len += sizeof(pkt.ack) + sizeof(pkt.info);
+   padding = 0;
+   iov[0].iov_len += sizeof(pkt.ack);
+   len += sizeof(pkt.ack) + 3 + sizeof(ack_info);
+   ioc = 3;
break;
+
+   default:
+   return;
}
 
/* Resync with __rxrpc_disconnect_call() and check that the last call
@@ -110,9 +121,6 @@ static void rxrpc_conn_retransmit_call(struct 
rxrpc_connection *conn,
if (READ_ONCE(chan->

[PATCH net 1/3] rxrpc: Clean up whitespace

2017-11-29 Thread David Howells
Clean up some whitespace from rxrpc.

Signed-off-by: David Howells <dhowe...@redhat.com>
---

 net/rxrpc/call_event.c  |2 +-
 net/rxrpc/conn_object.c |2 +-
 net/rxrpc/input.c   |4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
index bda952ffe6a6..555274ddc514 100644
--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
@@ -426,7 +426,7 @@ void rxrpc_process_call(struct work_struct *work)
next = call->expect_rx_by;
 
 #define set(T) { t = READ_ONCE(T); if (time_before(t, next)) next = t; }
-   
+
set(call->expect_req_by);
set(call->expect_term_by);
set(call->ack_at);
diff --git a/net/rxrpc/conn_object.c b/net/rxrpc/conn_object.c
index 1aad04a32d5e..c628351eb900 100644
--- a/net/rxrpc/conn_object.c
+++ b/net/rxrpc/conn_object.c
@@ -424,7 +424,7 @@ void rxrpc_service_connection_reaper(struct work_struct 
*work)
if (earliest != now + MAX_JIFFY_OFFSET) {
_debug("reschedule reaper %ld", (long)earliest - (long)now);
ASSERT(time_after(earliest, now));
-   rxrpc_set_service_reap_timer(rxnet, earliest);  
+   rxrpc_set_service_reap_timer(rxnet, earliest);
}
 
while (!list_empty()) {
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index 23a5e61d8f79..6fc61400337f 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -976,7 +976,7 @@ static void rxrpc_input_call_packet(struct rxrpc_call *call,
rxrpc_reduce_call_timer(call, expect_rx_by, now,
rxrpc_timer_set_for_normal);
}
-   
+
switch (sp->hdr.type) {
case RXRPC_PACKET_TYPE_DATA:
rxrpc_input_data(call, skb, skew);
@@ -1213,7 +1213,7 @@ void rxrpc_data_ready(struct sock *udp_sk)
goto reupgrade;
conn->service_id = sp->hdr.serviceId;
}
-   
+
if (sp->hdr.callNumber == 0) {
/* Connection-level packet */
_debug("CONN %p {%d}", conn, conn->debug_id);



[PATCH net 3/3] rxrpc: Fix variable overwrite

2017-11-29 Thread David Howells
From: Gustavo A. R. Silva <garsi...@embeddedor.com>

Values assigned to both variable resend_at and ack_at are overwritten
before they can be used.

The correct fix here is to add 'now' to the previously computed value in
resend_at and ack_at.

Addresses-Coverity-ID: 1462262
Addresses-Coverity-ID: 1462263
Addresses-Coverity-ID: 1462264
Fixes: beb8e5e4f38c ("rxrpc: Express protocol timeouts in terms of RTT")
Link: https://marc.info/?i=17004.1511808959%40warthog.procyon.org.uk
Signed-off-by: Gustavo A. R. Silva <garsi...@embeddedor.com>
Signed-off-by: David Howells <dhowe...@redhat.com>
---

 net/rxrpc/call_event.c |2 +-
 net/rxrpc/sendmsg.c|2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
index 555274ddc514..ad2ab1103189 100644
--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
@@ -123,7 +123,7 @@ static void __rxrpc_propose_ACK(struct rxrpc_call *call, u8 
ack_reason,
else
ack_at = expiry;
 
-   ack_at = jiffies + expiry;
+   ack_at += now;
if (time_before(ack_at, call->ack_at)) {
WRITE_ONCE(call->ack_at, ack_at);
rxrpc_reduce_call_timer(call, ack_at, now,
diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c
index a1c53ac066a1..09f2a3e05221 100644
--- a/net/rxrpc/sendmsg.c
+++ b/net/rxrpc/sendmsg.c
@@ -233,7 +233,7 @@ static void rxrpc_queue_packet(struct rxrpc_sock *rx, 
struct rxrpc_call *call,
if (resend_at < 1)
resend_at = 1;
 
-   resend_at = now + rxrpc_resend_timeout;
+   resend_at += now;
WRITE_ONCE(call->resend_at, resend_at);
rxrpc_reduce_call_timer(call, resend_at, now,
rxrpc_timer_set_for_send);



Re: [PATCH] rxrpc: call_event: Fix variable overwrite in __rxrpc_propose_ACK

2017-11-28 Thread David Howells
Gustavo A. R. Silva  wrote:

> - ack_at = jiffies + expiry;

Same issue as with the other patch.  Can you just combine the two please?

David


Re: [PATCH] rxrpc: sendmsg: Fix variable overwrite in rxrpc_queue_packet

2017-11-27 Thread David Howells
Gustavo A. R. Silva  wrote:

> -   resend_at = now + rxrpc_resend_timeout;
> +   resend_at += now;

Yep.  :-)

David


Re: [PATCH] rxrpc: sendmsg: Fix variable overwrite in rxrpc_queue_packet

2017-11-27 Thread David Howells
Gustavo A. R. Silva  wrote:

> Value assigned to variable resend_at is overwritten before it can be used.
> 
> Fix this by removing the value overwrite as it seems that this is a
> leftover code.

NAK.  Your fix will actually cause the code to break.

The resend_at value used for the timer must be based on the current time
(ie. jiffies in this case), so we can't simply remove that line as the
previously calculated resend_at value is a duration, not a time.

What you need to do is to instead modify the line you wanted to remove to add
'now' to the previously-computed value.

David


[PATCH net-next 00/12] rxrpc: Fixes and improvements

2017-11-24 Thread David Howells

Hi David,

Is it too late for this to go to Linus in this merge window?

---

Here's a set of patches that fix and improve some stuff in the AF_RXRPC
protocol:

The patches are:

 (1) Unlock mutex returned by rxrpc_accept_call().

 (2) Don't set connection upgrade by default.

 (3) Differentiate the call->user_mutex used by the kernel from that used
 by userspace calling sendmsg() to avoid lockdep warnings.

 (4) Delay terminal ACK transmission to a work queue so that it can be
 replaced by the next call if there is one.

 (5) Split the call parameters from the connection parameters so that more
 call-specific parameters can be passed through.

 (6) Fix the call timeouts to work the same as for other RxRPC/AFS
 implementations.

 (7) Don't transmit DELAY ACKs immediately, but instead delay them slightly
 so that can be discarded or can represent more packets.

 (8) Use RTT to calculate certain protocol timeouts.

 (9) Add a timeout to detect lost ACK/DATA packets.

(10) Add a keepalive function so that we ping the peer if we haven't
 transmitted for a short while, thereby keeping intervening firewall
 routes open.

(11) Make service endpoints expire like they're supposed to so that the UDP
 port can be reused.

(12) Fix connection expiry timers to make cleanup happen in a more timely
 fashion.

The patches can be found here also:


http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-fixes

Tagged thusly:

git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
rxrpc-fixes-20171124

David
---
David Howells (12):
  rxrpc: The mutex lock returned by rxrpc_accept_call() needs releasing
  rxrpc: Don't set upgrade by default in sendmsg()
  rxrpc: Provide a different lockdep key for call->user_mutex for kernel 
calls
  rxrpc: Delay terminal ACK transmission on a client call
  rxrpc: Split the call params from the operation params
  rxrpc: Fix call timeouts
  rxrpc: Don't transmit DELAY ACKs immediately on proposal
  rxrpc: Express protocol timeouts in terms of RTT
  rxrpc: Add a timeout for detecting lost ACKs/lost DATA
  rxrpc: Add keepalive for a call
  rxrpc: Fix service endpoint expiry
  rxrpc: Fix conn expiry timers


 include/trace/events/rxrpc.h |   86 
 include/uapi/linux/rxrpc.h   |1 
 net/rxrpc/af_rxrpc.c |   23 
 net/rxrpc/ar-internal.h  |  103 ---
 net/rxrpc/call_accept.c  |2 
 net/rxrpc/call_event.c   |  229 --
 net/rxrpc/call_object.c  |   62 +++
 net/rxrpc/conn_client.c  |   54 --
 net/rxrpc/conn_event.c   |   74 +++---
 net/rxrpc/conn_object.c  |   76 +-
 net/rxrpc/input.c|   74 +-
 net/rxrpc/misc.c |   19 +--
 net/rxrpc/net_ns.c   |   33 +-
 net/rxrpc/output.c   |   43 
 net/rxrpc/recvmsg.c  |   12 +-
 net/rxrpc/sendmsg.c  |  126 ++-
 net/rxrpc/sysctl.c   |   60 +--
 17 files changed, 752 insertions(+), 325 deletions(-)



[PATCH net-next 01/12] rxrpc: The mutex lock returned by rxrpc_accept_call() needs releasing

2017-11-24 Thread David Howells
The caller of rxrpc_accept_call() must release the lock on call->user_mutex
returned by that function.

Signed-off-by: David Howells <dhowe...@redhat.com>
---

 net/rxrpc/sendmsg.c |5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c
index 7d2595582c09..3a99b1a908df 100644
--- a/net/rxrpc/sendmsg.c
+++ b/net/rxrpc/sendmsg.c
@@ -619,8 +619,8 @@ int rxrpc_do_sendmsg(struct rxrpc_sock *rx, struct msghdr 
*msg, size_t len)
/* The socket is now unlocked. */
if (IS_ERR(call))
return PTR_ERR(call);
-   rxrpc_put_call(call, rxrpc_call_put);
-   return 0;
+   ret = 0;
+   goto out_put_unlock;
}
 
call = rxrpc_find_call_by_user_ID(rx, p.user_call_ID);
@@ -689,6 +689,7 @@ int rxrpc_do_sendmsg(struct rxrpc_sock *rx, struct msghdr 
*msg, size_t len)
ret = rxrpc_send_data(rx, call, msg, len, NULL);
}
 
+out_put_unlock:
mutex_unlock(>user_mutex);
 error_put:
rxrpc_put_call(call, rxrpc_call_put);



[PATCH net-next 03/12] rxrpc: Provide a different lockdep key for call->user_mutex for kernel calls

2017-11-24 Thread David Howells
0060f720 R08: 0001 R09: 
R10: 0001 R11: 8800b5459b68 R12: 8800ce150e00
R13: 0060f720 R14: 006127a8 R15: 
 padzero+0x1c/0x2b
 load_elf_binary+0x785/0xdc7
 search_binary_handler+0x81/0x1ff
 do_execveat_common.isra.14+0x600/0x888
 do_execve+0x1f/0x21
 SyS_execve+0x28/0x2f
 do_syscall_64+0x89/0x1be
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7fdb6009ee07
RSP: 002b:7fff566d9728 EFLAGS: 0246 ORIG_RAX: 003b
RAX: ffda RBX: 55ba57280900 RCX: 7fdb6009ee07
RDX: 55ba5727f270 RSI: 55ba5727cac0 RDI: 55ba57280900
RBP: 55ba57280900 R08: 7fff566d9700 R09: 
R10: 55ba5727cac0 R11: 0246 R12: 
R13: 55ba5727cac0 R14: 000055ba5727f270 R15: 

Signed-off-by: David Howells <dhowe...@redhat.com>
---

 net/rxrpc/ar-internal.h |2 +-
 net/rxrpc/call_accept.c |2 +-
 net/rxrpc/call_object.c |   19 +++
 3 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index b2151993d384..a972887b3f5d 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -672,7 +672,7 @@ extern unsigned int rxrpc_max_call_lifetime;
 extern struct kmem_cache *rxrpc_call_jar;
 
 struct rxrpc_call *rxrpc_find_call_by_user_ID(struct rxrpc_sock *, unsigned 
long);
-struct rxrpc_call *rxrpc_alloc_call(gfp_t);
+struct rxrpc_call *rxrpc_alloc_call(struct rxrpc_sock *, gfp_t);
 struct rxrpc_call *rxrpc_new_client_call(struct rxrpc_sock *,
 struct rxrpc_conn_parameters *,
 struct sockaddr_rxrpc *,
diff --git a/net/rxrpc/call_accept.c b/net/rxrpc/call_accept.c
index cbd1701e813a..3028298ca561 100644
--- a/net/rxrpc/call_accept.c
+++ b/net/rxrpc/call_accept.c
@@ -94,7 +94,7 @@ static int rxrpc_service_prealloc_one(struct rxrpc_sock *rx,
/* Now it gets complicated, because calls get registered with the
 * socket here, particularly if a user ID is preassigned by the user.
 */
-   call = rxrpc_alloc_call(gfp);
+   call = rxrpc_alloc_call(rx, gfp);
if (!call)
return -ENOMEM;
call->flags |= (1 << RXRPC_CALL_IS_SERVICE);
diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c
index 4c7fbc6dcce7..1f141dc08ad2 100644
--- a/net/rxrpc/call_object.c
+++ b/net/rxrpc/call_object.c
@@ -55,6 +55,8 @@ static void rxrpc_call_timer_expired(unsigned long _call)
rxrpc_set_timer(call, rxrpc_timer_expired, ktime_get_real());
 }
 
+static struct lock_class_key rxrpc_call_user_mutex_lock_class_key;
+
 /*
  * find an extant server call
  * - called in process context with IRQs enabled
@@ -95,7 +97,7 @@ struct rxrpc_call *rxrpc_find_call_by_user_ID(struct 
rxrpc_sock *rx,
 /*
  * allocate a new call
  */
-struct rxrpc_call *rxrpc_alloc_call(gfp_t gfp)
+struct rxrpc_call *rxrpc_alloc_call(struct rxrpc_sock *rx, gfp_t gfp)
 {
struct rxrpc_call *call;
 
@@ -114,6 +116,14 @@ struct rxrpc_call *rxrpc_alloc_call(gfp_t gfp)
goto nomem_2;
 
mutex_init(>user_mutex);
+
+   /* Prevent lockdep reporting a deadlock false positive between the afs
+* filesystem and sys_sendmsg() via the mmap sem.
+*/
+   if (rx->sk.sk_kern_sock)
+   lockdep_set_class(>user_mutex,
+ _call_user_mutex_lock_class_key);
+
setup_timer(>timer, rxrpc_call_timer_expired,
(unsigned long)call);
INIT_WORK(>processor, _process_call);
@@ -151,7 +161,8 @@ struct rxrpc_call *rxrpc_alloc_call(gfp_t gfp)
 /*
  * Allocate a new client call.
  */
-static struct rxrpc_call *rxrpc_alloc_client_call(struct sockaddr_rxrpc *srx,
+static struct rxrpc_call *rxrpc_alloc_client_call(struct rxrpc_sock *rx,
+ struct sockaddr_rxrpc *srx,
  gfp_t gfp)
 {
struct rxrpc_call *call;
@@ -159,7 +170,7 @@ static struct rxrpc_call *rxrpc_alloc_client_call(struct 
sockaddr_rxrpc *srx,
 
_enter("");
 
-   call = rxrpc_alloc_call(gfp);
+   call = rxrpc_alloc_call(rx, gfp);
if (!call)
return ERR_PTR(-ENOMEM);
call->state = RXRPC_CALL_CLIENT_AWAIT_CONN;
@@ -210,7 +221,7 @@ struct rxrpc_call *rxrpc_new_client_call(struct rxrpc_sock 
*rx,
 
_enter("%p,%lx", rx, user_call_ID);
 
-   call = rxrpc_alloc_client_call(srx, gfp);
+   call = rxrpc_alloc_client_call(rx, srx, gfp);
if (IS_ERR(call)) {
release_sock(>sk);
_leave(" = %ld", PTR_ERR(call));



[PATCH net-next 05/12] rxrpc: Split the call params from the operation params

2017-11-24 Thread David Howells
When rxrpc_sendmsg() parses the control message buffer, it places the
parameters extracted into a structure, but lumps together call parameters
(such as user call ID) with operation parameters (such as whether to send
data, send an abort or accept a call).

Split the call parameters out into their own structure, a copy of which is
then embedded in the operation parameters struct.

The call parameters struct is then passed down into the places that need it
instead of passing the individual parameters.  This allows for extra call
parameters to be added.

Signed-off-by: David Howells <dhowe...@redhat.com>
---

 net/rxrpc/af_rxrpc.c|8 ++-
 net/rxrpc/ar-internal.h |   31 -
 net/rxrpc/call_object.c |   15 ++
 net/rxrpc/sendmsg.c |   51 ---
 4 files changed, 60 insertions(+), 45 deletions(-)

diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index 9b5c46b052fd..c0cdcf980ffc 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -285,6 +285,7 @@ struct rxrpc_call *rxrpc_kernel_begin_call(struct socket 
*sock,
   bool upgrade)
 {
struct rxrpc_conn_parameters cp;
+   struct rxrpc_call_params p;
struct rxrpc_call *call;
struct rxrpc_sock *rx = rxrpc_sk(sock->sk);
int ret;
@@ -302,6 +303,10 @@ struct rxrpc_call *rxrpc_kernel_begin_call(struct socket 
*sock,
if (key && !key->payload.data[0])
key = NULL; /* a no-security key */
 
+   memset(, 0, sizeof(p));
+   p.user_call_ID = user_call_ID;
+   p.tx_total_len = tx_total_len;
+
memset(, 0, sizeof(cp));
cp.local= rx->local;
cp.key  = key;
@@ -309,8 +314,7 @@ struct rxrpc_call *rxrpc_kernel_begin_call(struct socket 
*sock,
cp.exclusive= false;
cp.upgrade  = upgrade;
cp.service_id   = srx->srx_service;
-   call = rxrpc_new_client_call(rx, , srx, user_call_ID, tx_total_len,
-gfp);
+   call = rxrpc_new_client_call(rx, , srx, , gfp);
/* The socket has been unlocked. */
if (!IS_ERR(call)) {
call->notify_rx = notify_rx;
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index d1213d503f30..ba63f2231107 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -643,6 +643,35 @@ struct rxrpc_ack_summary {
u8  cumulative_acks;
 };
 
+/*
+ * sendmsg() cmsg-specified parameters.
+ */
+enum rxrpc_command {
+   RXRPC_CMD_SEND_DATA,/* send data message */
+   RXRPC_CMD_SEND_ABORT,   /* request abort generation */
+   RXRPC_CMD_ACCEPT,   /* [server] accept incoming call */
+   RXRPC_CMD_REJECT_BUSY,  /* [server] reject a call as busy */
+};
+
+struct rxrpc_call_params {
+   s64 tx_total_len;   /* Total Tx data length (if 
send data) */
+   unsigned long   user_call_ID;   /* User's call ID */
+   struct {
+   u32 hard;   /* Maximum lifetime (sec) */
+   u32 idle;   /* Max time since last data 
packet (msec) */
+   u32 normal; /* Max time since last call 
packet (msec) */
+   } timeouts;
+   u8  nr_timeouts;/* Number of timeouts specified 
*/
+};
+
+struct rxrpc_send_params {
+   struct rxrpc_call_params call;
+   u32 abort_code; /* Abort code to Tx (if abort) 
*/
+   enum rxrpc_command  command : 8;/* The command to implement */
+   boolexclusive;  /* Shared or exclusive call */
+   boolupgrade;/* If the connection is 
upgradeable */
+};
+
 #include 
 
 /*
@@ -687,7 +716,7 @@ struct rxrpc_call *rxrpc_alloc_call(struct rxrpc_sock *, 
gfp_t);
 struct rxrpc_call *rxrpc_new_client_call(struct rxrpc_sock *,
 struct rxrpc_conn_parameters *,
 struct sockaddr_rxrpc *,
-unsigned long, s64, gfp_t);
+struct rxrpc_call_params *, gfp_t);
 int rxrpc_retry_client_call(struct rxrpc_sock *,
struct rxrpc_call *,
struct rxrpc_conn_parameters *,
diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c
index 1f141dc08ad2..c3e1fa854471 100644
--- a/net/rxrpc/call_object.c
+++ b/net/rxrpc/call_object.c
@@ -208,8 +208,7 @@ static void rxrpc_start_call_timer(struct rxrpc_call *call)
 struct rxrpc_call *rxrpc_new_client_call(struct rxrpc_sock *rx,
 struct rxrpc_conn_parameters *cp,
  

[PATCH net-next 04/12] rxrpc: Delay terminal ACK transmission on a client call

2017-11-24 Thread David Howells
Delay terminal ACK transmission on a client call by deferring it to the
connection processor.  This allows it to be skipped if we can send the next
call instead, the first DATA packet of which will implicitly ack this call.

Signed-off-by: David Howells <dhowe...@redhat.com>
---

 net/rxrpc/ar-internal.h |   17 +++
 net/rxrpc/conn_client.c |   18 +++
 net/rxrpc/conn_event.c  |   74 +++
 net/rxrpc/conn_object.c |   10 ++
 net/rxrpc/recvmsg.c |2 +
 5 files changed, 108 insertions(+), 13 deletions(-)

diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index a972887b3f5d..d1213d503f30 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -338,8 +338,17 @@ enum rxrpc_conn_flag {
RXRPC_CONN_DONT_REUSE,  /* Don't reuse this connection */
RXRPC_CONN_COUNTED, /* Counted by rxrpc_nr_client_conns */
RXRPC_CONN_PROBING_FOR_UPGRADE, /* Probing for service upgrade */
+   RXRPC_CONN_FINAL_ACK_0, /* Need final ACK for channel 0 */
+   RXRPC_CONN_FINAL_ACK_1, /* Need final ACK for channel 1 */
+   RXRPC_CONN_FINAL_ACK_2, /* Need final ACK for channel 2 */
+   RXRPC_CONN_FINAL_ACK_3, /* Need final ACK for channel 3 */
 };
 
+#define RXRPC_CONN_FINAL_ACK_MASK ((1UL << RXRPC_CONN_FINAL_ACK_0) |   \
+  (1UL << RXRPC_CONN_FINAL_ACK_1) |\
+  (1UL << RXRPC_CONN_FINAL_ACK_2) |\
+  (1UL << RXRPC_CONN_FINAL_ACK_3))
+
 /*
  * Events that can be raised upon a connection.
  */
@@ -393,6 +402,7 @@ struct rxrpc_connection {
 #define RXRPC_ACTIVE_CHANS_MASK((1 << RXRPC_MAXCALLS) - 1)
struct list_headwaiting_calls;  /* Calls waiting for channels */
struct rxrpc_channel {
+   unsigned long   final_ack_at;   /* Time at which to 
issue final ACK */
struct rxrpc_call __rcu *call;  /* Active call */
u32 call_id;/* ID of current call */
u32 call_counter;   /* Call ID counter */
@@ -404,6 +414,7 @@ struct rxrpc_connection {
};
} channels[RXRPC_MAXCALLS];
 
+   struct timer_list   timer;  /* Conn event timer */
struct work_struct  processor;  /* connection event processor */
union {
struct rb_node  client_node;/* Node in local->client_conns 
*/
@@ -861,6 +872,12 @@ static inline void rxrpc_put_connection(struct 
rxrpc_connection *conn)
rxrpc_put_service_conn(conn);
 }
 
+static inline void rxrpc_reduce_conn_timer(struct rxrpc_connection *conn,
+  unsigned long expire_at)
+{
+   timer_reduce(>timer, expire_at);
+}
+
 /*
  * conn_service.c
  */
diff --git a/net/rxrpc/conn_client.c b/net/rxrpc/conn_client.c
index 5f9624bd311c..cfb997593da9 100644
--- a/net/rxrpc/conn_client.c
+++ b/net/rxrpc/conn_client.c
@@ -554,6 +554,11 @@ static void rxrpc_activate_one_channel(struct 
rxrpc_connection *conn,
 
trace_rxrpc_client(conn, channel, rxrpc_client_chan_activate);
 
+   /* Cancel the final ACK on the previous call if it hasn't been sent yet
+* as the DATA packet will implicitly ACK it.
+*/
+   clear_bit(RXRPC_CONN_FINAL_ACK_0 + channel, >flags);
+
write_lock_bh(>state_lock);
if (!test_bit(RXRPC_CALL_TX_LASTQ, >flags))
call->state = RXRPC_CALL_CLIENT_SEND_REQUEST;
@@ -813,6 +818,19 @@ void rxrpc_disconnect_client_call(struct rxrpc_call *call)
goto out_2;
}
 
+   /* Schedule the final ACK to be transmitted in a short while so that it
+* can be skipped if we find a follow-on call.  The first DATA packet
+* of the follow on call will implicitly ACK this call.
+*/
+   if (test_bit(RXRPC_CALL_EXPOSED, >flags)) {
+   unsigned long final_ack_at = jiffies + 2;
+
+   WRITE_ONCE(chan->final_ack_at, final_ack_at);
+   smp_wmb(); /* vs rxrpc_process_delayed_final_acks() */
+   set_bit(RXRPC_CONN_FINAL_ACK_0 + channel, >flags);
+   rxrpc_reduce_conn_timer(conn, final_ack_at);
+   }
+
/* Things are more complex and we need the cache lock.  We might be
 * able to simply idle the conn or it might now be lurking on the wait
 * list.  It might even get moved back to the active list whilst we're
diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c
index 59a51a56e7c8..9e9a8db1bc9c 100644
--- a/net/rxrpc/conn_event.c
+++ b/net/rxrpc/conn_event.c
@@ -24,9 +24,10 @@
  * Retransmit terminal ACK or ABORT of the previous call.
  */
 static void rxrpc_conn_retransmit_call(struct rxrpc_connection *c

[PATCH net-next 02/12] rxrpc: Don't set upgrade by default in sendmsg()

2017-11-24 Thread David Howells
Don't set upgrade by default when creating a call from sendmsg().  This is
a holdover from when I was testing the code.

Signed-off-by: David Howells <dhowe...@redhat.com>
---

 net/rxrpc/sendmsg.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c
index 3a99b1a908df..94555c94b2d8 100644
--- a/net/rxrpc/sendmsg.c
+++ b/net/rxrpc/sendmsg.c
@@ -602,7 +602,7 @@ int rxrpc_do_sendmsg(struct rxrpc_sock *rx, struct msghdr 
*msg, size_t len)
.abort_code = 0,
.command= RXRPC_CMD_SEND_DATA,
.exclusive  = false,
-   .upgrade= true,
+   .upgrade= false,
};
 
_enter("");



[PATCH net-next 07/12] rxrpc: Don't transmit DELAY ACKs immediately on proposal

2017-11-24 Thread David Howells
Don't transmit a DELAY ACK immediately on proposal when the Rx window is
rotated, but rather defer it to the work function.  This means that we have
a chance to queue/consume more received packets before we actually send the
DELAY ACK, or even cancel it entirely, thereby reducing the number of
packets transmitted.

We do, however, want to continue sending other types of packet immediately,
particularly REQUESTED ACKs, as they may be used for RTT calculation by the
other side.

Signed-off-by: David Howells <dhowe...@redhat.com>
---

 net/rxrpc/recvmsg.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c
index 0b6609da80b7..fad5f42a3abd 100644
--- a/net/rxrpc/recvmsg.c
+++ b/net/rxrpc/recvmsg.c
@@ -219,9 +219,9 @@ static void rxrpc_rotate_rx_window(struct rxrpc_call *call)
after_eq(top, call->ackr_seen + 2) ||
(hard_ack == top && after(hard_ack, call->ackr_consumed)))
rxrpc_propose_ACK(call, RXRPC_ACK_DELAY, 0, serial,
- true, false,
+ true, true,
  rxrpc_propose_ack_rotate_rx);
-   if (call->ackr_reason)
+   if (call->ackr_reason && call->ackr_reason != RXRPC_ACK_DELAY)
rxrpc_send_ack_packet(call, false);
}
 }



[PATCH net-next 06/12] rxrpc: Fix call timeouts

2017-11-24 Thread David Howells
Fix the rxrpc call expiration timeouts and make them settable from
userspace.  By analogy with other rx implementations, there should be three
timeouts:

 (1) "Normal timeout"

 This is set for all calls and is triggered if we haven't received any
 packets from the peer in a while.  It is measured from the last time
 we received any packet on that call.  This is not reset by any
 connection packets (such as CHALLENGE/RESPONSE packets).

 If a service operation takes a long time, the server should generate
 PING ACKs at a duration that's substantially less than the normal
 timeout so is to keep both sides alive.  This is set at 1/6 of normal
 timeout.

 (2) "Idle timeout"

 This is set only for a service call and is triggered if we stop
 receiving the DATA packets that comprise the request data.  It is
 measured from the last time we received a DATA packet.

 (3) "Hard timeout"

 This can be set for a call and specified the maximum lifetime of that
 call.  It should not be specified by default.  Some operations (such
 as volume transfer) take a long time.

Allow userspace to set/change the timeouts on a call with sendmsg, using a
control message:

RXRPC_SET_CALL_TIMEOUTS

The data to the message is a number of 32-bit words, not all of which need
be given:

u32 hard_timeout;   /* sec from first packet */
u32 idle_timeout;   /* msec from packet Rx */
u32 normal_timeout; /* msec from data Rx */

This can be set in combination with any other sendmsg() that affects a
call.

Signed-off-by: David Howells <dhowe...@redhat.com>
---

 include/trace/events/rxrpc.h |   69 +++-
 include/uapi/linux/rxrpc.h   |1 
 net/rxrpc/ar-internal.h  |   37 ++---
 net/rxrpc/call_event.c   |  179 --
 net/rxrpc/call_object.c  |   27 --
 net/rxrpc/conn_client.c  |4 -
 net/rxrpc/input.c|   34 +++-
 net/rxrpc/misc.c |   19 ++--
 net/rxrpc/recvmsg.c  |2 
 net/rxrpc/sendmsg.c  |   59 +++---
 net/rxrpc/sysctl.c   |   60 +++---
 11 files changed, 290 insertions(+), 201 deletions(-)

diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h
index ebe96796027a..01dcbc2164b5 100644
--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
@@ -138,10 +138,20 @@ enum rxrpc_rtt_rx_trace {
 
 enum rxrpc_timer_trace {
rxrpc_timer_begin,
+   rxrpc_timer_exp_ack,
+   rxrpc_timer_exp_hard,
+   rxrpc_timer_exp_idle,
+   rxrpc_timer_exp_normal,
+   rxrpc_timer_exp_ping,
+   rxrpc_timer_exp_resend,
rxrpc_timer_expired,
rxrpc_timer_init_for_reply,
rxrpc_timer_init_for_send_reply,
+   rxrpc_timer_restart,
rxrpc_timer_set_for_ack,
+   rxrpc_timer_set_for_hard,
+   rxrpc_timer_set_for_idle,
+   rxrpc_timer_set_for_normal,
rxrpc_timer_set_for_ping,
rxrpc_timer_set_for_resend,
rxrpc_timer_set_for_send,
@@ -296,12 +306,22 @@ enum rxrpc_congest_change {
 #define rxrpc_timer_traces \
EM(rxrpc_timer_begin,   "Begin ") \
EM(rxrpc_timer_expired, "*EXPR*") \
+   EM(rxrpc_timer_exp_ack, "ExpAck") \
+   EM(rxrpc_timer_exp_hard,"ExpHrd") \
+   EM(rxrpc_timer_exp_idle,"ExpIdl") \
+   EM(rxrpc_timer_exp_normal,  "ExpNml") \
+   EM(rxrpc_timer_exp_ping,"ExpPng") \
+   EM(rxrpc_timer_exp_resend,  "ExpRsn") \
EM(rxrpc_timer_init_for_reply,  "IniRpl") \
EM(rxrpc_timer_init_for_send_reply, "SndRpl") \
+   EM(rxrpc_timer_restart, "Restrt") \
EM(rxrpc_timer_set_for_ack, "SetAck") \
+   EM(rxrpc_timer_set_for_hard,"SetHrd") \
+   EM(rxrpc_timer_set_for_idle,"SetIdl") \
+   EM(rxrpc_timer_set_for_normal,  "SetNml") \
EM(rxrpc_timer_set_for_ping,"SetPng") \
EM(rxrpc_timer_set_for_resend,  "SetRTx") \
-   E_(rxrpc_timer_set_for_send,"SetTx ")
+   E_(rxrpc_timer_set_for_send,"SetSnd")
 
 #define rxrpc_propose_ack_traces \
EM(rxrpc_propose_ack_client_tx_end, "ClTxEnd") \
@@ -932,39 +952,44 @@ TRACE_EVENT(rxrpc_rtt_rx,
 
 TRACE_EVENT(rxrpc_timer,
TP_PROTO(struct rxrpc_call *call, enum rxrpc_timer_trace why,
-ktime_t now, unsigned long now_j),
+unsigned long now),
 
-   TP_ARGS(call, why, now, now_j),
+   TP_ARGS(call, why, now),
 
TP_STRUCT__entr

[PATCH net-next 09/12] rxrpc: Add a timeout for detecting lost ACKs/lost DATA

2017-11-24 Thread David Howells
Add an extra timeout that is set/updated when we send a DATA packet that
has the request-ack flag set.  This allows us to detect if we don't get an
ACK in response to the latest flagged packet.

The ACK packet is adjudged to have been lost if it doesn't turn up within
2*RTT of the transmission.

If the timeout occurs, we schedule the sending of a PING ACK to find out
the state of the other side.  If a new DATA packet is ready to go sooner,
we cancel the sending of the ping and set the request-ack flag on that
instead.

If we get back a PING-RESPONSE ACK that indicates a lower tx_top than what
we had at the time of the ping transmission, we adjudge all the DATA
packets sent between the response tx_top and the ping-time tx_top to have
been lost and retransmit immediately.

Rather than sending a PING ACK, we could just pick a DATA packet and
speculatively retransmit that with request-ack set.  It should result in
either a REQUESTED ACK or a DUPLICATE ACK which we can then use in lieu the
a PING-RESPONSE ACK mentioned above.

Signed-off-by: David Howells <dhowe...@redhat.com>
---

 include/trace/events/rxrpc.h |   11 +--
 net/rxrpc/ar-internal.h  |6 +-
 net/rxrpc/call_event.c   |   26 ++
 net/rxrpc/call_object.c  |1 +
 net/rxrpc/input.c|   40 
 net/rxrpc/output.c   |   20 ++--
 net/rxrpc/recvmsg.c  |4 ++--
 net/rxrpc/sendmsg.c  |2 +-
 8 files changed, 98 insertions(+), 12 deletions(-)

diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h
index 01dcbc2164b5..84ade8b76a19 100644
--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
@@ -141,6 +141,7 @@ enum rxrpc_timer_trace {
rxrpc_timer_exp_ack,
rxrpc_timer_exp_hard,
rxrpc_timer_exp_idle,
+   rxrpc_timer_exp_lost_ack,
rxrpc_timer_exp_normal,
rxrpc_timer_exp_ping,
rxrpc_timer_exp_resend,
@@ -151,6 +152,7 @@ enum rxrpc_timer_trace {
rxrpc_timer_set_for_ack,
rxrpc_timer_set_for_hard,
rxrpc_timer_set_for_idle,
+   rxrpc_timer_set_for_lost_ack,
rxrpc_timer_set_for_normal,
rxrpc_timer_set_for_ping,
rxrpc_timer_set_for_resend,
@@ -309,6 +311,7 @@ enum rxrpc_congest_change {
EM(rxrpc_timer_exp_ack, "ExpAck") \
EM(rxrpc_timer_exp_hard,"ExpHrd") \
EM(rxrpc_timer_exp_idle,"ExpIdl") \
+   EM(rxrpc_timer_exp_lost_ack,"ExpLoA") \
EM(rxrpc_timer_exp_normal,  "ExpNml") \
EM(rxrpc_timer_exp_ping,"ExpPng") \
EM(rxrpc_timer_exp_resend,  "ExpRsn") \
@@ -318,6 +321,7 @@ enum rxrpc_congest_change {
EM(rxrpc_timer_set_for_ack, "SetAck") \
EM(rxrpc_timer_set_for_hard,"SetHrd") \
EM(rxrpc_timer_set_for_idle,"SetIdl") \
+   EM(rxrpc_timer_set_for_lost_ack,"SetLoA") \
EM(rxrpc_timer_set_for_normal,  "SetNml") \
EM(rxrpc_timer_set_for_ping,"SetPng") \
EM(rxrpc_timer_set_for_resend,  "SetRTx") \
@@ -961,6 +965,7 @@ TRACE_EVENT(rxrpc_timer,
__field(enum rxrpc_timer_trace, why 
)
__field(long,   now 
)
__field(long,   ack_at  
)
+   __field(long,   ack_lost_at 
)
__field(long,   resend_at   
)
__field(long,   ping_at 
)
__field(long,   expect_rx_by
)
@@ -974,6 +979,7 @@ TRACE_EVENT(rxrpc_timer,
__entry->why= why;
__entry->now= now;
__entry->ack_at = call->ack_at;
+   __entry->ack_lost_at= call->ack_lost_at;
__entry->resend_at  = call->resend_at;
__entry->expect_rx_by   = call->expect_rx_by;
__entry->expect_req_by  = call->expect_req_by;
@@ -981,10 +987,11 @@ TRACE_EVENT(rxrpc_timer,
__entry->timer  = call->timer.expires;
   ),
 
-   TP_printk("c=%p %s a=%ld r=%ld xr=%ld xq=%ld xt=%ld t=%ld",
+   TP_printk("c=%p %s a=%ld la=%ld r=%ld xr=%ld xq=%ld xt=%ld t=%ld",
  __entry->call,
  __print_symbolic(__entry->why, rxrpc_timer_traces),
  

[PATCH net-next 11/12] rxrpc: Fix service endpoint expiry

2017-11-24 Thread David Howells
RxRPC service endpoints expire like they're supposed to by the following
means:

 (1) Mark dead rxrpc_net structs (with ->live) rather than twiddling the
 global service conn timeout, otherwise the first rxrpc_net struct to
 die will cause connections on all others to expire immediately from
 then on.

 (2) Mark local service endpoints for which the socket has been closed
 (->service_closed) so that the expiration timeout can be much
 shortened for service and client connections going through that
 endpoint.

 (3) rxrpc_put_service_conn() needs to schedule the reaper when the usage
 count reaches 1, not 0, as idle conns have a 1 count.

 (4) The accumulator for the earliest time we might want to schedule for
 should be initialised to jiffies + MAX_JIFFY_OFFSET, not ULONG_MAX as
 the comparison functions use signed arithmetic.

 (5) Simplify the expiration handling, adding the expiration value to the
 idle timestamp each time rather than keeping track of the time in the
 past before which the idle timestamp must go to be expired.  This is
 much easier to read.

 (6) Ignore the timeouts if the net namespace is dead.

 (7) Restart the service reaper work item rather the client reaper.

Signed-off-by: David Howells <dhowe...@redhat.com>
---

 include/trace/events/rxrpc.h |2 ++
 net/rxrpc/af_rxrpc.c |   13 +
 net/rxrpc/ar-internal.h  |3 +++
 net/rxrpc/conn_client.c  |2 ++
 net/rxrpc/conn_object.c  |   42 --
 net/rxrpc/net_ns.c   |3 +++
 6 files changed, 47 insertions(+), 18 deletions(-)

diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h
index e98fed6de497..36cb50c111a6 100644
--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
@@ -49,6 +49,7 @@ enum rxrpc_conn_trace {
rxrpc_conn_put_client,
rxrpc_conn_put_service,
rxrpc_conn_queued,
+   rxrpc_conn_reap_service,
rxrpc_conn_seen,
 };
 
@@ -221,6 +222,7 @@ enum rxrpc_congest_change {
EM(rxrpc_conn_put_client,   "PTc") \
EM(rxrpc_conn_put_service,  "PTs") \
EM(rxrpc_conn_queued,   "QUE") \
+   EM(rxrpc_conn_reap_service, "RPs") \
E_(rxrpc_conn_seen, "SEE")
 
 #define rxrpc_client_traces \
diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index c0cdcf980ffc..abb524c2b8f8 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -867,6 +867,19 @@ static int rxrpc_release_sock(struct sock *sk)
sock_orphan(sk);
sk->sk_shutdown = SHUTDOWN_MASK;
 
+   /* We want to kill off all connections from a service socket
+* as fast as possible because we can't share these; client
+* sockets, on the other hand, can share an endpoint.
+*/
+   switch (sk->sk_state) {
+   case RXRPC_SERVER_BOUND:
+   case RXRPC_SERVER_BOUND2:
+   case RXRPC_SERVER_LISTENING:
+   case RXRPC_SERVER_LISTEN_DISABLED:
+   rx->local->service_closed = true;
+   break;
+   }
+
spin_lock_bh(>sk_receive_queue.lock);
sk->sk_state = RXRPC_CLOSE;
spin_unlock_bh(>sk_receive_queue.lock);
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index cdcbc798f921..a0082c407005 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -84,6 +84,7 @@ struct rxrpc_net {
unsigned intnr_client_conns;
unsigned intnr_active_client_conns;
boolkill_all_client_conns;
+   boollive;
spinlock_t  client_conn_cache_lock; /* Lock for 
->*_client_conns */
spinlock_t  client_conn_discard_lock; /* Prevent multiple 
discarders */
struct list_headwaiting_client_conns;
@@ -265,6 +266,7 @@ struct rxrpc_local {
rwlock_tservices_lock;  /* lock for services list */
int debug_id;   /* debug ID for printks */
booldead;
+   boolservice_closed; /* Service socket closed */
struct sockaddr_rxrpc   srx;/* local address */
 };
 
@@ -881,6 +883,7 @@ void rxrpc_process_connection(struct work_struct *);
  * conn_object.c
  */
 extern unsigned int rxrpc_connection_expiry;
+extern unsigned int rxrpc_closed_conn_expiry;
 
 struct rxrpc_connection *rxrpc_alloc_connection(gfp_t);
 struct rxrpc_connection *rxrpc_find_connection_rcu(struct rxrpc_local *,
diff --git a/net/rxrpc/conn_client.c b/net/rxrpc/conn_client.c
index 97f6a8de4845..785dfdb9fef1 100644
--- a/net/rxrpc/conn_client.c
+++ b/net/rxrpc/conn_client.c
@@ -1079,6 +1079,8 @@ void rxrpc_discard_expired_client_conns(struct 
work_struct *work)
expiry = rxrpc

[PATCH net-next 10/12] rxrpc: Add keepalive for a call

2017-11-24 Thread David Howells
We need to transmit a packet every so often to act as a keepalive for the
peer (which has a timeout from the last time it received a packet) and also
to prevent any intervening firewalls from closing the route.

Do this by resetting a timer every time we transmit a packet.  If the timer
ever expires, we transmit a PING ACK packet and thereby also elicit a PING
RESPONSE ACK from the other side - which prevents our last-rx timeout from
expiring.

The timer is set to 1/6 of the last-rx timeout so that we can detect the
other side going away if it misses 6 replies in a row.

This is particularly necessary for servers where the processing of the
service function may take a significant amount of time.

Signed-off-by: David Howells <dhowe...@redhat.com>
---

 include/trace/events/rxrpc.h |6 ++
 net/rxrpc/ar-internal.h  |1 +
 net/rxrpc/call_event.c   |   10 ++
 net/rxrpc/output.c   |   23 +++
 4 files changed, 40 insertions(+)

diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h
index 84ade8b76a19..e98fed6de497 100644
--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
@@ -141,6 +141,7 @@ enum rxrpc_timer_trace {
rxrpc_timer_exp_ack,
rxrpc_timer_exp_hard,
rxrpc_timer_exp_idle,
+   rxrpc_timer_exp_keepalive,
rxrpc_timer_exp_lost_ack,
rxrpc_timer_exp_normal,
rxrpc_timer_exp_ping,
@@ -152,6 +153,7 @@ enum rxrpc_timer_trace {
rxrpc_timer_set_for_ack,
rxrpc_timer_set_for_hard,
rxrpc_timer_set_for_idle,
+   rxrpc_timer_set_for_keepalive,
rxrpc_timer_set_for_lost_ack,
rxrpc_timer_set_for_normal,
rxrpc_timer_set_for_ping,
@@ -162,6 +164,7 @@ enum rxrpc_timer_trace {
 enum rxrpc_propose_ack_trace {
rxrpc_propose_ack_client_tx_end,
rxrpc_propose_ack_input_data,
+   rxrpc_propose_ack_ping_for_keepalive,
rxrpc_propose_ack_ping_for_lost_ack,
rxrpc_propose_ack_ping_for_lost_reply,
rxrpc_propose_ack_ping_for_params,
@@ -311,6 +314,7 @@ enum rxrpc_congest_change {
EM(rxrpc_timer_exp_ack, "ExpAck") \
EM(rxrpc_timer_exp_hard,"ExpHrd") \
EM(rxrpc_timer_exp_idle,"ExpIdl") \
+   EM(rxrpc_timer_exp_keepalive,   "ExpKA ") \
EM(rxrpc_timer_exp_lost_ack,"ExpLoA") \
EM(rxrpc_timer_exp_normal,  "ExpNml") \
EM(rxrpc_timer_exp_ping,"ExpPng") \
@@ -321,6 +325,7 @@ enum rxrpc_congest_change {
EM(rxrpc_timer_set_for_ack, "SetAck") \
EM(rxrpc_timer_set_for_hard,"SetHrd") \
EM(rxrpc_timer_set_for_idle,"SetIdl") \
+   EM(rxrpc_timer_set_for_keepalive,   "KeepAl") \
EM(rxrpc_timer_set_for_lost_ack,"SetLoA") \
EM(rxrpc_timer_set_for_normal,  "SetNml") \
EM(rxrpc_timer_set_for_ping,"SetPng") \
@@ -330,6 +335,7 @@ enum rxrpc_congest_change {
 #define rxrpc_propose_ack_traces \
EM(rxrpc_propose_ack_client_tx_end, "ClTxEnd") \
EM(rxrpc_propose_ack_input_data,"DataIn ") \
+   EM(rxrpc_propose_ack_ping_for_keepalive, "KeepAlv") \
EM(rxrpc_propose_ack_ping_for_lost_ack, "LostAck") \
EM(rxrpc_propose_ack_ping_for_lost_reply, "LostRpl") \
EM(rxrpc_propose_ack_ping_for_params,   "Params ") \
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 7e7b817c69f0..cdcbc798f921 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -519,6 +519,7 @@ struct rxrpc_call {
unsigned long   ack_lost_at;/* When ACK is figured as lost 
*/
unsigned long   resend_at;  /* When next resend needs to 
happen */
unsigned long   ping_at;/* When next to send a ping */
+   unsigned long   keepalive_at;   /* When next to send a 
keepalive ping */
unsigned long   expect_rx_by;   /* When we expect to get a 
packet by */
unsigned long   expect_req_by;  /* When we expect to get a 
request DATA packet by */
unsigned long   expect_term_by; /* When we expect call 
termination by */
diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
index c65666b2f39e..bda952ffe6a6 100644
--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
@@ -366,6 +366,15 @@ void rxrpc_process_call(struct work_struct *work)
set_bit(RXRPC_CALL_EV_ACK_LOST, >events);
}
 
+   t = READ_ONCE(call->keepalive_at);
+   if (time_after_eq(now, t)) {
+   trace_rxrpc_timer(call, rxrpc_timer_exp_keepalive, now);
+   cmpxchg(>keepalive_at, t,

[PATCH net-next 12/12] rxrpc: Fix conn expiry timers

2017-11-24 Thread David Howells
Fix the rxrpc connection expiry timers so that connections for closed
AF_RXRPC sockets get deleted in a more timely fashion, freeing up the
transport UDP port much more quickly.

 (1) Replace the delayed work items with work items plus timers so that
 timer_reduce() can be used to shorten them and so that the timer
 doesn't requeue the work item if the net namespace is dead.

 (2) Don't use queue_delayed_work() as that won't alter the timeout if the
 timer is already running.

 (3) Don't rearm the timers if the network namespace is dead.

Signed-off-by: David Howells <dhowe...@redhat.com>
---

 net/rxrpc/af_rxrpc.c|2 ++
 net/rxrpc/ar-internal.h |6 --
 net/rxrpc/conn_client.c |   30 +++---
 net/rxrpc/conn_object.c |   28 +---
 net/rxrpc/net_ns.c  |   30 ++
 5 files changed, 68 insertions(+), 28 deletions(-)

diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index abb524c2b8f8..8f7cf4c042be 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -895,6 +895,8 @@ static int rxrpc_release_sock(struct sock *sk)
rxrpc_release_calls_on_socket(rx);
flush_workqueue(rxrpc_workqueue);
rxrpc_purge_queue(>sk_receive_queue);
+   rxrpc_queue_work(>local->rxnet->service_conn_reaper);
+   rxrpc_queue_work(>local->rxnet->client_conn_reaper);
 
rxrpc_put_local(rx->local);
rx->local = NULL;
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index a0082c407005..416688381eb7 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -79,7 +79,8 @@ struct rxrpc_net {
struct list_headconn_proc_list; /* List of conns in this 
namespace for proc */
struct list_headservice_conns;  /* Service conns in this 
namespace */
rwlock_tconn_lock;  /* Lock for ->conn_proc_list, 
->service_conns */
-   struct delayed_work service_conn_reaper;
+   struct work_struct  service_conn_reaper;
+   struct timer_list   service_conn_reap_timer;
 
unsigned intnr_client_conns;
unsigned intnr_active_client_conns;
@@ -90,7 +91,8 @@ struct rxrpc_net {
struct list_headwaiting_client_conns;
struct list_headactive_client_conns;
struct list_headidle_client_conns;
-   struct delayed_work client_conn_reaper;
+   struct work_struct  client_conn_reaper;
+   struct timer_list   client_conn_reap_timer;
 
struct list_headlocal_endpoints;
struct mutexlocal_mutex;/* Lock for ->local_endpoints */
diff --git a/net/rxrpc/conn_client.c b/net/rxrpc/conn_client.c
index 785dfdb9fef1..7f74ca3059f8 100644
--- a/net/rxrpc/conn_client.c
+++ b/net/rxrpc/conn_client.c
@@ -691,7 +691,7 @@ int rxrpc_connect_call(struct rxrpc_call *call,
 
_enter("{%d,%lx},", call->debug_id, call->user_call_ID);
 
-   rxrpc_discard_expired_client_conns(>client_conn_reaper.work);
+   rxrpc_discard_expired_client_conns(>client_conn_reaper);
rxrpc_cull_active_client_conns(rxnet);
 
ret = rxrpc_get_client_conn(call, cp, srx, gfp);
@@ -757,6 +757,18 @@ void rxrpc_expose_client_call(struct rxrpc_call *call)
 }
 
 /*
+ * Set the reap timer.
+ */
+static void rxrpc_set_client_reap_timer(struct rxrpc_net *rxnet)
+{
+   unsigned long now = jiffies;
+   unsigned long reap_at = now + rxrpc_conn_idle_client_expiry;
+
+   if (rxnet->live)
+   timer_reduce(>client_conn_reap_timer, reap_at);
+}
+
+/*
  * Disconnect a client call.
  */
 void rxrpc_disconnect_client_call(struct rxrpc_call *call)
@@ -896,9 +908,7 @@ void rxrpc_disconnect_client_call(struct rxrpc_call *call)
list_move_tail(>cache_link, >idle_client_conns);
if (rxnet->idle_client_conns.next == >cache_link &&
!rxnet->kill_all_client_conns)
-   queue_delayed_work(rxrpc_workqueue,
-  >client_conn_reaper,
-  rxrpc_conn_idle_client_expiry);
+   rxrpc_set_client_reap_timer(rxnet);
} else {
trace_rxrpc_client(conn, channel, rxrpc_client_to_inactive);
conn->cache_state = RXRPC_CONN_CLIENT_INACTIVE;
@@ -1036,8 +1046,7 @@ void rxrpc_discard_expired_client_conns(struct 
work_struct *work)
 {
struct rxrpc_connection *conn;
struct rxrpc_net *rxnet =
-   container_of(to_delayed_work(work),
-struct rxrpc_net, client_conn_reaper);
+   container_of(work, struct rxrpc_net, client_conn_reaper);
unsigned long expiry, conn_expires_at, now;
unsigned int nr_conns;
bool did_discard = false;
@@ -1116,9 +1125

[PATCH net-next 08/12] rxrpc: Express protocol timeouts in terms of RTT

2017-11-24 Thread David Howells
Express protocol timeouts for data retransmission and deferred ack
generation in terms on RTT rather than specified timeouts once we have
sufficient RTT samples.

For the moment, this requires just one RTT sample to be able to use this
for ack deferral and two for data retransmission.

The data retransmission timeout is set at RTT*1.5 and the ACK deferral
timeout is set at RTT.

Note that the calculated timeout is limited to a minimum of 4ns to make
sure it doesn't happen too quickly.

Signed-off-by: David Howells <dhowe...@redhat.com>
---

 net/rxrpc/call_event.c |   22 ++
 net/rxrpc/sendmsg.c|7 +++
 2 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
index c14395d5ad8c..da91f16ac77c 100644
--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
@@ -52,7 +52,7 @@ static void __rxrpc_propose_ACK(struct rxrpc_call *call, u8 
ack_reason,
enum rxrpc_propose_ack_trace why)
 {
enum rxrpc_propose_ack_outcome outcome = rxrpc_propose_ack_use;
-   unsigned long now, ack_at, expiry = rxrpc_soft_ack_delay;
+   unsigned long expiry = rxrpc_soft_ack_delay;
s8 prior = rxrpc_ack_priority[ack_reason];
 
/* Pings are handled specially because we don't want to accidentally
@@ -116,7 +116,13 @@ static void __rxrpc_propose_ACK(struct rxrpc_call *call, 
u8 ack_reason,
background)
rxrpc_queue_call(call);
} else {
-   now = jiffies;
+   unsigned long now = jiffies, ack_at;
+
+   if (call->peer->rtt_usage > 0)
+   ack_at = nsecs_to_jiffies(call->peer->rtt);
+   else
+   ack_at = expiry;
+
ack_at = jiffies + expiry;
if (time_before(ack_at, call->ack_at)) {
WRITE_ONCE(call->ack_at, ack_at);
@@ -160,14 +166,22 @@ static void rxrpc_resend(struct rxrpc_call *call, 
unsigned long now_j)
struct sk_buff *skb;
unsigned long resend_at;
rxrpc_seq_t cursor, seq, top;
-   ktime_t now, max_age, oldest, ack_ts;
+   ktime_t now, max_age, oldest, ack_ts, timeout, min_timeo;
int ix;
u8 annotation, anno_type, retrans = 0, unacked = 0;
 
_enter("{%d,%d}", call->tx_hard_ack, call->tx_top);
 
+   if (call->peer->rtt_usage > 1)
+   timeout = ns_to_ktime(call->peer->rtt * 3 / 2);
+   else
+   timeout = ms_to_ktime(rxrpc_resend_timeout);
+   min_timeo = ns_to_ktime((10 / HZ) * 4);
+   if (ktime_before(timeout, min_timeo))
+   timeout = min_timeo;
+
now = ktime_get_real();
-   max_age = ktime_sub_ms(now, rxrpc_resend_timeout * 1000 / HZ);
+   max_age = ktime_sub(now, timeout);
 
spin_lock_bh(>lock);
 
diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c
index 03e0676db28c..c56ee54fdd1f 100644
--- a/net/rxrpc/sendmsg.c
+++ b/net/rxrpc/sendmsg.c
@@ -226,6 +226,13 @@ static void rxrpc_queue_packet(struct rxrpc_sock *rx, 
struct rxrpc_call *call,
} else {
unsigned long now = jiffies, resend_at;
 
+   if (call->peer->rtt_usage > 1)
+   resend_at = nsecs_to_jiffies(call->peer->rtt * 3 / 2);
+   else
+   resend_at = rxrpc_resend_timeout;
+   if (resend_at < 1)
+   resend_at = 1;
+
resend_at = now + rxrpc_resend_timeout;
WRITE_ONCE(call->resend_at, resend_at);
rxrpc_reduce_call_timer(call, resend_at, now,



[PATCH net-next 2/3] rxrpc: Fix a null ptr deref in rxrpc_fill_out_ack()

2017-11-11 Thread David Howells
rxrpc_fill_out_ack() needs to be passed the connection pointer from its
caller rather than using call->conn as the call may be disconnected in
parallel with it, clearing call->conn, leading to:

BUG: unable to handle kernel NULL pointer dereference at 
0010
IP: rxrpc_send_ack_packet+0x231/0x6a4

Signed-off-by: David Howells <dhowe...@redhat.com>
---

 net/rxrpc/output.c |9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c
index 71e6f713fbe7..8ee8b2d4a3eb 100644
--- a/net/rxrpc/output.c
+++ b/net/rxrpc/output.c
@@ -35,7 +35,8 @@ struct rxrpc_abort_buffer {
 /*
  * Fill out an ACK packet.
  */
-static size_t rxrpc_fill_out_ack(struct rxrpc_call *call,
+static size_t rxrpc_fill_out_ack(struct rxrpc_connection *conn,
+struct rxrpc_call *call,
 struct rxrpc_ack_buffer *pkt,
 rxrpc_seq_t *_hard_ack,
 rxrpc_seq_t *_top,
@@ -77,8 +78,8 @@ static size_t rxrpc_fill_out_ack(struct rxrpc_call *call,
} while (before_eq(seq, top));
}
 
-   mtu = call->conn->params.peer->if_mtu;
-   mtu -= call->conn->params.peer->hdrsize;
+   mtu = conn->params.peer->if_mtu;
+   mtu -= conn->params.peer->hdrsize;
jmax = (call->nr_jumbo_bad > 3) ? 1 : rxrpc_rx_jumbo_max;
pkt->ackinfo.rxMTU  = htonl(rxrpc_rx_mtu);
pkt->ackinfo.maxMTU = htonl(mtu);
@@ -148,7 +149,7 @@ int rxrpc_send_ack_packet(struct rxrpc_call *call, bool 
ping)
}
call->ackr_reason = 0;
}
-   n = rxrpc_fill_out_ack(call, pkt, _ack, , reason);
+   n = rxrpc_fill_out_ack(conn, call, pkt, _ack, , reason);
 
spin_unlock_bh(>lock);
 



[PATCH net-next 0/3] rxrpc: Fixes

2017-11-11 Thread David Howells

Here are some patches that fix some things in AF_RXRPC:

 (1) Prevent notifications from being passed to a kernel service for a call
 that it has ended.

 (2) Fix a null pointer deference that occurs under some circumstances when an
 ACK is generated.

 (3) Fix a number of things to do with call expiration.

The patches can be found here also:


http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-next

Tagged thusly:

git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
rxrpc-next-2017

David
---
David Howells (3):
  rxrpc: Lock around calling a kernel service Rx notification
  rxrpc: Fix a null ptr deref in rxrpc_fill_out_ack()
  rxrpc: Fix call expiry handling


 net/rxrpc/af_rxrpc.c|   16 
 net/rxrpc/ar-internal.h |1 +
 net/rxrpc/call_event.c  |2 +-
 net/rxrpc/call_object.c |1 +
 net/rxrpc/input.c   |2 --
 net/rxrpc/output.c  |   19 +++
 net/rxrpc/recvmsg.c |2 ++
 7 files changed, 36 insertions(+), 7 deletions(-)



[PATCH net-next 1/3] rxrpc: Lock around calling a kernel service Rx notification

2017-11-11 Thread David Howells
Place a spinlock around the invocation of call->notify_rx() for a kernel
service call and lock again when ending the call and replace the
notification pointer with a pointer to a dummy function.

This is required because it's possible for rxrpc_notify_socket() to be
called after the call has been ended by the kernel service if called from
the asynchronous work function rxrpc_process_call().

However, rxrpc_notify_socket() currently only holds the RCU read lock when
invoking ->notify_rx(), which means that the afs_call struct would need to
be disposed of by call_rcu() rather than by kfree().

But we shouldn't see any notifications from a call after calling
rxrpc_kernel_end_call(), so a lock is required in rxrpc code.

Without this, we may see the call wait queue as having a corrupt spinlock:

BUG: spinlock bad magic on CPU#0, kworker/0:2/1612
general protection fault:  [#1] SMP
...
Workqueue: krxrpcd rxrpc_process_call
task: 88040b83c400 task.stack: 88040adfc000
RIP: 0010:spin_bug+0x161/0x18f
RSP: 0018:88040adffcc0 EFLAGS: 00010002
RAX: 0032 RBX: 6b6b6b6b6b6b6b6b RCX: 81ab16cf
RDX: 88041fa14c01 RSI: 88041fa0ccb8 RDI: 88041fa0ccb8
RBP: 88040adffcd8 R08:  R09: 
R10: 88040adffc60 R11: 022c R12: 88040aca2208
R13: 81a58114 R14:  R15: 

Call Trace:
 do_raw_spin_lock+0x1d/0x89
 _raw_spin_lock_irqsave+0x3d/0x49
 ? __wake_up_common_lock+0x4c/0xa7
 __wake_up_common_lock+0x4c/0xa7
 ? __lock_is_held+0x47/0x7a
 __wake_up+0xe/0x10
 afs_wake_up_call_waiter+0x11b/0x122 [kafs]
 rxrpc_notify_socket+0x12b/0x258
 rxrpc_process_call+0x18e/0x7d0
 process_one_work+0x298/0x4de
 ? rescuer_thread+0x280/0x280
 worker_thread+0x1d1/0x2ae
 ? rescuer_thread+0x280/0x280
 kthread+0x12c/0x134
 ? kthread_create_on_node+0x3a/0x3a
 ret_from_fork+0x27/0x40

In this case, note the corrupt data in EBX.  The address of the offending
afs_call is in R12, plus the offset to the spinlock.

Signed-off-by: David Howells <dhowe...@redhat.com>
---

 net/rxrpc/af_rxrpc.c|   16 
 net/rxrpc/ar-internal.h |1 +
 net/rxrpc/call_object.c |1 +
 net/rxrpc/recvmsg.c |2 ++
 4 files changed, 20 insertions(+)

diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index 344b2dcad52d..9b5c46b052fd 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -322,6 +322,14 @@ struct rxrpc_call *rxrpc_kernel_begin_call(struct socket 
*sock,
 }
 EXPORT_SYMBOL(rxrpc_kernel_begin_call);
 
+/*
+ * Dummy function used to stop the notifier talking to recvmsg().
+ */
+static void rxrpc_dummy_notify_rx(struct sock *sk, struct rxrpc_call *rxcall,
+ unsigned long call_user_ID)
+{
+}
+
 /**
  * rxrpc_kernel_end_call - Allow a kernel service to end a call it was using
  * @sock: The socket the call is on
@@ -336,6 +344,14 @@ void rxrpc_kernel_end_call(struct socket *sock, struct 
rxrpc_call *call)
 
mutex_lock(>user_mutex);
rxrpc_release_call(rxrpc_sk(sock->sk), call);
+
+   /* Make sure we're not going to call back into a kernel service */
+   if (call->notify_rx) {
+   spin_lock_bh(>notify_lock);
+   call->notify_rx = rxrpc_dummy_notify_rx;
+   spin_unlock_bh(>notify_lock);
+   }
+
mutex_unlock(>user_mutex);
rxrpc_put_call(call, rxrpc_call_put_kernel);
 }
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index ea5600b747cc..b2151993d384 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -525,6 +525,7 @@ struct rxrpc_call {
unsigned long   flags;
unsigned long   events;
spinlock_t  lock;
+   spinlock_t  notify_lock;/* Kernel notification lock */
rwlock_tstate_lock; /* lock for state transition */
u32 abort_code; /* Local/remote abort code */
int error;  /* Local error incurred */
diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c
index fcdd6555a820..4c7fbc6dcce7 100644
--- a/net/rxrpc/call_object.c
+++ b/net/rxrpc/call_object.c
@@ -124,6 +124,7 @@ struct rxrpc_call *rxrpc_alloc_call(gfp_t gfp)
INIT_LIST_HEAD(>sock_link);
init_waitqueue_head(>waitq);
spin_lock_init(>lock);
+   spin_lock_init(>notify_lock);
rwlock_init(>state_lock);
atomic_set(>usage, 1);
call->debug_id = atomic_inc_return(_debug_id);
diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c
index e4937b3f3685..8510a98b87e1 100644
--- a/net/rxrpc/recvmsg.c
+++ b/net/rxrpc/recvmsg.c
@@ -40,7 +40,9 @@ void rxrpc_notify_socket(struct rxrpc_call *call)
sk = >sk;
if (rx 

[PATCH net-next 3/3] rxrpc: Fix call expiry handling

2017-11-11 Thread David Howells
Fix call expiry handling in the following ways

 (1) If all the request data from a client call is acked, don't send a
 follow up IDLE ACK with firstPacket == 1 and previousPacket == 0 as
 this appears to fool some servers into thinking everything has been
 accepted.

 (2) Never send an abort back to the server once it has ACK'd all the
 request packets; rather just try to reuse the channel for the next
 call.  The first request DATA packet of the next call on the same
 channel will implicitly ACK the entire reply of the dead call - even
 if we haven't transmitted it yet.

 (3) Don't send RX_CALL_TIMEOUT in an ABORT packet, librx uses abort codes
 to pass local errors to the caller in addition to remote errors, and
 this is meant to be local only.

The following also need to be addressed in future patches:

 (4) Service calls should send PING ACKs as 'keep alives' if the server is
 still processing the call.

 (5) VERSION REPLY packets should be sent to the peers of service
 connections to act as keep-alives.  This is used to keep firewall
 routes in place.  The AFS CM should enable this.

Signed-off-by: David Howells <dhowe...@redhat.com>
---

 net/rxrpc/call_event.c |2 +-
 net/rxrpc/input.c  |2 --
 net/rxrpc/output.c |   10 ++
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
index 7a77844aab16..3574508baf9a 100644
--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
@@ -386,7 +386,7 @@ void rxrpc_process_call(struct work_struct *work)
 
now = ktime_get_real();
if (ktime_before(call->expire_at, now)) {
-   rxrpc_abort_call("EXP", call, 0, RX_CALL_TIMEOUT, -ETIME);
+   rxrpc_abort_call("EXP", call, 0, RX_USER_ABORT, -ETIME);
set_bit(RXRPC_CALL_EV_ABORT, >events);
goto recheck_state;
}
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index 1e37eb1c0c66..1b592073ec96 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -298,8 +298,6 @@ static bool rxrpc_end_tx_phase(struct rxrpc_call *call, 
bool reply_begun,
 
write_unlock(>state_lock);
if (call->state == RXRPC_CALL_CLIENT_AWAIT_REPLY) {
-   rxrpc_propose_ACK(call, RXRPC_ACK_IDLE, 0, 0, false, true,
- rxrpc_propose_ack_client_tx_end);
trace_rxrpc_transmit(call, rxrpc_transmit_await_reply);
} else {
trace_rxrpc_transmit(call, rxrpc_transmit_end);
diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c
index 8ee8b2d4a3eb..f47659c7b224 100644
--- a/net/rxrpc/output.c
+++ b/net/rxrpc/output.c
@@ -222,6 +222,16 @@ int rxrpc_send_abort_packet(struct rxrpc_call *call)
rxrpc_serial_t serial;
int ret;
 
+   /* Don't bother sending aborts for a client call once the server has
+* hard-ACK'd all of its request data.  After that point, we're not
+* going to stop the operation proceeding, and whilst we might limit
+* the reply, it's not worth it if we can send a new call on the same
+* channel instead, thereby closing off this call.
+*/
+   if (rxrpc_is_client_call(call) &&
+   test_bit(RXRPC_CALL_TX_LAST, >flags))
+   return 0;
+
spin_lock_bh(>lock);
if (call->conn)
conn = rxrpc_get_connection_maybe(call->conn);



Re: [PATCH 18/27] bpf: Restrict kernel image access functions when the kernel is locked down

2017-11-09 Thread David Howells
Alexei Starovoitov  wrote:

> > TBH, I've no idea how bpf does anything, so I can't say whether this is
> > better, overkill or insufficient.
> 
> ok. To make it clear:
> Nacked-by: Alexei Starovoitov 
> For the current patch.
> Unnecessary checks for no good reason in performance critical
> functions are not acceptable.

They aren't unnecessary checks.

Can you please suggest if there's some way more suitable than just killing bpf
entirely?  I don't know the code, and I presume you do.

David


Re: Is there a race between __mod_timer() and del_timer()?

2017-11-08 Thread David Howells
David Howells <dhowe...@redhat.com> wrote:

> I think it might just be best to put a note in the comments in __mod_timer().

How about the attached?

David
---
commit d538c734f9bf885292b88a81a06c5efee528d70d
Author: David Howells <dhowe...@redhat.com>
Date:   Wed Nov 8 10:20:27 2017 +

Add a comment into __mod_timer() noting a possible race with del_timer()

Add a comment into __mod_timer() noting a possible race with del_timer() in
which the 'common optimization' case could leave the timer unstarted if
del_timer() happens between the timer_pending() check and the timer
expiration check.

Signed-off-by: David Howells <dhowe...@redhat.com>

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index f2674a056c26..e0ac4486529c 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -949,6 +949,9 @@ __mod_timer(struct timer_list *timer, unsigned long 
expires, bool pending_only)
 * The downside of this optimization is that it can result in
 * larger granularity than you would get from adding a new
 * timer with this expiry.
+*
+* Note that if del_timer() happens whilst we're just here, we
+* will return with the timer unstarted.
 */
if (timer->expires == expires)
return 1;


Is there a race between __mod_timer() and del_timer()?

2017-11-08 Thread David Howells
Is there a race between the optimisation for networking code in __mod_timer()
and del_timer() - or, at least, a race that matters?

Consider:

CPU A   CPU B
=== ===
[timer X is active]
==>__mod_timer(X)
if (timer_pending(timer))
[Take the true path]
-- IRQ --   ==>del_timer(X)
<==
if (timer->expires == expires)
[Take the true path]
<==return 1
[timer X is not active]

There's no locking to prevent this, but __mod_timer() returns without
restarting the timer.  I'm not sure this is a problem exactly, however, since
del_timer() *was* issued, and could've deleted the timer after __mod_timer()
returned.

A couple of possible alleviations:

 (1) Recheck timer_pending() before returning from __mod_timer().

 (2) Set timer->expires to jiffies in del_timer() - but since there's nothing
 preventing the optimisation in __mod_timer() from occurring concurrently
 with del_timer(), this probably won't help.

I think it might just be best to put a note in the comments in __mod_timer().

Thoughts?

David


  1   2   3   4   5   6   7   8   9   >