Re: [PATCH] crypto: arm/curve25519 - Move '.fpu' after '.arch'

2021-04-09 Thread Jason A. Donenfeld
Seems reasonable to me.

Acked-by: Jason A. Donenfeld 


Re: [Linuxarm] Re: [RFC v2] net: sched: implement TCQ_F_CAN_BYPASS for lockless qdisc

2021-03-19 Thread Jason A. Donenfeld
On Thu, Mar 18, 2021 at 1:33 AM Yunsheng Lin  wrote:
> > That offer definitely still stands. Generalization sounds like a lot of fun.
> >
> > Keep in mind though that it's an eventually consistent queue, not an
> > immediately consistent one, so that might not match all use cases. It
> > works with wg because we always trigger the reader thread anew when it
> > finishes, but that doesn't apply to everyone's queueing setup.
>
> Thanks for mentioning this.
>
> "multi-producer, single-consumer" seems to match the lockless qdisc's
> paradigm too, for now concurrent enqueuing/dequeuing to the pfifo_fast's
> queues() is not allowed, it is protected by producer_lock or consumer_lock.

The other thing is that if you've got memory for a ring buffer rather
than a list queue, we worked on an MPMC ring structure for WireGuard a
few years ago that we didn't wind up using in the end, but it lives
here:
https://git.zx2c4.com/wireguard-monolithic-historical/tree/src/mpmc_ptr_ring.h?h=tg/mpmc-benchmark


Re: [RFC v2] net: sched: implement TCQ_F_CAN_BYPASS for lockless qdisc

2021-03-17 Thread Jason A. Donenfeld
On 3/17/21, Toke Høiland-Jørgensen  wrote:
> Cong Wang  writes:
>
>> On Mon, Mar 15, 2021 at 2:07 PM Jakub Kicinski  wrote:
>>>
>>> I thought pfifo was supposed to be "lockless" and this change
>>> re-introduces a lock between producer and consumer, no?
>>
>> It has never been truly lockless, it uses two spinlocks in the ring
>> buffer
>> implementation, and it introduced a q->seqlock recently, with this patch
>> now we have priv->lock, 4 locks in total. So our "lockless" qdisc ends
>> up having more locks than others. ;) I don't think we are going to a
>> right direction...
>
> Just a thought, have you guys considered adopting the lockless MSPC ring
> buffer recently introduced into Wireguard in commit:
>
> 8b5553ace83c ("wireguard: queueing: get rid of per-peer ring buffers")
>
> Jason indicated he was willing to work on generalising it into a
> reusable library if there was a use case for it. I haven't quite though
> through the details of whether this would be such a use case, but
> figured I'd at least mention it :)

That offer definitely still stands. Generalization sounds like a lot of fun.

Keep in mind though that it's an eventually consistent queue, not an
immediately consistent one, so that might not match all use cases. It
works with wg because we always trigger the reader thread anew when it
finishes, but that doesn't apply to everyone's queueing setup.


Re: [PATCH] wireguard: netlink: add multicast notification for peer changes

2021-03-07 Thread Jason A. Donenfeld
Hey Linus,

Thanks for the patch and sorry for the delay in reviewing it. Seeing
the basic scaffolding for getting netlink notifiers working with
WireGuard is enlightening; it looks surprisingly straightforward.

There are three classes of things that are interesting to receive
notifications for:

1) Things that the admin changes locally. This is akin to the `ip
monitor`, in which you can see various things that other programs (or
the kernel) modify. This seems straightforward and uncontroversial.

2) Authenticated events from peers. This basically amounts to new
sessions being created following a successful handshake. This seems
mostly okay because authenticated actions already have various DoS
protections in place.

3) Unauthenticated events. This would be things like (a) your patch --
a peer's endpoint changes -- or, more interestingly, (b) public keys
of unknown peers trying to handshake.

(b) is interesting because it allows doing database lookups in
userspace, adding the looked up entry, adding it to the interface, and
initiating a handshake in the reverse direction using the endpoint
information. It's partially DoS-protected due to the on-demand cookie
mac2 field.

(a) is also interesting for the use cases you outlined, but much more
perilous, as these are highspeed packets where the outer IP header is
totally unauthenticated. What prevents a man in the middle from doing
something nasty there, causing userspace to be inundated with
expensive calls and filling up netlink socket buffers and so forth?

I wonder if this would be something better belonging to (2) -- getting
an event on an authenticated peer handshake -- and combining that with
the ability to disable roaming (something discussed a few times on
wgml). Alternatively, maybe you have a different idea in mind about
this?

I also don't (yet) know much about the efficiency of multicast netlink
events and what the overhead is if nobody is listening, and questions
like that. So I'd be curious to hear your thoughts on the matter.

Regards,
Jason


Re: [PATCH] crypto: mips/poly1305 - enable for all MIPS processors

2021-03-03 Thread Jason A. Donenfeld
On Wed, Mar 3, 2021 at 1:31 PM Andy Polyakov  wrote:
>
> Hi,
>
> > I'm also CC'ing Andy on this, who wrote the original assembly, in case
> > he has some last minute objection.
>
> Just "what took you so long":-) On potentially related note cryptogams
> chacha-mips is as universal as poly1305-mips. "Universal" in sense that
> it can be compiled for all MIPS stripes.

Oh great. I didn't realize you had a ChaCha mips implementation. I'll
look into having that replace my junky mips32r2 one.

Jason


Re: [PATCH] crypto: mips/poly1305 - enable for all MIPS processors

2021-03-03 Thread Jason A. Donenfeld
Hi Maciej,

On Wed, Mar 3, 2021 at 2:16 AM Maciej W. Rozycki  wrote:
>
> The MIPS Poly1305 implementation is generic MIPS code written such as to
> support down to the original MIPS I and MIPS III ISA for the 32-bit and
> 64-bit variant respectively.  Lift the current limitation then to enable
> code for MIPSr1 ISA or newer processors only and have it available for
> all MIPS processors.

That sounds like a good solution to me. Thanks for doing the research
on it. Assuming your findings hold up:

Acked-by: Jason A. Donenfeld 

I'm also CC'ing Andy on this, who wrote the original assembly, in case
he has some last minute objection.

Jason

>
> Signed-off-by: Maciej W. Rozycki 
> Fixes: a11d055e7a64 ("crypto: mips/poly1305 - incorporate OpenSSL/CRYPTOGAMS 
> optimized implementation")
> Cc: sta...@vger.kernel.org # v5.5+
> ---
> On Wed, 3 Mar 2021, Jason A. Donenfeld wrote:
>
> > >> Would you mind sending this for 5.12 in an rc at some point, rather
> > >> than waiting for 5.13? I'd like to see this backported to 5.10 and 5.4
> > >> for OpenWRT.
> > >
> > > why is this so important for OpenWRT ? Just to select CRYPTO_POLY1305_MIPS
> > > ?
> >
> > Yes. The performance boost on Octeon is significant for WireGuard users.
>
>  But that's the wrong fix for that purpose.  I've skimmed over that module
> and there's nothing MIPS64-specific there.  In fact it's plain generic
> MIPS assembly, with some R2 optimisations enabled where applicable but not
> necessary (and then R6 tweaks, but that's irrelevant here).
>
>  As a matter of interest I have just built it successfully for a MIPS I
> DECstation configuration:
>
> $ file arch/mips/crypto/poly1305-mips.ko
> arch/mips/crypto/poly1305-mips.ko: ELF 32-bit LSB relocatable, MIPS, MIPS-I 
> version 1 (SYSV), BuildID[sha1]=d36384d94f60ba7deff638ca8a24500120b45b56, not 
> stripped
> $
>
> Patch included, please apply.
>
>  So while your change is surely right, what you want is this really.
>
>   Maciej
> ---
>  arch/mips/crypto/Makefile |4 ++--
>  crypto/Kconfig|2 +-
>  drivers/net/Kconfig   |2 +-
>  3 files changed, 4 insertions(+), 4 deletions(-)
>
> Index: linux/arch/mips/crypto/Makefile
> ===
> --- linux.orig/arch/mips/crypto/Makefile
> +++ linux/arch/mips/crypto/Makefile
> @@ -12,8 +12,8 @@ AFLAGS_chacha-core.o += -O2 # needed to
>  obj-$(CONFIG_CRYPTO_POLY1305_MIPS) += poly1305-mips.o
>  poly1305-mips-y := poly1305-core.o poly1305-glue.o
>
> -perlasm-flavour-$(CONFIG_CPU_MIPS32) := o32
> -perlasm-flavour-$(CONFIG_CPU_MIPS64) := 64
> +perlasm-flavour-$(CONFIG_32BIT) := o32
> +perlasm-flavour-$(CONFIG_64BIT) := 64
>
>  quiet_cmd_perlasm = PERLASM $@
>cmd_perlasm = $(PERL) $(<) $(perlasm-flavour-y) $(@)
> Index: linux/crypto/Kconfig
> ===
> --- linux.orig/crypto/Kconfig
> +++ linux/crypto/Kconfig
> @@ -772,7 +772,7 @@ config CRYPTO_POLY1305_X86_64
>
>  config CRYPTO_POLY1305_MIPS
> tristate "Poly1305 authenticator algorithm (MIPS optimized)"
> -   depends on CPU_MIPS32 || (CPU_MIPS64 && 64BIT)
> +   depends on MIPS
> select CRYPTO_ARCH_HAVE_LIB_POLY1305
>
>  config CRYPTO_MD4
> Index: linux/drivers/net/Kconfig
> ===
> --- linux.orig/drivers/net/Kconfig
> +++ linux/drivers/net/Kconfig
> @@ -92,7 +92,7 @@ config WIREGUARD
> select CRYPTO_POLY1305_ARM if ARM
> select CRYPTO_CURVE25519_NEON if ARM && KERNEL_MODE_NEON
> select CRYPTO_CHACHA_MIPS if CPU_MIPS32_R2
> -   select CRYPTO_POLY1305_MIPS if CPU_MIPS32 || (CPU_MIPS64 && 64BIT)
> +   select CRYPTO_POLY1305_MIPS if MIPS
> help
>   WireGuard is a secure, fast, and easy to use replacement for IPSec
>   that uses modern cryptography and clever networking tricks. It's


Re: [PATCH v2] MIPS: select CPU_MIPS64 for remaining MIPS64 CPUs

2021-03-03 Thread Jason A. Donenfeld
On 3/3/21, Thomas Bogendoerfer  wrote:
> On Mon, Mar 01, 2021 at 05:27:46PM +0100, Jason A. Donenfeld wrote:
>> Hey Thomas,
>>
>> Would you mind sending this for 5.12 in an rc at some point, rather
>> than waiting for 5.13? I'd like to see this backported to 5.10 and 5.4
>> for OpenWRT.
>
> why is this so important for OpenWRT ? Just to select CRYPTO_POLY1305_MIPS
> ?

Yes. The performance boost on Octeon is significant for WireGuard users.

And it becomes incredibly frustrating to bifurcate packaging into
"mips64 where the config is right" vs "mips64 where the config is
borked." This saves us a lot of trouble.

Plus the patch is trivial.


Re: [PATCH v2] MIPS: select CPU_MIPS64 for remaining MIPS64 CPUs

2021-03-01 Thread Jason A. Donenfeld
Hey Thomas,

Would you mind sending this for 5.12 in an rc at some point, rather
than waiting for 5.13? I'd like to see this backported to 5.10 and 5.4
for OpenWRT.

Thanks,
Jason


[PATCH v2] MIPS: select CPU_MIPS64 for remaining MIPS64 CPUs

2021-02-27 Thread Jason A. Donenfeld
CPU_MIPS64 is supposed to be selected for CPUs that implement a revision
of the MIPS64 ISA. While it contains the generic ones, it forgot about
Octeon and Loongson in its list, which are indeed MIPS64 processors.
This commit adds these missing CPUs to the auto-selection list.

Cc: Maciej W. Rozycki 
Cc: Thomas Bogendoerfer 
Cc: Ralf Baechle 
Cc: George Cherian 
Cc: Huacai Chen 
Cc: Jiaxun Yang 
Signed-off-by: Jason A. Donenfeld 
---
 arch/mips/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index d89efba3d8a4..3e0e8f1d2e82 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -2118,7 +2118,7 @@ config CPU_MIPS32
 config CPU_MIPS64
bool
default y if CPU_MIPS64_R1 || CPU_MIPS64_R2 || CPU_MIPS64_R5 || \
-CPU_MIPS64_R6
+CPU_MIPS64_R6 || CPU_LOONGSON64 || CPU_CAVIUM_OCTEON
 
 #
 # These indicate the revision of the architecture
-- 
2.30.1



Re: [PATCH] MIPS: select CPU_MIPS64 for remaining MIPS64 CPUs

2021-02-27 Thread Jason A. Donenfeld
On Sat, Feb 27, 2021 at 2:41 PM Maciej W. Rozycki  wrote:
>
> On Sat, 27 Feb 2021, Jason A. Donenfeld wrote:
>
> > The CPU_MIPS64 and CPU_MIPS32 variables are supposed to be able to
> > distinguish broadly between 64-bit and 32-bit MIPS CPUs. However, they
>
>  That is not true.  The purpose of these options is to identify MIPS64 and
> MIPS32 ISA processors respectively (and the generic features these ISAs
> imply).  There are 64-bit and 32-bit MIPS processors which do not qualify,
> specifically all MIPS I, MIPS II, MIPS III, and MIPS IV implementations.
>
> > weren't selected by the specialty CPUs, Octeon and Loongson, which meant
> > it was possible to hit a weird state of:
> >
> > MIPS=y, CONFIG_64BIT=y, CPU_MIPS64=n
>
>  This is a correct combination for MIPS III and MIPS IV processors.
>
> > This commit rectifies the issue by having CPU_MIPS64 be selected when
> > the missing Octeon or Loongson models are selected.
>
>  From the description and/or other options selected by CPU_LOONGSON64 and
> CPU_CAVIUM_OCTEON I infer the change itself is correct, so you only need
> to rewrite the change description.

Indeed you're right. v2 on its way.

Jason


[PATCH] MIPS: select CPU_MIPS64 for remaining MIPS64 CPUs

2021-02-27 Thread Jason A. Donenfeld
The CPU_MIPS64 and CPU_MIPS32 variables are supposed to be able to
distinguish broadly between 64-bit and 32-bit MIPS CPUs. However, they
weren't selected by the specialty CPUs, Octeon and Loongson, which meant
it was possible to hit a weird state of:

MIPS=y, CONFIG_64BIT=y, CPU_MIPS64=n

This commit rectifies the issue by having CPU_MIPS64 be selected when
the missing Octeon or Loongson models are selected.

Cc: Thomas Bogendoerfer 
Cc: Ralf Baechle 
Cc: George Cherian 
Cc: Huacai Chen 
Cc: Jiaxun Yang 
Signed-off-by: Jason A. Donenfeld 
---
 arch/mips/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index d89efba3d8a4..3e0e8f1d2e82 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -2118,7 +2118,7 @@ config CPU_MIPS32
 config CPU_MIPS64
bool
default y if CPU_MIPS64_R1 || CPU_MIPS64_R2 || CPU_MIPS64_R5 || \
-CPU_MIPS64_R6
+CPU_MIPS64_R6 || CPU_LOONGSON64 || CPU_CAVIUM_OCTEON
 
 #
 # These indicate the revision of the architecture
-- 
2.30.1



Re: possible stack corruption in icmp_send (__stack_chk_fail)

2021-02-17 Thread Jason A. Donenfeld
On 2/18/21, Willem de Bruijn  wrote:
> On Wed, Feb 17, 2021 at 5:56 PM Jason A. Donenfeld  wrote:
>>
>> Hi Willem,
>>
>> On Wed, Feb 17, 2021 at 11:27 PM Willem de Bruijn
>>  wrote:
>> > A vmlinux image might help. I couldn't find one for this kernel.
>>
>> https://data.zx2c4.com/icmp_send-crash-e03b4a42-706a-43bf-bc40-1f15966b3216.tar.xz
>> has .debs with vmlinuz in there, which you can extract to vmlinux, as
>> well as my own vmlinux elf construction with the symbols added back in
>> by extracting them from kallsyms. That's the best I've been able to
>> do, as all of this is coming from somebody random emailing me.
>>
>> > But could it be
>> > that the forwarded packet is not sensible IPv4? The skb->protocol is
>> > inferred in wg_packet_consume_data_done->ip_tunnel_parse_protocol.
>>
>> The wg calls to icmp_ndo_send are gated by checking skb->protocol:
>>
>> if (skb->protocol == htons(ETH_P_IP))
>>icmp_ndo_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH,
>> 0);
>>else if (skb->protocol == htons(ETH_P_IPV6))
>>icmpv6_ndo_send(skb, ICMPV6_DEST_UNREACH,
>> ICMPV6_ADDR_UNREACH, 0);
>>
>> On the other hand, that code is hit on an error path when
>> wg_check_packet_protocol returns false:
>>
>> static inline bool wg_check_packet_protocol(struct sk_buff *skb)
>> {
>>__be16 real_protocol = ip_tunnel_parse_protocol(skb);
>>return real_protocol && skb->protocol == real_protocol;
>> }
>>
>> So that means, at least in theory, icmp_ndo_send could be called with
>> skb->protocol != ip_tunnel_parse_protocol(skb). I guess I can address
>> that. But... is it actually a problem?
>
> For this forwarded packet that arrived on a wireguard tunnel,
> skb->protocol was originally also set by ip_tunnel_parse_protocol.
> So likely not.
>
> The other issue seems more like a real bug. wg_xmit calling
> icmp_ndo_send without clearing IPCB first.
>

Bingo! Nice eye! I confirmed the crash by just memsetting 0x41 to cb
before the call. Clearly this should be zeroed by icmp_ndo_xmit. Will
send a patch for icmp_ndo_xmit momentarily and will CC you.

Jason


Re: possible stack corruption in icmp_send (__stack_chk_fail)

2021-02-17 Thread Jason A. Donenfeld
Hi Willem,

On Wed, Feb 17, 2021 at 11:27 PM Willem de Bruijn
 wrote:
> A vmlinux image might help. I couldn't find one for this kernel.

https://data.zx2c4.com/icmp_send-crash-e03b4a42-706a-43bf-bc40-1f15966b3216.tar.xz
has .debs with vmlinuz in there, which you can extract to vmlinux, as
well as my own vmlinux elf construction with the symbols added back in
by extracting them from kallsyms. That's the best I've been able to
do, as all of this is coming from somebody random emailing me.

> But could it be
> that the forwarded packet is not sensible IPv4? The skb->protocol is
> inferred in wg_packet_consume_data_done->ip_tunnel_parse_protocol.

The wg calls to icmp_ndo_send are gated by checking skb->protocol:

if (skb->protocol == htons(ETH_P_IP))
   icmp_ndo_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH, 0);
   else if (skb->protocol == htons(ETH_P_IPV6))
   icmpv6_ndo_send(skb, ICMPV6_DEST_UNREACH,
ICMPV6_ADDR_UNREACH, 0);

On the other hand, that code is hit on an error path when
wg_check_packet_protocol returns false:

static inline bool wg_check_packet_protocol(struct sk_buff *skb)
{
   __be16 real_protocol = ip_tunnel_parse_protocol(skb);
   return real_protocol && skb->protocol == real_protocol;
}

So that means, at least in theory, icmp_ndo_send could be called with
skb->protocol != ip_tunnel_parse_protocol(skb). I guess I can address
that. But... is it actually a problem?

Jason


possible stack corruption in icmp_send (__stack_chk_fail)

2021-02-17 Thread Jason A. Donenfeld
Hi Netdev & Willem,

I've received a report of stack corruption -- via the stack protector
check -- in icmp_send. I was sent a vmcore, and was able to extract
the OOPS from there. However, I've been unable to produce the bug and
I don't see where it'd be in the code. That might point to a more
sinister problem, or I'm simply just not seeing it. Apparently the
reporter reproduces it every 40 or so minutes, and has seen it happen
since at least ~5.10. Willem - I'm emailing you because it seems like
you were making a lot of changes to the icmp code around then, and
perhaps you have an intuition. For example, some of the error handling
code takes a pointer to a stack buffer (_objh and such), and maybe
that's problematic? I'm not quite sure. The vmcore, along with the
various kernel binaries I hunted down are here:
https://data.zx2c4.com/icmp_send-crash-e03b4a42-706a-43bf-bc40-1f15966b3216.tar.xz
. The extracted dmesg follows below, in case you or anyone has a
pointer. I've been staring at this for a while and don't see it.

Jason

Kernel panic - not syncing: stack-protector: Kernel stack is corrupted
in: __icmp_send+0x5bd/0x5c0
CPU: 0 PID: 959 Comm: kworker/0:2 Kdump: loaded Not tainted
5.11.0-051100-lowlatency #202102142330
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.13.0-48-gd9c812dda519-prebuilt.qemu.org 04/01/2014
Workqueue: wg-crypt-wg0 wg_packet_decrypt_worker [wireguard]
Call Trace:
 
 show_stack+0x52/0x58
 dump_stack+0x70/0x8b
 panic+0x108/0x2ea
 ? ip_push_pending_frames+0x42/0x90
 ? __icmp_send+0x5bd/0x5c0
 __stack_chk_fail+0x14/0x20
 __icmp_send+0x5bd/0x5c0
 icmp_ndo_send+0x148/0x160
 wg_xmit+0x359/0x450 [wireguard]
 ? harmonize_features+0x19/0x80
 xmit_one.constprop.0+0x9f/0x190
 dev_hard_start_xmit+0x43/0x90
 sch_direct_xmit+0x11d/0x340
 __qdisc_run+0x66/0xc0
 __dev_xmit_skb+0xd5/0x340
 __dev_queue_xmit+0x32b/0x4d0
 ? nf_conntrack_double_lock.constprop.0+0x97/0x140 [nf_conntrack]
 dev_queue_xmit+0x10/0x20
 neigh_connected_output+0xcb/0xf0
 ip_finish_output2+0x17f/0x470
 __ip_finish_output+0x9b/0x140
 ? ipv4_confirm+0x4a/0x80 [nf_conntrack]
 ip_finish_output+0x2d/0xb0
 ip_output+0x78/0x110
 ? __ip_finish_output+0x140/0x140
 ip_forward_finish+0x58/0x90
 ip_forward+0x40a/0x4d0
 ? ip4_key_hashfn+0xb0/0xb0
 ip_sublist_rcv_finish+0x3d/0x50
 ip_list_rcv_finish.constprop.0+0x163/0x190
 ip_sublist_rcv+0x37/0xb0
 ? ip_rcv_finish_core.constprop.0+0x310/0x310
 ip_list_rcv+0xf5/0x120
 __netif_receive_skb_list_core+0x228/0x250
 __netif_receive_skb_list+0x102/0x170
 ? dev_gro_receive+0x1b5/0x370
 netif_receive_skb_list_internal+0xca/0x190
 napi_complete_done+0x7a/0x1a0
 wg_packet_rx_poll+0x384/0x400 [wireguard]
 napi_poll+0x92/0x200
 net_rx_action+0xb8/0x1c0
 __do_softirq+0xce/0x2b3
 asm_call_irq_on_stack+0x12/0x20
 
 do_softirq_own_stack+0x3d/0x50
 do_softirq+0x66/0x80
 __local_bh_enable_ip+0x62/0x70
 _raw_spin_unlock_bh+0x1e/0x20
 wg_packet_decrypt_worker+0xf6/0x190 [wireguard]
 process_one_work+0x217/0x3e0
 worker_thread+0x4d/0x350
 ? rescuer_thread+0x390/0x390
 kthread+0x145/0x170
 ? __kthread_bind_mask+0x70/0x70
 ret_from_fork+0x22/0x30
Kernel Offset: 0x200 from 0x8100 (relocation range:
0x8000-0xbfff)


Re: KASAN: invalid-access Write in enqueue_timer

2021-02-16 Thread Jason A. Donenfeld
On Tue, Feb 16, 2021 at 6:46 PM Jason A. Donenfeld  wrote:
>
> Hi Catalin,
>
> On Tue, Feb 16, 2021 at 6:28 PM Catalin Marinas  
> wrote:
> > Adding Jason and Ard. It may be a use-after-free in the wireguard
> > driver.
>
> Thanks for sending this my way. Note: to my knowledge, Ard doesn't
> work on wireguard.
>
> > >  hlist_add_head include/linux/list.h:883 [inline]
> > >  enqueue_timer+0x18/0xc0 kernel/time/timer.c:581
> > >  mod_timer+0x14/0x20 kernel/time/timer.c:1106
> > >  mod_peer_timer drivers/net/wireguard/timers.c:37 [inline]
> > >  wg_timers_any_authenticated_packet_traversal+0x68/0x90 
> > > drivers/net/wireguard/timers.c:215
>
> The line of hlist_add_head that it's hitting is:
>
> static inline void hlist_add_head(struct hlist_node *n, struct hlist_head *h)
> {
>struct hlist_node *first = h->first;
>WRITE_ONCE(n->next, first);
>if (first)
>
> So that means it's the dereferencing of h that's a problem. That comes from:
>
> static void enqueue_timer(struct timer_base *base, struct timer_list *timer,
>  unsigned int idx, unsigned long bucket_expiry)
> {
>
>hlist_add_head(>entry, base->vectors + idx);
>
> That means it concerns base->vectors + idx, not the timer_list object
> that wireguard manages. That's confusing. Could that imply that the
> bug is in freeing a previous timer without removing it from the timer
> lists, so that it winds up being in base->vectors?
>
> The allocation and deallocation backtrace is confusing
>
> > >  alloc_netdev_mqs+0x5c/0x3bc net/core/dev.c:10546
> > >  rtnl_create_link+0xc8/0x2b0 net/core/rtnetlink.c:3171
> > >  __rtnl_newlink+0x5bc/0x800 net/core/rtnetlink.c:3433
>
> This suggests it's part of the `ip link add wg0 type wireguard` nelink
> call, during it's allocation of the netdevice's private area. For
> this, the wg_device struct is used. It has no timer_list structures in
> it!
>
> Similarly,
>
> > >  netdev_freemem+0x18/0x2c net/core/dev.c:10500
> > >  netdev_release+0x30/0x44 net/core/net-sysfs.c:1828
> > >  device_release+0x34/0x90 drivers/base/core.c:1980
>
> That smells like `ip link del wg0 type wireguard`. But again,
> wg_device doesn't have any timer_lists in it.
>
> So what's happening here exactly? I'm not really sure yet...
>
> It'd be nice to have a reproducer.
>
>
> Jason


Digging around on syzkaller, it looks like there's a similar bug on
jbd2, concerning iptunnels's allocation:

https://syzkaller.appspot.com/text?tag=CrashReport=13afb19cd0

And one from ext4:

https://syzkaller.appspot.com/text?tag=CrashReport=17685330d0

And from from ext4 with fddup:

https://syzkaller.appspot.com/text?tag=CrashReport=17685330d0
https://syzkaller.appspot.com/text?tag=CrashReport=12d326e8d0

It might not actually be a wireguard bug?


Re: KASAN: invalid-access Write in enqueue_timer

2021-02-16 Thread Jason A. Donenfeld
Hi Catalin,

On Tue, Feb 16, 2021 at 6:28 PM Catalin Marinas  wrote:
> Adding Jason and Ard. It may be a use-after-free in the wireguard
> driver.

Thanks for sending this my way. Note: to my knowledge, Ard doesn't
work on wireguard.

> >  hlist_add_head include/linux/list.h:883 [inline]
> >  enqueue_timer+0x18/0xc0 kernel/time/timer.c:581
> >  mod_timer+0x14/0x20 kernel/time/timer.c:1106
> >  mod_peer_timer drivers/net/wireguard/timers.c:37 [inline]
> >  wg_timers_any_authenticated_packet_traversal+0x68/0x90 
> > drivers/net/wireguard/timers.c:215

The line of hlist_add_head that it's hitting is:

static inline void hlist_add_head(struct hlist_node *n, struct hlist_head *h)
{
   struct hlist_node *first = h->first;
   WRITE_ONCE(n->next, first);
   if (first)

So that means it's the dereferencing of h that's a problem. That comes from:

static void enqueue_timer(struct timer_base *base, struct timer_list *timer,
 unsigned int idx, unsigned long bucket_expiry)
{

   hlist_add_head(>entry, base->vectors + idx);

That means it concerns base->vectors + idx, not the timer_list object
that wireguard manages. That's confusing. Could that imply that the
bug is in freeing a previous timer without removing it from the timer
lists, so that it winds up being in base->vectors?

The allocation and deallocation backtrace is confusing

> >  alloc_netdev_mqs+0x5c/0x3bc net/core/dev.c:10546
> >  rtnl_create_link+0xc8/0x2b0 net/core/rtnetlink.c:3171
> >  __rtnl_newlink+0x5bc/0x800 net/core/rtnetlink.c:3433

This suggests it's part of the `ip link add wg0 type wireguard` nelink
call, during it's allocation of the netdevice's private area. For
this, the wg_device struct is used. It has no timer_list structures in
it!

Similarly,

> >  netdev_freemem+0x18/0x2c net/core/dev.c:10500
> >  netdev_release+0x30/0x44 net/core/net-sysfs.c:1828
> >  device_release+0x34/0x90 drivers/base/core.c:1980

That smells like `ip link del wg0 type wireguard`. But again,
wg_device doesn't have any timer_lists in it.

So what's happening here exactly? I'm not really sure yet...

It'd be nice to have a reproducer.


Jason


Re: [regression -next0117] What is kcompactd and why is he eating 100% of my cpu?

2021-02-16 Thread Jason A. Donenfeld
On Mon, Jan 25, 2021 at 07:54:38PM +0100, Tibor Bana wrote:
> Greetings!
> 
> I don't know if it still actual, but I am strugling with this problem right 
> now and searching the internet for solutions.
> I read the thread and saw that you are strugling to reproduce the problem, 
> and I can reproduce it almost every day.
> 
> - Install vmware player, and a linux guest. 
> - Configure the virtual machine to have a good amount of memory and cpu
> - run resource intensive tasks on the guest
> - when the host used up almost it's all memory and start to reuse caches 
> kcompactd will kick in.
> 
> As I know the problem is related to transparent huge pages, but I tried to 
> disable it. 
> Today I saw the problem again and kcompactd shown an interesting status in 
> top. It hasn't used any memory, all zeroes but it used up one core 
> completely. 
> 
> My machine is a core-i7 with 4 physical cores and hyper threading and 24GB 
> Memory
> 5.9.11-arch2-1 #1 SMP PREEMPT Sat, 28 Nov 2020 02:07:22 + x86_64 GNU/Linux

Another anecdote: 5.11.0, 64 gigs of ram. If I run QEMU/KVM for a VM
with 16 gigs at the same time as a VMware VM with 16 gigs of ram,
kcompact goes wild and both VMs get really slow. The key here is running
KVM at the same time as VMware.



Re: [PATCH] Input: synaptic - reverting dcb00fc799dc03fd320e123e4c81b3278c763ea5 because it breaks the touchpad for one guy on Reddit.

2021-02-07 Thread Jason A. Donenfeld
On Sun, Feb 7, 2021 at 5:00 AM Colton Booth  wrote:
>
> I can't test myself since I don't have the correct hardware, BUT this change 
> seems to work for him. I'm thinking he has an early version of the X1E which 
> may use slightly different trackpad revision.
>
> Signed-off-by: Colton Booth 
> ---
>  drivers/input/mouse/synaptics.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/drivers/input/mouse/synaptics.c b/drivers/input/mouse/synaptics.c
> index ffad142801b3..2d3f03921dbc 100644
> --- a/drivers/input/mouse/synaptics.c
> +++ b/drivers/input/mouse/synaptics.c
> @@ -179,9 +179,7 @@ static const char * const smbus_pnp_ids[] = {
> "LEN0093", /* T480 */
> "LEN0096", /* X280 */
> "LEN0097", /* X280 -> ALPS trackpoint */
> -   "LEN0099", /* X1 Extreme Gen 1 / P1 Gen 1 */
> "LEN009b", /* T580 */
> -   "LEN0402", /* X1 Extreme Gen 2 / P1 Gen 2 */
> "LEN200f", /* T450s */
> "LEN2044", /* L470  */
> "LEN2054", /* E480 */

This removes two totally separate models for "one guy on Reddit". Does
he have two computers that coincidentally have the same problem? Any
details about what that problem is so that we can assess it?

Jason


Re: forkat(int pidfd), execveat(int pidfd), other awful things?

2021-02-01 Thread Jason A. Donenfeld
> int execve_parent(int parent_pidfd, int root_dirfd, int cgroup_fd, int
> namespace_fd, const char *pathname, char *const argv[], char *const
> envp[]);

A variant on the same scheme would be:

int execve_remote(int pidfd, int root_dirfd, int cgroup_fd, int
namespace_fd, const char *pathname, char *const argv[], char *const
envp[]);

Unpriv'd process calls fork(), and from that fork sends its pidfd
through a unix socket to systemd-sudod, which then calls execve_remote
on that pidfd.

There are a lot of (potentially very bad) ways to skin this cat.


forkat(int pidfd), execveat(int pidfd), other awful things?

2021-02-01 Thread Jason A. Donenfeld
Hi Andy & others,

I was reversing some NT stuff recently and marveling over how wild and
crazy things are over in Windows-land. A few things related to process
creation caught my interest:

- It's possible to create a new process with an *arbitrary parent
process*, which means it'll then inherit various things like handles
and security attributes and tokens from that new parent process.

- It's possible to create a new process with the memory space handle
of a different process. Consider this on Linux, and you have some
abomination like `forkat(int pidfd)`.

The big question is "why!?" At first I was just amused by its presence
in NT. Everything is an object and you can usually freely mix and
match things, and it's very flexible, which is cool. But this is NT,
not Linux.

Jann and I were discussing, though, that maybe some variant of these
features might be useful to get rid of setuid executables. Imagine
something like `systemd-sudod`, forked off of PID 1 very early.
Subsequently all new processes on the system run with
PR_SET_NO_NEW_PRIVS or similar policies to prevent non-root->root
transition. Then, if you want to transition, you ask systemd-sudod (or
polkitd, or whatever else you have in mind) to make you a new process,
and it then does the various policy checks, and executes a new process
for you as the parent of the requesting process.

So how would that work? Well, executing processes with arbitrary
parents would be part of it, as above. But we'd probably want to more
carefully control that new process. Which chroot is it in? How do
cgroups work? And so on. And ultimately this design leads to something
like ZwCreateProcess, where you have several arguments, each to a
handle to some part of the new process state, or null to be inherited
from its parent.

int execve_parent(int parent_pidfd, int root_dirfd, int cgroup_fd, int
namespace_fd, const char *pathname, char *const argv[], char *const
envp[]);

One could imagine this growing pretty unwieldy. There's also this
other design aspect of Linux that's worth considering. Namespaces and
other process-inherited resources are generally hierarchical, with
children getting the resource from their parent. This makes sense and
is simple to conceptualize. Everytime we add a new thing_fd as a
pointer to one of these resources, and allow it to be used outside of
that hierarchy, it introduces a kind of "escape hatch". That might be
considered "bad design" by some; it might not be by others. Seen this
way, NT is one massive escape hatch, with pretty much everything being
an object with a handle.

But! Maybe this is nonetheless an interesting design avenue to
explore. The introduction of pidfd is sort of just the "beginning" of
that kind of design.

Is any of this interesting to you as a future of privilege escalation
and management on Linux?

Jason


Re: [PATCH v2] wireguard: netlink: add multicast notification for peer changes

2021-01-15 Thread Jason A. Donenfeld
Hey Linus,

My email server has been firewalled from vger.kernel.org until today,
so I didn't see the original until this v2 was sent today. My
apologies. I'll review this first thing on Monday.

Jason


Re: [PATCH] random: avoid arch_get_random_seed_long() when collecting IRQ randomness

2021-01-15 Thread Jason A. Donenfeld
In case it helps,

Reviewed-by: Jason A. Donenfeld 

[resending, as my mail server got blocked from vger for a week and my
message bounced]


Re: wg-crypt-wg0 process

2020-12-30 Thread Jason A. Donenfeld
Hi Fatih,

Thanks for the report and the detailed test case. From what I can see,
this behavior presents itself both with the explicit ip link del and
without. When running with debugging enabled, I can see this in dmesg:

[558758.361056] wireguard: wg0: Keypair 244 destroyed for peer 21
[558758.546649] wireguard: wg0: Peer 21 (10.150.150.2:51820) destroyed
[558758.563317] wireguard: wg0: Interface destroyed
[558758.567803] wireguard: wg0: Keypair 243 destroyed for peer 22
[558758.733287] wireguard: wg0: Peer 22 (10.150.150.1:51820) destroyed
[558758.749991] wireguard: wg0: Interface destroyed

The fact that I see "Interface destroyed" for both interfaces means
that wg_destruct() is being called, which includes these calls:

destroy_workqueue(wg->handshake_receive_wq);
destroy_workqueue(wg->handshake_send_wq);
destroy_workqueue(wg->packet_crypt_wq);

In doing so, the output of ps changes from:

$ ps aux|grep wg0
root  200479  0.0  0.0  0 0 ?I13:06   0:00
[kworker/0:2-wg-crypt-wg0]
root  201226  0.0  0.0  0 0 ?I13:08   0:00
[kworker/1:4-wg-crypt-wg0]
root  201476  0.0  0.0  0 0 ?I<   13:11   0:00
[wg-crypt-wg0]
root  201484  0.0  0.0  0 0 ?I<   13:11   0:00
[wg-crypt-wg0]

to:

$ ps aux|grep wg0
root  200479  0.0  0.0  0 0 ?I13:06   0:00
[kworker/0:2-wg-crypt-wg0]
root  201226  0.0  0.0  0 0 ?I13:08   0:00
[kworker/1:4-wg-crypt-wg0]

What I suspect is happening is that destroying the workqueue does not
actually destroy the kthreads that they were using, so that they can
be reused (and eventually relabeled) by other drivers. Looking at the
stack of those indicates this is probably the case:

$ cat /proc/200479/stack
[<0>] worker_thread+0xba/0x3c0
[<0>] kthread+0x114/0x130
[<0>] ret_from_fork+0x1f/0x30

So it's just hanging out there idle waiting to be scheduled by
something new. In fact, while I was writing this email, that worker
already seems to have been reclaimed by another driver:

$ cat /proc/200479/comm
kworker/0:2-events

Now it's called "events".

This is happening because the kthread isn't actually destroyed, and
task->comm is being hijacked. In proc_task_name we have:

if (p->flags & PF_WQ_WORKER)
   wq_worker_comm(tcomm, sizeof(tcomm), p);
   else
   __get_task_comm(tcomm, sizeof(tcomm), p);

That top condition holds for workqueue workers, and wq_worker_comm
winds up scnprintf'ing the current worker description in there:

/*
* ->desc tracks information (wq name or
* set_worker_desc()) for the latest execution.  If
* current, prepend '+', otherwise '-'.
*/
   if (worker->desc[0] != '\0') {
   if (worker->current_work)
   scnprintf(buf + off, size - off, "+%s",
 worker->desc);
   else
   scnprintf(buf + off, size - off, "-%s",
 worker->desc);

But worker->desc isn't set until process_one_work is called:

/*
* Record wq name for cmdline and debug reporting, may get
* overridden through set_worker_desc().
*/
   strscpy(worker->desc, pwq->wq->name, WORKER_DESC_LEN);

And it's never unset after the work is done and it's waiting idle in
worker_thread for the scheduler to reschedule it and eventually call
process_one_work on a new work unit.

It would be easy to just null out that string after the work is done
with something like:

worker->current_func(work);
worker->desc[0] = '\0';

But I guess this has never sufficiently bothered anyone before. I
suppose I could submit a patch and see how it's received. But it also
looks like the scnprintf above in wq_worker_comm distinguishes these
cases anyway. If there's a + it means that the work is active and if
there's a - it means that it's an old leftover thread. So maybe this
is fine as-is?

Jason


Re: drivers/char/random.c needs a (new) maintainer

2020-12-23 Thread Jason A. Donenfeld
On Wed, Dec 23, 2020 at 5:03 PM Jason A. Donenfeld  wrote:
>
> Hi Peter,
>
> On Wed, Dec 23, 2020 at 5:01 PM Petr Tesarik  wrote:
> > I never suggested that this should serve as a supportive argument. I was 
> > just trying to be honest about our motivations.
> >
> > I'm a bit sad that this discussion has quickly gone back to the choice of 
> > algorithms and how they can be implemented.
>
> Why are you sad? You are interested in FIPS. FIPS indicates a certain
> set of algorithms. The ones most suitable to the task seem like they'd
> run into real practical problems in the kernel's RNG.
>
> That's not the _only_ reason I'm not keen on FIPS, but it does seem
> like a very basic one.
>
> Jason

And just to add to that: in working through Nicholai's patches (an
ongoing process), I'm reminded of his admonishment in the 00 cover
letter that at some point chacha20 will have to be replaced, due to
FIPS. So it seems like that's very much on the table. I brought it up
here as an example ("For example, " is how I began that sentence), but
it is a concern. If you want to make lots of changes for cryptographic
or technical reasons, that seems like a decent way to engage. But if
the motivation for each of these is the bean counting, then again, I'm
pretty wary of churn for nothing. And if that bean counting will
eventually lead us into bad corners, like the concerns I brought up
about FPU usage in the kernel, then I'm even more hesitant. However, I
think there may be good arguments to be made that some of Nicholai's
patches stand on their own, without the FIPS motivation. And that's
the set of arguments that are compelling.

Jason


Re: drivers/char/random.c needs a (new) maintainer

2020-12-23 Thread Jason A. Donenfeld
Hi Peter,

On Wed, Dec 23, 2020 at 5:01 PM Petr Tesarik  wrote:
> I never suggested that this should serve as a supportive argument. I was just 
> trying to be honest about our motivations.
>
> I'm a bit sad that this discussion has quickly gone back to the choice of 
> algorithms and how they can be implemented.

Why are you sad? You are interested in FIPS. FIPS indicates a certain
set of algorithms. The ones most suitable to the task seem like they'd
run into real practical problems in the kernel's RNG.

That's not the _only_ reason I'm not keen on FIPS, but it does seem
like a very basic one.

Jason


Re: drivers/char/random.c needs a (new) maintainer

2020-12-23 Thread Jason A. Donenfeld
On Wed, Dec 23, 2020 at 4:26 PM Stephan Mueller  wrote:
>
> Am Mittwoch, dem 23.12.2020 um 15:32 +0100 schrieb Jason A. Donenfeld:
> >
> > I would, however, be interested in a keccak-based construction. But
> > just using the keccak permutation does not automatically make it
> > "SHA-3", so we're back at the same issue again. FIPS is simply not
> > interesting for our requirements.
>
> Using non-assessed cryptography? Sounds dangerous to me even though it may be
> based on some well-known construction.

"assessed" is not necessarily the same as FIPS. Don't conflate the
two. I don't appreciate that kind of dishonest argumentation.

And new constructions that I'm interested in would be formally
verified (like the other crypto work I've done) with review and buy-in
from the cryptographic community, both engineering and academic. I
have no interest in submitting "non-assessed" things developed in a
vacuum, and I'm displeased with your attempting to make that
characterization.

Similarly, any other new design proposed I would expect a similar
amount of rigor. The current RNG is admittedly a bit of a mess, but at
least it's a design that's evolved. Something that's "revolutionary",
rather than evolutionary, needs considerably more argumentation.

So, please, don't strawman this into the "non-assessed" rhetoric.


Re: drivers/char/random.c needs a (new) maintainer

2020-12-23 Thread Jason A. Donenfeld
On Wed, Dec 23, 2020 at 3:17 PM Petr Tesarik  wrote:
> Upfront, let me admit that SUSE has a vested interest in a FIPS-certifiable 
> Linux kernel.

Sorry, but just because you have a "vested interest", or a financial
interest, or because you want it does not suddenly make it a good
idea. The idea is to have good crypto, not to merely check some boxes
for the bean counters.

For example, it's very unlikely that future kernel RNGs will move to
using AES, due to the performance overhead involved on non-table-based
implementations, and the lack of availability of FPU/AES-NI in all the
contexts we need. NT's fortuna machine can use AES, because NT allows
the FPU in all contexts. We don't have that luxury (or associated
performance penalty).

I would, however, be interested in a keccak-based construction. But
just using the keccak permutation does not automatically make it
"SHA-3", so we're back at the same issue again. FIPS is simply not
interesting for our requirements.

Jason


Re: [PATCH] wireguard: avoid double unlikely() notation when using IS_ERR()

2020-12-10 Thread Jason A. Donenfeld
On Thu, Dec 10, 2020 at 9:56 AM Antonio Quartulli  wrote:
>
> The definition of IS_ERR() already applies the unlikely() notation
> when checking the error status of the passed pointer. For this
> reason there is no need to have the same notation outside of
> IS_ERR() itself.
>
> Clean up code by removing redundant notation.

Thanks for the patch. I can confirm this doesn't change the codgen at all.

I've queued this up in wireguard's staging tree, and I'll push it back
out as part of the next series.

Jason


Re: [PATCH] random: avoid arch_get_random_seed_long() when collecting IRQ randomness

2020-12-07 Thread Jason A. Donenfeld
Hi Ard,

On Tue, Dec 1, 2020 at 1:24 PM Ard Biesheuvel  wrote:
> > > > is implemented. In most cases, these are special instructions, but in
> > > > some cases, such as on ARM, we may want to back this using firmware
> > > > calls, which are considerably more expensive.

This seems fine. But I suppose I'm curious to learn more about what
you have in mind for ARM. We've been assuming that arch_get_random is
not horribly expensive. Usually external RNGs that are horribly
expensive separate hardware take a different route and aren't hooked
into arch_get_random. When you say "we may want to back this using
firmware", does that mean it hasn't happened yet, and you're in a
position to direct the design otherwise? If so, would it be reasonable
to take a different route with that hardware, and keep arch_get_random
for when it's actually implemented by the hardware? Or are there
actually good reasons for keeping it one and the same?

On the other hand, rdrand on intel is getting slower and slower, to
the point where we've removed it from a few places that used to use
it. And I don't see anything terribly wrong with removing the extra
call in this path here. So need be, I'll offer my Reviewed-by. But I
wanted to get an understanding of the fuller story first.

Jason


Re: drivers/char/random.c needs a (new) maintainer

2020-12-01 Thread Jason A. Donenfeld
On Mon, Nov 30, 2020 at 5:56 PM Theodore Y. Ts'o  wrote:
> patches this cycle.  One thing that would help me is if folks
> (especially Jason, if you would) could start with a detailed review of
> Nicolai's patches.

Sure, I'll take a look.


Re: drivers/char/random.c needs a (new) maintainer

2020-11-30 Thread Jason A. Donenfeld
I am willing to maintain random.c and have intentions to have a
formally verified RNG. I've mentioned this to Ted before.

But I think Ted's reluctance to not accept the recent patches sent to
this list is mostly justified, and I have no desire to see us rush
into replacing random.c with something suboptimal or FIPSy.


Re: [PATCH v2] Compiler Attributes: remove CONFIG_ENABLE_MUST_CHECK

2020-11-28 Thread Jason A. Donenfeld
On Sat, Nov 28, 2020 at 9:48 AM Masahiro Yamada  wrote:
>
> Revert commit cebc04ba9aeb ("add CONFIG_ENABLE_MUST_CHECK").
>
> A lot of warn_unused_result wearings existed in 2006, but until now
> they have been fixed thanks to people doing allmodconfig tests.
>
> Our goal is to always enable __must_check where appropriate, so this
> CONFIG option is no longer needed.
>
> I see a lot of defconfig (arch/*/configs/*_defconfig) files having:
>
> # CONFIG_ENABLE_MUST_CHECK is not set
>
> I did not touch them for now since it would be a big churn. If arch
> maintainers want to clean them up, please go ahead.
>
> While I was here, I also moved __must_check to compiler_attributes.h
> from compiler_types.h

For the wireguard test harness change:

Acked-by: Jason A. Donenfeld 


Re: [PATCH AUTOSEL 5.9 26/55] wireguard: selftests: check that route_me_harder packets use the right sk

2020-11-13 Thread Jason A. Donenfeld
On Tue, Nov 10, 2020 at 6:20 PM Greg KH  wrote:
>
> On Tue, Nov 10, 2020 at 01:29:41PM +0100, Jason A. Donenfeld wrote:
> > Note that this requires
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=46d6c5ae953cc0be38efd0e469284df7c4328cf8
> > And that commit should be backported to every kernel ever, since the
> > bug is so old.
>
> Sasha queued this up to 5.4.y and 5.9.y, but it looks like it doesn't
> easily apply to older kernels.  If you think that it's needed beyond
> that, I'll gladly take backported patches.

Backport to older kernels coming your way shortly.


Re: [PATCH AUTOSEL 5.9 26/55] wireguard: selftests: check that route_me_harder packets use the right sk

2020-11-10 Thread Jason A. Donenfeld
Note that this requires
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=46d6c5ae953cc0be38efd0e469284df7c4328cf8
And that commit should be backported to every kernel ever, since the
bug is so old.


Re: [PATCH] wireguard: convert selftest/{counter,ratelimiter}.c to KUnit

2020-10-19 Thread Jason A. Donenfeld
Hi Daniel,

Thanks for this patch. KUnit looks interesting. But I'm not totally
sure what this would gain here, except more complicated infrastructure
to handle. We're already running these tests in our CI on every single
commit made to a variety of trees on a variety of architectures -- see
https://www.wireguard.com/build-status/ -- runnable via `make -C
tools/testing/selftests/wireguard/qemu -j$(nproc)`. It looks like this
commit breaks that while making everything slightly more complex. Is
there a good reason to switch over to this other than fad? From a
development perspective, I don't see this as really helping with much.

Jason

On Mon, Oct 19, 2020 at 10:24 PM Daniel Latypov  wrote:
>
> These tests already focus on testing individual functions that can run
> in a more minimal environment like KUnit.
>
> The primary motivation for this change it to make it faster and easier
> to run these tests, and thus encourage the addition of more test cases.
>
> E.g.
> Test timing after make mrproper: 47.418s building, 0.000s running
> With an incremental build: 3.891s building, 0.000s running
>
> KUnit also provides a bit more structure, like tracking overall
> pass/fail status and printing failure messages like
> >  # wg_packet_counter_test: EXPECTATION FAILED at 
> > drivers/net/wireguard/counter_test.c:32
> >  Expected counter_validate(counter, (COUNTER_WINDOW_SIZE + 1)) == false, but
>
> Note: so we no longer need to track test_num in counter_test.c.
> But deleting the /*1*/ test_num comments means git (with the default
> threshold) no longer recognizes that the file was moved.
>
> Signed-off-by: Daniel Latypov 
> Cc: Jason A. Donenfeld 
> Cc: David Miller 
> Cc: Brendan Higgins 
> ---
>  drivers/net/Kconfig   | 12 
>  .../{selftest/counter.c => counter_test.c}| 45 ++--
>  drivers/net/wireguard/main.c  |  3 +-
>  drivers/net/wireguard/queueing.h  |  4 --
>  drivers/net/wireguard/ratelimiter.c   |  4 +-
>  .../ratelimiter.c => ratelimiter_test.c}  | 68 +++
>  drivers/net/wireguard/receive.c   |  6 +-
>  7 files changed, 80 insertions(+), 62 deletions(-)
>  rename drivers/net/wireguard/{selftest/counter.c => counter_test.c} (73%)
>  rename drivers/net/wireguard/{selftest/ratelimiter.c => ratelimiter_test.c} 
> (85%)
>
> diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
> index c3dbe64e628e..208ed162bcc0 100644
> --- a/drivers/net/Kconfig
> +++ b/drivers/net/Kconfig
> @@ -114,6 +114,18 @@ config WIREGUARD_DEBUG
>
>   Say N here unless you know what you're doing.
>
> +config WIREGUARD_KUNIT_TEST
> +   tristate "KUnit tests for WireGuard"
> +   default KUNIT_ALL_TESTS
> +   depends on KUNIT && WIREGUARD
> +   help
> + This enables KUnit tests for Wireguard.
> +
> + For more information on KUnit and unit tests in general please refer
> + to the KUnit documentation in Documentation/dev-tools/kunit/.
> +
> + Say N here unless you know what you're doing.
> +
>  config EQUALIZER
> tristate "EQL (serial line load balancing) support"
> help
> diff --git a/drivers/net/wireguard/selftest/counter.c 
> b/drivers/net/wireguard/counter_test.c
> similarity index 73%
> rename from drivers/net/wireguard/selftest/counter.c
> rename to drivers/net/wireguard/counter_test.c
> index ec3c156bf91b..167153fc249f 100644
> --- a/drivers/net/wireguard/selftest/counter.c
> +++ b/drivers/net/wireguard/counter_test.c
> @@ -3,32 +3,23 @@
>   * Copyright (C) 2015-2019 Jason A. Donenfeld . All Rights 
> Reserved.
>   */
>
> -#ifdef DEBUG
> -bool __init wg_packet_counter_selftest(void)
> +#include 
> +
> +static void wg_packet_counter_test(struct kunit *test)
>  {
> struct noise_replay_counter *counter;
> -   unsigned int test_num = 0, i;
> -   bool success = true;
> +   unsigned int i;
>
> -   counter = kmalloc(sizeof(*counter), GFP_KERNEL);
> -   if (unlikely(!counter)) {
> -   pr_err("nonce counter self-test malloc: FAIL\n");
> -   return false;
> -   }
> +   counter = kunit_kmalloc(test, sizeof(*counter), GFP_KERNEL);
> +   KUNIT_ASSERT_NOT_ERR_OR_NULL(test, counter);
>
>  #define T_INIT do {\
> memset(counter, 0, sizeof(*counter));  \
> spin_lock_init(>lock);\
> } while (0)
>  #define T_LIM (COUNTER_WINDOW_SIZE + 1)
> -#define T(n, v) do {  

Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver

2020-10-17 Thread Jason A. Donenfeld
After discussing this offline with Jann a bit, I have a few general
comments on the design of this.

First, the UUID communicated by the hypervisor should be consumed by
the kernel -- added as another input to the rng -- and then userspace
should be notified that it should reseed any userspace RNGs that it
may have, without actually communicating that UUID to userspace. IOW,
I agree with Jann there. Then, it's the functioning of this
notification mechanism to userspace that is interesting to me.

There are a few design goals of notifying userspace: it should be
fast, because people who are using userspace RNGs are usually doing so
in the first place to completely avoid syscall overhead for whatever
high performance application they have - e.g. I recall conversations
with Colm about his TLS implementation needing to make random IVs
_really_ fast. It should also happen as early as possible, with no
race or as minimal as possible race window, so that userspace doesn't
begin using old randomness and then switch over after the damage is
already done.

I'm also not wedded to using Microsoft's proprietary hypervisor design
for this. If we come up with a better interface, I don't think it's
asking too much to implement that and reasonably expect for Microsoft
to catch up. Maybe someone here will find that controversial, but
whatever -- discussing ideal designs does not seem out of place or
inappropriate for how we usually approach things in the kernel, and a
closed source hypervisor coming along shouldn't disrupt that.

So, anyway, here are a few options with some pros and cons for the
kernel notifying userspace that its RNG should reseed.

1. SIGRND - a new signal. Lol.

2. Userspace opens a file descriptor that it can epoll on. Pros are
that many notification mechanisms already use this. Cons is that this
requires syscall and might be more racy than we want. Another con is
that this a new thing for userspace programs to do.

3. We stick an atomic counter in the vDSO, Jann's suggestion. Pros are
that this is extremely fast, and also simple to use and implement.
There are enough sequence points in typical crypto programs that
checking to see whether this counter has changed before doing whatever
operation seems easy enough. Cons are that typically we've been
conservative about adding things to the vDSO, and this is also a new
thing for userspace programs to do.

4. We already have a mechanism for this kind of thing, because the
same issue comes up when fork()ing. The solution was MADV_WIPEONFORK,
where userspace marks a page to be zeroed when forking, for the
purposes of the RNG being notified when its world gets split in two.
This is basically the same thing as we're discussing here with guest
snapshots, except it's on the system level rather than the process
level, and a system has many processes. But the problem space is still
almost the same, and we could simply reuse that same mechanism. There
are a few implementation strategies for that:

4a. We mess with the PTEs of all processes' pages that are
MADV_WIPEONFORK, like fork does now, when the hypervisor notifies us
to do so. Then we wind up reusing the already existing logic for
userspace RNGs. Cons might be that this usually requires semaphores,
and we're in irq context, so we'd have to hoist to a workqueue, which
means either more wake up latency, or a larger race window.

4b. We just memzero all processes' pages that are MADV_WIPEONFORK,
when the hypervisor notifies us to do so. Then we wind up reusing the
already existing logic for userspace RNGs.

4c. The guest kernel maintains an array of physical addresses that are
MADV_WIPEONFORK. The hypervisor knows about this array and its
location through whatever protocol, and before resuming a
moved/snapshotted/duplicated VM, it takes the responsibility for
memzeroing this memory. The huge pro here would be that this
eliminates all races, and reduces complexity quite a bit, because the
hypervisor can perfectly synchronize its bringup (and SMP bringup)
with this, and it can even optimize things like on-disk memory
snapshots to simply not write out those pages to disk.

A 4c-like approach seems like it'd be a lot of bang for the buck -- we
reuse the existing mechanism (MADV_WIPEONFORK), so there's no new
userspace API to deal with, and it'd be race free, and eliminate a lot
of kernel complexity.

But 4b and 3 don't seem too bad either.

Any thoughts on 4c? Is that utterly insane, or does that actually get
us somewhere close to what we want?

Jason


[PATCH] powerpc32: don't adjust unmoved stack pointer in csum_partial_copy_generic() epilogue

2020-10-14 Thread Jason A. Donenfeld
A recent change to the checksum code removed usage of some extra
arguments, alongside with storage on the stack for those, and the stack
pointer no longer needed to be adjusted in the function prologue. But, a
left over subtraction wasn't removed in the function epilogue, causing
the function to return with the stack pointer moved 16 bytes away from
where it should have. This corrupted local state and lead to weird
crashes. This commit simply removes the leftover instruction from the
epilogue.

Fixes: 70d65cd555c5 ("ppc: propagate the calling conventions change down to 
csum_partial_copy_generic()")
Cc: Al Viro 
Signed-off-by: Jason A. Donenfeld 
---
 arch/powerpc/lib/checksum_32.S | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/lib/checksum_32.S b/arch/powerpc/lib/checksum_32.S
index ec5cd2dede35..27d9070617df 100644
--- a/arch/powerpc/lib/checksum_32.S
+++ b/arch/powerpc/lib/checksum_32.S
@@ -236,7 +236,6 @@ _GLOBAL(csum_partial_copy_generic)
slwir0,r0,8
adder12,r12,r0
 66:addze   r3,r12
-   addir1,r1,16
beqlr+  cr7
rlwinm  r3,r3,8,0,31/* odd destination address: rotate one byte */
blr
-- 
2.28.0



Re: [PATCH v2 20/20] ppc: propagate the calling conventions change down to csum_partial_copy_generic()

2020-10-14 Thread Jason A. Donenfeld
On Thu, Oct 15, 2020 at 12:53 AM Linus Torvalds
 wrote:
>
> On Wed, Oct 14, 2020 at 3:51 PM Linus Torvalds
>  wrote:
> >
> > I think it's this instruction:
> >
> > addir1,r1,16
> >
> > that should be removed from the function exit, because Al removed the
> >
> > -   stwur1,-16(r1)
> >
> > on function entry.
> >
> > So I think you end up with a corrupt stack pointer and basically
> > random behavior.
> >
> > Mind trying that? (This is obviously all in
> > arch/powerpc/lib/checksum_32.S, the csum_partial_copy_generic()
> > function).
>
> Patch attached to make it easier to test.
>
> NOTE! This is ENTIRELY untested. My ppc asm is so rusty that I might
> be barking entirely up the wrong tree, and I just made things much
> worse.

Indeed - exactly the same thing that's in my tree. Writing a commit
message for it now.

Jason


Re: [PATCH v2 20/20] ppc: propagate the calling conventions change down to csum_partial_copy_generic()

2020-10-14 Thread Jason A. Donenfeld
On Thu, Oct 15, 2020 at 12:51 AM Linus Torvalds
 wrote:
>
> On Wed, Oct 14, 2020 at 3:27 PM Jason A. Donenfeld  wrote:
> >
> > This patch is causing crashes in WireGuard's CI over at
> > https://www.wireguard.com/build-status/ . Apparently sending a simple
> > network packet winds up triggering refcount_t's warn-on-saturate code. I
>
> Ouch.
>
> The C parts look fairly straightforward, and I don't see how they
> could cause that odd refcount issue.
>
> So I assume it's the low-level asm code conversion that is buggy. And
> it's apparently the 32-bit conversion, since your ppc64 status looks
> fine.
>
> I think it's this instruction:
>
> addir1,r1,16
>
> that should be removed from the function exit, because Al removed the
>
> -   stwur1,-16(r1)

I just tried that about a minute ago, and indeed that seems to be
what's up. Problem goes away without it. I'll send a patch shortly.

Jason


Re: [PATCH v2 20/20] ppc: propagate the calling conventions change down to csum_partial_copy_generic()

2020-10-14 Thread Jason A. Donenfeld
Hi Al,

On Fri, Jul 24, 2020 at 02:25:46AM +0100, Al Viro wrote:
> From: Al Viro 
> 
> ... and get rid of the pointless fallback in the wrappers.  On error it used
> to zero the unwritten area and calculate the csum of the entire thing.  Not
> wanting to do it in assembler part had been very reasonable; doing that in
> the first place, OTOH...  In case of an error the caller discards the data
> we'd copied, along with whatever checksum it might've had.

This patch is causing crashes in WireGuard's CI over at
https://www.wireguard.com/build-status/ . Apparently sending a simple
network packet winds up triggering refcount_t's warn-on-saturate code. I
don't know if the new assembly failed to reset some flag or if something
else is up. I can start digging into it if you want, but I thought I
should let you know first about the issue. The splat follows below.

Thanks,
Jason

$ ping -c 10 -f -W 1 192.168.241.1
PING 192.168.241.1 (192.168.241.1) 56(84) bytes of data.
[1.432922] [ cut here ]
[1.433069] refcount_t: saturated; leaking memory.
[1.433344] WARNING: CPU: 3 PID: 90 at refcount_warn_saturate+0x100/0x1bc
[1.433646] CPU: 3 PID: 90 Comm: ping Not tainted 5.9.0+ #3
[1.433797] NIP:  c01a6fa0 LR: c01a6fa0 CTR: c01ccbec
[1.433964] REGS: cfacfb80 TRAP: 0700   Not tainted  (5.9.0+)
[1.434102] MSR:  00029000   CR: 28022404  XER: 
[1.434345]
[1.434345] GPR00: c01a6fa0 cfacfc38 cf8eeae0 0026 3fffefff cfacfa90 
cfacfaa0 00021000
[1.434345] GPR08: 0f4a1000  c08b4674 c0918704 42022404  
cfa34180 
[1.434345] GPR16:  cf8ef004   0040  
 cfbac230
[1.434345] GPR24: cfacfce8 c02a802c  cfa34180 cfacfc58 c02aa53c 
55c0a4ff 
[1.435471] NIP [c01a6fa0] refcount_warn_saturate+0x100/0x1bc
[1.435615] LR [c01a6fa0] refcount_warn_saturate+0x100/0x1bc
[1.435825] Call Trace:
[1.435922] [cfacfc38] [c01a6fa0] refcount_warn_saturate+0x100/0x1bc 
(unreliable)
[1.436149] [cfacfc48] [c02a7f14] __ip_append_data.isra.0+0x8a8/0xde0
[1.436302] [cfacfce8] [c02a84e0] ip_append_data.part.0+0x94/0xf0
[1.436438] [cfacfd18] [c02dffe0] raw_sendmsg+0x298/0xa84
[1.436544] [cfacfe48] [c020b9ec] __sys_sendto+0xdc/0x13c
[1.436641] [cfacff38] [c000f1dc] ret_from_syscall+0x0/0x38
[1.436824] --- interrupt: c01 at 0xb7e44f00
[1.436824] LR = 0xb7e21ba0
[1.437038] Instruction dump:
[1.437239] 3d20c092 39291bc1 89490001 2c0a 4082ff64 3c60c040 7c0802a6 
3941
[1.437439] 38633b74 90010014 99490001 4be9b6e1 <0fe0> 80010014 7c0803a6 
4b38
[1.437753] ---[ end trace aaa4b4788958d0a6 ]---
[1.440214] [ cut here ]
[1.440301] refcount_t: underflow; use-after-free.
[1.440397] WARNING: CPU: 3 PID: 90 at refcount_warn_saturate+0x1ac/0x1bc
[1.440587] CPU: 3 PID: 90 Comm: ping Tainted: GW 5.9.0+ #3
[1.440741] NIP:  c01a704c LR: c01a704c CTR: c01ccbec
[1.440857] REGS: cfacfaa0 TRAP: 0700   Tainted: GW  (5.9.0+)
[1.441016] MSR:  00029000   CR: 48022404  XER: 
[1.441176]
[1.441176] GPR00: c01a704c cfacfb58 cf8eeae0 0026 3fffefff cfacf9b0 
cfacf9c0 00021000
[1.441176] GPR08: 0f4a1000 0400 c08b4674 c0918704 42022404  
10020464 0003
[1.441176] GPR16: 7ff0 1002 0080 cfb27000 cfb2704c c093 
cfacfc54 c092d260
[1.441176] GPR24: 058c cfa82120 cfa8212c cfa8212c  cfa82000 
cfacfd44 cfacfc58
[1.441995] NIP [c01a704c] refcount_warn_saturate+0x1ac/0x1bc
[1.442125] LR [c01a704c] refcount_warn_saturate+0x1ac/0x1bc
[1.442252] Call Trace:
[1.442320] [cfacfb58] [c01a704c] refcount_warn_saturate+0x1ac/0x1bc 
(unreliable)
[1.442726] [cfacfb68] [c020e7dc] sock_wfree+0x130/0x134
[1.442877] [cfacfb78] [c01f1388] wg_packet_send_staged_packets+0x234/0x6b4
[1.443061] [cfacfbb8] [c01eecf8] wg_xmit+0x2a0/0x46c
[1.443204] [cfacfbf8] [c0232134] dev_hard_start_xmit+0x190/0x1c0
[1.443369] [cfacfc38] [c0232f2c] __dev_queue_xmit+0x4d0/0x844
[1.443527] [cfacfc88] [c02a7134] ip_finish_output2+0x180/0x6b8
[1.443686] [cfacfcb8] [c02aa3e8] ip_output+0xf0/0x1c0
[1.443829] [cfacfd08] [c02ab14c] ip_send_skb+0x24/0xe8
[1.443975] [cfacfd18] [c02e04bc] raw_sendmsg+0x774/0xa84
[1.444124] [cfacfe48] [c020b9ec] __sys_sendto+0xdc/0x13c
[1.444274] [cfacff38] [c000f1dc] ret_from_syscall+0x0/0x38
[1.37] --- interrupt: c01 at 0xb7e44f00
[1.37] LR = 0xb7e21ba0
[1.444644] Instruction dump:
[1.444736] 4be9b661 0fe0 80010014 7c0803a6 4bfffeb8 3c60c040 7c0802a6 
3941
[1.444989] 38633bd8 90010014 99490003 4be9b635 <0fe0> 80010014 7c0803a6 
4bfffe8c
[1.445252] ---[ end trace aaa4b4788958d0a7 ]---
[1.445583] BUG: Unable to handle kernel instruction fetch (NULL pointer?)
[1.445767] Faulting instruction address: 0x
[

Re: 5.9-rc7 null ptr deref in __i915_gem_userptr_get_pages_worker

2020-09-28 Thread Jason A. Donenfeld
Oh, this is just a copy and paste error, when the code was originally
pasted from internal_get_user_pages_fast, which assumes a current.
I'll fix this up and send a patch shortly.


Re: 5.9-rc7 null ptr deref in __i915_gem_userptr_get_pages_worker

2020-09-28 Thread Jason A. Donenfeld
Alright, the failing code seems to be in mm:

if (flags & FOLL_PIN)
atomic_set(>mm->has_pinned, 1);

Apparently you can't rely on current->mm being valid in this context;
it's null here, hence the +0x64 for has_pinned's offset.

This was added by 008cfe4418b3 ("mm: Introduce mm_struct.has_pinned"),
which is new for rc7 indeed.

The crash goes away when changing that to:

if ((flags & FOLL_PIN) && current->mm)
atomic_set(>mm->has_pinned, 1);

But I haven't really evaluated whether or not that's racy or if I need
to take locks to do such a thing.


Re: 5.9-rc7 null ptr deref in __i915_gem_userptr_get_pages_worker

2020-09-28 Thread Jason A. Donenfeld
Increasing the CC list a bit, as i915 didn't really get much churn
rc6->rc7, but mm/gup.c did, and mm has had a lot of recent changes.

On Mon, Sep 28, 2020 at 11:39 AM Jason A. Donenfeld  wrote:
>
> Seeing a new crash in 5.9-rc7 I didn't have in 5.9-rc6:
>
> [ 1311.596896] BUG: kernel NULL pointer dereference, address: 0064
> [ 1311.596898] #PF: supervisor write access in kernel mode
> [ 1311.596899] #PF: error_code(0x0002) - not-present page
> [ 1311.596899] PGD 0 P4D 0
> [ 1311.596901] Oops: 0002 [#1] SMP
> [ 1311.596902] CPU: 10 PID: 1431 Comm: kworker/u33:1 Tainted: P S   U
>O  5.9.0-rc7+ #140
> [ 1311.596903] Hardware name: LENOVO 20QTCTO1WW/20QTCTO1WW, BIOS
> N2OET47W (1.34 ) 08/06/2020
> [ 1311.596955] Workqueue: i915-userptr-acquire
> __i915_gem_userptr_get_pages_worker [i915]
> [ 1311.596959] RIP: 0010:__get_user_pages_remote+0xd7/0x310
> [ 1311.596960] Code: f5 01 00 00 83 7d 00 01 0f 85 ed 01 00 00 f7 c1
> 00 00 04 00 0f 84 58 01 00 00 65 48 8b 04 25 00 6d 01 00 48 8b 80 40
> 03 00 00  40 64 01 00 00 00 65 48 8b 04 25 00 6d 01 00 48 c7 44 24
> 18 00
> [ 1311.596961] RSP: 0018:888fdfe47de0 EFLAGS: 00010206
> [ 1311.596962] RAX:  RBX: 7fe188531000 RCX: 
> 00040001
> [ 1311.596962] RDX: 0001 RSI: 7fe188531000 RDI: 
> 888ff0748f00
> [ 1311.596963] RBP: 888fdfe47e54 R08: 888fedc7d7c8 R09: 
> 
> [ 1311.596963] R10: 0018 R11: fefefefefefefeff R12: 
> 888ff0748f00
> [ 1311.596963] R13: 888fedc7d7c8 R14: 888f81fe3a40 R15: 
> 00042003
> [ 1311.596964] FS:  () GS:888ffc48()
> knlGS:
> [ 1311.596965] CS:  0010 DS:  ES:  CR0: 80050033
> [ 1311.596965] CR2: 0064 CR3: 02009003 CR4: 
> 003706e0
> [ 1311.596966] DR0:  DR1:  DR2: 
> 
> [ 1311.596966] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> [ 1311.596967] Call Trace:
> [ 1311.596993]  __i915_gem_userptr_get_pages_worker+0xc8/0x260 [i915]
> [ 1311.596996]  process_one_work+0x1ca/0x390
> [ 1311.596997]  worker_thread+0x48/0x3c0
> [ 1311.596998]  ? rescuer_thread+0x3d0/0x3d0
> [ 1311.597000]  kthread+0x114/0x130
> [ 1311.597001]  ? kthread_create_worker_on_cpu+0x40/0x40
> [ 1311.597003]  ret_from_fork+0x1f/0x30
> [ 1311.597031] CR2: 0064
> [ 1311.597033] ---[ end trace e2b8ddde994a6f6d ]---


5.9-rc7 null ptr deref in __i915_gem_userptr_get_pages_worker

2020-09-28 Thread Jason A. Donenfeld
Seeing a new crash in 5.9-rc7 I didn't have in 5.9-rc6:

[ 1311.596896] BUG: kernel NULL pointer dereference, address: 0064
[ 1311.596898] #PF: supervisor write access in kernel mode
[ 1311.596899] #PF: error_code(0x0002) - not-present page
[ 1311.596899] PGD 0 P4D 0
[ 1311.596901] Oops: 0002 [#1] SMP
[ 1311.596902] CPU: 10 PID: 1431 Comm: kworker/u33:1 Tainted: P S   U
   O  5.9.0-rc7+ #140
[ 1311.596903] Hardware name: LENOVO 20QTCTO1WW/20QTCTO1WW, BIOS
N2OET47W (1.34 ) 08/06/2020
[ 1311.596955] Workqueue: i915-userptr-acquire
__i915_gem_userptr_get_pages_worker [i915]
[ 1311.596959] RIP: 0010:__get_user_pages_remote+0xd7/0x310
[ 1311.596960] Code: f5 01 00 00 83 7d 00 01 0f 85 ed 01 00 00 f7 c1
00 00 04 00 0f 84 58 01 00 00 65 48 8b 04 25 00 6d 01 00 48 8b 80 40
03 00 00  40 64 01 00 00 00 65 48 8b 04 25 00 6d 01 00 48 c7 44 24
18 00
[ 1311.596961] RSP: 0018:888fdfe47de0 EFLAGS: 00010206
[ 1311.596962] RAX:  RBX: 7fe188531000 RCX: 00040001
[ 1311.596962] RDX: 0001 RSI: 7fe188531000 RDI: 888ff0748f00
[ 1311.596963] RBP: 888fdfe47e54 R08: 888fedc7d7c8 R09: 
[ 1311.596963] R10: 0018 R11: fefefefefefefeff R12: 888ff0748f00
[ 1311.596963] R13: 888fedc7d7c8 R14: 888f81fe3a40 R15: 00042003
[ 1311.596964] FS:  () GS:888ffc48()
knlGS:
[ 1311.596965] CS:  0010 DS:  ES:  CR0: 80050033
[ 1311.596965] CR2: 0064 CR3: 02009003 CR4: 003706e0
[ 1311.596966] DR0:  DR1:  DR2: 
[ 1311.596966] DR3:  DR6: fffe0ff0 DR7: 0400
[ 1311.596967] Call Trace:
[ 1311.596993]  __i915_gem_userptr_get_pages_worker+0xc8/0x260 [i915]
[ 1311.596996]  process_one_work+0x1ca/0x390
[ 1311.596997]  worker_thread+0x48/0x3c0
[ 1311.596998]  ? rescuer_thread+0x3d0/0x3d0
[ 1311.597000]  kthread+0x114/0x130
[ 1311.597001]  ? kthread_create_worker_on_cpu+0x40/0x40
[ 1311.597003]  ret_from_fork+0x1f/0x30
[ 1311.597031] CR2: 0064
[ 1311.597033] ---[ end trace e2b8ddde994a6f6d ]---


Re: [DISCUSSION PATCH 00/41] random: possible ways towards NIST SP800-90B compliance

2020-09-21 Thread Jason A. Donenfeld
I haven't looked into the details of this patchset yet, but your
description here indicates to me that this is motivated by FIPS
certification desires, which...worries me. I would like to rewrite the
RNG at some point, and I've started to work on a bunch of designs for
this (and proving them correct, too), but going about this via FIPS
certification or trying to implement some NIST specs is most certainly
the wrong way to go about this, will lock us into subpar crypto for
years, and is basically a waste of time.


Re: [PATCH] x86/msr: do not warn on writes to OC_MAILBOX

2020-09-08 Thread Jason A. Donenfeld
On Tue, Sep 8, 2020 at 8:01 PM Borislav Petkov  wrote:
>
> On Tue, Sep 08, 2020 at 07:42:12PM +0200, Jason A. Donenfeld wrote:
> > Are you prepared to track down all the MSRs that might maybe do
> > something naughty?
>
> I'm not prepared - that's why this MSR filtering. To block *all* direct
> MSR accesses from userspace in the future.
>
> > Does `dd` warn when you run `dd if=/dev/zero of=/dev/sda`?
>
> Yah, because that's the same as bricking your hardware. Geez.

Nobody is talking about bricking any hardware. Sorry if I seemed to
imply that before. In my experience, undervolting improperly results
in the CPU calculating things wrong and eventually crashing, and
overclocking usually will trip some thermal limits where the CPU
powers down. I've never experienced bricked hardware as a result of
this.

Jason


Re: [PATCH] x86/msr: do not warn on writes to OC_MAILBOX

2020-09-08 Thread Jason A. Donenfeld
On Tue, Sep 8, 2020 at 7:36 PM Borislav Petkov  wrote:
>
> On Tue, Sep 08, 2020 at 07:29:11PM +0200, Jason A. Donenfeld wrote:
> > Well that's not cool.
>
> So you're saying that if someone wants to kill its box by poking at that
> MSR, we should just let her/him?
>
> If anything, I think that a BIG FAT WARNING at least would make sense.

Are you prepared to track down all the MSRs that might maybe do
something naughty?

After determining optimal voltages, people have systemd running
intel-undervolt for them. It becomes part of the normal system
configuration, is applied all the time, and after figuring it out
once, users forget they ever had enabled it, except when observing
that their laptop works better than it originally did.

Does `dd` warn when you run `dd if=/dev/zero of=/dev/sda`?

> Now, if there were a proper interface which would allow only valid
> commands, now that would be optimal...

Probably not possible. Optimal values are related to the "silicon
lottery" that occurs when you buy a new CPU. Different optimal values
for different individual chips.


Re: [PATCH] x86/msr: do not warn on writes to OC_MAILBOX

2020-09-08 Thread Jason A. Donenfeld
On Tue, Sep 8, 2020 at 7:26 PM Borislav Petkov  wrote:
>
> On Tue, Sep 08, 2020 at 07:12:44PM +0200, Jason A. Donenfeld wrote:
> > > Overclocking is not architectural I/F and is supported by some special
> > > CPU skews. I can't find any public document to specify the commands
> > > which can be used via this OC mailbox. I have to check internally to
> > > see if there is any. To add a proper sysfs interface we have to make
> > > sure that we are not allowing some random commands to hardware and
> > > crash the system.
> >
> > Well you can definitely crash the system this way -- undervolting can
> > result in all sorts of nice glitching. You might be able to even
> > programmatically undervolt to compromise the kernel in clever ways (a
> > lockdown bypass, I guess, but who cares).
> >
> > That's why I initially suggested this was pretty squarely in the realm
> > of hobbyists and should just be added to that whitelist.
>
> If that MSR can cause all kinds of crazy, I'd prefer writes to it from
> userspace to be completely forbidden, actually. And if force-enabled,
> with a BIG FAT WARNING each time userspace writes to it.

Well that's not cool. And it's sure to really upset the fairly sizable
crowd of people who rely on undervolting and related things to make
their laptops remotely usable, especially in light of the crazy
thermal designs for late-era 14nm intel cpus. Tools like
intel-undervolt have been a godsend in that regard. I came here
posting a patch to remove the annoying message you added for that use
case. Now you want to just totally remove that feature all together
from the kernel? Sounds like a regression in functionality I simply
can't get behind. I know that my laptop, at least, would suffer.

Jason


Re: [PATCH] x86/msr: do not warn on writes to OC_MAILBOX

2020-09-08 Thread Jason A. Donenfeld
On Tue, Sep 8, 2020 at 7:10 PM Srinivas Pandruvada
 wrote:
>
> On Mon, 2020-09-07 at 12:06 +0200, Borislav Petkov wrote:
> > + Srinivas.
> > + kitsunyan.
> >
> > On Mon, Sep 07, 2020 at 11:48:43AM +0200, Jason A. Donenfeld wrote:
> > > Popular tools, like intel-undervolt, use MSR 0x150 to control the
> > > CPU
> > > voltage offset. In fact, evidently the intel_turbo_max_3 driver in-
> > > tree
> > > also uses this MSR. So, teach the kernel's MSR list about this, so
> > > that
> > > intel-undervolt and other such tools don't spew warnings to dmesg,
> > > while
> > > unifying the constant used throughout the kernel.
> > >
>
> [...]
>
> > > -   if (reg == MSR_IA32_ENERGY_PERF_BIAS)
> > > +   switch (reg) {
> > > +   case MSR_IA32_ENERGY_PERF_BIAS:
> There is already sysfs interface for it.
>
> > > +   case MSR_IA32_OC_MAILBOX:
> > > return 0;
> > > +   }
> > >
>
> [...]
>
> > Actually, we added the filtering to catch exactly such misuses and,
> > lemme check what is the proper word now... /me checks, aha, adding
> > new
> > MSRs to the "passlist" is the wrong thing to do.
> >
> > Srinivas, can you pls convert this in-tree driver to use a proper
> > sysfs
> > interface for that mailbox MSR and also work with the intel-undervolt
> > author - I hope I have the right person CCed from the git repo on
> > github
> > - to come up with a proper interface so that we can drop this MSR use
> > too.
>
> Overclocking is not architectural I/F and is supported by some special
> CPU skews. I can't find any public document to specify the commands
> which can be used via this OC mailbox. I have to check internally to
> see if there is any. To add a proper sysfs interface we have to make
> sure that we are not allowing some random commands to hardware and
> crash the system.

Well you can definitely crash the system this way -- undervolting can
result in all sorts of nice glitching. You might be able to even
programmatically undervolt to compromise the kernel in clever ways (a
lockdown bypass, I guess, but who cares).

That's why I initially suggested this was pretty squarely in the realm
of hobbyists and should just be added to that whitelist.


Re: [PATCH] x86/msr: do not warn on writes to OC_MAILBOX

2020-09-07 Thread Jason A. Donenfeld
On Mon, Sep 7, 2020 at 1:11 PM Borislav Petkov  wrote:
>
> Hi,
>
> On Mon, Sep 07, 2020 at 12:46:35PM +0200, Jason A. Donenfeld wrote:
> > Are you sure that intel-undervolt using OC_MAILBOX from userspace is
> > actually a "misuse"? Should the kernel or kernel drivers actually be
> > involved with the task of underclocking? This seems pretty squarely in
> > the realm of "hobbyists poking and prodding at their CPUs" rather than
> > something made for a kernel driver, right?
>
> The only thing I'm sure is that *if* it makes sense for any driver to
> control something in the hardware over MSRs, it should *not* poke at
> naked MSRs but use a proper interface.
>
> I'd leave it to the people who actually need this interface, to explain
> why they do.
>
> > Also, what was the justification for whitelisting
> > MSR_IA32_ENERGY_PERF_BIAS?
>
> That's:
>
> tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c
>
> Once that thing gets converted to a proper interface too, that MSR goes
> off the allowlist too.

Gotcha. So your perspective is that the goal is actually to have no
list at all in the end, because all MSR writes should go through sysfs
interfaces and such, always? I certainly like that goal -- it'd make a
whole lot of CPU functionality a lot more discoverable and easier to
tinker with. In practice, it seems like that's a hard goal to
accomplish, with different MSRs having different semantics and
meanings of different bit offsets, and a great deal of them aren't
actually publicly documented by Intel. Were you hoping to just handle
these piece by piece, and eventually Linux will have a decent
compendium of MSRs? That sure would be nice. Is Intel on board with
that plan?

Jason


Re: [PATCH] x86/msr: do not warn on writes to OC_MAILBOX

2020-09-07 Thread Jason A. Donenfeld
Hi Borislav,


On Mon, Sep 7, 2020 at 12:06 PM Borislav Petkov  wrote:
>
> + Srinivas.
> + kitsunyan.
>
> On Mon, Sep 07, 2020 at 11:48:43AM +0200, Jason A. Donenfeld wrote:
> > Popular tools, like intel-undervolt, use MSR 0x150 to control the CPU
> > voltage offset. In fact, evidently the intel_turbo_max_3 driver in-tree
> > also uses this MSR. So, teach the kernel's MSR list about this, so that
> > intel-undervolt and other such tools don't spew warnings to dmesg, while
> > unifying the constant used throughout the kernel.
> >
> > Fixes: a7e1f67ed29f ("x86/msr: Filter MSR writes")
> > Cc: Borislav Petkov 
> > Signed-off-by: Jason A. Donenfeld 
> > ---
> >  arch/x86/include/asm/msr-index.h | 2 ++
> >  arch/x86/kernel/msr.c| 5 -
> >  drivers/platform/x86/intel_turbo_max_3.c | 6 +++---
> >  3 files changed, 9 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/msr-index.h 
> > b/arch/x86/include/asm/msr-index.h
> > - if (reg == MSR_IA32_ENERGY_PERF_BIAS)
> > + switch (reg) {
> > + case MSR_IA32_ENERGY_PERF_BIAS:
> > + case MSR_IA32_OC_MAILBOX:
> >   return 0;
> > + }
> Actually, we added the filtering to catch exactly such misuses and,

Are you sure that intel-undervolt using OC_MAILBOX from userspace is
actually a "misuse"? Should the kernel or kernel drivers actually be
involved with the task of underclocking? This seems pretty squarely in
the realm of "hobbyists poking and prodding at their CPUs" rather than
something made for a kernel driver, right? Also, what was the
justification for whitelisting MSR_IA32_ENERGY_PERF_BIAS?

Jason


[PATCH] x86/msr: do not warn on writes to OC_MAILBOX

2020-09-07 Thread Jason A. Donenfeld
Popular tools, like intel-undervolt, use MSR 0x150 to control the CPU
voltage offset. In fact, evidently the intel_turbo_max_3 driver in-tree
also uses this MSR. So, teach the kernel's MSR list about this, so that
intel-undervolt and other such tools don't spew warnings to dmesg, while
unifying the constant used throughout the kernel.

Fixes: a7e1f67ed29f ("x86/msr: Filter MSR writes")
Cc: Borislav Petkov 
Signed-off-by: Jason A. Donenfeld 
---
 arch/x86/include/asm/msr-index.h | 2 ++
 arch/x86/kernel/msr.c| 5 -
 drivers/platform/x86/intel_turbo_max_3.c | 6 +++---
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 2859ee4f39a8..0bcb3604d0e2 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -132,6 +132,8 @@
 #define MSR_IA32_MCU_OPT_CTRL  0x0123
 #define RNGDS_MITG_DIS BIT(0)
 
+#define MSR_IA32_OC_MAILBOX0x0150
+
 #define MSR_IA32_SYSENTER_CS   0x0174
 #define MSR_IA32_SYSENTER_ESP  0x0175
 #define MSR_IA32_SYSENTER_EIP  0x0176
diff --git a/arch/x86/kernel/msr.c b/arch/x86/kernel/msr.c
index 49dcfb85e773..64f6200681e3 100644
--- a/arch/x86/kernel/msr.c
+++ b/arch/x86/kernel/msr.c
@@ -86,8 +86,11 @@ static int filter_write(u32 reg)
default: break;
}
 
-   if (reg == MSR_IA32_ENERGY_PERF_BIAS)
+   switch (reg) {
+   case MSR_IA32_ENERGY_PERF_BIAS:
+   case MSR_IA32_OC_MAILBOX:
return 0;
+   }
 
pr_err_ratelimited("Write to unrecognized MSR 0x%x by %s\n"
   "Please report to x...@kernel.org\n",
diff --git a/drivers/platform/x86/intel_turbo_max_3.c 
b/drivers/platform/x86/intel_turbo_max_3.c
index 892140b62898..991cdbc3295b 100644
--- a/drivers/platform/x86/intel_turbo_max_3.c
+++ b/drivers/platform/x86/intel_turbo_max_3.c
@@ -17,8 +17,8 @@
 
 #include 
 #include 
+#include 
 
-#define MSR_OC_MAILBOX 0x150
 #define MSR_OC_MAILBOX_CMD_OFFSET  32
 #define MSR_OC_MAILBOX_RSP_OFFSET  32
 #define MSR_OC_MAILBOX_BUSY_BIT63
@@ -41,14 +41,14 @@ static int get_oc_core_priority(unsigned int cpu)
value = cmd << MSR_OC_MAILBOX_CMD_OFFSET;
/* Set the busy bit to indicate OS is trying to issue command */
value |=  BIT_ULL(MSR_OC_MAILBOX_BUSY_BIT);
-   ret = wrmsrl_safe(MSR_OC_MAILBOX, value);
+   ret = wrmsrl_safe(MSR_IA32_OC_MAILBOX, value);
if (ret) {
pr_debug("cpu %d OC mailbox write failed\n", cpu);
return ret;
}
 
for (i = 0; i < OC_MAILBOX_RETRY_COUNT; ++i) {
-   ret = rdmsrl_safe(MSR_OC_MAILBOX, );
+   ret = rdmsrl_safe(MSR_IA32_OC_MAILBOX, );
if (ret) {
pr_debug("cpu %d OC mailbox read failed\n", cpu);
break;
-- 
2.28.0



Re: WARNING in __cfg80211_connect_result

2020-08-20 Thread Jason A. Donenfeld
On Wed, Aug 19, 2020 at 8:42 PM syzbot
 wrote:
>
> syzbot has bisected this issue to:
>
> commit e7096c131e5161fa3b8e52a650d7719d2857adfd
> Author: Jason A. Donenfeld 
> Date:   Sun Dec 8 23:27:34 2019 +
>
> net: WireGuard secure network tunnel
>
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=175ad8b190
> start commit:   e3ec1e8c net: eliminate meaningless memcpy to data in pskb..
> git tree:   net-next
> final oops: https://syzkaller.appspot.com/x/report.txt?x=14dad8b190
> console output: https://syzkaller.appspot.com/x/log.txt?x=10dad8b190
> kernel config:  https://syzkaller.appspot.com/x/.config?x=3d400a47d1416652
> dashboard link: https://syzkaller.appspot.com/bug?extid=cc4c0f394e2611edba66
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=15d9de9190
>
> Reported-by: syzbot+cc4c0f394e2611edb...@syzkaller.appspotmail.com
> Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")

Having trouble linking this back to wireguard... Those oopses don't
have anything to do with it either. Bisection error?


Re: [GIT PULL] x86/mm changes for v5.9

2020-08-07 Thread Jason A. Donenfeld
Hey Joerg,

On Thu, Aug 6, 2020 at 9:23 PM Joerg Roedel  wrote:
> Jason, can you share more details about the test setup which triggers
> this? Like the .config and the machine setup, ideally a qemu
> command-line, and how to reproduce it on that setup.

make -C tools/testing/selftests/wireguard/qemu -j$(nproc)

I can also confirm that the patch you sent to the list fixes the issue.

Jason


Re: [PATCH] x86/mm/64: Do not dereference non-present PGD entries

2020-08-07 Thread Jason A. Donenfeld
On Fri, Aug 7, 2020 at 10:40 AM Joerg Roedel  wrote:
>
> From: Joerg Roedel 
>
> The code for preallocate_vmalloc_pages() was written under the
> assumption that the p4d_offset() and pud_offset() functions will perform
> present checks before dereferencing the parent entries.
>
> This assumption is wrong an leads to a bug in the code which causes the
> physical address found in the PGD be used as a page-table page, even if
> the PGD is not present.
>
> So the code flow currently is:
>
> pgd = pgd_offset_k(addr);
> p4d = p4d_offset(pgd, addr);
> if (p4d_none(*p4d))
> p4d = p4d_alloc(_mm, pgd, addr);
>
> This lacks a check for pgd_none() at least, the correct flow would be:
>
> pgd = pgd_offset_k(addr);
> if (pgd_none(*pgd))
> p4d = p4d_alloc(_mm, pgd, addr);
> else
> p4d = p4d_offset(pgd, addr);
>
> But this is the same flow that the p4d_alloc() and the pud_alloc()
> functions use internally, so there is no need to duplicate them.
>
> Remove the p?d_none() checks from the function and just call into
> p4d_alloc() and pud_alloc() to correctly pre-allocate the PGD entries.
>
> Reported-by: Jason A. Donenfeld 
> Fixes: 6eb82f994026 ("x86/mm: Pre-allocate P4D/PUD pages for vmalloc area")
> Signed-off-by: Joerg Roedel 
> ---
>  arch/x86/mm/init_64.c | 31 +--
>  1 file changed, 13 insertions(+), 18 deletions(-)
>
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 3f4e29a78f2b..449e071240e1 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -1253,28 +1253,23 @@ static void __init preallocate_vmalloc_pages(void)
> p4d_t *p4d;
> pud_t *pud;
>
> -   p4d = p4d_offset(pgd, addr);
> -   if (p4d_none(*p4d)) {
> -   /* Can only happen with 5-level paging */
> -   p4d = p4d_alloc(_mm, pgd, addr);
> -   if (!p4d) {
> -   lvl = "p4d";
> -   goto failed;
> -   }
> -   }
> +   lvl = "p4d";
> +   p4d = p4d_alloc(_mm, pgd, addr);
> +   if (!p4d)
> +   goto failed;
>
> +   /*
> +* With 5-level paging the P4D level is not folded. So the 
> PGDs
> +* are now populated and there is no need to walk down to the
> +* PUD level.
> +*/
> if (pgtable_l5_enabled())
> continue;
>
> -   pud = pud_offset(p4d, addr);
> -   if (pud_none(*pud)) {
> -   /* Ends up here only with 4-level paging */
> -   pud = pud_alloc(_mm, p4d, addr);
> -   if (!pud) {
> -   lvl = "pud";
> -   goto failed;
> -       }
> -   }
> +   lvl = "pud";
> +   pud = pud_alloc(_mm, p4d, addr);
> +   if (!pud)
> +   goto failed;
> }
>
> return;
> --
> 2.26.2


This appears to fix the issue, so:

Tested-by: Jason A. Donenfeld 


Re: [GIT PULL] x86/mm changes for v5.9

2020-08-05 Thread Jason A. Donenfeld
Hi Ingo, Joerg,

On Mon, Aug 03, 2020 at 09:03:54PM +0200, Ingo Molnar wrote:
> Linus,
> 
> Please pull the latest x86/mm git tree from:
> 
>git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-mm-2020-08-03
> 
># HEAD: 2b32ab031e82a109e2c5b0d30ce563db0fe286b4 x86/mm/64: Make 
> sync_global_pgds() static
> 
> The biggest change is to not sync the vmalloc and ioremap ranges for x86-64 
> anymore.
> 
>  Thanks,
> 
>   Ingo
> 
> -->
> Joerg Roedel (3):
>   x86/mm: Pre-allocate P4D/PUD pages for vmalloc area
>   x86/mm/64: Do not sync vmalloc/ioremap mappings
>   x86/mm/64: Make sync_global_pgds() static

The commit 8bb9bf242d1f ("x86/mm/64: Do not sync vmalloc/ioremap
mappings") causes the OOPS below, in Linus' tree and in linux-next,
unearthed by my CI on .
Bisecting reveals 8bb9bf242d1f, and reverting this makes the OOPS go
away.

Jason


BUG: unable to handle page fault for address: e8d00608
#PF: supervisor read access in kernel mode
#PF: error_code(0x) - not-present page
PGD 0 P4D 0
Oops:  [#1] PREEMPT SMP
CPU: 2 PID: 22 Comm: kworker/2:0 Not tainted 5.8.0+ #154
RIP: 0010:process_one_work+0x2c/0x2d0
Code: 41 56 41 55 41 54 55 48 89 f5 53 48 89 fb 48 83 ec 08 48 8b 06 4c 8b 67 
40 49 89 c6 45 30 f6 a8 04 b8 00 00 00 00 4c 0f 44 f0 <49> 8b 46 08 44 8b a8 00 
01 05
RSP: 0018:88800016be98 EFLAGS: 00010002
RAX:  RBX: 88810c00 RCX: 
RDX: fffedb85 RSI: 888000391900 RDI: 88810c00
RBP: 888000391900 R08: 888000391900 R09: 888000350090
R10: ff770df6 R11: 88800f91f840 R12: 88800f91f140
R13: 88800f91f160 R14: e8d00600 R15: 88800f91f140
FS:  () GS:88800f90() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: e8d00608 CR3: 001be005 CR4: 001706a0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 worker_thread+0x4b/0x3b0
 ? rescuer_thread+0x360/0x360
 kthread+0x116/0x140
 ? __kthread_create_worker+0x110/0x110
 ret_from_fork+0x1f/0x30
CR2: e8d00608
---[ end trace ca0cfc5a79414fb2 ]---
RIP: 0010:process_one_work+0x2c/0x2d0
Code: 41 56 41 55 41 54 55 48 89 f5 53 48 89 fb 48 83 ec 08 48 8b 06 4c 8b 67 
40 49 89 c6 45 30 f6 a8 04 b8 00 00 00 00 4c 0f 44 f0 <49> 8b 46 08 44 8b a8 00 
01 05
RSP: 0018:88800016be98 EFLAGS: 00010002
RAX:  RBX: 88810c00 RCX: 
RDX: fffedb85 RSI: 888000391900 RDI: 88810c00
RBP: 888000391900 R08: 888000391900 R09: 888000350090
R10: ff770df6 R11: 88800f91f840 R12: 88800f91f140
R13: 88800f91f160 R14: e8d00600 R15: 88800f91f140
FS:  () GS:88800f90() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: e8d00608 CR3: 001be005 CR4: 001706a0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Kernel panic - not syncing: Fatal exception



Re: [PATCH 12/26] netfilter: switch nf_setsockopt to sockptr_t

2020-07-28 Thread Jason A. Donenfeld
On Tue, Jul 28, 2020 at 10:07 AM David Laight  wrote:
>
> From: Christoph Hellwig
> > Sent: 27 July 2020 17:24
> >
> > On Mon, Jul 27, 2020 at 06:16:32PM +0200, Jason A. Donenfeld wrote:
> > > Maybe sockptr_advance should have some safety checks and sometimes
> > > return -EFAULT? Or you should always use the implementation where
> > > being a kernel address is an explicit bit of sockptr_t, rather than
> > > being implicit?
> >
> > I already have a patch to use access_ok to check the whole range in
> > init_user_sockptr.
>
> That doesn't make (much) difference to the code paths that ignore
> the user-supplied length.
> OTOH doing the user/kernel check on the base address (not an
> incremented one) means that the correct copy function is always
> selected.

Right, I had the same reaction in reading this, but actually, his code
gets rid of the sockptr_advance stuff entirely and never mutates, so
even though my point about attacking those pointers was missed, the
code does the better thing now -- checking the base address and never
mutating the pointer. So I think we're good.

>
> Perhaps the functions should all be passed a 'const sockptr_t'.
> The typedef could be made 'const' - requiring non-const items
> explicitly use the union/struct itself.

I was thinking the same, but just by making the pointers inside the
struct const. However, making the whole struct const via the typedef
is a much better idea. That'd probably require changing the signature
of init_user_sockptr a bit, which would be fine, but indeed I think
this would be a very positive change.

Jason


Re: [PATCH 12/26] netfilter: switch nf_setsockopt to sockptr_t

2020-07-27 Thread Jason A. Donenfeld
> From cce2d2e1b43ecee5f4af7cf116808b74b330080f Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig 
> Date: Mon, 27 Jul 2020 17:42:27 +0200
> Subject: net: remove sockptr_advance
>
> sockptr_advance never properly worked.  Replace it with _offset variants
> of copy_from_sockptr and copy_to_sockptr.
>
> Fixes: ba423fdaa589 ("net: add a new sockptr_t type")
> Reported-by: Jason A. Donenfeld 
> Reported-by: Ido Schimmel 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/crypto/chelsio/chtls/chtls_main.c | 12 +-
>  include/linux/sockptr.h   | 27 +++
>  net/dccp/proto.c  |  5 ++---
>  net/ipv4/netfilter/arp_tables.c   |  8 +++
>  net/ipv4/netfilter/ip_tables.c|  8 +++
>  net/ipv4/tcp.c|  5 +++--
>  net/ipv6/ip6_flowlabel.c  | 11 -
>  net/ipv6/netfilter/ip6_tables.c   |  8 +++
>  net/netfilter/x_tables.c  |  7 +++---
>  net/tls/tls_main.c|  6 ++---
>  10 files changed, 49 insertions(+), 48 deletions(-)
>
> diff --git a/drivers/crypto/chelsio/chtls/chtls_main.c 
> b/drivers/crypto/chelsio/chtls/chtls_main.c
> index c3058dcdb33c5c..66d247efd5615b 100644
> --- a/drivers/crypto/chelsio/chtls/chtls_main.c
> +++ b/drivers/crypto/chelsio/chtls/chtls_main.c
> @@ -525,9 +525,9 @@ static int do_chtls_setsockopt(struct sock *sk, int 
> optname,
> /* Obtain version and type from previous copy */
> crypto_info[0] = tmp_crypto_info;
> /* Now copy the following data */
> -   sockptr_advance(optval, sizeof(*crypto_info));
> -   rc = copy_from_sockptr((char *)crypto_info + 
> sizeof(*crypto_info),
> -   optval,
> +   rc = copy_from_sockptr_offset((char *)crypto_info +
> +   sizeof(*crypto_info),
> +   optval, sizeof(*crypto_info),
> sizeof(struct tls12_crypto_info_aes_gcm_128)
> - sizeof(*crypto_info));
>
> @@ -542,9 +542,9 @@ static int do_chtls_setsockopt(struct sock *sk, int 
> optname,
> }
> case TLS_CIPHER_AES_GCM_256: {
> crypto_info[0] = tmp_crypto_info;
> -   sockptr_advance(optval, sizeof(*crypto_info));
> -   rc = copy_from_sockptr((char *)crypto_info + 
> sizeof(*crypto_info),
> -   optval,
> +   rc = copy_from_sockptr_offset((char *)crypto_info +
> +   sizeof(*crypto_info),
> +   optval, sizeof(*crypto_info),
> sizeof(struct tls12_crypto_info_aes_gcm_256)
> - sizeof(*crypto_info));
>
> diff --git a/include/linux/sockptr.h b/include/linux/sockptr.h
> index b13ea1422f93a5..9e6c81d474cba8 100644
> --- a/include/linux/sockptr.h
> +++ b/include/linux/sockptr.h
> @@ -69,19 +69,26 @@ static inline bool sockptr_is_null(sockptr_t sockptr)
> return !sockptr.user;
>  }
>
> -static inline int copy_from_sockptr(void *dst, sockptr_t src, size_t size)
> +static inline int copy_from_sockptr_offset(void *dst, sockptr_t src,
> +   size_t offset, size_t size)
>  {
> if (!sockptr_is_kernel(src))
> -   return copy_from_user(dst, src.user, size);
> -   memcpy(dst, src.kernel, size);
> +   return copy_from_user(dst, src.user + offset, size);
> +   memcpy(dst, src.kernel + offset, size);
> return 0;
>  }
>
> -static inline int copy_to_sockptr(sockptr_t dst, const void *src, size_t 
> size)
> +static inline int copy_from_sockptr(void *dst, sockptr_t src, size_t size)
> +{
> +   return copy_from_sockptr_offset(dst, src, 0, size);
> +}
> +
> +static inline int copy_to_sockptr_offset(sockptr_t dst, size_t offset,
> +   const void *src, size_t size)
>  {
> if (!sockptr_is_kernel(dst))
> -   return copy_to_user(dst.user, src, size);
> -   memcpy(dst.kernel, src, size);
> +   return copy_to_user(dst.user + offset, src, size);
> +   memcpy(dst.kernel + offset, src, size);
> return 0;
>  }
>
> @@ -112,14 +119,6 @@ static inline void *memdup_sockptr_nul(sockptr_t src, 
> size_t len)
> return p;
>  }
>
> -static inline void sockptr_advance(sockptr_t sockptr, size_t len)
> -{
> -   if (sockptr_is_kernel(sockptr))
> -   sockptr.kernel += len;
> -   else
> -   sockptr.user += len;
> -}
> -
>  static i

Re: [PATCH 12/26] netfilter: switch nf_setsockopt to sockptr_t

2020-07-27 Thread Jason A. Donenfeld
On Mon, Jul 27, 2020 at 5:06 PM Christoph Hellwig  wrote:
>
> On Mon, Jul 27, 2020 at 05:03:10PM +0200, Jason A. Donenfeld wrote:
> > Hi Christoph,
> >
> > On Thu, Jul 23, 2020 at 08:08:54AM +0200, Christoph Hellwig wrote:
> > > diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
> > > index da933f99b5d517..42befbf12846c0 100644
> > > --- a/net/ipv4/ip_sockglue.c
> > > +++ b/net/ipv4/ip_sockglue.c
> > > @@ -1422,7 +1422,8 @@ int ip_setsockopt(struct sock *sk, int level,
> > > optname != IP_IPSEC_POLICY &&
> > > optname != IP_XFRM_POLICY &&
> > > !ip_mroute_opt(optname))
> > > -   err = nf_setsockopt(sk, PF_INET, optname, optval, optlen);
> > > +   err = nf_setsockopt(sk, PF_INET, optname, 
> > > USER_SOCKPTR(optval),
> > > +   optlen);
> > >  #endif
> > > return err;
> > >  }
> > > diff --git a/net/ipv4/netfilter/ip_tables.c 
> > > b/net/ipv4/netfilter/ip_tables.c
> > > index 4697d09c98dc3e..f2a9680303d8c0 100644
> > > --- a/net/ipv4/netfilter/ip_tables.c
> > > +++ b/net/ipv4/netfilter/ip_tables.c
> > > @@ -1102,7 +1102,7 @@ __do_replace(struct net *net, const char *name, 
> > > unsigned int valid_hooks,
> > >  }
> > >
> > >  static int
> > > -do_replace(struct net *net, const void __user *user, unsigned int len)
> > > +do_replace(struct net *net, sockptr_t arg, unsigned int len)
> > >  {
> > > int ret;
> > > struct ipt_replace tmp;
> > > @@ -1110,7 +1110,7 @@ do_replace(struct net *net, const void __user 
> > > *user, unsigned int len)
> > > void *loc_cpu_entry;
> > > struct ipt_entry *iter;
> > >
> > > -   if (copy_from_user(, user, sizeof(tmp)) != 0)
> > > +   if (copy_from_sockptr(, arg, sizeof(tmp)) != 0)
> > > return -EFAULT;
> > >
> > > /* overflow check */
> > > @@ -1126,8 +1126,8 @@ do_replace(struct net *net, const void __user 
> > > *user, unsigned int len)
> > > return -ENOMEM;
> > >
> > > loc_cpu_entry = newinfo->entries;
> > > -   if (copy_from_user(loc_cpu_entry, user + sizeof(tmp),
> > > -  tmp.size) != 0) {
> > > +   sockptr_advance(arg, sizeof(tmp));
> > > +   if (copy_from_sockptr(loc_cpu_entry, arg, tmp.size) != 0) {
> > > ret = -EFAULT;
> > > goto free_newinfo;
> > > }
> >
> > Something along this path seems to have broken with this patch. An
> > invocation of `iptables -A INPUT -m length --length 1360 -j DROP` now
> > fails, with
> >
> > nf_setsockopt->do_replace->translate_table->check_entry_size_and_hooks:
> >   (unsigned char *)e + e->next_offset > limit  ==>  TRUE
> >
> > resulting in the whole call chain returning -EINVAL. It bisects back to
> > this commit. This is on net-next.
>
> This is another use o sockptr_advance that Ido already found a problem
> in.  I'm looking into this at the moment..

I haven't seen Ido's patch, but it seems clear the issue is that you
want to call `sockptr_advance(, sizeof(tmp))`, and adjust
sockptr_advance to take a pointer.

Slight concern about the whole concept:

Things are defined as

typedef union {
void*kernel;
void __user *user;
} sockptr_t;
static inline bool sockptr_is_kernel(sockptr_t sockptr)
{
return (unsigned long)sockptr.kernel >= TASK_SIZE;
}

So what happens if we have some code like:

sockptr_t sp;
init_user_sockptr(, user_controlled_struct.extra_user_ptr);
sockptr_advance(, user_controlled_struct.some_big_offset);
copy_to_sockptr(, user_controlled_struct.a_few_bytes,
sizeof(user_controlled_struct.a_few_bytes));

With the user controlling some_big_offset, he can convert the user
sockptr into a kernel sockptr, causing the subsequent copy_to_sockptr
to be a vanilla memcpy, after which a security disaster ensues.

Maybe sockptr_advance should have some safety checks and sometimes
return -EFAULT? Or you should always use the implementation where
being a kernel address is an explicit bit of sockptr_t, rather than
being implicit?


Re: [PATCH 12/26] netfilter: switch nf_setsockopt to sockptr_t

2020-07-27 Thread Jason A. Donenfeld
Hi Christoph,

On Thu, Jul 23, 2020 at 08:08:54AM +0200, Christoph Hellwig wrote:
> diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
> index da933f99b5d517..42befbf12846c0 100644
> --- a/net/ipv4/ip_sockglue.c
> +++ b/net/ipv4/ip_sockglue.c
> @@ -1422,7 +1422,8 @@ int ip_setsockopt(struct sock *sk, int level,
>   optname != IP_IPSEC_POLICY &&
>   optname != IP_XFRM_POLICY &&
>   !ip_mroute_opt(optname))
> - err = nf_setsockopt(sk, PF_INET, optname, optval, optlen);
> + err = nf_setsockopt(sk, PF_INET, optname, USER_SOCKPTR(optval),
> + optlen);
>  #endif
>   return err;
>  }
> diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
> index 4697d09c98dc3e..f2a9680303d8c0 100644
> --- a/net/ipv4/netfilter/ip_tables.c
> +++ b/net/ipv4/netfilter/ip_tables.c
> @@ -1102,7 +1102,7 @@ __do_replace(struct net *net, const char *name, 
> unsigned int valid_hooks,
>  }
>  
>  static int
> -do_replace(struct net *net, const void __user *user, unsigned int len)
> +do_replace(struct net *net, sockptr_t arg, unsigned int len)
>  {
>   int ret;
>   struct ipt_replace tmp;
> @@ -1110,7 +1110,7 @@ do_replace(struct net *net, const void __user *user, 
> unsigned int len)
>   void *loc_cpu_entry;
>   struct ipt_entry *iter;
>  
> - if (copy_from_user(, user, sizeof(tmp)) != 0)
> + if (copy_from_sockptr(, arg, sizeof(tmp)) != 0)
>   return -EFAULT;
>  
>   /* overflow check */
> @@ -1126,8 +1126,8 @@ do_replace(struct net *net, const void __user *user, 
> unsigned int len)
>   return -ENOMEM;
>  
>   loc_cpu_entry = newinfo->entries;
> - if (copy_from_user(loc_cpu_entry, user + sizeof(tmp),
> -tmp.size) != 0) {
> + sockptr_advance(arg, sizeof(tmp));
> + if (copy_from_sockptr(loc_cpu_entry, arg, tmp.size) != 0) {
>   ret = -EFAULT;
>   goto free_newinfo;
>   }

Something along this path seems to have broken with this patch. An
invocation of `iptables -A INPUT -m length --length 1360 -j DROP` now
fails, with

nf_setsockopt->do_replace->translate_table->check_entry_size_and_hooks:
  (unsigned char *)e + e->next_offset > limit  ==>  TRUE

resulting in the whole call chain returning -EINVAL. It bisects back to
this commit. This is on net-next.

Jason


Re: [PATCH] Revert "kernel/printk: add kmsg SEEK_CUR handling"

2020-07-11 Thread Jason A. Donenfeld
On Sun, Jun 21, 2020 at 9:02 PM Jason A. Donenfeld  wrote:
> This commit broke userspace. Bash uses ESPIPE to determine whether or
> not the file should be read using "unbuffered I/O", which means reading
> 1 byte at a time instead of 128 bytes at a time. I used to use bash to
> read through kmsg in a really quite nasty way:
>
> while read -t 0.1 -r line 2>/dev/null || [[ $? -ne 142 ]]; do
>echo "SARU $line"
> done < /dev/kmsg
>
> This will show all lines that can fit into the 128 byte buffer, and skip
> lines that don't. That's pretty awful, but at least it worked.

FYI, bash finally bumped its read buffer up to 4k, which actually
makes reading /dev/kmsg less awful than previously thought:
http://git.savannah.gnu.org/cgit/bash.git/commit/?id=6edcd70089d71ee8c17bf3298527054b3223be9f
This is probably too mundane to warrant an email, but in case somebody
finds this thread in the future, voila.


[PATCH] fscache: replace open-coded pr_warn_once() with actual call

2020-07-05 Thread Jason A. Donenfeld
There's no reason to open code this here, so instead replace it with
pr_warn_once, which amounts to exactly the same thing.

Signed-off-by: Jason A. Donenfeld 
---
 fs/fscache/page.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/fs/fscache/page.c b/fs/fscache/page.c
index 26af6fdf1538..dc8dce79fa8d 100644
--- a/fs/fscache/page.c
+++ b/fs/fscache/page.c
@@ -1173,14 +1173,9 @@ void fscache_mark_page_cached(struct fscache_retrieval 
*op, struct page *page)
trace_fscache_page(cookie, page, fscache_page_cached);
 
_debug("- mark %p{%lx}", page, page->index);
-   if (TestSetPageFsCache(page)) {
-   static bool once_only;
-   if (!once_only) {
-   once_only = true;
-   pr_warn("Cookie type %s marked page %lx multiple 
times\n",
-   cookie->def->name, page->index);
-   }
-   }
+   if (TestSetPageFsCache(page))
+   pr_warn_once("Cookie type %s marked page %lx multiple times\n",
+cookie->def->name, page->index);
 
if (cookie->def->mark_page_cached)
cookie->def->mark_page_cached(cookie->netfs_data,
-- 
2.27.0



Re: KASAN: use-after-free Read in netdev_name_node_lookup_rcu

2020-06-29 Thread Jason A. Donenfeld
Hey Cong,

I'm wondering if the below error is related to what you've been
looking at yesterday. AFAICT, there's a simple UaF on the attrbuf
passed to the start method. I recall recently you were working on the
locking in genetlink's family buffers and wound up mallocing some
things, so it seems like this might be related. See below.

Regards,
Jason

On Mon, Jun 29, 2020 at 6:40 PM syzbot
 wrote:
>
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit:1590a2e1 Merge tag 'acpi-5.8-rc3' of git://git.kernel.org/..
> git tree:   upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1664afad10
> kernel config:  https://syzkaller.appspot.com/x/.config?x=bf3aec367b9ab569
> dashboard link: https://syzkaller.appspot.com/bug?extid=a82be85e09cd5df398fe
> compiler:   gcc (GCC) 10.1.0-syz 20200507
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=14a1bf1d10
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1514a06b10
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+a82be85e09cd5df39...@syzkaller.appspotmail.com
>
> ==
> BUG: KASAN: use-after-free in strnlen+0x64/0x70 lib/string.c:561
> Read of size 1 at addr 8880933b8c18 by task syz-executor821/6893
>
> CPU: 0 PID: 6893 Comm: syz-executor821 Not tainted 5.8.0-rc2-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x18f/0x20d lib/dump_stack.c:118
>  print_address_description.constprop.0.cold+0xae/0x436 mm/kasan/report.c:383
>  __kasan_report mm/kasan/report.c:513 [inline]
>  kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
>  strnlen+0x64/0x70 lib/string.c:561
>  strnlen include/linux/string.h:339 [inline]
>  dev_name_hash net/core/dev.c:208 [inline]
>  netdev_name_node_lookup_rcu+0x22/0x150 net/core/dev.c:290
>  dev_get_by_name_rcu net/core/dev.c:883 [inline]
>  dev_get_by_name+0x7b/0x1e0 net/core/dev.c:905
>  lookup_interface drivers/net/wireguard/netlink.c:63 [inline]
>  wg_get_device_start+0x2e4/0x3f0 drivers/net/wireguard/netlink.c:203
>  genl_start+0x342/0x6e0 net/netlink/genetlink.c:556
>  __netlink_dump_start+0x585/0x900 net/netlink/af_netlink.c:2343
>  genl_family_rcv_msg_dumpit+0x2ac/0x310 net/netlink/genetlink.c:638
>  genl_family_rcv_msg net/netlink/genetlink.c:733 [inline]
>  genl_rcv_msg+0x797/0x9e0 net/netlink/genetlink.c:753
>  netlink_rcv_skb+0x15a/0x430 net/netlink/af_netlink.c:2469
>  genl_rcv+0x24/0x40 net/netlink/genetlink.c:764
>  netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
>  netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1329
>  netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1918
>  sock_sendmsg_nosec net/socket.c:652 [inline]
>  sock_sendmsg+0xcf/0x120 net/socket.c:672
>  sys_sendmsg+0x6e8/0x810 net/socket.c:2352
>  ___sys_sendmsg+0xf3/0x170 net/socket.c:2406
>  __sys_sendmsg+0xe5/0x1b0 net/socket.c:2439
>  do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:359
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x445299
> Code: Bad RIP value.
> RSP: 002b:7ffd1e794308 EFLAGS: 0246 ORIG_RAX: 002e
> RAX: ffda RBX:  RCX: 00445299
> RDX:  RSI: 2200 RDI: 0003
> RBP: 00082a5d R08:  R09: 004002e0
> R10:  R11: 0246 R12: 00402430
> R13: 004024c0 R14:  R15: 
>
> Allocated by task 6894:
>  save_stack+0x1b/0x40 mm/kasan/common.c:48
>  set_track mm/kasan/common.c:56 [inline]
>  __kasan_kmalloc.constprop.0+0xc2/0xd0 mm/kasan/common.c:494
>  __kmalloc_reserve net/core/skbuff.c:142 [inline]
>  __alloc_skb+0xae/0x550 net/core/skbuff.c:210
>  alloc_skb include/linux/skbuff.h:1083 [inline]
>  netlink_alloc_large_skb net/netlink/af_netlink.c:1175 [inline]
>  netlink_sendmsg+0x94f/0xd90 net/netlink/af_netlink.c:1893
>  sock_sendmsg_nosec net/socket.c:652 [inline]
>  sock_sendmsg+0xcf/0x120 net/socket.c:672
>  sys_sendmsg+0x6e8/0x810 net/socket.c:2352
>  ___sys_sendmsg+0xf3/0x170 net/socket.c:2406
>  __sys_sendmsg+0xe5/0x1b0 net/socket.c:2439
>  do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:359
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> Freed by task 6894:
>  save_stack+0x1b/0x40 mm/kasan/common.c:48
>  set_track mm/kasan/common.c:56 [inline]
>  kasan_set_free_info mm/kasan/common.c:316 [inline]
>  __kasan_slab_free+0xf5/0x140 mm/kasan/common.c:455
>  __cache_free mm/slab.c:3426 [inline]
>  kfree+0x103/0x2c0 mm/slab.c:3757
>  skb_free_head net/core/skbuff.c:590 [inline]
>  skb_release_data+0x6d9/0x910 net/core/skbuff.c:610
>  skb_release_all net/core/skbuff.c:664 [inline]
>  __kfree_skb net/core/skbuff.c:678 [inline]
>  consume_skb net/core/skbuff.c:837 [inline]
>  consume_skb+0xc2/0x160 

Re: [PATCH v2 net-next] net: core: use listified Rx for GRO_NORMAL in napi_gro_receive()

2020-06-24 Thread Jason A. Donenfeld
On Wed, Jun 24, 2020 at 03:06:10PM -0600, Jason A. Donenfeld wrote:
> Hi Alexander,
> 
> This patch introduced a behavior change around GRO_DROP:
> 
> napi_skb_finish used to sometimes return GRO_DROP:
> 
> > -static gro_result_t napi_skb_finish(gro_result_t ret, struct sk_buff *skb)
> > +static gro_result_t napi_skb_finish(struct napi_struct *napi,
> > +   struct sk_buff *skb,
> > +   gro_result_t ret)
> >  {
> > switch (ret) {
> > case GRO_NORMAL:
> > -   if (netif_receive_skb_internal(skb))
> > -   ret = GRO_DROP;
> > +   gro_normal_one(napi, skb);
> >
> 
> But under your change, gro_normal_one and the various calls that makes
> never propagates its return value, and so GRO_DROP is never returned to
> the caller, even if something drops it.
> 
> Was this intentional? Or should I start looking into how to restore it?
> 
> Thanks,
> Jason

For some context, I'm consequently mulling over this change in my code,
since checking for GRO_DROP now constitutes dead code:

diff --git a/drivers/net/wireguard/receive.c b/drivers/net/wireguard/receive.c
index 91438144e4f7..9b2ab6fc91cd 100644
--- a/drivers/net/wireguard/receive.c
+++ b/drivers/net/wireguard/receive.c
@@ -414,14 +414,8 @@ static void wg_packet_consume_data_done(struct wg_peer 
*peer,
if (unlikely(routed_peer != peer))
goto dishonest_packet_peer;

-   if (unlikely(napi_gro_receive(>napi, skb) == GRO_DROP)) {
-   ++dev->stats.rx_dropped;
-   net_dbg_ratelimited("%s: Failed to give packet to userspace 
from peer %llu (%pISpfsc)\n",
-   dev->name, peer->internal_id,
-   >endpoint.addr);
-   } else {
-   update_rx_stats(peer, message_data_len(len_before_trim));
-   }
+   napi_gro_receive(>napi, skb);
+   update_rx_stats(peer, message_data_len(len_before_trim));
return;

 dishonest_packet_peer:



Re: [PATCH v2 net-next] net: core: use listified Rx for GRO_NORMAL in napi_gro_receive()

2020-06-24 Thread Jason A. Donenfeld
Hi Alexander,

This patch introduced a behavior change around GRO_DROP:

napi_skb_finish used to sometimes return GRO_DROP:

> -static gro_result_t napi_skb_finish(gro_result_t ret, struct sk_buff *skb)
> +static gro_result_t napi_skb_finish(struct napi_struct *napi,
> + struct sk_buff *skb,
> + gro_result_t ret)
>  {
>   switch (ret) {
>   case GRO_NORMAL:
> - if (netif_receive_skb_internal(skb))
> - ret = GRO_DROP;
> + gro_normal_one(napi, skb);
>

But under your change, gro_normal_one and the various calls that makes
never propagates its return value, and so GRO_DROP is never returned to
the caller, even if something drops it.

Was this intentional? Or should I start looking into how to restore it?

Thanks,
Jason


Re: [PATCH 5.2 003/313] ipv6: do not free rt if FIB_LOOKUP_NOREF is set on suppress rule

2020-06-23 Thread Jason A. Donenfeld
I realize 5.2 has long since sunsetted, and I really missed the boat
on speaking up about this many many months ago, but in case any
distros or organizations have a 5.2.x stable series, this should
probably be reverted from 5.2. The reason is that 7d9e5f422 was added
to 5.3, and introduced a UaF, which this commit then fixed. But
without 7d9e5f422, this commit adds a memory leak.

Anyway, not worth spending too much brain cycles on this, considering
5.2 isn't even mentioned on kernel.org any more, but in case it helps
for somebody doing something strange in years to come, voila.


[PATCH] Revert "kernel/printk: add kmsg SEEK_CUR handling"

2020-06-21 Thread Jason A. Donenfeld
This reverts commit 8ece3b3eb576a78d2e67ad4c3a80a39fa6708809.

This commit broke userspace. Bash uses ESPIPE to determine whether or
not the file should be read using "unbuffered I/O", which means reading
1 byte at a time instead of 128 bytes at a time. I used to use bash to
read through kmsg in a really quite nasty way:

while read -t 0.1 -r line 2>/dev/null || [[ $? -ne 142 ]]; do
   echo "SARU $line"
done < /dev/kmsg

This will show all lines that can fit into the 128 byte buffer, and skip
lines that don't. That's pretty awful, but at least it worked.

With this change, bash now tries to do 1-byte reads, which means it
skips all the lines, which is worse than before.

Now, I don't really care very much about this, and I'm already look for
a workaround. But I did just spend an hour trying to figure out why my
scripts were broken. Either way, it makes no difference to me personally
whether this is reverted, but it might be something to consider. If you
declare that "trying to read /dev/kmsg with bash is terminally stupid
anyway," I might be inclined to agree with you. But do note that bash
uses lseek(fd, 0, SEEK_CUR)==>ESPIPE to determine whether or not it's
reading from a pipe.

Cc: Bruno Meneguele 
Cc: pmla...@suse.com
Cc: sergey.senozhat...@gmail.com
Cc: rost...@goodmis.org
Cc: david.lai...@aculab.com
Cc: Linus Torvalds 
Cc: Sergey Senozhatsky 
Cc: Petr Mladek 
Signed-off-by: Jason A. Donenfeld 
---
 Documentation/ABI/testing/dev-kmsg |  5 -
 kernel/printk/printk.c | 10 --
 2 files changed, 15 deletions(-)

diff --git a/Documentation/ABI/testing/dev-kmsg 
b/Documentation/ABI/testing/dev-kmsg
index 1e6c28b1942b..f307506eb54c 100644
--- a/Documentation/ABI/testing/dev-kmsg
+++ b/Documentation/ABI/testing/dev-kmsg
@@ -56,11 +56,6 @@ Description: The /dev/kmsg character device node provides 
userspace access
  seek after the last record available at the time
  the last SYSLOG_ACTION_CLEAR was issued.
 
-   Due to the record nature of this interface with a "read all"
-   behavior and the specific positions each seek operation sets,
-   SEEK_CUR is not supported, returning -ESPIPE (invalid seek) to
-   errno whenever requested.
-
The output format consists of a prefix carrying the syslog
prefix including priority and facility, the 64 bit message
sequence number and the monotonic timestamp in microseconds,
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 8c14835be46c..b71eaf5f5a86 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -974,16 +974,6 @@ static loff_t devkmsg_llseek(struct file *file, loff_t 
offset, int whence)
user->idx = log_next_idx;
user->seq = log_next_seq;
break;
-   case SEEK_CUR:
-   /*
-* It isn't supported due to the record nature of this
-* interface: _SET _DATA and _END point to very specific
-* record positions, while _CUR would be more useful in case
-* of a byte-based log. Because of that, return the default
-* errno value for invalid seek operation.
-*/
-   ret = -ESPIPE;
-   break;
default:
ret = -EINVAL;
}
-- 
2.27.0



Re: [PATCH] acpi: disallow loading configfs acpi tables when locked down

2020-06-17 Thread Jason A. Donenfeld
On Wed, Jun 17, 2020 at 2:38 AM Ard Biesheuvel  wrote:
>
> On Wed, 17 Jun 2020 at 00:21, Jason A. Donenfeld  wrote:
> >
> > Hi Rafael, Len,
> >
> > Looks like I should have CC'd you on this patch. This is probably
> > something we should get into 5.8-rc2, so that it can then get put into
> > stable kernels, as some people think this is security sensitive.
> > Bigger picture is this:
> >
> > https://data.zx2c4.com/american-unsigned-language-2.gif
> > https://data.zx2c4.com/american-unsigned-language-2-fedora-5.8.png
> >
> > Also, somebody mentioned to me that Microsoft's ACPI implementation
> > disallows writes to system memory as a security mitigation. I haven't
> > looked at what that actually entails, but I wonder if entirely
> > disabling support for ACPI_ADR_SPACE_SYSTEM_MEMORY would be sensible.
> > I haven't looked at too many DSDTs. Would that break real hardware, or
> > does nobody do that? Alternatively, the range of acceptable addresses
> > for SystemMemory could exclude kernel memory. Would that break
> > anything? Have you heard about Microsoft's mitigation to know more
> > details on what they figured out they could safely restrict without
> > breaking hardware? Either way, food for thought I suppose.
> >
>
> ACPI_ADR_SPACE_SYSTEM_MEMORY may be used for everything that is memory
> mapped, i.e., PCIe ECAM space, GPIO control registers etc.
>
> I agree that using ACPI_ADR_SPACE_SYSTEM_MEMORY for any memory that is
> under the kernel's control is a bad idea, and this should be easy to
> filter out: the SystemMemory address space handler needs the ACPI
> support routines to map the physical addresses used by AML into
> virtual kernel addresses, so all these accesses go through
> acpi_os_ioremap(). So as a first step, it should be reasonable to put
> a lockdown check there, and fail any access to OS owned memory if
> lockdown is enabled, and print a warning if it is not.

Makes sense. Though I was thinking of doing this unconditionally, not
just for lockdown, as a "realer" security mitigation (hence CCing
kernel-hardening), against bugs that manage to mess with acpi things.
A second mitigation might be marking inmemory acpi bytecode pages as
read only.


Re: [PATCH] acpi: disallow loading configfs acpi tables when locked down

2020-06-16 Thread Jason A. Donenfeld
Hi Rafael, Len,

Looks like I should have CC'd you on this patch. This is probably
something we should get into 5.8-rc2, so that it can then get put into
stable kernels, as some people think this is security sensitive.
Bigger picture is this:

https://data.zx2c4.com/american-unsigned-language-2.gif
https://data.zx2c4.com/american-unsigned-language-2-fedora-5.8.png

Also, somebody mentioned to me that Microsoft's ACPI implementation
disallows writes to system memory as a security mitigation. I haven't
looked at what that actually entails, but I wonder if entirely
disabling support for ACPI_ADR_SPACE_SYSTEM_MEMORY would be sensible.
I haven't looked at too many DSDTs. Would that break real hardware, or
does nobody do that? Alternatively, the range of acceptable addresses
for SystemMemory could exclude kernel memory. Would that break
anything? Have you heard about Microsoft's mitigation to know more
details on what they figured out they could safely restrict without
breaking hardware? Either way, food for thought I suppose.

Jason


Re: [PATCH v4 0/3] mm, treewide: Rename kzfree() to kfree_sensitive()

2020-06-16 Thread Jason A. Donenfeld
On Tue, Jun 16, 2020 at 12:54 PM Joe Perches  wrote:
>
> On Mon, 2020-06-15 at 21:57 -0400, Waiman Long wrote:
> >  v4:
> >   - Break out the memzero_explicit() change as suggested by Dan Carpenter
> > so that it can be backported to stable.
> >   - Drop the "crypto: Remove unnecessary memzero_explicit()" patch for
> > now as there can be a bit more discussion on what is best. It will be
> > introduced as a separate patch later on after this one is merged.
>
> To this larger audience and last week without reply:
> https://lore.kernel.org/lkml/573b3fbd5927c643920e1364230c296b23e7584d.ca...@perches.com/
>
> Are there _any_ fastpath uses of kfree or vfree?

The networking stack has various places where there will be a quick
kmalloc followed by a kfree for an incoming or outgoing packet. One
place that comes to mind would be esp_alloc_tmp, which does a quick
allocation of some temporary kmalloc memory, processes some packet
things inside of that, and then frees it, sometimes in the same
function, and sometimes later in an async callback. I don't know how
"fastpath" you consider this, but usually packet processing is
something people want to do with minimal overhead, considering how
fast NICs are these days.


[PATCH] acpi: disallow loading configfs acpi tables when locked down

2020-06-15 Thread Jason A. Donenfeld
Like other vectors already patched, this one here allows the root user
to load ACPI tables, which enables arbitrary physical address writes,
which in turn makes it possible to disable lockdown. This patch prevents
this by checking the lockdown status before allowing a new ACPI table to be
installed. The link in the trailer shows a PoC of how this might be
used.

Link: 
https://git.zx2c4.com/american-unsigned-language/tree/american-unsigned-language-2.sh
Cc: sta...@vger.kernel.org
Signed-off-by: Jason A. Donenfeld 
---
 drivers/acpi/acpi_configfs.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/acpi_configfs.c b/drivers/acpi/acpi_configfs.c
index ece8c1a921cc..88c8af455ea3 100644
--- a/drivers/acpi/acpi_configfs.c
+++ b/drivers/acpi/acpi_configfs.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "acpica/accommon.h"
 #include "acpica/actables.h"
@@ -28,7 +29,10 @@ static ssize_t acpi_table_aml_write(struct config_item *cfg,
 {
const struct acpi_table_header *header = data;
struct acpi_table *table;
-   int ret;
+   int ret = security_locked_down(LOCKDOWN_ACPI_TABLES);
+
+   if (ret)
+   return ret;
 
table = container_of(cfg, struct acpi_table, cfg);
 
-- 
2.27.0



Re: [PATCH] Do not assign in if condition wg_noise_handshake_consume_initiation()

2020-06-09 Thread Jason A. Donenfeld
On Tue, Jun 9, 2020 at 9:21 AM Frank Werner-Krippendorf  wrote:
>
> Fixes an error condition reported by checkpatch.pl which caused by
> assigning a variable in an if condition in
> wg_noise_handshake_consume_initiation().
>
> Signed-off-by: Frank Werner-Krippendorf 

Thanks. Queued up in the wireguard-linux.git tree.

Jason


Re: linux-next: build failure after merge of the keys tree

2020-05-14 Thread Jason A. Donenfeld
On Thu, May 14, 2020 at 6:35 AM Masahiro Yamada  wrote:
>
> On Thu, May 14, 2020 at 9:11 PM David Howells  wrote:
> >
> > Jason A. Donenfeld  wrote:
> >
> > > Your touch might be helpful here. CRYPTO_LIB_CHACHA20POLY1305 is a
> > > tristate and depends on as well as selects other things that are
> > > tristates.
> > >
> > > Meanwhile BIG_KEYS is a bool, which needs to select
> > > CRYPTO_LIB_CHACHA20POLY1305. However, it gets antsy if the the symbol
> > > its selecting has =m items in its hierarchy.
> > >
> > > Any suggestions? The ideal thing to happen would be that the select of
> > > CRYPTO_LIB_CHACHA20POLY1305 in BIG_KEYS causes all of the descendants
> > > to become =y too.
> >
> > I think that select is broken in its behaviour - it doesn't propagate the
> > selection enforcement up the tree.  You could try changing it to a depends 
> > on
> > or you could put in a select for every dependency.
>
> I agree.
> 'depends on' will be cleaner.

That's fine, but also makes it more annoying for people to select
big_keys, and I don't know how David feels in that regard.

Seems like it'd be useful to have something that means "select X and
all the things X needs to not be broken", though satisfiability
problems like that can get really complicated quite fast.


Re: linux-next: build failure after merge of the keys tree

2020-05-13 Thread Jason A. Donenfeld
Hey Masahiro,

Your touch might be helpful here. CRYPTO_LIB_CHACHA20POLY1305 is a
tristate and depends on as well as selects other things that are
tristates.

Meanwhile BIG_KEYS is a bool, which needs to select
CRYPTO_LIB_CHACHA20POLY1305. However, it gets antsy if the the symbol
its selecting has =m items in its hierarchy.

Any suggestions? The ideal thing to happen would be that the select of
CRYPTO_LIB_CHACHA20POLY1305 in BIG_KEYS causes all of the descendants
to become =y too.

Jason


On Wed, May 13, 2020 at 10:31 PM Stephen Rothwell  wrote:
>
> Hi all,
>
> After merging the keys tree, today's linux-next build (x86_64
> allmodconfig) failed like this:
>
> WARNING: unmet direct dependencies detected for CRYPTO_LIB_CHACHA20POLY1305
>   Depends on [m]: CRYPTO [=y] && (CRYPTO_ARCH_HAVE_LIB_CHACHA [=m] || 
> !CRYPTO_ARCH_HAVE_LIB_CHACHA [=m]) && (CRYPTO_ARCH_HAVE_LIB_POLY1305 [=m] || 
> !CRYPTO_ARCH_HAVE_LIB_POLY1305 [=m])
>   Selected by [y]:
>   - BIG_KEYS [=y] && KEYS [=y] && TMPFS [=y]
>   Selected by [m]:
>   - WIREGUARD [=m] && NETDEVICES [=y] && NET_CORE [=y] && NET [=y] && INET 
> [=y] && (IPV6 [=y] || !IPV6 [=y])
>
> WARNING: unmet direct dependencies detected for CRYPTO_LIB_CHACHA20POLY1305
>   Depends on [m]: CRYPTO [=y] && (CRYPTO_ARCH_HAVE_LIB_CHACHA [=m] || 
> !CRYPTO_ARCH_HAVE_LIB_CHACHA [=m]) && (CRYPTO_ARCH_HAVE_LIB_POLY1305 [=m] || 
> !CRYPTO_ARCH_HAVE_LIB_POLY1305 [=m])
>   Selected by [y]:
>   - BIG_KEYS [=y] && KEYS [=y] && TMPFS [=y]
>   Selected by [m]:
>   - WIREGUARD [=m] && NETDEVICES [=y] && NET_CORE [=y] && NET [=y] && INET 
> [=y] && (IPV6 [=y] || !IPV6 [=y])
>
> WARNING: unmet direct dependencies detected for CRYPTO_LIB_CHACHA20POLY1305
>   Depends on [m]: CRYPTO [=y] && (CRYPTO_ARCH_HAVE_LIB_CHACHA [=m] || 
> !CRYPTO_ARCH_HAVE_LIB_CHACHA [=m]) && (CRYPTO_ARCH_HAVE_LIB_POLY1305 [=m] || 
> !CRYPTO_ARCH_HAVE_LIB_POLY1305 [=m])
>   Selected by [y]:
>   - BIG_KEYS [=y] && KEYS [=y] && TMPFS [=y]
>   Selected by [m]:
>   - WIREGUARD [=m] && NETDEVICES [=y] && NET_CORE [=y] && NET [=y] && INET 
> [=y] && (IPV6 [=y] || !IPV6 [=y])
> x86_64-linux-gnu-ld: lib/crypto/chacha20poly1305.o: in function 
> `xchacha_init':
> chacha20poly1305.c:(.text+0x12d): undefined reference to `chacha_init_arch'
> x86_64-linux-gnu-ld: chacha20poly1305.c:(.text+0x13d): undefined reference to 
> `hchacha_block_arch'
> x86_64-linux-gnu-ld: chacha20poly1305.c:(.text+0x14b): undefined reference to 
> `chacha_init_arch'
> x86_64-linux-gnu-ld: lib/crypto/chacha20poly1305.o: in function 
> `__chacha20poly1305_encrypt':
> chacha20poly1305.c:(.text+0x2ab): undefined reference to `chacha_crypt_arch'
> x86_64-linux-gnu-ld: chacha20poly1305.c:(.text+0x2bd): undefined reference to 
> `poly1305_init_arch'
> x86_64-linux-gnu-ld: chacha20poly1305.c:(.text+0x2d6): undefined reference to 
> `poly1305_update_arch'
> x86_64-linux-gnu-ld: chacha20poly1305.c:(.text+0x317): undefined reference to 
> `chacha_crypt_arch'
> x86_64-linux-gnu-ld: chacha20poly1305.c:(.text+0x32d): undefined reference to 
> `poly1305_update_arch'
> x86_64-linux-gnu-ld: chacha20poly1305.c:(.text+0x379): undefined reference to 
> `poly1305_update_arch'
> x86_64-linux-gnu-ld: chacha20poly1305.c:(.text+0x385): undefined reference to 
> `poly1305_final_arch'
> x86_64-linux-gnu-ld: chacha20poly1305.c:(.text+0x413): undefined reference to 
> `poly1305_update_arch'
> x86_64-linux-gnu-ld: chacha20poly1305.c:(.text+0x434): undefined reference to 
> `poly1305_update_arch'
> x86_64-linux-gnu-ld: lib/crypto/chacha20poly1305.o: in function 
> `chacha20poly1305_encrypt':
> (.text+0x59d): undefined reference to `chacha_init_arch'
> x86_64-linux-gnu-ld: lib/crypto/chacha20poly1305.o: in function 
> `__chacha20poly1305_decrypt':
> chacha20poly1305.c:(.text+0x847): undefined reference to `chacha_crypt_arch'
> x86_64-linux-gnu-ld: chacha20poly1305.c:(.text+0x859): undefined reference to 
> `poly1305_init_arch'
> x86_64-linux-gnu-ld: chacha20poly1305.c:(.text+0x86d): undefined reference to 
> `poly1305_update_arch'
> x86_64-linux-gnu-ld: chacha20poly1305.c:(.text+0x8a7): undefined reference to 
> `poly1305_update_arch'
> x86_64-linux-gnu-ld: chacha20poly1305.c:(.text+0x8f1): undefined reference to 
> `poly1305_update_arch'
> x86_64-linux-gnu-ld: chacha20poly1305.c:(.text+0x8fc): undefined reference to 
> `poly1305_final_arch'
> x86_64-linux-gnu-ld: chacha20poly1305.c:(.text+0x94f): undefined reference to 
> `chacha_crypt_arch'
> x86_64-linux-gnu-ld: chacha20poly1305.c:(.text+0x9d9): undefined reference to 
> `poly1305_update_arch'
> x86_64-linux-gnu-ld: chacha20poly1305.c:(.text+0x9f9): undefined reference to 
> `poly1305_update_arch'
> x86_64-linux-gnu-ld: lib/crypto/chacha20poly1305.o: in function 
> `chacha20poly1305_decrypt':
> (.text+0xb78): undefined reference to `chacha_init_arch'
> x86_64-linux-gnu-ld: lib/crypto/chacha20poly1305.o: in function 
> `chacha20poly1305_crypt_sg_inplace':
> chacha20poly1305.c:(.text+0xf16): undefined reference to `chacha_init_arch'
> 

Re: [PATCH v3] security/keys: rewrite big_key crypto to use library interface

2020-05-12 Thread Jason A. Donenfeld
On Tue, May 12, 2020 at 4:03 PM David Howells  wrote:
>
> Jason A. Donenfeld  wrote:
>
> > So long as that ->update function:
> > 1. Deletes the old on-disk data.
> > 2. Deletes the old key from the inode.
> > 3. Generates a new key using get_random_bytes.
> > 4. Stores that new key in the inode.
> > 5. Encrypts the updated data afresh with the new key.
> > 6. Puts the updated data onto disk,
> >
> > then this is fine with me, and feel free to have my Acked-by if you
> > want. But if it doesn't do that -- i.e. if it tries to reuse the old
> > key or similar -- then this isn't fine. But it sounds like from what
> > you've described that things are actually fine, in which case, I guess
> > it makes sense to apply your patch ontop of mine and commit these.
>
> Yep.  It calls big_key_destroy(), which clears away the old stuff just as when
> a key is being destroyed, then generic_key_instantiate() just as when a key is
> being set up.
>
> The key ID and the key metadata (ownership, perms, expiry) are maintained, but
> the payload is just completely replaced.

Okay, in that case, take my:

Acked-by: Jason A. Donenfeld 

And then perhaps you can take both my patch and your addendum into keys-next.

Jason


Re: [PATCH v3] security/keys: rewrite big_key crypto to use library interface

2020-05-12 Thread Jason A. Donenfeld
Hi David,

So long as that ->update function:
1. Deletes the old on-disk data.
2. Deletes the old key from the inode.
3. Generates a new key using get_random_bytes.
4. Stores that new key in the inode.
5. Encrypts the updated data afresh with the new key.
6. Puts the updated data onto disk,

then this is fine with me, and feel free to have my Acked-by if you
want. But if it doesn't do that -- i.e. if it tries to reuse the old
key or similar -- then this isn't fine. But it sounds like from what
you've described that things are actually fine, in which case, I guess
it makes sense to apply your patch ontop of mine and commit these.

Jason


Re: [PATCH v2] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10

2020-05-11 Thread Jason A. Donenfeld
On Mon, May 11, 2020 at 6:05 PM Linus Torvalds
 wrote:
> There's a reason -O3 isn't even offered as an option.
>
> Maybe things have changed, and maybe they've improved. But I'd like to
> see actual numbers for something like this.
>
> Not inlining as aggressively is not necessarily a bad thing. It can
> be, of course. But I've actually also done gcc bugreports about gcc
> inlining too much, and generating _worse_ code as a result (ie
> inlinging things that were behind an "if (unlikely())" test, and
> causing the likely path to grow a stack fram and stack spills as a
> result).
>
> So just "O3 inlines more" is not a valid argument.

Alright. It might be possible to produce some benchmarks, and then
isolate the precise inlining parameter that makes the difference, and
include that for gcc-10. But you made a compelling argument in that
old gcc bug report about not going down the finicky rabbit hole of gcc
inlining switches that seem to change meaning between releases, which
is persuasive.

The other possibility would be if -O3 actually isn't bad like it used
to be and the codegen is markedly better, alongside some numbers to
back it up. I'm not presently making that argument and don't have
those numbers, but perhaps others who were interested in this patch
for other reasons do have strong arguments there and want to chime in.
Otherwise, no problem dropping this.


[PATCH v2] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10

2020-05-11 Thread Jason A. Donenfeld
GCC 10 appears to have changed -O2 in order to make compilation time
faster when using -flto, seemingly at the expense of performance, in
particular with regards to how the inliner works. Since -O3 these days
shouldn't have the same set of bugs as 10 years ago, this commit
defaults new kernel compiles to -O3 when using gcc >= 10.

Cc: linux-kbu...@vger.kernel.org
Cc: x...@kernel.org
Cc: sta...@vger.kernel.org
Cc: hjl.to...@gmail.com
Cc: Peter Zijlstra 
Cc: Jakub Jelinek 
Cc: Oleksandr Natalenko 
Cc: Arnd Bergmann 
Cc: Andrew Morton 
Cc: David Laight 
Cc: Linus Torvalds 
Cc: Masahiro Yamada 
Signed-off-by: Jason A. Donenfeld 
---
Changes v1->v2:
 - [Oleksandr] Remove O3 dependency on ARC.

 init/Kconfig | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index 9e22ee8fbd75..f76ec3ccc883 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1245,7 +1245,8 @@ config BOOT_CONFIG
 
 choice
prompt "Compiler optimization level"
-   default CC_OPTIMIZE_FOR_PERFORMANCE
+   default CC_OPTIMIZE_FOR_PERFORMANCE_O3 if GCC_VERSION >= 10
+   default CC_OPTIMIZE_FOR_PERFORMANCE if (GCC_VERSION < 10 || 
CC_IS_CLANG)
 
 config CC_OPTIMIZE_FOR_PERFORMANCE
bool "Optimize for performance (-O2)"
@@ -1256,7 +1257,6 @@ config CC_OPTIMIZE_FOR_PERFORMANCE
 
 config CC_OPTIMIZE_FOR_PERFORMANCE_O3
bool "Optimize more for performance (-O3)"
-   depends on ARC
imply CC_DISABLE_WARN_MAYBE_UNINITIALIZED  # avoid false positives
help
  Choosing this option will pass "-O3" to your compiler to optimize
-- 
2.26.2



[PATCH v3] security/keys: rewrite big_key crypto to use library interface

2020-05-11 Thread Jason A. Donenfeld
A while back, I noticed that the crypto and crypto API usage in big_keys
were entirely broken in multiple ways, so I rewrote it. Now, I'm
rewriting it again, but this time using the simpler ChaCha20Poly1305
library function. This makes the file considerably more simple; the
diffstat alone should justify this commit. It also should be faster,
since it no longer requires a mutex around the "aead api object" (nor
allocations), allowing us to encrypt multiple items in parallel. We also
benefit from being able to pass any type of pointer, so we can get rid
of the ridiculously complex custom page allocator that big_key really
doesn't need.

Cc: David Howells 
Cc: Andy Lutomirski 
Cc: Greg KH 
Cc: Linus Torvalds 
Cc: kernel-harden...@lists.openwall.com
Reviewed-by: Eric Biggers 
Signed-off-by: Jason A. Donenfeld 
---
Changes v2->v3:
 - [Eric] Unify kernel_read/write handling in big_key_preparse and
   big_key_read.
 - [Eric] Update commit message.

Changes v1->v2:
 - [Eric] Return -EBADMSG instead of -EINVAL if the authtag fails.
 - [Eric] Select CONFIG_CRYPTO, since it's required by the LIB selection.
 - [Eric] Zero out buffers that formerly contained either plaintext or
   ciphertext keys.
 - [Jason] If kernel_read() fails, return that error value, instead of
   relying on the subsequent decryption to fail.

Note v1:
 I finally got around to updating this patch from the mailing list posts
 back in 2017-2018, using the library interface that we eventually
 merged in 2019. I haven't retested this for functionality, but nothing
 much has changed, so I suspect things should still be good to go.

 security/keys/Kconfig   |   3 +-
 security/keys/big_key.c | 240 ++--
 2 files changed, 35 insertions(+), 208 deletions(-)

diff --git a/security/keys/Kconfig b/security/keys/Kconfig
index 47c041563d41..7da6c1b496f9 100644
--- a/security/keys/Kconfig
+++ b/security/keys/Kconfig
@@ -61,8 +61,7 @@ config BIG_KEYS
depends on KEYS
depends on TMPFS
select CRYPTO
-   select CRYPTO_AES
-   select CRYPTO_GCM
+   select CRYPTO_LIB_CHACHA20POLY1305
help
  This option provides support for holding large keys within the kernel
  (for example Kerberos ticket caches).  The data may be stored out to
diff --git a/security/keys/big_key.c b/security/keys/big_key.c
index 82008f900930..d43f3daab2b8 100644
--- a/security/keys/big_key.c
+++ b/security/keys/big_key.c
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0-or-later
 /* Large capacity key type
  *
- * Copyright (C) 2017 Jason A. Donenfeld . All Rights 
Reserved.
+ * Copyright (C) 2017-2020 Jason A. Donenfeld . All Rights 
Reserved.
  * Copyright (C) 2013 Red Hat, Inc. All Rights Reserved.
  * Written by David Howells (dhowe...@redhat.com)
  */
@@ -12,20 +12,10 @@
 #include 
 #include 
 #include 
-#include 
 #include 
-#include 
 #include 
 #include 
-#include 
-#include 
-
-struct big_key_buf {
-   unsigned intnr_pages;
-   void*virt;
-   struct scatterlist  *sg;
-   struct page *pages[];
-};
+#include 
 
 /*
  * Layout of key payload words.
@@ -37,14 +27,6 @@ enum {
big_key_len,
 };
 
-/*
- * Crypto operation with big_key data
- */
-enum big_key_op {
-   BIG_KEY_ENC,
-   BIG_KEY_DEC,
-};
-
 /*
  * If the data is under this limit, there's no point creating a shm file to
  * hold it as the permanently resident metadata for the shmem fs will be at
@@ -52,16 +34,6 @@ enum big_key_op {
  */
 #define BIG_KEY_FILE_THRESHOLD (sizeof(struct inode) + sizeof(struct dentry))
 
-/*
- * Key size for big_key data encryption
- */
-#define ENC_KEY_SIZE 32
-
-/*
- * Authentication tag length
- */
-#define ENC_AUTHTAG_SIZE 16
-
 /*
  * big_key defined keys take an arbitrary string as the description and an
  * arbitrary blob of data as the payload
@@ -75,136 +47,20 @@ struct key_type key_type_big_key = {
.destroy= big_key_destroy,
.describe   = big_key_describe,
.read   = big_key_read,
-   /* no ->update(); don't add it without changing big_key_crypt() nonce */
+   /* no ->update(); don't add it without changing chacha20poly1305's 
nonce */
 };
 
-/*
- * Crypto names for big_key data authenticated encryption
- */
-static const char big_key_alg_name[] = "gcm(aes)";
-#define BIG_KEY_IV_SIZEGCM_AES_IV_SIZE
-
-/*
- * Crypto algorithms for big_key data authenticated encryption
- */
-static struct crypto_aead *big_key_aead;
-
-/*
- * Since changing the key affects the entire object, we need a mutex.
- */
-static DEFINE_MUTEX(big_key_aead_lock);
-
-/*
- * Encrypt/decrypt big_key data
- */
-static int big_key_crypt(enum big_key_op op, struct big_key_buf *buf, size_t 
datalen, u8 *key)
-{
-   int ret;
-   struct aead_request *aead_req;
-   /* We always use a zero nonce. The reason we can get away with 

Re: [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10

2020-05-08 Thread Jason A. Donenfeld
On Fri, May 8, 2020 at 5:56 AM Arnd Bergmann  wrote:
>
> On Fri, May 8, 2020 at 1:33 PM Oleksandr Natalenko  
> wrote:
> >
> > On Fri, May 08, 2020 at 05:21:47AM -0600, Jason A. Donenfeld wrote:
> > > > Should we untangle -O3 from depending on ARC first maybe?
> > >
> > > Oh, hah, good point. Yes, I'll do that for a v2, but will wait another
> > > day for feedback first.
> >
> > Just keep in mind that my previous attempt [1] failed because of too
> > many false positive warnings despite -O3 really uncovered a couple of
> > bugs in the codebase.
>
> I think my warning fixes were mostly picked up in the meantime, but
> if there are any remaining, they would be mixed in with random other
> fixes in my testing tree, so it's hard to know for sure.
>
> I also want to hear the feedback from the gcc developers about what
> the general recommendations are between O2 and O3, and how
> they may have changed over time. According to the gcc-10 documentation,
> the difference between -O2 and -O3 is exactly this set of options:
>
> -fgcse-after-reload
> -fipa-cp-clone
> -floop-interchange
> -floop-unroll-and-jam
> -fpeel-loops
> -fpredictive-commoning
> -fsplit-loops
> -fsplit-paths
> -ftree-loop-distribution
> -ftree-loop-vectorize
> -ftree-partial-pre
> -ftree-slp-vectorize
> -funswitch-loops
> -fvect-cost-model
> -fvect-cost-model=dynamic
> -fversion-loops-for-strides

The other significant thing -- and what prompted this patchset -- is
it looks like gcc 10 has lowered the inlining degree for -O2, and put
gcc 9's inlining parameters from -O2 into gcc-10's -O3.


Re: [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10

2020-05-08 Thread Jason A. Donenfeld
On Fri, May 8, 2020 at 3:08 AM Oleksandr Natalenko  wrote:
>
> Should we untangle -O3 from depending on ARC first maybe?

Oh, hah, good point. Yes, I'll do that for a v2, but will wait another
day for feedback first.

Jason


[PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10

2020-05-07 Thread Jason A. Donenfeld
GCC 10 appears to have changed -O2 in order to make compilation time
faster when using -flto, seemingly at the expense of performance, in
particular with regards to how the inliner works. Since -O3 these days
shouldn't have the same set of bugs as 10 years ago, this commit
defaults new kernel compiles to -O3 when using gcc >= 10.

Signed-off-by: Jason A. Donenfeld 
---
 init/Kconfig | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/init/Kconfig b/init/Kconfig
index 9e22ee8fbd75..fab3f810a68d 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1245,7 +1245,8 @@ config BOOT_CONFIG
 
 choice
prompt "Compiler optimization level"
-   default CC_OPTIMIZE_FOR_PERFORMANCE
+   default CC_OPTIMIZE_FOR_PERFORMANCE_O3 if GCC_VERSION >= 10
+   default CC_OPTIMIZE_FOR_PERFORMANCE if (GCC_VERSION < 10 || 
CC_IS_CLANG)
 
 config CC_OPTIMIZE_FOR_PERFORMANCE
bool "Optimize for performance (-O2)"
-- 
2.26.2



Re: [PATCH v2] security: disable FORTIFY_SOURCE on clang

2020-05-05 Thread Jason A. Donenfeld
On Tue, May 5, 2020 at 8:54 PM Kees Cook  wrote:
>
> On Tue, May 05, 2020 at 06:14:53PM -0600, Jason A. Donenfeld wrote:
> > clang-10 has a broken optimization stage that doesn't allow the
> > compiler to prove at compile time that certain memcpys are within
> > bounds, and thus the outline memcpy is always called, resulting in
> > horrific performance, and in some cases, excessive stack frame growth.
> > Here's a simple reproducer:
> >
> > typedef unsigned long size_t;
> > void *c(void *dest, const void *src, size_t n) __asm__("memcpy");
> > extern inline __attribute__((gnu_inline)) void *memcpy(void *dest, 
> > const void *src, size_t n) { return c(dest, src, n); }
> > void blah(char *a)
> > {
> >   unsigned long long b[10], c[10];
> >   int i;
> >
> >   memcpy(b, a, sizeof(b));
> >   for (i = 0; i < 10; ++i)
> > c[i] = b[i] ^ b[9 - i];
> >   for (i = 0; i < 10; ++i)
> > b[i] = c[i] ^ a[i];
> >   memcpy(a, b, sizeof(b));
> > }
> >
> > Compile this with clang-9 and clang-10 and observe:
> >
> > zx2c4@thinkpad /tmp/curve25519-hacl64-stack-frame-size-test $ clang-10 
> > -Wframe-larger-than=0 -O3 -c b.c -o c10.o
> > b.c:5:6: warning: stack frame size of 104 bytes in function 'blah' 
> > [-Wframe-larger-than=]
> > void blah(char *a)
> >  ^
> > 1 warning generated.
> > zx2c4@thinkpad /tmp/curve25519-hacl64-stack-frame-size-test $ clang-9 
> > -Wframe-larger-than=0 -O3 -c b.c -o c9.o
> >
> > Looking at the disassembly of c10.o and c9.o, one can see that c9.o is
> > properly optimized in the obvious way one would expect, while c10.o has
> > blown up and includes extern calls to memcpy.
> >
> > But actually, for versions of clang earlier than 10, fortify source
> > mostly does nothing. So, between being broken and doing nothing, it
> > probably doesn't make sense to pretend to offer this option. So, this
> > commit just disables it entirely when compiling with clang.
> >
> > Cc: Arnd Bergmann 
> > Cc: LKML 
> > Cc: clang-built-linux 
> > Cc: Kees Cook 
> > Cc: George Burgess 
> > Cc: Nick Desaulniers 
> > Link: https://bugs.llvm.org/show_bug.cgi?id=45802
> > Signed-off-by: Jason A. Donenfeld 
>
> Grudgingly,
>
> Reviewed-by: Kees Cook 

Do you want to take this into your tree to send to Linus? Seems like
security kconfig switches is in line with your usual submissions.


[PATCH v2] security: disable FORTIFY_SOURCE on clang

2020-05-05 Thread Jason A. Donenfeld
clang-10 has a broken optimization stage that doesn't allow the
compiler to prove at compile time that certain memcpys are within
bounds, and thus the outline memcpy is always called, resulting in
horrific performance, and in some cases, excessive stack frame growth.
Here's a simple reproducer:

typedef unsigned long size_t;
void *c(void *dest, const void *src, size_t n) __asm__("memcpy");
extern inline __attribute__((gnu_inline)) void *memcpy(void *dest, const 
void *src, size_t n) { return c(dest, src, n); }
void blah(char *a)
{
  unsigned long long b[10], c[10];
  int i;

  memcpy(b, a, sizeof(b));
  for (i = 0; i < 10; ++i)
c[i] = b[i] ^ b[9 - i];
  for (i = 0; i < 10; ++i)
b[i] = c[i] ^ a[i];
  memcpy(a, b, sizeof(b));
}

Compile this with clang-9 and clang-10 and observe:

zx2c4@thinkpad /tmp/curve25519-hacl64-stack-frame-size-test $ clang-10 
-Wframe-larger-than=0 -O3 -c b.c -o c10.o
b.c:5:6: warning: stack frame size of 104 bytes in function 'blah' 
[-Wframe-larger-than=]
void blah(char *a)
 ^
1 warning generated.
zx2c4@thinkpad /tmp/curve25519-hacl64-stack-frame-size-test $ clang-9 
-Wframe-larger-than=0 -O3 -c b.c -o c9.o

Looking at the disassembly of c10.o and c9.o, one can see that c9.o is
properly optimized in the obvious way one would expect, while c10.o has
blown up and includes extern calls to memcpy.

But actually, for versions of clang earlier than 10, fortify source
mostly does nothing. So, between being broken and doing nothing, it
probably doesn't make sense to pretend to offer this option. So, this
commit just disables it entirely when compiling with clang.

Cc: Arnd Bergmann 
Cc: LKML 
Cc: clang-built-linux 
Cc: Kees Cook 
Cc: George Burgess 
Cc: Nick Desaulniers 
Link: https://bugs.llvm.org/show_bug.cgi?id=45802
Signed-off-by: Jason A. Donenfeld 
---
 security/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/security/Kconfig b/security/Kconfig
index cd3cc7da3a55..76bcfb3eb16f 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -191,6 +191,7 @@ config HARDENED_USERCOPY_PAGESPAN
 config FORTIFY_SOURCE
bool "Harden common str/mem functions against buffer overflows"
depends on ARCH_HAS_FORTIFY_SOURCE
+   depends on !CC_IS_CLANG
help
  Detect overflows of buffers in common string and memory functions
  where the compiler can determine and validate the buffer sizes.
-- 
2.26.2



Re: [PATCH] Kbuild: disable FORTIFY_SOURCE on clang-10

2020-05-05 Thread Jason A. Donenfeld
On Tue, May 5, 2020 at 5:22 PM Jason A. Donenfeld  wrote:
>
> On Tue, May 5, 2020 at 5:19 PM Kees Cook  wrote:
> >
> > On Tue, May 05, 2020 at 04:37:38PM -0600, Jason A. Donenfeld wrote:
> > > On Tue, May 5, 2020 at 4:25 PM Nathan Chancellor
> > >  wrote:
> > > > I believe these issues are one in the same. I did a reverse bisect with
> > > > Arnd's test case and converged on George's first patch:
> > > >
> > > > https://github.com/llvm/llvm-project/commit/2dd17ff08165e6118e70f00e22b2c36d2d4e0a9a
> > > >
> > > > I think that in lieu of this patch, we should have that patch and its
> > > > follow-up fix merged into 10.0.1.
> > >
> > > If this is fixed in 10.0.1, do we even need to patch the kernel at
> > > all? Or can we just leave it be, considering most organizations using
> > > clang know what they're getting into? I'd personally prefer the
> > > latter, so that we don't clutter things.
> >
> > I agree: I'd rather this was fixed in 10.0.1 (but if we do want a
> > kernel-side work-around for 10.0.0, I would suggest doing the version
> > check in the Kconfig for FORTIFY_SOURCE instead of in the Makefile,
> > as that's where these things are supposed to live these days).
>
> Indeed this belongs in the Makefile. I can send a patch adjusting

er, Kconfig, is what I meant to type.


Re: [PATCH] Kbuild: disable FORTIFY_SOURCE on clang-10

2020-05-05 Thread Jason A. Donenfeld
On Tue, May 5, 2020 at 5:19 PM Kees Cook  wrote:
>
> On Tue, May 05, 2020 at 04:37:38PM -0600, Jason A. Donenfeld wrote:
> > On Tue, May 5, 2020 at 4:25 PM Nathan Chancellor
> >  wrote:
> > > I believe these issues are one in the same. I did a reverse bisect with
> > > Arnd's test case and converged on George's first patch:
> > >
> > > https://github.com/llvm/llvm-project/commit/2dd17ff08165e6118e70f00e22b2c36d2d4e0a9a
> > >
> > > I think that in lieu of this patch, we should have that patch and its
> > > follow-up fix merged into 10.0.1.
> >
> > If this is fixed in 10.0.1, do we even need to patch the kernel at
> > all? Or can we just leave it be, considering most organizations using
> > clang know what they're getting into? I'd personally prefer the
> > latter, so that we don't clutter things.
>
> I agree: I'd rather this was fixed in 10.0.1 (but if we do want a
> kernel-side work-around for 10.0.0, I would suggest doing the version
> check in the Kconfig for FORTIFY_SOURCE instead of in the Makefile,
> as that's where these things are supposed to live these days).

Indeed this belongs in the Makefile. I can send a patch adjusting
that, if you want, but I think I'd rather do nothing and have a fix be
rolled out in 10.0.1. Clang users should know what to expect in that
regard.

> (Though as was mentioned, it's likely that FORTIFY_SOURCE isn't working
> _at all_ under Clang, so I may still send a patch to depend on !clang
> just to avoid surprises until it's fixed, but I haven't had time to
> chase down a solution yet.)

That might be the most coherent thing to do, at least so that people
don't get some false sense of mitigation.

Jason


Re: [PATCH] Kbuild: disable FORTIFY_SOURCE on clang-10

2020-05-05 Thread Jason A. Donenfeld
On Tue, May 5, 2020 at 4:25 PM Nathan Chancellor
 wrote:
> I believe these issues are one in the same. I did a reverse bisect with
> Arnd's test case and converged on George's first patch:
>
> https://github.com/llvm/llvm-project/commit/2dd17ff08165e6118e70f00e22b2c36d2d4e0a9a
>
> I think that in lieu of this patch, we should have that patch and its
> follow-up fix merged into 10.0.1.

If this is fixed in 10.0.1, do we even need to patch the kernel at
all? Or can we just leave it be, considering most organizations using
clang know what they're getting into? I'd personally prefer the
latter, so that we don't clutter things.


[PATCH] Kbuild: disable FORTIFY_SOURCE on clang-10

2020-05-05 Thread Jason A. Donenfeld
clang-10 has a broken optimization stage that doesn't enable the
compiler to prove at compile time that certain memcpys are within
bounds, and thus the outline memcpy is always called, resulting in
horrific performance, and in some cases, excessive stack frame growth.
Here's a simple reproducer:

typedef unsigned long size_t;
void *c(void *dest, const void *src, size_t n) __asm__("memcpy");
extern inline __attribute__((gnu_inline)) void *memcpy(void *dest, const 
void *src, size_t n) { return c(dest, src, n); }
void blah(char *a)
{
  unsigned long long b[10], c[10];
  int i;

  memcpy(b, a, sizeof(b));
  for (i = 0; i < 10; ++i)
c[i] = b[i] ^ b[9 - i];
  for (i = 0; i < 10; ++i)
b[i] = c[i] ^ a[i];
  memcpy(a, b, sizeof(b));
}

Compile this with clang-9 and clang-10 and observe:

zx2c4@thinkpad /tmp/curve25519-hacl64-stack-frame-size-test $ clang-10 
-Wframe-larger-than=0 -O3 -c b.c -o c10.o
b.c:5:6: warning: stack frame size of 104 bytes in function 'blah' 
[-Wframe-larger-than=]
void blah(char *a)
 ^
1 warning generated.
zx2c4@thinkpad /tmp/curve25519-hacl64-stack-frame-size-test $ clang-9 
-Wframe-larger-than=0 -O3 -c b.c -o c9.o

Looking at the disassembly of c10.o and c9.o, one can see that c9.o is
properly optimized in the obvious way one would expect, while c10.o has
blown up and includes extern calls to memcpy.

This is present on the most trivial bits of code. Thus, for clang-10, we
just set __NO_FORTIFY globally, so that this issue won't be incurred.

Cc: Arnd Bergmann 
Cc: LKML 
Cc: clang-built-linux 
Cc: Kees Cook 
Cc: George Burgess 
Cc: Nick Desaulniers 
Link: https://bugs.llvm.org/show_bug.cgi?id=45802
Signed-off-by: Jason A. Donenfeld 
---
 Makefile | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/Makefile b/Makefile
index 49b2709ff44e..f022f077591d 100644
--- a/Makefile
+++ b/Makefile
@@ -768,6 +768,13 @@ KBUILD_CFLAGS += -Wno-gnu
 # source of a reference will be _MergedGlobals and not on of the whitelisted 
names.
 # See modpost pattern 2
 KBUILD_CFLAGS += -mno-global-merge
+
+# clang-10 has a broken optimization stage that causes memcpy to always be
+# outline, resulting in excessive stack frame growth and poor performance.
+ifeq ($(shell test $(CONFIG_CLANG_VERSION) -ge 10 && test 
$(CONFIG_CLANG_VERSION) -lt 11; echo $$?),0)
+KBUILD_CFLAGS += -D__NO_FORTIFY
+endif
+
 else
 
 # These warnings generated too much noise in a regular build.
-- 
2.26.2



Re: [PATCH] crypto: curve25519-hacl64 - Disable fortify-source for clang-10

2020-05-05 Thread Jason A. Donenfeld
On Tue, May 5, 2020 at 3:48 PM Nick Desaulniers  wrote:
>
> + Kees, George, who have started looking into this, too.

I have a smaller reproducer and analysis I'll send out very shortly.


Re: [PATCH] crypto: curve25519-hacl64 - Disable fortify-source for clang-10

2020-05-05 Thread Jason A. Donenfeld
As discussed on IRC, this issue here isn't specific to this file, but
rather fortify source has some serious issues on clang-10, everywhere
in the kernel, and we should probably disable it globally for this
clang version. I'll follow up with some more info in a different
patch.


Re: [PATCH] net: wireguard: avoid unused variable warning

2020-05-05 Thread Jason A. Donenfeld
On Tue, May 5, 2020 at 8:13 AM Arnd Bergmann  wrote:
>
> clang points out a harmless use of uninitialized variables that
> get passed into a local function but are ignored there:
>
> In file included from drivers/net/wireguard/ratelimiter.c:223:
> drivers/net/wireguard/selftest/ratelimiter.c:173:34: error: variable 'skb6' 
> is uninitialized when used here [-Werror,-Wuninitialized]
> ret = timings_test(skb4, hdr4, skb6, hdr6, _count);
>^~~~
> drivers/net/wireguard/selftest/ratelimiter.c:123:29: note: initialize the 
> variable 'skb6' to silence this warning
> struct sk_buff *skb4, *skb6;
>^
> = NULL
> drivers/net/wireguard/selftest/ratelimiter.c:173:40: error: variable 'hdr6' 
> is uninitialized when used here [-Werror,-Wuninitialized]
> ret = timings_test(skb4, hdr4, skb6, hdr6, _count);
>  ^~~~
> drivers/net/wireguard/selftest/ratelimiter.c:125:22: note: initialize the 
> variable 'hdr6' to silence this warning
> struct ipv6hdr *hdr6;
> ^

Seems like the code is a bit easier to read and is more uniform
looking by just initializing those two variables to NULL, like the
warning suggests. If you don't mind, I'll queue something up in my
tree to this effect.

By the way, I'm having a bit of a hard time reproducing the warning
with either clang-10 or clang-9. Just for my own curiosity, would you
mind sending the .config that results in this?

Jason


Re: INFO: rcu detected stall in wg_packet_tx_worker

2020-05-04 Thread Jason A. Donenfeld
So in spite of this Syzkaller bug being unrelated in the end, I've
continued to think about the stacktrace a bit, and combined with some
other [potentially false alarm] bug reports I'm trying to wrap my head
around, I'm a bit a curious about ideal usage for the udp_tunnel API.

All the uses I've seen in the kernel (including wireguard) follow this pattern:

rcu_read_lock_bh();
sock = rcu_dereference(obj->sock);
...
udp_tunnel_xmit_skb(..., sock, ...);
rcu_read_unlock_bh();

udp_tunnel_xmit_skb calls iptunnel_xmit, which winds up in the usual
ip_local_out path, which eventually winds up calling some other
devices' ndo_xmit, or gets queued up in a qdisc. Calls to
udp_tunnel_xmit_skb aren't exactly cheap. So I wonder: is holding the
rcu lock for all that time really a good thing?

A different pattern that avoids holding the rcu lock would be:

rcu_read_lock_bh();
sock = rcu_dereference(obj->sock);
sock_hold(sock);
rcu_read_unlock_bh();
...
udp_tunnel_xmit_skb(..., sock, ...);
sock_put(sock);

This seems better, but I wonder if it has some drawbacks too. For
example, sock_put has some comment that warns against incrementing it
in response to forwarded packets. And if this isn't necessary to do,
it's marginally more costly than the first pattern.

Any opinions about this?

Jason


Re: [PATCH] drm/i915: Avoid using simd from interrupt context

2020-05-03 Thread Jason A. Donenfeld
On Sun, May 3, 2020 at 2:31 PM Chris Wilson  wrote:
>
> Query whether or not we are in a legal context for using SIMD, before
> using SSE4.2 registers.
>
> Suggested-by: Jason A. Donenfeld 
> Signed-off-by: Chris Wilson 
> ---
>  drivers/gpu/drm/i915/i915_memcpy.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_memcpy.c 
> b/drivers/gpu/drm/i915/i915_memcpy.c
> index 7b3b83bd5ab8..fc18d6c28d5f 100644
> --- a/drivers/gpu/drm/i915/i915_memcpy.c
> +++ b/drivers/gpu/drm/i915/i915_memcpy.c
> @@ -24,6 +24,7 @@
>
>  #include 
>  #include 
> +#include 
>
>  #include "i915_memcpy.h"
>
> @@ -115,6 +116,9 @@ bool i915_memcpy_from_wc(void *dst, const void *src, 
> unsigned long len)
> if (unlikely(((unsigned long)dst | (unsigned long)src | len) & 15))
> return false;
>
> +   if (unlikely(!may_use_simd()))
> +   return false;
> +
> if (static_branch_likely(_movntdqa)) {
> if (likely(len))
> __memcpy_ntdqa(dst, src, len >> 4);
> --
> 2.20.1

Looks like you beat me to the punch. Thanks for doing this.

Reviewed-by: Jason A. Donenfeld 


Re: [PATCH] drm/i915: check to see if SIMD registers are available before using SIMD

2020-05-03 Thread Jason A. Donenfeld
On Sun, May 3, 2020 at 2:30 PM Chris Wilson  wrote:
>
> Quoting Jason A. Donenfeld (2020-04-30 23:10:16)
> > Sometimes it's not okay to use SIMD registers, the conditions for which
> > have changed subtly from kernel release to kernel release. Usually the
> > pattern is to check for may_use_simd() and then fallback to using
> > something slower in the unlikely case SIMD registers aren't available.
> > So, this patch fixes up i915's accelerated memcpy routines to fallback
> > to boring memcpy if may_use_simd() is false.
> >
> > Cc: sta...@vger.kernel.org
>
> The same argument as on the previous submission is that we return to the
> caller if we can't use movntqda as their fallback path should be faster
> than uncached memcpy.

Oh, THAT's what you meant before. Okay, will follow up.


Re: [PATCH 0/7] sha1 library cleanup

2020-05-02 Thread Jason A. Donenfeld
Thanks for this series. I like the general idea. I think it might make
sense, though, to separate things out into sha1.h and sha256.h. That
will be nice preparation work for when we eventually move obsolete
primitives into some  subdirectory.


Re: [PATCH] drm/i915: check to see if SIMD registers are available before using SIMD

2020-05-01 Thread Jason A. Donenfeld
On Fri, May 1, 2020 at 12:07 PM Christoph Hellwig  wrote:
>
> On Thu, Apr 30, 2020 at 04:10:16PM -0600, Jason A. Donenfeld wrote:
> > Sometimes it's not okay to use SIMD registers, the conditions for which
> > have changed subtly from kernel release to kernel release. Usually the
> > pattern is to check for may_use_simd() and then fallback to using
> > something slower in the unlikely case SIMD registers aren't available.
> > So, this patch fixes up i915's accelerated memcpy routines to fallback
> > to boring memcpy if may_use_simd() is false.
>
> Err, why does i915 implements its own uncached memcpy instead of relying
> on core functionality to start with?

I was wondering the same. It sure does seem like this ought to be more
generalized functionality, with a name that represents the type of
transfer it's optimized for (wc or similar).


Re: [PATCH] drm/i915: check to see if SIMD registers are available before using SIMD

2020-05-01 Thread Jason A. Donenfeld
On Fri, May 1, 2020 at 4:42 AM Sebastian Andrzej Siewior
 wrote:
>Reviewed-by: Sebastian Andrzej Siewior 

Thanks.

>
> May I ask how large the memcpy can be? I'm asking in case it is large
> and an explicit rescheduling point might be needed.

Yea I was worried about that too. I'm not an i915 developer, but so
far as I can tell:

- The path from intel_engine_cmd_parser is  <= 256 KiB for "known
users", so that's rather large.
- The path from perf_memcpy is either 4k, 64k, or 4M, depending on the
type of object, so that seems gigantic, but I think that might be
selftest code.
- The path from compress_page appears to be PAGE_SIZE, so 4k, which
meshes with the limits we set agreed on few weeks ago for the crypto
stuff.
- The path from guc_read_update_log_buffer appears to be 8k, 32k, 2M,
or 8M, depending on the type of object, so that seems absurdly huge
and doesn't appear to be selftest code either like the other case.

I have no doubt the i915 developers will jump in here waving their
arms, but either way, it sure seems to me like you might have a point.


[PATCH] drm/i915: check to see if SIMD registers are available before using SIMD

2020-04-30 Thread Jason A. Donenfeld
Sometimes it's not okay to use SIMD registers, the conditions for which
have changed subtly from kernel release to kernel release. Usually the
pattern is to check for may_use_simd() and then fallback to using
something slower in the unlikely case SIMD registers aren't available.
So, this patch fixes up i915's accelerated memcpy routines to fallback
to boring memcpy if may_use_simd() is false.

Cc: sta...@vger.kernel.org
Signed-off-by: Jason A. Donenfeld 
---
 drivers/gpu/drm/i915/i915_memcpy.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_memcpy.c 
b/drivers/gpu/drm/i915/i915_memcpy.c
index fdd550405fd3..7c0e022586bc 100644
--- a/drivers/gpu/drm/i915/i915_memcpy.c
+++ b/drivers/gpu/drm/i915/i915_memcpy.c
@@ -24,6 +24,7 @@
 
 #include 
 #include 
+#include 
 
 #include "i915_memcpy.h"
 
@@ -38,6 +39,12 @@ static DEFINE_STATIC_KEY_FALSE(has_movntdqa);
 #ifdef CONFIG_AS_MOVNTDQA
 static void __memcpy_ntdqa(void *dst, const void *src, unsigned long len)
 {
+   if (unlikely(!may_use_simd())) {
+   memcpy(dst, src, len);
+   return;
+   }
+
+
kernel_fpu_begin();
 
while (len >= 4) {
@@ -67,6 +74,11 @@ static void __memcpy_ntdqa(void *dst, const void *src, 
unsigned long len)
 
 static void __memcpy_ntdqu(void *dst, const void *src, unsigned long len)
 {
+   if (unlikely(!may_use_simd())) {
+   memcpy(dst, src, len);
+   return;
+   }
+
kernel_fpu_begin();
 
while (len >= 4) {
-- 
2.26.2



Re: [PATCH crypto-stable v3 1/2] crypto: arch/lib - limit simd usage to 4k chunks

2020-04-28 Thread Jason A. Donenfeld
Hi Herbert,

This v3 patchset has a Reviewed-by from Ard for 1/2 and from Eric for
2/2, from last week. Could you submit this to Linus for rc4?

Thanks,
Jason


Re: WireGuard to port to existing Crypto API

2019-09-25 Thread Jason A. Donenfeld
Hi Dave,

On Wed, Sep 25, 2019 at 12:03 PM David Miller  wrote:
> I didn't say "must" anything, I suggested this as a more smoothe
> and efficient way forward.

s/must/should/g? However it's characterized, I think your jugements
and opinions are generally sound, and I intend to put them into
action.

> I'm also a bit disappointed that you felt the need to so quickly
> make such an explosive posting to the mailing list when we've

Explosive? That's certainly not the intent here. The project is
changing direction in a big way. Collaborating with others on the
crypto API will be an important part of that. Announcing the change in
direction, those intentions, a rationale on why it will be okay, and
inviting collaboration is a responsible thing to do at the earliest
opportunity. Better to announce intent early rather than surprise
people or deter potential collaborators by keeping plans secret.

Jason


  1   2   3   4   5   6   7   8   9   10   >