Re: Please create the email alias do-not-apply-to-sta...@kernel.org -> /dev/null

2024-04-17 Thread Willy Tarreau
On Wed, Apr 17, 2024 at 10:16:26AM +0200, Greg KH wrote:
> > at the scripts used by stable developers - and maybe at the ML server - to
> > catch different variations won't hurt, as it sounds likely that people will
> > end messing up with a big name like "do-not-apply-to-stable", typing
> > instead things like:
> > 
> > do_not_apply_to_stable
> > dont-apply-to-stable
> > 
> > and other variants.
> 
> I want this very explicit that someone does not want this applied, and
> that it has a reason to do so.  And if getting the email right to do so
> is the issue with that, that's fine.  This is a very rare case that
> almost no one should normally hit.

For using a comparable approach in haproxy on a daily basis, I do see
the value in this. We just mark a lot of fixes "no backport needed" or
"no backport needed unless blablabla" for everything that is only
relevant to the dev tree, and that's a huge time saver for those working
on the backports later.

Maybe "not-for-stable" would be both shorter and easier to remember BTW ?

Regards,
Willy



Re: [PATCH] selftests/nolibc: libc-test: avoid -Wstringop-overflow warnings

2023-09-11 Thread Willy Tarreau
Hi Thomas,

On Sun, Sep 10, 2023 at 09:29:01PM +0200, Thomas Weißschuh wrote:
> Newer versions of glibc annotate the poll() function with
> __attribute__(access) which triggers a compiler warning inside the
> testcase poll_fault.
> Avoid this by using a plain NULL which is enough for the testcase.
> To avoid potential future warnings also adapt the other EFAULT
> testcases, except select_fault as NULL is a valid value for its
> argument.
(...)

Looks good to me. I wouldn't be surprised if we're soon forced to do
the same with select() on some archs where it might be emulated.

Feel free to push it to the shared repo.

Thanks!
Willy


Re: [PATCH 00/13] [RFC] Rust support

2021-04-20 Thread Willy Tarreau
On Tue, Apr 20, 2021 at 07:56:18AM +0200, Greg Kroah-Hartman wrote:
> I would LOVE it if some "executives" would see the above presentations,
> because then they would maybe actually fund developers to fix bugs and
> maintain the kernel code, instead of only allowing them to add new
> features.
> 
> Seriously, that's the real problem, that Dmitry's work has exposed, the
> lack of people allowed to do this type of bugfixing and maintenance on
> company time, for something that the company relies on, is a huge issue.
> "executives" feel that they are willing to fund the initial work and
> then "throw it over the wall to the community" once it is merged, and
> then they can forget about it as "the community" will maintain it for
> them for free.  And that's a lie, as Dmitry's work shows.

That's sadly the eternal situation, and I'm suspecting that software
development and maintenance is not identified as a requirement for a
large number of hardware vendors, especially on the consumer side where
margins are lower. A contractor is paid to develop a driver, *sometimes*
to try to mainline it (and the later they engage with the community, the
longer it takes in round trips), and once the code finally gets merged,
all the initial budget is depleted and no more software work will be
done.

Worse, we could imagine kicking unmaintained drivers faster off the
tree, but that would actually help these unscrupulous vendors by
forcing their customers to switch to the new model :-/  And most of
them wouldn't care either if their contributions were refused based
on their track record of not maintaining their code, since they often
see this as a convenience to please their customers and not something
they need (after all, relying on a bogus and vulnerable BSP has never
prevented from selling a device, quite the opposite).

In short, there is a parallel universe where running highly bogus and
vulnerable out-of-tree code seems like the norm and where there is no
sort of care for what is mainlined as it's possibly just made to look
"cool".

We also need to recognize that it's expectable that some vendors are
not willing to engage on supporting a driver for a decade if they
expect their device to last 5 years only, and maybe we should make
some rules clear about mainlining drivers and what to expect for
users (in which case the end of support would be clear and nobody
would be surprised if the driver is removed at the end of its
maintenance, barring a switch to a community maintainer).

Just my two cents,
Willy


Re: [PATCH 00/13] [RFC] Rust support

2021-04-19 Thread Willy Tarreau
Hi Nick,

On Mon, Apr 19, 2021 at 05:24:33PM -0700, Nick Desaulniers wrote:
> I don't think the introduction of Rust made Firefox _more_ insecure.
> https://wiki.mozilla.org/Oxidation#Within_Firefox

Browsers are human interfaces and do not fundamentally require low
level access to memory/hardware/whatever. They can be written in
about any language, only the resource usage and performance will
make a difference. As such, some were even written in Java or JS
for example.

Operating systems, and particularly drivers *do* require low-level
accesses, and stuff that can hardly be abstracted or understood by
a compiler. You may have to perform two 16-bit reads/writes on a
32-bit MMIO address to perform an operation and the compiler does
not have to know it, just to obey.

> Really, a key point is that a lot of common mistakes in C are compile
> time errors in Rust. I know no "true" kernel dev would make such
> mistakes in C,

Everyone makes mistakes, the level of attention varies over time and
the focus often changes when dealing with build errors. How many time
some of us facing a bug remembered having changed the code very late
after a build error, and being less careful from this point when the
goal changed from "let's do it right" to "let's get this to build" ?

> but is there nothing we can do to help our peers
> writing drivers?  The point is to transfer cost from runtime to
> compile time to avoid costs at runtime; like all of the memory safety
> bugs which are costing our industry.

And do we have stats on the number of logical bugs, some of which are
caused by developers trying to work around compilers' stubbornness ?
For me, personally speaking, they have *increased* over time, usually
trying to avoid some annoying modern gcc warnings, resulting in integer
casts being placed close to string formats, or returns being placed in
switch/case to avoid the fall-through warning, etc. Thus I'm worried that
a non-negligible part of the 70% of bugs caused by memory safety issues
could be replaced with logic bugs to get to the point where the rust
compiler finally accepts to compile the code. It makes me think about
researchers trying to reduce the causes of certain deaths and claiming
to "save lives" while in the end the people they "save" will simply die
from something else.

And I'm not particularly trying to blindly defend C here. I'm complaining
every single day about some of its shortcomings like the vast amount of
UB, stupid type promotion, counter-intuitive operators precedence when
combining bit-ops with arithmetic, limited size of enums, lack of rotate
operator, strict aliasing, or the recourse to asm() statements every 10
lines to do stuff that can hardly be expressed in a way understandable
by a compiler. I'm just seeing that a lot of the griefs I'm having
against C come from the compiler trying to be too smart or too stubborn,
so giving even more of the handle to a compiler doesn't appeal me at all.

In addition, we all know how painful it is to work around compiler bugs
by writing complex code that carefully avoids certain constructs. I'm
wondering if we'll still have that luxury with a stricter compiler, or
if the only response will have to be between "let's disable this driver
that does not compile" or "please force distros to upgrade their
compilers".

But we'll see :-/

Regards,
Willy


Re: Compiling kernel-3.4.xxx with gcc-9.x. Need some help.

2021-04-18 Thread Willy Tarreau
On Sun, Apr 18, 2021 at 07:25:08PM +0200, Fawad Lateef wrote:
> I tried booting the userspace compiled with gcc-9.1 and kernel
> compiled with gcc-5.5. But seems like the kernel 3.4.111 is not
> compatible with user-space compiled with gcc-9.1.
> During boot getting error: "FATAL: kernel too old." (from init I
> believe) and then kernel Panics. Log (part) below:

That's not a compiler issue, it's a libc issue. When you built your
toolchain you have likely (or accidently) selected a minimum kernel
version that is more recent than this one. The init code in the linker
checks the kernel version and refuses to start in such a case. If your
init depends on this libc, you simply cannot boot.

I don't know how far recent libcs can go on kernel support, but
you may possibly need to rebuild an older one, and sometimes older
libc will not build with modern gcc. So you should use the most
recent libc that still claims to support that kernel, and use the
most recent compiler your libc can be built with (maybe yours is
OK but I don't know).

Hoping this helps,
Willy


Re: PROBLEM: DoS Attack on Fragment Cache

2021-04-17 Thread Willy Tarreau
On Sat, Apr 17, 2021 at 10:26:30PM -0400, Matt Corallo wrote:
> Sure, there are better ways to handle the reassembly cache overflowing, but
> that is pretty unrelated to the fact that waiting 30 full seconds for a
> fragment to come in doesn't really make sense in today's networks (the 30
> second delay that is used today appears to even be higher than RFC 791
> suggested in 1981!).

Not exactly actually, because you forget the TTL here. With most hosts
sending an initial TTL around 64, after crossing 10-15 hops it's still
around 50 so that would result in ~50 seconds by default, even according
to the 40 years old RFC791. The 15s there was the absolute minimum. While
I do agree that we shouldn't keep them that long nowadays, we can't go
too low without risking to break some slow transmission stacks (SLIP/PPP
over modems for example). In addition even cutting that in 3 will remain
trivially DoSable.

> You get a lot more bang for your buck if you don't wait
> around so long (or we could restructure things to kick out the oldest
> fragments, but that is a lot more work, and probably extra indexes that just
> aren't worth it).

Kicking out oldest ones is a bad approach in a system made only of
independent elements, because it tends to result in a lot of damage once
all of them behave similarly. I.e. if you need to kick out an old entry
in valid traffic, it's because you do need to wait that long, and if all
datagrams need to wait that long, then new datagrams systematically
prevent the oldest one from being reassembled, and none gest reassembled.
With a random approach at least your success ratio converges towards 1/e
(i.e. 36%) which is better.

Willy


Re: [PATCH 00/13] [RFC] Rust support

2021-04-17 Thread Willy Tarreau
On Sat, Apr 17, 2021 at 04:24:43PM +0200, Peter Zijlstra wrote:
> On Sat, Apr 17, 2021 at 01:46:23PM +0200, Willy Tarreau wrote:
> > For me the old trick of casting one side as long long still works:
> > 
> >   unsigned long long mul3264(unsigned int a, unsigned int b)
> >   {
> > return (unsigned long long)a * b;
> >   }
> > 
> > i386:
> >    :
> >  0: 8b 44 24 08   mov0x8(%esp),%eax
> >  4: f7 64 24 04   mull   0x4(%esp)
> >  8: c3ret
> > 
> > x86_64:
> >    :
> >  0: 89 f8 mov%edi,%eax
> >  2: 89 f7 mov%esi,%edi
> >  4: 48 0f af c7   imul   %rdi,%rax
> >  8: c3retq   
> > 
> > Or maybe you had something else in mind ?
> 
> Last time I tried it, the thing refused :/ which is how we ended up with
> mul_u32_u32() in asm.

Oh I trust you, I do remember having noticed it on one gcc version as
well (maybe 4.5). But I've been successfully using this since 2.95, and
could quickly recheck that 4.7, 4.8, 5.4, 6.5, 7.4, 9.3 and 11-trunk do
produce the code above, which is reassuring, as we all prefer to limit
the amount of asm statements.

Willy


Re: [PATCH 00/13] [RFC] Rust support

2021-04-17 Thread Willy Tarreau
On Sat, Apr 17, 2021 at 02:53:10PM +0100, Wedson Almeida Filho wrote:
> > > Note that this is
> > > another area where Rust offers advantages: read-only guards (in C, if you 
> > > take a
> > > read lock, nothing prevents you from making changes to fields you should 
> > > only be
> > > allowed to read);
> > 
> > But I'm happily doing that when I know what I'm doing. What you call a
> > read lock usually is in fact a shared lock as opposed to an exclusive
> > lock (generally used for writes). For me it's perfectly valid to perform
> > atomic writes under a read lock instead of forcing everyone to wait by
> > taking a write lock. You may for example take a read lock on a structure
> > to make sure that a field you're accessing in it points to stable memory
> > that is only modified under the write lock, but the pointer itself is
> > atomically accessed and swapped under the read lock.
> 
> Yes, this is a great example. Also easily expressible in Rust: they have this
> concept of interior mutability where certain types allow their contents to be
> modified even when shared immutably. Atomics offer such interior mutability, 
> so
> the scenario you describe is fine.
> 
> Rust in fact has an extra enforcement here that C doesn't: it requires 
> interior
> mutability for this scenario to be allowed, so you can't do it with a plain
> naked type (say u64) -- you'd need to use something like an atomic64_t, where
> you're required to specify memory ordering when accessing them.
> 
> In C we of course have atomics but the compiler never alerts us for when we 
> need
> them.

OK thanks for explaining.

> > > In fact, this is also an advantage of Rust. It would *force* developers to
> > > lock/unlock the RCU lock before they can access the protected data.
> > 
> > I'm really afraid by languages which force developers to do this or that.
> 
> When I say that Rust forces developers to do certain things, it's to provide 
> the
> compile-time safety guarantees. Some of these requirements are imposed by our
> own abstractions -- we can always revisit and try to improve them. In cases 
> when
> the abstractions cannot be further refined, developers always have the escape
> hatch of unsafety, where they're allowed to do pretty much everything as in C,
> but then they also give up the compile-time guarantees for those parts.

Well, I can't express how much I hate abstractions because I constantly
need to know what it's doing under the hood, and I spend my time reading
the output asm code because I always want to confirm my assumptions about
the compiler not cheating on me (and not hitting one of its bugs),
especially after C compilers have become so smart that they completely
replace your code with what they think is better for you, (including
nothing), so I guess all of this is really not for someone like me.

However while I'm pretty sure that based on our respective experiences
we'd probably disagree forever on a wide number of approaches when it
comes to deciding whether the developer or the compiler should have the
last say, I sincerely appreciate that you take the time to calmly explain
your differing views and the rationale behind, so many thanks for this!

Willy


Re: [PATCH 00/13] [RFC] Rust support

2021-04-17 Thread Willy Tarreau
On Sat, Apr 17, 2021 at 01:17:21PM +0200, Peter Zijlstra wrote:
> Well, I think the rules actually make sense, at the point in the syntax
> tree where + happens, we have 'unsigned char' and 'int', so at that
> point we promote to 'int'. Subsequently 'int' gets shifted and bad
> things happen.

That's always the problem caused by signedness being applied to the
type while modern machines do not care about that and use it during
(or even after) the operation instead :-/

We'd need to define some macros to zero-extend and sign-extend some
values to avoid such issues. I'm sure this would be more intuitive
than trying to guess how many casts (and in what order) to place to
make sure an operation works as desired.

> The 'unsigned long' doesn't happen until quite a bit later.
> 
> Anyway, the rules are imo fairly clear and logical, but yes they can be
> annoying. The really silly thing here is that << and >> have UB at all,
> and I would love a -fwrapv style flag that simply defines it. Yes it
> will generate worse code in some cases, but having the UB there is just
> stupid.

I'd also love to have a UB-less mode with well defined semantics for
plenty of operations that are known to work well on modern machines,
like integer wrapping, bit shifts ignoring higher bits etc. Lots of
stuff we often have to write useless code for, just to please the
compiler.

> That of course doesn't help your case here, it would simply misbehave
> and not be UB.
> 
> Another thing the C rules cannot really express is a 32x32->64
> multiplication, some (older) versions of GCC can be tricked into it, but
> mostly it just doesn't want to do that sanely and the C rules are
> absolutely no help there.

For me the old trick of casting one side as long long still works:

  unsigned long long mul3264(unsigned int a, unsigned int b)
  {
return (unsigned long long)a * b;
  }

i386:
   :
 0: 8b 44 24 08   mov0x8(%esp),%eax
 4: f7 64 24 04   mull   0x4(%esp)
 8: c3ret

x86_64:
   :
 0: 89 f8 mov%edi,%eax
 2: 89 f7 mov%esi,%edi
 4: 48 0f af c7   imul   %rdi,%rax
 8: c3retq   

Or maybe you had something else in mind ?

Willy


Re: PROBLEM: DoS Attack on Fragment Cache

2021-04-17 Thread Willy Tarreau
On Sat, Apr 17, 2021 at 12:42:39AM -0700, Keyu Man wrote:
> How about at least allow the existing queue to finish? Currently a tiny new
> fragment would potentially invalid all previous fragments by letting them
> timeout without allowing the fragments to come in to finish the assembly.

Because this is exactly the principle of how attacks are built: reserve
resources claiming that you'll send everything so that others can't make
use of the resources that are reserved to you. The best solution precisely
is *not* to wait for anyone to finish, hence *not* to reserve valuable
resources that are unusuable by others.

Willy


Re: PROBLEM: DoS Attack on Fragment Cache

2021-04-17 Thread Willy Tarreau
On Sat, Apr 17, 2021 at 06:44:40AM +0200, Eric Dumazet wrote:
> On Sat, Apr 17, 2021 at 2:31 AM David Ahern  wrote:
> >
> > [ cc author of 648700f76b03b7e8149d13cc2bdb3355035258a9 ]
> 
> I think this has been discussed already. There is no strategy that
> makes IP reassembly units immune to DDOS attacks.

For having tried to deal with this in the past as well, I agree with
this conclusion, which is also another good example of why fragments
should really be avoided as much as possible over hostile networks.

However I also found that random drops of previous entries is the
approach which seems to offer the most statistical opportunities to
legitimate traffic to still work under attack (albeit really poorly
considering that any lost fragment requires retransmission of the
whole series). In this case the chance for a packet to be successfully
reassembled would vary proportionally to the inverse of its number of
fragments, which reasonably limits the impact of attacks (without being
an ultimate solution of course).

> We added rb-tree and sysctls to let admins choose to use GB of RAM if
> they really care.

I agree that for those who care, the real solution is to make sure they
can store all the traffic they receive during a reassembly period.
Legitimate traffic mostly reassembles quickly so keeping 1 second of
traffic at 10 Gbps is only 1.25 GB of RAM after all...

Willy


Re: [PATCH 04/13] Kbuild: Rust support

2021-04-16 Thread Willy Tarreau
On Sat, Apr 17, 2021 at 01:46:35AM +0200, Miguel Ojeda wrote:
> On Sat, Apr 17, 2021 at 12:04 AM Willy Tarreau  wrote:
> >
> > But my point remains that the point of extreme care is at the interface
> > with the rest of the kernel because there is a change of semantics
> > there.
> >
> > Sure but as I said most often (due to API or ABI inheritance), both
> > are already exclusive and stored as ranges. Returning 1..4095 for
> > errno or a pointer including NULL for a success doesn't shock me at
> > all.
> 
> At the point of the interface we definitely need to take care of
> converting properly, but for Rust-to-Rust code (i.e. the ones using
> `Result` etc.), that would not be a concern.

Sure.

> Just to ensure I understood your concern, for instance, in this case
> you mentioned:
> 
>result.status = foo_alloc();
>if (!result.status) {
>result.error = -ENOMEM;
>return result;
>}

Yes I mentioned this when it was my understanding that the composite
result returned was made both of a pointer and an error code, but Connor
explained that it was in fact more of a selector and a union.

> Is your concern is that the caller would mix up the `status` with the
> `error`, basically bubbling up the `status` as an `int` and forgetting
> about the `error`, and then someone else later understanding that
> `int` as a non-error because it is non-negative?

My concern was to know what field to look at to reliably detect an error
from the C side after a sequence doing C -> Rust -> C when the inner C
code uses NULL to mark an error and the upper C code uses NULL as a valid
value and needs to look at an error code instead to rebuild a result. But
if it's more:
 
 if (result.ok)
return result.pointer;
 else
return (void *)-result.error;

then it shouldn't be an issue.

Willy


Re: [PATCH 04/13] Kbuild: Rust support

2021-04-16 Thread Willy Tarreau
On Fri, Apr 16, 2021 at 11:39:00PM +0200, Miguel Ojeda wrote:
> On Fri, Apr 16, 2021 at 10:58 PM Willy Tarreau  wrote:
> >
> > No, two:
> >   - ok in %rax (seems like it's "!ok" technically speaking since it
> > returns 1 on !ok and 0 on ok)
> >   - foo_or_err in %rdx
> 
> Yes, but that is the implementation -- conceptually you only have one
> or the other, and Rust won't allow you to use the wrong one.

OK so for unions you always pass two values along the whole chain, a
selector and the value itself.

But my point remains that the point of extreme care is at the interface
with the rest of the kernel because there is a change of semantics
there.

> > However then I'm bothered because Miguel's example showed that regardless
> > of OK, EINVAL was always returned in foo_or_err, so maybe it's just
> > because his example was not well chosen but it wasn't very visible from
> > the source:
> 
> That is the optimizer being fancy since the error can be put
> unconditionally in `rdx`.

Yes that's what I understood as well. I just didn't know that it had
to be seen as a union.

On Fri, Apr 16, 2021 at 11:19:18PM +0200, Miguel Ojeda wrote:
> On Fri, Apr 16, 2021 at 10:22 PM Willy Tarreau  wrote:
> >
> > So it simply does the equivalent of:
> >
> >   struct result {
> >  int status;
> >  int error;
> >   };
> 
> Not exactly, it is more like a tagged union, as Connor mentioned.
> 
> However, and this is the critical bit: it is a compile-time error to
> access the inactive variants (in safe code). In C, it is on you to
> keep track which one is the current one.

Sure but as I said most often (due to API or ABI inheritance), both
are already exclusive and stored as ranges. Returning 1..4095 for
errno or a pointer including NULL for a success doesn't shock me at
all.

Along thes lines I hardly see how you'd tag pointers by manipulating
their lower unused bits. That's something important both for memory
usage and performance (supports atomic opts).

> >  kill_foo();   // only for rust, C doesn't need it
> 
> Please note that `kill_foo()` is not needed in Rust -- it was an
> example of possible cleanup (since Al mentioned resources/cleanup)
> using RAII.

Yep but I kept it just to have comparable output code since in C
you'd simply use "goto leave" and not have this function call to
do the cleanup.

Willy


Re: [PATCH 04/13] Kbuild: Rust support

2021-04-16 Thread Willy Tarreau
On Fri, Apr 16, 2021 at 03:34:50PM -0500, Connor Kuehl wrote:
> On 4/16/21 3:22 PM, Willy Tarreau wrote:
> > So it simply does the equivalent of:
> > 
> >   #define EINVAL -1234
> > 
> >   struct result {
> >  int status;
> >  int error;
> >   };
> 
> Result and Option types are more like a union with a tag that
> describes which variant it is.
> 
> struct foo_result {
> /* if ok, then access foo_or_err.successful_foo
>  *else, access foo_or_err.error
>  */
> bool ok;
> union {
> struct foo successful_foo;
> int error;
> } foo_or_err;
> };

OK.

> > [..]
> > 
> > So it simply returns a pair of values instead of a single one, which
> 
> It will only return 1 value.

No, two:
  - ok in %rax (seems like it's "!ok" technically speaking since it
returns 1 on !ok and 0 on ok)
  - foo_or_err in %rdx

However then I'm bothered because Miguel's example showed that regardless
of OK, EINVAL was always returned in foo_or_err, so maybe it's just
because his example was not well chosen but it wasn't very visible from
the source:

 bar:
 pushrbx
 mov ebx, 1
 callqword ptr [rip + black_box@GOTPCREL]
 testal, al
 jne .LBB2_2
 callqword ptr [rip + kill_foo@GOTPCREL]
 xor ebx, ebx
 .LBB2_2:
 mov eax, ebx
 mov edx, -1234
 pop rbx
 ret

Willy


Re: [PATCH 04/13] Kbuild: Rust support

2021-04-16 Thread Willy Tarreau
On Fri, Apr 16, 2021 at 08:57:07PM +0200, Miguel Ojeda wrote:
> On Fri, Apr 16, 2021 at 8:10 PM Al Viro  wrote:
> >
> > How well would ? operator fit that pattern?  _If_ it's just a syntax sugar
> > along the lines of "if argument matches Err(_), return Err(_)", the types
> > shouldn't be an issue, but that might need some fun with releasing 
> > resources,
> > etc.  If it's something more elaborate... details, please.
> 
> Yes, it is just syntax sugar -- it doesn't introduce any power to the 
> language.
> 
> It was introduced because it is a very common pattern when using the
> `Result` and `Option` enums. In fact, before it existed, it was just a
> simple macro that you could also implement yourself.
> 
> For instance, given `Foo` and `Bar` types that need RAII cleanup of
> some kind (let's say `kill_foo()` and `kill_bar()`):
> 
> fn foo() -> KernelResult {
> if black_box() {
> return Err(EINVAL);
> }
> 
> // something that gets you a `Foo`
> let foo = ...;
> 
> Ok(foo)
> }
> 
> fn bar() -> KernelResult {
> let p = foo()?;
> 
> // something that gets you a `Bar`, possibly using the `p`
> let bar = ...;
> 
> Ok(bar)
> }
> 
> This reduces to (full example at https://godbolt.org/z/hjTxd3oP1):
> 
> bar:
> pushrbx
> mov ebx, 1
> callqword ptr [rip + black_box@GOTPCREL]
> testal, al
> jne .LBB2_2
> callqword ptr [rip + kill_foo@GOTPCREL]
> xor ebx, ebx
> .LBB2_2:
> mov eax, ebx
> mov edx, -1234
> pop rbx
> ret
> 
> You can see `bar()` calls `black_box()`. If it failed, it returns the
> EINVAL. Otherwise, it cleans up the `foo` automatically and returns
> the successful `bar`.

So it simply does the equivalent of:

  #define EINVAL -1234

  struct result {
 int status;
 int error;
  };

  extern bool black_box();
  extern void kill_foo();

  struct result bar()
  {
 return (struct error){ !!black_box(), EINVAL };
  }

  struct result foo()
  {
 struct result res = bar();

 if (res.status)
goto leave;
 /* ... */
 kill_foo();   // only for rust, C doesn't need it
  leave:
 return res;
  }

So it simply returns a pair of values instead of a single one, which
is nothing special but not much conventional in the kernel either given
that ultimately syscalls will usually return a single value anyway. At
some point some code will have to remerge them to provide a composite
result and even take care of ambigous special cases like { true, 0 }
which may or may not indicate an error, and avoiding to consider the
unused error code on success.

For example some code called from mmap() might have to deal with this
this way:

   if (result.status == (void *)-1)
   return -result.error;
   else
   return result.status;

But possibly a lot of code feeding this result struct would be tempted
to do something like this to deal with NULL returns:

   result.status = foo_alloc();
   if (!result.status) {
   result.error = -ENOMEM;
   return result;
   }

And suddenly the error is lost, as a NULL is returned to the upper layers
which does not correspond to an failure status. Conversely, with a unique
result you'd do something like this:

   result = foo_alloc();
   if (!result)
   return -ENOMEM;

So it might possibly be safer to stick to the usually expected return
values instead of introducing composite ones.

I tend to agree that composite results can be better from new projects
started from scratch when all the API follows this principle, but here
there's quite a massive code base that was not designed along these
lines and I easily see how this can become a source of trouble over
time.

Willy


Re: [PATCH 00/13] [RFC] Rust support

2021-04-16 Thread Willy Tarreau
Hi Miguel,

On Fri, Apr 16, 2021 at 07:10:17PM +0200, Miguel Ojeda wrote:
> And by having the compiler enforce this safe-unsafe split, you can
> review safe code without having to constantly worry about UB; and be
> extra alert when dealing with `unsafe` blocks.

I do appreciate this safe/unsafe split and a few other things I've seen
in the language. The equivalent I'm using in C is stronger typing and
"const" modifiers wherever possible. Of course it's much more limited,
it's just to explain that I do value this. I just feel like "unsafe"
is the universal response to any question "how would I do this" while
at the same time "safe" is the best selling argument for the language.
As such, I strongly doubt about the real benefits once facing reality
with everything marked unsafe. Except that it will be easier to blame
the person having written the unsafe one-liner instead of writing 60
cryptic lines doing the functional equivalent using some lesser known
extensions :-/

> Of course, UB is only a subset of errors, but it is a major one, and
> particularly critical for privileged code.

Not in my experience. I do create bugs that very seldomly stem from UB,
like any of us probably. But the vast majority of my bugs are caused by
stupid logic errors. When you invert an error check somewhere because
the function name looks like a boolean but its result works the other
way around, you can pass 10 times over it without noticing, and the
compiler will not help. And these ones are due to the human brain not
being that powerful in front of a computer, and whatever language will
not change this. Or worse, if it's harder to express what I want, I
will write more bugs. It happened to me quite a few times already
trying to work around absurd gcc warnings.

Based on the comments in this thread and the responses often being
around "we'll try to get this done" or "we'll bring the issue to the
compiler team", combined with the difficulty to keep control over
resources usage, I'm really not convinced at all it's suited for
low-level development. I understand the interest of the experiment
to help the language evolve into that direction, but I fear that
the kernel will soon be as bloated and insecure as a browser, and
that's really not to please me.

Cheers,
Willy


Re: [PATCH 00/13] [RFC] Rust support

2021-04-16 Thread Willy Tarreau
On Fri, Apr 16, 2021 at 04:33:51PM +0100, Wedson Almeida Filho wrote:
> On Fri, Apr 16, 2021 at 04:19:07PM +0200, Peter Zijlstra wrote:
> > Does this also not prohibit constructs where modification must be done
> > while holding two locks, but reading can be done while holding either
> > lock?
> 
> I don't believe it does. Remember that we have full control of the 
> abstractions,
> so we can (and will when the need arises) build an abstraction that provides 
> the
> functionality you describe. For the read path, we can have functions that 
> return
> a read-only guard (which is the gateway to the data in Rust) when locking 
> either
> of the locks, or when showing evidence that either lock is already locked 
> (i.e.,
> by temporarily transferring ownership of another guard).

But will this remain syntactically readable/writable by mere humans ?
I mean, I keep extremely bad memories of having tried to write a loop
oconcatenating at most N times a string to another one, where N was a
number provided on the command line, with the compiler shouting at me
all the time until I blindly copy-pasted random pieces of unreadable
code from the net with a horribly complicated syntax that still
resulted in the impossibility for me to check for memory allocation
before failing. So I'm wondering how complicated that can become after
adding all sort of artificial protections on top of this :-/

> Note that this is
> another area where Rust offers advantages: read-only guards (in C, if you 
> take a
> read lock, nothing prevents you from making changes to fields you should only 
> be
> allowed to read);

But I'm happily doing that when I know what I'm doing. What you call a
read lock usually is in fact a shared lock as opposed to an exclusive
lock (generally used for writes). For me it's perfectly valid to perform
atomic writes under a read lock instead of forcing everyone to wait by
taking a write lock. You may for example take a read lock on a structure
to make sure that a field you're accessing in it points to stable memory
that is only modified under the write lock, but the pointer itself is
atomically accessed and swapped under the read lock.

> In fact, this is also an advantage of Rust. It would *force* developers to
> lock/unlock the RCU lock before they can access the protected data.

I'm really afraid by languages which force developers to do this or that.
Many bugs in C come from casts because developers know their use case
better than the compiler's developers, and result in lack of warnings
when the code evolves, leaving pending bugs behind. What is important
in my opinion is to let developers express what they want and report
suspicious constructs, not to force them to dirtily work around rules
that conflict with their use case :-/

Willy


Re: [PATCH 00/13] [RFC] Rust support

2021-04-16 Thread Willy Tarreau
On Fri, Apr 16, 2021 at 10:16:05AM +0200, Michal Kubecek wrote:
> And I don't see how the two languages might coexist peacefully without
> rust toolchain being necessary for building any kernel useful in
> practice and anyone seriously involved in kernel development having to
> be proficient in both languages.

Two languages ? No, one is specified and multiple-implemented, the other
one is the defined as what its only compiler understands at the moment,
so it's not a language, it's a compiler's reference manual at best. I'm
waiting for the day you're force to write things which look wrong with a
big comment around saying "it's not a bug it's a workaround for a bug in
the unique compiler, waiting to be retrofitted into the spec to solve the
problem for every user". Already seen for at least another "language"
implemented by a single vendor 22 years ago.

> Neither of these looks appealing to me.
> 
> The dependency on rust toolchain was exactly what made me give up on
> building Firefox from mercurial snapshots few years ago. To be able to
> build them, one needed bleeding edge snapshots of rust toolchain which
> my distribution couldn't possibly provide and building them myself
> required way too much effort. This very discussion already revealed that
> rust kernel code would provide similar experience. I also have my doubts
> about the "optional" part; once there are some interesting drivers
> written in rust, even if only in the form of out of tree modules, there
> will be an enormous pressure on distributions, both community and
> enterprise, to enable rust support.

Yes this scarily looks like the usual "embrace and extend... and abandon
the corpse once it doesn't move anymore".

I've already faced situations where I couldn't compile a recent 5.x kernel
using my previous gcc-4.7 compiler and this really really really pissed me
off because I'd had it in a build farm for many architectures and I had to
give up. But I also know that updating to a newer version will take time,
will be durable and will be worth it for the long term (except for the fact
that gcc doubles the build time every two versions). But here having to use
*the* compiler of the day and being prepared to keep a collection of them
to work with different stable kernels, no!

Also, I'm a bit worried about long-term survival of the new
language-of-the-day-that-makes-you-look-cool-at-beer-events. I was once
told perl would replace C everywhere. Does someone use it outside of
checkpatch.pl anymore ? Then I was told that C was dead because PHP was
appearing everywhere. I've even seen (slow) log processors written with
it. Now PHP seems to only be a WAF-selling argument. Then Ruby was "safe"
and would rule them all. Safe as its tab[-1] which crashed the interpreter.
Anyone heard of it recently ? Then Python, whose 2.7 is still present on
a lot of systems because the forced transition to 3 broke tons of code.
Will there ever be a 4 after this sore experience ? Then JS, Rust, Go,
Zig and I don't know what. What I'm noting is that such languages appear,
serve a purpose well, have their moment of fame, last a decade and
disappear except at a few enthousiasts. C has been there for 50 years
and served as the basis of many newer languages so it's still well
understood. I'm sure about one thing, the C bugs we have today will be
fixable in 20 years. I'm not even sure the Rust code we'll merge today
will still be compilable in 10 years nor will support the relevant
architectures available by then, and probably this code will have to
be rewritten in C to become maintained again.

> The other problem is even worse. Once we have non-trivial amount of rust
> code around the tree, even if it's "just some drivers", you cannot
> completely ignore it. One example would be internal API changes. Today,
> if I want to touch e.g. ethtool_ops, I need to adjust all in tree NIC
> drivers providing the affected callback and adjust them. Usually most of
> the patch is generated by spatch but manual tweaks are often needed here
> and there. In the world of bilingual kernel with nontrivial number of
> NIC drivers written in rust, I don't see how I could do that without
> also being proficient in rust.

You'll simply change the code you're able to change and those in charge
of their driver will use your commit message as instruction to fix the
build on theirs. How do you want it to be otherwise ?

> Rust enthusiasts tell us they want to
> open kernel development to more people but the result could as well be
> exactly the opposite: it could restrict kernel development to people
> proficient in _both_ languages.

This has been my understanding from the very beginning. Language prophets
always want to conquier very visible targets as a gauge of their baby's
popularity.

> As Peter said, it's not an imminent problem but as it's obvious this is
> just the first step, we should have a clear idea what the plan is and
> what we can and should expect.

I think the experience could 

Re: [PATCH] floppy: remove redundant assignment to variable st

2021-04-15 Thread Willy Tarreau
On Thu, Apr 15, 2021 at 02:00:20PM +0100, Colin King wrote:
> From: Colin Ian King 
> 
> The variable st is being assigned a value that is never read and
> it is being updated later with a new value. The initialization is
> redundant and can be removed.
> 
> Addresses-Coverity: ("Unused value")
> Signed-off-by: Colin Ian King 

Indeed you're right.

Acked-by: Willy Tarreau 

Willy


Re: Candidate Linux ABI for Intel AMX and hypothetical new related features

2021-04-14 Thread Willy Tarreau
On Thu, Apr 15, 2021 at 06:43:43AM +0200, Borislav Petkov wrote:
> On Wed, Apr 14, 2021 at 05:57:22PM -0400, Len Brown wrote:
> > I'm pretty sure that the "it isn't my use case of interest, so it
> > doesn't matter" line of reasoning has long been established as -EINVAL
> > ;-)
> 
> I have only a very faint idea what you're trying to say here. Please
> explain properly and more verbosely what exactly has been established
> where?

What Len is saying is that not being interested in a feature is not an
argument for rejecting its adoption, which I'm perfectly fine with. But
conversely not being interested in a feature is also an argument for
insisting that its adoption doesn't harm other use cases (generally
speaking, not this specific case here).

Willy


Re: Candidate Linux ABI for Intel AMX and hypothetical new related features

2021-04-14 Thread Willy Tarreau
On Wed, Apr 14, 2021 at 11:58:04AM +0200, Borislav Petkov wrote:
> On Tue, Apr 13, 2021 at 03:51:50PM -0400, Len Brown wrote:
> > AMX does the type of matrix multiplication that AI algorithms use. In
> > the unlikely event that you or one of the libraries you call are doing
> > the same, then you will be very happy with AMX. Otherwise, you'll
> > probably not use it.
> 
> Which sounds to me like AMX is something which should not be enabled
> automatically but explicitly requested. I don't see the majority of the
> processes on the majority of the Linux machines out there doing AI with
> AMX - at least not anytime soon. If it becomes ubiquitous later, we can
> make it automatic then.

And change jobs :-)

Willy


Re: Candidate Linux ABI for Intel AMX and hypothetical new related features

2021-04-12 Thread Willy Tarreau
On Mon, Apr 12, 2021 at 07:46:06PM -0400, Len Brown wrote:
> On Mon, Apr 12, 2021 at 11:21 AM Andy Lutomirski  wrote:
> 
> > AMX: Multiplying a 4x4 matrix probably looks *great* in a
> > microbenchmark.  Do it once and you permanently allocate 8kB (is that
> > even a constant?  can it grow in newer parts?), potentially hurts all
> > future context switches, and does who-knows-what to Turbo licenses and
> > such.
> 
> Intel expects that AMX will be extremely valuable to key workloads.
> It is true that you may never run that kind of workload on the machine
> in front of you,
> and so you have every right to be doubtful about the value of AMX.
> 
> The AMX architectural state size is not expected to change.
> Rather, if a "new AMX" has a different state size, it is expected to
> use a new feature bit, different from AMX.
> 
> The AMX context switch buffer is allocated only if and when a task
> touches AMX registers.
> 
> Yes, there will be data transfer to and from that buffer when three
> things all happen.
> 1. the data is valid
> 2. hardware interrupts the application
> 3. the kernel decides to context switch.

As a userspace developer of a proxy, my code is extremely sensitive to
syscall cost and works in environments where 1 million interrupts/s is
not uncommon. Additionally the data I process are small HTTP headers
and I already had to reimplement my own byte-level memcmp because the
overhead of some libc to decide what variant to use to compare 5 bytes
was higher than the time to iterate over them.

So I'm among those userspace developers who grumble each time new
technology is automatically adopted by the compiler and libs, because
that tends to make me figure what the impact is and how to work around
it. I have no idea what AMX could bring me but reading this above makes
me think that it has a great potential of significantly hurting the
performance if one lib decides to occasionally make use of it. It would
possibly be similar if a lib decided to use AVX-512 to copy data and if
it resulted in the CPU quickly reaching its TDP and starting to throttle
like crazy :-/

Thus I think that the first thing to think about before introducing
possibly cost-sensitive optimizations is : how do I allow easily
user-space to easily disable them for a task, and how do I allow an
admin to easily disable them system-wide. "echo !foobar > cpuinfo"
could be a nice way to mask a flag system-wide for example. prctl()
would be nice for a task (as long as it's not too late already).

Maybe the API should be surrounded by __amx_begin() / __amx_end() and
the calls having undefined behavior outside of these. These flags would
put a flag somewhere asking to extend the stacks, or __amx_begin() could
even point itself to the specific stack to be used. This way it could
possibly allow some userspace libraries to use it for small stuff
without definitely impacting the rest of the process.

> At the risk of stating the obvious...
> Intel's view is that libraries that deliver the most value from the
> hardware are a "good thing",
> and that anything preventing libraries from getting the most value
> from the hardware is a "bad thing":-)

As a developer I have a different view. Anything that requires to build
using different libraries depending on the systems is a real hassle,
and I want to focus on the same code to run everywhere. I'm fine with
some #ifdef in the code if I know that a specific part must run as
fast as possible, and even some runtime detection at various points
but do not want to have to deal with extra dependencies that further
increase the test matrix and combinations in bug reports.

Just my two cents,
Willy


Re: [PATCH v8 5/5] FIX driver

2021-03-29 Thread Willy Tarreau
On Mon, Mar 29, 2021 at 11:51:38AM +0200, Gustavo Pimentel wrote:
> Signed-off-by: Gustavo Pimentel 

Please make an effort, this is in no way an acceptable commit description
for a patch. The subject is already extremely vague "FIX driver" with no
context at all, and there is no description of the intent here. What is
someone supposed to think about the risk of keeping or reverting it if a
bisect section would end on this one ?

Thanks,
Willy


Re: Testers wanted: Atom netbooks with x86_64 disabled by BIOS

2021-03-28 Thread Willy Tarreau
On Sun, Mar 28, 2021 at 11:14:05PM +0300, Andy Shevchenko wrote:
> On Sunday, March 28, 2021, Willy Tarreau  wrote:
> 
> > Hi Thomas,
> >
> > On Sun, Mar 28, 2021 at 03:07:24AM +0200, Thomas Gleixner wrote:
> > > On Sun, Mar 28 2021 at 00:25, Willy Tarreau wrote:
> > > > On Sat, Mar 27, 2021 at 10:13:22PM +0100, Mateusz Jonczyk wrote:
> > > > FWIW I tested on my ASUS 1025C which runs on an Atom N2600 forced to
> > > > 32-bit. I had already tried in the past but wanted to give it a try
> > > > again in case I'd have missed anything. Sadly it didn't work, I'm
> > > > still getting the "requires an x86-64 CPU" message.
> > > >
> > > > Given these machines were really cheap, I've always suspected that they
> > > > employ cheaper, low-grade CPUs, possibly having been subject to reduced
> > > > tests where x86_64-specific parts were not even verified and might be
> > > > defective. This may explain why they forcefully disable long mode
> > there,
> > > > but that's just speculation.
> > >
> > > There are some of these '32bit only' CPUs out there in the wild which
> > > actually support long mode. Some of them even do not have the long mode
> > > CPUID bit fused out.
> >
> > Yes, I'm aware of this as well. We might even have talked to the same
> > "victim" :-)
> >
> > > But whether it works is a different story:
> > >
> > >   - If the CPUID bit is on, then the chance is high, but it runs out of
> > > spec (guarantee wise)
> > >
> > >   - If it's off is still might work by some definition of work as they
> > > might have fused off more or there are actual defects in some 64bit
> > > only area which are irrelevant when in 32bit mode.
> > >
> > > Even if it could work perfectly fine, the BIOS/SMM/ucode can prevent
> > > switching to long mode.
> > >
> > > It's a lost cause.
> >
> > I agree. While I bought this netbook to have a 64-bit CPU and was extremely
> > disappointed,
> 
> 
> Where did you get an idea that it had 64-bit SoC from?

It's an N2600, and I bought this laptop because N2600 supports 64-bit
(and do have another mini-itx motherboard at work with the same CPU
on it working pretty well in 64-bit):

   
https://ark.intel.com/content/www/us/en/ark/products/58916/intel-atom-processor-n2600-1m-cache-1-6-ghz.html

> Atom Based 64-bit ones are Bay Trail, Cherry Trail, Tangier (Merrifield),
> Anniedale (Moorefield) and all based on Skylake family (Apollo Lake,
> Broxton, Gemini Lake, ...).

Well, to be honest, I've never been able to remind (nor sort) all these
totally crazy names. The day someone gives me a mnemotechnic hint to
remind them and their ordering, that will make me reconsider them. For
now they're all "something lake", and I find it particularly difficult
to map them to SKUs.

Cheers,
Willy


Re: Testers wanted: Atom netbooks with x86_64 disabled by BIOS

2021-03-28 Thread Willy Tarreau
On Sun, Mar 28, 2021 at 03:30:29PM +0200, Willy Tarreau wrote:
> On Sun, Mar 28, 2021 at 02:37:55PM +0200, Mateusz Jonczyk wrote:
> > W dniu 28.03.2021 o 00:25, Willy Tarreau pisze:
> > > FWIW I tested on my ASUS 1025C which runs on an Atom N2600 forced to
> > > 32-bit. I had already tried in the past but wanted to give it a try
> > > again in case I'd have missed anything. Sadly it didn't work, I'm
> > > still getting the "requires an x86-64 CPU" message.
> > 
> > Thank you. It looks like your bootloader uses the 16-bit kernel boot
> > protocol. The 16-bit kernel boot code checks for x86_64 presence with a
> > similar message ( inside arch/x86/boot/cpu.c ), which I did not patch out.
> > If you would like to test again, use the same patched kernel, but change in
> > GRUB: "linux16" to "linux" and "initrd16" to "initrd" to use the 32-bit boot
> > protocol. Which distribution and bootloader do you use?
> 
> I'm using Lilo on an old Slackware. I can patch the 16-bit code myself,
> it's no big deal.
> 
> > Of course, force enabling x86_64 would require passing a kernel command line
> > parameter with a prominent warning in documentation, just like with
> > "forcepae".
> 
> Sure, but I mean, I suspect that the risk could be higher with very low
> priced laptops were crappy chips are to be expected by definition based
> on contracts neither you nor me have seen. 
> 
> I'll try again after patching the 16-bit code, thanks for the suggestion.

So I added this at the end of get_cpuflags():

 set_bit(X86_FEATURE_LM, cpu.flags);

But now it goes further, the screen turns black and after 2 seconds or so
it reboots, it looks like a triple fault late in the init process. No need
to go further on this machine!

Willy


Re: Testers wanted: Atom netbooks with x86_64 disabled by BIOS

2021-03-28 Thread Willy Tarreau
On Sun, Mar 28, 2021 at 02:37:55PM +0200, Mateusz Jonczyk wrote:
> W dniu 28.03.2021 o 00:25, Willy Tarreau pisze:
> > FWIW I tested on my ASUS 1025C which runs on an Atom N2600 forced to
> > 32-bit. I had already tried in the past but wanted to give it a try
> > again in case I'd have missed anything. Sadly it didn't work, I'm
> > still getting the "requires an x86-64 CPU" message.
> 
> Thank you. It looks like your bootloader uses the 16-bit kernel boot
> protocol. The 16-bit kernel boot code checks for x86_64 presence with a
> similar message ( inside arch/x86/boot/cpu.c ), which I did not patch out.
> If you would like to test again, use the same patched kernel, but change in
> GRUB: "linux16" to "linux" and "initrd16" to "initrd" to use the 32-bit boot
> protocol. Which distribution and bootloader do you use?

I'm using Lilo on an old Slackware. I can patch the 16-bit code myself,
it's no big deal.

> Of course, force enabling x86_64 would require passing a kernel command line
> parameter with a prominent warning in documentation, just like with
> "forcepae".

Sure, but I mean, I suspect that the risk could be higher with very low
priced laptops were crappy chips are to be expected by definition based
on contracts neither you nor me have seen. 

I'll try again after patching the 16-bit code, thanks for the suggestion.
Willy


Re: Testers wanted: Atom netbooks with x86_64 disabled by BIOS

2021-03-28 Thread Willy Tarreau
Hi Thomas,

On Sun, Mar 28, 2021 at 03:07:24AM +0200, Thomas Gleixner wrote:
> On Sun, Mar 28 2021 at 00:25, Willy Tarreau wrote:
> > On Sat, Mar 27, 2021 at 10:13:22PM +0100, Mateusz Jonczyk wrote:
> > FWIW I tested on my ASUS 1025C which runs on an Atom N2600 forced to
> > 32-bit. I had already tried in the past but wanted to give it a try
> > again in case I'd have missed anything. Sadly it didn't work, I'm
> > still getting the "requires an x86-64 CPU" message.
> >
> > Given these machines were really cheap, I've always suspected that they
> > employ cheaper, low-grade CPUs, possibly having been subject to reduced
> > tests where x86_64-specific parts were not even verified and might be
> > defective. This may explain why they forcefully disable long mode there,
> > but that's just speculation.
> 
> There are some of these '32bit only' CPUs out there in the wild which
> actually support long mode. Some of them even do not have the long mode
> CPUID bit fused out.

Yes, I'm aware of this as well. We might even have talked to the same
"victim" :-)

> But whether it works is a different story:
> 
>   - If the CPUID bit is on, then the chance is high, but it runs out of
> spec (guarantee wise)
> 
>   - If it's off is still might work by some definition of work as they
> might have fused off more or there are actual defects in some 64bit
> only area which are irrelevant when in 32bit mode.
> 
> Even if it could work perfectly fine, the BIOS/SMM/ucode can prevent
> switching to long mode.
> 
> It's a lost cause.

I agree. While I bought this netbook to have a 64-bit CPU and was extremely
disappointed, after seeing that it was not just a matter of "oops we forgot
to enable LM", I concluded that it was pointless to try to go further, as I
would never trust it anyway.

Cheers,
Willy


Re: Testers wanted: Atom netbooks with x86_64 disabled by BIOS

2021-03-27 Thread Willy Tarreau
Hi,

On Sat, Mar 27, 2021 at 10:13:22PM +0100, Mateusz Jonczyk wrote:
> W dniu 27.03.2021 o 21:32, Mateusz Jonczyk pisze:
> > Hello,
> >
> > There are some netbooks with Intel Atom processors that have 64-bit
> > support disabled by BIOS. Theoretically, the processor supports 64-bit
> > operation, but BIOS allows only 32-bit code to run.
> >
> > I wonder whether the 64-bit mode is really disabled in the CPU or only
> > hidden in the CPUID flags. If the latter, the computer could be made to
> > run a 64-bit kernel.
> >
> > Similarly, there are some Pentium M processors that support PAE
> > (Physical Address Extensions), but do not show this in CPUID. They could
> > be made to run distributions that require PAE with the "forcepae" kernel
> > command line parameter.
> >
> > I would like to ask people with such netbooks to try to run a 64-bit kernel
> > with this patch applied.
> >
> > When a patched 64-bit kernel is run in `qemu-system-i386`, the virtual
> > machine restarts instantly. Without this patch in such a case a 64-bit
> > kernel hangs indefinitely (inside .Lno_longmode in head_64.S).
> 
> I have made two mistakes:
> - I left commented out code,
> - I have commented out lines with '#'. The code compiled though.
> 
> Attaching corrected patch, please excuse me.

FWIW I tested on my ASUS 1025C which runs on an Atom N2600 forced to
32-bit. I had already tried in the past but wanted to give it a try
again in case I'd have missed anything. Sadly it didn't work, I'm
still getting the "requires an x86-64 CPU" message.

Given these machines were really cheap, I've always suspected that they
employ cheaper, low-grade CPUs, possibly having been subject to reduced
tests where x86_64-specific parts were not even verified and might be
defective. This may explain why they forcefully disable long mode there,
but that's just speculation.

Cheers,
Willy


Re: [PATCH] target: pscsi: avoid Wempty-body warning

2021-03-22 Thread Willy Tarreau
On Mon, Mar 22, 2021 at 05:34:48PM +0100, Arnd Bergmann wrote:
> On Mon, Mar 22, 2021 at 5:26 PM Willy Tarreau  wrote:
> > On Mon, Mar 22, 2021 at 05:18:23PM +0100, Arnd Bergmann wrote:
> > > and a lot mostly harmless code like
> > >
> > > #ifdef DEBUG_THIS_DRIVER /* always disabled */
> > > #define dprintk(args...) printk(args)
> > > #else
> > > #define dprintk(args...)
> > > #endif
> > > /* note the mismatched format string */
> > > dprintk(KERN_WARNING "error %d\n", );
> > >
> > > Turning the empty dprintk() into no_printk() means we can catch
> > > the wrong format string during compile testing.
> >
> > Hmmm OK for this one. With this said, given how plenty of warnings seem
> > to consider indent and whatever, I wouldn't be surprised if a difference
> > is made between a totally empty body and one that results from an empty
> > macro. But indeed this one can represent a real bug.
> 
> The -Wempty-body warning is actually really old and predates the compiler's
> understanding of indentation, we just always disabled it by default so far.
> 
> As a lot of subsystems are W=1 clean these days, I just went for the
> final 26 patches to shut up all empty-body warnings in randconfig builds.
> Most of these were in the dprink() category, though none of this last set
> actually had incorrect format strings.

I agree that if it's only 26 patches on the whole kernel to re-enable one
warning it can be worth it for newcomers.

Willy


Re: [PATCH] target: pscsi: avoid Wempty-body warning

2021-03-22 Thread Willy Tarreau
On Mon, Mar 22, 2021 at 05:18:23PM +0100, Arnd Bergmann wrote:
> I agree that this instance of the warning is particularly stupid, but the
> I'd like to leave the warning option there and eventually enable it by
> default because it tends to find other more interesting cases, and this
> one is trivial to work around.
> 
> I remember previously fixing a few drivers that did obviously
> incorrect things like
> 
> if (error); /* note the extra ';' */
>  return error;

I totally agree with this one but usually it's already reported by
another one (probably the one complaining about misindenting). The
case I've seen quite a few times was:

 while (condition);

At least I want the ';' on its own line to avoid it being
confused with one that ends a do {} while() block.

> and a lot mostly harmless code like
> 
> #ifdef DEBUG_THIS_DRIVER /* always disabled */
> #define dprintk(args...) printk(args)
> #else
> #define dprintk(args...)
> #endif
> /* note the mismatched format string */
> dprintk(KERN_WARNING "error %d\n", );
> 
> Turning the empty dprintk() into no_printk() means we can catch
> the wrong format string during compile testing.

Hmmm OK for this one. With this said, given how plenty of warnings seem
to consider indent and whatever, I wouldn't be surprised if a difference
is made between a totally empty body and one that results from an empty
macro. But indeed this one can represent a real bug.

Willy


Re: [PATCH] target: pscsi: avoid Wempty-body warning

2021-03-22 Thread Willy Tarreau
On Mon, Mar 22, 2021 at 03:47:35PM +, Christoph Hellwig wrote:
> On Mon, Mar 22, 2021 at 12:44:34PM +0100, Arnd Bergmann wrote:
> > From: Arnd Bergmann 
> > 
> > Building with 'make W=1' shows a harmless warning for pscsi:
> > 
> > drivers/target/target_core_pscsi.c: In function 'pscsi_complete_cmd':
> > drivers/target/target_core_pscsi.c:624:33: error: suggest braces around 
> > empty body in an 'if' statement [-Werror=empty-body]
> >   624 | ; /* XXX: 
> > TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE */
> >   | ^
> > 
> > Rework the coding style as suggested by gcc to avoid the warning.
> 
> I would much, much prefer to drop the bogus warning;
> 
>   if (foo)
>   ; /* comment */
> 
> is a fairly usual and absolutely sensible style.  The warning on hte
> other hand is completely stupid.

Agreed!

These days it seems there is a competition for the stupidest warning
between compilers, and we've reached the point where working around
them manages to introduce real bugs :-(

I predict we'll soon see warning such as "this comment looks like valid
C code, if you really intended to comment it out, use #if 0 instead". Oh
well, let's hope I have not given a new idea here...

Willy


Re: [PATCH] tools: include: nolibc: Fix a typo occured to occurred in the file nolibc.h

2021-03-22 Thread Willy Tarreau
On Sat, Feb 27, 2021 at 02:58:18PM -0800, Randy Dunlap wrote:
> On 2/27/21 2:44 PM, Bhaskar Chowdhury wrote:
> > 
> > s/occured/occurred/
> > 
> > Signed-off-by: Bhaskar Chowdhury 
> 
> Acked-by: Randy Dunlap 

Oops, seems like I missed this one, now queued, thanks to you both!
Willy


Re: [PATCH] atl1c: optimize rx loop

2021-03-18 Thread Willy Tarreau
On Fri, Mar 19, 2021 at 12:04:47PM +0800, Sieng Piaw Liew wrote:
> Remove this trivial bit of inefficiency from the rx receive loop,
> results in increase of a few Mbps in iperf3. Tested on Intel Core2
> platform.
> 
> Signed-off-by: Sieng Piaw Liew 
> ---
>  drivers/net/ethernet/atheros/atl1c/atl1c_main.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c 
> b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> index 3f65f2b370c5..b995f9a0479c 100644
> --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> @@ -1796,9 +1796,7 @@ static void atl1c_clean_rx_irq(struct atl1c_adapter 
> *adapter,
>   struct atl1c_recv_ret_status *rrs;
>   struct atl1c_buffer *buffer_info;
>  
> - while (1) {
> - if (*work_done >= work_to_do)
> - break;
> + while (*work_done < work_to_do) {

It should not change anything, or only based on the compiler's optimization
and should not result in a measurable difference because what it does is
exactly the same. Have you really compared the compiled output code to
explain the difference ? I strongly suspect you'll find no difference at
all.

Thus for me it's certainly not an optimization, it could be qualified as
a cleanup to improve code readability however.

Willy


Re: Why is the bit size different between a syscall and its wrapper?

2021-03-11 Thread Willy Tarreau
On Fri, Mar 12, 2021 at 11:48:11AM +0900, Masahiro Yamada wrote:
> Hi.
> 
> I think I am missing something, but
> is there any particular reason to
> use a different bit size between
> a syscall and its userspace wrapper?
> 
> 
> 
> For example, for the unshare syscall,
> 
> unshare(2) says the parameter is int.
> 
> 
> SYNOPSIS
>#define _GNU_SOURCE
>#include 
> 
>int unshare(int flags);
> 
> 
> 
> 
> In the kernel, it is unsigned long.
> 
> 
> SYSCALL_DEFINE1(unshare, unsigned long, unshare_flags)
> {
> return ksys_unshare(unshare_flags);
> }

The syscalls must have a well defined interface for a given architecture.
Thus in practice the ABI will define that arg1 goes into this register,
arg2 into this one etc, regardless of their type (plenty of them are
pointers for example). The long is the size of a register so it can carry
any of the types we care about. So by defining each syscall as a function
taking 1 to 6 fixed-size arguments you can implement about all syscalls.

Regarding the libc, it has to offer an interface which is compatible with
the standard definition of the syscalls as defined by POSIX or as commonly
found on other OSes, and this regardless of the platform.

For example look at recv(), it takes an int, a pointer, a size_t and an
int. It requires to be defined like this for portability, but at the OS
level all these will typically be passed as a register each.

Hoping this helps,
Willy


Re: macb broken on HiFive Unleashed

2021-03-10 Thread Willy Tarreau
Hi,

On Tue, Mar 09, 2021 at 08:55:10AM +, claudiu.bez...@microchip.com wrote:
> Hi Andreas,
> 
> On 08.03.2021 21:30, Andreas Schwab wrote:
> > EXTERNAL EMAIL: Do not click links or open attachments unless you know the 
> > content is safe
> > 
> > One of the changes to the macb driver between 5.10 and 5.11 has broken
> > the SiFive HiFive Unleashed.  These are the last messages before the
> > system hangs:
> > 
> > [   12.468674] libphy: Fixed MDIO Bus: probed
> > [   12.746518] macb 1009.ethernet: Registered clk switch 
> > 'sifive-gemgxl-mgmt'
> > [   12.753119] macb 1009.ethernet: GEM doesn't support hardware ptp.
> > [   12.760178] libphy: MACB_mii_bus: probed
> > [   12.881792] MACsec IEEE 802.1AE
> > [   12.944426] macb 1009.ethernet eth0: Cadence GEM rev 0x10070109 at 
> > 0x1009 irq 16 (70:b3:d5:92:f1:07)
> > 
> 
> I don't have a SiFive HiFive Unleashed to investigate this. Can you check
> if reverting commits on macb driver b/w 5.10 and 5.11 solves your issues:
> 
> git log --oneline v5.10..v5.11 -- drivers/net/ethernet/cadence/
> 1d0d561ad1d7 net: macb: Correct usage of MACB_CAPS_CLK_HW_CHG flag
> 1d608d2e0d51 Revert "macb: support the two tx descriptors on at91rm9200"
> 700d566e8171 net: macb: add support for sama7g5 emac interface
> ec771de654e4 net: macb: add support for sama7g5 gem interface
> f4de93f03ed8 net: macb: unprepare clocks in case of failure
> 38493da4e6a8 net: macb: add function to disable all macb clocks
> daafa1d33cc9 net: macb: add capability to not set the clock rate
> edac63861db7 net: macb: add userio bits as platform configuration
> 9e6cad531c9d net: macb: Fix passing zero to 'PTR_ERR'
> 0012eeb370f8 net: macb: fix NULL dereference due to no pcs_config method
> e4e143e26ce8 net: macb: add support for high speed interface

In addition, it's worth mentioning that the driver has multiple rx/tx/irq
functions depending on the platforms or chip variants, and that based on
this it should be easy to further reduce this list.

Just my two cents,
Willy


Re: NAND Flash Issue Need Help!

2021-03-05 Thread Willy Tarreau
Hello,

On Fri, Mar 05, 2021 at 06:38:08PM +, Justin Mitchell wrote:
> Issue:
> Intermittent occurrence of failure to program new boards from CM.

No idea what "CM" is here, but that's not relevant here anyway.

> Primary partition mounted OK
> Loading file 'dtb/at91sam9g25ek.dtb' to addr 0x2100 with size 25836 
> (0x64ec)...
> Done
> DTB file loaded OK
> Loading file 'kernel/uImage' to addr 0x2200 with size 2968536 
> (0x002d4bd8)...
> UBIFS error (pid 0): ubifs_read_node: bad node type (255 but expected 1) 
> UBIFS error (pid 0): ubifs_read_node: bad node at LEB 215:0 UBIFS error (pid 
> 0): do_readpage: cannot read page 178 of inode 1244, error -22 Error reading 
> file 'kernel/uImage'
> kernel/uImage not found!

But this is not a Linux issue if its U-boot which is failing to load the
kernel. Are you sure you didn't upgrade u-boot at the same time as your
kernel ?

> If I will do ubifscat kernel/uImage it starts to read the file and then I see 
> this error:
> 
> P?@ ???P?@ ???P?@ ??Q?? ?
(...)
> -- System haltedAttempting division by 0!stack-protector: Kernel stack is 
> corrupted Uncompressing Linux...decompressor returned an error done, booting 
> the kernel.
(...)

That's not an error :-)  It's the contents of the bootstrap code in the
kernel that deals with very early issues such as decompression failures.
You're just dumping your kernel image to the screen here.

(...)
> ubifsmount 
> - mount 'volume-name' volume
> 
> And if I will do the same second time then it starts but getting no uImage 
> error:
> 
> DTB file loaded OK
> Loading file 'kernel/uImage' to addr 0x2200 with size 2968536 
> (0x002d4bd8)...
> UBIFS error (pid 0): ubifs_read_node: bad node type (255 but expected 1) 
> UBIFS error (pid 0): ubifs_read_node: bad node at LEB 215:0 UBIFS error (pid 
> 0): do_readpage: cannot read page 178 of inode 1244, error -22 Error reading 
> file 'kernel/uImage'
> kernel/uImage not found!
> 
> Strange that I need to run same command twice and that uImage also can not be 
> loaded from SystemB

So indeed that has nothing to do with Linux. Maybe your image is much larger
than the previous one and a bug in the u-boot code used to load the image
makes it randomly fail.
 
Hoping this helps,
Willy


Re: 5.10 LTS Kernel: 2 or 6 years?

2021-02-20 Thread Willy Tarreau
On Sat, Feb 20, 2021 at 05:05:14PM +0100, Greg Kroah-Hartman wrote:
> On Sat, Feb 20, 2021 at 01:29:21PM +, Jari Ruusu wrote:
> > On Friday, February 19, 2021 5:23 PM, Jari Ruusu  
> > wrote:
> > > My contribution here is trying to point you guys to right direction.
> > 
> > I have been able to narrow the beginning of the problem to these kernels:
> > 4.14.188 ... 4.14.202
> > 
> > Same "fix" that went info 4.14.y is also bugging 4.19.y kernels.
> 
> Great, any chance you can narrow this down to the commit itself?

What is strnage is that there was no iwlwifi driver change in this
interval. Only iwlegacy (don't know if this one was used instead).

Note, my laptop uses iwlwifi as well. I've met random issues with it
a year ago when I started to use wifi, such as wifi randomly freezing
during audio meetings, and automatically fixing itself after 1 minute.
I also found that a down-up cycle on the interface would fix it. It
happened less than once a day so it was not easy to diagnose, and given
how crappy all wifi chips are, I naturally attributed this to the
hardware. But since I upgraded to 5.4 months ago, I don't remember having
met that issue anymore, so it was likely related to the driver. All I can
say is that 4.19.68 was exhibiting this issue. I can't say for older ones
because I didn't use wifi before.

Just my two cents,
Willy


Re: 5.10 LTS Kernel: 2 or 6 years?

2021-02-18 Thread Willy Tarreau
On Thu, Feb 18, 2021 at 12:16:50PM -0800, Scott Branden wrote:
> On 2021-02-18 10:36 a.m., Greg Kroah-Hartman wrote:
> > On Thu, Feb 18, 2021 at 07:20:50PM +0100, Willy Tarreau wrote:
> >> On Thu, Feb 18, 2021 at 06:53:56PM +0100, Greg Kroah-Hartman wrote:
> >>> On Thu, Feb 18, 2021 at 09:21:13AM -0800, Florian Fainelli wrote:
> >>>> As a company, we are most likely shooting ourselves in the foot by not
> >>>> having a point of coordination with the Linux Foundation and key people
> >>>> like you, Greg and other participants in the stable kernel.
> >>>
> >>> What does the LF have to do with this?
> >>>
> >>> We are here, on the mailing lists, working with everyone.  Just test the
> >>> -rc releases we make and let us know if they work or not for you, it's
> >>> not a lot of "coordination" needed at all.
> >>>
> >>> Otherwise, if no one is saying that they are going to need these for 6
> >>> years and are willing to use it in their project (i.e. and test it),
> >>> there's no need for us to maintain it for that long, right?
> >>
> >> Greg, please remember I expressed I really need them for slightly more than
> >> 3 years (say 3.5-4) :-) I'm fine with helping a bit more as time permits if
> >> this saves me from having to take over these kernels after you, like in the
> >> past, but I cannot engage on the regularity of my availability.
> > 
> > Ok, great!
> > 
> > That's one person/company saying they can help out (along with what CIP
> > has been stating.)
> > 
> > What about others?  Broadcom started this conversation, odd that they
> > don't seem to want to help out :)
> Greg, I'm sorry but I'm not in a position to provide such a commitment.

Are you at least in a position to defend that ? There are necessarily
some people in your company who understand the benefits of using open
source provdided for free by others and who understand that devoting
a few people's time to this task is extremely cheap compared to the
amount of work required by having to do it entirely yourself for a
lower quality.

> My original question arose because the 5.10 kernel is declared as 2 years LTS
> while older LTS kernels are now 6 years.
(...)
> If all LTS kernels were declared as 3.5-4 years as Willy commented this would
> solve a few issues. 6 year LTS kernels would only have a maximum 1 year
> lifespan over the latest declared LTS kernel. Also, many products take a year
> or more to develop, there isn't any life left in an LTS kernel if it is only
> 2 years.

We all have the same problem regarding this but how do you want Greg to
engage into such a task by himself if he's not certain he can count on
others to help ? The few of us having worked on extended kernels know
that there's a limit around 2.5 years beyond which backports become much
harder to perform and to test. Doing it every year would result in 6 LTS
kernels to maintain in addition to the last 1-2 stable ones. That becomes
a huge amount of work! I even think that having one LTS kernel every 2
years but maintained one extra year (e.g. 5 vs 4 in my case) would reduce
the effort.

> After 1-3 years of kernel age the relevant parties that want to invest and
> care about supporting specific kernel versions longer should become apparent
> and could commit to longer support.

But that's exactly what's currently being done. Greg initially commits
to 2 years hoping to get some help to pursue this longer, and this causes
trouble to some of us not being certain upfront whether or not we're choosing
the right kernel. So only the solution I'm seeing is for Greg to know
early who jumps in so that those of us without the power or skill to
entirely maintain a kernel by themselves know early which version to
choose. Quite frankly if we ship an LTS kernel in a product, the least
we can do is to give back a little bit to make sure the situation remains
durable.

As such even if you are not in a position to provide such a commitment,
I'd appreciate it if you would bring these arguments to those who are in
such a position, so that I don't end up as one of the too few ones having
to share a significant part of that task to make sure this valuable kernel
continues to exist.

Thanks,
Willy


Re: 5.10 LTS Kernel: 2 or 6 years?

2021-02-18 Thread Willy Tarreau
On Thu, Feb 18, 2021 at 06:53:56PM +0100, Greg Kroah-Hartman wrote:
> On Thu, Feb 18, 2021 at 09:21:13AM -0800, Florian Fainelli wrote:
> > As a company, we are most likely shooting ourselves in the foot by not
> > having a point of coordination with the Linux Foundation and key people
> > like you, Greg and other participants in the stable kernel.
> 
> What does the LF have to do with this?
> 
> We are here, on the mailing lists, working with everyone.  Just test the
> -rc releases we make and let us know if they work or not for you, it's
> not a lot of "coordination" needed at all.
> 
> Otherwise, if no one is saying that they are going to need these for 6
> years and are willing to use it in their project (i.e. and test it),
> there's no need for us to maintain it for that long, right?

Greg, please remember I expressed I really need them for slightly more than
3 years (say 3.5-4) :-) I'm fine with helping a bit more as time permits if
this saves me from having to take over these kernels after you, like in the
past, but I cannot engage on the regularity of my availability.

Overall I think that a lot of people completely underestimate the amount
of work it requires to maintain stable kernels, and how much it could be
distributed. By having a bunch of users participate a little bit more
(e.g. by sometimes backporting the patches that are essential to them,
by testing what's relevant to their use case), it already offloads a lot
of work. I don't think the extra work requires to be much organized if
there are enough participants to share the efforts.

Regards,
Willy


Re: 5.10 LTS Kernel: 2 or 6 years?

2021-02-18 Thread Willy Tarreau
Hi Florian!

On Thu, Feb 18, 2021 at 09:42:00AM -0800, Florian Fainelli wrote:
> > Other difficulty with the LTS version is the frequency it is updated.  We 
> > would not
> > pickup the changes that frequently to test.  A quarterly, bi-annually, or 
> > when a critical fix
> > is identified would be when we update and perform any meaningful testing 
> > when in maintainence.
> 
> Well you really have the best of both worlds here, you are not forced to
> update your downstream fork of the stable kernel every week, though
> bonus points if you do.
> 
> For instance I try to test all the stable release candidates for 4.9,
> 5.4 and 5.10, and merge the stable tags every week when they show up. We
> do release to our customers every 6 weeks though, so usually they will
> jump several stable releases, and they are fine with that.
> 
> Yes it takes time, but I would rather do that than have to continuously
> respond to questions about is this CVE fixed in kernel X.Y.Z like it
> used to be before we did that. It also helps catch problem faster before
> customers do, the usual benefits of continuous integration/delivery.

Totally agreed! In our use case at haproxy, it's slightly different but
not that much. We're much less sensitive to kernel bugs except for the
network parts and any long-term stability issue. However we're extremely
sensitive to openssl bugs and haproxy bugs. Thus we use them as triggers
to emit a new release and we systematically update the kernel to the
latest one. Our local patches usually apply pretty well on top of that.

We face maybe 1-3 rejects a year which take half an hour of extra manual
work, and roughly one regression every 3 years, essentially caused by
one of our local patches applying to the wrong place due to changes and
not caught by QA tests before being put online. I think that in ~15 years
(we started with 2.4), only a single customer was ever affected by a
regression caused by the kernel, it is so low we could almost laugh about
it. Quite frankly this is unrivaled, and it illustrates the huge benefit
in almost blindly following LTS this way! More quality, less work, and
faster response to critical bugs! For sure there's no "kernel hero" in
our dev team, but who really wants that anyway ?

Cheers,
Willy


Re: 5.10 LTS Kernel: 2 or 6 years?

2021-02-18 Thread Willy Tarreau
On Thu, Feb 18, 2021 at 04:15:11PM +0200, Jari Ruusu wrote:
> Willy Tarreau wrote:
> > The only set of fixes that can be trusted are the "official" stable
> > kernels, because they are the only ones that are approved by the patches
> > authors themselves. Adding more stuff on top of stable kernels is fine
> > (and done at your own risk), but randomly dropping stuff from stable
> > kernels just because you don't think you need that is totally non-sense
> > and must not be done anymore!
> 
> This may be little bit off-topic... but stable kernel.org kernels
> can also bit-rot badly because of "selective" backporting... as in
> anything that does not apply cleanly gets dropped regardless of
> how critical they are.

Sure it will. And the huge difference is that it usually gets quickly
spotted and fixed. For sensitive servers I tend to apply the principle
of not necessarily updating to the latest stable kernel but one or two
versions before it which nobody complained about. And I'm pretty fine
with skipping a significant number of updates (we all do that anyway).

> I will give you one example: Intel WiFi (iwlwifi) on 4.19.y
> kernel.org stable kernels is currently missing many critical
> locking fixes. As a result, that in-tree iwlwifi driver causes
> erratic behavior to random unrelated processes, and has been doing
> so for many months now. My not-so-politically correct opinion is
> that in-tree iwlwifi is completely FUBAR unless someone steps up
> to do professional quality backport of those locking fixes from
> upstream out-of-tree Intel version [1] [2] of the driver.

I see, and it happens with plenty of other drivers or subsystems. Is
it in any way the stable branch's or stable maintainer's fault if
someone doesn't correctly do the backporting job on their driver ? No.
Is it expected that a driver works perfectly from its inclusion ? No.
Is it expected that a driver can always be fixed without a significant
rework that risks more breakage than fixes ? No. Some design limitations
or errors can require so many changes that they're unfixable in place.
I even had to *document* security issues in 2.4 because fixing them was
riskier than keeping them. This happens in any piece of software.

It's always been the case that some older kernels work less well than
some newer ones due to limited features, partially wrong drivers etc,
and getting better drivers is a valid reason for upgrading to a more
recent one. However the older driver ought to continue to be maintained
in a working state for those for whom it works fine.

> For me
> only way to get properly working WiFi on my laptop computer is to
> compile that Intel out-of-tree version. Sad, but true.

That's perfectly fine from my point of view. I've been doing the same
for certain driver (e.g. e100 vs eepro100 15 years ago) and have been
pleased to be able to stop using those out-of-tree versions. This is
also in order to make this possible for those who need to do it that
LTS kernels provide a lot of value: such out-of-tree drivers tend to
take some time to resynchronize with latest updates, and once they're
updated, you can use your machine for quite some time with them.

Obviously if somemone is able to figure the required fixes for the locking
bugs you mentioned above and to submit patches for stable branches, I'm
sure Greg will appreciate! But maybe that's not fixable there and you need
to upgrade. Usually you pick an LTS kernel for a specific hardware. If it
works that's great. But you cannot expect hardware to suddenly start to
work in the middle of a stable kernel. Sometimes it happens (PCI IDs) but
that's basically all and that's not their purpose.

Willy


Re: 5.10 LTS Kernel: 2 or 6 years?

2021-02-18 Thread Willy Tarreau
On Thu, Feb 18, 2021 at 08:43:48AM +0100, Greg Kroah-Hartman wrote:
> On Wed, Feb 17, 2021 at 11:48:21AM -0800, Scott Branden wrote:
> > Other difficulty with the LTS version is the frequency it is updated.

What a stange statement! So basically if fixes come in quickly so that
customers are not exposed too long to well-known issues, it's a difficulty ?
I guess by now every serious OS vendor provides at least weekly fixes, and
at an era where devices are all interconnected, it's really necessary
(unless of course you don't care about your customer's security).

> > We would not
> > pickup the changes that frequently to test.  A quarterly, bi-annually, or 
> > when a critical fix
> > is identified would be when we update and perform any meaningful testing 
> > when in maintainence.
> 
> How are you "identifying" these "critical fixes"?  We fix at least one
> known security issue a week, and probably multitudes of
> unknown-at-this-moment ones.  How are you determining when you need to
> send a new base kernel update off to your customers?  At such long
> intervals it feels like anyone using your kernel releases is woefully
> insecure.

+1! It seems like this dangerous practice will never end :-(

Let me explain a personal experience. When I took over 2.6.32 many years
ago, Greg asked me to adapt to the new maintenance process involving the
patch reviews. At first I feared that it would increase my amount of work.
And it did. But I also discovered how important these reviews were, because
I started to get lots of "don't take this one in this version" and more
importantly "if you merge this you'll need these ones as well". And very
quickly I discovered how bogus the branches I used to maintain before
had been, given the high feedback ratio!

So based on this experience, I can assure anyone doing cherry-picks in
their garage from LTS kernels that they're doing crap and that they must
not distribute these kernels to anyone because THESE KERNELS ARE DANGEROUS.
It's even very easy to introduce vulnerabilities by doing this!

The only set of fixes that can be trusted are the "official" stable
kernels, because they are the only ones that are approved by the patches
authors themselves. Adding more stuff on top of stable kernels is fine
(and done at your own risk), but randomly dropping stuff from stable
kernels just because you don't think you need that is totally non-sense
and must not be done anymore!

Willy


Re: [GIT PULL tip/core/rcu] RCU, LKMM, and KCSAN commits for v5.12

2021-02-13 Thread Willy Tarreau
Hi Paul,

On Fri, Feb 12, 2021 at 07:07:05AM -0800, Paul E. McKenney wrote:
> Thank you, Ingo!  In the future, I will group nolibc with RCU.  But there
> has to be something other than RCU that needs it.  I will take a look. ;-)

All my kernels boot using a "preinit" that is built with nolibc and
integrated into the initramfs. Historically it used to just create /dev
entries and mount the rootfs, nowadays it's used to untar modules and
finish the boot so that I can have a clean separation between a kernel
image and a rootfs. It even allows to perform some minimal debugging as
it includes a minimalistic shell. Do you think something like this could
be of any use in your development sessions ? If so I can discuss this
with you in a separate thread so as not to annoy everyone. Just let me
know :-)

Cheers,
Willy


[tip: core/rcu] tools/nolibc: Make dup2() rely on dup3() when available

2021-02-12 Thread tip-bot2 for Willy Tarreau
The following commit has been merged into the core/rcu branch of tip:

Commit-ID: 79f220e56dc85739aa5462fa8a1abd4a44f002e0
Gitweb:
https://git.kernel.org/tip/79f220e56dc85739aa5462fa8a1abd4a44f002e0
Author:Willy Tarreau 
AuthorDate:Thu, 21 Jan 2021 08:20:24 +01:00
Committer: Paul E. McKenney 
CommitterDate: Thu, 21 Jan 2021 10:06:44 -08:00

tools/nolibc: Make dup2() rely on dup3() when available

A recent boot failure on 5.4-rc3 on arm64 revealed that sys_dup2()
is not available and that only sys_dup3() is implemented.  This commit
detects this and falls back to sys_dup3() when available.  This is a
port of nolibc's upstream commit fd5272ec2c66 to the Linux kernel.

Tested-by: Valentin Schneider 
Tested-by: Mark Rutland  [arm64]
Signed-off-by: Willy Tarreau 
Signed-off-by: Willy Tarreau 
Signed-off-by: Paul E. McKenney 
---
 tools/include/nolibc/nolibc.h | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/tools/include/nolibc/nolibc.h b/tools/include/nolibc/nolibc.h
index 3115c64..5fda4d8 100644
--- a/tools/include/nolibc/nolibc.h
+++ b/tools/include/nolibc/nolibc.h
@@ -1502,10 +1502,22 @@ int sys_dup(int fd)
return my_syscall1(__NR_dup, fd);
 }
 
+#ifdef __NR_dup3
+static __attribute__((unused))
+int sys_dup3(int old, int new, int flags)
+{
+   return my_syscall3(__NR_dup3, old, new, flags);
+}
+#endif
+
 static __attribute__((unused))
 int sys_dup2(int old, int new)
 {
+#ifdef __NR_dup3
+   return my_syscall3(__NR_dup3, old, new, 0);
+#else
return my_syscall2(__NR_dup2, old, new);
+#endif
 }
 
 static __attribute__((unused))
@@ -1876,6 +1888,20 @@ int dup2(int old, int new)
return ret;
 }
 
+#ifdef __NR_dup3
+static __attribute__((unused))
+int dup3(int old, int new, int flags)
+{
+   int ret = sys_dup3(old, new, flags);
+
+   if (ret < 0) {
+   SET_ERRNO(-ret);
+   ret = -1;
+   }
+   return ret;
+}
+#endif
+
 static __attribute__((unused))
 int execve(const char *filename, char *const argv[], char *const envp[])
 {


[tip: core/rcu] tools/nolibc: Make getpgrp() fall back to getpgid(0)

2021-02-12 Thread tip-bot2 for Willy Tarreau
The following commit has been merged into the core/rcu branch of tip:

Commit-ID: c0c7c103756fee25aadfd5c36f7b86e318f9abb4
Gitweb:
https://git.kernel.org/tip/c0c7c103756fee25aadfd5c36f7b86e318f9abb4
Author:Willy Tarreau 
AuthorDate:Thu, 21 Jan 2021 08:20:25 +01:00
Committer: Paul E. McKenney 
CommitterDate: Thu, 21 Jan 2021 10:06:44 -08:00

tools/nolibc: Make getpgrp() fall back to getpgid(0)

The getpgrp() syscall is not implemented on arm64, so this commit instead
uses getpgid(0) when getpgrp() is not available.  This is a port of
nolibc's upstream commit 2379f25073f9 to the Linux kernel.

Fixes: 66b6f755ad45 ("rcutorture: Import a copy of nolibc")
Tested-by: Valentin Schneider 
Tested-by: Mark Rutland  [arm64]
Signed-off-by: Willy Tarreau 
Signed-off-by: Paul E. McKenney 
---
 tools/include/nolibc/nolibc.h | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/tools/include/nolibc/nolibc.h b/tools/include/nolibc/nolibc.h
index 5fda4d8..9209da8 100644
--- a/tools/include/nolibc/nolibc.h
+++ b/tools/include/nolibc/nolibc.h
@@ -1545,9 +1545,15 @@ int sys_getdents64(int fd, struct linux_dirent64 *dirp, 
int count)
 }
 
 static __attribute__((unused))
+pid_t sys_getpgid(pid_t pid)
+{
+   return my_syscall1(__NR_getpgid, pid);
+}
+
+static __attribute__((unused))
 pid_t sys_getpgrp(void)
 {
-   return my_syscall0(__NR_getpgrp);
+   return sys_getpgid(0);
 }
 
 static __attribute__((unused))
@@ -1951,6 +1957,18 @@ int getdents64(int fd, struct linux_dirent64 *dirp, int 
count)
 }
 
 static __attribute__((unused))
+pid_t getpgid(pid_t pid)
+{
+   pid_t ret = sys_getpgid(pid);
+
+   if (ret < 0) {
+   SET_ERRNO(-ret);
+   ret = -1;
+   }
+   return ret;
+}
+
+static __attribute__((unused))
 pid_t getpgrp(void)
 {
pid_t ret = sys_getpgrp();


[tip: core/rcu] tools/nolibc: Get timeval, timespec and timezone from linux/time.h

2021-02-12 Thread tip-bot2 for Willy Tarreau
The following commit has been merged into the core/rcu branch of tip:

Commit-ID: 70ca7aea50a27f03aa7e4cc6ee68940d13cbcd17
Gitweb:
https://git.kernel.org/tip/70ca7aea50a27f03aa7e4cc6ee68940d13cbcd17
Author:Willy Tarreau 
AuthorDate:Thu, 21 Jan 2021 08:20:28 +01:00
Committer: Paul E. McKenney 
CommitterDate: Thu, 21 Jan 2021 10:06:45 -08:00

tools/nolibc: Get timeval, timespec and timezone from linux/time.h

The definitions of timeval(), timespec() and timezone() conflict with
linux/time.h when building, so this commit takes them directly from
linux/time.h. This is a port of nolibc's upstream commit dc45f5426b0c
to the Linux kernel.

Fixes: 66b6f755ad45 ("rcutorture: Import a copy of nolibc")
Tested-by: Valentin Schneider 
Tested-by: Mark Rutland  [arm64]
Signed-off-by: Willy Tarreau 
Signed-off-by: Paul E. McKenney 
---
 tools/include/nolibc/nolibc.h | 19 +--
 1 file changed, 1 insertion(+), 18 deletions(-)

diff --git a/tools/include/nolibc/nolibc.h b/tools/include/nolibc/nolibc.h
index 833693f..611d9d1 100644
--- a/tools/include/nolibc/nolibc.h
+++ b/tools/include/nolibc/nolibc.h
@@ -94,6 +94,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define NOLIBC
 
@@ -152,24 +153,6 @@ struct pollfd {
short int revents;
 };
 
-/* for select() */
-struct timeval {
-   longtv_sec;
-   longtv_usec;
-};
-
-/* for pselect() */
-struct timespec {
-   longtv_sec;
-   longtv_nsec;
-};
-
-/* for gettimeofday() */
-struct timezone {
-   int tz_minuteswest;
-   int tz_dsttime;
-};
-
 /* for getdents64() */
 struct linux_dirent64 {
uint64_t   d_ino;


[tip: core/rcu] tools/nolibc: Emit detailed error for missing alternate syscall number definitions

2021-02-12 Thread tip-bot2 for Willy Tarreau
The following commit has been merged into the core/rcu branch of tip:

Commit-ID: 35635d7fa689492ca9edb1d949f1805f074ecf1a
Gitweb:
https://git.kernel.org/tip/35635d7fa689492ca9edb1d949f1805f074ecf1a
Author:Willy Tarreau 
AuthorDate:Thu, 21 Jan 2021 08:20:30 +01:00
Committer: Paul E. McKenney 
CommitterDate: Thu, 21 Jan 2021 10:06:45 -08:00

tools/nolibc: Emit detailed error for missing alternate syscall number 
definitions

Some syscalls can be implemented from different __NR_* variants. For
example, sys_dup2() can be implemented based on __NR_dup3 or __NR_dup2.
In this case it is useful to mention both alternatives in error messages
when neither are detected. This information will help the user search for
the right one (e.g __NR_dup3) instead of just the fallback (__NR_dup2)
which might not exist on the platform.

This is a port of nolibc's upstream commit a21080d2ba41 to the Linux
kernel.

Suggested-by: Mark Rutland 
Link: https://lore.kernel.org/lkml/20210120145447.GC77728@C02TD0UTHF1T.local/
Tested-by: Valentin Schneider 
Tested-by: Mark Rutland  [arm64]
Signed-off-by: Willy Tarreau 
Signed-off-by: Paul E. McKenney 
---
 tools/include/nolibc/nolibc.h | 52 +-
 1 file changed, 39 insertions(+), 13 deletions(-)

diff --git a/tools/include/nolibc/nolibc.h b/tools/include/nolibc/nolibc.h
index 475d956..618acad 100644
--- a/tools/include/nolibc/nolibc.h
+++ b/tools/include/nolibc/nolibc.h
@@ -1446,8 +1446,10 @@ int sys_chmod(const char *path, mode_t mode)
 {
 #ifdef __NR_fchmodat
return my_syscall4(__NR_fchmodat, AT_FDCWD, path, mode, 0);
-#else
+#elif defined(__NR_chmod)
return my_syscall2(__NR_chmod, path, mode);
+#else
+#error Neither __NR_fchmodat nor __NR_chmod defined, cannot implement 
sys_chmod()
 #endif
 }
 
@@ -1456,8 +1458,10 @@ int sys_chown(const char *path, uid_t owner, gid_t group)
 {
 #ifdef __NR_fchownat
return my_syscall5(__NR_fchownat, AT_FDCWD, path, owner, group, 0);
-#else
+#elif defined(__NR_chown)
return my_syscall3(__NR_chown, path, owner, group);
+#else
+#error Neither __NR_fchownat nor __NR_chown defined, cannot implement 
sys_chown()
 #endif
 }
 
@@ -1492,8 +1496,10 @@ int sys_dup2(int old, int new)
 {
 #ifdef __NR_dup3
return my_syscall3(__NR_dup3, old, new, 0);
-#else
+#elif defined(__NR_dup2)
return my_syscall2(__NR_dup2, old, new);
+#else
+#error Neither __NR_dup3 nor __NR_dup2 defined, cannot implement sys_dup2()
 #endif
 }
 
@@ -1512,8 +1518,10 @@ pid_t sys_fork(void)
 * will not use the rest with no other flag.
 */
return my_syscall5(__NR_clone, SIGCHLD, 0, 0, 0, 0);
-#else
+#elif defined(__NR_fork)
return my_syscall0(__NR_fork);
+#else
+#error Neither __NR_clone nor __NR_fork defined, cannot implement sys_fork()
 #endif
 }
 
@@ -1570,8 +1578,10 @@ int sys_link(const char *old, const char *new)
 {
 #ifdef __NR_linkat
return my_syscall5(__NR_linkat, AT_FDCWD, old, AT_FDCWD, new, 0);
-#else
+#elif defined(__NR_link)
return my_syscall2(__NR_link, old, new);
+#else
+#error Neither __NR_linkat nor __NR_link defined, cannot implement sys_link()
 #endif
 }
 
@@ -1586,8 +1596,10 @@ int sys_mkdir(const char *path, mode_t mode)
 {
 #ifdef __NR_mkdirat
return my_syscall3(__NR_mkdirat, AT_FDCWD, path, mode);
-#else
+#elif defined(__NR_mkdir)
return my_syscall2(__NR_mkdir, path, mode);
+#else
+#error Neither __NR_mkdirat nor __NR_mkdir defined, cannot implement 
sys_mkdir()
 #endif
 }
 
@@ -1596,8 +1608,10 @@ long sys_mknod(const char *path, mode_t mode, dev_t dev)
 {
 #ifdef __NR_mknodat
return my_syscall4(__NR_mknodat, AT_FDCWD, path, mode, dev);
-#else
+#elif defined(__NR_mknod)
return my_syscall3(__NR_mknod, path, mode, dev);
+#else
+#error Neither __NR_mknodat nor __NR_mknod defined, cannot implement 
sys_mknod()
 #endif
 }
 
@@ -1613,8 +1627,10 @@ int sys_open(const char *path, int flags, mode_t mode)
 {
 #ifdef __NR_openat
return my_syscall4(__NR_openat, AT_FDCWD, path, flags, mode);
-#else
+#elif defined(__NR_open)
return my_syscall3(__NR_open, path, flags, mode);
+#else
+#error Neither __NR_openat nor __NR_open defined, cannot implement sys_open()
 #endif
 }
 
@@ -1635,8 +1651,10 @@ int sys_poll(struct pollfd *fds, int nfds, int timeout)
t.tv_nsec = (timeout % 1000) * 100;
}
return my_syscall4(__NR_ppoll, fds, nfds, (timeout >= 0) ?  : NULL, 
NULL);
-#else
+#elif defined(__NR_poll)
return my_syscall3(__NR_poll, fds, nfds, timeout);
+#else
+#error Neither __NR_ppoll nor __NR_poll defined, cannot implement sys_poll()
 #endif
 }
 
@@ -1676,11 +1694,13 @@ int sys_select(int nfds, fd_set *rfds, fd_set *wfds, 
fd_set *efds, struct timeva
t.tv_nsec = timeout->tv_usec * 1000;
}
return my_syscall6(__NR_pselect6, nfds, rfds, wfds, efds, timeout ?  
: NULL, NULL);
-#else
+#elif defined(__NR__newselect) || d

[tip: core/rcu] tools/nolibc: Remove incorrect definitions of __ARCH_WANT_*

2021-02-12 Thread tip-bot2 for Willy Tarreau
The following commit has been merged into the core/rcu branch of tip:

Commit-ID: f65d7117785cb8ab04f1af55909807c7eb9ed30b
Gitweb:
https://git.kernel.org/tip/f65d7117785cb8ab04f1af55909807c7eb9ed30b
Author:Willy Tarreau 
AuthorDate:Thu, 21 Jan 2021 08:20:29 +01:00
Committer: Paul E. McKenney 
CommitterDate: Thu, 21 Jan 2021 10:06:45 -08:00

tools/nolibc: Remove incorrect definitions of __ARCH_WANT_*

The __ARCH_WANT_* definitions were added in order to support aarch64
when it was missing some syscall definitions (including __NR_dup2,
__NR_fork, and __NR_getpgrp), but these __ARCH_WANT_* definitions were
actually wrong because these syscalls do not exist on this platform.
Defining these resulted in exposing invalid definitions, resulting in
failures on aarch64.

The missing syscalls were since implemented based on the newer ones
(__NR_dup3,  __NR_clone, __NR_getpgid) so these incorrect __ARCH_WANT_*
definitions are no longer needed.

Thanks to Mark Rutland for spotting this incorrect analysis and
explaining why it was wrong.

This is a port of nolibc's upstream commit 00b1b0d9b2a4 to the Linux
kernel.

Reported-by: Mark Rutland 
Link: https://lore.kernel.org/lkml/20210119153147.GA5083@paulmck-ThinkPad-P72
Tested-by: Valentin Schneider 
Tested-by: Mark Rutland  [arm64]
Signed-off-by: Willy Tarreau 
Signed-off-by: Paul E. McKenney 
---
 tools/include/nolibc/nolibc.h | 8 
 1 file changed, 8 deletions(-)

diff --git a/tools/include/nolibc/nolibc.h b/tools/include/nolibc/nolibc.h
index 611d9d1..475d956 100644
--- a/tools/include/nolibc/nolibc.h
+++ b/tools/include/nolibc/nolibc.h
@@ -81,14 +81,6 @@
  *
  */
 
-/* Some archs (at least aarch64) don't expose the regular syscalls anymore by
- * default, either because they have an "_at" replacement, or because there are
- * more modern alternatives. For now we'd rather still use them.
- */
-#define __ARCH_WANT_SYSCALL_NO_AT
-#define __ARCH_WANT_SYSCALL_NO_FLAGS
-#define __ARCH_WANT_SYSCALL_DEPRECATED
-
 #include 
 #include 
 #include 


[tip: core/rcu] tools/nolibc: Add the definition for dup()

2021-02-12 Thread tip-bot2 for Willy Tarreau
The following commit has been merged into the core/rcu branch of tip:

Commit-ID: c261145abd2461f921ac44ad70c28778dda710f4
Gitweb:
https://git.kernel.org/tip/c261145abd2461f921ac44ad70c28778dda710f4
Author:Willy Tarreau 
AuthorDate:Thu, 21 Jan 2021 08:20:23 +01:00
Committer: Paul E. McKenney 
CommitterDate: Thu, 21 Jan 2021 10:06:44 -08:00

tools/nolibc: Add the definition for dup()

This commit adds the dup() function, which was omitted when sys_dup()
was defined.  This is a port of nolibc's upstream commit 47cc42a79c92
to the Linux kernel.

Fixes: 66b6f755ad45 ("rcutorture: Import a copy of nolibc")
Tested-by: Valentin Schneider 
Tested-by: Mark Rutland  [arm64]
Signed-off-by: Willy Tarreau 
Signed-off-by: Paul E. McKenney 
---
 tools/include/nolibc/nolibc.h | 12 
 1 file changed, 12 insertions(+)

diff --git a/tools/include/nolibc/nolibc.h b/tools/include/nolibc/nolibc.h
index e61d36c..3115c64 100644
--- a/tools/include/nolibc/nolibc.h
+++ b/tools/include/nolibc/nolibc.h
@@ -1853,6 +1853,18 @@ int close(int fd)
 }
 
 static __attribute__((unused))
+int dup(int fd)
+{
+   int ret = sys_dup(fd);
+
+   if (ret < 0) {
+   SET_ERRNO(-ret);
+   ret = -1;
+   }
+   return ret;
+}
+
+static __attribute__((unused))
 int dup2(int old, int new)
 {
int ret = sys_dup2(old, new);


[tip: core/rcu] tools/nolibc: Implement fork() based on clone()

2021-02-12 Thread tip-bot2 for Willy Tarreau
The following commit has been merged into the core/rcu branch of tip:

Commit-ID: be60ca41fbaa93bc8f92b24e34d8cc62af41300d
Gitweb:
https://git.kernel.org/tip/be60ca41fbaa93bc8f92b24e34d8cc62af41300d
Author:Willy Tarreau 
AuthorDate:Thu, 21 Jan 2021 08:20:26 +01:00
Committer: Paul E. McKenney 
CommitterDate: Thu, 21 Jan 2021 10:06:44 -08:00

tools/nolibc: Implement fork() based on clone()

Some archs such as arm64 do not have fork() and have to use clone()
instead.  This commit therefore makes fork() use clone() when
available. This requires including signal.h to get the definition of
SIGCHLD.  This is a port of nolibc's upstream commit d2dc42fd6149 to
the Linux kernel.

Fixes: 66b6f755ad45 ("rcutorture: Import a copy of nolibc")
Tested-by: Valentin Schneider 
Tested-by: Mark Rutland  [arm64]
Signed-off-by: Willy Tarreau 
Signed-off-by: Paul E. McKenney 
---
 tools/include/nolibc/nolibc.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/tools/include/nolibc/nolibc.h b/tools/include/nolibc/nolibc.h
index 9209da8..fdd5524 100644
--- a/tools/include/nolibc/nolibc.h
+++ b/tools/include/nolibc/nolibc.h
@@ -271,6 +271,8 @@ struct stat {
 #define WEXITSTATUS(status)   (((status) & 0xff00) >> 8)
 #define WIFEXITED(status) (((status) & 0x7f) == 0)
 
+/* for SIGCHLD */
+#include 
 
 /* Below comes the architecture-specific code. For each architecture, we have
  * the syscall declarations and the _start code definition. This is the only
@@ -1529,7 +1531,15 @@ int sys_execve(const char *filename, char *const argv[], 
char *const envp[])
 static __attribute__((unused))
 pid_t sys_fork(void)
 {
+#ifdef __NR_clone
+   /* note: some archs only have clone() and not fork(). Different archs
+* have a different API, but most archs have the flags on first arg and
+* will not use the rest with no other flag.
+*/
+   return my_syscall5(__NR_clone, SIGCHLD, 0, 0, 0, 0);
+#else
return my_syscall0(__NR_fork);
+#endif
 }
 
 static __attribute__((unused))


[tip: core/rcu] tools/nolibc: Implement poll() based on ppoll()

2021-02-12 Thread tip-bot2 for Willy Tarreau
The following commit has been merged into the core/rcu branch of tip:

Commit-ID: 5b1c827ca3b349801e2faff4185118cfa74f94c6
Gitweb:
https://git.kernel.org/tip/5b1c827ca3b349801e2faff4185118cfa74f94c6
Author:Willy Tarreau 
AuthorDate:Thu, 21 Jan 2021 08:20:27 +01:00
Committer: Paul E. McKenney 
CommitterDate: Thu, 21 Jan 2021 10:06:44 -08:00

tools/nolibc: Implement poll() based on ppoll()

Some architectures like arm64 do not implement poll() and have to use
ppoll() instead. This commit therefore makes poll() use ppoll() when
available. This is a port of nolibc's upstream commit 800f75c13ede to
the Linux kernel.

Fixes: 66b6f755ad45 ("rcutorture: Import a copy of nolibc")
Tested-by: Valentin Schneider 
Tested-by: Mark Rutland  [arm64]
Signed-off-by: Willy Tarreau 
Signed-off-by: Paul E. McKenney 
---
 tools/include/nolibc/nolibc.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/tools/include/nolibc/nolibc.h b/tools/include/nolibc/nolibc.h
index fdd5524..833693f 100644
--- a/tools/include/nolibc/nolibc.h
+++ b/tools/include/nolibc/nolibc.h
@@ -1652,7 +1652,17 @@ int sys_pivot_root(const char *new, const char *old)
 static __attribute__((unused))
 int sys_poll(struct pollfd *fds, int nfds, int timeout)
 {
+#if defined(__NR_ppoll)
+   struct timespec t;
+
+   if (timeout >= 0) {
+   t.tv_sec  = timeout / 1000;
+   t.tv_nsec = (timeout % 1000) * 100;
+   }
+   return my_syscall4(__NR_ppoll, fds, nfds, (timeout >= 0) ?  : NULL, 
NULL);
+#else
return my_syscall3(__NR_poll, fds, nfds, timeout);
+#endif
 }
 
 static __attribute__((unused))


[tip: core/rcu] tools/nolibc: Fix position of -lgcc in the documented example

2021-02-12 Thread tip-bot2 for Willy Tarreau
The following commit has been merged into the core/rcu branch of tip:

Commit-ID: 3c6ce7a5363723a05bfe3ee03a8d4a9b66841ae4
Gitweb:
https://git.kernel.org/tip/3c6ce7a5363723a05bfe3ee03a8d4a9b66841ae4
Author:Willy Tarreau 
AuthorDate:Thu, 21 Jan 2021 08:20:31 +01:00
Committer: Paul E. McKenney 
CommitterDate: Thu, 21 Jan 2021 10:06:45 -08:00

tools/nolibc: Fix position of -lgcc in the documented example

The documentation header in the nolibc.h file provides an example command
line, but it places the -lgcc argument before the source files, which
can fail with libgcc.a (e.g. on ARM when uidiv is needed). This commit
therefore moves the -lgcc to the end of the command line, hopefully
before this example leaks into makefiles.  This is a port of nolibc's
upstream commit b5e282089223 to the Linux kernel.

Fixes: 66b6f755ad45 ("rcutorture: Import a copy of nolibc")
Tested-by: Valentin Schneider 
Tested-by: Mark Rutland  [arm64]
Signed-off-by: Willy Tarreau 
Signed-off-by: Paul E. McKenney 
---
 tools/include/nolibc/nolibc.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/include/nolibc/nolibc.h b/tools/include/nolibc/nolibc.h
index 618acad..8b7a983 100644
--- a/tools/include/nolibc/nolibc.h
+++ b/tools/include/nolibc/nolibc.h
@@ -71,7 +71,7 @@
  *
  * A simple static executable may be built this way :
  *  $ gcc -fno-asynchronous-unwind-tables -fno-ident -s -Os -nostdlib \
- *-static -include nolibc.h -lgcc -o hello hello.c
+ *-static -include nolibc.h -o hello hello.c -lgcc
  *
  * A very useful calling convention table may be found here :
  *  http://man7.org/linux/man-pages/man2/syscall.2.html


[tip: core/rcu] tools/rcutorture: Fix position of -lgcc in mkinitrd.sh

2021-02-12 Thread tip-bot2 for Willy Tarreau
The following commit has been merged into the core/rcu branch of tip:

Commit-ID: 26cec81415b1b2a2e8e36ef0b24cf5f26467aa61
Gitweb:
https://git.kernel.org/tip/26cec81415b1b2a2e8e36ef0b24cf5f26467aa61
Author:Willy Tarreau 
AuthorDate:Thu, 21 Jan 2021 08:48:08 +01:00
Committer: Paul E. McKenney 
CommitterDate: Thu, 21 Jan 2021 10:06:45 -08:00

tools/rcutorture: Fix position of -lgcc in mkinitrd.sh

The -lgcc command-line argument is placed poorly in the build options,
which can result in build failures, for exapmle, on ARM when uidiv()
is required.  This commit therefore places the -lgcc argument after the
source files.

Fixes: b94ec36896da ("rcutorture: Make use of nolibc when available")
Tested-by: Valentin Schneider 
Tested-by: Mark Rutland  [arm64]
Signed-off-by: Willy Tarreau 
Signed-off-by: Paul E. McKenney 
---
 tools/testing/selftests/rcutorture/bin/mkinitrd.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/rcutorture/bin/mkinitrd.sh 
b/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
index 38e424d..70d62fd 100755
--- a/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
+++ b/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
@@ -70,7 +70,7 @@ if echo -e "#if 
__x86_64__||__i386__||__i486__||__i586__||__i686__" \
# architecture supported by nolibc
 ${CROSS_COMPILE}gcc -fno-asynchronous-unwind-tables -fno-ident \
-nostdlib -include ../../../../include/nolibc/nolibc.h \
-   -lgcc -s -static -Os -o init init.c
+   -s -static -Os -o init init.c -lgcc
 else
${CROSS_COMPILE}gcc -s -static -Os -o init init.c
 fi


Re: [PATCH] I was wondering why I can't set the resolution to 2560x1080, while in windows 7 I can without a problem. I looked at the radeon driver code and found it doesn't support this resolution. So

2021-02-07 Thread Willy Tarreau
Hello,

On Sun, Feb 07, 2021 at 10:46:04AM +0100, Marcin Raszka wrote:
> ---
>  drivers/gpu/drm/radeon/radeon_benchmark.c  |  5 ++--
>  drivers/gpu/drm/radeon/radeon_connectors.c | 30 ++
>  drivers/gpu/drm/radeon/radeon_drv.c|  5 
>  drivers/gpu/drm/radeon/radeon_encoders.c   |  6 +++--
>  4 files changed, 32 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/gpu/drm/radeon/radeon_benchmark.c 
> b/drivers/gpu/drm/radeon/radeon_benchmark.c
(...)

Please have a look at Documentation/process/submitting-patches.rst to
see how to reformat your patch so that it contains a descriptive commit
message.

Thanks,
Willy


Re: Linux 4.4.256

2021-02-06 Thread Willy Tarreau
On Sat, Feb 06, 2021 at 02:19:57PM +0100, Greg Kroah-Hartman wrote:
> On Sat, Feb 06, 2021 at 02:11:13PM +0100, Willy Tarreau wrote:
> > On Sat, Feb 06, 2021 at 02:00:27PM +0100, Greg Kroah-Hartman wrote:
> > > I think Sasha's patch here:
> > >   https://lore.kernel.org/r/20210205174702.1904681-1-sas...@kernel.org
> > > is looking like the solution.
> > 
> > It might cause trouble to those forcing SUBLEVEL to a given version such
> > as .0 to avoid exposing the exact stable version. I guess we should
> > instead try to integrate a test on the value itself and cap it at 255.
> 
> That's the main goal of the upstream submission that checks the value
> before capping it:
>   https://lore.kernel.org/r/20210206035033.2036180-2-sas...@kernel.org
>
> > Something like this looks more robust to me, it will use SUBLEVEL for
> > values 0 to 255 and 255 for any larger value:
> > 
> > -   expr $(VERSION) \* 65536 + 0$(PATCHLEVEL) \* 256 + 0$(SUBLEVEL)); \
> > +   expr $(VERSION) \* 65536 + 0$(PATCHLEVEL) \* 256 + 255 \* (0$(SUBLEVEL) 
> > > 255) + 0$(SUBLEVEL) * (0$(SUBLEVEL \<= 255)); \
> 
> I think you just rewrote the above linked patch :)

Ah I missed it, indeed! Sorry for the noise :-)

Willy


Re: Linux 4.4.256

2021-02-06 Thread Willy Tarreau
On Sat, Feb 06, 2021 at 02:11:13PM +0100, Willy Tarreau wrote:
> Something like this looks more robust to me, it will use SUBLEVEL for
> values 0 to 255 and 255 for any larger value:
> 
> - expr $(VERSION) \* 65536 + 0$(PATCHLEVEL) \* 256 + 0$(SUBLEVEL)); \
> + expr $(VERSION) \* 65536 + 0$(PATCHLEVEL) \* 256 + 255 \* (0$(SUBLEVEL) 
> > 255) + 0$(SUBLEVEL) * (0$(SUBLEVEL \<= 255)); \

Bah, I obviously missed a backslash above and forgot spaces around parens.
Here's a tested version:

diff --git a/Makefile b/Makefile
index 7d86ad6ad36c..9b91b8815b40 100644
--- a/Makefile
+++ b/Makefile
@@ -1252,7 +1252,7 @@ endef
 
 define filechk_version.h
echo \#define LINUX_VERSION_CODE $(shell \
-   expr $(VERSION) \* 65536 + 0$(PATCHLEVEL) \* 256 + 0$(SUBLEVEL)); \
+   expr $(VERSION) \* 65536 + 0$(PATCHLEVEL) \* 256 + 255 \* \( 
0$(SUBLEVEL) \> 255 \) + 0$(SUBLEVEL) \* \( 0$(SUBLEVEL) \<= 255 \) ); \
echo '#define KERNEL_VERSION(a,b,c) (((a) << 16) + ((b) << 8) + (c))'
 endef
 
Willy


Re: Linux 4.4.256

2021-02-06 Thread Willy Tarreau
On Sat, Feb 06, 2021 at 02:00:27PM +0100, Greg Kroah-Hartman wrote:
> I think Sasha's patch here:
>   https://lore.kernel.org/r/20210205174702.1904681-1-sas...@kernel.org
> is looking like the solution.

It might cause trouble to those forcing SUBLEVEL to a given version such
as .0 to avoid exposing the exact stable version. I guess we should
instead try to integrate a test on the value itself and cap it at 255.

Something like this looks more robust to me, it will use SUBLEVEL for
values 0 to 255 and 255 for any larger value:

-   expr $(VERSION) \* 65536 + 0$(PATCHLEVEL) \* 256 + 0$(SUBLEVEL)); \
+   expr $(VERSION) \* 65536 + 0$(PATCHLEVEL) \* 256 + 255 \* (0$(SUBLEVEL) 
> 255) + 0$(SUBLEVEL) * (0$(SUBLEVEL \<= 255)); \

Willy


Re: Alternative compilers to GCC/Clang

2021-02-02 Thread Willy Tarreau
On Tue, Feb 02, 2021 at 10:20:48PM +0100, Borislav Petkov wrote:
> It would be good to start forward-porting and integrating some of the
> fixes and even extend tcc to handle some of the gnuisms we're using in
> the kernel so that we can build the kernel with it too.

I agree. And the team is responsive and shows great consideration for
patches.

> I can imagine having CONFIG_TCC - as long as that doesn't get too
> intrusive and get in the way of things - and those who wanna build the
> kernel with it, can enable it. For example...

I like this idea. It's way better than having to implement everything
at once or degrade some code just to make it build. It could be solved
at config time by automatically excluding some features.

It should also be less of a hassle than dealing with many gcc versions
because if we see it as a development speed up tool we can easily accept
that we occasionaly break compatibility with older of its versions and
that those who want to use it just rebuild the latest one (it's trivial
and fast, basically "make" and you're done, not the typical toolchain
experience). You don't care if it doesn't work for one week, you're not
supposed to ship any form of official code built with it anyway. It's
just an aid, and a nice one.

Willy


Re: Alternative compilers to GCC/Clang

2021-02-02 Thread Willy Tarreau
On Tue, Feb 02, 2021 at 09:19:20PM +0100, Borislav Petkov wrote:
> On Tue, Feb 02, 2021 at 11:11:32AM -0800, Amy Parker wrote:
> > On Tue, Feb 2, 2021 at 8:26 AM Amy Parker  wrote:
> > > > It compiles extremely fast, implements some subsets of gcc (a few
> > > > attributes for example), but is far from being able to compile a kernel
> > >
> > > Well, we'll see what I can do with that. :)
> > 
> > Well, just installed it and tried building the kernel. Fails every file. :)
> > 
> > It's definitely something to work towards - but I don't know if kernel
> > advancements requiring newer GCC versions will go slow enough to allow
> > TCC improvements to arise. This isn't just something like with Clang
> > where a few tweaks to files and to Clang itself did the trick.
> 
> Maybe this'll help you find something to do:
> 
> https://www.youtube.com/watch?v=iU0Z0vBKrtQ
> 
> Yes, it would be lovely to be able to compile the kernel with tcc but it
> is not going to be trivial.

In any case there will always be numerous limitations, but at least being
able to perform the basic build check with limited options could save quite
some time to many developers. Using gcc once the tcc-based "typo check"
passes would already be a nice start. Getting the kernel to boot would
indeed be a huge step forward! In haproxy we can only build with threading
disabled and it works slowly but sufficiently for basic tests and printf-
based debugging. It's convenient for bisecting certain bugs.

Thanks for the video Boris, I wasn't aware of it, definitely interesting!

Willy


Re: Alternative compilers to GCC/Clang

2021-02-01 Thread Willy Tarreau
Hi Amy,

On Mon, Feb 01, 2021 at 03:31:49PM -0800, Amy Parker wrote:
> Hello! My name's Amy. I'm really impressed by the work done to make
> Clang (and the LLVM toolchain overall) able to compile the kernel.
> Figured I might as well donate my monkey hours to helping make it run
> on other compilers as well. I haven't been able to find any that use
> the same arguments structure as GCC and Clang (read: you can pass it
> in as CC=compilername in your $MAKEOPTS). Any compilers along that
> route anyone here has worked with that I could work with?

If you're interested, you should have a look at TCC (tiny CC) :

 https://repo.or.cz/tinycc.git

It compiles extremely fast, implements some subsets of gcc (a few
attributes for example), but is far from being able to compile a kernel
(at least last time I checked). Its speed makes it very convenient for
development. I made some efforts to make haproxy support it (and provided
some fixes to tcc) as it compiles the whole project in 0.5 second instead
of ~10 seconds with a modern gcc. It could probably compile a kernel in
15-20 seconds if properly supported, and this could be particularly handy
for development and testing.

Regards,
Willy


Re: 5.10 LTS Kernel: 2 or 6 years?

2021-01-26 Thread Willy Tarreau
Hi Greg,

On Tue, Jan 26, 2021 at 07:51:18PM +0100, Greg Kroah-Hartman wrote:
> > > Ok, seriously, this happens every year, and every year we go through the
> > > same thing, it's not like this is somehow new, right?
> > No, but why do we need to keep playing the same game every year now.
> 
> Because, 5.4 almost did not become "6 years" of support from me.  That
> was because in the beginning, no one said they were going to use it in
> their devices and offer me help in testing and backporting.  Only when I
> knew for sure that we had people helping this out did I change the date
> on kernel.org.
> 
> So far the jury is still out for 5.10, are you willing to help with
> this?  If not, why are you willing to hope that others are going to do
> your work for you?  I am talking to some companies, but am not willing
> to commit to anything in public just yet, because no one has committed
> to me yet.

This is interesting, because I used to extend LTS kernels after you dropped
them exactly for the reason I needed them. With 6 months of porting+testing,
~3 years of major version life in field, and roughly 6 extra months covering
late starts or minor extensions to help match a deadline, I figured that 4
years was the minimum we needed for our products. As such, 6 years add some
comfort, but as you probably remember I once got scared not knowing if 4.19
was going to be extended or not as we were having plenty in field and I did
not plan to get back to that maintenance myself. These days, the 6 years
duration allows us to skip one upgrade if we're too late, but we still
prefer to upgrade every year. It's also true that before engaging in that
direction before the official statement was on the site, each time I
preferred to ask you how you felt about it, and I'm sure most major players
decide after checking with you as well. For sure knowing better early would
be much better but I understand your concerns about doing that job for free
and for no reason.

I initially got the impression that the extra resources you got were enough
to help you and to keep me away from annoying you again with pushing my late
branches, but it's certain that free time cannot extend forever. And You,
Ben, Sasha and me definitely know how painful it is to backport past 2.5-3
years, especially with security stuff that everyone expects but often is
the riskiest. While most of my time is spent in userland these days, I'm
still willing to help as I can if that saves from from having to takeover
an old kernel again on my week-ends.

One of my impression, which might or might not be shared, is that the
duration is more important than the frequency, which means that having
one extended kernel every two years would bring much more value than
maintaining all of them only 3 years, and would equally result in cutting
the effort in half: with 6 years, you still have 5 years if you upgrade
every two years.

> What would you do if you were in my situation?

I'd continue to ask for a shared effort for extended support, which
everyone benefits from.

Maybe one possibility would be to start gauging upfront around september
whether or not the end-of-year's LTS should be an extended LTS or not,
and if so, what's needed and who's willing to particpate. I suspect that
numerous companies have available resources to help but are not even
aware that they could help, and they're seeing something which works
extremely well so there's nothing to try to improve. And if nobody
provides resources, it means there's not enough interest to create yet
another extra LTS kernel so better save the effort.

Just my two cents,
Willy


Re: [PATCH 0/9] tools/nolibc: fix build issues on aarch64 after unistd cleanup

2021-01-21 Thread Willy Tarreau
On Thu, Jan 21, 2021 at 11:54:32AM -0800, Paul E. McKenney wrote:
> > > It would be great if this could be applied soon so that it's possible to
> > > use the rcutorture scripts without applying local hacks.
> > 
> > Makes sense. I was wondering, should we mark them for stable ? I don't
> > know if anyone relies on these tests to validate stable kernels in
> > fact.
> 
> I added Fixes tags that should make this happen, and they are now visible
> at -rcu branch "dev".  Could you please check them for me?

I've just rerun all previous tests from my history and all of them are
OK. Please note however that I only did the manual build test, not through
the whole kvm.sh script, but a diff shows that the involved files are byte
for byte identical to those that Valentin and Mark tested, so for me that's
OK as well.

By the way, thank you for having completed the commit messages, I hope you
didn't spend too much time on this.

Cheers,
Willy


Re: [PATCH 0/9] tools/nolibc: fix build issues on aarch64 after unistd cleanup

2021-01-21 Thread Willy Tarreau
On Thu, Jan 21, 2021 at 11:11:17AM +, Mark Rutland wrote:
> So FWIW:
> 
> Tested-by: Mark Rutland  [arm64]

Perfect, thanks! Paul, may I let you copy-paste the tested-by yourself ?
If you prefer I'm fine with resending a series to you, I just don't want
to needlessly spam you :-)

> It would be great if this could be applied soon so that it's possible to
> use the rcutorture scripts without applying local hacks.

Makes sense. I was wondering, should we mark them for stable ? I don't
know if anyone relies on these tests to validate stable kernels in
fact.

> Willy, thanks for sorting this out, especially so quickly!

You're welcome, and thanks to you for the detailed report and explanations.

Willy


Re: [PATCH 0/9] tools/nolibc: fix build issues on aarch64 after unistd cleanup

2021-01-21 Thread Willy Tarreau
On Thu, Jan 21, 2021 at 11:05:48AM +, Valentin Schneider wrote:
> This lets me run the following invocation without a hitch:
> 
> tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 10 
> --configs "4*SRCU-P" --trust-make
> 
> where before I would get some errors building the initrd due to missing
> __NR_foo.
> 
> Tested-by: Valentin Schneider 

Thank you Valentin! Much appreciated!

Willy


[PATCH 10/9] tools/rcutorture: fix position of -lgcc in mkinitrd.sh

2021-01-20 Thread Willy Tarreau
I placed -lgcc poorly in the build options, resulting in possible build
failures that are typically encountered on ARM when uidiv is required;
-lgcc must be placed after the source files.

Signed-off-by: Willy Tarreau 
---

Sorry for this last one, I figured after my last documentation fix that
the incorrect command line did indeed slip into one command line that is
in mkinitrd.sh. It would break ARM builds if a divide is required by the
init code.

 tools/testing/selftests/rcutorture/bin/mkinitrd.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/rcutorture/bin/mkinitrd.sh 
b/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
index 38e424d..70d62fd 100755
--- a/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
+++ b/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
@@ -70,7 +70,7 @@ if echo -e "#if 
__x86_64__||__i386__||__i486__||__i586__||__i686__" \
# architecture supported by nolibc
 ${CROSS_COMPILE}gcc -fno-asynchronous-unwind-tables -fno-ident \
-nostdlib -include ../../../../include/nolibc/nolibc.h \
-   -lgcc -s -static -Os -o init init.c
+   -s -static -Os -o init init.c -lgcc
 else
${CROSS_COMPILE}gcc -s -static -Os -o init init.c
 fi
-- 
2.9.0



[PATCH 7/9] tools/nolibc: remove incorrect definitions of __ARCH_WANT_*

2021-01-20 Thread Willy Tarreau
These definitions were added with support for aarch64 when some
syscall definitions were missing (__NR_dup2, __NR_fork,
__NR_getpgrp, etc), but these were actually wrong because these
syscalls do not exist on this platform, and definiting these
resulted in exposing invalid definitions.

The missing syscalls were since implemented based on the newer ones
(__NR_dup3,  __NR_clone, __NR_getpgid) so these incorrect defintions
are definitely not needed anymore.

Thanks to Mark Rutland for spotting this incorrect analysis and
explaining why it was wrong.

[This is nolibc's upstream commit 00b1b0d9b2a4]

Reported-by: Mark Rutland 
Link: https://lore.kernel.org/lkml/20210119153147.GA5083@paulmck-ThinkPad-P72
Cc: "Paul E. McKenney" 
Cc: valentin.schnei...@arm.com
Signed-off-by: Willy Tarreau 
---
 tools/include/nolibc/nolibc.h | 8 
 1 file changed, 8 deletions(-)

diff --git a/tools/include/nolibc/nolibc.h b/tools/include/nolibc/nolibc.h
index 611d9d15899d..475d956ed1d6 100644
--- a/tools/include/nolibc/nolibc.h
+++ b/tools/include/nolibc/nolibc.h
@@ -81,14 +81,6 @@
  *
  */
 
-/* Some archs (at least aarch64) don't expose the regular syscalls anymore by
- * default, either because they have an "_at" replacement, or because there are
- * more modern alternatives. For now we'd rather still use them.
- */
-#define __ARCH_WANT_SYSCALL_NO_AT
-#define __ARCH_WANT_SYSCALL_NO_FLAGS
-#define __ARCH_WANT_SYSCALL_DEPRECATED
-
 #include 
 #include 
 #include 
-- 
2.28.0



[PATCH 3/9] tools/nolibc: make getpgrp() fall back to getpgid(0)

2021-01-20 Thread Willy Tarreau
getpgrp() is not implemented on arm64, getpgid(0) must be used
instead.
[This is nolibc's upstream commit 2379f25073f9]

Signed-off-by: Willy Tarreau 
---
 tools/include/nolibc/nolibc.h | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/tools/include/nolibc/nolibc.h b/tools/include/nolibc/nolibc.h
index 5fda4d844054..9209da89044a 100644
--- a/tools/include/nolibc/nolibc.h
+++ b/tools/include/nolibc/nolibc.h
@@ -1544,10 +1544,16 @@ int sys_getdents64(int fd, struct linux_dirent64 *dirp, 
int count)
return my_syscall3(__NR_getdents64, fd, dirp, count);
 }
 
+static __attribute__((unused))
+pid_t sys_getpgid(pid_t pid)
+{
+   return my_syscall1(__NR_getpgid, pid);
+}
+
 static __attribute__((unused))
 pid_t sys_getpgrp(void)
 {
-   return my_syscall0(__NR_getpgrp);
+   return sys_getpgid(0);
 }
 
 static __attribute__((unused))
@@ -1950,6 +1956,18 @@ int getdents64(int fd, struct linux_dirent64 *dirp, int 
count)
return ret;
 }
 
+static __attribute__((unused))
+pid_t getpgid(pid_t pid)
+{
+   pid_t ret = sys_getpgid(pid);
+
+   if (ret < 0) {
+   SET_ERRNO(-ret);
+   ret = -1;
+   }
+   return ret;
+}
+
 static __attribute__((unused))
 pid_t getpgrp(void)
 {
-- 
2.28.0



[PATCH 8/9] tools/nolibc: emit a detailed error when missing alternate syscall number definitions

2021-01-20 Thread Willy Tarreau
Some syscalls can be implemented from different __NR_* variants. For
example, sys_dup2() can be implemented based on __NR_dup3 or __NR_dup2.
In this case it's useful to produce an error message when neither are
detected, mentioning the supported alternatives. In case of trouble with
includes, it will help the user search for the right one (e.g __NR_dup3)
instead of just the fallback (__NR_dup2) which might simply not exist on
the platform.
[This is nolibc's upstream commit a21080d2ba41]

Suggested-by: Mark Rutland 
Link: https://lore.kernel.org/lkml/20210120145447.GC77728@C02TD0UTHF1T.local/
Signed-off-by: Willy Tarreau 
---
 tools/include/nolibc/nolibc.h | 52 ++-
 1 file changed, 39 insertions(+), 13 deletions(-)

diff --git a/tools/include/nolibc/nolibc.h b/tools/include/nolibc/nolibc.h
index 475d956ed1d6..618acad6c932 100644
--- a/tools/include/nolibc/nolibc.h
+++ b/tools/include/nolibc/nolibc.h
@@ -1446,8 +1446,10 @@ int sys_chmod(const char *path, mode_t mode)
 {
 #ifdef __NR_fchmodat
return my_syscall4(__NR_fchmodat, AT_FDCWD, path, mode, 0);
-#else
+#elif defined(__NR_chmod)
return my_syscall2(__NR_chmod, path, mode);
+#else
+#error Neither __NR_fchmodat nor __NR_chmod defined, cannot implement 
sys_chmod()
 #endif
 }
 
@@ -1456,8 +1458,10 @@ int sys_chown(const char *path, uid_t owner, gid_t group)
 {
 #ifdef __NR_fchownat
return my_syscall5(__NR_fchownat, AT_FDCWD, path, owner, group, 0);
-#else
+#elif defined(__NR_chown)
return my_syscall3(__NR_chown, path, owner, group);
+#else
+#error Neither __NR_fchownat nor __NR_chown defined, cannot implement 
sys_chown()
 #endif
 }
 
@@ -1492,8 +1496,10 @@ int sys_dup2(int old, int new)
 {
 #ifdef __NR_dup3
return my_syscall3(__NR_dup3, old, new, 0);
-#else
+#elif defined(__NR_dup2)
return my_syscall2(__NR_dup2, old, new);
+#else
+#error Neither __NR_dup3 nor __NR_dup2 defined, cannot implement sys_dup2()
 #endif
 }
 
@@ -1512,8 +1518,10 @@ pid_t sys_fork(void)
 * will not use the rest with no other flag.
 */
return my_syscall5(__NR_clone, SIGCHLD, 0, 0, 0, 0);
-#else
+#elif defined(__NR_fork)
return my_syscall0(__NR_fork);
+#else
+#error Neither __NR_clone nor __NR_fork defined, cannot implement sys_fork()
 #endif
 }
 
@@ -1570,8 +1578,10 @@ int sys_link(const char *old, const char *new)
 {
 #ifdef __NR_linkat
return my_syscall5(__NR_linkat, AT_FDCWD, old, AT_FDCWD, new, 0);
-#else
+#elif defined(__NR_link)
return my_syscall2(__NR_link, old, new);
+#else
+#error Neither __NR_linkat nor __NR_link defined, cannot implement sys_link()
 #endif
 }
 
@@ -1586,8 +1596,10 @@ int sys_mkdir(const char *path, mode_t mode)
 {
 #ifdef __NR_mkdirat
return my_syscall3(__NR_mkdirat, AT_FDCWD, path, mode);
-#else
+#elif defined(__NR_mkdir)
return my_syscall2(__NR_mkdir, path, mode);
+#else
+#error Neither __NR_mkdirat nor __NR_mkdir defined, cannot implement 
sys_mkdir()
 #endif
 }
 
@@ -1596,8 +1608,10 @@ long sys_mknod(const char *path, mode_t mode, dev_t dev)
 {
 #ifdef __NR_mknodat
return my_syscall4(__NR_mknodat, AT_FDCWD, path, mode, dev);
-#else
+#elif defined(__NR_mknod)
return my_syscall3(__NR_mknod, path, mode, dev);
+#else
+#error Neither __NR_mknodat nor __NR_mknod defined, cannot implement 
sys_mknod()
 #endif
 }
 
@@ -1613,8 +1627,10 @@ int sys_open(const char *path, int flags, mode_t mode)
 {
 #ifdef __NR_openat
return my_syscall4(__NR_openat, AT_FDCWD, path, flags, mode);
-#else
+#elif defined(__NR_open)
return my_syscall3(__NR_open, path, flags, mode);
+#else
+#error Neither __NR_openat nor __NR_open defined, cannot implement sys_open()
 #endif
 }
 
@@ -1635,8 +1651,10 @@ int sys_poll(struct pollfd *fds, int nfds, int timeout)
t.tv_nsec = (timeout % 1000) * 100;
}
return my_syscall4(__NR_ppoll, fds, nfds, (timeout >= 0) ?  : NULL, 
NULL);
-#else
+#elif defined(__NR_poll)
return my_syscall3(__NR_poll, fds, nfds, timeout);
+#else
+#error Neither __NR_ppoll nor __NR_poll defined, cannot implement sys_poll()
 #endif
 }
 
@@ -1676,11 +1694,13 @@ int sys_select(int nfds, fd_set *rfds, fd_set *wfds, 
fd_set *efds, struct timeva
t.tv_nsec = timeout->tv_usec * 1000;
}
return my_syscall6(__NR_pselect6, nfds, rfds, wfds, efds, timeout ?  
: NULL, NULL);
-#else
+#elif defined(__NR__newselect) || defined(__NR_select)
 #ifndef __NR__newselect
 #define __NR__newselect __NR_select
 #endif
return my_syscall5(__NR__newselect, nfds, rfds, wfds, efds, timeout);
+#else
+#error None of __NR_select, __NR_pselect6, nor __NR__newselect defined, cannot 
implement sys_select()
 #endif
 }
 
@@ -1705,8 +1725,10 @@ int sys_stat(const char *path, struct stat *buf)
 #ifdef __NR_newfstatat
/* only solution for arm64 */
ret = my_syscall4(__NR_newfstatat, AT_FDCWD, path, , 0);
-#else
+#elif defined(__N

[PATCH 9/9] tools/nolibc: fix position of -lgcc in the documented example

2021-01-20 Thread Willy Tarreau
The documentation header in the file provides a command line example
to explain how to build, but it places -lgcc before the source,
which usually fails with libgcc.a (e.g. on ARM when uidiv is
needed). Let's fix this before it leaks into makefiles.
[This is nolibc's upstream commit b5e282089223]

Signed-off-by: Willy Tarreau 
---
 tools/include/nolibc/nolibc.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/include/nolibc/nolibc.h b/tools/include/nolibc/nolibc.h
index 618acad6c932..8b7a9830dd22 100644
--- a/tools/include/nolibc/nolibc.h
+++ b/tools/include/nolibc/nolibc.h
@@ -71,7 +71,7 @@
  *
  * A simple static executable may be built this way :
  *  $ gcc -fno-asynchronous-unwind-tables -fno-ident -s -Os -nostdlib \
- *-static -include nolibc.h -lgcc -o hello hello.c
+ *-static -include nolibc.h -o hello hello.c -lgcc
  *
  * A very useful calling convention table may be found here :
  *  http://man7.org/linux/man-pages/man2/syscall.2.html
-- 
2.28.0



[PATCH 5/9] tools/nolibc: implement poll() based on ppoll()

2021-01-20 Thread Willy Tarreau
Some architectures like arm64 do not implement poll() and have to
use ppoll() instead.
[This is nolibc's upstream commit 800f75c13ede]

Signed-off-by: Willy Tarreau 
---
 tools/include/nolibc/nolibc.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/tools/include/nolibc/nolibc.h b/tools/include/nolibc/nolibc.h
index fdd5524e0e54..833693faf53c 100644
--- a/tools/include/nolibc/nolibc.h
+++ b/tools/include/nolibc/nolibc.h
@@ -1652,7 +1652,17 @@ int sys_pivot_root(const char *new, const char *old)
 static __attribute__((unused))
 int sys_poll(struct pollfd *fds, int nfds, int timeout)
 {
+#if defined(__NR_ppoll)
+   struct timespec t;
+
+   if (timeout >= 0) {
+   t.tv_sec  = timeout / 1000;
+   t.tv_nsec = (timeout % 1000) * 100;
+   }
+   return my_syscall4(__NR_ppoll, fds, nfds, (timeout >= 0) ?  : NULL, 
NULL);
+#else
return my_syscall3(__NR_poll, fds, nfds, timeout);
+#endif
 }
 
 static __attribute__((unused))
-- 
2.28.0



[PATCH 4/9] tools/nolibc: implement fork() based on clone()

2021-01-20 Thread Willy Tarreau
Some archs such as arm64 do not have fork() and have to use clone()
instead so let's make fork() always use clone() when available. This
requires to include signal.h to get the definition of SIGCHLD.
[This is nolibc's upstream commit d2dc42fd6149]

Signed-off-by: Willy Tarreau 
---
 tools/include/nolibc/nolibc.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/tools/include/nolibc/nolibc.h b/tools/include/nolibc/nolibc.h
index 9209da89044a..fdd5524e0e54 100644
--- a/tools/include/nolibc/nolibc.h
+++ b/tools/include/nolibc/nolibc.h
@@ -271,6 +271,8 @@ struct stat {
 #define WEXITSTATUS(status)   (((status) & 0xff00) >> 8)
 #define WIFEXITED(status) (((status) & 0x7f) == 0)
 
+/* for SIGCHLD */
+#include 
 
 /* Below comes the architecture-specific code. For each architecture, we have
  * the syscall declarations and the _start code definition. This is the only
@@ -1529,7 +1531,15 @@ int sys_execve(const char *filename, char *const argv[], 
char *const envp[])
 static __attribute__((unused))
 pid_t sys_fork(void)
 {
+#ifdef __NR_clone
+   /* note: some archs only have clone() and not fork(). Different archs
+* have a different API, but most archs have the flags on first arg and
+* will not use the rest with no other flag.
+*/
+   return my_syscall5(__NR_clone, SIGCHLD, 0, 0, 0, 0);
+#else
return my_syscall0(__NR_fork);
+#endif
 }
 
 static __attribute__((unused))
-- 
2.28.0



[PATCH 2/9] tools/nolibc: make dup2() rely on dup3() when available

2021-01-20 Thread Willy Tarreau
A recent boot failure on 5.4-rc3 on arm64 revealed that sys_dup2()
is not available and that only sys_dup3() is implemented, thus let's
detect this and fall back to sys_dup3() when available.
[This is nolibc's upstream commit fd5272ec2c66]

Signed-off-by: Willy Tarreau 
---
 tools/include/nolibc/nolibc.h | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/tools/include/nolibc/nolibc.h b/tools/include/nolibc/nolibc.h
index 3115c6467d10..5fda4d844054 100644
--- a/tools/include/nolibc/nolibc.h
+++ b/tools/include/nolibc/nolibc.h
@@ -1502,10 +1502,22 @@ int sys_dup(int fd)
return my_syscall1(__NR_dup, fd);
 }
 
+#ifdef __NR_dup3
+static __attribute__((unused))
+int sys_dup3(int old, int new, int flags)
+{
+   return my_syscall3(__NR_dup3, old, new, flags);
+}
+#endif
+
 static __attribute__((unused))
 int sys_dup2(int old, int new)
 {
+#ifdef __NR_dup3
+   return my_syscall3(__NR_dup3, old, new, 0);
+#else
return my_syscall2(__NR_dup2, old, new);
+#endif
 }
 
 static __attribute__((unused))
@@ -1876,6 +1888,20 @@ int dup2(int old, int new)
return ret;
 }
 
+#ifdef __NR_dup3
+static __attribute__((unused))
+int dup3(int old, int new, int flags)
+{
+   int ret = sys_dup3(old, new, flags);
+
+   if (ret < 0) {
+   SET_ERRNO(-ret);
+   ret = -1;
+   }
+   return ret;
+}
+#endif
+
 static __attribute__((unused))
 int execve(const char *filename, char *const argv[], char *const envp[])
 {
-- 
2.28.0



[PATCH 1/9] tools/nolibc: the definition dup() was missing

2021-01-20 Thread Willy Tarreau
dup() was mistakenly forgotten while sys_dup() was defined.
[This is nolibc's upstream commit 47cc42a79c92]

Signed-off-by: Willy Tarreau 
---
 tools/include/nolibc/nolibc.h | 12 
 1 file changed, 12 insertions(+)

diff --git a/tools/include/nolibc/nolibc.h b/tools/include/nolibc/nolibc.h
index e61d36cd4e50..3115c6467d10 100644
--- a/tools/include/nolibc/nolibc.h
+++ b/tools/include/nolibc/nolibc.h
@@ -1852,6 +1852,18 @@ int close(int fd)
return ret;
 }
 
+static __attribute__((unused))
+int dup(int fd)
+{
+   int ret = sys_dup(fd);
+
+   if (ret < 0) {
+   SET_ERRNO(-ret);
+   ret = -1;
+   }
+   return ret;
+}
+
 static __attribute__((unused))
 int dup2(int old, int new)
 {
-- 
2.28.0



[PATCH 0/9] tools/nolibc: fix build issues on aarch64 after unistd cleanup

2021-01-20 Thread Willy Tarreau
Hi Paul,

as per the recent discussion with Mark, I've updated the nolibc header to
reflect latest upstream which is needed to build on arm64, and I performed
the few cleanups that Mark rightfully suggested.

The following patches were taken from the upstream code and this time I
carefully copied the original commit IDs in hope not to miss such fixes
anymore in the future.

I've build-tested these on x86_64, i586, arm(v5 & v7), arm64, mips and
mipsel, using compilers ranging from gcc 3.4 to gcc 9.3 so I think we're
good for these archs now.

Just let me know if you prefer a pull request, as I can do that as well.

Thanks!
Willy


Willy Tarreau (9):
  tools/nolibc: the definition dup() was missing
  tools/nolibc: make dup2() rely on dup3() when available
  tools/nolibc: make getpgrp() fall back to getpgid(0)
  tools/nolibc: implement fork() based on clone()
  tools/nolibc: implement poll() based on ppoll()
  tools/nolibc: get timeval, timespec and timezone from linux/time.h
  tools/nolibc: remove incorrect definitions of __ARCH_WANT_*
  tools/nolibc: emit a detailed error when missing alternate syscall
number definitions
  tools/nolibc: fix position of -lgcc in the documented example

 tools/include/nolibc/nolibc.h | 153 +-
 1 file changed, 115 insertions(+), 38 deletions(-)

-- 
2.28.0



[PATCH 6/9] tools/nolibc: get timeval, timespec and timezone from linux/time.h

2021-01-20 Thread Willy Tarreau
These ones conflict with linux/time.h when building, so better take
them from there directly.
[This is nolibc's upstream commit dc45f5426b0c]

Signed-off-by: Willy Tarreau 
---
 tools/include/nolibc/nolibc.h | 19 +--
 1 file changed, 1 insertion(+), 18 deletions(-)

diff --git a/tools/include/nolibc/nolibc.h b/tools/include/nolibc/nolibc.h
index 833693faf53c..611d9d15899d 100644
--- a/tools/include/nolibc/nolibc.h
+++ b/tools/include/nolibc/nolibc.h
@@ -94,6 +94,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define NOLIBC
 
@@ -152,24 +153,6 @@ struct pollfd {
short int revents;
 };
 
-/* for select() */
-struct timeval {
-   longtv_sec;
-   longtv_usec;
-};
-
-/* for pselect() */
-struct timespec {
-   longtv_sec;
-   longtv_nsec;
-};
-
-/* for gettimeofday() */
-struct timezone {
-   int tz_minuteswest;
-   int tz_dsttime;
-};
-
 /* for getdents64() */
 struct linux_dirent64 {
uint64_t   d_ino;
-- 
2.28.0



Re: Splicing to/from a tty

2021-01-20 Thread Willy Tarreau
On Wed, Jan 20, 2021 at 07:38:38PM -0800, Linus Torvalds wrote:
> On Wed, Jan 20, 2021 at 5:45 PM Al Viro  wrote:
> >
> > splice() triggers an error for seekable destination with O_APPEND and
> > with NULL off_out.
> 
> Ok, that's just broken.
> 
> > Same for splice() to socket with
> > fcntl(sock_fd, F_SETFL, O_APPEND);
> > done first.
> 
> Same.
> 
> As long as you don't pass a position pointer, I think both should just work.
> 
> Not that I imagine it matters for a lot of people..

I think that most users of splice() on sockets got used to falling back
to recv/send on splice failure due to various cases not being supported
historically (UNIX family sockets immediately come to my mind but I seem
to remember other combinations). Thus I guess that most users of splice()
detect that it doesn't work either due to lower than expected performance
or while running strace.

Willy


Re: rcutorture initrd/nolibc build on ARMv8?

2021-01-20 Thread Willy Tarreau
On Wed, Jan 20, 2021 at 04:02:23PM +0100, Willy Tarreau wrote:
> > ... and getting rid of the ARCH_WANT_* definitions (which are never
> > legitimate for userspace to set). That way, there's no risk that we
> > accidentally use a bogus syscall number in future. Where the kernel does
> > implement a syscall, it will have done whatever is necessary to expose
> > the corresponding __NR_ to userspace without userspace needing
> > to define anything.
> 
> I'm really willing to get rid of that crap, I hated it, I vaguely
> remember having gone through layers of glibc indirections when using
> the pre-processed asm/* and found this to be the only way to expose
> some of them. The fact that it's not needed for you is pretty much
> encouraging. I'll just retry on an older libc I've used a lot (2.18)
> to make sure I didn't overlook anything.

I've now retested and can confirm that the only reason I included them
by then was to have access to these (wrong) __NR_* definitions. Now with
the fixed syscalls these ones are not needed anymore and I indeed only
see the correct definitions, so I've removed these bogus definitions.

I've completed my cleanup and tests, I'll send an updated patch set
later today after I've carefully rediffed everything to synchronise
the in-kernel version with the out-of-tree one.

Cheers,
Willy


Re: rcutorture initrd/nolibc build on ARMv8?

2021-01-20 Thread Willy Tarreau
On Wed, Jan 20, 2021 at 02:54:47PM +, Mark Rutland wrote:
> On Wed, Jan 20, 2021 at 03:25:00PM +0100, Willy Tarreau wrote:
> > On Wed, Jan 20, 2021 at 01:45:11PM +, Mark Rutland wrote:
> > > > Ah that's very interesting indeed because actually these ones should
> > > > only be used when __NR_dup3 or __NR_clone are not defined. Thus I wanted
> > > > to check the definitions that were reported in your error output but
> > > > actually what was needed was to figure whether the correct ones were
> > > > present, and they are, here on my machine (and yes I agree that in this
> > > > case the dup2/fork above are bofus):
> > > 
> > > The issue is that even if a function is unused, the compiler still has
> > > to parse and compile the code, so where __NR_dup2 is not defined, we'll
> > > get a build error for:
> > > 
> > > static __attribute__((unused))
> > > int sys_dup2(int old, int new)
> > > {
> > >return my_syscall2(__NR_dup2, old, new);
> > > }
> > 
> > For sure but this is supposed to be used only when __NR_dup3 is not
> > defined. Ah now I understand where my mistake is: after it built
> > successfully for me I inspected the most recent tree which has these
> > in place. Sorry for being stupid!
> > 
> > In my local tree it's defined like this:
> > 
> >  static __attribute__((unused))
> >  int sys_dup2(int old, int new)
> >  {
> > #ifdef __NR_dup3
> >return my_syscall3(__NR_dup3, old, new, 0);
> > #else
> >return my_syscall2(__NR_dup2, old, new);
> > #endif
> >  }
> 
> Ah, much better!
> 
> For robustness, I think it would be worth doing:
> 
> static __attribute__((unused))
> int sys_dup2(int old, int new)
> {
> #if defined(__NR_dup3)
>   return my_syscall3(__NR_dup3, old, new, 0);
> #elif defined(__NR_dup2)
>   return my_syscall2(__NR_dup2, old, new);
> #else
> #error Cannot implement dup2
> #endif
> }

Yep I'm fine with this. We'll probably adjust between return -ENOSYS
or #error depending on the "criticity" of the syscalls. E.g. we can
miss poll() for plenty of init stuff band be done with ENOSYS but
missing dup is quickly going to cause trouble.

> ... and getting rid of the ARCH_WANT_* definitions (which are never
> legitimate for userspace to set). That way, there's no risk that we
> accidentally use a bogus syscall number in future. Where the kernel does
> implement a syscall, it will have done whatever is necessary to expose
> the corresponding __NR_ to userspace without userspace needing
> to define anything.

I'm really willing to get rid of that crap, I hated it, I vaguely
remember having gone through layers of glibc indirections when using
the pre-processed asm/* and found this to be the only way to expose
some of them. The fact that it's not needed for you is pretty much
encouraging. I'll just retry on an older libc I've used a lot (2.18)
to make sure I didn't overlook anything.

> > However I'm interested in knowing if the latest version fixes everything
> > for you or not :
> > 
> >   https://raw.githubusercontent.com/wtarreau/nolibc/master/nolibc.h
> 
> I replaced my in-tree copy with that, and it built without issues, and
> the tests ran correctly.

Perfect, thanks for the quick feedback!

> > OK thanks! I will retry here without setting those. I'm pretty sure I
> > needed these ones to find the __NR_* values but it's possible that it
> > was before I had the alternate ones and that these are finally not
> > nedeed at all (which I would prefer as these are ugly).
> 
> Great! I reckon they're not needed at all so long as usage of each
> __NR_* is suitably guarded (such as above).
> 
> If you do spot issues when removing ARCH_WANT_*, I'm happy to take a
> look, and/or to test patches handling any fallout.

Many thanks. I'm pretty confident that it's OK without them based on
your feedback but if need I'll ask you for some advices.

cheers,
Willy


Re: rcutorture initrd/nolibc build on ARMv8?

2021-01-20 Thread Willy Tarreau
Hi Mark,

On Wed, Jan 20, 2021 at 12:07:25PM +, Mark Rutland wrote:
> On Tue, Jan 19, 2021 at 06:43:58PM +0100, Willy Tarreau wrote:
> > On Tue, Jan 19, 2021 at 06:16:37PM +0100, Willy Tarreau wrote:
> > Given that you used a native compiler we can't suspect an issue with a
> > bare-metal compiler possibly affecting how kernel headers are passed
> > there. But nevertheless, I'd still not disregard the possibility that
> > the headers found under "linux/" are picked from the libc which for
> > whatever reason would be missing a lot of them.
> 
> I think the actual issue here is a misapprehension in nolibc.h, which
> started blowing up due to a refactoring in asm/unistd.h.
> 
> In nolibc.h, we do:
> 
> | /* Some archs (at least aarch64) don't expose the regular syscalls anymore 
> by
> |  * default, either because they have an "_at" replacement, or because there 
> are
> |  * more modern alternatives. For now we'd rather still use them.
> |  */
> | #define __ARCH_WANT_SYSCALL_NO_AT
> | #define __ARCH_WANT_SYSCALL_NO_FLAGS
> | #define __ARCH_WANT_SYSCALL_DEPRECATED
> 
> ... but this isn't quite right -- it's not that the syscalls aren't
> exposed by default, but rather that these syscall numbers are not valid
> for architectures which do not define the corresponding __ARCH_WANT_*
> flags. Architectures without those have never implemented the syscalls,
> and would have returned -ENOSYS for the unrecognized syscall numbers,
> but the numbers could be allocated to (distinct) syscalls in future.
> 
> Since commit:
> 
>   a0673fdbcd421052 ("asm-generic: clean up asm/unistd.h")
> 
> ... those definitions got pulled out of , and
> hence it's no longer possible to accidentally get those where a
> userspace header defines __ARCH_WANT_* in an architecture where they
> don't exist (e.g. arm64).

I didn't remember the details behind these definitions, I've found
this in the related commit message:

Some syscall numbers are not exported
by default unless a few macros are defined before including unistd.h :

  #define __ARCH_WANT_SYSCALL_NO_AT
  #define __ARCH_WANT_SYSCALL_NO_FLAGS
  #define __ARCH_WANT_SYSCALL_DEPRECATED

So it seems I faced difficulties getting the __NR_* entries and only
could get them using such definitions :-/

Did you figure the correct way to get __NR_* defined on your machine or
should I search ?

> It seems that the headers on my Debian 10.7 system were generated after
> that commit, whereas yours were generated before that.

It's very possible indeed.

> > We've seen that __NR_fork or __NR_dup2 for example were missing in your
> > output, on my native machine I can see them, so that could give us a clue
> > about the root cause of the issue:
> > 
> >   $ gcc -fno-asynchronous-unwind-tables -fno-ident -nostdlib -include 
> > nolibc.h -lgcc -s -static -E -dM init-fail.c | egrep '__NR_(fork|dup2)'
> >   #define __NR_dup2 1041
> >   #define __NR_syscalls (__NR_fork+1)
> >   #define __NR_fork 1079
> 
> As above, these are bogus for arm64. There is no syscall number for dup2
> or fork, and __NR_syscalls is currently only 442.

Ah that's very interesting indeed because actually these ones should
only be used when __NR_dup3 or __NR_clone are not defined. Thus I wanted
to check the definitions that were reported in your error output but
actually what was needed was to figure whether the correct ones were
present, and they are, here on my machine (and yes I agree that in this
case the dup2/fork above are bofus):

  ubuntu@ubuntu:~$ gcc -fno-asynchronous-unwind-tables -fno-ident -nostdlib 
-include nolibc.h -lgcc -s -static -E -dM init-fail.c | egrep 
'__NR_(clone|dup3)'
  #define __NR_clone 220
  #define __NR_dup3 24

Do you have these ones with your more recent includes ? Or are these wrong
again ? Again I'd be surprised given that I didn't invent them and that my
systems boot fine using these.

> I think the right thing to do is to have nolibc.h detect which syscalls
> are implemented, and to not define __ARCH_WANT_*.

Actually that's what was attempted by already focusing on ifdefs but
without *any* __NR_* we can't even hope to call a syscall and check for
ENOSYS for example. So the code really tries to figure which is the right
__NR_ value for a given syscall, and a quick check at my userland code
shows that I'm already using dup2() and fork() as defined from dup3()
and clone().

Thanks!
Willy


Re: rcutorture initrd/nolibc build on ARMv8?

2021-01-20 Thread Willy Tarreau
On Wed, Jan 20, 2021 at 01:45:11PM +, Mark Rutland wrote:
> > Ah that's very interesting indeed because actually these ones should
> > only be used when __NR_dup3 or __NR_clone are not defined. Thus I wanted
> > to check the definitions that were reported in your error output but
> > actually what was needed was to figure whether the correct ones were
> > present, and they are, here on my machine (and yes I agree that in this
> > case the dup2/fork above are bofus):
> 
> The issue is that even if a function is unused, the compiler still has
> to parse and compile the code, so where __NR_dup2 is not defined, we'll
> get a build error for:
> 
> static __attribute__((unused))
> int sys_dup2(int old, int new)
> {
>return my_syscall2(__NR_dup2, old, new);
> }

For sure but this is supposed to be used only when __NR_dup3 is not
defined. Ah now I understand where my mistake is: after it built
successfully for me I inspected the most recent tree which has these
in place. Sorry for being stupid!

In my local tree it's defined like this:

 static __attribute__((unused))
 int sys_dup2(int old, int new)
 {
#ifdef __NR_dup3
   return my_syscall3(__NR_dup3, old, new, 0);
#else
   return my_syscall2(__NR_dup2, old, new);
#endif
 }

I guess I fixed it in a hurry and forgot to upstream it. I hate doing
that :-(

I'm going to push an update then. On a quick glance I'm seeing that I
addressed dup2() using dup3(), fork() using clone(), getpgrp() using
getpgid(), and poll() using ppoll().

> ... we can deal with that by always returning -ENOSYS for unimplemented
> syscalls, e.g.
> 
> static __attribute__((unused))
> int sys_dup2(int old, int new)
> {
> #ifdef __NR_dup2
>return my_syscall2(__NR_dup2, old, new);
> #else
>return -ENOSYS;
> #endif
> }

I didn't want to do that because that would break userland which needs
dup2(), hence the mapping to dup3 instead:

  static __attribute__((unused))
  int sys_dup2(int old, int new)
  {
  #ifdef __NR_dup3
  return my_syscall3(__NR_dup3, old, new, 0);
  #else
  return my_syscall2(__NR_dup2, old, new);
  #endif
  }

> I can spin a patch fixing up all the relevant syscalls, if you'd like?

I shouldn't need since these are already fixed in my tree. At first glance
the equivalent of the following commits is missing from the kernel version:

   
https://github.com/wtarreau/nolibc/commit/2379f25073f906d0bad22c48566fcffee8fc9cde
   
https://github.com/wtarreau/nolibc/commit/fd5272ec2c66e6f5b195d41c9c8978f58eb74668
   
https://github.com/wtarreau/nolibc/commit/47cc42a79c92305f4f8bc02fb28628a4fdd63eaa
   
https://github.com/wtarreau/nolibc/commit/d2dc42fd614991c741dfdc8b984864fa3cf64c8e
   
https://github.com/wtarreau/nolibc/commit/800f75c13ede49097325f82a4cca3515c44a7939

However I'm interested in knowing if the latest version fixes everything
for you or not :

  https://raw.githubusercontent.com/wtarreau/nolibc/master/nolibc.h

> [...]
> 
> >   ubuntu@ubuntu:~$ gcc -fno-asynchronous-unwind-tables -fno-ident -nostdlib 
> > -include nolibc.h -lgcc -s -static -E -dM init-fail.c | egrep 
> > '__NR_(clone|dup3)'
> >   #define __NR_clone 220
> >   #define __NR_dup3 24
> > 
> > Do you have these ones with your more recent includes ? Or are these wrong
> > again ?
> 
> Those are correct (and all the syscall numbers in unistd.h should be
> correct so long as you don't erroneously set the __ARCH_WANT_* flags):
> 
> [mark@gravadlaks:~/src/linux]% gcc -fno-asynchronous-unwind-tables -fno-ident 
> -nostdlib -include tools/include/nolibc/nolibc.h -lgcc -s -static -E -dM 
> init-fail.c | egrep '__NR_(clone|dup3)'
> #define __NR_clone 220
> #define __NR_dup3 24

OK thanks! I will retry here without setting those. I'm pretty sure I
needed these ones to find the __NR_* values but it's possible that it
was before I had the alternate ones and that these are finally not
nedeed at all (which I would prefer as these are ugly).

> There's still some latent issue when using nolibc (compared to using
> glibc) where the init process never seems to exit, but that looks to be
> orthogonal to the syscall numbering issue -- I'm currently digging into
> that.

OK! Usually for me it does as in my preinit (which uses nolibc), if I
exit I instantly get a kernel panic. In addition if I launch it after
boot, it immediately exits and shows no issue. But maybe you're observing
an artefact related to something else (process session, opened FD or
anything else maybe).

I'll send an update ASAP, likely this evening.

Many thanks for digging into this, and sorry for this mess, as I was
absolutely certain it was up to date :-(

Willy


Re: rcutorture initrd/nolibc build on ARMv8?

2021-01-19 Thread Willy Tarreau
On Tue, Jan 19, 2021 at 06:16:37PM +0100, Willy Tarreau wrote:
> > | ${CROSS_COMPILE}gcc -fno-asynchronous-unwind-tables -fno-ident \
> > | -nostdlib -include ../../../../include/nolibc/nolibc.h \
> > | -lgcc -s -static -Os -o init init.c
> 
> OK I'll retry this, thank you!

For me on my x86 PC it worked using a cross-compiler:

  $ 
/f/tc/gcc-linaro-6.4.1-2018.05-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc
 -fno-asynchronous-unwind-tables -fno-ident -nostdlib -include 
tools/include/nolibc/nolibc.h -lgcc -s -static -Os -o init-fail init-fail.c 
  $ file init-fail
  init-fail: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), 
statically linked, BuildID[sha1]=1088f8fad85b7b8a7f00eef4918d24c4c4fdaadf, 
stripped

Given that you used a native compiler we can't suspect an issue with a
bare-metal compiler possibly affecting how kernel headers are passed
there. But nevertheless, I'd still not disregard the possibility that
the headers found under "linux/" are picked from the libc which for
whatever reason would be missing a lot of them.

I've now tested natively on a local rpi4b onto which I copied the files
(not enough I/O BW on the SD card to clone a kernel there). Tried:

  $ gcc -fno-asynchronous-unwind-tables -fno-ident -nostdlib -include nolibc.h 
-lgcc -s -static -Os -o init-fail init-fail.c
  specs.
  COLLECT_GCC=gcc
  COLLECT_LTO_WRAPPER=/usr/lib/gcc/aarch64-linux-gnu/7/lto-wrapper
  Target: aarch64-linux-gnu
  Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 
7.5.0-3ubuntu1~18.04' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs 
--enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++ --prefix=/usr 
--with-gcc-major-version-only --program-suffix=-7 
--program-prefix=aarch64-linux-gnu- --enable-shared --enable-linker-build-id 
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix 
--libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu 
--enable-libstdcxx-debug --enable-libstdcxx-time=yes 
--with-default-libstdcxx-abi=new --enable-gnu-unique-object 
--disable-libquadmath --disable-libquadmath-support --enable-plugin 
--enable-default-pie --with-system-zlib --enable-multiarch 
--enable-fix-cortex-a53-843419 --disable-werror --enable-checking=release 
--build=aarch64-linux-gnu --host=aarch64-linux-gnu --target=aarch64-linux-gnu
  Thread model: posix
  gcc version 7.5.0 (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 

Could you please check the output of this from your intermediary "init.c"
file:

   $ gcc -fno-asynchronous-unwind-tables -fno-ident -nostdlib -include 
tools/include/nolibc/nolibc.h -lgcc -s -static -E  init.c | grep '^#'

It might give us a clue about where it's finding its includes. And possibly
the list of __NR_ entries reported here as well:

   $ gcc -fno-asynchronous-unwind-tables -fno-ident -nostdlib -include 
tools/include/nolibc/nolibc.h -lgcc -s -static -E -dM init.c | grep __NR_

We've seen that __NR_fork or __NR_dup2 for example were missing in your
output, on my native machine I can see them, so that could give us a clue
about the root cause of the issue:

  $ gcc -fno-asynchronous-unwind-tables -fno-ident -nostdlib -include nolibc.h 
-lgcc -s -static -E -dM init-fail.c | egrep '__NR_(fork|dup2)'
  #define __NR_dup2 1041
  #define __NR_syscalls (__NR_fork+1)
  #define __NR_fork 1079

I could easily imagine a missing "linux" link or entry somewhere in the
default includes path, but here since there's no such error it rather looks
like an incorrect file which is a bit more concerning. I purposely avoided
to maintain my own definitions for __NR_* let's hope we won't need to do
that :-/

Thanks,
Willy


Re: rcutorture initrd/nolibc build on ARMv8?

2021-01-19 Thread Willy Tarreau
Hi Mark,

On Tue, Jan 19, 2021 at 05:02:38PM +, Mark Rutland wrote:
> > I can't spot from the report above the original C file that was attempted
> > to be built, it makes me think we tried to compile directly the .h file.
> 
> That was the inline snippet in
> tools/testing/selftests/rcutorture/bin/mkinitrd.sh:
> 
> | #ifndef NOLIBC
> | #include 
> | #include 
> | #endif
> | 
> | volatile unsigned long delaycount;
> | 
> | int main(int argc, int argv[])
> | {
> | int i;
> | struct timeval tv;
> | struct timeval tvb;
> | 
> | for (;;) {
> | sleep(1);
> | /* Need some userspace time. */
> | if (gettimeofday(, NULL))
> | continue;
> | do {
> | for (i = 0; i < 1000 * 100; i++)
> | delaycount = i * i;
> | if (gettimeofday(, NULL))
> | break;
> | tv.tv_sec -= tvb.tv_sec;
> | if (tv.tv_sec > 1)
> | break;
> | tv.tv_usec += tv.tv_sec * 1000 * 1000;
> | tv.tv_usec -= tvb.tv_usec;
> | } while (tv.tv_usec < 1000);
> | }
> | return 0;
> | }
> 
> ... which gets written to a file called init.c, and then built with:
> 
> | ${CROSS_COMPILE}gcc -fno-asynchronous-unwind-tables -fno-ident \
> | -nostdlib -include ../../../../include/nolibc/nolibc.h \
> | -lgcc -s -static -Os -o init init.c

OK I'll retry this, thank you!

> I was building natively on an arm64 box:
> 
> | ./tools/testing/selftests/rcutorture/bin/kvm.sh \
> | --cpus 250 --trust-make --configs "TREE03" \
> | --kmake-arg "CROSS_COMPILE=aarch64-linux-gnu- ARCH=arm64"
> 
> > Having it run through sh -x would help me try to locate the root cause or
> > possibly even attempt to reproduce it.
> 
> I ran with sh -x, but it didn't log the compiler invocation; hopefully
> the above is sufficient?

I guess so, yes. I'm pretty sure I'll come back with new questions
soon :-)

Thanks,
Willy


Re: rcutorture initrd/nolibc build on ARMv8?

2021-01-19 Thread Willy Tarreau
Hi Paul,

On Tue, Jan 19, 2021 at 07:31:47AM -0800, Paul E. McKenney wrote:
> Hello, Willy,
> 
> Some people are having trouble running rcutorture on ARMv8.  They
> get things like this from the nolibc build of initrd:
> 
> https://paste.debian.net/1181762/
> 
> The nolibc.h file says this:
> 
> /* Some archs (at least aarch64) don't expose the regular syscalls anymore by
>  * default, either because they have an "_at" replacement, or because there 
> are
>  * more modern alternatives. For now we'd rather still use them.
>  */
> 
> Are these build failures expected behavior on ARMv8?

No, I don't think so. I'm regularly building my own init using this,
and it works on various platforms including aarch64, while it makes
use of a number of such syscalls.

>From what I'm seeing, this seems to happen each time in such a construct:

#if defined(__NR_new_name)
 use __NR_new_name
#else
 use __NR_old_name
#endif

with the error appearing on old_name. So I guess that we're rather facing a
case where a number of such __NR_* entries are not defined because presumably
one include file might be missing (probably from a recent change of headers
dependency).

I can't spot from the report above the original C file that was attempted
to be built, it makes me think we tried to compile directly the .h file.

Having it run through sh -x would help me try to locate the root cause or
possibly even attempt to reproduce it.

Thanks,
Willy


Re: Old platforms: bring out your dead

2021-01-09 Thread Willy Tarreau
On Sat, Jan 09, 2021 at 10:52:53PM +0100, Arnd Bergmann wrote:
(... i486 ...)
> As with the other older platforms, the main question to ask is:
> Are there users that are better off running a future LTS kernel on this
> hardware than the v5.10.y version or something older?

I think this is the most important part of the question. Because the
possible use case I've described actually doesn't correspond to a
"prod" machine but to a machine that's powered on every 5 years for
some old data recovery. In such a case users just start with an old
system (possibly the one that's still on them if present), and this
doesn't warrant an up-to-date OS.

Moreover, just as I experienced when maintaining 2.4, there's a point
where support for old stuff starts to break again by lack of testing.
And just because of this, users shouldn't always expect to see their
old machines boot fine on a recent kernel. Sometimes there may even be
difficulties setting up a compatible toolchain.

So actually the question shouldn't be "does anyone want such old
machines to still be supported" but "does anyone *need* them to be
supported". And I suspect that for most of them the response is "no",
it's just a convenience.

Just my two cents,
Willy


Re: Old platforms: bring out your dead

2021-01-08 Thread Willy Tarreau
On Fri, Jan 08, 2021 at 11:55:06PM +0100, Arnd Bergmann wrote:
> * 80486SX/DX: 80386 CPUs were dropped in 2012, and there are
>   indications that 486 have no users either on recent kernels.
>   There is still the Vortex86 family of SoCs, and the oldest of those were
>   486SX-class, but all the modern ones are 586-class.

These also are the last generation of fanless x86 boards with 100% compatible
controllers, that some people have probably kept around because these don't
age much and have plenty of connectivity. I've used an old one a few times
to plug in an old floppy drive, ISA SCSI controllers to access an old tape
drive and a few such things. That doesn't mean that it's a good justification
not to remove them, what I rather mean is that *if* there is no benefit
in dropping them maybe we can keep them. On the other hand, good luck for
running a modern OS on these, when 16MB-32MB RAM was about the maximum that
was commonly found by then (though if people kept them around that's probably
because they were well equipped, like that 64MB 386DX I'm having :-)).

Willy


Re: drivers/mtd/tests/subpagetest.c:426:1: error: could not split insn

2020-12-15 Thread Willy Tarreau
Hi,

On Tue, Dec 15, 2020 at 11:05:28PM +0800, kernel test robot wrote:
> Hi Willy,
> 
> FYI, the error/warning still remains.
> 
> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
> master
> head:   148842c98a24e508aecb929718818fbf4c2a6ff3
> commit: 3744741adab6d9195551ce30e65e726c7a408421 random32: add noise from 
> network and scheduling activity
(...)

not sure why I'm assigned this, but the root cause is a compiler bug:
>drivers/mtd/tests/subpagetest.c: In function 'mtd_subpagetest_init':
> >> drivers/mtd/tests/subpagetest.c:426:1: error: could not split insn
>  426 | }
>  | ^
>(insn:TI 453 2652 455 (set (reg/v:SI 3 a3 [orig:304 a ] [304])
>(xor:SI (reg:SI 1 a1 [orig:717 net_rand_noise ] [717])
>(const:SI (plus:SI (symbol_ref:SI ("*.LANCHOR0") [flags 0x182])
>(const_int 12 [0xc]) 
> "include/linux/prandom.h":66:4 152 {cskyv2_xorsi3}
> (expr_list:REG_DEAD (reg:SI 1 a1 [orig:717 net_rand_noise ] [717])
>(nil)))
>during RTL pass: final
>drivers/mtd/tests/subpagetest.c:426:1: internal compiler error: in 
> final_scan_insn_1, at final.c:3074
^^^

>0x510da0 _fatal_insn(char const*, rtx_def const*, char const*, int, char 
> const*)
>   /tmp/build-crosstools-xh-9.3.0-2.34/gcc/gcc-9.3.0/gcc/rtl-error.c:108
>0x503d22 final_scan_insn_1
>   /tmp/build-crosstools-xh-9.3.0-2.34/gcc/gcc-9.3.0/gcc/final.c:3074
>0x73f8bf final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*)
>   /tmp/build-crosstools-xh-9.3.0-2.34/gcc/gcc-9.3.0/gcc/final.c:3153
>0x73fbac final_1
>   /tmp/build-crosstools-xh-9.3.0-2.34/gcc/gcc-9.3.0/gcc/final.c:2021
>0x740618 rest_of_handle_final
>   /tmp/build-crosstools-xh-9.3.0-2.34/gcc/gcc-9.3.0/gcc/final.c:4659
>0x740618 execute
>   /tmp/build-crosstools-xh-9.3.0-2.34/gcc/gcc-9.3.0/gcc/final.c:4737
>Please submit a full bug report,
>with preprocessed source if appropriate.
>Please include the complete backtrace with any bug report.
>See  for instructions.
  ^

That's totally out of my scope. I suspect it might have broken a bisect
operation.

Regards,
Willy


Re:

2020-12-03 Thread Willy Tarreau
On Thu, Dec 03, 2020 at 10:46:25AM -0800, Yury Norov wrote:
> Yun, could you please stop top-posting and excessive trimming in the thread?

And re-configure the mail agent to make the "Subject" field appear and
fill it.

Willy


Re: Linux 5.9.8

2020-11-11 Thread Willy Tarreau
On Wed, Nov 11, 2020 at 03:36:04PM +0100, Christian Hesse wrote:
> Greg Kroah-Hartman  on Tue, 2020/11/10 21:56:
> > I'm announcing the release of the 5.9.8 kernel.
> 
> This is not yet linked on kernel.org - same goes for lts version 5.4.77.
> I guess this is not by intention, no?

Strange, both are indeed not listed on kernel.org but are available in
the download directory and can be retrieved from there:

https://mirrors.edge.kernel.org/pub/linux/kernel/v5.x/
https://cdn.kernel.org/pub/linux/kernel/v5.x/

Maybe it's just the bot that updates the front page that was stopped.

Willy


[PATCH 4.14] random32: make prandom_u32() output unpredictable

2020-11-10 Thread Willy Tarreau
From: George Spelvin 

commit c51f8f88d705e06bd696d7510aff22b33eb8e638 upstream.

Non-cryptographic PRNGs may have great statistical properties, but
are usually trivially predictable to someone who knows the algorithm,
given a small sample of their output.  An LFSR like prandom_u32() is
particularly simple, even if the sample is widely scattered bits.

It turns out the network stack uses prandom_u32() for some things like
random port numbers which it would prefer are *not* trivially predictable.
Predictability led to a practical DNS spoofing attack.  Oops.

This patch replaces the LFSR with a homebrew cryptographic PRNG based
on the SipHash round function, which is in turn seeded with 128 bits
of strong random key.  (The authors of SipHash have *not* been consulted
about this abuse of their algorithm.)  Speed is prioritized over security;
attacks are rare, while performance is always wanted.

Replacing all callers of prandom_u32() is the quick fix.
Whether to reinstate a weaker PRNG for uses which can tolerate it
is an open question.

Commit f227e3ec3b5c ("random32: update the net random state on interrupt
and activity") was an earlier attempt at a solution.  This patch replaces
it.

Reported-by: Amit Klein 
Cc: Willy Tarreau 
Cc: Eric Dumazet 
Cc: "Jason A. Donenfeld" 
Cc: Andy Lutomirski 
Cc: Kees Cook 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Linus Torvalds 
Cc: ty...@mit.edu
Cc: Florian Westphal 
Cc: Marc Plumb 
Fixes: f227e3ec3b5c ("random32: update the net random state on interrupt and 
activity")
Signed-off-by: George Spelvin 
Link: https://lore.kernel.org/netdev/20200808152628.ga27...@sdf.org/
[ willy: partial reversal of f227e3ec3b5c; moved SIPROUND definitions
  to prandom.h for later use; merged George's prandom_seed() proposal;
  inlined siprand_u32(); replaced the net_rand_state[] array with 4
  members to fix a build issue; cosmetic cleanups to make checkpatch
  happy; fixed RANDOM32_SELFTEST build ]
Signed-off-by: Willy Tarreau 
[wt: backported to 4.14 -- various context adjustments; timer API change]
Signed-off-by: Willy Tarreau 
---
 drivers/char/random.c   |   1 -
 include/linux/prandom.h |  36 +++-
 kernel/time/timer.c |   7 -
 lib/random32.c  | 462 +---
 4 files changed, 317 insertions(+), 189 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index b202f66..868d262 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1246,7 +1246,6 @@ void add_interrupt_randomness(int irq, int irq_flags)
 
fast_mix(fast_pool);
add_interrupt_bench(cycles);
-   this_cpu_add(net_rand_state.s1, fast_pool->pool[cycles & 3]);
 
if (unlikely(crng_init == 0)) {
if ((fast_pool->count >= 64) &&
diff --git a/include/linux/prandom.h b/include/linux/prandom.h
index aa16e64..cc1e713 100644
--- a/include/linux/prandom.h
+++ b/include/linux/prandom.h
@@ -16,12 +16,44 @@ void prandom_bytes(void *buf, size_t nbytes);
 void prandom_seed(u32 seed);
 void prandom_reseed_late(void);
 
+#if BITS_PER_LONG == 64
+/*
+ * The core SipHash round function.  Each line can be executed in
+ * parallel given enough CPU resources.
+ */
+#define PRND_SIPROUND(v0, v1, v2, v3) ( \
+   v0 += v1, v1 = rol64(v1, 13),  v2 += v3, v3 = rol64(v3, 16), \
+   v1 ^= v0, v0 = rol64(v0, 32),  v3 ^= v2, \
+   v0 += v3, v3 = rol64(v3, 21),  v2 += v1, v1 = rol64(v1, 17), \
+   v3 ^= v0,  v1 ^= v2, v2 = rol64(v2, 32)  \
+)
+
+#define PRND_K0 (0x736f6d6570736575 ^ 0x6c7967656e657261)
+#define PRND_K1 (0x646f72616e646f6d ^ 0x7465646279746573)
+
+#elif BITS_PER_LONG == 32
+/*
+ * On 32-bit machines, we use HSipHash, a reduced-width version of SipHash.
+ * This is weaker, but 32-bit machines are not used for high-traffic
+ * applications, so there is less output for an attacker to analyze.
+ */
+#define PRND_SIPROUND(v0, v1, v2, v3) ( \
+   v0 += v1, v1 = rol32(v1,  5),  v2 += v3, v3 = rol32(v3,  8), \
+   v1 ^= v0, v0 = rol32(v0, 16),  v3 ^= v2, \
+   v0 += v3, v3 = rol32(v3,  7),  v2 += v1, v1 = rol32(v1, 13), \
+   v3 ^= v0,  v1 ^= v2, v2 = rol32(v2, 16)  \
+)
+#define PRND_K0 0x6c796765
+#define PRND_K1 0x74656462
+
+#else
+#error Unsupported BITS_PER_LONG
+#endif
+
 struct rnd_state {
__u32 s1, s2, s3, s4;
 };
 
-DECLARE_PER_CPU(struct rnd_state, net_rand_state);
-
 u32 prandom_u32_state(struct rnd_state *state);
 void prandom_bytes_state(struct rnd_state *state, void *buf, size_t nbytes);
 void prandom_seed_full_state(struct rnd_state __percpu *pcpu_state);
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index d4bc272..99f885d 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1596,13 +1596,6 @@ void update_process_times(int user_tick)
scheduler_tick();
if (IS_ENABLED(CONFIG_POSIX_TIMERS))
ru

[PATCH 5.4] random32: make prandom_u32() output unpredictable

2020-11-10 Thread Willy Tarreau
From: George Spelvin 

commit c51f8f88d705e06bd696d7510aff22b33eb8e638 upstream.

Non-cryptographic PRNGs may have great statistical properties, but
are usually trivially predictable to someone who knows the algorithm,
given a small sample of their output.  An LFSR like prandom_u32() is
particularly simple, even if the sample is widely scattered bits.

It turns out the network stack uses prandom_u32() for some things like
random port numbers which it would prefer are *not* trivially predictable.
Predictability led to a practical DNS spoofing attack.  Oops.

This patch replaces the LFSR with a homebrew cryptographic PRNG based
on the SipHash round function, which is in turn seeded with 128 bits
of strong random key.  (The authors of SipHash have *not* been consulted
about this abuse of their algorithm.)  Speed is prioritized over security;
attacks are rare, while performance is always wanted.

Replacing all callers of prandom_u32() is the quick fix.
Whether to reinstate a weaker PRNG for uses which can tolerate it
is an open question.

Commit f227e3ec3b5c ("random32: update the net random state on interrupt
and activity") was an earlier attempt at a solution.  This patch replaces
it.

Reported-by: Amit Klein 
Cc: Willy Tarreau 
Cc: Eric Dumazet 
Cc: "Jason A. Donenfeld" 
Cc: Andy Lutomirski 
Cc: Kees Cook 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Linus Torvalds 
Cc: ty...@mit.edu
Cc: Florian Westphal 
Cc: Marc Plumb 
Fixes: f227e3ec3b5c ("random32: update the net random state on interrupt and 
activity")
Signed-off-by: George Spelvin 
Link: https://lore.kernel.org/netdev/20200808152628.ga27...@sdf.org/
[ willy: partial reversal of f227e3ec3b5c; moved SIPROUND definitions
  to prandom.h for later use; merged George's prandom_seed() proposal;
  inlined siprand_u32(); replaced the net_rand_state[] array with 4
  members to fix a build issue; cosmetic cleanups to make checkpatch
  happy; fixed RANDOM32_SELFTEST build ]
Signed-off-by: Willy Tarreau 
[wt: backported to 5.4 -- no tracepoint there]
Signed-off-by: Willy Tarreau 
---
 drivers/char/random.c   |   1 -
 include/linux/prandom.h |  36 +++-
 kernel/time/timer.c |   7 -
 lib/random32.c  | 462 +---
 4 files changed, 317 insertions(+), 189 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 75a8f7f..2c29f83 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1330,7 +1330,6 @@ void add_interrupt_randomness(int irq, int irq_flags)
 
fast_mix(fast_pool);
add_interrupt_bench(cycles);
-   this_cpu_add(net_rand_state.s1, fast_pool->pool[cycles & 3]);
 
if (unlikely(crng_init == 0)) {
if ((fast_pool->count >= 64) &&
diff --git a/include/linux/prandom.h b/include/linux/prandom.h
index aa16e64..cc1e713 100644
--- a/include/linux/prandom.h
+++ b/include/linux/prandom.h
@@ -16,12 +16,44 @@ void prandom_bytes(void *buf, size_t nbytes);
 void prandom_seed(u32 seed);
 void prandom_reseed_late(void);
 
+#if BITS_PER_LONG == 64
+/*
+ * The core SipHash round function.  Each line can be executed in
+ * parallel given enough CPU resources.
+ */
+#define PRND_SIPROUND(v0, v1, v2, v3) ( \
+   v0 += v1, v1 = rol64(v1, 13),  v2 += v3, v3 = rol64(v3, 16), \
+   v1 ^= v0, v0 = rol64(v0, 32),  v3 ^= v2, \
+   v0 += v3, v3 = rol64(v3, 21),  v2 += v1, v1 = rol64(v1, 17), \
+   v3 ^= v0,  v1 ^= v2, v2 = rol64(v2, 32)  \
+)
+
+#define PRND_K0 (0x736f6d6570736575 ^ 0x6c7967656e657261)
+#define PRND_K1 (0x646f72616e646f6d ^ 0x7465646279746573)
+
+#elif BITS_PER_LONG == 32
+/*
+ * On 32-bit machines, we use HSipHash, a reduced-width version of SipHash.
+ * This is weaker, but 32-bit machines are not used for high-traffic
+ * applications, so there is less output for an attacker to analyze.
+ */
+#define PRND_SIPROUND(v0, v1, v2, v3) ( \
+   v0 += v1, v1 = rol32(v1,  5),  v2 += v3, v3 = rol32(v3,  8), \
+   v1 ^= v0, v0 = rol32(v0, 16),  v3 ^= v2, \
+   v0 += v3, v3 = rol32(v3,  7),  v2 += v1, v1 = rol32(v1, 13), \
+   v3 ^= v0,  v1 ^= v2, v2 = rol32(v2, 16)  \
+)
+#define PRND_K0 0x6c796765
+#define PRND_K1 0x74656462
+
+#else
+#error Unsupported BITS_PER_LONG
+#endif
+
 struct rnd_state {
__u32 s1, s2, s3, s4;
 };
 
-DECLARE_PER_CPU(struct rnd_state, net_rand_state);
-
 u32 prandom_u32_state(struct rnd_state *state);
 void prandom_bytes_state(struct rnd_state *state, void *buf, size_t nbytes);
 void prandom_seed_full_state(struct rnd_state __percpu *pcpu_state);
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index a3ae244..87fa73c 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1743,13 +1743,6 @@ void update_process_times(int user_tick)
scheduler_tick();
if (IS_ENABLED(CONFIG_POSIX_TIMERS))
run_posix_cpu_timers();
-
-

[PATCH 4.4] random32: make prandom_u32() output unpredictable

2020-11-10 Thread Willy Tarreau
From: George Spelvin 

commit c51f8f88d705e06bd696d7510aff22b33eb8e638 upstream.

Non-cryptographic PRNGs may have great statistical properties, but
are usually trivially predictable to someone who knows the algorithm,
given a small sample of their output.  An LFSR like prandom_u32() is
particularly simple, even if the sample is widely scattered bits.

It turns out the network stack uses prandom_u32() for some things like
random port numbers which it would prefer are *not* trivially predictable.
Predictability led to a practical DNS spoofing attack.  Oops.

This patch replaces the LFSR with a homebrew cryptographic PRNG based
on the SipHash round function, which is in turn seeded with 128 bits
of strong random key.  (The authors of SipHash have *not* been consulted
about this abuse of their algorithm.)  Speed is prioritized over security;
attacks are rare, while performance is always wanted.

Replacing all callers of prandom_u32() is the quick fix.
Whether to reinstate a weaker PRNG for uses which can tolerate it
is an open question.

Commit f227e3ec3b5c ("random32: update the net random state on interrupt
and activity") was an earlier attempt at a solution.  This patch replaces
it.

Reported-by: Amit Klein 
Cc: Willy Tarreau 
Cc: Eric Dumazet 
Cc: "Jason A. Donenfeld" 
Cc: Andy Lutomirski 
Cc: Kees Cook 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Linus Torvalds 
Cc: ty...@mit.edu
Cc: Florian Westphal 
Cc: Marc Plumb 
Fixes: f227e3ec3b5c ("random32: update the net random state on interrupt and 
activity")
Signed-off-by: George Spelvin 
Link: https://lore.kernel.org/netdev/20200808152628.ga27...@sdf.org/
[ willy: partial reversal of f227e3ec3b5c; moved SIPROUND definitions
  to prandom.h for later use; merged George's prandom_seed() proposal;
  inlined siprand_u32(); replaced the net_rand_state[] array with 4
  members to fix a build issue; cosmetic cleanups to make checkpatch
  happy; fixed RANDOM32_SELFTEST build ]
Signed-off-by: Willy Tarreau 
[wt: backported to 4.4 -- no latent_entropy, drop prandom_reseed_late]
Signed-off-by: Willy Tarreau 
---
 drivers/char/random.c   |   2 -
 include/linux/prandom.h |  36 +++-
 kernel/time/timer.c |   7 -
 lib/random32.c  | 463 +---
 4 files changed, 317 insertions(+), 191 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 7bb1e42..08d96d5 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -678,7 +678,6 @@ static void credit_entropy_bits(struct entropy_store *r, 
int nbits)
r->initialized = 1;
r->entropy_total = 0;
if (r == _pool) {
-   prandom_reseed_late();
process_random_ready_list();
wake_up_all(_init_wait);
pr_notice("random: %s pool is initialized\n", r->name);
@@ -923,7 +922,6 @@ void add_interrupt_randomness(int irq, int irq_flags)
 
fast_mix(fast_pool);
add_interrupt_bench(cycles);
-   this_cpu_add(net_rand_state.s1, fast_pool->pool[cycles & 3]);
 
if ((fast_pool->count < 64) &&
!time_after(now, fast_pool->last + HZ))
diff --git a/include/linux/prandom.h b/include/linux/prandom.h
index aa16e64..cc1e713 100644
--- a/include/linux/prandom.h
+++ b/include/linux/prandom.h
@@ -16,12 +16,44 @@ void prandom_bytes(void *buf, size_t nbytes);
 void prandom_seed(u32 seed);
 void prandom_reseed_late(void);
 
+#if BITS_PER_LONG == 64
+/*
+ * The core SipHash round function.  Each line can be executed in
+ * parallel given enough CPU resources.
+ */
+#define PRND_SIPROUND(v0, v1, v2, v3) ( \
+   v0 += v1, v1 = rol64(v1, 13),  v2 += v3, v3 = rol64(v3, 16), \
+   v1 ^= v0, v0 = rol64(v0, 32),  v3 ^= v2, \
+   v0 += v3, v3 = rol64(v3, 21),  v2 += v1, v1 = rol64(v1, 17), \
+   v3 ^= v0,  v1 ^= v2, v2 = rol64(v2, 32)  \
+)
+
+#define PRND_K0 (0x736f6d6570736575 ^ 0x6c7967656e657261)
+#define PRND_K1 (0x646f72616e646f6d ^ 0x7465646279746573)
+
+#elif BITS_PER_LONG == 32
+/*
+ * On 32-bit machines, we use HSipHash, a reduced-width version of SipHash.
+ * This is weaker, but 32-bit machines are not used for high-traffic
+ * applications, so there is less output for an attacker to analyze.
+ */
+#define PRND_SIPROUND(v0, v1, v2, v3) ( \
+   v0 += v1, v1 = rol32(v1,  5),  v2 += v3, v3 = rol32(v3,  8), \
+   v1 ^= v0, v0 = rol32(v0, 16),  v3 ^= v2, \
+   v0 += v3, v3 = rol32(v3,  7),  v2 += v1, v1 = rol32(v1, 13), \
+   v3 ^= v0,  v1 ^= v2, v2 = rol32(v2, 16)  \
+)
+#define PRND_K0 0x6c796765
+#define PRND_K1 0x74656462
+
+#else
+#error Unsupported BITS_PER_LONG
+#endif
+
 struct rnd_state {
__u32 s1, s2, s3, s4;
 };
 
-DECLARE_PER_CPU(struct rnd_state, net_rand_state);
-
 u32 prandom_u32_state(struct rnd_state *sta

[PATCH 4.9] random32: make prandom_u32() output unpredictable

2020-11-10 Thread Willy Tarreau
From: George Spelvin 

commit c51f8f88d705e06bd696d7510aff22b33eb8e638 upstream.

Non-cryptographic PRNGs may have great statistical properties, but
are usually trivially predictable to someone who knows the algorithm,
given a small sample of their output.  An LFSR like prandom_u32() is
particularly simple, even if the sample is widely scattered bits.

It turns out the network stack uses prandom_u32() for some things like
random port numbers which it would prefer are *not* trivially predictable.
Predictability led to a practical DNS spoofing attack.  Oops.

This patch replaces the LFSR with a homebrew cryptographic PRNG based
on the SipHash round function, which is in turn seeded with 128 bits
of strong random key.  (The authors of SipHash have *not* been consulted
about this abuse of their algorithm.)  Speed is prioritized over security;
attacks are rare, while performance is always wanted.

Replacing all callers of prandom_u32() is the quick fix.
Whether to reinstate a weaker PRNG for uses which can tolerate it
is an open question.

Commit f227e3ec3b5c ("random32: update the net random state on interrupt
and activity") was an earlier attempt at a solution.  This patch replaces
it.

Reported-by: Amit Klein 
Cc: Willy Tarreau 
Cc: Eric Dumazet 
Cc: "Jason A. Donenfeld" 
Cc: Andy Lutomirski 
Cc: Kees Cook 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Linus Torvalds 
Cc: ty...@mit.edu
Cc: Florian Westphal 
Cc: Marc Plumb 
Fixes: f227e3ec3b5c ("random32: update the net random state on interrupt and 
activity")
Signed-off-by: George Spelvin 
Link: https://lore.kernel.org/netdev/20200808152628.ga27...@sdf.org/
[ willy: partial reversal of f227e3ec3b5c; moved SIPROUND definitions
  to prandom.h for later use; merged George's prandom_seed() proposal;
  inlined siprand_u32(); replaced the net_rand_state[] array with 4
  members to fix a build issue; cosmetic cleanups to make checkpatch
  happy; fixed RANDOM32_SELFTEST build ]
Signed-off-by: Willy Tarreau 
[wt: backported to 4.9 -- various context adjustments; timer API change]
Signed-off-by: Willy Tarreau 
---
 drivers/char/random.c   |   1 -
 include/linux/prandom.h |  36 +++-
 kernel/time/timer.c |   7 -
 lib/random32.c  | 462 +---
 4 files changed, 317 insertions(+), 189 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index c417aa1..4cbc731 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1211,7 +1211,6 @@ void add_interrupt_randomness(int irq, int irq_flags)
 
fast_mix(fast_pool);
add_interrupt_bench(cycles);
-   this_cpu_add(net_rand_state.s1, fast_pool->pool[cycles & 3]);
 
if (unlikely(crng_init == 0)) {
if ((fast_pool->count >= 64) &&
diff --git a/include/linux/prandom.h b/include/linux/prandom.h
index aa16e64..cc1e713 100644
--- a/include/linux/prandom.h
+++ b/include/linux/prandom.h
@@ -16,12 +16,44 @@ void prandom_bytes(void *buf, size_t nbytes);
 void prandom_seed(u32 seed);
 void prandom_reseed_late(void);
 
+#if BITS_PER_LONG == 64
+/*
+ * The core SipHash round function.  Each line can be executed in
+ * parallel given enough CPU resources.
+ */
+#define PRND_SIPROUND(v0, v1, v2, v3) ( \
+   v0 += v1, v1 = rol64(v1, 13),  v2 += v3, v3 = rol64(v3, 16), \
+   v1 ^= v0, v0 = rol64(v0, 32),  v3 ^= v2, \
+   v0 += v3, v3 = rol64(v3, 21),  v2 += v1, v1 = rol64(v1, 17), \
+   v3 ^= v0,  v1 ^= v2, v2 = rol64(v2, 32)  \
+)
+
+#define PRND_K0 (0x736f6d6570736575 ^ 0x6c7967656e657261)
+#define PRND_K1 (0x646f72616e646f6d ^ 0x7465646279746573)
+
+#elif BITS_PER_LONG == 32
+/*
+ * On 32-bit machines, we use HSipHash, a reduced-width version of SipHash.
+ * This is weaker, but 32-bit machines are not used for high-traffic
+ * applications, so there is less output for an attacker to analyze.
+ */
+#define PRND_SIPROUND(v0, v1, v2, v3) ( \
+   v0 += v1, v1 = rol32(v1,  5),  v2 += v3, v3 = rol32(v3,  8), \
+   v1 ^= v0, v0 = rol32(v0, 16),  v3 ^= v2, \
+   v0 += v3, v3 = rol32(v3,  7),  v2 += v1, v1 = rol32(v1, 13), \
+   v3 ^= v0,  v1 ^= v2, v2 = rol32(v2, 16)  \
+)
+#define PRND_K0 0x6c796765
+#define PRND_K1 0x74656462
+
+#else
+#error Unsupported BITS_PER_LONG
+#endif
+
 struct rnd_state {
__u32 s1, s2, s3, s4;
 };
 
-DECLARE_PER_CPU(struct rnd_state, net_rand_state);
-
 u32 prandom_u32_state(struct rnd_state *state);
 void prandom_bytes_state(struct rnd_state *state, void *buf, size_t nbytes);
 void prandom_seed_full_state(struct rnd_state __percpu *pcpu_state);
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index d2e4698..c9325a1 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1636,13 +1636,6 @@ void update_process_times(int user_tick)
 #endif
scheduler_tick();
run_posix_cpu_timers(p);
-
-   /* The current 

[PATCH 4.19] random32: make prandom_u32() output unpredictable

2020-11-10 Thread Willy Tarreau
From: George Spelvin 

commit c51f8f88d705e06bd696d7510aff22b33eb8e638 upstream.

Non-cryptographic PRNGs may have great statistical properties, but
are usually trivially predictable to someone who knows the algorithm,
given a small sample of their output.  An LFSR like prandom_u32() is
particularly simple, even if the sample is widely scattered bits.

It turns out the network stack uses prandom_u32() for some things like
random port numbers which it would prefer are *not* trivially predictable.
Predictability led to a practical DNS spoofing attack.  Oops.

This patch replaces the LFSR with a homebrew cryptographic PRNG based
on the SipHash round function, which is in turn seeded with 128 bits
of strong random key.  (The authors of SipHash have *not* been consulted
about this abuse of their algorithm.)  Speed is prioritized over security;
attacks are rare, while performance is always wanted.

Replacing all callers of prandom_u32() is the quick fix.
Whether to reinstate a weaker PRNG for uses which can tolerate it
is an open question.

Commit f227e3ec3b5c ("random32: update the net random state on interrupt
and activity") was an earlier attempt at a solution.  This patch replaces
it.

Reported-by: Amit Klein 
Cc: Willy Tarreau 
Cc: Eric Dumazet 
Cc: "Jason A. Donenfeld" 
Cc: Andy Lutomirski 
Cc: Kees Cook 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Linus Torvalds 
Cc: ty...@mit.edu
Cc: Florian Westphal 
Cc: Marc Plumb 
Fixes: f227e3ec3b5c ("random32: update the net random state on interrupt and 
activity")
Signed-off-by: George Spelvin 
Link: https://lore.kernel.org/netdev/20200808152628.ga27...@sdf.org/
[ willy: partial reversal of f227e3ec3b5c; moved SIPROUND definitions
  to prandom.h for later use; merged George's prandom_seed() proposal;
  inlined siprand_u32(); replaced the net_rand_state[] array with 4
  members to fix a build issue; cosmetic cleanups to make checkpatch
  happy; fixed RANDOM32_SELFTEST build ]
Signed-off-by: Willy Tarreau 
[wt: backported to 4.19 -- various context adjustments]
Signed-off-by: Willy Tarreau 
---
 drivers/char/random.c   |   1 -
 include/linux/prandom.h |  36 +++-
 kernel/time/timer.c |   7 -
 lib/random32.c  | 462 +---
 4 files changed, 317 insertions(+), 189 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 80dedec..98925d4 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1257,7 +1257,6 @@ void add_interrupt_randomness(int irq, int irq_flags)
 
fast_mix(fast_pool);
add_interrupt_bench(cycles);
-   this_cpu_add(net_rand_state.s1, fast_pool->pool[cycles & 3]);
 
if (unlikely(crng_init == 0)) {
if ((fast_pool->count >= 64) &&
diff --git a/include/linux/prandom.h b/include/linux/prandom.h
index aa16e64..cc1e713 100644
--- a/include/linux/prandom.h
+++ b/include/linux/prandom.h
@@ -16,12 +16,44 @@ void prandom_bytes(void *buf, size_t nbytes);
 void prandom_seed(u32 seed);
 void prandom_reseed_late(void);
 
+#if BITS_PER_LONG == 64
+/*
+ * The core SipHash round function.  Each line can be executed in
+ * parallel given enough CPU resources.
+ */
+#define PRND_SIPROUND(v0, v1, v2, v3) ( \
+   v0 += v1, v1 = rol64(v1, 13),  v2 += v3, v3 = rol64(v3, 16), \
+   v1 ^= v0, v0 = rol64(v0, 32),  v3 ^= v2, \
+   v0 += v3, v3 = rol64(v3, 21),  v2 += v1, v1 = rol64(v1, 17), \
+   v3 ^= v0,  v1 ^= v2, v2 = rol64(v2, 32)  \
+)
+
+#define PRND_K0 (0x736f6d6570736575 ^ 0x6c7967656e657261)
+#define PRND_K1 (0x646f72616e646f6d ^ 0x7465646279746573)
+
+#elif BITS_PER_LONG == 32
+/*
+ * On 32-bit machines, we use HSipHash, a reduced-width version of SipHash.
+ * This is weaker, but 32-bit machines are not used for high-traffic
+ * applications, so there is less output for an attacker to analyze.
+ */
+#define PRND_SIPROUND(v0, v1, v2, v3) ( \
+   v0 += v1, v1 = rol32(v1,  5),  v2 += v3, v3 = rol32(v3,  8), \
+   v1 ^= v0, v0 = rol32(v0, 16),  v3 ^= v2, \
+   v0 += v3, v3 = rol32(v3,  7),  v2 += v1, v1 = rol32(v1, 13), \
+   v3 ^= v0,  v1 ^= v2, v2 = rol32(v2, 16)  \
+)
+#define PRND_K0 0x6c796765
+#define PRND_K1 0x74656462
+
+#else
+#error Unsupported BITS_PER_LONG
+#endif
+
 struct rnd_state {
__u32 s1, s2, s3, s4;
 };
 
-DECLARE_PER_CPU(struct rnd_state, net_rand_state);
-
 u32 prandom_u32_state(struct rnd_state *state);
 void prandom_bytes_state(struct rnd_state *state, void *buf, size_t nbytes);
 void prandom_seed_full_state(struct rnd_state __percpu *pcpu_state);
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 61e41ea..a6e88d9 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1655,13 +1655,6 @@ void update_process_times(int user_tick)
scheduler_tick();
if (IS_ENABLED(CONFIG_POSIX_TIMERS))
run_posix_cpu_timers(p);

[PATCH 5.8] random32: make prandom_u32() output unpredictable

2020-11-10 Thread Willy Tarreau
From: George Spelvin 

commit c51f8f88d705e06bd696d7510aff22b33eb8e638 upstream.

Non-cryptographic PRNGs may have great statistical properties, but
are usually trivially predictable to someone who knows the algorithm,
given a small sample of their output.  An LFSR like prandom_u32() is
particularly simple, even if the sample is widely scattered bits.

It turns out the network stack uses prandom_u32() for some things like
random port numbers which it would prefer are *not* trivially predictable.
Predictability led to a practical DNS spoofing attack.  Oops.

This patch replaces the LFSR with a homebrew cryptographic PRNG based
on the SipHash round function, which is in turn seeded with 128 bits
of strong random key.  (The authors of SipHash have *not* been consulted
about this abuse of their algorithm.)  Speed is prioritized over security;
attacks are rare, while performance is always wanted.

Replacing all callers of prandom_u32() is the quick fix.
Whether to reinstate a weaker PRNG for uses which can tolerate it
is an open question.

Commit f227e3ec3b5c ("random32: update the net random state on interrupt
and activity") was an earlier attempt at a solution.  This patch replaces
it.

Reported-by: Amit Klein 
Cc: Willy Tarreau 
Cc: Eric Dumazet 
Cc: "Jason A. Donenfeld" 
Cc: Andy Lutomirski 
Cc: Kees Cook 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Linus Torvalds 
Cc: ty...@mit.edu
Cc: Florian Westphal 
Cc: Marc Plumb 
Fixes: f227e3ec3b5c ("random32: update the net random state on interrupt and 
activity")
Signed-off-by: George Spelvin 
Link: https://lore.kernel.org/netdev/20200808152628.ga27...@sdf.org/
[ willy: partial reversal of f227e3ec3b5c; moved SIPROUND definitions
  to prandom.h for later use; merged George's prandom_seed() proposal;
  inlined siprand_u32(); replaced the net_rand_state[] array with 4
  members to fix a build issue; cosmetic cleanups to make checkpatch
  happy; fixed RANDOM32_SELFTEST build ]
Signed-off-by: Willy Tarreau 
[wt: backported to 5.8 -- no tracepoint there]
Signed-off-by: Willy Tarreau 
---
 drivers/char/random.c   |   1 -
 include/linux/prandom.h |  36 +++-
 kernel/time/timer.c |   7 -
 lib/random32.c  | 462 +---
 4 files changed, 317 insertions(+), 189 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index d20ba1b..2a41b21 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1277,7 +1277,6 @@ void add_interrupt_randomness(int irq, int irq_flags)
 
fast_mix(fast_pool);
add_interrupt_bench(cycles);
-   this_cpu_add(net_rand_state.s1, fast_pool->pool[cycles & 3]);
 
if (unlikely(crng_init == 0)) {
if ((fast_pool->count >= 64) &&
diff --git a/include/linux/prandom.h b/include/linux/prandom.h
index aa16e64..cc1e713 100644
--- a/include/linux/prandom.h
+++ b/include/linux/prandom.h
@@ -16,12 +16,44 @@ void prandom_bytes(void *buf, size_t nbytes);
 void prandom_seed(u32 seed);
 void prandom_reseed_late(void);
 
+#if BITS_PER_LONG == 64
+/*
+ * The core SipHash round function.  Each line can be executed in
+ * parallel given enough CPU resources.
+ */
+#define PRND_SIPROUND(v0, v1, v2, v3) ( \
+   v0 += v1, v1 = rol64(v1, 13),  v2 += v3, v3 = rol64(v3, 16), \
+   v1 ^= v0, v0 = rol64(v0, 32),  v3 ^= v2, \
+   v0 += v3, v3 = rol64(v3, 21),  v2 += v1, v1 = rol64(v1, 17), \
+   v3 ^= v0,  v1 ^= v2, v2 = rol64(v2, 32)  \
+)
+
+#define PRND_K0 (0x736f6d6570736575 ^ 0x6c7967656e657261)
+#define PRND_K1 (0x646f72616e646f6d ^ 0x7465646279746573)
+
+#elif BITS_PER_LONG == 32
+/*
+ * On 32-bit machines, we use HSipHash, a reduced-width version of SipHash.
+ * This is weaker, but 32-bit machines are not used for high-traffic
+ * applications, so there is less output for an attacker to analyze.
+ */
+#define PRND_SIPROUND(v0, v1, v2, v3) ( \
+   v0 += v1, v1 = rol32(v1,  5),  v2 += v3, v3 = rol32(v3,  8), \
+   v1 ^= v0, v0 = rol32(v0, 16),  v3 ^= v2, \
+   v0 += v3, v3 = rol32(v3,  7),  v2 += v1, v1 = rol32(v1, 13), \
+   v3 ^= v0,  v1 ^= v2, v2 = rol32(v2, 16)  \
+)
+#define PRND_K0 0x6c796765
+#define PRND_K1 0x74656462
+
+#else
+#error Unsupported BITS_PER_LONG
+#endif
+
 struct rnd_state {
__u32 s1, s2, s3, s4;
 };
 
-DECLARE_PER_CPU(struct rnd_state, net_rand_state);
-
 u32 prandom_u32_state(struct rnd_state *state);
 void prandom_bytes_state(struct rnd_state *state, void *buf, size_t nbytes);
 void prandom_seed_full_state(struct rnd_state __percpu *pcpu_state);
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 026ac01..c9d839c 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1743,13 +1743,6 @@ void update_process_times(int user_tick)
scheduler_tick();
if (IS_ENABLED(CONFIG_POSIX_TIMERS))
run_posix_cpu_timers();
-
-

Re: [PATCH 5.9 639/757] random32: make prandom_u32() output unpredictable

2020-11-08 Thread Willy Tarreau
On Mon, Nov 09, 2020 at 08:54:13AM +0200, Amit Klein wrote:
> Unfortunately, I'm just a security researcher, not a kernel developer...
> 
> Does that mean you don't plan to back-port the patch?

I could possibly have a look, but quite frankly I'm not convinced that we
need to backport this at all. I think that what we've done is mostly to be
future-proof and that the likelihood of practical attacks against live
systems with the previous fix are close to zero.

Cheers,
Willy


Re: [PATCH 3/5] gpio: msc313: MStar MSC313 GPIO driver

2020-11-05 Thread Willy Tarreau
On Thu, Nov 05, 2020 at 10:21:27AM +0100, Linus Walleij wrote:
> If your SoC is only used by OpenWrt (like ixp4xx) then it is fine
> to just use bool because that distribution is always built with an
> image for a specific hardware, whereas distributions are generic.

Speaking for myself (since I have a few now), I'm not running OpenWRT
on mine but my own distro, and I guess most users will run either
Buildroot or their own distro. It's unlikely that we'll see very
generic distros there given the limited storage you'd typically have
in an SPI NOR (16-32 MB) and the small RAM (64MB) which tends to
discourage anyone from booting a regular distro over other storage
anyway.

Thus my guess is that most users will keep building their own kernels.

But this just emphasizes your points :-)

Just my two cents,
Willy


Re: [PATCH 5.9 639/757] random32: make prandom_u32() output unpredictable

2020-10-27 Thread Willy Tarreau
Hi Greg,

On Tue, Oct 27, 2020 at 02:54:49PM +0100, Greg Kroah-Hartman wrote:
> From: George Spelvin 
> 
> [ Upstream commit c51f8f88d705e06bd696d7510aff22b33eb8e638 ]
> 
> Non-cryptographic PRNGs may have great statistical properties, but
> are usually trivially predictable to someone who knows the algorithm,
> given a small sample of their output.  An LFSR like prandom_u32() is
> particularly simple, even if the sample is widely scattered bits.
(...)

I'd have let it cook a bit longer into mainline before backporting it,
first it's not small (a bit border line by stable rules), and second,
considering how long we've been with the previous solution, there's no
emergency anymore. The risks are essentially at the build level though
(e.g.  include hell on exotic architectures, or obscure driver trying
to make use of one of the removed functions maybe).

On the other hand, given the amount of tests that run on the stable
queue, we'll quickly know! So we can probably keep it for now, but do
not hesitate to drop and postpone it if it causes any trouble so that
we have time to investigate. I'd rather not go through the previous
one's repeated breakage again!

Thanks,
Willy


[PATCH v4 3/3] random32: add a selftest for the prandom32 code

2020-10-24 Thread Willy Tarreau
Given that this code is new, let's add a selftest for it as well.
It doesn't rely on fixed sets, instead it picks 1024 numbers and
verifies that they're not more correlated than desired.

Link: https://lore.kernel.org/netdev/20200808152628.ga27...@sdf.org/
Cc: George Spelvin 
Cc: Amit Klein 
Cc: Eric Dumazet 
Cc: "Jason A. Donenfeld" 
Cc: Andy Lutomirski 
Cc: Kees Cook 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Linus Torvalds 
Cc: ty...@mit.edu
Cc: Florian Westphal 
Cc: Marc Plumb 
Signed-off-by: Willy Tarreau 
---
 lib/random32.c | 56 ++
 1 file changed, 56 insertions(+)

diff --git a/lib/random32.c b/lib/random32.c
index 7f047844e494..4d0e05e471d7 100644
--- a/lib/random32.c
+++ b/lib/random32.c
@@ -38,6 +38,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -556,6 +557,61 @@ static void prandom_timer_start(struct 
random_ready_callback *unused)
mod_timer(_timer, jiffies);
 }
 
+#ifdef CONFIG_RANDOM32_SELFTEST
+/* Principle: True 32-bit random numbers will all have 16 differing bits on
+ * average. For each 32-bit number, there are 601M numbers differing by 16
+ * bits, and 89% of the numbers differ by at least 12 bits. Note that more
+ * than 16 differing bits also implies a correlation with inverted bits. Thus
+ * we take 1024 random numbers and compare each of them to the other ones,
+ * counting the deviation of correlated bits to 16. Constants report 32,
+ * counters 32-log2(TEST_SIZE), and pure randoms, around 6 or lower. With the
+ * u32 total, TEST_SIZE may be as large as 4096 samples.
+ */
+#define TEST_SIZE 1024
+static int __init prandom32_state_selftest(void)
+{
+   unsigned int x, y, bits, samples;
+   u32 xor, flip;
+   u32 total;
+   u32 *data;
+
+   data = kmalloc(sizeof(*data) * TEST_SIZE, GFP_KERNEL);
+   if (!data)
+   return 0;
+
+   for (samples = 0; samples < TEST_SIZE; samples++)
+   data[samples] = prandom_u32();
+
+   flip = total = 0;
+   for (x = 0; x < samples; x++) {
+   for (y = 0; y < samples; y++) {
+   if (x == y)
+   continue;
+   xor = data[x] ^ data[y];
+   flip |= xor;
+   bits = hweight32(xor);
+   total += (bits - 16) * (bits - 16);
+   }
+   }
+
+   /* We'll return the average deviation as 2*sqrt(corr/samples), which
+* is also sqrt(4*corr/samples) which provides a better resolution.
+*/
+   bits = int_sqrt(total / (samples * (samples - 1)) * 4);
+   if (bits > 6)
+   pr_warn("prandom32: self test failed (at least %u bits"
+   " correlated, fixed_mask=%#x fixed_value=%#x\n",
+   bits, ~flip, data[0] & ~flip);
+   else
+   pr_info("prandom32: self test passed (less than %u bits"
+   " correlated)\n",
+   bits+1);
+   kfree(data);
+   return 0;
+}
+core_initcall(prandom32_state_selftest);
+#endif /*  CONFIG_RANDOM32_SELFTEST */
+
 /*
  * Start periodic full reseeding as soon as strong
  * random numbers are available.
-- 
2.28.0



[PATCH v4 1/3] random32: make prandom_u32() output unpredictable

2020-10-24 Thread Willy Tarreau
From: George Spelvin 

Non-cryptographic PRNGs may have great statistical properties, but
are usually trivially predictable to someone who knows the algorithm,
given a small sample of their output.  An LFSR like prandom_u32() is
particularly simple, even if the sample is widely scattered bits.

It turns out the network stack uses prandom_u32() for some things like
random port numbers which it would prefer are *not* trivially predictable.
Predictability led to a practical DNS spoofing attack.  Oops.

This patch replaces the LFSR with a homebrew cryptographic PRNG based
on the SipHash round function, which is in turn seeded with 128 bits
of strong random key.  (The authors of SipHash have *not* been consulted
about this abuse of their algorithm.)  Speed is prioritized over security;
attacks are rare, while performance is always wanted.

Replacing all callers of prandom_u32() is the quick fix.
Whether to reinstate a weaker PRNG for uses which can tolerate it
is an open question.

Commit f227e3ec3b5c ("random32: update the net random state on interrupt
and activity") was an earlier attempt at a solution.  This patch replaces
it.

Reported-by: Amit Klein 
Cc: Willy Tarreau 
Cc: Eric Dumazet 
Cc: "Jason A. Donenfeld" 
Cc: Andy Lutomirski 
Cc: Kees Cook 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Linus Torvalds 
Cc: ty...@mit.edu
Cc: Florian Westphal 
Cc: Marc Plumb 
Fixes: f227e3ec3b5c ("random32: update the net random state on interrupt and 
activity")
Signed-off-by: George Spelvin 
Link: https://lore.kernel.org/netdev/20200808152628.ga27...@sdf.org/
[ willy: partial reversal of f227e3ec3b5c; moved SIPROUND definitions
  to prandom.h for later use; merged George's prandom_seed() proposal;
  inlined siprand_u32(); replaced the net_rand_state[] array with 4
  members to fix a build issue; cosmetic cleanups to make checkpatch
  happy; fixed RANDOM32_SELFTEST build ]
Signed-off-by: Willy Tarreau 
---
 drivers/char/random.c   |   1 -
 include/linux/prandom.h |  36 +++-
 kernel/time/timer.c |   7 -
 lib/random32.c  | 464 
 4 files changed, 318 insertions(+), 190 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index d20ba1b104ca..2a41b21623ae 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1277,7 +1277,6 @@ void add_interrupt_randomness(int irq, int irq_flags)
 
fast_mix(fast_pool);
add_interrupt_bench(cycles);
-   this_cpu_add(net_rand_state.s1, fast_pool->pool[cycles & 3]);
 
if (unlikely(crng_init == 0)) {
if ((fast_pool->count >= 64) &&
diff --git a/include/linux/prandom.h b/include/linux/prandom.h
index aa16e6468f91..cc1e71334e53 100644
--- a/include/linux/prandom.h
+++ b/include/linux/prandom.h
@@ -16,12 +16,44 @@ void prandom_bytes(void *buf, size_t nbytes);
 void prandom_seed(u32 seed);
 void prandom_reseed_late(void);
 
+#if BITS_PER_LONG == 64
+/*
+ * The core SipHash round function.  Each line can be executed in
+ * parallel given enough CPU resources.
+ */
+#define PRND_SIPROUND(v0, v1, v2, v3) ( \
+   v0 += v1, v1 = rol64(v1, 13),  v2 += v3, v3 = rol64(v3, 16), \
+   v1 ^= v0, v0 = rol64(v0, 32),  v3 ^= v2, \
+   v0 += v3, v3 = rol64(v3, 21),  v2 += v1, v1 = rol64(v1, 17), \
+   v3 ^= v0,  v1 ^= v2, v2 = rol64(v2, 32)  \
+)
+
+#define PRND_K0 (0x736f6d6570736575 ^ 0x6c7967656e657261)
+#define PRND_K1 (0x646f72616e646f6d ^ 0x7465646279746573)
+
+#elif BITS_PER_LONG == 32
+/*
+ * On 32-bit machines, we use HSipHash, a reduced-width version of SipHash.
+ * This is weaker, but 32-bit machines are not used for high-traffic
+ * applications, so there is less output for an attacker to analyze.
+ */
+#define PRND_SIPROUND(v0, v1, v2, v3) ( \
+   v0 += v1, v1 = rol32(v1,  5),  v2 += v3, v3 = rol32(v3,  8), \
+   v1 ^= v0, v0 = rol32(v0, 16),  v3 ^= v2, \
+   v0 += v3, v3 = rol32(v3,  7),  v2 += v1, v1 = rol32(v1, 13), \
+   v3 ^= v0,  v1 ^= v2, v2 = rol32(v2, 16)  \
+)
+#define PRND_K0 0x6c796765
+#define PRND_K1 0x74656462
+
+#else
+#error Unsupported BITS_PER_LONG
+#endif
+
 struct rnd_state {
__u32 s1, s2, s3, s4;
 };
 
-DECLARE_PER_CPU(struct rnd_state, net_rand_state);
-
 u32 prandom_u32_state(struct rnd_state *state);
 void prandom_bytes_state(struct rnd_state *state, void *buf, size_t nbytes);
 void prandom_seed_full_state(struct rnd_state __percpu *pcpu_state);
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index dda05f4b7a1f..3e341af741b9 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1717,13 +1717,6 @@ void update_process_times(int user_tick)
scheduler_tick();
if (IS_ENABLED(CONFIG_POSIX_TIMERS))
run_posix_cpu_timers();
-
-   /* The current CPU might make use of net randoms without receiving IRQs
-* to renew them often enough. Le

[PATCH v4 2/3] random32: add noise from network and scheduling activity

2020-10-24 Thread Willy Tarreau
With the removal of the interrupt perturbations in previous random32
change (random32: make prandom_u32() output unpredictable), the PRNG
has become 100% deterministic again. While SipHash is expected to be
way more robust against brute force than the previous Tausworthe LFSR,
there's still the risk that whoever has even one temporary access to
the PRNG's internal state is able to predict all subsequent draws till
the next reseed (roughly every minute). This may happen through a side
channel attack or any data leak.

This patch restores the spirit of commit f227e3ec3b5c ("random32: update
the net random state on interrupt and activity") in that it will perturb
the internal PRNG's statee using externally collected noise, except that
it will not pick that noise from the random pool's bits nor upon
interrupt, but will rather combine a few elements along the Tx path
that are collectively hard to predict, such as dev, skb and txq
pointers, packet length and jiffies values. These ones are combined
using a single round of SipHash into a single long variable that is
mixed with the net_rand_state upon each invocation.

The operation was inlined because it produces very small and efficient
code, typically 3 xor, 2 add and 2 rol. The performance was measured
to be the same (even very slightly better) than before the switch to
SipHash; on a 6-core 12-thread Core i7-8700k equipped with a 40G NIC
(i40e), the connection rate dropped from 556k/s to 555k/s while the
SYN cookie rate grew from 5.38 Mpps to 5.45 Mpps.

Link: https://lore.kernel.org/netdev/20200808152628.ga27...@sdf.org/
Cc: George Spelvin 
Cc: Amit Klein 
Cc: Eric Dumazet 
Cc: "Jason A. Donenfeld" 
Cc: Andy Lutomirski 
Cc: Kees Cook 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Linus Torvalds 
Cc: ty...@mit.edu
Cc: Florian Westphal 
Cc: Marc Plumb 
Tested-by: Sedat Dilek 
Signed-off-by: Willy Tarreau 
---
 include/linux/prandom.h | 19 +++
 kernel/time/timer.c |  2 ++
 lib/random32.c  |  5 +
 net/core/dev.c  |  4 
 4 files changed, 30 insertions(+)

diff --git a/include/linux/prandom.h b/include/linux/prandom.h
index cc1e71334e53..bbf4b4ad61df 100644
--- a/include/linux/prandom.h
+++ b/include/linux/prandom.h
@@ -16,6 +16,12 @@ void prandom_bytes(void *buf, size_t nbytes);
 void prandom_seed(u32 seed);
 void prandom_reseed_late(void);
 
+DECLARE_PER_CPU(unsigned long, net_rand_noise);
+
+#define PRANDOM_ADD_NOISE(a, b, c, d) \
+   prandom_u32_add_noise((unsigned long)(a), (unsigned long)(b), \
+ (unsigned long)(c), (unsigned long)(d))
+
 #if BITS_PER_LONG == 64
 /*
  * The core SipHash round function.  Each line can be executed in
@@ -50,6 +56,18 @@ void prandom_reseed_late(void);
 #error Unsupported BITS_PER_LONG
 #endif
 
+static inline void prandom_u32_add_noise(unsigned long a, unsigned long b,
+unsigned long c, unsigned long d)
+{
+   /*
+* This is not used cryptographically; it's just
+* a convenient 4-word hash function. (3 xor, 2 add, 2 rol)
+*/
+   a ^= raw_cpu_read(net_rand_noise);
+   PRND_SIPROUND(a, b, c, d);
+   raw_cpu_write(net_rand_noise, d);
+}
+
 struct rnd_state {
__u32 s1, s2, s3, s4;
 };
@@ -99,6 +117,7 @@ static inline void prandom_seed_state(struct rnd_state 
*state, u64 seed)
state->s2 = __seed(i,   8U);
state->s3 = __seed(i,  16U);
state->s4 = __seed(i, 128U);
+   PRANDOM_ADD_NOISE(state, i, 0, 0);
 }
 
 /* Pseudo random number generator from numerical recipes. */
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 3e341af741b9..de37e33a868d 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1706,6 +1706,8 @@ void update_process_times(int user_tick)
 {
struct task_struct *p = current;
 
+   PRANDOM_ADD_NOISE(jiffies, user_tick, p, 0);
+
/* Note: this timer irq context must be accounted for as well. */
account_process_tick(p, user_tick);
run_local_timers();
diff --git a/lib/random32.c b/lib/random32.c
index be9f242a4207..7f047844e494 100644
--- a/lib/random32.c
+++ b/lib/random32.c
@@ -337,6 +337,8 @@ struct siprand_state {
 };
 
 static DEFINE_PER_CPU(struct siprand_state, net_rand_state) __latent_entropy;
+DEFINE_PER_CPU(unsigned long, net_rand_noise);
+EXPORT_PER_CPU_SYMBOL(net_rand_noise);
 
 /*
  * This is the core CPRNG function.  As "pseudorandom", this is not used
@@ -360,9 +362,12 @@ static DEFINE_PER_CPU(struct siprand_state, 
net_rand_state) __latent_entropy;
 static inline u32 siprand_u32(struct siprand_state *s)
 {
unsigned long v0 = s->v0, v1 = s->v1, v2 = s->v2, v3 = s->v3;
+   unsigned long n = raw_cpu_read(net_rand_noise);
 
+   v3 ^= n;
PRND_SIPROUND(v0, v1, v2, v3);
PRND_SIPROUND(v0, v1, v2, v3);
+   v0 ^= n;
s->v0 = v0;  s->v1 = v1;  s->v2 = v2;  s->v3 = v3;
  

  1   2   3   4   5   6   7   8   9   10   >