Re: [net-next PATCH] e1000: add initial XDP support

2016-08-27 Thread Or Gerlitz
On Sat, Aug 27, 2016 at 10:11 AM, John Fastabend
 wrote:
> From: Alexei Starovoitov 

> This patch adds initial support for XDP on e1000 driver. Note e1000
> driver does not support page recycling in general which could be
> added as a further improvement. However for XDP_DROP and XDP_XMIT
> the xdp code paths will recycle pages.

> @@ -4188,15 +4305,57 @@ static bool e1000_clean_jumbo_rx_irq(struct 
> e1000_adapter *adapter,
> prefetch(next_rxd);
>
> next_buffer = _ring->buffer_info[i];
> -

nit, better to avoid random cleanups in a patch adding new (&& cool)
functionality

> cleaned = true;
> cleaned_count++;
> +   length = le16_to_cpu(rx_desc->length);
> +
> +   if (prog) {
> +   struct page *p = buffer_info->rxbuf.page;
> +   dma_addr_t dma = buffer_info->dma;
> +   int act;
> +
> +   if (unlikely(!(status & E1000_RXD_STAT_EOP))) {
> +   /* attached bpf disallows larger than page
> +* packets, so this is hw error or corruption
> +*/
> +   pr_info_once("%s buggy !eop\n", netdev->name);
> +   break;
> +   }
> +   if (unlikely(rx_ring->rx_skb_top)) {
> +   pr_info_once("%s ring resizing bug\n",
> +netdev->name);
> +   break;
> +   }
> +   dma_sync_single_for_cpu(>dev, dma,
> +   length, DMA_FROM_DEVICE);
> +   act = e1000_call_bpf(prog, page_address(p), length);
> +   switch (act) {
> +   case XDP_PASS:
> +   break;
> +   case XDP_TX:
> +   dma_sync_single_for_device(>dev,
> +  dma,
> +  length,
> +  DMA_TO_DEVICE);
> +   e1000_xmit_raw_frame(buffer_info, length,
> +netdev, adapter);
> +   /* Fallthrough to re-use mappedg page after xmit */

Did you want to say "mapped"? wasn't sure what's the role of "g" @ the end

> +   case XDP_DROP:
> +   default:
> +   /* re-use mapped page. keep buffer_info->dma
> +* as-is, so that e1000_alloc_jumbo_rx_buffers
> +* only needs to put it back into rx ring
> +*/

if we're on the XDP_TX pass, don't we need to actually see that frame
has been xmitted
before re using the page?

> +   total_rx_bytes += length;
> +   total_rx_packets++;
> +   goto next_desc;
> +   }
> +   }
> +
> dma_unmap_page(>dev, buffer_info->dma,
>adapter->rx_buffer_len, DMA_FROM_DEVICE);
> buffer_info->dma = 0;


Re: [PATCH] [v9] net: emac: emac gigabit ethernet controller driver

2016-08-27 Thread David Miller
From: Timur Tabi 
Date: Sat, 27 Aug 2016 07:26:39 -0500

> David Miller wrote:
>> From: Timur Tabi 
>> Date: Thu, 25 Aug 2016 16:39:03 -0500
>>
>>> +static const u8 duuid[] = {
>>> +   0x77, 0x79, 0x60, 0xbf,
>>> +   0x2d, 0xab,
>>> +   0x9d, 0x4b,
>>> +   0x94, 0xf0,
>>> +   0xe1, 0x11, 0x98, 0x01, 0xa2, 0xba
>>> +};
>>
>> This seems to be completely unused, please remove it.
> 
> Sorry, I missed that during my cleanup.  I'll remove it in v10.
> 
> I wonder why my compiler didn't complain about the unused static.

Unused static "const" is not warned about.


[PATCH v2] ipv6: Use inbound ifaddr as source addresses for ICMPv6 errors

2016-08-27 Thread Eli Cooper
According to RFC 1885 2.2(c), the source address of ICMPv6
errors in response to forwarded packets should be set to the
unicast address of the forwarding interface in order to be helpful
in diagnosis. Currently the selection of source address is based
on the default route, without respect to the inbound interface.

This patch sets the source address of ICMPv6 error messages to
the address of inbound interface, with the exception of
'time exceeded' and 'packet to big' messages sent in ip6_forward(),
where the address of OUTPUT device is forced as source address
(however, it is NOT enforced as claimed without this patch).

Signed-off-by: Eli Cooper 
---
v2: changed to a switch statement

 net/ipv6/icmp.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index bd59c34..edb50b6 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -397,6 +397,7 @@ static void icmp6_send(struct sk_buff *skb, u8 type, u8 
code, __u32 info,
struct sock *sk;
struct ipv6_pinfo *np;
const struct in6_addr *saddr = NULL;
+   struct in6_addr tmp_saddr;
struct dst_entry *dst;
struct icmp6hdr tmp_hdr;
struct flowi6 fl6;
@@ -421,6 +422,17 @@ static void icmp6_send(struct sk_buff *skb, u8 type, u8 
code, __u32 info,
 */
addr_type = ipv6_addr_type(>daddr);
 
+   switch (type) {
+   case ICMPV6_DEST_UNREACH:
+   case ICMPV6_PKT_TOOBIG:
+   case ICMPV6_TIME_EXCEED:
+   case ICMPV6_PARAMPROB:
+   if (!ipv6_dev_get_saddr(net, skb->dev, >saddr, 0,
+   _saddr))
+   saddr = _saddr;
+   break;
+   }
+
if (ipv6_chk_addr(net, >daddr, skb->dev, 0) ||
ipv6_chk_acast_addr_src(net, skb->dev, >daddr))
saddr = >daddr;
-- 
2.9.3



[PATCH] Documentation: networking: dsa: Remove platform device TODO

2016-08-27 Thread Florian Fainelli
Since commit 83c0afaec7b7 ("net: dsa: Add new binding implementation"),
the shortcomings of the dsa platform device have been addressed, remove
that TODO item.

Signed-off-by: Florian Fainelli 
---
 Documentation/networking/dsa/dsa.txt | 20 
 1 file changed, 20 deletions(-)

diff --git a/Documentation/networking/dsa/dsa.txt 
b/Documentation/networking/dsa/dsa.txt
index 44ed453ccf66..a4e55c76d371 100644
--- a/Documentation/networking/dsa/dsa.txt
+++ b/Documentation/networking/dsa/dsa.txt
@@ -587,26 +587,6 @@ of DSA, would be the its port-based VLAN, used by the 
associated bridge device.
 TODO
 
 
-The platform device problem

-DSA is currently implemented as a platform device driver which is far from 
ideal
-as was discussed in this thread:
-
-http://permalink.gmane.org/gmane.linux.network/329848
-
-This basically prevents the device driver model to be properly used and 
applied,
-and support non-MDIO, non-MMIO Ethernet connected switches.
-
-Another problem with the platform device driver approach is that it prevents 
the
-use of a modular switch drivers build due to a circular dependency, illustrated
-here:
-
-http://comments.gmane.org/gmane.linux.network/345803
-
-Attempts of reworking this has been done here:
-
-https://lwn.net/Articles/643149/
-
 Making SWITCHDEV and DSA converge towards an unified codebase
 -
 
-- 
2.7.4



Re: [PATCH v2] net: smc91x: fix SMC accesses

2016-08-27 Thread Robert Jarzmik
Russell King  writes:

> Commit b70661c70830 ("net: smc91x: use run-time configuration on all ARM
> machines") broke some ARM platforms through several mistakes.  Firstly,
> the access size must correspond to the following rule:
>
> (a) at least one of 16-bit or 8-bit access size must be supported
> (b) 32-bit accesses are optional, and may be enabled in addition to
> the above.
>
> Secondly, it provides no emulation of 16-bit accesses, instead blindly
> making 16-bit accesses even when the platform specifies that only 8-bit
> is supported.
>
> Reorganise smc91x.h so we can make use of the existing 16-bit access
> emulation already provided - if 16-bit accesses are supported, use
> 16-bit accesses directly, otherwise if 8-bit accesses are supported,
> use the provided 16-bit access emulation.  If neither, BUG().  This
> exactly reflects the driver behaviour prior to the commit being fixed.
>
> Since the conversion incorrectly cut down the available access sizes on
> several platforms, we also need to go through every platform and fix up
> the overly-restrictive access size: Arnd assumed that if a platform can
> perform 32-bit, 16-bit and 8-bit accesses, then only a 32-bit access
> size needed to be specified - not so, all available access sizes must
> be specified.
>
> This likely fixes some performance regressions in doing this: if a
> platform does not support 8-bit accesses, 8-bit accesses have been
> emulated by performing a 16-bit read-modify-write access.
>
> Tested on the Intel Assabet/Neponset platform, which supports only 8-bit
> accesses, which was broken by the original commit.
>
> Fixes: b70661c70830 ("net: smc91x: use run-time configuration on all ARM 
> machines")
> Signed-off-by: Russell King 
> ---
> Robert, this is the full patch - can I add your tested-by to this for
> the subset patch please?
Yes, I just retested it, so :
Tested-by: Robert Jarzmik 

Cheers.

--
Robert


iptables, conntrack, and the raw table vs. L3DSR

2016-08-27 Thread Quentin Barnes
Several years ago, I wrote an iptables module that rewrites packets'
destination addresses based on the value in the DSCP field to
implement Layer 3 Direct Server Return (L3DSR).  The main code of
the iptables target module you can find here:
https://github.com/yahoo/l3dsr/blob/master/linux/kmod-xt/xt_DADDR.c

The iptable-daddr module has been in production since I wrote it
with some limitations.  One of those limitations is it doesn't
work well with conntrack modules.  I believe that's from the daddr
rewriting confuses conntrack since changing a packet's daddr changes
its 4-tuple not allowing conntrack to track a connection.

Someone recently suggested I change the module from the "mangle"
table to "raw", so it can be put in the prerouting chain ahead of
conntrack.  That would let conntrack see the packet after its daddr
update.  This approach seems to work fine in a test case letting
L3DSR and conntrack apparently work in concert, but has me concerned
that there might be unforeseen negative side-effects from using the
raw table for doing mangling work.

Can anyone think of any issues with having a mangle target module be
invoked from the raw table?

Or as an alternative if necessary, is there a possible/rational way
to leave the module in the mangle table and then inform conntrack
about the packet's daddr alteration?

Quentin


Re: [PATCH] ravb: avoid unused function warnings

2016-08-27 Thread Sergei Shtylyov

Hello.

On 08/27/2016 01:48 AM, Sergei Shtylyov wrote:


When CONFIG_PM_SLEEP is disabled, we get a couple of harmless warnings:

drivers/net/ethernet/renesas/ravb_main.c:2117:12: error: 'ravb_resume'
defined but not used [-Werror=unused-function]
drivers/net/ethernet/renesas/ravb_main.c:2104:12: error: 'ravb_suspend'
defined but not used [-Werror=unused-function]

The simplest solution here is to replace the #ifdef with __maybe_unused
annotations, which lets the compiler do the right thing by itself.

Signed-off-by: Arnd Bergmann 
Fixes: 0184165b2f42 ("ravb: add sleep PM suspend/resume support")


   That was the patch I didn't review, sorry about that...


---
 drivers/net/ethernet/renesas/ravb_main.c | 14 --
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/renesas/ravb_main.c
b/drivers/net/ethernet/renesas/ravb_main.c
index cad23ba06904..630536bc72f9 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c

[...]

@@ -2166,17 +2165,12 @@ static const struct dev_pm_ops ravb_dev_pm_ops = {
 SET_RUNTIME_PM_OPS(ravb_runtime_nop, ravb_runtime_nop, NULL)
 };

-#define RAVB_PM_OPS (_dev_pm_ops)
-#else
-#define RAVB_PM_OPS NULL
-#endif
-
 static struct platform_driver ravb_driver = {
 .probe= ravb_probe,
 .remove= ravb_remove,
 .driver = {
 .name= "ravb",
-.pm= RAVB_PM_OPS,
+.pm= _dev_pm_ops,


   I guess it's safe to have this pointing to the 'struct dev_pm_ops' filled
with NULLs? Looking at the PM code left me somewhat unsure...


   Well, this means CONFIG_PM[_SLEEP] not defined anyway and the PM code 
calling the methods absent, so it should be safe indeed...


Acked-by: Sergei Shtylyov 

MBR, Sergei



Re: [RFC v2 09/10] landlock: Handle cgroups (program types)

2016-08-27 Thread Mickaël Salaün

On 27/08/2016 22:56, Alexei Starovoitov wrote:
> On Sat, Aug 27, 2016 at 09:55:01PM +0200, Mickaël Salaün wrote:
>>
>> On 27/08/2016 20:19, Alexei Starovoitov wrote:
>>> On Sat, Aug 27, 2016 at 04:34:55PM +0200, Mickaël Salaün wrote:

 On 27/08/2016 01:05, Alexei Starovoitov wrote:
> On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote:
>
>>> As far as safety and type checking that bpf programs has to do,
>>> I like the approach of patch 06/10:
>>> +LANDLOCK_HOOK2(file_open, FILE_OPEN,
>>> +   PTR_TO_STRUCT_FILE, struct file *, file,
>>> +   PTR_TO_STRUCT_CRED, const struct cred *, cred
>>> +)
>>> teaching verifier to recognize struct file, cred, sockaddr
>>> will let bpf program access them naturally without any overhead.
>>> Though:
>>> @@ -102,6 +102,9 @@ enum bpf_prog_type {
>>> BPF_PROG_TYPE_SCHED_CLS,
>>> BPF_PROG_TYPE_SCHED_ACT,
>>> BPF_PROG_TYPE_TRACEPOINT,
>>> +   BPF_PROG_TYPE_LANDLOCK_FILE_OPEN,
>>> +   BPF_PROG_TYPE_LANDLOCK_FILE_PERMISSION,
>>> +   BPF_PROG_TYPE_LANDLOCK_MMAP_FILE,
>>>  };
>>> is a bit of overkill.
>>> I think it would be cleaner to have single
>>> BPF_PROG_TYPE_LSM and at program load time pass
>>> lsm_hook_id as well, so that verifier can do safety checks
>>> based on type info provided in LANDLOCK_HOOKs
>>
>> I first started with a unique BPF_PROG_TYPE but, the thing is, the BPF
>> verifier check programs according to their types. If we need to check
>> specific context value types (e.g. PTR_TO_STRUCT_FILE), we need a
>> dedicated program types. I don't see any other way to do it with the
>> current verifier code. Moreover it's the purpose of program types, right?
>
> Adding new bpf program type for every lsm hook is not acceptable.
> Either do one new program type + pass lsm_hook_id as suggested
> or please come up with an alternative approach.

 OK, so we have to modify the verifier to not only rely on the program
 type but on another value to check the context accesses. Do you have a
 hint from where this value could come from? Do we need to add a new bpf
 command to associate a program to a subtype?
>>>
>>> It's another field prog_subtype (or prog_hook_id) in union bpf_attr.
>>> Both prog_type and prog_hook_id are used during verification.
>>> prog_type distinguishes the main aspects whereas prog_hook_id selects
>>> which lsm_hook's argument definition to apply.
>>> At the time of attaching to a hook, the prog_hook_id passed at the
>>> load time should match lsm's hook_id.
>>
>> OK, so this new prog_subtype field should be use/set by a new bpf_cmd,
>> right? Something like BPF_PROG_SUBTYPE or BPF_PROG_METADATA?
> 
> No new command. It will be an optional field to existing BPF_PROG_LOAD.
> In other words instead of replicating everything that different
> bpf_prog_type-s need, we can pass this hook_id field to fine tune
> the purpose (and applicability) of the program.
> And one BPF_PROG_TYPE_LANDLOCK type will cover all lsm hooks.
> For example existing BPF_PROG_TYPE_TRACEPOINT checks the safety
> for all tracepoints while they all have different arguments.
> For tracepoints it's easier, since the only difference between them
> is the range of ctx access. Here we need strong type safety
> of arguments therefore need extra hook_id at load time.

OK, I will do it.



signature.asc
Description: OpenPGP digital signature


Re: [RFC v2 09/10] landlock: Handle cgroups (performance)

2016-08-27 Thread Mickaël Salaün

On 27/08/2016 22:43, Alexei Starovoitov wrote:
> On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote:
>> On 27/08/2016 20:06, Alexei Starovoitov wrote:
>>> On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
 As said above, Landlock will not run an eBPF programs when not strictly
 needed. Attaching to a cgroup will have the same performance impact as
 attaching to a process hierarchy.
>>>
>>> Having a prog per cgroup per lsm_hook is the only scalable way I
>>> could come up with. If you see another way, please propose.
>>> current->seccomp.landlock_prog is not the answer.
>>
>> Hum, I don't see the difference from a performance point of view between
>> a cgroup-based or a process hierarchy-based system.
>>
>> Maybe a better option should be to use an array of pointers with N
>> entries, one for each supported hook, instead of a unique pointer list?
> 
> yes, clearly array dereference is faster than link list walk.
> Now the question is where to keep this prog_array[num_lsm_hooks] ?
> Since we cannot keep it inside task_struct, we have to allocate it.
> Every time the task is creted then. What to do on the fork? That
> will require changes all over. Then the obvious optimization would be
> to share this allocated array of prog pointers across multiple tasks...
> and little by little this new facility will look like cgroup.
> Hence the suggestion to put this array into cgroup from the start.

I see your point :)

> 
>> Anyway, being able to attach an LSM hook program to a cgroup thanks to
>> the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility
>> to use a process hierarchy). The downside will be to handle an LSM hook
>> program which is not triggered by a seccomp-filter, but this should be
>> needed anyway to handle interruptions.
> 
> what do you mean 'not triggered by seccomp' ?
> You're not suggesting that this lsm has to enable seccomp to be functional?
> imo that's non starter due to overhead.

Yes, for now, it is triggered by a new seccomp filter return value
RET_LANDLOCK, which can take a 16-bit value called cookie. This must not
be needed but could be useful to bind a seccomp filter security policy
with a Landlock one. Waiting for Kees's point of view…



signature.asc
Description: OpenPGP digital signature


Re: [RFC v2 09/10] landlock: Handle cgroups (program types)

2016-08-27 Thread Alexei Starovoitov
On Sat, Aug 27, 2016 at 09:55:01PM +0200, Mickaël Salaün wrote:
> 
> On 27/08/2016 20:19, Alexei Starovoitov wrote:
> > On Sat, Aug 27, 2016 at 04:34:55PM +0200, Mickaël Salaün wrote:
> >>
> >> On 27/08/2016 01:05, Alexei Starovoitov wrote:
> >>> On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote:
> >>>
> > As far as safety and type checking that bpf programs has to do,
> > I like the approach of patch 06/10:
> > +LANDLOCK_HOOK2(file_open, FILE_OPEN,
> > +   PTR_TO_STRUCT_FILE, struct file *, file,
> > +   PTR_TO_STRUCT_CRED, const struct cred *, cred
> > +)
> > teaching verifier to recognize struct file, cred, sockaddr
> > will let bpf program access them naturally without any overhead.
> > Though:
> > @@ -102,6 +102,9 @@ enum bpf_prog_type {
> > BPF_PROG_TYPE_SCHED_CLS,
> > BPF_PROG_TYPE_SCHED_ACT,
> > BPF_PROG_TYPE_TRACEPOINT,
> > +   BPF_PROG_TYPE_LANDLOCK_FILE_OPEN,
> > +   BPF_PROG_TYPE_LANDLOCK_FILE_PERMISSION,
> > +   BPF_PROG_TYPE_LANDLOCK_MMAP_FILE,
> >  };
> > is a bit of overkill.
> > I think it would be cleaner to have single
> > BPF_PROG_TYPE_LSM and at program load time pass
> > lsm_hook_id as well, so that verifier can do safety checks
> > based on type info provided in LANDLOCK_HOOKs
> 
>  I first started with a unique BPF_PROG_TYPE but, the thing is, the BPF
>  verifier check programs according to their types. If we need to check
>  specific context value types (e.g. PTR_TO_STRUCT_FILE), we need a
>  dedicated program types. I don't see any other way to do it with the
>  current verifier code. Moreover it's the purpose of program types, right?
> >>>
> >>> Adding new bpf program type for every lsm hook is not acceptable.
> >>> Either do one new program type + pass lsm_hook_id as suggested
> >>> or please come up with an alternative approach.
> >>
> >> OK, so we have to modify the verifier to not only rely on the program
> >> type but on another value to check the context accesses. Do you have a
> >> hint from where this value could come from? Do we need to add a new bpf
> >> command to associate a program to a subtype?
> > 
> > It's another field prog_subtype (or prog_hook_id) in union bpf_attr.
> > Both prog_type and prog_hook_id are used during verification.
> > prog_type distinguishes the main aspects whereas prog_hook_id selects
> > which lsm_hook's argument definition to apply.
> > At the time of attaching to a hook, the prog_hook_id passed at the
> > load time should match lsm's hook_id.
> 
> OK, so this new prog_subtype field should be use/set by a new bpf_cmd,
> right? Something like BPF_PROG_SUBTYPE or BPF_PROG_METADATA?

No new command. It will be an optional field to existing BPF_PROG_LOAD.
In other words instead of replicating everything that different
bpf_prog_type-s need, we can pass this hook_id field to fine tune
the purpose (and applicability) of the program.
And one BPF_PROG_TYPE_LANDLOCK type will cover all lsm hooks.
For example existing BPF_PROG_TYPE_TRACEPOINT checks the safety
for all tracepoints while they all have different arguments.
For tracepoints it's easier, since the only difference between them
is the range of ctx access. Here we need strong type safety
of arguments therefore need extra hook_id at load time.




Re: [RFC v2 09/10] landlock: Handle cgroups (performance)

2016-08-27 Thread Alexei Starovoitov
On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote:
> 
> On 27/08/2016 20:06, Alexei Starovoitov wrote:
> > On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
> >>
> >> On 27/08/2016 01:05, Alexei Starovoitov wrote:
> >>> On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote:
> 
> >
> > - I don't think such 'for' loop can scale. The solution needs to work
> > with thousands of containers and thousands of cgroups.
> > In the patch 06/10 the proposal is to use 'current' as holder of
> > the programs:
> > +   for (prog = current->seccomp.landlock_prog;
> > +   prog; prog = prog->prev) {
> > +   if (prog->filter == landlock_ret->filter) {
> > +   cur_ret = BPF_PROG_RUN(prog->prog, (void *));
> > +   break;
> > +   }
> > +   }
> > imo that's the root of scalability issue.
> > I think to be able to scale the bpf programs have to be attached to
> > cgroups instead of tasks.
> > That would be very different api. seccomp doesn't need to be touched.
> > But that is the only way I see to be able to scale.
> 
>  Landlock is inspired from seccomp which also use a BPF program per
>  thread. For seccomp, each BPF programs are executed for each syscall.
>  For Landlock, some BPF programs are executed for some LSM hooks. I don't
>  see why it is a scale issue for Landlock comparing to seccomp. I also
>  don't see why storing the BPF program list pointer in the cgroup struct
>  instead of the task struct change a lot here. The BPF programs execution
>  will be the same anyway (for each LSM hook). Kees should probably have a
>  better opinion on this.
> >>>
> >>> seccomp has its own issues and copying them doesn't make this lsm any 
> >>> better.
> >>> Like seccomp bpf programs are all gigantic switch statement that looks
> >>> for interesting syscall numbers. All syscalls of a task are paying
> >>> non-trivial seccomp penalty due to such design. If bpf was attached per
> >>> syscall it would have been much cheaper. Of course doing it this way
> >>> for seccomp is not easy, but for lsm such facility is already there.
> >>> Blank call of a single bpf prog for all lsm hooks is unnecessary
> >>> overhead that can and should be avoided.
> >>
> >> It's probably a misunderstanding. Contrary to seccomp which run all the
> >> thread's BPF programs for any syscall, Landlock only run eBPF programs
> >> for the triggered LSM hooks, if their type match. Indeed, thanks to the
> >> multiple eBPF program types and contrary to seccomp, Landlock only run
> >> an eBPF program when needed. Landlock will have almost no performance
> >> overhead if the syscalls do not trigger the watched LSM hooks for the
> >> current process.
> > 
> > that's not what I see in the patch 06/10:
> > all lsm_hooks in 'static struct security_hook_list landlock_hooks'
> > (which eventually means all lsm hooks) will call
> > static inline int landlock_hook_##NAME
> > which will call landlock_run_prog()
> > which does:
> > + for (landlock_ret = current->seccomp.landlock_ret;
> > +  landlock_ret; landlock_ret = landlock_ret->prev) {
> > +if (landlock_ret->triggered) {
> > +   ctx.cookie = landlock_ret->cookie;
> > +   for (prog = current->seccomp.landlock_prog;
> > +prog; prog = prog->prev) {
> > +   if (prog->filter == landlock_ret->filter) {
> > +   cur_ret = BPF_PROG_RUN(prog->prog, (void *));
> > +   break;
> > +   }
> > +   }
> > 
> > that is unacceptable overhead and not a scalable design.
> > It kinda works for 3 lsm_hooks as in patch 6, but doesn't scale
> > as soon as more lsm hooks are added.
> 
> Good catch! I forgot to check the program (sub)type in the loop to only
> run the needed programs for the current hook. I will fix this.
> 
> 
> > 
> >> As said above, Landlock will not run an eBPF programs when not strictly
> >> needed. Attaching to a cgroup will have the same performance impact as
> >> attaching to a process hierarchy.
> > 
> > Having a prog per cgroup per lsm_hook is the only scalable way I
> > could come up with. If you see another way, please propose.
> > current->seccomp.landlock_prog is not the answer.
> 
> Hum, I don't see the difference from a performance point of view between
> a cgroup-based or a process hierarchy-based system.
> 
> Maybe a better option should be to use an array of pointers with N
> entries, one for each supported hook, instead of a unique pointer list?

yes, clearly array dereference is faster than link list walk.
Now the question is where to keep this prog_array[num_lsm_hooks] ?
Since we cannot keep it inside task_struct, we have to allocate it.
Every time the task is creted then. What to do on the fork? That
will require changes all over. Then the obvious optimization would be
to share this allocated array of prog 

Re: [PATCH] [v9] net: emac: emac gigabit ethernet controller driver

2016-08-27 Thread Timur Tabi

Rami Rosen wrote:

Hi,

Shouldn't it be obj-$(CONFIG_QCOM_EMAC) instead obj-y?


Yes, I noticed this soon after I posted this patch.  I will fix it in v10.


The return type of the method emac_tx_tpd_create()  should be void, as
it always returns true.


Will fix, thanks.


Seems that there are several unused members in the emac_stats struct:


+struct emac_stats {

...
...
Both rx_bcast_byte_cnt and rx_mcast_byte_cnt are not used anywhere/

+   u64 rx_bcast_byte_cnt;  /* broadcast packets byte count (without FCS) */
+   u64 rx_mcast_byte_cnt;  /* multicast packets byte count (without FCS) */

...
rx_err_addr is not used

+   u64 rx_err_addr;/* packets dropped due to address filtering */


I'll go through the structure and remove the unused fields.



Typo: it is jabbers and not jubbers; greppping for "jubber" under
drivers/net/ethernet gives
0 results, whereas:
/net-next/drivers/net/ethernet$ grep -R jabber | wc -l
gives 241 results.


+   u64 rx_jubbers; /* jubbers */


Thanks.  I'll fix that in v10.




...
..
tx_rd_eop and tx_len_err are not used

+   u64 tx_rd_eop;  /* count of reads beyond EOP */
+   u64 tx_len_err; /* packets with length mismatch */


Both tx_bcast_byte and tx_mcast_byte are not used.

+   u64 tx_bcast_byte;  /* broadcast packets byte count (without FCS) */
+   u64 tx_mcast_byte;  /* multicast packets byte count (without FCS) */


Thanks for your good work, it seems that the driver is almost done!


I'm grateful for the opportunity to improve this driver over the crappy 
one that's been bouncing around internally for years, and for all the 
people who helped me along the way.


--
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.


Re: [PATCH] [v9] net: emac: emac gigabit ethernet controller driver

2016-08-27 Thread Rami Rosen
Hi,

Shouldn't it be obj-$(CONFIG_QCOM_EMAC) instead obj-y? QCOM_EMAC is defined as
tristate in the drivers/net/ethernet/qualcomm/Kconfig by this patch,
so it should be possible to build it as a *.ko module also.

> diff --git a/drivers/net/ethernet/qualcomm/emac/Makefile 
> b/drivers/net/ethernet/qualcomm/emac/Makefile
> new file mode 100644
> index 000..700ee6e
> --- /dev/null
> +++ b/drivers/net/ethernet/qualcomm/emac/Makefile
> @@ -0,0 +1,7 @@
> +#
> +# Makefile for the Qualcomm Technologies, Inc. EMAC Gigabit Ethernet driver
> +#
> +
> +obj-y += qcom-emac.o
> +


The return type of the method emac_tx_tpd_create()  should be void, as
it always returns true.

> +/* Produce new transmit descriptor */
> +static bool emac_tx_tpd_create(struct emac_adapter *adpt,
> +  struct emac_tx_queue *tx_q, struct emac_tpd 
> *tpd)
> +{
> +   u32 *hw_tpd;
> +
> +   tx_q->tpd.last_produce_idx = tx_q->tpd.produce_idx;
> +   hw_tpd = EMAC_TPD(tx_q, adpt->tpd_size, tx_q->tpd.produce_idx);
> +
> +   if (++tx_q->tpd.produce_idx == tx_q->tpd.count)
> +   tx_q->tpd.produce_idx = 0;
> +
> +   *(hw_tpd++) = tpd->word[0];
> +   *(hw_tpd++) = tpd->word[1];
> +   *(hw_tpd++) = tpd->word[2];
> +   *hw_tpd = tpd->word[3];
> +
> +   return true;
> +}

...
...
...

Seems that there are several unused members in the emac_stats struct:

> +struct emac_stats {
...
...
Both rx_bcast_byte_cnt and rx_mcast_byte_cnt are not used anywhere/
> +   u64 rx_bcast_byte_cnt;  /* broadcast packets byte count (without FCS) 
> */
> +   u64 rx_mcast_byte_cnt;  /* multicast packets byte count (without FCS) 
> */
...
rx_err_addr is not used
> +   u64 rx_err_addr;/* packets dropped due to address filtering */

Typo: it is jabbers and not jubbers; greppping for "jubber" under
drivers/net/ethernet gives
0 results, whereas:
/net-next/drivers/net/ethernet$ grep -R jabber | wc -l
gives 241 results.

> +   u64 rx_jubbers; /* jubbers */


...
..
tx_rd_eop and tx_len_err are not used
> +   u64 tx_rd_eop;  /* count of reads beyond EOP */
> +   u64 tx_len_err; /* packets with length mismatch */

Both tx_bcast_byte and tx_mcast_byte are not used.
> +   u64 tx_bcast_byte;  /* broadcast packets byte count (without FCS) 
> */
> +   u64 tx_mcast_byte;  /* multicast packets byte count (without FCS) 
> */

Thanks for your good work, it seems that the driver is almost done!


Regards,
Rami Rosen


[PATCH net 2/2] ppp: declare PPP devices as LLTX

2016-08-27 Thread Guillaume Nault
ppp_xmit_process() already locks the xmit path. If HARD_TX_LOCK() tries
to hold the _xmit_lock we can get lock inversion.

[  973.726130] ==
[  973.727311] [ INFO: possible circular locking dependency detected ]
[  973.728546] 4.8.0-rc2 #1 Tainted: G   O
[  973.728986] ---
[  973.728986] accel-pppd/1806 is trying to acquire lock:
[  973.728986]  (_xmit_lock_key){+.-...}, at: [] 
sch_direct_xmit+0x8d/0x221
[  973.728986]
[  973.728986] but task is already holding lock:
[  973.728986]  (l2tp_sock){+.-...}, at: [] 
l2tp_xmit_skb+0x1e8/0x5d7 [l2tp_core]
[  973.728986]
[  973.728986] which lock already depends on the new lock.
[  973.728986]
[  973.728986]
[  973.728986] the existing dependency chain (in reverse order) is:
[  973.728986]
-> #3 (l2tp_sock){+.-...}:
[  973.728986][] lock_acquire+0x150/0x217
[  973.728986][] _raw_spin_lock+0x2d/0x3c
[  973.728986][] l2tp_xmit_skb+0x1e8/0x5d7 [l2tp_core]
[  973.728986][] pppol2tp_xmit+0x1f2/0x25e [l2tp_ppp]
[  973.728986][] ppp_channel_push+0xb5/0x14a 
[ppp_generic]
[  973.728986][] ppp_write+0x104/0x11c [ppp_generic]
[  973.728986][] __vfs_write+0x56/0x120
[  973.728986][] vfs_write+0xbd/0x11b
[  973.728986][] SyS_write+0x5e/0x96
[  973.728986][] entry_SYSCALL_64_fastpath+0x18/0xa8
[  973.728986]
-> #2 (&(>downl)->rlock){+.-...}:
[  973.728986][] lock_acquire+0x150/0x217
[  973.728986][] _raw_spin_lock_bh+0x31/0x40
[  973.728986][] ppp_push+0xa7/0x82d [ppp_generic]
[  973.728986][] __ppp_xmit_process+0x48/0x877 
[ppp_generic]
[  973.728986][] ppp_xmit_process+0x4b/0xaf 
[ppp_generic]
[  973.728986][] ppp_write+0x10e/0x11c [ppp_generic]
[  973.728986][] __vfs_write+0x56/0x120
[  973.728986][] vfs_write+0xbd/0x11b
[  973.728986][] SyS_write+0x5e/0x96
[  973.728986][] entry_SYSCALL_64_fastpath+0x18/0xa8
[  973.728986]
-> #1 (&(>wlock)->rlock){+.-...}:
[  973.728986][] lock_acquire+0x150/0x217
[  973.728986][] _raw_spin_lock_bh+0x31/0x40
[  973.728986][] __ppp_xmit_process+0x27/0x877 
[ppp_generic]
[  973.728986][] ppp_xmit_process+0x4b/0xaf 
[ppp_generic]
[  973.728986][] ppp_start_xmit+0x21b/0x22a 
[ppp_generic]
[  973.728986][] dev_hard_start_xmit+0x1a9/0x43d
[  973.728986][] sch_direct_xmit+0xd6/0x221
[  973.728986][] __dev_queue_xmit+0x62a/0x912
[  973.728986][] dev_queue_xmit+0xb/0xd
[  973.728986][] neigh_direct_output+0xc/0xe
[  973.728986][] ip6_finish_output2+0x5a9/0x623
[  973.728986][] ip6_output+0x15e/0x16a
[  973.728986][] dst_output+0x76/0x7f
[  973.728986][] mld_sendpack+0x335/0x404
[  973.728986][] mld_send_initial_cr.part.21+0x99/0xa2
[  973.728986][] ipv6_mc_dad_complete+0x42/0x71
[  973.728986][] addrconf_dad_completed+0x1cf/0x2ea
[  973.728986][] addrconf_dad_work+0x453/0x520
[  973.728986][] process_one_work+0x365/0x6f0
[  973.728986][] worker_thread+0x2de/0x421
[  973.728986][] kthread+0x121/0x130
[  973.728986][] ret_from_fork+0x1f/0x40
[  973.728986]
-> #0 (_xmit_lock_key){+.-...}:
[  973.728986][] __lock_acquire+0x1118/0x1483
[  973.728986][] lock_acquire+0x150/0x217
[  973.728986][] _raw_spin_lock+0x2d/0x3c
[  973.728986][] sch_direct_xmit+0x8d/0x221
[  973.728986][] __dev_queue_xmit+0x62a/0x912
[  973.728986][] dev_queue_xmit+0xb/0xd
[  973.728986][] neigh_direct_output+0xc/0xe
[  973.728986][] ip_finish_output2+0x5db/0x609
[  973.728986][] ip_finish_output+0x152/0x15e
[  973.728986][] ip_output+0x8c/0x96
[  973.728986][] ip_local_out+0x41/0x4a
[  973.728986][] ip_queue_xmit+0x5a5/0x609
[  973.728986][] l2tp_xmit_skb+0x582/0x5d7 [l2tp_core]
[  973.728986][] pppol2tp_xmit+0x1f2/0x25e [l2tp_ppp]
[  973.728986][] ppp_channel_push+0xb5/0x14a 
[ppp_generic]
[  973.728986][] ppp_write+0x104/0x11c [ppp_generic]
[  973.728986][] __vfs_write+0x56/0x120
[  973.728986][] vfs_write+0xbd/0x11b
[  973.728986][] SyS_write+0x5e/0x96
[  973.728986][] entry_SYSCALL_64_fastpath+0x18/0xa8
[  973.728986]
[  973.728986] other info that might help us debug this:
[  973.728986]
[  973.728986] Chain exists of:
  _xmit_lock_key --> &(>downl)->rlock --> l2tp_sock

[  973.728986]  Possible unsafe locking scenario:
[  973.728986]
[  973.728986]CPU0CPU1
[  973.728986]
[  973.728986]   lock(l2tp_sock);
[  973.728986]lock(&(>downl)->rlock);
[  973.728986]lock(l2tp_sock);
[  973.728986]   lock(_xmit_lock_key);
[  973.728986]
[  973.728986]  *** 

[PATCH net 1/2] ppp: avoid dealock on recursive xmit

2016-08-27 Thread Guillaume Nault
In case of misconfiguration, a virtual PPP channel might send packets
back to their parent PPP interface. This typically happens in
misconfigured L2TP setups, where PPP's peer IP address is set with the
IP of the L2TP peer.
When that happens the system hangs due to PPP trying to recursively
lock its xmit path.

[  243.332155] BUG: spinlock recursion on CPU#1, accel-pppd/926
[  243.333272]  lock: 0x880033d90f18, .magic: dead4ead, .owner: 
accel-pppd/926, .owner_cpu: 1
[  243.334859] CPU: 1 PID: 926 Comm: accel-pppd Not tainted 4.8.0-rc2 #1
[  243.336010] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Debian-1.8.2-1 04/01/2014
[  243.336018]  7fff 8800319a77a0 8128de85 
880033d90f18
[  243.336018]  880033ad8000 8800319a77d8 810ad7c0 
039e
[  243.336018]  880033d90f18 880033d90f60 880033d90f18 
880033d90f28
[  243.336018] Call Trace:
[  243.336018]  [] dump_stack+0x4f/0x65
[  243.336018]  [] spin_dump+0xe1/0xeb
[  243.336018]  [] spin_bug+0x26/0x28
[  243.336018]  [] do_raw_spin_lock+0x5c/0x160
[  243.336018]  [] _raw_spin_lock_bh+0x35/0x3c
[  243.336018]  [] ? ppp_push+0xa7/0x82d [ppp_generic]
[  243.336018]  [] ppp_push+0xa7/0x82d [ppp_generic]
[  243.336018]  [] ? do_raw_spin_unlock+0xc2/0xcc
[  243.336018]  [] ? preempt_count_sub+0x13/0xc7
[  243.336018]  [] ? _raw_spin_unlock_irqrestore+0x34/0x49
[  243.336018]  [] ppp_xmit_process+0x48/0x877 [ppp_generic]
[  243.336018]  [] ? preempt_count_sub+0x13/0xc7
[  243.336018]  [] ? skb_queue_tail+0x71/0x7c
[  243.336018]  [] ppp_start_xmit+0x21b/0x22a [ppp_generic]
[  243.336018]  [] dev_hard_start_xmit+0x15e/0x32c
[  243.336018]  [] sch_direct_xmit+0xd6/0x221
[  243.336018]  [] __dev_queue_xmit+0x52a/0x820
[  243.336018]  [] dev_queue_xmit+0xb/0xd
[  243.336018]  [] neigh_direct_output+0xc/0xe
[  243.336018]  [] ip_finish_output2+0x4d2/0x548
[  243.336018]  [] ? dst_mtu+0x29/0x2e
[  243.336018]  [] ip_finish_output+0x152/0x15e
[  243.336018]  [] ? ip_output+0x74/0x96
[  243.336018]  [] ip_output+0x8c/0x96
[  243.336018]  [] ip_local_out+0x41/0x4a
[  243.336018]  [] ip_queue_xmit+0x531/0x5c5
[  243.336018]  [] ? udp_set_csum+0x207/0x21e
[  243.336018]  [] l2tp_xmit_skb+0x582/0x5d7 [l2tp_core]
[  243.336018]  [] pppol2tp_xmit+0x1eb/0x257 [l2tp_ppp]
[  243.336018]  [] ppp_channel_push+0x91/0x102 [ppp_generic]
[  243.336018]  [] ppp_write+0x104/0x11c [ppp_generic]
[  243.336018]  [] __vfs_write+0x56/0x120
[  243.336018]  [] ? fsnotify_perm+0x27/0x95
[  243.336018]  [] ? security_file_permission+0x4d/0x54
[  243.336018]  [] vfs_write+0xbd/0x11b
[  243.336018]  [] SyS_write+0x5e/0x96
[  243.336018]  [] entry_SYSCALL_64_fastpath+0x13/0x94

The main entry points for sending packets over a PPP unit are the
.write() and .ndo_start_xmit() callbacks (simplified view):

.write(unit fd) or .ndo_start_xmit()
   \
CALL ppp_xmit_process()
   \
LOCK unit's xmit path (ppp->wlock)
|
CALL ppp_push()
   \
LOCK channel's xmit path (chan->downl)
|
CALL lower layer's .start_xmit() callback
   \
... might recursively call .ndo_start_xmit() ...
   /
RETURN from .start_xmit()
|
UNLOCK channel's xmit path
   /
RETURN from ppp_push()
|
UNLOCK unit's xmit path
   /
RETURN from ppp_xmit_process()

Packets can also be directly sent on channels (e.g. LCP packets):

.write(channel fd) or ppp_output_wakeup()
   \
CALL ppp_channel_push()
   \
LOCK channel's xmit path (chan->downl)
|
CALL lower layer's .start_xmit() callback
   \
... might call .ndo_start_xmit() ...
   /
RETURN from .start_xmit()
|
UNLOCK channel's xmit path
   /
RETURN from ppp_channel_push()

Key points about the lower layer's .start_xmit() callback:

  * It can be called directly by a channel fd .write() or by
ppp_output_wakeup() or indirectly by a unit fd .write() or by
.ndo_start_xmit().

  * In any case, it's always called with chan->downl held.

  * It might route the packet back to its parent unit using
.ndo_start_xmit() as entry point.

This patch detects and breaks recursion in ppp_xmit_process(). This
function is a good candidate for the task because it's called early
enough after .ndo_start_xmit(), it's always part of the recursion
loop and it's on the path of whatever entry point is used to send
a packet on a PPP unit.

Recursion detection is done using the per-cpu ppp_xmit_recursion
variable.

Since 

[PATCH net 0/2] ppp: fix deadlock upon recursive xmit

2016-08-27 Thread Guillaume Nault
This series fixes the issue reported by Feng where packets looping
through a ppp device makes the module deadlock:
https://marc.info/?l=linux-netdev=147134567319038=2

The problem can occur on virtual interfaces (e.g. PPP over L2TP, or
PPPoE on vxlan devices), when a PPP packet is routed back to the PPP
interface.

PPP's xmit path isn't reentrant, so patch #1 uses a per-cpu variable
to detect and break recursion. Patch #2 sets the NETIF_F_LLTX flag to
avoid lock inversion issues between ppp and txqueue locks.

There are multiple entry points to the PPP xmit path. This series has
been tested with lockdep and should address recursion issues no matter
how the packet entered the path.


A similar issue in L2TP is not covered by this series:
l2tp_xmit_skb() also isn't reentrant, and it can be called as part of
PPP's xmit path (pppol2tp_xmit()), or directly from the L2TP socket
(l2tp_ppp_sendmsg()). If a packet is sent by l2tp_ppp_sendmsg() and
routed to the parent PPP interface, then it's going to hit
l2tp_xmit_skb() again.

Breaking recursion as done in ppp_generic is not enough, because we'd
still have a lock inversion issue (locking in l2tp_xmit_skb() can
happen before or after locking in ppp_generic). The best approach would
be to use the ip_tunnel functions and remove the socket locking in
l2tp_xmit_skb(). But that'd be something for net-next.


BTW, I hope the commit messages aren't too long. Just let me know if I
should trim something.


Guillaume Nault (2):
  ppp: avoid dealock on recursive xmit
  ppp: declare PPP devices as LLTX

 drivers/net/ppp/ppp_generic.c | 54 +--
 1 file changed, 42 insertions(+), 12 deletions(-)

-- 
2.9.3



Re: [RFC v2 09/10] landlock: Handle cgroups (program types)

2016-08-27 Thread Mickaël Salaün

On 27/08/2016 20:19, Alexei Starovoitov wrote:
> On Sat, Aug 27, 2016 at 04:34:55PM +0200, Mickaël Salaün wrote:
>>
>> On 27/08/2016 01:05, Alexei Starovoitov wrote:
>>> On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote:
>>>
> As far as safety and type checking that bpf programs has to do,
> I like the approach of patch 06/10:
> +LANDLOCK_HOOK2(file_open, FILE_OPEN,
> +   PTR_TO_STRUCT_FILE, struct file *, file,
> +   PTR_TO_STRUCT_CRED, const struct cred *, cred
> +)
> teaching verifier to recognize struct file, cred, sockaddr
> will let bpf program access them naturally without any overhead.
> Though:
> @@ -102,6 +102,9 @@ enum bpf_prog_type {
> BPF_PROG_TYPE_SCHED_CLS,
> BPF_PROG_TYPE_SCHED_ACT,
> BPF_PROG_TYPE_TRACEPOINT,
> +   BPF_PROG_TYPE_LANDLOCK_FILE_OPEN,
> +   BPF_PROG_TYPE_LANDLOCK_FILE_PERMISSION,
> +   BPF_PROG_TYPE_LANDLOCK_MMAP_FILE,
>  };
> is a bit of overkill.
> I think it would be cleaner to have single
> BPF_PROG_TYPE_LSM and at program load time pass
> lsm_hook_id as well, so that verifier can do safety checks
> based on type info provided in LANDLOCK_HOOKs

 I first started with a unique BPF_PROG_TYPE but, the thing is, the BPF
 verifier check programs according to their types. If we need to check
 specific context value types (e.g. PTR_TO_STRUCT_FILE), we need a
 dedicated program types. I don't see any other way to do it with the
 current verifier code. Moreover it's the purpose of program types, right?
>>>
>>> Adding new bpf program type for every lsm hook is not acceptable.
>>> Either do one new program type + pass lsm_hook_id as suggested
>>> or please come up with an alternative approach.
>>
>> OK, so we have to modify the verifier to not only rely on the program
>> type but on another value to check the context accesses. Do you have a
>> hint from where this value could come from? Do we need to add a new bpf
>> command to associate a program to a subtype?
> 
> It's another field prog_subtype (or prog_hook_id) in union bpf_attr.
> Both prog_type and prog_hook_id are used during verification.
> prog_type distinguishes the main aspects whereas prog_hook_id selects
> which lsm_hook's argument definition to apply.
> At the time of attaching to a hook, the prog_hook_id passed at the
> load time should match lsm's hook_id.

OK, so this new prog_subtype field should be use/set by a new bpf_cmd,
right? Something like BPF_PROG_SUBTYPE or BPF_PROG_METADATA?



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2 net-next] documentation: Document issues with VMs and XPS and drivers enabling it on their own

2016-08-27 Thread Tom Herbert
On Fri, Aug 26, 2016 at 9:35 PM, David Miller  wrote:
> From: Tom Herbert 
> Date: Thu, 25 Aug 2016 16:43:35 -0700
>
>> This seems like it will only confuse users even more. You've clearly
>> identified an issue, let's figure out how to fix it.
>
> I kinda feel the same way about this situation.

I'm working on XFS (as the transmit analogue to RFS). We'll track
flows enough so that we should know when it's safe to move them.

Tom


Re: [RFC v2 09/10] landlock: Handle cgroups (performance)

2016-08-27 Thread Mickaël Salaün

On 27/08/2016 20:06, Alexei Starovoitov wrote:
> On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
>>
>> On 27/08/2016 01:05, Alexei Starovoitov wrote:
>>> On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote:

>
> - I don't think such 'for' loop can scale. The solution needs to work
> with thousands of containers and thousands of cgroups.
> In the patch 06/10 the proposal is to use 'current' as holder of
> the programs:
> +   for (prog = current->seccomp.landlock_prog;
> +   prog; prog = prog->prev) {
> +   if (prog->filter == landlock_ret->filter) {
> +   cur_ret = BPF_PROG_RUN(prog->prog, (void *));
> +   break;
> +   }
> +   }
> imo that's the root of scalability issue.
> I think to be able to scale the bpf programs have to be attached to
> cgroups instead of tasks.
> That would be very different api. seccomp doesn't need to be touched.
> But that is the only way I see to be able to scale.

 Landlock is inspired from seccomp which also use a BPF program per
 thread. For seccomp, each BPF programs are executed for each syscall.
 For Landlock, some BPF programs are executed for some LSM hooks. I don't
 see why it is a scale issue for Landlock comparing to seccomp. I also
 don't see why storing the BPF program list pointer in the cgroup struct
 instead of the task struct change a lot here. The BPF programs execution
 will be the same anyway (for each LSM hook). Kees should probably have a
 better opinion on this.
>>>
>>> seccomp has its own issues and copying them doesn't make this lsm any 
>>> better.
>>> Like seccomp bpf programs are all gigantic switch statement that looks
>>> for interesting syscall numbers. All syscalls of a task are paying
>>> non-trivial seccomp penalty due to such design. If bpf was attached per
>>> syscall it would have been much cheaper. Of course doing it this way
>>> for seccomp is not easy, but for lsm such facility is already there.
>>> Blank call of a single bpf prog for all lsm hooks is unnecessary
>>> overhead that can and should be avoided.
>>
>> It's probably a misunderstanding. Contrary to seccomp which run all the
>> thread's BPF programs for any syscall, Landlock only run eBPF programs
>> for the triggered LSM hooks, if their type match. Indeed, thanks to the
>> multiple eBPF program types and contrary to seccomp, Landlock only run
>> an eBPF program when needed. Landlock will have almost no performance
>> overhead if the syscalls do not trigger the watched LSM hooks for the
>> current process.
> 
> that's not what I see in the patch 06/10:
> all lsm_hooks in 'static struct security_hook_list landlock_hooks'
> (which eventually means all lsm hooks) will call
> static inline int landlock_hook_##NAME
> which will call landlock_run_prog()
> which does:
> + for (landlock_ret = current->seccomp.landlock_ret;
> +  landlock_ret; landlock_ret = landlock_ret->prev) {
> +if (landlock_ret->triggered) {
> +   ctx.cookie = landlock_ret->cookie;
> +   for (prog = current->seccomp.landlock_prog;
> +prog; prog = prog->prev) {
> +   if (prog->filter == landlock_ret->filter) {
> +   cur_ret = BPF_PROG_RUN(prog->prog, (void *));
> +   break;
> +   }
> +   }
> 
> that is unacceptable overhead and not a scalable design.
> It kinda works for 3 lsm_hooks as in patch 6, but doesn't scale
> as soon as more lsm hooks are added.

Good catch! I forgot to check the program (sub)type in the loop to only
run the needed programs for the current hook. I will fix this.


> 
>> As said above, Landlock will not run an eBPF programs when not strictly
>> needed. Attaching to a cgroup will have the same performance impact as
>> attaching to a process hierarchy.
> 
> Having a prog per cgroup per lsm_hook is the only scalable way I
> could come up with. If you see another way, please propose.
> current->seccomp.landlock_prog is not the answer.

Hum, I don't see the difference from a performance point of view between
a cgroup-based or a process hierarchy-based system.

Maybe a better option should be to use an array of pointers with N
entries, one for each supported hook, instead of a unique pointer list?

Anyway, being able to attach an LSM hook program to a cgroup thanks to
the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility
to use a process hierarchy). The downside will be to handle an LSM hook
program which is not triggered by a seccomp-filter, but this should be
needed anyway to handle interruptions.



signature.asc
Description: OpenPGP digital signature


Re: [RFC v2 09/10] landlock: Handle cgroups (netfilter match)

2016-08-27 Thread Alexei Starovoitov
On Sat, Aug 27, 2016 at 04:19:05PM +0200, Mickaël Salaün wrote:
> 
> On 27/08/2016 01:05, Alexei Starovoitov wrote:
> > On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote:
> >> To sum up, there is four related patchsets:
> >> * "Landlock LSM: Unprivileged sandboxing" (this series)
> >> * "Add Checmate, BPF-driven minor LSM" (Sargun Dhillon)
> >> * "Networking cgroup controller" (Anoop Naravaram)
> >> * "Add eBPF hooks for cgroups" (Daniel Mack)
> 
> >>> Anoop Naravaram's use case is to control the ports the applications
> >>> under cgroup can bind and listen on.
> >>> Such use case can be solved by such 'lsm cgroup controller' by
> >>> attaching bpf program to security_socket_bind lsm hook and
> >>> filtering sockaddr.
> >>> Furthermore Sargun's use case is to allow further sockaddr rewrites
> >>> from the bpf program which can be done as natural extension
> >>> of such mechanism.
> >>>
> >>> If I understood Daniel's Anoop's Sargun's and yours use cases
> >>> correctly the common piece of kernel infrastructure that can solve
> >>> them all can start from Daniel's current set of patches that
> >>> establish a mechanism of attaching bpf program to a cgroup.
> >>> Then adding lsm hooks to it and later allowing argument rewrite
> >>> (since they're already in the kernel and no ToCToU problems exist)
> 
> >> For the network-related series, I think it make more sense to simply
> >> create a netfilter rule matching a cgroup and then add more features to
> >> netfilter (restrict port ranges and so on) thanks to eBPF programs.
> >> Containers are (usually) in a dedicated network namespace, which open
> >> the possibility to not only rely on cgroups (e.g. match UID,
> >> netmask...). It would also be more flexible to be able to load a BPF
> >> program in netfilter and update its maps on the fly to make dynamic
> >> rules, like ipset does, but in a more generic way.
> 
> What do the netdev folks think about this design?

such design doesn't scale when used for container management and
that's what we need to solve.
netns has its overhead and management issues. There are proposals to
solve that but that is orthogonal to containers in general.
A lot of them don't use netns.



Re: [PATCH net-next] tcp: add tcp_add_backlog()

2016-08-27 Thread Neal Cardwell
On Sat, Aug 27, 2016 at 10:37 AM, Eric Dumazet  wrote:
> From: Eric Dumazet 
>
> When TCP operates in lossy environments (between 1 and 10 % packet
> losses), many SACK blocks can be exchanged, and I noticed we could
> drop them on busy senders, if these SACK blocks have to be queued
> into the socket backlog.
>
> While the main cause is the poor performance of RACK/SACK processing,
> we can try to avoid these drops of valuable information that can lead to
> spurious timeouts and retransmits.
>
> Cause of the drops is the skb->truesize overestimation caused by :
>
> - drivers allocating ~2048 (or more) bytes as a fragment to hold an
>   Ethernet frame.
>
> - various pskb_may_pull() calls bringing the headers into skb->head
>   might have pulled all the frame content, but skb->truesize could
>   not be lowered, as the stack has no idea of each fragment truesize.
>
> The backlog drops are also more visible on bidirectional flows, since
> their sk_rmem_alloc can be quite big.
>
> Let's add some room for the backlog, as only the socket owner
> can selectively take action to lower memory needs, like collapsing
> receive queues or partial ofo pruning.
>
> Signed-off-by: Eric Dumazet 
> Cc: Yuchung Cheng 
> Cc: Neal Cardwell 
> ---
>  include/net/tcp.h   |1 +
>  net/ipv4/tcp_ipv4.c |   33 +
>  net/ipv6/tcp_ipv6.c |5 +
>  3 files changed, 31 insertions(+), 8 deletions(-)
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 
> 25d64f6de69e1f639ed1531bf2d2df3f00fd76a2..5f5f09f6e019682ef29c864d2f43a8f247fcdd9a
>  100644

Thanks for doing this, and thanks for the detailed answers to Yuchung's e-mail!

Acked-by: Neal Cardwell 

neal


Re: [RFC v2 09/10] landlock: Handle cgroups (program types)

2016-08-27 Thread Alexei Starovoitov
On Sat, Aug 27, 2016 at 04:34:55PM +0200, Mickaël Salaün wrote:
> 
> On 27/08/2016 01:05, Alexei Starovoitov wrote:
> > On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote:
> >
> >>> As far as safety and type checking that bpf programs has to do,
> >>> I like the approach of patch 06/10:
> >>> +LANDLOCK_HOOK2(file_open, FILE_OPEN,
> >>> +   PTR_TO_STRUCT_FILE, struct file *, file,
> >>> +   PTR_TO_STRUCT_CRED, const struct cred *, cred
> >>> +)
> >>> teaching verifier to recognize struct file, cred, sockaddr
> >>> will let bpf program access them naturally without any overhead.
> >>> Though:
> >>> @@ -102,6 +102,9 @@ enum bpf_prog_type {
> >>> BPF_PROG_TYPE_SCHED_CLS,
> >>> BPF_PROG_TYPE_SCHED_ACT,
> >>> BPF_PROG_TYPE_TRACEPOINT,
> >>> +   BPF_PROG_TYPE_LANDLOCK_FILE_OPEN,
> >>> +   BPF_PROG_TYPE_LANDLOCK_FILE_PERMISSION,
> >>> +   BPF_PROG_TYPE_LANDLOCK_MMAP_FILE,
> >>>  };
> >>> is a bit of overkill.
> >>> I think it would be cleaner to have single
> >>> BPF_PROG_TYPE_LSM and at program load time pass
> >>> lsm_hook_id as well, so that verifier can do safety checks
> >>> based on type info provided in LANDLOCK_HOOKs
> >>
> >> I first started with a unique BPF_PROG_TYPE but, the thing is, the BPF
> >> verifier check programs according to their types. If we need to check
> >> specific context value types (e.g. PTR_TO_STRUCT_FILE), we need a
> >> dedicated program types. I don't see any other way to do it with the
> >> current verifier code. Moreover it's the purpose of program types, right?
> > 
> > Adding new bpf program type for every lsm hook is not acceptable.
> > Either do one new program type + pass lsm_hook_id as suggested
> > or please come up with an alternative approach.
> 
> OK, so we have to modify the verifier to not only rely on the program
> type but on another value to check the context accesses. Do you have a
> hint from where this value could come from? Do we need to add a new bpf
> command to associate a program to a subtype?

It's another field prog_subtype (or prog_hook_id) in union bpf_attr.
Both prog_type and prog_hook_id are used during verification.
prog_type distinguishes the main aspects whereas prog_hook_id selects
which lsm_hook's argument definition to apply.
At the time of attaching to a hook, the prog_hook_id passed at the
load time should match lsm's hook_id.



Re: [RFC v2 09/10] landlock: Handle cgroups

2016-08-27 Thread Alexei Starovoitov
On Sat, Aug 27, 2016 at 12:30:36AM -0700, Andy Lutomirski wrote:
> > cgroup is the common way to group multiple tasks.
> > Without cgroup only parent<->child relationship will be possible,
> > which will limit usability of such lsm to a master task that controls
> > its children. Such api restriction would have been ok, if we could
> > extend it in the future, but unfortunately task-centric won't allow it
> > without creating a parallel lsm that is cgroup based.
> > Therefore I think we have to go with cgroup-centric api and your
> > application has to use cgroups from the start though only parent-child
> > would have been enough.
> > Also I don't think the kernel can afford two bpf based lsm. One task
> > based and another cgroup based, so we have to find common ground
> > that suits both use cases.
> > Having unprivliged access is a subset. There is no strong reason why
> > cgroup+lsm+bpf should be limited to root only always.
> > When we can guarantee no pointer leaks, we can allow unpriv.
> 
> I don't really understand what you mean.  In the context of landlock,
> which is a *sandbox*, can one of you explain a use case that
> materially benefits from this type of cgroup usage?  I haven't thought
> of one.

In case of seccomp-like sandbox where parent controls child processes
cgroup is not needed. It's needed when container management software
needs to control a set of applications. If we can have one bpf-based lsm
that works via cgroup and without, I'd be fine with it. Right now
I haven't seen a plausible proposal to do that. Therefore cgroup based
api is a common api that works for sandbox as well, though requiring
parent to create a cgroup just to control a single child is cumbersome.



Re: [RFC v2 09/10] landlock: Handle cgroups (performance)

2016-08-27 Thread Alexei Starovoitov
On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
> 
> On 27/08/2016 01:05, Alexei Starovoitov wrote:
> > On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote:
> >>
> >>>
> >>> - I don't think such 'for' loop can scale. The solution needs to work
> >>> with thousands of containers and thousands of cgroups.
> >>> In the patch 06/10 the proposal is to use 'current' as holder of
> >>> the programs:
> >>> +   for (prog = current->seccomp.landlock_prog;
> >>> +   prog; prog = prog->prev) {
> >>> +   if (prog->filter == landlock_ret->filter) {
> >>> +   cur_ret = BPF_PROG_RUN(prog->prog, (void *));
> >>> +   break;
> >>> +   }
> >>> +   }
> >>> imo that's the root of scalability issue.
> >>> I think to be able to scale the bpf programs have to be attached to
> >>> cgroups instead of tasks.
> >>> That would be very different api. seccomp doesn't need to be touched.
> >>> But that is the only way I see to be able to scale.
> >>
> >> Landlock is inspired from seccomp which also use a BPF program per
> >> thread. For seccomp, each BPF programs are executed for each syscall.
> >> For Landlock, some BPF programs are executed for some LSM hooks. I don't
> >> see why it is a scale issue for Landlock comparing to seccomp. I also
> >> don't see why storing the BPF program list pointer in the cgroup struct
> >> instead of the task struct change a lot here. The BPF programs execution
> >> will be the same anyway (for each LSM hook). Kees should probably have a
> >> better opinion on this.
> > 
> > seccomp has its own issues and copying them doesn't make this lsm any 
> > better.
> > Like seccomp bpf programs are all gigantic switch statement that looks
> > for interesting syscall numbers. All syscalls of a task are paying
> > non-trivial seccomp penalty due to such design. If bpf was attached per
> > syscall it would have been much cheaper. Of course doing it this way
> > for seccomp is not easy, but for lsm such facility is already there.
> > Blank call of a single bpf prog for all lsm hooks is unnecessary
> > overhead that can and should be avoided.
> 
> It's probably a misunderstanding. Contrary to seccomp which run all the
> thread's BPF programs for any syscall, Landlock only run eBPF programs
> for the triggered LSM hooks, if their type match. Indeed, thanks to the
> multiple eBPF program types and contrary to seccomp, Landlock only run
> an eBPF program when needed. Landlock will have almost no performance
> overhead if the syscalls do not trigger the watched LSM hooks for the
> current process.

that's not what I see in the patch 06/10:
all lsm_hooks in 'static struct security_hook_list landlock_hooks'
(which eventually means all lsm hooks) will call
static inline int landlock_hook_##NAME
which will call landlock_run_prog()
which does:
+ for (landlock_ret = current->seccomp.landlock_ret;
+  landlock_ret; landlock_ret = landlock_ret->prev) {
+if (landlock_ret->triggered) {
+   ctx.cookie = landlock_ret->cookie;
+   for (prog = current->seccomp.landlock_prog;
+prog; prog = prog->prev) {
+   if (prog->filter == landlock_ret->filter) {
+   cur_ret = BPF_PROG_RUN(prog->prog, (void *));
+   break;
+   }
+   }

that is unacceptable overhead and not a scalable design.
It kinda works for 3 lsm_hooks as in patch 6, but doesn't scale
as soon as more lsm hooks are added.

> As said above, Landlock will not run an eBPF programs when not strictly
> needed. Attaching to a cgroup will have the same performance impact as
> attaching to a process hierarchy.

Having a prog per cgroup per lsm_hook is the only scalable way I
could come up with. If you see another way, please propose.
current->seccomp.landlock_prog is not the answer.



Re: [PATCH] [v9] net: emac: emac gigabit ethernet controller driver

2016-08-27 Thread Florian Fainelli
On August 27, 2016 5:26:39 AM PDT, Timur Tabi  wrote:
>David Miller wrote:
>> From: Timur Tabi 
>> Date: Thu, 25 Aug 2016 16:39:03 -0500
>>
>>> +static const u8 duuid[] = {
>>> +   0x77, 0x79, 0x60, 0xbf,
>>> +   0x2d, 0xab,
>>> +   0x9d, 0x4b,
>>> +   0x94, 0xf0,
>>> +   0xe1, 0x11, 0x98, 0x01, 0xa2, 0xba
>>> +};
>>
>> This seems to be completely unused, please remove it.
>
>Sorry, I missed that during my cleanup.  I'll remove it in v10.
>
>I wonder why my compiler didn't complain about the unused static.

AFAICT this really depends on how modern your compiler is and which warnings it 
may now enable by default.

Since you are very close from an accepted driver, you may want to squash 
potential warnings (and more) with doing a sparse build (make C=) which covers 
a bit more than what a traditional C compiler does.

Almost there!

-- 
Florian


Re: [RFCv2 07/16] bpf: enable non-core use of the verfier

2016-08-27 Thread Alexei Starovoitov
On Sat, Aug 27, 2016 at 12:40:04PM +0100, Jakub Kicinski wrote:
> On Fri, 26 Aug 2016 16:29:05 -0700, Alexei Starovoitov wrote:
> > On Fri, Aug 26, 2016 at 07:06:06PM +0100, Jakub Kicinski wrote:
> > > Advanced JIT compilers and translators may want to use
> > > eBPF verifier as a base for parsers or to perform custom
> > > checks and validations.
> > > 
> > > Add ability for external users to invoke the verifier
> > > and provide callbacks to be invoked for every intruction
> > > checked.  For now only add most basic callback for
> > > per-instruction pre-interpretation checks is added.  More
> > > advanced users may also like to have per-instruction post
> > > callback and state comparison callback.
> > > 
> > > Signed-off-by: Jakub Kicinski   
> > 
> > I like the apporach. Making verifier into 'bytecode parser'
> > that JITs can reuse is a good design choice.
> > The only thing I would suggest is to tweak the verifier to
> > avoid in-place state recording. Then I think patch 8 for
> > clone/unclone of the program won't be needed, since verifier
> > will be read-only from bytecode point of view and patch 9
> > also will be slightly cleaner.
> > I think there are very few places in verifier that do this
> > state keeping inside insn. It was bugging me for some time.
> > Good time to clean that up.
> > Unless I misunderstand the patches 7,8,9...
> 
> Agreed, I think the verifier only modifies the program to
> store pointer types in imm field.  I will try to come up
> a way around this, any suggestions?  Perhaps state_equal()

probably array_of_insn_aux_data[num_insns] should do it.
Unlike reg_state that is forked on branches, this array
is only one.

> logic could be modified to downgrade pointers to UNKONWNs
> when it detects other state had incompatible pointer type.
> 
> > There is also small concern for patches 5 and 6 that add
> > more register state information. Potentially that extra
> > state can prevent states_equal() to recognize equivalent
> > states. Only patch 9 uses that info, right?
> 
> 5 and 6 recognize more constant loads, those can only
> upgrade some UNKNOWN_VALUEs to CONST_IMMs.  So yes, if the
> verifier hits the CONST first and then tries with UNKNOWN
> it will have to reverify the path.  
> 
> > Another question is do you need all state walking that
> > verifier does or single linear pass through insns
> > would have worked?
> > Looks like you're only using CONST_IMM and PTR_TO_CTX
> > state, right?
> 
> I think I need all the parsing.  Right now I mostly need
> the verification to check that exit codes are specific
> CONST_IMMs.  Clang quite happily does this:
> 
> r0 <- 0
> if (...)
>   r0 <- 1
> exit

I see. Indeed then you'd need the verifier to walk all paths
to make sure constant return values.
If you only need yes/no check then such info can probably be
collected unconditionally during initial program load.
Like prog->cb_access flag.



Re: [PATCH] bridge: Fix format string for %ul

2016-08-27 Thread Oleg Drokin

On Aug 27, 2016, at 12:18 PM, Sergei Shtylyov wrote:

> Hello.
> 
> On 8/27/2016 6:58 PM, Oleg Drokin wrote:
> 
 %ul would print an unsigned value and a letter l,
 likely it was %lu that was meant to print the long int,
 but in reality the values printed there are just regular signed
>>> 
>>>  Signed? Then you need probably "%d" or "%i"…
>> 
>> They are signed in the struct definition, but in reality they
>> designate time, so could not be negative, I imagine?
> 
>   That doesn't matter. If the type is signed, it should be printed as signed.
> Doesn't gcc complain about the format specifiers not matching the values 
> passed?

only if the size differs.

I don't care either way.
I can change the struct to unsigned too, but if we want super correctness,
the problem lies deeper.

E.g. if we look how those times are derived, the age comes as:
static inline int br_get_ticks(const unsigned char *src)
{
unsigned long ticks = get_unaligned_be16(src);

return DIV_ROUND_UP(ticks * HZ, STP_HZ);
}

So we divide unsigned value by positive value and get a signed result ;)

there's a struct net_bridge_port with similar members in this are that are
unsigned long, so I guess we should convert to unsigned long in the
struct br_config_bpdu, do you want a patch for that separately or as part of 
this one?

> 
 ints, so just dropping the l altogether.
 
 Signed-off-by: Oleg Drokin 
>>> [...]
> 
> MBR, Sergei



[PATCH v2] net: smc91x: fix SMC accesses

2016-08-27 Thread Russell King
Commit b70661c70830 ("net: smc91x: use run-time configuration on all ARM
machines") broke some ARM platforms through several mistakes.  Firstly,
the access size must correspond to the following rule:

(a) at least one of 16-bit or 8-bit access size must be supported
(b) 32-bit accesses are optional, and may be enabled in addition to
the above.

Secondly, it provides no emulation of 16-bit accesses, instead blindly
making 16-bit accesses even when the platform specifies that only 8-bit
is supported.

Reorganise smc91x.h so we can make use of the existing 16-bit access
emulation already provided - if 16-bit accesses are supported, use
16-bit accesses directly, otherwise if 8-bit accesses are supported,
use the provided 16-bit access emulation.  If neither, BUG().  This
exactly reflects the driver behaviour prior to the commit being fixed.

Since the conversion incorrectly cut down the available access sizes on
several platforms, we also need to go through every platform and fix up
the overly-restrictive access size: Arnd assumed that if a platform can
perform 32-bit, 16-bit and 8-bit accesses, then only a 32-bit access
size needed to be specified - not so, all available access sizes must
be specified.

This likely fixes some performance regressions in doing this: if a
platform does not support 8-bit accesses, 8-bit accesses have been
emulated by performing a 16-bit read-modify-write access.

Tested on the Intel Assabet/Neponset platform, which supports only 8-bit
accesses, which was broken by the original commit.

Fixes: b70661c70830 ("net: smc91x: use run-time configuration on all ARM 
machines")
Signed-off-by: Russell King 
---
Robert, this is the full patch - can I add your tested-by to this for
the subset patch please?

Nico, I'd appreciate at least an acked-by from you as an approval for
this approach and to confirm that what I've said above is correct.
I'm not asking for you to test it.

Thanks.

Sorry, v2 against mainline rather than with some of my other SMC91x
changes.  Please ignore the previous version.

 arch/arm/mach-pxa/idp.c|  3 +-
 arch/arm/mach-pxa/xcep.c   |  3 +-
 arch/arm/mach-realview/core.c  |  3 +-
 arch/arm/mach-sa1100/pleb.c|  2 +-
 arch/blackfin/mach-bf561/boards/cm_bf561.c |  3 +-
 arch/blackfin/mach-bf561/boards/ezkit.c|  3 +-
 drivers/net/ethernet/smsc/smc91x.c |  7 
 drivers/net/ethernet/smsc/smc91x.h | 65 +-
 include/linux/smc91x.h | 10 +
 9 files changed, 73 insertions(+), 26 deletions(-)

diff --git a/arch/arm/mach-pxa/idp.c b/arch/arm/mach-pxa/idp.c
index c410d84b243d..66070acaa888 100644
--- a/arch/arm/mach-pxa/idp.c
+++ b/arch/arm/mach-pxa/idp.c
@@ -83,7 +83,8 @@ static struct resource smc91x_resources[] = {
 };
 
 static struct smc91x_platdata smc91x_platdata = {
-   .flags = SMC91X_USE_32BIT | SMC91X_USE_DMA | SMC91X_NOWAIT,
+   .flags = SMC91X_USE_8BIT | SMC91X_USE_16BIT | SMC91X_USE_32BIT |
+SMC91X_USE_DMA | SMC91X_NOWAIT,
 };
 
 static struct platform_device smc91x_device = {
diff --git a/arch/arm/mach-pxa/xcep.c b/arch/arm/mach-pxa/xcep.c
index 3f06cd90567a..056369ef250e 100644
--- a/arch/arm/mach-pxa/xcep.c
+++ b/arch/arm/mach-pxa/xcep.c
@@ -120,7 +120,8 @@ static struct resource smc91x_resources[] = {
 };
 
 static struct smc91x_platdata xcep_smc91x_info = {
-   .flags  = SMC91X_USE_32BIT | SMC91X_NOWAIT | SMC91X_USE_DMA,
+   .flags  = SMC91X_USE_8BIT | SMC91X_USE_16BIT | SMC91X_USE_32BIT |
+ SMC91X_NOWAIT | SMC91X_USE_DMA,
 };
 
 static struct platform_device smc91x_device = {
diff --git a/arch/arm/mach-realview/core.c b/arch/arm/mach-realview/core.c
index baf174542e36..a0ead0ae23d6 100644
--- a/arch/arm/mach-realview/core.c
+++ b/arch/arm/mach-realview/core.c
@@ -93,7 +93,8 @@ static struct smsc911x_platform_config smsc911x_config = {
 };
 
 static struct smc91x_platdata smc91x_platdata = {
-   .flags = SMC91X_USE_32BIT | SMC91X_NOWAIT,
+   .flags = SMC91X_USE_8BIT | SMC91X_USE_16BIT | SMC91X_USE_32BIT |
+SMC91X_NOWAIT,
 };
 
 static struct platform_device realview_eth_device = {
diff --git a/arch/arm/mach-sa1100/pleb.c b/arch/arm/mach-sa1100/pleb.c
index 1525d7b5f1b7..88149f85bc49 100644
--- a/arch/arm/mach-sa1100/pleb.c
+++ b/arch/arm/mach-sa1100/pleb.c
@@ -45,7 +45,7 @@ static struct resource smc91x_resources[] = {
 };
 
 static struct smc91x_platdata smc91x_platdata = {
-   .flags = SMC91X_USE_16BIT | SMC91X_NOWAIT,
+   .flags = SMC91X_USE_16BIT | SMC91X_USE_8BIT | SMC91X_NOWAIT,
 };
 
 static struct platform_device smc91x_device = {
diff --git a/arch/blackfin/mach-bf561/boards/cm_bf561.c 
b/arch/blackfin/mach-bf561/boards/cm_bf561.c
index c6db52ba3a06..10c57771822d 100644
--- a/arch/blackfin/mach-bf561/boards/cm_bf561.c
+++ b/arch/blackfin/mach-bf561/boards/cm_bf561.c
@@ -146,7 +146,8 @@ static struct 

[PATCH] net: smc91x: fix SMC accesses

2016-08-27 Thread Russell King
Commit b70661c70830 ("net: smc91x: use run-time configuration on all ARM
machines") broke some ARM platforms through several mistakes.  Firstly,
the access size must correspond to the following rule:

(a) at least one of 16-bit or 8-bit access size must be supported
(b) 32-bit accesses are optional, and may be enabled in addition to
the above.

Secondly, it provides no emulation of 16-bit accesses, instead blindly
making 16-bit accesses even when the platform specifies that only 8-bit
is supported.

Reorganise smc91x.h so we can make use of the existing 16-bit access
emulation already provided - if 16-bit accesses are supported, use
16-bit accesses directly, otherwise if 8-bit accesses are supported,
use the provided 16-bit access emulation.  If neither, BUG().  This
exactly reflects the driver behaviour prior to the commit being fixed.

Since the conversion incorrectly cut down the available access sizes on
several platforms, we also need to go through every platform and fix up
the overly-restrictive access size: Arnd assumed that if a platform can
perform 32-bit, 16-bit and 8-bit accesses, then only a 32-bit access
size needed to be specified - not so, all available access sizes must
be specified.

This likely fixes some performance regressions in doing this: if a
platform does not support 8-bit accesses, 8-bit accesses have been
emulated by performing a 16-bit read-modify-write access.

Tested on the Intel Assabet/Neponset platform, which supports only 8-bit
accesses, which was broken by the original commit.

Fixes: b70661c70830 ("net: smc91x: use run-time configuration on all ARM 
machines")
Signed-off-by: Russell King 
---
Robert, this is the full patch - can I add your tested-by to this for
the subset patch please?

Nico, I'd appreciate at least an acked-by from you as an approval for
this approach and to confirm that what I've said above is correct.
I'm not asking for you to test it.

Thanks.

 arch/arm/mach-pxa/idp.c|  3 +-
 arch/arm/mach-pxa/xcep.c   |  3 +-
 arch/arm/mach-realview/core.c  |  3 +-
 arch/arm/mach-sa1100/pleb.c|  2 +-
 arch/blackfin/mach-bf561/boards/cm_bf561.c |  3 +-
 arch/blackfin/mach-bf561/boards/ezkit.c|  3 +-
 drivers/net/ethernet/smsc/smc91x.c |  7 
 drivers/net/ethernet/smsc/smc91x.h | 65 +-
 include/linux/smc91x.h | 10 +
 9 files changed, 73 insertions(+), 26 deletions(-)

diff --git a/arch/arm/mach-pxa/idp.c b/arch/arm/mach-pxa/idp.c
index c410d84b243d..66070acaa888 100644
--- a/arch/arm/mach-pxa/idp.c
+++ b/arch/arm/mach-pxa/idp.c
@@ -83,7 +83,8 @@ static struct resource smc91x_resources[] = {
 };
 
 static struct smc91x_platdata smc91x_platdata = {
-   .flags = SMC91X_USE_32BIT | SMC91X_USE_DMA | SMC91X_NOWAIT,
+   .flags = SMC91X_USE_8BIT | SMC91X_USE_16BIT | SMC91X_USE_32BIT |
+SMC91X_USE_DMA | SMC91X_NOWAIT,
 };
 
 static struct platform_device smc91x_device = {
diff --git a/arch/arm/mach-pxa/xcep.c b/arch/arm/mach-pxa/xcep.c
index 3f06cd90567a..056369ef250e 100644
--- a/arch/arm/mach-pxa/xcep.c
+++ b/arch/arm/mach-pxa/xcep.c
@@ -120,7 +120,8 @@ static struct resource smc91x_resources[] = {
 };
 
 static struct smc91x_platdata xcep_smc91x_info = {
-   .flags  = SMC91X_USE_32BIT | SMC91X_NOWAIT | SMC91X_USE_DMA,
+   .flags  = SMC91X_USE_8BIT | SMC91X_USE_16BIT | SMC91X_USE_32BIT |
+ SMC91X_NOWAIT | SMC91X_USE_DMA,
 };
 
 static struct platform_device smc91x_device = {
diff --git a/arch/arm/mach-realview/core.c b/arch/arm/mach-realview/core.c
index baf174542e36..a0ead0ae23d6 100644
--- a/arch/arm/mach-realview/core.c
+++ b/arch/arm/mach-realview/core.c
@@ -93,7 +93,8 @@ static struct smsc911x_platform_config smsc911x_config = {
 };
 
 static struct smc91x_platdata smc91x_platdata = {
-   .flags = SMC91X_USE_32BIT | SMC91X_NOWAIT,
+   .flags = SMC91X_USE_8BIT | SMC91X_USE_16BIT | SMC91X_USE_32BIT |
+SMC91X_NOWAIT,
 };
 
 static struct platform_device realview_eth_device = {
diff --git a/arch/arm/mach-sa1100/pleb.c b/arch/arm/mach-sa1100/pleb.c
index 1525d7b5f1b7..88149f85bc49 100644
--- a/arch/arm/mach-sa1100/pleb.c
+++ b/arch/arm/mach-sa1100/pleb.c
@@ -45,7 +45,7 @@ static struct resource smc91x_resources[] = {
 };
 
 static struct smc91x_platdata smc91x_platdata = {
-   .flags = SMC91X_USE_16BIT | SMC91X_NOWAIT,
+   .flags = SMC91X_USE_16BIT | SMC91X_USE_8BIT | SMC91X_NOWAIT,
 };
 
 static struct platform_device smc91x_device = {
diff --git a/arch/blackfin/mach-bf561/boards/cm_bf561.c 
b/arch/blackfin/mach-bf561/boards/cm_bf561.c
index c6db52ba3a06..10c57771822d 100644
--- a/arch/blackfin/mach-bf561/boards/cm_bf561.c
+++ b/arch/blackfin/mach-bf561/boards/cm_bf561.c
@@ -146,7 +146,8 @@ static struct platform_device hitachi_fb_device = {
 #include 
 
 static struct smc91x_platdata smc91x_info = {
-   .flags = 

[PATCH] fix:gpio: mark symbols static where possible

2016-08-27 Thread Baoyou Xie
We get 1 warning about global functions without a declaration
in the ath9k gpio driver when building with W=1:
drivers/net/wireless/ath/ath9k/gpio.c:25:6: warning: no previous prototype for 
'ath_fill_led_pin' [-Wmissing-prototypes]

In fact, this function is only used in the file in which it is declared
and don't need a declaration, but can be made static.
so this patch marks it 'static'.

Signed-off-by: Baoyou Xie 
---
 drivers/net/wireless/ath/ath9k/gpio.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/ath9k/gpio.c 
b/drivers/net/wireless/ath/ath9k/gpio.c
index 490f74d..ddb2886 100644
--- a/drivers/net/wireless/ath/ath9k/gpio.c
+++ b/drivers/net/wireless/ath/ath9k/gpio.c
@@ -22,7 +22,7 @@
 
 #ifdef CONFIG_MAC80211_LEDS
 
-void ath_fill_led_pin(struct ath_softc *sc)
+static void ath_fill_led_pin(struct ath_softc *sc)
 {
struct ath_hw *ah = sc->sc_ah;
 
-- 
2.7.4



Re: [PATCH net-next] tcp: add tcp_add_backlog()

2016-08-27 Thread Eric Dumazet
On Sat, 2016-08-27 at 09:13 -0700, Yuchung Cheng wrote:
> On Sat, Aug 27, 2016 at 7:37 AM, Eric Dumazet  wrote:
> >

> > +   /* Only socket owner can try to collapse/prune rx queues
> > +* to reduce memory overhead, so add a little headroom here.
> > +* Few sockets backlog are possibly concurrently non empty.
> > +*/
> > +   limit += 64*1024;
> Just a thought: only add the headroom if ofo queue exists (e.g., signs
> of losses ore recovery).

Testing the ofo would add a cache line miss, and likely slow down the
other cpu processing the other packets for this flow.

Also, even if the ofo does not exist, the sk_rcvbuf budget can be
consumed by regular receive queue.

We still need to be able to process incoming ACK, if both send and
receive queues are 'full'.

> 
> btw is the added headroom subject to the memory pressure check?

Remind that the backlog check here is mostly to avoid some kind of DOS
attacks that we had in the past.

While we should definitely prevents DOS attacks, we should also not drop
legitimate traffic.

Here, number of backlogged sockets is limited by the number of cpus in
the host (if CONFIG_PREEMPT is disabled), or number of threads blocked
during a sendmsg()/recvmsg() (if CONFIG_PREEMPT is enabled)

So we do not need to be ultra precise, just have a safe guard.

The pressure check will be done at the time skbs will be added into a
receive/ofo queue in the very near future.

Thanks !





Re: [PATCH] bridge: Fix format string for %ul

2016-08-27 Thread Sergei Shtylyov

Hello.

On 8/27/2016 6:58 PM, Oleg Drokin wrote:


%ul would print an unsigned value and a letter l,
likely it was %lu that was meant to print the long int,
but in reality the values printed there are just regular signed


  Signed? Then you need probably "%d" or "%i"…


They are signed in the struct definition, but in reality they
designate time, so could not be negative, I imagine?


   That doesn't matter. If the type is signed, it should be printed as signed.
Doesn't gcc complain about the format specifiers not matching the values passed?


ints, so just dropping the l altogether.

Signed-off-by: Oleg Drokin 

[...]


MBR, Sergei



[PATCH] fix:rtl8xxxu_core: mark symbols static where possible

2016-08-27 Thread Baoyou Xie
We get 1 warning about global functions without a declaration
in the rtl8xxxu rtl8xxxu_core.c when building with W=1:
drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c:898:1: warning: no 
previous prototype for 'rtl8xxxu_gen1_h2c_cmd' [-Wmissing-prototypes]

In fact, this function is only used in the file in which it is declared
and don't need a declaration, but can be made static.
so this patch marks it 'static'.

Signed-off-by: Baoyou Xie 
---
 drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c 
b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c
index 77048db..95b54b8 100644
--- a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c
+++ b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c
@@ -894,7 +894,7 @@ int rtl8xxxu_write_rfreg(struct rtl8xxxu_priv *priv,
return retval;
 }
 
-int
+static int
 rtl8xxxu_gen1_h2c_cmd(struct rtl8xxxu_priv *priv, struct h2c_cmd *h2c, int len)
 {
struct device *dev = >udev->dev;
-- 
2.7.4



Re: [PATCH net-next] tcp: add tcp_add_backlog()

2016-08-27 Thread Yuchung Cheng
On Sat, Aug 27, 2016 at 7:37 AM, Eric Dumazet  wrote:
>
> From: Eric Dumazet 
>
> When TCP operates in lossy environments (between 1 and 10 % packet
> losses), many SACK blocks can be exchanged, and I noticed we could
> drop them on busy senders, if these SACK blocks have to be queued
> into the socket backlog.
>
> While the main cause is the poor performance of RACK/SACK processing,
I have a patch in preparation for this ;)

> we can try to avoid these drops of valuable information that can lead to
> spurious timeouts and retransmits.
>
> Cause of the drops is the skb->truesize overestimation caused by :
>
> - drivers allocating ~2048 (or more) bytes as a fragment to hold an
>   Ethernet frame.
>
> - various pskb_may_pull() calls bringing the headers into skb->head
>   might have pulled all the frame content, but skb->truesize could
>   not be lowered, as the stack has no idea of each fragment truesize.
>
> The backlog drops are also more visible on bidirectional flows, since
> their sk_rmem_alloc can be quite big.
>
> Let's add some room for the backlog, as only the socket owner
> can selectively take action to lower memory needs, like collapsing
> receive queues or partial ofo pruning.
>
> Signed-off-by: Eric Dumazet 
> Cc: Yuchung Cheng 
> Cc: Neal Cardwell 
> ---
>  include/net/tcp.h   |1 +
>  net/ipv4/tcp_ipv4.c |   33 +
>  net/ipv6/tcp_ipv6.c |5 +
>  3 files changed, 31 insertions(+), 8 deletions(-)
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 
> 25d64f6de69e1f639ed1531bf2d2df3f00fd76a2..5f5f09f6e019682ef29c864d2f43a8f247fcdd9a
>  100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -1163,6 +1163,7 @@ static inline void tcp_prequeue_init(struct tcp_sock 
> *tp)
>  }
>
>  bool tcp_prequeue(struct sock *sk, struct sk_buff *skb);
> +bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb);
>
>  #undef STATE_TRACE
>
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 
> ad41e8ecf796bba1bd6d9ed155ca4a57ced96844..53e80cd004b6ce401c3acbb4b243b243c5c3c4a3
>  100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1532,6 +1532,34 @@ bool tcp_prequeue(struct sock *sk, struct sk_buff *skb)
>  }
>  EXPORT_SYMBOL(tcp_prequeue);
>
> +bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb)
> +{
> +   u32 limit = sk->sk_rcvbuf + sk->sk_sndbuf;
> +
> +   /* Only socket owner can try to collapse/prune rx queues
> +* to reduce memory overhead, so add a little headroom here.
> +* Few sockets backlog are possibly concurrently non empty.
> +*/
> +   limit += 64*1024;
Just a thought: only add the headroom if ofo queue exists (e.g., signs
of losses ore recovery).

btw is the added headroom subject to the memory pressure check?

> +
> +   /* In case all data was pulled from skb frags (in __pskb_pull_tail()),
> +* we can fix skb->truesize to its real value to avoid future drops.
> +* This is valid because skb is not yet charged to the socket.
> +* It has been noticed pure SACK packets were sometimes dropped
> +* (if cooked by drivers without copybreak feature).
> +*/
> +   if (!skb->data_len)
> +   skb->truesize = SKB_TRUESIZE(skb_end_offset(skb));
nice!

> +
> +   if (unlikely(sk_add_backlog(sk, skb, limit))) {
> +   bh_unlock_sock(sk);
> +   __NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPBACKLOGDROP);
> +   return true;
> +   }
> +   return false;
> +}
> +EXPORT_SYMBOL(tcp_add_backlog);
> +
>  /*
>   * From tcp_input.c
>   */
> @@ -1662,10 +1690,7 @@ process:
> if (!sock_owned_by_user(sk)) {
> if (!tcp_prequeue(sk, skb))
> ret = tcp_v4_do_rcv(sk, skb);
> -   } else if (unlikely(sk_add_backlog(sk, skb,
> -  sk->sk_rcvbuf + sk->sk_sndbuf))) {
> -   bh_unlock_sock(sk);
> -   __NET_INC_STATS(net, LINUX_MIB_TCPBACKLOGDROP);
> +   } else if (tcp_add_backlog(sk, skb)) {
> goto discard_and_relse;
> }
> bh_unlock_sock(sk);
> diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
> index 
> e4f55683af314438da8a09473927213a140e6e5c..5bf460bd299ff61ae759e8e545ba6298a1d1373c
>  100644
> --- a/net/ipv6/tcp_ipv6.c
> +++ b/net/ipv6/tcp_ipv6.c
> @@ -1467,10 +1467,7 @@ process:
> if (!sock_owned_by_user(sk)) {
> if (!tcp_prequeue(sk, skb))
> ret = tcp_v6_do_rcv(sk, skb);
> -   } else if (unlikely(sk_add_backlog(sk, skb,
> -  sk->sk_rcvbuf + sk->sk_sndbuf))) {
> -   bh_unlock_sock(sk);
> -   __NET_INC_STATS(net, LINUX_MIB_TCPBACKLOGDROP);
> +   } else if (tcp_add_backlog(sk, skb)) {
> goto discard_and_relse;
> 

Re: [PATCH] bridge: Fix format string for %ul

2016-08-27 Thread Oleg Drokin

On Aug 27, 2016, at 11:13 AM, Sergei Shtylyov wrote:

> Hello.
> 
> On 8/27/2016 6:10 AM, Oleg Drokin wrote:
> 
>> %ul would print an unsigned value and a letter l,
>> likely it was %lu that was meant to print the long int,
>> but in reality the values printed there are just regular signed
> 
>   Signed? Then you need probably "%d" or "%i"…

They are signed in the struct definition, but in reality they
designate time, so could not be negative, I imagine?

> 
>> ints, so just dropping the l altogether.
>> 
>> Signed-off-by: Oleg Drokin 
> [...]
> 
> MBR, Sergei



Re: [PATCH] smc91x: remove ARM hack for unaligned 16-bit writes

2016-08-27 Thread Russell King - ARM Linux
On Sat, Aug 27, 2016 at 05:30:42PM +0200, Robert Jarzmik wrote:
> Hi Russell,
> 
> With your patch :
>  - lubbock, mainstone and zylonite boards are working correctly
>=> ie. all my boards are working correctly
> 
>  - the message "ifconfig: SIOCSIFHWADDR: Device or resource busy" is still 
> there
>=> this message is here even without your patch, so it's rather not 
> relevant
>=> this message is triggered by an "/sbin/ifconfig eth0 hw ether
>08:00:3e:26:0a:5b" command
> 
> So from a PXA testing point of view it's all good, ie.
> Tested-by: Robert Jarzmik 

Thanks for testing, that's good news.

You'll get "SIOCSIFHWADDR: Device or resource busy" if the device is
already up and has no support for changing the address while the NIC
is "live" - we only program the MAC address when bringing the smc91x
device up (or resetting it after a timeout).

See net/ethernet/eth.c::eth_prepare_mac_addr_change() and smc_enable().

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.


Re: [PATCH] smc91x: remove ARM hack for unaligned 16-bit writes

2016-08-27 Thread Robert Jarzmik
Russell King - ARM Linux  writes:

> On Thu, Aug 25, 2016 at 08:02:35PM +0200, Robert Jarzmik wrote:
>> Arnd Bergmann  writes:
>> 
>> > On Thursday, August 25, 2016 4:43:04 PM CEST Arnd Bergmann wrote:
>> >>  drivers/net/ethernet/smsc/smc91x.h | 50 
>> >> +++---
>> >>  1 file changed, 30 insertions(+), 20 deletions(-)
>> >> 
>> >> While this patch fixes one bug on Neponset, it probably doesn't address
>> >> the one that Russell ran into first, so this is for review only for now,
>> >> until the remaining problem(s) have been worked out.
>> >> 
>> >
>> > The comment should have been on another patch, my mistake. please
>> > see v2.
>> >
>> >Arnd
>> 
>> Hi Arnd,
>> 
>> I didn't review the patch thoroughly, but I launched your 2 patches in my pxa
>> little farm.
>> 
>> The result is that lubbock and mainstone are all right, but zylonite is 
>> broken
>> (ie. networkless). I removed then these 2 patches and zylonite worked again.
>> 
>> I have also an error message on the console on a "broken" zylonite :
>>   Changing smcs91x MAC address to 08:00:3e:26:0a:5b: ifconfig: SIOCSIFHWADDR:
>>   Device or resource busy
>> 
>> I reran the test twice (2 times with your patches, 2 times without), the 
>> result
>> looks consistent, ie. zylonite doesn't really like them.
>
> Please try the patch below.  I sent this to Will a few days ago, as
> he said (on irc) that he was also seeing problems on a platform he
> had, but I've yet to hear back.  I've not posted it yet because I
> haven't got around to writing a commit description for it.
>
> It does require that at least one of 8-bit or 16-bit accesses are
> supported, but I think that's already true.

Hi Russell,

With your patch :
 - lubbock, mainstone and zylonite boards are working correctly
   => ie. all my boards are working correctly

 - the message "ifconfig: SIOCSIFHWADDR: Device or resource busy" is still there
   => this message is here even without your patch, so it's rather not relevant
   => this message is triggered by an "/sbin/ifconfig eth0 hw ether
   08:00:3e:26:0a:5b" command

So from a PXA testing point of view it's all good, ie.
Tested-by: Robert Jarzmik 

Cheers.

-- 
Robert


Re: [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing (cgroup delegation)

2016-08-27 Thread Mickaël Salaün
Cc Tejun and the cgroups ML.

On 27/08/2016 17:10, Mickaël Salaün wrote:
> On 27/08/2016 09:40, Andy Lutomirski wrote:
>> On Thu, Aug 25, 2016 at 3:32 AM, Mickaël Salaün  wrote:
>>>
>>> # Sandbox example with conditional access control depending on cgroup
>>>
>>>   $ mkdir /sys/fs/cgroup/sandboxed
>>>   $ ls /home
>>>   user1
>>>   $ LANDLOCK_CGROUPS='/sys/fs/cgroup/sandboxed' \
>>>   LANDLOCK_ALLOWED='/bin:/lib:/usr:/tmp:/proc/self/fd/0' \
>>>   ./sandbox /bin/sh -i
>>>   $ ls /home
>>>   user1
>>>   $ echo $$ > /sys/fs/cgroup/sandboxed/cgroup.procs
>>>   $ ls /home
>>>   ls: cannot open directory '/home': Permission denied
>>>
>>
>> Something occurs to me that isn't strictly relevant to landlock but
>> may be relevant to unprivileged cgroups: can you cause trouble by
>> setting up a nastily-configured cgroup and running a setuid program in
>> it?
>>
> 
> I hope not… But the use of cgroups should not be mandatory for Landlock.
> 

In a previous email:

On 26/08/2016 17:50, Tejun Heo wrote:
> I haven't looked in detail but in general I'm not too excited about
> layering security mechanism on top of cgroup.  Maybe it makes some
> sense when security domain coincides with resource domains but at any
> rate please keep me in the loop.




signature.asc
Description: OpenPGP digital signature


Re: [PATCH] ipv6: Use inbound ifaddr as source addresses for ICMPv6 errors

2016-08-27 Thread Sergei Shtylyov

Hello.

On 8/27/2016 4:05 PM, Eli Cooper wrote:


According to RFC 1885 2.2(c), the source address of ICMPv6
errors in response to forwarded packets should be set to the
unicast address of the forwarding interface in order to be helpful
in diagnosis. Currently the selection of source address is based
on the default route, without respect to the inbound interface.

This patch sets the source address of ICMPv6 error messages to
the address of inbound interface, with the exception of
'time exceeded' and 'packet to big' messages sent in ip6_forward(),
where the address of OUTPUT device is forced as source address
(however, it is NOT enforced as claimed without this patch).

Signed-off-by: Eli Cooper 
---
 net/ipv6/icmp.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index bd59c34..0e52f3b 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c

[...]

@@ -421,6 +422,12 @@ static void icmp6_send(struct sk_buff *skb, u8 type, u8 
code, __u32 info,
 */
addr_type = ipv6_addr_type(>daddr);

+   if (type == ICMPV6_DEST_UNREACH || type == ICMPV6_PKT_TOOBIG ||
+   type == ICMPV6_TIME_EXCEED || type == ICMPV6_PARAMPROB)


   This is asking to be a *switch* statement instead.


+   if (!ipv6_dev_get_saddr(net, skb->dev, >saddr, 0,
+   _saddr))
+   saddr = _saddr;
+
if (ipv6_chk_addr(net, >daddr, skb->dev, 0) ||
ipv6_chk_acast_addr_src(net, skb->dev, >daddr))
saddr = >daddr;


MBR, Sergei



Re: [PATCH] bridge: Fix format string for %ul

2016-08-27 Thread Sergei Shtylyov

Hello.

On 8/27/2016 6:10 AM, Oleg Drokin wrote:


%ul would print an unsigned value and a letter l,
likely it was %lu that was meant to print the long int,
but in reality the values printed there are just regular signed


   Signed? Then you need probably "%d" or "%i"...


ints, so just dropping the l altogether.

Signed-off-by: Oleg Drokin 

[...]

MBR, Sergei



Re: [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing

2016-08-27 Thread Mickaël Salaün

On 27/08/2016 09:40, Andy Lutomirski wrote:
> On Thu, Aug 25, 2016 at 3:32 AM, Mickaël Salaün  wrote:
>> Hi,
>>
>> This series is a proof of concept to fill some missing part of seccomp as the
>> ability to check syscall argument pointers or creating more dynamic security
>> policies. The goal of this new stackable Linux Security Module (LSM) called
>> Landlock is to allow any process, including unprivileged ones, to create
>> powerful security sandboxes comparable to the Seatbelt/XNU Sandbox or the
>> OpenBSD Pledge. This kind of sandbox help to mitigate the security impact of
>> bugs or unexpected/malicious behaviors in userland applications.
>>
>> The first RFC [1] was focused on extending seccomp while staying at the 
>> syscall
>> level. This brought a working PoC but with some (mitigated) ToCToU race
>> conditions due to the seccomp ptrace hole (now fixed) and the non-atomic
>> syscall argument evaluation (hence the LSM hooks).
>>
>>
>> # Landlock LSM
>>
>> This second RFC is a fresh revamp of the code while keeping some working 
>> ideas.
>> This series is mainly focused on LSM hooks, while keeping the possibility to
>> tied them to syscalls. This new code removes all race conditions by design. 
>> It
>> now use eBPF instead of a subset of cBPF (as used by seccomp-bpf). This allow
>> to remove the previous stacked cBPF hack to do complex access checks thanks 
>> to
>> dedicated eBPF functions. An eBPF program is still very limited (i.e. can 
>> only
>> call a whitelist of functions) and can not do a denial of service (i.e. no
>> loop). The other major improvement is the replacement of the previous custom
>> checker groups of syscall arguments with a new dedicated eBPF map to collect
>> and compare Landlock handles with system resources (e.g. files or network
>> connections).
>>
>> The approach taken is to add the minimum amount of code while still allowing
>> the userland to create quite complex access rules. A dedicated security 
>> policy
>> language such as used by SELinux, AppArmor and other major LSMs is a lot of
>> code and dedicated to a trusted process (i.e. root/administrator).
>>
> 
> I think there might be a problem with the current design.  If I add a
> seccomp filter that uses RET_LANDLOCK and some landlock filters, what
> happens if a second seccomp filter *also* uses RET_LANDLOCK?  I think
> they'll interfere with each other.  It might end up being necessary to
> require only one landlock seccomp layer at a time or to find a way to
> stick all the filters in a layer together with the LSM callbacks or
> maybe to just drop RET_LANDLOCK and let the callbacks look at the
> syscall args.

This is correctly managed. For each RET_LANDLOCK, if there is one or
more associated Landlock programs (i.e. created by the same thread after
this seccomp filters), there is one Landlock program instance run for
each seccomp that trigger them. This way, each cookie linked to a
RET_LANDLOCK is evaluated one time by each relevant Landlock program.

Example when a thread that loaded multiple seccomp filters (SF) and
multiple Landlock programs (LP) associated with one LSM hook: SF0, SF1,
LP0(file_open), SF2, LP1(file_open), LP2(file_permission)

* If SF0 returns RET_LANDLOCK(cookie0), then LP0 and LP1 are run with
cookie0 if the current syscall trigger the file_open hook, and LP2 is
run with cookie0 if the syscall trigger the file_permission hook.

* In addition to the previous case, if SF1 returns
RET_LANDLOCK(cookie1), then LP0 and LP1 are run with cookie1 if the
current syscall trigger the file_open hook, and LP2 is run with cookie1
if the syscall trigger the file_permission hook.

* In addition to the previous cases, if SF2 returns
RET_LANDLOCK(cookie2), then (only) LP1 is run with cookie2 if the
current syscall trigger the file_open hook, and LP2 is run with cookie2
if the syscall trigger the file_permission hook.


> 
> BTW, what happens if an LSM hook is called outside a syscall context,
> e.g. from a page fault?

Good catch! For now, only a syscall can trigger an LSM hook because of
the RET_LANDLOCK constraint. It may be wise to trigger them without a
cookie and add a dedicated variable in the eBPF context.

> 
>>
>>
>> # Sandbox example with conditional access control depending on cgroup
>>
>>   $ mkdir /sys/fs/cgroup/sandboxed
>>   $ ls /home
>>   user1
>>   $ LANDLOCK_CGROUPS='/sys/fs/cgroup/sandboxed' \
>>   LANDLOCK_ALLOWED='/bin:/lib:/usr:/tmp:/proc/self/fd/0' \
>>   ./sandbox /bin/sh -i
>>   $ ls /home
>>   user1
>>   $ echo $$ > /sys/fs/cgroup/sandboxed/cgroup.procs
>>   $ ls /home
>>   ls: cannot open directory '/home': Permission denied
>>
> 
> Something occurs to me that isn't strictly relevant to landlock but
> may be relevant to unprivileged cgroups: can you cause trouble by
> setting up a nastily-configured cgroup and running a setuid program in
> it?
> 

I hope not… But the use of cgroups should not be mandatory for Landlock.



signature.asc
Description: 

[PATCH net-next] tcp: add tcp_add_backlog()

2016-08-27 Thread Eric Dumazet
From: Eric Dumazet 

When TCP operates in lossy environments (between 1 and 10 % packet
losses), many SACK blocks can be exchanged, and I noticed we could
drop them on busy senders, if these SACK blocks have to be queued
into the socket backlog.

While the main cause is the poor performance of RACK/SACK processing,
we can try to avoid these drops of valuable information that can lead to
spurious timeouts and retransmits.

Cause of the drops is the skb->truesize overestimation caused by :

- drivers allocating ~2048 (or more) bytes as a fragment to hold an
  Ethernet frame.

- various pskb_may_pull() calls bringing the headers into skb->head
  might have pulled all the frame content, but skb->truesize could
  not be lowered, as the stack has no idea of each fragment truesize.

The backlog drops are also more visible on bidirectional flows, since
their sk_rmem_alloc can be quite big.

Let's add some room for the backlog, as only the socket owner
can selectively take action to lower memory needs, like collapsing
receive queues or partial ofo pruning.

Signed-off-by: Eric Dumazet 
Cc: Yuchung Cheng 
Cc: Neal Cardwell 
---
 include/net/tcp.h   |1 +
 net/ipv4/tcp_ipv4.c |   33 +
 net/ipv6/tcp_ipv6.c |5 +
 3 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 
25d64f6de69e1f639ed1531bf2d2df3f00fd76a2..5f5f09f6e019682ef29c864d2f43a8f247fcdd9a
 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1163,6 +1163,7 @@ static inline void tcp_prequeue_init(struct tcp_sock *tp)
 }
 
 bool tcp_prequeue(struct sock *sk, struct sk_buff *skb);
+bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb);
 
 #undef STATE_TRACE
 
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 
ad41e8ecf796bba1bd6d9ed155ca4a57ced96844..53e80cd004b6ce401c3acbb4b243b243c5c3c4a3
 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1532,6 +1532,34 @@ bool tcp_prequeue(struct sock *sk, struct sk_buff *skb)
 }
 EXPORT_SYMBOL(tcp_prequeue);
 
+bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb)
+{
+   u32 limit = sk->sk_rcvbuf + sk->sk_sndbuf;
+
+   /* Only socket owner can try to collapse/prune rx queues
+* to reduce memory overhead, so add a little headroom here.
+* Few sockets backlog are possibly concurrently non empty.
+*/
+   limit += 64*1024;
+
+   /* In case all data was pulled from skb frags (in __pskb_pull_tail()),
+* we can fix skb->truesize to its real value to avoid future drops.
+* This is valid because skb is not yet charged to the socket.
+* It has been noticed pure SACK packets were sometimes dropped
+* (if cooked by drivers without copybreak feature).
+*/
+   if (!skb->data_len)
+   skb->truesize = SKB_TRUESIZE(skb_end_offset(skb));
+
+   if (unlikely(sk_add_backlog(sk, skb, limit))) {
+   bh_unlock_sock(sk);
+   __NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPBACKLOGDROP);
+   return true;
+   }
+   return false;
+}
+EXPORT_SYMBOL(tcp_add_backlog);
+
 /*
  * From tcp_input.c
  */
@@ -1662,10 +1690,7 @@ process:
if (!sock_owned_by_user(sk)) {
if (!tcp_prequeue(sk, skb))
ret = tcp_v4_do_rcv(sk, skb);
-   } else if (unlikely(sk_add_backlog(sk, skb,
-  sk->sk_rcvbuf + sk->sk_sndbuf))) {
-   bh_unlock_sock(sk);
-   __NET_INC_STATS(net, LINUX_MIB_TCPBACKLOGDROP);
+   } else if (tcp_add_backlog(sk, skb)) {
goto discard_and_relse;
}
bh_unlock_sock(sk);
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 
e4f55683af314438da8a09473927213a140e6e5c..5bf460bd299ff61ae759e8e545ba6298a1d1373c
 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1467,10 +1467,7 @@ process:
if (!sock_owned_by_user(sk)) {
if (!tcp_prequeue(sk, skb))
ret = tcp_v6_do_rcv(sk, skb);
-   } else if (unlikely(sk_add_backlog(sk, skb,
-  sk->sk_rcvbuf + sk->sk_sndbuf))) {
-   bh_unlock_sock(sk);
-   __NET_INC_STATS(net, LINUX_MIB_TCPBACKLOGDROP);
+   } else if (tcp_add_backlog(sk, skb)) {
goto discard_and_relse;
}
bh_unlock_sock(sk);




Re: [RFC v2 09/10] landlock: Handle cgroups (program types)

2016-08-27 Thread Mickaël Salaün

On 27/08/2016 01:05, Alexei Starovoitov wrote:
> On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote:
>
>>> As far as safety and type checking that bpf programs has to do,
>>> I like the approach of patch 06/10:
>>> +LANDLOCK_HOOK2(file_open, FILE_OPEN,
>>> +   PTR_TO_STRUCT_FILE, struct file *, file,
>>> +   PTR_TO_STRUCT_CRED, const struct cred *, cred
>>> +)
>>> teaching verifier to recognize struct file, cred, sockaddr
>>> will let bpf program access them naturally without any overhead.
>>> Though:
>>> @@ -102,6 +102,9 @@ enum bpf_prog_type {
>>> BPF_PROG_TYPE_SCHED_CLS,
>>> BPF_PROG_TYPE_SCHED_ACT,
>>> BPF_PROG_TYPE_TRACEPOINT,
>>> +   BPF_PROG_TYPE_LANDLOCK_FILE_OPEN,
>>> +   BPF_PROG_TYPE_LANDLOCK_FILE_PERMISSION,
>>> +   BPF_PROG_TYPE_LANDLOCK_MMAP_FILE,
>>>  };
>>> is a bit of overkill.
>>> I think it would be cleaner to have single
>>> BPF_PROG_TYPE_LSM and at program load time pass
>>> lsm_hook_id as well, so that verifier can do safety checks
>>> based on type info provided in LANDLOCK_HOOKs
>>
>> I first started with a unique BPF_PROG_TYPE but, the thing is, the BPF
>> verifier check programs according to their types. If we need to check
>> specific context value types (e.g. PTR_TO_STRUCT_FILE), we need a
>> dedicated program types. I don't see any other way to do it with the
>> current verifier code. Moreover it's the purpose of program types, right?
> 
> Adding new bpf program type for every lsm hook is not acceptable.
> Either do one new program type + pass lsm_hook_id as suggested
> or please come up with an alternative approach.

OK, so we have to modify the verifier to not only rely on the program
type but on another value to check the context accesses. Do you have a
hint from where this value could come from? Do we need to add a new bpf
command to associate a program to a subtype?



signature.asc
Description: OpenPGP digital signature


Re: [RFC v2 09/10] landlock: Handle cgroups (netfilter match)

2016-08-27 Thread Mickaël Salaün

On 27/08/2016 01:05, Alexei Starovoitov wrote:
> On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote:
>> To sum up, there is four related patchsets:
>> * "Landlock LSM: Unprivileged sandboxing" (this series)
>> * "Add Checmate, BPF-driven minor LSM" (Sargun Dhillon)
>> * "Networking cgroup controller" (Anoop Naravaram)
>> * "Add eBPF hooks for cgroups" (Daniel Mack)

>>> Anoop Naravaram's use case is to control the ports the applications
>>> under cgroup can bind and listen on.
>>> Such use case can be solved by such 'lsm cgroup controller' by
>>> attaching bpf program to security_socket_bind lsm hook and
>>> filtering sockaddr.
>>> Furthermore Sargun's use case is to allow further sockaddr rewrites
>>> from the bpf program which can be done as natural extension
>>> of such mechanism.
>>>
>>> If I understood Daniel's Anoop's Sargun's and yours use cases
>>> correctly the common piece of kernel infrastructure that can solve
>>> them all can start from Daniel's current set of patches that
>>> establish a mechanism of attaching bpf program to a cgroup.
>>> Then adding lsm hooks to it and later allowing argument rewrite
>>> (since they're already in the kernel and no ToCToU problems exist)

>> For the network-related series, I think it make more sense to simply
>> create a netfilter rule matching a cgroup and then add more features to
>> netfilter (restrict port ranges and so on) thanks to eBPF programs.
>> Containers are (usually) in a dedicated network namespace, which open
>> the possibility to not only rely on cgroups (e.g. match UID,
>> netmask...). It would also be more flexible to be able to load a BPF
>> program in netfilter and update its maps on the fly to make dynamic
>> rules, like ipset does, but in a more generic way.

What do the netdev folks think about this design?



signature.asc
Description: OpenPGP digital signature


Re: [RFC v2 09/10] landlock: Handle cgroups (performance)

2016-08-27 Thread Mickaël Salaün

On 27/08/2016 01:05, Alexei Starovoitov wrote:
> On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote:
>>
>>>
>>> - I don't think such 'for' loop can scale. The solution needs to work
>>> with thousands of containers and thousands of cgroups.
>>> In the patch 06/10 the proposal is to use 'current' as holder of
>>> the programs:
>>> +   for (prog = current->seccomp.landlock_prog;
>>> +   prog; prog = prog->prev) {
>>> +   if (prog->filter == landlock_ret->filter) {
>>> +   cur_ret = BPF_PROG_RUN(prog->prog, (void *));
>>> +   break;
>>> +   }
>>> +   }
>>> imo that's the root of scalability issue.
>>> I think to be able to scale the bpf programs have to be attached to
>>> cgroups instead of tasks.
>>> That would be very different api. seccomp doesn't need to be touched.
>>> But that is the only way I see to be able to scale.
>>
>> Landlock is inspired from seccomp which also use a BPF program per
>> thread. For seccomp, each BPF programs are executed for each syscall.
>> For Landlock, some BPF programs are executed for some LSM hooks. I don't
>> see why it is a scale issue for Landlock comparing to seccomp. I also
>> don't see why storing the BPF program list pointer in the cgroup struct
>> instead of the task struct change a lot here. The BPF programs execution
>> will be the same anyway (for each LSM hook). Kees should probably have a
>> better opinion on this.
> 
> seccomp has its own issues and copying them doesn't make this lsm any better.
> Like seccomp bpf programs are all gigantic switch statement that looks
> for interesting syscall numbers. All syscalls of a task are paying
> non-trivial seccomp penalty due to such design. If bpf was attached per
> syscall it would have been much cheaper. Of course doing it this way
> for seccomp is not easy, but for lsm such facility is already there.
> Blank call of a single bpf prog for all lsm hooks is unnecessary
> overhead that can and should be avoided.

It's probably a misunderstanding. Contrary to seccomp which run all the
thread's BPF programs for any syscall, Landlock only run eBPF programs
for the triggered LSM hooks, if their type match. Indeed, thanks to the
multiple eBPF program types and contrary to seccomp, Landlock only run
an eBPF program when needed. Landlock will have almost no performance
overhead if the syscalls do not trigger the watched LSM hooks for the
current process.


> 
>>> May be another way of thinking about it is 'lsm cgroup controller'
>>> that Sargun is proposing.
>>> The lsm hooks will provide stable execution points and the programs
>>> will be called like:
>>> prog = task_css_set(current)->dfl_cgrp->bpf.prog_effective[lsm_hook_id];
>>> BPF_PROG_RUN(prog, ctx);
>>> The delegation functionality and 'prog_effective' logic that
>>> Daniel Mack is proposing will be fully reused here.
>>> External container management software will be able to apply bpf
>>> programs to control tasks under cgroup and such
>>> bpf_landlock_cmp_cgroup_beneath() helper won't be necessary.
>>> The user will be able to register different programs for different lsm 
>>> hooks.
>>> If I understand the patch 6/10 correctly, there is one (or a list) prog for
>>> all lsm hooks per task which is not flexible enough.
>>
>> For each LSM hook triggered by a thread, all of its Landlock eBPF
>> programs (dedicated for this kind of hook) will be evaluated (if
>> needed). This is the same behavior as seccomp (list of BPF programs
>> attached to a process hierarchy) except the BPF programs are not
>> evaluated for syscall but for LSM hooks. There is no way to make it more
>> fine-grained :)
> 
> There is a way to attach different bpf program per cgroup
> and per lsm hook. Such approach drastically reduces overhead
> of sandboxed application.

As said above, Landlock will not run an eBPF programs when not strictly
needed. Attaching to a cgroup will have the same performance impact as
attaching to a process hierarchy.



signature.asc
Description: OpenPGP digital signature


Re: [RFC v2 08/10] landlock: Handle file system comparisons

2016-08-27 Thread Mickaël Salaün

On 26/08/2016 16:57, Andy Lutomirski wrote:
> On Thu, Aug 25, 2016 at 7:10 AM, Mickaël Salaün  wrote:
>>
>> On 25/08/2016 13:12, Andy Lutomirski wrote:
>>> On Thu, Aug 25, 2016 at 3:32 AM, Mickaël Salaün  wrote:
 Add eBPF functions to compare file system access with a Landlock file
 system handle:
 * bpf_landlock_cmp_fs_prop_with_struct_file(prop, map, map_op, file)
   This function allows to compare the dentry, inode, device or mount
   point of the currently accessed file, with a reference handle.
 * bpf_landlock_cmp_fs_beneath_with_struct_file(opt, map, map_op, file)
   This function allows an eBPF program to check if the current accessed
   file is the same or in the hierarchy of a reference handle.

 The goal of file system handle is to abstract kernel objects such as a
 struct file or a struct inode. Userland can create this kind of handle
 thanks to the BPF_MAP_UPDATE_ELEM command. The element is a struct
 landlock_handle containing the handle type (e.g.
 BPF_MAP_HANDLE_TYPE_LANDLOCK_FS_FD) and a file descriptor. This could
 also be any descriptions able to match a struct file or a struct inode
 (e.g. path or glob string).
>>>
>>> This needs Eric's opinion.
>>>
>>> Also, where do all the struct file *'s get stashed?  Are they
>>> preserved in the arraymap?  What prevents reference cycles or absurdly
>>> large numbers of struct files getting pinned?
>>
>> Yes, the struct file are kept in the arraymap and dropped when there is
>> no more reference on them. Currently, the limitations are the maximum
>> number of open file descriptors referring to an arraymap and the maximum
>> number of eBPF Landlock programs loaded in a process
>> (LANDLOCK_PROG_LIST_MAX_PAGES in kernel/seccomp.c).
>>
>> What kind of reference cycles have you in mind?
> 
> Shoving evil things into the arraymaps, e.g. unix sockets with
> SCM_RIGHTS messages pending, eBPF program references, the arraymap fd
> itself, another arraymap fd, etc.

The arraymap of Landlock handles is strongly typed and can check the
kind of FD it get when creating/updating an entry, which is done for the
cgroup type. It may be wise to add another check for FS types as well,
which should be a one-liner. I'll do it for the next round.


>>
>> It probably needs another limit for kernel object references as well.
>> What is the best option here? Add another static limitation or use an
>> existing one?
> 
> Dunno.  If RLIMIT_FILE could be made to work, that would be nice.

The RLIMIT_NOFILE is used for the eBPF map creation, but only the memory
limit is used to store the map entries (struct file pointers). I'll add
a new static limit for the number of FD-based arraymap entries because
it does not reflect the same semantic. The struct files are not usable
as FD, their only purpose is to be able to compare with another file.

 Mickaël



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] ath10k: fix spelling mistake "montior" -> "monitor"

2016-08-27 Thread Julian Calaby
Hi All,

On Sat, Aug 27, 2016 at 4:08 AM, Colin King  wrote:
> From: Colin Ian King 
>
> Trivial fix to spelling mistake in ath10k_warn message.
>
> Signed-off-by: Colin Ian King 

Reviewed-by: Julian Calaby 

Thanks,

-- 
Julian Calaby

Email: julian.cal...@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/


Re: [PATCH v3 0/6] Add eBPF hooks for cgroups

2016-08-27 Thread Rami Rosen
Hi Daniel,
I don't see the cgroups mailing list address in the cc list. Since
this patch is related also to the cgroups subsystem, I would suggest
that going forward you will cc also cgro...@vger.kernel.org to future
patches related to cgroups. (I hope this won't cause exceeding the max
cc list length for patches).

Regards,
Rami Rosen

On 26 August 2016 at 22:58, Daniel Mack  wrote:
> This is v3 of the patch set to allow eBPF programs for network
> filtering and accounting to be attached to cgroups, so that they apply
> to all sockets of all tasks placed in that cgroup. The logic also
> allows to be extendeded for other cgroup based eBPF logic.
>
> I am posting this now with only very few changes from v2 because
> I'll be travelling for a couple of days and won't have access to my
> mails.
>
>
> Changes from v2:
>
> * Fixed the RCU locking details Tejun pointed out.
>
> * Assert bpf_attr.flags == 0 in BPF_PROG_DETACH syscall handler.
>
>
> Changes from v1:
>
> * Moved all bpf specific cgroup code into its own file, and stub
>   out related functions for !CONFIG_CGROUP_BPF as static inline nops.
>   This way, the call sites are not cluttered with #ifdef guards while
>   the feature remains compile-time configurable.
>
> * Implemented the new scheme proposed by Tejun. Per cgroup, store one
>   set of pointers that are pinned to the cgroup, and one for the
>   programs that are effective. When a program is attached or detached,
>   the change is propagated to all the cgroup's descendants. If a
>   subcgroup has its own pinned program, skip the whole subbranch in
>   order to allow delegation models.
>
> * The hookup for egress packets is now done from __dev_queue_xmit().
>
> * A static key is now used in both the ingress and egress fast paths
>   to keep performance penalties close to zero if the feature is
>   not in use.
>
> * Overall cleanup to make the accessors use the program arrays.
>   This should make it much easier to add new program types, which
>   will then automatically follow the pinned vs. effective logic.
>
> * Fixed locking issues, as pointed out by Eric Dumazet and Alexei
>   Starovoitov. Changes to the program array are now done with
>   xchg() and are protected by cgroup_mutex.
>
> * eBPF programs are now expected to return 1 to let the packet pass,
>   not >= 0. Pointed out by Alexei.
>
> * Operation is now limited to INET sockets, so local AF_UNIX sockets
>   are not affected. The enum members are renamed accordingly. In case
>   other socket families should be supported, this can be extended in
>   the future.
>
> * The sample program learned to support both ingress and egress, and
>   can now optionally make the eBPF program drop packets by making it
>   return 0.
>
>
> As always, feedback is much appreciated.
>
> Thanks,
> Daniel
>
> Daniel Mack (6):
>   bpf: add new prog type for cgroup socket filtering
>   cgroup: add support for eBPF programs
>   bpf: add BPF_PROG_ATTACH and BPF_PROG_DETACH commands
>   net: filter: run cgroup eBPF ingress programs
>   net: core: run cgroup eBPF egress programs
>   samples: bpf: add userspace example for attaching eBPF programs to
> cgroups
>
>  include/linux/bpf-cgroup.h  |  70 +
>  include/linux/cgroup-defs.h |   4 +
>  include/uapi/linux/bpf.h|  16 
>  init/Kconfig|  12 +++
>  kernel/bpf/Makefile |   1 +
>  kernel/bpf/cgroup.c | 165 
> 
>  kernel/bpf/syscall.c|  83 
>  kernel/bpf/verifier.c   |   1 +
>  kernel/cgroup.c |  18 +
>  net/core/dev.c  |   6 ++
>  net/core/filter.c   |  11 +++
>  samples/bpf/Makefile|   2 +
>  samples/bpf/libbpf.c|  23 ++
>  samples/bpf/libbpf.h|   3 +
>  samples/bpf/test_cgrp2_attach.c | 147 +++
>  15 files changed, 562 insertions(+)
>  create mode 100644 include/linux/bpf-cgroup.h
>  create mode 100644 kernel/bpf/cgroup.c
>  create mode 100644 samples/bpf/test_cgrp2_attach.c
>
> --
> 2.5.5
>


[PATCH] ipv6: Use inbound ifaddr as source addresses for ICMPv6 errors

2016-08-27 Thread Eli Cooper
According to RFC 1885 2.2(c), the source address of ICMPv6
errors in response to forwarded packets should be set to the
unicast address of the forwarding interface in order to be helpful
in diagnosis. Currently the selection of source address is based
on the default route, without respect to the inbound interface.

This patch sets the source address of ICMPv6 error messages to
the address of inbound interface, with the exception of
'time exceeded' and 'packet to big' messages sent in ip6_forward(),
where the address of OUTPUT device is forced as source address
(however, it is NOT enforced as claimed without this patch).

Signed-off-by: Eli Cooper 
---
 net/ipv6/icmp.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index bd59c34..0e52f3b 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -397,6 +397,7 @@ static void icmp6_send(struct sk_buff *skb, u8 type, u8 
code, __u32 info,
struct sock *sk;
struct ipv6_pinfo *np;
const struct in6_addr *saddr = NULL;
+   struct in6_addr tmp_saddr;
struct dst_entry *dst;
struct icmp6hdr tmp_hdr;
struct flowi6 fl6;
@@ -421,6 +422,12 @@ static void icmp6_send(struct sk_buff *skb, u8 type, u8 
code, __u32 info,
 */
addr_type = ipv6_addr_type(>daddr);
 
+   if (type == ICMPV6_DEST_UNREACH || type == ICMPV6_PKT_TOOBIG ||
+   type == ICMPV6_TIME_EXCEED || type == ICMPV6_PARAMPROB)
+   if (!ipv6_dev_get_saddr(net, skb->dev, >saddr, 0,
+   _saddr))
+   saddr = _saddr;
+
if (ipv6_chk_addr(net, >daddr, skb->dev, 0) ||
ipv6_chk_acast_addr_src(net, skb->dev, >daddr))
saddr = >daddr;
-- 
2.9.3



Re: [PATCH] [v9] net: emac: emac gigabit ethernet controller driver

2016-08-27 Thread Timur Tabi

David Miller wrote:

From: Timur Tabi 
Date: Thu, 25 Aug 2016 16:39:03 -0500


+static const u8 duuid[] = {
+   0x77, 0x79, 0x60, 0xbf,
+   0x2d, 0xab,
+   0x9d, 0x4b,
+   0x94, 0xf0,
+   0xe1, 0x11, 0x98, 0x01, 0xa2, 0xba
+};


This seems to be completely unused, please remove it.


Sorry, I missed that during my cleanup.  I'll remove it in v10.

I wonder why my compiler didn't complain about the unused static.

--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the
Code Aurora Forum, hosted by The Linux Foundation.


Re: [RFCv2 07/16] bpf: enable non-core use of the verfier

2016-08-27 Thread Jakub Kicinski
On Fri, 26 Aug 2016 16:29:05 -0700, Alexei Starovoitov wrote:
> On Fri, Aug 26, 2016 at 07:06:06PM +0100, Jakub Kicinski wrote:
> > Advanced JIT compilers and translators may want to use
> > eBPF verifier as a base for parsers or to perform custom
> > checks and validations.
> > 
> > Add ability for external users to invoke the verifier
> > and provide callbacks to be invoked for every intruction
> > checked.  For now only add most basic callback for
> > per-instruction pre-interpretation checks is added.  More
> > advanced users may also like to have per-instruction post
> > callback and state comparison callback.
> > 
> > Signed-off-by: Jakub Kicinski   
> 
> I like the apporach. Making verifier into 'bytecode parser'
> that JITs can reuse is a good design choice.
> The only thing I would suggest is to tweak the verifier to
> avoid in-place state recording. Then I think patch 8 for
> clone/unclone of the program won't be needed, since verifier
> will be read-only from bytecode point of view and patch 9
> also will be slightly cleaner.
> I think there are very few places in verifier that do this
> state keeping inside insn. It was bugging me for some time.
> Good time to clean that up.
> Unless I misunderstand the patches 7,8,9...

Agreed, I think the verifier only modifies the program to
store pointer types in imm field.  I will try to come up
a way around this, any suggestions?  Perhaps state_equal()
logic could be modified to downgrade pointers to UNKONWNs
when it detects other state had incompatible pointer type.

> There is also small concern for patches 5 and 6 that add
> more register state information. Potentially that extra
> state can prevent states_equal() to recognize equivalent
> states. Only patch 9 uses that info, right?

5 and 6 recognize more constant loads, those can only
upgrade some UNKNOWN_VALUEs to CONST_IMMs.  So yes, if the
verifier hits the CONST first and then tries with UNKNOWN
it will have to reverify the path.  

> Another question is do you need all state walking that
> verifier does or single linear pass through insns
> would have worked?
> Looks like you're only using CONST_IMM and PTR_TO_CTX
> state, right?

I think I need all the parsing.  Right now I mostly need
the verification to check that exit codes are specific
CONST_IMMs.  Clang quite happily does this:

r0 <- 0
if (...)
r0 <- 1
exit

> The rest looks very good. Thanks a lot!

Thanks for the review!  FWIW my use of parsing is isolated
to the nfp_bpf_verifier.c file, at the very end of patch 9.


Re: [PATCH 1/2] smc91x: always use 8-bit access if necessary

2016-08-27 Thread kbuild test robot
Hi Arnd,

[auto build test WARNING on net-next/master]
[also build test WARNING on v4.8-rc3 next-20160825]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]
[Suggest to use git(>=2.9.0) format-patch --base= (or --base=auto for 
convenience) to record what (public, well-known) commit your patch series was 
built on]
[Check https://git-scm.com/docs/git-format-patch for more information]

url:
https://github.com/0day-ci/linux/commits/Arnd-Bergmann/smc91x-always-use-8-bit-access-if-necessary/20160825-225929
config: m32r-allyesconfig (attached as .config)
compiler: m32r-linux-gcc (GCC) 4.9.0
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=m32r 

All warnings (new ones prefixed by >>):

   drivers/net/ethernet/smsc/smc91x.h:424:16: warning: unused variable 
'__val16' [-Wunused-variable]
  unsigned int __val16 = (x); \
   ^
   drivers/net/ethernet/smsc/smc91x.h:1015:28: note: in expansion of macro 
'SMC_out16'
#define SMC_SET_TCR(lp, x) SMC_out16(x, ioaddr, TCR_REG(lp))
   ^
   drivers/net/ethernet/smsc/smc91x.c:1028:3: note: in expansion of macro 
'SMC_SET_TCR'
  SMC_SET_TCR(lp, lp->tcr_cur_mode);
  ^
   drivers/net/ethernet/smsc/smc91x.c: In function 'smc_phy_configure':
   drivers/net/ethernet/smsc/smc91x.h:424:16: warning: unused variable 
'__val16' [-Wunused-variable]
  unsigned int __val16 = (x); \
   ^
   drivers/net/ethernet/smsc/smc91x.h:944:4: note: in expansion of macro 
'SMC_out16'
   SMC_out16(x, ioaddr, BANK_SELECT);  \
   ^
   drivers/net/ethernet/smsc/smc91x.c:1077:2: note: in expansion of macro 
'SMC_SELECT_BANK'
 SMC_SELECT_BANK(lp, 0);
 ^
   drivers/net/ethernet/smsc/smc91x.h:424:16: warning: unused variable 
'__val16' [-Wunused-variable]
  unsigned int __val16 = (x); \
   ^
   drivers/net/ethernet/smsc/smc91x.h:1010:4: note: in expansion of macro 
'SMC_out16'
   SMC_out16(x, ioaddr, RPC_REG(lp));  \
   ^
   drivers/net/ethernet/smsc/smc91x.c:1078:2: note: in expansion of macro 
'SMC_SET_RPC'
 SMC_SET_RPC(lp, lp->rpc_cur_mode);
 ^
   drivers/net/ethernet/smsc/smc91x.h:424:16: warning: unused variable 
'__val16' [-Wunused-variable]
  unsigned int __val16 = (x); \
   ^
   drivers/net/ethernet/smsc/smc91x.h:944:4: note: in expansion of macro 
'SMC_out16'
   SMC_out16(x, ioaddr, BANK_SELECT);  \
   ^
   drivers/net/ethernet/smsc/smc91x.c:1135:2: note: in expansion of macro 
'SMC_SELECT_BANK'
 SMC_SELECT_BANK(lp, 2);
 ^
   drivers/net/ethernet/smsc/smc91x.c: In function 'smc_10bt_check_media':
   drivers/net/ethernet/smsc/smc91x.h:424:16: warning: unused variable 
'__val16' [-Wunused-variable]
  unsigned int __val16 = (x); \
   ^
   drivers/net/ethernet/smsc/smc91x.h:944:4: note: in expansion of macro 
'SMC_out16'
   SMC_out16(x, ioaddr, BANK_SELECT);  \
   ^
   drivers/net/ethernet/smsc/smc91x.c:1176:2: note: in expansion of macro 
'SMC_SELECT_BANK'
 SMC_SELECT_BANK(lp, 0);
 ^
   drivers/net/ethernet/smsc/smc91x.h:424:16: warning: unused variable 
'__val16' [-Wunused-variable]
  unsigned int __val16 = (x); \
   ^
   drivers/net/ethernet/smsc/smc91x.h:944:4: note: in expansion of macro 
'SMC_out16'
   SMC_out16(x, ioaddr, BANK_SELECT);  \
   ^
   drivers/net/ethernet/smsc/smc91x.c:1178:2: note: in expansion of macro 
'SMC_SELECT_BANK'
 SMC_SELECT_BANK(lp, 2);
 ^
   drivers/net/ethernet/smsc/smc91x.c: In function 'smc_eph_interrupt':
   drivers/net/ethernet/smsc/smc91x.h:424:16: warning: unused variable 
'__val16' [-Wunused-variable]
  unsigned int __val16 = (x); \
   ^
   drivers/net/ethernet/smsc/smc91x.h:944:4: note: in expansion of macro 
'SMC_out16'
   SMC_out16(x, ioaddr, BANK_SELECT);  \
   ^
   drivers/net/ethernet/smsc/smc91x.c:1200:2: note: in expansion of macro 
'SMC_SELECT_BANK'
 SMC_SELECT_BANK(lp, 1);
 ^
   drivers/net/ethernet/smsc/smc91x.h:424:16: warning: unused variable 
'__val16' [-Wunused-variable]
  unsigned int __val16 = (x); \
   ^
   drivers/net/ethernet/smsc/smc91x.h:959:28: note: in expansion of macro 
'SMC_out16'
#define SMC_SET_CTL(lp, x) SMC_out16(x, ioaddr, CTL_REG(lp))
   ^
   drivers/net/ethernet/smsc/smc91x.c:1202:2: note: in expansion of macro 
'SMC_SET_CTL'
 SMC_SET_CTL(lp, ctl & ~CTL_LE_ENABLE);
 ^
   drivers/net/ethernet/smsc/smc91x.h:424:16: warning: unused variable 
'__val16' [-Wunused-variable]
  unsigned int __val16 = (x); \
   ^
   drivers/net/ethernet/smsc/smc91x.h:959:28: note: in expansion 

Re: kernel BUG at net/unix/garbage.c:149!"

2016-08-27 Thread Miklos Szeredi
On Wed, Aug 24, 2016 at 11:40 PM, Hannes Frederic Sowa
 wrote:
> On 24.08.2016 16:24, Nikolay Borisov wrote:
>> Hello,
>>
>> I hit the following BUG:
>>
>> [1851513.239831] [ cut here ]
>> [1851513.240079] kernel BUG at net/unix/garbage.c:149!
>> [1851513.240313] invalid opcode:  [#1] SMP
>> [1851513.248320] CPU: 37 PID: 11683 Comm: nginx Tainted: G   O
>> 4.4.14-clouder3 #26
>> [1851513.248719] Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1 04/14/2015
>> [1851513.248966] task: 883b0f6f ti: 880189cf task.ti: 
>> 880189cf
>> [1851513.249361] RIP: 0010:[]  [] 
>> unix_notinflight+0x8d/0x90
>> [1851513.249846] RSP: 0018:880189cf3cf8  EFLAGS: 00010246
>> [1851513.250082] RAX: 883b05491968 RBX: 883b05491680 RCX: 
>> 8807f9967330
>> [1851513.250476] RDX: 0001 RSI: 882e6d8bae00 RDI: 
>> 82073f10
>> [1851513.250886] RBP: 880189cf3d08 R08: 880cbc70e200 R09: 
>> 00018021
>> [1851513.251280] R10: 883fff3b9dc0 R11: ea0032f1c380 R12: 
>> 883fbaf5
>> [1851513.251674] R13: 815f6354 R14: 881a7c77b140 R15: 
>> 881a7c7792c0
>> [1851513.252083] FS:  7f4f19573720() GS:883fff3a() 
>> knlGS:
>> [1851513.252481] CS:  0010 DS:  ES:  CR0: 80050033
>> [1851513.252717] CR2: 013062d8 CR3: 001712f32000 CR4: 
>> 001406e0
>> [1851513.253116] Stack:
>> [1851513.253345]   880189cf3d40 880189cf3d28 
>> 815f4383
>> [1851513.254022]  8839ee11a800 8839ee11a800 880189cf3d60 
>> 815f53b8
>> [1851513.254685]   883406788de0  
>> 
>> [1851513.255360] Call Trace:
>> [1851513.255594]  [] unix_detach_fds.isra.19+0x43/0x50
>> [1851513.255851]  [] unix_destruct_scm+0x48/0x80
>> [1851513.256090]  [] skb_release_head_state+0x4f/0xb0
>> [1851513.256328]  [] skb_release_all+0x12/0x30
>> [1851513.256564]  [] kfree_skb+0x32/0xa0
>> [1851513.256810]  [] unix_release_sock+0x1e4/0x2c0
>> [1851513.257046]  [] unix_release+0x20/0x30
>> [1851513.257284]  [] sock_release+0x1f/0x80
>> [1851513.257521]  [] sock_close+0x12/0x20
>> [1851513.257769]  [] __fput+0xea/0x1f0
>> [1851513.258005]  [] fput+0xe/0x10
>> [1851513.258244]  [] task_work_run+0x7f/0xb0
>> [1851513.258488]  [] exit_to_usermode_loop+0xc0/0xd0
>> [1851513.258728]  [] syscall_return_slowpath+0x80/0xf0
>> [1851513.258983]  [] int_ret_from_sys_call+0x25/0x9f
>> [1851513.259222] Code: 7e 5b 41 5c 5d c3 48 8b 8b e8 02 00 00 48 8b 93 f0 02 
>> 00 00 48 89 51 08 48 89 0a 48 89 83 e8 02 00 00 48 89 83 f0 02 00 00 eb b8 
>> <0f> 0b 90 0f 1f 44 00 00 55 48 c7 c7 10 3f 07 82 48 89 e5 41 54
>> [1851513.268473] RIP  [] unix_notinflight+0x8d/0x90
>> [1851513.268793]  RSP 
>>
>> That's essentially BUG_ON(list_empty(>link));
>>
>> I see that all the code involving the ->link member hasn't really been
>> touched since it was introduced in 2007. So this must be a latent bug.
>> This is the first time I've observed it. The state
>> of the struct unix_sock can be found here http://sprunge.us/WCMW . Evidently,
>> there are no inflight sockets.

Weird.  If it was the BUG_ON(!list_empty(>link)) I'd understand,
because the code in scan children looks fishy: what prevents "embryos"
from fledging to full socket status and going in-flight?

But this one, I cannot imagine any scenario.

Can we have access to the crashdump?

Thanks,
Miklos


Re: [PATCH net-next v1] gso: Support partial splitting at the frag_list pointer

2016-08-27 Thread Shmulik Ladkani
Hi,

On Fri, 26 Aug 2016 13:45:56 -0700 Alexander Duyck  
wrote:
> > However, if TSO is off, but GSO is on, who takes care of further
> > splitting these skbs according to their gso_size?  
> 
> I believe the patch resolves it via the net_gso_ok check.  This is
> used to verify if the lower device could segment it if we split out
> the buffers from skb->frag_list.

Thanks, got it.

> > And another question:
> > Can this be utilized in any way to solve the problem described in [1] ?
> >
> > [1] https://patchwork.ozlabs.org/patch/661419/  
> 
> I don't think so.  This solution is to only do part of the software
> offload and still make use of an existing hardware offload.

Sorry, I wasn't too clear.

When attempting to reduce gso_size in order to avoid
segmentation+fragmentation, problem I'm hitting is:
  http://lists.openwall.net/netdev/2016/08/25/35

Actually, the idea of yours hinted me to a new direction, will pursue
that.

Thanks,
Shmulik


Re: [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing

2016-08-27 Thread Andy Lutomirski
On Thu, Aug 25, 2016 at 3:32 AM, Mickaël Salaün  wrote:
> Hi,
>
> This series is a proof of concept to fill some missing part of seccomp as the
> ability to check syscall argument pointers or creating more dynamic security
> policies. The goal of this new stackable Linux Security Module (LSM) called
> Landlock is to allow any process, including unprivileged ones, to create
> powerful security sandboxes comparable to the Seatbelt/XNU Sandbox or the
> OpenBSD Pledge. This kind of sandbox help to mitigate the security impact of
> bugs or unexpected/malicious behaviors in userland applications.
>
> The first RFC [1] was focused on extending seccomp while staying at the 
> syscall
> level. This brought a working PoC but with some (mitigated) ToCToU race
> conditions due to the seccomp ptrace hole (now fixed) and the non-atomic
> syscall argument evaluation (hence the LSM hooks).
>
>
> # Landlock LSM
>
> This second RFC is a fresh revamp of the code while keeping some working 
> ideas.
> This series is mainly focused on LSM hooks, while keeping the possibility to
> tied them to syscalls. This new code removes all race conditions by design. It
> now use eBPF instead of a subset of cBPF (as used by seccomp-bpf). This allow
> to remove the previous stacked cBPF hack to do complex access checks thanks to
> dedicated eBPF functions. An eBPF program is still very limited (i.e. can only
> call a whitelist of functions) and can not do a denial of service (i.e. no
> loop). The other major improvement is the replacement of the previous custom
> checker groups of syscall arguments with a new dedicated eBPF map to collect
> and compare Landlock handles with system resources (e.g. files or network
> connections).
>
> The approach taken is to add the minimum amount of code while still allowing
> the userland to create quite complex access rules. A dedicated security policy
> language such as used by SELinux, AppArmor and other major LSMs is a lot of
> code and dedicated to a trusted process (i.e. root/administrator).
>

I think there might be a problem with the current design.  If I add a
seccomp filter that uses RET_LANDLOCK and some landlock filters, what
happens if a second seccomp filter *also* uses RET_LANDLOCK?  I think
they'll interfere with each other.  It might end up being necessary to
require only one landlock seccomp layer at a time or to find a way to
stick all the filters in a layer together with the LSM callbacks or
maybe to just drop RET_LANDLOCK and let the callbacks look at the
syscall args.

BTW, what happens if an LSM hook is called outside a syscall context,
e.g. from a page fault?

>
>
> # Sandbox example with conditional access control depending on cgroup
>
>   $ mkdir /sys/fs/cgroup/sandboxed
>   $ ls /home
>   user1
>   $ LANDLOCK_CGROUPS='/sys/fs/cgroup/sandboxed' \
>   LANDLOCK_ALLOWED='/bin:/lib:/usr:/tmp:/proc/self/fd/0' \
>   ./sandbox /bin/sh -i
>   $ ls /home
>   user1
>   $ echo $$ > /sys/fs/cgroup/sandboxed/cgroup.procs
>   $ ls /home
>   ls: cannot open directory '/home': Permission denied
>

Something occurs to me that isn't strictly relevant to landlock but
may be relevant to unprivileged cgroups: can you cause trouble by
setting up a nastily-configured cgroup and running a setuid program in
it?


Re: [RFC v2 09/10] landlock: Handle cgroups

2016-08-27 Thread Andy Lutomirski
On Aug 27, 2016 1:05 AM, "Alexei Starovoitov"
 wrote:
>
> On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote:
> >
>
> trimming cc list again. When it's too big vger will consider it as spam.
>
> > On 26/08/2016 04:14, Alexei Starovoitov wrote:
> > > On Thu, Aug 25, 2016 at 12:32:44PM +0200, Mickaël Salaün wrote:
> > >> Add an eBPF function bpf_landlock_cmp_cgroup_beneath(opt, map, map_op)
> > >> to compare the current process cgroup with a cgroup handle, The handle
> > >> can match the current cgroup if it is the same or a child. This allows
> > >> to make conditional rules according to the current cgroup.
> > >>
> > >> A cgroup handle is a map entry created from a file descriptor referring
> > >> a cgroup directory (e.g. by opening /sys/fs/cgroup/X). In this case, the
> > >> map entry is of type BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD and the
> > >> inferred array map is of type BPF_MAP_ARRAY_TYPE_LANDLOCK_CGROUP.
> > >>
> > >> An unprivileged process can create and manipulate cgroups thanks to
> > >> cgroup delegation.
> > >>
> > >> Signed-off-by: Mickaël Salaün 
> > > ...
> > >> +static inline u64 bpf_landlock_cmp_cgroup_beneath(u64 r1_option, u64 
> > >> r2_map,
> > >> +  u64 r3_map_op, u64 r4, u64 r5)
> > >> +{
> > >> +  u8 option = (u8) r1_option;
> > >> +  struct bpf_map *map = (struct bpf_map *) (unsigned long) r2_map;
> > >> +  enum bpf_map_array_op map_op = r3_map_op;
> > >> +  struct bpf_array *array = container_of(map, struct bpf_array, map);
> > >> +  struct cgroup *cg1, *cg2;
> > >> +  struct map_landlock_handle *handle;
> > >> +  int i;
> > >> +
> > >> +  /* ARG_CONST_PTR_TO_LANDLOCK_HANDLE_CGROUP is an arraymap */
> > >> +  if (unlikely(!map)) {
> > >> +  WARN_ON(1);
> > >> +  return -EFAULT;
> > >> +  }
> > >> +  if (unlikely((option | _LANDLOCK_FLAG_OPT_MASK) != 
> > >> _LANDLOCK_FLAG_OPT_MASK))
> > >> +  return -EINVAL;
> > >> +
> > >> +  /* for now, only handle OP_OR */
> > >> +  switch (map_op) {
> > >> +  case BPF_MAP_ARRAY_OP_OR:
> > >> +  break;
> > >> +  case BPF_MAP_ARRAY_OP_UNSPEC:
> > >> +  case BPF_MAP_ARRAY_OP_AND:
> > >> +  case BPF_MAP_ARRAY_OP_XOR:
> > >> +  default:
> > >> +  return -EINVAL;
> > >> +  }
> > >> +
> > >> +  synchronize_rcu();
> > >> +
> > >> +  for (i = 0; i < array->n_entries; i++) {
> > >> +  handle = (struct map_landlock_handle *)
> > >> +  (array->value + array->elem_size * i);
> > >> +
> > >> +  /* protected by the proto types, should not happen */
> > >> +  if (unlikely(handle->type != 
> > >> BPF_MAP_HANDLE_TYPE_LANDLOCK_CGROUP_FD)) {
> > >> +  WARN_ON(1);
> > >> +  return -EFAULT;
> > >> +  }
> > >> +  if (unlikely(!handle->css)) {
> > >> +  WARN_ON(1);
> > >> +  return -EFAULT;
> > >> +  }
> > >> +
> > >> +  if (option & LANDLOCK_FLAG_OPT_REVERSE) {
> > >> +  cg1 = handle->css->cgroup;
> > >> +  cg2 = task_css_set(current)->dfl_cgrp;
> > >> +  } else {
> > >> +  cg1 = task_css_set(current)->dfl_cgrp;
> > >> +  cg2 = handle->css->cgroup;
> > >> +  }
> > >> +
> > >> +  if (cgroup_is_descendant(cg1, cg2))
> > >> +  return 0;
> > >> +  }
> > >> +  return 1;
> > >> +}
> > >
> > > - please take a loook at exisiting bpf_current_task_under_cgroup and
> > > reuse BPF_MAP_TYPE_CGROUP_ARRAY as a minimum. Doing new cgroup array
> > > is nothing but duplication of the code.
> >
> > Oh, I didn't know about this patchset and the new helper. Indeed, it
> > looks a lot like mine except there is no static verification of the map
> > type as I did with the arraymap of handles, and no batch mode either. I
> > think the return value of bpf_current_task_under_cgroup is error-prone
> > if an eBPF program do an "if(ret)" test on the value (because of the
> > negative ERRNO return value). Inverting the 0 and 1 return values should
> > fix this (0 == succeed, 1 == failed, <0 == error).
>
> nothing to fix. It's good as-is. Use if (ret > 0) instead.
>
> >
> > To sum up, there is four related patchsets:
> > * "Landlock LSM: Unprivileged sandboxing" (this series)
> > * "Add Checmate, BPF-driven minor LSM" (Sargun Dhillon)
> > * "Networking cgroup controller" (Anoop Naravaram)
> > * "Add eBPF hooks for cgroups" (Daniel Mack)
> >
> > The three other series (Sargun's, Anoop's and Daniel's) are mainly
> > focused on network access-control via cgroup for *containers*. As far as
> > I can tell, only a *root* user (CAP_SYS_ADMIN) can use them. Landlock's
> > goal is to empower all processes (privileged or not) to create their own
> > sandbox. This also means, like explained in "[RFC v2 00/10] Landlock
> > LSM: Unprivileged sandboxing", there is more constraints. For example,
> > it is not acceptable to let a process probe the kernel 

[net-next PATCH] e1000: add initial XDP support

2016-08-27 Thread John Fastabend
From: Alexei Starovoitov 

This patch adds initial support for XDP on e1000 driver. Note e1000
driver does not support page recycling in general which could be
added as a further improvement. However for XDP_DROP and XDP_XMIT
the xdp code paths will recycle pages.

This patch includes the rcu_read_lock/rcu_read_unlock pair noted by
Brenden Blanco in another pending patch.

  net/mlx4_en: protect ring->xdp_prog with rcu_read_lock

CC: William Tu 
Signed-off-by: Alexei Starovoitov 
Signed-off-by: John Fastabend 
---
 drivers/net/ethernet/intel/e1000/e1000.h  |1 
 drivers/net/ethernet/intel/e1000/e1000_main.c |  168 -
 2 files changed, 165 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000.h 
b/drivers/net/ethernet/intel/e1000/e1000.h
index d7bdea7..62e33fa 100644
--- a/drivers/net/ethernet/intel/e1000/e1000.h
+++ b/drivers/net/ethernet/intel/e1000/e1000.h
@@ -279,6 +279,7 @@ struct e1000_adapter {
 struct e1000_rx_ring *rx_ring,
 int cleaned_count);
struct e1000_rx_ring *rx_ring;  /* One per active queue */
+   struct bpf_prog *prog;
struct napi_struct napi;
 
int num_tx_queues;
diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c 
b/drivers/net/ethernet/intel/e1000/e1000_main.c
index f42129d..396b0e0 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 
 char e1000_driver_name[] = "e1000";
 static char e1000_driver_string[] = "Intel(R) PRO/1000 Network Driver";
@@ -842,6 +843,44 @@ static int e1000_set_features(struct net_device *netdev,
return 0;
 }
 
+static int e1000_xdp_set(struct net_device *netdev, struct bpf_prog *prog)
+{
+   struct e1000_adapter *adapter = netdev_priv(netdev);
+   struct bpf_prog *old_prog;
+
+   old_prog = xchg(>prog, prog);
+   if (old_prog) {
+   synchronize_net();
+   bpf_prog_put(old_prog);
+   }
+
+   if (netif_running(netdev))
+   e1000_reinit_locked(adapter);
+   else
+   e1000_reset(adapter);
+   return 0;
+}
+
+static bool e1000_xdp_attached(struct net_device *dev)
+{
+   struct e1000_adapter *priv = netdev_priv(dev);
+
+   return !!priv->prog;
+}
+
+static int e1000_xdp(struct net_device *dev, struct netdev_xdp *xdp)
+{
+   switch (xdp->command) {
+   case XDP_SETUP_PROG:
+   return e1000_xdp_set(dev, xdp->prog);
+   case XDP_QUERY_PROG:
+   xdp->prog_attached = e1000_xdp_attached(dev);
+   return 0;
+   default:
+   return -EINVAL;
+   }
+}
+
 static const struct net_device_ops e1000_netdev_ops = {
.ndo_open   = e1000_open,
.ndo_stop   = e1000_close,
@@ -860,6 +899,7 @@ static const struct net_device_ops e1000_netdev_ops = {
 #endif
.ndo_fix_features   = e1000_fix_features,
.ndo_set_features   = e1000_set_features,
+   .ndo_xdp= e1000_xdp,
 };
 
 /**
@@ -1276,6 +1316,9 @@ static void e1000_remove(struct pci_dev *pdev)
e1000_down_and_stop(adapter);
e1000_release_manageability(adapter);
 
+   if (adapter->prog)
+   bpf_prog_put(adapter->prog);
+
unregister_netdev(netdev);
 
e1000_phy_hw_reset(hw);
@@ -1859,7 +1902,7 @@ static void e1000_configure_rx(struct e1000_adapter 
*adapter)
struct e1000_hw *hw = >hw;
u32 rdlen, rctl, rxcsum;
 
-   if (adapter->netdev->mtu > ETH_DATA_LEN) {
+   if (adapter->netdev->mtu > ETH_DATA_LEN || adapter->prog) {
rdlen = adapter->rx_ring[0].count *
sizeof(struct e1000_rx_desc);
adapter->clean_rx = e1000_clean_jumbo_rx_irq;
@@ -3298,6 +3341,61 @@ static netdev_tx_t e1000_xmit_frame(struct sk_buff *skb,
return NETDEV_TX_OK;
 }
 
+static void e1000_tx_map_rxpage(struct e1000_tx_ring *tx_ring,
+   struct e1000_rx_buffer *rx_buffer_info,
+   unsigned int len)
+{
+   struct e1000_tx_buffer *buffer_info;
+   unsigned int i = tx_ring->next_to_use;
+
+   buffer_info = _ring->buffer_info[i];
+
+   buffer_info->length = len;
+   buffer_info->time_stamp = jiffies;
+   buffer_info->mapped_as_page = false;
+   buffer_info->dma = rx_buffer_info->dma;
+   buffer_info->next_to_watch = i;
+
+   tx_ring->buffer_info[i].skb = NULL;
+   tx_ring->buffer_info[i].segs = 1;
+   tx_ring->buffer_info[i].bytecount = len;
+   tx_ring->buffer_info[i].next_to_watch = i;
+}
+
+static void e1000_xmit_raw_frame(struct e1000_rx_buffer *rx_buffer_info,
+unsigned int len,
+struct 

Re: [PATCH 3/5] rxrpc: fix last_call processing

2016-08-27 Thread David Howells
Arnd Bergmann  wrote:

> A change to the retransmission handling in rxrpc caused a use-before-init
> bug in rxrpc_data_ready(), as indicated by "gcc -Wmaybe-uninitialized":
> 
> net/rxrpc/input.c: In function 'rxrpc_data_ready':
> net/rxrpc/input.c:735:34: error: 'call' may be used uninitialized in this 
> function [-Werror=maybe-uninitialized]
> 
> This moves the initialization of the local variable before the first
> user, which presumably is what was intended here.
> 
> Signed-off-by: Arnd Bergmann 
> Fixes: 18bfeba50dfd ("rxrpc: Perform terminal call ACK/ABORT retransmission 
> from conn processor")
> ---
> Cc: David Howells 
> Cc: "David S. Miller" 
> Cc: netdev@vger.kernel.org
> 
>  net/rxrpc/input.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
> index 66cdeb56f44f..3c22e43a58fd 100644
> --- a/net/rxrpc/input.c
> +++ b/net/rxrpc/input.c
> @@ -728,6 +728,10 @@ void rxrpc_data_ready(struct sock *sk)
>   if (sp->hdr.callNumber < chan->last_call)
>   goto discard_unlock;
>  
> + call = rcu_dereference(chan->call);
> + if (!call || atomic_read(>usage) == 0)
> + goto cant_route_call;
> +
>   if (sp->hdr.callNumber == chan->last_call) {
>   /* For the previous service call, if completed
>* successfully, we discard all further packets.
> @@ -744,10 +748,6 @@ void rxrpc_data_ready(struct sock *sk)
>   goto out_unlock;
>   }
>  
> - call = rcu_dereference(chan->call);
> - if (!call || atomic_read(>usage) == 0)
> - goto cant_route_call;
> -
>   rxrpc_post_packet_to_call(call, skb);
>   goto out_unlock;
>   }

You can't rearrange these like this.  I have a different fix.

David


[PATCH v2] fix:brcmfmac: add missing header dependencies

2016-08-27 Thread Baoyou Xie
We get 1 warning when biuld kernel with W=1:
drivers/net/wireless/broadcom/brcm80211/brcmfmac/tracepoint.c:23:6: warning: no 
previous prototype for '__brcmf_err' [-Wmissing-
prototypes]

In fact, this function is declared in brcmfmac/debug.h, so this patch
add missing header dependencies

Signed-off-by: Baoyou Xie 
---
 drivers/net/wireless/broadcom/brcm80211/brcmfmac/tracepoint.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/tracepoint.c 
b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/tracepoint.c
index a10f35c..fe67559 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/tracepoint.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/tracepoint.c
@@ -19,6 +19,7 @@
 #ifndef __CHECKER__
 #define CREATE_TRACE_POINTS
 #include "tracepoint.h"
+#include "debug.h"
 
 void __brcmf_err(const char *func, const char *fmt, ...)
 {
-- 
2.7.4