Re: [PATCH v3] net: Require exact match for TCP socket lookups if dif is l3mdev
From: David Ahern Date: Sat, 15 Oct 2016 17:07:53 -0600 > I believe at netconf someone mentioned it would be a great day when > something is done for IPv6 first and IPv4 was a follow on. Here you > go. :-) :-) > I can rename the existing one to skb_l3mdev_slave_6 and make the new > one skb_l3mdev_slave_4. That works. So does names with "ipv4_" and "ipv6_" prefixes which at least to me seems more canonical. But maybe I'm just weird like that.
Re: [PATCH 0/2] net: Fix compiler warnings
On 10/15/2016 02:48 PM, David Miller wrote: From: Tushar Dave Date: Fri, 14 Oct 2016 17:06:04 -0700 Recently, ATU (iommu) changes are submitted to linux-sparc that enables 64bit DMA on SPARC. However, this change also makes 'incompatible pointer type' compiler warnings inevitable on sunqe and sunbmac driver. The two patches in series fix compiler warnings. Only the sparc tree has this build problem, so these patches really ought to be submitted for and applied there. Okay. I will send these to sparclinux then. Thanks. -Tushar Thanks.
Re: [PATCH v3] net: Require exact match for TCP socket lookups if dif is l3mdev
On 10/15/16 3:46 PM, David Miller wrote: > From: David Ahern > Date: Fri, 14 Oct 2016 12:29:19 -0700 > >> +/* can not be used in TCP layer after tcp_v6_fill_cb */ >> +static inline bool inet6_exact_dif_match(struct net *net, struct sk_buff >> *skb) >> +{ >> +#if defined(CONFIG_NET_L3_MASTER_DEV) >> +if (!net->ipv4.sysctl_tcp_l3mdev_accept && >> +skb_l3mdev_slave(IP6CB(skb)->flags)) >> +return true; >> +#endif >> +return false; >> +} > ... >> +static inline bool skb_l3mdev_slave4(u16 flags) >> +{ >> +return !!(flags & IPSKB_L3SLAVE); >> +} > > I think this makes the code confusing. > > Actually it has been from the beginning, because we have a generically > named "skb_l3mdev_slave()" helper which strictly operates on ipv6 > state. > > Please do something with the naming of these two helpers, > skb_l3mdev_slave() and skb_l3mdev_slave4(), so that it is clear that > they are ipv6 and ipv4 specific helpers, respectively. > I believe at netconf someone mentioned it would be a great day when something is done for IPv6 first and IPv4 was a follow on. Here you go. :-) I can rename the existing one to skb_l3mdev_slave_6 and make the new one skb_l3mdev_slave_4.
Re: [PATCH] ipvlan: constify l3mdev_ops structure
From: Julia Lawall Date: Sat, 15 Oct 2016 17:40:30 +0200 > This l3mdev_ops structure is only stored in the l3mdev_ops field of a > net_device structure. This field is declared const, so the l3mdev_ops > structure can be declared as const also. Additionally drop the > __read_mostly annotation. > > The semantic patch that adds const is as follows: > (http://coccinelle.lip6.fr/) ... > Signed-off-by: Julia Lawall Applied, thanks.
Re: [PATCH 0/2] net: Fix compiler warnings
From: Tushar Dave Date: Fri, 14 Oct 2016 17:06:04 -0700 > Recently, ATU (iommu) changes are submitted to linux-sparc that > enables 64bit DMA on SPARC. However, this change also makes > 'incompatible pointer type' compiler warnings inevitable on sunqe > and sunbmac driver. > > The two patches in series fix compiler warnings. Only the sparc tree has this build problem, so these patches really ought to be submitted for and applied there. Thanks.
Re: [PATCH v2] vmxnet3: avoid assumption about invalid dma_pa in vmxnet3_set_mc()
From: Alexey Khoroshilov Date: Sat, 15 Oct 2016 00:01:20 +0300 > vmxnet3_set_mc() checks new_table_pa returned by dma_map_single() > with dma_mapping_error(), but even there it assumes zero is invalid pa > (it assumes dma_mapping_error(...,0) returns true if new_table is NULL). > > The patch adds an explicit variable to track status of new_table_pa. > > Found by Linux Driver Verification project (linuxtesting.org). > > v2: use "bool" and "true"/"false" for boolean variables. > Signed-off-by: Alexey Khoroshilov Applied.
Re: [PATCH v3] net: Require exact match for TCP socket lookups if dif is l3mdev
From: David Ahern Date: Fri, 14 Oct 2016 12:29:19 -0700 > +/* can not be used in TCP layer after tcp_v6_fill_cb */ > +static inline bool inet6_exact_dif_match(struct net *net, struct sk_buff > *skb) > +{ > +#if defined(CONFIG_NET_L3_MASTER_DEV) > + if (!net->ipv4.sysctl_tcp_l3mdev_accept && > + skb_l3mdev_slave(IP6CB(skb)->flags)) > + return true; > +#endif > + return false; > +} ... > +static inline bool skb_l3mdev_slave4(u16 flags) > +{ > + return !!(flags & IPSKB_L3SLAVE); > +} I think this makes the code confusing. Actually it has been from the beginning, because we have a generically named "skb_l3mdev_slave()" helper which strictly operates on ipv6 state. Please do something with the naming of these two helpers, skb_l3mdev_slave() and skb_l3mdev_slave4(), so that it is clear that they are ipv6 and ipv4 specific helpers, respectively.
Re: [patch] stmmac: fix an error code in stmmac_ptp_register()
From: Dan Carpenter Date: Fri, 14 Oct 2016 22:26:11 +0300 > PTR_ERR(NULL) is success. We have to preserve the error code earlier. > > Fixes: 7086605a6ab5 ("stmmac: fix error check when init ptp") > Signed-off-by: Dan Carpenter Good catch, applied.
Re: [PATCH] net: qcom/emac: disable interrupts before calling phy_disconnect
From: Timur Tabi Date: Fri, 14 Oct 2016 14:14:35 -0500 > There is a race condition that can occur if EMAC interrupts are > enabled when phy_disconnect() is called. phy_disconnect() sets > adjust_link to NULL. When an interrupt occurs, the ISR might > call phy_mac_interrupt(), which wakes up the workqueue function > phy_state_machine(). This function might reference adjust_link, > thereby causing a null pointer exception. > > Signed-off-by: Timur Tabi Applied.
Re: [PATCH v2 net-next 0/2] ila: Cache a route in ILA lwt structure
From: Tom Herbert Date: Fri, 14 Oct 2016 11:25:35 -0700 > Add a dst_cache to ila_lwt structure. This holds a cached route for the > translated address. In ila_output we now perform a route lookup after > translation and if possible (destination in original route is full 128 > bits) we set the dst_cache. Subsequent calls to ila_output can then use > the cache to avoid the route lookup. ... Series applied, thanks Tom.
Re: [PATCH v2] r8169: set coherent DMA mask as well as streaming DMA mask
From: Ard Biesheuvel Date: Fri, 14 Oct 2016 14:48:51 +0100 > >> On 14 Oct 2016, at 14:42, David Laight wrote: >> >> From: Of Ard Biesheuvel >>> Sent: 14 October 2016 14:41 >>> PCI devices that are 64-bit DMA capable should set the coherent >>> DMA mask as well as the streaming DMA mask. On some architectures, >>> these are managed separately, and so the coherent DMA mask will be >>> left at its default value of 32 if it is not set explicitly. This >>> results in errors such as >>> >>> r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded >>> hwdev DMA mask = 0x, dev_addr = 0x0080fbfff000 >>> swiotlb: coherent allocation failed for device :02:00.0 size=4096 >>> CPU: 0 PID: 1062 Comm: systemd-udevd Not tainted 4.8.0+ #35 >>> Hardware name: AMD Seattle/Seattle, BIOS 10:53:24 Oct 13 2016 >>> >>> on systems without memory that is 32-bit addressable by PCI devices. >>> >>> Signed-off-by: Ard Biesheuvel >>> --- >>> v2: dropped the hunk that sets the coherent DMA mask to DMA_BIT_MASK(32), >>>which is unnecessary given that it is the default >>> >>> drivers/net/ethernet/realtek/r8169.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/net/ethernet/realtek/r8169.c >>> b/drivers/net/ethernet/realtek/r8169.c >>> index e55638c7505a..bf000d819a21 100644 >>> --- a/drivers/net/ethernet/realtek/r8169.c >>> +++ b/drivers/net/ethernet/realtek/r8169.c >>> @@ -8273,7 +8273,8 @@ static int rtl_init_one(struct pci_dev *pdev, const >>> struct pci_device_id *ent) >>>if ((sizeof(dma_addr_t) > 4) && >>>(use_dac == 1 || (use_dac == -1 && pci_is_pcie(pdev) && >>> tp->mac_version >= RTL_GIGA_MAC_VER_18)) && >>> -!pci_set_dma_mask(pdev, DMA_BIT_MASK(64))) { >>> +!pci_set_dma_mask(pdev, DMA_BIT_MASK(64)) && >>> +!pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64))) { >> >> Isn't there a dma_set_mask_and_coherent() function ? >> > > Not of the pci_xxx variety afaik You can often use the "dev_*" variants intechangably with the pci_*() ones. In fact you'll find that for several architectures pci_*() is implemented via calls to dev_*().
[PATCH RFC] ixgbe: ixgbe_atr() must check if network header is available in headlen
For some Tx paths (e.g., tpacket_snd()), ixgbe_atr may be passed down an sk_buff that has the network and transport header in the paged data, so it needs to make sure these headers are available in the headlen bytes to calculate the l4_proto. This patch bails out if the headlen is "too short", and does not attempt to call skb_header_pointer() to get the needed bytes: the assumption is that the caller should set things up properly if the l4_proto based tx steering is desired. Signed-off-by: Sowmini Varadhan --- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |9 + 1 files changed, 9 insertions(+), 0 deletions(-) diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index a244d9a..0868de1 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -7632,6 +7632,7 @@ static void ixgbe_atr(struct ixgbe_ring *ring, struct sk_buff *skb; __be16 vlan_id; int l4_proto; + int min_hdr_size = 0; /* if ring doesn't have a interrupt vector, cannot perform ATR */ if (!q_vector) @@ -7650,6 +7651,14 @@ static void ixgbe_atr(struct ixgbe_ring *ring, /* snag network header to get L4 type and address */ skb = first->skb; + if (first->protocol == htons(ETH_P_IP)) + min_hdr_size = sizeof(struct iphdr) + + sizeof(struct tcphdr); + else if (first->protocol == htons(ETH_P_IPV6)) + min_hdr_size = sizeof(struct ipv6hdr) + + sizeof(struct tcphdr); + if (min_hdr_size && skb_headlen(skb) < ETH_HLEN + min_hdr_size) + return; hdr.network = skb_network_header(skb); if (skb->encapsulation && first->protocol == htons(ETH_P_IP) && -- 1.7.1
Re: [PATCH v2] r8169: set coherent DMA mask as well as streaming DMA mask
From: Ard Biesheuvel Date: Fri, 14 Oct 2016 14:40:33 +0100 > PCI devices that are 64-bit DMA capable should set the coherent > DMA mask as well as the streaming DMA mask. On some architectures, > these are managed separately, and so the coherent DMA mask will be > left at its default value of 32 if it is not set explicitly. This > results in errors such as > > r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded > hwdev DMA mask = 0x, dev_addr = 0x0080fbfff000 > swiotlb: coherent allocation failed for device :02:00.0 size=4096 > CPU: 0 PID: 1062 Comm: systemd-udevd Not tainted 4.8.0+ #35 > Hardware name: AMD Seattle/Seattle, BIOS 10:53:24 Oct 13 2016 > > on systems without memory that is 32-bit addressable by PCI devices. > > Signed-off-by: Ard Biesheuvel Applied.
linux-next: manual merge of the net tree with Linus' tree
Hi all, Today's linux-next merge of the net tree got a conflict in: drivers/net/ethernet/qlogic/Kconfig between commit: 2e0cbc4dd077 ("qedr: Add RoCE driver framework") from Linus' tree and commit: 0189efb8f4f8 ("qed*: Fix Kconfig dependencies with INFINIBAND_QEDR") from the net tree. I fixed it up (see below) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. I also added this merge fix patch: From: Stephen Rothwell Date: Sun, 16 Oct 2016 08:09:42 +1100 Subject: [PATCH] qed*: merge fix for CONFIG_INFINIBAND_QEDR Kconfig move Signed-off-by: Stephen Rothwell --- drivers/infiniband/hw/qedr/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/infiniband/hw/qedr/Kconfig b/drivers/infiniband/hw/qedr/Kconfig index 7c06d85568d4..6c9f3923e838 100644 --- a/drivers/infiniband/hw/qedr/Kconfig +++ b/drivers/infiniband/hw/qedr/Kconfig @@ -2,6 +2,7 @@ config INFINIBAND_QEDR tristate "QLogic RoCE driver" depends on 64BIT && QEDE select QED_LL2 + select QED_RDMA ---help--- This driver provides low-level InfiniBand over Ethernet support for QLogic QED host channel adapters (HCAs). -- 2.8.1 -- Cheers, Stephen Rothwell diff --cc drivers/net/ethernet/qlogic/Kconfig index 1e8339a67f6e,77567727528a.. --- a/drivers/net/ethernet/qlogic/Kconfig +++ b/drivers/net/ethernet/qlogic/Kconfig @@@ -107,4 -107,19 +107,7 @@@ config QED ---help--- This enables the support for ... + config QED_RDMA + bool + -config INFINIBAND_QEDR - tristate "QLogic qede RoCE sources [debug]" - depends on QEDE && 64BIT - select QED_LL2 - select QED_RDMA - default n - ---help--- -This provides a temporary node that allows the compilation -and logical testing of the InfiniBand over Ethernet support -for QLogic QED. This would be replaced by the 'real' option -once the QEDR driver is added [+relocated]. - endif # NET_VENDOR_QLOGIC
Re: [PATCH v2 1/3] net: smc91x: isolate u16 writes alignment workaround
Sorry David, I just noticed you weren't in the "To:" of this serie, but I won't forget you for the v3 I need to release anyway (https://lkml.org/lkml/2016/10/15/104). Robert Jarzmik writes: > + lp->half_word_align4 = > + machine_is_mainstone() || machine_is_stargate2() || > + machine_is_pxa_idp(); Bah this one is not good enough. First, machine_is_*() is not defined if CONFIG_ARM=n, and this part is not under a #ifdef CONFIG_ARM. Moreover, I think it is a good occasion to go further, and : - enhance smc91x_platdata and add a pxa_u16_align4 boolean - transform this statement into : lp->half_word_align4 = lp->cfg.pxa_u16_align4 This will remove the machine_*() calls from the smc91x driver, which looks a good move, doesn't it ? Cheers. -- Robert
Re: [PATCH 2/2] rds: Remove duplicate prefix from rds_conn_path_error use
On 10/15/2016 11:53 AM, Joe Perches wrote: rds_conn_path_error already prefixes "RDS:" to the output. Signed-off-by: Joe Perches --- Acked-by: Santosh Shilimkar
Re: [PATCH 1/2] rds: Remove unused rds_conn_error
On 10/15/2016 11:53 AM, Joe Perches wrote: This macro's last use was removed in commit d769ef81d5b59 ("RDS: Update rds_conn_shutdown to work with rds_conn_path") so make the macro and the __rds_conn_error function definition and declaration disappear. Signed-off-by: Joe Perches --- Had same patch along with few more in the queue but didn't find time of late to get it on the list. Thanks for both patches. Acked-by: Santosh Shilimkar
[PATCH v2 3/3] net: smsc91x: add u16 workaround for pxa platforms
Add a workaround for mainstone, idp and stargate2 boards, for u16 writes which must be aligned on 32 bits addresses. Signed-off-by: Robert Jarzmik Cc: Jeremy Linton --- Since v1: rename dt property to pxa-u16-align4 change the binding documentation file --- Documentation/devicetree/bindings/net/smsc-lan91c111.txt | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Documentation/devicetree/bindings/net/smsc-lan91c111.txt b/Documentation/devicetree/bindings/net/smsc-lan91c111.txt index e77e167593db..309e37eb7c7c 100644 --- a/Documentation/devicetree/bindings/net/smsc-lan91c111.txt +++ b/Documentation/devicetree/bindings/net/smsc-lan91c111.txt @@ -13,3 +13,5 @@ Optional properties: 16-bit access only. - power-gpios: GPIO to control the PWRDWN pin - reset-gpios: GPIO to control the RESET pin +- pxa-u16-align4 : Boolean, put in place the workaround the force all + u16 writes to be 32 bits aligned -- 2.1.4
[PATCH v2 1/3] net: smc91x: isolate u16 writes alignment workaround
Writes to u16 has a special handling on 3 PXA platforms, where the hardware wiring forces these writes to be u32 aligned. This patch isolates this handling for PXA platforms as before, but enables this "workaround" to be set up dynamically, which will be the case in device-tree build types. This patch was tested on 2 PXA platforms : mainstone, which relies on the workaround, and lubbock, which doesn't. Signed-off-by: Robert Jarzmik --- drivers/net/ethernet/smsc/smc91x.c | 6 ++- drivers/net/ethernet/smsc/smc91x.h | 78 +- 2 files changed, 48 insertions(+), 36 deletions(-) diff --git a/drivers/net/ethernet/smsc/smc91x.c b/drivers/net/ethernet/smsc/smc91x.c index 9b4780f87863..5658c2b28ec8 100644 --- a/drivers/net/ethernet/smsc/smc91x.c +++ b/drivers/net/ethernet/smsc/smc91x.c @@ -602,7 +602,8 @@ static void smc_hardware_send_pkt(unsigned long data) SMC_PUSH_DATA(lp, buf, len & ~1); /* Send final ctl word with the last byte if there is one */ - SMC_outw(((len & 1) ? (0x2000 | buf[len-1]) : 0), ioaddr, DATA_REG(lp)); + SMC_outw(lp, ((len & 1) ? (0x2000 | buf[len-1]) : 0), ioaddr, +DATA_REG(lp)); /* * If THROTTLE_TX_PKTS is set, we stop the queue here. This will @@ -2282,6 +2283,9 @@ static int smc_drv_probe(struct platform_device *pdev) goto out_free_netdev; } } + lp->half_word_align4 = + machine_is_mainstone() || machine_is_stargate2() || + machine_is_pxa_idp(); #if IS_BUILTIN(CONFIG_OF) match = of_match_device(of_match_ptr(smc91x_match), &pdev->dev); diff --git a/drivers/net/ethernet/smsc/smc91x.h b/drivers/net/ethernet/smsc/smc91x.h index ea8465467469..dff165ed106d 100644 --- a/drivers/net/ethernet/smsc/smc91x.h +++ b/drivers/net/ethernet/smsc/smc91x.h @@ -86,11 +86,11 @@ #define SMC_inl(a, r) readl((a) + (r)) #define SMC_outb(v, a, r) writeb(v, (a) + (r)) -#define SMC_outw(v, a, r) \ +#define SMC_outw(lp, v, a, r) \ do {\ unsigned int __v = v, __smc_r = r; \ if (SMC_16BIT(lp)) \ - __SMC_outw(__v, a, __smc_r);\ + __SMC_outw(lp, __v, a, __smc_r);\ else if (SMC_8BIT(lp)) \ SMC_outw_b(__v, a, __smc_r);\ else\ @@ -107,10 +107,10 @@ #define SMC_IRQ_FLAGS (-1)/* from resource */ /* We actually can't write halfwords properly if not word aligned */ -static inline void __SMC_outw(u16 val, void __iomem *ioaddr, int reg) +static inline void _SMC_outw_align4(u16 val, void __iomem *ioaddr, int reg, + bool use_align4_workaround) { - if ((machine_is_mainstone() || machine_is_stargate2() || -machine_is_pxa_idp()) && reg & 2) { + if (use_align4_workaround) { unsigned int v = val << 16; v |= readl(ioaddr + (reg & ~2)) & 0x; writel(v, ioaddr + (reg & ~2)); @@ -119,6 +119,12 @@ static inline void __SMC_outw(u16 val, void __iomem *ioaddr, int reg) } } +#define __SMC_outw(lp, v, a, r) \ + _SMC_outw_align4((v), (a), (r), \ +IS_BUILTIN(CONFIG_ARCH_PXA) && ((r) & 2) &&\ +lp->half_word_align4) + + #elif defined(CONFIG_SH_SH4202_MICRODEV) #define SMC_CAN_USE_8BIT 0 @@ -129,7 +135,7 @@ static inline void __SMC_outw(u16 val, void __iomem *ioaddr, int reg) #define SMC_inw(a, r) inw((a) + (r) - 0xa000) #define SMC_inl(a, r) inl((a) + (r) - 0xa000) #define SMC_outb(v, a, r) outb(v, (a) + (r) - 0xa000) -#define SMC_outw(v, a, r) outw(v, (a) + (r) - 0xa000) +#define SMC_outw(lp, v, a, r) outw(v, (a) + (r) - 0xa000) #define SMC_outl(v, a, r) outl(v, (a) + (r) - 0xa000) #define SMC_insl(a, r, p, l) insl((a) + (r) - 0xa000, p, l) #define SMC_outsl(a, r, p, l) outsl((a) + (r) - 0xa000, p, l) @@ -147,7 +153,7 @@ static inline void __SMC_outw(u16 val, void __iomem *ioaddr, int reg) #define SMC_inb(a, r) inb(((u32)a) + (r)) #define SMC_inw(a, r) inw(((u32)a) + (r)) #define SMC_outb(v, a, r) outb(v, ((u32)a) + (r)) -#define SMC_outw(v, a, r) outw(v, ((u32)a) + (r)) +#define SMC_outw(lp, v, a, r) outw(v, ((u32)a) + (r)) #define SMC_insw(a, r, p, l) insw(((u32)a) + (r), p, l) #define SMC_outsw(a, r, p, l) outsw(((u32)a) + (r), p, l) @@ -175,7 +181,7 @@ static
[PATCH v2 2/3] net: smc91x: take into account half-word workaround
For device-tree builds, platforms such as mainstone, idp and stargate2 must have their u16 writes all aligned on 32 bit boundaries. This is already enabled in platform data builds, and this patch adds it to device-tree builds. Signed-off-by: Robert Jarzmik --- Since v1: rename dt property to pxa-u16-align4 --- drivers/net/ethernet/smsc/smc91x.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/smsc/smc91x.c b/drivers/net/ethernet/smsc/smc91x.c index 5658c2b28ec8..c14676805d06 100644 --- a/drivers/net/ethernet/smsc/smc91x.c +++ b/drivers/net/ethernet/smsc/smc91x.c @@ -2329,6 +2329,8 @@ static int smc_drv_probe(struct platform_device *pdev) if (!device_property_read_u32(&pdev->dev, "reg-shift", &val)) lp->io_shift = val; + lp->half_word_align4 = + device_property_read_bool(&pdev->dev, "pxa-u16-align4"); } #endif -- 2.1.4
[PATCH v2 0/3] support smc91x on mainstone and devicetree
This serie aims at bringing support to mainstone board on a device-tree based build, as what is already in place for legacy mainstone. The bulk of the mainstone "specific" behavior is that a u16 write doesn't work on a address of the form 4*n + 2, while it works on 4*n. The legacy workaround was in SMC_outw(), with calls to machine_is_mainstone(). These calls don't work with a pxa27x-dt machine type, which is used when a generic device-tree pxa27x machine is used to boot the mainstone board. Therefore, this serie enables the smc91c111 adapter of the mainstone board to work on a device-tree build, exaclty as it's been working for years with the legacy arch/arm/mach-pxa/mainstone.c definition. Cheers. -- Robert Robert Jarzmik (3): net: smc91x: isolate u16 writes alignment workaround net: smc91x: take into account half-word workaround net: smsc91x: add u16 workaround for pxa platforms .../devicetree/bindings/net/smsc-lan91c111.txt | 2 + drivers/net/ethernet/smsc/smc91x.c | 8 ++- drivers/net/ethernet/smsc/smc91x.h | 78 -- 3 files changed, 52 insertions(+), 36 deletions(-) -- 2.1.4
Re: [rds-devel] [PATCH 2/2] rds: Remove duplicate prefix from rds_conn_path_error use
On (10/15/16 11:53), Joe Perches wrote: > > rds_conn_path_error already prefixes "RDS:" to the output. > > Signed-off-by: Joe Perches Acked-by: Sowmini Varadhan
Re: [rds-devel] [PATCH 1/2] rds: Remove unused rds_conn_error
On (10/15/16 11:53), Joe Perches wrote: > This macro's last use was removed in commit d769ef81d5b59 > ("RDS: Update rds_conn_shutdown to work with rds_conn_path") > so make the macro and the __rds_conn_error function definition > and declaration disappear. > > Signed-off-by: Joe Perches Acked-by: Sowmini Varadhan
[PATCH 1/2] rds: Remove unused rds_conn_error
This macro's last use was removed in commit d769ef81d5b59 ("RDS: Update rds_conn_shutdown to work with rds_conn_path") so make the macro and the __rds_conn_error function definition and declaration disappear. Signed-off-by: Joe Perches --- net/rds/connection.c | 15 --- net/rds/rds.h| 4 2 files changed, 19 deletions(-) diff --git a/net/rds/connection.c b/net/rds/connection.c index f5058559bb08..13f459dad4ef 100644 --- a/net/rds/connection.c +++ b/net/rds/connection.c @@ -689,21 +689,6 @@ void rds_conn_connect_if_down(struct rds_connection *conn) } EXPORT_SYMBOL_GPL(rds_conn_connect_if_down); -/* - * An error occurred on the connection - */ -void -__rds_conn_error(struct rds_connection *conn, const char *fmt, ...) -{ - va_list ap; - - va_start(ap, fmt); - vprintk(fmt, ap); - va_end(ap); - - rds_conn_drop(conn); -} - void __rds_conn_path_error(struct rds_conn_path *cp, const char *fmt, ...) { diff --git a/net/rds/rds.h b/net/rds/rds.h index fd0bccb2f9f9..25532a46602f 100644 --- a/net/rds/rds.h +++ b/net/rds/rds.h @@ -683,10 +683,6 @@ void rds_for_each_conn_info(struct socket *sock, unsigned int len, struct rds_info_lengths *lens, int (*visitor)(struct rds_connection *, void *), size_t item_len); -__printf(2, 3) -void __rds_conn_error(struct rds_connection *conn, const char *, ...); -#define rds_conn_error(conn, fmt...) \ - __rds_conn_error(conn, KERN_WARNING "RDS: " fmt) __printf(2, 3) void __rds_conn_path_error(struct rds_conn_path *cp, const char *, ...); -- 2.10.0.rc2.1.g053435c
[PATCH 2/2] rds: Remove duplicate prefix from rds_conn_path_error use
rds_conn_path_error already prefixes "RDS:" to the output. Signed-off-by: Joe Perches --- net/rds/threads.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/net/rds/threads.c b/net/rds/threads.c index e42df11bf30a..e36e333a0aa0 100644 --- a/net/rds/threads.c +++ b/net/rds/threads.c @@ -171,8 +171,7 @@ void rds_connect_worker(struct work_struct *work) RDS_CONN_DOWN)) rds_queue_reconnect(cp); else - rds_conn_path_error(cp, - "RDS: connect failed\n"); + rds_conn_path_error(cp, "connect failed\n"); } } } -- 2.10.0.rc2.1.g053435c
[PATCH 0/2] rds: logging neatening
Joe Perches (2): rds: Remove unused rds_conn_error rds: Remove duplicate prefix from rds_conn_path_error use net/rds/connection.c | 15 --- net/rds/rds.h| 4 net/rds/threads.c| 3 +-- 3 files changed, 1 insertion(+), 21 deletions(-) -- 2.10.0.rc2.1.g053435c
Re: Need help with mdiobus_register and phy
Andrew Lunn wrote: 1) Take the SerDes power down out of the suspend code for the at803x. 2) Assume MII_PHYID1/2 registers are not guaranteed to be available when the PHY is powered down. So get_phy_id should first read MII_BMCR. If it gets 0x, assume there is no PHY there. If the PDOWN bit is set, power up the PHY. Then reading the ID registers. Before we take approach #1, I'd like to hear from the developer of that patch, Zefir. According to him, that patch is necessary to fix a bug. I don't know if that bug exists only on his system, though. -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation.
Re: Need help with mdiobus_register and phy
On Sat, Oct 15, 2016 at 09:39:12AM -0500, Timur Tabi wrote: > Florian Fainelli wrote: > >After reading the spec again, it does not appear to me that a PHY > >with PDOWN set is guaranteed or even required to respond to other > >register reads such as MII_PHYID1/2, in which case we may have to > >implement a MDIO bus reset routine which clears PDOWN for all PHYs > >that we detect(ed), or as Andrew suggested, utilize the matching by > >compatible string with the PHY OUI in it. > > The 8031 does respond normally when PDOWN is set. However, the ID > registers are not available when the SerDes bus is also powered > down. I'll call this PDOWN+. This is a special power-down sequence > that the at803x driver does on suspend. See my other email for > details. So we appear to have two ways to go: 1) Take the SerDes power down out of the suspend code for the at803x. 2) Assume MII_PHYID1/2 registers are not guaranteed to be available when the PHY is powered down. So get_phy_id should first read MII_BMCR. If it gets 0x, assume there is no PHY there. If the PDOWN bit is set, power up the PHY. Then reading the ID registers. Andrew
[net:master 10/15] net/ipv6/addrconf.c:1251:14-30: WARNING: Unsigned expression compared with zero: tmp_prefered_lft < 0
I haven't checked the entire context, but it could be useful to look at line 1251. julia -- Forwarded message -- Date: Sun, 16 Oct 2016 01:34:18 +0800 From: kbuild test robot To: kbu...@01.org Cc: Julia Lawall Subject: [net:master 10/15] net/ipv6/addrconf.c:1251:14-30: WARNING: Unsigned expression compared with zero: tmp_prefered_lft < 0 CC: kbuild-...@01.org CC: netdev@vger.kernel.org TO: Jiri Bohac tree: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master head: 9e55d0f95460a067def5400fa5eee5dabb0fc5a5 commit: 76506a986dc31394fd1f2741db037d29c7e57843 [10/15] IPv6: fix DESYNC_FACTOR :: branch date: 21 hours ago :: commit date: 27 hours ago >> net/ipv6/addrconf.c:1251:14-30: WARNING: Unsigned expression compared with >> zero: tmp_prefered_lft < 0 git remote add net https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git git remote update net git checkout 76506a986dc31394fd1f2741db037d29c7e57843 vim +1251 net/ipv6/addrconf.c 76506a986 Jiri Bohac 2016-10-13 1235 if (unlikely(idev->desync_factor > max_desync_factor)) { 76506a986 Jiri Bohac 2016-10-13 1236 if (max_desync_factor > 0) { 76506a986 Jiri Bohac 2016-10-13 1237 get_random_bytes(&idev->desync_factor, 76506a986 Jiri Bohac 2016-10-13 1238 sizeof(idev->desync_factor)); 76506a986 Jiri Bohac 2016-10-13 1239 idev->desync_factor %= max_desync_factor; 76506a986 Jiri Bohac 2016-10-13 1240 } else { 76506a986 Jiri Bohac 2016-10-13 1241 idev->desync_factor = 0; 76506a986 Jiri Bohac 2016-10-13 1242 } 76506a986 Jiri Bohac 2016-10-13 1243 } 76506a986 Jiri Bohac 2016-10-13 1244 ^1da177e4 Linus Torvalds 2005-04-16 1245 tmp_valid_lft = min_t(__u32, ^1da177e4 Linus Torvalds 2005-04-16 1246 ifp->valid_lft, 7a876b0ef Glenn Wurster 2010-09-27 1247 idev->cnf.temp_valid_lft + age); 76506a986 Jiri Bohac 2016-10-13 1248 tmp_prefered_lft = idev->cnf.temp_prefered_lft + age - 76506a986 Jiri Bohac 2016-10-13 1249 idev->desync_factor; 76506a986 Jiri Bohac 2016-10-13 1250 /* guard against underflow in case of concurrent updates to cnf */ 76506a986 Jiri Bohac 2016-10-13 @1251 if (unlikely(tmp_prefered_lft < 0)) 76506a986 Jiri Bohac 2016-10-13 1252 tmp_prefered_lft = 0; 76506a986 Jiri Bohac 2016-10-13 1253 tmp_prefered_lft = min_t(__u32, ifp->prefered_lft, tmp_prefered_lft); ^1da177e4 Linus Torvalds 2005-04-16 1254 tmp_plen = ifp->prefix_len; ^1da177e4 Linus Torvalds 2005-04-16 1255 tmp_tstamp = ifp->tstamp; ^1da177e4 Linus Torvalds 2005-04-16 1256 spin_unlock_bh(&ifp->lock); ^1da177e4 Linus Torvalds 2005-04-16 1257 53bd67491 Jiri Pirko 2013-12-06 1258 write_unlock_bh(&idev->lock); 95c385b4d Neil Horman2007-04-25 1259 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
Re: [patch net-next RFC 4/6] Introduce sample tc action
On 10/15/16, 9:34 AM, Roopa Prabhu wrote: > On 10/12/16, 5:41 AM, Jiri Pirko wrote: >> From: Yotam Gigi >> >> This action allow the user to sample traffic matched by tc classifier. >> The sampling consists of choosing packets randomly, truncating them, >> adding some informative metadata regarding the interface and the original >> packet size and mark them with specific mark, to allow further tc rules to >> match and process. The marked sample packets are then injected into the >> device ingress qdisc using netif_receive_skb. >> >> The packets metadata is packed using the ife encapsulation protocol, and >> the outer packet's ethernet dest, source and eth_type, along with the >> rate, mark and the optional truncation size can be configured from >> userspace. >> >> Example: >> To sample ingress traffic from interface eth1, and redirect the sampled >> the sampled packets to interface dummy0, one may use the commands: >> >> tc qdisc add dev eth1 handle : ingress >> >> tc filter add dev eth1 parent : \ >> matchall action sample rate 12 mark 17 >> >> tc filter add parent : dev eth1 protocol all \ >> u32 match mark 172 0xff >> action mirred egress redirect dev dummy0 >> >> Where the first command adds an ingress qdisc and the second starts >> sampling every 12'th packet on dev eth0 and marks the sampled packets with >> 17. The command third catches the sampled packets, which are marked with >> 17, and redirects them to dev dummy0. >> >> Signed-off-by: Yotam Gigi >> Signed-off-by: Jiri Pirko > channeling some feedback from Peter Phaal @sflow inline below: > > If it helps, one more thing that came up was using bpf. They also use bpf filters for pkt sampling in the non-offloaded case: http://blog.sflow.com/2016/05/berkeley-packet-filter-bpf.html so, existing apps (like sflow) that care about packet sampling do prefer to use a socket api for sample delivery: netlink nflog or bpf like socket filters also, to keep the software and hardware models the same, wondering if ebpf attach can be a viable option (have not thought about the offloaded case completely yet). This would give apps more control on attaching sample headers (like sflow) if needed. thanks, Roopa
[PATCH] crypto: ccm - avoid scatterlist for MAC encryption
The CCM code goes out of its way to perform the CTR encryption of the MAC using the subordinate CTR driver. To this end, it tweaks the input and output scatterlists so the aead_req 'odata' and/or 'auth_tag' fields [which may live on the stack] are prepended to the CTR payload. This involves calling sg_set_buf() on addresses which are not direct mapped, which is not supported. Since the calculation of the MAC keystream involves a single call into the cipher, to which we have a handle already given that the CBC-MAC calculation uses it as well, just calculate the MAC keystream directly, and record it in the aead_req private context so we can apply it to the MAC in cypto_ccm_auth_mac(). This greatly simplifies the scatterlist manipulation, and no longer requires scatterlists to refer to buffers that may live on the stack. Signed-off-by: Ard Biesheuvel --- This is an alternative for the patch 'mac80211: aes_ccm: move struct aead_req off the stack' that I sent out yesterday. IMO, this is a more correct approach, since it addresses the problem directly in crypto/ccm.c, which is the only CCM-AES driver that suffers from this issue. crypto/ccm.c | 55 +++- 1 file changed, 29 insertions(+), 26 deletions(-) diff --git a/crypto/ccm.c b/crypto/ccm.c index 006d8575ef5c..faa5efcf59e2 100644 --- a/crypto/ccm.c +++ b/crypto/ccm.c @@ -46,10 +46,13 @@ struct crypto_ccm_req_priv_ctx { u8 odata[16]; u8 idata[16]; u8 auth_tag[16]; + u8 cmac[16]; u32 ilen; u32 flags; - struct scatterlist src[3]; - struct scatterlist dst[3]; + struct scatterlist *src; + struct scatterlist *dst; + struct scatterlist srcbuf[2]; + struct scatterlist dstbuf[2]; struct skcipher_request skreq; }; @@ -280,6 +283,8 @@ static int crypto_ccm_auth(struct aead_request *req, struct scatterlist *plain, if (cryptlen) get_data_to_compute(cipher, pctx, plain, cryptlen); + crypto_xor(odata, pctx->cmac, 16); + out: return err; } @@ -307,10 +312,12 @@ static inline int crypto_ccm_check_iv(const u8 *iv) return 0; } -static int crypto_ccm_init_crypt(struct aead_request *req, u8 *tag) +static int crypto_ccm_init_crypt(struct aead_request *req) { + struct crypto_aead *aead = crypto_aead_reqtfm(req); + struct crypto_ccm_ctx *ctx = crypto_aead_ctx(aead); struct crypto_ccm_req_priv_ctx *pctx = crypto_ccm_reqctx(req); - struct scatterlist *sg; + struct crypto_cipher *cipher = ctx->cipher; u8 *iv = req->iv; int err; @@ -325,19 +332,16 @@ static int crypto_ccm_init_crypt(struct aead_request *req, u8 *tag) */ memset(iv + 15 - iv[0], 0, iv[0] + 1); - sg_init_table(pctx->src, 3); - sg_set_buf(pctx->src, tag, 16); - sg = scatterwalk_ffwd(pctx->src + 1, req->src, req->assoclen); - if (sg != pctx->src + 1) - sg_chain(pctx->src, 2, sg); + /* prepare the key stream for the auth tag */ + crypto_cipher_encrypt_one(cipher, pctx->cmac, iv); - if (req->src != req->dst) { - sg_init_table(pctx->dst, 3); - sg_set_buf(pctx->dst, tag, 16); - sg = scatterwalk_ffwd(pctx->dst + 1, req->dst, req->assoclen); - if (sg != pctx->dst + 1) - sg_chain(pctx->dst, 2, sg); - } + /* increment BE counter in IV[] for the actual payload */ + iv[15] = 1; + + pctx->src = scatterwalk_ffwd(pctx->srcbuf, req->src, req->assoclen); + if (req->src != req->dst) + pctx->dst = scatterwalk_ffwd(pctx->dstbuf, req->dst, +req->assoclen); return 0; } @@ -354,11 +358,11 @@ static int crypto_ccm_encrypt(struct aead_request *req) u8 *iv = req->iv; int err; - err = crypto_ccm_init_crypt(req, odata); + err = crypto_ccm_init_crypt(req); if (err) return err; - err = crypto_ccm_auth(req, sg_next(pctx->src), cryptlen); + err = crypto_ccm_auth(req, pctx->src, cryptlen); if (err) return err; @@ -369,13 +373,13 @@ static int crypto_ccm_encrypt(struct aead_request *req) skcipher_request_set_tfm(skreq, ctx->ctr); skcipher_request_set_callback(skreq, pctx->flags, crypto_ccm_encrypt_done, req); - skcipher_request_set_crypt(skreq, pctx->src, dst, cryptlen + 16, iv); + skcipher_request_set_crypt(skreq, pctx->src, dst, cryptlen, iv); err = crypto_skcipher_encrypt(skreq); if (err) return err; /* copy authtag to end of dst */ - scatterwalk_map_and_copy(odata, sg_next(dst), cryptlen, + scatterwalk_map_and_copy(odata, dst, cryptlen, crypto_aead_authsize(aead), 1); return err; } @@ -392,7 +396,7 @@ static void crypto_ccm_decrypt_do
Re: [PATCH] net: limit a number of namespaces which can be cleaned up concurrently
Andrei Vagin writes: > On Thu, Oct 13, 2016 at 10:06:28PM -0500, Eric W. Biederman wrote: >> Andrei Vagin writes: >> >> > On Thu, Oct 13, 2016 at 10:49:38AM -0500, Eric W. Biederman wrote: >> >> Andrei Vagin writes: >> >> >> >> > From: Andrey Vagin >> >> > >> >> > The operation of destroying netns is heavy and it is executed under >> >> > net_mutex. If many namespaces are destroyed concurrently, net_mutex can >> >> > be locked for a long time. It is impossible to create a new netns during >> >> > this period of time. >> >> >> >> This may be the right approach or at least the right approach to bound >> >> net_mutex hold times but I have to take exception to calling network >> >> namespace cleanup heavy. >> >> >> >> The only particularly time consuming operation I have ever found are >> >> calls to >> >> synchronize_rcu/sycrhonize_sched/synchronize_net. >> > >> > I booted the kernel with maxcpus=1, in this case these functions work >> > very fast and the problem is there any way. >> > >> > Accoding to perf, we spend a lot of time in kobject_uevent: >> > >> > - 99.96% 0.00% kworker/u4:1 [kernel.kallsyms] [k] >> > unregister_netdevice_many >> >- unregister_netdevice_many >> > - 99.95% rollback_registered_many >> > - 99.64% netdev_unregister_kobject >> > - 33.43% netdev_queue_update_kobjects >> >- 33.40% kobject_put >> > - kobject_release >> > + 33.37% kobject_uevent >> > + 0.03% kobject_del >> >+ 0.03% sysfs_remove_group >> > - 33.13% net_rx_queue_update_kobjects >> >- kobject_put >> >- kobject_release >> > + 33.11% kobject_uevent >> > + 0.01% kobject_del >> > 0.00% rx_queue_release >> > - 33.08% device_del >> >+ 32.75% kobject_uevent >> >+ 0.17% device_remove_attrs >> >+ 0.07% dpm_sysfs_remove >> >+ 0.04% device_remove_class_symlinks >> >+ 0.01% kobject_del >> >+ 0.01% device_pm_remove >> >+ 0.01% sysfs_remove_file_ns >> >+ 0.00% klist_del >> >+ 0.00% driver_deferred_probe_del >> > 0.00% cleanup_glue_dir.isra.14.part.15 >> > 0.00% to_acpi_device_node >> > 0.00% sysfs_remove_group >> > 0.00% klist_del >> > 0.00% device_remove_attrs >> > + 0.26% call_netdevice_notifiers_info >> > + 0.04% rtmsg_ifinfo_build_skb >> > + 0.01% rtmsg_ifinfo_send >> > 0.00% dev_uc_flush >> > 0.00% netif_reset_xps_queues_gt >> > >> > Someone can listen these uevents, so we can't stop sending them without >> > breaking backward compatibility. We can try to optimize >> > kobject_uevent... >> >> Oh that is a surprise. We can definitely skip genenerating uevents for >> network namespaces that are exiting because by definition no one can see >> those network namespaces. If a socket existed that could see those >> uevents it would hold a reference to the network namespace and as such >> the network namespace could not exit. >> >> That sounds like it is worth investigating a little more deeply. >> >> I am surprised that allocation and freeing is so heavy we are spending >> lots of time doing that. On the other hand kobj_bcast_filter is very >> dumb and very late so I expect something can be moved earlier and make >> that code cheaper with the tiniest bit of work. >> > > I'm sorry, I've collected this data for a kernel with debug options > (DEBUG_SPINLOCK, PROVE_LOCKING, DEBUG_LIST, etc). If a kernel is > compiled without debug options, kobject_uevent becomes less expensive, > but still expensive. > > - 98.64% 0.00% kworker/u4:2 [kernel.kallsyms][k] cleanup_net >- cleanup_net > - 98.54% ops_exit_list.isra.4 > - 60.48% default_device_exit_batch > - 60.40% unregister_netdevice_many >- rollback_registered_many > - 59.82% netdev_unregister_kobject > - 20.10% device_del > + 19.44% kobject_uevent > + 0.40% device_remove_attrs > + 0.17% dpm_sysfs_remove > + 0.04% device_remove_class_symlinks > + 0.04% kobject_del > + 0.01% device_pm_remove > + 0.01% sysfs_remove_file_ns > - 19.89% netdev_queue_update_kobjects > + 19.81% kobject_put > + 0.07% sysfs_remove_group > - 19.79% net_rx_queue_update_kobjects > kobject_put > - kobject_release >+ 19.77% kobject_uevent >+ 0.02% kobject_d
Re: [patch net-next RFC 4/6] Introduce sample tc action
On 10/12/16, 5:41 AM, Jiri Pirko wrote: > From: Yotam Gigi > > This action allow the user to sample traffic matched by tc classifier. > The sampling consists of choosing packets randomly, truncating them, > adding some informative metadata regarding the interface and the original > packet size and mark them with specific mark, to allow further tc rules to > match and process. The marked sample packets are then injected into the > device ingress qdisc using netif_receive_skb. > > The packets metadata is packed using the ife encapsulation protocol, and > the outer packet's ethernet dest, source and eth_type, along with the > rate, mark and the optional truncation size can be configured from > userspace. > > Example: > To sample ingress traffic from interface eth1, and redirect the sampled > the sampled packets to interface dummy0, one may use the commands: > > tc qdisc add dev eth1 handle : ingress > > tc filter add dev eth1 parent : \ > matchall action sample rate 12 mark 17 > > tc filter add parent : dev eth1 protocol all \ > u32 match mark 172 0xff > action mirred egress redirect dev dummy0 > > Where the first command adds an ingress qdisc and the second starts > sampling every 12'th packet on dev eth0 and marks the sampled packets with > 17. The command third catches the sampled packets, which are marked with > 17, and redirects them to dev dummy0. > > Signed-off-by: Yotam Gigi > Signed-off-by: Jiri Pirko channeling some feedback from Peter Phaal @sflow inline below: > --- > > diff --git a/include/net/tc_act/tc_sample.h b/include/net/tc_act/tc_sample.h > new file mode 100644 > index 000..a2b445a > --- /dev/null > +++ b/include/net/tc_act/tc_sample.h > @@ -0,0 +1,88 @@ > +#ifndef __NET_TC_SAMPLE_H > +#define __NET_TC_SAMPLE_H > + > +#include > +#include > + > +struct tcf_sample { > + struct tc_actioncommon; > + u32 rate; > + u32 mark; > + booltruncate; > + u32 trunc_size; > + u32 packet_counter; > + u8 eth_dst[ETH_ALEN]; > + u8 eth_src[ETH_ALEN]; > + u16 eth_type; > + booleth_type_set; > + struct list_headtcfm_list; > +}; > +#define to_sample(a) ((struct tcf_sample *)a) > + > +struct sample_packet_metadata { > + int sample_size; > + int orig_size; > + int ifindex; > +}; > + This metadata does not look extensible.. can it be made to ? With sflow in context, you need a pair of ifindex numbers to encode ingress and egress ports. Ideally you would also include a sequence number and a count of the total number of packets that were candidates for sampling. The OVS implementation is a good example, the metadata includes all the actions applied to the packet in the kernel data path. [snip] > diff --git a/include/uapi/linux/tc_act/tc_sample.h > b/include/uapi/linux/tc_act/tc_sample.h > new file mode 100644 > index 000..654945b > --- /dev/null > +++ b/include/uapi/linux/tc_act/tc_sample.h > @@ -0,0 +1,31 @@ > +#ifndef __LINUX_TC_SAMPLE_H > +#define __LINUX_TC_SAMPLE_H > + > +#include > +#include > +#include > + > +#define TCA_ACT_SAMPLE 26 > + > +struct tc_sample { > + tc_gen; > + __u32 rate; /* sample rate */ > + __u32 mark; /* mark to put on the sampled packets */ > + booltruncate; /* whether to truncate the packets */ > + __u32 trunc_size; /* truncation size */ > + __u8eth_dst[ETH_ALEN]; /* encapsulated mac destination */ > + __u8eth_src[ETH_ALEN]; /* encapsulated mac source */ > + booleth_type_set; /* whether to overrid ethtype */ > + __u16 eth_type; /* encapsulated mac ethtype */ > +}; > + this does not look extensible and is part of UAPI .. Doing the minimum in the kernel and leaving the rest to the user space agent is much more flexible. The user space agent can attach additional metadata and offer more flexibility in forwarding (sFlow uses XDR encoding over UDP and is routable over IPv4/IPv6). > +enum { > + TCA_SAMPLE_UNSPEC, > + TCA_SAMPLE_TM, > + TCA_SAMPLE_PARMS, > + TCA_SAMPLE_PAD, > + __TCA_SAMPLE_MAX > +}; > +#define TCA_SAMPLE_MAX (__TCA_SAMPLE_MAX - 1) > + > +#endif > diff --git a/net/sched/Kconfig b/net/sched/Kconfig > index 24f7cac..c54ea6b 100644 > --- a/net/sched/Kconfig > +++ b/net/sched/Kconfig > @@ -650,6 +650,19 @@ config NET_ACT_MIRRED > To compile this code as a module, choose M here: the > module will be called act_mirred. > > +config NET_ACT_SAMPLE > +tristate "Traffic Sampling" > +depends on NET_CLS_ACT > +select NET_IFE > +---help--- > + Say Y here to allow packet sampling tc action. The packet sample > + action consi
[PATCH] ipvlan: constify l3mdev_ops structure
This l3mdev_ops structure is only stored in the l3mdev_ops field of a net_device structure. This field is declared const, so the l3mdev_ops structure can be declared as const also. Additionally drop the __read_mostly annotation. The semantic patch that adds const is as follows: (http://coccinelle.lip6.fr/) // @r disable optional_qualifier@ identifier i; position p; @@ static struct l3mdev_ops i@p = { ... }; @ok@ identifier r.i; struct net_device *e; position p; @@ e->l3mdev_ops = &i@p; @bad@ position p != {r.p,ok.p}; identifier r.i; struct l3mdev_ops e; @@ e@i@p @depends on !bad disable optional_qualifier@ identifier r.i; @@ static +const struct l3mdev_ops i = { ... }; // The effect on the layout of the .o file is shown by the following output of the size command, first before then after the transformation: textdata bss dec hex filename 7364 466 5278821eca drivers/net/ipvlan/ipvlan_main.o 7412 434 5278981eda drivers/net/ipvlan/ipvlan_main.o Signed-off-by: Julia Lawall --- drivers/net/ipvlan/ipvlan_main.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff -u -p a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c --- a/drivers/net/ipvlan/ipvlan_main.c +++ b/drivers/net/ipvlan/ipvlan_main.c @@ -26,7 +26,7 @@ static struct nf_hook_ops ipvl_nfops[] _ }, }; -static struct l3mdev_ops ipvl_l3mdev_ops __read_mostly = { +static const struct l3mdev_ops ipvl_l3mdev_ops = { .l3mdev_l3_rcv = ipvlan_l3_rcv, };
[PATCH net] net: pktgen: remove rcu locking in pktgen_change_name()
From: Eric Dumazet After Jesper commit back in linux-3.18, we trigger a lockdep splat in proc_create_data() while allocating memory from pktgen_change_name(). This patch converts t->if_lock to a mutex, since it is now only used from control path, and adds proper locking to pktgen_change_name() 1) pktgen_thread_lock to protect the outer loop (iterating threads) 2) t->if_lock to protect the inner loop (iterating devices) Note that before Jesper patch, pktgen_change_name() was lacking proper protection, but lockdep was not able to detect the problem. Fixes: 8788370a1d4b ("pktgen: RCU-ify "if_list" to remove lock in next_to_run()") Reported-by: John Sperbeck Signed-off-by: Eric Dumazet Cc: Jesper Dangaard Brouer --- net/core/pktgen.c | 17 ++--- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/net/core/pktgen.c b/net/core/pktgen.c index 5219a9e2127a..306b8f0e03c1 100644 --- a/net/core/pktgen.c +++ b/net/core/pktgen.c @@ -216,8 +216,8 @@ #define M_QUEUE_XMIT 2 /* Inject packet into qdisc */ /* If lock -- protects updating of if_list */ -#define if_lock(t) spin_lock(&(t->if_lock)); -#define if_unlock(t) spin_unlock(&(t->if_lock)); +#define if_lock(t) mutex_lock(&(t->if_lock)); +#define if_unlock(t) mutex_unlock(&(t->if_lock)); /* Used to help with determining the pkts on receive */ #define PKTGEN_MAGIC 0xbe9be955 @@ -423,7 +423,7 @@ struct pktgen_net { }; struct pktgen_thread { - spinlock_t if_lock; /* for list of devices */ + struct mutex if_lock; /* for list of devices */ struct list_head if_list; /* All device here */ struct list_head th_list; struct task_struct *tsk; @@ -2010,11 +2010,13 @@ static void pktgen_change_name(const struct pktgen_net *pn, struct net_device *d { struct pktgen_thread *t; + mutex_lock(&pktgen_thread_lock); + list_for_each_entry(t, &pn->pktgen_threads, th_list) { struct pktgen_dev *pkt_dev; - rcu_read_lock(); - list_for_each_entry_rcu(pkt_dev, &t->if_list, list) { + if_lock(t); + list_for_each_entry(pkt_dev, &t->if_list, list) { if (pkt_dev->odev != dev) continue; @@ -2029,8 +2031,9 @@ static void pktgen_change_name(const struct pktgen_net *pn, struct net_device *d dev->name); break; } - rcu_read_unlock(); + if_unlock(t); } + mutex_unlock(&pktgen_thread_lock); } static int pktgen_device_event(struct notifier_block *unused, @@ -3762,7 +3765,7 @@ static int __net_init pktgen_create_thread(int cpu, struct pktgen_net *pn) return -ENOMEM; } - spin_lock_init(&t->if_lock); + mutex_init(&t->if_lock); t->cpu = cpu; INIT_LIST_HEAD(&t->if_list);
Re: [PATCH] ethtool: Zero memory allocated for statistics
From: Vlad Tsyrklevich Date: Sat, 15 Oct 2016 15:11:08 + > I agree that we should propagate those errors and I'll prepare a new change > to do so for phy_driver.get_stats(), ethtool_ops.self_test(), and > ethtool_ops.get_ethtool_stats(). However, I still think this change should > be adopted. 3/5 of the cases here are reachable without any special > capabilities and programming defensively at the ethtool interface can > eliminate an entire class of potential driver bugs instead of fixing them > one by one. For example, get_eeprom() propagates errors but with a brief > grep I found that qlcnic_get_eeprom() will return 0 incorrectly even though > it read nothing for some NICs. Deeper bugs are undoubtedly laying around. I'm all for defensive program when practical. But statistics gathering is highly performance sensitive for many important use cases, so I'm not ready to add a whole bzero() here unless absolutely, positively, necessary. Thanks.
Re: linux/atm_zatm.h not really usable in userspace since cf00713a655d3019be7faa184402f16c43a0fed3
On 15 October 2016 at 16:10, Mikko Rapeli wrote: > On Sat, Oct 15, 2016 at 03:33:22PM +0100, Pascal Terjan wrote: >> On 15 October 2016 at 15:09, Mikko Rapeli wrote: >> > On Sat, Oct 15, 2016 at 01:05:10PM +0100, Pascal Terjan wrote: >> >> It is no longer possible to include + userspace >> >> headers using time, for example , this broke for example >> >> the build of linux-atm. >> >> >> >> Reproducer: >> >> >> >> $ cat test.c >> >> #include >> >> #include >> > >> > If possible, please reverse the order of includes to first include glibc >> > headers and then Linux kernel uapi ones. >> >> That was what I tried first but this didn't help: >> >> In file included from /usr/include/linux/atm_zatm.h:17:0, >> from test.c:2: >> /usr/include/linux/time.h:9:8: error: redefinition of 'struct timespec' >> struct timespec { >> ^ >> In file included from /usr/include/sys/select.h:43:0, >> from /usr/include/sys/types.h:219, >> from /usr/include/stdlib.h:314, >> from test.c:1: >> /usr/include/time.h:120:8: note: originally defined here >> struct timespec >> ^ >> In file included from /usr/include/linux/atm_zatm.h:17:0,> >> from test.c:2: >> from test.c:2: >> /usr/include/linux/time.h:15:8: error: redefinition of 'struct timeval' >> struct timeval { >> ^ >> In file included from /usr/include/sys/select.h:45:0, >> from /usr/include/sys/types.h:219, >> from /usr/include/stdlib.h:314, >> from test.c:1: >> /usr/include/bits/time.h:30:8: note: originally defined here >> struct timeval >> ^ >> > Kernel uapi headers did not declare their header file dependencies >> > correctly >> > and I've been fixing them. I have also tried to fix compatibility issues >> > with glibc headers, but unfortunately they only work when glibc headers >> > are included before kernel headers. Userspace which has been relying on >> > the magic include order for various uapi headers is now unfortunately >> > affected. Sorry about that. >> >> In this case no order works, it seems the kernel doesn't handle it in >> time.h unlike many other headers > > Ok, then https://patchwork.kernel.org/patch/9294305/ hasn't been applied yet. > You can apply that or revert cf00713a655d3019be7faa184402f16c43a0fed3 > for the time being. Ah thanks, I'll take that patch :) > It's a bit tricky to push through changes touching uapi headers for various > kernel sub systems since they may get applied at different order and time. Yeah I can imagine, thanks for doing it
Re: [PATCH] ethtool: Zero memory allocated for statistics
I agree that we should propagate those errors and I'll prepare a new change to do so for phy_driver.get_stats(), ethtool_ops.self_test(), and ethtool_ops.get_ethtool_stats(). However, I still think this change should be adopted. 3/5 of the cases here are reachable without any special capabilities and programming defensively at the ethtool interface can eliminate an entire class of potential driver bugs instead of fixing them one by one. For example, get_eeprom() propagates errors but with a brief grep I found that qlcnic_get_eeprom() will return 0 incorrectly even though it read nothing for some NICs. Deeper bugs are undoubtedly laying around. On Sat, Oct 15, 2016 at 5:11 PM, Vlad Tsyrklevich wrote: > I agree that we should propagate those errors and I'll prepare a new change > to do so for phy_driver.get_stats(), ethtool_ops.self_test(), and > ethtool_ops.get_ethtool_stats(). However, I still think this change should > be adopted. 3/5 of the cases here are reachable without any special > capabilities and programming defensively at the ethtool interface can > eliminate an entire class of potential driver bugs instead of fixing them > one by one. For example, get_eeprom() propagates errors but with a brief > grep I found that qlcnic_get_eeprom() will return 0 incorrectly even though > it read nothing for some NICs. Deeper bugs are undoubtedly laying around. > > On Sat, Oct 15, 2016 at 3:21 AM David Miller wrote: >> >> From: Vlad Tsyrklevich >> Date: Fri, 14 Oct 2016 11:59:18 +0200 >> >> > enic_get_ethtool_stats() >> >> Looknig merely at this shows the real problem. >> >> We don't propagate and handle errors for this method. >> >> And that's what we should fix, making the get_ethtool_stats() method >> return an integer error. >> >> Then ethtool_get_stats() should return any non-zero value provided by >> ops->get_ethtool_stats() and not attempt to copy any bytes of 'data' >> to userspace in that case.
Re: linux/atm_zatm.h not really usable in userspace since cf00713a655d3019be7faa184402f16c43a0fed3
On Sat, Oct 15, 2016 at 03:33:22PM +0100, Pascal Terjan wrote: > On 15 October 2016 at 15:09, Mikko Rapeli wrote: > > On Sat, Oct 15, 2016 at 01:05:10PM +0100, Pascal Terjan wrote: > >> It is no longer possible to include + userspace > >> headers using time, for example , this broke for example > >> the build of linux-atm. > >> > >> Reproducer: > >> > >> $ cat test.c > >> #include > >> #include > > > > If possible, please reverse the order of includes to first include glibc > > headers and then Linux kernel uapi ones. > > That was what I tried first but this didn't help: > > In file included from /usr/include/linux/atm_zatm.h:17:0, > from test.c:2: > /usr/include/linux/time.h:9:8: error: redefinition of 'struct timespec' > struct timespec { > ^ > In file included from /usr/include/sys/select.h:43:0, > from /usr/include/sys/types.h:219, > from /usr/include/stdlib.h:314, > from test.c:1: > /usr/include/time.h:120:8: note: originally defined here > struct timespec > ^ > In file included from /usr/include/linux/atm_zatm.h:17:0,> > from test.c:2: > from test.c:2: > /usr/include/linux/time.h:15:8: error: redefinition of 'struct timeval' > struct timeval { > ^ > In file included from /usr/include/sys/select.h:45:0, > from /usr/include/sys/types.h:219, > from /usr/include/stdlib.h:314, > from test.c:1: > /usr/include/bits/time.h:30:8: note: originally defined here > struct timeval > ^ > > Kernel uapi headers did not declare their header file dependencies correctly > > and I've been fixing them. I have also tried to fix compatibility issues > > with glibc headers, but unfortunately they only work when glibc headers > > are included before kernel headers. Userspace which has been relying on > > the magic include order for various uapi headers is now unfortunately > > affected. Sorry about that. > > In this case no order works, it seems the kernel doesn't handle it in > time.h unlike many other headers Ok, then https://patchwork.kernel.org/patch/9294305/ hasn't been applied yet. You can apply that or revert cf00713a655d3019be7faa184402f16c43a0fed3 for the time being. It's a bit tricky to push through changes touching uapi headers for various kernel sub systems since they may get applied at different order and time. -Mikko
Re: Need help with mdiobus_register and phy
Florian Fainelli wrote: After reading the spec again, it does not appear to me that a PHY with PDOWN set is guaranteed or even required to respond to other register reads such as MII_PHYID1/2, in which case we may have to implement a MDIO bus reset routine which clears PDOWN for all PHYs that we detect(ed), or as Andrew suggested, utilize the matching by compatible string with the PHY OUI in it. The 8031 does respond normally when PDOWN is set. However, the ID registers are not available when the SerDes bus is also powered down. I'll call this PDOWN+. This is a special power-down sequence that the at803x driver does on suspend. See my other email for details. -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation.
Re: linux/atm_zatm.h not really usable in userspace since cf00713a655d3019be7faa184402f16c43a0fed3
On 15 October 2016 at 15:09, Mikko Rapeli wrote: > On Sat, Oct 15, 2016 at 01:05:10PM +0100, Pascal Terjan wrote: >> It is no longer possible to include + userspace >> headers using time, for example , this broke for example >> the build of linux-atm. >> >> Reproducer: >> >> $ cat test.c >> #include >> #include > > If possible, please reverse the order of includes to first include glibc > headers and then Linux kernel uapi ones. That was what I tried first but this didn't help: In file included from /usr/include/linux/atm_zatm.h:17:0, from test.c:2: /usr/include/linux/time.h:9:8: error: redefinition of 'struct timespec' struct timespec { ^ In file included from /usr/include/sys/select.h:43:0, from /usr/include/sys/types.h:219, from /usr/include/stdlib.h:314, from test.c:1: /usr/include/time.h:120:8: note: originally defined here struct timespec ^ In file included from /usr/include/linux/atm_zatm.h:17:0, from test.c:2: /usr/include/linux/time.h:15:8: error: redefinition of 'struct timeval' struct timeval { ^ In file included from /usr/include/sys/select.h:45:0, from /usr/include/sys/types.h:219, from /usr/include/stdlib.h:314, from test.c:1: /usr/include/bits/time.h:30:8: note: originally defined here struct timeval ^ > Kernel uapi headers did not declare their header file dependencies correctly > and I've been fixing them. I have also tried to fix compatibility issues > with glibc headers, but unfortunately they only work when glibc headers > are included before kernel headers. Userspace which has been relying on > the magic include order for various uapi headers is now unfortunately > affected. Sorry about that. In this case no order works, it seems the kernel doesn't handle it in time.h unlike many other headers
Re: linux/atm_zatm.h not really usable in userspace since cf00713a655d3019be7faa184402f16c43a0fed3
On Sat, Oct 15, 2016 at 01:05:10PM +0100, Pascal Terjan wrote: > It is no longer possible to include + userspace > headers using time, for example , this broke for example > the build of linux-atm. > > Reproducer: > > $ cat test.c > #include > #include If possible, please reverse the order of includes to first include glibc headers and then Linux kernel uapi ones. Kernel uapi headers did not declare their header file dependencies correctly and I've been fixing them. I have also tried to fix compatibility issues with glibc headers, but unfortunately they only work when glibc headers are included before kernel headers. Userspace which has been relying on the magic include order for various uapi headers is now unfortunately affected. Sorry about that. -Mikko > $ gcc -c test.c > In file included from /usr/include/sys/select.h:43:0, > from /usr/include/sys/types.h:219, > from /usr/include/stdlib.h:314, > from test.c:2: > /usr/include/time.h:120:8: error: redefinition of 'struct timespec' > struct timespec > ^ > In file included from /usr/include/linux/atm_zatm.h:17:0, > from test.c:1: > /usr/include/linux/time.h:9:8: note: originally defined here > struct timespec { > ^ > In file included from /usr/include/sys/select.h:45:0, > from /usr/include/sys/types.h:219, > from /usr/include/stdlib.h:314, > from test.c:2: > /usr/include/bits/time.h:30:8: error: redefinition of 'struct timeval' > struct timeval > ^ > In file included from /usr/include/linux/atm_zatm.h:17:0, > from test.c:1: > /usr/include/linux/time.h:15:8: note: originally defined here > struct timeval { > ^
Re: userspace build broken by include changes
Hi, On Sat, Oct 15, 2016 at 12:40:43PM +0100, Pascal Terjan wrote: > rp-pppoe plugin of ppp no longer builds: > > In file included from pppoe.h:87:0, > from plugin.c:29: > /usr/include/linux/in.h:28:3: error: redeclaration of enumerator 'IPPROTO_IP' >IPPROTO_IP = 0, /* Dummy protocol for TCP */ >^ > /usr/include/netinet/in.h:42:5: note: previous definition of > 'IPPROTO_IP' was here > IPPROTO_IP = 0,/* Dummy protocol for TCP. */ > > Short reproducer: > > #include > #include > #include > #include > #include > #include > > Full log: > http://pkgsubmit.mageia.org/autobuild/cauldron/x86_64/core/2016-10-12/ppp-2.4.7-8.mga6.src.rpm/build.0.20161012185227.log > > Moving the include of linux/if.h after netinet/in.h fixes it. > > I guess the breakage is caused by > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/include/uapi/linux?id=eafe92114308acf14e45c6c3d154a5dad5523d1a > but the commit doesn't look wrong to me. These kernel uapi headers did not declare their dependencies correctly and this patch fixed it. Many uapi headers from kernel unfortunately conflict with glibc and other userspace headers, and userspace code is often relying on this behavior, e.g. has had a working order of includes. My patch series has compatibility changes so that kernel headers can be included after glibc ones. Unfortunately I haven't had time to provide similar patches to glibc so things might breaks if kernel headers are included before glibc headers. So the best I can do for now is to ask you to change the userspace include order to first include glibc headers and then kernel uapi ones. This is an unfortunate kernel header API break, sorry. ABI's are not affected though. -Mikko
[RFC v2 2/2] proc connector: add a "get feature" op
From: Alban Crequy As more kinds of events are being added in the proc connector, userspace needs a way to detect whether the kernel supports those new events. When a kind of event is not supported, userspace should report an error propertly, or fallback to other methods (regular polling of procfs). The events fork, exec, uid, gid, sid, ptrace, comm, exit were added together. Then commit 2b5faa4c ("connector: Added coredumping event to the process connector") added coredump events but without a way for userspace to detect if the kernel will emit those. So I am grouping them all together in PROC_CN_FEATURE_BASIC. - PROC_CN_FEATURE_BASIC: supports fork, exec, uid, gid, sid, ptrace, comm, exit, coredump. - PROC_CN_FEATURE_NS: supports ns. Signed-off-by: Alban Crequy --- drivers/connector/cn_proc.c | 25 +++-- include/uapi/linux/cn_proc.h | 4 2 files changed, 19 insertions(+), 10 deletions(-) diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c index c38733d..5f9ace6 100644 --- a/drivers/connector/cn_proc.c +++ b/drivers/connector/cn_proc.c @@ -442,15 +442,12 @@ void proc_ns_connector_send(struct ns_event_prepare *prepare, struct task_struct * values because it's not being returned via syscall return * mechanisms. */ -static void cn_proc_ack(int err, int rcvd_seq, int rcvd_ack) +static void cn_proc_ack(int err, u16 flags, int rcvd_seq, int rcvd_ack) { struct cn_msg *msg; struct proc_event *ev; __u8 buffer[CN_PROC_MSG_SIZE] __aligned(8); - if (atomic_read(&proc_event_num_listeners) < 1) - return; - msg = buffer_to_cn_msg(buffer); ev = (struct proc_event *)msg->data; memset(&ev->event_data, 0, sizeof(ev->event_data)); @@ -462,7 +459,7 @@ static void cn_proc_ack(int err, int rcvd_seq, int rcvd_ack) memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id)); msg->ack = rcvd_ack + 1; msg->len = sizeof(*ev); - msg->flags = 0; /* not used */ + msg->flags = flags; send_msg(msg); } @@ -475,9 +472,12 @@ static void cn_proc_mcast_ctl(struct cn_msg *msg, { enum proc_cn_mcast_op *mc_op = NULL; int err = 0; + u16 flags = 0; - if (msg->len != sizeof(*mc_op)) - return; + if (msg->len != sizeof(*mc_op)) { + err = EINVAL; + goto out; + } /* * Events are reported with respect to the initial pid @@ -485,8 +485,10 @@ static void cn_proc_mcast_ctl(struct cn_msg *msg, * other namespaces. */ if ((current_user_ns() != &init_user_ns) || - (task_active_pid_ns(current) != &init_pid_ns)) - return; + (task_active_pid_ns(current) != &init_pid_ns)) { + err = EPERM; + goto out; + } /* Can only change if privileged. */ if (!__netlink_ns_capable(nsp, &init_user_ns, CAP_NET_ADMIN)) { @@ -496,6 +498,9 @@ static void cn_proc_mcast_ctl(struct cn_msg *msg, mc_op = (enum proc_cn_mcast_op *)msg->data; switch (*mc_op) { + case PROC_CN_GET_FEATURES: + flags = PROC_CN_FEATURE_BASIC | PROC_CN_FEATURE_NS; + break; case PROC_CN_MCAST_LISTEN: atomic_inc(&proc_event_num_listeners); break; @@ -508,7 +513,7 @@ static void cn_proc_mcast_ctl(struct cn_msg *msg, } out: - cn_proc_ack(err, msg->seq, msg->ack); + cn_proc_ack(err, flags, msg->seq, msg->ack); } /* diff --git a/include/uapi/linux/cn_proc.h b/include/uapi/linux/cn_proc.h index 3270e8c..2ea0e5d 100644 --- a/include/uapi/linux/cn_proc.h +++ b/include/uapi/linux/cn_proc.h @@ -25,10 +25,14 @@ * for events on the connector. */ enum proc_cn_mcast_op { + PROC_CN_GET_FEATURES = 0, PROC_CN_MCAST_LISTEN = 1, PROC_CN_MCAST_IGNORE = 2 }; +#define PROC_CN_FEATURE_BASIC 0x0001 +#define PROC_CN_FEATURE_NS0x0002 + /* * From the user's point of view, the process * ID is the thread group ID and thread ID is the internal -- 2.7.4
[RFC v2 1/2] proc connector: add namespace events
From: Alban Crequy The act of a process creating or joining a namespace via clone(), unshare() or setns() is a useful signal for monitoring applications. I am working on a monitoring application that keeps track of all the containers and all processes inside each container. The current way of doing it is by polling regularly in /proc for the list of processes and in /proc/*/ns/* to know which namespaces they belong to. This is inefficient on systems with a large number of containers and a large number of processes. Instead, I would inspect /proc only one time and get the updates with the proc connector. Unfortunately, the proc connector gives me the list of processes but does not notify me when a process changes namespaces. So I would still need to inspect /proc/*/ns/*. This patch adds namespace events for processes. It generates a namespace event each time a process changes namespace via clone(), unshare() or setns(). For example, the following command: | # unshare -n -i -f ls -l /proc/self/ns/ | total 0 | lrwxrwxrwx 1 root root 0 Sep 25 22:31 cgroup -> 'cgroup:[4026531835]' | lrwxrwxrwx 1 root root 0 Sep 25 22:31 ipc -> 'ipc:[4026532208]' | lrwxrwxrwx 1 root root 0 Sep 25 22:31 mnt -> 'mnt:[4026531840]' | lrwxrwxrwx 1 root root 0 Sep 25 22:31 net -> 'net:[4026532210]' | lrwxrwxrwx 1 root root 0 Sep 25 22:31 pid -> 'pid:[4026531836]' | lrwxrwxrwx 1 root root 0 Sep 25 22:31 user -> 'user:[4026531837]' | lrwxrwxrwx 1 root root 0 Sep 25 22:31 uts -> 'uts:[4026531838]' causes the proc connector to generate the following events: | fork: ppid=691 pid=808 | exec: pid=808 | ns: pid=808 reason=unshare count=2 | type=ipc 4026531839 -> 4026532208 | type=net 4026531957 -> 4026532210 | fork: ppid=808 pid=809 | exec: pid=809 | exit: pid=809 | exit: pid=808 Signed-off-by: Alban Crequy --- drivers/connector/cn_proc.c | 138 +++ include/linux/cn_proc.h | 25 include/uapi/linux/cn_proc.h | 23 +++- kernel/fork.c| 10 kernel/nsproxy.c | 6 ++ 5 files changed, 201 insertions(+), 1 deletion(-) diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c index a782ce8..c38733d 100644 --- a/drivers/connector/cn_proc.c +++ b/drivers/connector/cn_proc.c @@ -30,8 +30,13 @@ #include #include #include +#include +#include +#include +#include #include +#include /* * Size of a cn_msg followed by a proc_event structure. Since the @@ -296,6 +301,139 @@ void proc_exit_connector(struct task_struct *task) send_msg(msg); } +void proc_ns_connector_prepare(struct ns_event_prepare *prepare, u16 reason) +{ + struct nsproxy *ns = current->nsproxy; + struct ns_common *mntns; + + prepare->num_listeners = atomic_read(&proc_event_num_listeners); + + if (prepare->num_listeners < 1) + return; + + prepare->reason = reason; + + prepare->user_inum = current->cred->user_ns->ns.inum; + prepare->uts_inum = ns->uts_ns->ns.inum; + prepare->ipc_inum = ns->ipc_ns->ns.inum; + + mntns = mntns_operations.get(current); + if (mntns) { + prepare->mnt_inum = mntns->inum; + mntns_operations.put(mntns); + } else + prepare->mnt_inum = 0; + + prepare->pid_inum = ns->pid_ns_for_children->ns.inum; + prepare->net_inum = ns->net_ns->ns.inum; + prepare->cgroup_inum = ns->cgroup_ns->ns.inum; +} + +void proc_ns_connector_send(struct ns_event_prepare *prepare, struct task_struct *task) +{ + struct nsproxy *ns = task->nsproxy; + struct ns_common *mntns; + struct cn_msg *msg; + struct proc_event *ev; + __u8 buffer[CN_PROC_MSG_SIZE] __aligned(8); + int count; + + if (prepare->num_listeners < 1) + return; + + if (atomic_read(&proc_event_num_listeners) < 1) + return; + + msg = buffer_to_cn_msg(buffer); + ev = (struct proc_event *)msg->data; + memset(&ev->event_data, 0, sizeof(ev->event_data)); + ev->timestamp_ns = ktime_get_ns(); + ev->what = PROC_EVENT_NS; + + ev->event_data.ns.process_pid = task->pid; + ev->event_data.ns.process_tgid = task->tgid; + ev->event_data.ns.reason = prepare->reason; + count = 0; + + /* user */ + if (prepare->user_inum != task->cred->user_ns->ns.inum) { + ev->event_data.ns.items[count].type = CLONE_NEWUSER; + ev->event_data.ns.items[count].flags = 0; + ev->event_data.ns.items[count].old_inum = prepare->user_inum; + ev->event_data.ns.items[count].inum = task->cred->user_ns->ns.inum; + count++; + } + + /* uts */ + if (prepare->uts_inum != ns->uts_ns->ns.inum) { + ev->event_data.ns.items[count].type = CLONE_NEWUTS; + ev->event_data.ns.items[count].flags = 0; + ev->event_data.ns.items[count].old_in
[RFC v2 0/2] proc connector: get namespace events
This is v2 of the patch set to add namespace events in the proc connector. The act of a process creating or joining a namespace via clone(), unshare() or setns() is a useful signal for monitoring applications. I am working on a monitoring application that keeps track of all the containers and all processes inside each container. The current way of doing it is by polling regularly in /proc for the list of processes and in /proc/*/ns/* to know which namespaces they belong to. This is inefficient on systems with a large number of containers and a large number of processes. Instead, I would inspect /proc only one time and get the updates with the proc connector. Unfortunately, the proc connector gives me the list of processes but does not notify me when a process changes namespaces. So I would still need to inspect /proc/*/ns/*. (1) Add namespace events for processes. It generates a namespace event each time a process changes namespace via clone(), unshare() or setns(). (2) Add a way for userspace to detect if proc connector is able to send namespace events. Changes since RFC-v1: https://lkml.org/lkml/2016/9/8/588 * Supports userns. * The reason field says exactly whether it is clone/setns/unshare. * Sends aggregated messages containing details of several namespaces changes. Suggested by Evgeniy Polyakov. * Add patch 2 to detect if proc connector is able to send namespace events. This patch set is available in the git repository at: https://github.com/kinvolk/linux.git alban/proc_ns_connector-v2-5 Alban Crequy (2): proc connector: add namespace events proc connector: add a "get feature" op drivers/connector/cn_proc.c | 163 --- include/linux/cn_proc.h | 25 +++ include/uapi/linux/cn_proc.h | 27 ++- kernel/fork.c| 10 +++ kernel/nsproxy.c | 6 ++ 5 files changed, 220 insertions(+), 11 deletions(-) -- 2.7.4
linux/atm_zatm.h not really usable in userspace since cf00713a655d3019be7faa184402f16c43a0fed3
It is no longer possible to include + userspace headers using time, for example , this broke for example the build of linux-atm. Reproducer: $ cat test.c #include #include $ gcc -c test.c In file included from /usr/include/sys/select.h:43:0, from /usr/include/sys/types.h:219, from /usr/include/stdlib.h:314, from test.c:2: /usr/include/time.h:120:8: error: redefinition of 'struct timespec' struct timespec ^ In file included from /usr/include/linux/atm_zatm.h:17:0, from test.c:1: /usr/include/linux/time.h:9:8: note: originally defined here struct timespec { ^ In file included from /usr/include/sys/select.h:45:0, from /usr/include/sys/types.h:219, from /usr/include/stdlib.h:314, from test.c:2: /usr/include/bits/time.h:30:8: error: redefinition of 'struct timeval' struct timeval ^ In file included from /usr/include/linux/atm_zatm.h:17:0, from test.c:1: /usr/include/linux/time.h:15:8: note: originally defined here struct timeval { ^
userspace build broken by include changes
rp-pppoe plugin of ppp no longer builds: In file included from pppoe.h:87:0, from plugin.c:29: /usr/include/linux/in.h:28:3: error: redeclaration of enumerator 'IPPROTO_IP' IPPROTO_IP = 0, /* Dummy protocol for TCP */ ^ /usr/include/netinet/in.h:42:5: note: previous definition of 'IPPROTO_IP' was here IPPROTO_IP = 0,/* Dummy protocol for TCP. */ Short reproducer: #include #include #include #include #include #include Full log: http://pkgsubmit.mageia.org/autobuild/cauldron/x86_64/core/2016-10-12/ppp-2.4.7-8.mga6.src.rpm/build.0.20161012185227.log Moving the include of linux/if.h after netinet/in.h fixes it. I guess the breakage is caused by http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/include/uapi/linux?id=eafe92114308acf14e45c6c3d154a5dad5523d1a but the commit doesn't look wrong to me. This is indeed enough to cause the error: #include #include #include
Re: Need help with mdiobus_register and phy
On October 14, 2016 7:25:14 PM CEST, Andrew Lunn wrote: >> So after calling BMCR_PDOWN, the PHYSID1 and PHYSID2 registers are >> no longer readable. Is that expected? > >You are making two changes here. Is it the SGMII power down which is >causing the id registers to return 0x, or the BMCR_PDOWN. I would be curious to know about that as well. > >The generic suspend code sets the PDOWN bit, so it is assuming the PHY >will respond afterwards. After reading the spec again, it does not appear to me that a PHY with PDOWN set is guaranteed or even required to respond to other register reads such as MII_PHYID1/2, in which case we may have to implement a MDIO bus reset routine which clears PDOWN for all PHYs that we detect(ed), or as Andrew suggested, utilize the matching by compatible string with the PHY OUI in it. -- Florian
Re: net/sctp: BUG: KASAN: stack-out-of-bounds in memcmp
Hello Xin Long, On 2016/10/14 19:13, Xin Long wrote: > On Sat, Aug 20, 2016 at 3:51 PM, Baozeng Ding wrote: >> Hello all, >> The following program triggers stack-out-of-bounds in memcmp. The kernel >> version is 4.8.0-rc1+ (on Aug 13 commit >> 118253a593bd1c57de2d1193df1ccffe1abe745b). Thanks. > ... >> >> #define _GNU_SOURCE >> #include >> #include >> #include >> #include >> #include >> #include >> #include >> #include >> >> int main() >> { >> int fd; >> mmap((void *)0x2000ul, 0xff2000ul, 0x3ul, 0x32ul, -1, 0x0ul); >> fd = socket(AF_INET6, SOCK_STREAM, IPPROTO_SCTP); >> memcpy((void*)0x20f82f80, >> "\x0a\x00\xab\x12\x72\xd4\x19\x9a\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x85\xda\x00\xa0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00", >> 128); >> bind(fd, (struct sockaddr*)0x20f82f80ul, 0x80ul); >> *(uint64_t*)0x202e1fc8 = (uint64_t)0x20f77f80; >> *(uint32_t*)0x202e1fd0 = (uint32_t)0x80; >> *(uint64_t*)0x202e1fd8 = (uint64_t)0x20f7dfe0; >> *(uint64_t*)0x202e1fe0 = (uint64_t)0x2; >> *(uint64_t*)0x202e1fe8 = (uint64_t)0x20f77000; >> *(uint64_t*)0x202e1ff0 = (uint64_t)0x3; >> *(uint32_t*)0x202e1ff8 = (uint32_t)0x80; >> memcpy((void*)0x20f77f80, >> "\x0a\x00\xab\x12\xb0\xb3\x20\x7b\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\xc2\xc2\x0b\xb2\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00", >> 128); >> *(uint64_t*)0x20f7dfe0 = (uint64_t)0x20f77fc5; >> *(uint64_t*)0x20f7dfe8 = (uint64_t)0x3b; >> *(uint64_t*)0x20f7dff0 = (uint64_t)0x20f77fac; >> *(uint64_t*)0x20f7dff8 = (uint64_t)0x54; >> memcpy((void*)0x20f77fc5, >> "\xa5\x7d\xf3\xc4\xfe\xd3\xfd\x44\x63\x00\x8c\x1e\x4c\x2e\x8d\x8d\x9a\x9c\x9c\x9d\x5b\x7c\xe1\x06\xf7\x15\x16\xed\x68\xd1\xfc\xf4\xa4\x3a\xe4\x69\x51\x16\x74\xf4\x1a\xcf\x0e\x99\xc3\xa3\x87\xe7\x81\x6c\x10\x78\x75\x17\x69\x9d\x11\x0c\xc7", >> 59); >> memcpy((void*)0x20f77fac, >> "\x86\x08\x89\x3c\xf3\x58\xea\xe7\x64\x6a\xfb\xb5\xe8\xdd\x5f\x69\xa5\xd4\xdc\xd9\xe7\x71\x95\x07\x78\x7b\x21\xda\x43\x9c\x62\x4d\xca\x64\xb5\x6e\x96\x55\xe9\x58\x76\x66\x1d\xb9\x7b\xe6\x20\xc1\xa9\xed\x70\xc1\x2b\x7c\x86\x8c\xba\x28\xb3\x2c\xb9\x64\xb7\x84\x65\x0d\x7f\xa6\x98\x6f\x49\xcb\x35\xad\x5a\xdf\x13\x75\x99\x57\x7e\xbb\x38\x89", >> 84); >> *(uint64_t*)0x20f77000 = (uint64_t)0x15; >> *(uint32_t*)0x20f77008 = (uint32_t)0x1; >> *(uint32_t*)0x20f7700c = (uint32_t)0xfffe; >> *(uint8_t*)0x20f77010 = (uint8_t)0xbb; >> *(uint8_t*)0x20f77011 = (uint8_t)0x2; >> *(uint8_t*)0x20f77012 = (uint8_t)0x5; >> *(uint8_t*)0x20f77013 = (uint8_t)0x2; >> *(uint8_t*)0x20f77014 = (uint8_t)0x8000; >> *(uint64_t*)0x20f77015 = (uint64_t)0x10; >> *(uint32_t*)0x20f7701d = (uint32_t)0x; >> *(uint32_t*)0x20f77021 = (uint32_t)0x1; >> *(uint64_t*)0x20f77025 = (uint64_t)0x13; >> *(uint32_t*)0x20f7702d = (uint32_t)0x6; >> *(uint32_t*)0x20f77031 = (uint32_t)0xfe00; >> *(uint8_t*)0x20f77035 = (uint8_t)0x8000; >> *(uint8_t*)0x20f77036 = (uint8_t)0xfff8; >> sendmmsg(fd, (struct mmsghdr *)0x202e1fc8ul, 0x1ul, 0x1ul); >> return 0; >> } >> > Hi, Baozeng, I couldn't reproduce this issue with this script, > even in 118253a593bd1c57de2d1193df1ccffe1abe745b > do I need to do some extra config for this ? > You need config KASAN. CONFIG_HAVE_ARCH_KASAN=y CONFIG_KASAN=y CONFIG_KASAN_INLINE=y CONFIG_KASAN_SHADOW_OFFSET=0xdc00 I justed tested with b67be92feb486f800d80d72c67fd87b47b79b18e(Octor 12), it sitll exits. If you still cannot reproduce it, i will send the .config to you privately. Thanks.