Re: linux-next: build failure after merge of the asm-generic tree
> I'm not sending a pull request for this if it breaks any architectures, > so I think we need to fix them all, and I suppose we also have to > change all architectures in the same patch that changes the architecture > independent declaration, so it doesn't break intermittently. > > At this point, I'd probably drop your patches from my asm-generic > tree entirely to avoid the regression, and wait for you to resend > them all after 5.1-rc1, for inclusion in 5.2. > > Can you elaborate on the original problem that you saw? Maybe > we can have a different workaround for it in the meantime. Agree, these two patches should be dropped: 1. iomap: add missing const to ioread*/iowrite addr arg since there are incompatible definitions or declarations in arch/alpha/include/asm/core_apecs.h arch/alpha/include/asm/core_cia.h arch/alpha/include/asm/core_lca.h arch/alpha/include/asm/core_marvel.h arch/alpha/include/asm/core_mcpcia.h arch/alpha/include/asm/core_t2.h arch/alpha/include/asm/io.h arch/alpha/include/asm/io_trivial.h arch/alpha/include/asm/jensen.h arch/alpha/include/asm/machvec.h arch/alpha/kernel/core_marvel.c arch/alpha/kernel/io.c arch/parisc/lib/iomap.c arch/powerpc/kernel/iomap.c arch/sh/kernel/iomap.c 2. lib/iomap: add missing const to mmio_ins* addr arg since there are incompatible definitions or declarations in arch/sh/kernel/iomap.c I will resubmit them with all changes required for arch/. The ones for alpha, powerpc and sh are already ready and built with cross compiler. I still have to setup a cross compiler to build my parisc changes. Thanks, regards, Hugo -- Hugo Lefeuvre (hle)|www.owl.eu.com RSA4096_ 360B 03B3 BF27 4F4D 7A3F D5E8 14AA 1EB8 A247 3DFD ed25519_ 37B2 6D38 0B25 B8A2 6B9F 3A65 A36F 5357 5F2D DC4C
[PATCH] powerpc: iomap: add missing const to ioread addr argument
ioread*() signatures from include/asm-generic/iomap.h and lib/iomap.c have been recently modified to add the missing const qualifier to the addr parameter. This broke the powerpc build which still had ioread*() definitions without the const (conflicting types). Add const qualifier to ioread*() definitions from arch/powerpc/kernel/iomap.c. Signed-off-by: Hugo Lefeuvre --- arch/powerpc/kernel/iomap.c | 28 ++-- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/kernel/iomap.c b/arch/powerpc/kernel/iomap.c index 5ac84efc6ede..9fe4fb3b08aa 100644 --- a/arch/powerpc/kernel/iomap.c +++ b/arch/powerpc/kernel/iomap.c @@ -15,23 +15,23 @@ * Here comes the ppc64 implementation of the IOMAP * interfaces. */ -unsigned int ioread8(void __iomem *addr) +unsigned int ioread8(const void __iomem *addr) { return readb(addr); } -unsigned int ioread16(void __iomem *addr) +unsigned int ioread16(const void __iomem *addr) { return readw(addr); } -unsigned int ioread16be(void __iomem *addr) +unsigned int ioread16be(const void __iomem *addr) { return readw_be(addr); } -unsigned int ioread32(void __iomem *addr) +unsigned int ioread32(const void __iomem *addr) { return readl(addr); } -unsigned int ioread32be(void __iomem *addr) +unsigned int ioread32be(const void __iomem *addr) { return readl_be(addr); } @@ -41,27 +41,27 @@ EXPORT_SYMBOL(ioread16be); EXPORT_SYMBOL(ioread32); EXPORT_SYMBOL(ioread32be); #ifdef __powerpc64__ -u64 ioread64(void __iomem *addr) +u64 ioread64(const void __iomem *addr) { return readq(addr); } -u64 ioread64_lo_hi(void __iomem *addr) +u64 ioread64_lo_hi(const void __iomem *addr) { return readq(addr); } -u64 ioread64_hi_lo(void __iomem *addr) +u64 ioread64_hi_lo(const void __iomem *addr) { return readq(addr); } -u64 ioread64be(void __iomem *addr) +u64 ioread64be(const void __iomem *addr) { return readq_be(addr); } -u64 ioread64be_lo_hi(void __iomem *addr) +u64 ioread64be_lo_hi(const void __iomem *addr) { return readq_be(addr); } -u64 ioread64be_hi_lo(void __iomem *addr) +u64 ioread64be_hi_lo(const void __iomem *addr) { return readq_be(addr); } @@ -139,15 +139,15 @@ EXPORT_SYMBOL(iowrite64be_hi_lo); * FIXME! We could make these do EEH handling if we really * wanted. Not clear if we do. */ -void ioread8_rep(void __iomem *addr, void *dst, unsigned long count) +void ioread8_rep(const void __iomem *addr, void *dst, unsigned long count) { readsb(addr, dst, count); } -void ioread16_rep(void __iomem *addr, void *dst, unsigned long count) +void ioread16_rep(const void __iomem *addr, void *dst, unsigned long count) { readsw(addr, dst, count); } -void ioread32_rep(void __iomem *addr, void *dst, unsigned long count) +void ioread32_rep(const void __iomem *addr, void *dst, unsigned long count) { readsl(addr, dst, count); } -- 2.20.1
Re: linux-next: build failure after merge of the asm-generic tree
Hi Stephen, Arnd, > After merging the asm-generic tree, today's linux-next build (powerpc > allnoconfig) failed like this: > ... > Caused by commit > > 8e074c243ed3 ("iomap: add missing const to ioread*/iowrite addr arg") I have prepared a patch addressing this issue, and am currently building it with a powerpc cross compiler. I'll submit it once it's done. Please tell me if I should update the initial patch instead. > The const qualifiers are also missing in: > > arch/parisc/lib/iomap.c > arch/sh/kernel/iomap.c I also have patches for these architectures, but it is more complicated for me to test them with a cross compiler. Should I still submit them? > I have reverted that commit for today. > > BTW, that commit only added the const to the ioread* functions ... Yes, and it does not even make sense to add a const qualifier to the addr argument in the iowrite case. That was a bad commit message. Thanks, and sorry for the trouble. regards, Hugo -- Hugo Lefeuvre (hle)|www.owl.eu.com RSA4096_ 360B 03B3 BF27 4F4D 7A3F D5E8 14AA 1EB8 A247 3DFD ed25519_ 37B2 6D38 0B25 B8A2 6B9F 3A65 A36F 5357 5F2D DC4C
Re: [PATCH 0/4] iomap: fix multiple consistency issues, interface cleanup
Hi, > Nice cleanup! Applied to my asm-generic tree now. Thanks! > I notice that you did not add a 'volatile' qualifier in addition to 'const'. > Is that something you considered doing? I think the traditional > readl/writel style accessors have them partly for historic reasons, > and partly to simply the trivial implementation that is based on > a volatile pointer dereference. Neither is needed here, and I don't > think we have any drivers that pass volatile pointers into ioread()/iowrite() > (which would cause a warning), so we can probably leave it at that, > but it's still a bit inconsistent. Right, I intentionally omitted the volatile qualifier here. Adding it to the definitions from asm-generic/iomap.h would require to also update lib/iomap.c and arch/ implementations. This should, imo, only be done if really necessary. >From what I've seen, almost all ioread() implementations simply redirect to readl style accessors. I could not find one implementation that does something with the addr argument which requires the volatile qualifier. Also, I'm not even sure to understand why we need the volatile qualifier in the ioread() definitions from asm-generic/io.h... Drivers wishing to pass volatile pointers to ioread() should probably use a more low level api. regards, Hugo -- Hugo Lefeuvre (hle)|www.owl.eu.com RSA4096_ 360B 03B3 BF27 4F4D 7A3F D5E8 14AA 1EB8 A247 3DFD ed25519_ 37B2 6D38 0B25 B8A2 6B9F 3A65 A36F 5357 5F2D DC4C
[PATCH 1/4] iomap: add missing function args identifier names
Add missing function arguments identifier names to asm-generic/iomap.h definitions. This addresses multiple checkpatch.pl code style warnings. Signed-off-by: Hugo Lefeuvre --- include/asm-generic/iomap.h | 32 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/include/asm-generic/iomap.h b/include/asm-generic/iomap.h index 5b63b94ef6b5..7e26f21b466f 100644 --- a/include/asm-generic/iomap.h +++ b/include/asm-generic/iomap.h @@ -26,24 +26,24 @@ * in the low address range. Architectures for which this is not * true can't use this generic implementation. */ -extern unsigned int ioread8(void __iomem *); -extern unsigned int ioread16(void __iomem *); -extern unsigned int ioread16be(void __iomem *); -extern unsigned int ioread32(void __iomem *); -extern unsigned int ioread32be(void __iomem *); +extern unsigned int ioread8(void __iomem *addr); +extern unsigned int ioread16(void __iomem *addr); +extern unsigned int ioread16be(void __iomem *addr); +extern unsigned int ioread32(void __iomem *addr); +extern unsigned int ioread32be(void __iomem *addr); #ifdef CONFIG_64BIT -extern u64 ioread64(void __iomem *); -extern u64 ioread64be(void __iomem *); +extern u64 ioread64(void __iomem *addr); +extern u64 ioread64be(void __iomem *addr); #endif -extern void iowrite8(u8, void __iomem *); -extern void iowrite16(u16, void __iomem *); -extern void iowrite16be(u16, void __iomem *); -extern void iowrite32(u32, void __iomem *); -extern void iowrite32be(u32, void __iomem *); +extern void iowrite8(u8 value, void __iomem *addr); +extern void iowrite16(u16 value, void __iomem *addr); +extern void iowrite16be(u16 value, void __iomem *addr); +extern void iowrite32(u32 value, void __iomem *addr); +extern void iowrite32be(u32 value, void __iomem *addr); #ifdef CONFIG_64BIT -extern void iowrite64(u64, void __iomem *); -extern void iowrite64be(u64, void __iomem *); +extern void iowrite64(u64 value, void __iomem *addr); +extern void iowrite64be(u64 value, void __iomem *addr); #endif /* @@ -68,7 +68,7 @@ extern void iowrite32_rep(void __iomem *port, const void *buf, unsigned long cou #ifdef CONFIG_HAS_IOPORT_MAP /* Create a virtual mapping cookie for an IO port range */ extern void __iomem *ioport_map(unsigned long port, unsigned int nr); -extern void ioport_unmap(void __iomem *); +extern void ioport_unmap(void __iomem *addr); #endif #ifndef ARCH_HAS_IOREMAP_WC @@ -82,7 +82,7 @@ extern void ioport_unmap(void __iomem *); #ifdef CONFIG_PCI /* Destroy a virtual mapping cookie for a PCI BAR (memory or IO) */ struct pci_dev; -extern void pci_iounmap(struct pci_dev *dev, void __iomem *); +extern void pci_iounmap(struct pci_dev *dev, void __iomem *addr); #elif defined(CONFIG_GENERIC_IOMAP) struct pci_dev; static inline void pci_iounmap(struct pci_dev *dev, void __iomem *addr) -- 2.20.1
[PATCH 4/4] lib/iomap: add missing const to mmio_ins* addr arg
mmio_ins* definitions from lib/iomap.c are missing const qualifiers for the addr argument. This results in compilation warnings when compiling drivers. Add missing const qualifiers. Signed-off-by: Hugo Lefeuvre --- lib/iomap.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/lib/iomap.c b/lib/iomap.c index 8cc3270697e9..2bcc5d9d30b1 100644 --- a/lib/iomap.c +++ b/lib/iomap.c @@ -143,7 +143,7 @@ EXPORT_SYMBOL(iowrite32be); * order" (we also don't have IO barriers). */ #ifndef mmio_insb -static inline void mmio_insb(void __iomem *addr, u8 *dst, int count) +static inline void mmio_insb(const void __iomem *addr, u8 *dst, int count) { while (--count >= 0) { u8 data = __raw_readb(addr); @@ -151,7 +151,7 @@ static inline void mmio_insb(void __iomem *addr, u8 *dst, int count) dst++; } } -static inline void mmio_insw(void __iomem *addr, u16 *dst, int count) +static inline void mmio_insw(const void __iomem *addr, u16 *dst, int count) { while (--count >= 0) { u16 data = __raw_readw(addr); @@ -159,7 +159,7 @@ static inline void mmio_insw(void __iomem *addr, u16 *dst, int count) dst++; } } -static inline void mmio_insl(void __iomem *addr, u32 *dst, int count) +static inline void mmio_insl(const void __iomem *addr, u32 *dst, int count) { while (--count >= 0) { u32 data = __raw_readl(addr); -- 2.20.1
[PATCH 2/4] iomap: add missing const to ioread*/iowrite addr arg
ioread* and iowrite* definitions from asm-generic/iomap.h and lib/iomap.c are missing const qualifiers. This is inconsistent with the definitions from asm-generic/io.h and results in compilation warnings when compiling drivers. Add missing const qualifiers. Signed-off-by: Hugo Lefeuvre --- include/asm-generic/iomap.h | 20 ++-- lib/iomap.c | 16 2 files changed, 18 insertions(+), 18 deletions(-) diff --git a/include/asm-generic/iomap.h b/include/asm-generic/iomap.h index 7e26f21b466f..cd92d32a4d87 100644 --- a/include/asm-generic/iomap.h +++ b/include/asm-generic/iomap.h @@ -26,14 +26,14 @@ * in the low address range. Architectures for which this is not * true can't use this generic implementation. */ -extern unsigned int ioread8(void __iomem *addr); -extern unsigned int ioread16(void __iomem *addr); -extern unsigned int ioread16be(void __iomem *addr); -extern unsigned int ioread32(void __iomem *addr); -extern unsigned int ioread32be(void __iomem *addr); +extern unsigned int ioread8(const void __iomem *addr); +extern unsigned int ioread16(const void __iomem *addr); +extern unsigned int ioread16be(const void __iomem *addr); +extern unsigned int ioread32(const void __iomem *addr); +extern unsigned int ioread32be(const void __iomem *addr); #ifdef CONFIG_64BIT -extern u64 ioread64(void __iomem *addr); -extern u64 ioread64be(void __iomem *addr); +extern u64 ioread64(const void __iomem *addr); +extern u64 ioread64be(const void __iomem *addr); #endif extern void iowrite8(u8 value, void __iomem *addr); @@ -57,9 +57,9 @@ extern void iowrite64be(u64 value, void __iomem *addr); * memory across multiple ports, use "memcpy_toio()" * and friends. */ -extern void ioread8_rep(void __iomem *port, void *buf, unsigned long count); -extern void ioread16_rep(void __iomem *port, void *buf, unsigned long count); -extern void ioread32_rep(void __iomem *port, void *buf, unsigned long count); +extern void ioread8_rep(const void __iomem *port, void *buf, unsigned long count); +extern void ioread16_rep(const void __iomem *port, void *buf, unsigned long count); +extern void ioread32_rep(const void __iomem *port, void *buf, unsigned long count); extern void iowrite8_rep(void __iomem *port, const void *buf, unsigned long count); extern void iowrite16_rep(void __iomem *port, const void *buf, unsigned long count); diff --git a/lib/iomap.c b/lib/iomap.c index 541d926da95e..8cc3270697e9 100644 --- a/lib/iomap.c +++ b/lib/iomap.c @@ -69,27 +69,27 @@ static void bad_io_access(unsigned long port, const char *access) #define mmio_read32be(addr) be32_to_cpu(__raw_readl(addr)) #endif -unsigned int ioread8(void __iomem *addr) +unsigned int ioread8(const void __iomem *addr) { IO_COND(addr, return inb(port), return readb(addr)); return 0xff; } -unsigned int ioread16(void __iomem *addr) +unsigned int ioread16(const void __iomem *addr) { IO_COND(addr, return inw(port), return readw(addr)); return 0x; } -unsigned int ioread16be(void __iomem *addr) +unsigned int ioread16be(const void __iomem *addr) { IO_COND(addr, return pio_read16be(port), return mmio_read16be(addr)); return 0x; } -unsigned int ioread32(void __iomem *addr) +unsigned int ioread32(const void __iomem *addr) { IO_COND(addr, return inl(port), return readl(addr)); return 0x; } -unsigned int ioread32be(void __iomem *addr) +unsigned int ioread32be(const void __iomem *addr) { IO_COND(addr, return pio_read32be(port), return mmio_read32be(addr)); return 0x; @@ -193,15 +193,15 @@ static inline void mmio_outsl(void __iomem *addr, const u32 *src, int count) } #endif -void ioread8_rep(void __iomem *addr, void *dst, unsigned long count) +void ioread8_rep(const void __iomem *addr, void *dst, unsigned long count) { IO_COND(addr, insb(port,dst,count), mmio_insb(addr, dst, count)); } -void ioread16_rep(void __iomem *addr, void *dst, unsigned long count) +void ioread16_rep(const void __iomem *addr, void *dst, unsigned long count) { IO_COND(addr, insw(port,dst,count), mmio_insw(addr, dst, count)); } -void ioread32_rep(void __iomem *addr, void *dst, unsigned long count) +void ioread32_rep(const void __iomem *addr, void *dst, unsigned long count) { IO_COND(addr, insl(port,dst,count), mmio_insl(addr, dst, count)); } -- 2.20.1
[PATCH 3/4] io: change io*_rep definitions to take ulong count
ioread*_rep and iowrite*_rep from asm-generic/io.h expect unsigned int count parameter. This is inconsistent with all other definitions in the kernel which take unsigned long count. Change io*_rep definitions to take unsigned long count instead of unsigned int. Signed-off-by: Hugo Lefeuvre --- include/asm-generic/io.h | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h index d356f802945a..345ae6a09bca 100644 --- a/include/asm-generic/io.h +++ b/include/asm-generic/io.h @@ -811,7 +811,7 @@ static inline void iowrite64be(u64 value, volatile void __iomem *addr) #ifndef ioread8_rep #define ioread8_rep ioread8_rep static inline void ioread8_rep(const volatile void __iomem *addr, void *buffer, - unsigned int count) + unsigned long count) { readsb(addr, buffer, count); } @@ -820,7 +820,7 @@ static inline void ioread8_rep(const volatile void __iomem *addr, void *buffer, #ifndef ioread16_rep #define ioread16_rep ioread16_rep static inline void ioread16_rep(const volatile void __iomem *addr, - void *buffer, unsigned int count) + void *buffer, unsigned long count) { readsw(addr, buffer, count); } @@ -829,7 +829,7 @@ static inline void ioread16_rep(const volatile void __iomem *addr, #ifndef ioread32_rep #define ioread32_rep ioread32_rep static inline void ioread32_rep(const volatile void __iomem *addr, - void *buffer, unsigned int count) + void *buffer, unsigned long count) { readsl(addr, buffer, count); } @@ -839,7 +839,7 @@ static inline void ioread32_rep(const volatile void __iomem *addr, #ifndef ioread64_rep #define ioread64_rep ioread64_rep static inline void ioread64_rep(const volatile void __iomem *addr, - void *buffer, unsigned int count) + void *buffer, unsigned long count) { readsq(addr, buffer, count); } @@ -850,7 +850,7 @@ static inline void ioread64_rep(const volatile void __iomem *addr, #define iowrite8_rep iowrite8_rep static inline void iowrite8_rep(volatile void __iomem *addr, const void *buffer, - unsigned int count) + unsigned long count) { writesb(addr, buffer, count); } @@ -860,7 +860,7 @@ static inline void iowrite8_rep(volatile void __iomem *addr, #define iowrite16_rep iowrite16_rep static inline void iowrite16_rep(volatile void __iomem *addr, const void *buffer, -unsigned int count) +unsigned long count) { writesw(addr, buffer, count); } @@ -870,7 +870,7 @@ static inline void iowrite16_rep(volatile void __iomem *addr, #define iowrite32_rep iowrite32_rep static inline void iowrite32_rep(volatile void __iomem *addr, const void *buffer, -unsigned int count) +unsigned long count) { writesl(addr, buffer, count); } @@ -881,7 +881,7 @@ static inline void iowrite32_rep(volatile void __iomem *addr, #define iowrite64_rep iowrite64_rep static inline void iowrite64_rep(volatile void __iomem *addr, const void *buffer, -unsigned int count) +unsigned long count) { writesq(addr, buffer, count); } -- 2.20.1
[PATCH 0/4] iomap: fix multiple consistency issues, interface cleanup
Hi, This patch cleans up the iomap interface. The first patch makes the include/asm-generic/iomap.h header compliant with the kernel style guidelines by adding missing function args identifier names. The second and fourth patches address multiple compilation warnings due to missing const qualifiers in mmio_ins* and ioread*/iowrite definitions in include/asm-generic/iomap.h and lib/iomap.c. The third patch modifies io*_rep definitions from asm-generic/io.h to take unsigned long count parameter instead of unsigned int which is inconsistent with other definitions in the kernel. Hugo Lefeuvre (4): iomap: add missing function args identifier names iomap: add missing const to ioread*/iowrite addr arg io: change io*_rep definitions to take ulong count lib/iomap: add missing const to mmio_ins* addr arg include/asm-generic/io.h| 16 include/asm-generic/iomap.h | 38 ++--- lib/iomap.c | 22 ++--- 3 files changed, 38 insertions(+), 38 deletions(-) -- 2.20.1
Re: void __iomem *addr should be const
Hi, > The const makes perfectly sense and we should have consistent state all > over the place. Thanks, I will submit a patch addressing this issue soon. regards, Hugo -- Hugo Lefeuvre (hle)|www.owl.eu.com RSA4096_ 360B 03B3 BF27 4F4D 7A3F D5E8 14AA 1EB8 A247 3DFD ed25519_ 37B2 6D38 0B25 B8A2 6B9F 3A65 A36F 5357 5F2D DC4C
Re: [PATCH] staging/android: use multiple futex wait queues
Hi, > Have you tested this? I have finally set up a cuttlefish test env and tested both my first patch set[0] and this patch (v2). My first patch set works fine. I have nothing to say about it. > Noticed any performance speedups or slow downs? This patch doesn't work. The boot process goes well. Overall, calls to vsoc_cond_wake are executed slightly faster. However the system freezes at some point. The graphical interface generates many calls to vsoc_cond_wake and wait, so I suspect an issue with the locks. I will investigate this and come back with an updated patch. regards, Hugo [0] https://lore.kernel.org/patchwork/patch/1039712/ -- Hugo Lefeuvre (hle)|www.owl.eu.com RSA4096_ 360B 03B3 BF27 4F4D 7A3F D5E8 14AA 1EB8 A247 3DFD ed25519_ 37B2 6D38 0B25 B8A2 6B9F 3A65 A36F 5357 5F2D DC4C
Re: [PATCH] staging/android: use multiple futex wait queues
> > + list_for_each_entry(it, >futex_wait_queue_list, list) { > > + if (wait_queue->offset == arg->offset) { > ^^ > You meant "it->offset". Right, this is not good. Fixed in v2. Thanks for the feedback! regards, Hugo -- Hugo Lefeuvre (hle)|www.owl.eu.com RSA4096_ 360B 03B3 BF27 4F4D 7A3F D5E8 14AA 1EB8 A247 3DFD ed25519_ 37B2 6D38 0B25 B8A2 6B9F 3A65 A36F 5357 5F2D DC4C
[PATCH v2] staging/android: use multiple futex wait queues
Use multiple per-offset wait queues instead of one big wait queue per region. Signed-off-by: Hugo Lefeuvre --- Changes in v2: - dereference the it pointer instead of wait_queue (which is not set yet) in handle_vsoc_cond_wait() --- drivers/staging/android/TODO | 4 --- drivers/staging/android/vsoc.c | 56 ++ 2 files changed, 44 insertions(+), 16 deletions(-) diff --git a/drivers/staging/android/TODO b/drivers/staging/android/TODO index fbf015cc6d62..cd2af06341dc 100644 --- a/drivers/staging/android/TODO +++ b/drivers/staging/android/TODO @@ -12,10 +12,6 @@ ion/ - Better test framework (integration with VGEM was suggested) vsoc.c, uapi/vsoc_shm.h - - The current driver uses the same wait queue for all of the futexes in a - region. This will cause false wakeups in regions with a large number of - waiting threads. We should eventually use multiple queues and select the - queue based on the region. - Add debugfs support for examining the permissions of regions. - Remove VSOC_WAIT_FOR_INCOMING_INTERRUPT ioctl. This functionality has been superseded by the futex and is there for legacy reasons. diff --git a/drivers/staging/android/vsoc.c b/drivers/staging/android/vsoc.c index f2bb18158e5b..97303d9173aa 100644 --- a/drivers/staging/android/vsoc.c +++ b/drivers/staging/android/vsoc.c @@ -31,6 +31,7 @@ #include #include #include +#include #include "uapi/vsoc_shm.h" #define VSOC_DEV_NAME "vsoc" @@ -62,11 +63,18 @@ static const int MAX_REGISTER_BAR_LEN = 0x100; */ static const int SHARED_MEMORY_BAR = 2; +struct vsoc_futex_wait_queue_t { + struct list_head list; + u32 offset; + wait_queue_head_t queue; +}; + struct vsoc_region_data { char name[VSOC_DEVICE_NAME_SZ + 1]; wait_queue_head_t interrupt_wait_queue; - /* TODO(b/73664181): Use multiple futex wait queues */ - wait_queue_head_t futex_wait_queue; + /* Per-offset futex wait queue. */ + struct list_head futex_wait_queue_list; + spinlock_t futex_wait_queue_lock; /* Flag indicating that an interrupt has been signalled by the host. */ atomic_t *incoming_signalled; /* Flag indicating the guest has signalled the host. */ @@ -402,6 +410,7 @@ static int handle_vsoc_cond_wait(struct file *filp, struct vsoc_cond_wait *arg) struct vsoc_region_data *data = vsoc_dev.regions_data + region_number; int ret = 0; struct vsoc_device_region *region_p = vsoc_region_from_filep(filp); + struct vsoc_futex_wait_queue_t *it, *wait_queue = NULL; atomic_t *address = NULL; ktime_t wake_time; @@ -415,10 +424,27 @@ static int handle_vsoc_cond_wait(struct file *filp, struct vsoc_cond_wait *arg) address = shm_off_to_virtual_addr(region_p->region_begin_offset + arg->offset); + /* Find wait queue corresponding to offset or create it */ + spin_lock(>futex_wait_queue_lock); + list_for_each_entry(it, >futex_wait_queue_list, list) { + if (it->offset == arg->offset) { + wait_queue = it; + break; + } + } + + if (!wait_queue) { + wait_queue = kmalloc(sizeof(*wait_queue), GFP_KERNEL); + wait_queue->offset = arg->offset; + init_waitqueue_head(_queue->queue); + list_add(_queue->list, >futex_wait_queue_list); + } + spin_unlock(>futex_wait_queue_lock); + /* Ensure that the type of wait is valid */ switch (arg->wait_type) { case VSOC_WAIT_IF_EQUAL: - ret = wait_event_freezable(data->futex_wait_queue, + ret = wait_event_freezable(wait_queue->queue, arg->wakes++ && atomic_read(address) != arg->value); break; @@ -426,7 +452,7 @@ static int handle_vsoc_cond_wait(struct file *filp, struct vsoc_cond_wait *arg) if (arg->wake_time_nsec >= NSEC_PER_SEC) return -EINVAL; wake_time = ktime_set(arg->wake_time_sec, arg->wake_time_nsec); - ret = wait_event_freezable_hrtimeout(data->futex_wait_queue, + ret = wait_event_freezable_hrtimeout(wait_queue->queue, arg->wakes++ && atomic_read(address) != arg->value, wake_time); @@ -463,6 +489,7 @@ static int do_vsoc_cond_wake(struct file *filp, uint32_t offset) struct vsoc_device_region *region_p = vsoc_region_from_filep(filp); u32 region_number = iminor(file_inode(filp)); struct vsoc_region_data *data = vsoc_dev.regi
Re: [PATCH] tty/nozomi: use pci_iomap instead of ioremap_nocache
> I will provide a patch if somebody is available to test it. FTR, I have found test devices and should be able to run some tests by the end of next week. regards, Hugo -- Hugo Lefeuvre (hle)|www.owl.eu.com RSA4096_ 360B 03B3 BF27 4F4D 7A3F D5E8 14AA 1EB8 A247 3DFD ed25519_ 37B2 6D38 0B25 B8A2 6B9F 3A65 A36F 5357 5F2D DC4C
Re: [PATCH] staging/android: use multiple futex wait queues
> > Use multiple per-offset wait queues instead of one big wait queue per > > region. > > > > Signed-off-by: Hugo Lefeuvre > > Have you tested this? > > Noticed any performance speedups or slow downs? Not yet. I have started to set up a cuttlefish test environment but it might take some time until I am able to fully test these changes myself (having some trouble to find documentation)... I will post the results here when ready. thanks! regards, Hugo -- Hugo Lefeuvre (hle)|www.owl.eu.com RSA4096_ 360B 03B3 BF27 4F4D 7A3F D5E8 14AA 1EB8 A247 3DFD ed25519_ 37B2 6D38 0B25 B8A2 6B9F 3A65 A36F 5357 5F2D DC4C
[PATCH] staging/android: use multiple futex wait queues
Use multiple per-offset wait queues instead of one big wait queue per region. Signed-off-by: Hugo Lefeuvre --- This patch is based on the simplify handle_vsoc_cond_wait patchset, currently under review: https://lkml.org/lkml/2019/2/7/870 --- drivers/staging/android/TODO | 4 --- drivers/staging/android/vsoc.c | 56 ++ 2 files changed, 44 insertions(+), 16 deletions(-) diff --git a/drivers/staging/android/TODO b/drivers/staging/android/TODO index fbf015cc6d62..cd2af06341dc 100644 --- a/drivers/staging/android/TODO +++ b/drivers/staging/android/TODO @@ -12,10 +12,6 @@ ion/ - Better test framework (integration with VGEM was suggested) vsoc.c, uapi/vsoc_shm.h - - The current driver uses the same wait queue for all of the futexes in a - region. This will cause false wakeups in regions with a large number of - waiting threads. We should eventually use multiple queues and select the - queue based on the region. - Add debugfs support for examining the permissions of regions. - Remove VSOC_WAIT_FOR_INCOMING_INTERRUPT ioctl. This functionality has been superseded by the futex and is there for legacy reasons. diff --git a/drivers/staging/android/vsoc.c b/drivers/staging/android/vsoc.c index f2bb18158e5b..55b0ee03e7b8 100644 --- a/drivers/staging/android/vsoc.c +++ b/drivers/staging/android/vsoc.c @@ -31,6 +31,7 @@ #include #include #include +#include #include "uapi/vsoc_shm.h" #define VSOC_DEV_NAME "vsoc" @@ -62,11 +63,18 @@ static const int MAX_REGISTER_BAR_LEN = 0x100; */ static const int SHARED_MEMORY_BAR = 2; +struct vsoc_futex_wait_queue_t { + struct list_head list; + u32 offset; + wait_queue_head_t queue; +}; + struct vsoc_region_data { char name[VSOC_DEVICE_NAME_SZ + 1]; wait_queue_head_t interrupt_wait_queue; - /* TODO(b/73664181): Use multiple futex wait queues */ - wait_queue_head_t futex_wait_queue; + /* Per-offset futex wait queue. */ + struct list_head futex_wait_queue_list; + spinlock_t futex_wait_queue_lock; /* Flag indicating that an interrupt has been signalled by the host. */ atomic_t *incoming_signalled; /* Flag indicating the guest has signalled the host. */ @@ -402,6 +410,7 @@ static int handle_vsoc_cond_wait(struct file *filp, struct vsoc_cond_wait *arg) struct vsoc_region_data *data = vsoc_dev.regions_data + region_number; int ret = 0; struct vsoc_device_region *region_p = vsoc_region_from_filep(filp); + struct vsoc_futex_wait_queue_t *it, *wait_queue = NULL; atomic_t *address = NULL; ktime_t wake_time; @@ -415,10 +424,27 @@ static int handle_vsoc_cond_wait(struct file *filp, struct vsoc_cond_wait *arg) address = shm_off_to_virtual_addr(region_p->region_begin_offset + arg->offset); + /* Find wait queue corresponding to offset or create it */ + spin_lock(>futex_wait_queue_lock); + list_for_each_entry(it, >futex_wait_queue_list, list) { + if (wait_queue->offset == arg->offset) { + wait_queue = it; + break; + } + } + + if (!wait_queue) { + wait_queue = kmalloc(sizeof(*wait_queue), GFP_KERNEL); + wait_queue->offset = arg->offset; + init_waitqueue_head(_queue->queue); + list_add(_queue->list, >futex_wait_queue_list); + } + spin_unlock(>futex_wait_queue_lock); + /* Ensure that the type of wait is valid */ switch (arg->wait_type) { case VSOC_WAIT_IF_EQUAL: - ret = wait_event_freezable(data->futex_wait_queue, + ret = wait_event_freezable(wait_queue->queue, arg->wakes++ && atomic_read(address) != arg->value); break; @@ -426,7 +452,7 @@ static int handle_vsoc_cond_wait(struct file *filp, struct vsoc_cond_wait *arg) if (arg->wake_time_nsec >= NSEC_PER_SEC) return -EINVAL; wake_time = ktime_set(arg->wake_time_sec, arg->wake_time_nsec); - ret = wait_event_freezable_hrtimeout(data->futex_wait_queue, + ret = wait_event_freezable_hrtimeout(wait_queue->queue, arg->wakes++ && atomic_read(address) != arg->value, wake_time); @@ -463,6 +489,7 @@ static int do_vsoc_cond_wake(struct file *filp, uint32_t offset) struct vsoc_device_region *region_p = vsoc_region_from_filep(filp); u32 region_number = iminor(file_inode(filp)); struct vsoc_region_data *data
Re: [PATCH] tty/nozomi: use pci_iomap instead of ioremap_nocache
(cc-ing Paul Hardwick, mentioned as maintainer of the driver) Hi, > > there's still something unclear to me about dc->card_type being used as > > size argument to ioremap_nocache(). > > > > dc->base_addr is the sum of all six io region lengths, not the size of > > region 0 which we are trying to map here. Why not using the size of region > > 0 instead ? > > No idea, that might just be how the card is layed out. > > > If the goal is to map all six regions "at once", I'm not sure how this is > > supposed to work. Is there any kind of guarantee that all six regions will > > be adjacent? > > For some reason, it must happen that way, otherwise the driver would not > work very well :) > > > If this is a bug then this patch "somehow" already adresses it since > > pci_iomap calls pci_resource_len itself. Otherwise this patch is broken. > > Let's apply it and see if anyone screams... Hum, it looks very much like the intention here was to map all bars at once. The offsets corresponding to the downlink, uplink, etc. regions are retrieved as part of the config table by nozomi_read_config_table(). Unfortunately I don't own test devices, so I will not be able to verify it myself. This is easy to test, though: if I am right, this patch breaks the driver. I guess the right way to map all bars in a single buffer would look like what the hifn_795x driver does[0]. I will provide a patch if somebody is available to test it. regards, Hugo [0] https://elixir.bootlin.com/linux/latest/source/drivers/crypto/hifn_795x.c#L2504 -- Hugo Lefeuvre (hle)|www.owl.eu.com RSA4096_ 360B 03B3 BF27 4F4D 7A3F D5E8 14AA 1EB8 A247 3DFD ed25519_ 37B2 6D38 0B25 B8A2 6B9F 3A65 A36F 5357 5F2D DC4C signature.asc Description: PGP signature
[tip:sched/core] sched/wait: Use freezable_schedule() when possible
Commit-ID: 2b9c2a4859ad5ac7b5a28e9db28c3e618760fe8c Gitweb: https://git.kernel.org/tip/2b9c2a4859ad5ac7b5a28e9db28c3e618760fe8c Author: Hugo Lefeuvre AuthorDate: Thu, 7 Feb 2019 21:03:52 +0100 Committer: Ingo Molnar CommitDate: Mon, 11 Feb 2019 08:34:04 +0100 sched/wait: Use freezable_schedule() when possible Replace 'schedule(); try_to_freeze();' with a call to freezable_schedule(). Tasks calling freezable_schedule() set the PF_FREEZER_SKIP flag before calling schedule(). Unlike tasks calling schedule(); try_to_freeze() tasks calling freezable_schedule() are not awaken by try_to_freeze_tasks(). Instead they call try_to_freeze() when they wake up if the freeze is still underway. It is not a problem since sleeping tasks can't do anything which isn't allowed for a frozen task while sleeping. The result is a potential performance gain during freeze, since less tasks have to be awaken. For instance on a bare Debian vm running a 4.19 stable kernel, the number of tasks skipped in freeze_task() went up from 12 without the patch to 32 with the patch (out of 448), an increase of > x2.5. Signed-off-by: Hugo Lefeuvre Reviewed-by: Joel Fernandes (Google) Cc: Joel Fernandes Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Rafael J. Wysocki Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/20190207200352.ga27...@behemoth.owl.eu.com.local Signed-off-by: Ingo Molnar --- include/linux/wait.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/include/linux/wait.h b/include/linux/wait.h index ed7c122cb31f..5f3efabc36f4 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -308,7 +308,7 @@ do { \ #define __wait_event_freezable(wq_head, condition) \ ___wait_event(wq_head, condition, TASK_INTERRUPTIBLE, 0, 0, \ - schedule(); try_to_freeze()) + freezable_schedule()) /** * wait_event_freezable - sleep (or freeze) until a condition gets true @@ -367,7 +367,7 @@ do { \ #define __wait_event_freezable_timeout(wq_head, condition, timeout) \ ___wait_event(wq_head, ___wait_cond_timeout(condition), \ TASK_INTERRUPTIBLE, 0, timeout, \ - __ret = schedule_timeout(__ret); try_to_freeze()) + __ret = freezable_schedule_timeout(__ret)) /* * like wait_event_timeout() -- except it uses TASK_INTERRUPTIBLE to avoid @@ -588,7 +588,7 @@ do { \ #define __wait_event_freezable_exclusive(wq, condition) \ ___wait_event(wq, condition, TASK_INTERRUPTIBLE, 1, 0, \ - schedule(); try_to_freeze()) + freezable_schedule()) #define wait_event_freezable_exclusive(wq, condition) \ ({ \
Re: [PATCH] tty/nozomi: use pci_iomap instead of ioremap_nocache
> Use pci_iomap instead of ioremap_nocache in nozomi_card_init(). This > is a cleaner way to do PCI MMIO (performs additional checks) and > allows to drop the manual call to pci_resource_start. > > pci_iomap relies on ioremap for MMIO and thus has uncached behavior. there's still something unclear to me about dc->card_type being used as size argument to ioremap_nocache(). dc->base_addr is the sum of all six io region lengths, not the size of region 0 which we are trying to map here. Why not using the size of region 0 instead ? If the goal is to map all six regions "at once", I'm not sure how this is supposed to work. Is there any kind of guarantee that all six regions will be adjacent? If this is a bug then this patch "somehow" already adresses it since pci_iomap calls pci_resource_len itself. Otherwise this patch is broken. -- Hugo Lefeuvre (hle)|www.owl.eu.com RSA4096_ 360B 03B3 BF27 4F4D 7A3F D5E8 14AA 1EB8 A247 3DFD ed25519_ 37B2 6D38 0B25 B8A2 6B9F 3A65 A36F 5357 5F2D DC4C signature.asc Description: PGP signature
[PATCH] tty/nozomi: use pci_iomap instead of ioremap_nocache
Use pci_iomap instead of ioremap_nocache in nozomi_card_init(). This is a cleaner way to do PCI MMIO (performs additional checks) and allows to drop the manual call to pci_resource_start. pci_iomap relies on ioremap for MMIO and thus has uncached behavior. Signed-off-by: Hugo Lefeuvre --- drivers/tty/nozomi.c | 10 +- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/drivers/tty/nozomi.c b/drivers/tty/nozomi.c index fed820e9ab9d..3214e22e79f3 100644 --- a/drivers/tty/nozomi.c +++ b/drivers/tty/nozomi.c @@ -1317,7 +1317,6 @@ static void remove_sysfs_files(struct nozomi *dc) static int nozomi_card_init(struct pci_dev *pdev, const struct pci_device_id *ent) { - resource_size_t start; int ret; struct nozomi *dc = NULL; int ndev_idx; @@ -1357,17 +1356,10 @@ static int nozomi_card_init(struct pci_dev *pdev, goto err_disable_device; } - start = pci_resource_start(dc->pdev, 0); - if (start == 0) { - dev_err(>dev, "No I/O address for card detected\n"); - ret = -ENODEV; - goto err_rel_regs; - } - /* Find out what card type it is */ nozomi_get_card_type(dc); - dc->base_addr = ioremap_nocache(start, dc->card_type); + dc->base_addr = pci_iomap(dc->pdev, 0, dc->card_type); if (!dc->base_addr) { dev_err(>dev, "Unable to map card MMIO\n"); ret = -ENODEV; -- 2.20.1
void __iomem *addr should be const
Hi, __iomem *addr seems to lack a const qualifier in the ioread* definitions from include/asm-generic/iomap.h (other definitions such as asm-generic/io.h do have the const). This issue was briefly discussed a while ago[0] but the outcome is not quite clear to me. Should I submit a patch addressing this issue or is it a definitive wontfix? This issue triggers warnings in my current work but I would like to avoid dropping the const if possible. regards, Hugo [0] https://patchwork.kernel.org/patch/5742881/ -- Hugo Lefeuvre (hle)|www.owl.eu.com RSA4096_ 360B 03B3 BF27 4F4D 7A3F D5E8 14AA 1EB8 A247 3DFD ed25519_ 37B2 6D38 0B25 B8A2 6B9F 3A65 A36F 5357 5F2D DC4C signature.asc Description: PGP signature
Re: [PATCH 1/3] sched/wait: use freezable_schedule when possible
> Sure, add these test results to the patch as well showing reduced wakeups. > > I would say submit the freezable_schedule as a single separate patch > independent of the vsoc series since it can go in separately, and also > benefits other things than vsoc. > > Also CC Rafael (power maintainer) on it. Thanks, I have splitted the patch set[0][1] and submitted the freezable_schedule patch separately (only cc-ing people responsible for the wait api + Rafael). regards, Hugo [0] https://lkml.org/lkml/2019/2/7/802 [1] https://lkml.org/lkml/2019/2/7/870 -- Hugo Lefeuvre (hle)|www.owl.eu.com RSA4096_ 360B 03B3 BF27 4F4D 7A3F D5E8 14AA 1EB8 A247 3DFD ed25519_ 37B2 6D38 0B25 B8A2 6B9F 3A65 A36F 5357 5F2D DC4C signature.asc Description: PGP signature
[PATCH v2 2/2] staging/android: simplify handle_vsoc_cond_wait
simplify handle_vsoc_cond_wait (drivers/staging/android/vsoc.c) using newly added wait_event_freezable_hrtimeout helper and remove duplicate include. Signed-off-by: Hugo Lefeuvre --- Changes in v2: - Fix removal of necessary linux/freezer.h include. - Make commit message more precise about the include removal. drivers/staging/android/vsoc.c | 68 +- 1 file changed, 10 insertions(+), 58 deletions(-) diff --git a/drivers/staging/android/vsoc.c b/drivers/staging/android/vsoc.c index 22571abcaa4e..f2bb18158e5b 100644 --- a/drivers/staging/android/vsoc.c +++ b/drivers/staging/android/vsoc.c @@ -29,7 +29,6 @@ #include #include #include -#include #include #include #include "uapi/vsoc_shm.h" @@ -401,7 +400,6 @@ static int handle_vsoc_cond_wait(struct file *filp, struct vsoc_cond_wait *arg) DEFINE_WAIT(wait); u32 region_number = iminor(file_inode(filp)); struct vsoc_region_data *data = vsoc_dev.regions_data + region_number; - struct hrtimer_sleeper timeout, *to = NULL; int ret = 0; struct vsoc_device_region *region_p = vsoc_region_from_filep(filp); atomic_t *address = NULL; @@ -420,69 +418,23 @@ static int handle_vsoc_cond_wait(struct file *filp, struct vsoc_cond_wait *arg) /* Ensure that the type of wait is valid */ switch (arg->wait_type) { case VSOC_WAIT_IF_EQUAL: + ret = wait_event_freezable(data->futex_wait_queue, + arg->wakes++ && + atomic_read(address) != arg->value); break; case VSOC_WAIT_IF_EQUAL_TIMEOUT: - to = - break; - default: - return -EINVAL; - } - - if (to) { - /* Copy the user-supplied timesec into the kernel structure. -* We do things this way to flatten differences between 32 bit -* and 64 bit timespecs. -*/ if (arg->wake_time_nsec >= NSEC_PER_SEC) return -EINVAL; wake_time = ktime_set(arg->wake_time_sec, arg->wake_time_nsec); - - hrtimer_init_on_stack(>timer, CLOCK_MONOTONIC, - HRTIMER_MODE_ABS); - hrtimer_set_expires_range_ns(>timer, wake_time, -current->timer_slack_ns); - - hrtimer_init_sleeper(to, current); + ret = wait_event_freezable_hrtimeout(data->futex_wait_queue, +arg->wakes++ && +atomic_read(address) != arg->value, +wake_time); + break; + default: + return -EINVAL; } - while (1) { - prepare_to_wait(>futex_wait_queue, , - TASK_INTERRUPTIBLE); - /* -* Check the sentinel value after prepare_to_wait. If the value -* changes after this check the writer will call signal, -* changing the task state from INTERRUPTIBLE to RUNNING. That -* will ensure that schedule() will eventually schedule this -* task. -*/ - if (atomic_read(address) != arg->value) { - ret = 0; - break; - } - if (to) { - hrtimer_start_expires(>timer, HRTIMER_MODE_ABS); - if (likely(to->task)) - freezable_schedule(); - hrtimer_cancel(>timer); - if (!to->task) { - ret = -ETIMEDOUT; - break; - } - } else { - freezable_schedule(); - } - /* Count the number of times that we woke up. This is useful -* for unit testing. -*/ - ++arg->wakes; - if (signal_pending(current)) { - ret = -EINTR; - break; - } - } - finish_wait(>futex_wait_queue, ); - if (to) - destroy_hrtimer_on_stack(>timer); return ret; } -- 2.20.1
[PATCH v2 1/2] sched/wait: introduce wait_event_freezable_hrtimeout
introduce wait_event_freezable_hrtimeout, an interruptible and freezable version of wait_event_hrtimeout. This helper will allow for simplifications in staging/android/vsoc.c, among others. Signed-off-by: Hugo Lefeuvre --- Changes in v2: - No change. include/linux/wait.h | 25 + 1 file changed, 21 insertions(+), 4 deletions(-) diff --git a/include/linux/wait.h b/include/linux/wait.h index 5f3efabc36f4..c4cf5113f58a 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -483,7 +483,7 @@ do { \ __ret; \ }) -#define __wait_event_hrtimeout(wq_head, condition, timeout, state) \ +#define __wait_event_hrtimeout(wq_head, condition, timeout, state, cmd) \ ({ \ int __ret = 0; \ struct hrtimer_sleeper __t; \ @@ -500,7 +500,7 @@ do { \ __ret = -ETIME; \ break; \ } \ - schedule()); \ + cmd); \ \ hrtimer_cancel(&__t.timer); \ destroy_hrtimer_on_stack(&__t.timer); \ @@ -529,7 +529,23 @@ do { \ might_sleep(); \ if (!(condition)) \ __ret = __wait_event_hrtimeout(wq_head, condition, timeout, \ - TASK_UNINTERRUPTIBLE); \ + TASK_UNINTERRUPTIBLE, \ + schedule()); \ + __ret; \ +}) + +/* + * like wait_event_hrtimeout() -- except it uses TASK_INTERRUPTIBLE to avoid + * increasing load and is freezable. + */ +#define wait_event_freezable_hrtimeout(wq_head, condition, timeout) \ +({ \ + int __ret = 0; \ + might_sleep(); \ + if (!(condition)) \ + __ret = __wait_event_hrtimeout(wq_head, condition, timeout, \ + TASK_INTERRUPTIBLE, \ + freezable_schedule()); \ __ret; \ }) @@ -555,7 +571,8 @@ do { \ might_sleep(); \ if (!(condition)) \ __ret = __wait_event_hrtimeout(wq, condition, timeout, \ - TASK_INTERRUPTIBLE); \ + TASK_INTERRUPTIBLE, \ + schedule()); \ __ret; \ }) -- 2.20.1
[PATCH v2 0/2] sched/wait, staging/android: simplification of freeze related code
This patchset introduces a new wait_event_freezable_hrtimeout method to the wait api. wait_event_freezable_hrtimeout is then used to greatly simplify handle_vsoc_cond_wait in the android vsoc driver, reducing the size of the vsoc driver. Changes since v1 [1]: - Delete "[1/3] sched/wait: use freezable_schedule when possible", it was submitted separately. - Patch 3/3 (now 2/2): Fix removal of a necessary linux/freezer.h include and improve commit message. [1] v1: https://lkml.org/lkml/2019/2/1/19 Hugo Lefeuvre (2): sched/wait: introduce wait_event_freezable_hrtimeout staging/android: simplify handle_vsoc_cond_wait drivers/staging/android/vsoc.c | 68 +- include/linux/wait.h | 25 +++-- 2 files changed, 31 insertions(+), 62 deletions(-) -- 2.20.1
[PATCH v2] sched/wait: use freezable_schedule when possible
Replace schedule(); try_to_freeze() by freezable_schedule(). Tasks calling freezable_schedule() set the PF_FREEZER_SKIP flag before calling schedule(). Unlike tasks calling schedule(); try_to_freeze() tasks calling freezable_schedule() are not awaken by try_to_freeze_tasks(). Instead they call try_to_freeze() when they wake up if the freeze is still underway. It is not a problem since sleeping tasks can't do anything which isn't allowed for a frozen task while sleeping. The result is a potential performance gain during freeze, since less tasks have to be awaken. For instance on a bare Debian vm running a 4.19 stable kernel, the number of tasks skipped in freeze_task() went up from 12 without the patch to 32 with the patch (out of 448), an increase of > x2.5. Signed-off-by: Hugo Lefeuvre --- Changes in v2: - Add test results to commit message. - Split from initial patch set[0] since this patch can go in separately [0] https://lkml.org/lkml/2019/2/1/19 include/linux/wait.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/include/linux/wait.h b/include/linux/wait.h index ed7c122cb31f..5f3efabc36f4 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -308,7 +308,7 @@ do { \ #define __wait_event_freezable(wq_head, condition) \ ___wait_event(wq_head, condition, TASK_INTERRUPTIBLE, 0, 0, \ - schedule(); try_to_freeze()) + freezable_schedule()) /** * wait_event_freezable - sleep (or freeze) until a condition gets true @@ -367,7 +367,7 @@ do { \ #define __wait_event_freezable_timeout(wq_head, condition, timeout) \ ___wait_event(wq_head, ___wait_cond_timeout(condition), \ TASK_INTERRUPTIBLE, 0, timeout, \ - __ret = schedule_timeout(__ret); try_to_freeze()) + __ret = freezable_schedule_timeout(__ret)) /* * like wait_event_timeout() -- except it uses TASK_INTERRUPTIBLE to avoid @@ -588,7 +588,7 @@ do { \ #define __wait_event_freezable_exclusive(wq, condition) \ ___wait_event(wq, condition, TASK_INTERRUPTIBLE, 1, 0, \ - schedule(); try_to_freeze()) + freezable_schedule()) #define wait_event_freezable_exclusive(wq, condition) \ ({ \ -- 2.20.1
Re: [PATCH 1/3] sched/wait: use freezable_schedule when possible
Hi, > > The result is a potential performance gain during freeze, since less > > tasks have to be awaken. > > I'm curious did you try the freezing process and see if pointless wakeups are > reduced? That would be an added bonus if you did. Test env: fresh Debian QEMU vm with 4.19 stable kernel. Test process: - Added two debug logs to freeze_task: bool freeze_task(struct task_struct *p) { unsigned long flags; [snip] pr_info("freezing a task"); [snip] if (freezer_should_skip(p)) { pr_info("skeeping a task"); return false; } [snip] } - Triggered manual freeze: # echo freezer > /sys/power/pm_test # echo test_resume > /sys/power/disk # echo disk > /sys/power/state - grep -c to get the number of "freezing a task" and "skeeping a task" lines in kern.log. Results: Without my patch: 448 calls freeze_task, 12 skipped. With my patch: 448 calls, 32 skipped. 2.6x more tasks skipped. Not sure this is the best way to test this patch, though. Any advice? regards, Hugo -- Hugo Lefeuvre (hle)|www.owl.eu.com RSA4096_ 360B 03B3 BF27 4F4D 7A3F D5E8 14AA 1EB8 A247 3DFD ed25519_ 37B2 6D38 0B25 B8A2 6B9F 3A65 A36F 5357 5F2D DC4C signature.asc Description: PGP signature
Re: [PATCH 1/3] sched/wait: use freezable_schedule when possible
Hi Joel, > I'm curious did you try the freezing process and see if pointless wakeups are > reduced? That would be an added bonus if you did. I'm currently testing these changes. I hope to be able to come back with more concrete results soon. Also, I just noticed that the third patch removes a necessary #include . I will submit an updated version tomorrow. Thanks for the review! regards, Hugo -- Hugo Lefeuvre (hle)|www.owl.eu.com RSA4096_ 360B 03B3 BF27 4F4D 7A3F D5E8 14AA 1EB8 A247 3DFD ed25519_ 37B2 6D38 0B25 B8A2 6B9F 3A65 A36F 5357 5F2D DC4C
Re: [PATCH] sched/wait: introduce wait_event_freezable_hrtimeout
Hi, > I agree, it is probably better to use freezable_schedule() for all freeze > related wait APIs, and keep it consistent. Your analysis is convincing. I have submitted a new patchset which migrates the wait api to freezable_schedule() and splits the changes from the previous patch. regards, Hugo -- Hugo Lefeuvre (hle)|www.owl.eu.com RSA4096_ 360B 03B3 BF27 4F4D 7A3F D5E8 14AA 1EB8 A247 3DFD ed25519_ 37B2 6D38 0B25 B8A2 6B9F 3A65 A36F 5357 5F2D DC4C signature.asc Description: PGP signature
[PATCH 3/3] staging/android: simplify handle_vsoc_cond_wait
simplify handle_vsoc_cond_wait (drivers/staging/android/vsoc.c) using newly added wait_event_freezable_hrtimeout helper and remove useless includes. Signed-off-by: Hugo Lefeuvre --- drivers/staging/android/vsoc.c | 69 +- 1 file changed, 10 insertions(+), 59 deletions(-) diff --git a/drivers/staging/android/vsoc.c b/drivers/staging/android/vsoc.c index 22571abcaa4e..7e620e69f39d 100644 --- a/drivers/staging/android/vsoc.c +++ b/drivers/staging/android/vsoc.c @@ -17,7 +17,6 @@ */ #include -#include #include #include #include @@ -29,7 +28,6 @@ #include #include #include -#include #include #include #include "uapi/vsoc_shm.h" @@ -401,7 +399,6 @@ static int handle_vsoc_cond_wait(struct file *filp, struct vsoc_cond_wait *arg) DEFINE_WAIT(wait); u32 region_number = iminor(file_inode(filp)); struct vsoc_region_data *data = vsoc_dev.regions_data + region_number; - struct hrtimer_sleeper timeout, *to = NULL; int ret = 0; struct vsoc_device_region *region_p = vsoc_region_from_filep(filp); atomic_t *address = NULL; @@ -420,69 +417,23 @@ static int handle_vsoc_cond_wait(struct file *filp, struct vsoc_cond_wait *arg) /* Ensure that the type of wait is valid */ switch (arg->wait_type) { case VSOC_WAIT_IF_EQUAL: + ret = wait_event_freezable(data->futex_wait_queue, + arg->wakes++ && + atomic_read(address) != arg->value); break; case VSOC_WAIT_IF_EQUAL_TIMEOUT: - to = - break; - default: - return -EINVAL; - } - - if (to) { - /* Copy the user-supplied timesec into the kernel structure. -* We do things this way to flatten differences between 32 bit -* and 64 bit timespecs. -*/ if (arg->wake_time_nsec >= NSEC_PER_SEC) return -EINVAL; wake_time = ktime_set(arg->wake_time_sec, arg->wake_time_nsec); - - hrtimer_init_on_stack(>timer, CLOCK_MONOTONIC, - HRTIMER_MODE_ABS); - hrtimer_set_expires_range_ns(>timer, wake_time, -current->timer_slack_ns); - - hrtimer_init_sleeper(to, current); + ret = wait_event_freezable_hrtimeout(data->futex_wait_queue, +arg->wakes++ && +atomic_read(address) != arg->value, +wake_time); + break; + default: + return -EINVAL; } - while (1) { - prepare_to_wait(>futex_wait_queue, , - TASK_INTERRUPTIBLE); - /* -* Check the sentinel value after prepare_to_wait. If the value -* changes after this check the writer will call signal, -* changing the task state from INTERRUPTIBLE to RUNNING. That -* will ensure that schedule() will eventually schedule this -* task. -*/ - if (atomic_read(address) != arg->value) { - ret = 0; - break; - } - if (to) { - hrtimer_start_expires(>timer, HRTIMER_MODE_ABS); - if (likely(to->task)) - freezable_schedule(); - hrtimer_cancel(>timer); - if (!to->task) { - ret = -ETIMEDOUT; - break; - } - } else { - freezable_schedule(); - } - /* Count the number of times that we woke up. This is useful -* for unit testing. -*/ - ++arg->wakes; - if (signal_pending(current)) { - ret = -EINTR; - break; - } - } - finish_wait(>futex_wait_queue, ); - if (to) - destroy_hrtimer_on_stack(>timer); return ret; } -- 2.20.1
[PATCH 2/3] sched/wait: introduce wait_event_freezable_hrtimeout
introduce wait_event_freezable_hrtimeout, an interruptible and freezable version of wait_event_hrtimeout. Among others this helper will allow for simplifications in staging/android/vsoc.c. Signed-off-by: Hugo Lefeuvre --- include/linux/wait.h | 25 + 1 file changed, 21 insertions(+), 4 deletions(-) diff --git a/include/linux/wait.h b/include/linux/wait.h index 5f3efabc36f4..c4cf5113f58a 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -483,7 +483,7 @@ do { \ __ret; \ }) -#define __wait_event_hrtimeout(wq_head, condition, timeout, state) \ +#define __wait_event_hrtimeout(wq_head, condition, timeout, state, cmd) \ ({ \ int __ret = 0; \ struct hrtimer_sleeper __t; \ @@ -500,7 +500,7 @@ do { \ __ret = -ETIME; \ break; \ } \ - schedule()); \ + cmd); \ \ hrtimer_cancel(&__t.timer); \ destroy_hrtimer_on_stack(&__t.timer); \ @@ -529,7 +529,23 @@ do { \ might_sleep(); \ if (!(condition)) \ __ret = __wait_event_hrtimeout(wq_head, condition, timeout, \ - TASK_UNINTERRUPTIBLE); \ + TASK_UNINTERRUPTIBLE, \ + schedule()); \ + __ret; \ +}) + +/* + * like wait_event_hrtimeout() -- except it uses TASK_INTERRUPTIBLE to avoid + * increasing load and is freezable. + */ +#define wait_event_freezable_hrtimeout(wq_head, condition, timeout) \ +({ \ + int __ret = 0; \ + might_sleep(); \ + if (!(condition)) \ + __ret = __wait_event_hrtimeout(wq_head, condition, timeout, \ + TASK_INTERRUPTIBLE, \ + freezable_schedule()); \ __ret; \ }) @@ -555,7 +571,8 @@ do { \ might_sleep(); \ if (!(condition)) \ __ret = __wait_event_hrtimeout(wq, condition, timeout, \ - TASK_INTERRUPTIBLE); \ + TASK_INTERRUPTIBLE, \ + schedule()); \ __ret; \ }) -- 2.20.1
[PATCH 1/3] sched/wait: use freezable_schedule when possible
Replace schedule(); try_to_freeze() by freezable_schedule(). Tasks calling freezable_schedule() set the PF_FREEZER_SKIP flag before calling schedule(). Unlike tasks calling schedule(); try_to_freeze() tasks calling freezable_schedule() are not awaken by try_to_freeze_tasks(). Instead they call try_to_freeze() when they wake up if the freeze is still underway. It is not a problem since sleeping tasks can't do anything which isn't allowed for a frozen task while sleeping. The result is a potential performance gain during freeze, since less tasks have to be awaken. Signed-off-by: Hugo Lefeuvre --- include/linux/wait.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/include/linux/wait.h b/include/linux/wait.h index ed7c122cb31f..5f3efabc36f4 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -308,7 +308,7 @@ do { \ #define __wait_event_freezable(wq_head, condition) \ ___wait_event(wq_head, condition, TASK_INTERRUPTIBLE, 0, 0, \ - schedule(); try_to_freeze()) + freezable_schedule()) /** * wait_event_freezable - sleep (or freeze) until a condition gets true @@ -367,7 +367,7 @@ do { \ #define __wait_event_freezable_timeout(wq_head, condition, timeout) \ ___wait_event(wq_head, ___wait_cond_timeout(condition), \ TASK_INTERRUPTIBLE, 0, timeout, \ - __ret = schedule_timeout(__ret); try_to_freeze()) + __ret = freezable_schedule_timeout(__ret)) /* * like wait_event_timeout() -- except it uses TASK_INTERRUPTIBLE to avoid @@ -588,7 +588,7 @@ do { \ #define __wait_event_freezable_exclusive(wq, condition) \ ___wait_event(wq, condition, TASK_INTERRUPTIBLE, 1, 0, \ - schedule(); try_to_freeze()) + freezable_schedule()) #define wait_event_freezable_exclusive(wq, condition) \ ({ \ -- 2.20.1
[PATCH 0/3] sched/wait, staging/android: simplification and optimization of freeze related code
This patchset changes the wait api to use freezable_schedule when possible and adds a new wait_event_freezable_hrtimeout method. wait_event_freezable_hrtimeout is then used to greatly simplify handle_vsoc_cond_wait in the android vsoc driver. This reduces the size of the vsoc driver and allows for potential performance gain during freeze in the wait api. This is a follow up of my previous patch "sched/wait: introduce wait_event_freezable_hrtimeout"[0]. More information related to the performance gain by using freezable_schedule can be found in the previous discussion[1]. [0] https://lkml.org/lkml/2019/1/17/877 [1] https://lkml.org/lkml/2019/1/19/58 Hugo Lefeuvre (3): sched/wait: use freezable_schedule when possible sched/wait: introduce wait_event_freezable_hrtimeout staging/android: simplify handle_vsoc_cond_wait drivers/staging/android/vsoc.c | 69 +- include/linux/wait.h | 31 +++ 2 files changed, 34 insertions(+), 66 deletions(-) -- 2.20.1
Re: [PATCH] sched/wait: introduce wait_event_freezable_hrtimeout
> > as far as I understand this code, freezable_schedule() avoids blocking the > > freezer during the schedule() call, but in the end try_to_freeze() is still > > called so the result is the same, right? > > I wonder why wait_event_freezable is not calling freezable_schedule(). > > It could be something subtle in my view. freezable_schedule() actually makes > the freezer code not send a wake up to the sleeping task if a freeze happens, > because the PF_FREEZER_SKIP flag is set, as you pointed. > > Whereas wait_event_freezable() which uses try_to_freeze() does not seem to > have > this behavior and the task will enter the freezer. So I'm a bit skeptical if > your API will behave as expected (or at least consistently with other wait > APIs). oh right, now it is clear to me: - schedule(); try_to_freeze() schedule() is called and the task enters sleep. Since PF_FREEZER_SKIP is not set, the task wakes up as soon as try_to_freeze_tasks() is called. Right after waking up the task calls try_to_freeze() which freezes it. - freezable_schedule() schedule() is called and the task enters sleep. Since PF_FREEZER_SKIP is set, the task does not wake up when try_to_freeze_tasks() is called. This is not a problem, the task can't "do anything which isn't allowed for a frozen task" while sleeping[0]. When the task wakes up (timeout, or whatever other reason) it calls try_to_freeze() which freezes it if the freeze is still underway. So if a freeze is triggered while the task is sleeping, a task executing freezable_schedule() might or might not notice the freeze depending on how long it sleeps. A task executing schedule(); try_to_freeze() will always notice it. I might be wrong on that, but freezable_schedule() just seems like a performance improvement to me. Now I fully agree with you that there should be a uniform definition of "freezable" between wait_event_freezable and wait_event_freezable_hrtimeout. This leaves me to the question: should I modify my definition of wait_event_freezable_hrtimeout, or prepare a patch for wait_event_freezable ? If I am right with the performance thing, the latter might be worth considering? Either way, this will be fixed in the v2. > > That being said, I am not sure that the try_to_freeze() call does anything > > in the vsoc case because there is no call to set_freezable() so the thread > > still has PF_NOFREEZE... > > I traced this, and PF_NOFREEZE is not set by default for tasks. Well, I did not check this in practice and might be confused somewhere but the documentation[1] says "kernel threads are not freezable by default. However, a kernel thread may clear PF_NOFREEZE for itself by calling set_freezable()". Looking at the kthreadd() definition it seems like new tasks have PF_NOFREEZE set by default[2]. I'll take some time to check this in practice in the next days. Anyways, thanks for your time ! regards, Hugo [0] https://elixir.bootlin.com/linux/latest/source/include/linux/freezer.h#L103 [1] https://elixir.bootlin.com/linux/latest/source/Documentation/power/freezing-of-tasks.txt#L90 [2] https://elixir.bootlin.com/linux/latest/source/kernel/kthread.c#L569 -- Hugo Lefeuvre (hle)|www.owl.eu.com RSA4096_ 360B 03B3 BF27 4F4D 7A3F D5E8 14AA 1EB8 A247 3DFD ed25519_ 37B2 6D38 0B25 B8A2 6B9F 3A65 A36F 5357 5F2D DC4C signature.asc Description: PGP signature
Re: [PATCH] sched/wait: introduce wait_event_freezable_hrtimeout
Hi Joel, Thanks for your review. > I believe these should be 2 patches. In the first patch you introduce the > new API, in the second one you would simplify the VSOC driver. > > In fact, in one part of the patch you are using wait_event_freezable which > already exists so that's unrelated to the new API you are adding. Agree, I will split the patch for the v2. > > +/* > > + * like wait_event_hrtimeout() -- except it uses TASK_INTERRUPTIBLE to > > avoid > > + * increasing load and is freezable. > > + */ > > +#define wait_event_freezable_hrtimeout(wq_head, condition, timeout) > > \ > > You should document the variable names in the header comments. Agree. This comment was copy/pasted from wait_event_freezable_timeout, should I fix it there as well? > Also, this new API appears to conflict with definition of 'freezable' in > wait_event_freezable I think, > > wait_event_freezable - sleep or freeze until condition is true. > > wait_event_freezable_hrtimeout - sleep but make sure freezer is not blocked. > (your API) > > It seems to me these are 2 different definitions of 'freezing' (or I could be > completely confused). But in the first case it calls try_to_freeze after > schedule(). In the second case (your new API), it calls freezable_schedule(). > > So I am wondering why is there this difference in the 'meaning of freezable'. > In the very least, any such subtle differences should be documented in the > header comments IMO. It appears that freezable_schedule() and schedule(); try_to_freeze() are almost identical: static inline void freezable_schedule(void) { freezer_do_not_count(); schedule(); freezer_count(); } and static inline void freezer_do_not_count(void) { current->flags |= PF_FREEZER_SKIP; } static inline void freezer_count(void) { current->flags &= ~PF_FREEZER_SKIP; /* * If freezing is in progress, the following paired with smp_mb() * in freezer_should_skip() ensures that either we see %true * freezing() or freezer_should_skip() sees !PF_FREEZER_SKIP. */ smp_mb(); try_to_freeze(); } as far as I understand this code, freezable_schedule() avoids blocking the freezer during the schedule() call, but in the end try_to_freeze() is still called so the result is the same, right? I wonder why wait_event_freezable is not calling freezable_schedule(). That being said, I am not sure that the try_to_freeze() call does anything in the vsoc case because there is no call to set_freezable() so the thread still has PF_NOFREEZE... regards, Hugo -- Hugo Lefeuvre (hle)|www.owl.eu.com RSA4096_ 360B 03B3 BF27 4F4D 7A3F D5E8 14AA 1EB8 A247 3DFD ed25519_ 37B2 6D38 0B25 B8A2 6B9F 3A65 A36F 5357 5F2D DC4C signature.asc Description: PGP signature
Re: [PATCH] sched/wait: introduce wait_event_freezable_hrtimeout
Hi Greg, > > introduce wait_event_freezable_hrtimeout, an interruptible and freezable > > version of wait_event_hrtimeout. > > > > simplify handle_vsoc_cond_wait (drivers/staging/android/vsoc.c) using this > > newly added helper and remove useless includes. > > > > Signed-off-by: Hugo Lefeuvre > > --- > > drivers/staging/android/vsoc.c | 69 +- > > include/linux/wait.h | 25 ++-- > > code in drivers/staging/ should be self-contained, and not, if at all > possible, ever force additional changes on "core" kernel code. > > Are you sure that the vsoc code can't use one of the current wait > macros? As far as I know there is no macro implementing freezable wait with high resolution timeout. > Why is it so special and unique that no one else in the kernel > has ever needed this before it came along? many wait_event_X() (_exclusive, _interruptible, _timeout) functions have a freezable counterpart. wait_event_hrtimeout() doesn't, probably because it is relatively new (and seemingly quite unused). If there is a wait_event_hrtimeout() function, it makes sense to me to have wait_event_freezable_hrtimeout(). -- Hugo Lefeuvre (hle)|www.owl.eu.com RSA4096_ 360B 03B3 BF27 4F4D 7A3F D5E8 14AA 1EB8 A247 3DFD ed25519_ 37B2 6D38 0B25 B8A2 6B9F 3A65 A36F 5357 5F2D DC4C signature.asc Description: PGP signature
[PATCH] sched/wait: introduce wait_event_freezable_hrtimeout
introduce wait_event_freezable_hrtimeout, an interruptible and freezable version of wait_event_hrtimeout. simplify handle_vsoc_cond_wait (drivers/staging/android/vsoc.c) using this newly added helper and remove useless includes. Signed-off-by: Hugo Lefeuvre --- drivers/staging/android/vsoc.c | 69 +- include/linux/wait.h | 25 ++-- 2 files changed, 31 insertions(+), 63 deletions(-) diff --git a/drivers/staging/android/vsoc.c b/drivers/staging/android/vsoc.c index 22571abcaa4e..7e620e69f39d 100644 --- a/drivers/staging/android/vsoc.c +++ b/drivers/staging/android/vsoc.c @@ -17,7 +17,6 @@ */ #include -#include #include #include #include @@ -29,7 +28,6 @@ #include #include #include -#include #include #include #include "uapi/vsoc_shm.h" @@ -401,7 +399,6 @@ static int handle_vsoc_cond_wait(struct file *filp, struct vsoc_cond_wait *arg) DEFINE_WAIT(wait); u32 region_number = iminor(file_inode(filp)); struct vsoc_region_data *data = vsoc_dev.regions_data + region_number; - struct hrtimer_sleeper timeout, *to = NULL; int ret = 0; struct vsoc_device_region *region_p = vsoc_region_from_filep(filp); atomic_t *address = NULL; @@ -420,69 +417,23 @@ static int handle_vsoc_cond_wait(struct file *filp, struct vsoc_cond_wait *arg) /* Ensure that the type of wait is valid */ switch (arg->wait_type) { case VSOC_WAIT_IF_EQUAL: + ret = wait_event_freezable(data->futex_wait_queue, + arg->wakes++ && + atomic_read(address) != arg->value); break; case VSOC_WAIT_IF_EQUAL_TIMEOUT: - to = - break; - default: - return -EINVAL; - } - - if (to) { - /* Copy the user-supplied timesec into the kernel structure. -* We do things this way to flatten differences between 32 bit -* and 64 bit timespecs. -*/ if (arg->wake_time_nsec >= NSEC_PER_SEC) return -EINVAL; wake_time = ktime_set(arg->wake_time_sec, arg->wake_time_nsec); - - hrtimer_init_on_stack(>timer, CLOCK_MONOTONIC, - HRTIMER_MODE_ABS); - hrtimer_set_expires_range_ns(>timer, wake_time, -current->timer_slack_ns); - - hrtimer_init_sleeper(to, current); + ret = wait_event_freezable_hrtimeout(data->futex_wait_queue, +arg->wakes++ && +atomic_read(address) != arg->value, +wake_time); + break; + default: + return -EINVAL; } - while (1) { - prepare_to_wait(>futex_wait_queue, , - TASK_INTERRUPTIBLE); - /* -* Check the sentinel value after prepare_to_wait. If the value -* changes after this check the writer will call signal, -* changing the task state from INTERRUPTIBLE to RUNNING. That -* will ensure that schedule() will eventually schedule this -* task. -*/ - if (atomic_read(address) != arg->value) { - ret = 0; - break; - } - if (to) { - hrtimer_start_expires(>timer, HRTIMER_MODE_ABS); - if (likely(to->task)) - freezable_schedule(); - hrtimer_cancel(>timer); - if (!to->task) { - ret = -ETIMEDOUT; - break; - } - } else { - freezable_schedule(); - } - /* Count the number of times that we woke up. This is useful -* for unit testing. -*/ - ++arg->wakes; - if (signal_pending(current)) { - ret = -EINTR; - break; - } - } - finish_wait(>futex_wait_queue, ); - if (to) - destroy_hrtimer_on_stack(>timer); return ret; } diff --git a/include/linux/wait.h b/include/linux/wait.h index ed7c122cb31f..13a454884f8b 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -483,7 +483,7 @@ do { \ __ret; \ }
Re: staging/android: questions regarding TODO entries
> It should probably say "address." Thanks. I'm working on a few patches for staging/android, this issue will be addressed as well. regards, Hugo -- Hugo Lefeuvre (hle)|www.owl.eu.com RSA4096_ 360B 03B3 BF27 4F4D 7A3F D5E8 14AA 1EB8 A247 3DFD ed25519_ 37B2 6D38 0B25 B8A2 6B9F 3A65 A36F 5357 5F2D DC4C
staging/android: questions regarding TODO entries
Hi, This todo entry from staging/android/TODO intriguates me: vsoc.c, uapi/vsoc_shm.h - The current driver uses the same wait queue for all of the futexes in a region. This will cause false wakeups in regions with a large number of waiting threads. We should eventually use multiple queues and select the queue based on the region. I am not sure to understand it very well. What does "select the queue based on the region" mean here ? We already have one queue per region, right ? What I understand: there is one wait queue per region, meaning that if threads T1 to Tn are waiting at offsets O1 to On (same region), then a wakeup at offset Om will wake them all. In this case there is a perf issue because only Tm (waiting for changes at offset Om) really wants to be waken up here, the rest is a bunch of spurious wakeups. Does the todo suggest to have one queue per offset ? Also, this comment (drivers/staging/android/vsoc.c) mentions a worst case of ten threads: /* * TODO(b/73664181): Use multiple futex wait queues. * We need to wake every sleeper when the condition changes. Typically * only a single thread will be waiting on the condition, but there * are exceptions. The worst case is about 10 threads. */ It is not clear to me how this value has been obtained, nor under which conditions it might be true. There is no maximum to the number of threads fitting in the wait queue, so how can we be sure that at most ten threads will wait at the same offset ? second, unrelated question: In the VSOC_SELF_INTERRUPT ioctl (which might be removed in the future if VSOC_WAIT_FOR_INCOMING_INTERRUPT disappears, right ?), incoming_signalled is set to 1 but at no other place in the driver we reset it to zero. So, once VSOC_SELF_INTERRUPT has been executed once, VSOC_WAIT_FOR_INCOMING_INTERRUPT doesn't work anymore ? Thanks for your work ! cheers, Hugo PS: cc-ing the result of get_maintainer.pl + contacts from todo. Please tell me if this is not the right way to go. -- Hugo Lefeuvre (hle)|www.owl.eu.com RSA4096_ 360B 03B3 BF27 4F4D 7A3F D5E8 14AA 1EB8 A247 3DFD ed25519_ 37B2 6D38 0B25 B8A2 6B9F 3A65 A36F 5357 5F2D DC4C signature.asc Description: PGP signature
pi433: initialization of tx config in pi433_open()
Hi Marcus, I'm currently working on the following TODO: 966 /* setup instance data*/ 967 instance->device = device; 968 instance->tx_cfg.bit_rate = 4711; 969 // TODO: fill instance->tx_cfg; If a user calls write() right after open()-ing an instance, the driver might try to setup the device with uninitialized garbage. In fact nothing really bad will happen because the rf69 interface abstraction will filter out wrong values, but this might be a confusing behavior for the user. What do you think about initializing instance->tx_cfg with the default values of the rf69 datasheet[0] ? Also, is there a specific reason why you chose 4711 as a default value for the bit rate ? I couldn't find it anywhere in the datasheet nor on the internet. Thanks ! Regards, Hugo [0] http://www.hoperf.com/upload/rf/RFM69CW-V1.1.pdf -- Hugo Lefeuvre (hle)|www.owl.eu.com 4096/ 9C4F C8BF A4B0 8FC5 48EB 56B8 1962 765B B9A8 BACA signature.asc Description: PGP signature
pi433: initialization of tx config in pi433_open()
Hi Marcus, I'm currently working on the following TODO: 966 /* setup instance data*/ 967 instance->device = device; 968 instance->tx_cfg.bit_rate = 4711; 969 // TODO: fill instance->tx_cfg; If a user calls write() right after open()-ing an instance, the driver might try to setup the device with uninitialized garbage. In fact nothing really bad will happen because the rf69 interface abstraction will filter out wrong values, but this might be a confusing behavior for the user. What do you think about initializing instance->tx_cfg with the default values of the rf69 datasheet[0] ? Also, is there a specific reason why you chose 4711 as a default value for the bit rate ? I couldn't find it anywhere in the datasheet nor on the internet. Thanks ! Regards, Hugo [0] http://www.hoperf.com/upload/rf/RFM69CW-V1.1.pdf -- Hugo Lefeuvre (hle)|www.owl.eu.com 4096/ 9C4F C8BF A4B0 8FC5 48EB 56B8 1962 765B B9A8 BACA signature.asc Description: PGP signature
Re: rf69_set_deviation in rf69.c (pi433 driver)
Hi Marcus, > > According to the datasheet[0], the deviation should always be smaller > > than 300kHz, and the following equation should be respected: > > > > (1) FDA + BRF/2 =< 500 kHz > > > > Why did you choose 500kHz as max for FDA, instead of 300kHz ? It looks like > > a bug to me. > > My focus was always on OOK and ASK. PSK was only used for a few > measurements in the laboratory, I engaged to check CE compliance. > Most probably 500kHz was a value, that's common for PSK and I didn't pay > any attention to the datasheet. So I think, you are right: This is a bug > and could be revised. > Never the less: While using it in the lab, the transmission was fine and > the signals over air have been clean and fitted to the recommondations > of the CE. > > > > Concerning the TODO, I can see two solutions currently: > > > > 1. Introduce a new rf69_get_bit_rate function which reads REG_BITRATE_MSB > >and REG_BITRATE_LSB and returns reconstructed BRF. > >We could use this function in rf69_set_deviation to retrieve the BRF. > > > > + clean > > + intuitive > > - heavy / slow > > Why not: I like clean and intuitive implementations. Since it's used > during configuration, we shouldn't be that squeezed in time, that we > need to hurry. > > > > 2. Store BRF somewhere in rf69.c, initialize it with the default value > >(4.8 kb/s) and update it when rf69_set_bit_rate is called. > > > > + easy > > - dirty, doesn't fit well with the design of rf69.c (do not store > > anything, simply expose API) > > Up to my experience, storing reg values is always a bit problematic, > since the value may be outdated. And if you update the value every time > you want to use it to be sure, it's correct, there is no win in having > the copy of the reg value. > So this wouldn't be my favourite. > > > > I'd really prefer going for the first one, but I wanted to have your opinion > > on this. > > Agree. I'll prepare a patch addressing both issues. However I don't own test devices so it would be really great if you could test it ! I'm currently thinking of adapting this driver for other HopeRf modules like RFM69HCW or RFM12 so I will probably try to find some test equipement in the future. Thanks for your answer ! Regards, Hugo -- Hugo Lefeuvre (hle)|www.owl.eu.com 4096/ 9C4F C8BF A4B0 8FC5 48EB 56B8 1962 765B B9A8 BACA signature.asc Description: PGP signature
Re: rf69_set_deviation in rf69.c (pi433 driver)
Hi Marcus, > > According to the datasheet[0], the deviation should always be smaller > > than 300kHz, and the following equation should be respected: > > > > (1) FDA + BRF/2 =< 500 kHz > > > > Why did you choose 500kHz as max for FDA, instead of 300kHz ? It looks like > > a bug to me. > > My focus was always on OOK and ASK. PSK was only used for a few > measurements in the laboratory, I engaged to check CE compliance. > Most probably 500kHz was a value, that's common for PSK and I didn't pay > any attention to the datasheet. So I think, you are right: This is a bug > and could be revised. > Never the less: While using it in the lab, the transmission was fine and > the signals over air have been clean and fitted to the recommondations > of the CE. > > > > Concerning the TODO, I can see two solutions currently: > > > > 1. Introduce a new rf69_get_bit_rate function which reads REG_BITRATE_MSB > >and REG_BITRATE_LSB and returns reconstructed BRF. > >We could use this function in rf69_set_deviation to retrieve the BRF. > > > > + clean > > + intuitive > > - heavy / slow > > Why not: I like clean and intuitive implementations. Since it's used > during configuration, we shouldn't be that squeezed in time, that we > need to hurry. > > > > 2. Store BRF somewhere in rf69.c, initialize it with the default value > >(4.8 kb/s) and update it when rf69_set_bit_rate is called. > > > > + easy > > - dirty, doesn't fit well with the design of rf69.c (do not store > > anything, simply expose API) > > Up to my experience, storing reg values is always a bit problematic, > since the value may be outdated. And if you update the value every time > you want to use it to be sure, it's correct, there is no win in having > the copy of the reg value. > So this wouldn't be my favourite. > > > > I'd really prefer going for the first one, but I wanted to have your opinion > > on this. > > Agree. I'll prepare a patch addressing both issues. However I don't own test devices so it would be really great if you could test it ! I'm currently thinking of adapting this driver for other HopeRf modules like RFM69HCW or RFM12 so I will probably try to find some test equipement in the future. Thanks for your answer ! Regards, Hugo -- Hugo Lefeuvre (hle)|www.owl.eu.com 4096/ 9C4F C8BF A4B0 8FC5 48EB 56B8 1962 765B B9A8 BACA signature.asc Description: PGP signature
Re: [PATCH v2] staging: pi433: fix race condition in pi433_open
On Wed, Jun 20, 2018 at 11:34:39AM +0300, Dan Carpenter wrote: > On Tue, Jun 19, 2018 at 10:33:26PM -0400, Hugo Lefeuvre wrote: > > @@ -1178,6 +1152,11 @@ static int pi433_probe(struct spi_device *spi) > > device->tx_active = false; > > device->interrupt_rx_allowed = false; > > > > + /* init rx buffer */ > > + device->rx_buffer = kmalloc(MAX_MSG_SIZE, GFP_KERNEL); > > + if (!device->rx_buffer) > > + return -ENOMEM; > > We need to free device. Fixed in v3. Thanks ! regards, Hugo -- Hugo Lefeuvre (hle)|www.owl.eu.com 4096/ 9C4F C8BF A4B0 8FC5 48EB 56B8 1962 765B B9A8 BACA signature.asc Description: PGP signature
Re: [PATCH v2] staging: pi433: fix race condition in pi433_open
On Wed, Jun 20, 2018 at 11:34:39AM +0300, Dan Carpenter wrote: > On Tue, Jun 19, 2018 at 10:33:26PM -0400, Hugo Lefeuvre wrote: > > @@ -1178,6 +1152,11 @@ static int pi433_probe(struct spi_device *spi) > > device->tx_active = false; > > device->interrupt_rx_allowed = false; > > > > + /* init rx buffer */ > > + device->rx_buffer = kmalloc(MAX_MSG_SIZE, GFP_KERNEL); > > + if (!device->rx_buffer) > > + return -ENOMEM; > > We need to free device. Fixed in v3. Thanks ! regards, Hugo -- Hugo Lefeuvre (hle)|www.owl.eu.com 4096/ 9C4F C8BF A4B0 8FC5 48EB 56B8 1962 765B B9A8 BACA signature.asc Description: PGP signature
[PATCH v3] staging: pi433: fix race condition in pi433_open
The device structure contains a useless non-atomic users counter which is subject to race conditions. It has probably been created to handle the case where remove is executed while operations are still executing on open fds but this will never happen because of reference counts. Drop the users counter and move rx buffer {de,}allocation to probe() and remove(). Remove associated dead code from open() and release(). Remove related TODO entry from ioctl(). Signed-off-by: Hugo Lefeuvre --- Changes in v3: - add missing free call in probe() (in case of failure during memory allocation for rx buffer). --- diff --git a/drivers/staging/pi433/pi433_if.c b/drivers/staging/pi433/pi433_if.c index 94e0bfcec991..0638fd5f14d9 100644 --- a/drivers/staging/pi433/pi433_if.c +++ b/drivers/staging/pi433/pi433_if.c @@ -78,7 +78,6 @@ struct pi433_device { struct device *dev; struct cdev *cdev; struct spi_device *spi; - unsigned intusers; /* irq related values */ struct gpio_desc*gpiod[NUM_DIO]; @@ -887,9 +886,6 @@ pi433_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) if (_IOC_TYPE(cmd) != PI433_IOC_MAGIC) return -ENOTTY; - /* TODO? guard against device removal before, or while, -* we issue this ioctl. --> device_get() -*/ instance = filp->private_data; device = instance->device; @@ -963,19 +959,9 @@ static int pi433_open(struct inode *inode, struct file *filp) return -ENODEV; } - if (!device->rx_buffer) { - device->rx_buffer = kmalloc(MAX_MSG_SIZE, GFP_KERNEL); - if (!device->rx_buffer) - return -ENOMEM; - } - - device->users++; instance = kzalloc(sizeof(*instance), GFP_KERNEL); - if (!instance) { - kfree(device->rx_buffer); - device->rx_buffer = NULL; + if (!instance) return -ENOMEM; - } /* setup instance data*/ instance->device = device; @@ -992,23 +978,11 @@ static int pi433_open(struct inode *inode, struct file *filp) static int pi433_release(struct inode *inode, struct file *filp) { struct pi433_instance *instance; - struct pi433_device *device; instance = filp->private_data; - device = instance->device; kfree(instance); filp->private_data = NULL; - /* last close? */ - device->users--; - - if (!device->users) { - kfree(device->rx_buffer); - device->rx_buffer = NULL; - if (!device->spi) - kfree(device); - } - return 0; } @@ -1178,6 +1152,13 @@ static int pi433_probe(struct spi_device *spi) device->tx_active = false; device->interrupt_rx_allowed = false; + /* init rx buffer */ + device->rx_buffer = kmalloc(MAX_MSG_SIZE, GFP_KERNEL); + if (!device->rx_buffer) { + retval = -ENOMEM; + goto RX_failed; + } + /* init wait queues */ init_waitqueue_head(>tx_wait_queue); init_waitqueue_head(>rx_wait_queue); @@ -1280,6 +1261,8 @@ static int pi433_probe(struct spi_device *spi) minor_failed: free_gpio(device); GPIO_failed: + kfree(device->rx_buffer); +RX_failed: kfree(device); return retval; @@ -1303,8 +1286,8 @@ static int pi433_remove(struct spi_device *spi) pi433_free_minor(device); - if (device->users == 0) - kfree(device); + kfree(device->rx_buffer); + kfree(device); return 0; } -- 2.17.1
[PATCH v3] staging: pi433: fix race condition in pi433_open
The device structure contains a useless non-atomic users counter which is subject to race conditions. It has probably been created to handle the case where remove is executed while operations are still executing on open fds but this will never happen because of reference counts. Drop the users counter and move rx buffer {de,}allocation to probe() and remove(). Remove associated dead code from open() and release(). Remove related TODO entry from ioctl(). Signed-off-by: Hugo Lefeuvre --- Changes in v3: - add missing free call in probe() (in case of failure during memory allocation for rx buffer). --- diff --git a/drivers/staging/pi433/pi433_if.c b/drivers/staging/pi433/pi433_if.c index 94e0bfcec991..0638fd5f14d9 100644 --- a/drivers/staging/pi433/pi433_if.c +++ b/drivers/staging/pi433/pi433_if.c @@ -78,7 +78,6 @@ struct pi433_device { struct device *dev; struct cdev *cdev; struct spi_device *spi; - unsigned intusers; /* irq related values */ struct gpio_desc*gpiod[NUM_DIO]; @@ -887,9 +886,6 @@ pi433_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) if (_IOC_TYPE(cmd) != PI433_IOC_MAGIC) return -ENOTTY; - /* TODO? guard against device removal before, or while, -* we issue this ioctl. --> device_get() -*/ instance = filp->private_data; device = instance->device; @@ -963,19 +959,9 @@ static int pi433_open(struct inode *inode, struct file *filp) return -ENODEV; } - if (!device->rx_buffer) { - device->rx_buffer = kmalloc(MAX_MSG_SIZE, GFP_KERNEL); - if (!device->rx_buffer) - return -ENOMEM; - } - - device->users++; instance = kzalloc(sizeof(*instance), GFP_KERNEL); - if (!instance) { - kfree(device->rx_buffer); - device->rx_buffer = NULL; + if (!instance) return -ENOMEM; - } /* setup instance data*/ instance->device = device; @@ -992,23 +978,11 @@ static int pi433_open(struct inode *inode, struct file *filp) static int pi433_release(struct inode *inode, struct file *filp) { struct pi433_instance *instance; - struct pi433_device *device; instance = filp->private_data; - device = instance->device; kfree(instance); filp->private_data = NULL; - /* last close? */ - device->users--; - - if (!device->users) { - kfree(device->rx_buffer); - device->rx_buffer = NULL; - if (!device->spi) - kfree(device); - } - return 0; } @@ -1178,6 +1152,13 @@ static int pi433_probe(struct spi_device *spi) device->tx_active = false; device->interrupt_rx_allowed = false; + /* init rx buffer */ + device->rx_buffer = kmalloc(MAX_MSG_SIZE, GFP_KERNEL); + if (!device->rx_buffer) { + retval = -ENOMEM; + goto RX_failed; + } + /* init wait queues */ init_waitqueue_head(>tx_wait_queue); init_waitqueue_head(>rx_wait_queue); @@ -1280,6 +1261,8 @@ static int pi433_probe(struct spi_device *spi) minor_failed: free_gpio(device); GPIO_failed: + kfree(device->rx_buffer); +RX_failed: kfree(device); return retval; @@ -1303,8 +1286,8 @@ static int pi433_remove(struct spi_device *spi) pi433_free_minor(device); - if (device->users == 0) - kfree(device); + kfree(device->rx_buffer); + kfree(device); return 0; } -- 2.17.1
[PATCH v2] staging: pi433: fix race condition in pi433_open
The device structure contains a useless non-atomic users counter which is subject to race conditions. It has probably been created to handle the case where remove is executed while operations are still executing on open fds but this will never happen because of reference counts. Drop the users counter and move rx buffer {de,}allocation to probe() and remove(). Remove associated dead code from open() and release(). Remove related TODO entry from ioctl(). Signed-off-by: Hugo Lefeuvre --- Changes in v2: - Remove useless users counter. - Remove unneeded TODO entry in ioctl(). - Move rx buffer {de,}allocation to probe() and remove(). --- diff --git a/drivers/staging/pi433/pi433_if.c b/drivers/staging/pi433/pi433_if.c index 94e0bfcec991..a5aa9c5bc6fd 100644 --- a/drivers/staging/pi433/pi433_if.c +++ b/drivers/staging/pi433/pi433_if.c @@ -78,7 +78,6 @@ struct pi433_device { struct device *dev; struct cdev *cdev; struct spi_device *spi; - unsigned intusers; /* irq related values */ struct gpio_desc*gpiod[NUM_DIO]; @@ -887,9 +886,6 @@ pi433_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) if (_IOC_TYPE(cmd) != PI433_IOC_MAGIC) return -ENOTTY; - /* TODO? guard against device removal before, or while, -* we issue this ioctl. --> device_get() -*/ instance = filp->private_data; device = instance->device; @@ -963,19 +959,9 @@ static int pi433_open(struct inode *inode, struct file *filp) return -ENODEV; } - if (!device->rx_buffer) { - device->rx_buffer = kmalloc(MAX_MSG_SIZE, GFP_KERNEL); - if (!device->rx_buffer) - return -ENOMEM; - } - - device->users++; instance = kzalloc(sizeof(*instance), GFP_KERNEL); - if (!instance) { - kfree(device->rx_buffer); - device->rx_buffer = NULL; + if (!instance) return -ENOMEM; - } /* setup instance data*/ instance->device = device; @@ -992,23 +978,11 @@ static int pi433_open(struct inode *inode, struct file *filp) static int pi433_release(struct inode *inode, struct file *filp) { struct pi433_instance *instance; - struct pi433_device *device; instance = filp->private_data; - device = instance->device; kfree(instance); filp->private_data = NULL; - /* last close? */ - device->users--; - - if (!device->users) { - kfree(device->rx_buffer); - device->rx_buffer = NULL; - if (!device->spi) - kfree(device); - } - return 0; } @@ -1178,6 +1152,11 @@ static int pi433_probe(struct spi_device *spi) device->tx_active = false; device->interrupt_rx_allowed = false; + /* init rx buffer */ + device->rx_buffer = kmalloc(MAX_MSG_SIZE, GFP_KERNEL); + if (!device->rx_buffer) + return -ENOMEM; + /* init wait queues */ init_waitqueue_head(>tx_wait_queue); init_waitqueue_head(>rx_wait_queue); @@ -1280,6 +1259,7 @@ static int pi433_probe(struct spi_device *spi) minor_failed: free_gpio(device); GPIO_failed: + kfree(device->rx_buffer); kfree(device); return retval; @@ -1303,8 +1283,8 @@ static int pi433_remove(struct spi_device *spi) pi433_free_minor(device); - if (device->users == 0) - kfree(device); + kfree(device->rx_buffer); + kfree(device); return 0; } -- 2.17.1
[PATCH v2] staging: pi433: fix race condition in pi433_open
The device structure contains a useless non-atomic users counter which is subject to race conditions. It has probably been created to handle the case where remove is executed while operations are still executing on open fds but this will never happen because of reference counts. Drop the users counter and move rx buffer {de,}allocation to probe() and remove(). Remove associated dead code from open() and release(). Remove related TODO entry from ioctl(). Signed-off-by: Hugo Lefeuvre --- Changes in v2: - Remove useless users counter. - Remove unneeded TODO entry in ioctl(). - Move rx buffer {de,}allocation to probe() and remove(). --- diff --git a/drivers/staging/pi433/pi433_if.c b/drivers/staging/pi433/pi433_if.c index 94e0bfcec991..a5aa9c5bc6fd 100644 --- a/drivers/staging/pi433/pi433_if.c +++ b/drivers/staging/pi433/pi433_if.c @@ -78,7 +78,6 @@ struct pi433_device { struct device *dev; struct cdev *cdev; struct spi_device *spi; - unsigned intusers; /* irq related values */ struct gpio_desc*gpiod[NUM_DIO]; @@ -887,9 +886,6 @@ pi433_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) if (_IOC_TYPE(cmd) != PI433_IOC_MAGIC) return -ENOTTY; - /* TODO? guard against device removal before, or while, -* we issue this ioctl. --> device_get() -*/ instance = filp->private_data; device = instance->device; @@ -963,19 +959,9 @@ static int pi433_open(struct inode *inode, struct file *filp) return -ENODEV; } - if (!device->rx_buffer) { - device->rx_buffer = kmalloc(MAX_MSG_SIZE, GFP_KERNEL); - if (!device->rx_buffer) - return -ENOMEM; - } - - device->users++; instance = kzalloc(sizeof(*instance), GFP_KERNEL); - if (!instance) { - kfree(device->rx_buffer); - device->rx_buffer = NULL; + if (!instance) return -ENOMEM; - } /* setup instance data*/ instance->device = device; @@ -992,23 +978,11 @@ static int pi433_open(struct inode *inode, struct file *filp) static int pi433_release(struct inode *inode, struct file *filp) { struct pi433_instance *instance; - struct pi433_device *device; instance = filp->private_data; - device = instance->device; kfree(instance); filp->private_data = NULL; - /* last close? */ - device->users--; - - if (!device->users) { - kfree(device->rx_buffer); - device->rx_buffer = NULL; - if (!device->spi) - kfree(device); - } - return 0; } @@ -1178,6 +1152,11 @@ static int pi433_probe(struct spi_device *spi) device->tx_active = false; device->interrupt_rx_allowed = false; + /* init rx buffer */ + device->rx_buffer = kmalloc(MAX_MSG_SIZE, GFP_KERNEL); + if (!device->rx_buffer) + return -ENOMEM; + /* init wait queues */ init_waitqueue_head(>tx_wait_queue); init_waitqueue_head(>rx_wait_queue); @@ -1280,6 +1259,7 @@ static int pi433_probe(struct spi_device *spi) minor_failed: free_gpio(device); GPIO_failed: + kfree(device->rx_buffer); kfree(device); return retval; @@ -1303,8 +1283,8 @@ static int pi433_remove(struct spi_device *spi) pi433_free_minor(device); - if (device->users == 0) - kfree(device); + kfree(device->rx_buffer); + kfree(device); return 0; } -- 2.17.1
Re: [PATCH] staging: pi433: fix race condition in pi433_open
> > It would be great to get rid of this counter, indeed. But how to do it > > properly without breaking things ? It seems to be useful to me... > > These things are refcounted so you can't unload the module while a file > is open. When we do an open it does a cdev_get(). When we call the > delete_module syscall, it checks the refcount in try_stop_module() -> > try_release_module_ref(). It will still let us force a release but > at that point you're expecting use after frees. Then I'd like to understand why this counter was introduced if remove always gets executed after the last release call returned. This was probably unclear to the original developer. This TODO seems to be related to this misunderstanding too: 890 /* TODO? guard against device removal before, or while, 891 * we issue this ioctl. --> device_get() 892 */ Device removal can't happen before or during the ioctl execution. > > Really ? device->spi is NULL-ed in remove() so that operations on > > remaining fds can detect remove() was already called and free remaining > > resources: > > > > 1296 /* make sure ops on existing fds can abort cleanly */ > > 1297 device->spi = NULL; > > That's when we're unloading the module so there aren't any users left. I'll submit an updated version of my patch getting rid of the counter and addressing the remaining race conditions. Thanks ! Regards, Hugo -- Hugo Lefeuvre (hle)|www.owl.eu.com 4096/ 9C4F C8BF A4B0 8FC5 48EB 56B8 1962 765B B9A8 BACA signature.asc Description: PGP signature
Re: [PATCH] staging: pi433: fix race condition in pi433_open
> > It would be great to get rid of this counter, indeed. But how to do it > > properly without breaking things ? It seems to be useful to me... > > These things are refcounted so you can't unload the module while a file > is open. When we do an open it does a cdev_get(). When we call the > delete_module syscall, it checks the refcount in try_stop_module() -> > try_release_module_ref(). It will still let us force a release but > at that point you're expecting use after frees. Then I'd like to understand why this counter was introduced if remove always gets executed after the last release call returned. This was probably unclear to the original developer. This TODO seems to be related to this misunderstanding too: 890 /* TODO? guard against device removal before, or while, 891 * we issue this ioctl. --> device_get() 892 */ Device removal can't happen before or during the ioctl execution. > > Really ? device->spi is NULL-ed in remove() so that operations on > > remaining fds can detect remove() was already called and free remaining > > resources: > > > > 1296 /* make sure ops on existing fds can abort cleanly */ > > 1297 device->spi = NULL; > > That's when we're unloading the module so there aren't any users left. I'll submit an updated version of my patch getting rid of the counter and addressing the remaining race conditions. Thanks ! Regards, Hugo -- Hugo Lefeuvre (hle)|www.owl.eu.com 4096/ 9C4F C8BF A4B0 8FC5 48EB 56B8 1962 765B B9A8 BACA signature.asc Description: PGP signature
Re: [PATCH] staging: pi433: fix race condition in pi433_open
Hi Dan, > We need to decrement device->users-- on the error paths as well. > This function was already slightly broken with respect to counting the > users, but let's not make it worse. > > I think it's still a tiny bit racy because it's not an atomic type. Oh right, I missed that. I'll fix it in the v2. :) > I'm not sure the error handling in open() works either. It's releasing > device->rx_buffer but there could be another user. Agree. > The ->rx_buffer > should be allocated in probe() instead of open() probably, no? And then > freed in pi433_remove(). Then once we put that in the right layer > it means we can just get rid of ->users... It would be great to get rid of this counter, indeed. But how to do it properly without breaking things ? It seems to be useful to me... For example, how do you handle the case where remove() is called but some operations are still running on existing fds ? What if remove frees the rx_buffer while a read() call executes this ? copy_to_user(buf, device->rx_buffer, bytes_received) rx_buffer is freed by release() because it's the only buffer from the device structure used in read/write/ioctl, meaning that we can only free it when we are sure that it isn't used anywhere anymore. So, we can't do it in remove() unless remove() is delayed until the last release() has returned. > The lines: > > 1008 if (!device->spi) > 1009 kfree(device); > > make no sort of sense at all... Fortunately it's not posssible for > device->spi to be NULL so it's dead code. Really ? device->spi is NULL-ed in remove() so that operations on remaining fds can detect remove() was already called and free remaining resources: 1296 /* make sure ops on existing fds can abort cleanly */ 1297 device->spi = NULL; Thanks for your time ! Regards, Hugo -- Hugo Lefeuvre (hle)|www.owl.eu.com 4096/ 9C4F C8BF A4B0 8FC5 48EB 56B8 1962 765B B9A8 BACA
Re: [PATCH] staging: pi433: fix race condition in pi433_open
Hi Dan, > We need to decrement device->users-- on the error paths as well. > This function was already slightly broken with respect to counting the > users, but let's not make it worse. > > I think it's still a tiny bit racy because it's not an atomic type. Oh right, I missed that. I'll fix it in the v2. :) > I'm not sure the error handling in open() works either. It's releasing > device->rx_buffer but there could be another user. Agree. > The ->rx_buffer > should be allocated in probe() instead of open() probably, no? And then > freed in pi433_remove(). Then once we put that in the right layer > it means we can just get rid of ->users... It would be great to get rid of this counter, indeed. But how to do it properly without breaking things ? It seems to be useful to me... For example, how do you handle the case where remove() is called but some operations are still running on existing fds ? What if remove frees the rx_buffer while a read() call executes this ? copy_to_user(buf, device->rx_buffer, bytes_received) rx_buffer is freed by release() because it's the only buffer from the device structure used in read/write/ioctl, meaning that we can only free it when we are sure that it isn't used anywhere anymore. So, we can't do it in remove() unless remove() is delayed until the last release() has returned. > The lines: > > 1008 if (!device->spi) > 1009 kfree(device); > > make no sort of sense at all... Fortunately it's not posssible for > device->spi to be NULL so it's dead code. Really ? device->spi is NULL-ed in remove() so that operations on remaining fds can detect remove() was already called and free remaining resources: 1296 /* make sure ops on existing fds can abort cleanly */ 1297 device->spi = NULL; Thanks for your time ! Regards, Hugo -- Hugo Lefeuvre (hle)|www.owl.eu.com 4096/ 9C4F C8BF A4B0 8FC5 48EB 56B8 1962 765B B9A8 BACA
[PATCH] staging: pi433: fix race condition in pi433_open
Whenever pi433_open and pi433_remove execute concurrently, a race condition potentially resulting in use-after-free might happen. Let T1 and T2 be two kernel threads. 1. T1 executes pi433_open and stops before "device->users++". 2. The pi433 device was removed inbetween, so T2 executes pi433_remove and frees device because the user count has not been incremented yet. 3. T1 executes "device->users++" (use-after-free). This race condition happens because the check of minor number and user count increment does not happen atomically. Fix: Extend scope of minor_lock in pi433_open(). Signed-off-by: Hugo Lefeuvre --- drivers/staging/pi433/pi433_if.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/staging/pi433/pi433_if.c b/drivers/staging/pi433/pi433_if.c index 94e0bfcec991..73c511249f7f 100644 --- a/drivers/staging/pi433/pi433_if.c +++ b/drivers/staging/pi433/pi433_if.c @@ -957,11 +957,13 @@ static int pi433_open(struct inode *inode, struct file *filp) mutex_lock(_lock); device = idr_find(_idr, iminor(inode)); - mutex_unlock(_lock); if (!device) { + mutex_unlock(_lock); pr_debug("device: minor %d unknown.\n", iminor(inode)); return -ENODEV; } + device->users++; + mutex_unlock(_lock); if (!device->rx_buffer) { device->rx_buffer = kmalloc(MAX_MSG_SIZE, GFP_KERNEL); @@ -969,7 +971,6 @@ static int pi433_open(struct inode *inode, struct file *filp) return -ENOMEM; } - device->users++; instance = kzalloc(sizeof(*instance), GFP_KERNEL); if (!instance) { kfree(device->rx_buffer); -- 2.17.1
[PATCH] staging: pi433: fix race condition in pi433_open
Whenever pi433_open and pi433_remove execute concurrently, a race condition potentially resulting in use-after-free might happen. Let T1 and T2 be two kernel threads. 1. T1 executes pi433_open and stops before "device->users++". 2. The pi433 device was removed inbetween, so T2 executes pi433_remove and frees device because the user count has not been incremented yet. 3. T1 executes "device->users++" (use-after-free). This race condition happens because the check of minor number and user count increment does not happen atomically. Fix: Extend scope of minor_lock in pi433_open(). Signed-off-by: Hugo Lefeuvre --- drivers/staging/pi433/pi433_if.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/staging/pi433/pi433_if.c b/drivers/staging/pi433/pi433_if.c index 94e0bfcec991..73c511249f7f 100644 --- a/drivers/staging/pi433/pi433_if.c +++ b/drivers/staging/pi433/pi433_if.c @@ -957,11 +957,13 @@ static int pi433_open(struct inode *inode, struct file *filp) mutex_lock(_lock); device = idr_find(_idr, iminor(inode)); - mutex_unlock(_lock); if (!device) { + mutex_unlock(_lock); pr_debug("device: minor %d unknown.\n", iminor(inode)); return -ENODEV; } + device->users++; + mutex_unlock(_lock); if (!device->rx_buffer) { device->rx_buffer = kmalloc(MAX_MSG_SIZE, GFP_KERNEL); @@ -969,7 +971,6 @@ static int pi433_open(struct inode *inode, struct file *filp) return -ENOMEM; } - device->users++; instance = kzalloc(sizeof(*instance), GFP_KERNEL); if (!instance) { kfree(device->rx_buffer); -- 2.17.1
rf69_set_deviation in rf69.c (pi433 driver)
Hi Marcus, I have been taking a look at the pi433 driver these last days, and started working on the remaining TODOs. I just stumbled across the following one (drivers/staging/pi433/rf69.c): 245 // TODO: Dependency to bitrate 246 if (deviation < 600 || deviation > 50) { 247 dev_dbg(>dev, "set_deviation: illegal input param"); 248 return -EINVAL; 249 } According to the datasheet[0], the deviation should always be smaller than 300kHz, and the following equation should be respected: (1) FDA + BRF/2 =< 500 kHz Why did you choose 500kHz as max for FDA, instead of 300kHz ? It looks like a bug to me. Concerning the TODO, I can see two solutions currently: 1. Introduce a new rf69_get_bit_rate function which reads REG_BITRATE_MSB and REG_BITRATE_LSB and returns reconstructed BRF. We could use this function in rf69_set_deviation to retrieve the BRF. + clean + intuitive - heavy / slow 2. Store BRF somewhere in rf69.c, initialize it with the default value (4.8 kb/s) and update it when rf69_set_bit_rate is called. + easy - dirty, doesn't fit well with the design of rf69.c (do not store anything, simply expose API) I'd really prefer going for the first one, but I wanted to have your opinion on this. Thanks for your work ! Best regards, Hugo [0] http://www.hoperf.com/upload/rf/RFM69CW-V1.1.pdf [CC-ing Valentin Vidic, he was quite active on the pi433 driver these last months] -- Hugo Lefeuvre (hle)|www.owl.eu.com 4096/ 9C4F C8BF A4B0 8FC5 48EB 56B8 1962 765B B9A8 BACA signature.asc Description: PGP signature
rf69_set_deviation in rf69.c (pi433 driver)
Hi Marcus, I have been taking a look at the pi433 driver these last days, and started working on the remaining TODOs. I just stumbled across the following one (drivers/staging/pi433/rf69.c): 245 // TODO: Dependency to bitrate 246 if (deviation < 600 || deviation > 50) { 247 dev_dbg(>dev, "set_deviation: illegal input param"); 248 return -EINVAL; 249 } According to the datasheet[0], the deviation should always be smaller than 300kHz, and the following equation should be respected: (1) FDA + BRF/2 =< 500 kHz Why did you choose 500kHz as max for FDA, instead of 300kHz ? It looks like a bug to me. Concerning the TODO, I can see two solutions currently: 1. Introduce a new rf69_get_bit_rate function which reads REG_BITRATE_MSB and REG_BITRATE_LSB and returns reconstructed BRF. We could use this function in rf69_set_deviation to retrieve the BRF. + clean + intuitive - heavy / slow 2. Store BRF somewhere in rf69.c, initialize it with the default value (4.8 kb/s) and update it when rf69_set_bit_rate is called. + easy - dirty, doesn't fit well with the design of rf69.c (do not store anything, simply expose API) I'd really prefer going for the first one, but I wanted to have your opinion on this. Thanks for your work ! Best regards, Hugo [0] http://www.hoperf.com/upload/rf/RFM69CW-V1.1.pdf [CC-ing Valentin Vidic, he was quite active on the pi433 driver these last months] -- Hugo Lefeuvre (hle)|www.owl.eu.com 4096/ 9C4F C8BF A4B0 8FC5 48EB 56B8 1962 765B B9A8 BACA signature.asc Description: PGP signature