Re: KASAN: use-after-free Read in ceph_mdsc_destroy
On 2020/7/23 14:27, syzbot wrote: Hello, syzbot found the following issue on: HEAD commit:f932d58a Merge tag 'scsi-fixes' of git://git.kernel.org/pu.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=152c6a8090 kernel config: https://syzkaller.appspot.com/x/.config?x=a160d1053fc89af5 dashboard link: https://syzkaller.appspot.com/bug?extid=b57f46d8d6ea51960b8c compiler: gcc (GCC) 10.1.0-syz 20200507 Unfortunately, I don't have any reproducer for this issue yet. IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+b57f46d8d6ea51960...@syzkaller.appspotmail.com == BUG: KASAN: use-after-free in timer_is_static_object+0x7a/0x90 kernel/time/timer.c:611 Read of size 8 at addr 88809e482380 by task syz-executor.3/15653 If the ceph_mdsc_init() fails, it will free the mdsc already. After freed we should set it to NULL. I will fix it. Thanks for reporting. CPU: 0 PID: 15653 Comm: syz-executor.3 Not tainted 5.8.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x18f/0x20d lib/dump_stack.c:118 print_address_description.constprop.0.cold+0xae/0x436 mm/kasan/report.c:383 __kasan_report mm/kasan/report.c:513 [inline] kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530 timer_is_static_object+0x7a/0x90 kernel/time/timer.c:611 debug_object_assert_init lib/debugobjects.c:866 [inline] debug_object_assert_init+0x1df/0x2e0 lib/debugobjects.c:841 debug_timer_assert_init kernel/time/timer.c:728 [inline] debug_assert_init kernel/time/timer.c:773 [inline] del_timer+0x6d/0x110 kernel/time/timer.c:1196 try_to_grab_pending kernel/workqueue.c:1249 [inline] __cancel_work_timer+0x12d/0x700 kernel/workqueue.c:3092 ceph_mdsc_stop fs/ceph/mds_client.c:4660 [inline] ceph_mdsc_destroy+0x50/0x140 fs/ceph/mds_client.c:4679 destroy_fs_client+0x13/0x200 fs/ceph/super.c:720 ceph_get_tree+0x9e5/0x1660 fs/ceph/super.c:1110 vfs_get_tree+0x89/0x2f0 fs/super.c:1547 do_new_mount fs/namespace.c:2875 [inline] do_mount+0x1592/0x1fe0 fs/namespace.c:3200 __do_sys_mount fs/namespace.c:3410 [inline] __se_sys_mount fs/namespace.c:3387 [inline] __x64_sys_mount+0x18f/0x230 fs/namespace.c:3387 do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:384 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x45c1d9 Code: Bad RIP value. RSP: 002b:7f33d2bc0c78 EFLAGS: 0246 ORIG_RAX: 00a5 RAX: ffda RBX: 0001f400 RCX: 0045c1d9 RDX: 2140 RSI: 20c0 RDI: 25c0 RBP: 0078bf50 R08: R09: R10: R11: 0246 R12: 0078bf0c R13: 7fffaad3cc8f R14: 7f33d2bc19c0 R15: 0078bf0c Allocated by task 15653: save_stack+0x1b/0x40 mm/kasan/common.c:48 set_track mm/kasan/common.c:56 [inline] __kasan_kmalloc.constprop.0+0xc2/0xd0 mm/kasan/common.c:494 kmem_cache_alloc_trace+0x14f/0x2d0 mm/slab.c:3551 kmalloc include/linux/slab.h:555 [inline] kzalloc include/linux/slab.h:669 [inline] ceph_mdsc_init+0x47/0xf10 fs/ceph/mds_client.c:4351 ceph_get_tree+0x4fe/0x1660 fs/ceph/super.c:1063 vfs_get_tree+0x89/0x2f0 fs/super.c:1547 do_new_mount fs/namespace.c:2875 [inline] do_mount+0x1592/0x1fe0 fs/namespace.c:3200 __do_sys_mount fs/namespace.c:3410 [inline] __se_sys_mount fs/namespace.c:3387 [inline] __x64_sys_mount+0x18f/0x230 fs/namespace.c:3387 do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:384 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Freed by task 15653: save_stack+0x1b/0x40 mm/kasan/common.c:48 set_track mm/kasan/common.c:56 [inline] kasan_set_free_info mm/kasan/common.c:316 [inline] __kasan_slab_free+0xf5/0x140 mm/kasan/common.c:455 __cache_free mm/slab.c:3426 [inline] kfree+0x103/0x2c0 mm/slab.c:3757 ceph_mdsc_init+0xc64/0xf10 fs/ceph/mds_client.c:4422 ceph_get_tree+0x4fe/0x1660 fs/ceph/super.c:1063 vfs_get_tree+0x89/0x2f0 fs/super.c:1547 do_new_mount fs/namespace.c:2875 [inline] do_mount+0x1592/0x1fe0 fs/namespace.c:3200 __do_sys_mount fs/namespace.c:3410 [inline] __se_sys_mount fs/namespace.c:3387 [inline] __x64_sys_mount+0x18f/0x230 fs/namespace.c:3387 do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:384 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The buggy address belongs to the object at 88809e482000 which belongs to the cache kmalloc-4k of size 4096 The buggy address is located 896 bytes inside of 4096-byte region [88809e482000, 88809e483000) The buggy address belongs to the page: page:ea0002792080 refcount:1 mapcount:0 mapping: index:0x0 head:ea0002792080 order:1 compound_mapcount:0 flags: 0xfffe010200(slab|head) raw: 00fffe010200 ea0002792008 ea000241dc08
Re: netfilter: does the API break or something else ?
On 2020/5/14 18:54, Phil Sutter wrote: Hi, On Wed, May 13, 2020 at 11:20:35PM +0800, Xiubo Li wrote: Recently I hit one netfilter issue, it seems the API breaks or something else. Just for the record, this was caused by a misconfigured kernel. Yeah, thanks Phil for your help. BRs Xiubo Cheers, Phil
netfilter: does the API break or something else ?
Hi Experts, Recently I hit one netfilter issue, it seems the API breaks or something else. On CentOS8.1 with the recent upstream kernel built from source, such as 5.6.0-rc6/5.7.0-rc4. When running the following command: $ sudo bash -c 'iptables -A FORWARD -o enp3s0f1 -i ceph-brx -j ACCEPT' iptables v1.8.2 (nf_tables): CHAIN_ADD failed (Operation not supported): chain INPUT With the nftables command: $ sudo nft add chain ip filter INPUT { type filter hook input priority 0\; } Error: Could not process rule: Operation not supported add chain ip filter INPUT { type filter hook input priority 0; } ^ $ sudo nft add chain ip filter FORWARD { type filter hook forward priority 0\; } Error: Could not process rule: Operation not supported add chain ip filter FORWARD { type filter hook forward priority 0; } ^ $ sudo nft add chain ip filter OUTPUT { type filter hook output priority 0\; } Error: Could not process rule: Operation not supported add chain ip filter OUTPUT { type filter hook output priority 0; } ^^^ While tried them with downstream kernel 4.18.0-147.8.1.el8_1.x86_64, they all could work well. The nftables/libnftnl packages are: $ rpm -qa|grep nft nftables-0.9.0-14.el8.x86_64 libnftnl-1.1.1-4.el8.x86_64 And we have tried v5.7.0-rc4+ with f31 userspace, they all could work well too. From above I just suspect the API should break. Could someone kindly point out which and where ? Thanks BRs Xiubo
Re: [PATCH RESEND] nbd: avoid losing pointer to reallocated config->socks in nbd_add_socket
On 2019/9/21 0:06, Eugene Syromiatnikov wrote: In the (very unlikely) case of config->socks reallocation success and nsock allocation failure config->nsock will not get updated with the new pointer to socks array. Fix it by updating config->socks right after reallocation successfulness check. Fixes: 9561a7ade0c2 ("nbd: add multi-connection support") Signed-off-by: Eugene Syromiatnikov Cc: sta...@vger.kernel.org # 4.10+ --- drivers/block/nbd.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index a8e3815..a04c686 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -987,14 +987,14 @@ static int nbd_add_socket(struct nbd_device *nbd, unsigned long arg, sockfd_put(sock); return -ENOMEM; } + config->socks = socks; + nsock = kzalloc(sizeof(struct nbd_sock), GFP_KERNEL); if (!nsock) { sockfd_put(sock); return -ENOMEM; } - config->socks = socks; - This makes sense. If the socks allocating successes, then the old config->socks will be freed by krealloc() and return the new one, but if the nsock allocating fails, the config->socks will hold the released memory, which may cause the kernel crash. Thanks BRs nsock->fallback_index = -1; nsock->dead = false; mutex_init(>tx_lock);
Re: [RFC PATCH] memalloc_noio: update the comment to make it cleaner
On 2019/9/18 16:14, Michal Hocko wrote: On Wed 18-09-19 16:02:52, Xiubo Li wrote: On 2019/9/18 15:25, Michal Hocko wrote: On Wed 18-09-19 04:58:20, xiu...@redhat.com wrote: From: Xiubo Li The GFP_NOIO means all further allocations will implicitly drop both __GFP_IO and __GFP_FS flags and so they are safe for both the IO critical section and the the critical section from the allocation recursion point of view. Not only the __GFP_IO, which a bit confusing when reading the code or using the save/restore pair. Historically GFP_NOIO has always implied GFP_NOFS as well. I can imagine that this might come as an surprise for somebody not familiar with the code though. Yeah, it true. I am wondering whether your update of the documentation would be better off at __GFP_FS, __GFP_IO resp. GFP_NOFS, GFP_NOIO level. This interface is simply a way to set a scoped NO{IO,FS} context. The "Documentation/core-api/gfp_mask-from-fs-io.rst" is already very detail about them all. This fixing just means to make sure that it won't surprise someone who is having a quickly through some code and not familiar much about the detail. It may make not much sense ? Ohh, I do not think this would be senseless. I just think that the NOIO implying NOFS as well should be described at the level where they are documented rather than the api you have chosen. Hmm, yeah totally agree :-) Thanks BRs
Re: [RFC PATCH] memalloc_noio: update the comment to make it cleaner
On 2019/9/18 15:25, Michal Hocko wrote: On Wed 18-09-19 04:58:20, xiu...@redhat.com wrote: From: Xiubo Li The GFP_NOIO means all further allocations will implicitly drop both __GFP_IO and __GFP_FS flags and so they are safe for both the IO critical section and the the critical section from the allocation recursion point of view. Not only the __GFP_IO, which a bit confusing when reading the code or using the save/restore pair. Historically GFP_NOIO has always implied GFP_NOFS as well. I can imagine that this might come as an surprise for somebody not familiar with the code though. Yeah, it true. I am wondering whether your update of the documentation would be better off at __GFP_FS, __GFP_IO resp. GFP_NOFS, GFP_NOIO level. This interface is simply a way to set a scoped NO{IO,FS} context. The "Documentation/core-api/gfp_mask-from-fs-io.rst" is already very detail about them all. This fixing just means to make sure that it won't surprise someone who is having a quickly through some code and not familiar much about the detail. It may make not much sense ? Thanks, BRs Xiubo Signed-off-by: Xiubo Li --- include/linux/sched/mm.h | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index 4a7944078cc3..9bdc97e52de1 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -211,10 +211,11 @@ static inline void fs_reclaim_release(gfp_t gfp_mask) { } * memalloc_noio_save - Marks implicit GFP_NOIO allocation scope. * * This functions marks the beginning of the GFP_NOIO allocation scope. - * All further allocations will implicitly drop __GFP_IO flag and so - * they are safe for the IO critical section from the allocation recursion - * point of view. Use memalloc_noio_restore to end the scope with flags - * returned by this function. + * All further allocations will implicitly drop __GFP_IO and __GFP_FS + * flags and so they are safe for both the IO critical section and the + * the critical section from the allocation recursion point of view. Use + * memalloc_noio_restore to end the scope with flags returned by this + * function. * * This function is safe to be used from any context. */ -- 2.21.0
Re: [PATCH v3] driver: uio: fix possible memory leak in uio_open
On 2019/1/3 22:28, liujian wrote: Fixes: 57c5f4df0a5a ("uio: fix crash after the device is unregistered") Signed-off-by: liujian --- v1->v2: rename the "err_infoopen" to "err_idev_info" v2->3: put the extra info after the "--" Looks good to me. Thanks. BRs drivers/uio/uio.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c index 5c10fc7..aab3520 100644 --- a/drivers/uio/uio.c +++ b/drivers/uio/uio.c @@ -496,18 +496,19 @@ static int uio_open(struct inode *inode, struct file *filep) if (!idev->info) { mutex_unlock(>info_lock); ret = -EINVAL; - goto err_alloc_listener; + goto err_idev_info; } if (idev->info && idev->info->open) ret = idev->info->open(idev->info, inode); mutex_unlock(>info_lock); if (ret) - goto err_infoopen; + goto err_idev_info; return 0; -err_infoopen: +err_idev_info: + filep->private_data = NULL; kfree(listener); err_alloc_listener:
Re: [PATCH] driver: uio: fix possible memory leak in uio_open
On 2019/1/3 0:26, liujian wrote: Fixes: 57c5f4df0a5a ("uio: fix crash after the device is unregistered") Signed-off-by: liujian --- drivers/uio/uio.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c index 5c10fc7..bde7d7a 100644 --- a/drivers/uio/uio.c +++ b/drivers/uio/uio.c @@ -496,7 +496,7 @@ static int uio_open(struct inode *inode, struct file *filep) if (!idev->info) { mutex_unlock(>info_lock); ret = -EINVAL; - goto err_alloc_listener; + goto err_infoopen; } if (idev->info && idev->info->open) @@ -508,6 +508,7 @@ static int uio_open(struct inode *inode, struct file *filep) return 0; err_infoopen: Maybe we should rename the "err_infoopen" to something like "err_idev_info"... Thanks. BRs + filep->private_data = NULL; kfree(listener); err_alloc_listener:
Re: Problems with uio_pci_generic in v4.18-rc8
On 2018/8/12 19:03, gre...@linuxfoundation.org wrote: On Sun, Aug 12, 2018 at 06:55:58PM +0800, Xiubo Li wrote: Hi Harris, Since the lock protection in uio irq handler is useless currently as discussed in the mail list, so I will revert this commit with small changes and now the testing based LIO/TCMU has been finished. Can you send the fixup patch so I can apply it please? Yeah, sure. Thanks, thanks, greg k-h
Re: Problems with uio_pci_generic in v4.18-rc8
On 2018/8/12 19:03, gre...@linuxfoundation.org wrote: On Sun, Aug 12, 2018 at 06:55:58PM +0800, Xiubo Li wrote: Hi Harris, Since the lock protection in uio irq handler is useless currently as discussed in the mail list, so I will revert this commit with small changes and now the testing based LIO/TCMU has been finished. Can you send the fixup patch so I can apply it please? Yeah, sure. Thanks, thanks, greg k-h
Re: Problems with uio_pci_generic in v4.18-rc8
Hi Harris, Since the lock protection in uio irq handler is useless currently as discussed in the mail list, so I will revert this commit with small changes and now the testing based LIO/TCMU has been finished. Thanks, BRs On 2018/8/11 5:12, Harris, James R wrote: Hi, Using Linux kernel v4.18-rc8, I am unable to bind a PCI device to uio_pci_generic. This works fine with v4.18-rc4. dmesg shows: [ 336.221585] genirq: Threaded irq requested with handler=NULL and !ONESHOT for irq 33 [ 336.221681] uio_pci_generic: probe of :04:00.0 failed with error -22 This problem has been reproduced on multiple systems. This seems to be related to commit 9421e45f5 (“uio: use request_threaded_irq instead”) which was merged in -rc5. Reverting this commit fixes the issue (I also reverted 543af5861f and 57c5f4df0a which also touch uio.c - just to be safe). This seems like a serious regression to uio_pci_generic, but any tips or pointers would be appreciated. Thanks, Jim Harris
Re: Problems with uio_pci_generic in v4.18-rc8
Hi Harris, Since the lock protection in uio irq handler is useless currently as discussed in the mail list, so I will revert this commit with small changes and now the testing based LIO/TCMU has been finished. Thanks, BRs On 2018/8/11 5:12, Harris, James R wrote: Hi, Using Linux kernel v4.18-rc8, I am unable to bind a PCI device to uio_pci_generic. This works fine with v4.18-rc4. dmesg shows: [ 336.221585] genirq: Threaded irq requested with handler=NULL and !ONESHOT for irq 33 [ 336.221681] uio_pci_generic: probe of :04:00.0 failed with error -22 This problem has been reproduced on multiple systems. This seems to be related to commit 9421e45f5 (“uio: use request_threaded_irq instead”) which was merged in -rc5. Reverting this commit fixes the issue (I also reverted 543af5861f and 57c5f4df0a which also touch uio.c - just to be safe). This seems like a serious regression to uio_pci_generic, but any tips or pointers would be appreciated. Thanks, Jim Harris
Re: [PATCH] uio: fix wrong return value from uio_mmap()
On 2018/7/20 8:31, Hailong Liu wrote: uio_mmap has multiple fail paths to set return value to nonzero then goto out. However, it always returns *0* from the *out* at end, and this will mislead callers who check the return value of this function. Fixes: 57c5f4df0a5a0ee ("uio: fix crash after the device is unregistered") CC: Xiubo Li Signed-off-by: Hailong Liu Signed-off-by: Jiang Biao --- drivers/uio/uio.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c index 5d421d7..0ddfda2 100644 --- a/drivers/uio/uio.c +++ b/drivers/uio/uio.c @@ -814,7 +814,7 @@ static int uio_mmap(struct file *filep, struct vm_area_struct *vma) out: mutex_unlock(>info_lock); - return 0; + return ret; Hi Hailong, Good catch, Thanks. BRs } static const struct file_operations uio_fops = {
Re: [PATCH] uio: fix wrong return value from uio_mmap()
On 2018/7/20 8:31, Hailong Liu wrote: uio_mmap has multiple fail paths to set return value to nonzero then goto out. However, it always returns *0* from the *out* at end, and this will mislead callers who check the return value of this function. Fixes: 57c5f4df0a5a0ee ("uio: fix crash after the device is unregistered") CC: Xiubo Li Signed-off-by: Hailong Liu Signed-off-by: Jiang Biao --- drivers/uio/uio.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c index 5d421d7..0ddfda2 100644 --- a/drivers/uio/uio.c +++ b/drivers/uio/uio.c @@ -814,7 +814,7 @@ static int uio_mmap(struct file *filep, struct vm_area_struct *vma) out: mutex_unlock(>info_lock); - return 0; + return ret; Hi Hailong, Good catch, Thanks. BRs } static const struct file_operations uio_fops = {
Re: [PATCH v4 0/3] uio: fix potential crash bug
On 2018/7/13 4:34, Hamish Martin wrote: Hi Xiubo, Tested-by: Hamish Martin I see these were already merged into Linus' tree but I wanted to let you know that I tested v4.18-rc4 (which contains these three patches) and the issue which led to my original series is still fixed and this patch series of yours has caused no regression in our scenario. This was tested on PPC 64 bit system (NXP T2081). Hi Hamish, Cool and thanks very much for your testing about this. BRs Xiubo Thanks for resolving that. Regards, Hamish M. On 07/07/2018 02:05 PM, xiu...@redhat.com wrote: From: Xiubo Li V2: - resend it with some small fix V3: - switch to use request_threaded_irq V4: - remove useless checking code, Thanks Mike. - Thanks very much for the review from Hamish and Mike. Xiubo Li (3): uio: use request_threaded_irq instead uio: change to use the mutex lock instead of the spin lock uio: fix crash after the device is unregistered drivers/uio/uio.c | 139 + include/linux/uio_driver.h | 2 +- 2 files changed, 104 insertions(+), 37 deletions(-)
Re: [PATCH v4 0/3] uio: fix potential crash bug
On 2018/7/13 4:34, Hamish Martin wrote: Hi Xiubo, Tested-by: Hamish Martin I see these were already merged into Linus' tree but I wanted to let you know that I tested v4.18-rc4 (which contains these three patches) and the issue which led to my original series is still fixed and this patch series of yours has caused no regression in our scenario. This was tested on PPC 64 bit system (NXP T2081). Hi Hamish, Cool and thanks very much for your testing about this. BRs Xiubo Thanks for resolving that. Regards, Hamish M. On 07/07/2018 02:05 PM, xiu...@redhat.com wrote: From: Xiubo Li V2: - resend it with some small fix V3: - switch to use request_threaded_irq V4: - remove useless checking code, Thanks Mike. - Thanks very much for the review from Hamish and Mike. Xiubo Li (3): uio: use request_threaded_irq instead uio: change to use the mutex lock instead of the spin lock uio: fix crash after the device is unregistered drivers/uio/uio.c | 139 + include/linux/uio_driver.h | 2 +- 2 files changed, 104 insertions(+), 37 deletions(-)
Re: [PATCH v3 3/3] uio: fix crash after the device is unregistered
On 2018/7/10 1:06, Mike Christie wrote: On 07/06/2018 08:28 PM, Xiubo Li wrote: On 2018/7/7 2:23, Mike Christie wrote: On 07/05/2018 09:57 PM, xiu...@redhat.com wrote: static irqreturn_t uio_interrupt(int irq, void *dev_id) { struct uio_device *idev = (struct uio_device *)dev_id; -irqreturn_t ret = idev->info->handler(irq, idev->info); +irqreturn_t ret; + +mutex_lock(>info_lock); +if (!idev->info) { +ret = IRQ_NONE; +goto out; +} +ret = idev->info->handler(irq, idev->info); if (ret == IRQ_HANDLED) uio_event_notify(idev->info); +out: +mutex_unlock(>info_lock); return ret; } Do you need the interrupt related changes in this patch and the first one? Actually, the NULL checking is not a must, we can remove this. But the lock/unlock is needed. When we do uio_unregister_device -> free_irq does free_irq return when there are no longer running interrupt handlers that we requested? If that is not the case then I think we can hit a similar bug. We do: __uio_register_device -> device_register -> device's refcount goes to zero so we do -> uio_device_release -> kfree(idev) and if it is possible the interrupt handler could still run after free_irq then we would end up doing: uio_interrupt -> mutex_lock(>info_lock) -> idev access freed memory. I think this shouldn't happen. Because the free_irq function does not return until any executing interrupts for this IRQ have completed. If free_irq returns after executing interrupts and does not allow new executions what is the lock protecting in uio_interrupt? I meant idev->info->handler(irq, idev->info), it may should be protected by the related lock in each driver. Thanks,
Re: [PATCH v3 3/3] uio: fix crash after the device is unregistered
On 2018/7/10 1:06, Mike Christie wrote: On 07/06/2018 08:28 PM, Xiubo Li wrote: On 2018/7/7 2:23, Mike Christie wrote: On 07/05/2018 09:57 PM, xiu...@redhat.com wrote: static irqreturn_t uio_interrupt(int irq, void *dev_id) { struct uio_device *idev = (struct uio_device *)dev_id; -irqreturn_t ret = idev->info->handler(irq, idev->info); +irqreturn_t ret; + +mutex_lock(>info_lock); +if (!idev->info) { +ret = IRQ_NONE; +goto out; +} +ret = idev->info->handler(irq, idev->info); if (ret == IRQ_HANDLED) uio_event_notify(idev->info); +out: +mutex_unlock(>info_lock); return ret; } Do you need the interrupt related changes in this patch and the first one? Actually, the NULL checking is not a must, we can remove this. But the lock/unlock is needed. When we do uio_unregister_device -> free_irq does free_irq return when there are no longer running interrupt handlers that we requested? If that is not the case then I think we can hit a similar bug. We do: __uio_register_device -> device_register -> device's refcount goes to zero so we do -> uio_device_release -> kfree(idev) and if it is possible the interrupt handler could still run after free_irq then we would end up doing: uio_interrupt -> mutex_lock(>info_lock) -> idev access freed memory. I think this shouldn't happen. Because the free_irq function does not return until any executing interrupts for this IRQ have completed. If free_irq returns after executing interrupts and does not allow new executions what is the lock protecting in uio_interrupt? I meant idev->info->handler(irq, idev->info), it may should be protected by the related lock in each driver. Thanks,
Re: [PATCH v3 3/3] uio: fix crash after the device is unregistered
On 2018/7/10 0:40, Mike Christie wrote: On 07/06/2018 08:47 PM, Xiubo Li wrote: On 2018/7/7 2:58, Mike Christie wrote: On 07/05/2018 09:57 PM, xiu...@redhat.com wrote: void uio_event_notify(struct uio_info *info) { -struct uio_device *idev = info->uio_dev; +struct uio_device *idev; + +if (!info) +return; + +idev = info->uio_dev; For this one too, I am not sure if it is needed. uio_interrupt -> uio_event_notify. See other mail. driver XYZ -> uio_event_notify. I think drivers need to handle this and set some bits and/or perform some cleanup to make sure they are not calling uio_event_notify after it has called uio_unregister_device. The problem with the above test is if they do not they could have called uio_unregister_device right after the info test so you could still hit the problem. When we are tcmu_destroy_device(), if the netlink notify event to userspace is not successful then the TCMU will call the uio unregister, which will set the idev->info = NULL, without close and deleting the device in userspace. But the TCMU could still queue cmds to the ring buffer, then the uio_event_notify will be called. Before tcmu_destroy_device is called LIO will have stopped new IO and waited for outstanding IO to be completed, so it would never be called after uio_unregister_device. If it does, it's a bug in LIO. Yeah, So this should be fixed now. Only hit one crash seem related to uio of this code with lower kernel before. Thanks, BRs
Re: [PATCH v3 3/3] uio: fix crash after the device is unregistered
On 2018/7/10 0:40, Mike Christie wrote: On 07/06/2018 08:47 PM, Xiubo Li wrote: On 2018/7/7 2:58, Mike Christie wrote: On 07/05/2018 09:57 PM, xiu...@redhat.com wrote: void uio_event_notify(struct uio_info *info) { -struct uio_device *idev = info->uio_dev; +struct uio_device *idev; + +if (!info) +return; + +idev = info->uio_dev; For this one too, I am not sure if it is needed. uio_interrupt -> uio_event_notify. See other mail. driver XYZ -> uio_event_notify. I think drivers need to handle this and set some bits and/or perform some cleanup to make sure they are not calling uio_event_notify after it has called uio_unregister_device. The problem with the above test is if they do not they could have called uio_unregister_device right after the info test so you could still hit the problem. When we are tcmu_destroy_device(), if the netlink notify event to userspace is not successful then the TCMU will call the uio unregister, which will set the idev->info = NULL, without close and deleting the device in userspace. But the TCMU could still queue cmds to the ring buffer, then the uio_event_notify will be called. Before tcmu_destroy_device is called LIO will have stopped new IO and waited for outstanding IO to be completed, so it would never be called after uio_unregister_device. If it does, it's a bug in LIO. Yeah, So this should be fixed now. Only hit one crash seem related to uio of this code with lower kernel before. Thanks, BRs
Re: [PATCH v3 3/3] uio: fix crash after the device is unregistered
On 2018/7/7 2:58, Mike Christie wrote: On 07/05/2018 09:57 PM, xiu...@redhat.com wrote: void uio_event_notify(struct uio_info *info) { - struct uio_device *idev = info->uio_dev; + struct uio_device *idev; + + if (!info) + return; + + idev = info->uio_dev; For this one too, I am not sure if it is needed. uio_interrupt -> uio_event_notify. See other mail. driver XYZ -> uio_event_notify. I think drivers need to handle this and set some bits and/or perform some cleanup to make sure they are not calling uio_event_notify after it has called uio_unregister_device. The problem with the above test is if they do not they could have called uio_unregister_device right after the info test so you could still hit the problem. When we are tcmu_destroy_device(), if the netlink notify event to userspace is not successful then the TCMU will call the uio unregister, which will set the idev->info = NULL, without close and deleting the device in userspace. But the TCMU could still queue cmds to the ring buffer, then the uio_event_notify will be called. For this case only when using idev->info it would happen. And currently there is no need to check this, I will remove it for now. Thanks, BRs
Re: [PATCH v3 3/3] uio: fix crash after the device is unregistered
On 2018/7/7 2:58, Mike Christie wrote: On 07/05/2018 09:57 PM, xiu...@redhat.com wrote: void uio_event_notify(struct uio_info *info) { - struct uio_device *idev = info->uio_dev; + struct uio_device *idev; + + if (!info) + return; + + idev = info->uio_dev; For this one too, I am not sure if it is needed. uio_interrupt -> uio_event_notify. See other mail. driver XYZ -> uio_event_notify. I think drivers need to handle this and set some bits and/or perform some cleanup to make sure they are not calling uio_event_notify after it has called uio_unregister_device. The problem with the above test is if they do not they could have called uio_unregister_device right after the info test so you could still hit the problem. When we are tcmu_destroy_device(), if the netlink notify event to userspace is not successful then the TCMU will call the uio unregister, which will set the idev->info = NULL, without close and deleting the device in userspace. But the TCMU could still queue cmds to the ring buffer, then the uio_event_notify will be called. For this case only when using idev->info it would happen. And currently there is no need to check this, I will remove it for now. Thanks, BRs
Re: [PATCH v3 3/3] uio: fix crash after the device is unregistered
On 2018/7/7 2:23, Mike Christie wrote: On 07/05/2018 09:57 PM, xiu...@redhat.com wrote: static irqreturn_t uio_interrupt(int irq, void *dev_id) { struct uio_device *idev = (struct uio_device *)dev_id; - irqreturn_t ret = idev->info->handler(irq, idev->info); + irqreturn_t ret; + + mutex_lock(>info_lock); + if (!idev->info) { + ret = IRQ_NONE; + goto out; + } + ret = idev->info->handler(irq, idev->info); if (ret == IRQ_HANDLED) uio_event_notify(idev->info); +out: + mutex_unlock(>info_lock); return ret; } Do you need the interrupt related changes in this patch and the first one? Actually, the NULL checking is not a must, we can remove this. But the lock/unlock is needed. When we do uio_unregister_device -> free_irq does free_irq return when there are no longer running interrupt handlers that we requested? If that is not the case then I think we can hit a similar bug. We do: __uio_register_device -> device_register -> device's refcount goes to zero so we do -> uio_device_release -> kfree(idev) and if it is possible the interrupt handler could still run after free_irq then we would end up doing: uio_interrupt -> mutex_lock(>info_lock) -> idev access freed memory. I think this shouldn't happen. Because the free_irq function does not return until any executing interrupts for this IRQ have completed. Thanks, BRs
Re: [PATCH v3 3/3] uio: fix crash after the device is unregistered
On 2018/7/7 2:23, Mike Christie wrote: On 07/05/2018 09:57 PM, xiu...@redhat.com wrote: static irqreturn_t uio_interrupt(int irq, void *dev_id) { struct uio_device *idev = (struct uio_device *)dev_id; - irqreturn_t ret = idev->info->handler(irq, idev->info); + irqreturn_t ret; + + mutex_lock(>info_lock); + if (!idev->info) { + ret = IRQ_NONE; + goto out; + } + ret = idev->info->handler(irq, idev->info); if (ret == IRQ_HANDLED) uio_event_notify(idev->info); +out: + mutex_unlock(>info_lock); return ret; } Do you need the interrupt related changes in this patch and the first one? Actually, the NULL checking is not a must, we can remove this. But the lock/unlock is needed. When we do uio_unregister_device -> free_irq does free_irq return when there are no longer running interrupt handlers that we requested? If that is not the case then I think we can hit a similar bug. We do: __uio_register_device -> device_register -> device's refcount goes to zero so we do -> uio_device_release -> kfree(idev) and if it is possible the interrupt handler could still run after free_irq then we would end up doing: uio_interrupt -> mutex_lock(>info_lock) -> idev access freed memory. I think this shouldn't happen. Because the free_irq function does not return until any executing interrupts for this IRQ have completed. Thanks, BRs
Re: [PATCH v3 0/3] uio: fix potential crash bug
On 2018/7/6 11:31, Hamish Martin wrote: Hi Xiubo, Thanks for your patch that addresses the regression found with my earlier commit. I will take your code and run it in our scenario that showed the bug that led to my commit. Unfortunately I won't be able to get that done until mid-next week. I intend to report back to you by July 13th. Hi Hamish, Thanks vey much for your quickly review and reply about this. Take your time :-) BRs Xiubo Thanks, Hamish M On 07/06/2018 02:57 PM, xiu...@redhat.com wrote: From: Xiubo Li The V2 patch set maybe not post successfully, so post it again with one new updation. V2: - resend it with some small fix V3: - switch to use request_threaded_irq Xiubo Li (3): uio: use request_threaded_irq instead uio: change to use the mutex lock instead of the spin lock uio: fix crash after the device is unregistered drivers/uio/uio.c | 151 ++--- include/linux/uio_driver.h | 2 +- 2 files changed, 115 insertions(+), 38 deletions(-)
Re: [PATCH v3 0/3] uio: fix potential crash bug
On 2018/7/6 11:31, Hamish Martin wrote: Hi Xiubo, Thanks for your patch that addresses the regression found with my earlier commit. I will take your code and run it in our scenario that showed the bug that led to my commit. Unfortunately I won't be able to get that done until mid-next week. I intend to report back to you by July 13th. Hi Hamish, Thanks vey much for your quickly review and reply about this. Take your time :-) BRs Xiubo Thanks, Hamish M On 07/06/2018 02:57 PM, xiu...@redhat.com wrote: From: Xiubo Li The V2 patch set maybe not post successfully, so post it again with one new updation. V2: - resend it with some small fix V3: - switch to use request_threaded_irq Xiubo Li (3): uio: use request_threaded_irq instead uio: change to use the mutex lock instead of the spin lock uio: fix crash after the device is unregistered drivers/uio/uio.c | 151 ++--- include/linux/uio_driver.h | 2 +- 2 files changed, 115 insertions(+), 38 deletions(-)
Re: [PATCH 2/2] uio: fix crash after the device is unregistered
On 2018/7/6 4:56, Jann Horn wrote: On Thu, Jul 5, 2018 at 10:53 PM wrote: From: Xiubo Li For the target_core_user use case, after the device is unregistered it maybe still opened in user space, then the kernel will crash, like: [...] Signed-off-by: Xiubo Li --- drivers/uio/uio.c | 101 ++ 1 file changed, 86 insertions(+), 15 deletions(-) diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c index 33c3bfe..2b9268a 100644 --- a/drivers/uio/uio.c +++ b/drivers/uio/uio.c [...] @@ -720,30 +775,46 @@ static int uio_mmap(struct file *filep, struct vm_area_struct *vma) vma->vm_private_data = idev; + mutex_lock(>info_lock); + if (!idev->info) { + ret = -EINVAL; + goto out; + } + mi = uio_find_mem_index(vma); - if (mi < 0) - return -EINVAL; + if (mi < 0) { + ret = -EINVAL; + goto out; + } requested_pages = vma_pages(vma); actual_pages = ((idev->info->mem[mi].addr & ~PAGE_MASK) + idev->info->mem[mi].size + PAGE_SIZE -1) >> PAGE_SHIFT; - if (requested_pages > actual_pages) - return -EINVAL; + if (requested_pages > actual_pages) { + ret = -EINVAL; + goto out; + } if (idev->info->mmap) { ret = idev->info->mmap(idev->info, vma); - return ret; + goto out; } switch (idev->info->mem[mi].memtype) { case UIO_MEM_PHYS: - return uio_mmap_physical(vma); + ret = uio_mmap_physical(vma); + break; case UIO_MEM_LOGICAL: case UIO_MEM_VIRTUAL: - return uio_mmap_logical(vma); + ret = uio_mmap_logical(vma); + break; default: - return -EINVAL; + ret = -EINVAL; } + +out: + mutex_lock(>info_lock); This is probably supposed to be mutex_unlock(...)? Yeah yeah, right, Good catch :-) Locally I had fixed this, but after my building and testing just forgot to amend it. Will fix it. Thanks very much. BRs
Re: [PATCH 2/2] uio: fix crash after the device is unregistered
On 2018/7/6 4:56, Jann Horn wrote: On Thu, Jul 5, 2018 at 10:53 PM wrote: From: Xiubo Li For the target_core_user use case, after the device is unregistered it maybe still opened in user space, then the kernel will crash, like: [...] Signed-off-by: Xiubo Li --- drivers/uio/uio.c | 101 ++ 1 file changed, 86 insertions(+), 15 deletions(-) diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c index 33c3bfe..2b9268a 100644 --- a/drivers/uio/uio.c +++ b/drivers/uio/uio.c [...] @@ -720,30 +775,46 @@ static int uio_mmap(struct file *filep, struct vm_area_struct *vma) vma->vm_private_data = idev; + mutex_lock(>info_lock); + if (!idev->info) { + ret = -EINVAL; + goto out; + } + mi = uio_find_mem_index(vma); - if (mi < 0) - return -EINVAL; + if (mi < 0) { + ret = -EINVAL; + goto out; + } requested_pages = vma_pages(vma); actual_pages = ((idev->info->mem[mi].addr & ~PAGE_MASK) + idev->info->mem[mi].size + PAGE_SIZE -1) >> PAGE_SHIFT; - if (requested_pages > actual_pages) - return -EINVAL; + if (requested_pages > actual_pages) { + ret = -EINVAL; + goto out; + } if (idev->info->mmap) { ret = idev->info->mmap(idev->info, vma); - return ret; + goto out; } switch (idev->info->mem[mi].memtype) { case UIO_MEM_PHYS: - return uio_mmap_physical(vma); + ret = uio_mmap_physical(vma); + break; case UIO_MEM_LOGICAL: case UIO_MEM_VIRTUAL: - return uio_mmap_logical(vma); + ret = uio_mmap_logical(vma); + break; default: - return -EINVAL; + ret = -EINVAL; } + +out: + mutex_lock(>info_lock); This is probably supposed to be mutex_unlock(...)? Yeah yeah, right, Good catch :-) Locally I had fixed this, but after my building and testing just forgot to amend it. Will fix it. Thanks very much. BRs
Re: [PATCH 1/2] uio: change to use the mutex lock instead of the spin lock
On 2018/7/6 0:33, Greg KH wrote: On Thu, Jul 05, 2018 at 12:27:27PM -0400, xiu...@redhat.com wrote: From: Xiubo Li We are hitting a regression with the following commit: commit a93e7b331568227500186a465fee3c2cb5dffd1f Author: Hamish Martin Date: Mon May 14 13:32:23 2018 +1200 uio: Prevent device destruction while fds are open The problem is the addition of spin_lock_irqsave in uio_write. This leads to hitting uio_write -> copy_from_user -> _copy_from_user -> might_fault and the logs filling up with sleeping warnings. I also noticed some uio drivers allocate memory, sleep, grab mutexes from callouts like open() and release and uio is now doing spin_lock_irqsave while calling them. Reported-by: Mike Christie Signed-off-by: Xiubo Li --- drivers/uio/uio.c | 33 ++--- include/linux/uio_driver.h | 2 +- 2 files changed, 15 insertions(+), 20 deletions(-) Any specific reason you did not also cc: the author of the above patch to review this one? Ah, no, sorry for that, I just used the following script to get the mantainer and the list, and here was to late at night and I was a little dizzy forgetting that: [root@gblock2 scsi]# ./scripts/get_maintainer.pl drivers/uio/ Greg Kroah-Hartman (maintainer:USERSPACE I/O (UIO)) linux-kernel@vger.kernel.org (open list) [root@gblock2 scsi]# And all the others are my team member, because this is a high priority bug fix, so I cced them to let them know the status of this patch set. I will resend it right now. Thanks very much for you reminding of this. BRs Xiubo Please fix up and resend. greg k-h
Re: [PATCH 1/2] uio: change to use the mutex lock instead of the spin lock
On 2018/7/6 0:33, Greg KH wrote: On Thu, Jul 05, 2018 at 12:27:27PM -0400, xiu...@redhat.com wrote: From: Xiubo Li We are hitting a regression with the following commit: commit a93e7b331568227500186a465fee3c2cb5dffd1f Author: Hamish Martin Date: Mon May 14 13:32:23 2018 +1200 uio: Prevent device destruction while fds are open The problem is the addition of spin_lock_irqsave in uio_write. This leads to hitting uio_write -> copy_from_user -> _copy_from_user -> might_fault and the logs filling up with sleeping warnings. I also noticed some uio drivers allocate memory, sleep, grab mutexes from callouts like open() and release and uio is now doing spin_lock_irqsave while calling them. Reported-by: Mike Christie Signed-off-by: Xiubo Li --- drivers/uio/uio.c | 33 ++--- include/linux/uio_driver.h | 2 +- 2 files changed, 15 insertions(+), 20 deletions(-) Any specific reason you did not also cc: the author of the above patch to review this one? Ah, no, sorry for that, I just used the following script to get the mantainer and the list, and here was to late at night and I was a little dizzy forgetting that: [root@gblock2 scsi]# ./scripts/get_maintainer.pl drivers/uio/ Greg Kroah-Hartman (maintainer:USERSPACE I/O (UIO)) linux-kernel@vger.kernel.org (open list) [root@gblock2 scsi]# And all the others are my team member, because this is a high priority bug fix, so I cced them to let them know the status of this patch set. I will resend it right now. Thanks very much for you reminding of this. BRs Xiubo Please fix up and resend. greg k-h
Re: [PATCH] tcmu: Add fifo type waiter list support to avoidstarvation
Hi Nic & Mike I will update this just after the issue reported by Bryant on his environment been fixed later. Thanks, BRs Xiubo On 2017年06月04日 12:11, Mike Christie wrote: On 05/04/2017 09:51 PM, lixi...@cmss.chinamobile.com wrote: From: Xiubo Li <lixi...@cmss.chinamobile.com> The fifo type waiter list will hold the udevs who are waiting for the blocks from the data global pool. The unmap thread will try to feed the first udevs in waiter list, if the global free blocks available are not enough, it will stop traversing the list and abort waking up the others. Signed-off-by: Xiubo Li <lixi...@cmss.chinamobile.com> --- drivers/target/target_core_user.c | 82 +++ 1 file changed, 66 insertions(+), 16 deletions(-) diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c index 9045837..10c99e7 100644 --- a/drivers/target/target_core_user.c +++ b/drivers/target/target_core_user.c @@ -97,6 +97,7 @@ struct tcmu_hba { struct tcmu_dev { struct list_head node; + struct list_head waiter; struct se_device se_dev; @@ -123,7 +124,7 @@ struct tcmu_dev { wait_queue_head_t wait_cmdr; struct mutex cmdr_lock; - bool waiting_global; + uint32_t waiting_blocks; uint32_t dbi_max; uint32_t dbi_thresh; DECLARE_BITMAP(data_bitmap, DATA_BLOCK_BITS); @@ -165,6 +166,10 @@ struct tcmu_cmd { static DEFINE_MUTEX(root_udev_mutex); static LIST_HEAD(root_udev); +/* The data blocks global pool waiter list */ +static DEFINE_MUTEX(root_udev_waiter_mutex); Sorry for the delay. Could you just add a comment about the lock ordering. It will be helpful to new engineers/reviewers to check for errors. +static LIST_HEAD(root_udev_waiter); + static atomic_t global_db_count = ATOMIC_INIT(0); static struct kmem_cache *tcmu_cmd_cache; @@ -195,6 +200,11 @@ enum tcmu_multicast_groups { #define tcmu_cmd_set_dbi(cmd, index) ((cmd)->dbi[(cmd)->dbi_cur++] = (index)) #define tcmu_cmd_get_dbi(cmd) ((cmd)->dbi[(cmd)->dbi_cur++]) +static inline bool is_in_waiter_list(struct tcmu_dev *udev) +{ + return !!udev->waiting_blocks; +} + static void tcmu_cmd_free_data(struct tcmu_cmd *tcmu_cmd, uint32_t len) { struct tcmu_dev *udev = tcmu_cmd->tcmu_dev; @@ -250,8 +260,6 @@ static bool tcmu_get_empty_blocks(struct tcmu_dev *udev, { int i; - udev->waiting_global = false; - for (i = tcmu_cmd->dbi_cur; i < tcmu_cmd->dbi_cnt; i++) { if (!tcmu_get_empty_block(udev, tcmu_cmd)) goto err; @@ -259,9 +267,7 @@ static bool tcmu_get_empty_blocks(struct tcmu_dev *udev, return true; err: - udev->waiting_global = true; - /* Try to wake up the unmap thread */ - wake_up(_wait); + udev->waiting_blocks += tcmu_cmd->dbi_cnt - i; return false; } @@ -671,6 +677,7 @@ static inline size_t tcmu_cmd_get_cmd_size(struct tcmu_cmd *tcmu_cmd, while (!is_ring_space_avail(udev, tcmu_cmd, command_size, data_length)) { int ret; + bool is_waiting_blocks = !!udev->waiting_blocks; You can use your helper is_in_waiter_list(). DEFINE_WAIT(__wait); prepare_to_wait(>wait_cmdr, &__wait, TASK_INTERRUPTIBLE); @@ -688,6 +695,18 @@ static inline size_t tcmu_cmd_get_cmd_size(struct tcmu_cmd *tcmu_cmd, return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE; } + /* +* Try to insert the current udev to waiter list and +* then wake up the unmap thread +*/ + if (is_waiting_blocks) { + mutex_lock(_udev_waiter_mutex); + list_add_tail(>waiter, _udev_waiter); + mutex_unlock(_udev_waiter_mutex); + + wake_up(_wait); + } Is this supposed to go before the schedule_timeout call? If we are here and schedule_timeout returned non zero then our wait_cmdr has been woken up from the unmap thread because it freed some space, so we do not want to again add ourself and call unmap again. + mutex_lock(>cmdr_lock); /* We dropped cmdr_lock, cmd_head is stale */ @@ -902,8 +921,6 @@ static unsigned int tcmu_handle_completions(struct tcmu_dev *udev) if (mb->cmd_tail == mb->cmd_head) del_timer(>timeout); /* no more pending cmds */ - wake_up(>wait_cmdr); - return handled; } @@ -996,7 +1013,17 @@ static int tcmu_irqcontrol(struct uio_info *info, s32 irq_on) struct tcmu_dev *tcmu_dev = container_of(info, struct tcmu_dev, uio_info); mutex_lock(_dev->cmdr_lock); - tcmu_handle_completions(tcmu_dev); + /* +* If the current udev is also in wait
Re: [PATCH] tcmu: Add fifo type waiter list support to avoidstarvation
Hi Nic & Mike I will update this just after the issue reported by Bryant on his environment been fixed later. Thanks, BRs Xiubo On 2017年06月04日 12:11, Mike Christie wrote: On 05/04/2017 09:51 PM, lixi...@cmss.chinamobile.com wrote: From: Xiubo Li The fifo type waiter list will hold the udevs who are waiting for the blocks from the data global pool. The unmap thread will try to feed the first udevs in waiter list, if the global free blocks available are not enough, it will stop traversing the list and abort waking up the others. Signed-off-by: Xiubo Li --- drivers/target/target_core_user.c | 82 +++ 1 file changed, 66 insertions(+), 16 deletions(-) diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c index 9045837..10c99e7 100644 --- a/drivers/target/target_core_user.c +++ b/drivers/target/target_core_user.c @@ -97,6 +97,7 @@ struct tcmu_hba { struct tcmu_dev { struct list_head node; + struct list_head waiter; struct se_device se_dev; @@ -123,7 +124,7 @@ struct tcmu_dev { wait_queue_head_t wait_cmdr; struct mutex cmdr_lock; - bool waiting_global; + uint32_t waiting_blocks; uint32_t dbi_max; uint32_t dbi_thresh; DECLARE_BITMAP(data_bitmap, DATA_BLOCK_BITS); @@ -165,6 +166,10 @@ struct tcmu_cmd { static DEFINE_MUTEX(root_udev_mutex); static LIST_HEAD(root_udev); +/* The data blocks global pool waiter list */ +static DEFINE_MUTEX(root_udev_waiter_mutex); Sorry for the delay. Could you just add a comment about the lock ordering. It will be helpful to new engineers/reviewers to check for errors. +static LIST_HEAD(root_udev_waiter); + static atomic_t global_db_count = ATOMIC_INIT(0); static struct kmem_cache *tcmu_cmd_cache; @@ -195,6 +200,11 @@ enum tcmu_multicast_groups { #define tcmu_cmd_set_dbi(cmd, index) ((cmd)->dbi[(cmd)->dbi_cur++] = (index)) #define tcmu_cmd_get_dbi(cmd) ((cmd)->dbi[(cmd)->dbi_cur++]) +static inline bool is_in_waiter_list(struct tcmu_dev *udev) +{ + return !!udev->waiting_blocks; +} + static void tcmu_cmd_free_data(struct tcmu_cmd *tcmu_cmd, uint32_t len) { struct tcmu_dev *udev = tcmu_cmd->tcmu_dev; @@ -250,8 +260,6 @@ static bool tcmu_get_empty_blocks(struct tcmu_dev *udev, { int i; - udev->waiting_global = false; - for (i = tcmu_cmd->dbi_cur; i < tcmu_cmd->dbi_cnt; i++) { if (!tcmu_get_empty_block(udev, tcmu_cmd)) goto err; @@ -259,9 +267,7 @@ static bool tcmu_get_empty_blocks(struct tcmu_dev *udev, return true; err: - udev->waiting_global = true; - /* Try to wake up the unmap thread */ - wake_up(_wait); + udev->waiting_blocks += tcmu_cmd->dbi_cnt - i; return false; } @@ -671,6 +677,7 @@ static inline size_t tcmu_cmd_get_cmd_size(struct tcmu_cmd *tcmu_cmd, while (!is_ring_space_avail(udev, tcmu_cmd, command_size, data_length)) { int ret; + bool is_waiting_blocks = !!udev->waiting_blocks; You can use your helper is_in_waiter_list(). DEFINE_WAIT(__wait); prepare_to_wait(>wait_cmdr, &__wait, TASK_INTERRUPTIBLE); @@ -688,6 +695,18 @@ static inline size_t tcmu_cmd_get_cmd_size(struct tcmu_cmd *tcmu_cmd, return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE; } + /* +* Try to insert the current udev to waiter list and +* then wake up the unmap thread +*/ + if (is_waiting_blocks) { + mutex_lock(_udev_waiter_mutex); + list_add_tail(>waiter, _udev_waiter); + mutex_unlock(_udev_waiter_mutex); + + wake_up(_wait); + } Is this supposed to go before the schedule_timeout call? If we are here and schedule_timeout returned non zero then our wait_cmdr has been woken up from the unmap thread because it freed some space, so we do not want to again add ourself and call unmap again. + mutex_lock(>cmdr_lock); /* We dropped cmdr_lock, cmd_head is stale */ @@ -902,8 +921,6 @@ static unsigned int tcmu_handle_completions(struct tcmu_dev *udev) if (mb->cmd_tail == mb->cmd_head) del_timer(>timeout); /* no more pending cmds */ - wake_up(>wait_cmdr); - return handled; } @@ -996,7 +1013,17 @@ static int tcmu_irqcontrol(struct uio_info *info, s32 irq_on) struct tcmu_dev *tcmu_dev = container_of(info, struct tcmu_dev, uio_info); mutex_lock(_dev->cmdr_lock); - tcmu_handle_completions(tcmu_dev); + /* +* If the current udev is also in waiter list, this will +* make sure that the other waiters in list be feeded ahead +* of it. +*
Re: [PATCH v6 2/2] tcmu: Add global data block pool support
[...] + +static bool tcmu_get_empty_blocks(struct tcmu_dev *udev, + struct tcmu_cmd *tcmu_cmd, + uint32_t blocks_needed) Can drop blocks_needed. Will fix it. [...] -static void *tcmu_get_block_addr(struct tcmu_dev *udev, uint32_t dbi) +static inline struct page * +tcmu_get_block_page(struct tcmu_dev *udev, uint32_t dbi) { return radix_tree_lookup(>data_blocks, dbi); } @@ -365,17 +417,20 @@ static int alloc_and_scatter_data_area(struct tcmu_dev *udev, We don't actually alloc the area here any more. Could probably rename it to be similar to gather_data_area. Yes, will rename to scatter_data_area. [...] @@ -616,6 +711,7 @@ static bool is_ring_space_avail(struct tcmu_dev *udev, size_t cmd_size, size_t d , _cnt, copy_to_data_area); if (ret) { pr_err("tcmu: alloc and scatter data failed\n"); + mutex_unlock(>cmdr_lock); I think here and in the error case below, you need to do a tcmu_cmd_free_data. Yes, good catch, and the tcmu_cmd_free_data is needed here to free the bits in udev->data_bitmap. And the unlock will be added in the first patch. [...] +static struct page *tcmu_try_get_block_page(struct tcmu_dev *udev, uint32_t dbi) +{ + struct page *page; + int ret; + + mutex_lock(>cmdr_lock); + page = tcmu_get_block_page(udev, dbi); + if (likely(page)) { + mutex_unlock(>cmdr_lock); + return page; + } + + /* +* Normally it shouldn't be here: +* Only when the userspace has touched the blocks which +* are out of the tcmu_cmd's data iov[], and will return +* one zeroed page. Is it a userspace bug when this happens? Do you know when it is occcuring? Since the UIO will map the whole ring buffer to the user space at the beginning, and the userspace is allowed and legal to access any block within the limits of the mapped ring area. But actually when this happens, it normally will be one bug of the userspace. Without this checking the kernel will output many page fault dump traces. Maybe here outputing some warning message is a good idea, and will be easy to debug for userspace. [...] @@ -1388,6 +1509,81 @@ static ssize_t tcmu_cmd_time_out_store(struct config_item *item, const char *pag .tb_dev_attrib_attrs= NULL, }; +static int unmap_thread_fn(void *data) +{ + struct tcmu_dev *udev; + loff_t off; + uint32_t start, end, block; + struct page *page; + int i; + + while (1) { + DEFINE_WAIT(__wait); + + prepare_to_wait(_wait, &__wait, TASK_INTERRUPTIBLE); + schedule(); + finish_wait(_wait, &__wait); + + mutex_lock(_udev_mutex); + list_for_each_entry(udev, _udev, node) { + mutex_lock(>cmdr_lock); + + /* Try to complete the finished commands first */ + tcmu_handle_completions(udev); + + /* Skip the udevs waiting the global pool or in idle */ + if (udev->waiting_global || !udev->dbi_thresh) { + mutex_unlock(>cmdr_lock); + continue; + } + + end = udev->dbi_max + 1; + block = find_last_bit(udev->data_bitmap, end); + if (block == udev->dbi_max) { + /* +* The last bit is dbi_max, so there is +* no need to shrink any blocks. +*/ + mutex_unlock(>cmdr_lock); + continue; + } else if (block == end) { + /* The current udev will goto idle state */ + udev->dbi_thresh = start = 0; + udev->dbi_max = 0; + } else { + udev->dbi_thresh = start = block + 1; + udev->dbi_max = block; + } + + /* Here will truncate the data area from off */ + off = udev->data_off + start * DATA_BLOCK_SIZE; + unmap_mapping_range(udev->inode->i_mapping, off, 0, 1); + + /* Release the block pages */ + for (i = start; i < end; i++) { + page = radix_tree_delete(>data_blocks, i); + if (page) { + __free_page(page); + atomic_dec(_db_count); + } + } + mutex_unlock(>cmdr_lock); + } + +
Re: [PATCH v6 2/2] tcmu: Add global data block pool support
[...] + +static bool tcmu_get_empty_blocks(struct tcmu_dev *udev, + struct tcmu_cmd *tcmu_cmd, + uint32_t blocks_needed) Can drop blocks_needed. Will fix it. [...] -static void *tcmu_get_block_addr(struct tcmu_dev *udev, uint32_t dbi) +static inline struct page * +tcmu_get_block_page(struct tcmu_dev *udev, uint32_t dbi) { return radix_tree_lookup(>data_blocks, dbi); } @@ -365,17 +417,20 @@ static int alloc_and_scatter_data_area(struct tcmu_dev *udev, We don't actually alloc the area here any more. Could probably rename it to be similar to gather_data_area. Yes, will rename to scatter_data_area. [...] @@ -616,6 +711,7 @@ static bool is_ring_space_avail(struct tcmu_dev *udev, size_t cmd_size, size_t d , _cnt, copy_to_data_area); if (ret) { pr_err("tcmu: alloc and scatter data failed\n"); + mutex_unlock(>cmdr_lock); I think here and in the error case below, you need to do a tcmu_cmd_free_data. Yes, good catch, and the tcmu_cmd_free_data is needed here to free the bits in udev->data_bitmap. And the unlock will be added in the first patch. [...] +static struct page *tcmu_try_get_block_page(struct tcmu_dev *udev, uint32_t dbi) +{ + struct page *page; + int ret; + + mutex_lock(>cmdr_lock); + page = tcmu_get_block_page(udev, dbi); + if (likely(page)) { + mutex_unlock(>cmdr_lock); + return page; + } + + /* +* Normally it shouldn't be here: +* Only when the userspace has touched the blocks which +* are out of the tcmu_cmd's data iov[], and will return +* one zeroed page. Is it a userspace bug when this happens? Do you know when it is occcuring? Since the UIO will map the whole ring buffer to the user space at the beginning, and the userspace is allowed and legal to access any block within the limits of the mapped ring area. But actually when this happens, it normally will be one bug of the userspace. Without this checking the kernel will output many page fault dump traces. Maybe here outputing some warning message is a good idea, and will be easy to debug for userspace. [...] @@ -1388,6 +1509,81 @@ static ssize_t tcmu_cmd_time_out_store(struct config_item *item, const char *pag .tb_dev_attrib_attrs= NULL, }; +static int unmap_thread_fn(void *data) +{ + struct tcmu_dev *udev; + loff_t off; + uint32_t start, end, block; + struct page *page; + int i; + + while (1) { + DEFINE_WAIT(__wait); + + prepare_to_wait(_wait, &__wait, TASK_INTERRUPTIBLE); + schedule(); + finish_wait(_wait, &__wait); + + mutex_lock(_udev_mutex); + list_for_each_entry(udev, _udev, node) { + mutex_lock(>cmdr_lock); + + /* Try to complete the finished commands first */ + tcmu_handle_completions(udev); + + /* Skip the udevs waiting the global pool or in idle */ + if (udev->waiting_global || !udev->dbi_thresh) { + mutex_unlock(>cmdr_lock); + continue; + } + + end = udev->dbi_max + 1; + block = find_last_bit(udev->data_bitmap, end); + if (block == udev->dbi_max) { + /* +* The last bit is dbi_max, so there is +* no need to shrink any blocks. +*/ + mutex_unlock(>cmdr_lock); + continue; + } else if (block == end) { + /* The current udev will goto idle state */ + udev->dbi_thresh = start = 0; + udev->dbi_max = 0; + } else { + udev->dbi_thresh = start = block + 1; + udev->dbi_max = block; + } + + /* Here will truncate the data area from off */ + off = udev->data_off + start * DATA_BLOCK_SIZE; + unmap_mapping_range(udev->inode->i_mapping, off, 0, 1); + + /* Release the block pages */ + for (i = start; i < end; i++) { + page = radix_tree_delete(>data_blocks, i); + if (page) { + __free_page(page); + atomic_dec(_db_count); + } + } + mutex_unlock(>cmdr_lock); + } + +
Re: [PATCH v6 1/2] tcmu: Add dynamic growing data area featuresupport
On 2017年04月30日 13:48, Mike Christie wrote: On 04/26/2017 01:25 AM, lixi...@cmss.chinamobile.com wrote: for_each_sg(data_sg, sg, data_nents, i) { @@ -275,22 +371,26 @@ static void alloc_and_scatter_data_area(struct tcmu_dev *udev, from = kmap_atomic(sg_page(sg)) + sg->offset; while (sg_remaining > 0) { if (block_remaining == 0) { - block = find_first_zero_bit(udev->data_bitmap, - DATA_BLOCK_BITS); block_remaining = DATA_BLOCK_SIZE; - set_bit(block, udev->data_bitmap); + dbi = tcmu_get_empty_block(udev, ); + if (dbi < 0) I know it you fixed the missing kunmap_atomic here and missing unlock in tcmu_queue_cmd_ring in the next patch, but I think normally people prefer that one patch does not add a bug, then the next patch fixes it. Do you mean the following kmap_atomic() ? from = kmap_atomic(sg_page(sg)) + sg->offset; In this patch there has no new kmap/kunmap introduced. This is the old code and the kunmap is at the end of aasda(). This as the initial patch, the memory is from slab cache now. But since the second patch followed will covert to use memory page from the buddy directly. Thanks, BRs Xiubo Li
Re: [PATCH v6 1/2] tcmu: Add dynamic growing data area featuresupport
On 2017年04月30日 13:48, Mike Christie wrote: On 04/26/2017 01:25 AM, lixi...@cmss.chinamobile.com wrote: for_each_sg(data_sg, sg, data_nents, i) { @@ -275,22 +371,26 @@ static void alloc_and_scatter_data_area(struct tcmu_dev *udev, from = kmap_atomic(sg_page(sg)) + sg->offset; while (sg_remaining > 0) { if (block_remaining == 0) { - block = find_first_zero_bit(udev->data_bitmap, - DATA_BLOCK_BITS); block_remaining = DATA_BLOCK_SIZE; - set_bit(block, udev->data_bitmap); + dbi = tcmu_get_empty_block(udev, ); + if (dbi < 0) I know it you fixed the missing kunmap_atomic here and missing unlock in tcmu_queue_cmd_ring in the next patch, but I think normally people prefer that one patch does not add a bug, then the next patch fixes it. Do you mean the following kmap_atomic() ? from = kmap_atomic(sg_page(sg)) + sg->offset; In this patch there has no new kmap/kunmap introduced. This is the old code and the kunmap is at the end of aasda(). This as the initial patch, the memory is from slab cache now. But since the second patch followed will covert to use memory page from the buddy directly. Thanks, BRs Xiubo Li
Re: how to unmap pages in an anonymous mmap?
On 2017年02月28日 03:32, Andy Grover wrote: On 02/26/2017 09:59 PM, Xiubo Li wrote: But, We likely don't want to release memory from the data area anyways while active, in any case. How about if we set a timer when active commands go to zero, and then reduce data area to some minimum if no new cmds come in before timer expires? If I understand correctly: for example, we have 1G(as the minimum) data area and all blocks have been allocated and mapped to runner's vma, then we extern it to 1G + 256M as needed. When there have no active cmds and after the timer expires, will it reduce the data area back to 1G ? And then should it release the reduced 256M data area's memories ? If so, after kfree()ed the blocks' memories, it should also try to remove all the ptes which are mapping this page(like using the try_to_umap()), but something like try_to_umap() doesn't export for the modules. Without ummaping the kfree()ed pages' ptes mentioned above, then the reduced 256M vma space couldn't be reused again for the runner process, because the runner has already do the mapping for the reduced vma space to some old physical pages(won't trigger new page fault again). Then there will be a hole, and the hole will be bigger and bigger. Without ummaping the kfree()ed pages' ptes mentioned above, the pages' reference count (page_ref_dec(), which _inc()ed in page fault) couldn't be reduced back too. Let's ask people who will know... Hi linux-mm, TCM-User (drivers/target/target_core_user.c) currently uses vmalloc()ed memory to back a ring buffer that is mmaped by userspace. We want to move to dynamically mapping pages into this region, and also we'd like to unmap/free pages when idle. What's the right way to unmap? I see unmap_mapping_range() but that mentions an underlying file, which TCMU doesn't have. Or maybe zap_page_range()? But it's not exported. Hi linux-mm For the TCMU case, the vm is not anonymous mapping. And still has device file desc: mmap(NULL, len, PROT_READ|PROT_WRITE, MAP_SHARED, dev->fd, 0); If using the unmap_mapping_range() to do the dynamically maping, is it okay ? Any other potential risks ? Or the mentioned 'underlying file' is must one desk file ? Thanks very much, BRs Xiubo Any advice? Thanks in advance -- Regards -- Andy
Re: how to unmap pages in an anonymous mmap?
On 2017年02月28日 03:32, Andy Grover wrote: On 02/26/2017 09:59 PM, Xiubo Li wrote: But, We likely don't want to release memory from the data area anyways while active, in any case. How about if we set a timer when active commands go to zero, and then reduce data area to some minimum if no new cmds come in before timer expires? If I understand correctly: for example, we have 1G(as the minimum) data area and all blocks have been allocated and mapped to runner's vma, then we extern it to 1G + 256M as needed. When there have no active cmds and after the timer expires, will it reduce the data area back to 1G ? And then should it release the reduced 256M data area's memories ? If so, after kfree()ed the blocks' memories, it should also try to remove all the ptes which are mapping this page(like using the try_to_umap()), but something like try_to_umap() doesn't export for the modules. Without ummaping the kfree()ed pages' ptes mentioned above, then the reduced 256M vma space couldn't be reused again for the runner process, because the runner has already do the mapping for the reduced vma space to some old physical pages(won't trigger new page fault again). Then there will be a hole, and the hole will be bigger and bigger. Without ummaping the kfree()ed pages' ptes mentioned above, the pages' reference count (page_ref_dec(), which _inc()ed in page fault) couldn't be reduced back too. Let's ask people who will know... Hi linux-mm, TCM-User (drivers/target/target_core_user.c) currently uses vmalloc()ed memory to back a ring buffer that is mmaped by userspace. We want to move to dynamically mapping pages into this region, and also we'd like to unmap/free pages when idle. What's the right way to unmap? I see unmap_mapping_range() but that mentions an underlying file, which TCMU doesn't have. Or maybe zap_page_range()? But it's not exported. Hi linux-mm For the TCMU case, the vm is not anonymous mapping. And still has device file desc: mmap(NULL, len, PROT_READ|PROT_WRITE, MAP_SHARED, dev->fd, 0); If using the unmap_mapping_range() to do the dynamically maping, is it okay ? Any other potential risks ? Or the mentioned 'underlying file' is must one desk file ? Thanks very much, BRs Xiubo Any advice? Thanks in advance -- Regards -- Andy
Re: [PATCH] target/user: Add daynmic growing data area featuresupport
For now we will increase the data area size to 1G, and the cmd area size to 128M. The tcmu-runner should mmap() about (128M + 1G) when running and the TCMU will dynamically grows the data area from 0 to max 1G size. Cool. This is a good approach for an initial patch but this raises concerns about efficiently managing kernel memory usage -- the data area grows but never shrinks, and total possible usage increases per backstore. (What if there are 1000?) Any ideas how we could also improve these aspects of the design? (Global TCMU data area usage limit?) Sorry for misunderstanding about this on my part before. If we couldn't get a feasible way from mm to deal with the memories shrinking. Maybe a global TCMU data area usage limit is a good choice: We can limit the global physical data area size to 2G as default, and export one sysfs to configure this as needed(such as 10G size with possible 1000 targets). Then use one global radix tree to manage all the 2G physical pages(will grow from 0 to 2G). Each ring buffer will search it's own data area bitmaps, and if the current block is reusing, then we should get the old page, which has already mapped to the runner process, from the global radix tree, or should we get one new page(from global radix tree or system MM). After getting the page, tcmu in kernel will use it my kmap(). Free it with kumapp() and insert to global radix tree. BRs Xiubo
Re: [PATCH] target/user: Add daynmic growing data area featuresupport
For now we will increase the data area size to 1G, and the cmd area size to 128M. The tcmu-runner should mmap() about (128M + 1G) when running and the TCMU will dynamically grows the data area from 0 to max 1G size. Cool. This is a good approach for an initial patch but this raises concerns about efficiently managing kernel memory usage -- the data area grows but never shrinks, and total possible usage increases per backstore. (What if there are 1000?) Any ideas how we could also improve these aspects of the design? (Global TCMU data area usage limit?) Sorry for misunderstanding about this on my part before. If we couldn't get a feasible way from mm to deal with the memories shrinking. Maybe a global TCMU data area usage limit is a good choice: We can limit the global physical data area size to 2G as default, and export one sysfs to configure this as needed(such as 10G size with possible 1000 targets). Then use one global radix tree to manage all the 2G physical pages(will grow from 0 to 2G). Each ring buffer will search it's own data area bitmaps, and if the current block is reusing, then we should get the old page, which has already mapped to the runner process, from the global radix tree, or should we get one new page(from global radix tree or system MM). After getting the page, tcmu in kernel will use it my kmap(). Free it with kumapp() and insert to global radix tree. BRs Xiubo
Re: [PATCH] target/user: Add daynmic growing data area featuresupport
Write throughput is pretty low at around 150 MB/s. What's the original write throughput without this patch? Is it also around 80 MB/s ? It is around 20-30 MB/s. Same fio args except using --rw=write. Got it. Thanks. BRs Xiubo
Re: [PATCH] target/user: Add daynmic growing data area featuresupport
Write throughput is pretty low at around 150 MB/s. What's the original write throughput without this patch? Is it also around 80 MB/s ? It is around 20-30 MB/s. Same fio args except using --rw=write. Got it. Thanks. BRs Xiubo
Re: [PATCH] target/user: Add daynmic growing data area featuresupport
On 02/17/2017 01:24 AM, lixi...@cmss.chinamobile.com wrote: From: Xiubo Li <lixi...@cmss.chinamobile.com> Currently for the TCMU, the ring buffer size is fixed to 64K cmd area + 1M data area, and this will be bottlenecks for high iops. Hi Xiubo, thanks for your work. daynmic -> dynamic Have you benchmarked this patch and determined what kind of iops improvement it allows? Do you see the data area reaching its fully-allocated size? I tested this patch with Venky's tcmu-runner rbd aio patches, with one 10 gig iscsi session, and for pretty basic fio direct io (64 -256K read/writes with a queue depth of 64 numjobs between 1 and 4) tests read throughput goes from about 80 to 500 MB/s. Write throughput is pretty low at around 150 MB/s. I did not hit the fully allocated size. I did not drive a lot of IO though. How about dealing with memories shrinking in patch series followed? As the initial patch, we could set the cmd area size to 8MB and the data area size to 512MB. And this could work fine for most cases without using too much memories. On my similar test case by using VMs(low iops case) using fio, -bs=[64K, 128K, 512K, 1M] -size=20G, -iodepth 1 -numjobs=10, the bw of read increases from about 5200KB/s to about 6100KB/s, and the bw of write increases from about 3000KB/s to about 3300KB/s. While bs < 64K(from the log, the maximum of the data length is 64K), the smaller of it the two bws will be closer. But for all my test cases, the allocated size is far away from the full size too. Thanks, BRs Xiubo
Re: [PATCH] target/user: Add daynmic growing data area featuresupport
On 02/17/2017 01:24 AM, lixi...@cmss.chinamobile.com wrote: From: Xiubo Li Currently for the TCMU, the ring buffer size is fixed to 64K cmd area + 1M data area, and this will be bottlenecks for high iops. Hi Xiubo, thanks for your work. daynmic -> dynamic Have you benchmarked this patch and determined what kind of iops improvement it allows? Do you see the data area reaching its fully-allocated size? I tested this patch with Venky's tcmu-runner rbd aio patches, with one 10 gig iscsi session, and for pretty basic fio direct io (64 -256K read/writes with a queue depth of 64 numjobs between 1 and 4) tests read throughput goes from about 80 to 500 MB/s. Write throughput is pretty low at around 150 MB/s. I did not hit the fully allocated size. I did not drive a lot of IO though. How about dealing with memories shrinking in patch series followed? As the initial patch, we could set the cmd area size to 8MB and the data area size to 512MB. And this could work fine for most cases without using too much memories. On my similar test case by using VMs(low iops case) using fio, -bs=[64K, 128K, 512K, 1M] -size=20G, -iodepth 1 -numjobs=10, the bw of read increases from about 5200KB/s to about 6100KB/s, and the bw of write increases from about 3000KB/s to about 3300KB/s. While bs < 64K(from the log, the maximum of the data length is 64K), the smaller of it the two bws will be closer. But for all my test cases, the allocated size is far away from the full size too. Thanks, BRs Xiubo
Re: [PATCH] target/user: Add daynmic growing data area featuresupport
Cool. This is a good approach for an initial patch but this raises concerns about efficiently managing kernel memory usage -- the data area grows but never shrinks, and total possible usage increases per backstore. (What if there are 1000?) Any ideas how we could also improve these aspects of the design? (Global TCMU data area usage limit?) Two ways in my mind: The first: How about by setting a threshold cmd(SHRINK cmd), something likes the PAD cmd, to tell the userspace runner try to shrink the memories? Why should userspace need to know if the kernel is shrinking memory allocated to the data area? Userspace knows about memory described in iovecs for in-flight cmds, we shouldn't need its cooperation to free other allocated parts of the data area. Yes. But, We likely don't want to release memory from the data area anyways while active, in any case. How about if we set a timer when active commands go to zero, and then reduce data area to some minimum if no new cmds come in before timer expires? If I understand correctly: for example, we have 1G(as the minimum) data area and all blocks have been allocated and mapped to runner's vma, then we extern it to 1G + 256M as needed. When there have no active cmds and after the timer expires, will it reduce the data area back to 1G ? And then should it release the reduced 256M data area's memories ? If so, after kfree()ed the blocks' memories, it should also try to remove all the ptes which are mapping this page(like using the try_to_umap()), but something like try_to_umap() doesn't export for the modules. Without ummaping the kfree()ed pages' ptes mentioned above, then the reduced 256M vma space couldn't be reused again for the runner process, because the runner has already do the mapping for the reduced vma space to some old physical pages(won't trigger new page fault again). Then there will be a hole, and the hole will be bigger and bigger. Without ummaping the kfree()ed pages' ptes mentioned above, the pages' reference count (page_ref_dec(), which _inc()ed in page fault) couldn't be reduced back too. When the runner get the SHRINK cmd, it will try to remmap uio0's ring buffer(?). Then the kernel will get chance to shrink the memories The second: Try to extern the data area by using /dev/uio1, we could remmap the uio1 device when need, so it will be easy to get a chance to shrink the memories in uio1. Userspace should not need to remap the region in order for the kernel to free and unmap pages from the region. The only thing we need to watch out for is if blocks are referenced by in-flight cmds, we can't free them or userspace will segfault. Yes, agree. So, if we wait until there are no in-flight cmds, then it follows that the kernel can free whatever it wants and userspace will not segfault. So, the problem is how to ummap the kfree()ed pages' ptes. BRs Xiubo -- Andy
Re: [PATCH] target/user: Add daynmic growing data area featuresupport
Cool. This is a good approach for an initial patch but this raises concerns about efficiently managing kernel memory usage -- the data area grows but never shrinks, and total possible usage increases per backstore. (What if there are 1000?) Any ideas how we could also improve these aspects of the design? (Global TCMU data area usage limit?) Two ways in my mind: The first: How about by setting a threshold cmd(SHRINK cmd), something likes the PAD cmd, to tell the userspace runner try to shrink the memories? Why should userspace need to know if the kernel is shrinking memory allocated to the data area? Userspace knows about memory described in iovecs for in-flight cmds, we shouldn't need its cooperation to free other allocated parts of the data area. Yes. But, We likely don't want to release memory from the data area anyways while active, in any case. How about if we set a timer when active commands go to zero, and then reduce data area to some minimum if no new cmds come in before timer expires? If I understand correctly: for example, we have 1G(as the minimum) data area and all blocks have been allocated and mapped to runner's vma, then we extern it to 1G + 256M as needed. When there have no active cmds and after the timer expires, will it reduce the data area back to 1G ? And then should it release the reduced 256M data area's memories ? If so, after kfree()ed the blocks' memories, it should also try to remove all the ptes which are mapping this page(like using the try_to_umap()), but something like try_to_umap() doesn't export for the modules. Without ummaping the kfree()ed pages' ptes mentioned above, then the reduced 256M vma space couldn't be reused again for the runner process, because the runner has already do the mapping for the reduced vma space to some old physical pages(won't trigger new page fault again). Then there will be a hole, and the hole will be bigger and bigger. Without ummaping the kfree()ed pages' ptes mentioned above, the pages' reference count (page_ref_dec(), which _inc()ed in page fault) couldn't be reduced back too. When the runner get the SHRINK cmd, it will try to remmap uio0's ring buffer(?). Then the kernel will get chance to shrink the memories The second: Try to extern the data area by using /dev/uio1, we could remmap the uio1 device when need, so it will be easy to get a chance to shrink the memories in uio1. Userspace should not need to remap the region in order for the kernel to free and unmap pages from the region. The only thing we need to watch out for is if blocks are referenced by in-flight cmds, we can't free them or userspace will segfault. Yes, agree. So, if we wait until there are no in-flight cmds, then it follows that the kernel can free whatever it wants and userspace will not segfault. So, the problem is how to ummap the kfree()ed pages' ptes. BRs Xiubo -- Andy
Re: [PATCH] target/user: Add daynmic growing data area featuresupport
When N is bigger, the ratio will be smaller. If N >= 1, the ratio will be [15/1024, 4/1024), for this the ratio 15 : 1024 will be enough. But maybe some iscsi cmds has no datas, N == 0. So the ratio should be bigger. For now we will increase the data area size to 1G, and the cmd area size to 128M. The tcmu-runner should mmap() about (128M + 1G) when running and the TCMU will dynamically grows the data area from 0 to max 1G size. Cool. This is a good approach for an initial patch but this raises concerns about efficiently managing kernel memory usage -- the data area grows but never shrinks, and total possible usage increases per backstore. (What if there are 1000?) Any ideas how we could also improve these aspects of the design? (Global TCMU data area usage limit?) Two ways in my mind: The first: How about by setting a threshold cmd(SHRINK cmd), something likes the PAD cmd, to tell the userspace runner try to shrink the memories? When the runner get the SHRINK cmd, it will try to remmap uio0's ring buffer(?). Then the kernel will get chance to shrink the memories The second: Try to extern the data area by using /dev/uio1, we could remmap the uio1 device when need, so it will be easy to get a chance to shrink the memories in uio1. Maybe these are a little ugly, are there other more effective ways ? Thanks, BRs Xiubo The cmd area memory will be allocated through vmalloc(), and the data area's blocks will be allocated individually later when needed.
Re: [PATCH] target/user: Add daynmic growing data area featuresupport
When N is bigger, the ratio will be smaller. If N >= 1, the ratio will be [15/1024, 4/1024), for this the ratio 15 : 1024 will be enough. But maybe some iscsi cmds has no datas, N == 0. So the ratio should be bigger. For now we will increase the data area size to 1G, and the cmd area size to 128M. The tcmu-runner should mmap() about (128M + 1G) when running and the TCMU will dynamically grows the data area from 0 to max 1G size. Cool. This is a good approach for an initial patch but this raises concerns about efficiently managing kernel memory usage -- the data area grows but never shrinks, and total possible usage increases per backstore. (What if there are 1000?) Any ideas how we could also improve these aspects of the design? (Global TCMU data area usage limit?) Two ways in my mind: The first: How about by setting a threshold cmd(SHRINK cmd), something likes the PAD cmd, to tell the userspace runner try to shrink the memories? When the runner get the SHRINK cmd, it will try to remmap uio0's ring buffer(?). Then the kernel will get chance to shrink the memories The second: Try to extern the data area by using /dev/uio1, we could remmap the uio1 device when need, so it will be easy to get a chance to shrink the memories in uio1. Maybe these are a little ugly, are there other more effective ways ? Thanks, BRs Xiubo The cmd area memory will be allocated through vmalloc(), and the data area's blocks will be allocated individually later when needed.
Re: [PATCH] uio: add UIO_MEM_CUSTOM support
buffer(uio0 --> map0). Currently the TCMU will using the fixed small size map area as the ring buffer, but this will be the bottleneck for high iops. Without knowing how large it is enough, so the new scheme will use the fixed small ring buffer area(about 64M ~ 128M) + dynamically "growing" ring buffer area(about 1.5G). The following code is in uio_mmap() in uio.c: if (idev->info->mmap) { ret = idev->info->mmap(idev->info, vma); return ret; } switch (idev->info->mem[mi].memtype) { case UIO_MEM_PHYS: return uio_mmap_physical(vma); case UIO_MEM_LOGICAL: case UIO_MEM_VIRTUAL: return uio_mmap_logical(vma); default: return -EINVAL; } We already have the equivalent of a CUSTOM memtype because TCMU sets the info->mmap fn, overriding uio's default handling choices in favor of its own. For the driver like TCMU, the UIO_MEM_NONE could be used with implementing its own ->mmap() magic, instead of adding new UIO_MEM_CUSTOM. But will the semantic be clearer by using _CUSTOM instead of _NONE ? Thanks, BRs Xiubo HTH -- Regards -- Andy
Re: [PATCH] uio: add UIO_MEM_CUSTOM support
buffer(uio0 --> map0). Currently the TCMU will using the fixed small size map area as the ring buffer, but this will be the bottleneck for high iops. Without knowing how large it is enough, so the new scheme will use the fixed small ring buffer area(about 64M ~ 128M) + dynamically "growing" ring buffer area(about 1.5G). The following code is in uio_mmap() in uio.c: if (idev->info->mmap) { ret = idev->info->mmap(idev->info, vma); return ret; } switch (idev->info->mem[mi].memtype) { case UIO_MEM_PHYS: return uio_mmap_physical(vma); case UIO_MEM_LOGICAL: case UIO_MEM_VIRTUAL: return uio_mmap_logical(vma); default: return -EINVAL; } We already have the equivalent of a CUSTOM memtype because TCMU sets the info->mmap fn, overriding uio's default handling choices in favor of its own. For the driver like TCMU, the UIO_MEM_NONE could be used with implementing its own ->mmap() magic, instead of adding new UIO_MEM_CUSTOM. But will the semantic be clearer by using _CUSTOM instead of _NONE ? Thanks, BRs Xiubo HTH -- Regards -- Andy
Re: [PATCH] uio: add UIO_MEM_CUSTOM support
For example, the TCMU will use the map area as ISCSI commands & data ring buffer(uio0 --> map0). Currently the TCMU will using the fixed small size map area as the ring buffer, but this will be the bottleneck for high iops. Without knowing how large it is enough, so the new scheme will use the fixed small ring buffer area(about 64M ~ 128M) + dynamically "growing" ring buffer area(about 1.5G). The following code is in uio_mmap() in uio.c: if (idev->info->mmap) { ret = idev->info->mmap(idev->info, vma); return ret; Yes, just missed this return. } switch (idev->info->mem[mi].memtype) { case UIO_MEM_PHYS: return uio_mmap_physical(vma); case UIO_MEM_LOGICAL: case UIO_MEM_VIRTUAL: return uio_mmap_logical(vma); default: return -EINVAL; } We already have the equivalent of a CUSTOM memtype because TCMU sets the info->mmap fn, overriding uio's default handling choices in favor of its own. HTH -- Regards -- Andy
Re: [PATCH] uio: add UIO_MEM_CUSTOM support
For example, the TCMU will use the map area as ISCSI commands & data ring buffer(uio0 --> map0). Currently the TCMU will using the fixed small size map area as the ring buffer, but this will be the bottleneck for high iops. Without knowing how large it is enough, so the new scheme will use the fixed small ring buffer area(about 64M ~ 128M) + dynamically "growing" ring buffer area(about 1.5G). The following code is in uio_mmap() in uio.c: if (idev->info->mmap) { ret = idev->info->mmap(idev->info, vma); return ret; Yes, just missed this return. } switch (idev->info->mem[mi].memtype) { case UIO_MEM_PHYS: return uio_mmap_physical(vma); case UIO_MEM_LOGICAL: case UIO_MEM_VIRTUAL: return uio_mmap_logical(vma); default: return -EINVAL; } We already have the equivalent of a CUSTOM memtype because TCMU sets the info->mmap fn, overriding uio's default handling choices in favor of its own. HTH -- Regards -- Andy
Re: [PATCH] uio: add UIO_MEM_CUSTOM support
diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c index fba021f..6ca0ae0 100644 --- a/drivers/uio/uio.c +++ b/drivers/uio/uio.c @@ -708,6 +708,8 @@ static int uio_mmap(struct file *filep, struct vm_area_struct *vma) case UIO_MEM_LOGICAL: case UIO_MEM_VIRTUAL: return uio_mmap_logical(vma); + case UIO_MEM_CUSTOM: + return 0; How does this help? For example, the TCMU will use the map area as ISCSI commands & data ring buffer(uio0 --> map0). Currently the TCMU will using the fixed small size map area as the ring buffer, but this will be the bottleneck for high iops. Without knowing how large it is enough, so the new scheme will use the fixed small ring buffer area(about 64M ~ 128M) + dynamically "growing" ring buffer area(about 1.5G). The fixed small area will be using vmalloc() when initializing, and dynamically "growing" area will must use kmalloc() for some reasons. ... Can you provide an update to the documentation in Documentation/driver-api/uio-howto.rst to show how to use this? Yes, I will update this. Thanks. BRs Xiubo thanks, greg k-h
Re: [PATCH] uio: add UIO_MEM_CUSTOM support
diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c index fba021f..6ca0ae0 100644 --- a/drivers/uio/uio.c +++ b/drivers/uio/uio.c @@ -708,6 +708,8 @@ static int uio_mmap(struct file *filep, struct vm_area_struct *vma) case UIO_MEM_LOGICAL: case UIO_MEM_VIRTUAL: return uio_mmap_logical(vma); + case UIO_MEM_CUSTOM: + return 0; How does this help? For example, the TCMU will use the map area as ISCSI commands & data ring buffer(uio0 --> map0). Currently the TCMU will using the fixed small size map area as the ring buffer, but this will be the bottleneck for high iops. Without knowing how large it is enough, so the new scheme will use the fixed small ring buffer area(about 64M ~ 128M) + dynamically "growing" ring buffer area(about 1.5G). The fixed small area will be using vmalloc() when initializing, and dynamically "growing" area will must use kmalloc() for some reasons. ... Can you provide an update to the documentation in Documentation/driver-api/uio-howto.rst to show how to use this? Yes, I will update this. Thanks. BRs Xiubo thanks, greg k-h
Re: [PATCH] target/user: Fix use-after-free cmd->se_cmd if the cmd isexpired
Hi Mike Thanks very much for your analysis. diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c index 2e33100..6396581 100644 --- a/drivers/target/target_core_user.c +++ b/drivers/target/target_core_user.c @@ -684,7 +684,6 @@ static int tcmu_check_expired_cmd(int id, void *p, void *data) set_bit(TCMU_CMD_BIT_EXPIRED, >flags); target_complete_cmd(cmd->se_cmd, SAM_STAT_CHECK_CONDITION); - cmd->se_cmd = NULL; How did tcmu_handle_completion get to a point it was accessing the se_cmd if the TCMU_CMD_BIT_EXPIRED bit was set? Were memory accesses out of order? No, even using the -O3, becuase has there memory dependency ? CPU1 set the TCMU_CMD_BIT_EXPIRED bit then cleared cmd->se_cmd, but CPU2 copied cmd->se_cmd to se_cmd and saw it was NULL but did not yet see the TCMU_CMD_BIT_EXPIRED bit set? Because the debug rpms for my kernel version were lost, and the crash tools couldn't be used to have a more accurate analysis. It looks like, if you do the above patch, the above function will call target_complete_cmd and tcmu_handle_completion will call it again, so we will have a double free issue. Maybe the best resolution is to move tcmu_handle_completion() between spin_lock(>commands_lock) and spin_unlock(>commands_lock)? Thanks. BRs Xiubo Li
Re: [PATCH] target/user: Fix use-after-free cmd->se_cmd if the cmd isexpired
Hi Mike Thanks very much for your analysis. diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c index 2e33100..6396581 100644 --- a/drivers/target/target_core_user.c +++ b/drivers/target/target_core_user.c @@ -684,7 +684,6 @@ static int tcmu_check_expired_cmd(int id, void *p, void *data) set_bit(TCMU_CMD_BIT_EXPIRED, >flags); target_complete_cmd(cmd->se_cmd, SAM_STAT_CHECK_CONDITION); - cmd->se_cmd = NULL; How did tcmu_handle_completion get to a point it was accessing the se_cmd if the TCMU_CMD_BIT_EXPIRED bit was set? Were memory accesses out of order? No, even using the -O3, becuase has there memory dependency ? CPU1 set the TCMU_CMD_BIT_EXPIRED bit then cleared cmd->se_cmd, but CPU2 copied cmd->se_cmd to se_cmd and saw it was NULL but did not yet see the TCMU_CMD_BIT_EXPIRED bit set? Because the debug rpms for my kernel version were lost, and the crash tools couldn't be used to have a more accurate analysis. It looks like, if you do the above patch, the above function will call target_complete_cmd and tcmu_handle_completion will call it again, so we will have a double free issue. Maybe the best resolution is to move tcmu_handle_completion() between spin_lock(>commands_lock) and spin_unlock(>commands_lock)? Thanks. BRs Xiubo Li
[PATCH] blk-mq: remove the double hctx->numa_node init code.
Since the commit f14bbe77a96bb ("blk-mq: pass in suggested NUMA node to ->alloc_hctx()") has already computed and set the suggested NUMA node for hctx, so this code here is reduntant. Signed-off-by: Xiubo Li <lixi...@cmss.chinamobile.com> --- block/blk-mq.c | 19 ++- 1 file changed, 2 insertions(+), 17 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 4c0622f..ab86716 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1767,33 +1767,18 @@ static int blk_mq_init_hw_queues(struct request_queue *q, return 1; } -static void blk_mq_init_cpu_queues(struct request_queue *q, - unsigned int nr_hw_queues) +static void blk_mq_init_cpu_queues(struct request_queue *q) { unsigned int i; for_each_possible_cpu(i) { struct blk_mq_ctx *__ctx = per_cpu_ptr(q->queue_ctx, i); - struct blk_mq_hw_ctx *hctx; memset(__ctx, 0, sizeof(*__ctx)); __ctx->cpu = i; spin_lock_init(&__ctx->lock); INIT_LIST_HEAD(&__ctx->rq_list); __ctx->queue = q; - - /* If the cpu isn't online, the cpu is mapped to first hctx */ - if (!cpu_online(i)) - continue; - - hctx = q->mq_ops->map_queue(q, i); - - /* -* Set local node, IFF we have more than one hw queue. If -* not, we remain on the home node of the device -*/ - if (nr_hw_queues > 1 && hctx->numa_node == NUMA_NO_NODE) - hctx->numa_node = local_memory_node(cpu_to_node(i)); } } @@ -2046,7 +2031,7 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set, if (set->ops->complete) blk_queue_softirq_done(q, set->ops->complete); - blk_mq_init_cpu_queues(q, set->nr_hw_queues); + blk_mq_init_cpu_queues(q); if (blk_mq_init_hw_queues(q, set)) goto err_hctxs; -- 1.8.3.1
[PATCH] blk-mq: remove the double hctx->numa_node init code.
Since the commit f14bbe77a96bb ("blk-mq: pass in suggested NUMA node to ->alloc_hctx()") has already computed and set the suggested NUMA node for hctx, so this code here is reduntant. Signed-off-by: Xiubo Li --- block/blk-mq.c | 19 ++- 1 file changed, 2 insertions(+), 17 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 4c0622f..ab86716 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1767,33 +1767,18 @@ static int blk_mq_init_hw_queues(struct request_queue *q, return 1; } -static void blk_mq_init_cpu_queues(struct request_queue *q, - unsigned int nr_hw_queues) +static void blk_mq_init_cpu_queues(struct request_queue *q) { unsigned int i; for_each_possible_cpu(i) { struct blk_mq_ctx *__ctx = per_cpu_ptr(q->queue_ctx, i); - struct blk_mq_hw_ctx *hctx; memset(__ctx, 0, sizeof(*__ctx)); __ctx->cpu = i; spin_lock_init(&__ctx->lock); INIT_LIST_HEAD(&__ctx->rq_list); __ctx->queue = q; - - /* If the cpu isn't online, the cpu is mapped to first hctx */ - if (!cpu_online(i)) - continue; - - hctx = q->mq_ops->map_queue(q, i); - - /* -* Set local node, IFF we have more than one hw queue. If -* not, we remain on the home node of the device -*/ - if (nr_hw_queues > 1 && hctx->numa_node == NUMA_NO_NODE) - hctx->numa_node = local_memory_node(cpu_to_node(i)); } } @@ -2046,7 +2031,7 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set, if (set->ops->complete) blk_queue_softirq_done(q, set->ops->complete); - blk_mq_init_cpu_queues(q, set->nr_hw_queues); + blk_mq_init_cpu_queues(q); if (blk_mq_init_hw_queues(q, set)) goto err_hctxs; -- 1.8.3.1
[PATCH v2] attribute_container: Fix typo
The 't' in "function" was missing, this patch fixes this typo: s/funcion/function/g Signed-off-by: Xiubo Li <lixi...@cmss.chinamobile.com> --- Changes for V2: - Add changelog text. drivers/base/attribute_container.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/base/attribute_container.c b/drivers/base/attribute_container.c index 2ba4cac..95e3ef8 100644 --- a/drivers/base/attribute_container.c +++ b/drivers/base/attribute_container.c @@ -243,7 +243,7 @@ attribute_container_remove_device(struct device *dev, * @dev: The generic device to run the trigger for * @fn the function to execute for each classdev. * - * This funcion is for executing a trigger when you need to know both + * This function is for executing a trigger when you need to know both * the container and the classdev. If you only care about the * container, then use attribute_container_trigger() instead. */ -- 1.8.3.1
[PATCH v2] attribute_container: Fix typo
The 't' in "function" was missing, this patch fixes this typo: s/funcion/function/g Signed-off-by: Xiubo Li --- Changes for V2: - Add changelog text. drivers/base/attribute_container.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/base/attribute_container.c b/drivers/base/attribute_container.c index 2ba4cac..95e3ef8 100644 --- a/drivers/base/attribute_container.c +++ b/drivers/base/attribute_container.c @@ -243,7 +243,7 @@ attribute_container_remove_device(struct device *dev, * @dev: The generic device to run the trigger for * @fn the function to execute for each classdev. * - * This funcion is for executing a trigger when you need to know both + * This function is for executing a trigger when you need to know both * the container and the classdev. If you only care about the * container, then use attribute_container_trigger() instead. */ -- 1.8.3.1
Re: [PATCH] attribute_container: Fix typo
On 31/05/2016 23:26, Greg KH wrote: On Tue, May 31, 2016 at 04:17:15PM +0800, Xiubo Li wrote: Signed-off-by: Xiubo Li <lixi...@cmss.chinamobile.com> I can't take patches without any changelog text, sorry. My mistake, I will send the v2 one. Thanks.
Re: [PATCH] attribute_container: Fix typo
On 31/05/2016 23:26, Greg KH wrote: On Tue, May 31, 2016 at 04:17:15PM +0800, Xiubo Li wrote: Signed-off-by: Xiubo Li I can't take patches without any changelog text, sorry. My mistake, I will send the v2 one. Thanks.
[PATCH] attribute_container: Fix typo
Signed-off-by: Xiubo Li <lixi...@cmss.chinamobile.com> --- drivers/base/attribute_container.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/base/attribute_container.c b/drivers/base/attribute_container.c index 2ba4cac..95e3ef8 100644 --- a/drivers/base/attribute_container.c +++ b/drivers/base/attribute_container.c @@ -243,7 +243,7 @@ attribute_container_remove_device(struct device *dev, * @dev: The generic device to run the trigger for * @fn the function to execute for each classdev. * - * This funcion is for executing a trigger when you need to know both + * This function is for executing a trigger when you need to know both * the container and the classdev. If you only care about the * container, then use attribute_container_trigger() instead. */ -- 1.8.3.1
[PATCH] attribute_container: Fix typo
Signed-off-by: Xiubo Li --- drivers/base/attribute_container.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/base/attribute_container.c b/drivers/base/attribute_container.c index 2ba4cac..95e3ef8 100644 --- a/drivers/base/attribute_container.c +++ b/drivers/base/attribute_container.c @@ -243,7 +243,7 @@ attribute_container_remove_device(struct device *dev, * @dev: The generic device to run the trigger for * @fn the function to execute for each classdev. * - * This funcion is for executing a trigger when you need to know both + * This function is for executing a trigger when you need to know both * the container and the classdev. If you only care about the * container, then use attribute_container_trigger() instead. */ -- 1.8.3.1
[PATCHv2 2/3] regcache: Introduce the index parsing API by stride order
Here introduces regcache_get_index_by_order() for regmap cache, which uses the register stride order and bit rotation, to improve the performance. Signed-off-by: Xiubo Li --- drivers/base/regmap/internal.h | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/base/regmap/internal.h b/drivers/base/regmap/internal.h index c22b04b..5c79526 100644 --- a/drivers/base/regmap/internal.h +++ b/drivers/base/regmap/internal.h @@ -273,4 +273,10 @@ static inline unsigned int regmap_get_offset(const struct regmap *map, return index * map->reg_stride; } +static inline unsigned int regcache_get_index_by_order(const struct regmap *map, + unsigned int reg) +{ + return reg >> map->reg_stride_order; +} + #endif -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHv2 3/3] regcache: flat: Introduce register strider order
Here we introduce regcache_flat_get_index(), which using register stride order and bit rotation, will save some memory spaces for flat cache. Though this will also lost some access performance, since the bit rotation is used to get the index of the cache array, and this could be ingored for memory I/O accessing. Signed-off-by: Xiubo Li --- drivers/base/regmap/regcache-flat.c | 20 +++- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/drivers/base/regmap/regcache-flat.c b/drivers/base/regmap/regcache-flat.c index 686c9e0..3ee7255 100644 --- a/drivers/base/regmap/regcache-flat.c +++ b/drivers/base/regmap/regcache-flat.c @@ -16,20 +16,30 @@ #include "internal.h" +static inline unsigned int regcache_flat_get_index(const struct regmap *map, + unsigned int reg) +{ + return regcache_get_index_by_order(map, reg); +} + static int regcache_flat_init(struct regmap *map) { int i; unsigned int *cache; - map->cache = kcalloc(map->max_register + 1, sizeof(unsigned int), -GFP_KERNEL); + if (!map || map->reg_stride_order < 0) + return -EINVAL; + + map->cache = kcalloc(regcache_flat_get_index(map, map->max_register) ++ 1, sizeof(unsigned int), GFP_KERNEL); if (!map->cache) return -ENOMEM; cache = map->cache; for (i = 0; i < map->num_reg_defaults; i++) - cache[map->reg_defaults[i].reg] = map->reg_defaults[i].def; + cache[regcache_flat_get_index(map, map->reg_defaults[i].reg)] = + map->reg_defaults[i].def; return 0; } @@ -47,7 +57,7 @@ static int regcache_flat_read(struct regmap *map, { unsigned int *cache = map->cache; - *value = cache[reg]; + *value = cache[regcache_flat_get_index(map, reg)]; return 0; } @@ -57,7 +67,7 @@ static int regcache_flat_write(struct regmap *map, unsigned int reg, { unsigned int *cache = map->cache; - cache[reg] = value; + cache[regcache_flat_get_index(map, reg)] = value; return 0; } -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHv2 0/3] regmap: Introduce regsiter stride order
Please ignore the V1 series. Changed in V2: - Introduce regsiter stride order by continue supporting non power of two strides. Xiubo Li (3): regmap: core: Introduce regsiter stride order regcache: Introduce the index parsing API by stride order regcache: flat: Introduce register strider order drivers/base/regmap/internal.h | 16 drivers/base/regmap/regcache-flat.c | 20 +++- drivers/base/regmap/regmap.c| 19 +-- 3 files changed, 44 insertions(+), 11 deletions(-) -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHv2 1/3] regmap: core: Introduce regsiter stride order
Since the register stride should always equal to 2^N, and bit rotation is much faster than multiplication and division. So introducing the stride order and using bit rotation to get the offset of the register from the index to improve the performance. Signed-off-by: Xiubo Li --- drivers/base/regmap/internal.h | 10 ++ drivers/base/regmap/regmap.c | 19 +-- 2 files changed, 23 insertions(+), 6 deletions(-) diff --git a/drivers/base/regmap/internal.h b/drivers/base/regmap/internal.h index 3df9770..c22b04b 100644 --- a/drivers/base/regmap/internal.h +++ b/drivers/base/regmap/internal.h @@ -110,6 +110,7 @@ struct regmap { /* number of bits to (left) shift the reg value when formatting*/ int reg_shift; int reg_stride; + int reg_stride_order; /* regcache specific members */ const struct regcache_ops *cache_ops; @@ -263,4 +264,13 @@ static inline const char *regmap_name(const struct regmap *map) return map->name; } +static inline unsigned int regmap_get_offset(const struct regmap *map, +unsigned int index) +{ + if (map->reg_stride_order >= 0) + return index << map->reg_stride_order; + else + return index * map->reg_stride; +} + #endif diff --git a/drivers/base/regmap/regmap.c b/drivers/base/regmap/regmap.c index ee54e84..29d526e 100644 --- a/drivers/base/regmap/regmap.c +++ b/drivers/base/regmap/regmap.c @@ -19,6 +19,7 @@ #include #include #include +#include #define CREATE_TRACE_POINTS #include "trace.h" @@ -638,6 +639,10 @@ struct regmap *__regmap_init(struct device *dev, map->reg_stride = config->reg_stride; else map->reg_stride = 1; + if (is_power_of_2(map->reg_stride)) + map->reg_stride_order = ilog2(map->reg_stride); + else + map->reg_stride_order = -1; map->use_single_read = config->use_single_rw || !bus || !bus->read; map->use_single_write = config->use_single_rw || !bus || !bus->write; map->can_multi_write = config->can_multi_write && bus && bus->write; @@ -1308,7 +1313,7 @@ int _regmap_raw_write(struct regmap *map, unsigned int reg, if (map->writeable_reg) for (i = 0; i < val_len / map->format.val_bytes; i++) if (!map->writeable_reg(map->dev, - reg + (i * map->reg_stride))) + reg + regmap_get_offset(map, i))) return -EINVAL; if (!map->cache_bypass && map->format.parse_val) { @@ -1316,7 +1321,8 @@ int _regmap_raw_write(struct regmap *map, unsigned int reg, int val_bytes = map->format.val_bytes; for (i = 0; i < val_len / val_bytes; i++) { ival = map->format.parse_val(val + (i * val_bytes)); - ret = regcache_write(map, reg + (i * map->reg_stride), + ret = regcache_write(map, +reg + regmap_get_offset(map, i), ival); if (ret) { dev_err(map->dev, @@ -1846,8 +1852,9 @@ int regmap_bulk_write(struct regmap *map, unsigned int reg, const void *val, goto out; } - ret = _regmap_write(map, reg + (i * map->reg_stride), - ival); + ret = _regmap_write(map, + reg + regmap_get_offset(map, i), + ival); if (ret != 0) goto out; } @@ -2416,7 +2423,7 @@ int regmap_raw_read(struct regmap *map, unsigned int reg, void *val, * cost as we expect to hit the cache. */ for (i = 0; i < val_count; i++) { - ret = _regmap_read(map, reg + (i * map->reg_stride), + ret = _regmap_read(map, reg + regmap_get_offset(map, i), ); if (ret != 0) goto out; @@ -2568,7 +2575,7 @@ int regmap_bulk_read(struct regmap *map, unsigned int reg, void *val, } else { for (i = 0; i < val_count; i++) { unsigned int ival; - ret = regmap_read(map, reg + (i * map->reg_stride), + ret = regmap_read(map, reg + regmap_get_offset(map, i), ); if (ret != 0)
[PATCHv2 1/3] regmap: core: Introduce regsiter stride order
Since the register stride should always equal to 2^N, and bit rotation is much faster than multiplication and division. So introducing the stride order and using bit rotation to get the offset of the register from the index to improve the performance. Signed-off-by: Xiubo Li <lixi...@cmss.chinamobile.com> --- drivers/base/regmap/internal.h | 10 ++ drivers/base/regmap/regmap.c | 19 +-- 2 files changed, 23 insertions(+), 6 deletions(-) diff --git a/drivers/base/regmap/internal.h b/drivers/base/regmap/internal.h index 3df9770..c22b04b 100644 --- a/drivers/base/regmap/internal.h +++ b/drivers/base/regmap/internal.h @@ -110,6 +110,7 @@ struct regmap { /* number of bits to (left) shift the reg value when formatting*/ int reg_shift; int reg_stride; + int reg_stride_order; /* regcache specific members */ const struct regcache_ops *cache_ops; @@ -263,4 +264,13 @@ static inline const char *regmap_name(const struct regmap *map) return map->name; } +static inline unsigned int regmap_get_offset(const struct regmap *map, +unsigned int index) +{ + if (map->reg_stride_order >= 0) + return index << map->reg_stride_order; + else + return index * map->reg_stride; +} + #endif diff --git a/drivers/base/regmap/regmap.c b/drivers/base/regmap/regmap.c index ee54e84..29d526e 100644 --- a/drivers/base/regmap/regmap.c +++ b/drivers/base/regmap/regmap.c @@ -19,6 +19,7 @@ #include #include #include +#include #define CREATE_TRACE_POINTS #include "trace.h" @@ -638,6 +639,10 @@ struct regmap *__regmap_init(struct device *dev, map->reg_stride = config->reg_stride; else map->reg_stride = 1; + if (is_power_of_2(map->reg_stride)) + map->reg_stride_order = ilog2(map->reg_stride); + else + map->reg_stride_order = -1; map->use_single_read = config->use_single_rw || !bus || !bus->read; map->use_single_write = config->use_single_rw || !bus || !bus->write; map->can_multi_write = config->can_multi_write && bus && bus->write; @@ -1308,7 +1313,7 @@ int _regmap_raw_write(struct regmap *map, unsigned int reg, if (map->writeable_reg) for (i = 0; i < val_len / map->format.val_bytes; i++) if (!map->writeable_reg(map->dev, - reg + (i * map->reg_stride))) + reg + regmap_get_offset(map, i))) return -EINVAL; if (!map->cache_bypass && map->format.parse_val) { @@ -1316,7 +1321,8 @@ int _regmap_raw_write(struct regmap *map, unsigned int reg, int val_bytes = map->format.val_bytes; for (i = 0; i < val_len / val_bytes; i++) { ival = map->format.parse_val(val + (i * val_bytes)); - ret = regcache_write(map, reg + (i * map->reg_stride), + ret = regcache_write(map, +reg + regmap_get_offset(map, i), ival); if (ret) { dev_err(map->dev, @@ -1846,8 +1852,9 @@ int regmap_bulk_write(struct regmap *map, unsigned int reg, const void *val, goto out; } - ret = _regmap_write(map, reg + (i * map->reg_stride), - ival); + ret = _regmap_write(map, + reg + regmap_get_offset(map, i), + ival); if (ret != 0) goto out; } @@ -2416,7 +2423,7 @@ int regmap_raw_read(struct regmap *map, unsigned int reg, void *val, * cost as we expect to hit the cache. */ for (i = 0; i < val_count; i++) { - ret = _regmap_read(map, reg + (i * map->reg_stride), + ret = _regmap_read(map, reg + regmap_get_offset(map, i), ); if (ret != 0) goto out; @@ -2568,7 +2575,7 @@ int regmap_bulk_read(struct regmap *map, unsigned int reg, void *val, } else { for (i = 0; i < val_count; i++) { unsigned int ival; - ret = regmap_read(map, reg + (i * map->reg_stride), + ret = regmap_read(map, reg + regmap_get_offset(map, i), );
[PATCHv2 2/3] regcache: Introduce the index parsing API by stride order
Here introduces regcache_get_index_by_order() for regmap cache, which uses the register stride order and bit rotation, to improve the performance. Signed-off-by: Xiubo Li <lixi...@cmss.chinamobile.com> --- drivers/base/regmap/internal.h | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/base/regmap/internal.h b/drivers/base/regmap/internal.h index c22b04b..5c79526 100644 --- a/drivers/base/regmap/internal.h +++ b/drivers/base/regmap/internal.h @@ -273,4 +273,10 @@ static inline unsigned int regmap_get_offset(const struct regmap *map, return index * map->reg_stride; } +static inline unsigned int regcache_get_index_by_order(const struct regmap *map, + unsigned int reg) +{ + return reg >> map->reg_stride_order; +} + #endif -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHv2 0/3] regmap: Introduce regsiter stride order
Please ignore the V1 series. Changed in V2: - Introduce regsiter stride order by continue supporting non power of two strides. Xiubo Li (3): regmap: core: Introduce regsiter stride order regcache: Introduce the index parsing API by stride order regcache: flat: Introduce register strider order drivers/base/regmap/internal.h | 16 drivers/base/regmap/regcache-flat.c | 20 +++- drivers/base/regmap/regmap.c| 19 +-- 3 files changed, 44 insertions(+), 11 deletions(-) -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHv2 3/3] regcache: flat: Introduce register strider order
Here we introduce regcache_flat_get_index(), which using register stride order and bit rotation, will save some memory spaces for flat cache. Though this will also lost some access performance, since the bit rotation is used to get the index of the cache array, and this could be ingored for memory I/O accessing. Signed-off-by: Xiubo Li <lixi...@cmss.chinamobile.com> --- drivers/base/regmap/regcache-flat.c | 20 +++- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/drivers/base/regmap/regcache-flat.c b/drivers/base/regmap/regcache-flat.c index 686c9e0..3ee7255 100644 --- a/drivers/base/regmap/regcache-flat.c +++ b/drivers/base/regmap/regcache-flat.c @@ -16,20 +16,30 @@ #include "internal.h" +static inline unsigned int regcache_flat_get_index(const struct regmap *map, + unsigned int reg) +{ + return regcache_get_index_by_order(map, reg); +} + static int regcache_flat_init(struct regmap *map) { int i; unsigned int *cache; - map->cache = kcalloc(map->max_register + 1, sizeof(unsigned int), -GFP_KERNEL); + if (!map || map->reg_stride_order < 0) + return -EINVAL; + + map->cache = kcalloc(regcache_flat_get_index(map, map->max_register) ++ 1, sizeof(unsigned int), GFP_KERNEL); if (!map->cache) return -ENOMEM; cache = map->cache; for (i = 0; i < map->num_reg_defaults; i++) - cache[map->reg_defaults[i].reg] = map->reg_defaults[i].def; + cache[regcache_flat_get_index(map, map->reg_defaults[i].reg)] = + map->reg_defaults[i].def; return 0; } @@ -47,7 +57,7 @@ static int regcache_flat_read(struct regmap *map, { unsigned int *cache = map->cache; - *value = cache[reg]; + *value = cache[regcache_flat_get_index(map, reg)]; return 0; } @@ -57,7 +67,7 @@ static int regcache_flat_write(struct regmap *map, unsigned int reg, { unsigned int *cache = map->cache; - cache[reg] = value; + cache[regcache_flat_get_index(map, reg)] = value; return 0; } -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] regmap: flat: introduce register striding to savesomememories
On 31/12/2015 01:58, Mark Brown wrote: On Fri, Dec 18, 2015 at 04:59:38PM +0800, lixi...@cmss.chinamobile.com wrote: I think we'll need to continue supporting non power of two strides so an unconditional conversion to shifts might be an issue - some weird DSP probably does that. Yes, agreed. IMO this won't happen to MMIO, and for the device using MMIO the register strides should equal to power of two. Are there some cases I have met? DSPs exposed via I2C and SPI are the main things I'm worried about. It's fairly common for DSPs to have unusual word sizes including things like three bytes. Yes, if so, for this case the non power of two strides should be still supported. Thanks for your promotion, and I will think over of this carefully. BRs Xiubo Li -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] regmap: flat: introduce register striding to savesomememories
On 31/12/2015 01:58, Mark Brown wrote: On Fri, Dec 18, 2015 at 04:59:38PM +0800, lixi...@cmss.chinamobile.com wrote: I think we'll need to continue supporting non power of two strides so an unconditional conversion to shifts might be an issue - some weird DSP probably does that. Yes, agreed. IMO this won't happen to MMIO, and for the device using MMIO the register strides should equal to power of two. Are there some cases I have met? DSPs exposed via I2C and SPI are the main things I'm worried about. It's fairly common for DSPs to have unusual word sizes including things like three bytes. Yes, if so, for this case the non power of two strides should be still supported. Thanks for your promotion, and I will think over of this carefully. BRs Xiubo Li -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/3] regcache: flat: Introduce regcache_get_index()
Here we introduce regcache_get_index(), which using register stride order and bit rotation, will save some memory spaces for flat cache. Though this will also lost some access performance, since the bit rotation is used to get the index of the cache array, and this could be ingored for memory I/O accessing. Signed-off-by: Xiubo Li --- drivers/base/regmap/regcache-flat.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/drivers/base/regmap/regcache-flat.c b/drivers/base/regmap/regcache-flat.c index 686c9e0..218cf39 100644 --- a/drivers/base/regmap/regcache-flat.c +++ b/drivers/base/regmap/regcache-flat.c @@ -21,15 +21,16 @@ static int regcache_flat_init(struct regmap *map) int i; unsigned int *cache; - map->cache = kcalloc(map->max_register + 1, sizeof(unsigned int), -GFP_KERNEL); + map->cache = kcalloc(regcache_get_index(map, map->max_register) + 1, +sizeof(unsigned int), GFP_KERNEL); if (!map->cache) return -ENOMEM; cache = map->cache; for (i = 0; i < map->num_reg_defaults; i++) - cache[map->reg_defaults[i].reg] = map->reg_defaults[i].def; + cache[regcache_get_index(map, map->reg_defaults[i].reg)] = + map->reg_defaults[i].def; return 0; } @@ -47,7 +48,7 @@ static int regcache_flat_read(struct regmap *map, { unsigned int *cache = map->cache; - *value = cache[reg]; + *value = cache[regcache_get_index(map, reg)]; return 0; } @@ -57,7 +58,7 @@ static int regcache_flat_write(struct regmap *map, unsigned int reg, { unsigned int *cache = map->cache; - cache[reg] = value; + cache[regcache_get_index(map, reg)] = value; return 0; } -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/3] regmap: core: Introduce register stride order
Since the register stride should always equal to 2^N, and bit rotation is much faster than multiplication and division. So introducing the stride order and using bit rotation to get the offset of the register from the index to improve the performance. Signed-off-by: Xiubo Li --- drivers/base/regmap/internal.h | 7 +++ drivers/base/regmap/regmap.c | 15 +-- 2 files changed, 16 insertions(+), 6 deletions(-) diff --git a/drivers/base/regmap/internal.h b/drivers/base/regmap/internal.h index 3df9770..d43784e 100644 --- a/drivers/base/regmap/internal.h +++ b/drivers/base/regmap/internal.h @@ -110,6 +110,7 @@ struct regmap { /* number of bits to (left) shift the reg value when formatting*/ int reg_shift; int reg_stride; + int reg_stride_order; /* regcache specific members */ const struct regcache_ops *cache_ops; @@ -263,4 +264,10 @@ static inline const char *regmap_name(const struct regmap *map) return map->name; } +static inline unsigned int regmap_get_offset(const struct regmap *map, +unsigned int index) +{ + return index << map->reg_stride_order; +} + #endif diff --git a/drivers/base/regmap/regmap.c b/drivers/base/regmap/regmap.c index ee54e84..9890d27 100644 --- a/drivers/base/regmap/regmap.c +++ b/drivers/base/regmap/regmap.c @@ -638,6 +638,7 @@ struct regmap *__regmap_init(struct device *dev, map->reg_stride = config->reg_stride; else map->reg_stride = 1; + map->reg_stride_order = get_order(map->reg_stride); map->use_single_read = config->use_single_rw || !bus || !bus->read; map->use_single_write = config->use_single_rw || !bus || !bus->write; map->can_multi_write = config->can_multi_write && bus && bus->write; @@ -1308,7 +1309,7 @@ int _regmap_raw_write(struct regmap *map, unsigned int reg, if (map->writeable_reg) for (i = 0; i < val_len / map->format.val_bytes; i++) if (!map->writeable_reg(map->dev, - reg + (i * map->reg_stride))) + reg + regmap_get_offset(map, i))) return -EINVAL; if (!map->cache_bypass && map->format.parse_val) { @@ -1316,7 +1317,8 @@ int _regmap_raw_write(struct regmap *map, unsigned int reg, int val_bytes = map->format.val_bytes; for (i = 0; i < val_len / val_bytes; i++) { ival = map->format.parse_val(val + (i * val_bytes)); - ret = regcache_write(map, reg + (i * map->reg_stride), + ret = regcache_write(map, +reg + regmap_get_offset(map, i), ival); if (ret) { dev_err(map->dev, @@ -1846,8 +1848,9 @@ int regmap_bulk_write(struct regmap *map, unsigned int reg, const void *val, goto out; } - ret = _regmap_write(map, reg + (i * map->reg_stride), - ival); + ret = _regmap_write(map, + reg + regmap_get_offset(map, i), + ival); if (ret != 0) goto out; } @@ -2416,7 +2419,7 @@ int regmap_raw_read(struct regmap *map, unsigned int reg, void *val, * cost as we expect to hit the cache. */ for (i = 0; i < val_count; i++) { - ret = _regmap_read(map, reg + (i * map->reg_stride), + ret = _regmap_read(map, reg + regmap_get_offset(map, i), ); if (ret != 0) goto out; @@ -2568,7 +2571,7 @@ int regmap_bulk_read(struct regmap *map, unsigned int reg, void *val, } else { for (i = 0; i < val_count; i++) { unsigned int ival; - ret = regmap_read(map, reg + (i * map->reg_stride), + ret = regmap_read(map, reg + regmap_get_offset(map, i), ); if (ret != 0) return ret; -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/3] Introduce reg stride order
Xiubo Li (3): regmap: core: Introduce register stride order regcache: Introduce the index parsing API regcache: flat: Introduce regcache_get_index() drivers/base/regmap/internal.h | 13 + drivers/base/regmap/regcache-flat.c | 11 ++- drivers/base/regmap/regmap.c| 15 +-- 3 files changed, 28 insertions(+), 11 deletions(-) -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/3] regcache: Introduce the index parsing API
Here introduces regcache_get_index() for regmap cache, which uses the register stride order and bit rotation. Signed-off-by: Xiubo Li --- drivers/base/regmap/internal.h | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/base/regmap/internal.h b/drivers/base/regmap/internal.h index d43784e..60699e6 100644 --- a/drivers/base/regmap/internal.h +++ b/drivers/base/regmap/internal.h @@ -270,4 +270,10 @@ static inline unsigned int regmap_get_offset(const struct regmap *map, return index << map->reg_stride_order; } +static inline unsigned int regcache_get_index(const struct regmap *map, + unsigned int reg) +{ + return reg >> map->reg_stride_order; +} + #endif -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/3] regcache: Introduce the index parsing API
Here introduces regcache_get_index() for regmap cache, which uses the register stride order and bit rotation. Signed-off-by: Xiubo Li <lixi...@cmss.chinamobile.com> --- drivers/base/regmap/internal.h | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/base/regmap/internal.h b/drivers/base/regmap/internal.h index d43784e..60699e6 100644 --- a/drivers/base/regmap/internal.h +++ b/drivers/base/regmap/internal.h @@ -270,4 +270,10 @@ static inline unsigned int regmap_get_offset(const struct regmap *map, return index << map->reg_stride_order; } +static inline unsigned int regcache_get_index(const struct regmap *map, + unsigned int reg) +{ + return reg >> map->reg_stride_order; +} + #endif -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/3] regcache: flat: Introduce regcache_get_index()
Here we introduce regcache_get_index(), which using register stride order and bit rotation, will save some memory spaces for flat cache. Though this will also lost some access performance, since the bit rotation is used to get the index of the cache array, and this could be ingored for memory I/O accessing. Signed-off-by: Xiubo Li <lixi...@cmss.chinamobile.com> --- drivers/base/regmap/regcache-flat.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/drivers/base/regmap/regcache-flat.c b/drivers/base/regmap/regcache-flat.c index 686c9e0..218cf39 100644 --- a/drivers/base/regmap/regcache-flat.c +++ b/drivers/base/regmap/regcache-flat.c @@ -21,15 +21,16 @@ static int regcache_flat_init(struct regmap *map) int i; unsigned int *cache; - map->cache = kcalloc(map->max_register + 1, sizeof(unsigned int), -GFP_KERNEL); + map->cache = kcalloc(regcache_get_index(map, map->max_register) + 1, +sizeof(unsigned int), GFP_KERNEL); if (!map->cache) return -ENOMEM; cache = map->cache; for (i = 0; i < map->num_reg_defaults; i++) - cache[map->reg_defaults[i].reg] = map->reg_defaults[i].def; + cache[regcache_get_index(map, map->reg_defaults[i].reg)] = + map->reg_defaults[i].def; return 0; } @@ -47,7 +48,7 @@ static int regcache_flat_read(struct regmap *map, { unsigned int *cache = map->cache; - *value = cache[reg]; + *value = cache[regcache_get_index(map, reg)]; return 0; } @@ -57,7 +58,7 @@ static int regcache_flat_write(struct regmap *map, unsigned int reg, { unsigned int *cache = map->cache; - cache[reg] = value; + cache[regcache_get_index(map, reg)] = value; return 0; } -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/3] regmap: core: Introduce register stride order
Since the register stride should always equal to 2^N, and bit rotation is much faster than multiplication and division. So introducing the stride order and using bit rotation to get the offset of the register from the index to improve the performance. Signed-off-by: Xiubo Li <lixi...@cmss.chinamobile.com> --- drivers/base/regmap/internal.h | 7 +++ drivers/base/regmap/regmap.c | 15 +-- 2 files changed, 16 insertions(+), 6 deletions(-) diff --git a/drivers/base/regmap/internal.h b/drivers/base/regmap/internal.h index 3df9770..d43784e 100644 --- a/drivers/base/regmap/internal.h +++ b/drivers/base/regmap/internal.h @@ -110,6 +110,7 @@ struct regmap { /* number of bits to (left) shift the reg value when formatting*/ int reg_shift; int reg_stride; + int reg_stride_order; /* regcache specific members */ const struct regcache_ops *cache_ops; @@ -263,4 +264,10 @@ static inline const char *regmap_name(const struct regmap *map) return map->name; } +static inline unsigned int regmap_get_offset(const struct regmap *map, +unsigned int index) +{ + return index << map->reg_stride_order; +} + #endif diff --git a/drivers/base/regmap/regmap.c b/drivers/base/regmap/regmap.c index ee54e84..9890d27 100644 --- a/drivers/base/regmap/regmap.c +++ b/drivers/base/regmap/regmap.c @@ -638,6 +638,7 @@ struct regmap *__regmap_init(struct device *dev, map->reg_stride = config->reg_stride; else map->reg_stride = 1; + map->reg_stride_order = get_order(map->reg_stride); map->use_single_read = config->use_single_rw || !bus || !bus->read; map->use_single_write = config->use_single_rw || !bus || !bus->write; map->can_multi_write = config->can_multi_write && bus && bus->write; @@ -1308,7 +1309,7 @@ int _regmap_raw_write(struct regmap *map, unsigned int reg, if (map->writeable_reg) for (i = 0; i < val_len / map->format.val_bytes; i++) if (!map->writeable_reg(map->dev, - reg + (i * map->reg_stride))) + reg + regmap_get_offset(map, i))) return -EINVAL; if (!map->cache_bypass && map->format.parse_val) { @@ -1316,7 +1317,8 @@ int _regmap_raw_write(struct regmap *map, unsigned int reg, int val_bytes = map->format.val_bytes; for (i = 0; i < val_len / val_bytes; i++) { ival = map->format.parse_val(val + (i * val_bytes)); - ret = regcache_write(map, reg + (i * map->reg_stride), + ret = regcache_write(map, +reg + regmap_get_offset(map, i), ival); if (ret) { dev_err(map->dev, @@ -1846,8 +1848,9 @@ int regmap_bulk_write(struct regmap *map, unsigned int reg, const void *val, goto out; } - ret = _regmap_write(map, reg + (i * map->reg_stride), - ival); + ret = _regmap_write(map, + reg + regmap_get_offset(map, i), + ival); if (ret != 0) goto out; } @@ -2416,7 +2419,7 @@ int regmap_raw_read(struct regmap *map, unsigned int reg, void *val, * cost as we expect to hit the cache. */ for (i = 0; i < val_count; i++) { - ret = _regmap_read(map, reg + (i * map->reg_stride), + ret = _regmap_read(map, reg + regmap_get_offset(map, i), ); if (ret != 0) goto out; @@ -2568,7 +2571,7 @@ int regmap_bulk_read(struct regmap *map, unsigned int reg, void *val, } else { for (i = 0; i < val_count; i++) { unsigned int ival; - ret = regmap_read(map, reg + (i * map->reg_stride), + ret = regmap_read(map, reg + regmap_get_offset(map, i), ); if (ret != 0) return ret; -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/3] Introduce reg stride order
Xiubo Li (3): regmap: core: Introduce register stride order regcache: Introduce the index parsing API regcache: flat: Introduce regcache_get_index() drivers/base/regmap/internal.h | 13 + drivers/base/regmap/regcache-flat.c | 11 ++- drivers/base/regmap/regmap.c| 15 +-- 3 files changed, 28 insertions(+), 11 deletions(-) -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] regmap: use IS_ALIGNED instead of % to improve the performance
The stride value should always equal to 2^n, so we can use bit rotation instead of % to improve the performance. Signed-off-by: Xiubo Li --- drivers/base/regmap/regmap.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/base/regmap/regmap.c b/drivers/base/regmap/regmap.c index d27fe2f..ee54e84 100644 --- a/drivers/base/regmap/regmap.c +++ b/drivers/base/regmap/regmap.c @@ -1607,7 +1607,7 @@ int regmap_write(struct regmap *map, unsigned int reg, unsigned int val) { int ret; - if (reg % map->reg_stride) + if (!IS_ALIGNED(reg, map->reg_stride)) return -EINVAL; map->lock(map->lock_arg); @@ -1634,7 +1634,7 @@ int regmap_write_async(struct regmap *map, unsigned int reg, unsigned int val) { int ret; - if (reg % map->reg_stride) + if (!IS_ALIGNED(reg, map->reg_stride)) return -EINVAL; map->lock(map->lock_arg); @@ -1808,7 +1808,7 @@ int regmap_bulk_write(struct regmap *map, unsigned int reg, const void *val, if (map->bus && !map->format.parse_inplace) return -EINVAL; - if (reg % map->reg_stride) + if (!IS_ALIGNED(reg, map->reg_stride)) return -EINVAL; /* @@ -2077,7 +2077,7 @@ static int _regmap_multi_reg_write(struct regmap *map, int reg = regs[i].reg; if (!map->writeable_reg(map->dev, reg)) return -EINVAL; - if (reg % map->reg_stride) + if (!IS_ALIGNED(reg, map->reg_stride)) return -EINVAL; } @@ -2227,7 +2227,7 @@ int regmap_raw_write_async(struct regmap *map, unsigned int reg, if (val_len % map->format.val_bytes) return -EINVAL; - if (reg % map->reg_stride) + if (!IS_ALIGNED(reg, map->reg_stride)) return -EINVAL; map->lock(map->lock_arg); @@ -2354,7 +2354,7 @@ int regmap_read(struct regmap *map, unsigned int reg, unsigned int *val) { int ret; - if (reg % map->reg_stride) + if (!IS_ALIGNED(reg, map->reg_stride)) return -EINVAL; map->lock(map->lock_arg); @@ -2390,7 +2390,7 @@ int regmap_raw_read(struct regmap *map, unsigned int reg, void *val, return -EINVAL; if (val_len % map->format.val_bytes) return -EINVAL; - if (reg % map->reg_stride) + if (!IS_ALIGNED(reg, map->reg_stride)) return -EINVAL; if (val_count == 0) return -EINVAL; @@ -2508,7 +2508,7 @@ int regmap_bulk_read(struct regmap *map, unsigned int reg, void *val, size_t val_bytes = map->format.val_bytes; bool vol = regmap_volatile_range(map, reg, val_count); - if (reg % map->reg_stride) + if (!IS_ALIGNED(reg, map->reg_stride)) return -EINVAL; if (map->bus && map->format.parse_inplace && (vol || map->cache_type == REGCACHE_NONE)) { -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] regmap: use IS_ALIGNED instead of % to improve the performance
The stride value should always equal to 2^n, so we can use bit rotation instead of % to improve the performance. Signed-off-by: Xiubo Li <lixi...@cmss.chinamobile.com> --- drivers/base/regmap/regmap.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/base/regmap/regmap.c b/drivers/base/regmap/regmap.c index d27fe2f..ee54e84 100644 --- a/drivers/base/regmap/regmap.c +++ b/drivers/base/regmap/regmap.c @@ -1607,7 +1607,7 @@ int regmap_write(struct regmap *map, unsigned int reg, unsigned int val) { int ret; - if (reg % map->reg_stride) + if (!IS_ALIGNED(reg, map->reg_stride)) return -EINVAL; map->lock(map->lock_arg); @@ -1634,7 +1634,7 @@ int regmap_write_async(struct regmap *map, unsigned int reg, unsigned int val) { int ret; - if (reg % map->reg_stride) + if (!IS_ALIGNED(reg, map->reg_stride)) return -EINVAL; map->lock(map->lock_arg); @@ -1808,7 +1808,7 @@ int regmap_bulk_write(struct regmap *map, unsigned int reg, const void *val, if (map->bus && !map->format.parse_inplace) return -EINVAL; - if (reg % map->reg_stride) + if (!IS_ALIGNED(reg, map->reg_stride)) return -EINVAL; /* @@ -2077,7 +2077,7 @@ static int _regmap_multi_reg_write(struct regmap *map, int reg = regs[i].reg; if (!map->writeable_reg(map->dev, reg)) return -EINVAL; - if (reg % map->reg_stride) + if (!IS_ALIGNED(reg, map->reg_stride)) return -EINVAL; } @@ -2227,7 +2227,7 @@ int regmap_raw_write_async(struct regmap *map, unsigned int reg, if (val_len % map->format.val_bytes) return -EINVAL; - if (reg % map->reg_stride) + if (!IS_ALIGNED(reg, map->reg_stride)) return -EINVAL; map->lock(map->lock_arg); @@ -2354,7 +2354,7 @@ int regmap_read(struct regmap *map, unsigned int reg, unsigned int *val) { int ret; - if (reg % map->reg_stride) + if (!IS_ALIGNED(reg, map->reg_stride)) return -EINVAL; map->lock(map->lock_arg); @@ -2390,7 +2390,7 @@ int regmap_raw_read(struct regmap *map, unsigned int reg, void *val, return -EINVAL; if (val_len % map->format.val_bytes) return -EINVAL; - if (reg % map->reg_stride) + if (!IS_ALIGNED(reg, map->reg_stride)) return -EINVAL; if (val_count == 0) return -EINVAL; @@ -2508,7 +2508,7 @@ int regmap_bulk_read(struct regmap *map, unsigned int reg, void *val, size_t val_bytes = map->format.val_bytes; bool vol = regmap_volatile_range(map, reg, val_count); - if (reg % map->reg_stride) + if (!IS_ALIGNED(reg, map->reg_stride)) return -EINVAL; if (map->bus && map->format.parse_inplace && (vol || map->cache_type == REGCACHE_NONE)) { -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] regmap: flat: introduce register striding to save some memories
Throughout the regcache-flat code, for the flat cache array, the actual register address offsets are used to index the values, and this will waste some memory spaces. For example, on 64-BIT platform, device A has three registers: register offsets: {0x00, 0x08, 0x10} max_register: 0x10 regsiter striding: 8 And the size of flat cache memory space will be: (max_register + 1 ) * sizeof(unsigned int) = (0x10 + 1) * sizeof(unsigned int) = 17 * 8 = 136 Bytes Since map->reg_stride has been introduced and all extant register addresses are a multiple of this value, it could use the address offsets divide by the stride to determine the index. Then the size of flat cache memory space will be: (max_register / reg_stride + 1 ) * sizeof(unsigned int) = (0x10 / 8 + 1) * sizeof(unsigned int) = 3 * 8 = 24 Bytes And the bigger of the striding value, there will be more memory space wasted. After introducing the register striding here can save some memeories for the system. Signed-off-by: Xiubo Li --- drivers/base/regmap/regcache-flat.c | 18 +++--- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/drivers/base/regmap/regcache-flat.c b/drivers/base/regmap/regcache-flat.c index 686c9e0..160d636 100644 --- a/drivers/base/regmap/regcache-flat.c +++ b/drivers/base/regmap/regcache-flat.c @@ -19,17 +19,19 @@ static int regcache_flat_init(struct regmap *map) { int i; - unsigned int *cache; + unsigned int index, *cache; - map->cache = kcalloc(map->max_register + 1, sizeof(unsigned int), -GFP_KERNEL); + map->cache = kcalloc(map->max_register / map->reg_stride + 1, +sizeof(unsigned int), GFP_KERNEL); if (!map->cache) return -ENOMEM; cache = map->cache; - for (i = 0; i < map->num_reg_defaults; i++) - cache[map->reg_defaults[i].reg] = map->reg_defaults[i].def; + for (i = 0; i < map->num_reg_defaults; i++) { + index = map->reg_defaults[i].reg / map->reg_stride; + cache[index] = map->reg_defaults[i].def; + } return 0; } @@ -45,9 +47,10 @@ static int regcache_flat_exit(struct regmap *map) static int regcache_flat_read(struct regmap *map, unsigned int reg, unsigned int *value) { + unsigned int index = reg / map->reg_stride; unsigned int *cache = map->cache; - *value = cache[reg]; + *value = cache[index]; return 0; } @@ -55,9 +58,10 @@ static int regcache_flat_read(struct regmap *map, static int regcache_flat_write(struct regmap *map, unsigned int reg, unsigned int value) { + unsigned int index = reg / map->reg_stride; unsigned int *cache = map->cache; - cache[reg] = value; + cache[index] = value; return 0; } -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] regmap: flat: introduce register striding to save some memories
Throughout the regcache-flat code, for the flat cache array, the actual register address offsets are used to index the values, and this will waste some memory spaces. For example, on 64-BIT platform, device A has three registers: register offsets: {0x00, 0x08, 0x10} max_register: 0x10 regsiter striding: 8 And the size of flat cache memory space will be: (max_register + 1 ) * sizeof(unsigned int) = (0x10 + 1) * sizeof(unsigned int) = 17 * 8 = 136 Bytes Since map->reg_stride has been introduced and all extant register addresses are a multiple of this value, it could use the address offsets divide by the stride to determine the index. Then the size of flat cache memory space will be: (max_register / reg_stride + 1 ) * sizeof(unsigned int) = (0x10 / 8 + 1) * sizeof(unsigned int) = 3 * 8 = 24 Bytes And the bigger of the striding value, there will be more memory space wasted. After introducing the register striding here can save some memeories for the system. Signed-off-by: Xiubo Li <lixi...@cmss.chinamobile.com> --- drivers/base/regmap/regcache-flat.c | 18 +++--- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/drivers/base/regmap/regcache-flat.c b/drivers/base/regmap/regcache-flat.c index 686c9e0..160d636 100644 --- a/drivers/base/regmap/regcache-flat.c +++ b/drivers/base/regmap/regcache-flat.c @@ -19,17 +19,19 @@ static int regcache_flat_init(struct regmap *map) { int i; - unsigned int *cache; + unsigned int index, *cache; - map->cache = kcalloc(map->max_register + 1, sizeof(unsigned int), -GFP_KERNEL); + map->cache = kcalloc(map->max_register / map->reg_stride + 1, +sizeof(unsigned int), GFP_KERNEL); if (!map->cache) return -ENOMEM; cache = map->cache; - for (i = 0; i < map->num_reg_defaults; i++) - cache[map->reg_defaults[i].reg] = map->reg_defaults[i].def; + for (i = 0; i < map->num_reg_defaults; i++) { + index = map->reg_defaults[i].reg / map->reg_stride; + cache[index] = map->reg_defaults[i].def; + } return 0; } @@ -45,9 +47,10 @@ static int regcache_flat_exit(struct regmap *map) static int regcache_flat_read(struct regmap *map, unsigned int reg, unsigned int *value) { + unsigned int index = reg / map->reg_stride; unsigned int *cache = map->cache; - *value = cache[reg]; + *value = cache[index]; return 0; } @@ -55,9 +58,10 @@ static int regcache_flat_read(struct regmap *map, static int regcache_flat_write(struct regmap *map, unsigned int reg, unsigned int value) { + unsigned int index = reg / map->reg_stride; unsigned int *cache = map->cache; - cache[reg] = value; + cache[index] = value; return 0; } -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHv2 1/2] regmap: cache: Add warning info for the cache check
If there is no cache used for the drivers, the register defaults or the register defaults raw are not need any more. This patch will check this and print a warning. Signed-off-by: Xiubo Li --- drivers/base/regmap/regcache.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/base/regmap/regcache.c b/drivers/base/regmap/regcache.c index 1c0210a..d4e96dd 100644 --- a/drivers/base/regmap/regcache.c +++ b/drivers/base/regmap/regcache.c @@ -100,15 +100,19 @@ int regcache_init(struct regmap *map, const struct regmap_config *config) int i; void *tmp_buf; - for (i = 0; i < config->num_reg_defaults; i++) - if (config->reg_defaults[i].reg % map->reg_stride) - return -EINVAL; - if (map->cache_type == REGCACHE_NONE) { + if (config->reg_defaults || config->num_reg_defaults_raw) + dev_warn(map->dev, +"No cache used with register defaults set!\n"); + map->cache_bypass = true; return 0; } + for (i = 0; i < config->num_reg_defaults; i++) + if (config->reg_defaults[i].reg % map->reg_stride) + return -EINVAL; + for (i = 0; i < ARRAY_SIZE(cache_types); i++) if (cache_types[i]->type == map->cache_type) break; -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHv2 0/2] regmap: cache: Add invalid cache check warnings
Changed in V2: - Correct some words in the commit commnet and logs. - Add register defaults raw check. - Add [PATCHv2 2/2] Xiubo Li (2): regmap: cache: Add warning info for the cache check regmap: cache: Move the num_reg_defaults check as early as possible drivers/base/regmap/regcache.c | 20 ++-- 1 file changed, 14 insertions(+), 6 deletions(-) -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHv2 2/2] regmap: cache: Move the num_reg_defaults check as early as possible
If the register defaults are provided by the driver without the number by mistake, it should just return an error with one promotion. This should be as early as possible, then there is no need to verify the register defaults' stride and the other code followed. Signed-off-by: Xiubo Li --- drivers/base/regmap/regcache.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/base/regmap/regcache.c b/drivers/base/regmap/regcache.c index d4e96dd..348be3a 100644 --- a/drivers/base/regmap/regcache.c +++ b/drivers/base/regmap/regcache.c @@ -109,6 +109,12 @@ int regcache_init(struct regmap *map, const struct regmap_config *config) return 0; } + if (config->reg_defaults && !config->num_reg_defaults) { + dev_err(map->dev, +"Register defaults are set without the number!\n"); + return -EINVAL; + } + for (i = 0; i < config->num_reg_defaults; i++) if (config->reg_defaults[i].reg % map->reg_stride) return -EINVAL; @@ -142,8 +148,6 @@ int regcache_init(struct regmap *map, const struct regmap_config *config) * a copy of it. */ if (config->reg_defaults) { - if (!map->num_reg_defaults) - return -EINVAL; tmp_buf = kmemdup(config->reg_defaults, map->num_reg_defaults * sizeof(struct reg_default), GFP_KERNEL); if (!tmp_buf) -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] regmap: cache: Add warning info for the cache check
On 10/12/2015 20:45, Charles Keepax wrote: On Thu, Dec 10, 2015 at 10:40:53AM +0800, Xiubo Li wrote: If there is no cache used for the drivers, the register drfaults s/drfaults/defaults/ Yes,Thanks. are not need any more. This patch will check this and print a warning. Signed-off-by: Xiubo Li --- drivers/base/regmap/regcache.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/base/regmap/regcache.c b/drivers/base/regmap/regcache.c index 1c0210a..bdcd401 100644 --- a/drivers/base/regmap/regcache.c +++ b/drivers/base/regmap/regcache.c @@ -100,15 +100,19 @@ int regcache_init(struct regmap *map, const struct regmap_config *config) int i; void *tmp_buf; - for (i = 0; i < config->num_reg_defaults; i++) - if (config->reg_defaults[i].reg % map->reg_stride) - return -EINVAL; - if (map->cache_type == REGCACHE_NONE) { + if (config->num_reg_defaults) + dev_warn(map->dev, +"No cache used with register defualts set!\n"); s/defualts/defaults/ See the next version. Thanks, BRs Xiubo Thanks, Charles -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] regmap: cache: Add warning info for the cache check
On 10/12/2015 20:45, Charles Keepax wrote: On Thu, Dec 10, 2015 at 10:40:53AM +0800, Xiubo Li wrote: If there is no cache used for the drivers, the register drfaults s/drfaults/defaults/ Yes,Thanks. are not need any more. This patch will check this and print a warning. Signed-off-by: Xiubo Li <lixi...@cmss.chinamobile.com> --- drivers/base/regmap/regcache.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/base/regmap/regcache.c b/drivers/base/regmap/regcache.c index 1c0210a..bdcd401 100644 --- a/drivers/base/regmap/regcache.c +++ b/drivers/base/regmap/regcache.c @@ -100,15 +100,19 @@ int regcache_init(struct regmap *map, const struct regmap_config *config) int i; void *tmp_buf; - for (i = 0; i < config->num_reg_defaults; i++) - if (config->reg_defaults[i].reg % map->reg_stride) - return -EINVAL; - if (map->cache_type == REGCACHE_NONE) { + if (config->num_reg_defaults) + dev_warn(map->dev, +"No cache used with register defualts set!\n"); s/defualts/defaults/ See the next version. Thanks, BRs Xiubo Thanks, Charles -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHv2 2/2] regmap: cache: Move the num_reg_defaults check as early as possible
If the register defaults are provided by the driver without the number by mistake, it should just return an error with one promotion. This should be as early as possible, then there is no need to verify the register defaults' stride and the other code followed. Signed-off-by: Xiubo Li <lixi...@cmss.chinamobile.com> --- drivers/base/regmap/regcache.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/base/regmap/regcache.c b/drivers/base/regmap/regcache.c index d4e96dd..348be3a 100644 --- a/drivers/base/regmap/regcache.c +++ b/drivers/base/regmap/regcache.c @@ -109,6 +109,12 @@ int regcache_init(struct regmap *map, const struct regmap_config *config) return 0; } + if (config->reg_defaults && !config->num_reg_defaults) { + dev_err(map->dev, +"Register defaults are set without the number!\n"); + return -EINVAL; + } + for (i = 0; i < config->num_reg_defaults; i++) if (config->reg_defaults[i].reg % map->reg_stride) return -EINVAL; @@ -142,8 +148,6 @@ int regcache_init(struct regmap *map, const struct regmap_config *config) * a copy of it. */ if (config->reg_defaults) { - if (!map->num_reg_defaults) - return -EINVAL; tmp_buf = kmemdup(config->reg_defaults, map->num_reg_defaults * sizeof(struct reg_default), GFP_KERNEL); if (!tmp_buf) -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHv2 0/2] regmap: cache: Add invalid cache check warnings
Changed in V2: - Correct some words in the commit commnet and logs. - Add register defaults raw check. - Add [PATCHv2 2/2] Xiubo Li (2): regmap: cache: Add warning info for the cache check regmap: cache: Move the num_reg_defaults check as early as possible drivers/base/regmap/regcache.c | 20 ++-- 1 file changed, 14 insertions(+), 6 deletions(-) -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHv2 1/2] regmap: cache: Add warning info for the cache check
If there is no cache used for the drivers, the register defaults or the register defaults raw are not need any more. This patch will check this and print a warning. Signed-off-by: Xiubo Li <lixi...@cmss.chinamobile.com> --- drivers/base/regmap/regcache.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/base/regmap/regcache.c b/drivers/base/regmap/regcache.c index 1c0210a..d4e96dd 100644 --- a/drivers/base/regmap/regcache.c +++ b/drivers/base/regmap/regcache.c @@ -100,15 +100,19 @@ int regcache_init(struct regmap *map, const struct regmap_config *config) int i; void *tmp_buf; - for (i = 0; i < config->num_reg_defaults; i++) - if (config->reg_defaults[i].reg % map->reg_stride) - return -EINVAL; - if (map->cache_type == REGCACHE_NONE) { + if (config->reg_defaults || config->num_reg_defaults_raw) + dev_warn(map->dev, +"No cache used with register defaults set!\n"); + map->cache_bypass = true; return 0; } + for (i = 0; i < config->num_reg_defaults; i++) + if (config->reg_defaults[i].reg % map->reg_stride) + return -EINVAL; + for (i = 0; i < ARRAY_SIZE(cache_types); i++) if (cache_types[i]->type == map->cache_type) break; -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] regmap: cache: Add warning info for the cache check
If there is no cache used for the drivers, the register drfaults are not need any more. This patch will check this and print a warning. Signed-off-by: Xiubo Li --- drivers/base/regmap/regcache.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/base/regmap/regcache.c b/drivers/base/regmap/regcache.c index 1c0210a..bdcd401 100644 --- a/drivers/base/regmap/regcache.c +++ b/drivers/base/regmap/regcache.c @@ -100,15 +100,19 @@ int regcache_init(struct regmap *map, const struct regmap_config *config) int i; void *tmp_buf; - for (i = 0; i < config->num_reg_defaults; i++) - if (config->reg_defaults[i].reg % map->reg_stride) - return -EINVAL; - if (map->cache_type == REGCACHE_NONE) { + if (config->num_reg_defaults) + dev_warn(map->dev, +"No cache used with register defualts set!\n"); + map->cache_bypass = true; return 0; } + for (i = 0; i < config->num_reg_defaults; i++) + if (config->reg_defaults[i].reg % map->reg_stride) + return -EINVAL; + for (i = 0; i < ARRAY_SIZE(cache_types); i++) if (cache_types[i]->type == map->cache_type) break; -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] regmap: speed up the regcache_init()
On 09/12/2015 23:05, Mark Brown wrote: On Wed, Dec 09, 2015 at 11:17:22AM +0800, Xiubo Li wrote: Yes, usually when the register cache is not used, the number of the defaults should be zero, but for some drivers like drv2267.c/led_lp8860.c will add the defaults register values though the cache type is REGCACHE_NONE for some reasons. I can't find either of those in the kernel tree... This could be found in the for-next branch of regmap tree. This patch may be not the best, but will be a bit meaningful for some drivers like drv2267.c/led_lp8860.c for now. TBH if we're going to do something here it might be as well to print a warning if something is providing register defaults but no cache. Yes, Agreed. I will enhance this later in another patch, just abandon this one please. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] regmap: fix the warning about unused variable
The variable 'u64 *u64' should be only visible on 64-BIT platform. Signed-off-by: Xiubo Li --- drivers/base/regmap/regmap.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/base/regmap/regmap.c b/drivers/base/regmap/regmap.c index 1791180..a0d30a0 100644 --- a/drivers/base/regmap/regmap.c +++ b/drivers/base/regmap/regmap.c @@ -2581,7 +2581,9 @@ int regmap_bulk_read(struct regmap *map, unsigned int reg, void *val, * we assume that the values are native * endian. */ +#ifdef CONFIG_64BIT u64 *u64 = val; +#endif u32 *u32 = val; u16 *u16 = val; u8 *u8 = val; -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] regmap: add 64-bit mode support
On 09/12/2015 16:45, Arnd Bergmann wrote: On Thursday 03 December 2015 17:31:52 Xiubo Li wrote: @@ -2488,11 +2581,17 @@ int regmap_bulk_read(struct regmap *map, unsigned int reg, void *val, * we assume that the values are native * endian. */ + u64 *u64 = val; u32 *u32 = val; u16 *u16 = val; u8 *u8 = val; switch (map->format.val_bytes) { +#ifdef CONFIG_64BIT + case 8: + u64[i] = ival; + break; +#endif case 4: u32[i] = ival; break; This now gives me: drivers/base/regmap/regmap.c: In function 'regmap_bulk_read': drivers/base/regmap/regmap.c:2584:10: warning: unused variable 'u64' [-Wunused-variable] u64 *u64 = val; ^ I will fix this. Thanks for your catch. BRs Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] regmap: add 64-bit mode support
On 09/12/2015 16:45, Arnd Bergmann wrote: On Thursday 03 December 2015 17:31:52 Xiubo Li wrote: @@ -2488,11 +2581,17 @@ int regmap_bulk_read(struct regmap *map, unsigned int reg, void *val, * we assume that the values are native * endian. */ + u64 *u64 = val; u32 *u32 = val; u16 *u16 = val; u8 *u8 = val; switch (map->format.val_bytes) { +#ifdef CONFIG_64BIT + case 8: + u64[i] = ival; + break; +#endif case 4: u32[i] = ival; break; This now gives me: drivers/base/regmap/regmap.c: In function 'regmap_bulk_read': drivers/base/regmap/regmap.c:2584:10: warning: unused variable 'u64' [-Wunused-variable] u64 *u64 = val; ^ I will fix this. Thanks for your catch. BRs Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] regmap: fix the warning about unused variable
The variable 'u64 *u64' should be only visible on 64-BIT platform. Signed-off-by: Xiubo Li <lixi...@cmss.chinamobile.com> --- drivers/base/regmap/regmap.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/base/regmap/regmap.c b/drivers/base/regmap/regmap.c index 1791180..a0d30a0 100644 --- a/drivers/base/regmap/regmap.c +++ b/drivers/base/regmap/regmap.c @@ -2581,7 +2581,9 @@ int regmap_bulk_read(struct regmap *map, unsigned int reg, void *val, * we assume that the values are native * endian. */ +#ifdef CONFIG_64BIT u64 *u64 = val; +#endif u32 *u32 = val; u16 *u16 = val; u8 *u8 = val; -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/