Re: [PATCH 1/1] media: uvc_driver: fix USB Camera ref leak denial of service
Laurent and Mauro, I've reviewed Philipp Zabel's 49770 patch, it looks like they equivalent, but take a different path. My version does the unref in uvc_unregister_video, Zabel's does the unref in each caller of uvc_unregister_video. I think I would prefer it to be in fewer places. Any ideas on the sysfs group 'power' not found bug on removal while on use? Thanks, it will be great to get this fixed. On Sat, May 26, 2018 at 07:21:11PM +0300, Laurent Pinchart wrote: > Hi David, > > Thank you for the patch. > > On Saturday, 26 May 2018 18:50:46 EEST David Fries wrote: > > Commit 9d15cd958c17 ("media: uvcvideo: Convert from using an atomic > > variable to a reference count") > > didn't take into account that while the old counter was initialized to > > 0 (no stream open), kref_init starts with a reference of 1. The > > reference count on unplug no longer reaches 0, uvc_delete isn't > > called, and evdev doesn't release /dev/input/event*. Plug and unplug > > enough times and it runs out of device minors preventing any new input > > device and the use of newly plugged in USB video cameras until the > > system is rebooted. > > > > Signed-off-by: David Fries <da...@fries.net> > > Cc: Guennadi Liakhovetski <g.liakhovet...@gmx.de> > > Cc: Laurent Pinchart <laurent.pinch...@ideasonboard.com> > > Cc: Mauro Carvalho Chehab <mche...@s-opensource.com> > > Cc: sta...@vger.kernel.org > > Philipp Zabel has already posted a similar patch, see https:// > patchwork.linuxtv.org/patch/49770/ > > Mauro, > > This is a serious issue so I'd like to get the patch merged in v4.18, but it > could be argued that it's getting late for that, especially given that the > bug > has been there since v4.14. Would you be OK merging this in v4.18 ? If so > could you pick https://patchwork.linuxtv.org/patch/49770/ up, or would you > like me to send a pull request ? > > > --- > > drivers/media/usb/uvc/uvc_driver.c | 11 --- > > 1 file changed, 4 insertions(+), 7 deletions(-) > > > > diff --git a/drivers/media/usb/uvc/uvc_driver.c > > b/drivers/media/usb/uvc/uvc_driver.c index 2469b49..3cbdc87 100644 > > --- a/drivers/media/usb/uvc/uvc_driver.c > > +++ b/drivers/media/usb/uvc/uvc_driver.c > > @@ -1871,13 +1871,6 @@ static void uvc_unregister_video(struct uvc_device > > *dev) { > > struct uvc_streaming *stream; > > > > - /* Unregistering all video devices might result in uvc_delete() being > > -* called from inside the loop if there's no open file handle. To avoid > > -* that, increment the refcount before iterating over the streams and > > -* decrement it when done. > > -*/ > > - kref_get(>ref); > > - > > list_for_each_entry(stream, >streams, list) { > > if (!video_is_registered(>vdev)) > > continue; > > @@ -1888,6 +1881,10 @@ static void uvc_unregister_video(struct uvc_device > > *dev) uvc_debugfs_cleanup_stream(stream); > > } > > > > + /* Release the reference implicit in kref_init from uvc_probe, > > +* the UVC device won't be deleted until the last file descriptor > > +* is also closed. > > +*/ > > kref_put(>ref, uvc_delete); > > } > > -- > Regards, > > Laurent Pinchart > > -- David Fries <da...@fries.net>
Re: [PATCH 1/1] media: uvc_driver: fix USB Camera ref leak denial of service
Laurent and Mauro, I've reviewed Philipp Zabel's 49770 patch, it looks like they equivalent, but take a different path. My version does the unref in uvc_unregister_video, Zabel's does the unref in each caller of uvc_unregister_video. I think I would prefer it to be in fewer places. Any ideas on the sysfs group 'power' not found bug on removal while on use? Thanks, it will be great to get this fixed. On Sat, May 26, 2018 at 07:21:11PM +0300, Laurent Pinchart wrote: > Hi David, > > Thank you for the patch. > > On Saturday, 26 May 2018 18:50:46 EEST David Fries wrote: > > Commit 9d15cd958c17 ("media: uvcvideo: Convert from using an atomic > > variable to a reference count") > > didn't take into account that while the old counter was initialized to > > 0 (no stream open), kref_init starts with a reference of 1. The > > reference count on unplug no longer reaches 0, uvc_delete isn't > > called, and evdev doesn't release /dev/input/event*. Plug and unplug > > enough times and it runs out of device minors preventing any new input > > device and the use of newly plugged in USB video cameras until the > > system is rebooted. > > > > Signed-off-by: David Fries > > Cc: Guennadi Liakhovetski > > Cc: Laurent Pinchart > > Cc: Mauro Carvalho Chehab > > Cc: sta...@vger.kernel.org > > Philipp Zabel has already posted a similar patch, see https:// > patchwork.linuxtv.org/patch/49770/ > > Mauro, > > This is a serious issue so I'd like to get the patch merged in v4.18, but it > could be argued that it's getting late for that, especially given that the > bug > has been there since v4.14. Would you be OK merging this in v4.18 ? If so > could you pick https://patchwork.linuxtv.org/patch/49770/ up, or would you > like me to send a pull request ? > > > --- > > drivers/media/usb/uvc/uvc_driver.c | 11 --- > > 1 file changed, 4 insertions(+), 7 deletions(-) > > > > diff --git a/drivers/media/usb/uvc/uvc_driver.c > > b/drivers/media/usb/uvc/uvc_driver.c index 2469b49..3cbdc87 100644 > > --- a/drivers/media/usb/uvc/uvc_driver.c > > +++ b/drivers/media/usb/uvc/uvc_driver.c > > @@ -1871,13 +1871,6 @@ static void uvc_unregister_video(struct uvc_device > > *dev) { > > struct uvc_streaming *stream; > > > > - /* Unregistering all video devices might result in uvc_delete() being > > -* called from inside the loop if there's no open file handle. To avoid > > -* that, increment the refcount before iterating over the streams and > > -* decrement it when done. > > -*/ > > - kref_get(>ref); > > - > > list_for_each_entry(stream, >streams, list) { > > if (!video_is_registered(>vdev)) > > continue; > > @@ -1888,6 +1881,10 @@ static void uvc_unregister_video(struct uvc_device > > *dev) uvc_debugfs_cleanup_stream(stream); > > } > > > > + /* Release the reference implicit in kref_init from uvc_probe, > > +* the UVC device won't be deleted until the last file descriptor > > +* is also closed. > > +*/ > > kref_put(>ref, uvc_delete); > > } > > -- > Regards, > > Laurent Pinchart > > -- David Fries
[PATCH 1/1] media: uvc_driver: fix USB Camera ref leak denial of service
Commit 9d15cd958c17 ("media: uvcvideo: Convert from using an atomic variable to a reference count") didn't take into account that while the old counter was initialized to 0 (no stream open), kref_init starts with a reference of 1. The reference count on unplug no longer reaches 0, uvc_delete isn't called, and evdev doesn't release /dev/input/event*. Plug and unplug enough times and it runs out of device minors preventing any new input device and the use of newly plugged in USB video cameras until the system is rebooted. Signed-off-by: David Fries <da...@fries.net> Cc: Guennadi Liakhovetski <g.liakhovet...@gmx.de> Cc: Laurent Pinchart <laurent.pinch...@ideasonboard.com> Cc: Mauro Carvalho Chehab <mche...@s-opensource.com> Cc: sta...@vger.kernel.org --- drivers/media/usb/uvc/uvc_driver.c | 11 --- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/drivers/media/usb/uvc/uvc_driver.c b/drivers/media/usb/uvc/uvc_driver.c index 2469b49..3cbdc87 100644 --- a/drivers/media/usb/uvc/uvc_driver.c +++ b/drivers/media/usb/uvc/uvc_driver.c @@ -1871,13 +1871,6 @@ static void uvc_unregister_video(struct uvc_device *dev) { struct uvc_streaming *stream; - /* Unregistering all video devices might result in uvc_delete() being -* called from inside the loop if there's no open file handle. To avoid -* that, increment the refcount before iterating over the streams and -* decrement it when done. -*/ - kref_get(>ref); - list_for_each_entry(stream, >streams, list) { if (!video_is_registered(>vdev)) continue; @@ -1888,6 +1881,10 @@ static void uvc_unregister_video(struct uvc_device *dev) uvc_debugfs_cleanup_stream(stream); } + /* Release the reference implicit in kref_init from uvc_probe, +* the UVC device won't be deleted until the last file descriptor +* is also closed. +*/ kref_put(>ref, uvc_delete); } -- 2.1.4 >From 32a612bc06a2a1b9215f3b7166342c98043bd925 Mon Sep 17 00:00:00 2001 From: David Fries <da...@fries.net> Date: Thu, 24 May 2018 23:43:15 -0500 Subject: [PATCH] uvc_driver: UVC kref never reaches zero leading to denial of service Commit 9d15cd958c17 ("media: uvcvideo: Convert from using an atomic variable to a reference count") didn't take into account that while the counter was initialized to 0 (no stream open), kref_init starts with a reference of 1, leading to the device never reaching 0 and uvc_delete never getting called. This leads to evdev never getting released and /dev/input/event* eventually running out of minors preventing any new event devices and new USB cameras from being usable until the system is rebooted. In my case "disabled by hub (EMI?), re-enabling..." kept removing/inserting the device until days later it ran out of minors and I lost the video security feed. Now that the device is actually being removed other problems are showing up. Specifically the following if the camera is removed or `rmmod ehci_pci` while an application is getting video from it. It doesn't happen if the camera is not in use. How do I track that down? sysfs group 'power' not found for kobject 'event10' sysfs group 'power' not found for kobject 'input32' sysfs group 'id' not found for kobject 'input32' sysfs group 'capabilities' not found for kobject 'input32' sysfs group 'power' not found for kobject 'media0' Signed-off-by: David Fries <da...@fries.net> --- drivers/media/usb/uvc/uvc_driver.c | 11 --- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/drivers/media/usb/uvc/uvc_driver.c b/drivers/media/usb/uvc/uvc_driver.c index 2469b49..3cbdc87 100644 --- a/drivers/media/usb/uvc/uvc_driver.c +++ b/drivers/media/usb/uvc/uvc_driver.c @@ -1871,13 +1871,6 @@ static void uvc_unregister_video(struct uvc_device *dev) { struct uvc_streaming *stream; - /* Unregistering all video devices might result in uvc_delete() being -* called from inside the loop if there's no open file handle. To avoid -* that, increment the refcount before iterating over the streams and -* decrement it when done. -*/ - kref_get(>ref); - list_for_each_entry(stream, >streams, list) { if (!video_is_registered(>vdev)) continue; @@ -1888,6 +1881,10 @@ static void uvc_unregister_video(struct uvc_device *dev) uvc_debugfs_cleanup_stream(stream); } + /* Release the reference implicit in kref_init from uvc_probe, +* the UVC device won't be deleted until the last file descriptor +* is also closed. +*/ kref_put(>ref, uvc_delete); } -- 2.1.4
[PATCH 0/1] media: uvc_driver: fix USB Camera ref leak denial of service
The patch in the following e-mail fixes a reference count bug, it seems to me that uvc_unregister_video is a good location to release the final reference, I find it is called once. It may sound like a lot to plug and unplug the USB camera 250 some times, but in my case "disabled by hub (EMI?), re-enabling..." kept unplugging and plugging in the device until days later it ran out of minors and I lost the video security feed. With this patch, now that the device is actually being removed other problems are showing up. Specifically the following if the camera is removed or `rmmod ehci_pci` while an application is getting video from it. It doesn't happen if the camera is not in use. How do I track that down? sysfs group 'power' not found for kobject 'event10' sysfs group 'power' not found for kobject 'input32' sysfs group 'id' not found for kobject 'input32' sysfs group 'capabilities' not found for kobject 'input32' sysfs group 'power' not found for kobject 'media0' -- David Fries <da...@fries.net>
[PATCH 0/1] media: uvc_driver: fix USB Camera ref leak denial of service
The patch in the following e-mail fixes a reference count bug, it seems to me that uvc_unregister_video is a good location to release the final reference, I find it is called once. It may sound like a lot to plug and unplug the USB camera 250 some times, but in my case "disabled by hub (EMI?), re-enabling..." kept unplugging and plugging in the device until days later it ran out of minors and I lost the video security feed. With this patch, now that the device is actually being removed other problems are showing up. Specifically the following if the camera is removed or `rmmod ehci_pci` while an application is getting video from it. It doesn't happen if the camera is not in use. How do I track that down? sysfs group 'power' not found for kobject 'event10' sysfs group 'power' not found for kobject 'input32' sysfs group 'id' not found for kobject 'input32' sysfs group 'capabilities' not found for kobject 'input32' sysfs group 'power' not found for kobject 'media0' -- David Fries
[PATCH 1/1] media: uvc_driver: fix USB Camera ref leak denial of service
Commit 9d15cd958c17 ("media: uvcvideo: Convert from using an atomic variable to a reference count") didn't take into account that while the old counter was initialized to 0 (no stream open), kref_init starts with a reference of 1. The reference count on unplug no longer reaches 0, uvc_delete isn't called, and evdev doesn't release /dev/input/event*. Plug and unplug enough times and it runs out of device minors preventing any new input device and the use of newly plugged in USB video cameras until the system is rebooted. Signed-off-by: David Fries Cc: Guennadi Liakhovetski Cc: Laurent Pinchart Cc: Mauro Carvalho Chehab Cc: sta...@vger.kernel.org --- drivers/media/usb/uvc/uvc_driver.c | 11 --- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/drivers/media/usb/uvc/uvc_driver.c b/drivers/media/usb/uvc/uvc_driver.c index 2469b49..3cbdc87 100644 --- a/drivers/media/usb/uvc/uvc_driver.c +++ b/drivers/media/usb/uvc/uvc_driver.c @@ -1871,13 +1871,6 @@ static void uvc_unregister_video(struct uvc_device *dev) { struct uvc_streaming *stream; - /* Unregistering all video devices might result in uvc_delete() being -* called from inside the loop if there's no open file handle. To avoid -* that, increment the refcount before iterating over the streams and -* decrement it when done. -*/ - kref_get(>ref); - list_for_each_entry(stream, >streams, list) { if (!video_is_registered(>vdev)) continue; @@ -1888,6 +1881,10 @@ static void uvc_unregister_video(struct uvc_device *dev) uvc_debugfs_cleanup_stream(stream); } + /* Release the reference implicit in kref_init from uvc_probe, +* the UVC device won't be deleted until the last file descriptor +* is also closed. +*/ kref_put(>ref, uvc_delete); } -- 2.1.4 >From 32a612bc06a2a1b9215f3b7166342c98043bd925 Mon Sep 17 00:00:00 2001 From: David Fries Date: Thu, 24 May 2018 23:43:15 -0500 Subject: [PATCH] uvc_driver: UVC kref never reaches zero leading to denial of service Commit 9d15cd958c17 ("media: uvcvideo: Convert from using an atomic variable to a reference count") didn't take into account that while the counter was initialized to 0 (no stream open), kref_init starts with a reference of 1, leading to the device never reaching 0 and uvc_delete never getting called. This leads to evdev never getting released and /dev/input/event* eventually running out of minors preventing any new event devices and new USB cameras from being usable until the system is rebooted. In my case "disabled by hub (EMI?), re-enabling..." kept removing/inserting the device until days later it ran out of minors and I lost the video security feed. Now that the device is actually being removed other problems are showing up. Specifically the following if the camera is removed or `rmmod ehci_pci` while an application is getting video from it. It doesn't happen if the camera is not in use. How do I track that down? sysfs group 'power' not found for kobject 'event10' sysfs group 'power' not found for kobject 'input32' sysfs group 'id' not found for kobject 'input32' sysfs group 'capabilities' not found for kobject 'input32' sysfs group 'power' not found for kobject 'media0' Signed-off-by: David Fries --- drivers/media/usb/uvc/uvc_driver.c | 11 --- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/drivers/media/usb/uvc/uvc_driver.c b/drivers/media/usb/uvc/uvc_driver.c index 2469b49..3cbdc87 100644 --- a/drivers/media/usb/uvc/uvc_driver.c +++ b/drivers/media/usb/uvc/uvc_driver.c @@ -1871,13 +1871,6 @@ static void uvc_unregister_video(struct uvc_device *dev) { struct uvc_streaming *stream; - /* Unregistering all video devices might result in uvc_delete() being -* called from inside the loop if there's no open file handle. To avoid -* that, increment the refcount before iterating over the streams and -* decrement it when done. -*/ - kref_get(>ref); - list_for_each_entry(stream, >streams, list) { if (!video_is_registered(>vdev)) continue; @@ -1888,6 +1881,10 @@ static void uvc_unregister_video(struct uvc_device *dev) uvc_debugfs_cleanup_stream(stream); } + /* Release the reference implicit in kref_init from uvc_probe, +* the UVC device won't be deleted until the last file descriptor +* is also closed. +*/ kref_put(>ref, uvc_delete); } -- 2.1.4
Re: [patch] [patch 2/2 v2] w1: use correct lock on error in w1_seq_show()
On Thu, Jun 11, 2015 at 06:31:00PM +0300, Evgeniy Polyakov wrote: > Hi > > 04.06.2015, 12:04, "Dan Carpenter" : > > I noticed there was a problem here because Smatch complained: > > > > drivers/w1/slaves/w1_therm.c:416 w1_seq_show() warn: > > inconsistent returns 'mutex:>master->mutex'. > > Locked on: line 416 > > Unlocked on: line 413 > > > > The problem is that we lock ->mutex but we unlock ->bus_mutex on error. > > David Fries says that ->bus_mutex is correct and ->mutex is incorrect. > > > > Fixes: d9411e57dc7f ('w1: Add support for DS28EA00 sequence to w1-therm') > > Signed-off-by: Dan Carpenter > > Looks good to me, Greg please pull this serie into your tree, if you hadn't > yet. > Am I right that this is a stable tree material too? I would expect the answer to be no. This is a fix to a new feature that is in gregkh/char-misc but not yet in Linus's tree. -- David Fries -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] [patch 2/2 v2] w1: use correct lock on error in w1_seq_show()
On Thu, Jun 11, 2015 at 06:31:00PM +0300, Evgeniy Polyakov wrote: Hi 04.06.2015, 12:04, Dan Carpenter dan.carpen...@oracle.com: I noticed there was a problem here because Smatch complained: drivers/w1/slaves/w1_therm.c:416 w1_seq_show() warn: inconsistent returns 'mutex:sl-master-mutex'. Locked on: line 416 Unlocked on: line 413 The problem is that we lock -mutex but we unlock -bus_mutex on error. David Fries says that -bus_mutex is correct and -mutex is incorrect. Fixes: d9411e57dc7f ('w1: Add support for DS28EA00 sequence to w1-therm') Signed-off-by: Dan Carpenter dan.carpen...@oracle.com Looks good to me, Greg please pull this serie into your tree, if you hadn't yet. Am I right that this is a stable tree material too? I would expect the answer to be no. This is a fix to a new feature that is in gregkh/char-misc but not yet in Linus's tree. -- David Fries da...@fries.net -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 2/2] w1: unlock correct lock on error in w1_seq_show()
On Tue, Jun 02, 2015 at 09:54:10PM -0400, Matt Campbell wrote: > Thanks for the response, I will try to answer your questions. > > The other function besides temp conversion and EEPROM write that > requires a delay is the initial CHAIN on command. The datasheet says > to "Wait for chain to charge" but it does not offer ANY indication on > how to do so. I did discover that my sleep was a sleep(0) long after > I submitted the patch, but as you observed it was not hurting > anything. Then it should be removed as it gives the wrong impression and that variable shouldn't be used in the slave code anyway. > The 64 was, in all honesty, a starting point that I picked for lack of > finding a better option. I only currently have 10 sensors available > for me to test with, so I was not able to do any significant tests to > see what would happen if I raised that limit and the loop did not exit > properly at the end of the chain. I was also not able to find an edge > case that would cause the loop to not find a match. The w1_seq file > only shows for the one type sensor that supports it so in the absence > of a bus error, a match is guaranteed. That's easy enough to test, comment out your if...break; in the loop. Is there a good reason that the loop doesn't terminate when the device you're after sl->reg_num.id is found? By making your loop terminate on a different condition, W1_42_FINISHED_BYTE, if the slave you are after isn't found you'll still return seq of 0. I'm wondering if seq should be initialized to -1. Do this, make up the right id of the correct family code and manually tell the kernel it is there. That would be the same as if you removed a chip before the kernel's search timed it out. Then do your sequence scan on your devices including this new one which isn't there. cd /sys/devices/w1_bus_master1 echo 42-01234567 > w1_master_add The devices return the rom numbers, I wonder about returning the rom list (which is in the sequence), instead of the sequence number for a device if that would be more or less useful to users. You would be able to do it in one pass. > The output of the function in the event the pins are hooked up as GPIO > and not a chain is undefined. Since the chips are very tiny, > prototyping this function would require me to design and order some > boards that I never was able to use. To my knowledge it is not > possible to determine that things are wired this way in the software. Something like "Output is undefined if all direct sequence sensors on this one wire bus aren't part of one chain. GPIO control of the two pins is not support by this driver." for the comment above w1_seq_show, in addition to the other comments I was asking about. I see now you have some documentation in Documentation/w1/slaves/w1_therm already that I didn't realize earlier. > Apologies, this is my first foray in submitting any sort of kernel > code so I am not sure of exactly how I would submit a fixup patch for > gregkh's tree. If someone wants to walk me through that off list I > would be most appreciative! git remote add gregkh_char-misc git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git git fetch gregkh_char-misc git rebase remotes/gregkh_char-misc/char-misc-next and add a new commit from there. > Matt Campbell > mattrcampb...@gmail.com > > > On Tue, Jun 2, 2015 at 9:07 PM, David Fries wrote: > > The source code is as good as it gets, that's why I posted those two > > lines from the w1.h file. Maybe it could be expanded upon, mutex is > > more for adding or removing slaves and the like modifications to > > w1_master data structure, where bus_mutex is what make sure something > > else doesn't try to talk on the bus while the current device is using > > it. What really matters is how other code is using the locks. For > > instance if you look at w1_therm.c bus_mutex is the only one used. > > Then you look at what your routine is doing, and what the existing > > routine is doing, they are both reading/writing from the one wire bus, > > and then return their output to the user, so then use the same lock. > > If it's the wrong lock then it's at least consistent and easier to > > cleanup when someone figures out it's the wrong lock. If you think > > it's the wrong lock raise a flag and ask. > > > > My patch review comments. > > > > The temperature sensor uses a strong pullup or delay of 750 ms to do > > the temperature conversion. A quick read through of this sensor has > > temperature conversion and EEPROM write as the only operations > > requiring a significant time delay, everything else is in the us, so I > > would expect this to not require any delay. > > > > Read the comments to w1_next_pullup in w1_i
Re: [patch 2/2] w1: unlock correct lock on error in w1_seq_show()
The source code is as good as it gets, that's why I posted those two lines from the w1.h file. Maybe it could be expanded upon, mutex is more for adding or removing slaves and the like modifications to w1_master data structure, where bus_mutex is what make sure something else doesn't try to talk on the bus while the current device is using it. What really matters is how other code is using the locks. For instance if you look at w1_therm.c bus_mutex is the only one used. Then you look at what your routine is doing, and what the existing routine is doing, they are both reading/writing from the one wire bus, and then return their output to the user, so then use the same lock. If it's the wrong lock then it's at least consistent and easier to cleanup when someone figures out it's the wrong lock. If you think it's the wrong lock raise a flag and ask. My patch review comments. The temperature sensor uses a strong pullup or delay of 750 ms to do the temperature conversion. A quick read through of this sensor has temperature conversion and EEPROM write as the only operations requiring a significant time delay, everything else is in the us, so I would expect this to not require any delay. Read the comments to w1_next_pullup in w1_io.c as pullup_duration will always be 0 in msleep(sl->master->pullup_duration); which isn't likely what you expect. Apparently it is working for you so that further reinforces my previous point. Where did the 64 come from? I have 14 temperature sensors in my single family residence right now, and it would take another 7 to cover each area/room. I could imagine an actual industrial setting to far exceed the 64 count, but I don't work in that area so I don't know the practical limits. How long does it take to read if it loops through 64 times without finding a match? What is reg_num value if a sensor doesn't reply? If it is all 00's or all ff's would that be a better test for the end of the line? How long would it take to loop 16384 times? Maybe you stop if it reaches 16384 or if the end of line value is read 64 times. I also assume this is pretty much a static answer, so you aren't going to be querying it except once per device, once per program setup and rarely or infrequently after that so the time shouldn't matter too much. Maybe the program queries again if a sensor is detected or removed. I would like to see some documentation (comment) at the top of w1_seq_show as to what the sysfs number returns means. I can see the variable is named seq, and it returns it as a signed number. What does a negative value mean? Are they expected? Will the first device in the chain be 0 or 1, and what defines what end that is? What kind of value is returned if the pins are connected as GPIO and not in the direct sequence configuration? Maybe it needs a comment that all sensors of this type must be in the direct sequence configuration for this routine to give meaningful values. "Place all devices in CHAIN state" that will address all devices including other types of devices, would you expect this to cause a problem with other devices? Line up the constants after #define to the same tab location before w1_seq_show. Do you plan to send in a mutex fixup patch? In my experience gregkh's tree only accepts patches to what is already there, he won't swap out a patch he already has (he could as it isn't in Linus's tree, but he doesn't). On Tue, Jun 02, 2015 at 08:35:41AM -0400, Matt Campbell wrote: > David, not arguing just curious how you decide which lock to use? Is there > a standard documented? > > --m > > Matt Campbell > mattrcampb...@gmail.com > > On Tue, Jun 2, 2015 at 8:12 AM, David Fries wrote: > > > Thanks for including me, I'm going to have to nack this though, it is > > an error, but bus_mutex is correct, mutex is incorrect, so the other > > two lock/unlock in that routine are the ones that need to be changed. > > > > w1.h > > * @mutex: protect most of w1_master > > * @bus_mutex: pretect concurrent bus access > > > > On Mon, Jun 01, 2015 at 12:56:32PM +0300, Dan Carpenter wrote: > > > Smatch complains that we don't unlock "master->mutex" on this error > > > path. It looks like it is a typo and we unlock ->bus_mutext where > > > ->mutex was intended. > > > > > > Fixes: d9411e57dc7f ('w1: Add support for DS28EA00 sequence to w1-therm') > > > Signed-off-by: Dan Carpenter > > > > > > diff --git a/drivers/w1/slaves/w1_therm.c b/drivers/w1/slaves/w1_therm.c > > > index 3351be6..79ecaf8 100644 > > > --- a/drivers/w1/slaves/w1_therm.c > > > +++ b/drivers/w1/slaves/w1_therm.c > > > @@ -412,7 +412,7 @@ static ssize_t w1_seq_show(struct device *device, > > > c -= snprintf(buf + PAGE_SIZE - c, c, &q
Re: [patch 2/2] w1: unlock correct lock on error in w1_seq_show()
Thanks for including me, I'm going to have to nack this though, it is an error, but bus_mutex is correct, mutex is incorrect, so the other two lock/unlock in that routine are the ones that need to be changed. w1.h * @mutex: protect most of w1_master * @bus_mutex: pretect concurrent bus access On Mon, Jun 01, 2015 at 12:56:32PM +0300, Dan Carpenter wrote: > Smatch complains that we don't unlock "master->mutex" on this error > path. It looks like it is a typo and we unlock ->bus_mutext where > ->mutex was intended. > > Fixes: d9411e57dc7f ('w1: Add support for DS28EA00 sequence to w1-therm') > Signed-off-by: Dan Carpenter > > diff --git a/drivers/w1/slaves/w1_therm.c b/drivers/w1/slaves/w1_therm.c > index 3351be6..79ecaf8 100644 > --- a/drivers/w1/slaves/w1_therm.c > +++ b/drivers/w1/slaves/w1_therm.c > @@ -412,7 +412,7 @@ static ssize_t w1_seq_show(struct device *device, > c -= snprintf(buf + PAGE_SIZE - c, c, "%d\n", seq); > return PAGE_SIZE - c; > error: > - mutex_unlock(>master->bus_mutex); > + mutex_unlock(>master->mutex); > return -EIO; > } > -- David Fries -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 2/2] w1: unlock correct lock on error in w1_seq_show()
The source code is as good as it gets, that's why I posted those two lines from the w1.h file. Maybe it could be expanded upon, mutex is more for adding or removing slaves and the like modifications to w1_master data structure, where bus_mutex is what make sure something else doesn't try to talk on the bus while the current device is using it. What really matters is how other code is using the locks. For instance if you look at w1_therm.c bus_mutex is the only one used. Then you look at what your routine is doing, and what the existing routine is doing, they are both reading/writing from the one wire bus, and then return their output to the user, so then use the same lock. If it's the wrong lock then it's at least consistent and easier to cleanup when someone figures out it's the wrong lock. If you think it's the wrong lock raise a flag and ask. My patch review comments. The temperature sensor uses a strong pullup or delay of 750 ms to do the temperature conversion. A quick read through of this sensor has temperature conversion and EEPROM write as the only operations requiring a significant time delay, everything else is in the us, so I would expect this to not require any delay. Read the comments to w1_next_pullup in w1_io.c as pullup_duration will always be 0 in msleep(sl-master-pullup_duration); which isn't likely what you expect. Apparently it is working for you so that further reinforces my previous point. Where did the 64 come from? I have 14 temperature sensors in my single family residence right now, and it would take another 7 to cover each area/room. I could imagine an actual industrial setting to far exceed the 64 count, but I don't work in that area so I don't know the practical limits. How long does it take to read if it loops through 64 times without finding a match? What is reg_num value if a sensor doesn't reply? If it is all 00's or all ff's would that be a better test for the end of the line? How long would it take to loop 16384 times? Maybe you stop if it reaches 16384 or if the end of line value is read 64 times. I also assume this is pretty much a static answer, so you aren't going to be querying it except once per device, once per program setup and rarely or infrequently after that so the time shouldn't matter too much. Maybe the program queries again if a sensor is detected or removed. I would like to see some documentation (comment) at the top of w1_seq_show as to what the sysfs number returns means. I can see the variable is named seq, and it returns it as a signed number. What does a negative value mean? Are they expected? Will the first device in the chain be 0 or 1, and what defines what end that is? What kind of value is returned if the pins are connected as GPIO and not in the direct sequence configuration? Maybe it needs a comment that all sensors of this type must be in the direct sequence configuration for this routine to give meaningful values. Place all devices in CHAIN state that will address all devices including other types of devices, would you expect this to cause a problem with other devices? Line up the constants after #define to the same tab location before w1_seq_show. Do you plan to send in a mutex fixup patch? In my experience gregkh's tree only accepts patches to what is already there, he won't swap out a patch he already has (he could as it isn't in Linus's tree, but he doesn't). On Tue, Jun 02, 2015 at 08:35:41AM -0400, Matt Campbell wrote: David, not arguing just curious how you decide which lock to use? Is there a standard documented? --m Matt Campbell mattrcampb...@gmail.com On Tue, Jun 2, 2015 at 8:12 AM, David Fries da...@fries.net wrote: Thanks for including me, I'm going to have to nack this though, it is an error, but bus_mutex is correct, mutex is incorrect, so the other two lock/unlock in that routine are the ones that need to be changed. w1.h * @mutex: protect most of w1_master * @bus_mutex: pretect concurrent bus access On Mon, Jun 01, 2015 at 12:56:32PM +0300, Dan Carpenter wrote: Smatch complains that we don't unlock master-mutex on this error path. It looks like it is a typo and we unlock -bus_mutext where -mutex was intended. Fixes: d9411e57dc7f ('w1: Add support for DS28EA00 sequence to w1-therm') Signed-off-by: Dan Carpenter dan.carpen...@oracle.com diff --git a/drivers/w1/slaves/w1_therm.c b/drivers/w1/slaves/w1_therm.c index 3351be6..79ecaf8 100644 --- a/drivers/w1/slaves/w1_therm.c +++ b/drivers/w1/slaves/w1_therm.c @@ -412,7 +412,7 @@ static ssize_t w1_seq_show(struct device *device, c -= snprintf(buf + PAGE_SIZE - c, c, %d\n, seq); return PAGE_SIZE - c; error: - mutex_unlock(sl-master-bus_mutex); + mutex_unlock(sl-master-mutex); return -EIO; } -- David Fries da...@fries.net -- David Fries da...@fries.net -- To unsubscribe from this list: send the line
Re: [patch 2/2] w1: unlock correct lock on error in w1_seq_show()
On Tue, Jun 02, 2015 at 09:54:10PM -0400, Matt Campbell wrote: Thanks for the response, I will try to answer your questions. The other function besides temp conversion and EEPROM write that requires a delay is the initial CHAIN on command. The datasheet says to Wait for chain to charge but it does not offer ANY indication on how to do so. I did discover that my sleep was a sleep(0) long after I submitted the patch, but as you observed it was not hurting anything. Then it should be removed as it gives the wrong impression and that variable shouldn't be used in the slave code anyway. The 64 was, in all honesty, a starting point that I picked for lack of finding a better option. I only currently have 10 sensors available for me to test with, so I was not able to do any significant tests to see what would happen if I raised that limit and the loop did not exit properly at the end of the chain. I was also not able to find an edge case that would cause the loop to not find a match. The w1_seq file only shows for the one type sensor that supports it so in the absence of a bus error, a match is guaranteed. That's easy enough to test, comment out your if...break; in the loop. Is there a good reason that the loop doesn't terminate when the device you're after sl-reg_num.id is found? By making your loop terminate on a different condition, W1_42_FINISHED_BYTE, if the slave you are after isn't found you'll still return seq of 0. I'm wondering if seq should be initialized to -1. Do this, make up the right id of the correct family code and manually tell the kernel it is there. That would be the same as if you removed a chip before the kernel's search timed it out. Then do your sequence scan on your devices including this new one which isn't there. cd /sys/devices/w1_bus_master1 echo 42-01234567 w1_master_add The devices return the rom numbers, I wonder about returning the rom list (which is in the sequence), instead of the sequence number for a device if that would be more or less useful to users. You would be able to do it in one pass. The output of the function in the event the pins are hooked up as GPIO and not a chain is undefined. Since the chips are very tiny, prototyping this function would require me to design and order some boards that I never was able to use. To my knowledge it is not possible to determine that things are wired this way in the software. Something like Output is undefined if all direct sequence sensors on this one wire bus aren't part of one chain. GPIO control of the two pins is not support by this driver. for the comment above w1_seq_show, in addition to the other comments I was asking about. I see now you have some documentation in Documentation/w1/slaves/w1_therm already that I didn't realize earlier. Apologies, this is my first foray in submitting any sort of kernel code so I am not sure of exactly how I would submit a fixup patch for gregkh's tree. If someone wants to walk me through that off list I would be most appreciative! git remote add gregkh_char-misc git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git git fetch gregkh_char-misc git rebase remotes/gregkh_char-misc/char-misc-next and add a new commit from there. Matt Campbell mattrcampb...@gmail.com On Tue, Jun 2, 2015 at 9:07 PM, David Fries da...@fries.net wrote: The source code is as good as it gets, that's why I posted those two lines from the w1.h file. Maybe it could be expanded upon, mutex is more for adding or removing slaves and the like modifications to w1_master data structure, where bus_mutex is what make sure something else doesn't try to talk on the bus while the current device is using it. What really matters is how other code is using the locks. For instance if you look at w1_therm.c bus_mutex is the only one used. Then you look at what your routine is doing, and what the existing routine is doing, they are both reading/writing from the one wire bus, and then return their output to the user, so then use the same lock. If it's the wrong lock then it's at least consistent and easier to cleanup when someone figures out it's the wrong lock. If you think it's the wrong lock raise a flag and ask. My patch review comments. The temperature sensor uses a strong pullup or delay of 750 ms to do the temperature conversion. A quick read through of this sensor has temperature conversion and EEPROM write as the only operations requiring a significant time delay, everything else is in the us, so I would expect this to not require any delay. Read the comments to w1_next_pullup in w1_io.c as pullup_duration will always be 0 in msleep(sl-master-pullup_duration); which isn't likely what you expect. Apparently it is working for you so that further reinforces my previous point. Where did the 64 come from? I have 14 temperature sensors in my single family residence right now, and it would take
Re: [patch 2/2] w1: unlock correct lock on error in w1_seq_show()
Thanks for including me, I'm going to have to nack this though, it is an error, but bus_mutex is correct, mutex is incorrect, so the other two lock/unlock in that routine are the ones that need to be changed. w1.h * @mutex: protect most of w1_master * @bus_mutex: pretect concurrent bus access On Mon, Jun 01, 2015 at 12:56:32PM +0300, Dan Carpenter wrote: Smatch complains that we don't unlock master-mutex on this error path. It looks like it is a typo and we unlock -bus_mutext where -mutex was intended. Fixes: d9411e57dc7f ('w1: Add support for DS28EA00 sequence to w1-therm') Signed-off-by: Dan Carpenter dan.carpen...@oracle.com diff --git a/drivers/w1/slaves/w1_therm.c b/drivers/w1/slaves/w1_therm.c index 3351be6..79ecaf8 100644 --- a/drivers/w1/slaves/w1_therm.c +++ b/drivers/w1/slaves/w1_therm.c @@ -412,7 +412,7 @@ static ssize_t w1_seq_show(struct device *device, c -= snprintf(buf + PAGE_SIZE - c, c, %d\n, seq); return PAGE_SIZE - c; error: - mutex_unlock(sl-master-bus_mutex); + mutex_unlock(sl-master-mutex); return -EIO; } -- David Fries da...@fries.net -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] w1_therm reference count family data
A temperature conversion can take 750 ms and when possible the w1_therm slave driver drops the bus_mutex to allow other bus operations, but that includes operations such as a periodic slave search, which can remove this slave when it is no longer detected. If that happens the sl->family_data will be freed and set to NULL causing w1_slave_show to crash when it wakes up. Signed-off-by: David Fries Reported-By: Thorsten Bschorr Tested-by: Thorsten Bschorr Acked-by: Evgeniy Polyakov --- This should be applied to the stable series as well. In the name of full disclosure, this just narrows the race window, from crashing in normal operation on the reporters system to no longer crashing with multiple readers and another process hammering on inserting/removing the slave device. This patch has been tested for some weeks by those who were affected, Evgeniy Polyakov (maintainer) has approved this fix with the intention of switching to sysfs device reference counting (as w1_slave_show is called through sysfs). drivers/w1/slaves/w1_therm.c | 62 -- 1 file changed, 47 insertions(+), 15 deletions(-) diff --git a/drivers/w1/slaves/w1_therm.c b/drivers/w1/slaves/w1_therm.c index 1f11a20..55eb86c 100644 --- a/drivers/w1/slaves/w1_therm.c +++ b/drivers/w1/slaves/w1_therm.c @@ -59,16 +59,32 @@ MODULE_ALIAS("w1-family-" __stringify(W1_THERM_DS28EA00)); static int w1_strong_pullup = 1; module_param_named(strong_pullup, w1_strong_pullup, int, 0); +struct w1_therm_family_data { + uint8_t rom[9]; + atomic_t refcnt; +}; + +/* return the address of the refcnt in the family data */ +#define THERM_REFCNT(family_data) \ + (&((struct w1_therm_family_data*)family_data)->refcnt) + static int w1_therm_add_slave(struct w1_slave *sl) { - sl->family_data = kzalloc(9, GFP_KERNEL); + sl->family_data = kzalloc(sizeof(struct w1_therm_family_data), + GFP_KERNEL); if (!sl->family_data) return -ENOMEM; + atomic_set(THERM_REFCNT(sl->family_data), 1); return 0; } static void w1_therm_remove_slave(struct w1_slave *sl) { + int refcnt = atomic_sub_return(1, THERM_REFCNT(sl->family_data)); + while(refcnt) { + msleep(1000); + refcnt = atomic_read(THERM_REFCNT(sl->family_data)); + } kfree(sl->family_data); sl->family_data = NULL; } @@ -194,13 +210,22 @@ static ssize_t w1_slave_show(struct device *device, struct w1_slave *sl = dev_to_w1_slave(device); struct w1_master *dev = sl->master; u8 rom[9], crc, verdict, external_power; - int i, max_trying = 10; + int i, ret, max_trying = 10; ssize_t c = PAGE_SIZE; + u8 *family_data = sl->family_data; + + ret = mutex_lock_interruptible(>bus_mutex); + if (ret != 0) + goto post_unlock; - i = mutex_lock_interruptible(>bus_mutex); - if (i != 0) - return i; + if(!sl->family_data) + { + ret = -ENODEV; + goto pre_unlock; + } + /* prevent the slave from going away in sleep */ + atomic_inc(THERM_REFCNT(family_data)); memset(rom, 0, sizeof(rom)); while (max_trying--) { @@ -230,17 +255,19 @@ static ssize_t w1_slave_show(struct device *device, mutex_unlock(>bus_mutex); sleep_rem = msleep_interruptible(tm); - if (sleep_rem != 0) - return -EINTR; + if (sleep_rem != 0) { + ret = -EINTR; + goto post_unlock; + } - i = mutex_lock_interruptible(>bus_mutex); - if (i != 0) - return i; + ret = mutex_lock_interruptible(>bus_mutex); + if (ret != 0) + goto post_unlock; } else if (!w1_strong_pullup) { sleep_rem = msleep_interruptible(tm); if (sleep_rem != 0) { - mutex_unlock(>bus_mutex); - return -EINTR; + ret = -EINTR; + goto pre_unlock; } } @@ -269,19 +296,24 @@ static ssize_t w1_slave_show(struct device *device, c -= snprintf(buf + PAGE_SIZE - c, c, ": crc=%02x %s\n", crc, (verdict) ? "YES" : "NO"); if (verdict) - memcpy(sl->family_data, rom, si
[PATCH] w1_therm reference count family data
A temperature conversion can take 750 ms and when possible the w1_therm slave driver drops the bus_mutex to allow other bus operations, but that includes operations such as a periodic slave search, which can remove this slave when it is no longer detected. If that happens the sl-family_data will be freed and set to NULL causing w1_slave_show to crash when it wakes up. Signed-off-by: David Fries da...@fries.net Reported-By: Thorsten Bschorr thors...@bschorr.de Tested-by: Thorsten Bschorr thors...@bschorr.de Acked-by: Evgeniy Polyakov z...@ioremap.net --- This should be applied to the stable series as well. In the name of full disclosure, this just narrows the race window, from crashing in normal operation on the reporters system to no longer crashing with multiple readers and another process hammering on inserting/removing the slave device. This patch has been tested for some weeks by those who were affected, Evgeniy Polyakov (maintainer) has approved this fix with the intention of switching to sysfs device reference counting (as w1_slave_show is called through sysfs). drivers/w1/slaves/w1_therm.c | 62 -- 1 file changed, 47 insertions(+), 15 deletions(-) diff --git a/drivers/w1/slaves/w1_therm.c b/drivers/w1/slaves/w1_therm.c index 1f11a20..55eb86c 100644 --- a/drivers/w1/slaves/w1_therm.c +++ b/drivers/w1/slaves/w1_therm.c @@ -59,16 +59,32 @@ MODULE_ALIAS(w1-family- __stringify(W1_THERM_DS28EA00)); static int w1_strong_pullup = 1; module_param_named(strong_pullup, w1_strong_pullup, int, 0); +struct w1_therm_family_data { + uint8_t rom[9]; + atomic_t refcnt; +}; + +/* return the address of the refcnt in the family data */ +#define THERM_REFCNT(family_data) \ + (((struct w1_therm_family_data*)family_data)-refcnt) + static int w1_therm_add_slave(struct w1_slave *sl) { - sl-family_data = kzalloc(9, GFP_KERNEL); + sl-family_data = kzalloc(sizeof(struct w1_therm_family_data), + GFP_KERNEL); if (!sl-family_data) return -ENOMEM; + atomic_set(THERM_REFCNT(sl-family_data), 1); return 0; } static void w1_therm_remove_slave(struct w1_slave *sl) { + int refcnt = atomic_sub_return(1, THERM_REFCNT(sl-family_data)); + while(refcnt) { + msleep(1000); + refcnt = atomic_read(THERM_REFCNT(sl-family_data)); + } kfree(sl-family_data); sl-family_data = NULL; } @@ -194,13 +210,22 @@ static ssize_t w1_slave_show(struct device *device, struct w1_slave *sl = dev_to_w1_slave(device); struct w1_master *dev = sl-master; u8 rom[9], crc, verdict, external_power; - int i, max_trying = 10; + int i, ret, max_trying = 10; ssize_t c = PAGE_SIZE; + u8 *family_data = sl-family_data; + + ret = mutex_lock_interruptible(dev-bus_mutex); + if (ret != 0) + goto post_unlock; - i = mutex_lock_interruptible(dev-bus_mutex); - if (i != 0) - return i; + if(!sl-family_data) + { + ret = -ENODEV; + goto pre_unlock; + } + /* prevent the slave from going away in sleep */ + atomic_inc(THERM_REFCNT(family_data)); memset(rom, 0, sizeof(rom)); while (max_trying--) { @@ -230,17 +255,19 @@ static ssize_t w1_slave_show(struct device *device, mutex_unlock(dev-bus_mutex); sleep_rem = msleep_interruptible(tm); - if (sleep_rem != 0) - return -EINTR; + if (sleep_rem != 0) { + ret = -EINTR; + goto post_unlock; + } - i = mutex_lock_interruptible(dev-bus_mutex); - if (i != 0) - return i; + ret = mutex_lock_interruptible(dev-bus_mutex); + if (ret != 0) + goto post_unlock; } else if (!w1_strong_pullup) { sleep_rem = msleep_interruptible(tm); if (sleep_rem != 0) { - mutex_unlock(dev-bus_mutex); - return -EINTR; + ret = -EINTR; + goto pre_unlock; } } @@ -269,19 +296,24 @@ static ssize_t w1_slave_show(struct device *device, c -= snprintf(buf + PAGE_SIZE - c, c, : crc=%02x %s\n, crc, (verdict) ? YES : NO); if (verdict) - memcpy(sl-family_data, rom, sizeof(rom)); + memcpy
Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm
It has not been solved. Evgeniy would like to make use of the sysfs device management instead of the current reference counting, however I haven't heard any volunteers to do that work. I posted a quick fix patch, it was very easy to crash without this patch, it doesn't completely solve the race conditions, and I don't think it can be solved in just a slave driver change. Are you up for the challenge? On Wed, Apr 15, 2015 at 09:52:27AM +0200, Jonathan ALIBERT wrote: > Hi, > > Do you know if the problem has been solved ? > > Cheers, > > *Jonathan ALIBERT* > *06 32 26 59 12* > *265, route de Saint Haon* > *42 370 RENAISON* > > > 2015-03-19 1:09 GMT+01:00 David Fries : > > > On Wed, Mar 18, 2015 at 06:18:53PM +0300, Evgeniy Polyakov wrote: > > > Hi > > > > > > 18.03.2015, 07:20, "David Fries" : > > > > static void w1_therm_remove_slave(struct w1_slave *sl) > > > > { > > > > + int refcnt = atomic_sub_return(1, THERM_REFCNT(sl->family_data)); > > > > + while(refcnt) { > > > > + msleep(1000); > > > > + refcnt = atomic_read(THERM_REFCNT(sl->family_data)); > > > > + } > > > > kfree(sl->family_data); > > > > sl->family_data = NULL; > > > > } > > > > > > Can we replace this whole atomic manipulations with kref_t and free > > family data in the place > > > which actually drops reference counter to zero? > > > > > > I.e. we return from remove_slave() function potentially leaving family > > data floating around, it will be freed > > > when the last user drops the reference. There is still a race between > > increasing reference when starting > > > reading and removing slave device, i.e. one starts reading, while > > attached slave device is being removed, > > > but that's a different problem. > > > > With the two while loops I posted, I see with two clients reading > > w1_slave, the other command to remove a slave gets permanently stuck > > in w1_therm_remove_slave, which keeps the slave around while the > > clients continue to read. I wouldn't predict things going better by > > keeping family_data around longer, the slave data would still go away > > with readers around. > > > > -- > > David Fries PGP pub CB1EE8F0 > > http://fries.net/~david/ > > -- David Fries PGP pub CB1EE8F0 http://fries.net/~david/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm
It has not been solved. Evgeniy would like to make use of the sysfs device management instead of the current reference counting, however I haven't heard any volunteers to do that work. I posted a quick fix patch, it was very easy to crash without this patch, it doesn't completely solve the race conditions, and I don't think it can be solved in just a slave driver change. Are you up for the challenge? On Wed, Apr 15, 2015 at 09:52:27AM +0200, Jonathan ALIBERT wrote: Hi, Do you know if the problem has been solved ? Cheers, *Jonathan ALIBERT* *06 32 26 59 12* *265, route de Saint Haon* *42 370 RENAISON* 2015-03-19 1:09 GMT+01:00 David Fries da...@fries.net: On Wed, Mar 18, 2015 at 06:18:53PM +0300, Evgeniy Polyakov wrote: Hi 18.03.2015, 07:20, David Fries da...@fries.net: static void w1_therm_remove_slave(struct w1_slave *sl) { + int refcnt = atomic_sub_return(1, THERM_REFCNT(sl-family_data)); + while(refcnt) { + msleep(1000); + refcnt = atomic_read(THERM_REFCNT(sl-family_data)); + } kfree(sl-family_data); sl-family_data = NULL; } Can we replace this whole atomic manipulations with kref_t and free family data in the place which actually drops reference counter to zero? I.e. we return from remove_slave() function potentially leaving family data floating around, it will be freed when the last user drops the reference. There is still a race between increasing reference when starting reading and removing slave device, i.e. one starts reading, while attached slave device is being removed, but that's a different problem. With the two while loops I posted, I see with two clients reading w1_slave, the other command to remove a slave gets permanently stuck in w1_therm_remove_slave, which keeps the slave around while the clients continue to read. I wouldn't predict things going better by keeping family_data around longer, the slave data would still go away with readers around. -- David Fries da...@fries.netPGP pub CB1EE8F0 http://fries.net/~david/ -- David Fries da...@fries.netPGP pub CB1EE8F0 http://fries.net/~david/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm
On Wed, Mar 18, 2015 at 06:18:53PM +0300, Evgeniy Polyakov wrote: > Hi > > 18.03.2015, 07:20, "David Fries" : > > static void w1_therm_remove_slave(struct w1_slave *sl) > > { > > + int refcnt = atomic_sub_return(1, THERM_REFCNT(sl->family_data)); > > + while(refcnt) { > > + msleep(1000); > > + refcnt = atomic_read(THERM_REFCNT(sl->family_data)); > > + } > > kfree(sl->family_data); > > sl->family_data = NULL; > > } > > Can we replace this whole atomic manipulations with kref_t and free family > data in the place > which actually drops reference counter to zero? > > I.e. we return from remove_slave() function potentially leaving family data > floating around, it will be freed > when the last user drops the reference. There is still a race between > increasing reference when starting > reading and removing slave device, i.e. one starts reading, while attached > slave device is being removed, > but that's a different problem. With the two while loops I posted, I see with two clients reading w1_slave, the other command to remove a slave gets permanently stuck in w1_therm_remove_slave, which keeps the slave around while the clients continue to read. I wouldn't predict things going better by keeping family_data around longer, the slave data would still go away with readers around. -- David Fries PGP pub CB1EE8F0 http://fries.net/~david/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm
On Wed, Mar 18, 2015 at 06:18:53PM +0300, Evgeniy Polyakov wrote: Hi 18.03.2015, 07:20, David Fries da...@fries.net: static void w1_therm_remove_slave(struct w1_slave *sl) { + int refcnt = atomic_sub_return(1, THERM_REFCNT(sl-family_data)); + while(refcnt) { + msleep(1000); + refcnt = atomic_read(THERM_REFCNT(sl-family_data)); + } kfree(sl-family_data); sl-family_data = NULL; } Can we replace this whole atomic manipulations with kref_t and free family data in the place which actually drops reference counter to zero? I.e. we return from remove_slave() function potentially leaving family data floating around, it will be freed when the last user drops the reference. There is still a race between increasing reference when starting reading and removing slave device, i.e. one starts reading, while attached slave device is being removed, but that's a different problem. With the two while loops I posted, I see with two clients reading w1_slave, the other command to remove a slave gets permanently stuck in w1_therm_remove_slave, which keeps the slave around while the clients continue to read. I wouldn't predict things going better by keeping family_data around longer, the slave data would still go away with readers around. -- David Fries da...@fries.netPGP pub CB1EE8F0 http://fries.net/~david/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm
On Sat, Mar 14, 2015 at 11:55:16PM +0300, Evgeniy Polyakov wrote: > Hi David > > 12.03.2015, 03:54, "David Fries" : > > Would that be removing all four refcnt, w1_slave, w1_master, > > w1_family, w1_cb_block, or just some of them? It sounds good to me, > > if that had bugs there would be much more than just the w1 system > > relying on it. I don't know enough about that system or have the time > > to code up that change. > > > > I can take another look at and post the reference counting w1_therm > > fix instead of the mutex version as a near term work around until that > > is available if you want. > > Please cook up a quick fix for this problem - this bug really hurts people. > And then we will discuss how 'ideal' life cycle should look Done, I don't like it, I'm not sure anyone else will either, but I'm no longer crashing in testing, so that's an improvement. My "production" system doesn't use w1_therm, so I only see these bugs in development testing it. I've come to the conclusion that in the face of a slave vanishing, w1_therm can't avoid all the race conditions, so the real fix must be elsewhere. >From 51d4024ca667c8b712de462489d125a78e85aa57 Mon Sep 17 00:00:00 2001 From: David Fries Date: Sat, 7 Mar 2015 22:25:37 -0600 Subject: [PATCH] w1_therm, reduce race conditions in w1_slave_show After applying this patch commands such as the following in one process, slave=28-02c95fb1 while true; do echo $slave > /sys/devices/w1_bus_master1/w1_master_add; sleep .1; echo $slave > /sys/devices/w1_bus_master1/w1_master_remove; sleep .1; done and then two at the same time in two other processes, slave=28-02c95fb1 while true; do time cat /sys/devices/w1_bus_master1/$slave/w1_slave ; sleep .1; done then randomly stop all three and repeat. With this patch I no longer see crashes, but at best this patch effectively hiding the result of a race condition. sl->family_data is being freed and set to NULL in the slave removal while the w1_slave_show is then dereferencing it, this holds on to the pointer meaning it's probably clobbering memory now instead of crashing. I wonder if that would make RCU be a fit for this? The original bug report was pointing the problem as unlocking bus_mutex while waiting for the temperature conversion, but I was getting sl->family_data set to NULL more reliable without external power which means bux_mutex was held for the duration of w1_slave_show, which is not to say that the original bug report wasn't correct, it is to say that even with the spinlock, holding bus_mutex on the slave, isn't sufficient to keep the slave from being removed. Reported-By: Thorsten Bschorr --- drivers/w1/slaves/w1_therm.c | 70 +- 1 file changed, 55 insertions(+), 15 deletions(-) diff --git a/drivers/w1/slaves/w1_therm.c b/drivers/w1/slaves/w1_therm.c index 1f11a20..403285d 100644 --- a/drivers/w1/slaves/w1_therm.c +++ b/drivers/w1/slaves/w1_therm.c @@ -59,16 +59,32 @@ MODULE_ALIAS("w1-family-" __stringify(W1_THERM_DS28EA00)); static int w1_strong_pullup = 1; module_param_named(strong_pullup, w1_strong_pullup, int, 0); +struct w1_therm_family_data { + uint8_t rom[9]; + atomic_t refcnt; +}; + +/* return the address of the refcnt in the family data */ +#define THERM_REFCNT(family_data) \ + (&((struct w1_therm_family_data*)family_data)->refcnt) + static int w1_therm_add_slave(struct w1_slave *sl) { - sl->family_data = kzalloc(9, GFP_KERNEL); + sl->family_data = kzalloc(sizeof(struct w1_therm_family_data), + GFP_KERNEL); if (!sl->family_data) return -ENOMEM; + atomic_set(THERM_REFCNT(sl->family_data), 1); return 0; } static void w1_therm_remove_slave(struct w1_slave *sl) { + int refcnt = atomic_sub_return(1, THERM_REFCNT(sl->family_data)); + while(refcnt) { + msleep(1000); + refcnt = atomic_read(THERM_REFCNT(sl->family_data)); + } kfree(sl->family_data); sl->family_data = NULL; } @@ -194,13 +210,30 @@ static ssize_t w1_slave_show(struct device *device, struct w1_slave *sl = dev_to_w1_slave(device); struct w1_master *dev = sl->master; u8 rom[9], crc, verdict, external_power; - int i, max_trying = 10; + int i, ret, max_trying = 10; ssize_t c = PAGE_SIZE; + u8 *family_data = sl->family_data; + + ret = mutex_lock_interruptible(>bus_mutex); + if (ret != 0) + goto post_unlock; - i = mutex_lock_interruptible(>bus_mutex); - if (i != 0) - return i; + if(!sl->family_data) + { + ret = -ENODEV; + /* Note for anyoe who actually saw this message, it is a known +* problem with eith
Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm
On Sat, Mar 14, 2015 at 11:55:16PM +0300, Evgeniy Polyakov wrote: Hi David 12.03.2015, 03:54, David Fries da...@fries.net: Would that be removing all four refcnt, w1_slave, w1_master, w1_family, w1_cb_block, or just some of them? It sounds good to me, if that had bugs there would be much more than just the w1 system relying on it. I don't know enough about that system or have the time to code up that change. I can take another look at and post the reference counting w1_therm fix instead of the mutex version as a near term work around until that is available if you want. Please cook up a quick fix for this problem - this bug really hurts people. And then we will discuss how 'ideal' life cycle should look Done, I don't like it, I'm not sure anyone else will either, but I'm no longer crashing in testing, so that's an improvement. My production system doesn't use w1_therm, so I only see these bugs in development testing it. I've come to the conclusion that in the face of a slave vanishing, w1_therm can't avoid all the race conditions, so the real fix must be elsewhere. From 51d4024ca667c8b712de462489d125a78e85aa57 Mon Sep 17 00:00:00 2001 From: David Fries da...@fries.net Date: Sat, 7 Mar 2015 22:25:37 -0600 Subject: [PATCH] w1_therm, reduce race conditions in w1_slave_show After applying this patch commands such as the following in one process, slave=28-02c95fb1 while true; do echo $slave /sys/devices/w1_bus_master1/w1_master_add; sleep .1; echo $slave /sys/devices/w1_bus_master1/w1_master_remove; sleep .1; done and then two at the same time in two other processes, slave=28-02c95fb1 while true; do time cat /sys/devices/w1_bus_master1/$slave/w1_slave ; sleep .1; done then randomly stop all three and repeat. With this patch I no longer see crashes, but at best this patch effectively hiding the result of a race condition. sl-family_data is being freed and set to NULL in the slave removal while the w1_slave_show is then dereferencing it, this holds on to the pointer meaning it's probably clobbering memory now instead of crashing. I wonder if that would make RCU be a fit for this? The original bug report was pointing the problem as unlocking bus_mutex while waiting for the temperature conversion, but I was getting sl-family_data set to NULL more reliable without external power which means bux_mutex was held for the duration of w1_slave_show, which is not to say that the original bug report wasn't correct, it is to say that even with the spinlock, holding bus_mutex on the slave, isn't sufficient to keep the slave from being removed. Reported-By: Thorsten Bschorr thors...@bschorr.de --- drivers/w1/slaves/w1_therm.c | 70 +- 1 file changed, 55 insertions(+), 15 deletions(-) diff --git a/drivers/w1/slaves/w1_therm.c b/drivers/w1/slaves/w1_therm.c index 1f11a20..403285d 100644 --- a/drivers/w1/slaves/w1_therm.c +++ b/drivers/w1/slaves/w1_therm.c @@ -59,16 +59,32 @@ MODULE_ALIAS(w1-family- __stringify(W1_THERM_DS28EA00)); static int w1_strong_pullup = 1; module_param_named(strong_pullup, w1_strong_pullup, int, 0); +struct w1_therm_family_data { + uint8_t rom[9]; + atomic_t refcnt; +}; + +/* return the address of the refcnt in the family data */ +#define THERM_REFCNT(family_data) \ + (((struct w1_therm_family_data*)family_data)-refcnt) + static int w1_therm_add_slave(struct w1_slave *sl) { - sl-family_data = kzalloc(9, GFP_KERNEL); + sl-family_data = kzalloc(sizeof(struct w1_therm_family_data), + GFP_KERNEL); if (!sl-family_data) return -ENOMEM; + atomic_set(THERM_REFCNT(sl-family_data), 1); return 0; } static void w1_therm_remove_slave(struct w1_slave *sl) { + int refcnt = atomic_sub_return(1, THERM_REFCNT(sl-family_data)); + while(refcnt) { + msleep(1000); + refcnt = atomic_read(THERM_REFCNT(sl-family_data)); + } kfree(sl-family_data); sl-family_data = NULL; } @@ -194,13 +210,30 @@ static ssize_t w1_slave_show(struct device *device, struct w1_slave *sl = dev_to_w1_slave(device); struct w1_master *dev = sl-master; u8 rom[9], crc, verdict, external_power; - int i, max_trying = 10; + int i, ret, max_trying = 10; ssize_t c = PAGE_SIZE; + u8 *family_data = sl-family_data; + + ret = mutex_lock_interruptible(dev-bus_mutex); + if (ret != 0) + goto post_unlock; - i = mutex_lock_interruptible(dev-bus_mutex); - if (i != 0) - return i; + if(!sl-family_data) + { + ret = -ENODEV; + /* Note for anyoe who actually saw this message, it is a known +* problem with either slave drivers or this driver in +* particular and the request is only a canary indication as +* to how many people and how often
Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm
On Tue, Mar 10, 2015 at 04:52:00PM +0300, Evgeniy Polyakov wrote: > Hi > > 10.03.2015, 02:09, "David Fries" : > > > diff --git a/drivers/w1/slaves/w1_therm.c b/drivers/w1/slaves/w1_therm.c > > index 1f11a20..39a9e6a 100644 > > --- a/drivers/w1/slaves/w1_therm.c > > +++ b/drivers/w1/slaves/w1_therm.c > > @@ -59,9 +59,20 @@ MODULE_ALIAS("w1-family-" > > __stringify(W1_THERM_DS28EA00)); > > static int w1_strong_pullup = 1; > > module_param_named(strong_pullup, w1_strong_pullup, int, 0); > > > > +struct w1_therm_family_data { > > + uint8_t rom[9]; > > + struct mutex lock; > > +}; > > This approach will not scale to other w1 families, I would rather prefer > solutions on w1 level, > not in particular drivers. What if we drop slave reference counter at all in > favor of automatic sysfs device management? I looked and didn't see any of the other slaves dropping the lock and being in this situation, but that doesn't mean they won't in the future. Personally I'm just using netlink and don't plan on using any of the slave drivers. Would that be removing all four refcnt, w1_slave, w1_master, w1_family, w1_cb_block, or just some of them? It sounds good to me, if that had bugs there would be much more than just the w1 system relying on it. I don't know enough about that system or have the time to code up that change. I can take another look at and post the reference counting w1_therm fix instead of the mutex version as a near term work around until that is available if you want. -- David Fries PGP pub CB1EE8F0 http://fries.net/~david/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm
On Tue, Mar 10, 2015 at 01:34:14AM +0100, Thorsten Bschorr wrote: > > It looks like your patch runs into dead locks problems: I was testing reading and removing the slave, I didn't test two readers, I'll keep that in mind. For some reason I was thinking it was a try lock after the sleep, it isn't, a signal can cause it to return a value, but yes that's not going to work. I have another version that uses an atomic counter in place of this mutex, it requires a loop in the remove slave. I have a one wire network with 14 temperature sensors on it. What I do is I have a program that talks to the kernel over netlink, which is now non-blocking for one wire after an earlier set of changes I made. It sends out all the conversion request requests, waits, sends the read request packets, then gets back the results, and relays the results to the program that requested them. It doesn't use the w1_therm and avoids all these problems. If I read them sequentially with w1_therm, that would take more than 14*.750 or 10.5 seconds, there's no way with w1_therm to read them at once without having 14 threads or programs doing so, the problem is it takes a lot more programming to do netlink. > > I have a cron job reading my sensors. If I read the sensors on another > > thread (e.g. via cat), the 2nd thread can produce a dead lock: > > > > * thread 1 has bus & sl lock > > * thread 1 drops bus lock, keeps sl locks and sleeps > > * thread 2 get bus lock, waits for sl lock > > * thread 1 returns from sleep, waits for bus lock, but this is help by > > thread 2 > > > Aquiring sl lock before the bus lock prevents this dead lock (no > change of locking-order). > With search enabled, removing a device by w1_search_process_cb then > also worked as intended: > > Mar 10 01:29:04 pi kernel: [ 924.870893] w1_therm_remove_slave about > to lock faily_data->lock > Mar 10 01:29:04 pi kernel: [ 924.870935] w1_therm_remove_slave > unlocked faily_data->lock > > Mar 10 01:29:04 pi kernel: [ 924.871115] w1_therm_remove_slave about > to lock faily_data->lock > Mar 10 01:29:05 pi kernel: [ 925.151242] start sleep d117a600 refcnt 1 ** > Mar 10 01:29:05 pi kernel: [ 925.151277] end sleep d117a600 refcnt 0 ** > Mar 10 01:29:11 pi kernel: [ 931.295344] w1_slave_driver > 10-000802cc045a: Read failed CRC check > Mar 10 01:29:11 pi kernel: [ 931.295437] w1_therm_remove_slave > unlocked faily_data->lock > ** I get the ref-cnt before and after the sleep and only log if they differ. > > > But I am unable to judge if mobing the sl lock at the beginning of > w1_slave_show can cause dead locks in other scenarios. I'm not sure, I would probably switch back to the referencing counting version I wrote earlier, or make the bus mutex lock a timed lock or try lock first. -- David Fries PGP pub CB1EE8F0 http://fries.net/~david/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm
On Tue, Mar 10, 2015 at 01:34:14AM +0100, Thorsten Bschorr wrote: It looks like your patch runs into dead locks problems: I was testing reading and removing the slave, I didn't test two readers, I'll keep that in mind. For some reason I was thinking it was a try lock after the sleep, it isn't, a signal can cause it to return a value, but yes that's not going to work. I have another version that uses an atomic counter in place of this mutex, it requires a loop in the remove slave. I have a one wire network with 14 temperature sensors on it. What I do is I have a program that talks to the kernel over netlink, which is now non-blocking for one wire after an earlier set of changes I made. It sends out all the conversion request requests, waits, sends the read request packets, then gets back the results, and relays the results to the program that requested them. It doesn't use the w1_therm and avoids all these problems. If I read them sequentially with w1_therm, that would take more than 14*.750 or 10.5 seconds, there's no way with w1_therm to read them at once without having 14 threads or programs doing so, the problem is it takes a lot more programming to do netlink. I have a cron job reading my sensors. If I read the sensors on another thread (e.g. via cat), the 2nd thread can produce a dead lock: * thread 1 has bus sl lock * thread 1 drops bus lock, keeps sl locks and sleeps * thread 2 get bus lock, waits for sl lock * thread 1 returns from sleep, waits for bus lock, but this is help by thread 2 Aquiring sl lock before the bus lock prevents this dead lock (no change of locking-order). With search enabled, removing a device by w1_search_process_cb then also worked as intended: Mar 10 01:29:04 pi kernel: [ 924.870893] w1_therm_remove_slave about to lock faily_data-lock Mar 10 01:29:04 pi kernel: [ 924.870935] w1_therm_remove_slave unlocked faily_data-lock Mar 10 01:29:04 pi kernel: [ 924.871115] w1_therm_remove_slave about to lock faily_data-lock Mar 10 01:29:05 pi kernel: [ 925.151242] start sleep d117a600 refcnt 1 ** Mar 10 01:29:05 pi kernel: [ 925.151277] end sleep d117a600 refcnt 0 ** Mar 10 01:29:11 pi kernel: [ 931.295344] w1_slave_driver 10-000802cc045a: Read failed CRC check Mar 10 01:29:11 pi kernel: [ 931.295437] w1_therm_remove_slave unlocked faily_data-lock ** I get the ref-cnt before and after the sleep and only log if they differ. But I am unable to judge if mobing the sl lock at the beginning of w1_slave_show can cause dead locks in other scenarios. I'm not sure, I would probably switch back to the referencing counting version I wrote earlier, or make the bus mutex lock a timed lock or try lock first. -- David Fries da...@fries.netPGP pub CB1EE8F0 http://fries.net/~david/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm
On Tue, Mar 10, 2015 at 04:52:00PM +0300, Evgeniy Polyakov wrote: Hi 10.03.2015, 02:09, David Fries da...@fries.net: diff --git a/drivers/w1/slaves/w1_therm.c b/drivers/w1/slaves/w1_therm.c index 1f11a20..39a9e6a 100644 --- a/drivers/w1/slaves/w1_therm.c +++ b/drivers/w1/slaves/w1_therm.c @@ -59,9 +59,20 @@ MODULE_ALIAS(w1-family- __stringify(W1_THERM_DS28EA00)); static int w1_strong_pullup = 1; module_param_named(strong_pullup, w1_strong_pullup, int, 0); +struct w1_therm_family_data { + uint8_t rom[9]; + struct mutex lock; +}; This approach will not scale to other w1 families, I would rather prefer solutions on w1 level, not in particular drivers. What if we drop slave reference counter at all in favor of automatic sysfs device management? I looked and didn't see any of the other slaves dropping the lock and being in this situation, but that doesn't mean they won't in the future. Personally I'm just using netlink and don't plan on using any of the slave drivers. Would that be removing all four refcnt, w1_slave, w1_master, w1_family, w1_cb_block, or just some of them? It sounds good to me, if that had bugs there would be much more than just the w1 system relying on it. I don't know enough about that system or have the time to code up that change. I can take another look at and post the reference counting w1_therm fix instead of the mutex version as a near term work around until that is available if you want. -- David Fries da...@fries.netPGP pub CB1EE8F0 http://fries.net/~david/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm
On Mon, Mar 09, 2015 at 11:47:24PM +0100, Thorsten Bschorr wrote: > > Here's a different strategy, add a w1_therm family_data specific mutex > > so w1_therm_remove_slave won't return while another thread is in > > w1_slave_show. Thoughts? > > > > I included three patches, the first is the proposed fix, the second > > makes it more reproducible (and since my testing doesn't have external > > power I had to ignore that check), the third is just some debugging > > messages. For testing I'm starting a read from w1_slave then manually > > remove the sensor, which seems to do the trick. > > I'll test your patch, but keeping the original sleep-time tm. Oops, sorry, it got lost in the shuffle, here's the first patch again (the others were for debugging and increase that time and so wouldn't go upstream anyway). >From 777f5fd75f5f99a3352863e83d226c7b65ebdaa4 Mon Sep 17 00:00:00 2001 From: David Fries Date: Sat, 7 Mar 2015 22:25:37 -0600 Subject: [PATCH] w1_therm, don't let the slave go away while in w1_slave_show A temperature conversion can take 750 ms to complete, if the sensor is externally powered it releases the bus_mutex while it waits, but if the slave device is removed sl becomes a dangling pointer. The race condition window can be 10 * 750 ms = 7.5 seconds if the crc check fails. Reported-By: Thorsten Bschorr Signed-off-by: David Fries --- drivers/w1/slaves/w1_therm.c | 51 +--- 1 file changed, 38 insertions(+), 13 deletions(-) diff --git a/drivers/w1/slaves/w1_therm.c b/drivers/w1/slaves/w1_therm.c index 1f11a20..39a9e6a 100644 --- a/drivers/w1/slaves/w1_therm.c +++ b/drivers/w1/slaves/w1_therm.c @@ -59,9 +59,20 @@ MODULE_ALIAS("w1-family-" __stringify(W1_THERM_DS28EA00)); static int w1_strong_pullup = 1; module_param_named(strong_pullup, w1_strong_pullup, int, 0); +struct w1_therm_family_data { + uint8_t rom[9]; + struct mutex lock; +}; + +/* return the address of the lock in the family data */ +#define THERM_LOCK(sl) \ + (&((struct w1_therm_family_data*)sl->family_data)->lock) + static int w1_therm_add_slave(struct w1_slave *sl) { - sl->family_data = kzalloc(9, GFP_KERNEL); + sl->family_data = kzalloc(sizeof(struct w1_therm_family_data), + GFP_KERNEL); + mutex_init(THERM_LOCK(sl)); if (!sl->family_data) return -ENOMEM; return 0; @@ -69,6 +80,11 @@ static int w1_therm_add_slave(struct w1_slave *sl) static void w1_therm_remove_slave(struct w1_slave *sl) { + /* Getting the lock means w1_slave_show isn't sleeping and the +* family_data can be freed. +*/ + mutex_lock(THERM_LOCK(sl)); + mutex_unlock(THERM_LOCK(sl)); kfree(sl->family_data); sl->family_data = NULL; } @@ -194,13 +210,15 @@ static ssize_t w1_slave_show(struct device *device, struct w1_slave *sl = dev_to_w1_slave(device); struct w1_master *dev = sl->master; u8 rom[9], crc, verdict, external_power; - int i, max_trying = 10; + int i, ret, max_trying = 10; ssize_t c = PAGE_SIZE; - i = mutex_lock_interruptible(>bus_mutex); - if (i != 0) - return i; + ret = mutex_lock_interruptible(>bus_mutex); + if (ret != 0) + goto post_unlock; + /* prevent the slave from going away in sleep */ + mutex_lock(THERM_LOCK(sl)); memset(rom, 0, sizeof(rom)); while (max_trying--) { @@ -230,17 +248,19 @@ static ssize_t w1_slave_show(struct device *device, mutex_unlock(>bus_mutex); sleep_rem = msleep_interruptible(tm); - if (sleep_rem != 0) - return -EINTR; + if (sleep_rem != 0) { + ret = -EINTR; + goto post_unlock; + } - i = mutex_lock_interruptible(>bus_mutex); - if (i != 0) - return i; + ret = mutex_lock_interruptible(>bus_mutex); + if (ret != 0) + goto post_unlock; } else if (!w1_strong_pullup) { sleep_rem = msleep_interruptible(tm); if (sleep_rem != 0) { - mutex_unlock(>bus_mutex); - return -EINTR; + ret = -EINTR; + goto pre_unlock; } } @@ -279,9 +299,14 @@ static ssize_t w1_slave_sh
Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm
On Mon, Mar 09, 2015 at 11:47:24PM +0100, Thorsten Bschorr wrote: Here's a different strategy, add a w1_therm family_data specific mutex so w1_therm_remove_slave won't return while another thread is in w1_slave_show. Thoughts? I included three patches, the first is the proposed fix, the second makes it more reproducible (and since my testing doesn't have external power I had to ignore that check), the third is just some debugging messages. For testing I'm starting a read from w1_slave then manually remove the sensor, which seems to do the trick. I'll test your patch, but keeping the original sleep-time tm. Oops, sorry, it got lost in the shuffle, here's the first patch again (the others were for debugging and increase that time and so wouldn't go upstream anyway). From 777f5fd75f5f99a3352863e83d226c7b65ebdaa4 Mon Sep 17 00:00:00 2001 From: David Fries da...@fries.net Date: Sat, 7 Mar 2015 22:25:37 -0600 Subject: [PATCH] w1_therm, don't let the slave go away while in w1_slave_show A temperature conversion can take 750 ms to complete, if the sensor is externally powered it releases the bus_mutex while it waits, but if the slave device is removed sl becomes a dangling pointer. The race condition window can be 10 * 750 ms = 7.5 seconds if the crc check fails. Reported-By: Thorsten Bschorr thors...@bschorr.de Signed-off-by: David Fries da...@fries.net --- drivers/w1/slaves/w1_therm.c | 51 +--- 1 file changed, 38 insertions(+), 13 deletions(-) diff --git a/drivers/w1/slaves/w1_therm.c b/drivers/w1/slaves/w1_therm.c index 1f11a20..39a9e6a 100644 --- a/drivers/w1/slaves/w1_therm.c +++ b/drivers/w1/slaves/w1_therm.c @@ -59,9 +59,20 @@ MODULE_ALIAS(w1-family- __stringify(W1_THERM_DS28EA00)); static int w1_strong_pullup = 1; module_param_named(strong_pullup, w1_strong_pullup, int, 0); +struct w1_therm_family_data { + uint8_t rom[9]; + struct mutex lock; +}; + +/* return the address of the lock in the family data */ +#define THERM_LOCK(sl) \ + (((struct w1_therm_family_data*)sl-family_data)-lock) + static int w1_therm_add_slave(struct w1_slave *sl) { - sl-family_data = kzalloc(9, GFP_KERNEL); + sl-family_data = kzalloc(sizeof(struct w1_therm_family_data), + GFP_KERNEL); + mutex_init(THERM_LOCK(sl)); if (!sl-family_data) return -ENOMEM; return 0; @@ -69,6 +80,11 @@ static int w1_therm_add_slave(struct w1_slave *sl) static void w1_therm_remove_slave(struct w1_slave *sl) { + /* Getting the lock means w1_slave_show isn't sleeping and the +* family_data can be freed. +*/ + mutex_lock(THERM_LOCK(sl)); + mutex_unlock(THERM_LOCK(sl)); kfree(sl-family_data); sl-family_data = NULL; } @@ -194,13 +210,15 @@ static ssize_t w1_slave_show(struct device *device, struct w1_slave *sl = dev_to_w1_slave(device); struct w1_master *dev = sl-master; u8 rom[9], crc, verdict, external_power; - int i, max_trying = 10; + int i, ret, max_trying = 10; ssize_t c = PAGE_SIZE; - i = mutex_lock_interruptible(dev-bus_mutex); - if (i != 0) - return i; + ret = mutex_lock_interruptible(dev-bus_mutex); + if (ret != 0) + goto post_unlock; + /* prevent the slave from going away in sleep */ + mutex_lock(THERM_LOCK(sl)); memset(rom, 0, sizeof(rom)); while (max_trying--) { @@ -230,17 +248,19 @@ static ssize_t w1_slave_show(struct device *device, mutex_unlock(dev-bus_mutex); sleep_rem = msleep_interruptible(tm); - if (sleep_rem != 0) - return -EINTR; + if (sleep_rem != 0) { + ret = -EINTR; + goto post_unlock; + } - i = mutex_lock_interruptible(dev-bus_mutex); - if (i != 0) - return i; + ret = mutex_lock_interruptible(dev-bus_mutex); + if (ret != 0) + goto post_unlock; } else if (!w1_strong_pullup) { sleep_rem = msleep_interruptible(tm); if (sleep_rem != 0) { - mutex_unlock(dev-bus_mutex); - return -EINTR; + ret = -EINTR; + goto pre_unlock; } } @@ -279,9 +299,14 @@ static ssize_t w1_slave_show(struct device *device, c -= snprintf(buf + PAGE_SIZE - c, c, t=%d\n
Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm
On Wed, Mar 04, 2015 at 06:36:41PM +0300, ??? ??? wrote: > Hi David > > 02.03.2015, 03:17, "David Fries" : > > > You are correct, it would be a race condition if it doesn't increment > > the refcnt before unlocking the mutex, and it should get the mutex > > before unref. Here's an updated version, I haven't even tried to > > compile it. > > > > What do you think Evgeniy? > > Sounds correct, it should do the trick, please cook up a patch, Thorsten, can > you test it? I quickly found out that strategy wasn't going to work. w1_unref_slave calls w1_family_notify(BUS_NOTIFY_DEL_DEVICE, sl); which calls sysfs_remove_groups which deadlocks, [] ? wait_woken+0x54/0x54 [] ? kernfs_remove_by_name_ns+0x67/0x82 [] ? remove_files+0x30/0x58 [] ? sysfs_remove_group+0x60/0x7b [] ? sysfs_remove_groups+0x1d/0x2f [] ? w1_unref_slave+0xd9/0x13b [wire] [] ? w1_slave_show+0x11a/0x335 [w1_therm] Here's a different strategy, add a w1_therm family_data specific mutex so w1_therm_remove_slave won't return while another thread is in w1_slave_show. Thoughts? I included three patches, the first is the proposed fix, the second makes it more reproducible (and since my testing doesn't have external power I had to ignore that check), the third is just some debugging messages. For testing I'm starting a read from w1_slave then manually remove the sensor, which seems to do the trick. echo 28-[ds18b20_id] > /sys/devices/w1_bus_master1/w1_master_remove Give it a try and let me know how it goes. My test system host is running 3.16, I'm using kvm to test 3.19 giving it access to the USB one wire dongle, and there's some refcnt problem I've not been able to track down. Once and a while when I have the kvm grab the USB device the host will start printing a refcount problem, which takes a reboot to clear, which is annoying because the reason to test in kvm is to not break the host. w1_master_driver w1_bus_master1: Waiting for w1_bus_master1 to become free: refcnt=1. >From bcde1a8d3d00cc3b13dff7e2476f8c03efc59001 Mon Sep 17 00:00:00 2001 From: David Fries Date: Sat, 7 Mar 2015 22:25:37 -0600 Subject: [PATCH 1/3] w1_therm, don't let the slave go away while in w1_slave_show A temperature conversion can take 750 ms to complete, if the sensor is externally powered it releases the bus_mutex while it waits, but if the slave device is removed sl becomes a dangling pointer. The race condition window can be 10 * 750 ms = 7.5 seconds if the crc check fails. Reported-By: Thorsten Bschorr Signed-off-by: David Fries --- drivers/w1/slaves/w1_therm.c | 52 ++ 1 file changed, 38 insertions(+), 14 deletions(-) diff --git a/drivers/w1/slaves/w1_therm.c b/drivers/w1/slaves/w1_therm.c index 1f11a20..6b3ef93 100644 --- a/drivers/w1/slaves/w1_therm.c +++ b/drivers/w1/slaves/w1_therm.c @@ -59,9 +59,20 @@ MODULE_ALIAS("w1-family-" __stringify(W1_THERM_DS28EA00)); static int w1_strong_pullup = 1; module_param_named(strong_pullup, w1_strong_pullup, int, 0); +struct w1_therm_family_data { + uint8_t rom[9]; + struct mutex lock; +}; + +/* return the address of the lock in the family data */ +#define THERM_LOCK(sl) \ + (&((struct w1_therm_family_data*)sl->family_data)->lock) + static int w1_therm_add_slave(struct w1_slave *sl) { - sl->family_data = kzalloc(9, GFP_KERNEL); + sl->family_data = kzalloc(sizeof(struct w1_therm_family_data), + GFP_KERNEL); + mutex_init(THERM_LOCK(sl)); if (!sl->family_data) return -ENOMEM; return 0; @@ -69,6 +80,11 @@ static int w1_therm_add_slave(struct w1_slave *sl) static void w1_therm_remove_slave(struct w1_slave *sl) { + /* Getting the lock means w1_slave_show isn't sleeping and the +* family_data can be freed. +*/ + mutex_lock(THERM_LOCK(sl)); + mutex_unlock(THERM_LOCK(sl)); kfree(sl->family_data); sl->family_data = NULL; } @@ -194,13 +210,15 @@ static ssize_t w1_slave_show(struct device *device, struct w1_slave *sl = dev_to_w1_slave(device); struct w1_master *dev = sl->master; u8 rom[9], crc, verdict, external_power; - int i, max_trying = 10; + int i, ret, max_trying = 10; ssize_t c = PAGE_SIZE; - i = mutex_lock_interruptible(>bus_mutex); - if (i != 0) - return i; + ret = mutex_lock_interruptible(>bus_mutex); + if (ret != 0) + goto post_unlock; + /* prevent the slave from going away in sleep */ + mutex_lock(THERM_LOCK(sl)); memset(rom, 0, sizeof(rom)); while (max_trying--) { @@ -229,18 +247,19 @@ static ssize_t w1_slave_show(struct device *device, if (external_power) { mutex_unlock(>bus_mutex);
Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm
On Wed, Mar 04, 2015 at 06:36:41PM +0300, ??? ??? wrote: Hi David 02.03.2015, 03:17, David Fries da...@fries.net: You are correct, it would be a race condition if it doesn't increment the refcnt before unlocking the mutex, and it should get the mutex before unref. Here's an updated version, I haven't even tried to compile it. What do you think Evgeniy? Sounds correct, it should do the trick, please cook up a patch, Thorsten, can you test it? I quickly found out that strategy wasn't going to work. w1_unref_slave calls w1_family_notify(BUS_NOTIFY_DEL_DEVICE, sl); which calls sysfs_remove_groups which deadlocks, [81047426] ? wait_woken+0x54/0x54 [810e7d6e] ? kernfs_remove_by_name_ns+0x67/0x82 [810e97c8] ? remove_files+0x30/0x58 [810e9b8a] ? sysfs_remove_group+0x60/0x7b [810e9bc2] ? sysfs_remove_groups+0x1d/0x2f [a01951ae] ? w1_unref_slave+0xd9/0x13b [wire] [a01a516e] ? w1_slave_show+0x11a/0x335 [w1_therm] Here's a different strategy, add a w1_therm family_data specific mutex so w1_therm_remove_slave won't return while another thread is in w1_slave_show. Thoughts? I included three patches, the first is the proposed fix, the second makes it more reproducible (and since my testing doesn't have external power I had to ignore that check), the third is just some debugging messages. For testing I'm starting a read from w1_slave then manually remove the sensor, which seems to do the trick. echo 28-[ds18b20_id] /sys/devices/w1_bus_master1/w1_master_remove Give it a try and let me know how it goes. My test system host is running 3.16, I'm using kvm to test 3.19 giving it access to the USB one wire dongle, and there's some refcnt problem I've not been able to track down. Once and a while when I have the kvm grab the USB device the host will start printing a refcount problem, which takes a reboot to clear, which is annoying because the reason to test in kvm is to not break the host. w1_master_driver w1_bus_master1: Waiting for w1_bus_master1 to become free: refcnt=1. From bcde1a8d3d00cc3b13dff7e2476f8c03efc59001 Mon Sep 17 00:00:00 2001 From: David Fries da...@fries.net Date: Sat, 7 Mar 2015 22:25:37 -0600 Subject: [PATCH 1/3] w1_therm, don't let the slave go away while in w1_slave_show A temperature conversion can take 750 ms to complete, if the sensor is externally powered it releases the bus_mutex while it waits, but if the slave device is removed sl becomes a dangling pointer. The race condition window can be 10 * 750 ms = 7.5 seconds if the crc check fails. Reported-By: Thorsten Bschorr thors...@bschorr.de Signed-off-by: David Fries da...@fries.net --- drivers/w1/slaves/w1_therm.c | 52 ++ 1 file changed, 38 insertions(+), 14 deletions(-) diff --git a/drivers/w1/slaves/w1_therm.c b/drivers/w1/slaves/w1_therm.c index 1f11a20..6b3ef93 100644 --- a/drivers/w1/slaves/w1_therm.c +++ b/drivers/w1/slaves/w1_therm.c @@ -59,9 +59,20 @@ MODULE_ALIAS(w1-family- __stringify(W1_THERM_DS28EA00)); static int w1_strong_pullup = 1; module_param_named(strong_pullup, w1_strong_pullup, int, 0); +struct w1_therm_family_data { + uint8_t rom[9]; + struct mutex lock; +}; + +/* return the address of the lock in the family data */ +#define THERM_LOCK(sl) \ + (((struct w1_therm_family_data*)sl-family_data)-lock) + static int w1_therm_add_slave(struct w1_slave *sl) { - sl-family_data = kzalloc(9, GFP_KERNEL); + sl-family_data = kzalloc(sizeof(struct w1_therm_family_data), + GFP_KERNEL); + mutex_init(THERM_LOCK(sl)); if (!sl-family_data) return -ENOMEM; return 0; @@ -69,6 +80,11 @@ static int w1_therm_add_slave(struct w1_slave *sl) static void w1_therm_remove_slave(struct w1_slave *sl) { + /* Getting the lock means w1_slave_show isn't sleeping and the +* family_data can be freed. +*/ + mutex_lock(THERM_LOCK(sl)); + mutex_unlock(THERM_LOCK(sl)); kfree(sl-family_data); sl-family_data = NULL; } @@ -194,13 +210,15 @@ static ssize_t w1_slave_show(struct device *device, struct w1_slave *sl = dev_to_w1_slave(device); struct w1_master *dev = sl-master; u8 rom[9], crc, verdict, external_power; - int i, max_trying = 10; + int i, ret, max_trying = 10; ssize_t c = PAGE_SIZE; - i = mutex_lock_interruptible(dev-bus_mutex); - if (i != 0) - return i; + ret = mutex_lock_interruptible(dev-bus_mutex); + if (ret != 0) + goto post_unlock; + /* prevent the slave from going away in sleep */ + mutex_lock(THERM_LOCK(sl)); memset(rom, 0, sizeof(rom)); while (max_trying--) { @@ -229,18 +247,19 @@ static ssize_t w1_slave_show(struct device *device, if (external_power) { mutex_unlock(dev-bus_mutex
Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm
On Sun, Mar 01, 2015 at 02:04:53PM +0100, Thorsten Bschorr wrote: > Hi David, > > thanks for your feedback on my first patch, I wasn't aware of checkpatch.pl. > > Initially, I had just if-ed the usage of family-data, which did not > look that nice. I was referring to this proof-of-concept workaround in > my initial bug report. > > The patch I've submitted is different from my proof-of-concept > workaround. Not unlocking the bus before returning clearly is an > error, I did not extensively test this patch. > > > > or just increment it while sleeping, which is when it's needed, which > > also looks simpler. > > > > if (external_power) { > > + int refcnt; > > mutex_unlock(>bus_mutex); > > > > + /* prevent the slave from going away */ > > + atomic_inc(>refcnt); > > sleep_rem = msleep_interruptible(tm); > > + refcnt = w1_unref_slave(sl); > > - if (sleep_rem != 0) > > + if (sleep_rem != 0 || !refcnt) > > return -EINTR; > > > > i = > > mutex_lock_interruptible(>bus_mutex); > > if (i != 0) > > return i; > > } else if (!w1_strong_pullup) { > > > I like this better than my workaround-patch. > > One thought occurred to me when looking at this proposal: wouldn't it > be even better to increase sl->refcnt before unlocking the mutex? > I was asking myself if it is possible that the current thread gets > suspended between mutex_unlock(>bus_mutex); and > atomic_inc(>refcnt); thus leaving another thread the change to > unref the device? > (I'm not that familiar with linux scheduling, so my assumption might be void.) You are correct, it would be a race condition if it doesn't increment the refcnt before unlocking the mutex, and it should get the mutex before unref. Here's an updated version, I haven't even tried to compile it. What do you think Evgeniy? if (external_power) { int refcnt; /* prevent the slave from going away in sleep */ atomic_inc(>refcnt); mutex_unlock(>bus_mutex); sleep_rem = msleep_interruptible(tm); if (sleep_rem != 0) { w1_unref_slave(sl); return -EINTR; } i = mutex_lock_interruptible(>bus_mutex); refcnt = w1_unref_slave(sl); if (i != 0) { /* failed to lock */ return i; } if (!refcnt) /* got lock, but slave went away */ mutex_unlock(>bus_mutex); return -EINTR; } } else if (!w1_strong_pullup) { -- David Fries PGP pub CB1EE8F0 http://fries.net/~david/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm
On Sun, Mar 01, 2015 at 02:04:53PM +0100, Thorsten Bschorr wrote: Hi David, thanks for your feedback on my first patch, I wasn't aware of checkpatch.pl. Initially, I had just if-ed the usage of family-data, which did not look that nice. I was referring to this proof-of-concept workaround in my initial bug report. The patch I've submitted is different from my proof-of-concept workaround. Not unlocking the bus before returning clearly is an error, I did not extensively test this patch. or just increment it while sleeping, which is when it's needed, which also looks simpler. if (external_power) { + int refcnt; mutex_unlock(dev-bus_mutex); + /* prevent the slave from going away */ + atomic_inc(sl-refcnt); sleep_rem = msleep_interruptible(tm); + refcnt = w1_unref_slave(sl); - if (sleep_rem != 0) + if (sleep_rem != 0 || !refcnt) return -EINTR; i = mutex_lock_interruptible(dev-bus_mutex); if (i != 0) return i; } else if (!w1_strong_pullup) { I like this better than my workaround-patch. One thought occurred to me when looking at this proposal: wouldn't it be even better to increase sl-refcnt before unlocking the mutex? I was asking myself if it is possible that the current thread gets suspended between mutex_unlock(dev-bus_mutex); and atomic_inc(sl-refcnt); thus leaving another thread the change to unref the device? (I'm not that familiar with linux scheduling, so my assumption might be void.) You are correct, it would be a race condition if it doesn't increment the refcnt before unlocking the mutex, and it should get the mutex before unref. Here's an updated version, I haven't even tried to compile it. What do you think Evgeniy? if (external_power) { int refcnt; /* prevent the slave from going away in sleep */ atomic_inc(sl-refcnt); mutex_unlock(dev-bus_mutex); sleep_rem = msleep_interruptible(tm); if (sleep_rem != 0) { w1_unref_slave(sl); return -EINTR; } i = mutex_lock_interruptible(dev-bus_mutex); refcnt = w1_unref_slave(sl); if (i != 0) { /* failed to lock */ return i; } if (!refcnt) /* got lock, but slave went away */ mutex_unlock(dev-bus_mutex); return -EINTR; } } else if (!w1_strong_pullup) { -- David Fries da...@fries.netPGP pub CB1EE8F0 http://fries.net/~david/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm
On Sun, Mar 01, 2015 at 04:48:22AM +0300, ??? ??? wrote: > Hi everyone > > 28.02.2015, 23:18, "David Fries" : > > Thanks for preparing the patch, it looks like it will go another > > round, but that happens to everyone. > > Patch itself doesn't solve the problem, it only adds a workaround. > There is a reference counter issue - reading data via sysfs only holds > a reference to device, w1 slave structure will be deleted by w1 search core. > > I believe it only accidentally doesn't crash because of kernel memory > allocator > hasn't yet reclaimed memory. Good point, it does look like that's the case. w1_unref_slave calls w1_family_notify(BUS_NOTIFY_DEL_DEVICE, sl); which calls sl->family->fops->remove_slave(sl); which kfree and sl->family_data = NULL; In that case what do you think about in w1_slave_show after getting bus_mutex, atomic_inc(>refcnt); then add w1_unref_slave(sl); in each return path, or do a goto and consolidate them. or just increment it while sleeping, which is when it's needed, which also looks simpler. if (external_power) { + int refcnt; mutex_unlock(>bus_mutex); + /* prevent the slave from going away */ + atomic_inc(>refcnt); sleep_rem = msleep_interruptible(tm); + refcnt = w1_unref_slave(sl); - if (sleep_rem != 0) + if (sleep_rem != 0 || !refcnt) return -EINTR; i = mutex_lock_interruptible(>bus_mutex); if (i != 0) return i; } else if (!w1_strong_pullup) { -- David Fries PGP pub CB1EE8F0 http://fries.net/~david/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm
Thanks for preparing the patch, it looks like it will go another round, but that happens to everyone. To simplify to show a problem, mutex_lock_interruptible(>bus_mutex); if (!sl->family_data) return -ENODEV; If family_data is NULL, then dev->bus_mutex is left locked. Your earlier e-mails indicated that you weren't seeing any problems after applying this patch, but I would have thought once this early return was encountered, that bus would not be usable any longer because bus_mutex would be locked. I don't know if the bus controller went away, so there is effectively a new bus, because I assume you are still able to read the temperature once this condition triggers? It looks like you need to add mutex_unlock(>bus_mutex); before return -ENODEV; and that's how it is handled elsewhere in the function, but you might want to put a printk before the return, trigger it and verify it is getting hung up on the bus_mutex, it would also be telling if you can remove the wire and the other w1 modules, they should not let you remove them while a mutex is held. The patch needs to be run through the checkpatch script, and it's listing two issues, adding a space after the if (I also prefer not having it, but this is for consistency), and missing Signed-off-by: That would look like this, only with your information, it's a tracing of where the changes came from and agreeing to the copyright, you can read more in Documentation/SubmittingPatches if you haven't already. Signed-off-by: David Fries scripts/checkpatch.pl /tmp/w1_null_patch.patch Thanks, keep up improving things. On Fri, Feb 27, 2015 at 09:43:14AM +0100, Thorsten.Bschorr wrote: > w1_slave_show unlocks the bus while waiting for the sensor > to convert the temperature. If the sensor gets disconnected > during that time, sl->family_data could get NULL. > Note: the w1_slave pointer inside w1_slave_show is protected by > sysfs-mutex > --- > drivers/w1/slaves/w1_therm.c | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/drivers/w1/slaves/w1_therm.c b/drivers/w1/slaves/w1_therm.c > index 1f11a20..3a14cb1 100644 > --- a/drivers/w1/slaves/w1_therm.c > +++ b/drivers/w1/slaves/w1_therm.c > @@ -236,6 +236,12 @@ static ssize_t w1_slave_show(struct device *device, > i = mutex_lock_interruptible(>bus_mutex); > if (i != 0) > return i; > + > + /* Ensure that device is still there - > + * it could have gone while we unlocked the bus. > + * Note: sl is still protected by sysfs-mutex */ > + if(!sl->family_data) > + return -ENODEV; > } else if (!w1_strong_pullup) { > sleep_rem = msleep_interruptible(tm); > if (sleep_rem != 0) { > -- > 1.8.4.5 > > > --- > Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft. > http://www.avast.com > -- David Fries PGP pub CB1EE8F0 http://fries.net/~david/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm
Thanks for preparing the patch, it looks like it will go another round, but that happens to everyone. To simplify to show a problem, mutex_lock_interruptible(dev-bus_mutex); if (!sl-family_data) return -ENODEV; If family_data is NULL, then dev-bus_mutex is left locked. Your earlier e-mails indicated that you weren't seeing any problems after applying this patch, but I would have thought once this early return was encountered, that bus would not be usable any longer because bus_mutex would be locked. I don't know if the bus controller went away, so there is effectively a new bus, because I assume you are still able to read the temperature once this condition triggers? It looks like you need to add mutex_unlock(dev-bus_mutex); before return -ENODEV; and that's how it is handled elsewhere in the function, but you might want to put a printk before the return, trigger it and verify it is getting hung up on the bus_mutex, it would also be telling if you can remove the wire and the other w1 modules, they should not let you remove them while a mutex is held. The patch needs to be run through the checkpatch script, and it's listing two issues, adding a space after the if (I also prefer not having it, but this is for consistency), and missing Signed-off-by: That would look like this, only with your information, it's a tracing of where the changes came from and agreeing to the copyright, you can read more in Documentation/SubmittingPatches if you haven't already. Signed-off-by: David Fries da...@fries.net scripts/checkpatch.pl /tmp/w1_null_patch.patch Thanks, keep up improving things. On Fri, Feb 27, 2015 at 09:43:14AM +0100, Thorsten.Bschorr wrote: w1_slave_show unlocks the bus while waiting for the sensor to convert the temperature. If the sensor gets disconnected during that time, sl-family_data could get NULL. Note: the w1_slave pointer inside w1_slave_show is protected by sysfs-mutex --- drivers/w1/slaves/w1_therm.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/w1/slaves/w1_therm.c b/drivers/w1/slaves/w1_therm.c index 1f11a20..3a14cb1 100644 --- a/drivers/w1/slaves/w1_therm.c +++ b/drivers/w1/slaves/w1_therm.c @@ -236,6 +236,12 @@ static ssize_t w1_slave_show(struct device *device, i = mutex_lock_interruptible(dev-bus_mutex); if (i != 0) return i; + + /* Ensure that device is still there - + * it could have gone while we unlocked the bus. + * Note: sl is still protected by sysfs-mutex */ + if(!sl-family_data) + return -ENODEV; } else if (!w1_strong_pullup) { sleep_rem = msleep_interruptible(tm); if (sleep_rem != 0) { -- 1.8.4.5 --- Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft. http://www.avast.com -- David Fries da...@fries.netPGP pub CB1EE8F0 http://fries.net/~david/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm
On Sun, Mar 01, 2015 at 04:48:22AM +0300, ??? ??? wrote: Hi everyone 28.02.2015, 23:18, David Fries da...@fries.net: Thanks for preparing the patch, it looks like it will go another round, but that happens to everyone. Patch itself doesn't solve the problem, it only adds a workaround. There is a reference counter issue - reading data via sysfs only holds a reference to device, w1 slave structure will be deleted by w1 search core. I believe it only accidentally doesn't crash because of kernel memory allocator hasn't yet reclaimed memory. Good point, it does look like that's the case. w1_unref_slave calls w1_family_notify(BUS_NOTIFY_DEL_DEVICE, sl); which calls sl-family-fops-remove_slave(sl); which kfree and sl-family_data = NULL; In that case what do you think about in w1_slave_show after getting bus_mutex, atomic_inc(sl-refcnt); then add w1_unref_slave(sl); in each return path, or do a goto and consolidate them. or just increment it while sleeping, which is when it's needed, which also looks simpler. if (external_power) { + int refcnt; mutex_unlock(dev-bus_mutex); + /* prevent the slave from going away */ + atomic_inc(sl-refcnt); sleep_rem = msleep_interruptible(tm); + refcnt = w1_unref_slave(sl); - if (sleep_rem != 0) + if (sleep_rem != 0 || !refcnt) return -EINTR; i = mutex_lock_interruptible(dev-bus_mutex); if (i != 0) return i; } else if (!w1_strong_pullup) { -- David Fries da...@fries.netPGP pub CB1EE8F0 http://fries.net/~david/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fwd: w1/slaves/w1_therm: null-ptr access of sl->family_data
Thorsten, Are you planning on submitting a patch? FYI, I load the one wire module with wire search_count=1 in /etc/modules which tells it to search the bus once only. That works for me as devices don't come and go on my bus, and it isn't scanning the bus every few seconds just to find out nothing changed. If you are finding devices vanish even when they aren't maybe it will be useful to you as well, though the bigger problem might be why they are vanishing in the first place. On Mon, Feb 23, 2015 at 06:09:16PM +0100, Thorsten Bschorr wrote: > Hi, > > I have observed a null-pointer access in w1/slaves/w1_therm on my Raspberry > Pi running 3.18.5 with several DS18S20 temperature sensors. I have already > discussed the problem with Evgeniy. > > @Evgeniy & David: sorry for re-sending my message, my email > accidentally contained a HTML part. > > > > w1_therm: w1_slave_show checks if the sensor uses external power. If this > is the case, the mutex on dev->bus_mutex is unlocked while waiting 750 ms > for the sensor to convert the temperature. Before reading the temperature, > the mutex is again locked. > > During this sleep-time, the sensor could get detached, for example by w1.c: > w1_search_process_cb not finding the sensor (*): > !test_bit(W1_SLAVE_ACTIVE, >flags) ==true. > This triggers w1_slave_detach, and hence w1_therm_remove_slave frees and > nulls sl->family_data. > > w1_slave_show does not check if familiy_data is null after re-acquiring the > bus_mutex resulting in a null-ptr access when storing data. > > After I've added checks for family_data!=0, I did not observe any more > crashes; the other data of struct w1_slave seem to be valid as long as any > thread executes w1_slave_show. > > > I have added debug-output to w1.c and w1_therm.c, here's a log of a > potential crash: > > [184199.510227] w1_master_driver w1_bus_master2: w1_search_process_cb, > !W1_SLAVE_ACTIVE, calling w1_slave_detach > [184199.510276] w1_slave_driver 10-000802d9c9e4: w1_slave_detach > destroy_now 1 > [184199.510297] w1_slave_driver 10-000802d9c9e4: w1_unref_slave refcnt 0 > [184199.510321] w1_slave_driver 10-000802d9c9e4: w1_unref_slave: detaching > 10-000802d9c9e4 [d5fb8800]. > [184199.510347] w1_slave_driver 10-000802d9c9e4: w1_unref_slave -> > w1_family_notify > [184199.510365] w1_slave_driver 10-000802d9c9e4: w1_family_notify calling > remove_slave > [184199.510382] w1_slave_driver 10-000802d9c9e4: w1_therm_remove_slave > [184199.510400] w1_slave_driver 10-000802d9c9e4: w1_therm_remove_slave > refcnt -1 > [184200.049745] w1_slave_driver 10-000802d9c9e4: w1_slave_show (after > sleep), family_data==0 > [184200.137133] w1_slave_driver 10-000802d9c9e4: w1_slave_show (before > sleep), family_data==0 > [184200.889551] w1_slave_driver 10-000802d9c9e4: w1_slave_show (after > sleep), family_data==0 > [184200.930866] w1_slave_driver 10-000802d9c9e4: w1_slave_show (after > sleep), family_data==0 > [184200.930907] w1_slave_driver 10-000802d9c9e4: Read failed CRC check > [184200.931002] w1_slave_driver 10-000802d9c9e4: w1_unref_slave -> > device_unregister > [184200.931169] w1 w1_unref_slave -> kfree > > Note: When this crash happened, multiple threads were reading the sensor. > > > I could trigger the problem several times, and each time device_unregister > in w1_unref_slave was executed *after* w1_slave_show. In one case, the > logged time-difference between the first family_data==0 message and > device_unregister was about 8 seconds! > I have not observed a w1_slave_show call after w1_unref_slave (as long as > the device was not re-attached again). > > >From my observation, the w1_slave data seem to be valid as long as > w1_slave_show is executed. > My guess is that the call of sysfs_remove_groups in w1_family_notify hits a > mutex (I did not add debug output here). > > > (*) On my tiny raspberry, this happens from time to time with high CPU and > external disc load (timing and/or electrical issues); it seems that the > sensor does not respond in time to the (periodic) search. > > > Please email me if you need further information. > > > > Best regards, > Thorsten Bschorr > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- David Fries PGP pub CB1EE8F0 http://fries.net/~david/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fwd: w1/slaves/w1_therm: null-ptr access of sl-family_data
Thorsten, Are you planning on submitting a patch? FYI, I load the one wire module with wire search_count=1 in /etc/modules which tells it to search the bus once only. That works for me as devices don't come and go on my bus, and it isn't scanning the bus every few seconds just to find out nothing changed. If you are finding devices vanish even when they aren't maybe it will be useful to you as well, though the bigger problem might be why they are vanishing in the first place. On Mon, Feb 23, 2015 at 06:09:16PM +0100, Thorsten Bschorr wrote: Hi, I have observed a null-pointer access in w1/slaves/w1_therm on my Raspberry Pi running 3.18.5 with several DS18S20 temperature sensors. I have already discussed the problem with Evgeniy. @Evgeniy David: sorry for re-sending my message, my email accidentally contained a HTML part. w1_therm: w1_slave_show checks if the sensor uses external power. If this is the case, the mutex on dev-bus_mutex is unlocked while waiting 750 ms for the sensor to convert the temperature. Before reading the temperature, the mutex is again locked. During this sleep-time, the sensor could get detached, for example by w1.c: w1_search_process_cb not finding the sensor (*): !test_bit(W1_SLAVE_ACTIVE, sl-flags) ==true. This triggers w1_slave_detach, and hence w1_therm_remove_slave frees and nulls sl-family_data. w1_slave_show does not check if familiy_data is null after re-acquiring the bus_mutex resulting in a null-ptr access when storing data. After I've added checks for family_data!=0, I did not observe any more crashes; the other data of struct w1_slave seem to be valid as long as any thread executes w1_slave_show. I have added debug-output to w1.c and w1_therm.c, here's a log of a potential crash: [184199.510227] w1_master_driver w1_bus_master2: w1_search_process_cb, !W1_SLAVE_ACTIVE, calling w1_slave_detach [184199.510276] w1_slave_driver 10-000802d9c9e4: w1_slave_detach destroy_now 1 [184199.510297] w1_slave_driver 10-000802d9c9e4: w1_unref_slave refcnt 0 [184199.510321] w1_slave_driver 10-000802d9c9e4: w1_unref_slave: detaching 10-000802d9c9e4 [d5fb8800]. [184199.510347] w1_slave_driver 10-000802d9c9e4: w1_unref_slave - w1_family_notify [184199.510365] w1_slave_driver 10-000802d9c9e4: w1_family_notify calling remove_slave [184199.510382] w1_slave_driver 10-000802d9c9e4: w1_therm_remove_slave [184199.510400] w1_slave_driver 10-000802d9c9e4: w1_therm_remove_slave refcnt -1 [184200.049745] w1_slave_driver 10-000802d9c9e4: w1_slave_show (after sleep), family_data==0 [184200.137133] w1_slave_driver 10-000802d9c9e4: w1_slave_show (before sleep), family_data==0 [184200.889551] w1_slave_driver 10-000802d9c9e4: w1_slave_show (after sleep), family_data==0 [184200.930866] w1_slave_driver 10-000802d9c9e4: w1_slave_show (after sleep), family_data==0 [184200.930907] w1_slave_driver 10-000802d9c9e4: Read failed CRC check [184200.931002] w1_slave_driver 10-000802d9c9e4: w1_unref_slave - device_unregister [184200.931169] w1 w1_unref_slave - kfree Note: When this crash happened, multiple threads were reading the sensor. I could trigger the problem several times, and each time device_unregister in w1_unref_slave was executed *after* w1_slave_show. In one case, the logged time-difference between the first family_data==0 message and device_unregister was about 8 seconds! I have not observed a w1_slave_show call after w1_unref_slave (as long as the device was not re-attached again). From my observation, the w1_slave data seem to be valid as long as w1_slave_show is executed. My guess is that the call of sysfs_remove_groups in w1_family_notify hits a mutex (I did not add debug output here). (*) On my tiny raspberry, this happens from time to time with high CPU and external disc load (timing and/or electrical issues); it seems that the sensor does not respond in time to the (periodic) search. Please email me if you need further information. Best regards, Thorsten Bschorr -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- David Fries da...@fries.netPGP pub CB1EE8F0 http://fries.net/~david/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] w1: slaves: w1_therm: Add sysfs entry for current temperature
On Tue, Jan 06, 2015 at 05:50:32PM +0100, Richard Weinberger wrote: > Am 06.01.2015 um 17:12 schrieb Mariusz Gorski: > > On Tue, Jan 06, 2015 at 04:01:47PM +0100, Richard Weinberger wrote: > >> On Tue, Jan 6, 2015 at 3:29 PM, Mariusz Gorski > >> wrote: > >>> DS18B20 and it's brothers are pretty popular in the RaspberryPi world > >>> when it comes to temperature measurement. All tutorials on the Internet > >>> use the same way of parsing the output of the w1_slave sysfs file. > >>> These patches add a dedicated sysfs entry called 'temp' whose only job > >>> is to output the current temperature. > >> > >> And what is the benefit of this patches? > > > > Well, instead of having to parse the output of w1_slave: > > > > $ cat w1_slave > > 4d 01 55 00 7f ff 0c 10 fd : crc=fd YES > > 4d 01 55 00 7f ff 0c 10 fd t=20812 > > > > the userspace program gets only the interesting information, > > which usually is the current temparture: > > A sane user would also check the CRC. Yes, and more because all 00's will pass a checksum, but be an invalid reading, because there will be some bit in there outside of the two temperature return bytes that won't be 0, hence the temperature field can't be trusted. All ff's will fail a checksum, but is useful to know because that pretty much means the sensor is no longer there. > IMHO it is up to the user to format the output for its needs. > Some may want "20812", some "20.8", some else maybe "20.8°C". > Let it up to the user. > Any trivial awk/cut/etc... oneliner would do it. I'm not fond of the format of w1_slave printing two sensor readings and such, it's confusing. Back in the day (2008) I at least standardized the t= converted value to millidegrees C. That was because DS18B20 gave degrees C where DS18S20 gave millidegrees C, so it's been worse. > Thanks, > //richard -- David Fries PGP pub CB1EE8F0 http://fries.net/~david/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] w1: slaves: w1_therm: Add temp attribute
On Tue, Jan 06, 2015 at 03:29:56PM +0100, Mariusz Gorski wrote: > Add new attribute to simplify reading of current temperature. > +static ssize_t temp_show(struct device *device, > + struct device_attribute *attr, char *buf) > +{ > + struct w1_slave *sl = dev_to_w1_slave(device); > + struct w1_master *dev = sl->master; > + u8 rom[9], verdict; > + int temp; > + > + if (mutex_lock_interruptible(>bus_mutex)) > + return -EINTR; > + > + memset(rom, 0, sizeof(rom)); > + > + verdict = read_rom(device, rom); > + > + if (verdict < 0) > + return verdict; I wanted to point out that this returns without unlocking bus_mutex. -- David Fries PGP pub CB1EE8F0 http://fries.net/~david/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] w1: slaves: w1_therm: Add sysfs entry for current temperature
On Tue, Jan 06, 2015 at 03:29:54PM +0100, Mariusz Gorski wrote: > DS18B20 and it's brothers are pretty popular in the RaspberryPi world > when it comes to temperature measurement. All tutorials on the Internet > use the same way of parsing the output of the w1_slave sysfs file. > These patches add a dedicated sysfs entry called 'temp' whose only job > is to output the current temperature. > > What could be improved here is the way how dev->bus_mutex is locked/unlocked, > which is split right now between two functions - maybe it would be better > to move all dev->bus_mutex locking into read_rom() and introduce another > mutex for protecting the slave's structures (e.g. family_data). As to adding another mutex, there are enough little mutexes around the one wire system that it already feels like a mess and I tried to get all the race conditions that I could. I think you don't understand what the bus_mutex protects. It protects other devices from accessing the one wire bus. The way you coded it, temp_show will allow only one temperature sensor to be accessed and read at a time, considering it takes 750 ms that's holding up the bus for a long time, and depending on how the devices are connected, isn't needed. If the temperature sensor is parasite powered, nothing else can use the bus while it is converting, if it has dedicated power, other slaves can be accessed at the same time. I second using a standard sensor attribute if there is one appropriate rather than here. I didn't in my application, but then I wasn't adding a new method to get the same data, I was using the existing netlink method. If it does get added to a standard sensor access method, I do request that it be optional (like w1_therm which I don't use), and not sit there polling the sensor for the current temperature every X seconds when no one is using the data (OpenBSD does this, or at least last I saw, and you have no idea how old the sensor data reading is or have any control over how often it will poll it). I have 15 DS18B20 temperature sensors scattered around the house on a single one wire network. I don't use the w1_therm system, I talk through netlink to the kernel. This has the advantage of being an asynchronous socket based interface so I can write a packet to request each sensor to do a conversion, get the write command status, and then after a timeout, write another to get a return, then read back the data. Using the sysfs means blocking while the read is in progress. Reading 15 means either taking at least 11.250 seconds as you read them serially, or having 15 threads blocked on one read each (provided they don't block the other readers). Thanks for Cc'ing me on this. > Mariusz Gorski (2): > w1: slaves: w1_therm: Extract read_rom function > w1: slaves: w1_therm: Add temp attribute > > drivers/w1/slaves/w1_therm.c | 83 > ++-- > 1 file changed, 65 insertions(+), 18 deletions(-) > > -- > 2.2.1 -- David Fries PGP pub CB1EE8F0 http://fries.net/~david/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] w1: slaves: w1_therm: Add sysfs entry for current temperature
On Tue, Jan 06, 2015 at 03:29:54PM +0100, Mariusz Gorski wrote: DS18B20 and it's brothers are pretty popular in the RaspberryPi world when it comes to temperature measurement. All tutorials on the Internet use the same way of parsing the output of the w1_slave sysfs file. These patches add a dedicated sysfs entry called 'temp' whose only job is to output the current temperature. What could be improved here is the way how dev-bus_mutex is locked/unlocked, which is split right now between two functions - maybe it would be better to move all dev-bus_mutex locking into read_rom() and introduce another mutex for protecting the slave's structures (e.g. family_data). As to adding another mutex, there are enough little mutexes around the one wire system that it already feels like a mess and I tried to get all the race conditions that I could. I think you don't understand what the bus_mutex protects. It protects other devices from accessing the one wire bus. The way you coded it, temp_show will allow only one temperature sensor to be accessed and read at a time, considering it takes 750 ms that's holding up the bus for a long time, and depending on how the devices are connected, isn't needed. If the temperature sensor is parasite powered, nothing else can use the bus while it is converting, if it has dedicated power, other slaves can be accessed at the same time. I second using a standard sensor attribute if there is one appropriate rather than here. I didn't in my application, but then I wasn't adding a new method to get the same data, I was using the existing netlink method. If it does get added to a standard sensor access method, I do request that it be optional (like w1_therm which I don't use), and not sit there polling the sensor for the current temperature every X seconds when no one is using the data (OpenBSD does this, or at least last I saw, and you have no idea how old the sensor data reading is or have any control over how often it will poll it). I have 15 DS18B20 temperature sensors scattered around the house on a single one wire network. I don't use the w1_therm system, I talk through netlink to the kernel. This has the advantage of being an asynchronous socket based interface so I can write a packet to request each sensor to do a conversion, get the write command status, and then after a timeout, write another to get a return, then read back the data. Using the sysfs means blocking while the read is in progress. Reading 15 means either taking at least 11.250 seconds as you read them serially, or having 15 threads blocked on one read each (provided they don't block the other readers). Thanks for Cc'ing me on this. Mariusz Gorski (2): w1: slaves: w1_therm: Extract read_rom function w1: slaves: w1_therm: Add temp attribute drivers/w1/slaves/w1_therm.c | 83 ++-- 1 file changed, 65 insertions(+), 18 deletions(-) -- 2.2.1 -- David Fries da...@fries.netPGP pub CB1EE8F0 http://fries.net/~david/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] w1: slaves: w1_therm: Add sysfs entry for current temperature
On Tue, Jan 06, 2015 at 05:50:32PM +0100, Richard Weinberger wrote: Am 06.01.2015 um 17:12 schrieb Mariusz Gorski: On Tue, Jan 06, 2015 at 04:01:47PM +0100, Richard Weinberger wrote: On Tue, Jan 6, 2015 at 3:29 PM, Mariusz Gorski marius.gor...@gmail.com wrote: DS18B20 and it's brothers are pretty popular in the RaspberryPi world when it comes to temperature measurement. All tutorials on the Internet use the same way of parsing the output of the w1_slave sysfs file. These patches add a dedicated sysfs entry called 'temp' whose only job is to output the current temperature. And what is the benefit of this patches? Well, instead of having to parse the output of w1_slave: $ cat w1_slave 4d 01 55 00 7f ff 0c 10 fd : crc=fd YES 4d 01 55 00 7f ff 0c 10 fd t=20812 the userspace program gets only the interesting information, which usually is the current temparture: A sane user would also check the CRC. Yes, and more because all 00's will pass a checksum, but be an invalid reading, because there will be some bit in there outside of the two temperature return bytes that won't be 0, hence the temperature field can't be trusted. All ff's will fail a checksum, but is useful to know because that pretty much means the sensor is no longer there. IMHO it is up to the user to format the output for its needs. Some may want 20812, some 20.8, some else maybe 20.8°C. Let it up to the user. Any trivial awk/cut/etc... oneliner would do it. I'm not fond of the format of w1_slave printing two sensor readings and such, it's confusing. Back in the day (2008) I at least standardized the t= converted value to millidegrees C. That was because DS18B20 gave degrees C where DS18S20 gave millidegrees C, so it's been worse. Thanks, //richard -- David Fries da...@fries.netPGP pub CB1EE8F0 http://fries.net/~david/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] w1: slaves: w1_therm: Add temp attribute
On Tue, Jan 06, 2015 at 03:29:56PM +0100, Mariusz Gorski wrote: Add new attribute to simplify reading of current temperature. +static ssize_t temp_show(struct device *device, + struct device_attribute *attr, char *buf) +{ + struct w1_slave *sl = dev_to_w1_slave(device); + struct w1_master *dev = sl-master; + u8 rom[9], verdict; + int temp; + + if (mutex_lock_interruptible(dev-bus_mutex)) + return -EINTR; + + memset(rom, 0, sizeof(rom)); + + verdict = read_rom(device, rom); + + if (verdict 0) + return verdict; I wanted to point out that this returns without unlocking bus_mutex. -- David Fries da...@fries.netPGP pub CB1EE8F0 http://fries.net/~david/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] w1: avoid potential u16 overflow
Reported-by: Dan Carpenter Acked-by: Evgeniy Polyakov Signed-off-by: David Fries --- drivers/w1/w1_netlink.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/w1/w1_netlink.c b/drivers/w1/w1_netlink.c index dd96562..881597a 100644 --- a/drivers/w1/w1_netlink.c +++ b/drivers/w1/w1_netlink.c @@ -598,7 +598,7 @@ static void w1_cn_callback(struct cn_msg *cn, struct netlink_skb_parms *nsp) msg = (struct w1_netlink_msg *)(cn + 1); if (node_count) { int size; - u16 reply_size = sizeof(*cn) + cn->len + slave_len; + int reply_size = sizeof(*cn) + cn->len + slave_len; if (cn->flags & W1_CN_BUNDLE) { /* bundling duplicats some of the messages */ reply_size += 2 * cmd_count * (sizeof(struct cn_msg) + -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] cn: verify msg->len before making callback
The struct cn_msg len field comes from userspace and needs to be validated. More logical to do so here where the cn_msg pointer is pulled out of the sk_buff than the callback which is passed cn_msg * and might assume no validation is needed. Reported-by: Dan Carpenter Acked-by: Evgeniy Polyakov Signed-off-by: David Fries --- drivers/connector/connector.c |6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c index f612d68..30f5228 100644 --- a/drivers/connector/connector.c +++ b/drivers/connector/connector.c @@ -141,12 +141,18 @@ EXPORT_SYMBOL_GPL(cn_netlink_send); */ static int cn_call_callback(struct sk_buff *skb) { + struct nlmsghdr *nlh; struct cn_callback_entry *i, *cbq = NULL; struct cn_dev *dev = struct cn_msg *msg = nlmsg_data(nlmsg_hdr(skb)); struct netlink_skb_parms *nsp = _CB(skb); int err = -ENODEV; + /* verify msg->len is within skb */ + nlh = nlmsg_hdr(skb); + if (nlh->nlmsg_len < NLMSG_HDRLEN + sizeof(struct cn_msg) + msg->len) + return -EINVAL; + spin_lock_bh(>cbdev->queue_lock); list_for_each_entry(i, >cbdev->queue_list, callback_entry) { if (cn_cb_equal(>id.id, >id)) { -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATHC 0/2] cn: w1: buffer size checks
Greg Kroah-Hartman, These issues were found by Dan Carpenter with a static checker and will check message buffer lengths from userspace or avoid length overflows. Evgeniy Polyakov has given his ack, and they can be applied to the stable branch as well. [PATCH 1/2] cn: verify msg->len before making callback [PATCH 2/2] w1: avoid potential u16 overflow -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATHC 0/2] cn: w1: buffer size checks
Greg Kroah-Hartman, These issues were found by Dan Carpenter with a static checker and will check message buffer lengths from userspace or avoid length overflows. Evgeniy Polyakov has given his ack, and they can be applied to the stable branch as well. [PATCH 1/2] cn: verify msg-len before making callback [PATCH 2/2] w1: avoid potential u16 overflow -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] cn: verify msg-len before making callback
The struct cn_msg len field comes from userspace and needs to be validated. More logical to do so here where the cn_msg pointer is pulled out of the sk_buff than the callback which is passed cn_msg * and might assume no validation is needed. Reported-by: Dan Carpenter dan.carpen...@oracle.com Acked-by: Evgeniy Polyakov z...@ioremap.net Signed-off-by: David Fries da...@fries.net --- drivers/connector/connector.c |6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c index f612d68..30f5228 100644 --- a/drivers/connector/connector.c +++ b/drivers/connector/connector.c @@ -141,12 +141,18 @@ EXPORT_SYMBOL_GPL(cn_netlink_send); */ static int cn_call_callback(struct sk_buff *skb) { + struct nlmsghdr *nlh; struct cn_callback_entry *i, *cbq = NULL; struct cn_dev *dev = cdev; struct cn_msg *msg = nlmsg_data(nlmsg_hdr(skb)); struct netlink_skb_parms *nsp = NETLINK_CB(skb); int err = -ENODEV; + /* verify msg-len is within skb */ + nlh = nlmsg_hdr(skb); + if (nlh-nlmsg_len NLMSG_HDRLEN + sizeof(struct cn_msg) + msg-len) + return -EINVAL; + spin_lock_bh(dev-cbdev-queue_lock); list_for_each_entry(i, dev-cbdev-queue_list, callback_entry) { if (cn_cb_equal(i-id.id, msg-id)) { -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] w1: avoid potential u16 overflow
Reported-by: Dan Carpenter dan.carpen...@oracle.com Acked-by: Evgeniy Polyakov z...@ioremap.net Signed-off-by: David Fries da...@fries.net --- drivers/w1/w1_netlink.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/w1/w1_netlink.c b/drivers/w1/w1_netlink.c index dd96562..881597a 100644 --- a/drivers/w1/w1_netlink.c +++ b/drivers/w1/w1_netlink.c @@ -598,7 +598,7 @@ static void w1_cn_callback(struct cn_msg *cn, struct netlink_skb_parms *nsp) msg = (struct w1_netlink_msg *)(cn + 1); if (node_count) { int size; - u16 reply_size = sizeof(*cn) + cn-len + slave_len; + int reply_size = sizeof(*cn) + cn-len + slave_len; if (cn-flags W1_CN_BUNDLE) { /* bundling duplicats some of the messages */ reply_size += 2 * cmd_count * (sizeof(struct cn_msg) + -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] cn: verify msg->len before making callback
The struct cn_msg len field comes from userspace and needs to be validated. More logical to do so here where the cn_msg pointer is pulled out of the sk_buff than the callback which is passed cn_msg * and might assume no validation is needed. Reported-by: Dan Carpenter Signed-off-by: David Fries --- drivers/connector/connector.c |6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c index f612d68..30f5228 100644 --- a/drivers/connector/connector.c +++ b/drivers/connector/connector.c @@ -141,12 +141,18 @@ EXPORT_SYMBOL_GPL(cn_netlink_send); */ static int cn_call_callback(struct sk_buff *skb) { + struct nlmsghdr *nlh; struct cn_callback_entry *i, *cbq = NULL; struct cn_dev *dev = struct cn_msg *msg = nlmsg_data(nlmsg_hdr(skb)); struct netlink_skb_parms *nsp = _CB(skb); int err = -ENODEV; + /* verify msg->len is within skb */ + nlh = nlmsg_hdr(skb); + if (nlh->nlmsg_len < NLMSG_HDRLEN + sizeof(struct cn_msg) + msg->len) + return -EINVAL; + spin_lock_bh(>cbdev->queue_lock); list_for_each_entry(i, >cbdev->queue_list, callback_entry) { if (cn_cb_equal(>id.id, >id)) { -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] w1: avoid potential u16 overflow
Reported-by: Dan Carpenter Signed-off-by: David Fries --- drivers/w1/w1_netlink.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/w1/w1_netlink.c b/drivers/w1/w1_netlink.c index dd96562..881597a 100644 --- a/drivers/w1/w1_netlink.c +++ b/drivers/w1/w1_netlink.c @@ -598,7 +598,7 @@ static void w1_cn_callback(struct cn_msg *cn, struct netlink_skb_parms *nsp) msg = (struct w1_netlink_msg *)(cn + 1); if (node_count) { int size; - u16 reply_size = sizeof(*cn) + cn->len + slave_len; + int reply_size = sizeof(*cn) + cn->len + slave_len; if (cn->flags & W1_CN_BUNDLE) { /* bundling duplicats some of the messages */ reply_size += 2 * cmd_count * (sizeof(struct cn_msg) + -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATHC 0/2] cn: w1: buffer size checks
These issues were found by Dan Carpenter with a static checker. Evgeniy are you okay with these? [PATCH 1/2] cn: verify msg->len before making callback [PATCH 2/2] w1: avoid potential u16 overflow -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] w1: avoid potential u16 overflow
Reported-by: Dan Carpenter dan.carpen...@oracle.com Signed-off-by: David Fries da...@fries.net --- drivers/w1/w1_netlink.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/w1/w1_netlink.c b/drivers/w1/w1_netlink.c index dd96562..881597a 100644 --- a/drivers/w1/w1_netlink.c +++ b/drivers/w1/w1_netlink.c @@ -598,7 +598,7 @@ static void w1_cn_callback(struct cn_msg *cn, struct netlink_skb_parms *nsp) msg = (struct w1_netlink_msg *)(cn + 1); if (node_count) { int size; - u16 reply_size = sizeof(*cn) + cn-len + slave_len; + int reply_size = sizeof(*cn) + cn-len + slave_len; if (cn-flags W1_CN_BUNDLE) { /* bundling duplicats some of the messages */ reply_size += 2 * cmd_count * (sizeof(struct cn_msg) + -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATHC 0/2] cn: w1: buffer size checks
These issues were found by Dan Carpenter with a static checker. Evgeniy are you okay with these? [PATCH 1/2] cn: verify msg-len before making callback [PATCH 2/2] w1: avoid potential u16 overflow -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] cn: verify msg-len before making callback
The struct cn_msg len field comes from userspace and needs to be validated. More logical to do so here where the cn_msg pointer is pulled out of the sk_buff than the callback which is passed cn_msg * and might assume no validation is needed. Reported-by: Dan Carpenter dan.carpen...@oracle.com Signed-off-by: David Fries da...@fries.net --- drivers/connector/connector.c |6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c index f612d68..30f5228 100644 --- a/drivers/connector/connector.c +++ b/drivers/connector/connector.c @@ -141,12 +141,18 @@ EXPORT_SYMBOL_GPL(cn_netlink_send); */ static int cn_call_callback(struct sk_buff *skb) { + struct nlmsghdr *nlh; struct cn_callback_entry *i, *cbq = NULL; struct cn_dev *dev = cdev; struct cn_msg *msg = nlmsg_data(nlmsg_hdr(skb)); struct netlink_skb_parms *nsp = NETLINK_CB(skb); int err = -ENODEV; + /* verify msg-len is within skb */ + nlh = nlmsg_hdr(skb); + if (nlh-nlmsg_len NLMSG_HDRLEN + sizeof(struct cn_msg) + msg-len) + return -EINVAL; + spin_lock_bh(dev-cbdev-queue_lock); list_for_each_entry(i, dev-cbdev-queue_list, callback_entry) { if (cn_cb_equal(i-id.id, msg-id)) { -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] w1: do not unlock unheld list_mutex in __w1_remove_master_device()
Acked-by: David Fries On Wed, May 07, 2014 at 01:26:04AM +0400, Alexey Khoroshilov wrote: > w1_process_callbacks() expects to be called with dev->list_mutex held, > but it is the fact only in w1_process(). __w1_remove_master_device() > calls w1_process_callbacks() after it releases list_mutex. > > The patch fixes __w1_remove_master_device() to acquire list_mutex > for w1_process_callbacks(). > > Found by Linux Driver Verification project (linuxtesting.org). > > Signed-off-by: Alexey Khoroshilov > --- > drivers/w1/w1.c | 2 ++ > drivers/w1/w1_int.c | 4 > 2 files changed, 6 insertions(+) > > diff --git a/drivers/w1/w1.c b/drivers/w1/w1.c > index ff52618cafbe..5d7341520544 100644 > --- a/drivers/w1/w1.c > +++ b/drivers/w1/w1.c > @@ -1078,6 +1078,8 @@ static void w1_search_process(struct w1_master *dev, u8 > search_type) > * w1_process_callbacks() - execute each dev->async_list callback entry > * @dev: w1_master device > * > + * The w1 master list_mutex must be held. > + * > * Return: 1 if there were commands to executed 0 otherwise > */ > int w1_process_callbacks(struct w1_master *dev) > diff --git a/drivers/w1/w1_int.c b/drivers/w1/w1_int.c > index 9b084db739c7..728039d2efe1 100644 > --- a/drivers/w1/w1_int.c > +++ b/drivers/w1/w1_int.c > @@ -219,9 +219,13 @@ void __w1_remove_master_device(struct w1_master *dev) > > if (msleep_interruptible(1000)) > flush_signals(current); > + mutex_lock(>list_mutex); > w1_process_callbacks(dev); > + mutex_unlock(>list_mutex); > } > + mutex_lock(>list_mutex); > w1_process_callbacks(dev); > + mutex_unlock(>list_mutex); > > memset(, 0, sizeof(msg)); > msg.id.mst.id = dev->id; > -- > 1.8.3.2 -- David Fries PGP pub CB1EE8F0 http://fries.net/~david/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] w1: do not unlock unheld list_mutex in __w1_remove_master_device()
On Wed, May 07, 2014 at 01:42:12AM +0400, Alexey Khoroshilov wrote: > On 01.05.2014 07:48, David Fries wrote: > > On Thu, May 01, 2014 at 12:37:58AM +0400, Alexey Khoroshilov wrote: > >> w1_process_callbacks() expects to be called with dev->list_mutex held, > >> but it is the fact only in w1_process(). __w1_remove_master_device() > >> calls w1_process_callbacks() after it releases list_mutex. > >> > >> The patch fixes __w1_remove_master_device() to acquire list_mutex > >> for w1_process_callbacks(). It is implemented by moving > >> w1_process_callbacks() functionality to __w1_process_callbacks() > >> and adding a wrapper with the old name. > > Good catch. I guess as it was in the shutdown path it failed the > > unlock silently. > > > > I would prefer a different fix. If w1_process_callbacks was more of a > > public facing API called from multiple locations where the caller > > doesn't have access to the locks, something like this would probably > > be the way to go. This is called from two areas of the code, and not > > likely to be called from any new areas in the future. > > > > Adding a documentation comment is good. I would be tempted in > > __w1_remove_master_device to move the dev->list_mutex down after the > > while loop, but I can't be sure that whatever has a refcnt wouldn't > > need list_mutex. The last w1_process_callbacks after the while loop > > shouldn't be needed I wouldn't think, I was just being defensive. > I do not understand all the details, but I am not sure. > refcnt can became 0, but that does not mean all callbacks have been > processed because they can be added before refcnt decrement but after > previous w1_process_callbacks invocation. > > By > > the time it completes the while loop the reference count is 0 so there > > shouldn't be anything left for it to process and if there is, it's a > > race condition and game over anyway. So just sandwich > > w1_process_callbacks with the lock/unlock in > > __w1_remove_master_device please. > Done. Thanks for finding and fixing this bug. > Another suspicious question for me is how safe is it to call > wake_up_process(dev->thread); > after > kthread_stop(dev->thread); > ? Looks bad, but I would need to convince myself it could actually be a problem. int wake_up_process(struct task_struct *p) { WARN_ON(task_is_stopped_or_traced(p)); return try_to_wake_up(p, TASK_NORMAL, 0); } But I think there are lower hanging fruit, like revisiting all the locks in all the code and testing with the lock debugging code on and get it all to pass or document that it really is correct. Currently it is detecting a locking order problem, but both code paths can only be called from the w1_process function, which can only be called by one thread, and one thread can't dead lock itself with a locking order problem. > >> Found by Linux Driver Verification project (linuxtesting.org). > > Was this found with code inspection or hardware testing with lock > > debugging enabled? > It was found by static verification tools developed within Linux Driver > Verification project. > > > Best regards, > Alexey -- David Fries PGP pub CB1EE8F0 http://fries.net/~david/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] w1: do not unlock unheld list_mutex in __w1_remove_master_device()
On Wed, May 07, 2014 at 01:42:12AM +0400, Alexey Khoroshilov wrote: On 01.05.2014 07:48, David Fries wrote: On Thu, May 01, 2014 at 12:37:58AM +0400, Alexey Khoroshilov wrote: w1_process_callbacks() expects to be called with dev-list_mutex held, but it is the fact only in w1_process(). __w1_remove_master_device() calls w1_process_callbacks() after it releases list_mutex. The patch fixes __w1_remove_master_device() to acquire list_mutex for w1_process_callbacks(). It is implemented by moving w1_process_callbacks() functionality to __w1_process_callbacks() and adding a wrapper with the old name. Good catch. I guess as it was in the shutdown path it failed the unlock silently. I would prefer a different fix. If w1_process_callbacks was more of a public facing API called from multiple locations where the caller doesn't have access to the locks, something like this would probably be the way to go. This is called from two areas of the code, and not likely to be called from any new areas in the future. Adding a documentation comment is good. I would be tempted in __w1_remove_master_device to move the dev-list_mutex down after the while loop, but I can't be sure that whatever has a refcnt wouldn't need list_mutex. The last w1_process_callbacks after the while loop shouldn't be needed I wouldn't think, I was just being defensive. I do not understand all the details, but I am not sure. refcnt can became 0, but that does not mean all callbacks have been processed because they can be added before refcnt decrement but after previous w1_process_callbacks invocation. By the time it completes the while loop the reference count is 0 so there shouldn't be anything left for it to process and if there is, it's a race condition and game over anyway. So just sandwich w1_process_callbacks with the lock/unlock in __w1_remove_master_device please. Done. Thanks for finding and fixing this bug. Another suspicious question for me is how safe is it to call wake_up_process(dev-thread); after kthread_stop(dev-thread); ? Looks bad, but I would need to convince myself it could actually be a problem. int wake_up_process(struct task_struct *p) { WARN_ON(task_is_stopped_or_traced(p)); return try_to_wake_up(p, TASK_NORMAL, 0); } But I think there are lower hanging fruit, like revisiting all the locks in all the code and testing with the lock debugging code on and get it all to pass or document that it really is correct. Currently it is detecting a locking order problem, but both code paths can only be called from the w1_process function, which can only be called by one thread, and one thread can't dead lock itself with a locking order problem. Found by Linux Driver Verification project (linuxtesting.org). Was this found with code inspection or hardware testing with lock debugging enabled? It was found by static verification tools developed within Linux Driver Verification project. Best regards, Alexey -- David Fries da...@fries.netPGP pub CB1EE8F0 http://fries.net/~david/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] w1: do not unlock unheld list_mutex in __w1_remove_master_device()
Acked-by: David Fries da...@fries.net On Wed, May 07, 2014 at 01:26:04AM +0400, Alexey Khoroshilov wrote: w1_process_callbacks() expects to be called with dev-list_mutex held, but it is the fact only in w1_process(). __w1_remove_master_device() calls w1_process_callbacks() after it releases list_mutex. The patch fixes __w1_remove_master_device() to acquire list_mutex for w1_process_callbacks(). Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov khoroshi...@ispras.ru --- drivers/w1/w1.c | 2 ++ drivers/w1/w1_int.c | 4 2 files changed, 6 insertions(+) diff --git a/drivers/w1/w1.c b/drivers/w1/w1.c index ff52618cafbe..5d7341520544 100644 --- a/drivers/w1/w1.c +++ b/drivers/w1/w1.c @@ -1078,6 +1078,8 @@ static void w1_search_process(struct w1_master *dev, u8 search_type) * w1_process_callbacks() - execute each dev-async_list callback entry * @dev: w1_master device * + * The w1 master list_mutex must be held. + * * Return: 1 if there were commands to executed 0 otherwise */ int w1_process_callbacks(struct w1_master *dev) diff --git a/drivers/w1/w1_int.c b/drivers/w1/w1_int.c index 9b084db739c7..728039d2efe1 100644 --- a/drivers/w1/w1_int.c +++ b/drivers/w1/w1_int.c @@ -219,9 +219,13 @@ void __w1_remove_master_device(struct w1_master *dev) if (msleep_interruptible(1000)) flush_signals(current); + mutex_lock(dev-list_mutex); w1_process_callbacks(dev); + mutex_unlock(dev-list_mutex); } + mutex_lock(dev-list_mutex); w1_process_callbacks(dev); + mutex_unlock(dev-list_mutex); memset(msg, 0, sizeof(msg)); msg.id.mst.id = dev-id; -- 1.8.3.2 -- David Fries da...@fries.netPGP pub CB1EE8F0 http://fries.net/~david/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] w1: do not unlock unheld list_mutex in __w1_remove_master_device()
On Thu, May 01, 2014 at 12:37:58AM +0400, Alexey Khoroshilov wrote: > w1_process_callbacks() expects to be called with dev->list_mutex held, > but it is the fact only in w1_process(). __w1_remove_master_device() > calls w1_process_callbacks() after it releases list_mutex. > > The patch fixes __w1_remove_master_device() to acquire list_mutex > for w1_process_callbacks(). It is implemented by moving > w1_process_callbacks() functionality to __w1_process_callbacks() > and adding a wrapper with the old name. Good catch. I guess as it was in the shutdown path it failed the unlock silently. I would prefer a different fix. If w1_process_callbacks was more of a public facing API called from multiple locations where the caller doesn't have access to the locks, something like this would probably be the way to go. This is called from two areas of the code, and not likely to be called from any new areas in the future. Adding a documentation comment is good. I would be tempted in __w1_remove_master_device to move the dev->list_mutex down after the while loop, but I can't be sure that whatever has a refcnt wouldn't need list_mutex. The last w1_process_callbacks after the while loop shouldn't be needed I wouldn't think, I was just being defensive. By the time it completes the while loop the reference count is 0 so there shouldn't be anything left for it to process and if there is, it's a race condition and game over anyway. So just sandwich w1_process_callbacks with the lock/unlock in __w1_remove_master_device please. > Found by Linux Driver Verification project (linuxtesting.org). Was this found with code inspection or hardware testing with lock debugging enabled? > Signed-off-by: Alexey Khoroshilov > --- > drivers/w1/w1.c | 8 +--- > drivers/w1/w1.h | 13 - > 2 files changed, 17 insertions(+), 4 deletions(-) > > diff --git a/drivers/w1/w1.c b/drivers/w1/w1.c > index ff52618cafbe..e507c51512aa 100644 > --- a/drivers/w1/w1.c > +++ b/drivers/w1/w1.c > @@ -1075,12 +1075,14 @@ static void w1_search_process(struct w1_master *dev, > u8 search_type) > } > > /** > - * w1_process_callbacks() - execute each dev->async_list callback entry > + * __w1_process_callbacks() - execute each dev->async_list callback entry > * @dev: w1_master device > * > + * The w1 master list_mutex must be held. > + * > * Return: 1 if there were commands to executed 0 otherwise > */ > -int w1_process_callbacks(struct w1_master *dev) > +int __w1_process_callbacks(struct w1_master *dev) > { > int ret = 0; > struct w1_async_cmd *async_cmd, *async_n; > @@ -1122,7 +1124,7 @@ int w1_process(void *data) > /* Note, w1_process_callback drops the lock while processing, >* but locks it again before returning. >*/ > - if (!w1_process_callbacks(dev) && jremain) { > + if (!__w1_process_callbacks(dev) && jremain) { > /* a wake up is either to stop the thread, process >* callbacks, or search, it isn't process callbacks, so >* schedule a search. > diff --git a/drivers/w1/w1.h b/drivers/w1/w1.h > index 734dab7fc687..e357321a5e13 100644 > --- a/drivers/w1/w1.h > +++ b/drivers/w1/w1.h > @@ -341,9 +341,20 @@ extern int w1_max_slave_ttl; > extern struct list_head w1_masters; > extern struct mutex w1_mlock; > > -extern int w1_process_callbacks(struct w1_master *dev); > +extern int __w1_process_callbacks(struct w1_master *dev); > extern int w1_process(void *); > > +static inline int w1_process_callbacks(struct w1_master *dev) > +{ > + int ret; > + > + mutex_lock(>list_mutex); > + ret = __w1_process_callbacks(dev); > + mutex_unlock(>list_mutex); > + return ret; > +} > + > + > #endif /* __KERNEL__ */ > > #endif /* __W1_H */ > -- > 1.8.3.2 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- David Fries PGP pub CB1EE8F0 http://fries.net/~david/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] w1: do not unlock unheld list_mutex in __w1_remove_master_device()
On Thu, May 01, 2014 at 12:37:58AM +0400, Alexey Khoroshilov wrote: w1_process_callbacks() expects to be called with dev-list_mutex held, but it is the fact only in w1_process(). __w1_remove_master_device() calls w1_process_callbacks() after it releases list_mutex. The patch fixes __w1_remove_master_device() to acquire list_mutex for w1_process_callbacks(). It is implemented by moving w1_process_callbacks() functionality to __w1_process_callbacks() and adding a wrapper with the old name. Good catch. I guess as it was in the shutdown path it failed the unlock silently. I would prefer a different fix. If w1_process_callbacks was more of a public facing API called from multiple locations where the caller doesn't have access to the locks, something like this would probably be the way to go. This is called from two areas of the code, and not likely to be called from any new areas in the future. Adding a documentation comment is good. I would be tempted in __w1_remove_master_device to move the dev-list_mutex down after the while loop, but I can't be sure that whatever has a refcnt wouldn't need list_mutex. The last w1_process_callbacks after the while loop shouldn't be needed I wouldn't think, I was just being defensive. By the time it completes the while loop the reference count is 0 so there shouldn't be anything left for it to process and if there is, it's a race condition and game over anyway. So just sandwich w1_process_callbacks with the lock/unlock in __w1_remove_master_device please. Found by Linux Driver Verification project (linuxtesting.org). Was this found with code inspection or hardware testing with lock debugging enabled? Signed-off-by: Alexey Khoroshilov khoroshi...@ispras.ru --- drivers/w1/w1.c | 8 +--- drivers/w1/w1.h | 13 - 2 files changed, 17 insertions(+), 4 deletions(-) diff --git a/drivers/w1/w1.c b/drivers/w1/w1.c index ff52618cafbe..e507c51512aa 100644 --- a/drivers/w1/w1.c +++ b/drivers/w1/w1.c @@ -1075,12 +1075,14 @@ static void w1_search_process(struct w1_master *dev, u8 search_type) } /** - * w1_process_callbacks() - execute each dev-async_list callback entry + * __w1_process_callbacks() - execute each dev-async_list callback entry * @dev: w1_master device * + * The w1 master list_mutex must be held. + * * Return: 1 if there were commands to executed 0 otherwise */ -int w1_process_callbacks(struct w1_master *dev) +int __w1_process_callbacks(struct w1_master *dev) { int ret = 0; struct w1_async_cmd *async_cmd, *async_n; @@ -1122,7 +1124,7 @@ int w1_process(void *data) /* Note, w1_process_callback drops the lock while processing, * but locks it again before returning. */ - if (!w1_process_callbacks(dev) jremain) { + if (!__w1_process_callbacks(dev) jremain) { /* a wake up is either to stop the thread, process * callbacks, or search, it isn't process callbacks, so * schedule a search. diff --git a/drivers/w1/w1.h b/drivers/w1/w1.h index 734dab7fc687..e357321a5e13 100644 --- a/drivers/w1/w1.h +++ b/drivers/w1/w1.h @@ -341,9 +341,20 @@ extern int w1_max_slave_ttl; extern struct list_head w1_masters; extern struct mutex w1_mlock; -extern int w1_process_callbacks(struct w1_master *dev); +extern int __w1_process_callbacks(struct w1_master *dev); extern int w1_process(void *); +static inline int w1_process_callbacks(struct w1_master *dev) +{ + int ret; + + mutex_lock(dev-list_mutex); + ret = __w1_process_callbacks(dev); + mutex_unlock(dev-list_mutex); + return ret; +} + + #endif /* __KERNEL__ */ #endif /* __W1_H */ -- 1.8.3.2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- David Fries da...@fries.netPGP pub CB1EE8F0 http://fries.net/~david/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] w1: avoid recursive device_add
__w1_attach_slave_device calls device_add which calls w1_bus_notify which calls the w1_bq27000 slave driver, which calls platform_device_add and device_add and deadlocks on getting &(>bus_notifier)->rwsem as it is still held in the previous device_add. This avoids the problem by processing the family add/remove outside of the slave device_add call. Commit 47eba33a0997fc7362a introduced this deadlock and added a KOBJ_ADD, as the add was already reported in device_register two add events were being sent. This change suppresses the device_register add so that any slave device sysfs entries are setup before the add goes out. Belisko Marek reported this change fixed the deadlock he was seeing on ARM device tree, while testing on my x86-64 system never saw the deadlock. Reported-by: Belisko Marek Signed-off-by: David Fries --- Evgeniy any comments? The patch looks good to me. Dr. H. Nikolaus Schaller, Can you give this patch another test? I don't expect any problems, but it is now only sending one add instead of two which is a change. I verified two KOBJ_ADD notifications were going out prior to this change, and only one after. There should only be one, also verified that the w1_bq27000 add notification went out before the slave device id, showing the suppression was working. Used udevadm monitor --kernel --property for notification monitoring. drivers/w1/w1.c | 32 ++-- 1 file changed, 6 insertions(+), 26 deletions(-) diff --git a/drivers/w1/w1.c b/drivers/w1/w1.c index b96f61b..ff52618 100644 --- a/drivers/w1/w1.c +++ b/drivers/w1/w1.c @@ -614,27 +614,11 @@ end: return err; } -/* - * Handle sysfs file creation and removal here, before userspace is told that - * the device is added / removed from the system - */ -static int w1_bus_notify(struct notifier_block *nb, unsigned long action, -void *data) +static int w1_family_notify(unsigned long action, struct w1_slave *sl) { - struct device *dev = data; - struct w1_slave *sl; struct w1_family_ops *fops; int err; - /* -* Only care about slave devices at the moment. Yes, we should use a -* separate "type" for this, but for now, look at the release function -* to know which type it is... -*/ - if (dev->release != w1_slave_release) - return 0; - - sl = dev_to_w1_slave(dev); fops = sl->family->fops; if (!fops) @@ -673,10 +657,6 @@ static int w1_bus_notify(struct notifier_block *nb, unsigned long action, return 0; } -static struct notifier_block w1_bus_nb = { - .notifier_call = w1_bus_notify, -}; - static int __w1_attach_slave_device(struct w1_slave *sl) { int err; @@ -698,6 +678,9 @@ static int __w1_attach_slave_device(struct w1_slave *sl) dev_dbg(>dev, "%s: registering %s as %p.\n", __func__, dev_name(>dev), sl); + /* suppress for w1_family_notify before sending KOBJ_ADD */ + dev_set_uevent_suppress(>dev, true); + err = device_register(>dev); if (err < 0) { dev_err(>dev, @@ -705,7 +688,7 @@ static int __w1_attach_slave_device(struct w1_slave *sl) dev_name(>dev), err); return err; } - + w1_family_notify(BUS_NOTIFY_ADD_DEVICE, sl); dev_set_uevent_suppress(>dev, false); kobject_uevent(>dev.kobj, KOBJ_ADD); @@ -799,6 +782,7 @@ int w1_unref_slave(struct w1_slave *sl) msg.type = W1_SLAVE_REMOVE; w1_netlink_send(sl->master, ); + w1_family_notify(BUS_NOTIFY_DEL_DEVICE, sl); device_unregister(>dev); #ifdef DEBUG memset(sl, 0, sizeof(*sl)); @@ -1186,10 +1170,6 @@ static int __init w1_init(void) goto err_out_exit_init; } - retval = bus_register_notifier(_bus_type, _bus_nb); - if (retval) - goto err_out_bus_unregister; - retval = driver_register(_master_driver); if (retval) { printk(KERN_ERR -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] w1: avoid recursive device_add
__w1_attach_slave_device calls device_add which calls w1_bus_notify which calls the w1_bq27000 slave driver, which calls platform_device_add and device_add and deadlocks on getting (priv-bus_notifier)-rwsem as it is still held in the previous device_add. This avoids the problem by processing the family add/remove outside of the slave device_add call. Commit 47eba33a0997fc7362a introduced this deadlock and added a KOBJ_ADD, as the add was already reported in device_register two add events were being sent. This change suppresses the device_register add so that any slave device sysfs entries are setup before the add goes out. Belisko Marek reported this change fixed the deadlock he was seeing on ARM device tree, while testing on my x86-64 system never saw the deadlock. Reported-by: Belisko Marek marek.beli...@gmail.com Signed-off-by: David Fries da...@fries.net --- Evgeniy any comments? The patch looks good to me. Dr. H. Nikolaus Schaller, Can you give this patch another test? I don't expect any problems, but it is now only sending one add instead of two which is a change. I verified two KOBJ_ADD notifications were going out prior to this change, and only one after. There should only be one, also verified that the w1_bq27000 add notification went out before the slave device id, showing the suppression was working. Used udevadm monitor --kernel --property for notification monitoring. drivers/w1/w1.c | 32 ++-- 1 file changed, 6 insertions(+), 26 deletions(-) diff --git a/drivers/w1/w1.c b/drivers/w1/w1.c index b96f61b..ff52618 100644 --- a/drivers/w1/w1.c +++ b/drivers/w1/w1.c @@ -614,27 +614,11 @@ end: return err; } -/* - * Handle sysfs file creation and removal here, before userspace is told that - * the device is added / removed from the system - */ -static int w1_bus_notify(struct notifier_block *nb, unsigned long action, -void *data) +static int w1_family_notify(unsigned long action, struct w1_slave *sl) { - struct device *dev = data; - struct w1_slave *sl; struct w1_family_ops *fops; int err; - /* -* Only care about slave devices at the moment. Yes, we should use a -* separate type for this, but for now, look at the release function -* to know which type it is... -*/ - if (dev-release != w1_slave_release) - return 0; - - sl = dev_to_w1_slave(dev); fops = sl-family-fops; if (!fops) @@ -673,10 +657,6 @@ static int w1_bus_notify(struct notifier_block *nb, unsigned long action, return 0; } -static struct notifier_block w1_bus_nb = { - .notifier_call = w1_bus_notify, -}; - static int __w1_attach_slave_device(struct w1_slave *sl) { int err; @@ -698,6 +678,9 @@ static int __w1_attach_slave_device(struct w1_slave *sl) dev_dbg(sl-dev, %s: registering %s as %p.\n, __func__, dev_name(sl-dev), sl); + /* suppress for w1_family_notify before sending KOBJ_ADD */ + dev_set_uevent_suppress(sl-dev, true); + err = device_register(sl-dev); if (err 0) { dev_err(sl-dev, @@ -705,7 +688,7 @@ static int __w1_attach_slave_device(struct w1_slave *sl) dev_name(sl-dev), err); return err; } - + w1_family_notify(BUS_NOTIFY_ADD_DEVICE, sl); dev_set_uevent_suppress(sl-dev, false); kobject_uevent(sl-dev.kobj, KOBJ_ADD); @@ -799,6 +782,7 @@ int w1_unref_slave(struct w1_slave *sl) msg.type = W1_SLAVE_REMOVE; w1_netlink_send(sl-master, msg); + w1_family_notify(BUS_NOTIFY_DEL_DEVICE, sl); device_unregister(sl-dev); #ifdef DEBUG memset(sl, 0, sizeof(*sl)); @@ -1186,10 +1170,6 @@ static int __init w1_init(void) goto err_out_exit_init; } - retval = bus_register_notifier(w1_bus_type, w1_bus_nb); - if (retval) - goto err_out_bus_unregister; - retval = driver_register(w1_master_driver); if (retval) { printk(KERN_ERR -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: w1: 3.14-rc7 - possible recursive locking detected
Belisko Marek, Here is a possible solution, could you give it a try and report back? Greg Kroah-Hartman, Evgeniy asked me to look into this report. I don't have the reporter's hardware configuration, but I wouldn't think that would be needed, just some w1 bus master (even W1_MASTER_GPIO might work), then loading the slave device and manually adding a slave device with that family id. Even then I didn't reproduce the reported recursive locking error. I saw unrelated locking reports, but not this one. I wrote up the included patch, but that undoes the notify changes that you added earlier in commit 47eba33a0997fc7, and I wanted to ask about that commit. Specifically these two lines, err = device_register(>dev); ... + dev_set_uevent_suppress(>dev, false); + kobject_uevent(>dev.kobj, KOBJ_ADD); Wouldn't the default be to not suppress? Nothing in the W1 system enables suppressing so is that even needed? (I'm fine with saying it's a good idea). device_register at some point must call device_add and device_add calls kobject_uevent(>kobj, KOBJ_ADD); so doesn't the KOBJ_ADD send the add a second time? As in it shouldn't be needed? Can the suppress be called before device_register to avoid the automatic notify, then after it returns setup the slave device as this patch does to avoid this problem report, and then call the KOBJ_ADD to make everything happy? I'm based on commit ce7613db2d8d4d5af2587a. On Tue, Apr 08, 2014 at 09:47:50PM +0200, Belisko Marek wrote: > Hi, > > On Mon, Mar 24, 2014 at 1:01 PM, Evgeniy Polyakov wrote: > > Hi everyone > > > > 24.03.2014, 01:50, "David Fries" : > >> On Mon, Mar 17, 2014 at 10:38:13PM +0100, Belisko Marek wrote: > >>> booting latest 3.14-rc7 on gta04 board gives following warning: > >>> > >>> [3.101409] Driver for 1-wire Dallas network protocol. > >>> [3.109649] omap_hdq omap_hdq: OMAP HDQ Hardware Rev 0.5. Driver in > >>> [3.400146]CPU0 > >>> [3.402709] > >>> [3.405273] lock(&(>bus_notifier)->rwsem); > >>> [3.410308] lock(&(>bus_notifier)->rwsem); > >>> [3.415374] > >>> [3.415374] *** DEADLOCK *** > >>> > >>> AFAIK we didn't see that on (at least 3/14-rc2). > >> > >> The last drivers/w1 change went in before v3.14-rc1, so if it is > >> something in the w1 code, it has either been there or something else > >> in the rest of the kernel changed to expose it. I'm using the ds2490 > >> USB w1 dongle, I didn't see that problem when enabling lock debugging > >> on 3.14.0-rc5+ which includes some changes I'm working on. > >> > >> https://github.com/dfries/linux.git w1_rework > > > > Can it be a bug in omap w1 driver? > We test with actual Linus master and it's even worse (it hangs completely). > Trace look little bit different and it only happens when booting via > device tree (not with board file). > > [2.807434] Driver for 1-wire Dallas network protocol. > [2.814117] omap_hdq omap_hdq: OMAP HDQ Hardware Rev 0.5. Driver in > Interrupt mode > [3.034271] call_modprobe: w1-family-0x1 > [3.051818] > [3.053405] = > [3.059051] [ INFO: possible recursive locking detected ] > [3.064727] 3.14.0-gta04 #556 Not tainted > [3.068939] - > [3.074615] w1_bus_master1/795 is trying to acquire lock: > [3.080261] (&(>bus_notifier)->rwsem){.+.+.+}, at: > [] __blocking_notifier_call_chain+0x28/0x58 > [3.090911] > [3.090911] but task is already holding lock: > [3.097045] (&(>bus_notifier)->rwsem){.+.+.+}, at: > [] __blocking_notifier_call_chain+0x28/0x58 > [3.107666] > [3.107666] other info that might help us debug this: > [3.114501] Possible unsafe locking scenario: > [3.114501] > [3.120727]CPU0 > [3.123291] > [3.125854] lock(&(>bus_notifier)->rwsem); > [3.130889] lock(&(>bus_notifier)->rwsem); > [3.135925] > [3.135925] *** DEADLOCK *** > [3.135925] > [3.142150] May be due to missing lock nesting notation > [3.142150] > [3.149291] 2 locks held by w1_bus_master1/795: > [3.154052] #0: (>mutex#3){+.+.+.}, at: [] > w1_attach_slave_device+0xc4/0x1c8 > [3.163055] #1: (&(>bus_notifier)->rwsem){.+.+.+}, at: > [] __blocking_notifier_call_chain+0x28/0x58 > [3.174163] > [3.174163] stack backtrace: > [3.178741] CPU: 0 PID: 795 Comm: w1_bus_master1 Not tainted > 3.14.0-gta04 #556 >
Re: w1: 3.14-rc7 - possible recursive locking detected
Belisko Marek, Here is a possible solution, could you give it a try and report back? Greg Kroah-Hartman, Evgeniy asked me to look into this report. I don't have the reporter's hardware configuration, but I wouldn't think that would be needed, just some w1 bus master (even W1_MASTER_GPIO might work), then loading the slave device and manually adding a slave device with that family id. Even then I didn't reproduce the reported recursive locking error. I saw unrelated locking reports, but not this one. I wrote up the included patch, but that undoes the notify changes that you added earlier in commit 47eba33a0997fc7, and I wanted to ask about that commit. Specifically these two lines, err = device_register(sl-dev); ... + dev_set_uevent_suppress(sl-dev, false); + kobject_uevent(sl-dev.kobj, KOBJ_ADD); Wouldn't the default be to not suppress? Nothing in the W1 system enables suppressing so is that even needed? (I'm fine with saying it's a good idea). device_register at some point must call device_add and device_add calls kobject_uevent(dev-kobj, KOBJ_ADD); so doesn't the KOBJ_ADD send the add a second time? As in it shouldn't be needed? Can the suppress be called before device_register to avoid the automatic notify, then after it returns setup the slave device as this patch does to avoid this problem report, and then call the KOBJ_ADD to make everything happy? I'm based on commit ce7613db2d8d4d5af2587a. On Tue, Apr 08, 2014 at 09:47:50PM +0200, Belisko Marek wrote: Hi, On Mon, Mar 24, 2014 at 1:01 PM, Evgeniy Polyakov z...@ioremap.net wrote: Hi everyone 24.03.2014, 01:50, David Fries da...@fries.net: On Mon, Mar 17, 2014 at 10:38:13PM +0100, Belisko Marek wrote: booting latest 3.14-rc7 on gta04 board gives following warning: [3.101409] Driver for 1-wire Dallas network protocol. [3.109649] omap_hdq omap_hdq: OMAP HDQ Hardware Rev 0.5. Driver in [3.400146]CPU0 [3.402709] [3.405273] lock((priv-bus_notifier)-rwsem); [3.410308] lock((priv-bus_notifier)-rwsem); [3.415374] [3.415374] *** DEADLOCK *** AFAIK we didn't see that on (at least 3/14-rc2). The last drivers/w1 change went in before v3.14-rc1, so if it is something in the w1 code, it has either been there or something else in the rest of the kernel changed to expose it. I'm using the ds2490 USB w1 dongle, I didn't see that problem when enabling lock debugging on 3.14.0-rc5+ which includes some changes I'm working on. https://github.com/dfries/linux.git w1_rework Can it be a bug in omap w1 driver? We test with actual Linus master and it's even worse (it hangs completely). Trace look little bit different and it only happens when booting via device tree (not with board file). [2.807434] Driver for 1-wire Dallas network protocol. [2.814117] omap_hdq omap_hdq: OMAP HDQ Hardware Rev 0.5. Driver in Interrupt mode [3.034271] call_modprobe: w1-family-0x1 [3.051818] [3.053405] = [3.059051] [ INFO: possible recursive locking detected ] [3.064727] 3.14.0-gta04 #556 Not tainted [3.068939] - [3.074615] w1_bus_master1/795 is trying to acquire lock: [3.080261] ((priv-bus_notifier)-rwsem){.+.+.+}, at: [c0058ab0] __blocking_notifier_call_chain+0x28/0x58 [3.090911] [3.090911] but task is already holding lock: [3.097045] ((priv-bus_notifier)-rwsem){.+.+.+}, at: [c0058ab0] __blocking_notifier_call_chain+0x28/0x58 [3.107666] [3.107666] other info that might help us debug this: [3.114501] Possible unsafe locking scenario: [3.114501] [3.120727]CPU0 [3.123291] [3.125854] lock((priv-bus_notifier)-rwsem); [3.130889] lock((priv-bus_notifier)-rwsem); [3.135925] [3.135925] *** DEADLOCK *** [3.135925] [3.142150] May be due to missing lock nesting notation [3.142150] [3.149291] 2 locks held by w1_bus_master1/795: [3.154052] #0: (dev-mutex#3){+.+.+.}, at: [c03ba1c8] w1_attach_slave_device+0xc4/0x1c8 [3.163055] #1: ((priv-bus_notifier)-rwsem){.+.+.+}, at: [c0058ab0] __blocking_notifier_call_chain+0x28/0x58 [3.174163] [3.174163] stack backtrace: [3.178741] CPU: 0 PID: 795 Comm: w1_bus_master1 Not tainted 3.14.0-gta04 #556 [3.186370] [c0013ca0] (unwind_backtrace) from [c0010f68] (show_stack+0x10/0x14) [3.194488] [c0010f68] (show_stack) from [c04f0900] (dump_stack+0x6c/0x88) [3.202117] [c04f0900] (dump_stack) from [c00746bc] (print_deadlock_bug+0xc0/0xf0) [3.210449] [c00746bc] (print_deadlock_bug) from [c0075f80] (validate_chain.isra.29+0x4dc/0x534) [3.220031] [c0075f80] (validate_chain.isra.29) from [c0076dac] (__lock_acquire+0x728/0x834) [3.229248] [c0076dac] (__lock_acquire) from [c00774b8] (lock_acquire+0xf4
Re: [PATCH 1/3] w1: fix netlink refcnt leak on error path
This patch is a bug fix, and I see from the mailing list I'm not the only one to run into this bug, so it would be nice for this patch to make it into this merge window. I didn't tag this one for stable because it doesn't apply cleanly due to previous changes that did make it into the merge window. Let me know if I should rewrite it for stable and which kernel version. The other two patches are more feature based changes. On Tue, Apr 08, 2014 at 10:37:07PM -0500, David Fries wrote: > If the message type is W1_MASTER_CMD or W1_SLAVE_CMD, then a reference > is taken when searching for the slave or master device. If there > isn't any following data m->len (mlen is a copy) is 0 and packing up > the message for later execution is skipped leaving nothing to > decrement the reference counts. > > Way back when, m->len was checked before the search that increments the > reference count, but W1_LIST_MASTERS has no additional data, the check > was moved in 9be62e0b2fadaf5ff causing this bug. > > This change reorders to put the check before the reference count is > incremented avoiding the problem. > > Signed-off-by: David Fries > Acked-by: Evgeniy Polyakov -- David Fries PGP pub CB1EE8F0 http://fries.net/~david/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/3] w1: optional bundling of netlink kernel replies
Applications can submit a set of commands in one packet to the kernel, and in some cases it is required such as reading the temperature sensor results. This adds an option W1_CN_BUNDLE to the flags of cn_msg to request the kernel to reply in one packet for efficiency. The cn_msg flags now check for unknown flag values and return an error if one is seen. See "Proper handling of unknown flags in system calls" http://lwn.net/Articles/588444/ This corrects the ack values returned as per the protocol standard, namely the original ack for status messages and seq + 1 for all others such as the data returned from a read. Some of the common variable names have been standardized as follows. struct cn_msg *cn struct w1_netlink_msg *msg struct w1_netlink_cmd *cmd struct w1_master *dev When an argument and a function scope variable would collide, add req_ to the argument. Signed-off-by: David Fries Acked-by: Evgeniy Polyakov --- Documentation/connector/connector.txt |2 +- Documentation/w1/w1.generic |2 +- Documentation/w1/w1.netlink | 13 +- drivers/w1/w1.h |8 - drivers/w1/w1_netlink.c | 649 - drivers/w1/w1_netlink.h | 36 ++ 6 files changed, 447 insertions(+), 263 deletions(-) diff --git a/Documentation/connector/connector.txt b/Documentation/connector/connector.txt index e56abdb..f6215f9 100644 --- a/Documentation/connector/connector.txt +++ b/Documentation/connector/connector.txt @@ -118,7 +118,7 @@ acknowledge number MUST be the same + 1. If we receive a message and its sequence number is not equal to one we are expecting, then it is a new message. If we receive a message and its sequence number is the same as one we are expecting, but its -acknowledge is not equal to the acknowledge number in the original +acknowledge is not equal to the sequence number in the original message + 1, then it is a new message. Obviously, the protocol header contains the above id. diff --git a/Documentation/w1/w1.generic b/Documentation/w1/w1.generic index a31c5a2..b2033c6 100644 --- a/Documentation/w1/w1.generic +++ b/Documentation/w1/w1.generic @@ -82,7 +82,7 @@ driver - (standard) symlink to the w1 driver w1_master_add - Manually register a slave device w1_master_attempts - the number of times a search was attempted w1_master_max_slave_count - - the maximum slaves that may be attached to a master + - maximum number of slaves to search for at a time w1_master_name - the name of the device (w1_bus_masterX) w1_master_pullup - 5V strong pullup 0 enabled, 1 disabled w1_master_remove - Manually remove a slave device diff --git a/Documentation/w1/w1.netlink b/Documentation/w1/w1.netlink index 927a52c..ef27271 100644 --- a/Documentation/w1/w1.netlink +++ b/Documentation/w1/w1.netlink @@ -30,7 +30,7 @@ Protocol. W1_SLAVE_CMD userspace command for slave device (read/write/touch) - __u8 res- reserved + __u8 status - error indication from kernel __u16 len - size of data attached to this header data union { __u8 id[8]; - slave unique device id @@ -44,10 +44,14 @@ Protocol. __u8 cmd- command opcode. W1_CMD_READ - read command W1_CMD_WRITE- write command - W1_CMD_TOUCH- touch command - (write and sample data back to userspace) W1_CMD_SEARCH - search command W1_CMD_ALARM_SEARCH - alarm search command + W1_CMD_TOUCH- touch command + (write and sample data back to userspace) + W1_CMD_RESET- send bus reset + W1_CMD_SLAVE_ADD- add slave to kernel list + W1_CMD_SLAVE_REMOVE - remove slave from kernel list + W1_CMD_LIST_SLAVES - get slaves list from kernel __u8 res- reserved __u16 len - length of data for this command For read command data must be allocated like for write command @@ -87,8 +91,7 @@ format: id0 ... idN Each message is at most 4k in size, so if number of master devices - exceeds this, it will be split into several messages, - cn.seq will be increased for each one. + exceeds this, it will be split into several messages. W1 search and alarm search commands. request: diff --git a/drivers/w1/w1.h b/drivers/w1/w1.h index 734dab7..56a49ba 100644 --- a/drivers/w1/w1.h +++ b/drivers/w1/w1.h @@ -203,7 +203,6 @@ enum w1_master_flags { * @search_id: allows continuing a search * @refcnt:reference count
[PATCH 1/3] w1: fix netlink refcnt leak on error path
If the message type is W1_MASTER_CMD or W1_SLAVE_CMD, then a reference is taken when searching for the slave or master device. If there isn't any following data m->len (mlen is a copy) is 0 and packing up the message for later execution is skipped leaving nothing to decrement the reference counts. Way back when, m->len was checked before the search that increments the reference count, but W1_LIST_MASTERS has no additional data, the check was moved in 9be62e0b2fadaf5ff causing this bug. This change reorders to put the check before the reference count is incremented avoiding the problem. Signed-off-by: David Fries Acked-by: Evgeniy Polyakov --- drivers/w1/w1_netlink.c | 44 ++-- 1 file changed, 26 insertions(+), 18 deletions(-) diff --git a/drivers/w1/w1_netlink.c b/drivers/w1/w1_netlink.c index 5234964..a02704a 100644 --- a/drivers/w1/w1_netlink.c +++ b/drivers/w1/w1_netlink.c @@ -300,12 +300,6 @@ static int w1_process_command_root(struct cn_msg *msg, struct w1_netlink_msg *w; u32 *id; - if (mcmd->type != W1_LIST_MASTERS) { - printk(KERN_NOTICE "%s: msg: %x.%x, wrong type: %u, len: %u.\n", - __func__, msg->id.idx, msg->id.val, mcmd->type, mcmd->len); - return -EPROTO; - } - cn = kmalloc(PAGE_SIZE, GFP_KERNEL); if (!cn) return -ENOMEM; @@ -441,6 +435,9 @@ static void w1_process_cb(struct w1_master *dev, struct w1_async_cmd *async_cmd) w1_netlink_send_error(>block->msg, node->m, cmd, node->block->portid, err); + /* ref taken in w1_search_slave or w1_search_master_id when building +* the block +*/ if (sl) w1_unref_slave(sl); else @@ -503,30 +500,42 @@ static void w1_cn_callback(struct cn_msg *msg, struct netlink_skb_parms *nsp) msg_len = msg->len; while (msg_len && !err) { - struct w1_reg_num id; - u16 mlen = m->len; dev = NULL; sl = NULL; - memcpy(, m->id.id, sizeof(id)); -#if 0 - printk("%s: %02x.%012llx.%02x: type=%02x, len=%u.\n", - __func__, id.family, (unsigned long long)id.id, id.crc, m->type, m->len); -#endif if (m->len + sizeof(struct w1_netlink_msg) > msg_len) { err = -E2BIG; break; } + /* execute on this thread, no need to process later */ + if (m->type == W1_LIST_MASTERS) { + err = w1_process_command_root(msg, m, nsp->portid); + goto out_cont; + } + + /* All following message types require additional data, +* check here before references are taken. +*/ + if (!m->len) { + err = -EPROTO; + goto out_cont; + } + + /* both search calls take reference counts */ if (m->type == W1_MASTER_CMD) { dev = w1_search_master_id(m->id.mst.id); } else if (m->type == W1_SLAVE_CMD) { - sl = w1_search_slave(); + sl = w1_search_slave((struct w1_reg_num *)m->id.id); if (sl) dev = sl->master; } else { - err = w1_process_command_root(msg, m, nsp->portid); + printk(KERN_NOTICE + "%s: msg: %x.%x, wrong type: %u, len: %u.\n", + __func__, msg->id.idx, msg->id.val, + m->type, m->len); + err = -EPROTO; goto out_cont; } @@ -536,8 +545,6 @@ static void w1_cn_callback(struct cn_msg *msg, struct netlink_skb_parms *nsp) } err = 0; - if (!mlen) - goto out_cont; atomic_inc(>refcnt); node->async.cb = w1_process_cb; @@ -557,7 +564,8 @@ out_cont: if (err) w1_netlink_send_error(msg, m, NULL, nsp->portid, err); msg_len -= sizeof(struct w1_netlink_msg) + m->len; - m = (struct w1_netlink_msg *)(((u8 *)m) + sizeof(struct w1_netlink_msg) + m->len); + m = (struct w1_netlink_msg *)(((u8 *)m) + + sizeof(struct w1_netlink_msg) + m->len); /* * Let's allow requests for nonexisting devices. -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/3] connector: allow multiple messages to be sent in one packet
This increases the amount of bundling to reduce the number of packets sent. For the one wire use there can be multiple struct w1_netlink_cmd in a struct w1_netlink_msg and multiple of those in struct cn_msg, and with this change multiple of those in a struct nlmsghdr, and at each level the len identifies there being multiple of the next. Signed-off-by: David Fries Acked-by: Evgeniy Polyakov --- Documentation/connector/connector.txt | 13 ++--- drivers/connector/connector.c | 17 +++-- include/linux/connector.h |1 + 3 files changed, 26 insertions(+), 5 deletions(-) diff --git a/Documentation/connector/connector.txt b/Documentation/connector/connector.txt index e5c5f5e..e56abdb 100644 --- a/Documentation/connector/connector.txt +++ b/Documentation/connector/connector.txt @@ -24,7 +24,8 @@ netlink based networking for inter-process communication in a significantly easier way: int cn_add_callback(struct cb_id *id, char *name, void (*callback) (struct cn_msg *, struct netlink_skb_parms *)); -void cn_netlink_send(struct cn_msg *msg, u32 __group, int gfp_mask); +void cn_netlink_send_multi(struct cn_msg *msg, u16 len, u32 portid, u32 __group, int gfp_mask); +void cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group, int gfp_mask); struct cb_id { @@ -71,15 +72,21 @@ void cn_del_callback(struct cb_id *id); struct cb_id *id - unique connector's user identifier. -int cn_netlink_send(struct cn_msg *msg, u32 __groups, int gfp_mask); +int cn_netlink_send_multi(struct cn_msg *msg, u16 len, u32 portid, u32 __groups, int gfp_mask); +int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __groups, int gfp_mask); Sends message to the specified groups. It can be safely called from softirq context, but may silently fail under strong memory pressure. If there are no listeners for given group -ESRCH can be returned. struct cn_msg * - message header(with attached data). + u16 len - for *_multi multiple cn_msg messages can be sent + u32 port - destination port. + If non-zero the message will be sent to the + given port, which should be set to the + original sender. u32 __group - destination group. - If __group is zero, then appropriate group will + If port and __group is zero, then appropriate group will be searched through all registered connector users, and message will be delivered to the group which was created for user with the same ID as in msg. diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c index b14f1d3..f612d68 100644 --- a/drivers/connector/connector.c +++ b/drivers/connector/connector.c @@ -43,6 +43,8 @@ static struct cn_dev cdev; static int cn_already_initialized; /* + * Sends mult (multiple) cn_msg at a time. + * * msg->seq and msg->ack are used to determine message genealogy. * When someone sends message it puts there locally unique sequence * and random acknowledge numbers. Sequence number may be copied into @@ -62,10 +64,13 @@ static int cn_already_initialized; * the acknowledgement number in the original message + 1, then it is * a new message. * + * If msg->len != len, then additional cn_msg messages are expected following + * the first msg. + * * The message is sent to, the portid if given, the group if given, both if * both, or if both are zero then the group is looked up and sent there. */ -int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group, +int cn_netlink_send_mult(struct cn_msg *msg, u16 len, u32 portid, u32 __group, gfp_t gfp_mask) { struct cn_callback_entry *__cbq; @@ -98,7 +103,7 @@ int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group, if (!portid && !netlink_has_listeners(dev->nls, group)) return -ESRCH; - size = sizeof(*msg) + msg->len; + size = sizeof(*msg) + len; skb = nlmsg_new(size, gfp_mask); if (!skb) @@ -121,6 +126,14 @@ int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group, gfp_mask); return netlink_unicast(dev->nls, skb, portid, !(gfp_mask&__GFP_WAIT)); } +EXPORT_SYMBOL_GPL(cn_netlink_send_mult); + +/* same as cn_netlink_send_mult except msg->len is used for len */ +int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group, + gfp_t gfp_mask) +{ + return cn_netlink_send_mult(msg, msg->len, portid, __group, gfp_mask); +} EXPORT_SYMBOL_GPL(cn_netlink_send); /* diff --git a/include/linux/connector.h b/include/linux/connector.h index be9c4747..f8fe863 100644 --- a/include/linux/connect
[PATCH 0/3] w1: fixes and bundling replies
A program can bundle a sequence of commands in one netlink packet but the kernel was having to reply in many different packets, this adds a flag W1_CN_BUNDLE in the cn_msg.flags to allow the kernel to bundle the replies in to one message. This is opt in to avoid breaking programs that aren't expecting additional messages. Netlink connector now has a new call cn_netlink_send_multi, which allow sending multiple cn_msg structures in a nlmsghdr structure. I tested with my client program that will bundle up 14 temperature sensor conversions, then after a delay, bundle up another set of commands to read all 14 with the bundle bit set. I also tested with a two year old version of the software that sends requests two one slave at a time (bundling only the write/read to get the data out), and doesn't have code to read the bundling the this patch adds. Both operate correctly running at the same time. Documentation/connector/connector.txt | 15 +- Documentation/w1/w1.generic |2 +- Documentation/w1/w1.netlink | 13 +- drivers/connector/connector.c | 17 +- drivers/w1/w1.h |8 - drivers/w1/w1_netlink.c | 673 - drivers/w1/w1_netlink.h | 36 ++ include/linux/connector.h |1 + 8 files changed, 489 insertions(+), 276 deletions(-) [PATCH 1/3] w1: fix netlink refcnt leak on error path [PATCH 2/3] connector: allow multiple messages to be sent in one [PATCH 3/3] w1: optional bundling of netlink kernel replies -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/3] w1: fixes and bundling replies
A program can bundle a sequence of commands in one netlink packet but the kernel was having to reply in many different packets, this adds a flag W1_CN_BUNDLE in the cn_msg.flags to allow the kernel to bundle the replies in to one message. This is opt in to avoid breaking programs that aren't expecting additional messages. Netlink connector now has a new call cn_netlink_send_multi, which allow sending multiple cn_msg structures in a nlmsghdr structure. I tested with my client program that will bundle up 14 temperature sensor conversions, then after a delay, bundle up another set of commands to read all 14 with the bundle bit set. I also tested with a two year old version of the software that sends requests two one slave at a time (bundling only the write/read to get the data out), and doesn't have code to read the bundling the this patch adds. Both operate correctly running at the same time. Documentation/connector/connector.txt | 15 +- Documentation/w1/w1.generic |2 +- Documentation/w1/w1.netlink | 13 +- drivers/connector/connector.c | 17 +- drivers/w1/w1.h |8 - drivers/w1/w1_netlink.c | 673 - drivers/w1/w1_netlink.h | 36 ++ include/linux/connector.h |1 + 8 files changed, 489 insertions(+), 276 deletions(-) [PATCH 1/3] w1: fix netlink refcnt leak on error path [PATCH 2/3] connector: allow multiple messages to be sent in one [PATCH 3/3] w1: optional bundling of netlink kernel replies -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/3] w1: optional bundling of netlink kernel replies
Applications can submit a set of commands in one packet to the kernel, and in some cases it is required such as reading the temperature sensor results. This adds an option W1_CN_BUNDLE to the flags of cn_msg to request the kernel to reply in one packet for efficiency. The cn_msg flags now check for unknown flag values and return an error if one is seen. See Proper handling of unknown flags in system calls http://lwn.net/Articles/588444/ This corrects the ack values returned as per the protocol standard, namely the original ack for status messages and seq + 1 for all others such as the data returned from a read. Some of the common variable names have been standardized as follows. struct cn_msg *cn struct w1_netlink_msg *msg struct w1_netlink_cmd *cmd struct w1_master *dev When an argument and a function scope variable would collide, add req_ to the argument. Signed-off-by: David Fries da...@fries.net Acked-by: Evgeniy Polyakov z...@ioremap.net --- Documentation/connector/connector.txt |2 +- Documentation/w1/w1.generic |2 +- Documentation/w1/w1.netlink | 13 +- drivers/w1/w1.h |8 - drivers/w1/w1_netlink.c | 649 - drivers/w1/w1_netlink.h | 36 ++ 6 files changed, 447 insertions(+), 263 deletions(-) diff --git a/Documentation/connector/connector.txt b/Documentation/connector/connector.txt index e56abdb..f6215f9 100644 --- a/Documentation/connector/connector.txt +++ b/Documentation/connector/connector.txt @@ -118,7 +118,7 @@ acknowledge number MUST be the same + 1. If we receive a message and its sequence number is not equal to one we are expecting, then it is a new message. If we receive a message and its sequence number is the same as one we are expecting, but its -acknowledge is not equal to the acknowledge number in the original +acknowledge is not equal to the sequence number in the original message + 1, then it is a new message. Obviously, the protocol header contains the above id. diff --git a/Documentation/w1/w1.generic b/Documentation/w1/w1.generic index a31c5a2..b2033c6 100644 --- a/Documentation/w1/w1.generic +++ b/Documentation/w1/w1.generic @@ -82,7 +82,7 @@ driver - (standard) symlink to the w1 driver w1_master_add - Manually register a slave device w1_master_attempts - the number of times a search was attempted w1_master_max_slave_count - - the maximum slaves that may be attached to a master + - maximum number of slaves to search for at a time w1_master_name - the name of the device (w1_bus_masterX) w1_master_pullup - 5V strong pullup 0 enabled, 1 disabled w1_master_remove - Manually remove a slave device diff --git a/Documentation/w1/w1.netlink b/Documentation/w1/w1.netlink index 927a52c..ef27271 100644 --- a/Documentation/w1/w1.netlink +++ b/Documentation/w1/w1.netlink @@ -30,7 +30,7 @@ Protocol. W1_SLAVE_CMD userspace command for slave device (read/write/touch) - __u8 res- reserved + __u8 status - error indication from kernel __u16 len - size of data attached to this header data union { __u8 id[8]; - slave unique device id @@ -44,10 +44,14 @@ Protocol. __u8 cmd- command opcode. W1_CMD_READ - read command W1_CMD_WRITE- write command - W1_CMD_TOUCH- touch command - (write and sample data back to userspace) W1_CMD_SEARCH - search command W1_CMD_ALARM_SEARCH - alarm search command + W1_CMD_TOUCH- touch command + (write and sample data back to userspace) + W1_CMD_RESET- send bus reset + W1_CMD_SLAVE_ADD- add slave to kernel list + W1_CMD_SLAVE_REMOVE - remove slave from kernel list + W1_CMD_LIST_SLAVES - get slaves list from kernel __u8 res- reserved __u16 len - length of data for this command For read command data must be allocated like for write command @@ -87,8 +91,7 @@ format: id0 ... idN Each message is at most 4k in size, so if number of master devices - exceeds this, it will be split into several messages, - cn.seq will be increased for each one. + exceeds this, it will be split into several messages. W1 search and alarm search commands. request: diff --git a/drivers/w1/w1.h b/drivers/w1/w1.h index 734dab7..56a49ba 100644 --- a/drivers/w1/w1.h +++ b/drivers/w1/w1.h @@ -203,7 +203,6 @@ enum w1_master_flags { * @search_id: allows continuing a search * @refcnt:reference count
[PATCH 1/3] w1: fix netlink refcnt leak on error path
If the message type is W1_MASTER_CMD or W1_SLAVE_CMD, then a reference is taken when searching for the slave or master device. If there isn't any following data m-len (mlen is a copy) is 0 and packing up the message for later execution is skipped leaving nothing to decrement the reference counts. Way back when, m-len was checked before the search that increments the reference count, but W1_LIST_MASTERS has no additional data, the check was moved in 9be62e0b2fadaf5ff causing this bug. This change reorders to put the check before the reference count is incremented avoiding the problem. Signed-off-by: David Fries da...@fries.net Acked-by: Evgeniy Polyakov z...@ioremap.net --- drivers/w1/w1_netlink.c | 44 ++-- 1 file changed, 26 insertions(+), 18 deletions(-) diff --git a/drivers/w1/w1_netlink.c b/drivers/w1/w1_netlink.c index 5234964..a02704a 100644 --- a/drivers/w1/w1_netlink.c +++ b/drivers/w1/w1_netlink.c @@ -300,12 +300,6 @@ static int w1_process_command_root(struct cn_msg *msg, struct w1_netlink_msg *w; u32 *id; - if (mcmd-type != W1_LIST_MASTERS) { - printk(KERN_NOTICE %s: msg: %x.%x, wrong type: %u, len: %u.\n, - __func__, msg-id.idx, msg-id.val, mcmd-type, mcmd-len); - return -EPROTO; - } - cn = kmalloc(PAGE_SIZE, GFP_KERNEL); if (!cn) return -ENOMEM; @@ -441,6 +435,9 @@ static void w1_process_cb(struct w1_master *dev, struct w1_async_cmd *async_cmd) w1_netlink_send_error(node-block-msg, node-m, cmd, node-block-portid, err); + /* ref taken in w1_search_slave or w1_search_master_id when building +* the block +*/ if (sl) w1_unref_slave(sl); else @@ -503,30 +500,42 @@ static void w1_cn_callback(struct cn_msg *msg, struct netlink_skb_parms *nsp) msg_len = msg-len; while (msg_len !err) { - struct w1_reg_num id; - u16 mlen = m-len; dev = NULL; sl = NULL; - memcpy(id, m-id.id, sizeof(id)); -#if 0 - printk(%s: %02x.%012llx.%02x: type=%02x, len=%u.\n, - __func__, id.family, (unsigned long long)id.id, id.crc, m-type, m-len); -#endif if (m-len + sizeof(struct w1_netlink_msg) msg_len) { err = -E2BIG; break; } + /* execute on this thread, no need to process later */ + if (m-type == W1_LIST_MASTERS) { + err = w1_process_command_root(msg, m, nsp-portid); + goto out_cont; + } + + /* All following message types require additional data, +* check here before references are taken. +*/ + if (!m-len) { + err = -EPROTO; + goto out_cont; + } + + /* both search calls take reference counts */ if (m-type == W1_MASTER_CMD) { dev = w1_search_master_id(m-id.mst.id); } else if (m-type == W1_SLAVE_CMD) { - sl = w1_search_slave(id); + sl = w1_search_slave((struct w1_reg_num *)m-id.id); if (sl) dev = sl-master; } else { - err = w1_process_command_root(msg, m, nsp-portid); + printk(KERN_NOTICE + %s: msg: %x.%x, wrong type: %u, len: %u.\n, + __func__, msg-id.idx, msg-id.val, + m-type, m-len); + err = -EPROTO; goto out_cont; } @@ -536,8 +545,6 @@ static void w1_cn_callback(struct cn_msg *msg, struct netlink_skb_parms *nsp) } err = 0; - if (!mlen) - goto out_cont; atomic_inc(block-refcnt); node-async.cb = w1_process_cb; @@ -557,7 +564,8 @@ out_cont: if (err) w1_netlink_send_error(msg, m, NULL, nsp-portid, err); msg_len -= sizeof(struct w1_netlink_msg) + m-len; - m = (struct w1_netlink_msg *)(((u8 *)m) + sizeof(struct w1_netlink_msg) + m-len); + m = (struct w1_netlink_msg *)(((u8 *)m) + + sizeof(struct w1_netlink_msg) + m-len); /* * Let's allow requests for nonexisting devices. -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/3] connector: allow multiple messages to be sent in one packet
This increases the amount of bundling to reduce the number of packets sent. For the one wire use there can be multiple struct w1_netlink_cmd in a struct w1_netlink_msg and multiple of those in struct cn_msg, and with this change multiple of those in a struct nlmsghdr, and at each level the len identifies there being multiple of the next. Signed-off-by: David Fries da...@fries.net Acked-by: Evgeniy Polyakov z...@ioremap.net --- Documentation/connector/connector.txt | 13 ++--- drivers/connector/connector.c | 17 +++-- include/linux/connector.h |1 + 3 files changed, 26 insertions(+), 5 deletions(-) diff --git a/Documentation/connector/connector.txt b/Documentation/connector/connector.txt index e5c5f5e..e56abdb 100644 --- a/Documentation/connector/connector.txt +++ b/Documentation/connector/connector.txt @@ -24,7 +24,8 @@ netlink based networking for inter-process communication in a significantly easier way: int cn_add_callback(struct cb_id *id, char *name, void (*callback) (struct cn_msg *, struct netlink_skb_parms *)); -void cn_netlink_send(struct cn_msg *msg, u32 __group, int gfp_mask); +void cn_netlink_send_multi(struct cn_msg *msg, u16 len, u32 portid, u32 __group, int gfp_mask); +void cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group, int gfp_mask); struct cb_id { @@ -71,15 +72,21 @@ void cn_del_callback(struct cb_id *id); struct cb_id *id - unique connector's user identifier. -int cn_netlink_send(struct cn_msg *msg, u32 __groups, int gfp_mask); +int cn_netlink_send_multi(struct cn_msg *msg, u16 len, u32 portid, u32 __groups, int gfp_mask); +int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __groups, int gfp_mask); Sends message to the specified groups. It can be safely called from softirq context, but may silently fail under strong memory pressure. If there are no listeners for given group -ESRCH can be returned. struct cn_msg * - message header(with attached data). + u16 len - for *_multi multiple cn_msg messages can be sent + u32 port - destination port. + If non-zero the message will be sent to the + given port, which should be set to the + original sender. u32 __group - destination group. - If __group is zero, then appropriate group will + If port and __group is zero, then appropriate group will be searched through all registered connector users, and message will be delivered to the group which was created for user with the same ID as in msg. diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c index b14f1d3..f612d68 100644 --- a/drivers/connector/connector.c +++ b/drivers/connector/connector.c @@ -43,6 +43,8 @@ static struct cn_dev cdev; static int cn_already_initialized; /* + * Sends mult (multiple) cn_msg at a time. + * * msg-seq and msg-ack are used to determine message genealogy. * When someone sends message it puts there locally unique sequence * and random acknowledge numbers. Sequence number may be copied into @@ -62,10 +64,13 @@ static int cn_already_initialized; * the acknowledgement number in the original message + 1, then it is * a new message. * + * If msg-len != len, then additional cn_msg messages are expected following + * the first msg. + * * The message is sent to, the portid if given, the group if given, both if * both, or if both are zero then the group is looked up and sent there. */ -int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group, +int cn_netlink_send_mult(struct cn_msg *msg, u16 len, u32 portid, u32 __group, gfp_t gfp_mask) { struct cn_callback_entry *__cbq; @@ -98,7 +103,7 @@ int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group, if (!portid !netlink_has_listeners(dev-nls, group)) return -ESRCH; - size = sizeof(*msg) + msg-len; + size = sizeof(*msg) + len; skb = nlmsg_new(size, gfp_mask); if (!skb) @@ -121,6 +126,14 @@ int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group, gfp_mask); return netlink_unicast(dev-nls, skb, portid, !(gfp_mask__GFP_WAIT)); } +EXPORT_SYMBOL_GPL(cn_netlink_send_mult); + +/* same as cn_netlink_send_mult except msg-len is used for len */ +int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group, + gfp_t gfp_mask) +{ + return cn_netlink_send_mult(msg, msg-len, portid, __group, gfp_mask); +} EXPORT_SYMBOL_GPL(cn_netlink_send); /* diff --git a/include/linux/connector.h b/include/linux/connector.h index be9c4747..f8fe863 100644 --- a/include/linux/connector.h +++ b/include
Re: [PATCH 1/3] w1: fix netlink refcnt leak on error path
This patch is a bug fix, and I see from the mailing list I'm not the only one to run into this bug, so it would be nice for this patch to make it into this merge window. I didn't tag this one for stable because it doesn't apply cleanly due to previous changes that did make it into the merge window. Let me know if I should rewrite it for stable and which kernel version. The other two patches are more feature based changes. On Tue, Apr 08, 2014 at 10:37:07PM -0500, David Fries wrote: If the message type is W1_MASTER_CMD or W1_SLAVE_CMD, then a reference is taken when searching for the slave or master device. If there isn't any following data m-len (mlen is a copy) is 0 and packing up the message for later execution is skipped leaving nothing to decrement the reference counts. Way back when, m-len was checked before the search that increments the reference count, but W1_LIST_MASTERS has no additional data, the check was moved in 9be62e0b2fadaf5ff causing this bug. This change reorders to put the check before the reference count is incremented avoiding the problem. Signed-off-by: David Fries da...@fries.net Acked-by: Evgeniy Polyakov z...@ioremap.net -- David Fries da...@fries.netPGP pub CB1EE8F0 http://fries.net/~david/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] w1: fixes and bundling replies
Evgeniy, Could you review this set of patches in this thread? Thanks. On Sat, Mar 22, 2014 at 08:27:44PM -0500, David Fries wrote: > On Fri, Feb 21, 2014 at 01:07:28AM +0400, Evgeniy Polyakov wrote: > > Your approach and patch seem correct, but I worry about how old > > commands are processed. Do I get it right, that replies to old > > non-bundle commands are queued back instead of sending immediately, > > but since it is not bundle but single command, queued reply is being > > sent 'almost' immediately? Basically, nothing changed to old > > clients, but there is new way to send batch requests now? > > Yes, with the previous patch set, if a client sent one command per > message they would get the replies in individual packets. However in > some cases clients had to bundle multiple commands in one message. > For instance when using a slave command to read a temperature sensor > it takes one W1_CMD_WRITE command to send a W1_READ_SCRATCHPAD, then > another W1_CMD_READ command to read the data back and they have to be > sent in the same W1_SLAVE_CMD because there can't be a reset between > the two write and read, and you can't do that with a slave command. > You can with master commands, but if you are going to issue individual > reset, write, and read commands why would you not bundle them? > > Here is a new set of patches making bundling opt in. > > In this version of the patch, W1_CN_BUNDLE set in the cn_msg.flags > enables the kernel to bundle the reply, otherwise replies aren't > bundled. > > The previous patch separated the data replies and status replies > because they have a different ack value which is in the cn_msg and > cn_netlink_send could only send one cn_msg in a call. I didn't much > like that solution because now status and data replies were out of > order. This version creates cn_netlink_send_mult which takes a length > which allows multiple cn_msg structures to be sent at once (inside one > nlmsghdr, so clients can see there are more cn_msg structures to > read), so now status and data messages can be sent in order. > > This is somewhat suboptimal as each command has a status reply and > possibly a data reply, each requiring a different ack, so even if a > client sent multiple commands in one w1_netlink_msg like, > > cn_msg [ack ], w1_netlink_msg, w1_netlink_cmd [read], w1_netlink_cmd [read] > > the response can't pack the commands into one w1_netlink_msg because > of the ack, so there's duplicate of cn_msg and w1_netlink_msg > structures, > > cn_msg [seq+1], w1_netlink_msg, w1_netlink_cmd [data] > cn_msg [ack ], w1_netlink_msg, w1_netlink_cmd [status] > cn_msg [seq+1], w1_netlink_msg, w1_netlink_cmd [data] > cn_msg [ack ], w1_netlink_msg, w1_netlink_cmd [status] > > cn_msg is 20 bytes, w1_netlink_msg 12, so compared to sending separate > packets with all the context switches, select overhead and such, > bundling is going to be well worth it. > > Another alternative could be one cn_msg [seq+1] with the data, and > followed by cn_msg [ack ] with the status messages, but they are back > to out of order. > > I wasn't completely sure what to call the new sending function, I > decided on mult for multiple cn_netlink_send_mult, cn_netlink_send_len > as another idea, or if there are any better ideas let me know. > > I tested with my client program that will bundle up 14 temperature > sensor conversions, then after a delay, bundle up another set of > commands to read all 14 with the bundle bit set. I also tested with a > two year old version of the software that sends requests two one slave > at a time (bundling only the write/read to get the data out), and > doesn't have code to read the bundling the this patch adds. Both > operate correctly running at the same time. > > Posted before, no changes > > connector support for sending multiple cn_msg > > bundling, > corrects ack value to previous ack or seq + 1 > corrects the variables to be name consistently > > Documentation/connector/connector.txt | 15 +- > Documentation/w1/w1.generic |2 +- > Documentation/w1/w1.netlink | 13 +- > drivers/connector/connector.c | 17 +- > drivers/w1/w1.h |8 - > drivers/w1/w1_netlink.c | 673 > - > drivers/w1/w1_netlink.h | 36 ++ > include/linux/connector.h |1 + > 8 files changed, 489 insertions(+), 276 deletions(-) > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at
Re: [RFC] w1: fixes and bundling replies
Evgeniy, Could you review this set of patches in this thread? Thanks. On Sat, Mar 22, 2014 at 08:27:44PM -0500, David Fries wrote: On Fri, Feb 21, 2014 at 01:07:28AM +0400, Evgeniy Polyakov wrote: Your approach and patch seem correct, but I worry about how old commands are processed. Do I get it right, that replies to old non-bundle commands are queued back instead of sending immediately, but since it is not bundle but single command, queued reply is being sent 'almost' immediately? Basically, nothing changed to old clients, but there is new way to send batch requests now? Yes, with the previous patch set, if a client sent one command per message they would get the replies in individual packets. However in some cases clients had to bundle multiple commands in one message. For instance when using a slave command to read a temperature sensor it takes one W1_CMD_WRITE command to send a W1_READ_SCRATCHPAD, then another W1_CMD_READ command to read the data back and they have to be sent in the same W1_SLAVE_CMD because there can't be a reset between the two write and read, and you can't do that with a slave command. You can with master commands, but if you are going to issue individual reset, write, and read commands why would you not bundle them? Here is a new set of patches making bundling opt in. In this version of the patch, W1_CN_BUNDLE set in the cn_msg.flags enables the kernel to bundle the reply, otherwise replies aren't bundled. The previous patch separated the data replies and status replies because they have a different ack value which is in the cn_msg and cn_netlink_send could only send one cn_msg in a call. I didn't much like that solution because now status and data replies were out of order. This version creates cn_netlink_send_mult which takes a length which allows multiple cn_msg structures to be sent at once (inside one nlmsghdr, so clients can see there are more cn_msg structures to read), so now status and data messages can be sent in order. This is somewhat suboptimal as each command has a status reply and possibly a data reply, each requiring a different ack, so even if a client sent multiple commands in one w1_netlink_msg like, cn_msg [ack ], w1_netlink_msg, w1_netlink_cmd [read], w1_netlink_cmd [read] the response can't pack the commands into one w1_netlink_msg because of the ack, so there's duplicate of cn_msg and w1_netlink_msg structures, cn_msg [seq+1], w1_netlink_msg, w1_netlink_cmd [data] cn_msg [ack ], w1_netlink_msg, w1_netlink_cmd [status] cn_msg [seq+1], w1_netlink_msg, w1_netlink_cmd [data] cn_msg [ack ], w1_netlink_msg, w1_netlink_cmd [status] cn_msg is 20 bytes, w1_netlink_msg 12, so compared to sending separate packets with all the context switches, select overhead and such, bundling is going to be well worth it. Another alternative could be one cn_msg [seq+1] with the data, and followed by cn_msg [ack ] with the status messages, but they are back to out of order. I wasn't completely sure what to call the new sending function, I decided on mult for multiple cn_netlink_send_mult, cn_netlink_send_len as another idea, or if there are any better ideas let me know. I tested with my client program that will bundle up 14 temperature sensor conversions, then after a delay, bundle up another set of commands to read all 14 with the bundle bit set. I also tested with a two year old version of the software that sends requests two one slave at a time (bundling only the write/read to get the data out), and doesn't have code to read the bundling the this patch adds. Both operate correctly running at the same time. Posted before, no changes connector support for sending multiple cn_msg bundling, corrects ack value to previous ack or seq + 1 corrects the variables to be name consistently Documentation/connector/connector.txt | 15 +- Documentation/w1/w1.generic |2 +- Documentation/w1/w1.netlink | 13 +- drivers/connector/connector.c | 17 +- drivers/w1/w1.h |8 - drivers/w1/w1_netlink.c | 673 - drivers/w1/w1_netlink.h | 36 ++ include/linux/connector.h |1 + 8 files changed, 489 insertions(+), 276 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- David Fries da...@fries.netPGP pub CB1EE8F0 http://fries.net/~david/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: w1: 3.14-rc7 - possible recursive locking detected
On Mon, Mar 17, 2014 at 10:38:13PM +0100, Belisko Marek wrote: > Hi, > > booting latest 3.14-rc7 on gta04 board gives following warning: > > [3.101409] Driver for 1-wire Dallas network protocol. > [3.109649] omap_hdq omap_hdq: OMAP HDQ Hardware Rev 0.5. Driver in > [3.400146]CPU0 > [3.402709] > [3.405273] lock(&(>bus_notifier)->rwsem); > [3.410308] lock(&(>bus_notifier)->rwsem); > [3.415374] > [3.415374] *** DEADLOCK *** > > AFAIK we didn't see that on (at least 3/14-rc2). The last drivers/w1 change went in before v3.14-rc1, so if it is something in the w1 code, it has either been there or something else in the rest of the kernel changed to expose it. I'm using the ds2490 USB w1 dongle, I didn't see that problem when enabling lock debugging on 3.14.0-rc5+ which includes some changes I'm working on. https://github.com/dfries/linux.git w1_rework -- David Fries PGP pub CB1EE8F0 http://fries.net/~david/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] w1: Fix refcount leak in netlink connector
I've been doing an overhaul of the w1 netlink system and have a patch outstanding to address this issue. It requires patches already accepted in gregkh char-misc-next. w1: fix netlink refcnt leak on error path I avoided the issue by checking the length before taking any references to avoid adding another goto target. Actually the changes in char-misc-next removed one goto target so with my patch series that routine only has one jump target, much easier to track what's going on than having three. Accepted patches (still need to apply the refcnt patch) git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git char-misc-next Here's my current set of patches on top of char-misc-next. 7cf8baf26cdc2c27bcff562e9b2328ac891dc3a5 https://github.com/dfries/linux.git w1_rework For my setup I have 14 ds18b20 one wire temperature sensors inside and outside of my house connected to the ds2490 USB dongle that I have a program talking over netlink to read every 30 seconds. I'm curious what's your setup? I would appreciate it if I could get some testers on the w1_rework branch and verify it doesn't break anything with other programs. The biggest user visible changes from the w1_rework branch are. w1: new netlink commands, add/remove/list slaves w1: process w1 netlink commands in w1_process thread w1: reply only to the requester portid ds2490 hardware search, much faster search in general, 23 seconds to .3 seconds w1: fix netlink refcnt leak on error path w1: optional bundling of netlink kernel replies With all these changes the program is no longer blocked when issuing commands because the w1_process thread now executes them. That's a pretty big deal if your program has something else to do while waiting for a reply. Then there's the optional bundling, so I can now send one netlink packet which will request a conversion or read a previous conversion for all temperature sensors and the kernel will reply with one netlink packet with all the status or results, instead of something like 56 individual packets. On Sun, Mar 09, 2014 at 12:08:48AM +0100, Richard Weinberger wrote: > If userspace sends a w1 message of length 0 we leak > the refcount. > > Signed-off-by: Richard Weinberger > --- > drivers/w1/w1_netlink.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/w1/w1_netlink.c b/drivers/w1/w1_netlink.c > index 40788c9..7131777 100644 > --- a/drivers/w1/w1_netlink.c > +++ b/drivers/w1/w1_netlink.c > @@ -355,7 +355,7 @@ static void w1_cn_callback(struct cn_msg *msg, struct > netlink_skb_parms *nsp) > > err = 0; > if (!mlen) > - goto out_cont; > + goto out_dec; > > mutex_lock(>mutex); > > @@ -384,10 +384,11 @@ static void w1_cn_callback(struct cn_msg *msg, struct > netlink_skb_parms *nsp) > mlen -= cmd->len + sizeof(struct w1_netlink_cmd); > } > out_up: > + mutex_unlock(>mutex); > +out_dec: > atomic_dec(>refcnt); > if (sl) > atomic_dec(>refcnt); > - mutex_unlock(>mutex); > out_cont: > if (!cmd || err) > w1_netlink_send_error(msg, m, cmd, err); > -- > 1.8.4.2 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- David Fries PGP pub CB1EE8F0 http://fries.net/~david/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] w1: Fix refcount leak in netlink connector
I've been doing an overhaul of the w1 netlink system and have a patch outstanding to address this issue. It requires patches already accepted in gregkh char-misc-next. w1: fix netlink refcnt leak on error path I avoided the issue by checking the length before taking any references to avoid adding another goto target. Actually the changes in char-misc-next removed one goto target so with my patch series that routine only has one jump target, much easier to track what's going on than having three. Accepted patches (still need to apply the refcnt patch) git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git char-misc-next Here's my current set of patches on top of char-misc-next. 7cf8baf26cdc2c27bcff562e9b2328ac891dc3a5 https://github.com/dfries/linux.git w1_rework For my setup I have 14 ds18b20 one wire temperature sensors inside and outside of my house connected to the ds2490 USB dongle that I have a program talking over netlink to read every 30 seconds. I'm curious what's your setup? I would appreciate it if I could get some testers on the w1_rework branch and verify it doesn't break anything with other programs. The biggest user visible changes from the w1_rework branch are. w1: new netlink commands, add/remove/list slaves w1: process w1 netlink commands in w1_process thread w1: reply only to the requester portid ds2490 hardware search, much faster search in general, 23 seconds to .3 seconds w1: fix netlink refcnt leak on error path w1: optional bundling of netlink kernel replies With all these changes the program is no longer blocked when issuing commands because the w1_process thread now executes them. That's a pretty big deal if your program has something else to do while waiting for a reply. Then there's the optional bundling, so I can now send one netlink packet which will request a conversion or read a previous conversion for all temperature sensors and the kernel will reply with one netlink packet with all the status or results, instead of something like 56 individual packets. On Sun, Mar 09, 2014 at 12:08:48AM +0100, Richard Weinberger wrote: If userspace sends a w1 message of length 0 we leak the refcount. Signed-off-by: Richard Weinberger rich...@nod.at --- drivers/w1/w1_netlink.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/w1/w1_netlink.c b/drivers/w1/w1_netlink.c index 40788c9..7131777 100644 --- a/drivers/w1/w1_netlink.c +++ b/drivers/w1/w1_netlink.c @@ -355,7 +355,7 @@ static void w1_cn_callback(struct cn_msg *msg, struct netlink_skb_parms *nsp) err = 0; if (!mlen) - goto out_cont; + goto out_dec; mutex_lock(dev-mutex); @@ -384,10 +384,11 @@ static void w1_cn_callback(struct cn_msg *msg, struct netlink_skb_parms *nsp) mlen -= cmd-len + sizeof(struct w1_netlink_cmd); } out_up: + mutex_unlock(dev-mutex); +out_dec: atomic_dec(dev-refcnt); if (sl) atomic_dec(sl-refcnt); - mutex_unlock(dev-mutex); out_cont: if (!cmd || err) w1_netlink_send_error(msg, m, cmd, err); -- 1.8.4.2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- David Fries da...@fries.netPGP pub CB1EE8F0 http://fries.net/~david/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: w1: 3.14-rc7 - possible recursive locking detected
On Mon, Mar 17, 2014 at 10:38:13PM +0100, Belisko Marek wrote: Hi, booting latest 3.14-rc7 on gta04 board gives following warning: [3.101409] Driver for 1-wire Dallas network protocol. [3.109649] omap_hdq omap_hdq: OMAP HDQ Hardware Rev 0.5. Driver in [3.400146]CPU0 [3.402709] [3.405273] lock((priv-bus_notifier)-rwsem); [3.410308] lock((priv-bus_notifier)-rwsem); [3.415374] [3.415374] *** DEADLOCK *** AFAIK we didn't see that on (at least 3/14-rc2). The last drivers/w1 change went in before v3.14-rc1, so if it is something in the w1 code, it has either been there or something else in the rest of the kernel changed to expose it. I'm using the ds2490 USB w1 dongle, I didn't see that problem when enabling lock debugging on 3.14.0-rc5+ which includes some changes I'm working on. https://github.com/dfries/linux.git w1_rework -- David Fries da...@fries.netPGP pub CB1EE8F0 http://fries.net/~david/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/3] w1: optional bundling of netlink kernel replies
Applications can submit a set of commands in one packet to the kernel, and in some cases it is required such as reading the temperature sensor results. This adds an option W1_CN_BUNDLE to the flags of cn_msg to request the kernel to reply in one packet for efficiency. The cn_msg flags now check for unknown flag values and return an error if one is seen. See "Proper handling of unknown flags in system calls" http://lwn.net/Articles/588444/ This corrects the ack values returned as per the protocol standard, namely the original ack for status messages and seq + 1 for all others such as the data returned from a read. Signed-off-by: David Fries Some of the common variable names have been standardized as follows. struct cn_msg *cn struct w1_netlink_msg *msg struct w1_netlink_cmd *cmd struct w1_master *dev When an argument and a function scope variable would collide, add req_ to the argument. --- Documentation/connector/connector.txt |2 +- Documentation/w1/w1.generic |2 +- Documentation/w1/w1.netlink | 13 +- drivers/w1/w1.h |8 - drivers/w1/w1_netlink.c | 649 - drivers/w1/w1_netlink.h | 36 ++ 6 files changed, 447 insertions(+), 263 deletions(-) diff --git a/Documentation/connector/connector.txt b/Documentation/connector/connector.txt index e56abdb..f6215f9 100644 --- a/Documentation/connector/connector.txt +++ b/Documentation/connector/connector.txt @@ -118,7 +118,7 @@ acknowledge number MUST be the same + 1. If we receive a message and its sequence number is not equal to one we are expecting, then it is a new message. If we receive a message and its sequence number is the same as one we are expecting, but its -acknowledge is not equal to the acknowledge number in the original +acknowledge is not equal to the sequence number in the original message + 1, then it is a new message. Obviously, the protocol header contains the above id. diff --git a/Documentation/w1/w1.generic b/Documentation/w1/w1.generic index a31c5a2..b2033c6 100644 --- a/Documentation/w1/w1.generic +++ b/Documentation/w1/w1.generic @@ -82,7 +82,7 @@ driver - (standard) symlink to the w1 driver w1_master_add - Manually register a slave device w1_master_attempts - the number of times a search was attempted w1_master_max_slave_count - - the maximum slaves that may be attached to a master + - maximum number of slaves to search for at a time w1_master_name - the name of the device (w1_bus_masterX) w1_master_pullup - 5V strong pullup 0 enabled, 1 disabled w1_master_remove - Manually remove a slave device diff --git a/Documentation/w1/w1.netlink b/Documentation/w1/w1.netlink index 927a52c..ef27271 100644 --- a/Documentation/w1/w1.netlink +++ b/Documentation/w1/w1.netlink @@ -30,7 +30,7 @@ Protocol. W1_SLAVE_CMD userspace command for slave device (read/write/touch) - __u8 res- reserved + __u8 status - error indication from kernel __u16 len - size of data attached to this header data union { __u8 id[8]; - slave unique device id @@ -44,10 +44,14 @@ Protocol. __u8 cmd- command opcode. W1_CMD_READ - read command W1_CMD_WRITE- write command - W1_CMD_TOUCH- touch command - (write and sample data back to userspace) W1_CMD_SEARCH - search command W1_CMD_ALARM_SEARCH - alarm search command + W1_CMD_TOUCH- touch command + (write and sample data back to userspace) + W1_CMD_RESET- send bus reset + W1_CMD_SLAVE_ADD- add slave to kernel list + W1_CMD_SLAVE_REMOVE - remove slave from kernel list + W1_CMD_LIST_SLAVES - get slaves list from kernel __u8 res- reserved __u16 len - length of data for this command For read command data must be allocated like for write command @@ -87,8 +91,7 @@ format: id0 ... idN Each message is at most 4k in size, so if number of master devices - exceeds this, it will be split into several messages, - cn.seq will be increased for each one. + exceeds this, it will be split into several messages. W1 search and alarm search commands. request: diff --git a/drivers/w1/w1.h b/drivers/w1/w1.h index 734dab7..56a49ba 100644 --- a/drivers/w1/w1.h +++ b/drivers/w1/w1.h @@ -203,7 +203,6 @@ enum w1_master_flags { * @search_id: allows continuing a search * @refcnt:reference count * @priv: private da
[PATCH 2/3] connector: allow multiple messages to be sent in one packet
This increases the amount of bundling to reduce the number of packets sent. For the one wire use there can be multiple struct w1_netlink_cmd in a struct w1_netlink_msg and multiple of those in struct cn_msg, and with this change multiple of those in a struct nlmsghdr, and at each level the len identifies there being multiple of the next. Signed-off-by: David Fries --- Documentation/connector/connector.txt | 13 ++--- drivers/connector/connector.c | 17 +++-- include/linux/connector.h |1 + 3 files changed, 26 insertions(+), 5 deletions(-) diff --git a/Documentation/connector/connector.txt b/Documentation/connector/connector.txt index e5c5f5e..e56abdb 100644 --- a/Documentation/connector/connector.txt +++ b/Documentation/connector/connector.txt @@ -24,7 +24,8 @@ netlink based networking for inter-process communication in a significantly easier way: int cn_add_callback(struct cb_id *id, char *name, void (*callback) (struct cn_msg *, struct netlink_skb_parms *)); -void cn_netlink_send(struct cn_msg *msg, u32 __group, int gfp_mask); +void cn_netlink_send_multi(struct cn_msg *msg, u16 len, u32 portid, u32 __group, int gfp_mask); +void cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group, int gfp_mask); struct cb_id { @@ -71,15 +72,21 @@ void cn_del_callback(struct cb_id *id); struct cb_id *id - unique connector's user identifier. -int cn_netlink_send(struct cn_msg *msg, u32 __groups, int gfp_mask); +int cn_netlink_send_multi(struct cn_msg *msg, u16 len, u32 portid, u32 __groups, int gfp_mask); +int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __groups, int gfp_mask); Sends message to the specified groups. It can be safely called from softirq context, but may silently fail under strong memory pressure. If there are no listeners for given group -ESRCH can be returned. struct cn_msg * - message header(with attached data). + u16 len - for *_multi multiple cn_msg messages can be sent + u32 port - destination port. + If non-zero the message will be sent to the + given port, which should be set to the + original sender. u32 __group - destination group. - If __group is zero, then appropriate group will + If port and __group is zero, then appropriate group will be searched through all registered connector users, and message will be delivered to the group which was created for user with the same ID as in msg. diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c index 77afe74..f4b0cdb 100644 --- a/drivers/connector/connector.c +++ b/drivers/connector/connector.c @@ -43,6 +43,8 @@ static struct cn_dev cdev; static int cn_already_initialized; /* + * Sends mult (multiple) cn_msg at a time. + * * msg->seq and msg->ack are used to determine message genealogy. * When someone sends message it puts there locally unique sequence * and random acknowledge numbers. Sequence number may be copied into @@ -62,10 +64,13 @@ static int cn_already_initialized; * the acknowledgement number in the original message + 1, then it is * a new message. * + * If msg->len != len, then additional cn_msg messages are expected following + * the first msg. + * * The message is sent to, the portid if given, the group if given, both if * both, or if both are zero then the group is looked up and sent there. */ -int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group, +int cn_netlink_send_mult(struct cn_msg *msg, u16 len, u32 portid, u32 __group, gfp_t gfp_mask) { struct cn_callback_entry *__cbq; @@ -98,7 +103,7 @@ int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group, if (!portid && !netlink_has_listeners(dev->nls, group)) return -ESRCH; - size = sizeof(*msg) + msg->len; + size = sizeof(*msg) + len; skb = nlmsg_new(size, gfp_mask); if (!skb) @@ -121,6 +126,14 @@ int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group, gfp_mask); return netlink_unicast(dev->nls, skb, portid, !(gfp_mask&__GFP_WAIT)); } +EXPORT_SYMBOL_GPL(cn_netlink_send_mult); + +/* same as cn_netlink_send_mult except msg->len is used for len */ +int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group, + gfp_t gfp_mask) +{ + return cn_netlink_send_mult(msg, msg->len, portid, __group, gfp_mask); +} EXPORT_SYMBOL_GPL(cn_netlink_send); /* diff --git a/include/linux/connector.h b/include/linux/connector.h index be9c4747..f8fe863 100644 --- a/include/linux/connector.h +++ b/include/linux/connect
[PATCH 1/3] w1: fix netlink refcnt leak on error path
If the message type is W1_MASTER_CMD or W1_SLAVE_CMD, then a reference is taken when searching for the slave or master device. If there isn't any following data m->len (mlen is a copy) is 0 and packing up the message for later execution is skipped leaving nothing to decrement the reference counts. Way back when, m->len was checked before the search that increments the reference count, but W1_LIST_MASTERS has no additional data, the check was moved in 9be62e0b2fadaf5ff causing this bug. This change reorders to put the check before the reference count is incremented avoiding the problem. Signed-off-by: David Fries --- drivers/w1/w1_netlink.c | 44 ++-- 1 file changed, 26 insertions(+), 18 deletions(-) diff --git a/drivers/w1/w1_netlink.c b/drivers/w1/w1_netlink.c index 5234964..a02704a 100644 --- a/drivers/w1/w1_netlink.c +++ b/drivers/w1/w1_netlink.c @@ -300,12 +300,6 @@ static int w1_process_command_root(struct cn_msg *msg, struct w1_netlink_msg *w; u32 *id; - if (mcmd->type != W1_LIST_MASTERS) { - printk(KERN_NOTICE "%s: msg: %x.%x, wrong type: %u, len: %u.\n", - __func__, msg->id.idx, msg->id.val, mcmd->type, mcmd->len); - return -EPROTO; - } - cn = kmalloc(PAGE_SIZE, GFP_KERNEL); if (!cn) return -ENOMEM; @@ -441,6 +435,9 @@ static void w1_process_cb(struct w1_master *dev, struct w1_async_cmd *async_cmd) w1_netlink_send_error(>block->msg, node->m, cmd, node->block->portid, err); + /* ref taken in w1_search_slave or w1_search_master_id when building +* the block +*/ if (sl) w1_unref_slave(sl); else @@ -503,30 +500,42 @@ static void w1_cn_callback(struct cn_msg *msg, struct netlink_skb_parms *nsp) msg_len = msg->len; while (msg_len && !err) { - struct w1_reg_num id; - u16 mlen = m->len; dev = NULL; sl = NULL; - memcpy(, m->id.id, sizeof(id)); -#if 0 - printk("%s: %02x.%012llx.%02x: type=%02x, len=%u.\n", - __func__, id.family, (unsigned long long)id.id, id.crc, m->type, m->len); -#endif if (m->len + sizeof(struct w1_netlink_msg) > msg_len) { err = -E2BIG; break; } + /* execute on this thread, no need to process later */ + if (m->type == W1_LIST_MASTERS) { + err = w1_process_command_root(msg, m, nsp->portid); + goto out_cont; + } + + /* All following message types require additional data, +* check here before references are taken. +*/ + if (!m->len) { + err = -EPROTO; + goto out_cont; + } + + /* both search calls take reference counts */ if (m->type == W1_MASTER_CMD) { dev = w1_search_master_id(m->id.mst.id); } else if (m->type == W1_SLAVE_CMD) { - sl = w1_search_slave(); + sl = w1_search_slave((struct w1_reg_num *)m->id.id); if (sl) dev = sl->master; } else { - err = w1_process_command_root(msg, m, nsp->portid); + printk(KERN_NOTICE + "%s: msg: %x.%x, wrong type: %u, len: %u.\n", + __func__, msg->id.idx, msg->id.val, + m->type, m->len); + err = -EPROTO; goto out_cont; } @@ -536,8 +545,6 @@ static void w1_cn_callback(struct cn_msg *msg, struct netlink_skb_parms *nsp) } err = 0; - if (!mlen) - goto out_cont; atomic_inc(>refcnt); node->async.cb = w1_process_cb; @@ -557,7 +564,8 @@ out_cont: if (err) w1_netlink_send_error(msg, m, NULL, nsp->portid, err); msg_len -= sizeof(struct w1_netlink_msg) + m->len; - m = (struct w1_netlink_msg *)(((u8 *)m) + sizeof(struct w1_netlink_msg) + m->len); + m = (struct w1_netlink_msg *)(((u8 *)m) + + sizeof(struct w1_netlink_msg) + m->len); /* * Let's allow requests for nonexisting devices. -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC] w1: fixes and bundling replies
On Fri, Feb 21, 2014 at 01:07:28AM +0400, Evgeniy Polyakov wrote: > Your approach and patch seem correct, but I worry about how old > commands are processed. Do I get it right, that replies to old > non-bundle commands are queued back instead of sending immediately, > but since it is not bundle but single command, queued reply is being > sent 'almost' immediately? Basically, nothing changed to old > clients, but there is new way to send batch requests now? Yes, with the previous patch set, if a client sent one command per message they would get the replies in individual packets. However in some cases clients had to bundle multiple commands in one message. For instance when using a slave command to read a temperature sensor it takes one W1_CMD_WRITE command to send a W1_READ_SCRATCHPAD, then another W1_CMD_READ command to read the data back and they have to be sent in the same W1_SLAVE_CMD because there can't be a reset between the two write and read, and you can't do that with a slave command. You can with master commands, but if you are going to issue individual reset, write, and read commands why would you not bundle them? Here is a new set of patches making bundling opt in. In this version of the patch, W1_CN_BUNDLE set in the cn_msg.flags enables the kernel to bundle the reply, otherwise replies aren't bundled. The previous patch separated the data replies and status replies because they have a different ack value which is in the cn_msg and cn_netlink_send could only send one cn_msg in a call. I didn't much like that solution because now status and data replies were out of order. This version creates cn_netlink_send_mult which takes a length which allows multiple cn_msg structures to be sent at once (inside one nlmsghdr, so clients can see there are more cn_msg structures to read), so now status and data messages can be sent in order. This is somewhat suboptimal as each command has a status reply and possibly a data reply, each requiring a different ack, so even if a client sent multiple commands in one w1_netlink_msg like, cn_msg [ack ], w1_netlink_msg, w1_netlink_cmd [read], w1_netlink_cmd [read] the response can't pack the commands into one w1_netlink_msg because of the ack, so there's duplicate of cn_msg and w1_netlink_msg structures, cn_msg [seq+1], w1_netlink_msg, w1_netlink_cmd [data] cn_msg [ack ], w1_netlink_msg, w1_netlink_cmd [status] cn_msg [seq+1], w1_netlink_msg, w1_netlink_cmd [data] cn_msg [ack ], w1_netlink_msg, w1_netlink_cmd [status] cn_msg is 20 bytes, w1_netlink_msg 12, so compared to sending separate packets with all the context switches, select overhead and such, bundling is going to be well worth it. Another alternative could be one cn_msg [seq+1] with the data, and followed by cn_msg [ack ] with the status messages, but they are back to out of order. I wasn't completely sure what to call the new sending function, I decided on mult for multiple cn_netlink_send_mult, cn_netlink_send_len as another idea, or if there are any better ideas let me know. I tested with my client program that will bundle up 14 temperature sensor conversions, then after a delay, bundle up another set of commands to read all 14 with the bundle bit set. I also tested with a two year old version of the software that sends requests two one slave at a time (bundling only the write/read to get the data out), and doesn't have code to read the bundling the this patch adds. Both operate correctly running at the same time. Posted before, no changes connector support for sending multiple cn_msg bundling, corrects ack value to previous ack or seq + 1 corrects the variables to be name consistently Documentation/connector/connector.txt | 15 +- Documentation/w1/w1.generic |2 +- Documentation/w1/w1.netlink | 13 +- drivers/connector/connector.c | 17 +- drivers/w1/w1.h |8 - drivers/w1/w1_netlink.c | 673 - drivers/w1/w1_netlink.h | 36 ++ include/linux/connector.h |1 + 8 files changed, 489 insertions(+), 276 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC] w1: fixes and bundling replies
On Fri, Feb 21, 2014 at 01:07:28AM +0400, Evgeniy Polyakov wrote: Your approach and patch seem correct, but I worry about how old commands are processed. Do I get it right, that replies to old non-bundle commands are queued back instead of sending immediately, but since it is not bundle but single command, queued reply is being sent 'almost' immediately? Basically, nothing changed to old clients, but there is new way to send batch requests now? Yes, with the previous patch set, if a client sent one command per message they would get the replies in individual packets. However in some cases clients had to bundle multiple commands in one message. For instance when using a slave command to read a temperature sensor it takes one W1_CMD_WRITE command to send a W1_READ_SCRATCHPAD, then another W1_CMD_READ command to read the data back and they have to be sent in the same W1_SLAVE_CMD because there can't be a reset between the two write and read, and you can't do that with a slave command. You can with master commands, but if you are going to issue individual reset, write, and read commands why would you not bundle them? Here is a new set of patches making bundling opt in. In this version of the patch, W1_CN_BUNDLE set in the cn_msg.flags enables the kernel to bundle the reply, otherwise replies aren't bundled. The previous patch separated the data replies and status replies because they have a different ack value which is in the cn_msg and cn_netlink_send could only send one cn_msg in a call. I didn't much like that solution because now status and data replies were out of order. This version creates cn_netlink_send_mult which takes a length which allows multiple cn_msg structures to be sent at once (inside one nlmsghdr, so clients can see there are more cn_msg structures to read), so now status and data messages can be sent in order. This is somewhat suboptimal as each command has a status reply and possibly a data reply, each requiring a different ack, so even if a client sent multiple commands in one w1_netlink_msg like, cn_msg [ack ], w1_netlink_msg, w1_netlink_cmd [read], w1_netlink_cmd [read] the response can't pack the commands into one w1_netlink_msg because of the ack, so there's duplicate of cn_msg and w1_netlink_msg structures, cn_msg [seq+1], w1_netlink_msg, w1_netlink_cmd [data] cn_msg [ack ], w1_netlink_msg, w1_netlink_cmd [status] cn_msg [seq+1], w1_netlink_msg, w1_netlink_cmd [data] cn_msg [ack ], w1_netlink_msg, w1_netlink_cmd [status] cn_msg is 20 bytes, w1_netlink_msg 12, so compared to sending separate packets with all the context switches, select overhead and such, bundling is going to be well worth it. Another alternative could be one cn_msg [seq+1] with the data, and followed by cn_msg [ack ] with the status messages, but they are back to out of order. I wasn't completely sure what to call the new sending function, I decided on mult for multiple cn_netlink_send_mult, cn_netlink_send_len as another idea, or if there are any better ideas let me know. I tested with my client program that will bundle up 14 temperature sensor conversions, then after a delay, bundle up another set of commands to read all 14 with the bundle bit set. I also tested with a two year old version of the software that sends requests two one slave at a time (bundling only the write/read to get the data out), and doesn't have code to read the bundling the this patch adds. Both operate correctly running at the same time. Posted before, no changes connector support for sending multiple cn_msg bundling, corrects ack value to previous ack or seq + 1 corrects the variables to be name consistently Documentation/connector/connector.txt | 15 +- Documentation/w1/w1.generic |2 +- Documentation/w1/w1.netlink | 13 +- drivers/connector/connector.c | 17 +- drivers/w1/w1.h |8 - drivers/w1/w1_netlink.c | 673 - drivers/w1/w1_netlink.h | 36 ++ include/linux/connector.h |1 + 8 files changed, 489 insertions(+), 276 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/3] w1: fix netlink refcnt leak on error path
If the message type is W1_MASTER_CMD or W1_SLAVE_CMD, then a reference is taken when searching for the slave or master device. If there isn't any following data m-len (mlen is a copy) is 0 and packing up the message for later execution is skipped leaving nothing to decrement the reference counts. Way back when, m-len was checked before the search that increments the reference count, but W1_LIST_MASTERS has no additional data, the check was moved in 9be62e0b2fadaf5ff causing this bug. This change reorders to put the check before the reference count is incremented avoiding the problem. Signed-off-by: David Fries da...@fries.net --- drivers/w1/w1_netlink.c | 44 ++-- 1 file changed, 26 insertions(+), 18 deletions(-) diff --git a/drivers/w1/w1_netlink.c b/drivers/w1/w1_netlink.c index 5234964..a02704a 100644 --- a/drivers/w1/w1_netlink.c +++ b/drivers/w1/w1_netlink.c @@ -300,12 +300,6 @@ static int w1_process_command_root(struct cn_msg *msg, struct w1_netlink_msg *w; u32 *id; - if (mcmd-type != W1_LIST_MASTERS) { - printk(KERN_NOTICE %s: msg: %x.%x, wrong type: %u, len: %u.\n, - __func__, msg-id.idx, msg-id.val, mcmd-type, mcmd-len); - return -EPROTO; - } - cn = kmalloc(PAGE_SIZE, GFP_KERNEL); if (!cn) return -ENOMEM; @@ -441,6 +435,9 @@ static void w1_process_cb(struct w1_master *dev, struct w1_async_cmd *async_cmd) w1_netlink_send_error(node-block-msg, node-m, cmd, node-block-portid, err); + /* ref taken in w1_search_slave or w1_search_master_id when building +* the block +*/ if (sl) w1_unref_slave(sl); else @@ -503,30 +500,42 @@ static void w1_cn_callback(struct cn_msg *msg, struct netlink_skb_parms *nsp) msg_len = msg-len; while (msg_len !err) { - struct w1_reg_num id; - u16 mlen = m-len; dev = NULL; sl = NULL; - memcpy(id, m-id.id, sizeof(id)); -#if 0 - printk(%s: %02x.%012llx.%02x: type=%02x, len=%u.\n, - __func__, id.family, (unsigned long long)id.id, id.crc, m-type, m-len); -#endif if (m-len + sizeof(struct w1_netlink_msg) msg_len) { err = -E2BIG; break; } + /* execute on this thread, no need to process later */ + if (m-type == W1_LIST_MASTERS) { + err = w1_process_command_root(msg, m, nsp-portid); + goto out_cont; + } + + /* All following message types require additional data, +* check here before references are taken. +*/ + if (!m-len) { + err = -EPROTO; + goto out_cont; + } + + /* both search calls take reference counts */ if (m-type == W1_MASTER_CMD) { dev = w1_search_master_id(m-id.mst.id); } else if (m-type == W1_SLAVE_CMD) { - sl = w1_search_slave(id); + sl = w1_search_slave((struct w1_reg_num *)m-id.id); if (sl) dev = sl-master; } else { - err = w1_process_command_root(msg, m, nsp-portid); + printk(KERN_NOTICE + %s: msg: %x.%x, wrong type: %u, len: %u.\n, + __func__, msg-id.idx, msg-id.val, + m-type, m-len); + err = -EPROTO; goto out_cont; } @@ -536,8 +545,6 @@ static void w1_cn_callback(struct cn_msg *msg, struct netlink_skb_parms *nsp) } err = 0; - if (!mlen) - goto out_cont; atomic_inc(block-refcnt); node-async.cb = w1_process_cb; @@ -557,7 +564,8 @@ out_cont: if (err) w1_netlink_send_error(msg, m, NULL, nsp-portid, err); msg_len -= sizeof(struct w1_netlink_msg) + m-len; - m = (struct w1_netlink_msg *)(((u8 *)m) + sizeof(struct w1_netlink_msg) + m-len); + m = (struct w1_netlink_msg *)(((u8 *)m) + + sizeof(struct w1_netlink_msg) + m-len); /* * Let's allow requests for nonexisting devices. -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/3] connector: allow multiple messages to be sent in one packet
This increases the amount of bundling to reduce the number of packets sent. For the one wire use there can be multiple struct w1_netlink_cmd in a struct w1_netlink_msg and multiple of those in struct cn_msg, and with this change multiple of those in a struct nlmsghdr, and at each level the len identifies there being multiple of the next. Signed-off-by: David Fries da...@fries.net --- Documentation/connector/connector.txt | 13 ++--- drivers/connector/connector.c | 17 +++-- include/linux/connector.h |1 + 3 files changed, 26 insertions(+), 5 deletions(-) diff --git a/Documentation/connector/connector.txt b/Documentation/connector/connector.txt index e5c5f5e..e56abdb 100644 --- a/Documentation/connector/connector.txt +++ b/Documentation/connector/connector.txt @@ -24,7 +24,8 @@ netlink based networking for inter-process communication in a significantly easier way: int cn_add_callback(struct cb_id *id, char *name, void (*callback) (struct cn_msg *, struct netlink_skb_parms *)); -void cn_netlink_send(struct cn_msg *msg, u32 __group, int gfp_mask); +void cn_netlink_send_multi(struct cn_msg *msg, u16 len, u32 portid, u32 __group, int gfp_mask); +void cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group, int gfp_mask); struct cb_id { @@ -71,15 +72,21 @@ void cn_del_callback(struct cb_id *id); struct cb_id *id - unique connector's user identifier. -int cn_netlink_send(struct cn_msg *msg, u32 __groups, int gfp_mask); +int cn_netlink_send_multi(struct cn_msg *msg, u16 len, u32 portid, u32 __groups, int gfp_mask); +int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __groups, int gfp_mask); Sends message to the specified groups. It can be safely called from softirq context, but may silently fail under strong memory pressure. If there are no listeners for given group -ESRCH can be returned. struct cn_msg * - message header(with attached data). + u16 len - for *_multi multiple cn_msg messages can be sent + u32 port - destination port. + If non-zero the message will be sent to the + given port, which should be set to the + original sender. u32 __group - destination group. - If __group is zero, then appropriate group will + If port and __group is zero, then appropriate group will be searched through all registered connector users, and message will be delivered to the group which was created for user with the same ID as in msg. diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c index 77afe74..f4b0cdb 100644 --- a/drivers/connector/connector.c +++ b/drivers/connector/connector.c @@ -43,6 +43,8 @@ static struct cn_dev cdev; static int cn_already_initialized; /* + * Sends mult (multiple) cn_msg at a time. + * * msg-seq and msg-ack are used to determine message genealogy. * When someone sends message it puts there locally unique sequence * and random acknowledge numbers. Sequence number may be copied into @@ -62,10 +64,13 @@ static int cn_already_initialized; * the acknowledgement number in the original message + 1, then it is * a new message. * + * If msg-len != len, then additional cn_msg messages are expected following + * the first msg. + * * The message is sent to, the portid if given, the group if given, both if * both, or if both are zero then the group is looked up and sent there. */ -int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group, +int cn_netlink_send_mult(struct cn_msg *msg, u16 len, u32 portid, u32 __group, gfp_t gfp_mask) { struct cn_callback_entry *__cbq; @@ -98,7 +103,7 @@ int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group, if (!portid !netlink_has_listeners(dev-nls, group)) return -ESRCH; - size = sizeof(*msg) + msg-len; + size = sizeof(*msg) + len; skb = nlmsg_new(size, gfp_mask); if (!skb) @@ -121,6 +126,14 @@ int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group, gfp_mask); return netlink_unicast(dev-nls, skb, portid, !(gfp_mask__GFP_WAIT)); } +EXPORT_SYMBOL_GPL(cn_netlink_send_mult); + +/* same as cn_netlink_send_mult except msg-len is used for len */ +int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group, + gfp_t gfp_mask) +{ + return cn_netlink_send_mult(msg, msg-len, portid, __group, gfp_mask); +} EXPORT_SYMBOL_GPL(cn_netlink_send); /* diff --git a/include/linux/connector.h b/include/linux/connector.h index be9c4747..f8fe863 100644 --- a/include/linux/connector.h +++ b/include/linux/connector.h @@ -71,6 +71,7 @@ struct
[PATCH 3/3] w1: optional bundling of netlink kernel replies
Applications can submit a set of commands in one packet to the kernel, and in some cases it is required such as reading the temperature sensor results. This adds an option W1_CN_BUNDLE to the flags of cn_msg to request the kernel to reply in one packet for efficiency. The cn_msg flags now check for unknown flag values and return an error if one is seen. See Proper handling of unknown flags in system calls http://lwn.net/Articles/588444/ This corrects the ack values returned as per the protocol standard, namely the original ack for status messages and seq + 1 for all others such as the data returned from a read. Signed-off-by: David Fries da...@fries.net Some of the common variable names have been standardized as follows. struct cn_msg *cn struct w1_netlink_msg *msg struct w1_netlink_cmd *cmd struct w1_master *dev When an argument and a function scope variable would collide, add req_ to the argument. --- Documentation/connector/connector.txt |2 +- Documentation/w1/w1.generic |2 +- Documentation/w1/w1.netlink | 13 +- drivers/w1/w1.h |8 - drivers/w1/w1_netlink.c | 649 - drivers/w1/w1_netlink.h | 36 ++ 6 files changed, 447 insertions(+), 263 deletions(-) diff --git a/Documentation/connector/connector.txt b/Documentation/connector/connector.txt index e56abdb..f6215f9 100644 --- a/Documentation/connector/connector.txt +++ b/Documentation/connector/connector.txt @@ -118,7 +118,7 @@ acknowledge number MUST be the same + 1. If we receive a message and its sequence number is not equal to one we are expecting, then it is a new message. If we receive a message and its sequence number is the same as one we are expecting, but its -acknowledge is not equal to the acknowledge number in the original +acknowledge is not equal to the sequence number in the original message + 1, then it is a new message. Obviously, the protocol header contains the above id. diff --git a/Documentation/w1/w1.generic b/Documentation/w1/w1.generic index a31c5a2..b2033c6 100644 --- a/Documentation/w1/w1.generic +++ b/Documentation/w1/w1.generic @@ -82,7 +82,7 @@ driver - (standard) symlink to the w1 driver w1_master_add - Manually register a slave device w1_master_attempts - the number of times a search was attempted w1_master_max_slave_count - - the maximum slaves that may be attached to a master + - maximum number of slaves to search for at a time w1_master_name - the name of the device (w1_bus_masterX) w1_master_pullup - 5V strong pullup 0 enabled, 1 disabled w1_master_remove - Manually remove a slave device diff --git a/Documentation/w1/w1.netlink b/Documentation/w1/w1.netlink index 927a52c..ef27271 100644 --- a/Documentation/w1/w1.netlink +++ b/Documentation/w1/w1.netlink @@ -30,7 +30,7 @@ Protocol. W1_SLAVE_CMD userspace command for slave device (read/write/touch) - __u8 res- reserved + __u8 status - error indication from kernel __u16 len - size of data attached to this header data union { __u8 id[8]; - slave unique device id @@ -44,10 +44,14 @@ Protocol. __u8 cmd- command opcode. W1_CMD_READ - read command W1_CMD_WRITE- write command - W1_CMD_TOUCH- touch command - (write and sample data back to userspace) W1_CMD_SEARCH - search command W1_CMD_ALARM_SEARCH - alarm search command + W1_CMD_TOUCH- touch command + (write and sample data back to userspace) + W1_CMD_RESET- send bus reset + W1_CMD_SLAVE_ADD- add slave to kernel list + W1_CMD_SLAVE_REMOVE - remove slave from kernel list + W1_CMD_LIST_SLAVES - get slaves list from kernel __u8 res- reserved __u16 len - length of data for this command For read command data must be allocated like for write command @@ -87,8 +91,7 @@ format: id0 ... idN Each message is at most 4k in size, so if number of master devices - exceeds this, it will be split into several messages, - cn.seq will be increased for each one. + exceeds this, it will be split into several messages. W1 search and alarm search commands. request: diff --git a/drivers/w1/w1.h b/drivers/w1/w1.h index 734dab7..56a49ba 100644 --- a/drivers/w1/w1.h +++ b/drivers/w1/w1.h @@ -203,7 +203,6 @@ enum w1_master_flags { * @search_id: allows continuing a search * @refcnt:reference count * @priv: private data storage
[PATCH] w1: bundle reply if the request was bundled
A program can bundle multiple messages and commands together when making one wire requests, which is going to be much more efficient than sending lots of little packets one write at a time. With this change the kernel will then bundle responses to requests that were bundled, where it's probably even more important because programs will typically execute select or a poll operation between reading packets, which takes additional context switches. Without doing any bundling, reading 14 temperature sensors using slave commands requires the application to send 14 packets for the conversion with one command each, and then later for reading it takes 14 packets with two commands each, one to request reading the scratchpad, and another to read the data back in. The kernel will then generate 14 replies with data, and then one status for each of the three commands per sensor totaling 56 packets or 84 total packets. That's a lot of context switches and not very efficient on the one wire bus. Using master commands allows for merging writes, such as match rom, rom id, and convert temperature into sending one write to the hardware. But that means sending individual commands for reset and write (for convert), or reset, write, read (for reading the result) which would be 28 packets without bundling or 2 packets when bundling. It requires two packets because it must wait for the conversion before reading results. Without this change the kernel would send one read result, and five command status results, so 6*14 = 84 packets. With this set of changes it would send three, a set of status results for the first convert commands, and another status result along with the data read for the second set. This also allows an application to get all the results from a bundle (reading the temperature sensors) at the same time, instead of them being spread out in time as the one wire bus takes time to execute and read each response. Due to the data structure limitation this patch will return packets in a different order. For a given bundle all the replies will be sent first, followed by the status messages. This is because currently the connector takes a cn_msg, and the difference between a reply and status is the ack field of cn_msg. I could add a second cn_netlink_send_nlh, to take a nlmsghdr or maybe a length so multiple cn_msg can be bundled into one packet, then all the messages could be returned in the current order. --- drivers/w1/w1.h |8 - drivers/w1/w1_netlink.c | 531 ++- 2 files changed, 340 insertions(+), 199 deletions(-) diff --git a/drivers/w1/w1.h b/drivers/w1/w1.h index 734dab7..56a49ba 100644 --- a/drivers/w1/w1.h +++ b/drivers/w1/w1.h @@ -203,7 +203,6 @@ enum w1_master_flags { * @search_id: allows continuing a search * @refcnt:reference count * @priv: private data storage - * @priv_size: size allocated * @enable_pullup: allows a strong pullup * @pullup_duration: time for the next strong pullup * @flags: one of w1_master_flags @@ -214,7 +213,6 @@ enum w1_master_flags { * @dev: sysfs device * @bus_master:io operations available * @seq: sequence number used for netlink broadcasts - * @portid:destination for the current netlink command */ struct w1_master { @@ -241,7 +239,6 @@ struct w1_master atomic_trefcnt; void*priv; - int priv_size; /** 5V strong pullup enabled flag, 1 enabled, zero disabled. */ int enable_pullup; @@ -260,11 +257,6 @@ struct w1_master struct w1_bus_master*bus_master; u32 seq; - /* port id to send netlink responses to. The value is temporarily -* stored here while processing a message, set after locking the -* mutex, zero before unlocking the mutex. -*/ - u32 portid; }; /** diff --git a/drivers/w1/w1_netlink.c b/drivers/w1/w1_netlink.c index fa24561..98b9314 100644 --- a/drivers/w1/w1_netlink.c +++ b/drivers/w1/w1_netlink.c @@ -29,6 +29,198 @@ #include "w1_netlink.h" #if defined(CONFIG_W1_CON) && (defined(CONFIG_CONNECTOR) || (defined(CONFIG_CONNECTOR_MODULE) && defined(CONFIG_W1_MODULE))) + +#define MIN(a, b) (((a) < (b)) ? (a) : (b)) + +/* Bundle together everything required to process a request in one memory + * allocation. + */ +struct w1_cb_block { + atomic_t refcnt; + /* replies, this would be include the data for a read message */ + /* maximum value for reply_msg->len */ + u16 reply_maxlen; + /* pointers to building up the reply message */ + struct cn_msg *reply_msg; + struct w1_netlink_msg *reply_m; + struct w1_netlink_cmd *reply_cmd; + /* status, for w1_netlink_msg.status of 0 success or error,
[PATCH] w1: bundle reply if the request was bundled
A program can bundle multiple messages and commands together when making one wire requests, which is going to be much more efficient than sending lots of little packets one write at a time. With this change the kernel will then bundle responses to requests that were bundled, where it's probably even more important because programs will typically execute select or a poll operation between reading packets, which takes additional context switches. Without doing any bundling, reading 14 temperature sensors using slave commands requires the application to send 14 packets for the conversion with one command each, and then later for reading it takes 14 packets with two commands each, one to request reading the scratchpad, and another to read the data back in. The kernel will then generate 14 replies with data, and then one status for each of the three commands per sensor totaling 56 packets or 84 total packets. That's a lot of context switches and not very efficient on the one wire bus. Using master commands allows for merging writes, such as match rom, rom id, and convert temperature into sending one write to the hardware. But that means sending individual commands for reset and write (for convert), or reset, write, read (for reading the result) which would be 28 packets without bundling or 2 packets when bundling. It requires two packets because it must wait for the conversion before reading results. Without this change the kernel would send one read result, and five command status results, so 6*14 = 84 packets. With this set of changes it would send three, a set of status results for the first convert commands, and another status result along with the data read for the second set. This also allows an application to get all the results from a bundle (reading the temperature sensors) at the same time, instead of them being spread out in time as the one wire bus takes time to execute and read each response. Due to the data structure limitation this patch will return packets in a different order. For a given bundle all the replies will be sent first, followed by the status messages. This is because currently the connector takes a cn_msg, and the difference between a reply and status is the ack field of cn_msg. I could add a second cn_netlink_send_nlh, to take a nlmsghdr or maybe a length so multiple cn_msg can be bundled into one packet, then all the messages could be returned in the current order. --- drivers/w1/w1.h |8 - drivers/w1/w1_netlink.c | 531 ++- 2 files changed, 340 insertions(+), 199 deletions(-) diff --git a/drivers/w1/w1.h b/drivers/w1/w1.h index 734dab7..56a49ba 100644 --- a/drivers/w1/w1.h +++ b/drivers/w1/w1.h @@ -203,7 +203,6 @@ enum w1_master_flags { * @search_id: allows continuing a search * @refcnt:reference count * @priv: private data storage - * @priv_size: size allocated * @enable_pullup: allows a strong pullup * @pullup_duration: time for the next strong pullup * @flags: one of w1_master_flags @@ -214,7 +213,6 @@ enum w1_master_flags { * @dev: sysfs device * @bus_master:io operations available * @seq: sequence number used for netlink broadcasts - * @portid:destination for the current netlink command */ struct w1_master { @@ -241,7 +239,6 @@ struct w1_master atomic_trefcnt; void*priv; - int priv_size; /** 5V strong pullup enabled flag, 1 enabled, zero disabled. */ int enable_pullup; @@ -260,11 +257,6 @@ struct w1_master struct w1_bus_master*bus_master; u32 seq; - /* port id to send netlink responses to. The value is temporarily -* stored here while processing a message, set after locking the -* mutex, zero before unlocking the mutex. -*/ - u32 portid; }; /** diff --git a/drivers/w1/w1_netlink.c b/drivers/w1/w1_netlink.c index fa24561..98b9314 100644 --- a/drivers/w1/w1_netlink.c +++ b/drivers/w1/w1_netlink.c @@ -29,6 +29,198 @@ #include w1_netlink.h #if defined(CONFIG_W1_CON) (defined(CONFIG_CONNECTOR) || (defined(CONFIG_CONNECTOR_MODULE) defined(CONFIG_W1_MODULE))) + +#define MIN(a, b) (((a) (b)) ? (a) : (b)) + +/* Bundle together everything required to process a request in one memory + * allocation. + */ +struct w1_cb_block { + atomic_t refcnt; + /* replies, this would be include the data for a read message */ + /* maximum value for reply_msg-len */ + u16 reply_maxlen; + /* pointers to building up the reply message */ + struct cn_msg *reply_msg; + struct w1_netlink_msg *reply_m; + struct w1_netlink_cmd *reply_cmd; + /* status, for w1_netlink_msg.status of 0 success or error, can't +
Re: [patch] w1: small type cleanup in sysfs
Dan, I have some other changes in work, how automated is your checkers? How much work is it for me to give a github repository and branch and find out if I introduced any problems before submitting them? I didn't get how you could get a less than one after a check for less than one from the description or patch until I looked at the rest of the source code. Looks good if the description mentions max_slave_count is an int. How about wording it, On 64 bit systems, a large value for "long tmp" is truncated when assigning to "int md->max_slave_count" so we still end up with a value less than one despite the "tmp < 1" check. Acked-by: David Fries On Tue, Feb 11, 2014 at 07:08:26PM +0300, Dan Carpenter wrote: > On 64 bit systems, a large value for "tmp" could be truncated so we > still end up with a ->max_slave_count which is less than one despite the > "tmp < 1" check. > > This is more of a problem for static checkers than a real life issue, > but it's simple enough to fix. > > Signed-off-by: Dan Carpenter > > diff --git a/drivers/w1/w1.c b/drivers/w1/w1.c > index 9eb816b2ea5e..b96f61b15dc6 100644 > --- a/drivers/w1/w1.c > +++ b/drivers/w1/w1.c > @@ -320,10 +320,10 @@ static ssize_t w1_master_attribute_show_timeout(struct > device *dev, struct devic > static ssize_t w1_master_attribute_store_max_slave_count(struct device *dev, > struct device_attribute *attr, const char *buf, size_t count) > { > - long tmp; > + int tmp; > struct w1_master *md = dev_to_w1_master(dev); > > - if (kstrtol(buf, 0, ) == -EINVAL || tmp < 1) > + if (kstrtoint(buf, 0, ) == -EINVAL || tmp < 1) > return -EINVAL; > > mutex_lock(>mutex); -- David Fries PGP pub CB1EE8F0 http://fries.net/~david/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] w1: small type cleanup in sysfs
Dan, I have some other changes in work, how automated is your checkers? How much work is it for me to give a github repository and branch and find out if I introduced any problems before submitting them? I didn't get how you could get a less than one after a check for less than one from the description or patch until I looked at the rest of the source code. Looks good if the description mentions max_slave_count is an int. How about wording it, On 64 bit systems, a large value for long tmp is truncated when assigning to int md-max_slave_count so we still end up with a value less than one despite the tmp 1 check. Acked-by: David Fries da...@fries.net On Tue, Feb 11, 2014 at 07:08:26PM +0300, Dan Carpenter wrote: On 64 bit systems, a large value for tmp could be truncated so we still end up with a -max_slave_count which is less than one despite the tmp 1 check. This is more of a problem for static checkers than a real life issue, but it's simple enough to fix. Signed-off-by: Dan Carpenter dan.carpen...@oracle.com diff --git a/drivers/w1/w1.c b/drivers/w1/w1.c index 9eb816b2ea5e..b96f61b15dc6 100644 --- a/drivers/w1/w1.c +++ b/drivers/w1/w1.c @@ -320,10 +320,10 @@ static ssize_t w1_master_attribute_show_timeout(struct device *dev, struct devic static ssize_t w1_master_attribute_store_max_slave_count(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { - long tmp; + int tmp; struct w1_master *md = dev_to_w1_master(dev); - if (kstrtol(buf, 0, tmp) == -EINVAL || tmp 1) + if (kstrtoint(buf, 0, tmp) == -EINVAL || tmp 1) return -EINVAL; mutex_lock(md-mutex); -- David Fries da...@fries.netPGP pub CB1EE8F0 http://fries.net/~david/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Revert ds1wm.c from "w1: hold bus_mutex in netlink and search"
This reverts ds1wm.c from commit d3a8a9dbb903c73a7ec2deae4c9b7d74b6834f4c. Of the three files changed ds1wm.c ds2490.c and w1_netlink.c, it turns out ds1wm.c was locking bus_mutex, but inside the loop and I missed it. Reverting ds1wm.c to the previous version. Signed-off-by: David Fries Reported-by: Dan Carpenter --- drivers/w1/masters/ds1wm.c |4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/w1/masters/ds1wm.c b/drivers/w1/masters/ds1wm.c index b077b8b..02df3b1 100644 --- a/drivers/w1/masters/ds1wm.c +++ b/drivers/w1/masters/ds1wm.c @@ -326,14 +326,13 @@ static void ds1wm_search(void *data, struct w1_master *master_dev, unsigned slaves_found = 0; unsigned int pass = 0; - mutex_lock(_dev->bus_mutex); dev_dbg(_data->pdev->dev, "search begin\n"); while (true) { ++pass; if (pass > 100) { dev_dbg(_data->pdev->dev, "too many attempts (100), search aborted\n"); - break; + return; } mutex_lock(_dev->bus_mutex); @@ -440,7 +439,6 @@ static void ds1wm_search(void *data, struct w1_master *master_dev, dev_dbg(_data->pdev->dev, "pass: %d total: %d search done ms d bit pos: %d\n", pass, slaves_found, ms_discrep_bit); - mutex_unlock(_dev->bus_mutex); } /* - */ -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Revert ds1wm.c from w1: hold bus_mutex in netlink and search
This reverts ds1wm.c from commit d3a8a9dbb903c73a7ec2deae4c9b7d74b6834f4c. Of the three files changed ds1wm.c ds2490.c and w1_netlink.c, it turns out ds1wm.c was locking bus_mutex, but inside the loop and I missed it. Reverting ds1wm.c to the previous version. Signed-off-by: David Fries da...@fries.net Reported-by: Dan Carpenter dan.carpen...@oracle.com --- drivers/w1/masters/ds1wm.c |4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/w1/masters/ds1wm.c b/drivers/w1/masters/ds1wm.c index b077b8b..02df3b1 100644 --- a/drivers/w1/masters/ds1wm.c +++ b/drivers/w1/masters/ds1wm.c @@ -326,14 +326,13 @@ static void ds1wm_search(void *data, struct w1_master *master_dev, unsigned slaves_found = 0; unsigned int pass = 0; - mutex_lock(master_dev-bus_mutex); dev_dbg(ds1wm_data-pdev-dev, search begin\n); while (true) { ++pass; if (pass 100) { dev_dbg(ds1wm_data-pdev-dev, too many attempts (100), search aborted\n); - break; + return; } mutex_lock(master_dev-bus_mutex); @@ -440,7 +439,6 @@ static void ds1wm_search(void *data, struct w1_master *master_dev, dev_dbg(ds1wm_data-pdev-dev, pass: %d total: %d search done ms d bit pos: %d\n, pass, slaves_found, ms_discrep_bit); - mutex_unlock(master_dev-bus_mutex); } /* - */ -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] w1: refcnt fix, skip non-error send, docs
On Sat, Feb 08, 2014 at 01:23:43AM +0400, z...@ioremap.net wrote: > Hi > > 07.02.2014, 10:00, "David Fries" : > > > Here's a patch to implement that. Is this what you have in mind? > > > > From 4ed65d81b0121a8c191a9833d041484e9097198b Mon Sep 17 00:00:00 2001 > > From: David Fries > > Date: Thu, 6 Feb 2014 23:45:05 -0600 > > Subject: [PATCH] w1: correct cn_msg ack, no change or seq + 1 > > > > Netlink messages sent from the kernel consists of kernel generated > > notifications for adds or removes, the error message (also indicates > > the message has been processed), and the messages that have data to > > return. The cn_msg ack is left alone for the first two, and when > > returning data it is the sequence number + 1. Modifying the code to > > the protocol standard. > > > > Signed-off-by: David Fries > > Yes, it looks right. > Can you also check that protocol documentation is correct? Documentation/connector/connector.txt ? I found it a little unclear, I'll see what I can do. -- David Fries PGP pub CB1EE8F0 http://fries.net/~david/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] w1: refcnt fix, skip non-error send, docs
On Sat, Feb 08, 2014 at 01:23:43AM +0400, z...@ioremap.net wrote: Hi 07.02.2014, 10:00, David Fries da...@fries.net: Here's a patch to implement that. Is this what you have in mind? From 4ed65d81b0121a8c191a9833d041484e9097198b Mon Sep 17 00:00:00 2001 From: David Fries da...@fries.net Date: Thu, 6 Feb 2014 23:45:05 -0600 Subject: [PATCH] w1: correct cn_msg ack, no change or seq + 1 Netlink messages sent from the kernel consists of kernel generated notifications for adds or removes, the error message (also indicates the message has been processed), and the messages that have data to return. The cn_msg ack is left alone for the first two, and when returning data it is the sequence number + 1. Modifying the code to the protocol standard. Signed-off-by: David Fries da...@fries.net Yes, it looks right. Can you also check that protocol documentation is correct? Documentation/connector/connector.txt ? I found it a little unclear, I'll see what I can do. -- David Fries da...@fries.netPGP pub CB1EE8F0 http://fries.net/~david/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] w1: refcnt fix, skip non-error send, docs
On Wed, Feb 05, 2014 at 03:48:45AM +0400, z...@ioremap.net wrote: > Hi > > 04.02.2014, 09:51, "David Fries" : > > Help me understand what the protocol is supposed to be. Assuming > > there aren't any errors, is there supposed to be a > > w1_netlink_send_error generated reply per netlink packet (cn_msg), per > > w1_netlink_msg, or per w1_netlink_cmd? > > reply should be sent per cmd to specify each command status > If there is no cmd in request or we didn't get to it (like failed to reset > device), we should send error. > > Depending on how w1-msg + (optional) w1-cmd are packed, client can detect > what exact error happend > > > What about the cn_msg seq and ack values? I assume the kernel > > response should carry the same seq number as the request, but what > > should the ack be set to? > > reply ack is seq + 1 > seq is the same to highlight request it belongs to Here's a patch to implement that. Is this what you have in mind? >From 4ed65d81b0121a8c191a9833d041484e9097198b Mon Sep 17 00:00:00 2001 From: David Fries Date: Thu, 6 Feb 2014 23:45:05 -0600 Subject: [PATCH] w1: correct cn_msg ack, no change or seq + 1 Netlink messages sent from the kernel consists of kernel generated notifications for adds or removes, the error message (also indicates the message has been processed), and the messages that have data to return. The cn_msg ack is left alone for the first two, and when returning data it is the sequence number + 1. Modifying the code to the protocol standard. Signed-off-by: David Fries --- Documentation/connector/connector.txt |2 +- drivers/w1/w1_netlink.c |6 +- 2 files changed, 2 insertions(+), 6 deletions(-) diff --git a/Documentation/connector/connector.txt b/Documentation/connector/connector.txt index 9bdfc1a..0bc3522 100644 --- a/Documentation/connector/connector.txt +++ b/Documentation/connector/connector.txt @@ -115,7 +115,7 @@ acknowledge number MUST be the same + 1. If we receive a message and its sequence number is not equal to one we are expecting, then it is a new message. If we receive a message and its sequence number is the same as one we are expecting, but its -acknowledge is not equal to the acknowledge number in the original +acknowledge is not equal to the sequence number in the original message + 1, then it is a new message. Obviously, the protocol header contains the above id. diff --git a/drivers/w1/w1_netlink.c b/drivers/w1/w1_netlink.c index 3c81689..d98b550 100644 --- a/drivers/w1/w1_netlink.c +++ b/drivers/w1/w1_netlink.c @@ -59,7 +59,6 @@ static void w1_send_slave(struct w1_master *dev, u64 rn) avail = dev->priv_size - cmd->len; if (avail < 8) { - msg->ack++; cn_netlink_send(msg, dev->portid, 0, GFP_KERNEL); msg->len = sizeof(struct w1_netlink_msg) + @@ -130,7 +129,6 @@ static int w1_get_slaves(struct w1_master *dev, W1_ALARM_SEARCH : W1_SEARCH, w1_found_send_slave); } - msg->ack = 0; cn_netlink_send(msg, dev->portid, 0, GFP_KERNEL); dev->priv = NULL; @@ -308,7 +306,7 @@ static int w1_process_command_root(struct cn_msg *msg, cn->id.val = CN_W1_VAL; cn->seq = msg->seq; - cn->ack = 1; + cn->ack = msg->seq + 1; cn->len = sizeof(struct w1_netlink_msg); w = (struct w1_netlink_msg *)(cn + 1); @@ -321,7 +319,6 @@ static int w1_process_command_root(struct cn_msg *msg, list_for_each_entry(m, _masters, w1_master_entry) { if (cn->len + sizeof(*id) > PAGE_SIZE - sizeof(struct cn_msg)) { cn_netlink_send(cn, portid, 0, GFP_KERNEL); - cn->ack++; cn->len = sizeof(struct w1_netlink_msg); w->len = 0; id = (u32 *)(w + 1); @@ -332,7 +329,6 @@ static int w1_process_command_root(struct cn_msg *msg, cn->len += sizeof(*id); id++; } - cn->ack = 0; cn_netlink_send(cn, portid, 0, GFP_KERNEL); mutex_unlock(_mlock); -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] w1: refcnt fix, skip non-error send, docs
On Wed, Feb 05, 2014 at 03:48:45AM +0400, z...@ioremap.net wrote: Hi 04.02.2014, 09:51, David Fries da...@fries.net: Help me understand what the protocol is supposed to be. Assuming there aren't any errors, is there supposed to be a w1_netlink_send_error generated reply per netlink packet (cn_msg), per w1_netlink_msg, or per w1_netlink_cmd? reply should be sent per cmd to specify each command status If there is no cmd in request or we didn't get to it (like failed to reset device), we should send error. Depending on how w1-msg + (optional) w1-cmd are packed, client can detect what exact error happend What about the cn_msg seq and ack values? I assume the kernel response should carry the same seq number as the request, but what should the ack be set to? reply ack is seq + 1 seq is the same to highlight request it belongs to Here's a patch to implement that. Is this what you have in mind? From 4ed65d81b0121a8c191a9833d041484e9097198b Mon Sep 17 00:00:00 2001 From: David Fries da...@fries.net Date: Thu, 6 Feb 2014 23:45:05 -0600 Subject: [PATCH] w1: correct cn_msg ack, no change or seq + 1 Netlink messages sent from the kernel consists of kernel generated notifications for adds or removes, the error message (also indicates the message has been processed), and the messages that have data to return. The cn_msg ack is left alone for the first two, and when returning data it is the sequence number + 1. Modifying the code to the protocol standard. Signed-off-by: David Fries da...@fries.net --- Documentation/connector/connector.txt |2 +- drivers/w1/w1_netlink.c |6 +- 2 files changed, 2 insertions(+), 6 deletions(-) diff --git a/Documentation/connector/connector.txt b/Documentation/connector/connector.txt index 9bdfc1a..0bc3522 100644 --- a/Documentation/connector/connector.txt +++ b/Documentation/connector/connector.txt @@ -115,7 +115,7 @@ acknowledge number MUST be the same + 1. If we receive a message and its sequence number is not equal to one we are expecting, then it is a new message. If we receive a message and its sequence number is the same as one we are expecting, but its -acknowledge is not equal to the acknowledge number in the original +acknowledge is not equal to the sequence number in the original message + 1, then it is a new message. Obviously, the protocol header contains the above id. diff --git a/drivers/w1/w1_netlink.c b/drivers/w1/w1_netlink.c index 3c81689..d98b550 100644 --- a/drivers/w1/w1_netlink.c +++ b/drivers/w1/w1_netlink.c @@ -59,7 +59,6 @@ static void w1_send_slave(struct w1_master *dev, u64 rn) avail = dev-priv_size - cmd-len; if (avail 8) { - msg-ack++; cn_netlink_send(msg, dev-portid, 0, GFP_KERNEL); msg-len = sizeof(struct w1_netlink_msg) + @@ -130,7 +129,6 @@ static int w1_get_slaves(struct w1_master *dev, W1_ALARM_SEARCH : W1_SEARCH, w1_found_send_slave); } - msg-ack = 0; cn_netlink_send(msg, dev-portid, 0, GFP_KERNEL); dev-priv = NULL; @@ -308,7 +306,7 @@ static int w1_process_command_root(struct cn_msg *msg, cn-id.val = CN_W1_VAL; cn-seq = msg-seq; - cn-ack = 1; + cn-ack = msg-seq + 1; cn-len = sizeof(struct w1_netlink_msg); w = (struct w1_netlink_msg *)(cn + 1); @@ -321,7 +319,6 @@ static int w1_process_command_root(struct cn_msg *msg, list_for_each_entry(m, w1_masters, w1_master_entry) { if (cn-len + sizeof(*id) PAGE_SIZE - sizeof(struct cn_msg)) { cn_netlink_send(cn, portid, 0, GFP_KERNEL); - cn-ack++; cn-len = sizeof(struct w1_netlink_msg); w-len = 0; id = (u32 *)(w + 1); @@ -332,7 +329,6 @@ static int w1_process_command_root(struct cn_msg *msg, cn-len += sizeof(*id); id++; } - cn-ack = 0; cn_netlink_send(cn, portid, 0, GFP_KERNEL); mutex_unlock(w1_mlock); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/