from:"David Fries"

Re: [PATCH 1/1] media: uvc_driver: fix USB Camera ref leak denial of service

2018-05-26 Thread David Fries

Laurent and Mauro,

I've reviewed Philipp Zabel's 49770 patch, it looks like they
equivalent, but take a different path.  My version does the unref in
uvc_unregister_video, Zabel's does the unref in each caller of
uvc_unregister_video.  I think I would prefer it to be in fewer
places.

Any ideas on the sysfs group 'power' not found bug on removal while on
use?

Thanks, it will be great to get this fixed.

On Sat, May 26, 2018 at 07:21:11PM +0300, Laurent Pinchart wrote:
> Hi David,
> 
> Thank you for the patch.
> 
> On Saturday, 26 May 2018 18:50:46 EEST David Fries wrote:
> > Commit 9d15cd958c17 ("media: uvcvideo: Convert from using an atomic
> > variable to a reference count")
> > didn't take into account that while the old counter was initialized to
> > 0 (no stream open), kref_init starts with a reference of 1.  The
> > reference count on unplug no longer reaches 0, uvc_delete isn't
> > called, and evdev doesn't release /dev/input/event*.  Plug and unplug
> > enough times and it runs out of device minors preventing any new input
> > device and the use of newly plugged in USB video cameras until the
> > system is rebooted.
> > 
> > Signed-off-by: David Fries <da...@fries.net>
> > Cc: Guennadi Liakhovetski <g.liakhovet...@gmx.de>
> > Cc: Laurent Pinchart <laurent.pinch...@ideasonboard.com>
> > Cc: Mauro Carvalho Chehab <mche...@s-opensource.com>
> > Cc: sta...@vger.kernel.org
> 
> Philipp Zabel has already posted a similar patch, see https://
> patchwork.linuxtv.org/patch/49770/
> 
> Mauro,
> 
> This is a serious issue so I'd like to get the patch merged in v4.18, but it 
> could be argued that it's getting late for that, especially given that the 
> bug 
> has been there since v4.14. Would you be OK merging this in v4.18 ? If so 
> could you pick https://patchwork.linuxtv.org/patch/49770/ up, or would you 
> like me to send a pull request ?
> 
> > ---
> >  drivers/media/usb/uvc/uvc_driver.c | 11 ---
> >  1 file changed, 4 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/media/usb/uvc/uvc_driver.c
> > b/drivers/media/usb/uvc/uvc_driver.c index 2469b49..3cbdc87 100644
> > --- a/drivers/media/usb/uvc/uvc_driver.c
> > +++ b/drivers/media/usb/uvc/uvc_driver.c
> > @@ -1871,13 +1871,6 @@ static void uvc_unregister_video(struct uvc_device
> > *dev) {
> > struct uvc_streaming *stream;
> > 
> > -   /* Unregistering all video devices might result in uvc_delete() being
> > -* called from inside the loop if there's no open file handle. To avoid
> > -* that, increment the refcount before iterating over the streams and
> > -* decrement it when done.
> > -*/
> > -   kref_get(>ref);
> > -
> > list_for_each_entry(stream, >streams, list) {
> > if (!video_is_registered(>vdev))
> > continue;
> > @@ -1888,6 +1881,10 @@ static void uvc_unregister_video(struct uvc_device
> > *dev) uvc_debugfs_cleanup_stream(stream);
> > }
> > 
> > +   /* Release the reference implicit in kref_init from uvc_probe,
> > +* the UVC device won't be deleted until the last file descriptor
> > +* is also closed.
> > +*/
> > kref_put(>ref, uvc_delete);
> >  }
> 
> -- 
> Regards,
> 
> Laurent Pinchart
> 
> 

-- 
David Fries <da...@fries.net>

Re: [PATCH 1/1] media: uvc_driver: fix USB Camera ref leak denial of service

2018-05-26 Thread David Fries

Laurent and Mauro,

I've reviewed Philipp Zabel's 49770 patch, it looks like they
equivalent, but take a different path.  My version does the unref in
uvc_unregister_video, Zabel's does the unref in each caller of
uvc_unregister_video.  I think I would prefer it to be in fewer
places.

Any ideas on the sysfs group 'power' not found bug on removal while on
use?

Thanks, it will be great to get this fixed.

On Sat, May 26, 2018 at 07:21:11PM +0300, Laurent Pinchart wrote:
> Hi David,
> 
> Thank you for the patch.
> 
> On Saturday, 26 May 2018 18:50:46 EEST David Fries wrote:
> > Commit 9d15cd958c17 ("media: uvcvideo: Convert from using an atomic
> > variable to a reference count")
> > didn't take into account that while the old counter was initialized to
> > 0 (no stream open), kref_init starts with a reference of 1.  The
> > reference count on unplug no longer reaches 0, uvc_delete isn't
> > called, and evdev doesn't release /dev/input/event*.  Plug and unplug
> > enough times and it runs out of device minors preventing any new input
> > device and the use of newly plugged in USB video cameras until the
> > system is rebooted.
> > 
> > Signed-off-by: David Fries 
> > Cc: Guennadi Liakhovetski 
> > Cc: Laurent Pinchart 
> > Cc: Mauro Carvalho Chehab 
> > Cc: sta...@vger.kernel.org
> 
> Philipp Zabel has already posted a similar patch, see https://
> patchwork.linuxtv.org/patch/49770/
> 
> Mauro,
> 
> This is a serious issue so I'd like to get the patch merged in v4.18, but it 
> could be argued that it's getting late for that, especially given that the 
> bug 
> has been there since v4.14. Would you be OK merging this in v4.18 ? If so 
> could you pick https://patchwork.linuxtv.org/patch/49770/ up, or would you 
> like me to send a pull request ?
> 
> > ---
> >  drivers/media/usb/uvc/uvc_driver.c | 11 ---
> >  1 file changed, 4 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/media/usb/uvc/uvc_driver.c
> > b/drivers/media/usb/uvc/uvc_driver.c index 2469b49..3cbdc87 100644
> > --- a/drivers/media/usb/uvc/uvc_driver.c
> > +++ b/drivers/media/usb/uvc/uvc_driver.c
> > @@ -1871,13 +1871,6 @@ static void uvc_unregister_video(struct uvc_device
> > *dev) {
> > struct uvc_streaming *stream;
> > 
> > -   /* Unregistering all video devices might result in uvc_delete() being
> > -* called from inside the loop if there's no open file handle. To avoid
> > -* that, increment the refcount before iterating over the streams and
> > -* decrement it when done.
> > -*/
> > -   kref_get(>ref);
> > -
> > list_for_each_entry(stream, >streams, list) {
> > if (!video_is_registered(>vdev))
> > continue;
> > @@ -1888,6 +1881,10 @@ static void uvc_unregister_video(struct uvc_device
> > *dev) uvc_debugfs_cleanup_stream(stream);
> > }
> > 
> > +   /* Release the reference implicit in kref_init from uvc_probe,
> > +* the UVC device won't be deleted until the last file descriptor
> > +* is also closed.
> > +*/
> > kref_put(>ref, uvc_delete);
> >  }
> 
> -- 
> Regards,
> 
> Laurent Pinchart
> 
> 

-- 
David Fries

[PATCH 1/1] media: uvc_driver: fix USB Camera ref leak denial of service

2018-05-26 Thread David Fries

Commit 9d15cd958c17 ("media: uvcvideo: Convert from using an atomic
variable to a reference count")
didn't take into account that while the old counter was initialized to
0 (no stream open), kref_init starts with a reference of 1.  The
reference count on unplug no longer reaches 0, uvc_delete isn't
called, and evdev doesn't release /dev/input/event*.  Plug and unplug
enough times and it runs out of device minors preventing any new input
device and the use of newly plugged in USB video cameras until the
system is rebooted.

Signed-off-by: David Fries <da...@fries.net>
Cc: Guennadi Liakhovetski <g.liakhovet...@gmx.de>
Cc: Laurent Pinchart <laurent.pinch...@ideasonboard.com>
Cc: Mauro Carvalho Chehab <mche...@s-opensource.com>
Cc: sta...@vger.kernel.org
---
 drivers/media/usb/uvc/uvc_driver.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/media/usb/uvc/uvc_driver.c 
b/drivers/media/usb/uvc/uvc_driver.c
index 2469b49..3cbdc87 100644
--- a/drivers/media/usb/uvc/uvc_driver.c
+++ b/drivers/media/usb/uvc/uvc_driver.c
@@ -1871,13 +1871,6 @@ static void uvc_unregister_video(struct uvc_device *dev)
 {
struct uvc_streaming *stream;
 
-   /* Unregistering all video devices might result in uvc_delete() being
-* called from inside the loop if there's no open file handle. To avoid
-* that, increment the refcount before iterating over the streams and
-* decrement it when done.
-*/
-   kref_get(>ref);
-
list_for_each_entry(stream, >streams, list) {
if (!video_is_registered(>vdev))
continue;
@@ -1888,6 +1881,10 @@ static void uvc_unregister_video(struct uvc_device *dev)
uvc_debugfs_cleanup_stream(stream);
}
 
+   /* Release the reference implicit in kref_init from uvc_probe,
+* the UVC device won't be deleted until the last file descriptor
+* is also closed.
+*/
kref_put(>ref, uvc_delete);
 }
 
-- 
2.1.4


>From 32a612bc06a2a1b9215f3b7166342c98043bd925 Mon Sep 17 00:00:00 2001
From: David Fries <da...@fries.net>
Date: Thu, 24 May 2018 23:43:15 -0500
Subject: [PATCH] uvc_driver: UVC kref never reaches zero leading to denial of
 service

Commit 9d15cd958c17 ("media: uvcvideo: Convert from using an atomic
variable to a reference count")
didn't take into account that while the counter was
initialized to 0 (no stream open), kref_init starts with a reference
of 1, leading to the device
never reaching 0 and uvc_delete never getting called.  This leads to
evdev never getting released and /dev/input/event* eventually running
out of minors preventing any new event devices and new USB cameras from
being usable until the system is rebooted.

In my case "disabled by hub (EMI?), re-enabling..." kept
removing/inserting the device until days later it ran out of minors
and I lost the video security feed.

Now that the device is actually being removed other problems are
showing up.  Specifically the following if the camera is removed or
`rmmod ehci_pci` while an application is getting video from it.  It
doesn't happen if the camera is not in use.  How do I track that down?

sysfs group 'power' not found for kobject 'event10'
sysfs group 'power' not found for kobject 'input32'
sysfs group 'id' not found for kobject 'input32'
sysfs group 'capabilities' not found for kobject 'input32'
sysfs group 'power' not found for kobject 'media0'

Signed-off-by: David Fries <da...@fries.net>
---
 drivers/media/usb/uvc/uvc_driver.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/media/usb/uvc/uvc_driver.c 
b/drivers/media/usb/uvc/uvc_driver.c
index 2469b49..3cbdc87 100644
--- a/drivers/media/usb/uvc/uvc_driver.c
+++ b/drivers/media/usb/uvc/uvc_driver.c
@@ -1871,13 +1871,6 @@ static void uvc_unregister_video(struct uvc_device *dev)
 {
struct uvc_streaming *stream;
 
-   /* Unregistering all video devices might result in uvc_delete() being
-* called from inside the loop if there's no open file handle. To avoid
-* that, increment the refcount before iterating over the streams and
-* decrement it when done.
-*/
-   kref_get(>ref);
-
list_for_each_entry(stream, >streams, list) {
if (!video_is_registered(>vdev))
continue;
@@ -1888,6 +1881,10 @@ static void uvc_unregister_video(struct uvc_device *dev)
uvc_debugfs_cleanup_stream(stream);
}
 
+   /* Release the reference implicit in kref_init from uvc_probe,
+* the UVC device won't be deleted until the last file descriptor
+* is also closed.
+*/
kref_put(>ref, uvc_delete);
 }
 
-- 
2.1.4

[PATCH 0/1] media: uvc_driver: fix USB Camera ref leak denial of service

2018-05-26 Thread David Fries

The patch in the following e-mail fixes a reference count bug, it
seems to me that uvc_unregister_video is a good location to release
the final reference, I find it is called once.  It may sound like a
lot to plug and unplug the USB camera 250 some times, but in my case
"disabled by hub (EMI?), re-enabling..." kept unplugging and plugging
in the device until days later it ran out of minors and I lost the
video security feed.

With this patch, now that the device is actually being removed other
problems are showing up.  Specifically the following if the camera is
removed or `rmmod ehci_pci` while an application is getting video from
it.  It doesn't happen if the camera is not in use.  How do I track
that down?

sysfs group 'power' not found for kobject 'event10'
sysfs group 'power' not found for kobject 'input32'
sysfs group 'id' not found for kobject 'input32'
sysfs group 'capabilities' not found for kobject 'input32'
sysfs group 'power' not found for kobject 'media0'

-- 
David Fries <da...@fries.net>

[PATCH 0/1] media: uvc_driver: fix USB Camera ref leak denial of service

2018-05-26 Thread David Fries

The patch in the following e-mail fixes a reference count bug, it
seems to me that uvc_unregister_video is a good location to release
the final reference, I find it is called once.  It may sound like a
lot to plug and unplug the USB camera 250 some times, but in my case
"disabled by hub (EMI?), re-enabling..." kept unplugging and plugging
in the device until days later it ran out of minors and I lost the
video security feed.

With this patch, now that the device is actually being removed other
problems are showing up.  Specifically the following if the camera is
removed or `rmmod ehci_pci` while an application is getting video from
it.  It doesn't happen if the camera is not in use.  How do I track
that down?

sysfs group 'power' not found for kobject 'event10'
sysfs group 'power' not found for kobject 'input32'
sysfs group 'id' not found for kobject 'input32'
sysfs group 'capabilities' not found for kobject 'input32'
sysfs group 'power' not found for kobject 'media0'

-- 
David Fries

[PATCH 1/1] media: uvc_driver: fix USB Camera ref leak denial of service

2018-05-26 Thread David Fries

Commit 9d15cd958c17 ("media: uvcvideo: Convert from using an atomic
variable to a reference count")
didn't take into account that while the old counter was initialized to
0 (no stream open), kref_init starts with a reference of 1.  The
reference count on unplug no longer reaches 0, uvc_delete isn't
called, and evdev doesn't release /dev/input/event*.  Plug and unplug
enough times and it runs out of device minors preventing any new input
device and the use of newly plugged in USB video cameras until the
system is rebooted.

Signed-off-by: David Fries 
Cc: Guennadi Liakhovetski 
Cc: Laurent Pinchart 
Cc: Mauro Carvalho Chehab 
Cc: sta...@vger.kernel.org
---
 drivers/media/usb/uvc/uvc_driver.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/media/usb/uvc/uvc_driver.c 
b/drivers/media/usb/uvc/uvc_driver.c
index 2469b49..3cbdc87 100644
--- a/drivers/media/usb/uvc/uvc_driver.c
+++ b/drivers/media/usb/uvc/uvc_driver.c
@@ -1871,13 +1871,6 @@ static void uvc_unregister_video(struct uvc_device *dev)
 {
struct uvc_streaming *stream;
 
-   /* Unregistering all video devices might result in uvc_delete() being
-* called from inside the loop if there's no open file handle. To avoid
-* that, increment the refcount before iterating over the streams and
-* decrement it when done.
-*/
-   kref_get(>ref);
-
list_for_each_entry(stream, >streams, list) {
if (!video_is_registered(>vdev))
continue;
@@ -1888,6 +1881,10 @@ static void uvc_unregister_video(struct uvc_device *dev)
uvc_debugfs_cleanup_stream(stream);
}
 
+   /* Release the reference implicit in kref_init from uvc_probe,
+* the UVC device won't be deleted until the last file descriptor
+* is also closed.
+*/
kref_put(>ref, uvc_delete);
 }
 
-- 
2.1.4


>From 32a612bc06a2a1b9215f3b7166342c98043bd925 Mon Sep 17 00:00:00 2001
From: David Fries 
Date: Thu, 24 May 2018 23:43:15 -0500
Subject: [PATCH] uvc_driver: UVC kref never reaches zero leading to denial of
 service

Commit 9d15cd958c17 ("media: uvcvideo: Convert from using an atomic
variable to a reference count")
didn't take into account that while the counter was
initialized to 0 (no stream open), kref_init starts with a reference
of 1, leading to the device
never reaching 0 and uvc_delete never getting called.  This leads to
evdev never getting released and /dev/input/event* eventually running
out of minors preventing any new event devices and new USB cameras from
being usable until the system is rebooted.

In my case "disabled by hub (EMI?), re-enabling..." kept
removing/inserting the device until days later it ran out of minors
and I lost the video security feed.

Now that the device is actually being removed other problems are
showing up.  Specifically the following if the camera is removed or
`rmmod ehci_pci` while an application is getting video from it.  It
doesn't happen if the camera is not in use.  How do I track that down?

sysfs group 'power' not found for kobject 'event10'
sysfs group 'power' not found for kobject 'input32'
sysfs group 'id' not found for kobject 'input32'
sysfs group 'capabilities' not found for kobject 'input32'
sysfs group 'power' not found for kobject 'media0'

Signed-off-by: David Fries 
---
 drivers/media/usb/uvc/uvc_driver.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/media/usb/uvc/uvc_driver.c 
b/drivers/media/usb/uvc/uvc_driver.c
index 2469b49..3cbdc87 100644
--- a/drivers/media/usb/uvc/uvc_driver.c
+++ b/drivers/media/usb/uvc/uvc_driver.c
@@ -1871,13 +1871,6 @@ static void uvc_unregister_video(struct uvc_device *dev)
 {
struct uvc_streaming *stream;
 
-   /* Unregistering all video devices might result in uvc_delete() being
-* called from inside the loop if there's no open file handle. To avoid
-* that, increment the refcount before iterating over the streams and
-* decrement it when done.
-*/
-   kref_get(>ref);
-
list_for_each_entry(stream, >streams, list) {
if (!video_is_registered(>vdev))
continue;
@@ -1888,6 +1881,10 @@ static void uvc_unregister_video(struct uvc_device *dev)
uvc_debugfs_cleanup_stream(stream);
}
 
+   /* Release the reference implicit in kref_init from uvc_probe,
+* the UVC device won't be deleted until the last file descriptor
+* is also closed.
+*/
kref_put(>ref, uvc_delete);
 }
 
-- 
2.1.4

Re: [patch] [patch 2/2 v2] w1: use correct lock on error in w1_seq_show()

2015-06-11 Thread David Fries

On Thu, Jun 11, 2015 at 06:31:00PM +0300, Evgeniy Polyakov wrote:
> Hi
> 
> 04.06.2015, 12:04, "Dan Carpenter" :
> > I noticed there was a problem here because Smatch complained:
> >
> > drivers/w1/slaves/w1_therm.c:416 w1_seq_show() warn:
> > inconsistent returns 'mutex:>master->mutex'.
> >   Locked on: line 416
> >   Unlocked on: line 413
> >
> > The problem is that we lock ->mutex but we unlock ->bus_mutex on error.
> > David Fries says that ->bus_mutex is correct and ->mutex is incorrect.
> >
> > Fixes: d9411e57dc7f ('w1: Add support for DS28EA00 sequence to w1-therm')
> > Signed-off-by: Dan Carpenter 
> 
> Looks good to me, Greg please pull this serie into your tree, if you hadn't 
> yet.
> Am I right that this is a stable tree material too?

I would expect the answer to be no.  This is a fix to a new feature
that is in gregkh/char-misc but not yet in Linus's tree.

-- 
David Fries 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] [patch 2/2 v2] w1: use correct lock on error in w1_seq_show()

2015-06-11 Thread David Fries

On Thu, Jun 11, 2015 at 06:31:00PM +0300, Evgeniy Polyakov wrote:
 Hi
 
 04.06.2015, 12:04, Dan Carpenter dan.carpen...@oracle.com:
  I noticed there was a problem here because Smatch complained:
 
  drivers/w1/slaves/w1_therm.c:416 w1_seq_show() warn:
  inconsistent returns 'mutex:sl-master-mutex'.
    Locked on: line 416
    Unlocked on: line 413
 
  The problem is that we lock -mutex but we unlock -bus_mutex on error.
  David Fries says that -bus_mutex is correct and -mutex is incorrect.
 
  Fixes: d9411e57dc7f ('w1: Add support for DS28EA00 sequence to w1-therm')
  Signed-off-by: Dan Carpenter dan.carpen...@oracle.com
 
 Looks good to me, Greg please pull this serie into your tree, if you hadn't 
 yet.
 Am I right that this is a stable tree material too?

I would expect the answer to be no.  This is a fix to a new feature
that is in gregkh/char-misc but not yet in Linus's tree.

-- 
David Fries da...@fries.net
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2/2] w1: unlock correct lock on error in w1_seq_show()

2015-06-02 Thread David Fries

On Tue, Jun 02, 2015 at 09:54:10PM -0400, Matt Campbell wrote:
> Thanks for the response, I will try to answer your questions.
> 
> The other function besides temp conversion and EEPROM write that
> requires a delay is the initial CHAIN on command. The datasheet says
> to "Wait for chain to charge" but it does not offer ANY indication on
> how to do so.  I did discover that my sleep was a sleep(0) long after
> I submitted the patch, but as you observed it was not hurting
> anything.

Then it should be removed as it gives the wrong impression and that
variable shouldn't be used in the slave code anyway.

> The 64 was, in all honesty, a starting point that I picked for lack of
> finding a better option.  I only currently have 10 sensors available
> for me to test with, so I was not able to do any significant tests to
> see what would happen if I raised that limit and the loop did not exit
> properly at the end of the chain.  I was also not able to find an edge
> case that would cause the loop to not find a match.  The w1_seq file
> only shows for the one type sensor that supports it so in the absence
> of a bus error, a match is guaranteed.

That's easy enough to test, comment out your if...break; in the loop.

Is there a good reason that the loop doesn't terminate when the device
you're after sl->reg_num.id is found?  By making your loop terminate
on a different condition, W1_42_FINISHED_BYTE, if the slave you are
after isn't found you'll still return seq of 0.  I'm wondering if seq
should be initialized to -1.

Do this, make up the right id of the correct family code and manually
tell the kernel it is there.  That would be the same as if you removed
a chip before the kernel's search timed it out.  Then do your sequence
scan on your devices including this new one which isn't there.
cd /sys/devices/w1_bus_master1
echo 42-01234567 > w1_master_add

The devices return the rom numbers, I wonder about returning the rom
list (which is in the sequence), instead of the sequence number for a
device if that would be more or less useful to users.  You would be
able to do it in one pass.

> The output of the function in the event the pins are hooked up as GPIO
> and not a chain is undefined.  Since the chips are very tiny,
> prototyping this function would require me to design and order some
> boards that I never was able to use.  To my knowledge it is not
> possible to determine that things are wired this way in the software.

Something like "Output is undefined if all direct sequence sensors on
this one wire bus aren't part of one chain.  GPIO control of the two
pins is not support by this driver." for the comment above
w1_seq_show, in addition to the other comments I was asking about.  I
see now you have some documentation in
Documentation/w1/slaves/w1_therm already that I didn't realize
earlier.

> Apologies, this is my first foray in submitting any sort of kernel
> code so I am not sure of exactly how I would submit a fixup patch for
> gregkh's tree.  If someone wants to walk me through that off list I
> would be most appreciative!

git remote add gregkh_char-misc 
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
git fetch gregkh_char-misc
git rebase remotes/gregkh_char-misc/char-misc-next
and add a new commit from there.

> Matt Campbell
> mattrcampb...@gmail.com
> 
> 
> On Tue, Jun 2, 2015 at 9:07 PM, David Fries  wrote:
> > The source code is as good as it gets, that's why I posted those two
> > lines from the w1.h file.  Maybe it could be expanded upon, mutex is
> > more for adding or removing slaves and the like modifications to
> > w1_master data structure, where bus_mutex is what make sure something
> > else doesn't try to talk on the bus while the current device is using
> > it.  What really matters is how other code is using the locks.  For
> > instance if you look at w1_therm.c bus_mutex is the only one used.
> > Then you look at what your routine is doing, and what the existing
> > routine is doing, they are both reading/writing from the one wire bus,
> > and then return their output to the user, so then use the same lock.
> > If it's the wrong lock then it's at least consistent and easier to
> > cleanup when someone figures out it's the wrong lock.  If you think
> > it's the wrong lock raise a flag and ask.
> >
> > My patch review comments.
> >
> > The temperature sensor uses a strong pullup or delay of 750 ms to do
> > the temperature conversion.  A quick read through of this sensor has
> > temperature conversion and EEPROM write as the only operations
> > requiring a significant time delay, everything else is in the us, so I
> > would expect this to not require any delay.
> >
> > Read the comments to w1_next_pullup in w1_i

Re: [patch 2/2] w1: unlock correct lock on error in w1_seq_show()

2015-06-02 Thread David Fries

The source code is as good as it gets, that's why I posted those two
lines from the w1.h file.  Maybe it could be expanded upon, mutex is
more for adding or removing slaves and the like modifications to
w1_master data structure, where bus_mutex is what make sure something
else doesn't try to talk on the bus while the current device is using
it.  What really matters is how other code is using the locks.  For
instance if you look at w1_therm.c bus_mutex is the only one used.
Then you look at what your routine is doing, and what the existing
routine is doing, they are both reading/writing from the one wire bus,
and then return their output to the user, so then use the same lock.
If it's the wrong lock then it's at least consistent and easier to
cleanup when someone figures out it's the wrong lock.  If you think
it's the wrong lock raise a flag and ask.

My patch review comments.

The temperature sensor uses a strong pullup or delay of 750 ms to do
the temperature conversion.  A quick read through of this sensor has
temperature conversion and EEPROM write as the only operations
requiring a significant time delay, everything else is in the us, so I
would expect this to not require any delay.

Read the comments to w1_next_pullup in w1_io.c as pullup_duration will
always be 0 in msleep(sl->master->pullup_duration); which isn't likely
what you expect.  Apparently it is working for you so that further
reinforces my previous point.

Where did the 64 come from?  I have 14 temperature sensors in my
single family residence right now, and it would take another 7 to
cover each area/room.  I could imagine an actual industrial setting to
far exceed the 64 count, but I don't work in that area so I don't know
the practical limits.  How long does it take to read if it loops
through 64 times without finding a match?  What is reg_num value if a
sensor doesn't reply?  If it is all 00's or all ff's would that be a
better test for the end of the line?  How long would it take to loop
16384 times?  Maybe you stop if it reaches 16384 or if the end of line
value is read 64 times.  I also assume this is pretty much a static
answer, so you aren't going to be querying it except once per device,
once per program setup and rarely or infrequently after that so the
time shouldn't matter too much.  Maybe the program queries again if a
sensor is detected or removed.

I would like to see some documentation (comment) at the top of
w1_seq_show as to what the sysfs number returns means.  I can see the
variable is named seq, and it returns it as a signed number.  What
does a negative value mean?  Are they expected?  Will the first device
in the chain be 0 or 1, and what defines what end that is?  What kind
of value is returned if the pins are connected as GPIO and not in the
direct sequence configuration?  Maybe it needs a comment that all
sensors of this type must be in the direct sequence configuration for
this routine to give meaningful values.  "Place all devices in CHAIN
state" that will address all devices including other types of devices,
would you expect this to cause a problem with other devices?

Line up the constants after #define to the same tab location before
w1_seq_show.

Do you plan to send in a mutex fixup patch?  In my experience gregkh's
tree only accepts patches to what is already there, he won't swap out a
patch he already has (he could as it isn't in Linus's tree, but he
doesn't).

On Tue, Jun 02, 2015 at 08:35:41AM -0400, Matt Campbell wrote:
> David, not arguing just curious how you decide which lock to use?  Is there
> a standard documented?
> 
> --m
> 
> Matt Campbell
> mattrcampb...@gmail.com
> 
> On Tue, Jun 2, 2015 at 8:12 AM, David Fries  wrote:
> 
> > Thanks for including me, I'm going to have to nack this though, it is
> > an error, but bus_mutex is correct, mutex is incorrect, so the other
> > two lock/unlock in that routine are the ones that need to be changed.
> >
> > w1.h
> >  * @mutex:  protect most of w1_master
> >  * @bus_mutex:  pretect concurrent bus access
> >
> > On Mon, Jun 01, 2015 at 12:56:32PM +0300, Dan Carpenter wrote:
> > > Smatch complains that we don't unlock "master->mutex" on this error
> > > path.  It looks like it is a typo and we unlock ->bus_mutext where
> > > ->mutex was intended.
> > >
> > > Fixes: d9411e57dc7f ('w1: Add support for DS28EA00 sequence to w1-therm')
> > > Signed-off-by: Dan Carpenter 
> > >
> > > diff --git a/drivers/w1/slaves/w1_therm.c b/drivers/w1/slaves/w1_therm.c
> > > index 3351be6..79ecaf8 100644
> > > --- a/drivers/w1/slaves/w1_therm.c
> > > +++ b/drivers/w1/slaves/w1_therm.c
> > > @@ -412,7 +412,7 @@ static ssize_t w1_seq_show(struct device *device,
> > >   c -= snprintf(buf + PAGE_SIZE - c, c, &q

Re: [patch 2/2] w1: unlock correct lock on error in w1_seq_show()

2015-06-02 Thread David Fries

Thanks for including me, I'm going to have to nack this though, it is
an error, but bus_mutex is correct, mutex is incorrect, so the other
two lock/unlock in that routine are the ones that need to be changed.

w1.h
 * @mutex:  protect most of w1_master
 * @bus_mutex:  pretect concurrent bus access

On Mon, Jun 01, 2015 at 12:56:32PM +0300, Dan Carpenter wrote:
> Smatch complains that we don't unlock "master->mutex" on this error
> path.  It looks like it is a typo and we unlock ->bus_mutext where
> ->mutex was intended.
> 
> Fixes: d9411e57dc7f ('w1: Add support for DS28EA00 sequence to w1-therm')
> Signed-off-by: Dan Carpenter 
> 
> diff --git a/drivers/w1/slaves/w1_therm.c b/drivers/w1/slaves/w1_therm.c
> index 3351be6..79ecaf8 100644
> --- a/drivers/w1/slaves/w1_therm.c
> +++ b/drivers/w1/slaves/w1_therm.c
> @@ -412,7 +412,7 @@ static ssize_t w1_seq_show(struct device *device,
>   c -= snprintf(buf + PAGE_SIZE - c, c, "%d\n", seq);
>   return PAGE_SIZE - c;
>  error:
> - mutex_unlock(>master->bus_mutex);
> + mutex_unlock(>master->mutex);
>   return -EIO;
>  }
>  

-- 
David Fries 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 2/2] w1: unlock correct lock on error in w1_seq_show()

2015-06-02 Thread David Fries

The source code is as good as it gets, that's why I posted those two
lines from the w1.h file.  Maybe it could be expanded upon, mutex is
more for adding or removing slaves and the like modifications to
w1_master data structure, where bus_mutex is what make sure something
else doesn't try to talk on the bus while the current device is using
it.  What really matters is how other code is using the locks.  For
instance if you look at w1_therm.c bus_mutex is the only one used.
Then you look at what your routine is doing, and what the existing
routine is doing, they are both reading/writing from the one wire bus,
and then return their output to the user, so then use the same lock.
If it's the wrong lock then it's at least consistent and easier to
cleanup when someone figures out it's the wrong lock.  If you think
it's the wrong lock raise a flag and ask.

My patch review comments.

The temperature sensor uses a strong pullup or delay of 750 ms to do
the temperature conversion.  A quick read through of this sensor has
temperature conversion and EEPROM write as the only operations
requiring a significant time delay, everything else is in the us, so I
would expect this to not require any delay.

Read the comments to w1_next_pullup in w1_io.c as pullup_duration will
always be 0 in msleep(sl-master-pullup_duration); which isn't likely
what you expect.  Apparently it is working for you so that further
reinforces my previous point.

Where did the 64 come from?  I have 14 temperature sensors in my
single family residence right now, and it would take another 7 to
cover each area/room.  I could imagine an actual industrial setting to
far exceed the 64 count, but I don't work in that area so I don't know
the practical limits.  How long does it take to read if it loops
through 64 times without finding a match?  What is reg_num value if a
sensor doesn't reply?  If it is all 00's or all ff's would that be a
better test for the end of the line?  How long would it take to loop
16384 times?  Maybe you stop if it reaches 16384 or if the end of line
value is read 64 times.  I also assume this is pretty much a static
answer, so you aren't going to be querying it except once per device,
once per program setup and rarely or infrequently after that so the
time shouldn't matter too much.  Maybe the program queries again if a
sensor is detected or removed.

I would like to see some documentation (comment) at the top of
w1_seq_show as to what the sysfs number returns means.  I can see the
variable is named seq, and it returns it as a signed number.  What
does a negative value mean?  Are they expected?  Will the first device
in the chain be 0 or 1, and what defines what end that is?  What kind
of value is returned if the pins are connected as GPIO and not in the
direct sequence configuration?  Maybe it needs a comment that all
sensors of this type must be in the direct sequence configuration for
this routine to give meaningful values.  Place all devices in CHAIN
state that will address all devices including other types of devices,
would you expect this to cause a problem with other devices?

Line up the constants after #define to the same tab location before
w1_seq_show.

Do you plan to send in a mutex fixup patch?  In my experience gregkh's
tree only accepts patches to what is already there, he won't swap out a
patch he already has (he could as it isn't in Linus's tree, but he
doesn't).

On Tue, Jun 02, 2015 at 08:35:41AM -0400, Matt Campbell wrote:
 David, not arguing just curious how you decide which lock to use?  Is there
 a standard documented?
 
 --m
 
 Matt Campbell
 mattrcampb...@gmail.com
 
 On Tue, Jun 2, 2015 at 8:12 AM, David Fries da...@fries.net wrote:
 
  Thanks for including me, I'm going to have to nack this though, it is
  an error, but bus_mutex is correct, mutex is incorrect, so the other
  two lock/unlock in that routine are the ones that need to be changed.
 
  w1.h
   * @mutex:  protect most of w1_master
   * @bus_mutex:  pretect concurrent bus access
 
  On Mon, Jun 01, 2015 at 12:56:32PM +0300, Dan Carpenter wrote:
   Smatch complains that we don't unlock master-mutex on this error
   path.  It looks like it is a typo and we unlock -bus_mutext where
   -mutex was intended.
  
   Fixes: d9411e57dc7f ('w1: Add support for DS28EA00 sequence to w1-therm')
   Signed-off-by: Dan Carpenter dan.carpen...@oracle.com
  
   diff --git a/drivers/w1/slaves/w1_therm.c b/drivers/w1/slaves/w1_therm.c
   index 3351be6..79ecaf8 100644
   --- a/drivers/w1/slaves/w1_therm.c
   +++ b/drivers/w1/slaves/w1_therm.c
   @@ -412,7 +412,7 @@ static ssize_t w1_seq_show(struct device *device,
 c -= snprintf(buf + PAGE_SIZE - c, c, %d\n, seq);
 return PAGE_SIZE - c;
error:
   - mutex_unlock(sl-master-bus_mutex);
   + mutex_unlock(sl-master-mutex);
 return -EIO;
}
  
 
  --
  David Fries da...@fries.net
 

-- 
David Fries da...@fries.net
--
To unsubscribe from this list: send the line

Re: [patch 2/2] w1: unlock correct lock on error in w1_seq_show()

2015-06-02 Thread David Fries

On Tue, Jun 02, 2015 at 09:54:10PM -0400, Matt Campbell wrote:
 Thanks for the response, I will try to answer your questions.
 
 The other function besides temp conversion and EEPROM write that
 requires a delay is the initial CHAIN on command. The datasheet says
 to Wait for chain to charge but it does not offer ANY indication on
 how to do so.  I did discover that my sleep was a sleep(0) long after
 I submitted the patch, but as you observed it was not hurting
 anything.

Then it should be removed as it gives the wrong impression and that
variable shouldn't be used in the slave code anyway.

 The 64 was, in all honesty, a starting point that I picked for lack of
 finding a better option.  I only currently have 10 sensors available
 for me to test with, so I was not able to do any significant tests to
 see what would happen if I raised that limit and the loop did not exit
 properly at the end of the chain.  I was also not able to find an edge
 case that would cause the loop to not find a match.  The w1_seq file
 only shows for the one type sensor that supports it so in the absence
 of a bus error, a match is guaranteed.

That's easy enough to test, comment out your if...break; in the loop.

Is there a good reason that the loop doesn't terminate when the device
you're after sl-reg_num.id is found?  By making your loop terminate
on a different condition, W1_42_FINISHED_BYTE, if the slave you are
after isn't found you'll still return seq of 0.  I'm wondering if seq
should be initialized to -1.

Do this, make up the right id of the correct family code and manually
tell the kernel it is there.  That would be the same as if you removed
a chip before the kernel's search timed it out.  Then do your sequence
scan on your devices including this new one which isn't there.
cd /sys/devices/w1_bus_master1
echo 42-01234567  w1_master_add

The devices return the rom numbers, I wonder about returning the rom
list (which is in the sequence), instead of the sequence number for a
device if that would be more or less useful to users.  You would be
able to do it in one pass.

 The output of the function in the event the pins are hooked up as GPIO
 and not a chain is undefined.  Since the chips are very tiny,
 prototyping this function would require me to design and order some
 boards that I never was able to use.  To my knowledge it is not
 possible to determine that things are wired this way in the software.

Something like Output is undefined if all direct sequence sensors on
this one wire bus aren't part of one chain.  GPIO control of the two
pins is not support by this driver. for the comment above
w1_seq_show, in addition to the other comments I was asking about.  I
see now you have some documentation in
Documentation/w1/slaves/w1_therm already that I didn't realize
earlier.

 Apologies, this is my first foray in submitting any sort of kernel
 code so I am not sure of exactly how I would submit a fixup patch for
 gregkh's tree.  If someone wants to walk me through that off list I
 would be most appreciative!

git remote add gregkh_char-misc 
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
git fetch gregkh_char-misc
git rebase remotes/gregkh_char-misc/char-misc-next
and add a new commit from there.

 Matt Campbell
 mattrcampb...@gmail.com
 
 
 On Tue, Jun 2, 2015 at 9:07 PM, David Fries da...@fries.net wrote:
  The source code is as good as it gets, that's why I posted those two
  lines from the w1.h file.  Maybe it could be expanded upon, mutex is
  more for adding or removing slaves and the like modifications to
  w1_master data structure, where bus_mutex is what make sure something
  else doesn't try to talk on the bus while the current device is using
  it.  What really matters is how other code is using the locks.  For
  instance if you look at w1_therm.c bus_mutex is the only one used.
  Then you look at what your routine is doing, and what the existing
  routine is doing, they are both reading/writing from the one wire bus,
  and then return their output to the user, so then use the same lock.
  If it's the wrong lock then it's at least consistent and easier to
  cleanup when someone figures out it's the wrong lock.  If you think
  it's the wrong lock raise a flag and ask.
 
  My patch review comments.
 
  The temperature sensor uses a strong pullup or delay of 750 ms to do
  the temperature conversion.  A quick read through of this sensor has
  temperature conversion and EEPROM write as the only operations
  requiring a significant time delay, everything else is in the us, so I
  would expect this to not require any delay.
 
  Read the comments to w1_next_pullup in w1_io.c as pullup_duration will
  always be 0 in msleep(sl-master-pullup_duration); which isn't likely
  what you expect.  Apparently it is working for you so that further
  reinforces my previous point.
 
  Where did the 64 come from?  I have 14 temperature sensors in my
  single family residence right now, and it would take

Re: [patch 2/2] w1: unlock correct lock on error in w1_seq_show()

2015-06-02 Thread David Fries

Thanks for including me, I'm going to have to nack this though, it is
an error, but bus_mutex is correct, mutex is incorrect, so the other
two lock/unlock in that routine are the ones that need to be changed.

w1.h
 * @mutex:  protect most of w1_master
 * @bus_mutex:  pretect concurrent bus access

On Mon, Jun 01, 2015 at 12:56:32PM +0300, Dan Carpenter wrote:
 Smatch complains that we don't unlock master-mutex on this error
 path.  It looks like it is a typo and we unlock -bus_mutext where
 -mutex was intended.
 
 Fixes: d9411e57dc7f ('w1: Add support for DS28EA00 sequence to w1-therm')
 Signed-off-by: Dan Carpenter dan.carpen...@oracle.com
 
 diff --git a/drivers/w1/slaves/w1_therm.c b/drivers/w1/slaves/w1_therm.c
 index 3351be6..79ecaf8 100644
 --- a/drivers/w1/slaves/w1_therm.c
 +++ b/drivers/w1/slaves/w1_therm.c
 @@ -412,7 +412,7 @@ static ssize_t w1_seq_show(struct device *device,
   c -= snprintf(buf + PAGE_SIZE - c, c, %d\n, seq);
   return PAGE_SIZE - c;
  error:
 - mutex_unlock(sl-master-bus_mutex);
 + mutex_unlock(sl-master-mutex);
   return -EIO;
  }
  

-- 
David Fries da...@fries.net
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] w1_therm reference count family data

2015-05-08 Thread David Fries

A temperature conversion can take 750 ms and when possible the
w1_therm slave driver drops the bus_mutex to allow other bus
operations, but that includes operations such as a periodic slave
search, which can remove this slave when it is no longer detected.
If that happens the sl->family_data will be freed and set to NULL
causing w1_slave_show to crash when it wakes up.

Signed-off-by: David Fries 
Reported-By: Thorsten Bschorr 
Tested-by: Thorsten Bschorr 
Acked-by: Evgeniy Polyakov 
---
This should be applied to the stable series as well.  In the name of
full disclosure, this just narrows the race window, from crashing in
normal operation on the reporters system to no longer crashing with
multiple readers and another process hammering on inserting/removing
the slave device.

This patch has been tested for some weeks by those who were affected,
Evgeniy Polyakov (maintainer) has approved this fix with the intention
of switching to sysfs device reference counting (as w1_slave_show is
called through sysfs).

 drivers/w1/slaves/w1_therm.c |   62 --
 1 file changed, 47 insertions(+), 15 deletions(-)

diff --git a/drivers/w1/slaves/w1_therm.c b/drivers/w1/slaves/w1_therm.c
index 1f11a20..55eb86c 100644
--- a/drivers/w1/slaves/w1_therm.c
+++ b/drivers/w1/slaves/w1_therm.c
@@ -59,16 +59,32 @@ MODULE_ALIAS("w1-family-" __stringify(W1_THERM_DS28EA00));
 static int w1_strong_pullup = 1;
 module_param_named(strong_pullup, w1_strong_pullup, int, 0);
 
+struct w1_therm_family_data {
+   uint8_t rom[9];
+   atomic_t refcnt;
+};
+
+/* return the address of the refcnt in the family data */
+#define THERM_REFCNT(family_data) \
+   (&((struct w1_therm_family_data*)family_data)->refcnt)
+
 static int w1_therm_add_slave(struct w1_slave *sl)
 {
-   sl->family_data = kzalloc(9, GFP_KERNEL);
+   sl->family_data = kzalloc(sizeof(struct w1_therm_family_data),
+   GFP_KERNEL);
if (!sl->family_data)
return -ENOMEM;
+   atomic_set(THERM_REFCNT(sl->family_data), 1);
return 0;
 }
 
 static void w1_therm_remove_slave(struct w1_slave *sl)
 {
+   int refcnt = atomic_sub_return(1, THERM_REFCNT(sl->family_data));
+   while(refcnt) {
+   msleep(1000);
+   refcnt = atomic_read(THERM_REFCNT(sl->family_data));
+   }
kfree(sl->family_data);
sl->family_data = NULL;
 }
@@ -194,13 +210,22 @@ static ssize_t w1_slave_show(struct device *device,
struct w1_slave *sl = dev_to_w1_slave(device);
struct w1_master *dev = sl->master;
u8 rom[9], crc, verdict, external_power;
-   int i, max_trying = 10;
+   int i, ret, max_trying = 10;
ssize_t c = PAGE_SIZE;
+   u8 *family_data = sl->family_data;
+
+   ret = mutex_lock_interruptible(>bus_mutex);
+   if (ret != 0)
+   goto post_unlock;
 
-   i = mutex_lock_interruptible(>bus_mutex);
-   if (i != 0)
-   return i;
+   if(!sl->family_data)
+   {
+   ret = -ENODEV;
+   goto pre_unlock;
+   }
 
+   /* prevent the slave from going away in sleep */
+   atomic_inc(THERM_REFCNT(family_data));
memset(rom, 0, sizeof(rom));
 
while (max_trying--) {
@@ -230,17 +255,19 @@ static ssize_t w1_slave_show(struct device *device,
mutex_unlock(>bus_mutex);
 
sleep_rem = msleep_interruptible(tm);
-   if (sleep_rem != 0)
-   return -EINTR;
+   if (sleep_rem != 0) {
+   ret = -EINTR;
+   goto post_unlock;
+   }
 
-   i = mutex_lock_interruptible(>bus_mutex);
-   if (i != 0)
-   return i;
+   ret = mutex_lock_interruptible(>bus_mutex);
+   if (ret != 0)
+   goto post_unlock;
} else if (!w1_strong_pullup) {
sleep_rem = msleep_interruptible(tm);
if (sleep_rem != 0) {
-   mutex_unlock(>bus_mutex);
-   return -EINTR;
+   ret = -EINTR;
+   goto pre_unlock;
}
}
 
@@ -269,19 +296,24 @@ static ssize_t w1_slave_show(struct device *device,
c -= snprintf(buf + PAGE_SIZE - c, c, ": crc=%02x %s\n",
   crc, (verdict) ? "YES" : "NO");
if (verdict)
-   memcpy(sl->family_data, rom, si

[PATCH] w1_therm reference count family data

2015-05-08 Thread David Fries

A temperature conversion can take 750 ms and when possible the
w1_therm slave driver drops the bus_mutex to allow other bus
operations, but that includes operations such as a periodic slave
search, which can remove this slave when it is no longer detected.
If that happens the sl-family_data will be freed and set to NULL
causing w1_slave_show to crash when it wakes up.

Signed-off-by: David Fries da...@fries.net
Reported-By: Thorsten Bschorr thors...@bschorr.de
Tested-by: Thorsten Bschorr thors...@bschorr.de
Acked-by: Evgeniy Polyakov z...@ioremap.net
---
This should be applied to the stable series as well.  In the name of
full disclosure, this just narrows the race window, from crashing in
normal operation on the reporters system to no longer crashing with
multiple readers and another process hammering on inserting/removing
the slave device.

This patch has been tested for some weeks by those who were affected,
Evgeniy Polyakov (maintainer) has approved this fix with the intention
of switching to sysfs device reference counting (as w1_slave_show is
called through sysfs).

 drivers/w1/slaves/w1_therm.c |   62 --
 1 file changed, 47 insertions(+), 15 deletions(-)

diff --git a/drivers/w1/slaves/w1_therm.c b/drivers/w1/slaves/w1_therm.c
index 1f11a20..55eb86c 100644
--- a/drivers/w1/slaves/w1_therm.c
+++ b/drivers/w1/slaves/w1_therm.c
@@ -59,16 +59,32 @@ MODULE_ALIAS(w1-family- __stringify(W1_THERM_DS28EA00));
 static int w1_strong_pullup = 1;
 module_param_named(strong_pullup, w1_strong_pullup, int, 0);
 
+struct w1_therm_family_data {
+   uint8_t rom[9];
+   atomic_t refcnt;
+};
+
+/* return the address of the refcnt in the family data */
+#define THERM_REFCNT(family_data) \
+   (((struct w1_therm_family_data*)family_data)-refcnt)
+
 static int w1_therm_add_slave(struct w1_slave *sl)
 {
-   sl-family_data = kzalloc(9, GFP_KERNEL);
+   sl-family_data = kzalloc(sizeof(struct w1_therm_family_data),
+   GFP_KERNEL);
if (!sl-family_data)
return -ENOMEM;
+   atomic_set(THERM_REFCNT(sl-family_data), 1);
return 0;
 }
 
 static void w1_therm_remove_slave(struct w1_slave *sl)
 {
+   int refcnt = atomic_sub_return(1, THERM_REFCNT(sl-family_data));
+   while(refcnt) {
+   msleep(1000);
+   refcnt = atomic_read(THERM_REFCNT(sl-family_data));
+   }
kfree(sl-family_data);
sl-family_data = NULL;
 }
@@ -194,13 +210,22 @@ static ssize_t w1_slave_show(struct device *device,
struct w1_slave *sl = dev_to_w1_slave(device);
struct w1_master *dev = sl-master;
u8 rom[9], crc, verdict, external_power;
-   int i, max_trying = 10;
+   int i, ret, max_trying = 10;
ssize_t c = PAGE_SIZE;
+   u8 *family_data = sl-family_data;
+
+   ret = mutex_lock_interruptible(dev-bus_mutex);
+   if (ret != 0)
+   goto post_unlock;
 
-   i = mutex_lock_interruptible(dev-bus_mutex);
-   if (i != 0)
-   return i;
+   if(!sl-family_data)
+   {
+   ret = -ENODEV;
+   goto pre_unlock;
+   }
 
+   /* prevent the slave from going away in sleep */
+   atomic_inc(THERM_REFCNT(family_data));
memset(rom, 0, sizeof(rom));
 
while (max_trying--) {
@@ -230,17 +255,19 @@ static ssize_t w1_slave_show(struct device *device,
mutex_unlock(dev-bus_mutex);
 
sleep_rem = msleep_interruptible(tm);
-   if (sleep_rem != 0)
-   return -EINTR;
+   if (sleep_rem != 0) {
+   ret = -EINTR;
+   goto post_unlock;
+   }
 
-   i = mutex_lock_interruptible(dev-bus_mutex);
-   if (i != 0)
-   return i;
+   ret = mutex_lock_interruptible(dev-bus_mutex);
+   if (ret != 0)
+   goto post_unlock;
} else if (!w1_strong_pullup) {
sleep_rem = msleep_interruptible(tm);
if (sleep_rem != 0) {
-   mutex_unlock(dev-bus_mutex);
-   return -EINTR;
+   ret = -EINTR;
+   goto pre_unlock;
}
}
 
@@ -269,19 +296,24 @@ static ssize_t w1_slave_show(struct device *device,
c -= snprintf(buf + PAGE_SIZE - c, c, : crc=%02x %s\n,
   crc, (verdict) ? YES : NO);
if (verdict)
-   memcpy(sl-family_data, rom, sizeof(rom));
+   memcpy

Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm

2015-04-15 Thread David Fries

It has not been solved.  Evgeniy would like to make use of the sysfs
device management instead of the current reference counting, however I
haven't heard any volunteers to do that work.  I posted a quick fix
patch, it was very easy to crash without this patch, it doesn't
completely solve the race conditions, and I don't think it can be
solved in just a slave driver change.

Are you up for the challenge?

On Wed, Apr 15, 2015 at 09:52:27AM +0200, Jonathan ALIBERT wrote:
> Hi,
> 
> Do you know if the problem has been solved ?
> 
> Cheers,
> 
> *Jonathan ALIBERT*
> *06 32 26 59 12*
> *265, route de Saint Haon*
> *42 370 RENAISON*
> 
> 
> 2015-03-19 1:09 GMT+01:00 David Fries :
> 
> > On Wed, Mar 18, 2015 at 06:18:53PM +0300, Evgeniy Polyakov wrote:
> > > Hi
> > >
> > > 18.03.2015, 07:20, "David Fries" :
> > > >  static void w1_therm_remove_slave(struct w1_slave *sl)
> > > >  {
> > > > + int refcnt = atomic_sub_return(1, THERM_REFCNT(sl->family_data));
> > > > + while(refcnt) {
> > > > + msleep(1000);
> > > > + refcnt = atomic_read(THERM_REFCNT(sl->family_data));
> > > > + }
> > > >  kfree(sl->family_data);
> > > >  sl->family_data = NULL;
> > > >  }
> > >
> > > Can we replace this whole atomic manipulations with kref_t and free
> > family data in the place
> > > which actually drops reference counter to zero?
> > >
> > > I.e. we return from remove_slave() function potentially leaving family
> > data floating around, it will be freed
> > > when the last user drops the reference. There is still a race between
> > increasing reference when starting
> > > reading and removing slave device, i.e. one starts reading, while
> > attached slave device is being removed,
> > > but that's a different problem.
> >
> > With the two while loops I posted, I see with two clients reading
> > w1_slave, the other command to remove a slave gets permanently stuck
> > in w1_therm_remove_slave, which keeps the slave around while the
> > clients continue to read.  I wouldn't predict things going better by
> > keeping family_data around longer, the slave data would still go away
> > with readers around.
> >
> > --
> > David Fries PGP pub CB1EE8F0
> > http://fries.net/~david/
> >

-- 
David Fries PGP pub CB1EE8F0
http://fries.net/~david/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm

2015-04-15 Thread David Fries

It has not been solved.  Evgeniy would like to make use of the sysfs
device management instead of the current reference counting, however I
haven't heard any volunteers to do that work.  I posted a quick fix
patch, it was very easy to crash without this patch, it doesn't
completely solve the race conditions, and I don't think it can be
solved in just a slave driver change.

Are you up for the challenge?

On Wed, Apr 15, 2015 at 09:52:27AM +0200, Jonathan ALIBERT wrote:
 Hi,
 
 Do you know if the problem has been solved ?
 
 Cheers,
 
 *Jonathan ALIBERT*
 *06 32 26 59 12*
 *265, route de Saint Haon*
 *42 370 RENAISON*
 
 
 2015-03-19 1:09 GMT+01:00 David Fries da...@fries.net:
 
  On Wed, Mar 18, 2015 at 06:18:53PM +0300, Evgeniy Polyakov wrote:
   Hi
  
   18.03.2015, 07:20, David Fries da...@fries.net:
 static void w1_therm_remove_slave(struct w1_slave *sl)
 {
+ int refcnt = atomic_sub_return(1, THERM_REFCNT(sl-family_data));
+ while(refcnt) {
+ msleep(1000);
+ refcnt = atomic_read(THERM_REFCNT(sl-family_data));
+ }
 kfree(sl-family_data);
 sl-family_data = NULL;
 }
  
   Can we replace this whole atomic manipulations with kref_t and free
  family data in the place
   which actually drops reference counter to zero?
  
   I.e. we return from remove_slave() function potentially leaving family
  data floating around, it will be freed
   when the last user drops the reference. There is still a race between
  increasing reference when starting
   reading and removing slave device, i.e. one starts reading, while
  attached slave device is being removed,
   but that's a different problem.
 
  With the two while loops I posted, I see with two clients reading
  w1_slave, the other command to remove a slave gets permanently stuck
  in w1_therm_remove_slave, which keeps the slave around while the
  clients continue to read.  I wouldn't predict things going better by
  keeping family_data around longer, the slave data would still go away
  with readers around.
 
  --
  David Fries da...@fries.netPGP pub CB1EE8F0
  http://fries.net/~david/
 

-- 
David Fries da...@fries.netPGP pub CB1EE8F0
http://fries.net/~david/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm

2015-03-18 Thread David Fries

On Wed, Mar 18, 2015 at 06:18:53PM +0300, Evgeniy Polyakov wrote:
> Hi
> 
> 18.03.2015, 07:20, "David Fries" :
> >  static void w1_therm_remove_slave(struct w1_slave *sl)
> >  {
> > + int refcnt = atomic_sub_return(1, THERM_REFCNT(sl->family_data));
> > + while(refcnt) {
> > + msleep(1000);
> > + refcnt = atomic_read(THERM_REFCNT(sl->family_data));
> > + }
> >  kfree(sl->family_data);
> >  sl->family_data = NULL;
> >  }
> 
> Can we replace this whole atomic manipulations with kref_t and free family 
> data in the place
> which actually drops reference counter to zero?
> 
> I.e. we return from remove_slave() function potentially leaving family data 
> floating around, it will be freed
> when the last user drops the reference. There is still a race between 
> increasing reference when starting
> reading and removing slave device, i.e. one starts reading, while attached 
> slave device is being removed,
> but that's a different problem.

With the two while loops I posted, I see with two clients reading
w1_slave, the other command to remove a slave gets permanently stuck
in w1_therm_remove_slave, which keeps the slave around while the
clients continue to read.  I wouldn't predict things going better by
keeping family_data around longer, the slave data would still go away
with readers around.

-- 
David Fries PGP pub CB1EE8F0
http://fries.net/~david/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm

2015-03-18 Thread David Fries

On Wed, Mar 18, 2015 at 06:18:53PM +0300, Evgeniy Polyakov wrote:
 Hi
 
 18.03.2015, 07:20, David Fries da...@fries.net:
   static void w1_therm_remove_slave(struct w1_slave *sl)
   {
  + int refcnt = atomic_sub_return(1, THERM_REFCNT(sl-family_data));
  + while(refcnt) {
  + msleep(1000);
  + refcnt = atomic_read(THERM_REFCNT(sl-family_data));
  + }
   kfree(sl-family_data);
   sl-family_data = NULL;
   }
 
 Can we replace this whole atomic manipulations with kref_t and free family 
 data in the place
 which actually drops reference counter to zero?
 
 I.e. we return from remove_slave() function potentially leaving family data 
 floating around, it will be freed
 when the last user drops the reference. There is still a race between 
 increasing reference when starting
 reading and removing slave device, i.e. one starts reading, while attached 
 slave device is being removed,
 but that's a different problem.

With the two while loops I posted, I see with two clients reading
w1_slave, the other command to remove a slave gets permanently stuck
in w1_therm_remove_slave, which keeps the slave around while the
clients continue to read.  I wouldn't predict things going better by
keeping family_data around longer, the slave data would still go away
with readers around.

-- 
David Fries da...@fries.netPGP pub CB1EE8F0
http://fries.net/~david/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm

2015-03-17 Thread David Fries

On Sat, Mar 14, 2015 at 11:55:16PM +0300, Evgeniy Polyakov wrote:
> Hi David
> 
> 12.03.2015, 03:54, "David Fries" :
> > Would that be removing all four refcnt, w1_slave, w1_master,
> > w1_family, w1_cb_block, or just some of them?  It sounds good to me,
> > if that had bugs there would be much more than just the w1 system
> > relying on it.  I don't know enough about that system or have the time
> > to code up that change.
> >
> > I can take another look at and post the reference counting w1_therm
> > fix instead of the mutex version as a near term work around until that
> > is available if you want.
> 
> Please cook up a quick fix for this problem - this bug really hurts people.
> And then we will discuss how 'ideal' life cycle should look

Done, I don't like it, I'm not sure anyone else will either, but I'm
no longer crashing in testing, so that's an improvement.  My
"production" system doesn't use w1_therm, so I only see these bugs in
development testing it.  I've come to the conclusion that in the face
of a slave vanishing, w1_therm can't avoid all the race conditions, so
the real fix must be elsewhere.

>From 51d4024ca667c8b712de462489d125a78e85aa57 Mon Sep 17 00:00:00 2001
From: David Fries 
Date: Sat, 7 Mar 2015 22:25:37 -0600
Subject: [PATCH] w1_therm, reduce race conditions in w1_slave_show

After applying this patch commands such as the following in one
process,

slave=28-02c95fb1
while true; do echo $slave > /sys/devices/w1_bus_master1/w1_master_add; sleep 
.1; echo $slave > /sys/devices/w1_bus_master1/w1_master_remove; sleep .1; done

and then two at the same time in two other processes,
slave=28-02c95fb1
while true; do time cat /sys/devices/w1_bus_master1/$slave/w1_slave ; sleep .1; 
done

then randomly stop all three and repeat.

With this patch I no longer see crashes, but at best this patch
effectively hiding the result of a race condition.  sl->family_data is
being freed and set to NULL in the slave removal while the
w1_slave_show is then dereferencing it, this holds on to the pointer
meaning it's probably clobbering memory now instead of crashing.  I
wonder if that would make RCU be a fit for this?  The original bug
report was pointing the problem as unlocking bus_mutex while waiting
for the temperature conversion, but I was getting sl->family_data set
to NULL more reliable without external power which means bux_mutex was
held for the duration of w1_slave_show, which is not to say that the
original bug report wasn't correct, it is to say that even with the
spinlock, holding bus_mutex on the slave, isn't sufficient to keep the
slave from being removed.

Reported-By: Thorsten Bschorr 
---
 drivers/w1/slaves/w1_therm.c |   70 +-
 1 file changed, 55 insertions(+), 15 deletions(-)

diff --git a/drivers/w1/slaves/w1_therm.c b/drivers/w1/slaves/w1_therm.c
index 1f11a20..403285d 100644
--- a/drivers/w1/slaves/w1_therm.c
+++ b/drivers/w1/slaves/w1_therm.c
@@ -59,16 +59,32 @@ MODULE_ALIAS("w1-family-" __stringify(W1_THERM_DS28EA00));
 static int w1_strong_pullup = 1;
 module_param_named(strong_pullup, w1_strong_pullup, int, 0);
 
+struct w1_therm_family_data {
+   uint8_t rom[9];
+   atomic_t refcnt;
+};
+
+/* return the address of the refcnt in the family data */
+#define THERM_REFCNT(family_data) \
+   (&((struct w1_therm_family_data*)family_data)->refcnt)
+
 static int w1_therm_add_slave(struct w1_slave *sl)
 {
-   sl->family_data = kzalloc(9, GFP_KERNEL);
+   sl->family_data = kzalloc(sizeof(struct w1_therm_family_data),
+   GFP_KERNEL);
if (!sl->family_data)
return -ENOMEM;
+   atomic_set(THERM_REFCNT(sl->family_data), 1);
return 0;
 }
 
 static void w1_therm_remove_slave(struct w1_slave *sl)
 {
+   int refcnt = atomic_sub_return(1, THERM_REFCNT(sl->family_data));
+   while(refcnt) {
+   msleep(1000);
+   refcnt = atomic_read(THERM_REFCNT(sl->family_data));
+   }
kfree(sl->family_data);
sl->family_data = NULL;
 }
@@ -194,13 +210,30 @@ static ssize_t w1_slave_show(struct device *device,
struct w1_slave *sl = dev_to_w1_slave(device);
struct w1_master *dev = sl->master;
u8 rom[9], crc, verdict, external_power;
-   int i, max_trying = 10;
+   int i, ret, max_trying = 10;
ssize_t c = PAGE_SIZE;
+   u8 *family_data = sl->family_data;
+
+   ret = mutex_lock_interruptible(>bus_mutex);
+   if (ret != 0)
+   goto post_unlock;
 
-   i = mutex_lock_interruptible(>bus_mutex);
-   if (i != 0)
-   return i;
+   if(!sl->family_data)
+   {
+   ret = -ENODEV;
+   /* Note for anyoe who actually saw this message, it is a known
+* problem with eith

Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm

2015-03-17 Thread David Fries

On Sat, Mar 14, 2015 at 11:55:16PM +0300, Evgeniy Polyakov wrote:
 Hi David
 
 12.03.2015, 03:54, David Fries da...@fries.net:
  Would that be removing all four refcnt, w1_slave, w1_master,
  w1_family, w1_cb_block, or just some of them?  It sounds good to me,
  if that had bugs there would be much more than just the w1 system
  relying on it.  I don't know enough about that system or have the time
  to code up that change.
 
  I can take another look at and post the reference counting w1_therm
  fix instead of the mutex version as a near term work around until that
  is available if you want.
 
 Please cook up a quick fix for this problem - this bug really hurts people.
 And then we will discuss how 'ideal' life cycle should look

Done, I don't like it, I'm not sure anyone else will either, but I'm
no longer crashing in testing, so that's an improvement.  My
production system doesn't use w1_therm, so I only see these bugs in
development testing it.  I've come to the conclusion that in the face
of a slave vanishing, w1_therm can't avoid all the race conditions, so
the real fix must be elsewhere.

From 51d4024ca667c8b712de462489d125a78e85aa57 Mon Sep 17 00:00:00 2001
From: David Fries da...@fries.net
Date: Sat, 7 Mar 2015 22:25:37 -0600
Subject: [PATCH] w1_therm, reduce race conditions in w1_slave_show

After applying this patch commands such as the following in one
process,

slave=28-02c95fb1
while true; do echo $slave  /sys/devices/w1_bus_master1/w1_master_add; sleep 
.1; echo $slave  /sys/devices/w1_bus_master1/w1_master_remove; sleep .1; done

and then two at the same time in two other processes,
slave=28-02c95fb1
while true; do time cat /sys/devices/w1_bus_master1/$slave/w1_slave ; sleep .1; 
done

then randomly stop all three and repeat.

With this patch I no longer see crashes, but at best this patch
effectively hiding the result of a race condition.  sl-family_data is
being freed and set to NULL in the slave removal while the
w1_slave_show is then dereferencing it, this holds on to the pointer
meaning it's probably clobbering memory now instead of crashing.  I
wonder if that would make RCU be a fit for this?  The original bug
report was pointing the problem as unlocking bus_mutex while waiting
for the temperature conversion, but I was getting sl-family_data set
to NULL more reliable without external power which means bux_mutex was
held for the duration of w1_slave_show, which is not to say that the
original bug report wasn't correct, it is to say that even with the
spinlock, holding bus_mutex on the slave, isn't sufficient to keep the
slave from being removed.

Reported-By: Thorsten Bschorr thors...@bschorr.de
---
 drivers/w1/slaves/w1_therm.c |   70 +-
 1 file changed, 55 insertions(+), 15 deletions(-)

diff --git a/drivers/w1/slaves/w1_therm.c b/drivers/w1/slaves/w1_therm.c
index 1f11a20..403285d 100644
--- a/drivers/w1/slaves/w1_therm.c
+++ b/drivers/w1/slaves/w1_therm.c
@@ -59,16 +59,32 @@ MODULE_ALIAS(w1-family- __stringify(W1_THERM_DS28EA00));
 static int w1_strong_pullup = 1;
 module_param_named(strong_pullup, w1_strong_pullup, int, 0);
 
+struct w1_therm_family_data {
+   uint8_t rom[9];
+   atomic_t refcnt;
+};
+
+/* return the address of the refcnt in the family data */
+#define THERM_REFCNT(family_data) \
+   (((struct w1_therm_family_data*)family_data)-refcnt)
+
 static int w1_therm_add_slave(struct w1_slave *sl)
 {
-   sl-family_data = kzalloc(9, GFP_KERNEL);
+   sl-family_data = kzalloc(sizeof(struct w1_therm_family_data),
+   GFP_KERNEL);
if (!sl-family_data)
return -ENOMEM;
+   atomic_set(THERM_REFCNT(sl-family_data), 1);
return 0;
 }
 
 static void w1_therm_remove_slave(struct w1_slave *sl)
 {
+   int refcnt = atomic_sub_return(1, THERM_REFCNT(sl-family_data));
+   while(refcnt) {
+   msleep(1000);
+   refcnt = atomic_read(THERM_REFCNT(sl-family_data));
+   }
kfree(sl-family_data);
sl-family_data = NULL;
 }
@@ -194,13 +210,30 @@ static ssize_t w1_slave_show(struct device *device,
struct w1_slave *sl = dev_to_w1_slave(device);
struct w1_master *dev = sl-master;
u8 rom[9], crc, verdict, external_power;
-   int i, max_trying = 10;
+   int i, ret, max_trying = 10;
ssize_t c = PAGE_SIZE;
+   u8 *family_data = sl-family_data;
+
+   ret = mutex_lock_interruptible(dev-bus_mutex);
+   if (ret != 0)
+   goto post_unlock;
 
-   i = mutex_lock_interruptible(dev-bus_mutex);
-   if (i != 0)
-   return i;
+   if(!sl-family_data)
+   {
+   ret = -ENODEV;
+   /* Note for anyoe who actually saw this message, it is a known
+* problem with either slave drivers or this driver in
+* particular and the request is only a canary indication as
+* to how many people and how often

Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm

2015-03-11 Thread David Fries

On Tue, Mar 10, 2015 at 04:52:00PM +0300, Evgeniy Polyakov wrote:
> Hi
> 
> 10.03.2015, 02:09, "David Fries" :
> 
> > diff --git a/drivers/w1/slaves/w1_therm.c b/drivers/w1/slaves/w1_therm.c
> > index 1f11a20..39a9e6a 100644
> > --- a/drivers/w1/slaves/w1_therm.c
> > +++ b/drivers/w1/slaves/w1_therm.c
> > @@ -59,9 +59,20 @@ MODULE_ALIAS("w1-family-" 
> > __stringify(W1_THERM_DS28EA00));
> >  static int w1_strong_pullup = 1;
> >  module_param_named(strong_pullup, w1_strong_pullup, int, 0);
> >
> > +struct w1_therm_family_data {
> > + uint8_t rom[9];
> > + struct mutex lock;
> > +};
> 
> This approach will not scale to other w1 families, I would rather prefer 
> solutions on w1 level,
> not in particular drivers. What if we drop slave reference counter at all in 
> favor of automatic sysfs device management?

I looked and didn't see any of the other slaves dropping the lock and
being in this situation, but that doesn't mean they won't in the
future.  Personally I'm just using netlink and don't plan on using any
of the slave drivers.

Would that be removing all four refcnt, w1_slave, w1_master,
w1_family, w1_cb_block, or just some of them?  It sounds good to me,
if that had bugs there would be much more than just the w1 system
relying on it.  I don't know enough about that system or have the time
to code up that change.

I can take another look at and post the reference counting w1_therm
fix instead of the mutex version as a near term work around until that
is available if you want.

-- 
David Fries PGP pub CB1EE8F0
http://fries.net/~david/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm

2015-03-11 Thread David Fries

On Tue, Mar 10, 2015 at 01:34:14AM +0100, Thorsten Bschorr wrote:
> > It looks like your patch runs into dead locks problems:

I was testing reading and removing the slave, I didn't test two
readers, I'll keep that in mind.

For some reason I was thinking it was a try lock after the sleep, it
isn't, a signal can cause it to return a value, but yes that's not
going to work.  I have another version that uses an atomic counter in
place of this mutex, it requires a loop in the remove slave.

I have a one wire network with 14 temperature sensors on it.  What I
do is I have a program that talks to the kernel over netlink, which is
now non-blocking for one wire after an earlier set of changes I made.
It sends out all the conversion request requests, waits, sends the
read request packets, then gets back the results, and relays the
results to the program that requested them.  It doesn't use the
w1_therm and avoids all these problems.  If I read them sequentially
with w1_therm, that would take more than 14*.750 or 10.5 seconds,
there's no way with w1_therm to read them at once without having 14
threads or programs doing so, the problem is it takes a lot more
programming to do netlink.

> > I have a cron job reading my sensors. If I read the sensors on another
> > thread (e.g. via cat), the 2nd thread can produce a dead lock:
> >
> > * thread 1 has bus & sl lock
> > * thread 1 drops bus lock, keeps sl locks and sleeps
> > * thread 2 get bus lock, waits for sl lock
> > * thread 1 returns from sleep, waits for bus lock, but this is help by 
> > thread 2
> 
> 
> Aquiring  sl lock before the bus lock prevents this dead lock (no
> change of locking-order).
> With search enabled, removing a device by w1_search_process_cb then
> also worked as intended:
> 
> Mar 10 01:29:04 pi kernel: [  924.870893] w1_therm_remove_slave about
> to lock faily_data->lock
> Mar 10 01:29:04 pi kernel: [  924.870935] w1_therm_remove_slave
> unlocked faily_data->lock
> 
> Mar 10 01:29:04 pi kernel: [  924.871115] w1_therm_remove_slave about
> to lock faily_data->lock
> Mar 10 01:29:05 pi kernel: [  925.151242] start sleep d117a600 refcnt 1 **
> Mar 10 01:29:05 pi kernel: [  925.151277] end sleep d117a600 refcnt 0 **
> Mar 10 01:29:11 pi kernel: [  931.295344] w1_slave_driver
> 10-000802cc045a: Read failed CRC check
> Mar 10 01:29:11 pi kernel: [  931.295437] w1_therm_remove_slave
> unlocked faily_data->lock
> ** I get the ref-cnt before and after the sleep and only log if they differ.
> 
> 
> But I am unable to judge if mobing the sl lock at the beginning of
> w1_slave_show can cause dead locks in other scenarios.

I'm not sure, I would probably switch back to the referencing counting
version I wrote earlier, or make the bus mutex lock a timed lock or
try lock first.

-- 
David Fries PGP pub CB1EE8F0
http://fries.net/~david/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm

2015-03-11 Thread David Fries

On Tue, Mar 10, 2015 at 01:34:14AM +0100, Thorsten Bschorr wrote:
  It looks like your patch runs into dead locks problems:

I was testing reading and removing the slave, I didn't test two
readers, I'll keep that in mind.

For some reason I was thinking it was a try lock after the sleep, it
isn't, a signal can cause it to return a value, but yes that's not
going to work.  I have another version that uses an atomic counter in
place of this mutex, it requires a loop in the remove slave.

I have a one wire network with 14 temperature sensors on it.  What I
do is I have a program that talks to the kernel over netlink, which is
now non-blocking for one wire after an earlier set of changes I made.
It sends out all the conversion request requests, waits, sends the
read request packets, then gets back the results, and relays the
results to the program that requested them.  It doesn't use the
w1_therm and avoids all these problems.  If I read them sequentially
with w1_therm, that would take more than 14*.750 or 10.5 seconds,
there's no way with w1_therm to read them at once without having 14
threads or programs doing so, the problem is it takes a lot more
programming to do netlink.

  I have a cron job reading my sensors. If I read the sensors on another
  thread (e.g. via cat), the 2nd thread can produce a dead lock:
 
  * thread 1 has bus  sl lock
  * thread 1 drops bus lock, keeps sl locks and sleeps
  * thread 2 get bus lock, waits for sl lock
  * thread 1 returns from sleep, waits for bus lock, but this is help by 
  thread 2
 
 
 Aquiring  sl lock before the bus lock prevents this dead lock (no
 change of locking-order).
 With search enabled, removing a device by w1_search_process_cb then
 also worked as intended:
 
 Mar 10 01:29:04 pi kernel: [  924.870893] w1_therm_remove_slave about
 to lock faily_data-lock
 Mar 10 01:29:04 pi kernel: [  924.870935] w1_therm_remove_slave
 unlocked faily_data-lock
 
 Mar 10 01:29:04 pi kernel: [  924.871115] w1_therm_remove_slave about
 to lock faily_data-lock
 Mar 10 01:29:05 pi kernel: [  925.151242] start sleep d117a600 refcnt 1 **
 Mar 10 01:29:05 pi kernel: [  925.151277] end sleep d117a600 refcnt 0 **
 Mar 10 01:29:11 pi kernel: [  931.295344] w1_slave_driver
 10-000802cc045a: Read failed CRC check
 Mar 10 01:29:11 pi kernel: [  931.295437] w1_therm_remove_slave
 unlocked faily_data-lock
 ** I get the ref-cnt before and after the sleep and only log if they differ.
 
 
 But I am unable to judge if mobing the sl lock at the beginning of
 w1_slave_show can cause dead locks in other scenarios.

I'm not sure, I would probably switch back to the referencing counting
version I wrote earlier, or make the bus mutex lock a timed lock or
try lock first.

-- 
David Fries da...@fries.netPGP pub CB1EE8F0
http://fries.net/~david/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm

2015-03-11 Thread David Fries

On Tue, Mar 10, 2015 at 04:52:00PM +0300, Evgeniy Polyakov wrote:
 Hi
 
 10.03.2015, 02:09, David Fries da...@fries.net:
 
  diff --git a/drivers/w1/slaves/w1_therm.c b/drivers/w1/slaves/w1_therm.c
  index 1f11a20..39a9e6a 100644
  --- a/drivers/w1/slaves/w1_therm.c
  +++ b/drivers/w1/slaves/w1_therm.c
  @@ -59,9 +59,20 @@ MODULE_ALIAS(w1-family- 
  __stringify(W1_THERM_DS28EA00));
   static int w1_strong_pullup = 1;
   module_param_named(strong_pullup, w1_strong_pullup, int, 0);
 
  +struct w1_therm_family_data {
  + uint8_t rom[9];
  + struct mutex lock;
  +};
 
 This approach will not scale to other w1 families, I would rather prefer 
 solutions on w1 level,
 not in particular drivers. What if we drop slave reference counter at all in 
 favor of automatic sysfs device management?

I looked and didn't see any of the other slaves dropping the lock and
being in this situation, but that doesn't mean they won't in the
future.  Personally I'm just using netlink and don't plan on using any
of the slave drivers.

Would that be removing all four refcnt, w1_slave, w1_master,
w1_family, w1_cb_block, or just some of them?  It sounds good to me,
if that had bugs there would be much more than just the w1 system
relying on it.  I don't know enough about that system or have the time
to code up that change.

I can take another look at and post the reference counting w1_therm
fix instead of the mutex version as a near term work around until that
is available if you want.

-- 
David Fries da...@fries.netPGP pub CB1EE8F0
http://fries.net/~david/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm

2015-03-09 Thread David Fries

On Mon, Mar 09, 2015 at 11:47:24PM +0100, Thorsten Bschorr wrote:
> > Here's a different strategy, add a w1_therm family_data specific mutex
> > so w1_therm_remove_slave won't return while another thread is in
> > w1_slave_show.  Thoughts?
> >
> > I included three patches, the first is the proposed fix, the second
> > makes it more reproducible (and since my testing doesn't have external
> > power I had to ignore that check), the third is just some debugging
> > messages.  For testing I'm starting a read from w1_slave then manually
> > remove the sensor, which seems to do the trick.
> 
> I'll test your patch, but keeping the original sleep-time tm.

Oops, sorry, it got lost in the shuffle, here's the first patch again
(the others were for debugging and increase that time and so wouldn't
go upstream anyway).

>From 777f5fd75f5f99a3352863e83d226c7b65ebdaa4 Mon Sep 17 00:00:00 2001
From: David Fries 
Date: Sat, 7 Mar 2015 22:25:37 -0600
Subject: [PATCH] w1_therm, don't let the slave go away while in w1_slave_show

A temperature conversion can take 750 ms to complete, if the sensor is
externally powered it releases the bus_mutex while it waits, but if
the slave device is removed sl becomes a dangling pointer.
The race condition window can be 10 * 750 ms = 7.5 seconds if the crc
check fails.

Reported-By: Thorsten Bschorr 
Signed-off-by: David Fries 
---
 drivers/w1/slaves/w1_therm.c | 51 +---
 1 file changed, 38 insertions(+), 13 deletions(-)

diff --git a/drivers/w1/slaves/w1_therm.c b/drivers/w1/slaves/w1_therm.c
index 1f11a20..39a9e6a 100644
--- a/drivers/w1/slaves/w1_therm.c
+++ b/drivers/w1/slaves/w1_therm.c
@@ -59,9 +59,20 @@ MODULE_ALIAS("w1-family-" __stringify(W1_THERM_DS28EA00));
 static int w1_strong_pullup = 1;
 module_param_named(strong_pullup, w1_strong_pullup, int, 0);
 
+struct w1_therm_family_data {
+   uint8_t rom[9];
+   struct mutex lock;
+};
+
+/* return the address of the lock in the family data */
+#define THERM_LOCK(sl) \
+   (&((struct w1_therm_family_data*)sl->family_data)->lock)
+
 static int w1_therm_add_slave(struct w1_slave *sl)
 {
-   sl->family_data = kzalloc(9, GFP_KERNEL);
+   sl->family_data = kzalloc(sizeof(struct w1_therm_family_data),
+   GFP_KERNEL);
+   mutex_init(THERM_LOCK(sl));
if (!sl->family_data)
return -ENOMEM;
return 0;
@@ -69,6 +80,11 @@ static int w1_therm_add_slave(struct w1_slave *sl)
 
 static void w1_therm_remove_slave(struct w1_slave *sl)
 {
+   /* Getting the lock means w1_slave_show isn't sleeping and the
+* family_data can be freed.
+*/
+   mutex_lock(THERM_LOCK(sl));
+   mutex_unlock(THERM_LOCK(sl));
kfree(sl->family_data);
sl->family_data = NULL;
 }
@@ -194,13 +210,15 @@ static ssize_t w1_slave_show(struct device *device,
struct w1_slave *sl = dev_to_w1_slave(device);
struct w1_master *dev = sl->master;
u8 rom[9], crc, verdict, external_power;
-   int i, max_trying = 10;
+   int i, ret, max_trying = 10;
ssize_t c = PAGE_SIZE;
 
-   i = mutex_lock_interruptible(>bus_mutex);
-   if (i != 0)
-   return i;
+   ret = mutex_lock_interruptible(>bus_mutex);
+   if (ret != 0)
+   goto post_unlock;
 
+   /* prevent the slave from going away in sleep */
+   mutex_lock(THERM_LOCK(sl));
memset(rom, 0, sizeof(rom));
 
while (max_trying--) {
@@ -230,17 +248,19 @@ static ssize_t w1_slave_show(struct device *device,
mutex_unlock(>bus_mutex);
 
sleep_rem = msleep_interruptible(tm);
-   if (sleep_rem != 0)
-   return -EINTR;
+   if (sleep_rem != 0) {
+   ret = -EINTR;
+   goto post_unlock;
+   }
 
-   i = mutex_lock_interruptible(>bus_mutex);
-   if (i != 0)
-   return i;
+   ret = mutex_lock_interruptible(>bus_mutex);
+   if (ret != 0)
+   goto post_unlock;
} else if (!w1_strong_pullup) {
sleep_rem = msleep_interruptible(tm);
if (sleep_rem != 0) {
-   mutex_unlock(>bus_mutex);
-   return -EINTR;
+   ret = -EINTR;
+   goto pre_unlock;
}
}
 
@@ -279,9 +299,14 @@ static ssize_t w1_slave_sh

Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm

2015-03-09 Thread David Fries

On Mon, Mar 09, 2015 at 11:47:24PM +0100, Thorsten Bschorr wrote:
  Here's a different strategy, add a w1_therm family_data specific mutex
  so w1_therm_remove_slave won't return while another thread is in
  w1_slave_show.  Thoughts?
 
  I included three patches, the first is the proposed fix, the second
  makes it more reproducible (and since my testing doesn't have external
  power I had to ignore that check), the third is just some debugging
  messages.  For testing I'm starting a read from w1_slave then manually
  remove the sensor, which seems to do the trick.
 
 I'll test your patch, but keeping the original sleep-time tm.

Oops, sorry, it got lost in the shuffle, here's the first patch again
(the others were for debugging and increase that time and so wouldn't
go upstream anyway).

From 777f5fd75f5f99a3352863e83d226c7b65ebdaa4 Mon Sep 17 00:00:00 2001
From: David Fries da...@fries.net
Date: Sat, 7 Mar 2015 22:25:37 -0600
Subject: [PATCH] w1_therm, don't let the slave go away while in w1_slave_show

A temperature conversion can take 750 ms to complete, if the sensor is
externally powered it releases the bus_mutex while it waits, but if
the slave device is removed sl becomes a dangling pointer.
The race condition window can be 10 * 750 ms = 7.5 seconds if the crc
check fails.

Reported-By: Thorsten Bschorr thors...@bschorr.de
Signed-off-by: David Fries da...@fries.net
---
 drivers/w1/slaves/w1_therm.c | 51 +---
 1 file changed, 38 insertions(+), 13 deletions(-)

diff --git a/drivers/w1/slaves/w1_therm.c b/drivers/w1/slaves/w1_therm.c
index 1f11a20..39a9e6a 100644
--- a/drivers/w1/slaves/w1_therm.c
+++ b/drivers/w1/slaves/w1_therm.c
@@ -59,9 +59,20 @@ MODULE_ALIAS(w1-family- __stringify(W1_THERM_DS28EA00));
 static int w1_strong_pullup = 1;
 module_param_named(strong_pullup, w1_strong_pullup, int, 0);
 
+struct w1_therm_family_data {
+   uint8_t rom[9];
+   struct mutex lock;
+};
+
+/* return the address of the lock in the family data */
+#define THERM_LOCK(sl) \
+   (((struct w1_therm_family_data*)sl-family_data)-lock)
+
 static int w1_therm_add_slave(struct w1_slave *sl)
 {
-   sl-family_data = kzalloc(9, GFP_KERNEL);
+   sl-family_data = kzalloc(sizeof(struct w1_therm_family_data),
+   GFP_KERNEL);
+   mutex_init(THERM_LOCK(sl));
if (!sl-family_data)
return -ENOMEM;
return 0;
@@ -69,6 +80,11 @@ static int w1_therm_add_slave(struct w1_slave *sl)
 
 static void w1_therm_remove_slave(struct w1_slave *sl)
 {
+   /* Getting the lock means w1_slave_show isn't sleeping and the
+* family_data can be freed.
+*/
+   mutex_lock(THERM_LOCK(sl));
+   mutex_unlock(THERM_LOCK(sl));
kfree(sl-family_data);
sl-family_data = NULL;
 }
@@ -194,13 +210,15 @@ static ssize_t w1_slave_show(struct device *device,
struct w1_slave *sl = dev_to_w1_slave(device);
struct w1_master *dev = sl-master;
u8 rom[9], crc, verdict, external_power;
-   int i, max_trying = 10;
+   int i, ret, max_trying = 10;
ssize_t c = PAGE_SIZE;
 
-   i = mutex_lock_interruptible(dev-bus_mutex);
-   if (i != 0)
-   return i;
+   ret = mutex_lock_interruptible(dev-bus_mutex);
+   if (ret != 0)
+   goto post_unlock;
 
+   /* prevent the slave from going away in sleep */
+   mutex_lock(THERM_LOCK(sl));
memset(rom, 0, sizeof(rom));
 
while (max_trying--) {
@@ -230,17 +248,19 @@ static ssize_t w1_slave_show(struct device *device,
mutex_unlock(dev-bus_mutex);
 
sleep_rem = msleep_interruptible(tm);
-   if (sleep_rem != 0)
-   return -EINTR;
+   if (sleep_rem != 0) {
+   ret = -EINTR;
+   goto post_unlock;
+   }
 
-   i = mutex_lock_interruptible(dev-bus_mutex);
-   if (i != 0)
-   return i;
+   ret = mutex_lock_interruptible(dev-bus_mutex);
+   if (ret != 0)
+   goto post_unlock;
} else if (!w1_strong_pullup) {
sleep_rem = msleep_interruptible(tm);
if (sleep_rem != 0) {
-   mutex_unlock(dev-bus_mutex);
-   return -EINTR;
+   ret = -EINTR;
+   goto pre_unlock;
}
}
 
@@ -279,9 +299,14 @@ static ssize_t w1_slave_show(struct device *device,
 
c -= snprintf(buf + PAGE_SIZE - c, c, t=%d\n

Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm

2015-03-08 Thread David Fries

On Wed, Mar 04, 2015 at 06:36:41PM +0300, ??? ??? wrote:
> Hi David
> 
> 02.03.2015, 03:17, "David Fries" :
> 
> > You are correct, it would be a race condition if it doesn't increment
> > the refcnt before unlocking the mutex, and it should get the mutex
> > before unref.  Here's an updated version, I haven't even tried to
> > compile it.
> >
> > What do you think Evgeniy?
> 
> Sounds correct, it should do the trick, please cook up a patch, Thorsten, can 
> you test it?

I quickly found out that strategy wasn't going to work.
w1_unref_slave calls w1_family_notify(BUS_NOTIFY_DEL_DEVICE, sl);
which calls sysfs_remove_groups which deadlocks,

[] ? wait_woken+0x54/0x54
[] ? kernfs_remove_by_name_ns+0x67/0x82
[] ? remove_files+0x30/0x58
[] ? sysfs_remove_group+0x60/0x7b
[] ? sysfs_remove_groups+0x1d/0x2f
[] ? w1_unref_slave+0xd9/0x13b [wire]
[] ? w1_slave_show+0x11a/0x335 [w1_therm]

Here's a different strategy, add a w1_therm family_data specific mutex
so w1_therm_remove_slave won't return while another thread is in
w1_slave_show.  Thoughts?

I included three patches, the first is the proposed fix, the second
makes it more reproducible (and since my testing doesn't have external
power I had to ignore that check), the third is just some debugging
messages.  For testing I'm starting a read from w1_slave then manually
remove the sensor, which seems to do the trick.

echo 28-[ds18b20_id] > /sys/devices/w1_bus_master1/w1_master_remove

Give it a try and let me know how it goes.

My test system host is running 3.16, I'm using kvm to test 3.19 giving
it access to the USB one wire dongle, and there's some refcnt problem
I've not been able to track down.  Once and a while when I have the
kvm grab the USB device the host will start printing a refcount
problem, which takes a reboot to clear, which is annoying because the
reason to test in kvm is to not break the host.

w1_master_driver w1_bus_master1: Waiting for w1_bus_master1 to become
free: refcnt=1.

>From bcde1a8d3d00cc3b13dff7e2476f8c03efc59001 Mon Sep 17 00:00:00 2001
From: David Fries 
Date: Sat, 7 Mar 2015 22:25:37 -0600
Subject: [PATCH 1/3] w1_therm, don't let the slave go away while in
 w1_slave_show

A temperature conversion can take 750 ms to complete, if the sensor is
externally powered it releases the bus_mutex while it waits, but if
the slave device is removed sl becomes a dangling pointer.
The race condition window can be 10 * 750 ms = 7.5 seconds if the crc
check fails.

Reported-By: Thorsten Bschorr 
Signed-off-by: David Fries 
---
 drivers/w1/slaves/w1_therm.c |   52 ++
 1 file changed, 38 insertions(+), 14 deletions(-)

diff --git a/drivers/w1/slaves/w1_therm.c b/drivers/w1/slaves/w1_therm.c
index 1f11a20..6b3ef93 100644
--- a/drivers/w1/slaves/w1_therm.c
+++ b/drivers/w1/slaves/w1_therm.c
@@ -59,9 +59,20 @@ MODULE_ALIAS("w1-family-" __stringify(W1_THERM_DS28EA00));
 static int w1_strong_pullup = 1;
 module_param_named(strong_pullup, w1_strong_pullup, int, 0);

+struct w1_therm_family_data {
+   uint8_t rom[9];
+   struct mutex lock;
+};
+
+/* return the address of the lock in the family data */
+#define THERM_LOCK(sl) \
+   (&((struct w1_therm_family_data*)sl->family_data)->lock)
+
 static int w1_therm_add_slave(struct w1_slave *sl)
 {
-   sl->family_data = kzalloc(9, GFP_KERNEL);
+   sl->family_data = kzalloc(sizeof(struct w1_therm_family_data),
+   GFP_KERNEL);
+   mutex_init(THERM_LOCK(sl));
if (!sl->family_data)
return -ENOMEM;
return 0;
@@ -69,6 +80,11 @@ static int w1_therm_add_slave(struct w1_slave *sl)

 static void w1_therm_remove_slave(struct w1_slave *sl)
 {
+   /* Getting the lock means w1_slave_show isn't sleeping and the
+* family_data can be freed.
+*/
+   mutex_lock(THERM_LOCK(sl));
+   mutex_unlock(THERM_LOCK(sl));
kfree(sl->family_data);
sl->family_data = NULL;
 }
@@ -194,13 +210,15 @@ static ssize_t w1_slave_show(struct device *device,
struct w1_slave *sl = dev_to_w1_slave(device);
struct w1_master *dev = sl->master;
u8 rom[9], crc, verdict, external_power;
-   int i, max_trying = 10;
+   int i, ret, max_trying = 10;
ssize_t c = PAGE_SIZE;

-   i = mutex_lock_interruptible(>bus_mutex);
-   if (i != 0)
-   return i;
+   ret = mutex_lock_interruptible(>bus_mutex);
+   if (ret != 0)
+   goto post_unlock;

+   /* prevent the slave from going away in sleep */
+   mutex_lock(THERM_LOCK(sl));
memset(rom, 0, sizeof(rom));

while (max_trying--) {
@@ -229,18 +247,19 @@ static ssize_t w1_slave_show(struct device *device,
if (external_power) {
mutex_unlock(>bus_mutex);

Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm

2015-03-08 Thread David Fries

On Wed, Mar 04, 2015 at 06:36:41PM +0300, ??? ??? wrote:
 Hi David
 
 02.03.2015, 03:17, David Fries da...@fries.net:
 
  You are correct, it would be a race condition if it doesn't increment
  the refcnt before unlocking the mutex, and it should get the mutex
  before unref.  Here's an updated version, I haven't even tried to
  compile it.
 
  What do you think Evgeniy?
 
 Sounds correct, it should do the trick, please cook up a patch, Thorsten, can 
 you test it?


I quickly found out that strategy wasn't going to work.
w1_unref_slave calls w1_family_notify(BUS_NOTIFY_DEL_DEVICE, sl);
which calls sysfs_remove_groups which deadlocks,

[81047426] ? wait_woken+0x54/0x54
[810e7d6e] ? kernfs_remove_by_name_ns+0x67/0x82
[810e97c8] ? remove_files+0x30/0x58
[810e9b8a] ? sysfs_remove_group+0x60/0x7b
[810e9bc2] ? sysfs_remove_groups+0x1d/0x2f
[a01951ae] ? w1_unref_slave+0xd9/0x13b [wire]
[a01a516e] ? w1_slave_show+0x11a/0x335 [w1_therm]

Here's a different strategy, add a w1_therm family_data specific mutex
so w1_therm_remove_slave won't return while another thread is in
w1_slave_show.  Thoughts?

I included three patches, the first is the proposed fix, the second
makes it more reproducible (and since my testing doesn't have external
power I had to ignore that check), the third is just some debugging
messages.  For testing I'm starting a read from w1_slave then manually
remove the sensor, which seems to do the trick.

echo 28-[ds18b20_id]  /sys/devices/w1_bus_master1/w1_master_remove

Give it a try and let me know how it goes.

My test system host is running 3.16, I'm using kvm to test 3.19 giving
it access to the USB one wire dongle, and there's some refcnt problem
I've not been able to track down.  Once and a while when I have the
kvm grab the USB device the host will start printing a refcount
problem, which takes a reboot to clear, which is annoying because the
reason to test in kvm is to not break the host.

w1_master_driver w1_bus_master1: Waiting for w1_bus_master1 to become
free: refcnt=1.

From bcde1a8d3d00cc3b13dff7e2476f8c03efc59001 Mon Sep 17 00:00:00 2001
From: David Fries da...@fries.net
Date: Sat, 7 Mar 2015 22:25:37 -0600
Subject: [PATCH 1/3] w1_therm, don't let the slave go away while in
 w1_slave_show

A temperature conversion can take 750 ms to complete, if the sensor is
externally powered it releases the bus_mutex while it waits, but if
the slave device is removed sl becomes a dangling pointer.
The race condition window can be 10 * 750 ms = 7.5 seconds if the crc
check fails.

Reported-By: Thorsten Bschorr thors...@bschorr.de
Signed-off-by: David Fries da...@fries.net
---
 drivers/w1/slaves/w1_therm.c |   52 ++
 1 file changed, 38 insertions(+), 14 deletions(-)

diff --git a/drivers/w1/slaves/w1_therm.c b/drivers/w1/slaves/w1_therm.c
index 1f11a20..6b3ef93 100644
--- a/drivers/w1/slaves/w1_therm.c
+++ b/drivers/w1/slaves/w1_therm.c
@@ -59,9 +59,20 @@ MODULE_ALIAS(w1-family- __stringify(W1_THERM_DS28EA00));
 static int w1_strong_pullup = 1;
 module_param_named(strong_pullup, w1_strong_pullup, int, 0);
 
+struct w1_therm_family_data {
+   uint8_t rom[9];
+   struct mutex lock;
+};
+
+/* return the address of the lock in the family data */
+#define THERM_LOCK(sl) \
+   (((struct w1_therm_family_data*)sl-family_data)-lock)
+
 static int w1_therm_add_slave(struct w1_slave *sl)
 {
-   sl-family_data = kzalloc(9, GFP_KERNEL);
+   sl-family_data = kzalloc(sizeof(struct w1_therm_family_data),
+   GFP_KERNEL);
+   mutex_init(THERM_LOCK(sl));
if (!sl-family_data)
return -ENOMEM;
return 0;
@@ -69,6 +80,11 @@ static int w1_therm_add_slave(struct w1_slave *sl)
 
 static void w1_therm_remove_slave(struct w1_slave *sl)
 {
+   /* Getting the lock means w1_slave_show isn't sleeping and the
+* family_data can be freed.
+*/
+   mutex_lock(THERM_LOCK(sl));
+   mutex_unlock(THERM_LOCK(sl));
kfree(sl-family_data);
sl-family_data = NULL;
 }
@@ -194,13 +210,15 @@ static ssize_t w1_slave_show(struct device *device,
struct w1_slave *sl = dev_to_w1_slave(device);
struct w1_master *dev = sl-master;
u8 rom[9], crc, verdict, external_power;
-   int i, max_trying = 10;
+   int i, ret, max_trying = 10;
ssize_t c = PAGE_SIZE;
 
-   i = mutex_lock_interruptible(dev-bus_mutex);
-   if (i != 0)
-   return i;
+   ret = mutex_lock_interruptible(dev-bus_mutex);
+   if (ret != 0)
+   goto post_unlock;
 
+   /* prevent the slave from going away in sleep */
+   mutex_lock(THERM_LOCK(sl));
memset(rom, 0, sizeof(rom));
 
while (max_trying--) {
@@ -229,18 +247,19 @@ static ssize_t w1_slave_show(struct device *device,
if (external_power) {
mutex_unlock(dev-bus_mutex

Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm

2015-03-01 Thread David Fries

On Sun, Mar 01, 2015 at 02:04:53PM +0100, Thorsten Bschorr wrote:
> Hi David,
> 
> thanks for your feedback on my first patch, I wasn't aware of checkpatch.pl.
> 
> Initially, I had just if-ed the usage of family-data, which did not
> look that nice. I was referring to this proof-of-concept workaround in
> my initial bug report.
> 
> The patch I've submitted is different from my proof-of-concept
> workaround. Not unlocking the bus before returning clearly is an
> error, I did not extensively test this patch.
> 
> 
> > or just increment it while sleeping, which is when it's needed, which
> > also looks simpler.
> >
> > if (external_power) {
> > +   int refcnt;
> > mutex_unlock(>bus_mutex);
> >
> > +   /* prevent the slave from going away */
> > +   atomic_inc(>refcnt);
> > sleep_rem = msleep_interruptible(tm);
> > +   refcnt = w1_unref_slave(sl);
> > -   if (sleep_rem != 0)
> > +   if (sleep_rem != 0 || !refcnt)
> > return -EINTR;
> >
> > i = 
> > mutex_lock_interruptible(>bus_mutex);
> > if (i != 0)
> > return i;
> > } else if (!w1_strong_pullup) {
> 
> 
> I like this better than my workaround-patch.
> 
> One thought occurred to me when looking at this proposal: wouldn't it
> be even better to increase sl->refcnt before unlocking the mutex?
> I was asking myself if it is possible that the current thread gets
> suspended between mutex_unlock(>bus_mutex); and
> atomic_inc(>refcnt); thus leaving another thread the change to
> unref the device?
> (I'm not that familiar with linux scheduling, so my assumption might be void.)

You are correct, it would be a race condition if it doesn't increment
the refcnt before unlocking the mutex, and it should get the mutex
before unref.  Here's an updated version, I haven't even tried to
compile it.

What do you think Evgeniy?

if (external_power) {
int refcnt;
/* prevent the slave from going away in sleep */
atomic_inc(>refcnt);
mutex_unlock(>bus_mutex);

sleep_rem = msleep_interruptible(tm);
if (sleep_rem != 0) {
w1_unref_slave(sl);
return -EINTR;
}

i = mutex_lock_interruptible(>bus_mutex);
refcnt = w1_unref_slave(sl);
if (i != 0) {
/* failed to lock */
return i;
}
if (!refcnt)
/* got lock, but slave went away */
        mutex_unlock(>bus_mutex);
return -EINTR;
}
} else if (!w1_strong_pullup) {


-- 
David Fries PGP pub CB1EE8F0
http://fries.net/~david/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm

2015-03-01 Thread David Fries

On Sun, Mar 01, 2015 at 02:04:53PM +0100, Thorsten Bschorr wrote:
 Hi David,
 
 thanks for your feedback on my first patch, I wasn't aware of checkpatch.pl.
 
 Initially, I had just if-ed the usage of family-data, which did not
 look that nice. I was referring to this proof-of-concept workaround in
 my initial bug report.
 
 The patch I've submitted is different from my proof-of-concept
 workaround. Not unlocking the bus before returning clearly is an
 error, I did not extensively test this patch.
 
 
  or just increment it while sleeping, which is when it's needed, which
  also looks simpler.
 
  if (external_power) {
  +   int refcnt;
  mutex_unlock(dev-bus_mutex);
 
  +   /* prevent the slave from going away */
  +   atomic_inc(sl-refcnt);
  sleep_rem = msleep_interruptible(tm);
  +   refcnt = w1_unref_slave(sl);
  -   if (sleep_rem != 0)
  +   if (sleep_rem != 0 || !refcnt)
  return -EINTR;
 
  i = 
  mutex_lock_interruptible(dev-bus_mutex);
  if (i != 0)
  return i;
  } else if (!w1_strong_pullup) {
 
 
 I like this better than my workaround-patch.
 
 One thought occurred to me when looking at this proposal: wouldn't it
 be even better to increase sl-refcnt before unlocking the mutex?
 I was asking myself if it is possible that the current thread gets
 suspended between mutex_unlock(dev-bus_mutex); and
 atomic_inc(sl-refcnt); thus leaving another thread the change to
 unref the device?
 (I'm not that familiar with linux scheduling, so my assumption might be void.)

You are correct, it would be a race condition if it doesn't increment
the refcnt before unlocking the mutex, and it should get the mutex
before unref.  Here's an updated version, I haven't even tried to
compile it.

What do you think Evgeniy?

if (external_power) {
int refcnt;
/* prevent the slave from going away in sleep */
atomic_inc(sl-refcnt);
mutex_unlock(dev-bus_mutex);

sleep_rem = msleep_interruptible(tm);
if (sleep_rem != 0) {
w1_unref_slave(sl);
return -EINTR;
}

i = mutex_lock_interruptible(dev-bus_mutex);
refcnt = w1_unref_slave(sl);
if (i != 0) {
/* failed to lock */
return i;
}
if (!refcnt)
/* got lock, but slave went away */
mutex_unlock(dev-bus_mutex);
return -EINTR;
}
} else if (!w1_strong_pullup) {


-- 
David Fries da...@fries.netPGP pub CB1EE8F0
http://fries.net/~david/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm

2015-02-28 Thread David Fries

On Sun, Mar 01, 2015 at 04:48:22AM +0300, ??? ??? wrote:
> Hi everyone
> 
> 28.02.2015, 23:18, "David Fries" :
> > Thanks for preparing the patch, it looks like it will go another
> > round, but that happens to everyone.
> 
> Patch itself doesn't solve the problem, it only adds a workaround.
> There is a reference counter issue - reading data via sysfs only holds
> a reference to device, w1 slave structure will be deleted by w1 search core.
> 
> I believe it only accidentally doesn't crash because of kernel memory 
> allocator
> hasn't yet reclaimed memory.

Good point, it does look like that's the case.

w1_unref_slave calls w1_family_notify(BUS_NOTIFY_DEL_DEVICE, sl);
which calls sl->family->fops->remove_slave(sl);
which kfree and sl->family_data = NULL;

In that case what do you think about in w1_slave_show after getting
bus_mutex,
atomic_inc(>refcnt);

then add
w1_unref_slave(sl);
in each return path, or do a goto and consolidate them.

or just increment it while sleeping, which is when it's needed, which
also looks simpler.

if (external_power) {
+   int refcnt;
mutex_unlock(>bus_mutex);

+   /* prevent the slave from going away */
+   atomic_inc(>refcnt);
sleep_rem = msleep_interruptible(tm);
+   refcnt = w1_unref_slave(sl);
-   if (sleep_rem != 0)
+   if (sleep_rem != 0 || !refcnt)
return -EINTR;

i = mutex_lock_interruptible(>bus_mutex);
if (i != 0)
    return i;
} else if (!w1_strong_pullup) {


-- 
David Fries PGP pub CB1EE8F0
http://fries.net/~david/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm

2015-02-28 Thread David Fries

Thanks for preparing the patch, it looks like it will go another
round, but that happens to everyone.

To simplify to show a problem,

mutex_lock_interruptible(>bus_mutex);

if (!sl->family_data)
return -ENODEV;

If family_data is NULL, then dev->bus_mutex is left locked.  Your
earlier e-mails indicated that you weren't seeing any problems after
applying this patch, but I would have thought once this early return
was encountered, that bus would not be usable any longer because
bus_mutex would be locked.  I don't know if the bus controller went
away, so there is effectively a new bus, because I assume you are
still able to read the temperature once this condition triggers?

It looks like you need to add mutex_unlock(>bus_mutex); before
return -ENODEV; and that's how it is handled elsewhere in the
function, but you might want to put a printk before the return,
trigger it and verify it is getting hung up on the bus_mutex, it would
also be telling if you can remove the wire and the other w1 modules,
they should not let you remove them while a mutex is held.

The patch needs to be run through the checkpatch script, and it's
listing two issues, adding a space after the if (I also prefer not
having it, but this is for consistency), and missing Signed-off-by:

That would look like this, only with your information, it's a
tracing of where the changes came from and agreeing to the copyright,
you can read more in Documentation/SubmittingPatches if you haven't
already.
Signed-off-by: David Fries 

scripts/checkpatch.pl /tmp/w1_null_patch.patch

Thanks, keep up improving things.

On Fri, Feb 27, 2015 at 09:43:14AM +0100, Thorsten.Bschorr wrote:
> w1_slave_show unlocks the bus while waiting for the sensor
> to convert the temperature. If the sensor gets disconnected
> during that time, sl->family_data could get NULL.
> Note: the w1_slave pointer inside w1_slave_show is protected by
> sysfs-mutex
> ---
>  drivers/w1/slaves/w1_therm.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/w1/slaves/w1_therm.c b/drivers/w1/slaves/w1_therm.c
> index 1f11a20..3a14cb1 100644
> --- a/drivers/w1/slaves/w1_therm.c
> +++ b/drivers/w1/slaves/w1_therm.c
> @@ -236,6 +236,12 @@ static ssize_t w1_slave_show(struct device *device,
>   i = mutex_lock_interruptible(>bus_mutex);
>   if (i != 0)
>   return i;
> +
> + /* Ensure that device is still there -
> +  * it could have gone while we unlocked the bus.
> +  * Note: sl is still protected by sysfs-mutex */
> + if(!sl->family_data)
> + return -ENODEV;
>   } else if (!w1_strong_pullup) {
>   sleep_rem = msleep_interruptible(tm);
>   if (sleep_rem != 0) {
> -- 
> 1.8.4.5
> 
> 
> ---
> Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
> http://www.avast.com
> 

-- 
David Fries PGP pub CB1EE8F0
http://fries.net/~david/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm

2015-02-28 Thread David Fries

Thanks for preparing the patch, it looks like it will go another
round, but that happens to everyone.

To simplify to show a problem,

mutex_lock_interruptible(dev-bus_mutex);

if (!sl-family_data)
return -ENODEV;

If family_data is NULL, then dev-bus_mutex is left locked.  Your
earlier e-mails indicated that you weren't seeing any problems after
applying this patch, but I would have thought once this early return
was encountered, that bus would not be usable any longer because
bus_mutex would be locked.  I don't know if the bus controller went
away, so there is effectively a new bus, because I assume you are
still able to read the temperature once this condition triggers?

It looks like you need to add mutex_unlock(dev-bus_mutex); before
return -ENODEV; and that's how it is handled elsewhere in the
function, but you might want to put a printk before the return,
trigger it and verify it is getting hung up on the bus_mutex, it would
also be telling if you can remove the wire and the other w1 modules,
they should not let you remove them while a mutex is held.


The patch needs to be run through the checkpatch script, and it's
listing two issues, adding a space after the if (I also prefer not
having it, but this is for consistency), and missing Signed-off-by:

That would look like this, only with your information, it's a
tracing of where the changes came from and agreeing to the copyright,
you can read more in Documentation/SubmittingPatches if you haven't
already.
Signed-off-by: David Fries da...@fries.net

scripts/checkpatch.pl /tmp/w1_null_patch.patch

Thanks, keep up improving things.

On Fri, Feb 27, 2015 at 09:43:14AM +0100, Thorsten.Bschorr wrote:
 w1_slave_show unlocks the bus while waiting for the sensor
 to convert the temperature. If the sensor gets disconnected
 during that time, sl-family_data could get NULL.
 Note: the w1_slave pointer inside w1_slave_show is protected by
 sysfs-mutex
 ---
  drivers/w1/slaves/w1_therm.c | 6 ++
  1 file changed, 6 insertions(+)
 
 diff --git a/drivers/w1/slaves/w1_therm.c b/drivers/w1/slaves/w1_therm.c
 index 1f11a20..3a14cb1 100644
 --- a/drivers/w1/slaves/w1_therm.c
 +++ b/drivers/w1/slaves/w1_therm.c
 @@ -236,6 +236,12 @@ static ssize_t w1_slave_show(struct device *device,
   i = mutex_lock_interruptible(dev-bus_mutex);
   if (i != 0)
   return i;
 +
 + /* Ensure that device is still there -
 +  * it could have gone while we unlocked the bus.
 +  * Note: sl is still protected by sysfs-mutex */
 + if(!sl-family_data)
 + return -ENODEV;
   } else if (!w1_strong_pullup) {
   sleep_rem = msleep_interruptible(tm);
   if (sleep_rem != 0) {
 -- 
 1.8.4.5
 
 
 ---
 Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
 http://www.avast.com
 

-- 
David Fries da...@fries.netPGP pub CB1EE8F0
http://fries.net/~david/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm

2015-02-28 Thread David Fries

On Sun, Mar 01, 2015 at 04:48:22AM +0300, ??? ??? wrote:
 Hi everyone
 
 28.02.2015, 23:18, David Fries da...@fries.net:
  Thanks for preparing the patch, it looks like it will go another
  round, but that happens to everyone.
 
 Patch itself doesn't solve the problem, it only adds a workaround.
 There is a reference counter issue - reading data via sysfs only holds
 a reference to device, w1 slave structure will be deleted by w1 search core.
 
 I believe it only accidentally doesn't crash because of kernel memory 
 allocator
 hasn't yet reclaimed memory.

Good point, it does look like that's the case.

w1_unref_slave calls w1_family_notify(BUS_NOTIFY_DEL_DEVICE, sl);
which calls sl-family-fops-remove_slave(sl);
which kfree and sl-family_data = NULL;

In that case what do you think about in w1_slave_show after getting
bus_mutex,
atomic_inc(sl-refcnt);

then add
w1_unref_slave(sl);
in each return path, or do a goto and consolidate them.

or just increment it while sleeping, which is when it's needed, which
also looks simpler.

if (external_power) {
+   int refcnt;
mutex_unlock(dev-bus_mutex);

+   /* prevent the slave from going away */
+   atomic_inc(sl-refcnt);
sleep_rem = msleep_interruptible(tm);
+   refcnt = w1_unref_slave(sl);
-   if (sleep_rem != 0)
+   if (sleep_rem != 0 || !refcnt)
return -EINTR;

i = mutex_lock_interruptible(dev-bus_mutex);
if (i != 0)
return i;
} else if (!w1_strong_pullup) {


-- 
David Fries da...@fries.netPGP pub CB1EE8F0
http://fries.net/~david/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Fwd: w1/slaves/w1_therm: null-ptr access of sl->family_data

2015-02-23 Thread David Fries

Thorsten,
Are you planning on submitting a patch?

FYI, I load the one wire module with
wire search_count=1
in
/etc/modules
which tells it to search the bus once only.  That works for me as
devices don't come and go on my bus, and it isn't scanning the bus
every few seconds just to find out nothing changed.  If you are
finding devices vanish even when they aren't maybe it will be useful
to you as well, though the bigger problem might be why they are
vanishing in the first place.

On Mon, Feb 23, 2015 at 06:09:16PM +0100, Thorsten Bschorr wrote:
> Hi,
> 
> I have observed a null-pointer access in w1/slaves/w1_therm on my Raspberry
> Pi running 3.18.5 with several DS18S20 temperature sensors. I have already
> discussed the problem with Evgeniy.
> 
> @Evgeniy & David: sorry for re-sending my message, my email
> accidentally contained a HTML part.
> 
> 
> 
> w1_therm: w1_slave_show checks if the sensor uses external power. If this
> is the case, the mutex on dev->bus_mutex is unlocked while waiting 750 ms
> for the sensor to convert the temperature. Before reading the temperature,
> the mutex is again locked.
> 
> During this sleep-time, the sensor could get detached, for example by w1.c:
> w1_search_process_cb not finding the sensor (*):
>  !test_bit(W1_SLAVE_ACTIVE, >flags) ==true.
> This triggers w1_slave_detach, and hence w1_therm_remove_slave frees and
> nulls sl->family_data.
> 
> w1_slave_show does not check if familiy_data is null after re-acquiring the
> bus_mutex resulting in a null-ptr access when storing data.
> 
> After I've added checks for family_data!=0, I did not observe any more
> crashes; the other data of struct w1_slave seem to be valid as long as any
> thread executes w1_slave_show.
> 
> 
> I have added debug-output to w1.c and w1_therm.c, here's a log of a
> potential crash:
> 
> [184199.510227] w1_master_driver w1_bus_master2: w1_search_process_cb,
> !W1_SLAVE_ACTIVE, calling w1_slave_detach
> [184199.510276] w1_slave_driver 10-000802d9c9e4: w1_slave_detach
> destroy_now 1
> [184199.510297] w1_slave_driver 10-000802d9c9e4: w1_unref_slave refcnt 0
> [184199.510321] w1_slave_driver 10-000802d9c9e4: w1_unref_slave: detaching
> 10-000802d9c9e4 [d5fb8800].
> [184199.510347] w1_slave_driver 10-000802d9c9e4: w1_unref_slave ->
> w1_family_notify
> [184199.510365] w1_slave_driver 10-000802d9c9e4: w1_family_notify calling
> remove_slave
> [184199.510382] w1_slave_driver 10-000802d9c9e4: w1_therm_remove_slave
> [184199.510400] w1_slave_driver 10-000802d9c9e4: w1_therm_remove_slave
> refcnt -1
> [184200.049745] w1_slave_driver 10-000802d9c9e4: w1_slave_show (after
> sleep), family_data==0
> [184200.137133] w1_slave_driver 10-000802d9c9e4: w1_slave_show (before
> sleep), family_data==0
> [184200.889551] w1_slave_driver 10-000802d9c9e4: w1_slave_show (after
> sleep), family_data==0
> [184200.930866] w1_slave_driver 10-000802d9c9e4: w1_slave_show (after
> sleep), family_data==0
> [184200.930907] w1_slave_driver 10-000802d9c9e4: Read failed CRC check
> [184200.931002] w1_slave_driver 10-000802d9c9e4: w1_unref_slave ->
> device_unregister
> [184200.931169] w1 w1_unref_slave -> kfree
> 
> Note: When this crash happened, multiple threads were reading the sensor.
> 
> 
> I could trigger the problem several times, and each time device_unregister
> in w1_unref_slave was executed  *after*  w1_slave_show. In one case, the
> logged time-difference between the first family_data==0 message and
> device_unregister was about 8 seconds!
> I have not observed a w1_slave_show call after w1_unref_slave (as long as
> the device was not re-attached again).
> 
> >From my observation, the w1_slave data seem to be valid as long as
> w1_slave_show is executed.
> My guess is that the call of sysfs_remove_groups in w1_family_notify hits a
> mutex (I did not add debug output here).
> 
> 
> (*) On my tiny raspberry, this happens from time to time with high CPU and
> external disc load (timing and/or electrical issues); it seems that the
> sensor does not respond in time to the (periodic) search.
> 
> 
> Please email me if you need further information.
> 
> 
> 
> Best regards,
> Thorsten Bschorr
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
David Fries PGP pub CB1EE8F0
http://fries.net/~david/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Fwd: w1/slaves/w1_therm: null-ptr access of sl-family_data

2015-02-23 Thread David Fries

Thorsten,
Are you planning on submitting a patch?

FYI, I load the one wire module with
wire search_count=1
in
/etc/modules
which tells it to search the bus once only.  That works for me as
devices don't come and go on my bus, and it isn't scanning the bus
every few seconds just to find out nothing changed.  If you are
finding devices vanish even when they aren't maybe it will be useful
to you as well, though the bigger problem might be why they are
vanishing in the first place.

On Mon, Feb 23, 2015 at 06:09:16PM +0100, Thorsten Bschorr wrote:
 Hi,
 
 I have observed a null-pointer access in w1/slaves/w1_therm on my Raspberry
 Pi running 3.18.5 with several DS18S20 temperature sensors. I have already
 discussed the problem with Evgeniy.
 
 @Evgeniy  David: sorry for re-sending my message, my email
 accidentally contained a HTML part.
 
 
 
 w1_therm: w1_slave_show checks if the sensor uses external power. If this
 is the case, the mutex on dev-bus_mutex is unlocked while waiting 750 ms
 for the sensor to convert the temperature. Before reading the temperature,
 the mutex is again locked.
 
 During this sleep-time, the sensor could get detached, for example by w1.c:
 w1_search_process_cb not finding the sensor (*):
  !test_bit(W1_SLAVE_ACTIVE, sl-flags) ==true.
 This triggers w1_slave_detach, and hence w1_therm_remove_slave frees and
 nulls sl-family_data.
 
 w1_slave_show does not check if familiy_data is null after re-acquiring the
 bus_mutex resulting in a null-ptr access when storing data.
 
 After I've added checks for family_data!=0, I did not observe any more
 crashes; the other data of struct w1_slave seem to be valid as long as any
 thread executes w1_slave_show.
 
 
 I have added debug-output to w1.c and w1_therm.c, here's a log of a
 potential crash:
 
 [184199.510227] w1_master_driver w1_bus_master2: w1_search_process_cb,
 !W1_SLAVE_ACTIVE, calling w1_slave_detach
 [184199.510276] w1_slave_driver 10-000802d9c9e4: w1_slave_detach
 destroy_now 1
 [184199.510297] w1_slave_driver 10-000802d9c9e4: w1_unref_slave refcnt 0
 [184199.510321] w1_slave_driver 10-000802d9c9e4: w1_unref_slave: detaching
 10-000802d9c9e4 [d5fb8800].
 [184199.510347] w1_slave_driver 10-000802d9c9e4: w1_unref_slave -
 w1_family_notify
 [184199.510365] w1_slave_driver 10-000802d9c9e4: w1_family_notify calling
 remove_slave
 [184199.510382] w1_slave_driver 10-000802d9c9e4: w1_therm_remove_slave
 [184199.510400] w1_slave_driver 10-000802d9c9e4: w1_therm_remove_slave
 refcnt -1
 [184200.049745] w1_slave_driver 10-000802d9c9e4: w1_slave_show (after
 sleep), family_data==0
 [184200.137133] w1_slave_driver 10-000802d9c9e4: w1_slave_show (before
 sleep), family_data==0
 [184200.889551] w1_slave_driver 10-000802d9c9e4: w1_slave_show (after
 sleep), family_data==0
 [184200.930866] w1_slave_driver 10-000802d9c9e4: w1_slave_show (after
 sleep), family_data==0
 [184200.930907] w1_slave_driver 10-000802d9c9e4: Read failed CRC check
 [184200.931002] w1_slave_driver 10-000802d9c9e4: w1_unref_slave -
 device_unregister
 [184200.931169] w1 w1_unref_slave - kfree
 
 Note: When this crash happened, multiple threads were reading the sensor.
 
 
 I could trigger the problem several times, and each time device_unregister
 in w1_unref_slave was executed  *after*  w1_slave_show. In one case, the
 logged time-difference between the first family_data==0 message and
 device_unregister was about 8 seconds!
 I have not observed a w1_slave_show call after w1_unref_slave (as long as
 the device was not re-attached again).
 
 From my observation, the w1_slave data seem to be valid as long as
 w1_slave_show is executed.
 My guess is that the call of sysfs_remove_groups in w1_family_notify hits a
 mutex (I did not add debug output here).
 
 
 (*) On my tiny raspberry, this happens from time to time with high CPU and
 external disc load (timing and/or electrical issues); it seems that the
 sensor does not respond in time to the (periodic) search.
 
 
 Please email me if you need further information.
 
 
 
 Best regards,
 Thorsten Bschorr
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

-- 
David Fries da...@fries.netPGP pub CB1EE8F0
http://fries.net/~david/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/2] w1: slaves: w1_therm: Add sysfs entry for current temperature

2015-01-06 Thread David Fries

On Tue, Jan 06, 2015 at 05:50:32PM +0100, Richard Weinberger wrote:
> Am 06.01.2015 um 17:12 schrieb Mariusz Gorski:
> > On Tue, Jan 06, 2015 at 04:01:47PM +0100, Richard Weinberger wrote:
> >> On Tue, Jan 6, 2015 at 3:29 PM, Mariusz Gorski  
> >> wrote:
> >>> DS18B20 and it's brothers are pretty popular in the RaspberryPi world
> >>> when it comes to temperature measurement. All tutorials on the Internet
> >>> use the same way of parsing the output of the w1_slave sysfs file.
> >>> These patches add a dedicated sysfs entry called 'temp' whose only job
> >>> is to output the current temperature.
> >>
> >> And what is the benefit of this patches?
> > 
> > Well, instead of having to parse the output of w1_slave:
> > 
> > $ cat w1_slave 
> > 4d 01 55 00 7f ff 0c 10 fd : crc=fd YES
> > 4d 01 55 00 7f ff 0c 10 fd t=20812
> > 
> > the userspace program gets only the interesting information,
> > which usually is the current temparture:
> 
> A sane user would also check the CRC.

Yes, and more because all 00's will pass a checksum, but be an invalid
reading, because there will be some bit in there outside of the two
temperature return bytes that won't be 0, hence the temperature field
can't be trusted.  All ff's will fail a checksum, but is useful to
know because that pretty much means the sensor is no longer there.

> IMHO it is up to the user to format the output for its needs.
> Some may want "20812", some "20.8", some else maybe "20.8°C".
> Let it up to the user.
> Any trivial awk/cut/etc... oneliner would do it.

I'm not fond of the format of w1_slave printing two sensor readings
and such, it's confusing.  Back in the day (2008) I at least
standardized the t= converted value to millidegrees C.  That was
because DS18B20 gave degrees C where DS18S20 gave millidegrees C, so
it's been worse.

> Thanks,
> //richard

-- 
David Fries PGP pub CB1EE8F0
http://fries.net/~david/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] w1: slaves: w1_therm: Add temp attribute

2015-01-06 Thread David Fries

On Tue, Jan 06, 2015 at 03:29:56PM +0100, Mariusz Gorski wrote:
> Add new attribute to simplify reading of current temperature.

> +static ssize_t temp_show(struct device *device,
> + struct device_attribute *attr, char *buf)
> +{
> + struct w1_slave *sl = dev_to_w1_slave(device);
> + struct w1_master *dev = sl->master;
> + u8 rom[9], verdict;
> + int temp;
> +
> + if (mutex_lock_interruptible(>bus_mutex))
> + return -EINTR;
> +
> + memset(rom, 0, sizeof(rom));
> +
> + verdict = read_rom(device, rom);
> +
> + if (verdict < 0)
> + return verdict;

I wanted to point out that this returns without unlocking bus_mutex.

-- 
David Fries PGP pub CB1EE8F0
http://fries.net/~david/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/2] w1: slaves: w1_therm: Add sysfs entry for current temperature

2015-01-06 Thread David Fries

On Tue, Jan 06, 2015 at 03:29:54PM +0100, Mariusz Gorski wrote:
> DS18B20 and it's brothers are pretty popular in the RaspberryPi world
> when it comes to temperature measurement. All tutorials on the Internet
> use the same way of parsing the output of the w1_slave sysfs file.
> These patches add a dedicated sysfs entry called 'temp' whose only job
> is to output the current temperature.
> 
> What could be improved here is the way how dev->bus_mutex is locked/unlocked,
> which is split right now between two functions - maybe it would be better
> to move all dev->bus_mutex locking into read_rom() and introduce another
> mutex for protecting the slave's structures (e.g. family_data).

As to adding another mutex, there are enough little mutexes around the
one wire system that it already feels like a mess and I tried to get 
all the race conditions that I could.

I think you don't understand what the bus_mutex protects.  It protects
other devices from accessing the one wire bus.  The way you coded it,
temp_show will allow only one temperature sensor to be accessed and
read at a time, considering it takes 750 ms that's holding up the bus
for a long time, and depending on how the devices are connected, isn't
needed.  If the temperature sensor is parasite powered, nothing else
can use the bus while it is converting, if it has dedicated power,
other slaves can be accessed at the same time.

I second using a standard sensor attribute if there is one appropriate
rather than here.  I didn't in my application, but then I wasn't 
adding a new method to get the same data, I was using the existing 
netlink method.  If it does get added to a standard sensor access
method, I do request that it be optional (like w1_therm which I don't
use), and not sit there polling the sensor for the current temperature
every X seconds when no one is using the data (OpenBSD does this, or
at least last I saw, and you have no idea how old the sensor data
reading is or have any control over how often it will poll it).

I have 15 DS18B20 temperature sensors scattered around the house on a 
single one wire network.  I don't use the w1_therm system, I talk
through netlink to the kernel.  This has the advantage of being an
asynchronous socket based interface so I can write a packet to request
each sensor to do a conversion, get the write command status, and then
after a timeout, write another to get a return, then read back the
data.

Using the sysfs means blocking while the read is in progress.  Reading
15 means either taking at least 11.250 seconds as you read them
serially, or having 15 threads blocked on one read each (provided they
don't block the other readers).

Thanks for Cc'ing me on this.

> Mariusz Gorski (2):
>   w1: slaves: w1_therm: Extract read_rom function
>   w1: slaves: w1_therm: Add temp attribute
> 
>  drivers/w1/slaves/w1_therm.c | 83 
> ++--
>  1 file changed, 65 insertions(+), 18 deletions(-)
> 
> -- 
> 2.2.1

-- 
David Fries PGP pub CB1EE8F0
http://fries.net/~david/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/2] w1: slaves: w1_therm: Add sysfs entry for current temperature

2015-01-06 Thread David Fries

On Tue, Jan 06, 2015 at 03:29:54PM +0100, Mariusz Gorski wrote:
 DS18B20 and it's brothers are pretty popular in the RaspberryPi world
 when it comes to temperature measurement. All tutorials on the Internet
 use the same way of parsing the output of the w1_slave sysfs file.
 These patches add a dedicated sysfs entry called 'temp' whose only job
 is to output the current temperature.
 
 What could be improved here is the way how dev-bus_mutex is locked/unlocked,
 which is split right now between two functions - maybe it would be better
 to move all dev-bus_mutex locking into read_rom() and introduce another
 mutex for protecting the slave's structures (e.g. family_data).

As to adding another mutex, there are enough little mutexes around the
one wire system that it already feels like a mess and I tried to get 
all the race conditions that I could.

I think you don't understand what the bus_mutex protects.  It protects
other devices from accessing the one wire bus.  The way you coded it,
temp_show will allow only one temperature sensor to be accessed and
read at a time, considering it takes 750 ms that's holding up the bus
for a long time, and depending on how the devices are connected, isn't
needed.  If the temperature sensor is parasite powered, nothing else
can use the bus while it is converting, if it has dedicated power,
other slaves can be accessed at the same time.

I second using a standard sensor attribute if there is one appropriate
rather than here.  I didn't in my application, but then I wasn't 
adding a new method to get the same data, I was using the existing 
netlink method.  If it does get added to a standard sensor access
method, I do request that it be optional (like w1_therm which I don't
use), and not sit there polling the sensor for the current temperature
every X seconds when no one is using the data (OpenBSD does this, or
at least last I saw, and you have no idea how old the sensor data
reading is or have any control over how often it will poll it).

I have 15 DS18B20 temperature sensors scattered around the house on a 
single one wire network.  I don't use the w1_therm system, I talk
through netlink to the kernel.  This has the advantage of being an
asynchronous socket based interface so I can write a packet to request
each sensor to do a conversion, get the write command status, and then
after a timeout, write another to get a return, then read back the
data.

Using the sysfs means blocking while the read is in progress.  Reading
15 means either taking at least 11.250 seconds as you read them
serially, or having 15 threads blocked on one read each (provided they
don't block the other readers).

Thanks for Cc'ing me on this.

 Mariusz Gorski (2):
   w1: slaves: w1_therm: Extract read_rom function
   w1: slaves: w1_therm: Add temp attribute
 
  drivers/w1/slaves/w1_therm.c | 83 
 ++--
  1 file changed, 65 insertions(+), 18 deletions(-)
 
 -- 
 2.2.1

-- 
David Fries da...@fries.netPGP pub CB1EE8F0
http://fries.net/~david/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/2] w1: slaves: w1_therm: Add sysfs entry for current temperature

2015-01-06 Thread David Fries

On Tue, Jan 06, 2015 at 05:50:32PM +0100, Richard Weinberger wrote:
 Am 06.01.2015 um 17:12 schrieb Mariusz Gorski:
  On Tue, Jan 06, 2015 at 04:01:47PM +0100, Richard Weinberger wrote:
  On Tue, Jan 6, 2015 at 3:29 PM, Mariusz Gorski marius.gor...@gmail.com 
  wrote:
  DS18B20 and it's brothers are pretty popular in the RaspberryPi world
  when it comes to temperature measurement. All tutorials on the Internet
  use the same way of parsing the output of the w1_slave sysfs file.
  These patches add a dedicated sysfs entry called 'temp' whose only job
  is to output the current temperature.
 
  And what is the benefit of this patches?
  
  Well, instead of having to parse the output of w1_slave:
  
  $ cat w1_slave 
  4d 01 55 00 7f ff 0c 10 fd : crc=fd YES
  4d 01 55 00 7f ff 0c 10 fd t=20812
  
  the userspace program gets only the interesting information,
  which usually is the current temparture:
 
 A sane user would also check the CRC.

Yes, and more because all 00's will pass a checksum, but be an invalid
reading, because there will be some bit in there outside of the two
temperature return bytes that won't be 0, hence the temperature field
can't be trusted.  All ff's will fail a checksum, but is useful to
know because that pretty much means the sensor is no longer there.

 IMHO it is up to the user to format the output for its needs.
 Some may want 20812, some 20.8, some else maybe 20.8°C.
 Let it up to the user.
 Any trivial awk/cut/etc... oneliner would do it.

I'm not fond of the format of w1_slave printing two sensor readings
and such, it's confusing.  Back in the day (2008) I at least
standardized the t= converted value to millidegrees C.  That was
because DS18B20 gave degrees C where DS18S20 gave millidegrees C, so
it's been worse.

 Thanks,
 //richard

-- 
David Fries da...@fries.netPGP pub CB1EE8F0
http://fries.net/~david/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] w1: slaves: w1_therm: Add temp attribute

2015-01-06 Thread David Fries

On Tue, Jan 06, 2015 at 03:29:56PM +0100, Mariusz Gorski wrote:
 Add new attribute to simplify reading of current temperature.

 +static ssize_t temp_show(struct device *device,
 + struct device_attribute *attr, char *buf)
 +{
 + struct w1_slave *sl = dev_to_w1_slave(device);
 + struct w1_master *dev = sl-master;
 + u8 rom[9], verdict;
 + int temp;
 +
 + if (mutex_lock_interruptible(dev-bus_mutex))
 + return -EINTR;
 +
 + memset(rom, 0, sizeof(rom));
 +
 + verdict = read_rom(device, rom);
 +
 + if (verdict  0)
 + return verdict;

I wanted to point out that this returns without unlocking bus_mutex.

-- 
David Fries da...@fries.netPGP pub CB1EE8F0
http://fries.net/~david/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] w1: avoid potential u16 overflow

2014-11-10 Thread David Fries

Reported-by: Dan Carpenter 
Acked-by: Evgeniy Polyakov 
Signed-off-by: David Fries 
---
 drivers/w1/w1_netlink.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/w1/w1_netlink.c b/drivers/w1/w1_netlink.c
index dd96562..881597a 100644
--- a/drivers/w1/w1_netlink.c
+++ b/drivers/w1/w1_netlink.c
@@ -598,7 +598,7 @@ static void w1_cn_callback(struct cn_msg *cn, struct 
netlink_skb_parms *nsp)
msg = (struct w1_netlink_msg *)(cn + 1);
if (node_count) {
int size;
-   u16 reply_size = sizeof(*cn) + cn->len + slave_len;
+   int reply_size = sizeof(*cn) + cn->len + slave_len;
if (cn->flags & W1_CN_BUNDLE) {
/* bundling duplicats some of the messages */
reply_size += 2 * cmd_count * (sizeof(struct cn_msg) +
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] cn: verify msg->len before making callback

2014-11-10 Thread David Fries

The struct cn_msg len field comes from userspace and needs to be
validated.  More logical to do so here where the cn_msg pointer is
pulled out of the sk_buff than the callback which is passed cn_msg *
and might assume no validation is needed.

Reported-by: Dan Carpenter 
Acked-by: Evgeniy Polyakov 
Signed-off-by: David Fries 
---
 drivers/connector/connector.c |6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c
index f612d68..30f5228 100644
--- a/drivers/connector/connector.c
+++ b/drivers/connector/connector.c
@@ -141,12 +141,18 @@ EXPORT_SYMBOL_GPL(cn_netlink_send);
  */
 static int cn_call_callback(struct sk_buff *skb)
 {
+   struct nlmsghdr *nlh;
struct cn_callback_entry *i, *cbq = NULL;
struct cn_dev *dev = 
struct cn_msg *msg = nlmsg_data(nlmsg_hdr(skb));
struct netlink_skb_parms *nsp = _CB(skb);
int err = -ENODEV;
 
+   /* verify msg->len is within skb */
+   nlh = nlmsg_hdr(skb);
+   if (nlh->nlmsg_len < NLMSG_HDRLEN + sizeof(struct cn_msg) + msg->len)
+   return -EINVAL;
+
spin_lock_bh(>cbdev->queue_lock);
list_for_each_entry(i, >cbdev->queue_list, callback_entry) {
if (cn_cb_equal(>id.id, >id)) {
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATHC 0/2] cn: w1: buffer size checks

2014-11-10 Thread David Fries

Greg Kroah-Hartman,
These issues were found by Dan Carpenter with a static checker and will check
message buffer lengths from userspace or avoid length overflows.  Evgeniy
Polyakov has given his ack, and they can be applied to the stable branch as
well.

[PATCH 1/2] cn: verify msg->len before making callback
[PATCH 2/2] w1: avoid potential u16 overflow
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATHC 0/2] cn: w1: buffer size checks

2014-11-10 Thread David Fries

Greg Kroah-Hartman,
These issues were found by Dan Carpenter with a static checker and will check
message buffer lengths from userspace or avoid length overflows.  Evgeniy
Polyakov has given his ack, and they can be applied to the stable branch as
well.

[PATCH 1/2] cn: verify msg-len before making callback
[PATCH 2/2] w1: avoid potential u16 overflow
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] cn: verify msg-len before making callback

2014-11-10 Thread David Fries

The struct cn_msg len field comes from userspace and needs to be
validated.  More logical to do so here where the cn_msg pointer is
pulled out of the sk_buff than the callback which is passed cn_msg *
and might assume no validation is needed.

Reported-by: Dan Carpenter dan.carpen...@oracle.com
Acked-by: Evgeniy Polyakov z...@ioremap.net
Signed-off-by: David Fries da...@fries.net
---
 drivers/connector/connector.c |6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c
index f612d68..30f5228 100644
--- a/drivers/connector/connector.c
+++ b/drivers/connector/connector.c
@@ -141,12 +141,18 @@ EXPORT_SYMBOL_GPL(cn_netlink_send);
  */
 static int cn_call_callback(struct sk_buff *skb)
 {
+   struct nlmsghdr *nlh;
struct cn_callback_entry *i, *cbq = NULL;
struct cn_dev *dev = cdev;
struct cn_msg *msg = nlmsg_data(nlmsg_hdr(skb));
struct netlink_skb_parms *nsp = NETLINK_CB(skb);
int err = -ENODEV;
 
+   /* verify msg-len is within skb */
+   nlh = nlmsg_hdr(skb);
+   if (nlh-nlmsg_len  NLMSG_HDRLEN + sizeof(struct cn_msg) + msg-len)
+   return -EINVAL;
+
spin_lock_bh(dev-cbdev-queue_lock);
list_for_each_entry(i, dev-cbdev-queue_list, callback_entry) {
if (cn_cb_equal(i-id.id, msg-id)) {
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] w1: avoid potential u16 overflow

2014-11-10 Thread David Fries

Reported-by: Dan Carpenter dan.carpen...@oracle.com
Acked-by: Evgeniy Polyakov z...@ioremap.net
Signed-off-by: David Fries da...@fries.net
---
 drivers/w1/w1_netlink.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/w1/w1_netlink.c b/drivers/w1/w1_netlink.c
index dd96562..881597a 100644
--- a/drivers/w1/w1_netlink.c
+++ b/drivers/w1/w1_netlink.c
@@ -598,7 +598,7 @@ static void w1_cn_callback(struct cn_msg *cn, struct 
netlink_skb_parms *nsp)
msg = (struct w1_netlink_msg *)(cn + 1);
if (node_count) {
int size;
-   u16 reply_size = sizeof(*cn) + cn-len + slave_len;
+   int reply_size = sizeof(*cn) + cn-len + slave_len;
if (cn-flags  W1_CN_BUNDLE) {
/* bundling duplicats some of the messages */
reply_size += 2 * cmd_count * (sizeof(struct cn_msg) +
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] cn: verify msg->len before making callback

2014-11-09 Thread David Fries

The struct cn_msg len field comes from userspace and needs to be
validated.  More logical to do so here where the cn_msg pointer is
pulled out of the sk_buff than the callback which is passed cn_msg *
and might assume no validation is needed.

Reported-by: Dan Carpenter 
Signed-off-by: David Fries 
---
 drivers/connector/connector.c |6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c
index f612d68..30f5228 100644
--- a/drivers/connector/connector.c
+++ b/drivers/connector/connector.c
@@ -141,12 +141,18 @@ EXPORT_SYMBOL_GPL(cn_netlink_send);
  */
 static int cn_call_callback(struct sk_buff *skb)
 {
+   struct nlmsghdr *nlh;
struct cn_callback_entry *i, *cbq = NULL;
struct cn_dev *dev = 
struct cn_msg *msg = nlmsg_data(nlmsg_hdr(skb));
struct netlink_skb_parms *nsp = _CB(skb);
int err = -ENODEV;
 
+   /* verify msg->len is within skb */
+   nlh = nlmsg_hdr(skb);
+   if (nlh->nlmsg_len < NLMSG_HDRLEN + sizeof(struct cn_msg) + msg->len)
+   return -EINVAL;
+
spin_lock_bh(>cbdev->queue_lock);
list_for_each_entry(i, >cbdev->queue_list, callback_entry) {
if (cn_cb_equal(>id.id, >id)) {
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] w1: avoid potential u16 overflow

2014-11-09 Thread David Fries

Reported-by: Dan Carpenter 
Signed-off-by: David Fries 
---
 drivers/w1/w1_netlink.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/w1/w1_netlink.c b/drivers/w1/w1_netlink.c
index dd96562..881597a 100644
--- a/drivers/w1/w1_netlink.c
+++ b/drivers/w1/w1_netlink.c
@@ -598,7 +598,7 @@ static void w1_cn_callback(struct cn_msg *cn, struct 
netlink_skb_parms *nsp)
msg = (struct w1_netlink_msg *)(cn + 1);
if (node_count) {
int size;
-   u16 reply_size = sizeof(*cn) + cn->len + slave_len;
+   int reply_size = sizeof(*cn) + cn->len + slave_len;
if (cn->flags & W1_CN_BUNDLE) {
/* bundling duplicats some of the messages */
reply_size += 2 * cmd_count * (sizeof(struct cn_msg) +
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATHC 0/2] cn: w1: buffer size checks

2014-11-09 Thread David Fries

These issues were found by Dan Carpenter with a static checker.
Evgeniy are you okay with these?

[PATCH 1/2] cn: verify msg->len before making callback
[PATCH 2/2] w1: avoid potential u16 overflow
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] w1: avoid potential u16 overflow

2014-11-09 Thread David Fries

Reported-by: Dan Carpenter dan.carpen...@oracle.com
Signed-off-by: David Fries da...@fries.net
---
 drivers/w1/w1_netlink.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/w1/w1_netlink.c b/drivers/w1/w1_netlink.c
index dd96562..881597a 100644
--- a/drivers/w1/w1_netlink.c
+++ b/drivers/w1/w1_netlink.c
@@ -598,7 +598,7 @@ static void w1_cn_callback(struct cn_msg *cn, struct 
netlink_skb_parms *nsp)
msg = (struct w1_netlink_msg *)(cn + 1);
if (node_count) {
int size;
-   u16 reply_size = sizeof(*cn) + cn-len + slave_len;
+   int reply_size = sizeof(*cn) + cn-len + slave_len;
if (cn-flags  W1_CN_BUNDLE) {
/* bundling duplicats some of the messages */
reply_size += 2 * cmd_count * (sizeof(struct cn_msg) +
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATHC 0/2] cn: w1: buffer size checks

2014-11-09 Thread David Fries

These issues were found by Dan Carpenter with a static checker.
Evgeniy are you okay with these?

[PATCH 1/2] cn: verify msg-len before making callback
[PATCH 2/2] w1: avoid potential u16 overflow
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] cn: verify msg-len before making callback

2014-11-09 Thread David Fries

The struct cn_msg len field comes from userspace and needs to be
validated.  More logical to do so here where the cn_msg pointer is
pulled out of the sk_buff than the callback which is passed cn_msg *
and might assume no validation is needed.

Reported-by: Dan Carpenter dan.carpen...@oracle.com
Signed-off-by: David Fries da...@fries.net
---
 drivers/connector/connector.c |6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c
index f612d68..30f5228 100644
--- a/drivers/connector/connector.c
+++ b/drivers/connector/connector.c
@@ -141,12 +141,18 @@ EXPORT_SYMBOL_GPL(cn_netlink_send);
  */
 static int cn_call_callback(struct sk_buff *skb)
 {
+   struct nlmsghdr *nlh;
struct cn_callback_entry *i, *cbq = NULL;
struct cn_dev *dev = cdev;
struct cn_msg *msg = nlmsg_data(nlmsg_hdr(skb));
struct netlink_skb_parms *nsp = NETLINK_CB(skb);
int err = -ENODEV;
 
+   /* verify msg-len is within skb */
+   nlh = nlmsg_hdr(skb);
+   if (nlh-nlmsg_len  NLMSG_HDRLEN + sizeof(struct cn_msg) + msg-len)
+   return -EINVAL;
+
spin_lock_bh(dev-cbdev-queue_lock);
list_for_each_entry(i, dev-cbdev-queue_list, callback_entry) {
if (cn_cb_equal(i-id.id, msg-id)) {
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] w1: do not unlock unheld list_mutex in __w1_remove_master_device()

2014-05-06 Thread David Fries

Acked-by: David Fries 

On Wed, May 07, 2014 at 01:26:04AM +0400, Alexey Khoroshilov wrote:
> w1_process_callbacks() expects to be called with dev->list_mutex held,
> but it is the fact only in w1_process(). __w1_remove_master_device()
> calls w1_process_callbacks() after it releases list_mutex.
> 
> The patch fixes __w1_remove_master_device() to acquire list_mutex
> for w1_process_callbacks().
> 
> Found by Linux Driver Verification project (linuxtesting.org).
> 
> Signed-off-by: Alexey Khoroshilov 
> ---
>  drivers/w1/w1.c | 2 ++
>  drivers/w1/w1_int.c | 4 
>  2 files changed, 6 insertions(+)
> 
> diff --git a/drivers/w1/w1.c b/drivers/w1/w1.c
> index ff52618cafbe..5d7341520544 100644
> --- a/drivers/w1/w1.c
> +++ b/drivers/w1/w1.c
> @@ -1078,6 +1078,8 @@ static void w1_search_process(struct w1_master *dev, u8 
> search_type)
>   * w1_process_callbacks() - execute each dev->async_list callback entry
>   * @dev: w1_master device
>   *
> + * The w1 master list_mutex must be held.
> + *
>   * Return: 1 if there were commands to executed 0 otherwise
>   */
>  int w1_process_callbacks(struct w1_master *dev)
> diff --git a/drivers/w1/w1_int.c b/drivers/w1/w1_int.c
> index 9b084db739c7..728039d2efe1 100644
> --- a/drivers/w1/w1_int.c
> +++ b/drivers/w1/w1_int.c
> @@ -219,9 +219,13 @@ void __w1_remove_master_device(struct w1_master *dev)
>  
>   if (msleep_interruptible(1000))
>   flush_signals(current);
> + mutex_lock(>list_mutex);
>   w1_process_callbacks(dev);
> + mutex_unlock(>list_mutex);
>   }
> + mutex_lock(>list_mutex);
>   w1_process_callbacks(dev);
> + mutex_unlock(>list_mutex);
>  
>   memset(, 0, sizeof(msg));
>   msg.id.mst.id = dev->id;
> -- 
> 1.8.3.2

-- 
David Fries PGP pub CB1EE8F0
http://fries.net/~david/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] w1: do not unlock unheld list_mutex in __w1_remove_master_device()

2014-05-06 Thread David Fries

On Wed, May 07, 2014 at 01:42:12AM +0400, Alexey Khoroshilov wrote:
> On 01.05.2014 07:48, David Fries wrote:
> > On Thu, May 01, 2014 at 12:37:58AM +0400, Alexey Khoroshilov wrote:
> >> w1_process_callbacks() expects to be called with dev->list_mutex held,
> >> but it is the fact only in w1_process(). __w1_remove_master_device()
> >> calls w1_process_callbacks() after it releases list_mutex.
> >>
> >> The patch fixes __w1_remove_master_device() to acquire list_mutex
> >> for w1_process_callbacks(). It is implemented by moving
> >> w1_process_callbacks() functionality to __w1_process_callbacks()
> >> and adding a wrapper with the old name.
> > Good catch.  I guess as it was in the shutdown path it failed the
> > unlock silently.
> >
> > I would prefer a different fix.  If w1_process_callbacks was more of a
> > public facing API called from multiple locations where the caller
> > doesn't have access to the locks, something like this would probably
> > be the way to go.  This is called from two areas of the code, and not
> > likely to be called from any new areas in the future.
> >
> > Adding a documentation comment is good.  I would be tempted in
> > __w1_remove_master_device to move the dev->list_mutex down after the
> > while loop, but I can't be sure that whatever has a refcnt wouldn't
> > need list_mutex.  The last w1_process_callbacks after the while loop
> > shouldn't be needed I wouldn't think, I was just being defensive.
> I do not understand all the details, but I am not sure.
> refcnt can became 0, but that does not mean all callbacks have been
> processed because they can be added before refcnt decrement but after
> previous w1_process_callbacks invocation.
> >   By
> > the time it completes the while loop the reference count is 0 so there
> > shouldn't be anything left for it to process and if there is, it's a
> > race condition and game over anyway.  So just sandwich
> > w1_process_callbacks with the lock/unlock in
> > __w1_remove_master_device please.
> Done.

Thanks for finding and fixing this bug.

> Another suspicious question for me is how safe is it to call
>   wake_up_process(dev->thread);
> after
>   kthread_stop(dev->thread);
> ?

Looks bad, but I would need to convince myself it could actually be a
problem.
int wake_up_process(struct task_struct *p)
{   
WARN_ON(task_is_stopped_or_traced(p));
return try_to_wake_up(p, TASK_NORMAL, 0);
}

But I think there are lower hanging fruit, like revisiting all the
locks in all the code and testing with the lock debugging code on and
get it all to pass or document that it really is correct.  Currently
it is detecting a locking order problem, but both code paths can only
be called from the w1_process function, which can only be called by
one thread, and one thread can't dead lock itself with a locking order
problem.

> >> Found by Linux Driver Verification project (linuxtesting.org).
> > Was this found with code inspection or hardware testing with lock
> > debugging enabled?
> It was found by static verification tools developed within Linux Driver
> Verification project.
> 
> 
> Best regards,
> Alexey

-- 
David Fries PGP pub CB1EE8F0
http://fries.net/~david/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] w1: do not unlock unheld list_mutex in __w1_remove_master_device()

2014-05-06 Thread David Fries

On Wed, May 07, 2014 at 01:42:12AM +0400, Alexey Khoroshilov wrote:
 On 01.05.2014 07:48, David Fries wrote:
  On Thu, May 01, 2014 at 12:37:58AM +0400, Alexey Khoroshilov wrote:
  w1_process_callbacks() expects to be called with dev-list_mutex held,
  but it is the fact only in w1_process(). __w1_remove_master_device()
  calls w1_process_callbacks() after it releases list_mutex.
 
  The patch fixes __w1_remove_master_device() to acquire list_mutex
  for w1_process_callbacks(). It is implemented by moving
  w1_process_callbacks() functionality to __w1_process_callbacks()
  and adding a wrapper with the old name.
  Good catch.  I guess as it was in the shutdown path it failed the
  unlock silently.
 
  I would prefer a different fix.  If w1_process_callbacks was more of a
  public facing API called from multiple locations where the caller
  doesn't have access to the locks, something like this would probably
  be the way to go.  This is called from two areas of the code, and not
  likely to be called from any new areas in the future.
 
  Adding a documentation comment is good.  I would be tempted in
  __w1_remove_master_device to move the dev-list_mutex down after the
  while loop, but I can't be sure that whatever has a refcnt wouldn't
  need list_mutex.  The last w1_process_callbacks after the while loop
  shouldn't be needed I wouldn't think, I was just being defensive.
 I do not understand all the details, but I am not sure.
 refcnt can became 0, but that does not mean all callbacks have been
 processed because they can be added before refcnt decrement but after
 previous w1_process_callbacks invocation.
By
  the time it completes the while loop the reference count is 0 so there
  shouldn't be anything left for it to process and if there is, it's a
  race condition and game over anyway.  So just sandwich
  w1_process_callbacks with the lock/unlock in
  __w1_remove_master_device please.
 Done.

Thanks for finding and fixing this bug.

 Another suspicious question for me is how safe is it to call
   wake_up_process(dev-thread);
 after
   kthread_stop(dev-thread);
 ?

Looks bad, but I would need to convince myself it could actually be a
problem.
int wake_up_process(struct task_struct *p)
{   
WARN_ON(task_is_stopped_or_traced(p));
return try_to_wake_up(p, TASK_NORMAL, 0);
}

But I think there are lower hanging fruit, like revisiting all the
locks in all the code and testing with the lock debugging code on and
get it all to pass or document that it really is correct.  Currently
it is detecting a locking order problem, but both code paths can only
be called from the w1_process function, which can only be called by
one thread, and one thread can't dead lock itself with a locking order
problem.

  Found by Linux Driver Verification project (linuxtesting.org).
  Was this found with code inspection or hardware testing with lock
  debugging enabled?
 It was found by static verification tools developed within Linux Driver
 Verification project.
 
 
 Best regards,
 Alexey

-- 
David Fries da...@fries.netPGP pub CB1EE8F0
http://fries.net/~david/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] w1: do not unlock unheld list_mutex in __w1_remove_master_device()

2014-05-06 Thread David Fries

Acked-by: David Fries da...@fries.net

On Wed, May 07, 2014 at 01:26:04AM +0400, Alexey Khoroshilov wrote:
 w1_process_callbacks() expects to be called with dev-list_mutex held,
 but it is the fact only in w1_process(). __w1_remove_master_device()
 calls w1_process_callbacks() after it releases list_mutex.
 
 The patch fixes __w1_remove_master_device() to acquire list_mutex
 for w1_process_callbacks().
 
 Found by Linux Driver Verification project (linuxtesting.org).
 
 Signed-off-by: Alexey Khoroshilov khoroshi...@ispras.ru
 ---
  drivers/w1/w1.c | 2 ++
  drivers/w1/w1_int.c | 4 
  2 files changed, 6 insertions(+)
 
 diff --git a/drivers/w1/w1.c b/drivers/w1/w1.c
 index ff52618cafbe..5d7341520544 100644
 --- a/drivers/w1/w1.c
 +++ b/drivers/w1/w1.c
 @@ -1078,6 +1078,8 @@ static void w1_search_process(struct w1_master *dev, u8 
 search_type)
   * w1_process_callbacks() - execute each dev-async_list callback entry
   * @dev: w1_master device
   *
 + * The w1 master list_mutex must be held.
 + *
   * Return: 1 if there were commands to executed 0 otherwise
   */
  int w1_process_callbacks(struct w1_master *dev)
 diff --git a/drivers/w1/w1_int.c b/drivers/w1/w1_int.c
 index 9b084db739c7..728039d2efe1 100644
 --- a/drivers/w1/w1_int.c
 +++ b/drivers/w1/w1_int.c
 @@ -219,9 +219,13 @@ void __w1_remove_master_device(struct w1_master *dev)
  
   if (msleep_interruptible(1000))
   flush_signals(current);
 + mutex_lock(dev-list_mutex);
   w1_process_callbacks(dev);
 + mutex_unlock(dev-list_mutex);
   }
 + mutex_lock(dev-list_mutex);
   w1_process_callbacks(dev);
 + mutex_unlock(dev-list_mutex);
  
   memset(msg, 0, sizeof(msg));
   msg.id.mst.id = dev-id;
 -- 
 1.8.3.2

-- 
David Fries da...@fries.netPGP pub CB1EE8F0
http://fries.net/~david/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] w1: do not unlock unheld list_mutex in __w1_remove_master_device()

2014-04-30 Thread David Fries

On Thu, May 01, 2014 at 12:37:58AM +0400, Alexey Khoroshilov wrote:
> w1_process_callbacks() expects to be called with dev->list_mutex held,
> but it is the fact only in w1_process(). __w1_remove_master_device()
> calls w1_process_callbacks() after it releases list_mutex.
> 
> The patch fixes __w1_remove_master_device() to acquire list_mutex
> for w1_process_callbacks(). It is implemented by moving
> w1_process_callbacks() functionality to __w1_process_callbacks()
> and adding a wrapper with the old name.

Good catch.  I guess as it was in the shutdown path it failed the
unlock silently.

I would prefer a different fix.  If w1_process_callbacks was more of a
public facing API called from multiple locations where the caller
doesn't have access to the locks, something like this would probably
be the way to go.  This is called from two areas of the code, and not
likely to be called from any new areas in the future.

Adding a documentation comment is good.  I would be tempted in
__w1_remove_master_device to move the dev->list_mutex down after the
while loop, but I can't be sure that whatever has a refcnt wouldn't
need list_mutex.  The last w1_process_callbacks after the while loop
shouldn't be needed I wouldn't think, I was just being defensive.  By
the time it completes the while loop the reference count is 0 so there
shouldn't be anything left for it to process and if there is, it's a
race condition and game over anyway.  So just sandwich
w1_process_callbacks with the lock/unlock in
__w1_remove_master_device please.

> Found by Linux Driver Verification project (linuxtesting.org).

Was this found with code inspection or hardware testing with lock
debugging enabled?

> Signed-off-by: Alexey Khoroshilov 
> ---
>  drivers/w1/w1.c |  8 +---
>  drivers/w1/w1.h | 13 -
>  2 files changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/w1/w1.c b/drivers/w1/w1.c
> index ff52618cafbe..e507c51512aa 100644
> --- a/drivers/w1/w1.c
> +++ b/drivers/w1/w1.c
> @@ -1075,12 +1075,14 @@ static void w1_search_process(struct w1_master *dev, 
> u8 search_type)
>  }
>  
>  /**
> - * w1_process_callbacks() - execute each dev->async_list callback entry
> + * __w1_process_callbacks() - execute each dev->async_list callback entry
>   * @dev: w1_master device
>   *
> + * The w1 master list_mutex must be held.
> + *
>   * Return: 1 if there were commands to executed 0 otherwise
>   */
> -int w1_process_callbacks(struct w1_master *dev)
> +int __w1_process_callbacks(struct w1_master *dev)
>  {
>   int ret = 0;
>   struct w1_async_cmd *async_cmd, *async_n;
> @@ -1122,7 +1124,7 @@ int w1_process(void *data)
>   /* Note, w1_process_callback drops the lock while processing,
>* but locks it again before returning.
>*/
> - if (!w1_process_callbacks(dev) && jremain) {
> + if (!__w1_process_callbacks(dev) && jremain) {
>   /* a wake up is either to stop the thread, process
>* callbacks, or search, it isn't process callbacks, so
>* schedule a search.
> diff --git a/drivers/w1/w1.h b/drivers/w1/w1.h
> index 734dab7fc687..e357321a5e13 100644
> --- a/drivers/w1/w1.h
> +++ b/drivers/w1/w1.h
> @@ -341,9 +341,20 @@ extern int w1_max_slave_ttl;
>  extern struct list_head w1_masters;
>  extern struct mutex w1_mlock;
>  
> -extern int w1_process_callbacks(struct w1_master *dev);
> +extern int __w1_process_callbacks(struct w1_master *dev);
>  extern int w1_process(void *);
>  
> +static inline int w1_process_callbacks(struct w1_master *dev)
> +{
> + int ret;
> +
> + mutex_lock(>list_mutex);
> + ret = __w1_process_callbacks(dev);
> + mutex_unlock(>list_mutex);
> + return ret;
> +}
> +
> +
>  #endif /* __KERNEL__ */
>  
>  #endif /* __W1_H */
> -- 
> 1.8.3.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
David Fries PGP pub CB1EE8F0
http://fries.net/~david/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] w1: do not unlock unheld list_mutex in __w1_remove_master_device()

2014-04-30 Thread David Fries

On Thu, May 01, 2014 at 12:37:58AM +0400, Alexey Khoroshilov wrote:
 w1_process_callbacks() expects to be called with dev-list_mutex held,
 but it is the fact only in w1_process(). __w1_remove_master_device()
 calls w1_process_callbacks() after it releases list_mutex.
 
 The patch fixes __w1_remove_master_device() to acquire list_mutex
 for w1_process_callbacks(). It is implemented by moving
 w1_process_callbacks() functionality to __w1_process_callbacks()
 and adding a wrapper with the old name.

Good catch.  I guess as it was in the shutdown path it failed the
unlock silently.

I would prefer a different fix.  If w1_process_callbacks was more of a
public facing API called from multiple locations where the caller
doesn't have access to the locks, something like this would probably
be the way to go.  This is called from two areas of the code, and not
likely to be called from any new areas in the future.

Adding a documentation comment is good.  I would be tempted in
__w1_remove_master_device to move the dev-list_mutex down after the
while loop, but I can't be sure that whatever has a refcnt wouldn't
need list_mutex.  The last w1_process_callbacks after the while loop
shouldn't be needed I wouldn't think, I was just being defensive.  By
the time it completes the while loop the reference count is 0 so there
shouldn't be anything left for it to process and if there is, it's a
race condition and game over anyway.  So just sandwich
w1_process_callbacks with the lock/unlock in
__w1_remove_master_device please.

 Found by Linux Driver Verification project (linuxtesting.org).

Was this found with code inspection or hardware testing with lock
debugging enabled?

 Signed-off-by: Alexey Khoroshilov khoroshi...@ispras.ru
 ---
  drivers/w1/w1.c |  8 +---
  drivers/w1/w1.h | 13 -
  2 files changed, 17 insertions(+), 4 deletions(-)
 
 diff --git a/drivers/w1/w1.c b/drivers/w1/w1.c
 index ff52618cafbe..e507c51512aa 100644
 --- a/drivers/w1/w1.c
 +++ b/drivers/w1/w1.c
 @@ -1075,12 +1075,14 @@ static void w1_search_process(struct w1_master *dev, 
 u8 search_type)
  }
  
  /**
 - * w1_process_callbacks() - execute each dev-async_list callback entry
 + * __w1_process_callbacks() - execute each dev-async_list callback entry
   * @dev: w1_master device
   *
 + * The w1 master list_mutex must be held.
 + *
   * Return: 1 if there were commands to executed 0 otherwise
   */
 -int w1_process_callbacks(struct w1_master *dev)
 +int __w1_process_callbacks(struct w1_master *dev)
  {
   int ret = 0;
   struct w1_async_cmd *async_cmd, *async_n;
 @@ -1122,7 +1124,7 @@ int w1_process(void *data)
   /* Note, w1_process_callback drops the lock while processing,
* but locks it again before returning.
*/
 - if (!w1_process_callbacks(dev)  jremain) {
 + if (!__w1_process_callbacks(dev)  jremain) {
   /* a wake up is either to stop the thread, process
* callbacks, or search, it isn't process callbacks, so
* schedule a search.
 diff --git a/drivers/w1/w1.h b/drivers/w1/w1.h
 index 734dab7fc687..e357321a5e13 100644
 --- a/drivers/w1/w1.h
 +++ b/drivers/w1/w1.h
 @@ -341,9 +341,20 @@ extern int w1_max_slave_ttl;
  extern struct list_head w1_masters;
  extern struct mutex w1_mlock;
  
 -extern int w1_process_callbacks(struct w1_master *dev);
 +extern int __w1_process_callbacks(struct w1_master *dev);
  extern int w1_process(void *);
  
 +static inline int w1_process_callbacks(struct w1_master *dev)
 +{
 + int ret;
 +
 + mutex_lock(dev-list_mutex);
 + ret = __w1_process_callbacks(dev);
 + mutex_unlock(dev-list_mutex);
 + return ret;
 +}
 +
 +
  #endif /* __KERNEL__ */
  
  #endif /* __W1_H */
 -- 
 1.8.3.2
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

-- 
David Fries da...@fries.netPGP pub CB1EE8F0
http://fries.net/~david/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] w1: avoid recursive device_add

2014-04-16 Thread David Fries

__w1_attach_slave_device calls device_add which calls w1_bus_notify
which calls the w1_bq27000 slave driver, which calls
platform_device_add and device_add and deadlocks on getting
&(>bus_notifier)->rwsem as it is still held in the previous
device_add.  This avoids the problem by processing the family
add/remove outside of the slave device_add call.

Commit 47eba33a0997fc7362a introduced this deadlock and added
a KOBJ_ADD, as the add was already reported in device_register two add
events were being sent.  This change suppresses the device_register
add so that any slave device sysfs entries are setup before the add
goes out.

Belisko Marek reported this change fixed the deadlock he was seeing on
ARM device tree, while testing on my x86-64 system never saw the
deadlock.

Reported-by: Belisko Marek 
Signed-off-by: David Fries 
---
Evgeniy any comments?  The patch looks good to me.
Dr. H. Nikolaus Schaller, Can you give this patch another test?  I
don't expect any problems, but it is now only sending one add instead
of two which is a change.
I verified two KOBJ_ADD notifications were going out prior to this
change, and only one after.  There should only be one, also verified
that the w1_bq27000 add notification went out before the slave device
id, showing the suppression was working.  Used
udevadm monitor --kernel --property
for notification monitoring.

 drivers/w1/w1.c |   32 ++--
 1 file changed, 6 insertions(+), 26 deletions(-)

diff --git a/drivers/w1/w1.c b/drivers/w1/w1.c
index b96f61b..ff52618 100644
--- a/drivers/w1/w1.c
+++ b/drivers/w1/w1.c
@@ -614,27 +614,11 @@ end:
return err;
 }
 
-/*
- * Handle sysfs file creation and removal here, before userspace is told that
- * the device is added / removed from the system
- */
-static int w1_bus_notify(struct notifier_block *nb, unsigned long action,
-void *data)
+static int w1_family_notify(unsigned long action, struct w1_slave *sl)
 {
-   struct device *dev = data;
-   struct w1_slave *sl;
struct w1_family_ops *fops;
int err;
 
-   /*
-* Only care about slave devices at the moment.  Yes, we should use a
-* separate "type" for this, but for now, look at the release function
-* to know which type it is...
-*/
-   if (dev->release != w1_slave_release)
-   return 0;
-
-   sl = dev_to_w1_slave(dev);
fops = sl->family->fops;
 
if (!fops)
@@ -673,10 +657,6 @@ static int w1_bus_notify(struct notifier_block *nb, 
unsigned long action,
return 0;
 }
 
-static struct notifier_block w1_bus_nb = {
-   .notifier_call = w1_bus_notify,
-};
-
 static int __w1_attach_slave_device(struct w1_slave *sl)
 {
int err;
@@ -698,6 +678,9 @@ static int __w1_attach_slave_device(struct w1_slave *sl)
dev_dbg(>dev, "%s: registering %s as %p.\n", __func__,
dev_name(>dev), sl);
 
+   /* suppress for w1_family_notify before sending KOBJ_ADD */
+   dev_set_uevent_suppress(>dev, true);
+
err = device_register(>dev);
if (err < 0) {
dev_err(>dev,
@@ -705,7 +688,7 @@ static int __w1_attach_slave_device(struct w1_slave *sl)
dev_name(>dev), err);
return err;
}
-
+   w1_family_notify(BUS_NOTIFY_ADD_DEVICE, sl);
 
dev_set_uevent_suppress(>dev, false);
kobject_uevent(>dev.kobj, KOBJ_ADD);
@@ -799,6 +782,7 @@ int w1_unref_slave(struct w1_slave *sl)
msg.type = W1_SLAVE_REMOVE;
w1_netlink_send(sl->master, );
 
+   w1_family_notify(BUS_NOTIFY_DEL_DEVICE, sl);
device_unregister(>dev);
#ifdef DEBUG
memset(sl, 0, sizeof(*sl));
@@ -1186,10 +1170,6 @@ static int __init w1_init(void)
goto err_out_exit_init;
}
 
-   retval = bus_register_notifier(_bus_type, _bus_nb);
-   if (retval)
-   goto err_out_bus_unregister;
-
retval = driver_register(_master_driver);
if (retval) {
printk(KERN_ERR
-- 
1.7.10.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] w1: avoid recursive device_add

2014-04-16 Thread David Fries

__w1_attach_slave_device calls device_add which calls w1_bus_notify
which calls the w1_bq27000 slave driver, which calls
platform_device_add and device_add and deadlocks on getting
(priv-bus_notifier)-rwsem as it is still held in the previous
device_add.  This avoids the problem by processing the family
add/remove outside of the slave device_add call.

Commit 47eba33a0997fc7362a introduced this deadlock and added
a KOBJ_ADD, as the add was already reported in device_register two add
events were being sent.  This change suppresses the device_register
add so that any slave device sysfs entries are setup before the add
goes out.

Belisko Marek reported this change fixed the deadlock he was seeing on
ARM device tree, while testing on my x86-64 system never saw the
deadlock.

Reported-by: Belisko Marek marek.beli...@gmail.com
Signed-off-by: David Fries da...@fries.net
---
Evgeniy any comments?  The patch looks good to me.
Dr. H. Nikolaus Schaller, Can you give this patch another test?  I
don't expect any problems, but it is now only sending one add instead
of two which is a change.
I verified two KOBJ_ADD notifications were going out prior to this
change, and only one after.  There should only be one, also verified
that the w1_bq27000 add notification went out before the slave device
id, showing the suppression was working.  Used
udevadm monitor --kernel --property
for notification monitoring.

 drivers/w1/w1.c |   32 ++--
 1 file changed, 6 insertions(+), 26 deletions(-)

diff --git a/drivers/w1/w1.c b/drivers/w1/w1.c
index b96f61b..ff52618 100644
--- a/drivers/w1/w1.c
+++ b/drivers/w1/w1.c
@@ -614,27 +614,11 @@ end:
return err;
 }
 
-/*
- * Handle sysfs file creation and removal here, before userspace is told that
- * the device is added / removed from the system
- */
-static int w1_bus_notify(struct notifier_block *nb, unsigned long action,
-void *data)
+static int w1_family_notify(unsigned long action, struct w1_slave *sl)
 {
-   struct device *dev = data;
-   struct w1_slave *sl;
struct w1_family_ops *fops;
int err;
 
-   /*
-* Only care about slave devices at the moment.  Yes, we should use a
-* separate type for this, but for now, look at the release function
-* to know which type it is...
-*/
-   if (dev-release != w1_slave_release)
-   return 0;
-
-   sl = dev_to_w1_slave(dev);
fops = sl-family-fops;
 
if (!fops)
@@ -673,10 +657,6 @@ static int w1_bus_notify(struct notifier_block *nb, 
unsigned long action,
return 0;
 }
 
-static struct notifier_block w1_bus_nb = {
-   .notifier_call = w1_bus_notify,
-};
-
 static int __w1_attach_slave_device(struct w1_slave *sl)
 {
int err;
@@ -698,6 +678,9 @@ static int __w1_attach_slave_device(struct w1_slave *sl)
dev_dbg(sl-dev, %s: registering %s as %p.\n, __func__,
dev_name(sl-dev), sl);
 
+   /* suppress for w1_family_notify before sending KOBJ_ADD */
+   dev_set_uevent_suppress(sl-dev, true);
+
err = device_register(sl-dev);
if (err  0) {
dev_err(sl-dev,
@@ -705,7 +688,7 @@ static int __w1_attach_slave_device(struct w1_slave *sl)
dev_name(sl-dev), err);
return err;
}
-
+   w1_family_notify(BUS_NOTIFY_ADD_DEVICE, sl);
 
dev_set_uevent_suppress(sl-dev, false);
kobject_uevent(sl-dev.kobj, KOBJ_ADD);
@@ -799,6 +782,7 @@ int w1_unref_slave(struct w1_slave *sl)
msg.type = W1_SLAVE_REMOVE;
w1_netlink_send(sl-master, msg);
 
+   w1_family_notify(BUS_NOTIFY_DEL_DEVICE, sl);
device_unregister(sl-dev);
#ifdef DEBUG
memset(sl, 0, sizeof(*sl));
@@ -1186,10 +1170,6 @@ static int __init w1_init(void)
goto err_out_exit_init;
}
 
-   retval = bus_register_notifier(w1_bus_type, w1_bus_nb);
-   if (retval)
-   goto err_out_bus_unregister;
-
retval = driver_register(w1_master_driver);
if (retval) {
printk(KERN_ERR
-- 
1.7.10.4
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: w1: 3.14-rc7 - possible recursive locking detected

2014-04-13 Thread David Fries

Belisko Marek,
Here is a possible solution, could you give it a try and report back?

Greg Kroah-Hartman,
Evgeniy asked me to look into this report.  I don't have the
reporter's hardware configuration, but I wouldn't think that would be
needed, just some w1 bus master (even W1_MASTER_GPIO might work), then
loading the slave device and manually adding a slave device with that
family id.  Even then I didn't reproduce the reported recursive
locking error.  I saw unrelated locking reports, but not this one.  I
wrote up the included patch, but that undoes the notify changes that
you added earlier in commit 47eba33a0997fc7, and I wanted to ask about
that commit.  Specifically these two lines,

err = device_register(>dev);
...
+   dev_set_uevent_suppress(>dev, false);
+   kobject_uevent(>dev.kobj, KOBJ_ADD);

Wouldn't the default be to not suppress?  Nothing in the W1 system
enables suppressing so is that even needed?  (I'm fine with saying
it's a good idea).
device_register at some point must call device_add and device_add
calls kobject_uevent(>kobj, KOBJ_ADD);
so doesn't the KOBJ_ADD send the add a second time?  As in it
shouldn't be needed?
Can the suppress be called before device_register to avoid the
automatic notify, then after it returns setup the slave device as this
patch does to avoid this problem report, and then call the KOBJ_ADD to
make everything happy?

I'm based on commit ce7613db2d8d4d5af2587a.

On Tue, Apr 08, 2014 at 09:47:50PM +0200, Belisko Marek wrote:
> Hi,
> 
> On Mon, Mar 24, 2014 at 1:01 PM, Evgeniy Polyakov  wrote:
> > Hi everyone
> >
> > 24.03.2014, 01:50, "David Fries" :
> >> On Mon, Mar 17, 2014 at 10:38:13PM +0100, Belisko Marek wrote:
> >>>  booting latest 3.14-rc7 on gta04 board gives following warning:
> >>>
> >>>  [3.101409] Driver for 1-wire Dallas network protocol.
> >>>  [3.109649] omap_hdq omap_hdq: OMAP HDQ Hardware Rev 0.5. Driver in
> >>>  [3.400146]CPU0
> >>>  [3.402709]
> >>>  [3.405273]   lock(&(>bus_notifier)->rwsem);
> >>>  [3.410308]   lock(&(>bus_notifier)->rwsem);
> >>>  [3.415374]
> >>>  [3.415374]  *** DEADLOCK ***
> >>>
> >>>  AFAIK we didn't see that on (at least 3/14-rc2).
> >>
> >> The last drivers/w1 change went in before v3.14-rc1, so if it is
> >> something in the w1 code, it has either been there or something else
> >> in the rest of the kernel changed to expose it.  I'm using the ds2490
> >> USB w1 dongle, I didn't see that problem when enabling lock debugging
> >> on 3.14.0-rc5+ which includes some changes I'm working on.
> >>
> >> https://github.com/dfries/linux.git w1_rework
> >
> > Can it be a bug in omap w1 driver?
> We test with actual Linus master and it's even worse (it hangs completely).
> Trace look little bit different and it only happens when booting via
> device tree (not with board file).
> 
> [2.807434] Driver for 1-wire Dallas network protocol.
> [2.814117] omap_hdq omap_hdq: OMAP HDQ Hardware Rev 0.5. Driver in
> Interrupt mode
> [3.034271] call_modprobe: w1-family-0x1
> [3.051818]
> [3.053405] =
> [3.059051] [ INFO: possible recursive locking detected ]
> [3.064727] 3.14.0-gta04 #556 Not tainted
> [3.068939] -
> [3.074615] w1_bus_master1/795 is trying to acquire lock:
> [3.080261]  (&(>bus_notifier)->rwsem){.+.+.+}, at:
> [] __blocking_notifier_call_chain+0x28/0x58
> [3.090911]
> [3.090911] but task is already holding lock:
> [3.097045]  (&(>bus_notifier)->rwsem){.+.+.+}, at:
> [] __blocking_notifier_call_chain+0x28/0x58
> [3.107666]
> [3.107666] other info that might help us debug this:
> [3.114501]  Possible unsafe locking scenario:
> [3.114501]
> [3.120727]CPU0
> [3.123291]
> [3.125854]   lock(&(>bus_notifier)->rwsem);
> [3.130889]   lock(&(>bus_notifier)->rwsem);
> [3.135925]
> [3.135925]  *** DEADLOCK ***
> [3.135925]
> [3.142150]  May be due to missing lock nesting notation
> [3.142150]
> [3.149291] 2 locks held by w1_bus_master1/795:
> [3.154052]  #0:  (>mutex#3){+.+.+.}, at: []
> w1_attach_slave_device+0xc4/0x1c8
> [3.163055]  #1:  (&(>bus_notifier)->rwsem){.+.+.+}, at:
> [] __blocking_notifier_call_chain+0x28/0x58
> [3.174163]
> [3.174163] stack backtrace:
> [3.178741] CPU: 0 PID: 795 Comm: w1_bus_master1 Not tainted
> 3.14.0-gta04 #556
>

Re: w1: 3.14-rc7 - possible recursive locking detected

2014-04-13 Thread David Fries

Belisko Marek,
Here is a possible solution, could you give it a try and report back?

Greg Kroah-Hartman,
Evgeniy asked me to look into this report.  I don't have the
reporter's hardware configuration, but I wouldn't think that would be
needed, just some w1 bus master (even W1_MASTER_GPIO might work), then
loading the slave device and manually adding a slave device with that
family id.  Even then I didn't reproduce the reported recursive
locking error.  I saw unrelated locking reports, but not this one.  I
wrote up the included patch, but that undoes the notify changes that
you added earlier in commit 47eba33a0997fc7, and I wanted to ask about
that commit.  Specifically these two lines,

err = device_register(sl-dev);
...
+   dev_set_uevent_suppress(sl-dev, false);
+   kobject_uevent(sl-dev.kobj, KOBJ_ADD);

Wouldn't the default be to not suppress?  Nothing in the W1 system
enables suppressing so is that even needed?  (I'm fine with saying
it's a good idea).
device_register at some point must call device_add and device_add
calls kobject_uevent(dev-kobj, KOBJ_ADD);
so doesn't the KOBJ_ADD send the add a second time?  As in it
shouldn't be needed?
Can the suppress be called before device_register to avoid the
automatic notify, then after it returns setup the slave device as this
patch does to avoid this problem report, and then call the KOBJ_ADD to
make everything happy?

I'm based on commit ce7613db2d8d4d5af2587a.

On Tue, Apr 08, 2014 at 09:47:50PM +0200, Belisko Marek wrote:
 Hi,
 
 On Mon, Mar 24, 2014 at 1:01 PM, Evgeniy Polyakov z...@ioremap.net wrote:
  Hi everyone
 
  24.03.2014, 01:50, David Fries da...@fries.net:
  On Mon, Mar 17, 2014 at 10:38:13PM +0100, Belisko Marek wrote:
   booting latest 3.14-rc7 on gta04 board gives following warning:
 
   [3.101409] Driver for 1-wire Dallas network protocol.
   [3.109649] omap_hdq omap_hdq: OMAP HDQ Hardware Rev 0.5. Driver in
   [3.400146]CPU0
   [3.402709]
   [3.405273]   lock((priv-bus_notifier)-rwsem);
   [3.410308]   lock((priv-bus_notifier)-rwsem);
   [3.415374]
   [3.415374]  *** DEADLOCK ***
 
   AFAIK we didn't see that on (at least 3/14-rc2).
 
  The last drivers/w1 change went in before v3.14-rc1, so if it is
  something in the w1 code, it has either been there or something else
  in the rest of the kernel changed to expose it.  I'm using the ds2490
  USB w1 dongle, I didn't see that problem when enabling lock debugging
  on 3.14.0-rc5+ which includes some changes I'm working on.
 
  https://github.com/dfries/linux.git w1_rework
 
  Can it be a bug in omap w1 driver?
 We test with actual Linus master and it's even worse (it hangs completely).
 Trace look little bit different and it only happens when booting via
 device tree (not with board file).
 
 [2.807434] Driver for 1-wire Dallas network protocol.
 [2.814117] omap_hdq omap_hdq: OMAP HDQ Hardware Rev 0.5. Driver in
 Interrupt mode
 [3.034271] call_modprobe: w1-family-0x1
 [3.051818]
 [3.053405] =
 [3.059051] [ INFO: possible recursive locking detected ]
 [3.064727] 3.14.0-gta04 #556 Not tainted
 [3.068939] -
 [3.074615] w1_bus_master1/795 is trying to acquire lock:
 [3.080261]  ((priv-bus_notifier)-rwsem){.+.+.+}, at:
 [c0058ab0] __blocking_notifier_call_chain+0x28/0x58
 [3.090911]
 [3.090911] but task is already holding lock:
 [3.097045]  ((priv-bus_notifier)-rwsem){.+.+.+}, at:
 [c0058ab0] __blocking_notifier_call_chain+0x28/0x58
 [3.107666]
 [3.107666] other info that might help us debug this:
 [3.114501]  Possible unsafe locking scenario:
 [3.114501]
 [3.120727]CPU0
 [3.123291]
 [3.125854]   lock((priv-bus_notifier)-rwsem);
 [3.130889]   lock((priv-bus_notifier)-rwsem);
 [3.135925]
 [3.135925]  *** DEADLOCK ***
 [3.135925]
 [3.142150]  May be due to missing lock nesting notation
 [3.142150]
 [3.149291] 2 locks held by w1_bus_master1/795:
 [3.154052]  #0:  (dev-mutex#3){+.+.+.}, at: [c03ba1c8]
 w1_attach_slave_device+0xc4/0x1c8
 [3.163055]  #1:  ((priv-bus_notifier)-rwsem){.+.+.+}, at:
 [c0058ab0] __blocking_notifier_call_chain+0x28/0x58
 [3.174163]
 [3.174163] stack backtrace:
 [3.178741] CPU: 0 PID: 795 Comm: w1_bus_master1 Not tainted
 3.14.0-gta04 #556
 [3.186370] [c0013ca0] (unwind_backtrace) from [c0010f68]
 (show_stack+0x10/0x14)
 [3.194488] [c0010f68] (show_stack) from [c04f0900]
 (dump_stack+0x6c/0x88)
 [3.202117] [c04f0900] (dump_stack) from [c00746bc]
 (print_deadlock_bug+0xc0/0xf0)
 [3.210449] [c00746bc] (print_deadlock_bug) from [c0075f80]
 (validate_chain.isra.29+0x4dc/0x534)
 [3.220031] [c0075f80] (validate_chain.isra.29) from [c0076dac]
 (__lock_acquire+0x728/0x834)
 [3.229248] [c0076dac] (__lock_acquire) from [c00774b8]
 (lock_acquire+0xf4

Re: [PATCH 1/3] w1: fix netlink refcnt leak on error path

2014-04-08 Thread David Fries

This patch is a bug fix, and I see from the mailing list I'm not the
only one to run into this bug, so it would be nice for this patch to
make it into this merge window.  I didn't tag this one for stable
because it doesn't apply cleanly due to previous changes that did make
it into the merge window.  Let me know if I should rewrite it for
stable and which kernel version.

The other two patches are more feature based changes.

On Tue, Apr 08, 2014 at 10:37:07PM -0500, David Fries wrote:
> If the message type is W1_MASTER_CMD or W1_SLAVE_CMD, then a reference
> is taken when searching for the slave or master device.  If there
> isn't any following data m->len (mlen is a copy) is 0 and packing up
> the message for later execution is skipped leaving nothing to
> decrement the reference counts.
> 
> Way back when, m->len was checked before the search that increments the
> reference count, but W1_LIST_MASTERS has no additional data, the check
> was moved in 9be62e0b2fadaf5ff causing this bug.
> 
> This change reorders to put the check before the reference count is
> incremented avoiding the problem.
> 
> Signed-off-by: David Fries 
> Acked-by: Evgeniy Polyakov 

-- 
David Fries PGP pub CB1EE8F0
http://fries.net/~david/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/3] w1: optional bundling of netlink kernel replies

2014-04-08 Thread David Fries

Applications can submit a set of commands in one packet to the kernel,
and in some cases it is required such as reading the temperature
sensor results.  This adds an option W1_CN_BUNDLE to the flags of
cn_msg to request the kernel to reply in one packet for efficiency.

The cn_msg flags now check for unknown flag values and return an error
if one is seen.  See "Proper handling of unknown flags in system
calls" http://lwn.net/Articles/588444/

This corrects the ack values returned as per the protocol standard,
namely the original ack for status messages and seq + 1 for all others
such as the data returned from a read.

Some of the common variable names have been standardized as follows.
struct cn_msg *cn
struct w1_netlink_msg *msg
struct w1_netlink_cmd *cmd
struct w1_master *dev

When an argument and a function scope variable would collide, add req_
to the argument.

Signed-off-by: David Fries 
Acked-by: Evgeniy Polyakov 
---
 Documentation/connector/connector.txt |2 +-
 Documentation/w1/w1.generic   |2 +-
 Documentation/w1/w1.netlink   |   13 +-
 drivers/w1/w1.h   |8 -
 drivers/w1/w1_netlink.c   |  649 -
 drivers/w1/w1_netlink.h   |   36 ++
 6 files changed, 447 insertions(+), 263 deletions(-)

diff --git a/Documentation/connector/connector.txt 
b/Documentation/connector/connector.txt
index e56abdb..f6215f9 100644
--- a/Documentation/connector/connector.txt
+++ b/Documentation/connector/connector.txt
@@ -118,7 +118,7 @@ acknowledge number MUST be the same + 1.
 If we receive a message and its sequence number is not equal to one we
 are expecting, then it is a new message.  If we receive a message and
 its sequence number is the same as one we are expecting, but its
-acknowledge is not equal to the acknowledge number in the original
+acknowledge is not equal to the sequence number in the original
 message + 1, then it is a new message.
 
 Obviously, the protocol header contains the above id.
diff --git a/Documentation/w1/w1.generic b/Documentation/w1/w1.generic
index a31c5a2..b2033c6 100644
--- a/Documentation/w1/w1.generic
+++ b/Documentation/w1/w1.generic
@@ -82,7 +82,7 @@ driver - (standard) symlink to the w1 driver
 w1_master_add  - Manually register a slave device
 w1_master_attempts - the number of times a search was attempted
 w1_master_max_slave_count
-   - the maximum slaves that may be attached to a master
+   - maximum number of slaves to search for at a time
 w1_master_name - the name of the device (w1_bus_masterX)
 w1_master_pullup   - 5V strong pullup 0 enabled, 1 disabled
 w1_master_remove   - Manually remove a slave device
diff --git a/Documentation/w1/w1.netlink b/Documentation/w1/w1.netlink
index 927a52c..ef27271 100644
--- a/Documentation/w1/w1.netlink
+++ b/Documentation/w1/w1.netlink
@@ -30,7 +30,7 @@ Protocol.
W1_SLAVE_CMD
userspace command for slave device
(read/write/touch)
-   __u8 res- reserved
+   __u8 status - error indication from kernel
__u16 len   - size of data attached to this header data
union {
__u8 id[8];  - slave unique device id
@@ -44,10 +44,14 @@ Protocol.
__u8 cmd- command opcode.
W1_CMD_READ - read command
W1_CMD_WRITE- write command
-   W1_CMD_TOUCH- touch command
-   (write and sample data back to userspace)
W1_CMD_SEARCH   - search command
W1_CMD_ALARM_SEARCH - alarm search command
+   W1_CMD_TOUCH- touch command
+   (write and sample data back to userspace)
+   W1_CMD_RESET- send bus reset
+   W1_CMD_SLAVE_ADD- add slave to kernel list
+   W1_CMD_SLAVE_REMOVE - remove slave from kernel list
+   W1_CMD_LIST_SLAVES  - get slaves list from kernel
__u8 res- reserved
__u16 len   - length of data for this command
For read command data must be allocated like for write command
@@ -87,8 +91,7 @@ format:
id0 ... idN
 
Each message is at most 4k in size, so if number of master devices
-   exceeds this, it will be split into several messages,
-   cn.seq will be increased for each one.
+   exceeds this, it will be split into several messages.
 
 W1 search and alarm search commands.
 request:
diff --git a/drivers/w1/w1.h b/drivers/w1/w1.h
index 734dab7..56a49ba 100644
--- a/drivers/w1/w1.h
+++ b/drivers/w1/w1.h
@@ -203,7 +203,6 @@ enum w1_master_flags {
  * @search_id: allows continuing a search
  * @refcnt:reference count

[PATCH 1/3] w1: fix netlink refcnt leak on error path

2014-04-08 Thread David Fries

If the message type is W1_MASTER_CMD or W1_SLAVE_CMD, then a reference
is taken when searching for the slave or master device.  If there
isn't any following data m->len (mlen is a copy) is 0 and packing up
the message for later execution is skipped leaving nothing to
decrement the reference counts.

Way back when, m->len was checked before the search that increments the
reference count, but W1_LIST_MASTERS has no additional data, the check
was moved in 9be62e0b2fadaf5ff causing this bug.

This change reorders to put the check before the reference count is
incremented avoiding the problem.

Signed-off-by: David Fries 
Acked-by: Evgeniy Polyakov 
---
 drivers/w1/w1_netlink.c |   44 ++--
 1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/drivers/w1/w1_netlink.c b/drivers/w1/w1_netlink.c
index 5234964..a02704a 100644
--- a/drivers/w1/w1_netlink.c
+++ b/drivers/w1/w1_netlink.c
@@ -300,12 +300,6 @@ static int w1_process_command_root(struct cn_msg *msg,
struct w1_netlink_msg *w;
u32 *id;
 
-   if (mcmd->type != W1_LIST_MASTERS) {
-   printk(KERN_NOTICE "%s: msg: %x.%x, wrong type: %u, len: %u.\n",
-   __func__, msg->id.idx, msg->id.val, mcmd->type, 
mcmd->len);
-   return -EPROTO;
-   }
-
cn = kmalloc(PAGE_SIZE, GFP_KERNEL);
if (!cn)
return -ENOMEM;
@@ -441,6 +435,9 @@ static void w1_process_cb(struct w1_master *dev, struct 
w1_async_cmd *async_cmd)
w1_netlink_send_error(>block->msg, node->m, cmd,
node->block->portid, err);
 
+   /* ref taken in w1_search_slave or w1_search_master_id when building
+* the block
+*/
if (sl)
w1_unref_slave(sl);
else
@@ -503,30 +500,42 @@ static void w1_cn_callback(struct cn_msg *msg, struct 
netlink_skb_parms *nsp)
 
msg_len = msg->len;
while (msg_len && !err) {
-   struct w1_reg_num id;
-   u16 mlen = m->len;
 
dev = NULL;
sl = NULL;
 
-   memcpy(, m->id.id, sizeof(id));
-#if 0
-   printk("%s: %02x.%012llx.%02x: type=%02x, len=%u.\n",
-   __func__, id.family, (unsigned long long)id.id, 
id.crc, m->type, m->len);
-#endif
if (m->len + sizeof(struct w1_netlink_msg) > msg_len) {
err = -E2BIG;
break;
}
 
+   /* execute on this thread, no need to process later */
+   if (m->type == W1_LIST_MASTERS) {
+   err = w1_process_command_root(msg, m, nsp->portid);
+   goto out_cont;
+   }
+
+   /* All following message types require additional data,
+* check here before references are taken.
+*/
+   if (!m->len) {
+   err = -EPROTO;
+   goto out_cont;
+   }
+
+   /* both search calls take reference counts */
if (m->type == W1_MASTER_CMD) {
dev = w1_search_master_id(m->id.mst.id);
} else if (m->type == W1_SLAVE_CMD) {
-   sl = w1_search_slave();
+   sl = w1_search_slave((struct w1_reg_num *)m->id.id);
if (sl)
dev = sl->master;
} else {
-   err = w1_process_command_root(msg, m, nsp->portid);
+   printk(KERN_NOTICE
+   "%s: msg: %x.%x, wrong type: %u, len: %u.\n",
+   __func__, msg->id.idx, msg->id.val,
+   m->type, m->len);
+   err = -EPROTO;
goto out_cont;
}
 
@@ -536,8 +545,6 @@ static void w1_cn_callback(struct cn_msg *msg, struct 
netlink_skb_parms *nsp)
}
 
err = 0;
-   if (!mlen)
-   goto out_cont;
 
atomic_inc(>refcnt);
node->async.cb = w1_process_cb;
@@ -557,7 +564,8 @@ out_cont:
if (err)
w1_netlink_send_error(msg, m, NULL, nsp->portid, err);
msg_len -= sizeof(struct w1_netlink_msg) + m->len;
-   m = (struct w1_netlink_msg *)(((u8 *)m) + sizeof(struct 
w1_netlink_msg) + m->len);
+   m = (struct w1_netlink_msg *)(((u8 *)m) +
+   sizeof(struct w1_netlink_msg) + m->len);
 
/*
 * Let's allow requests for nonexisting devices.
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/3] connector: allow multiple messages to be sent in one packet

2014-04-08 Thread David Fries

This increases the amount of bundling to reduce the number of packets
sent.  For the one wire use there can be multiple struct
w1_netlink_cmd in a struct w1_netlink_msg and multiple of those in
struct cn_msg, and with this change multiple of those in a struct
nlmsghdr, and at each level the len identifies there being multiple of
the next.

Signed-off-by: David Fries 
Acked-by: Evgeniy Polyakov 
---
 Documentation/connector/connector.txt |   13 ++---
 drivers/connector/connector.c |   17 +++--
 include/linux/connector.h |1 +
 3 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/Documentation/connector/connector.txt 
b/Documentation/connector/connector.txt
index e5c5f5e..e56abdb 100644
--- a/Documentation/connector/connector.txt
+++ b/Documentation/connector/connector.txt
@@ -24,7 +24,8 @@ netlink based networking for inter-process communication in a 
significantly
 easier way:
 
 int cn_add_callback(struct cb_id *id, char *name, void (*callback) (struct 
cn_msg *, struct netlink_skb_parms *));
-void cn_netlink_send(struct cn_msg *msg, u32 __group, int gfp_mask);
+void cn_netlink_send_multi(struct cn_msg *msg, u16 len, u32 portid, u32 
__group, int gfp_mask);
+void cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group, int 
gfp_mask);
 
 struct cb_id
 {
@@ -71,15 +72,21 @@ void cn_del_callback(struct cb_id *id);
  struct cb_id *id  - unique connector's user identifier.
 
 
-int cn_netlink_send(struct cn_msg *msg, u32 __groups, int gfp_mask);
+int cn_netlink_send_multi(struct cn_msg *msg, u16 len, u32 portid, u32 
__groups, int gfp_mask);
+int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __groups, int 
gfp_mask);
 
  Sends message to the specified groups.  It can be safely called from
  softirq context, but may silently fail under strong memory pressure.
  If there are no listeners for given group -ESRCH can be returned.
 
  struct cn_msg *   - message header(with attached data).
+ u16 len   - for *_multi multiple cn_msg messages can be 
sent
+ u32 port  - destination port.
+ If non-zero the message will be sent to the
+ given port, which should be set to the
+ original sender.
  u32 __group   - destination group.
- If __group is zero, then appropriate group 
will
+ If port and __group is zero, then appropriate 
group will
  be searched through all registered connector 
users,
  and message will be delivered to the group 
which was
  created for user with the same ID as in msg.
diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c
index b14f1d3..f612d68 100644
--- a/drivers/connector/connector.c
+++ b/drivers/connector/connector.c
@@ -43,6 +43,8 @@ static struct cn_dev cdev;
 static int cn_already_initialized;
 
 /*
+ * Sends mult (multiple) cn_msg at a time.
+ *
  * msg->seq and msg->ack are used to determine message genealogy.
  * When someone sends message it puts there locally unique sequence
  * and random acknowledge numbers.  Sequence number may be copied into
@@ -62,10 +64,13 @@ static int cn_already_initialized;
  * the acknowledgement number in the original message + 1, then it is
  * a new message.
  *
+ * If msg->len != len, then additional cn_msg messages are expected following
+ * the first msg.
+ *
  * The message is sent to, the portid if given, the group if given, both if
  * both, or if both are zero then the group is looked up and sent there.
  */
-int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group,
+int cn_netlink_send_mult(struct cn_msg *msg, u16 len, u32 portid, u32 __group,
gfp_t gfp_mask)
 {
struct cn_callback_entry *__cbq;
@@ -98,7 +103,7 @@ int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 
__group,
if (!portid && !netlink_has_listeners(dev->nls, group))
return -ESRCH;
 
-   size = sizeof(*msg) + msg->len;
+   size = sizeof(*msg) + len;
 
skb = nlmsg_new(size, gfp_mask);
if (!skb)
@@ -121,6 +126,14 @@ int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 
__group,
 gfp_mask);
return netlink_unicast(dev->nls, skb, portid, !(gfp_mask&__GFP_WAIT));
 }
+EXPORT_SYMBOL_GPL(cn_netlink_send_mult);
+
+/* same as cn_netlink_send_mult except msg->len is used for len */
+int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group,
+   gfp_t gfp_mask)
+{
+   return cn_netlink_send_mult(msg, msg->len, portid, __group, gfp_mask);
+}
 EXPORT_SYMBOL_GPL(cn_netlink_send);
 
 /*
diff --git a/include/linux/connector.h b/include/linux/connector.h
index be9c4747..f8fe863 100644
--- a/include/linux/connect

[PATCH 0/3] w1: fixes and bundling replies

2014-04-08 Thread David Fries

A program can bundle a sequence of commands in one netlink packet but
the kernel was having to reply in many different packets, this adds a
flag W1_CN_BUNDLE in the cn_msg.flags to allow the kernel to bundle
the replies in to one message.  This is opt in to avoid breaking
programs that aren't expecting additional messages.

Netlink connector now has a new call cn_netlink_send_multi, which
allow sending multiple cn_msg structures in a nlmsghdr structure.

I tested with my client program that will bundle up 14 temperature
sensor conversions, then after a delay, bundle up another set of
commands to read all 14 with the bundle bit set.  I also tested with a
two year old version of the software that sends requests two one slave
at a time (bundling only the write/read to get the data out), and
doesn't have code to read the bundling the this patch adds.  Both
operate correctly running at the same time.

 Documentation/connector/connector.txt |   15 +-
 Documentation/w1/w1.generic   |2 +-
 Documentation/w1/w1.netlink   |   13 +-
 drivers/connector/connector.c |   17 +-
 drivers/w1/w1.h   |8 -
 drivers/w1/w1_netlink.c   |  673 -
 drivers/w1/w1_netlink.h   |   36 ++
 include/linux/connector.h |1 +
 8 files changed, 489 insertions(+), 276 deletions(-)

[PATCH 1/3] w1: fix netlink refcnt leak on error path
[PATCH 2/3] connector: allow multiple messages to be sent in one
[PATCH 3/3] w1: optional bundling of netlink kernel replies
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/3] w1: fixes and bundling replies

2014-04-08 Thread David Fries

A program can bundle a sequence of commands in one netlink packet but
the kernel was having to reply in many different packets, this adds a
flag W1_CN_BUNDLE in the cn_msg.flags to allow the kernel to bundle
the replies in to one message.  This is opt in to avoid breaking
programs that aren't expecting additional messages.

Netlink connector now has a new call cn_netlink_send_multi, which
allow sending multiple cn_msg structures in a nlmsghdr structure.

I tested with my client program that will bundle up 14 temperature
sensor conversions, then after a delay, bundle up another set of
commands to read all 14 with the bundle bit set.  I also tested with a
two year old version of the software that sends requests two one slave
at a time (bundling only the write/read to get the data out), and
doesn't have code to read the bundling the this patch adds.  Both
operate correctly running at the same time.

 Documentation/connector/connector.txt |   15 +-
 Documentation/w1/w1.generic   |2 +-
 Documentation/w1/w1.netlink   |   13 +-
 drivers/connector/connector.c |   17 +-
 drivers/w1/w1.h   |8 -
 drivers/w1/w1_netlink.c   |  673 -
 drivers/w1/w1_netlink.h   |   36 ++
 include/linux/connector.h |1 +
 8 files changed, 489 insertions(+), 276 deletions(-)

[PATCH 1/3] w1: fix netlink refcnt leak on error path
[PATCH 2/3] connector: allow multiple messages to be sent in one
[PATCH 3/3] w1: optional bundling of netlink kernel replies
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/3] w1: optional bundling of netlink kernel replies

2014-04-08 Thread David Fries

Applications can submit a set of commands in one packet to the kernel,
and in some cases it is required such as reading the temperature
sensor results.  This adds an option W1_CN_BUNDLE to the flags of
cn_msg to request the kernel to reply in one packet for efficiency.

The cn_msg flags now check for unknown flag values and return an error
if one is seen.  See Proper handling of unknown flags in system
calls http://lwn.net/Articles/588444/

This corrects the ack values returned as per the protocol standard,
namely the original ack for status messages and seq + 1 for all others
such as the data returned from a read.

Some of the common variable names have been standardized as follows.
struct cn_msg *cn
struct w1_netlink_msg *msg
struct w1_netlink_cmd *cmd
struct w1_master *dev

When an argument and a function scope variable would collide, add req_
to the argument.

Signed-off-by: David Fries da...@fries.net
Acked-by: Evgeniy Polyakov z...@ioremap.net
---
 Documentation/connector/connector.txt |2 +-
 Documentation/w1/w1.generic   |2 +-
 Documentation/w1/w1.netlink   |   13 +-
 drivers/w1/w1.h   |8 -
 drivers/w1/w1_netlink.c   |  649 -
 drivers/w1/w1_netlink.h   |   36 ++
 6 files changed, 447 insertions(+), 263 deletions(-)

diff --git a/Documentation/connector/connector.txt 
b/Documentation/connector/connector.txt
index e56abdb..f6215f9 100644
--- a/Documentation/connector/connector.txt
+++ b/Documentation/connector/connector.txt
@@ -118,7 +118,7 @@ acknowledge number MUST be the same + 1.
 If we receive a message and its sequence number is not equal to one we
 are expecting, then it is a new message.  If we receive a message and
 its sequence number is the same as one we are expecting, but its
-acknowledge is not equal to the acknowledge number in the original
+acknowledge is not equal to the sequence number in the original
 message + 1, then it is a new message.
 
 Obviously, the protocol header contains the above id.
diff --git a/Documentation/w1/w1.generic b/Documentation/w1/w1.generic
index a31c5a2..b2033c6 100644
--- a/Documentation/w1/w1.generic
+++ b/Documentation/w1/w1.generic
@@ -82,7 +82,7 @@ driver - (standard) symlink to the w1 driver
 w1_master_add  - Manually register a slave device
 w1_master_attempts - the number of times a search was attempted
 w1_master_max_slave_count
-   - the maximum slaves that may be attached to a master
+   - maximum number of slaves to search for at a time
 w1_master_name - the name of the device (w1_bus_masterX)
 w1_master_pullup   - 5V strong pullup 0 enabled, 1 disabled
 w1_master_remove   - Manually remove a slave device
diff --git a/Documentation/w1/w1.netlink b/Documentation/w1/w1.netlink
index 927a52c..ef27271 100644
--- a/Documentation/w1/w1.netlink
+++ b/Documentation/w1/w1.netlink
@@ -30,7 +30,7 @@ Protocol.
W1_SLAVE_CMD
userspace command for slave device
(read/write/touch)
-   __u8 res- reserved
+   __u8 status - error indication from kernel
__u16 len   - size of data attached to this header data
union {
__u8 id[8];  - slave unique device id
@@ -44,10 +44,14 @@ Protocol.
__u8 cmd- command opcode.
W1_CMD_READ - read command
W1_CMD_WRITE- write command
-   W1_CMD_TOUCH- touch command
-   (write and sample data back to userspace)
W1_CMD_SEARCH   - search command
W1_CMD_ALARM_SEARCH - alarm search command
+   W1_CMD_TOUCH- touch command
+   (write and sample data back to userspace)
+   W1_CMD_RESET- send bus reset
+   W1_CMD_SLAVE_ADD- add slave to kernel list
+   W1_CMD_SLAVE_REMOVE - remove slave from kernel list
+   W1_CMD_LIST_SLAVES  - get slaves list from kernel
__u8 res- reserved
__u16 len   - length of data for this command
For read command data must be allocated like for write command
@@ -87,8 +91,7 @@ format:
id0 ... idN
 
Each message is at most 4k in size, so if number of master devices
-   exceeds this, it will be split into several messages,
-   cn.seq will be increased for each one.
+   exceeds this, it will be split into several messages.
 
 W1 search and alarm search commands.
 request:
diff --git a/drivers/w1/w1.h b/drivers/w1/w1.h
index 734dab7..56a49ba 100644
--- a/drivers/w1/w1.h
+++ b/drivers/w1/w1.h
@@ -203,7 +203,6 @@ enum w1_master_flags {
  * @search_id: allows continuing a search
  * @refcnt:reference count

[PATCH 1/3] w1: fix netlink refcnt leak on error path

2014-04-08 Thread David Fries

If the message type is W1_MASTER_CMD or W1_SLAVE_CMD, then a reference
is taken when searching for the slave or master device.  If there
isn't any following data m-len (mlen is a copy) is 0 and packing up
the message for later execution is skipped leaving nothing to
decrement the reference counts.

Way back when, m-len was checked before the search that increments the
reference count, but W1_LIST_MASTERS has no additional data, the check
was moved in 9be62e0b2fadaf5ff causing this bug.

This change reorders to put the check before the reference count is
incremented avoiding the problem.

Signed-off-by: David Fries da...@fries.net
Acked-by: Evgeniy Polyakov z...@ioremap.net
---
 drivers/w1/w1_netlink.c |   44 ++--
 1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/drivers/w1/w1_netlink.c b/drivers/w1/w1_netlink.c
index 5234964..a02704a 100644
--- a/drivers/w1/w1_netlink.c
+++ b/drivers/w1/w1_netlink.c
@@ -300,12 +300,6 @@ static int w1_process_command_root(struct cn_msg *msg,
struct w1_netlink_msg *w;
u32 *id;
 
-   if (mcmd-type != W1_LIST_MASTERS) {
-   printk(KERN_NOTICE %s: msg: %x.%x, wrong type: %u, len: %u.\n,
-   __func__, msg-id.idx, msg-id.val, mcmd-type, 
mcmd-len);
-   return -EPROTO;
-   }
-
cn = kmalloc(PAGE_SIZE, GFP_KERNEL);
if (!cn)
return -ENOMEM;
@@ -441,6 +435,9 @@ static void w1_process_cb(struct w1_master *dev, struct 
w1_async_cmd *async_cmd)
w1_netlink_send_error(node-block-msg, node-m, cmd,
node-block-portid, err);
 
+   /* ref taken in w1_search_slave or w1_search_master_id when building
+* the block
+*/
if (sl)
w1_unref_slave(sl);
else
@@ -503,30 +500,42 @@ static void w1_cn_callback(struct cn_msg *msg, struct 
netlink_skb_parms *nsp)
 
msg_len = msg-len;
while (msg_len  !err) {
-   struct w1_reg_num id;
-   u16 mlen = m-len;
 
dev = NULL;
sl = NULL;
 
-   memcpy(id, m-id.id, sizeof(id));
-#if 0
-   printk(%s: %02x.%012llx.%02x: type=%02x, len=%u.\n,
-   __func__, id.family, (unsigned long long)id.id, 
id.crc, m-type, m-len);
-#endif
if (m-len + sizeof(struct w1_netlink_msg)  msg_len) {
err = -E2BIG;
break;
}
 
+   /* execute on this thread, no need to process later */
+   if (m-type == W1_LIST_MASTERS) {
+   err = w1_process_command_root(msg, m, nsp-portid);
+   goto out_cont;
+   }
+
+   /* All following message types require additional data,
+* check here before references are taken.
+*/
+   if (!m-len) {
+   err = -EPROTO;
+   goto out_cont;
+   }
+
+   /* both search calls take reference counts */
if (m-type == W1_MASTER_CMD) {
dev = w1_search_master_id(m-id.mst.id);
} else if (m-type == W1_SLAVE_CMD) {
-   sl = w1_search_slave(id);
+   sl = w1_search_slave((struct w1_reg_num *)m-id.id);
if (sl)
dev = sl-master;
} else {
-   err = w1_process_command_root(msg, m, nsp-portid);
+   printk(KERN_NOTICE
+   %s: msg: %x.%x, wrong type: %u, len: %u.\n,
+   __func__, msg-id.idx, msg-id.val,
+   m-type, m-len);
+   err = -EPROTO;
goto out_cont;
}
 
@@ -536,8 +545,6 @@ static void w1_cn_callback(struct cn_msg *msg, struct 
netlink_skb_parms *nsp)
}
 
err = 0;
-   if (!mlen)
-   goto out_cont;
 
atomic_inc(block-refcnt);
node-async.cb = w1_process_cb;
@@ -557,7 +564,8 @@ out_cont:
if (err)
w1_netlink_send_error(msg, m, NULL, nsp-portid, err);
msg_len -= sizeof(struct w1_netlink_msg) + m-len;
-   m = (struct w1_netlink_msg *)(((u8 *)m) + sizeof(struct 
w1_netlink_msg) + m-len);
+   m = (struct w1_netlink_msg *)(((u8 *)m) +
+   sizeof(struct w1_netlink_msg) + m-len);
 
/*
 * Let's allow requests for nonexisting devices.
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/3] connector: allow multiple messages to be sent in one packet

2014-04-08 Thread David Fries

This increases the amount of bundling to reduce the number of packets
sent.  For the one wire use there can be multiple struct
w1_netlink_cmd in a struct w1_netlink_msg and multiple of those in
struct cn_msg, and with this change multiple of those in a struct
nlmsghdr, and at each level the len identifies there being multiple of
the next.

Signed-off-by: David Fries da...@fries.net
Acked-by: Evgeniy Polyakov z...@ioremap.net
---
 Documentation/connector/connector.txt |   13 ++---
 drivers/connector/connector.c |   17 +++--
 include/linux/connector.h |1 +
 3 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/Documentation/connector/connector.txt 
b/Documentation/connector/connector.txt
index e5c5f5e..e56abdb 100644
--- a/Documentation/connector/connector.txt
+++ b/Documentation/connector/connector.txt
@@ -24,7 +24,8 @@ netlink based networking for inter-process communication in a 
significantly
 easier way:
 
 int cn_add_callback(struct cb_id *id, char *name, void (*callback) (struct 
cn_msg *, struct netlink_skb_parms *));
-void cn_netlink_send(struct cn_msg *msg, u32 __group, int gfp_mask);
+void cn_netlink_send_multi(struct cn_msg *msg, u16 len, u32 portid, u32 
__group, int gfp_mask);
+void cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group, int 
gfp_mask);
 
 struct cb_id
 {
@@ -71,15 +72,21 @@ void cn_del_callback(struct cb_id *id);
  struct cb_id *id  - unique connector's user identifier.
 
 
-int cn_netlink_send(struct cn_msg *msg, u32 __groups, int gfp_mask);
+int cn_netlink_send_multi(struct cn_msg *msg, u16 len, u32 portid, u32 
__groups, int gfp_mask);
+int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __groups, int 
gfp_mask);
 
  Sends message to the specified groups.  It can be safely called from
  softirq context, but may silently fail under strong memory pressure.
  If there are no listeners for given group -ESRCH can be returned.
 
  struct cn_msg *   - message header(with attached data).
+ u16 len   - for *_multi multiple cn_msg messages can be 
sent
+ u32 port  - destination port.
+ If non-zero the message will be sent to the
+ given port, which should be set to the
+ original sender.
  u32 __group   - destination group.
- If __group is zero, then appropriate group 
will
+ If port and __group is zero, then appropriate 
group will
  be searched through all registered connector 
users,
  and message will be delivered to the group 
which was
  created for user with the same ID as in msg.
diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c
index b14f1d3..f612d68 100644
--- a/drivers/connector/connector.c
+++ b/drivers/connector/connector.c
@@ -43,6 +43,8 @@ static struct cn_dev cdev;
 static int cn_already_initialized;
 
 /*
+ * Sends mult (multiple) cn_msg at a time.
+ *
  * msg-seq and msg-ack are used to determine message genealogy.
  * When someone sends message it puts there locally unique sequence
  * and random acknowledge numbers.  Sequence number may be copied into
@@ -62,10 +64,13 @@ static int cn_already_initialized;
  * the acknowledgement number in the original message + 1, then it is
  * a new message.
  *
+ * If msg-len != len, then additional cn_msg messages are expected following
+ * the first msg.
+ *
  * The message is sent to, the portid if given, the group if given, both if
  * both, or if both are zero then the group is looked up and sent there.
  */
-int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group,
+int cn_netlink_send_mult(struct cn_msg *msg, u16 len, u32 portid, u32 __group,
gfp_t gfp_mask)
 {
struct cn_callback_entry *__cbq;
@@ -98,7 +103,7 @@ int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 
__group,
if (!portid  !netlink_has_listeners(dev-nls, group))
return -ESRCH;
 
-   size = sizeof(*msg) + msg-len;
+   size = sizeof(*msg) + len;
 
skb = nlmsg_new(size, gfp_mask);
if (!skb)
@@ -121,6 +126,14 @@ int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 
__group,
 gfp_mask);
return netlink_unicast(dev-nls, skb, portid, !(gfp_mask__GFP_WAIT));
 }
+EXPORT_SYMBOL_GPL(cn_netlink_send_mult);
+
+/* same as cn_netlink_send_mult except msg-len is used for len */
+int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group,
+   gfp_t gfp_mask)
+{
+   return cn_netlink_send_mult(msg, msg-len, portid, __group, gfp_mask);
+}
 EXPORT_SYMBOL_GPL(cn_netlink_send);
 
 /*
diff --git a/include/linux/connector.h b/include/linux/connector.h
index be9c4747..f8fe863 100644
--- a/include/linux/connector.h
+++ b/include

Re: [PATCH 1/3] w1: fix netlink refcnt leak on error path

2014-04-08 Thread David Fries

This patch is a bug fix, and I see from the mailing list I'm not the
only one to run into this bug, so it would be nice for this patch to
make it into this merge window.  I didn't tag this one for stable
because it doesn't apply cleanly due to previous changes that did make
it into the merge window.  Let me know if I should rewrite it for
stable and which kernel version.

The other two patches are more feature based changes.

On Tue, Apr 08, 2014 at 10:37:07PM -0500, David Fries wrote:
 If the message type is W1_MASTER_CMD or W1_SLAVE_CMD, then a reference
 is taken when searching for the slave or master device.  If there
 isn't any following data m-len (mlen is a copy) is 0 and packing up
 the message for later execution is skipped leaving nothing to
 decrement the reference counts.
 
 Way back when, m-len was checked before the search that increments the
 reference count, but W1_LIST_MASTERS has no additional data, the check
 was moved in 9be62e0b2fadaf5ff causing this bug.
 
 This change reorders to put the check before the reference count is
 incremented avoiding the problem.
 
 Signed-off-by: David Fries da...@fries.net
 Acked-by: Evgeniy Polyakov z...@ioremap.net

-- 
David Fries da...@fries.netPGP pub CB1EE8F0
http://fries.net/~david/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] w1: fixes and bundling replies

2014-04-07 Thread David Fries

Evgeniy,
Could you review this set of patches in this thread?  Thanks.

On Sat, Mar 22, 2014 at 08:27:44PM -0500, David Fries wrote:
> On Fri, Feb 21, 2014 at 01:07:28AM +0400, Evgeniy Polyakov wrote:
> > Your approach and patch seem correct, but I worry about how old
> > commands are processed.  Do I get it right, that replies to old
> > non-bundle commands are queued back instead of sending immediately,
> > but since it is not bundle but single command, queued reply is being
> > sent 'almost' immediately?  Basically, nothing changed to old
> > clients, but there is new way to send batch requests now?
> 
> Yes, with the previous patch set, if a client sent one command per
> message they would get the replies in individual packets.  However in
> some cases clients had to bundle multiple commands in one message.
> For instance when using a slave command to read a temperature sensor
> it takes one W1_CMD_WRITE command to send a W1_READ_SCRATCHPAD, then
> another W1_CMD_READ command to read the data back and they have to be
> sent in the same W1_SLAVE_CMD because there can't be a reset between
> the two write and read, and you can't do that with a slave command.
> You can with master commands, but if you are going to issue individual
> reset, write, and read commands why would you not bundle them?
> 
> Here is a new set of patches making bundling opt in.
> 
> In this version of the patch, W1_CN_BUNDLE set in the cn_msg.flags
> enables the kernel to bundle the reply, otherwise replies aren't
> bundled.
> 
> The previous patch separated the data replies and status replies
> because they have a different ack value which is in the cn_msg and
> cn_netlink_send could only send one cn_msg in a call.  I didn't much
> like that solution because now status and data replies were out of
> order.  This version creates cn_netlink_send_mult which takes a length
> which allows multiple cn_msg structures to be sent at once (inside one
> nlmsghdr, so clients can see there are more cn_msg structures to
> read), so now status and data messages can be sent in order.
> 
> This is somewhat suboptimal as each command has a status reply and
> possibly a data reply, each requiring a different ack, so even if a
> client sent multiple commands in one w1_netlink_msg like,
> 
> cn_msg [ack  ], w1_netlink_msg, w1_netlink_cmd [read], w1_netlink_cmd [read]
> 
> the response can't pack the commands into one w1_netlink_msg because
> of the ack, so there's duplicate of cn_msg and w1_netlink_msg
> structures,
> 
> cn_msg [seq+1], w1_netlink_msg, w1_netlink_cmd [data]
> cn_msg [ack  ], w1_netlink_msg, w1_netlink_cmd [status]
> cn_msg [seq+1], w1_netlink_msg, w1_netlink_cmd [data]
> cn_msg [ack  ], w1_netlink_msg, w1_netlink_cmd [status]
> 
> cn_msg is 20 bytes, w1_netlink_msg 12, so compared to sending separate
> packets with all the context switches, select overhead and such,
> bundling is going to be well worth it.
> 
> Another alternative could be one cn_msg [seq+1] with the data, and
> followed by cn_msg [ack ] with the status messages, but they are back
> to out of order.
> 
> I wasn't completely sure what to call the new sending function, I
> decided on mult for multiple cn_netlink_send_mult, cn_netlink_send_len
> as another idea, or if there are any better ideas let me know.
> 
> I tested with my client program that will bundle up 14 temperature
> sensor conversions, then after a delay, bundle up another set of
> commands to read all 14 with the bundle bit set.  I also tested with a
> two year old version of the software that sends requests two one slave
> at a time (bundling only the write/read to get the data out), and
> doesn't have code to read the bundling the this patch adds.  Both
> operate correctly running at the same time.
> 
> Posted before, no changes
> 
> connector support for sending multiple cn_msg
> 
> bundling,
> corrects ack value to previous ack or seq + 1
> corrects the variables to be name consistently
> 
>  Documentation/connector/connector.txt |   15 +-
>  Documentation/w1/w1.generic   |2 +-
>  Documentation/w1/w1.netlink   |   13 +-
>  drivers/connector/connector.c |   17 +-
>  drivers/w1/w1.h   |8 -
>  drivers/w1/w1_netlink.c   |  673 
> -
>  drivers/w1/w1_netlink.h   |   36 ++
>  include/linux/connector.h |1 +
>  8 files changed, 489 insertions(+), 276 deletions(-)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at

Re: [RFC] w1: fixes and bundling replies

2014-04-07 Thread David Fries

Evgeniy,
Could you review this set of patches in this thread?  Thanks.

On Sat, Mar 22, 2014 at 08:27:44PM -0500, David Fries wrote:
 On Fri, Feb 21, 2014 at 01:07:28AM +0400, Evgeniy Polyakov wrote:
  Your approach and patch seem correct, but I worry about how old
  commands are processed.  Do I get it right, that replies to old
  non-bundle commands are queued back instead of sending immediately,
  but since it is not bundle but single command, queued reply is being
  sent 'almost' immediately?  Basically, nothing changed to old
  clients, but there is new way to send batch requests now?
 
 Yes, with the previous patch set, if a client sent one command per
 message they would get the replies in individual packets.  However in
 some cases clients had to bundle multiple commands in one message.
 For instance when using a slave command to read a temperature sensor
 it takes one W1_CMD_WRITE command to send a W1_READ_SCRATCHPAD, then
 another W1_CMD_READ command to read the data back and they have to be
 sent in the same W1_SLAVE_CMD because there can't be a reset between
 the two write and read, and you can't do that with a slave command.
 You can with master commands, but if you are going to issue individual
 reset, write, and read commands why would you not bundle them?
 
 Here is a new set of patches making bundling opt in.
 
 In this version of the patch, W1_CN_BUNDLE set in the cn_msg.flags
 enables the kernel to bundle the reply, otherwise replies aren't
 bundled.
 
 The previous patch separated the data replies and status replies
 because they have a different ack value which is in the cn_msg and
 cn_netlink_send could only send one cn_msg in a call.  I didn't much
 like that solution because now status and data replies were out of
 order.  This version creates cn_netlink_send_mult which takes a length
 which allows multiple cn_msg structures to be sent at once (inside one
 nlmsghdr, so clients can see there are more cn_msg structures to
 read), so now status and data messages can be sent in order.
 
 This is somewhat suboptimal as each command has a status reply and
 possibly a data reply, each requiring a different ack, so even if a
 client sent multiple commands in one w1_netlink_msg like,
 
 cn_msg [ack  ], w1_netlink_msg, w1_netlink_cmd [read], w1_netlink_cmd [read]
 
 the response can't pack the commands into one w1_netlink_msg because
 of the ack, so there's duplicate of cn_msg and w1_netlink_msg
 structures,
 
 cn_msg [seq+1], w1_netlink_msg, w1_netlink_cmd [data]
 cn_msg [ack  ], w1_netlink_msg, w1_netlink_cmd [status]
 cn_msg [seq+1], w1_netlink_msg, w1_netlink_cmd [data]
 cn_msg [ack  ], w1_netlink_msg, w1_netlink_cmd [status]
 
 cn_msg is 20 bytes, w1_netlink_msg 12, so compared to sending separate
 packets with all the context switches, select overhead and such,
 bundling is going to be well worth it.
 
 Another alternative could be one cn_msg [seq+1] with the data, and
 followed by cn_msg [ack ] with the status messages, but they are back
 to out of order.
 
 I wasn't completely sure what to call the new sending function, I
 decided on mult for multiple cn_netlink_send_mult, cn_netlink_send_len
 as another idea, or if there are any better ideas let me know.
 
 I tested with my client program that will bundle up 14 temperature
 sensor conversions, then after a delay, bundle up another set of
 commands to read all 14 with the bundle bit set.  I also tested with a
 two year old version of the software that sends requests two one slave
 at a time (bundling only the write/read to get the data out), and
 doesn't have code to read the bundling the this patch adds.  Both
 operate correctly running at the same time.
 
 Posted before, no changes
 
 connector support for sending multiple cn_msg
 
 bundling,
 corrects ack value to previous ack or seq + 1
 corrects the variables to be name consistently
 
  Documentation/connector/connector.txt |   15 +-
  Documentation/w1/w1.generic   |2 +-
  Documentation/w1/w1.netlink   |   13 +-
  drivers/connector/connector.c |   17 +-
  drivers/w1/w1.h   |8 -
  drivers/w1/w1_netlink.c   |  673 
 -
  drivers/w1/w1_netlink.h   |   36 ++
  include/linux/connector.h |1 +
  8 files changed, 489 insertions(+), 276 deletions(-)
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

-- 
David Fries da...@fries.netPGP pub CB1EE8F0
http://fries.net/~david/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: w1: 3.14-rc7 - possible recursive locking detected

2014-03-23 Thread David Fries

On Mon, Mar 17, 2014 at 10:38:13PM +0100, Belisko Marek wrote:
> Hi,
> 
> booting latest 3.14-rc7 on gta04 board gives following warning:
> 
> [3.101409] Driver for 1-wire Dallas network protocol.
> [3.109649] omap_hdq omap_hdq: OMAP HDQ Hardware Rev 0.5. Driver in
> [3.400146]CPU0
> [3.402709]
> [3.405273]   lock(&(>bus_notifier)->rwsem);
> [3.410308]   lock(&(>bus_notifier)->rwsem);
> [3.415374]
> [3.415374]  *** DEADLOCK ***
> 
> AFAIK we didn't see that on (at least 3/14-rc2).

The last drivers/w1 change went in before v3.14-rc1, so if it is
something in the w1 code, it has either been there or something else
in the rest of the kernel changed to expose it.  I'm using the ds2490
USB w1 dongle, I didn't see that problem when enabling lock debugging
on 3.14.0-rc5+ which includes some changes I'm working on.

https://github.com/dfries/linux.git w1_rework

-- 
David Fries PGP pub CB1EE8F0
http://fries.net/~david/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] w1: Fix refcount leak in netlink connector

2014-03-23 Thread David Fries

I've been doing an overhaul of the w1 netlink system and have a patch
outstanding to address this issue.  It requires patches already
accepted in gregkh char-misc-next.

w1: fix netlink refcnt leak on error path

I avoided the issue by checking the length before taking any
references to avoid adding another goto target.  Actually the changes
in char-misc-next removed one goto target so with my patch series that
routine only has one jump target, much easier to track what's going on
than having three.

Accepted patches (still need to apply the refcnt patch)
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git 
char-misc-next

Here's my current set of patches on top of char-misc-next.
7cf8baf26cdc2c27bcff562e9b2328ac891dc3a5
https://github.com/dfries/linux.git w1_rework

For my setup I have 14 ds18b20 one wire temperature sensors inside and
outside of my house connected to the ds2490 USB dongle that I have a
program talking over netlink to read every 30 seconds.  I'm curious
what's your setup?  I would appreciate it if I could get some testers
on the w1_rework branch and verify it doesn't break anything with
other programs.

The biggest user visible changes from the w1_rework branch are.
w1: new netlink commands, add/remove/list slaves
w1: process w1 netlink commands in w1_process thread
w1: reply only to the requester portid
ds2490 hardware search, much faster search in general, 23 seconds to .3 seconds
w1: fix netlink refcnt leak on error path
w1: optional bundling of netlink kernel replies

With all these changes the program is no longer blocked when issuing
commands because the w1_process thread now executes them.  That's a
pretty big deal if your program has something else to do while waiting
for a reply.  Then there's the optional bundling, so I can now send
one netlink packet which will request a conversion or read a previous
conversion for all temperature sensors and the kernel will reply with
one netlink packet with all the status or results, instead of
something like 56 individual packets.

On Sun, Mar 09, 2014 at 12:08:48AM +0100, Richard Weinberger wrote:
> If userspace sends a w1 message of length 0 we leak
> the refcount.
> 
> Signed-off-by: Richard Weinberger 
> ---
>  drivers/w1/w1_netlink.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/w1/w1_netlink.c b/drivers/w1/w1_netlink.c
> index 40788c9..7131777 100644
> --- a/drivers/w1/w1_netlink.c
> +++ b/drivers/w1/w1_netlink.c
> @@ -355,7 +355,7 @@ static void w1_cn_callback(struct cn_msg *msg, struct 
> netlink_skb_parms *nsp)
>  
>   err = 0;
>   if (!mlen)
> - goto out_cont;
> + goto out_dec;
>  
>   mutex_lock(>mutex);
>  
> @@ -384,10 +384,11 @@ static void w1_cn_callback(struct cn_msg *msg, struct 
> netlink_skb_parms *nsp)
>   mlen -= cmd->len + sizeof(struct w1_netlink_cmd);
>   }
>  out_up:
> + mutex_unlock(>mutex);
> +out_dec:
>   atomic_dec(>refcnt);
>   if (sl)
>   atomic_dec(>refcnt);
> - mutex_unlock(>mutex);
>  out_cont:
>   if (!cmd || err)
>   w1_netlink_send_error(msg, m, cmd, err);
> -- 
> 1.8.4.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
David Fries PGP pub CB1EE8F0
http://fries.net/~david/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] w1: Fix refcount leak in netlink connector

2014-03-23 Thread David Fries

I've been doing an overhaul of the w1 netlink system and have a patch
outstanding to address this issue.  It requires patches already
accepted in gregkh char-misc-next.

w1: fix netlink refcnt leak on error path

I avoided the issue by checking the length before taking any
references to avoid adding another goto target.  Actually the changes
in char-misc-next removed one goto target so with my patch series that
routine only has one jump target, much easier to track what's going on
than having three.

Accepted patches (still need to apply the refcnt patch)
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git 
char-misc-next

Here's my current set of patches on top of char-misc-next.
7cf8baf26cdc2c27bcff562e9b2328ac891dc3a5
https://github.com/dfries/linux.git w1_rework

For my setup I have 14 ds18b20 one wire temperature sensors inside and
outside of my house connected to the ds2490 USB dongle that I have a
program talking over netlink to read every 30 seconds.  I'm curious
what's your setup?  I would appreciate it if I could get some testers
on the w1_rework branch and verify it doesn't break anything with
other programs.

The biggest user visible changes from the w1_rework branch are.
w1: new netlink commands, add/remove/list slaves
w1: process w1 netlink commands in w1_process thread
w1: reply only to the requester portid
ds2490 hardware search, much faster search in general, 23 seconds to .3 seconds
w1: fix netlink refcnt leak on error path
w1: optional bundling of netlink kernel replies

With all these changes the program is no longer blocked when issuing
commands because the w1_process thread now executes them.  That's a
pretty big deal if your program has something else to do while waiting
for a reply.  Then there's the optional bundling, so I can now send
one netlink packet which will request a conversion or read a previous
conversion for all temperature sensors and the kernel will reply with
one netlink packet with all the status or results, instead of
something like 56 individual packets.

On Sun, Mar 09, 2014 at 12:08:48AM +0100, Richard Weinberger wrote:
 If userspace sends a w1 message of length 0 we leak
 the refcount.
 
 Signed-off-by: Richard Weinberger rich...@nod.at
 ---
  drivers/w1/w1_netlink.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/w1/w1_netlink.c b/drivers/w1/w1_netlink.c
 index 40788c9..7131777 100644
 --- a/drivers/w1/w1_netlink.c
 +++ b/drivers/w1/w1_netlink.c
 @@ -355,7 +355,7 @@ static void w1_cn_callback(struct cn_msg *msg, struct 
 netlink_skb_parms *nsp)
  
   err = 0;
   if (!mlen)
 - goto out_cont;
 + goto out_dec;
  
   mutex_lock(dev-mutex);
  
 @@ -384,10 +384,11 @@ static void w1_cn_callback(struct cn_msg *msg, struct 
 netlink_skb_parms *nsp)
   mlen -= cmd-len + sizeof(struct w1_netlink_cmd);
   }
  out_up:
 + mutex_unlock(dev-mutex);
 +out_dec:
   atomic_dec(dev-refcnt);
   if (sl)
   atomic_dec(sl-refcnt);
 - mutex_unlock(dev-mutex);
  out_cont:
   if (!cmd || err)
   w1_netlink_send_error(msg, m, cmd, err);
 -- 
 1.8.4.2
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

-- 
David Fries da...@fries.netPGP pub CB1EE8F0
http://fries.net/~david/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: w1: 3.14-rc7 - possible recursive locking detected

2014-03-23 Thread David Fries

On Mon, Mar 17, 2014 at 10:38:13PM +0100, Belisko Marek wrote:
 Hi,
 
 booting latest 3.14-rc7 on gta04 board gives following warning:
 
 [3.101409] Driver for 1-wire Dallas network protocol.
 [3.109649] omap_hdq omap_hdq: OMAP HDQ Hardware Rev 0.5. Driver in
 [3.400146]CPU0
 [3.402709]
 [3.405273]   lock((priv-bus_notifier)-rwsem);
 [3.410308]   lock((priv-bus_notifier)-rwsem);
 [3.415374]
 [3.415374]  *** DEADLOCK ***
 
 AFAIK we didn't see that on (at least 3/14-rc2).

The last drivers/w1 change went in before v3.14-rc1, so if it is
something in the w1 code, it has either been there or something else
in the rest of the kernel changed to expose it.  I'm using the ds2490
USB w1 dongle, I didn't see that problem when enabling lock debugging
on 3.14.0-rc5+ which includes some changes I'm working on.

https://github.com/dfries/linux.git w1_rework

-- 
David Fries da...@fries.netPGP pub CB1EE8F0
http://fries.net/~david/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/3] w1: optional bundling of netlink kernel replies

2014-03-22 Thread David Fries

Applications can submit a set of commands in one packet to the kernel,
and in some cases it is required such as reading the temperature
sensor results.  This adds an option W1_CN_BUNDLE to the flags of
cn_msg to request the kernel to reply in one packet for efficiency.

The cn_msg flags now check for unknown flag values and return an error
if one is seen.  See "Proper handling of unknown flags in system
calls" http://lwn.net/Articles/588444/

This corrects the ack values returned as per the protocol standard,
namely the original ack for status messages and seq + 1 for all others
such as the data returned from a read.

Signed-off-by: David Fries 

Some of the common variable names have been standardized as follows.
struct cn_msg *cn
struct w1_netlink_msg *msg
struct w1_netlink_cmd *cmd
struct w1_master *dev

When an argument and a function scope variable would collide, add req_
to the argument.
---
 Documentation/connector/connector.txt |2 +-
 Documentation/w1/w1.generic   |2 +-
 Documentation/w1/w1.netlink   |   13 +-
 drivers/w1/w1.h   |8 -
 drivers/w1/w1_netlink.c   |  649 -
 drivers/w1/w1_netlink.h   |   36 ++
 6 files changed, 447 insertions(+), 263 deletions(-)

diff --git a/Documentation/connector/connector.txt 
b/Documentation/connector/connector.txt
index e56abdb..f6215f9 100644
--- a/Documentation/connector/connector.txt
+++ b/Documentation/connector/connector.txt
@@ -118,7 +118,7 @@ acknowledge number MUST be the same + 1.
 If we receive a message and its sequence number is not equal to one we
 are expecting, then it is a new message.  If we receive a message and
 its sequence number is the same as one we are expecting, but its
-acknowledge is not equal to the acknowledge number in the original
+acknowledge is not equal to the sequence number in the original
 message + 1, then it is a new message.
 
 Obviously, the protocol header contains the above id.
diff --git a/Documentation/w1/w1.generic b/Documentation/w1/w1.generic
index a31c5a2..b2033c6 100644
--- a/Documentation/w1/w1.generic
+++ b/Documentation/w1/w1.generic
@@ -82,7 +82,7 @@ driver - (standard) symlink to the w1 driver
 w1_master_add  - Manually register a slave device
 w1_master_attempts - the number of times a search was attempted
 w1_master_max_slave_count
-   - the maximum slaves that may be attached to a master
+   - maximum number of slaves to search for at a time
 w1_master_name - the name of the device (w1_bus_masterX)
 w1_master_pullup   - 5V strong pullup 0 enabled, 1 disabled
 w1_master_remove   - Manually remove a slave device
diff --git a/Documentation/w1/w1.netlink b/Documentation/w1/w1.netlink
index 927a52c..ef27271 100644
--- a/Documentation/w1/w1.netlink
+++ b/Documentation/w1/w1.netlink
@@ -30,7 +30,7 @@ Protocol.
W1_SLAVE_CMD
userspace command for slave device
(read/write/touch)
-   __u8 res- reserved
+   __u8 status - error indication from kernel
__u16 len   - size of data attached to this header data
union {
__u8 id[8];  - slave unique device id
@@ -44,10 +44,14 @@ Protocol.
__u8 cmd- command opcode.
W1_CMD_READ - read command
W1_CMD_WRITE- write command
-   W1_CMD_TOUCH- touch command
-   (write and sample data back to userspace)
W1_CMD_SEARCH   - search command
W1_CMD_ALARM_SEARCH - alarm search command
+   W1_CMD_TOUCH- touch command
+   (write and sample data back to userspace)
+   W1_CMD_RESET- send bus reset
+   W1_CMD_SLAVE_ADD- add slave to kernel list
+   W1_CMD_SLAVE_REMOVE - remove slave from kernel list
+   W1_CMD_LIST_SLAVES  - get slaves list from kernel
__u8 res- reserved
__u16 len   - length of data for this command
For read command data must be allocated like for write command
@@ -87,8 +91,7 @@ format:
id0 ... idN
 
Each message is at most 4k in size, so if number of master devices
-   exceeds this, it will be split into several messages,
-   cn.seq will be increased for each one.
+   exceeds this, it will be split into several messages.
 
 W1 search and alarm search commands.
 request:
diff --git a/drivers/w1/w1.h b/drivers/w1/w1.h
index 734dab7..56a49ba 100644
--- a/drivers/w1/w1.h
+++ b/drivers/w1/w1.h
@@ -203,7 +203,6 @@ enum w1_master_flags {
  * @search_id: allows continuing a search
  * @refcnt:reference count
  * @priv:  private da

[PATCH 2/3] connector: allow multiple messages to be sent in one packet

2014-03-22 Thread David Fries

This increases the amount of bundling to reduce the number of packets
sent.  For the one wire use there can be multiple struct
w1_netlink_cmd in a struct w1_netlink_msg and multiple of those in
struct cn_msg, and with this change multiple of those in a struct
nlmsghdr, and at each level the len identifies there being multiple of
the next.

Signed-off-by: David Fries 
---
 Documentation/connector/connector.txt |   13 ++---
 drivers/connector/connector.c |   17 +++--
 include/linux/connector.h |1 +
 3 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/Documentation/connector/connector.txt 
b/Documentation/connector/connector.txt
index e5c5f5e..e56abdb 100644
--- a/Documentation/connector/connector.txt
+++ b/Documentation/connector/connector.txt
@@ -24,7 +24,8 @@ netlink based networking for inter-process communication in a 
significantly
 easier way:
 
 int cn_add_callback(struct cb_id *id, char *name, void (*callback) (struct 
cn_msg *, struct netlink_skb_parms *));
-void cn_netlink_send(struct cn_msg *msg, u32 __group, int gfp_mask);
+void cn_netlink_send_multi(struct cn_msg *msg, u16 len, u32 portid, u32 
__group, int gfp_mask);
+void cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group, int 
gfp_mask);
 
 struct cb_id
 {
@@ -71,15 +72,21 @@ void cn_del_callback(struct cb_id *id);
  struct cb_id *id  - unique connector's user identifier.
 
 
-int cn_netlink_send(struct cn_msg *msg, u32 __groups, int gfp_mask);
+int cn_netlink_send_multi(struct cn_msg *msg, u16 len, u32 portid, u32 
__groups, int gfp_mask);
+int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __groups, int 
gfp_mask);
 
  Sends message to the specified groups.  It can be safely called from
  softirq context, but may silently fail under strong memory pressure.
  If there are no listeners for given group -ESRCH can be returned.
 
  struct cn_msg *   - message header(with attached data).
+ u16 len   - for *_multi multiple cn_msg messages can be 
sent
+ u32 port  - destination port.
+ If non-zero the message will be sent to the
+ given port, which should be set to the
+ original sender.
  u32 __group   - destination group.
- If __group is zero, then appropriate group 
will
+ If port and __group is zero, then appropriate 
group will
  be searched through all registered connector 
users,
  and message will be delivered to the group 
which was
  created for user with the same ID as in msg.
diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c
index 77afe74..f4b0cdb 100644
--- a/drivers/connector/connector.c
+++ b/drivers/connector/connector.c
@@ -43,6 +43,8 @@ static struct cn_dev cdev;
 static int cn_already_initialized;
 
 /*
+ * Sends mult (multiple) cn_msg at a time.
+ *
  * msg->seq and msg->ack are used to determine message genealogy.
  * When someone sends message it puts there locally unique sequence
  * and random acknowledge numbers.  Sequence number may be copied into
@@ -62,10 +64,13 @@ static int cn_already_initialized;
  * the acknowledgement number in the original message + 1, then it is
  * a new message.
  *
+ * If msg->len != len, then additional cn_msg messages are expected following
+ * the first msg.
+ *
  * The message is sent to, the portid if given, the group if given, both if
  * both, or if both are zero then the group is looked up and sent there.
  */
-int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group,
+int cn_netlink_send_mult(struct cn_msg *msg, u16 len, u32 portid, u32 __group,
gfp_t gfp_mask)
 {
struct cn_callback_entry *__cbq;
@@ -98,7 +103,7 @@ int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 
__group,
if (!portid && !netlink_has_listeners(dev->nls, group))
return -ESRCH;
 
-   size = sizeof(*msg) + msg->len;
+   size = sizeof(*msg) + len;
 
skb = nlmsg_new(size, gfp_mask);
if (!skb)
@@ -121,6 +126,14 @@ int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 
__group,
 gfp_mask);
return netlink_unicast(dev->nls, skb, portid, !(gfp_mask&__GFP_WAIT));
 }
+EXPORT_SYMBOL_GPL(cn_netlink_send_mult);
+
+/* same as cn_netlink_send_mult except msg->len is used for len */
+int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group,
+   gfp_t gfp_mask)
+{
+   return cn_netlink_send_mult(msg, msg->len, portid, __group, gfp_mask);
+}
 EXPORT_SYMBOL_GPL(cn_netlink_send);
 
 /*
diff --git a/include/linux/connector.h b/include/linux/connector.h
index be9c4747..f8fe863 100644
--- a/include/linux/connector.h
+++ b/include/linux/connect

[PATCH 1/3] w1: fix netlink refcnt leak on error path

2014-03-22 Thread David Fries

If the message type is W1_MASTER_CMD or W1_SLAVE_CMD, then a reference
is taken when searching for the slave or master device.  If there
isn't any following data m->len (mlen is a copy) is 0 and packing up
the message for later execution is skipped leaving nothing to
decrement the reference counts.

Way back when, m->len was checked before the search that increments the
reference count, but W1_LIST_MASTERS has no additional data, the check
was moved in 9be62e0b2fadaf5ff causing this bug.

This change reorders to put the check before the reference count is
incremented avoiding the problem.

Signed-off-by: David Fries 
---
 drivers/w1/w1_netlink.c |   44 ++--
 1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/drivers/w1/w1_netlink.c b/drivers/w1/w1_netlink.c
index 5234964..a02704a 100644
--- a/drivers/w1/w1_netlink.c
+++ b/drivers/w1/w1_netlink.c
@@ -300,12 +300,6 @@ static int w1_process_command_root(struct cn_msg *msg,
struct w1_netlink_msg *w;
u32 *id;
 
-   if (mcmd->type != W1_LIST_MASTERS) {
-   printk(KERN_NOTICE "%s: msg: %x.%x, wrong type: %u, len: %u.\n",
-   __func__, msg->id.idx, msg->id.val, mcmd->type, 
mcmd->len);
-   return -EPROTO;
-   }
-
cn = kmalloc(PAGE_SIZE, GFP_KERNEL);
if (!cn)
return -ENOMEM;
@@ -441,6 +435,9 @@ static void w1_process_cb(struct w1_master *dev, struct 
w1_async_cmd *async_cmd)
w1_netlink_send_error(>block->msg, node->m, cmd,
node->block->portid, err);
 
+   /* ref taken in w1_search_slave or w1_search_master_id when building
+* the block
+*/
if (sl)
w1_unref_slave(sl);
else
@@ -503,30 +500,42 @@ static void w1_cn_callback(struct cn_msg *msg, struct 
netlink_skb_parms *nsp)
 
msg_len = msg->len;
while (msg_len && !err) {
-   struct w1_reg_num id;
-   u16 mlen = m->len;
 
dev = NULL;
sl = NULL;
 
-   memcpy(, m->id.id, sizeof(id));
-#if 0
-   printk("%s: %02x.%012llx.%02x: type=%02x, len=%u.\n",
-   __func__, id.family, (unsigned long long)id.id, 
id.crc, m->type, m->len);
-#endif
if (m->len + sizeof(struct w1_netlink_msg) > msg_len) {
err = -E2BIG;
break;
}
 
+   /* execute on this thread, no need to process later */
+   if (m->type == W1_LIST_MASTERS) {
+   err = w1_process_command_root(msg, m, nsp->portid);
+   goto out_cont;
+   }
+
+   /* All following message types require additional data,
+* check here before references are taken.
+*/
+   if (!m->len) {
+   err = -EPROTO;
+   goto out_cont;
+   }
+
+   /* both search calls take reference counts */
if (m->type == W1_MASTER_CMD) {
dev = w1_search_master_id(m->id.mst.id);
} else if (m->type == W1_SLAVE_CMD) {
-   sl = w1_search_slave();
+   sl = w1_search_slave((struct w1_reg_num *)m->id.id);
if (sl)
dev = sl->master;
} else {
-   err = w1_process_command_root(msg, m, nsp->portid);
+   printk(KERN_NOTICE
+   "%s: msg: %x.%x, wrong type: %u, len: %u.\n",
+   __func__, msg->id.idx, msg->id.val,
+   m->type, m->len);
+   err = -EPROTO;
goto out_cont;
}
 
@@ -536,8 +545,6 @@ static void w1_cn_callback(struct cn_msg *msg, struct 
netlink_skb_parms *nsp)
}
 
err = 0;
-   if (!mlen)
-   goto out_cont;
 
atomic_inc(>refcnt);
node->async.cb = w1_process_cb;
@@ -557,7 +564,8 @@ out_cont:
if (err)
w1_netlink_send_error(msg, m, NULL, nsp->portid, err);
msg_len -= sizeof(struct w1_netlink_msg) + m->len;
-   m = (struct w1_netlink_msg *)(((u8 *)m) + sizeof(struct 
w1_netlink_msg) + m->len);
+   m = (struct w1_netlink_msg *)(((u8 *)m) +
+   sizeof(struct w1_netlink_msg) + m->len);
 
/*
 * Let's allow requests for nonexisting devices.
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC] w1: fixes and bundling replies

2014-03-22 Thread David Fries

On Fri, Feb 21, 2014 at 01:07:28AM +0400, Evgeniy Polyakov wrote:
> Your approach and patch seem correct, but I worry about how old
> commands are processed.  Do I get it right, that replies to old
> non-bundle commands are queued back instead of sending immediately,
> but since it is not bundle but single command, queued reply is being
> sent 'almost' immediately?  Basically, nothing changed to old
> clients, but there is new way to send batch requests now?

Yes, with the previous patch set, if a client sent one command per
message they would get the replies in individual packets.  However in
some cases clients had to bundle multiple commands in one message.
For instance when using a slave command to read a temperature sensor
it takes one W1_CMD_WRITE command to send a W1_READ_SCRATCHPAD, then
another W1_CMD_READ command to read the data back and they have to be
sent in the same W1_SLAVE_CMD because there can't be a reset between
the two write and read, and you can't do that with a slave command.
You can with master commands, but if you are going to issue individual
reset, write, and read commands why would you not bundle them?

Here is a new set of patches making bundling opt in.

In this version of the patch, W1_CN_BUNDLE set in the cn_msg.flags
enables the kernel to bundle the reply, otherwise replies aren't
bundled.

The previous patch separated the data replies and status replies
because they have a different ack value which is in the cn_msg and
cn_netlink_send could only send one cn_msg in a call.  I didn't much
like that solution because now status and data replies were out of
order.  This version creates cn_netlink_send_mult which takes a length
which allows multiple cn_msg structures to be sent at once (inside one
nlmsghdr, so clients can see there are more cn_msg structures to
read), so now status and data messages can be sent in order.

This is somewhat suboptimal as each command has a status reply and
possibly a data reply, each requiring a different ack, so even if a
client sent multiple commands in one w1_netlink_msg like,

cn_msg [ack  ], w1_netlink_msg, w1_netlink_cmd [read], w1_netlink_cmd [read]

the response can't pack the commands into one w1_netlink_msg because
of the ack, so there's duplicate of cn_msg and w1_netlink_msg
structures,

cn_msg [seq+1], w1_netlink_msg, w1_netlink_cmd [data]
cn_msg [ack  ], w1_netlink_msg, w1_netlink_cmd [status]
cn_msg [seq+1], w1_netlink_msg, w1_netlink_cmd [data]
cn_msg [ack  ], w1_netlink_msg, w1_netlink_cmd [status]

cn_msg is 20 bytes, w1_netlink_msg 12, so compared to sending separate
packets with all the context switches, select overhead and such,
bundling is going to be well worth it.

Another alternative could be one cn_msg [seq+1] with the data, and
followed by cn_msg [ack ] with the status messages, but they are back
to out of order.

I wasn't completely sure what to call the new sending function, I
decided on mult for multiple cn_netlink_send_mult, cn_netlink_send_len
as another idea, or if there are any better ideas let me know.

I tested with my client program that will bundle up 14 temperature
sensor conversions, then after a delay, bundle up another set of
commands to read all 14 with the bundle bit set.  I also tested with a
two year old version of the software that sends requests two one slave
at a time (bundling only the write/read to get the data out), and
doesn't have code to read the bundling the this patch adds.  Both
operate correctly running at the same time.

Posted before, no changes

connector support for sending multiple cn_msg

bundling,
corrects ack value to previous ack or seq + 1
corrects the variables to be name consistently

 Documentation/connector/connector.txt |   15 +-
 Documentation/w1/w1.generic   |2 +-
 Documentation/w1/w1.netlink   |   13 +-
 drivers/connector/connector.c |   17 +-
 drivers/w1/w1.h   |8 -
 drivers/w1/w1_netlink.c   |  673 -
 drivers/w1/w1_netlink.h   |   36 ++
 include/linux/connector.h |1 +
 8 files changed, 489 insertions(+), 276 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC] w1: fixes and bundling replies

2014-03-22 Thread David Fries

On Fri, Feb 21, 2014 at 01:07:28AM +0400, Evgeniy Polyakov wrote:
 Your approach and patch seem correct, but I worry about how old
 commands are processed.  Do I get it right, that replies to old
 non-bundle commands are queued back instead of sending immediately,
 but since it is not bundle but single command, queued reply is being
 sent 'almost' immediately?  Basically, nothing changed to old
 clients, but there is new way to send batch requests now?

Yes, with the previous patch set, if a client sent one command per
message they would get the replies in individual packets.  However in
some cases clients had to bundle multiple commands in one message.
For instance when using a slave command to read a temperature sensor
it takes one W1_CMD_WRITE command to send a W1_READ_SCRATCHPAD, then
another W1_CMD_READ command to read the data back and they have to be
sent in the same W1_SLAVE_CMD because there can't be a reset between
the two write and read, and you can't do that with a slave command.
You can with master commands, but if you are going to issue individual
reset, write, and read commands why would you not bundle them?

Here is a new set of patches making bundling opt in.

In this version of the patch, W1_CN_BUNDLE set in the cn_msg.flags
enables the kernel to bundle the reply, otherwise replies aren't
bundled.

The previous patch separated the data replies and status replies
because they have a different ack value which is in the cn_msg and
cn_netlink_send could only send one cn_msg in a call.  I didn't much
like that solution because now status and data replies were out of
order.  This version creates cn_netlink_send_mult which takes a length
which allows multiple cn_msg structures to be sent at once (inside one
nlmsghdr, so clients can see there are more cn_msg structures to
read), so now status and data messages can be sent in order.

This is somewhat suboptimal as each command has a status reply and
possibly a data reply, each requiring a different ack, so even if a
client sent multiple commands in one w1_netlink_msg like,

cn_msg [ack  ], w1_netlink_msg, w1_netlink_cmd [read], w1_netlink_cmd [read]

the response can't pack the commands into one w1_netlink_msg because
of the ack, so there's duplicate of cn_msg and w1_netlink_msg
structures,

cn_msg [seq+1], w1_netlink_msg, w1_netlink_cmd [data]
cn_msg [ack  ], w1_netlink_msg, w1_netlink_cmd [status]
cn_msg [seq+1], w1_netlink_msg, w1_netlink_cmd [data]
cn_msg [ack  ], w1_netlink_msg, w1_netlink_cmd [status]

cn_msg is 20 bytes, w1_netlink_msg 12, so compared to sending separate
packets with all the context switches, select overhead and such,
bundling is going to be well worth it.

Another alternative could be one cn_msg [seq+1] with the data, and
followed by cn_msg [ack ] with the status messages, but they are back
to out of order.

I wasn't completely sure what to call the new sending function, I
decided on mult for multiple cn_netlink_send_mult, cn_netlink_send_len
as another idea, or if there are any better ideas let me know.

I tested with my client program that will bundle up 14 temperature
sensor conversions, then after a delay, bundle up another set of
commands to read all 14 with the bundle bit set.  I also tested with a
two year old version of the software that sends requests two one slave
at a time (bundling only the write/read to get the data out), and
doesn't have code to read the bundling the this patch adds.  Both
operate correctly running at the same time.

Posted before, no changes

connector support for sending multiple cn_msg

bundling,
corrects ack value to previous ack or seq + 1
corrects the variables to be name consistently

 Documentation/connector/connector.txt |   15 +-
 Documentation/w1/w1.generic   |2 +-
 Documentation/w1/w1.netlink   |   13 +-
 drivers/connector/connector.c |   17 +-
 drivers/w1/w1.h   |8 -
 drivers/w1/w1_netlink.c   |  673 -
 drivers/w1/w1_netlink.h   |   36 ++
 include/linux/connector.h |1 +
 8 files changed, 489 insertions(+), 276 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/3] w1: fix netlink refcnt leak on error path

2014-03-22 Thread David Fries

If the message type is W1_MASTER_CMD or W1_SLAVE_CMD, then a reference
is taken when searching for the slave or master device.  If there
isn't any following data m-len (mlen is a copy) is 0 and packing up
the message for later execution is skipped leaving nothing to
decrement the reference counts.

Way back when, m-len was checked before the search that increments the
reference count, but W1_LIST_MASTERS has no additional data, the check
was moved in 9be62e0b2fadaf5ff causing this bug.

This change reorders to put the check before the reference count is
incremented avoiding the problem.

Signed-off-by: David Fries da...@fries.net
---
 drivers/w1/w1_netlink.c |   44 ++--
 1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/drivers/w1/w1_netlink.c b/drivers/w1/w1_netlink.c
index 5234964..a02704a 100644
--- a/drivers/w1/w1_netlink.c
+++ b/drivers/w1/w1_netlink.c
@@ -300,12 +300,6 @@ static int w1_process_command_root(struct cn_msg *msg,
struct w1_netlink_msg *w;
u32 *id;
 
-   if (mcmd-type != W1_LIST_MASTERS) {
-   printk(KERN_NOTICE %s: msg: %x.%x, wrong type: %u, len: %u.\n,
-   __func__, msg-id.idx, msg-id.val, mcmd-type, 
mcmd-len);
-   return -EPROTO;
-   }
-
cn = kmalloc(PAGE_SIZE, GFP_KERNEL);
if (!cn)
return -ENOMEM;
@@ -441,6 +435,9 @@ static void w1_process_cb(struct w1_master *dev, struct 
w1_async_cmd *async_cmd)
w1_netlink_send_error(node-block-msg, node-m, cmd,
node-block-portid, err);
 
+   /* ref taken in w1_search_slave or w1_search_master_id when building
+* the block
+*/
if (sl)
w1_unref_slave(sl);
else
@@ -503,30 +500,42 @@ static void w1_cn_callback(struct cn_msg *msg, struct 
netlink_skb_parms *nsp)
 
msg_len = msg-len;
while (msg_len  !err) {
-   struct w1_reg_num id;
-   u16 mlen = m-len;
 
dev = NULL;
sl = NULL;
 
-   memcpy(id, m-id.id, sizeof(id));
-#if 0
-   printk(%s: %02x.%012llx.%02x: type=%02x, len=%u.\n,
-   __func__, id.family, (unsigned long long)id.id, 
id.crc, m-type, m-len);
-#endif
if (m-len + sizeof(struct w1_netlink_msg)  msg_len) {
err = -E2BIG;
break;
}
 
+   /* execute on this thread, no need to process later */
+   if (m-type == W1_LIST_MASTERS) {
+   err = w1_process_command_root(msg, m, nsp-portid);
+   goto out_cont;
+   }
+
+   /* All following message types require additional data,
+* check here before references are taken.
+*/
+   if (!m-len) {
+   err = -EPROTO;
+   goto out_cont;
+   }
+
+   /* both search calls take reference counts */
if (m-type == W1_MASTER_CMD) {
dev = w1_search_master_id(m-id.mst.id);
} else if (m-type == W1_SLAVE_CMD) {
-   sl = w1_search_slave(id);
+   sl = w1_search_slave((struct w1_reg_num *)m-id.id);
if (sl)
dev = sl-master;
} else {
-   err = w1_process_command_root(msg, m, nsp-portid);
+   printk(KERN_NOTICE
+   %s: msg: %x.%x, wrong type: %u, len: %u.\n,
+   __func__, msg-id.idx, msg-id.val,
+   m-type, m-len);
+   err = -EPROTO;
goto out_cont;
}
 
@@ -536,8 +545,6 @@ static void w1_cn_callback(struct cn_msg *msg, struct 
netlink_skb_parms *nsp)
}
 
err = 0;
-   if (!mlen)
-   goto out_cont;
 
atomic_inc(block-refcnt);
node-async.cb = w1_process_cb;
@@ -557,7 +564,8 @@ out_cont:
if (err)
w1_netlink_send_error(msg, m, NULL, nsp-portid, err);
msg_len -= sizeof(struct w1_netlink_msg) + m-len;
-   m = (struct w1_netlink_msg *)(((u8 *)m) + sizeof(struct 
w1_netlink_msg) + m-len);
+   m = (struct w1_netlink_msg *)(((u8 *)m) +
+   sizeof(struct w1_netlink_msg) + m-len);
 
/*
 * Let's allow requests for nonexisting devices.
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/3] connector: allow multiple messages to be sent in one packet

2014-03-22 Thread David Fries

This increases the amount of bundling to reduce the number of packets
sent.  For the one wire use there can be multiple struct
w1_netlink_cmd in a struct w1_netlink_msg and multiple of those in
struct cn_msg, and with this change multiple of those in a struct
nlmsghdr, and at each level the len identifies there being multiple of
the next.

Signed-off-by: David Fries da...@fries.net
---
 Documentation/connector/connector.txt |   13 ++---
 drivers/connector/connector.c |   17 +++--
 include/linux/connector.h |1 +
 3 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/Documentation/connector/connector.txt 
b/Documentation/connector/connector.txt
index e5c5f5e..e56abdb 100644
--- a/Documentation/connector/connector.txt
+++ b/Documentation/connector/connector.txt
@@ -24,7 +24,8 @@ netlink based networking for inter-process communication in a 
significantly
 easier way:
 
 int cn_add_callback(struct cb_id *id, char *name, void (*callback) (struct 
cn_msg *, struct netlink_skb_parms *));
-void cn_netlink_send(struct cn_msg *msg, u32 __group, int gfp_mask);
+void cn_netlink_send_multi(struct cn_msg *msg, u16 len, u32 portid, u32 
__group, int gfp_mask);
+void cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group, int 
gfp_mask);
 
 struct cb_id
 {
@@ -71,15 +72,21 @@ void cn_del_callback(struct cb_id *id);
  struct cb_id *id  - unique connector's user identifier.
 
 
-int cn_netlink_send(struct cn_msg *msg, u32 __groups, int gfp_mask);
+int cn_netlink_send_multi(struct cn_msg *msg, u16 len, u32 portid, u32 
__groups, int gfp_mask);
+int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __groups, int 
gfp_mask);
 
  Sends message to the specified groups.  It can be safely called from
  softirq context, but may silently fail under strong memory pressure.
  If there are no listeners for given group -ESRCH can be returned.
 
  struct cn_msg *   - message header(with attached data).
+ u16 len   - for *_multi multiple cn_msg messages can be 
sent
+ u32 port  - destination port.
+ If non-zero the message will be sent to the
+ given port, which should be set to the
+ original sender.
  u32 __group   - destination group.
- If __group is zero, then appropriate group 
will
+ If port and __group is zero, then appropriate 
group will
  be searched through all registered connector 
users,
  and message will be delivered to the group 
which was
  created for user with the same ID as in msg.
diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c
index 77afe74..f4b0cdb 100644
--- a/drivers/connector/connector.c
+++ b/drivers/connector/connector.c
@@ -43,6 +43,8 @@ static struct cn_dev cdev;
 static int cn_already_initialized;
 
 /*
+ * Sends mult (multiple) cn_msg at a time.
+ *
  * msg-seq and msg-ack are used to determine message genealogy.
  * When someone sends message it puts there locally unique sequence
  * and random acknowledge numbers.  Sequence number may be copied into
@@ -62,10 +64,13 @@ static int cn_already_initialized;
  * the acknowledgement number in the original message + 1, then it is
  * a new message.
  *
+ * If msg-len != len, then additional cn_msg messages are expected following
+ * the first msg.
+ *
  * The message is sent to, the portid if given, the group if given, both if
  * both, or if both are zero then the group is looked up and sent there.
  */
-int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group,
+int cn_netlink_send_mult(struct cn_msg *msg, u16 len, u32 portid, u32 __group,
gfp_t gfp_mask)
 {
struct cn_callback_entry *__cbq;
@@ -98,7 +103,7 @@ int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 
__group,
if (!portid  !netlink_has_listeners(dev-nls, group))
return -ESRCH;
 
-   size = sizeof(*msg) + msg-len;
+   size = sizeof(*msg) + len;
 
skb = nlmsg_new(size, gfp_mask);
if (!skb)
@@ -121,6 +126,14 @@ int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 
__group,
 gfp_mask);
return netlink_unicast(dev-nls, skb, portid, !(gfp_mask__GFP_WAIT));
 }
+EXPORT_SYMBOL_GPL(cn_netlink_send_mult);
+
+/* same as cn_netlink_send_mult except msg-len is used for len */
+int cn_netlink_send(struct cn_msg *msg, u32 portid, u32 __group,
+   gfp_t gfp_mask)
+{
+   return cn_netlink_send_mult(msg, msg-len, portid, __group, gfp_mask);
+}
 EXPORT_SYMBOL_GPL(cn_netlink_send);
 
 /*
diff --git a/include/linux/connector.h b/include/linux/connector.h
index be9c4747..f8fe863 100644
--- a/include/linux/connector.h
+++ b/include/linux/connector.h
@@ -71,6 +71,7 @@ struct

[PATCH 3/3] w1: optional bundling of netlink kernel replies

2014-03-22 Thread David Fries

Applications can submit a set of commands in one packet to the kernel,
and in some cases it is required such as reading the temperature
sensor results.  This adds an option W1_CN_BUNDLE to the flags of
cn_msg to request the kernel to reply in one packet for efficiency.

The cn_msg flags now check for unknown flag values and return an error
if one is seen.  See Proper handling of unknown flags in system
calls http://lwn.net/Articles/588444/

This corrects the ack values returned as per the protocol standard,
namely the original ack for status messages and seq + 1 for all others
such as the data returned from a read.

Signed-off-by: David Fries da...@fries.net

Some of the common variable names have been standardized as follows.
struct cn_msg *cn
struct w1_netlink_msg *msg
struct w1_netlink_cmd *cmd
struct w1_master *dev

When an argument and a function scope variable would collide, add req_
to the argument.
---
 Documentation/connector/connector.txt |2 +-
 Documentation/w1/w1.generic   |2 +-
 Documentation/w1/w1.netlink   |   13 +-
 drivers/w1/w1.h   |8 -
 drivers/w1/w1_netlink.c   |  649 -
 drivers/w1/w1_netlink.h   |   36 ++
 6 files changed, 447 insertions(+), 263 deletions(-)

diff --git a/Documentation/connector/connector.txt 
b/Documentation/connector/connector.txt
index e56abdb..f6215f9 100644
--- a/Documentation/connector/connector.txt
+++ b/Documentation/connector/connector.txt
@@ -118,7 +118,7 @@ acknowledge number MUST be the same + 1.
 If we receive a message and its sequence number is not equal to one we
 are expecting, then it is a new message.  If we receive a message and
 its sequence number is the same as one we are expecting, but its
-acknowledge is not equal to the acknowledge number in the original
+acknowledge is not equal to the sequence number in the original
 message + 1, then it is a new message.
 
 Obviously, the protocol header contains the above id.
diff --git a/Documentation/w1/w1.generic b/Documentation/w1/w1.generic
index a31c5a2..b2033c6 100644
--- a/Documentation/w1/w1.generic
+++ b/Documentation/w1/w1.generic
@@ -82,7 +82,7 @@ driver - (standard) symlink to the w1 driver
 w1_master_add  - Manually register a slave device
 w1_master_attempts - the number of times a search was attempted
 w1_master_max_slave_count
-   - the maximum slaves that may be attached to a master
+   - maximum number of slaves to search for at a time
 w1_master_name - the name of the device (w1_bus_masterX)
 w1_master_pullup   - 5V strong pullup 0 enabled, 1 disabled
 w1_master_remove   - Manually remove a slave device
diff --git a/Documentation/w1/w1.netlink b/Documentation/w1/w1.netlink
index 927a52c..ef27271 100644
--- a/Documentation/w1/w1.netlink
+++ b/Documentation/w1/w1.netlink
@@ -30,7 +30,7 @@ Protocol.
W1_SLAVE_CMD
userspace command for slave device
(read/write/touch)
-   __u8 res- reserved
+   __u8 status - error indication from kernel
__u16 len   - size of data attached to this header data
union {
__u8 id[8];  - slave unique device id
@@ -44,10 +44,14 @@ Protocol.
__u8 cmd- command opcode.
W1_CMD_READ - read command
W1_CMD_WRITE- write command
-   W1_CMD_TOUCH- touch command
-   (write and sample data back to userspace)
W1_CMD_SEARCH   - search command
W1_CMD_ALARM_SEARCH - alarm search command
+   W1_CMD_TOUCH- touch command
+   (write and sample data back to userspace)
+   W1_CMD_RESET- send bus reset
+   W1_CMD_SLAVE_ADD- add slave to kernel list
+   W1_CMD_SLAVE_REMOVE - remove slave from kernel list
+   W1_CMD_LIST_SLAVES  - get slaves list from kernel
__u8 res- reserved
__u16 len   - length of data for this command
For read command data must be allocated like for write command
@@ -87,8 +91,7 @@ format:
id0 ... idN
 
Each message is at most 4k in size, so if number of master devices
-   exceeds this, it will be split into several messages,
-   cn.seq will be increased for each one.
+   exceeds this, it will be split into several messages.
 
 W1 search and alarm search commands.
 request:
diff --git a/drivers/w1/w1.h b/drivers/w1/w1.h
index 734dab7..56a49ba 100644
--- a/drivers/w1/w1.h
+++ b/drivers/w1/w1.h
@@ -203,7 +203,6 @@ enum w1_master_flags {
  * @search_id: allows continuing a search
  * @refcnt:reference count
  * @priv:  private data storage

[PATCH] w1: bundle reply if the request was bundled

2014-02-18 Thread David Fries

A program can bundle multiple messages and commands together when
making one wire requests, which is going to be much more efficient
than sending lots of little packets one write at a time.  With this
change the kernel will then bundle responses to requests that were
bundled, where it's probably even more important because programs will
typically execute select or a poll operation between reading packets,
which takes additional context switches.

Without doing any bundling, reading 14 temperature sensors using slave
commands requires the application to send 14 packets for the
conversion with one command each, and then later for reading it takes
14 packets with two commands each, one to request reading the
scratchpad, and another to read the data back in.  The kernel will
then generate 14 replies with data, and then one status for each of
the three commands per sensor totaling 56 packets or 84 total packets.

That's a lot of context switches and not very efficient on the one
wire bus.  Using master commands allows for merging writes, such as
match rom, rom id, and convert temperature into sending one write to
the hardware.  But that means sending individual commands for reset
and write (for convert), or reset, write, read (for reading the
result) which would be 28 packets without bundling or 2 packets when
bundling.  It requires two packets because it must wait for the
conversion before reading results.  Without this change the kernel
would send one read result, and five command status results, so 6*14 =
84 packets.  With this set of changes it would send three, a set of
status results for the first convert commands, and another status
result along with the data read for the second set.

This also allows an application to get all the results from a bundle
(reading the temperature sensors) at the same time, instead of them
being spread out in time as the one wire bus takes time to execute and
read each response.

Due to the data structure limitation this patch will return packets in
a different order.  For a given bundle all the replies will be sent
first, followed by the status messages.  This is because currently the
connector takes a cn_msg, and the difference between a reply and
status is the ack field of cn_msg.  I could add a second
cn_netlink_send_nlh, to take a nlmsghdr or maybe a length so multiple
cn_msg can be bundled into one packet, then all the messages could be
returned in the current order.
---
 drivers/w1/w1.h |8 -
 drivers/w1/w1_netlink.c |  531 ++-
 2 files changed, 340 insertions(+), 199 deletions(-)

diff --git a/drivers/w1/w1.h b/drivers/w1/w1.h
index 734dab7..56a49ba 100644
--- a/drivers/w1/w1.h
+++ b/drivers/w1/w1.h
@@ -203,7 +203,6 @@ enum w1_master_flags {
  * @search_id: allows continuing a search
  * @refcnt:reference count
  * @priv:  private data storage
- * @priv_size: size allocated
  * @enable_pullup: allows a strong pullup
  * @pullup_duration:   time for the next strong pullup
  * @flags: one of w1_master_flags
@@ -214,7 +213,6 @@ enum w1_master_flags {
  * @dev:   sysfs device
  * @bus_master:io operations available
  * @seq:   sequence number used for netlink broadcasts
- * @portid:destination for the current netlink command
  */
 struct w1_master
 {
@@ -241,7 +239,6 @@ struct w1_master
atomic_trefcnt;
 
void*priv;
-   int priv_size;
 
/** 5V strong pullup enabled flag, 1 enabled, zero disabled. */
int enable_pullup;
@@ -260,11 +257,6 @@ struct w1_master
struct w1_bus_master*bus_master;
 
u32 seq;
-   /* port id to send netlink responses to.  The value is temporarily
-* stored here while processing a message, set after locking the
-* mutex, zero before unlocking the mutex.
-*/
-   u32 portid;
 };
 
 /**
diff --git a/drivers/w1/w1_netlink.c b/drivers/w1/w1_netlink.c
index fa24561..98b9314 100644
--- a/drivers/w1/w1_netlink.c
+++ b/drivers/w1/w1_netlink.c
@@ -29,6 +29,198 @@
 #include "w1_netlink.h"
 
 #if defined(CONFIG_W1_CON) && (defined(CONFIG_CONNECTOR) || 
(defined(CONFIG_CONNECTOR_MODULE) && defined(CONFIG_W1_MODULE)))
+
+#define MIN(a, b)   (((a) < (b)) ? (a) : (b))
+
+/* Bundle together everything required to process a request in one memory
+ * allocation.
+ */
+struct w1_cb_block {
+   atomic_t refcnt;
+   /* replies, this would be include the data for a read message */
+   /* maximum value for reply_msg->len */
+   u16 reply_maxlen;
+   /* pointers to building up the reply message */
+   struct cn_msg *reply_msg;
+   struct w1_netlink_msg *reply_m;
+   struct w1_netlink_cmd *reply_cmd;
+   /* status, for w1_netlink_msg.status of 0 success or error,

[PATCH] w1: bundle reply if the request was bundled

2014-02-18 Thread David Fries

A program can bundle multiple messages and commands together when
making one wire requests, which is going to be much more efficient
than sending lots of little packets one write at a time.  With this
change the kernel will then bundle responses to requests that were
bundled, where it's probably even more important because programs will
typically execute select or a poll operation between reading packets,
which takes additional context switches.

Without doing any bundling, reading 14 temperature sensors using slave
commands requires the application to send 14 packets for the
conversion with one command each, and then later for reading it takes
14 packets with two commands each, one to request reading the
scratchpad, and another to read the data back in.  The kernel will
then generate 14 replies with data, and then one status for each of
the three commands per sensor totaling 56 packets or 84 total packets.

That's a lot of context switches and not very efficient on the one
wire bus.  Using master commands allows for merging writes, such as
match rom, rom id, and convert temperature into sending one write to
the hardware.  But that means sending individual commands for reset
and write (for convert), or reset, write, read (for reading the
result) which would be 28 packets without bundling or 2 packets when
bundling.  It requires two packets because it must wait for the
conversion before reading results.  Without this change the kernel
would send one read result, and five command status results, so 6*14 =
84 packets.  With this set of changes it would send three, a set of
status results for the first convert commands, and another status
result along with the data read for the second set.

This also allows an application to get all the results from a bundle
(reading the temperature sensors) at the same time, instead of them
being spread out in time as the one wire bus takes time to execute and
read each response.

Due to the data structure limitation this patch will return packets in
a different order.  For a given bundle all the replies will be sent
first, followed by the status messages.  This is because currently the
connector takes a cn_msg, and the difference between a reply and
status is the ack field of cn_msg.  I could add a second
cn_netlink_send_nlh, to take a nlmsghdr or maybe a length so multiple
cn_msg can be bundled into one packet, then all the messages could be
returned in the current order.
---
 drivers/w1/w1.h |8 -
 drivers/w1/w1_netlink.c |  531 ++-
 2 files changed, 340 insertions(+), 199 deletions(-)

diff --git a/drivers/w1/w1.h b/drivers/w1/w1.h
index 734dab7..56a49ba 100644
--- a/drivers/w1/w1.h
+++ b/drivers/w1/w1.h
@@ -203,7 +203,6 @@ enum w1_master_flags {
  * @search_id: allows continuing a search
  * @refcnt:reference count
  * @priv:  private data storage
- * @priv_size: size allocated
  * @enable_pullup: allows a strong pullup
  * @pullup_duration:   time for the next strong pullup
  * @flags: one of w1_master_flags
@@ -214,7 +213,6 @@ enum w1_master_flags {
  * @dev:   sysfs device
  * @bus_master:io operations available
  * @seq:   sequence number used for netlink broadcasts
- * @portid:destination for the current netlink command
  */
 struct w1_master
 {
@@ -241,7 +239,6 @@ struct w1_master
atomic_trefcnt;
 
void*priv;
-   int priv_size;
 
/** 5V strong pullup enabled flag, 1 enabled, zero disabled. */
int enable_pullup;
@@ -260,11 +257,6 @@ struct w1_master
struct w1_bus_master*bus_master;
 
u32 seq;
-   /* port id to send netlink responses to.  The value is temporarily
-* stored here while processing a message, set after locking the
-* mutex, zero before unlocking the mutex.
-*/
-   u32 portid;
 };
 
 /**
diff --git a/drivers/w1/w1_netlink.c b/drivers/w1/w1_netlink.c
index fa24561..98b9314 100644
--- a/drivers/w1/w1_netlink.c
+++ b/drivers/w1/w1_netlink.c
@@ -29,6 +29,198 @@
 #include w1_netlink.h
 
 #if defined(CONFIG_W1_CON)  (defined(CONFIG_CONNECTOR) || 
(defined(CONFIG_CONNECTOR_MODULE)  defined(CONFIG_W1_MODULE)))
+
+#define MIN(a, b)   (((a)  (b)) ? (a) : (b))
+
+/* Bundle together everything required to process a request in one memory
+ * allocation.
+ */
+struct w1_cb_block {
+   atomic_t refcnt;
+   /* replies, this would be include the data for a read message */
+   /* maximum value for reply_msg-len */
+   u16 reply_maxlen;
+   /* pointers to building up the reply message */
+   struct cn_msg *reply_msg;
+   struct w1_netlink_msg *reply_m;
+   struct w1_netlink_cmd *reply_cmd;
+   /* status, for w1_netlink_msg.status of 0 success or error, can't
+

Re: [patch] w1: small type cleanup in sysfs

2014-02-11 Thread David Fries

Dan,

I have some other changes in work, how automated is your checkers?
How much work is it for me to give a github repository and branch and
find out if I introduced any problems before submitting them?

I didn't get how you could get a less than one after a check for less
than one from the description or patch until I looked at the rest of
the source code.  Looks good if the description mentions
max_slave_count is an int.

How about wording it,

On 64 bit systems, a large value for "long tmp" is truncated when
assigning to "int md->max_slave_count" so we still end up with a value
less than one despite the "tmp < 1" check.

Acked-by: David Fries 

On Tue, Feb 11, 2014 at 07:08:26PM +0300, Dan Carpenter wrote:
> On 64 bit systems, a large value for "tmp" could be truncated so we
> still end up with a ->max_slave_count which is less than one despite the
> "tmp < 1" check.
> 
> This is more of a problem for static checkers than a real life issue,
> but it's simple enough to fix.
> 
> Signed-off-by: Dan Carpenter 
> 
> diff --git a/drivers/w1/w1.c b/drivers/w1/w1.c
> index 9eb816b2ea5e..b96f61b15dc6 100644
> --- a/drivers/w1/w1.c
> +++ b/drivers/w1/w1.c
> @@ -320,10 +320,10 @@ static ssize_t w1_master_attribute_show_timeout(struct 
> device *dev, struct devic
>  static ssize_t w1_master_attribute_store_max_slave_count(struct device *dev,
>   struct device_attribute *attr, const char *buf, size_t count)
>  {
> - long tmp;
> + int tmp;
>   struct w1_master *md = dev_to_w1_master(dev);
>  
> - if (kstrtol(buf, 0, ) == -EINVAL || tmp < 1)
> + if (kstrtoint(buf, 0, ) == -EINVAL || tmp < 1)
>   return -EINVAL;
>  
>   mutex_lock(>mutex);

-- 
David Fries PGP pub CB1EE8F0
http://fries.net/~david/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] w1: small type cleanup in sysfs

2014-02-11 Thread David Fries

Dan,

I have some other changes in work, how automated is your checkers?
How much work is it for me to give a github repository and branch and
find out if I introduced any problems before submitting them?


I didn't get how you could get a less than one after a check for less
than one from the description or patch until I looked at the rest of
the source code.  Looks good if the description mentions
max_slave_count is an int.

How about wording it,

On 64 bit systems, a large value for long tmp is truncated when
assigning to int md-max_slave_count so we still end up with a value
less than one despite the tmp  1 check.

Acked-by: David Fries da...@fries.net

On Tue, Feb 11, 2014 at 07:08:26PM +0300, Dan Carpenter wrote:
 On 64 bit systems, a large value for tmp could be truncated so we
 still end up with a -max_slave_count which is less than one despite the
 tmp  1 check.
 
 This is more of a problem for static checkers than a real life issue,
 but it's simple enough to fix.
 
 Signed-off-by: Dan Carpenter dan.carpen...@oracle.com
 
 diff --git a/drivers/w1/w1.c b/drivers/w1/w1.c
 index 9eb816b2ea5e..b96f61b15dc6 100644
 --- a/drivers/w1/w1.c
 +++ b/drivers/w1/w1.c
 @@ -320,10 +320,10 @@ static ssize_t w1_master_attribute_show_timeout(struct 
 device *dev, struct devic
  static ssize_t w1_master_attribute_store_max_slave_count(struct device *dev,
   struct device_attribute *attr, const char *buf, size_t count)
  {
 - long tmp;
 + int tmp;
   struct w1_master *md = dev_to_w1_master(dev);
  
 - if (kstrtol(buf, 0, tmp) == -EINVAL || tmp  1)
 + if (kstrtoint(buf, 0, tmp) == -EINVAL || tmp  1)
   return -EINVAL;
  
   mutex_lock(md-mutex);

-- 
David Fries da...@fries.netPGP pub CB1EE8F0
http://fries.net/~david/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Revert ds1wm.c from "w1: hold bus_mutex in netlink and search"

2014-02-08 Thread David Fries

This reverts ds1wm.c from commit d3a8a9dbb903c73a7ec2deae4c9b7d74b6834f4c.
Of the three files changed ds1wm.c ds2490.c and w1_netlink.c, it turns out
ds1wm.c was locking bus_mutex, but inside the loop and I missed it.
Reverting ds1wm.c to the previous version.

Signed-off-by: David Fries 
Reported-by: Dan Carpenter 
---
 drivers/w1/masters/ds1wm.c |4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/w1/masters/ds1wm.c b/drivers/w1/masters/ds1wm.c
index b077b8b..02df3b1 100644
--- a/drivers/w1/masters/ds1wm.c
+++ b/drivers/w1/masters/ds1wm.c
@@ -326,14 +326,13 @@ static void ds1wm_search(void *data, struct w1_master 
*master_dev,
unsigned slaves_found = 0;
unsigned int pass = 0;
 
-   mutex_lock(_dev->bus_mutex);
dev_dbg(_data->pdev->dev, "search begin\n");
while (true) {
++pass;
if (pass > 100) {
dev_dbg(_data->pdev->dev,
"too many attempts (100), search aborted\n");
-   break;
+   return;
}
 
mutex_lock(_dev->bus_mutex);
@@ -440,7 +439,6 @@ static void ds1wm_search(void *data, struct w1_master 
*master_dev,
dev_dbg(_data->pdev->dev,
"pass: %d total: %d search done ms d bit pos: %d\n", pass,
slaves_found, ms_discrep_bit);
-   mutex_unlock(_dev->bus_mutex);
 }
 
 /* - */
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Revert ds1wm.c from w1: hold bus_mutex in netlink and search

2014-02-08 Thread David Fries

This reverts ds1wm.c from commit d3a8a9dbb903c73a7ec2deae4c9b7d74b6834f4c.
Of the three files changed ds1wm.c ds2490.c and w1_netlink.c, it turns out
ds1wm.c was locking bus_mutex, but inside the loop and I missed it.
Reverting ds1wm.c to the previous version.

Signed-off-by: David Fries da...@fries.net
Reported-by: Dan Carpenter dan.carpen...@oracle.com
---
 drivers/w1/masters/ds1wm.c |4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/w1/masters/ds1wm.c b/drivers/w1/masters/ds1wm.c
index b077b8b..02df3b1 100644
--- a/drivers/w1/masters/ds1wm.c
+++ b/drivers/w1/masters/ds1wm.c
@@ -326,14 +326,13 @@ static void ds1wm_search(void *data, struct w1_master 
*master_dev,
unsigned slaves_found = 0;
unsigned int pass = 0;
 
-   mutex_lock(master_dev-bus_mutex);
dev_dbg(ds1wm_data-pdev-dev, search begin\n);
while (true) {
++pass;
if (pass  100) {
dev_dbg(ds1wm_data-pdev-dev,
too many attempts (100), search aborted\n);
-   break;
+   return;
}
 
mutex_lock(master_dev-bus_mutex);
@@ -440,7 +439,6 @@ static void ds1wm_search(void *data, struct w1_master 
*master_dev,
dev_dbg(ds1wm_data-pdev-dev,
pass: %d total: %d search done ms d bit pos: %d\n, pass,
slaves_found, ms_discrep_bit);
-   mutex_unlock(master_dev-bus_mutex);
 }
 
 /* - */
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] w1: refcnt fix, skip non-error send, docs

2014-02-07 Thread David Fries

On Sat, Feb 08, 2014 at 01:23:43AM +0400, z...@ioremap.net wrote:
> Hi
> 
> 07.02.2014, 10:00, "David Fries" :
> 
> > Here's a patch to implement that.  Is this what you have in mind?
> >
> > From 4ed65d81b0121a8c191a9833d041484e9097198b Mon Sep 17 00:00:00 2001
> > From: David Fries 
> > Date: Thu, 6 Feb 2014 23:45:05 -0600
> > Subject: [PATCH] w1: correct cn_msg ack, no change or seq + 1
> >
> > Netlink messages sent from the kernel consists of kernel generated
> > notifications for adds or removes, the error message (also indicates
> > the message has been processed), and the messages that have data to
> > return.  The cn_msg ack is left alone for the first two, and when
> > returning data it is the sequence number + 1.  Modifying the code to
> > the protocol standard.
> >
> > Signed-off-by: David Fries 
> 
> Yes, it looks right.
> Can you also check that protocol documentation is correct?

Documentation/connector/connector.txt ?  I found it a little unclear,
I'll see what I can do.

-- 
David Fries PGP pub CB1EE8F0
http://fries.net/~david/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] w1: refcnt fix, skip non-error send, docs

2014-02-07 Thread David Fries

On Sat, Feb 08, 2014 at 01:23:43AM +0400, z...@ioremap.net wrote:
 Hi

 07.02.2014, 10:00, David Fries da...@fries.net:

  Here's a patch to implement that.  Is this what you have in mind?

  From 4ed65d81b0121a8c191a9833d041484e9097198b Mon Sep 17 00:00:00 2001
  From: David Fries da...@fries.net
  Date: Thu, 6 Feb 2014 23:45:05 -0600
  Subject: [PATCH] w1: correct cn_msg ack, no change or seq + 1

  Netlink messages sent from the kernel consists of kernel generated
  notifications for adds or removes, the error message (also indicates
  the message has been processed), and the messages that have data to
  return.  The cn_msg ack is left alone for the first two, and when
  returning data it is the sequence number + 1.  Modifying the code to
  the protocol standard.

  Signed-off-by: David Fries da...@fries.net

 Yes, it looks right.
 Can you also check that protocol documentation is correct?

Documentation/connector/connector.txt ?  I found it a little unclear,
I'll see what I can do.

-- 
David Fries da...@fries.netPGP pub CB1EE8F0
http://fries.net/~david/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] w1: refcnt fix, skip non-error send, docs

2014-02-06 Thread David Fries

On Wed, Feb 05, 2014 at 03:48:45AM +0400, z...@ioremap.net wrote:
> Hi
> 
> 04.02.2014, 09:51, "David Fries" :
> > Help me understand what the protocol is supposed to be.  Assuming
> > there aren't any errors, is there supposed to be a
> > w1_netlink_send_error generated reply per netlink packet (cn_msg), per
> > w1_netlink_msg, or per w1_netlink_cmd?
> 
> reply should be sent per cmd to specify each command status
> If there is no cmd in request or we didn't get to it (like failed to reset 
> device), we should send error.
> 
> Depending on how w1-msg + (optional) w1-cmd are packed, client can detect 
> what exact error happend
> 
> > What about the cn_msg seq and ack values?  I assume the kernel
> > response should carry the same seq number as the request, but what
> > should the ack be set to?
> 
> reply ack is seq + 1
> seq is the same to highlight request it belongs to

Here's a patch to implement that.  Is this what you have in mind?

>From 4ed65d81b0121a8c191a9833d041484e9097198b Mon Sep 17 00:00:00 2001
From: David Fries 
Date: Thu, 6 Feb 2014 23:45:05 -0600
Subject: [PATCH] w1: correct cn_msg ack, no change or seq + 1

Netlink messages sent from the kernel consists of kernel generated
notifications for adds or removes, the error message (also indicates
the message has been processed), and the messages that have data to
return.  The cn_msg ack is left alone for the first two, and when
returning data it is the sequence number + 1.  Modifying the code to
the protocol standard.

Signed-off-by: David Fries 
---
 Documentation/connector/connector.txt |2 +-
 drivers/w1/w1_netlink.c   |6 +-
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/Documentation/connector/connector.txt 
b/Documentation/connector/connector.txt
index 9bdfc1a..0bc3522 100644
--- a/Documentation/connector/connector.txt
+++ b/Documentation/connector/connector.txt
@@ -115,7 +115,7 @@ acknowledge number MUST be the same + 1.
 If we receive a message and its sequence number is not equal to one we
 are expecting, then it is a new message.  If we receive a message and
 its sequence number is the same as one we are expecting, but its
-acknowledge is not equal to the acknowledge number in the original
+acknowledge is not equal to the sequence number in the original
 message + 1, then it is a new message.
 
 Obviously, the protocol header contains the above id.
diff --git a/drivers/w1/w1_netlink.c b/drivers/w1/w1_netlink.c
index 3c81689..d98b550 100644
--- a/drivers/w1/w1_netlink.c
+++ b/drivers/w1/w1_netlink.c
@@ -59,7 +59,6 @@ static void w1_send_slave(struct w1_master *dev, u64 rn)
avail = dev->priv_size - cmd->len;
 
if (avail < 8) {
-   msg->ack++;
cn_netlink_send(msg, dev->portid, 0, GFP_KERNEL);
 
msg->len = sizeof(struct w1_netlink_msg) +
@@ -130,7 +129,6 @@ static int w1_get_slaves(struct w1_master *dev,
W1_ALARM_SEARCH : W1_SEARCH, w1_found_send_slave);
}
 
-   msg->ack = 0;
cn_netlink_send(msg, dev->portid, 0, GFP_KERNEL);
 
dev->priv = NULL;
@@ -308,7 +306,7 @@ static int w1_process_command_root(struct cn_msg *msg,
cn->id.val = CN_W1_VAL;
 
cn->seq = msg->seq;
-   cn->ack = 1;
+   cn->ack = msg->seq + 1;
cn->len = sizeof(struct w1_netlink_msg);
w = (struct w1_netlink_msg *)(cn + 1);
 
@@ -321,7 +319,6 @@ static int w1_process_command_root(struct cn_msg *msg,
list_for_each_entry(m, _masters, w1_master_entry) {
if (cn->len + sizeof(*id) > PAGE_SIZE - sizeof(struct cn_msg)) {
cn_netlink_send(cn, portid, 0, GFP_KERNEL);
-   cn->ack++;
cn->len = sizeof(struct w1_netlink_msg);
w->len = 0;
id = (u32 *)(w + 1);
@@ -332,7 +329,6 @@ static int w1_process_command_root(struct cn_msg *msg,
cn->len += sizeof(*id);
id++;
}
-   cn->ack = 0;
cn_netlink_send(cn, portid, 0, GFP_KERNEL);
mutex_unlock(_mlock);
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/4] w1: refcnt fix, skip non-error send, docs

2014-02-06 Thread David Fries

On Wed, Feb 05, 2014 at 03:48:45AM +0400, z...@ioremap.net wrote:
 Hi
 
 04.02.2014, 09:51, David Fries da...@fries.net:
  Help me understand what the protocol is supposed to be.  Assuming
  there aren't any errors, is there supposed to be a
  w1_netlink_send_error generated reply per netlink packet (cn_msg), per
  w1_netlink_msg, or per w1_netlink_cmd?
 
 reply should be sent per cmd to specify each command status
 If there is no cmd in request or we didn't get to it (like failed to reset 
 device), we should send error.
 
 Depending on how w1-msg + (optional) w1-cmd are packed, client can detect 
 what exact error happend
 
  What about the cn_msg seq and ack values?  I assume the kernel
  response should carry the same seq number as the request, but what
  should the ack be set to?
 
 reply ack is seq + 1
 seq is the same to highlight request it belongs to

Here's a patch to implement that.  Is this what you have in mind?

From 4ed65d81b0121a8c191a9833d041484e9097198b Mon Sep 17 00:00:00 2001
From: David Fries da...@fries.net
Date: Thu, 6 Feb 2014 23:45:05 -0600
Subject: [PATCH] w1: correct cn_msg ack, no change or seq + 1

Netlink messages sent from the kernel consists of kernel generated
notifications for adds or removes, the error message (also indicates
the message has been processed), and the messages that have data to
return.  The cn_msg ack is left alone for the first two, and when
returning data it is the sequence number + 1.  Modifying the code to
the protocol standard.

Signed-off-by: David Fries da...@fries.net
---
 Documentation/connector/connector.txt |2 +-
 drivers/w1/w1_netlink.c   |6 +-
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/Documentation/connector/connector.txt 
b/Documentation/connector/connector.txt
index 9bdfc1a..0bc3522 100644
--- a/Documentation/connector/connector.txt
+++ b/Documentation/connector/connector.txt
@@ -115,7 +115,7 @@ acknowledge number MUST be the same + 1.
 If we receive a message and its sequence number is not equal to one we
 are expecting, then it is a new message.  If we receive a message and
 its sequence number is the same as one we are expecting, but its
-acknowledge is not equal to the acknowledge number in the original
+acknowledge is not equal to the sequence number in the original
 message + 1, then it is a new message.
 
 Obviously, the protocol header contains the above id.
diff --git a/drivers/w1/w1_netlink.c b/drivers/w1/w1_netlink.c
index 3c81689..d98b550 100644
--- a/drivers/w1/w1_netlink.c
+++ b/drivers/w1/w1_netlink.c
@@ -59,7 +59,6 @@ static void w1_send_slave(struct w1_master *dev, u64 rn)
avail = dev-priv_size - cmd-len;
 
if (avail  8) {
-   msg-ack++;
cn_netlink_send(msg, dev-portid, 0, GFP_KERNEL);
 
msg-len = sizeof(struct w1_netlink_msg) +
@@ -130,7 +129,6 @@ static int w1_get_slaves(struct w1_master *dev,
W1_ALARM_SEARCH : W1_SEARCH, w1_found_send_slave);
}
 
-   msg-ack = 0;
cn_netlink_send(msg, dev-portid, 0, GFP_KERNEL);
 
dev-priv = NULL;
@@ -308,7 +306,7 @@ static int w1_process_command_root(struct cn_msg *msg,
cn-id.val = CN_W1_VAL;
 
cn-seq = msg-seq;
-   cn-ack = 1;
+   cn-ack = msg-seq + 1;
cn-len = sizeof(struct w1_netlink_msg);
w = (struct w1_netlink_msg *)(cn + 1);
 
@@ -321,7 +319,6 @@ static int w1_process_command_root(struct cn_msg *msg,
list_for_each_entry(m, w1_masters, w1_master_entry) {
if (cn-len + sizeof(*id)  PAGE_SIZE - sizeof(struct cn_msg)) {
cn_netlink_send(cn, portid, 0, GFP_KERNEL);
-   cn-ack++;
cn-len = sizeof(struct w1_netlink_msg);
w-len = 0;
id = (u32 *)(w + 1);
@@ -332,7 +329,6 @@ static int w1_process_command_root(struct cn_msg *msg,
cn-len += sizeof(*id);
id++;
}
-   cn-ack = 0;
cn_netlink_send(cn, portid, 0, GFP_KERNEL);
mutex_unlock(w1_mlock);
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 >

1 - 100 of 228 matches

Mail list logo